This repository reproduces the results reported in arXiv version 2 the following paper:
Z. Niu, J. Ray Choudhury, E. Katsevich. “Computationally efficient and statistically accurate conditional independence testing with spaCRT.” (arXiv)
First, clone the spacrt-manuscript
repository onto your machine.
git clone [email protected]:Katsevich-Lab/spacrt-manuscript.git
One can choose to either run the simulation or real data analysis and obtain the figures, or directly download the results from Dropbox and use our plotting code to reproduce the figures. We will present these two routes separately.
The data are stored in .rds format. Download the simulation results and real data results from: Dropbox simulation results repository and Dropbox real data results repository, respectively. The following command could be used for reproducing the plots for simulation and real data analysis respectively.
One needs to change the data_dir
in realdata-code/plotting-code.R
to
the right directory where the downloaded results are. The value for
max_cutoff
should be chosen to 100.
Rscript realdata-code/plotting-code.R $max_cutoff
One could use the following code for reproducing the plots for
simulation results. Note the path_rds
variable in these Rscripts
should be the path to the downloaded simulation results.
Rscript -e 'source("simulation-code/plotting-code/assemble-plots-NB-disp-5e-2.R")'
Rscript -e 'source("simulation-code/plotting-code/assemble-plots-NB-disp-1.R")'
Rscript -e 'source("simulation-code/plotting-code/assemble-plots-NB-disp-10.R")'
If you would like to rerun the simulations from scratch, do not download the results and instead follow the steps in the next section.
One needs to first download the spacrt
package from
Katsevich-lab using the
following R code.
library(devtools)
install_github("katsevich-lab/spacrt")
We used a config file to increase the portability of our code across
machines. Create a config file called .research_config
in your home
directory.
cd
touch ~/.research_config
Define the following variable within this file:
LOCAL_SPACRT_DATA_DIR
: the location of the directory in which to store results.
The contents of the .research_config
file should look like something
along the following lines.
LOCAL_INTERNAL_DATA_DIR="/Users/ziangniu/Documents/Projects/HPCC/data/projects/"
LOCAL_SPACRT_DATA_DIR=$LOCAL_INTERNAL_DATA_DIR"spacrt/"
Navigate to the spacrt-manuscript directory. All scripts below must be executed from this directory. Figures will be automatically created if one uses the following code to reproduce the results.
Also, for the commands below, depending on the limits of your cluster,
you may need to set the max_gb and max_hours parameters differently. The
choice in run_all_simulation.sh
is 16 and 4, respectively.
qsub run_all_simulation.sh
One can use the following command to reproduce the real data analysis results.
qsub run_all_realdata.sh
Table 3 in the paper can be created using
realdata-code/sparsity_dataset.R
.