Reproducing the analyses reported in ‘Robust differential expression testing for single-cell CRISPR screens at low multiplicty of infection’
Timothy Barry, Kaishu Mason, Kathryn Roeder, Eugene Katsevich
This repository contains code to reproduce the analyses reported in the paper “Robust differential expression testing for single-cell CRISPR screens at low multiplicty of infection” (Genome Biology, 2024).
Ensure that your system has the required software dependencies.
- GCC version 11.1.0 or higher
- R version 4.1.2 or higher
- Python version 3.10.4 or higher
- Java version 8 or higher
- Nextflow version 21.10.6 or higher
First, clone the sceptre2-manuscript
repository onto your machine.
git clone [email protected]:Katsevich-Lab/sceptre2-manuscript.git
Next, manually download the processed datasets from Dropbox. The data
are stored in ondisc
(version 1.1.0) format. Download the following
three directories from the Dropbox data
repository:
frangieh-2021
, papalexi-2021
, schraivogel-2020
. The code
originally used to download and process these datasets (alongside
documentation) is available in the following repositories:
Frangieh,
Papalexi,
Schraivogel.
We used a config file to increase the portability of our code across
machines. Create a config file called .research_config
in your home
directory.
cd
touch ~/.research_config
Define the following variables within this file:
-
LOCAL_PAPALEXI_2021_DATA_DIR
: the location of thepapalexi-2021
data directory -
LOCAL_SCHRAIVOGEL_2020_DATA_DIR
: the location of theschraivogel-2020
data directory -
LOCAL_FRANGIEH_2021_DATA_DIR
: the location of thefrangieh-2021
data directory -
LOCAL_CODE_DIR
: the location of the directory in which the clonedsceptre2-manuscript
repository is located -
LOCAL_SCEPTRE2_DATA_DIR
: the location of the directory in which to store files (e.g., auxiliary data, results) associated with the paper. -
KATS_SOFT_DIR
: the location of the directory in which to create the Python virtual environment
The contents of the .research_config
file should look like something
along the following lines.
LOCAL_PAPALEXI_2021_DATA_DIR=/Users/timbarry/research_offsite/external/papalexi-2021/
LOCAL_SCHRAIVOGEL_2020_DATA_DIR=/Users/timbarry/research_offsite/external/schraivogel-2020/
LOCAL_FRANGIEH_2021_DATA_DIR=/Users/timbarry/research_offsite/external/frangieh-2021/
LOCAL_CODE_DIR=/Users/timbarry/research_code/
LOCAL_SCEPTRE2_DATA_DIR=/Users/timbarry/research_offsite/projects/sceptre2/
KATS_SOFT_DIR=/Users/timbarry/
Next, create an .Rprofile
file in your home directory (if you have not
yet done so).
cd
touch .Rprofile
Add the following commands to your .Rprofile
.
suppressMessages(library(conflicted))
.get_config_path <- function(dir_name) {
cmd <- paste0("source ~/.research_config; echo $", dir_name)
system(command = cmd, intern = TRUE)
}
Sys.setenv(RETICULATE_PYTHON = paste0(.get_config_path("KATS_SOFT_DIR"), "py/lowmoi-venv/bin/python"))
utils::setRepositories(ind = 1:4)
Navigate to the bash_scripts
subdirectory of the sceptre2-manuscript
directory. Execute the following command to create a Python virtual
environment called py/lowmoi-venv
and download the necessary Python
packages. This script assumes that the command python3
can be called
from the command line.
cd bash_scripts
bash install_python_packages.sh
If you are having trouble initializing the Python virtual environment,
consult the following
documentation.
Next, install the required R packages. Note that this script installs
version 1.1.0
of ondisc
and version 0.3.0
of sceptre
.
Rscript ../R_scripts/install_R_packages.R
Finally, complete the rest of the setup, which consists of initializing the directory structure of the SCEPTRE2 data directory, initializing the symbolic links, creating the synthetic data (both negative and positive control), performing quality control on the data, and computing sample sizes. The following code takes about 10 minutes to execute.
# 4. set up the offsite directory structure
bash setup.sh
# 5. create the symbolic links within the offsite directory structure
bash create_sym_links.sh
# 6. create the synthetic data
Rscript ../R_scripts/create_synthetic_data.R
# 7. run the low MOI uniform processing step (which includes cell and feature qc)
Rscript ../R_scripts/uniform_processing_lowmoi.R
# 8. compute dataset sample sizes
Rscript ../R_scripts/compute_dataset_sample_sizes.R
We recommend downloading the results from the Dropbox results repository, so that you can reproduce the figures without having to rerun the analyses. This can be executed via the following commands.
source ~/.research_config
cd $LOCAL_SCEPTRE2_DATA_DIR"/results"
wget --max-redirect=20 -O download.zip https://www.dropbox.com/sh/yuubaoro3k75c61/AACPi9LDjY0B1pxwMxUMSoTca?dl=1
unzip -o download.zip
All results will be placed into the correct locations.
The next step is to run a sequence of Nextflow pipelines to carry out
the main analyses of the paper. These pipelines should be run
one-by-one. Each of these pipelines takes as an argument a parameter
file contained within the directory /param_files
. The parameter file
passed as an argument to each pipeline is indicated as a comment.
# 9. Undercover analysis
# Figure 1: c-f
# Figure 2: c, e, f
# Figure 4: a-c
# Figure S1-S5
# parameter file: undercover/params_undercover_0523.groovy
qsub ../pipeline_launch_scripts/undercover/grp_size_1_0523/grp_size_1.sh
# 10. Positive control analysis
# Figure 4: d
qsub ../pipeline_launch_scripts/positive_control/pc_0523/pc_analysis.sh
# 11. MW resampling statistics analysis
# Figure 2: b
qsub ../pipeline_launch_scripts/resampling_distributions/seurat_at_scale/seurat_at_scale.sh
# 12. Unfiltered SCEPTRE positive control analysis
# Figure S9
qsub ../pipeline_launch_scripts/positive_control/sceptre_unfiltered_pc_0523/pc_analysis.sh
# 13. Discovery analysis
# Figure S7
qsub ../pipeline_launch_scripts/discovery_analysis/
We carried out our analyses on an SGE cluster; thus, we submitted the
Nextflow jobs to the scheduler via qsub
. If you instead are running
the analyses on a SLURM cluster, for example, then you should submit the
jobs via sbatch
. If you are running the analyses locally, then you
should submit the jobs via bash
, etc.
Once the jobs have completed, process the results.
Rscript ../R_scripts/process_results.R
The next step is to run several auxiliary analyses. These analyses are smaller and thus do not necessitate Nextflow pipelines.
# 14. run the confounding analyses for figure 2 and figure s6
Rscript ../R_scripts/fig_s6_analyses.R
# 15. compute the dataset statistical details for table s2
Rscript ../R_scripts/get_dataset_statistical_details.R
# 16. run the analysis for supplementary figure s7 (CAMP)
Rscript ../R_scripts/camp_simulation.R
# 17. run the analysis for supplementary figure s11 (score vs. resid simulation)
# note that this step is carried out on a SGE cluster
bash run_simulation_study.sh
# 18. run the analysis for supplementary table 4 (spectral vs qr score computation)
Rscript ../R_scripts/score_test_benchmark.R
# 19. run the analyses for figure 5
Rscript ../R_scripts/save_datasets_as_r_objects.R
bash run_discovery_analyses_fig_5.sh
# 20. run the analyses for figure s12
# first, install 'resid_statistic' branch of the sceptre package
Rscript -e "devtools::install_github('katsevich-lab/sceptre@resid_statistic')"
# next, run the analyses
bash run_discovery_analyses_fig_s12.sh
# redownload sceptre 0.3.0 for good measure
Rscript -e "devtools::install_github('katsevich-lab/sceptre', ref = 'v0.3.0')"
# 21. ChIP-seq
Rscript download_additional_data.R
Rscript get_TF_targets_papalexi_chipseq.R
The final step is to create the figures. To do so issue the following commands.
Rscript ../R_scripts/figure_creation/fig_1/fig_1.R
Rscript ../R_scripts/figure_creation/fig_2/fig_2.R
Rscript ../R_scripts/figure_creation/fig_3/fig_3.R
Rscript ../R_scripts/figure_creation/fig_4/fig_4.R
Rscript ../R_scripts/figure_creation/fig_5/fig_5.R
Rscript ../R_scripts/figure_creation/fig_s1_s3/fig_s1_s3.R
Rscript ../R_scripts/figure_creation/fig_s4/fig_s4_s5.R
Rscript ../R_scripts/figure_creation/fig_s6/fig_s6.R
Rscript ../R_scripts/figure_creation/fig_s7/fig_s7.R
Rscript ../R_scripts/figure_creation/fig_s8/fig_s8.R
Rscript ../R_scripts/figure_creation/fig_s9/fig_s9.R
Rscript ../R_scripts/figure_creation/fig_s10/fig_s10.R
Rscript ../R_scripts/figure_creation/fig_s11/fig_s11.R
Rscript ../R_scripts/figure_creation/fig_s12/fig_s12.R
Rscript ../R_scripts/figure_creation/fig_s13/fig_s13.R
This codebase is large and complex. Contact the authors if you seek help with reproducing an aspect of the analysis.
-
Timothy (Tim) Barry: [email protected]
-
Eugene (Gene) Katsevich: [email protected]