Reproducing the analyses reported in ‘Robust differential expression testing for single-cell CRISPR screens at low multiplicty of infection’

Timothy Barry, Kaishu Mason, Kathryn Roeder, Eugene Katsevich

This repository contains code to reproduce the analyses reported in the paper “Robust differential expression testing for single-cell CRISPR screens at low multiplicty of infection” (Genome Biology, 2024).

Dependencies

Ensure that your system has the required software dependencies.

GCC version 11.1.0 or higher
R version 4.1.2 or higher
Python version 3.10.4 or higher
Java version 8 or higher
Nextflow version 21.10.6 or higher

Get started

First, clone the sceptre2-manuscript repository onto your machine.

git clone [email protected]:Katsevich-Lab/sceptre2-manuscript.git

Next, manually download the processed datasets from Dropbox. The data are stored in ondisc (version 1.1.0) format. Download the following three directories from the Dropbox data repository: frangieh-2021, papalexi-2021, schraivogel-2020. The code originally used to download and process these datasets (alongside documentation) is available in the following repositories: Frangieh, Papalexi, Schraivogel.

We used a config file to increase the portability of our code across machines. Create a config file called .research_config in your home directory.

cd
touch ~/.research_config

Define the following variables within this file:

LOCAL_PAPALEXI_2021_DATA_DIR: the location of the papalexi-2021 data directory
LOCAL_SCHRAIVOGEL_2020_DATA_DIR: the location of the schraivogel-2020 data directory
LOCAL_FRANGIEH_2021_DATA_DIR: the location of the frangieh-2021 data directory
LOCAL_CODE_DIR: the location of the directory in which the cloned sceptre2-manuscript repository is located
LOCAL_SCEPTRE2_DATA_DIR: the location of the directory in which to store files (e.g., auxiliary data, results) associated with the paper.
KATS_SOFT_DIR: the location of the directory in which to create the Python virtual environment

The contents of the .research_config file should look like something along the following lines.

LOCAL_PAPALEXI_2021_DATA_DIR=/Users/timbarry/research_offsite/external/papalexi-2021/
LOCAL_SCHRAIVOGEL_2020_DATA_DIR=/Users/timbarry/research_offsite/external/schraivogel-2020/
LOCAL_FRANGIEH_2021_DATA_DIR=/Users/timbarry/research_offsite/external/frangieh-2021/
LOCAL_CODE_DIR=/Users/timbarry/research_code/
LOCAL_SCEPTRE2_DATA_DIR=/Users/timbarry/research_offsite/projects/sceptre2/
KATS_SOFT_DIR=/Users/timbarry/

Next, create an .Rprofile file in your home directory (if you have not yet done so).

cd
touch .Rprofile

Add the following commands to your .Rprofile.

suppressMessages(library(conflicted))
.get_config_path <- function(dir_name) {
  cmd <- paste0("source ~/.research_config; echo $", dir_name)
  system(command = cmd, intern = TRUE)
}
Sys.setenv(RETICULATE_PYTHON = paste0(.get_config_path("KATS_SOFT_DIR"), "py/lowmoi-venv/bin/python"))
utils::setRepositories(ind = 1:4)

Set up the analyses

Navigate to the bash_scripts subdirectory of the sceptre2-manuscript directory. Execute the following command to create a Python virtual environment called py/lowmoi-venv and download the necessary Python packages. This script assumes that the command python3 can be called from the command line.

cd bash_scripts
bash install_python_packages.sh

If you are having trouble initializing the Python virtual environment, consult the following documentation. Next, install the required R packages. Note that this script installs version 1.1.0 of ondisc and version 0.3.0 of sceptre.

Rscript ../R_scripts/install_R_packages.R

Finally, complete the rest of the setup, which consists of initializing the directory structure of the SCEPTRE2 data directory, initializing the symbolic links, creating the synthetic data (both negative and positive control), performing quality control on the data, and computing sample sizes. The following code takes about 10 minutes to execute.

# 4. set up the offsite directory structure
bash setup.sh

# 5. create the symbolic links within the offsite directory structure
bash create_sym_links.sh

# 6. create the synthetic data
Rscript ../R_scripts/create_synthetic_data.R

# 7. run the low MOI uniform processing step (which includes cell and feature qc)
Rscript ../R_scripts/uniform_processing_lowmoi.R

# 8. compute dataset sample sizes
Rscript ../R_scripts/compute_dataset_sample_sizes.R

We recommend downloading the results from the Dropbox results repository, so that you can reproduce the figures without having to rerun the analyses. This can be executed via the following commands.

source ~/.research_config
cd $LOCAL_SCEPTRE2_DATA_DIR"/results"
 wget --max-redirect=20 -O download.zip https://www.dropbox.com/sh/yuubaoro3k75c61/AACPi9LDjY0B1pxwMxUMSoTca?dl=1
 unzip -o download.zip

All results will be placed into the correct locations.

Run the Nextflow pipelines

The next step is to run a sequence of Nextflow pipelines to carry out the main analyses of the paper. These pipelines should be run one-by-one. Each of these pipelines takes as an argument a parameter file contained within the directory /param_files. The parameter file passed as an argument to each pipeline is indicated as a comment.

# 9. Undercover analysis
# Figure 1: c-f
# Figure 2: c, e, f
# Figure 4: a-c
# Figure S1-S5
# parameter file: undercover/params_undercover_0523.groovy
qsub ../pipeline_launch_scripts/undercover/grp_size_1_0523/grp_size_1.sh

# 10. Positive control analysis
# Figure 4: d
qsub ../pipeline_launch_scripts/positive_control/pc_0523/pc_analysis.sh

# 11. MW resampling statistics analysis
# Figure 2: b
qsub ../pipeline_launch_scripts/resampling_distributions/seurat_at_scale/seurat_at_scale.sh

# 12. Unfiltered SCEPTRE positive control analysis
# Figure S9
qsub ../pipeline_launch_scripts/positive_control/sceptre_unfiltered_pc_0523/pc_analysis.sh

# 13. Discovery analysis
# Figure S7
qsub ../pipeline_launch_scripts/discovery_analysis/

We carried out our analyses on an SGE cluster; thus, we submitted the Nextflow jobs to the scheduler via qsub. If you instead are running the analyses on a SLURM cluster, for example, then you should submit the jobs via sbatch. If you are running the analyses locally, then you should submit the jobs via bash, etc.

Once the jobs have completed, process the results.

Rscript ../R_scripts/process_results.R

Run the auxiliary analyses

The next step is to run several auxiliary analyses. These analyses are smaller and thus do not necessitate Nextflow pipelines.

# 14. run the confounding analyses for figure 2 and figure s6
Rscript ../R_scripts/fig_s6_analyses.R

# 15. compute the dataset statistical details for table s2
Rscript ../R_scripts/get_dataset_statistical_details.R

# 16. run the analysis for supplementary figure s7 (CAMP)
Rscript ../R_scripts/camp_simulation.R

# 17. run the analysis for supplementary figure s11 (score vs. resid simulation)
# note that this step is carried out on a SGE cluster
bash run_simulation_study.sh

# 18. run the analysis for supplementary table 4 (spectral vs qr score computation)
Rscript ../R_scripts/score_test_benchmark.R

# 19. run the analyses for figure 5
Rscript ../R_scripts/save_datasets_as_r_objects.R
bash run_discovery_analyses_fig_5.sh

# 20. run the analyses for figure s12
# first, install 'resid_statistic' branch of the sceptre package
Rscript -e "devtools::install_github('katsevich-lab/sceptre@resid_statistic')"
# next, run the analyses
bash run_discovery_analyses_fig_s12.sh

# redownload sceptre 0.3.0 for good measure
Rscript -e "devtools::install_github('katsevich-lab/sceptre', ref = 'v0.3.0')"

# 21. ChIP-seq
Rscript download_additional_data.R
Rscript get_TF_targets_papalexi_chipseq.R

Create the figures

The final step is to create the figures. To do so issue the following commands.

Rscript ../R_scripts/figure_creation/fig_1/fig_1.R
Rscript ../R_scripts/figure_creation/fig_2/fig_2.R
Rscript ../R_scripts/figure_creation/fig_3/fig_3.R
Rscript ../R_scripts/figure_creation/fig_4/fig_4.R
Rscript ../R_scripts/figure_creation/fig_5/fig_5.R
Rscript ../R_scripts/figure_creation/fig_s1_s3/fig_s1_s3.R
Rscript ../R_scripts/figure_creation/fig_s4/fig_s4_s5.R
Rscript ../R_scripts/figure_creation/fig_s6/fig_s6.R
Rscript ../R_scripts/figure_creation/fig_s7/fig_s7.R
Rscript ../R_scripts/figure_creation/fig_s8/fig_s8.R
Rscript ../R_scripts/figure_creation/fig_s9/fig_s9.R
Rscript ../R_scripts/figure_creation/fig_s10/fig_s10.R
Rscript ../R_scripts/figure_creation/fig_s11/fig_s11.R
Rscript ../R_scripts/figure_creation/fig_s12/fig_s12.R
Rscript ../R_scripts/figure_creation/fig_s13/fig_s13.R

This codebase is large and complex. Contact the authors if you seek help with reproducing an aspect of the analysis.

Timothy (Tim) Barry: [email protected]
Eugene (Gene) Katsevich: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 809 Commits
R_scripts		R_scripts
bash_scripts		bash_scripts
docs		docs
param_files		param_files
pipeline_launch_scripts		pipeline_launch_scripts
revision		revision
.RData		.RData
.Rhistory		.Rhistory
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproducing the analyses reported in ‘Robust differential expression testing for single-cell CRISPR screens at low multiplicty of infection’

Dependencies

Get started

Set up the analyses

Run the Nextflow pipelines

Run the auxiliary analyses

Create the figures

About

Releases

Packages

Contributors 3

Languages

License

Katsevich-Lab/sceptre2-manuscript

Folders and files

Latest commit

History

Repository files navigation

Reproducing the analyses reported in ‘Robust differential expression testing for single-cell CRISPR screens at low multiplicty of infection’

Dependencies

Get started

Set up the analyses

Run the Nextflow pipelines

Run the auxiliary analyses

Create the figures

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages