Preprocess scRNAseq data

This workflow is specifically designed to work with the CRUK CI computational infrastructure. It is not intended for general use.

The workflow downloads the single cell data as fastq from the Genomics server using Clarity tools, it then renames the fastq files to conform with CellRanger's expected input file name format. The workflow then runs CellRanger count on the fastq files to generate the gene expression matrix.

Currently it is only configured to run CellRanger count; the intention is to add functionality for other assays/chemistries as needed.

Provide the SLX id

The SLX id should be specified in a parameters yaml file or on the command line:

slx_id: The SLX ID of the sequencing run

All of the workflow outputs will be published into a directory named after the SLX ID.

CellRanger reference

To specify the reference you can just provide the species in the parameters:

species - "mus_musculus" or "homo_sapiens"

This will cause the workflow to download the reference data for the specified species from 10X:

homo_sapiens: https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2024-A.tar.gz
mus_musculus: https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCm39-2024-A.tar.gz

The reference data will be published into the directory:

${launchDir}/references

Alternatively, if you wish to use an existing reference specify:

reference_dir - The path to the CellRanger reference_dir

CellRanger software

The workflow has a singularity container that contains the CellRanger software version 9.0.1 and an R installation. This is currenly located at:

/home/bioinformatics/software/containers/cruk_ci_preprocess_scSeq-8.0.1.sif

The intention is to keep the container up to date as new versions of CellRanger are released.

If you wish to use a different version of CellRanger specify:

cellranger_dir - The path to the CellRanger software directory

Outputs

The workflow will generate the following outputs in the SLX directory:

fastq - the raw fastq files and accompanying files as downloaded from the genomics server using Clarity tools
reports/:
- ./<Barcode>.web_summary.html - The CellRanger web summary reports for each sample, with the barcode added to file name
- ./collated_metrics_file.csv - The summary metrics for all samples collated into a single file
- ./summary_metrics.pdf - Bar plots of read depth and cell count per sample
<Barcode> - One directory of CellRanger count output for each sample, named according to the sample barcode

Running the pipeline

The pipeline can be run using the following command:

nextflow run crukci-bioinformatics/nf_preprocess_scSeq_CRUKCI \
    --slxid {slxID} \
    --species {species} \
    -profile epyc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Preprocess scRNAseq data

Provide the SLX id

CellRanger reference

CellRanger software

Outputs

Running the pipeline

Files

README.md

Latest commit

History

README.md

File metadata and controls

Preprocess scRNAseq data

Provide the SLX id

CellRanger reference

CellRanger software

Outputs

Running the pipeline