Skip to content

Download single cell data from Genomics server and process with CellRanger - specific to CRUK CI infrastructure

License

Notifications You must be signed in to change notification settings

crukci-bioinformatics/nf_preprocess_scSeq_CRUKCI

Repository files navigation

Preprocess scRNAseq data

This workflow is specifically designed to work with the CRUK CI computational infrastructure. It is not intended for general use.

The workflow downloads the single cell data as fastq from the Genomics server using Clarity tools, it then renames the fastq files to conform with CellRanger's expected input file name format. The workflow then runs CellRanger count on the fastq files to generate the gene expression matrix.

Currently it is only configured to run CellRanger count; the intention is to add functionality for other assays/chemistries as needed.

Provide the SLX id

The SLX id should be specified in a parameters yaml file or on the command line:

  • slx_id: The SLX ID of the sequencing run

All of the workflow outputs will be published into a directory named after the SLX ID.

CellRanger reference

To specify the reference you can just provide the species in the parameters:

  • species - "mus_musculus" or "homo_sapiens"

This will cause the workflow to download the reference data for the specified species from 10X:

homo_sapiens: https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2024-A.tar.gz
mus_musculus: https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCm39-2024-A.tar.gz

The reference data will be published into the directory:

  • ${launchDir}/references

Alternatively, if you wish to use an existing reference specify:

  • reference_dir - The path to the CellRanger reference_dir

CellRanger software

The workflow has a singularity container that contains the CellRanger software version 9.0.1 and an R installation. This is currenly located at:

  • /home/bioinformatics/software/containers/cruk_ci_preprocess_scSeq-8.0.1.sif

The intention is to keep the container up to date as new versions of CellRanger are released.

If you wish to use a different version of CellRanger specify:

  • cellranger_dir - The path to the CellRanger software directory

Outputs

The workflow will generate the following outputs in the SLX directory:

  • fastq - the raw fastq files and accompanying files as downloaded from the genomics server using Clarity tools
  • reports/:
    • ./<Barcode>.web_summary.html - The CellRanger web summary reports for each sample, with the barcode added to file name
    • ./collated_metrics_file.csv - The summary metrics for all samples collated into a single file
    • ./summary_metrics.pdf - Bar plots of read depth and cell count per sample
  • <Barcode> - One directory of CellRanger count output for each sample, named according to the sample barcode

Running the pipeline

The pipeline can be run using the following command:

nextflow run crukci-bioinformatics/nf_preprocess_scSeq_CRUKCI \
    --slxid {slxID} \
    --species {species} \
    -profile epyc

About

Download single cell data from Genomics server and process with CellRanger - specific to CRUK CI infrastructure

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published