-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 6119a2e
Showing
9 changed files
with
59,752 additions
and
0 deletions.
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# CLIER | ||
|
||
This repository contains code and test data for the protocol **"Protocol for interpretable and context-specific single-cell informed deconvolution of bulk RNA-seq data"**, currently under review for STAR Protocols. | ||
|
||
## Detailed Description of Files | ||
|
||
### Code | ||
- **protocol_code.R**: Contains the lines of code included in the manuscript (excluding the processing from FASTQ to TPM). | ||
- **aux_functions.R**: Contains all the R functions necessary for executing the protocol. | ||
- **align_fastq.sh**: Automates the process of downloading FASTQ files and aligning paired-end RNA-Seq data using the STAR aligner. | ||
|
||
### Data | ||
- **kidney_atlas_matrix.rds**: Contains the single-cell signatures atlas built in "A transfer learning framework to elucidate the clinical relevance of altered proximal tubule cell states in kidney disease" (Legouis et al., 2024). | ||
- **kidney_atlas_info.xlsx**: Contains descriptions of the signatures included in the single-cell signatures atlas built in Legouis et al., 2024. | ||
- **DKD_tpm.rds**: Contains a processed version (TPM) of the dataset GSE142025, also used in Legouis et al., 2024. | ||
- **DKD_clin.rds**: Contains clinical information (fibrosis) regarding the dataset GSE142025. | ||
- **genelength.txt**: Contains genes length (to be used in data processing). | ||
|
||
## On the Execution | ||
|
||
The code in **protocol_code.R** can be fully executed using the test data provided in this repository. Users who might want to skip the training phase (that takes approximately 9 hours) and test a pre-trained model can find the KCLIER model [here](https://drive.switch.ch/index.php/s/OpvMh1vGRgRmKKf), together with other intermediate files produced during the execution. We share these file separately since, given their large size, they cannot fit on GitHub. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
#!/bin/bash | ||
|
||
# Define the array of URLs | ||
URL_LIST=( | ||
"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/031/SRR10691631/SRR10691631_1.fastq.gz" | ||
"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/031/SRR10691631/SRR10691631_2.fastq.gz" | ||
) | ||
|
||
# Define the fastq files, output and STAR genome directories | ||
FASTQ_DIR=/fastq_folder | ||
OUTPUT_DIR=/output_folder | ||
STAR_GENOME=/yourstargenomefolder | ||
|
||
# Create the directory if it doesn't exist | ||
mkdir -p "$FASTQ_DIR" | ||
|
||
for URL in "${URL_LIST[@]}"; do | ||
# Extract filename from URL | ||
FILE_NAME=$(basename "$URL") | ||
|
||
# Download the fastq file | ||
curl -L "$URL" -o "$FASTQ_DIR/$FILE_NAME" | ||
|
||
# Check if the download was successful | ||
|
||
if [ $? -eq 0 ]; then | ||
echo "Download of $FILE_NAME completed successfully." | ||
else | ||
echo "Error in downloading $FILE_NAME." | ||
fi | ||
done | ||
|
||
for R1 in ${FASTQ_DIR}/*_1.fastq.gz | ||
do | ||
# Derive R2 file by replacing "_1.fastq.gz" with "_2.fastq.gz" | ||
R2=${R1/_1.fastq.gz/_2.fastq.gz} | ||
# Extract the sample name (e.g., SRR10691631 from SRR10691631_1.fastq.gz) | ||
sample_name=$(basename ${R1} _1.fastq.gz) | ||
# Define the output prefix | ||
output_prefix="${OUTPUT_DIR}/${sample_name}." | ||
# Run STAR alignment | ||
STAR --runThreadN $ncpus \ | ||
--genomeDir ${STAR_GENOME} \ | ||
--outFileNamePrefix ${output_prefix} \ | ||
--readFilesIn ${R1} ${R2} \ | ||
--readFilesCommand zcat \ | ||
--quantMode GeneCounts \ | ||
--twopassMode Basic | ||
done |
Oops, something went wrong.