Skip to content

Commit

Permalink
Update documentation based on PR feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
fgypas committed Dec 22, 2024
1 parent 3684cd1 commit d7af25e
Show file tree
Hide file tree
Showing 7 changed files with 31 additions and 44 deletions.
8 changes: 3 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,7 @@

_RNA-seq analysis doesn't get simpler than that!_

ZARP relies on publicly available bioinformatics tools and currently handles single or paired-end stranded bulk RNA-seq data. The workflow is developed in [Snakemake][snakemake], a widely used workflow management system in the bioinformatics community.

ZARP will pre-process, align and quantify your single- or paired-end stranded bulk RNA-seq sequencing libraries with publicly available state-of-the-art bioinformatics tools. ZARP's browser-based rich reports and visualitations will give you meaningful initial insights in the quality and composition of your sequencing experiments - fast and simple. Whether you are an experimentalist struggling with large scale data analysis or an experienced bioinformatician, when there's RNA-seq data to analyze, just _ZARP 'em_!
The workflow is developed in [Snakemake][snakemake], a widely used workflow management system in the bioinformatics community. ZARP will pre-process, align and quantify your single- or paired-end stranded bulk RNA-seq sequencing libraries with publicly available state-of-the-art bioinformatics tools. ZARP's browser-based rich reports and visualizations will give you meaningful initial insights in the quality and composition of your sequencing experiments - fast and simple. Whether you are an experimentalist struggling with large scale data analysis or an experienced bioinformatician, when there's RNA-seq data to analyze, just _zarp 'em_!

<div align="center">
<img width="60%" src=images/zarp_schema.png>
Expand All @@ -37,7 +35,7 @@ Quick installation requires the following:
- Git
- [Conda][conda] >= 22.11.1
- [Mamba][mamba] >=1.3.0 <2
- [Singularity][singularity] >=3.5.2 (Required only if you want to use Singulaarity for the dependencies)
- [Singularity][singularity] >=3.5.2 (Required only if you want to use Singularity for the dependencies)

```bash
git clone https://github.com/zavolanlab/zarp.git
Expand Down Expand Up @@ -96,7 +94,7 @@ your run.
**OR**
Runner script for _Slurm cluster exection_ (note that you may need
Runner script for _Slurm cluster execution_ (note that you may need
to modify the arguments to `--jobs` and `--cores` in the file:
`profiles/slurm-singularity/config.yaml` depending on your HPC
and workload manager configuration):
Expand Down
22 changes: 4 additions & 18 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,33 +6,19 @@

**Welcome to the _ZARP_ documentation pages!**

_ZARP_ is a generic RNA-Seq analysis workflow that allows users to process and analyze
Illumina short-read sequencing libraries with minimum effort. Better yet: With our
companion [**ZARP-cli**](https://github.com/zavolanlab/zarp-cli) command line
interface, you can start ZARP runs with the simplest and most intuitive commands.
_ZARP_ is a generic RNA-Seq analysis workflow that allows users to process and analyze Illumina short-read sequencing libraries with minimum effort. Better yet: With our companion [**ZARP-cli**](https://github.com/zavolanlab/zarp-cli) command line interface, you can start ZARP runs with the simplest and most intuitive commands.

_RNA-seq analysis doesn't get simpler than that!_

ZARP relies on publicly available bioinformatics tools and currently handles
single or paired-end stranded bulk RNA-seq data. The workflow is developed in
[Snakemake][snakemake], a widely used workflow management system in the
bioinformatics community.

ZARP will pre-process, align and quantify your single- or paired-end stranded
bulk RNA-seq sequencing libraries with publicly available state-of-the-art
bioinformatics tools. ZARP's browser-based rich reports and visualitations will
give you meaningful initial insights in the quality and composition of your
sequencing experiments - fast and simple. Whether you are an experimentalist
struggling with large scale data analysis or an experienced bioinformatician,
when there's RNA-seq data to analyze, just _zarp 'em_!
The workflow is developed in [Snakemake][snakemake], a widely used workflow management system in the bioinformatics community. ZARP will pre-process, align and quantify your single- or paired-end stranded bulk RNA-seq sequencing libraries with publicly available state-of-the-art bioinformatics tools. ZARP's browser-based rich reports and visualizations will give you meaningful initial insights in the quality and composition of your sequencing experiments - fast and simple. Whether you are an experimentalist struggling with large scale data analysis or an experienced bioinformatician, when there's RNA-seq data to analyze, just _zarp 'em_!

## How does it work?

ZARP requires conda or mamba to install the basic dependencies. Each individual step of the workflow run either in its own Apptainer (Singularity) container or in its own Conda virtual environemnt.
ZARP requires conda or mamba to install the basic dependencies. Each individual step of the workflow run either in its own Apptainer (Singularity) container or in its own Conda virtual environment.

Once the installation is complete, you fill in a [config.yaml](https://github.com/zavolanlab/zarp/blob/dev/tests/input_files/config.yaml) file with parameters and a [samples.tsv](https://github.com/zavolanlab/zarp/blob/dev/tests/input_files/samples.tsv) file with sample specific information. You can easily trigger ZARP by making a call to snakemake with the appropriate parameters.

The pipeline can be executed in different systems or HPC clusters. ZARP generates multiple output files that help you QC your data and proceed with downstream analyses. Apart from running the main ZARP workflow, you can also run a second pipeline that downloads data from SRA, and a third pipeline that populates a file with the samples and determines sample specific parameters.
The pipeline can be executed in different systems or HPC clusters. ZARP generates multiple output files that help you Quality Control (QC) your data and proceed with downstream analyses. Apart from running the main ZARP workflow, you can also run a second pipeline that pulls sequencing sample data from the Sequence Read Archive (SRA), and a third pipeline that populates a file with the samples and infers missing metadata.

## How to cite

Expand Down
32 changes: 17 additions & 15 deletions docs/guides/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,14 @@ On this page, you will find out how to install _ZARP_ on your system.
Installation requires the following:

- Linux (tested with Ubuntu 20.04; macOS has not been tested yet)
- [Conda][conda] (tested with `conda 22.11.1`)
- [Mamba][mamba] (tested with `mamba 1.3.0`)
- [Singularity][singularity] (tested with `singularity 3.8.6`; not required
- [Conda][conda] (tested with `Conda 22.11.1`)
- [Mamba][mamba] (tested with `Mamba 1.3.0`)
- [Singularity][singularity] (tested with `Singularity 3.8.6`; not required
if you have root permissions on the machine you would like to install _ZARP_
on; in that case, see [below](#2-set-up-conda-environment))

> Other versions, especially older ones, are not guaranteed to work.
**TODO:** Add/replace requirements/versions and check that everything is correct

## Installation steps

### 1. Clone ZARP
Expand All @@ -42,8 +40,9 @@ conda 22.11.1
```
If it is not installed, you will see a <code style="color: red;">command not found error.</code>

### Conda installation
If Conda is not installed, you can install Miniconda by following these steps:
#### Conda installation

There are different ways to install Conda. We recommend to use [Miniconda][miniconda] to install it. Please refer to the official documentation for the latest installation. Alternatively you can follow these steps:

**1. Download the Miniconda installer:**

Expand All @@ -68,7 +67,7 @@ conda install conda=22.11.1
```
>This update includes a step to install a specific version of Conda, ensuring that users have a version tested to be compatible with ZARP.
### Conda installation if you already have Conda and do NOT want to change its version
#### Conda installation if you already have Conda and do NOT want to change its version

If you already have a specific conda version on your system which is not compatible with ZARP and do not want to change it, no worries. You can have more than two conda versions:

Expand Down Expand Up @@ -144,9 +143,9 @@ Activate the Conda environment with:
conda activate zarp
```

# Extra installation steps (optional)
## 6. Optional installation steps

## 6. Non-essential dependencies installation
### Install test dependencies

Most tests have additional dependencies. If you are planning to run tests, you
will need to install these by executing the following command _in your active
Expand All @@ -156,7 +155,7 @@ Conda environment_:
mamba env update -f install/environment.dev.yml
```

## 6a. Successful installation tests
### Run installation tests

We have prepared several tests to check the integrity of the workflow and its
components. These can be found in subdirectories of the `tests/` directory.
Expand All @@ -166,23 +165,26 @@ successfully, [additional dependencies](#installing-non-essential-dependencies)
need to be installed.
Execute one of the following commands to run the test workflow
on your local machine:
* Test workflow on local machine with **Singularity**:


#### Test workflow on local machine with **Singularity**:
```bash
bash tests/test_integration_workflow/test.local.sh
```
* Test workflow on local machine with **Conda**:
#### Test workflow on local machine with **Conda**:
```bash
bash tests/test_integration_workflow_with_conda/test.local.sh
```
Execute one of the following commands to run the test workflow
on a [Slurm][slurm]-managed high-performance computing (HPC) cluster:

* Test workflow with **Singularity**:
#### Test workflow with **Singularity**:

```bash
bash tests/test_integration_workflow/test.slurm.sh
```
* Test workflow with **Conda**:

#### Test workflow with **Conda**:

```bash
bash tests/test_integration_workflow_with_conda/test.slurm.sh
Expand Down
8 changes: 4 additions & 4 deletions docs/guides/outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ After a run you will find the following structure within the `results` directory
└── zpca
```

A descrpition of the different directories is shown below:
A description of the different directories is shown below:

- `results`: The main output directory for the ZARP workflow.
- `mus_musculus`: A subdirectory for the organism-specific results.
Expand Down Expand Up @@ -109,7 +109,7 @@ On the left you can find a navigation bar that takes you into different sections
<img width="80%" src=../images/zarp_multiqc_kallisto_alignment.png>
</div>

- Finally the `zpca` salmon and kallisto sections show PCA plots for expression levels of genes and transcripts.
- Finally the `zpca` Salmon and Kallisto sections show PCA plots for expression levels of genes and transcripts.

<div align="center">
<img width="80%" src=../images/zarp_multiqc_zpca.png>
Expand Down Expand Up @@ -137,7 +137,7 @@ Within the `samples` directory, you can find a directory for each sample, and wi
- In the `bigWig` directory you can find two folders. `UniqueMappers` and `MultimappersIncluded`. Within these files you find the bigWig files for the plus and minus strand. These files are convenient to load in a genome browser (like igv) to view the genome coverage of the mappings.


## Outputs of downnload SRA data
## Outputs of download SRA data

Once you run the pipeline that downloads data from the Sequence Read Archive (SRA) you can find the following file structure:

Expand Down Expand Up @@ -181,7 +181,7 @@ SRR18552868 results/sra_downloads/compress/SRR18552868/SRR18552868.fastq.gz
SRR18549672 results/sra_downloads/compress/SRR18549672/SRR18549672_1.fastq.gz results/sra_downloads/compress/SRR18549672/SRR18549672_2.fastq.gz
ERR2248142 results/sra_downloads/compress/ERR2248142/ERR2248142.fastq.gz
```
Some of the filenames indicate if the experiment was sequnced with `SINGLE (se)` or `PAIRED (pe)` end mode.
Some of the filenames indicate if the experiment was sequenced with `SINGLE (se)` or `PAIRED (pe)` end mode.

## Outputs of HTSinfer

Expand Down
2 changes: 1 addition & 1 deletion docs/guides/parameterization.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ remove_adapters_cutadapt:
-n: '2'
# Discard processed reads that are shorter than m; note that cutadapt uses
# a default value of m=0, causing reads without any nucleotides remaining
# after proessing to be retained; as "empty reads" will cause errors in
# after processing to be retained; as "empty reads" will cause errors in
# downstream applications in ZARP, we have changed the default to m=1,
# meaning that only read fragments of at least 1 nt will be retained after
# processing. The default will be overridden by the value specified here,
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ your run.
**OR**
Runner script for _Slurm cluster exection_ (note that you may need
Runner script for _Slurm cluster execution_ (note that you may need
to modify the arguments to `--jobs` and `--cores` in the file:
`profiles/slurm-singularity/config.yaml` depending on your HPC
and workload manager configuration):
Expand Down
1 change: 1 addition & 0 deletions docs/includes/references.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@
[zarp-issue-tracker]: <https://github.com/zavolanlab/zarp/issues>
[zarp-qa]: <https://github.com/zavolanlab/zarp/discussions>
[zavolab-gh]: <https://github.com/zavolanlab>
[miniconda]: <https://docs.anaconda.com/miniconda/>

0 comments on commit d7af25e

Please sign in to comment.