Update documentation based on PR feedback

zavolanlab · Dec 22, 2024 · d7af25e · d7af25e
1 parent 3684cd1
commit d7af25e
Show file tree

Hide file tree

Showing 7 changed files with 31 additions and 44 deletions.
diff --git a/README.md b/README.md
@@ -12,9 +12,7 @@
 
 _RNA-seq analysis doesn't get simpler than that!_
 
-ZARP relies on publicly available bioinformatics tools and currently handles single or paired-end stranded bulk RNA-seq data. The workflow is developed in [Snakemake][snakemake], a widely used workflow management system in the bioinformatics community.
-
-ZARP will pre-process, align and quantify your single- or paired-end stranded bulk RNA-seq sequencing libraries with publicly available state-of-the-art bioinformatics tools. ZARP's browser-based rich reports and visualitations will give you meaningful initial insights in the quality and composition of your sequencing experiments - fast and simple. Whether you are an experimentalist struggling with large scale data analysis or an experienced bioinformatician, when there's RNA-seq data to analyze, just _ZARP 'em_!
+The workflow is developed in [Snakemake][snakemake], a widely used workflow management system in the bioinformatics community. ZARP will pre-process, align and quantify your single- or paired-end stranded bulk RNA-seq sequencing libraries with publicly available state-of-the-art bioinformatics tools. ZARP's browser-based rich reports and visualizations will give you meaningful initial insights in the quality and composition of your sequencing experiments - fast and simple. Whether you are an experimentalist struggling with large scale data analysis or an experienced bioinformatician, when there's RNA-seq data to analyze, just _zarp 'em_!
 
 <div align="center">
     <img width="60%" src=images/zarp_schema.png>
@@ -37,7 +35,7 @@ Quick installation requires the following:
 - Git
 - [Conda][conda] >= 22.11.1
 - [Mamba][mamba] >=1.3.0 <2
-- [Singularity][singularity] >=3.5.2  (Required only if you want to use Singulaarity for the dependencies)
+- [Singularity][singularity] >=3.5.2  (Required only if you want to use Singularity for the dependencies)
 
 ```bash
 git clone https://github.com/zavolanlab/zarp.git
@@ -96,7 +94,7 @@ your run.
 
     **OR**
 
-    Runner script for _Slurm cluster exection_ (note that you may need
+    Runner script for _Slurm cluster execution_ (note that you may need
     to modify the arguments to `--jobs` and `--cores` in the file:
     `profiles/slurm-singularity/config.yaml` depending on your HPC
     and workload manager configuration):

diff --git a/docs/README.md b/docs/README.md
@@ -6,33 +6,19 @@
 
 **Welcome to the _ZARP_ documentation pages!**
 
-_ZARP_ is a generic RNA-Seq analysis workflow that allows users to process and analyze 
-Illumina short-read sequencing libraries with minimum effort. Better yet: With our
-companion [**ZARP-cli**](https://github.com/zavolanlab/zarp-cli) command line
-interface, you can start ZARP runs with the simplest and most intuitive commands.
+_ZARP_ is a generic RNA-Seq analysis workflow that allows users to process and analyze Illumina short-read sequencing libraries with minimum effort. Better yet: With our companion [**ZARP-cli**](https://github.com/zavolanlab/zarp-cli) command line interface, you can start ZARP runs with the simplest and most intuitive commands.
 
 _RNA-seq analysis doesn't get simpler than that!_
 
-ZARP relies on publicly available bioinformatics tools and currently handles
-single or paired-end stranded bulk RNA-seq data. The workflow is developed in
-[Snakemake][snakemake], a widely used workflow management system in the
-bioinformatics community.
-
-ZARP will pre-process, align and quantify your single- or paired-end stranded
-bulk RNA-seq sequencing libraries with publicly available state-of-the-art
-bioinformatics tools. ZARP's browser-based rich reports and visualitations will
-give you meaningful initial insights in the quality and composition of your
-sequencing experiments - fast and simple. Whether you are an experimentalist
-struggling with large scale data analysis or an experienced bioinformatician,
-when there's RNA-seq data to analyze, just _zarp 'em_!
+The workflow is developed in [Snakemake][snakemake], a widely used workflow management system in the bioinformatics community. ZARP will pre-process, align and quantify your single- or paired-end stranded bulk RNA-seq sequencing libraries with publicly available state-of-the-art bioinformatics tools. ZARP's browser-based rich reports and visualizations will give you meaningful initial insights in the quality and composition of your sequencing experiments - fast and simple. Whether you are an experimentalist struggling with large scale data analysis or an experienced bioinformatician, when there's RNA-seq data to analyze, just _zarp 'em_!
 
 ## How does it work?
 
-ZARP requires conda or mamba to install the basic dependencies. Each individual step of the workflow run either in its own Apptainer (Singularity) container or in its own Conda virtual environemnt.
+ZARP requires conda or mamba to install the basic dependencies. Each individual step of the workflow run either in its own Apptainer (Singularity) container or in its own Conda virtual environment.
 
 Once the installation is complete, you fill in a [config.yaml](https://github.com/zavolanlab/zarp/blob/dev/tests/input_files/config.yaml) file with parameters and a [samples.tsv](https://github.com/zavolanlab/zarp/blob/dev/tests/input_files/samples.tsv) file with sample specific information. You can easily trigger ZARP by making a call to snakemake with the appropriate parameters.
 
-The pipeline can be executed in different systems or HPC clusters. ZARP generates multiple output files that help you QC your data and proceed with downstream analyses. Apart from running the main ZARP workflow, you can also run a second pipeline that downloads data from SRA, and a third pipeline that populates a file with the samples and determines sample specific parameters.
+The pipeline can be executed in different systems or HPC clusters. ZARP generates multiple output files that help you Quality Control (QC) your data and proceed with downstream analyses. Apart from running the main ZARP workflow, you can also run a second pipeline that pulls sequencing sample data from the Sequence Read Archive (SRA), and a third pipeline that populates a file with the samples and infers missing metadata.
 
 ## How to cite
 

diff --git a/docs/guides/installation.md b/docs/guides/installation.md
@@ -7,16 +7,14 @@ On this page, you will find out how to install _ZARP_ on your system.
 Installation requires the following:
 
 - Linux (tested with Ubuntu 20.04; macOS has not been tested yet)
-- [Conda][conda] (tested with `conda 22.11.1`)
-- [Mamba][mamba] (tested with `mamba 1.3.0`)
-- [Singularity][singularity] (tested with `singularity 3.8.6`; not required
+- [Conda][conda] (tested with `Conda 22.11.1`)
+- [Mamba][mamba] (tested with `Mamba 1.3.0`)
+- [Singularity][singularity] (tested with `Singularity 3.8.6`; not required
   if you have root permissions on the machine you would like to install _ZARP_
   on; in that case, see [below](#2-set-up-conda-environment))
 
 > Other versions, especially older ones, are not guaranteed to work.
 
-**TODO:** Add/replace requirements/versions and check that everything is correct
-
 ## Installation steps
 
 ### 1. Clone ZARP
@@ -42,8 +40,9 @@ conda 22.11.1
 ```
 If it is not installed, you will see a <code style="color: red;">command not found error.</code>
 
-### Conda installation
-If Conda is not installed, you can install Miniconda by following these steps:
+#### Conda installation
+
+There are different ways to install Conda. We recommend to use [Miniconda][miniconda] to install it. Please refer to the official documentation for the latest installation. Alternatively you can follow these steps:
 
 **1. Download the Miniconda installer:**
 
@@ -68,7 +67,7 @@ conda install conda=22.11.1
 ```
 >This update includes a step to install a specific version of Conda, ensuring that users have a version tested to be compatible with ZARP.
 
-### Conda installation if you already have Conda and do NOT want to change its version
+#### Conda installation if you already have Conda and do NOT want to change its version
 
 If you already have a specific conda version on your system which is not compatible with ZARP and do not want to change it, no worries. You can have more than two conda versions:
 
@@ -144,9 +143,9 @@ Activate the Conda environment with:
 conda activate zarp
 ```
 
-# Extra installation steps (optional)
+## 6. Optional installation steps
 
-## 6. Non-essential dependencies installation
+### Install test dependencies
 
 Most tests have additional dependencies. If you are planning to run tests, you
 will need to install these by executing the following command _in your active
@@ -156,7 +155,7 @@ Conda environment_:
 mamba env update -f install/environment.dev.yml
 ```
 
-## 6a. Successful installation tests
+### Run installation tests
 
 We have prepared several tests to check the integrity of the workflow and its
 components. These can be found in subdirectories of the `tests/` directory. 
@@ -166,23 +165,26 @@ successfully, [additional dependencies](#installing-non-essential-dependencies)
 need to be installed. 
 Execute one of the following commands to run the test workflow 
 on your local machine:
-* Test workflow on local machine with **Singularity**:
+
+
+#### Test workflow on local machine with **Singularity**:
 ```bash
 bash tests/test_integration_workflow/test.local.sh
 ```
-* Test workflow on local machine with **Conda**:
+#### Test workflow on local machine with **Conda**:
 ```bash
 bash tests/test_integration_workflow_with_conda/test.local.sh
 ```
 Execute one of the following commands to run the test workflow 
 on a [Slurm][slurm]-managed high-performance computing (HPC) cluster:
 
-* Test workflow with **Singularity**:
+ #### Test workflow with **Singularity**:
 
 ```bash
 bash tests/test_integration_workflow/test.slurm.sh
 ```
-* Test workflow with **Conda**:
+
+#### Test workflow with **Conda**:
 
 ```bash
 bash tests/test_integration_workflow_with_conda/test.slurm.sh

diff --git a/docs/guides/outputs.md b/docs/guides/outputs.md
@@ -26,7 +26,7 @@ After a run you will find the following structure within the `results` directory
     └── zpca
 ```
 
-A descrpition of the different directories is shown below:
+A description of the different directories is shown below:
 
 - `results`: The main output directory for the ZARP workflow.
     - `mus_musculus`: A subdirectory for the organism-specific results.
@@ -109,7 +109,7 @@ On the left you can find a navigation bar that takes you into different sections
     <img width="80%" src=../images/zarp_multiqc_kallisto_alignment.png>
 </div>
 
-- Finally the `zpca` salmon and kallisto sections show PCA plots for expression levels of genes and transcripts.
+- Finally the `zpca` Salmon and Kallisto sections show PCA plots for expression levels of genes and transcripts.
 
 <div align="center">
     <img width="80%" src=../images/zarp_multiqc_zpca.png>
@@ -137,7 +137,7 @@ Within the `samples` directory, you can find a directory for each sample, and wi
 - In the `bigWig` directory you can find two folders. `UniqueMappers` and `MultimappersIncluded`. Within these files you find the bigWig files for the plus and minus strand. These files are convenient to load in a genome browser (like igv) to view the genome coverage of the mappings.
 
 
-## Outputs of downnload SRA data
+## Outputs of download SRA data
 
 Once you run the pipeline that downloads data from the Sequence Read Archive (SRA) you can find the following file structure:
 
@@ -181,7 +181,7 @@ SRR18552868     results/sra_downloads/compress/SRR18552868/SRR18552868.fastq.gz
 SRR18549672     results/sra_downloads/compress/SRR18549672/SRR18549672_1.fastq.gz       results/sra_downloads/compress/SRR18549672/SRR18549672_2.fastq.gz
 ERR2248142      results/sra_downloads/compress/ERR2248142/ERR2248142.fastq.gz 
 ```
-Some of the filenames indicate if the experiment was sequnced with `SINGLE (se)` or `PAIRED (pe)` end mode.
+Some of the filenames indicate if the experiment was sequenced with `SINGLE (se)` or `PAIRED (pe)` end mode.
 
 ## Outputs of HTSinfer
 

diff --git a/docs/guides/parameterization.md b/docs/guides/parameterization.md
@@ -14,7 +14,7 @@ remove_adapters_cutadapt:
     -n: '2'
     # Discard processed reads that are shorter than m; note that cutadapt uses
     # a default value of m=0, causing reads without any nucleotides remaining
-    # after proessing to be retained; as "empty reads" will cause errors in
+    # after processing to be retained; as "empty reads" will cause errors in
     # downstream applications in ZARP, we have changed the default to m=1,
     # meaning that only read fragments of at least 1 nt will be retained after
     # processing. The default will be overridden by the value specified here,

diff --git a/docs/guides/usage.md b/docs/guides/usage.md
@@ -54,7 +54,7 @@ your run.
 
     **OR**
 
-    Runner script for _Slurm cluster exection_ (note that you may need
+    Runner script for _Slurm cluster execution_ (note that you may need
     to modify the arguments to `--jobs` and `--cores` in the file:
     `profiles/slurm-singularity/config.yaml` depending on your HPC
     and workload manager configuration):

diff --git a/docs/includes/references.md b/docs/includes/references.md
@@ -10,3 +10,4 @@
 [zarp-issue-tracker]: <https://github.com/zavolanlab/zarp/issues>
 [zarp-qa]: <https://github.com/zavolanlab/zarp/discussions>
 [zavolab-gh]: <https://github.com/zavolanlab>
+[miniconda]: <https://docs.anaconda.com/miniconda/>