Skip to content

Commit

Permalink
Add docker instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
fgypas committed Oct 27, 2024
1 parent f5b780e commit 062005f
Show file tree
Hide file tree
Showing 3 changed files with 60 additions and 10 deletions.
6 changes: 1 addition & 5 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ when there's RNA-seq data to analyze, just _zarp 'em_!

ZARP requires conda or mamba to install the basic dependencies. Each individual step of the workflow run either in its own Apptainer (Singularity) container or in its own Conda virtual environemnt.

Once the installation is complete, you fill in a [config.yaml](https://github.com/zavolanlab/zarp/blob/dev/tests/input_files/config.yaml) file with parameters and a [samples.tsv](https://github.com/zavolanlab/zarp/blob/dev/tests/input_files/samples.tsv) file with sample specific installation. You can easily trigger ZARP by making a call to snakemake with the appropriate parameters.
Once the installation is complete, you fill in a [config.yaml](https://github.com/zavolanlab/zarp/blob/dev/tests/input_files/config.yaml) file with parameters and a [samples.tsv](https://github.com/zavolanlab/zarp/blob/dev/tests/input_files/samples.tsv) file with sample specific information. You can easily trigger ZARP by making a call to snakemake with the appropriate parameters.

The pipeline can be executed in different systems or HPC clusters. ZARP generates multiple output files that help you QC your data and proceed with downstream analyses. Apart from running the main ZARP workflow, you can also run a second pipeline that downloads data from SRA, and a third pipeline that populates a file with the samples and determines sample specific parameters.

Expand All @@ -46,10 +46,6 @@ Alexander Kanitz_
F1000Research 2024, 13:533
<https://doi.org/10.12688/f1000research.149237.1>

## Training materials

Coming soon...

## Info materials

### Poster
Expand Down
3 changes: 1 addition & 2 deletions docs/guides/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@ Installation requires the following:
Clone the [ZARP workflow repository][zarp] with:

```sh
git clone [email protected]:zavolanlab/zarp
# or: git clone https://github.com/zavolanlab/zarp.git
git clone https://github.com/zavolanlab/zarp.git
```

### 2. Set up Conda environment
Expand Down
61 changes: 58 additions & 3 deletions docs/guides/usage.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Execution of pipelines

ZARP consists of three different pipelines. The main pipeline that processes the data, the second allows you to download the sequencing libraries from the Sequence Read Archive (SRA), and the third that populates a file with the samples and determines sample specific parameters.
ZARP consists of three different pipelines. The main pipeline processes the data, the second allows you to download the sequencing libraries from the Sequence Read Archive (SRA), and the third populates a file with the samples and determines sample specific parameters.

If you can create a `samples.tsv` file and fill in the metadata for the different sequencing experiments then the main pipeline can analyze your data.

Expand Down Expand Up @@ -71,7 +71,7 @@ your run.
EOF
```
> Note: When running the pipeline with *conda* you should use `local-conda` and
> Note: When running the pipeline with *Conda* you should use `local-conda` and
`slurm-conda` profiles instead.
> Note: The slurm profiles are adapted to a cluster that uses the quality-of-service (QOS) keyword. If QOS is not supported by your slurm instance, you have to remove all the lines with "qos" in `profiles/slurm-config.json`.
Expand Down Expand Up @@ -162,4 +162,59 @@ snakemake \
However, this call will exit with an error, as not all parameters can be inferred from the example files. The argument `--keep-incomplete` makes sure the `samples_htsinfer.tsv` file can nevertheless be inspected.
After successful execution - if all parameters could be either inferred or were specified by the user - `[OUTDIR]/[SAMPLES_OUT]` should contain a populated table with parameters `seqmode`, `f1_3p`, `f2_3p`, `organism`, `libtype` and `index_size`.
After successful execution - if all parameters could be either inferred or were specified by the user - `[OUTDIR]/[SAMPLES_OUT]` should contain a populated table with parameters `seqmode`, `f1_3p`, `f2_3p`, `organism`, `libtype` and `index_size`.
# Execution with docker
ZARP is optimised for Linux users as all packages are available via Conda or Apptainer (Singularity). For other systems like Mac OS X, they don't work especially due to the current transition from Intel to ARM processors (M series). Nevertheless we built a Docker container that can be used to run ZARP in such environments.
1. Install Docker following the instructions [here](https://docs.docker.com/desktop/install/mac-install/)
2. Pull the Docker image the contains the necessary dependencies
```sh
docker pull zavolab/zarp:1.0.0-rc.1
```
3. Create a directoty (e.g. `data`) and store all the files required for a run:
- The genome sequence fasta file
- The annotation gtf file
- The fastq files of your experiments
- The `rule_config.yaml` for the parameters
- The `samples.tsv` containing the metadata of your samples
- The `config.yaml` file with parameters. Below you can find an example file where you can see that it points to files in the `data` directory.
```yaml
---
# Required fields
samples: "data/samples_docker.tsv"
output_dir: "data/results"
log_dir: "data/logs"
cluster_log_dir: "data/logs/cluster"
kallisto_indexes: "data/results/kallisto_indexes"
salmon_indexes: "data/results/salmon_indexes"
star_indexes: "data/results/star_indexes"
alfa_indexes: "data/results/alfa_indexes"
# Optional fields
rule_config: "data/rule_config.yaml"
report_description: "No description provided by user"
report_logo: "../../images/logo.128px.png"
report_url: "https://zavolan.biozentrum.unibas.ch/"
author_name: "NA"
author_email: "NA"
...
```
4. Execute ZARP as following:
```sh
docker run \
--platform linux/x86_64 \
--mount type=bind,source=$PWD/data,target=/data \
zavolab/zarp:1.0.0-rc.1 \
snakemake \
-p \
--snakefile /workflow/Snakefile \
--configfile data/config.yaml \
--cores 4 \
--use-conda \
--verbose
```
The command runs the Docker container `zavolab/zarp:1.0.0-rc.1` that we have pulled. It executes it as it would be done on a Linux platform `--platform linux/x86_64`. We use the `--mount` option to bind the local `data` directory that contains the input files with the `data` directory in the container. The pipeline is stored in the container in the path `/workflow/Snakefile`. Once ZARP is complete, the results will be stored in the `data/results` directory.

0 comments on commit 062005f

Please sign in to comment.