Overall steps:

The pipeline for a single paired-end lib contains 15 steps as follow:

report read length
alignment
flagstat bam
post align: dedup bam
post align: picard markedup
post align: dedup bam (again) - final bam file
post align: name sort bam
post align: bam to bedpe
bedpe to tagalign
shift tagalign
xcor subset sample
xcor calculation use subset
macs2 peak calling
filter peaks
ataqc

Individual steps

Alignment

bowtie2  -X2000 --mm --local | samtools view -Su /dev/stdin | samtools sort & index > xxx.PE2SE.bam &.bai

For bowtie2:

Use memory-mapped I/O to load the index (--mm);
'-X2000' means maximum fragment length for valid paired-end alignments is 2000bp;
--local: a preset options mode, default as --sensitive-local,

For samtools view:

-S: ignore for compatibility with previous samtools versions
-u: uncompressed BAM outputs

Filter & deduplicate bam

samtools view -F 1804 -f 2 -u -q 30 xxx.PE2SE.bam | sambamba sort -n  /dev/stdin -o /output_dir/xxx.PE2SE.dupmark.bam

Remove improper mapping marker (1804) & poor mapping score (<30) & output [u]ncompressed bam & [f] output fwd and rev. both mapped pairs
Sort the bam by name (-n) and prepair for the deduplicating step

samtools fixmate -r xxx.PE2SE.dupmark.bam (tmp)  xxx.PE2SE.dupmark.bam.fixmate.bam (tmp)

Fill in mate coordinate. ISIZE (insert size) and mate related flags from the name-sorted bam and remove secondary and ummapped reads (-r)

 samtools view -F 1804 -f 2 -u xxx.PE2SE.dupmark.bam.fixmate.bam | sambamba sort  /dev/stdin -o xxx.PE2SE.filt.bam

Call peaks

macs2 callpeak -t xxx.PE2SE.nodup.tn5.tagAlign.gz -f BED \
-n xxx.PE2SE.nodup.tn5.pf" -g "hs" -p 0.01 --nomodel \ 
--shift -75 --extsize 150 -B --SPMR --keep-dup all --call-summits

Sort by Col8 in descending order and replace long peak names in Column 4 with Peak_

sort -k 8gr,8gr xxx.PE2SE.nodup.tn5.pf"_peaks.narrowPeak | awk 'BEGIN{OFS="\t"}{$4="Peak_"NR ; print $0}' | gzip -nc > xxx.PE2SE.nodup.tn5.pf.narrowPeak.gz

 macs2 bdgcmp -t xxx.PE2SE.nodup.tn5.pf"_treat_pileup.bdg -c xxx.PE2SE.nodup.tn5.pf"_control_lambda.bdg \
 --o-prefix xxx.PE2SE.nodup.tn5.pf" -m FE
 
slopBed -i xxx.PE2SE.nodup.tn5.pf"_FE.bdg -g hg38.chrom.sizes -b 0 | bedClip stdin hg38.chrom.sizes xxx.PE2SE.nodup.tn5.pf.fc.signal.bedgraph

sort -k1,1 -k2,2n xxx.PE2SE.nodup.tn5.pf.fc.signal.bedgraph > xxx.PE2SE.nodup.tn5.pf.fc.signal.srt.bedgraph


bedGraphToBigWig xxx.pf.fc.signal.srt.bedgraph hg38.chrom.sizes xxx.PE2SE.nodup.tn5.pf.fc.signal.bigwig

Lib QCs

Some concerpts:

insert size
fragment distribution

tss enrichment caclculation

Calculated by using the final bam file
Extended TSS to -/+2kb
Use metaseq package to create BamSignal class, and caclulated coverageover TSS features which stores in a (length(features)*bins) NumPy array
Shifted the bam file to half of the read length in the 5' direction
Reversed the promoters on the minus strand
Use normalization method from Greenleaf et al. 2013:
- background average noise is to use averaged coverage of 100bps at both ends
- enrichment = coverage / background average noise

Reference

ENCODE Standards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipeline.md

pipeline.md

Overall steps:

Individual steps

Alignment

Filter & deduplicate bam

Call peaks

Lib QCs

tss enrichment caclculation

Reference

Files

pipeline.md

Latest commit

History

pipeline.md

File metadata and controls

Overall steps:

Individual steps

Alignment

Filter & deduplicate bam

Call peaks

Lib QCs

tss enrichment caclculation

Reference