Name

genome_ref_cal.sh: Script for inferring the genome reference from a vcf file. Works with genome ref grch37, grch38 and hg17.

Synopsis

This script allows to calculate the genome reference from a vcf file. We extracted unique nucleotides per pos in hg17, grch37, grch38.

We select the chromosome with more variants in the vcf file and we run the script for that specific chromosome.

It searchs for matches in 3 dictionaries (one per reference genome) and takes the one having matches (only one).

A maximum of 10K variants per chromosome are analyzed. Also a maximum of 100 matches per chromosome are analyzed.

The script includes the dictionaries for inferring the genome reference.

Run

Before running it for the first time you need to set the variable 'path_dic=' inside 'genome_ref_cal.sh'. In the installation directory the dictionaries are inside a folder name './ref_dics'

The script runs on Bash iand uses standard bash commands. The script was tested on Debian-based distributions.

Please note that input vcf file has to be gzipped (or bgzipped).

bash /path/genome_ref_cal.sh input.vcf.gz

Demo

Demo folder contains a subset of vcf from chromosome 22 from 1000 Genomes data for testing purposes.

*.final: contains total number of matches for each genome reference.
demo_subset.chr: file generated with the number of chromosomes and variants present in the demo vcf.
infer_ref: contains the inferred genome reference.
log

Author

Written by Dietmar Fernandez, PhD. Info about EGA can be found at https://ega-archive.org/.

Copyright

This bash script is copyrighted. See the LICENSE file included in this distribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Name

Synopsis

Run

Demo

Author

Copyright

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
demo		demo
ref_dics		ref_dics
LICENSE		LICENSE
MANIFEST		MANIFEST
README.md		README.md
genome_ref_cal.sh		genome_ref_cal.sh

License

dietmarfdz/EGA_genomeref-1

Folders and files

Latest commit

History

Repository files navigation

Name

Synopsis

Run

Demo

Author

Copyright

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages