Python
Some python modules have their own dependancies, see links:
ete2
pycogent
Biopython
Dendropy
psiblast_ugene_TREs_search_pipline.py
Perl
BioPerl
Executables
MAFFT 7
FastTreeMP 2.0
Consel 0.20
TrimAl 1.2
RAxML 7.3.5
UGENE 1.12 (v 1.13 incompatible, because of changes in the output format)
PSITBLASTN 2.2.28+, PSIBLAST 2.2.28+, MAKBLASTDB
psiblast_ugene_TREs_search_pipline.py - the pipline
Fitch2013_Adapted_DeLey2006.tre - A cartoon Nematoda tree to draw results
nematode_species.csv - code name and taxonomy of the studied species
renamed_nodes.txt - corrects a few pipeline derived classifications blased on preliminary results
GG.msa - query for building a Gag domain sequence model
one_seq.txt - subject file for the construction of the Gag model
query_model_seeds/YR.msa - query for building a YR domain sequence model
query_model_seeds/RT.msa - query for building a RT domain sequence model
query_model_seeds/MT.msa - query for building a MT domain sequence model
query_model_seeds/All_proteins.fasta - all the proteins from TEs from RepBase, Retrobase and Pat1 elements to be used for building the protein sequence model
LTR_sequences/RT_outgroup.fas - to be added to the RT dataset for phylogenetic analysis
import osfor folder in ('comparable_trees', 'GG', 'RT', 'YR_tree', 'psiblast_ugene_RESULTS', 'query_model_files'): if not os.path.exists(folder): os.makedirs(folder) else: raise Exception('folder ' + folder + ' already exists')
I am using seed query alignments to query all the DIRS, Ngaro and crypton proteins from repbase, retrobase and some additional pat elements and get an extended MSA file to serve as a basis for query models.
The seed alignments are as follows:
For RT, MT and YR I am using the query alignment from Piednoël et al. (2012). For GAG I am using some proteins from Repase and retrobase.
from Bio.Blast.Applications import NcbipsiblastCommandline import osfiles = os.listdir('query_model_seeds') MSAs = []
for name in files: if '.msa' in name: MSAs.append(name)
os.system('makeblastdb -in query_model_seeds/All_proteins.fasta -dbtype prot')
for msa in MSAs: ckp = msa.split('.')[0] + '.ckp' psiblast_cline = NcbipsiblastCommandline(in_msa= 'query_model_seeds/' + msa, db='query_model_seeds/All_proteins.fasta', evalue=0.00000000001, num_iterations = 10, out_pssm= 'query_model_files/' + ckp) stdout, stderr = psiblast_cline()
The following cell contains a scripts that runs psiblast_ugene_TREs_search_pipline.py for each of the assemblies. The script itself is an external file that is provided in the folder and is not writen in this notebook. The pipeline first searches for YR domains using PSITBLASTN. Then it extends the hits by 10 kbp in each direction and searhces RT and MT domains as well as direct and inverted repeats on the resulting fragments using UGENE. The annotated fragments are searched for a distinct element structure. Following is a short discription of the used parameters and the output files for each assembly.
>> /path/to/folder/psiblast_ugene_TREs_search_pipline.py [species_code_name][path_to_assembly_fasta] [assembly_fasta_file][comment] [path_to_results_folder][path_to_query_model_files]
-
extend: the fragment length in base-pairs from each side of the YR hit, that will be searched for MT and RT domains, as well as repeat sequences (current setting 10,000, i.e., the fragment will be 20,000 bp long plus the length of the YR hit, unless the contig is shorter than that).
This value was chosen based on the assumption that DIRS elements are less than 6 kb in length (Piednoël et al. BMC Genomics 2011, 12:621). An artifact of this may be the inclussion of protein domains from different elements as one element. To avoid this, only RT and MT domains that are circumscribe between element related repeats together with a YR domain were included in the final dataset. -
intron: the length of allowed intron within a domain. Two hits of the same domains will be concatenated if the gap between them is less than the intron maximum length (current setting 100).
Concatenated sequences aligned nicely with other elements. -
e_vlaue: the psitblastn e-value cutoff (current setting 0.01).
This value was chosen following (Piednoël et al. BMC Genomics 2011, 12:621). In addition, further support for the homology of the hits is obtained when the structure of a whole element is detected. - maxdist: the maximum distance between repeat pair partners (ugene, current setting 20,000).
-
mindist: the minimum distance between repeat pair partners (ugene, current setting 0).
Since the fragment are about 20,000 bp long, the maxdist and mindist are non-restrictive and are making no assumptions. -
minlength: the minimal repeat length (ugene, current setting 20).
Based on the results of (Piednoël et al. BMC Genomics 2011, 12:621), the shortest repeat sequence is a central repeat sequence 23 bp long. Central repeats of such length are always identical to the part of the terminal repeat they align with.
UGENE parameters may determine whether a sturcture is identified or not for a minority of the identified elements.
- _table.out: A table with the following info for each identified TRE: Fragment(species:contig:YR_start_on_contig), Structure_Indication, start_on_contig, end_on_contig, Contig_length, Fragment_length, YR_evalue, RT_evalue and MT_evalue.
- _YR_fragments.embl: YR fragment sequences in embl format. The features include all the protein domains and repeat sequences found. The TE feature provide the element's approximate borders. Since only identical repeats are searched for, the repeats borders are approximate. The locations provided are fuzzy, although they are formated as precise.
- _YR_fragments.fas: YR fragment sequences in fasta format.
- .xml: The PSIBLAST results. Three .xml files exist, one for each domain (YR, RT and MT). The YR results provide the location of the YR domain on the contig and the RT and MT results provide the location on the YR fragment (ie, the YR extanded hit)
- _aa.fas: Fasta file containing the protein domains aa sequeces. There are three files, one for each domain (YR, RT and MT)
- _direct.out and _inverted.out: UGENE results in GB format.
- The fragments_diagrams and TEs_diagrams folders: Contain diagrams of all the features found on each fragment or only the features between the element borders, respectively
- Pat1-like: ==>=RT=MT=>>=>=YR=>>=
- Kangaroo: ==>=RT=MT=>>=>=RY=>>=
- Undetermined PAT: =>RT=MT=YR=>>=>=>>=
- Dirs1-like: =>=RT=MT=YR=<>>=<<=
- Ngaro: =>RT=YR=>>=>=>>=
- YR-only:Only a YR domain was found. Could indicate a Crypton or a truncated element.
- Inverted domains: Any case in which there are protein domains on different strands. Could be a truncated Kangaroo, some derived form or two adjacent unrelated elements. The latter posibility could potentialy cause two elements to be mistaken as one. To resolve this, only when RT and MT domains are found together with a YR domain within a repeat structure, will the RT and MT domains will be included in the resulting sequence datatasets.
- Two-domains: Any case in which a YR and another domain are identified, either RT or MT. Could be a truncated Ngaro or a degenerate DIRS. Could also be two adjacent unrelated elements. The latter posibility could potentialy cause two elements to be mistaken as one. To resolve this, only when RT and MT domains are found together with a YR domain within a repeat structure, will the RT and MT domains will be included in the resulting sequence datatasets.
- Bxyl Bursaphelenchus xylophilus ftp://ftp.sanger.ac.uk/pub/pathogens/Bursaphelenchus/xylophilus/Assembly-v1.2/BurXv1.2.supercontigs.fa.gz
- Asuu Ascaris suum ftp://ftp.wormbase.org/pub/wormbase/species/a_suum/sequence/genomic/a_suum.PRJNA62057.WS238.genomic.fa.gz
- Wban Wuchereria bancrofti http://www.broadinstitute.org/annotation/genome/filarial_worms/download/?sp=EASupercontigsFasta&sp=SWuchereria_bancrofti_V1&sp=S.zip
- Ppac Pristionchus pacificus ftp://ftp.wormbase.org/pub/wormbase/species/p_pacificus/sequence/genomic/p_pacificus.PRJNA12644.WS238.genomic.fa.gz
- Ebre Enoplus brevis Genepool, in annotation
- Rcul Romanomermis culicivorax Genepool, in annotation
- Dviv Dictyocaulus viviparus Genepool, in annotation
- Pred Panagrellus redivivus http://www.ncbi.nlm.nih.gov/nuccore?term=KB454917:KB455574[PACC]
- Cang Caenorhabditis angaria ftp://ftp.wormbase.org/pub/wormbase/species/c_angaria/sequence/genomic/c_angaria.PRJNA51225.WS238.genomic.fa.gz
- Cbre Caenorhabditis brenneriftp://ftp.wormbase.org/pub/wormbase/species/c_brenneri/sequence/genomic/c_brenneri.PRJNA20035.WS238.genomic.fa.gz
- briC Caenorhabditis briggsae ftp://ftp.wormbase.org/pub/wormbase/species/c_briggsae/sequence/genomic/c_briggsae.PRJNA10731.WS238.genomic.fa.gz
- Cele Caenorhabditis elegans ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/sequence/genomic/c_elegans.WS235.genomic.fa.gz
- Cjap Caenorhabditis japonica ftp://ftp.wormbase.org/pub/wormbase/species/c_japonica/sequence/genomic/c_japonica.PRJNA12591.WS238.genomic.fa.gz
- Crem Caenorhabditis remanei ftp://ftp.wormbase.org/pub/wormbase/species/c_remanei/sequence/genomic/c_remanei.PRJNA53967.WS238.genomic.fa.gz
- C5sp Caenorhabditis sp. 5 ftp://ftp.wormbase.org/pub/wormbase/releases/WS230/species/c_sp5/c_sp5.WS230.genomic.fa.gz
- C11sp Caenorhabditis sp11 http://genome.wustl.edu/pub/organism/Invertebrates/Caenorhabditis_sp11_JU1373/assembly/Caenorhabditis_sp11_JU1373-3.0.1/output/supercontigs.fa.gz
- Hbac Heterorhabditis bacteriophora http://genome.wustl.edu/pub/organism/Invertebrates/Heterorhabditis_bacteriophora/assembly/Heterorhabditis_bacteriophora-1.2.1/output/contigs.fa.gz
- Otip Oscheius tipulae http://nematodes.org/downloads/959nematodegenomes/blast/db/Oscheius_tipulae_clc3_1.fna
- Srat Strongyloides ratti ftp://ftp.sanger.ac.uk/pub/pathogens/Strongyloides/ratti/version_4/Sratti_v4.genome.fa
- Avit Acanthocheilonema viteae http://nematodes.org/downloads/959nematodegenomes/blast/db2/Acanthocheilonema_viteae_v1.0.fna.gz
- Bmal Brugia malayi ftp://ftp.wormbase.org/pub/wormbase/species/b_malayi/sequence/genomic/b_malayi.PRJNA10729.WS238.genomic.fa.gz
- Dimm Dirofilaria immitis http://nematodes.org/downloads/959nematodegenomes/blast/db2/Dirofilaria_immitis_2.2.fna.gz
- Lsig Litomosoides sigmodontis http://badger.bio.ed.ac.uk/filarial/fileDownload/zip_download?fileName=Litomosoides_sigmodontis_2.1.fna
- Lloa Loa loa http://www.broadinstitute.org/annotation/genome/filarial_worms/download/?sp=EASupercontigsFasta&sp=SLoa_loa_V3&sp=S.zip
- Ooch Onchocerca ochengi http://www.nematodes.org/downloads/959nematodegenomes/blast/db2/Onchocerca_ochengi_nuclear_assembly_nOo.2.0.fna.gz
- Ovol Onchocerca volvulus http://www.broadinstitute.org/annotation/genome/filarial_worms/download/?sp=EASupercontigsFasta&sp=SOnchocerca_volvulus_V1&sp=S.zip
- Hcon Haemonchus contortus ftp://ftp.wormbase.org/pub/wormbase/species/h_contortus/sequence/genomic/h_contortus.PRJEB506.WS238.genomic.fa.gz
- Tspi Trichinella spiralis ftp://ftp.wormbase.org/pub/wormbase/species/t_spiralis/sequence/genomic/t_spiralis.PRJNA12603.WS238.genomic.fa.gz
-
Tmur Trichuris muris ftp://ftp.sanger.ac.uk/pub/pathogens/Trichuris/muris/genome/version_2b/Tmuris_v2b.genome_scaffolds.fa
- Gpal Globodera pallida ftp://ftp.sanger.ac.uk/pub/pathogens/Globodera/pallida/Assembly/ARCHIVE/Gpal.genome.30052012.scaffolds.fa.gz
- Haor Howardula aoronymphium http://nematodes.org/downloads/959nematodegenomes/blast/db/Howardula_aoronymphium_clc_1.fna
- Mflo Meloidogyne floridensis Genepool, in annotation
- Mhap Meloidogyne hapla ftp://ftp.wormbase.org/pub/wormbase/species/m_hapla/sequence/genomic/m_hapla.PRJNA29083.WS238.genomic.fa.gz
- Minc Meloidogyne incognita ftp://ftp.wormbase.org/pub/wormbase/species/m_incognita/sequence/genomic/m_incognita.PRJEA28837.WS238.genomic.fa.gz
-
Hduj Hypsibius dujardini http://badger.bio.ed.ac.uk/H_dujardini/fileDownload/zip_download?fileName=nHd.2.3.abv500.fna
- Dpul Daphnia pulex http://genome.jgi.doe.gov/Dappu1/download/Daphnia_pulex.fasta.gz
- Agam Anopheles gambiae http://genome.wustl.edu/pub/organism/Invertebrates/Anopheles_gambiae/assembly/Anopheles_gambiae_S-4.2/output/supercontigs.fa.gz
- Dsim Drosophila simulans ftp://ftp.flybase.net/genomes/Drosophila_simulans/dsim_r1.4_FB2012_03/fasta/dsim-all-chromosome-r1.4.fasta.gz
- Nvi Nasonia vitripennis ftp://ftp.hgsc.bcm.edu/Nvitripennis/fasta/Nvit_2.0/linearScaffolds/Nvit_2.0.linear.fa.gz
- Nve Nematostella vectensis ftp://ftp.jgi-psf.org/pub/JGI_data/Nematostella_vectensis/v1.0/assembly/Nemve1.fasta.gz
- Acal Aplysia californica http://www.broadinstitute.org/ftp/pub/assemblies/invertebrates/aplysia/AplCal3/A_californica_v0.assembly.fasta.gz
- Lgi Lottia gigantea ftp://ftp.jgi-psf.org/pub/JGI_data/Lottia_gigantea/v1.0/Lotgi1_assembly_scaffolds.fasta.gz
- Acas Acanthamoeba castellanii ftp://ftp.hgsc.bcm.edu/AcastellaniNeff/Acas20100210/LinearScaffolds/Acas20100210.contigs.agp.linear.fa
- Alyr Arabidopsis lyrata ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Alyrata/assembly/Alyrata_107.fa.gz
- Ath Arabidopsis thaliana ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/
- Vcar Volvox carteri ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Vcarteri/assembly/Vcarteri_199.fa.gz
- PAT Pristionchus pacificus Pat1-like Mathieu Piednoël, personal communication
- BAS RepBase_Crypton http://www.girinst.org/protected/repbase_extract.php?division=&customdivision=&rank=&type=Crypton&autonomous=1&nonautonomous=&simple=1&format=EMBL&sa=Download
- BAS RepBase_DIRS http://www.girinst.org/protected/repbase_extract.php?division=&customdivision=&rank=&type=DIRS&autonomous=1&nonautonomous=&simple=1&format=EMBL&sa=Download
- RET Retrobase_DIRS http://biocadmin.otago.ac.nz/fmi/xsl/retrobase/type1.xsl?-db=retrobase.fp7&-lay=AllFieldsLayout&-max=all&-sortfield.1=Subtype&-sortfield.2=Type&-sortfield.3=Family&Format=Method&-find
import osgenomes_path = '/media/amir/DATA/work/Genomes/'
genomes = { 'PAT_DNA_seqs_nematoda.fas': 'PAT', 'DIRS_crypton_autonomous.fasta': 'BAS', 'RetroBase.fas': 'RET', 'meloidogyne.velvet.fa': 'Mflo', 'dictyocaulus.velvet.fa': 'Dviv', 'enoplus.clc.fa': 'Ebre', 'romanomermis.clc.fa': 'Rcul', 'Acanthocheilonema_viteae_v1.0.fa': 'Avit', 'a_suum.PRJNA62057.WS238.genomic.fa': 'Asuu', 'BurXv1.2.supercontigs.fa': 'Bxyl', 'Heterorhabditis_bacteriophora-1.2.1.contigs.fa': 'Hbac', 'Howardula_aoronymphium_clc_1.fa': 'Haor', 'h_contortus.PRJEB506.WS238.genomic.fa': 'Hcon', 'Sratti_v4.genome.fa': 'Srat', 'c_briggsae.PRJNA10731.WS238.genomic.fa': 'briC', 'c_elegans.WS235.genomic.fa': 'Cele', 'c_japonica.PRJNA12591.WS238.genomic.fa': 'Cjap', 'b_malayi.PRJNA10729.WS238.genomic.fa': 'Bmal', 'c_brenneri.PRJNA20035.WS238.genomic.fa': 'Cbre', 'c_remanei.PRJNA53967.WS238.genomic.fa': 'Crem', 'Gpal.genome.30052012.scaffolds.fa': 'Gpal', 'Litomosoides_sigmodontis_2.1.fna': 'Lsig', 'p_pacificus.PRJNA12644.WS238.genomic.fa': 'Ppac', 'loa_loa_v3_3_supercontigs.fasta': 'Lloa', 'm_hapla.PRJNA29083.WS238.genomic.fa': 'Mhap', 'm_incognita.PRJEA28837.WS238.genomic.fa': 'Minc', 'c_sp5.WS230.genomic.fa': 'C5sp', 'Dirofilaria_immitis_2.2.fa': 'Dimm', 'Onchocerca_ochengi_nuclear_assembly_nOo.2.0.fna': 'Ooch', 'onchocerca_volvulus_1_supercontigs.fasta': 'Ovol', 'Oscheius_tipulae_clc3_1.fa': 'Otip', 'Panagrellus_redivivus.KB454917-KB455574.fasta': 'Pred', 'Caenorhabditis_sp11_JU1373-3.0.1.supercontigs.fa': 'C11sp', 'c_angaria.PRJNA51225.WS238.genomic.fa': 'Cang', 'Tmuris_v2b.genome_scaffolds.fa': 'Tmur', 't_spiralis.PRJNA12603.WS238.genomic.fa': 'Tspi', 'wuchereria_bancrofti_1_supercontigs.fasta': 'Wban', 'Acas20100210.contigs.agp.linear.fa': 'Acas', 'Anopheles_gambiae._S-4.2.supercontigs.fa': 'Agam', 'A_californica_v0.assembly.fasta': 'Acal', 'Daphnia_pulex.fasta': 'Dpul', 'Drosophila_simulans_w501-1.1.contigs.fa': 'Dsim', 'Lotgi1_assembly_scaffolds.fasta': 'Lgi', 'Nvit_2.0.linear.fa': 'Nvi', 'Nemve1.fasta': 'Nve', 'Ath_TAIR10.fas': 'Ath', 'Vcarteri_199.fa': 'Vcar', 'Alyrata_107.fa': 'Alyr', 'Hypsibius_dujardini_nHd.2.3.abv500.fna': 'Hduj', } for genome_name in genomes.keys(): os.system('python psiblast_ugene_TREs_search_pipline.py ' + genomes[genome_name] + ' ' + genomes_path + ' ' + genome_name + ' ' + genome_name + ' psiblast_ugene_RESULTS/ ' + 'query_model_files/')
%%script perl #!/usr/bin/perluse strict; use warnings; use Bio::SeqIO;
my $folder = 'psiblast_ugene_RESULTS/'; my $PT = "TE_SEQS"; my $outfile = $folder.$PT.".fasta"; open OUT, ">$outfile"; my $x = $folder.'/_YR_fragments.embl';
for my $d (< $x >){ $d =~ m/.+<span class="o">/([^_]+)_YR_fragments.embl/; my $name = $1; my $format = "embl"; my $inseq = Bio::SeqIO->new(-file => $d, -format => $format);
<span class="k">if</span> <span class="p">(</span><span class="o">-</span><span class="n">e</span><span class="p">(</span><span class="err">$</span><span class="n">d</span><span class="p">)){</span> <span class="k">while</span> <span class="p">(</span><span class="n">my</span> <span class="err">$</span><span class="n">seq</span> <span class="o">=</span> <span class="err">$</span><span class="n">inseq</span><span class="o">-></span><span class="n">next_seq</span><span class="p">){</span> <span class="k">for</span> <span class="n">my</span> <span class="err">$</span><span class="n">feature</span><span class="p">(</span><span class="err">$</span><span class="n">seq</span> <span class="o">-></span> <span class="n">get_SeqFeatures</span><span class="p">){</span> <span class="k">if</span> <span class="p">(</span><span class="err">$</span><span class="n">feature</span> <span class="o">-></span> <span class="n">primary_tag</span> <span class="n">eq</span> <span class="s">"TE"</span><span class="p">){</span> <span class="n">my</span> <span class="err">$</span><span class="n">cds</span> <span class="o">=</span> <span class="err">$</span><span class="n">feature</span> <span class="o">-></span> <span class="n">spliced_seq</span> <span class="o">-></span> <span class="n">seq</span><span class="p">;</span> <span class="k">for</span> <span class="n">my</span> <span class="err">$</span><span class="n">tag</span> <span class="p">(</span><span class="err">$</span><span class="n">feature</span> <span class="o">-></span> <span class="n">get_all_tags</span><span class="p">){</span> <span class="k">if</span> <span class="p">(</span><span class="err">$</span><span class="n">tag</span> <span class="n">eq</span> <span class="s">"element"</span><span class="p">){</span> <span class="n">my</span> <span class="err">$</span><span class="n">value</span><span class="p">;</span> <span class="k">for</span> <span class="n">my</span> <span class="err">$</span><span class="n">i</span> <span class="p">(</span><span class="err">$</span><span class="n">feature</span> <span class="o">-></span> <span class="n">get_tag_values</span><span class="p">(</span><span class="err">$</span><span class="n">tag</span><span class="p">)){</span> <span class="err">$</span><span class="n">value</span> <span class="o">=</span> <span class="err">$</span><span class="n">i</span><span class="p">;</span> <span class="p">}</span> <span class="k">print</span> <span class="n">OUT</span> <span class="s">">"</span><span class="p">,</span> <span class="err">$</span><span class="n">name</span><span class="p">,</span> <span class="s">'_'</span><span class="p">,</span> <span class="err">$</span><span class="n">seq</span> <span class="o">-></span> <span class="n">accession</span><span class="p">,</span> <span class="s">'_'</span><span class="p">,</span> <span class="err">$</span><span class="n">value</span> <span class="p">,</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="err">$</span><span class="n">cds</span><span class="p">,</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span>
}
close OUT;
import os, sys, re from Bio import SeqIO, SearchIO from Bio.SeqIO import SeqRecord from Bio.Seq import Seq from Bio.Alphabet import IUPACfrom Bio.Blast.Applications import NcbitblastnCommandline, NcbipsiblastCommandline
#first search for GAG proteins with a query adapted for PAT elements
folder = 'psiblast_ugene_RESULTS/' path = './'
os.system('makeblastdb -in ' + folder + 'TE_SEQS.fasta -dbtype nucl')
#make PAT adapted GAG model to search for zinc-fingers in PAT elements psiblast_cline = NcbipsiblastCommandline(in_msa= './GG.msa', subject='./one_seq.txt', out_pssm= 'GG/GG.ckp') stdout, stderr = psiblast_cline()
zf_count = 0 iterate = 1
seen_zf = [] while iterate == 1:
<span class="c">#search TE DNA for GAG AA</span> <span class="n">psiblast_cline</span> <span class="o">=</span> <span class="n">NcbitblastnCommandline</span><span class="p">(</span><span class="n">in_pssm</span><span class="o">=</span><span class="s">'GG/GG.ckp'</span><span class="p">,</span> <span class="n">db</span><span class="o">=</span><span class="n">folder</span> <span class="o">+</span> <span class="s">'TE_SEQS.fasta'</span><span class="p">,</span> <span class="n">evalue</span><span class="o">=</span><span class="mf">0.0001</span><span class="p">,</span> <span class="n">outfmt</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">out</span><span class="o">=</span> <span class="n">path</span> <span class="o">+</span> <span class="s">'GG/psiblast.out'</span><span class="p">)</span> <span class="n">stdout</span><span class="p">,</span> <span class="n">stderr</span> <span class="o">=</span> <span class="n">psiblast_cline</span><span class="p">()</span> <span class="c">#make GAG AA fasta file</span> <span class="n">GG_filename</span> <span class="o">=</span> <span class="n">path</span> <span class="o">+</span> <span class="s">'GG/GG.fas'</span> <span class="n">GG_blast_results</span> <span class="o">=</span> <span class="n">SearchIO</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">path</span> <span class="o">+</span> <span class="s">'GG/psiblast.out'</span><span class="p">,</span> <span class="s">'blast-xml'</span><span class="p">)</span> <span class="n">GG_seqs</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">hit</span> <span class="ow">in</span> <span class="n">GG_blast_results</span><span class="p">:</span> <span class="k">for</span> <span class="n">hsp</span> <span class="ow">in</span> <span class="n">hit</span><span class="o">.</span><span class="n">hsps</span><span class="p">:</span> <span class="n">GG_seqs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">hsp</span><span class="o">.</span><span class="n">hit</span><span class="p">)</span> <span class="c">#make singular names and remove gaps</span> <span class="n">singular</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">record</span> <span class="ow">in</span> <span class="n">GG_seqs</span><span class="p">:</span> <span class="n">record</span><span class="o">.</span><span class="n">seq</span> <span class="o">=</span> <span class="n">Seq</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="s">'-'</span><span class="p">,</span><span class="s">''</span><span class="p">,</span><span class="nb">str</span><span class="p">(</span><span class="n">record</span><span class="o">.</span><span class="n">seq</span><span class="p">)),</span> <span class="n">IUPAC</span><span class="o">.</span><span class="n">protein</span><span class="p">)</span> <span class="n">record</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">singular</span><span class="p">)</span> <span class="o">+</span> <span class="s">'_'</span> <span class="o">+</span> <span class="n">record</span><span class="o">.</span><span class="n">id</span> <span class="n">singular</span> <span class="o">+=</span> <span class="mi">1</span> <span class="n">SeqIO</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">GG_seqs</span><span class="p">,</span> <span class="n">GG_filename</span><span class="p">,</span> <span class="s">"fasta"</span><span class="p">)</span> <span class="c">#Search for zinc finger motifs</span> <span class="n">ZF_pattrns</span> <span class="o">=</span> <span class="p">{</span> <span class="s">'dirs1'</span><span class="p">:</span> <span class="p">[</span><span class="s">'C[A-Z]{3}C[A-Z]{9}H[A-Z]{2}C[A-Z]{2}C[A-Z]{4}H[A-Z]{9}C[A-Z]HC'</span><span class="p">,</span> <span class="s">'C[A-Z]{3}C[A-Z]{9}H[A-Z]{2}C[A-Z]{2}C[A-Z]{4}H[A-Z]{10}C[A-Z]HC'</span><span class="p">,</span> <span class="s">'C[A-Z]{2}CC[A-Z]{6}H[A-Z]{4}C[A-Z]{2}C'</span><span class="p">,</span> <span class="s">'C[A-Z]{2}C[A-Z]{9}H[A-Z]{2}C[A-Z]{2}C[A-Z]{4}H[A-Z]{4}C[A-Z]{2}C'</span> <span class="p">],</span> <span class="s">'pat1_Ngaro'</span><span class="p">:</span> <span class="p">[</span><span class="s">'C[A-Z]{2}C[A-Z]{4}H[A-Z]{4}C'</span><span class="p">],</span> <span class="s">'Ngaro'</span><span class="p">:</span> <span class="p">[</span><span class="s">'C[A-Z]{7}C[A-Z]{2}H[A-Z]C'</span><span class="p">,</span> <span class="s">'C[A-Z]{2}C[A-Z]{3}H[A-Z]{4}C'</span><span class="p">,</span> <span class="s">'C[A-Z]{7}C[A-Z]{5}C[A-Z]{3}H'</span><span class="p">,</span> <span class="s">'C[A-Z]{8}C[A-Z]{5}C[A-Z]{3}H'</span><span class="p">,</span> <span class="s">'C[A-Z]{2}C[A-Z]{3}H[A-Z]{6}C'</span><span class="p">,</span> <span class="s">'C[A-Z]{8}C[A-Z]{4}H[A-Z]{3}H'</span><span class="p">,</span> <span class="s">'C[A-Z]{2}C[A-Z]{4}H[A-Z]{4}H'</span><span class="p">,</span> <span class="s">'C[A-Z]{11}C[A-Z]H[A-Z]{6}H'</span><span class="p">,</span> <span class="s">'C[A-Z]{2}C[A-Z]{4}H[A-Z]{6}C'</span><span class="p">,</span> <span class="s">'C[A-Z]{13}C[A-Z]{5}C[A-Z]{5}H'</span><span class="p">,</span> <span class="p">]</span> <span class="p">}</span> <span class="n">new_zf_count</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">ZF_hits</span> <span class="o">=</span> <span class="p">{}</span> <span class="n">records</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">SeqIO</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">GG_filename</span><span class="p">,</span> <span class="s">'fasta'</span><span class="p">))</span> <span class="k">for</span> <span class="n">record</span> <span class="ow">in</span> <span class="n">records</span><span class="p">:</span> <span class="n">ZF_hits</span><span class="p">[</span><span class="n">record</span><span class="o">.</span><span class="n">id</span><span class="p">]</span> <span class="o">=</span> <span class="p">{}</span> <span class="k">for</span> <span class="n">element</span> <span class="ow">in</span> <span class="n">ZF_pattrns</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="n">ZF_hits</span><span class="p">[</span><span class="n">record</span><span class="o">.</span><span class="n">id</span><span class="p">][</span><span class="n">element</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">pattern</span> <span class="ow">in</span> <span class="n">ZF_pattrns</span><span class="p">[</span><span class="n">element</span><span class="p">]:</span> <span class="n">hits</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="n">pattern</span><span class="p">,</span> <span class="nb">str</span><span class="p">(</span><span class="n">record</span><span class="o">.</span><span class="n">seq</span><span class="p">),</span> <span class="n">re</span><span class="o">.</span><span class="n">IGNORECASE</span><span class="p">)</span> <span class="n">ZF_hits</span><span class="p">[</span><span class="n">record</span><span class="o">.</span><span class="n">id</span><span class="p">][</span><span class="n">element</span><span class="p">]</span> <span class="o">+=</span> <span class="nb">len</span><span class="p">(</span><span class="n">hits</span><span class="p">)</span> <span class="k">if</span> <span class="n">ZF_hits</span><span class="p">[</span><span class="n">record</span><span class="o">.</span><span class="n">id</span><span class="p">][</span><span class="n">element</span><span class="p">]</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span> <span class="n">new_zf_count</span> <span class="o">+=</span> <span class="mi">1</span> <span class="n">line</span> <span class="o">=</span> <span class="p">(</span> <span class="n">record</span><span class="o">.</span><span class="n">id</span> <span class="o">+</span> <span class="s">' '</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">ZF_hits</span><span class="p">[</span><span class="n">record</span><span class="o">.</span><span class="n">id</span><span class="p">][</span><span class="n">element</span><span class="p">])</span> <span class="o">+</span> <span class="s">' '</span> <span class="o">+</span> <span class="n">element</span> <span class="p">)</span> <span class="n">line_parts</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">'_'</span><span class="p">)</span> <span class="n">line</span> <span class="o">=</span> <span class="s">'_'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">line_parts</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">seen_zf</span><span class="p">:</span> <span class="n">seen_zf</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">line</span><span class="p">)</span> <span class="k">if</span> <span class="n">new_zf_count</span> <span class="o">==</span> <span class="n">zf_count</span><span class="p">:</span> <span class="n">iterate</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">zf_count</span> <span class="o">=</span> <span class="n">new_zf_count</span> <span class="c">#align hit GAG AA sequences</span> <span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s">'mafft '</span> <span class="o">+</span> <span class="n">GG_filename</span> <span class="o">+</span> <span class="s">' > '</span> <span class="o">+</span> <span class="n">GG_filename</span> <span class="o">+</span><span class="s">'.aln'</span><span class="p">)</span> <span class="c">#convert to blast in_msa format</span> <span class="n">SeqIO</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="n">GG_filename</span> <span class="o">+</span><span class="s">'.aln'</span><span class="p">,</span> <span class="s">'fasta'</span><span class="p">,</span> <span class="n">GG_filename</span> <span class="o">+</span><span class="s">'.msa'</span><span class="p">,</span> <span class="s">'clustal'</span><span class="p">)</span> <span class="n">IN</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">GG_filename</span> <span class="o">+</span><span class="s">'.msa'</span><span class="p">,</span> <span class="s">'r'</span><span class="p">)</span><span class="o">.</span><span class="n">readlines</span><span class="p">()</span> <span class="n">OUT</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">GG_filename</span> <span class="o">+</span><span class="s">'.msa'</span><span class="p">,</span> <span class="s">'wt'</span><span class="p">)</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">IN</span><span class="p">[</span><span class="mi">3</span><span class="p">:]:</span> <span class="n">OUT</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">line</span><span class="p">)</span> <span class="n">OUT</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="c">#make new ckp file from new alignment</span> <span class="n">psiblast_cline</span> <span class="o">=</span> <span class="n">NcbipsiblastCommandline</span><span class="p">(</span><span class="n">in_msa</span> <span class="o">=</span> <span class="n">GG_filename</span> <span class="o">+</span><span class="s">'.msa'</span><span class="p">,</span> <span class="n">subject</span> <span class="o">=</span> <span class="s">'./one_seq.txt'</span><span class="p">,</span> <span class="n">out_pssm</span> <span class="o">=</span> <span class="s">'./GG/GG.ckp'</span><span class="p">)</span> <span class="n">stdout</span><span class="p">,</span> <span class="n">stderr</span> <span class="o">=</span> <span class="n">psiblast_cline</span><span class="p">()</span>
for line in seen_zf: print line
Lgi_150_453169_PAT 1 pat1_Ngaro Lgi_59_1690142_DIRS 1 pat1_Ngaro Lgi_59_1685024_DIRS 1 pat1_Ngaro Lgi_246_5989_DIRS 1 pat1_Ngaro Lgi_246_5423_DIRS 1 pat1_Ngaro Nve_318_58153_PAT 1 pat1_Ngaro Nve_318_58227_PAT 1 pat1_Ngaro Hcon_0059030_17723_pat1-like 1 pat1_Ngaro Hcon_0059030_17196_pat1-like 1 pat1_Ngaro Rcul_11614_5929_DIRS 1 pat1_Ngaro Rcul_65775_27385_DIRS 1 pat1_Ngaro
/usr/local/lib/python2.7/dist-packages/Bio/SearchIO/__init__.py:213: BiopythonExperimentalWarning: Bio.SearchIO is an experimental submodule which may undergo significant changes prior to its future official release. BiopythonExperimentalWarning)
For the purpose of reconstructing a phylogenetic tree, RT hits from all the assemblies are joind into one file
import os, sys from Bio import SeqIO folder = 'psiblast_ugene_RESULTS/' dirs = os.listdir(folder) DIRS = [] for name in dirs: if not '.' in name: DIRS.append(name)RT_records = []
bad_hits = ( #These are crypton sequences with a short RT like hit inside their YR orf 'Lgi:sca_27:177842:1:two_domains', 'Lgi:sca_268:84059:1:two_domains', 'Lgi:sca_268:84059:2:two_domains', 'Lgi:sca_268:92085:1:two_domains', 'Lgi:sca_268:92085:2:two_domains', 'Lgi:sca_46:1937219:1:two_domains', 'Lgi:sca_197:139221:1:two_domains', 'BAS:CryptonA-1_LG:614:1:two_domains', 'Lgi:sca_59:5984:1:two_domains', 'Lgi:sca_225:109912:1:two_domains', 'Lgi:sca_38:570163:1:two_domains', 'Lgi:sca_113:551573:1:two_domains', 'Lgi:sca_113:551191:1:two_domains', 'Hduj:nHd.2.3.scaf00001:126730:1:two_domains', #These are elements with multiple RT sequences that cluster 'Hduj:nHd.2.3.scaf00001:127260:1:two_domains', #with the outgroup. ie, they have adjecent LTR which wrongly 'Lgi:sca_75:937380:1:DIRS', #passed previous filters 'Lgi:sca_18:2161112:1:two_domains', 'Lgi:sca_268:84095:1:two_domains', 'Nve:scaffold_182:226621:1:two_domains', 'Ppac:Ppa_Contig83:451527:1:DIRS', 'Acal:scaffold01513:41419:1:two_domains' ) for dir in DIRS: if os.path.exists(folder + dir + '/' + dir + '_RT_aa.fas'): records = SeqIO.parse(folder + dir + '/' + dir + '_RT_aa.fas', 'fasta') for record in records: if not record.description in bad_hits: RT_records.append(record)
for records in [RT_records]: print str(len(records)) SeqIO.write(RT_records, 'RT/all_RT.fas', 'fasta')
550
550
import os, sys from Bio import SeqIO folder = 'psiblast_ugene_RESULTS/' dirs = os.listdir(folder) DIRS = [] for name in dirs: if not '.' in name: DIRS.append(name)YR_records = [] for dir in DIRS: if os.path.exists(folder + dir + '/' + dir + '_YR_aa.fas'): records = SeqIO.parse(folder + dir + '/' + dir + '_YR_aa.fas', 'fasta') for record in records: YR_records.append(record)
SeqIO.write(YR_records, 'YR_tree/all_YR.fas', 'fasta')
2753
import os, sys from Bio import SeqIO folder = './'in_YR_records = list(SeqIO.parse(folder + 'YR_tree/all_YR.fas','fasta')) out_YR_records = []
in_RT_records = list(SeqIO.parse(folder + 'RT/all_RT.fas','fasta')) out_RT_records = []
for rt_record in in_RT_records: for yr_record in in_YR_records: if rt_record.id == yr_record.id: out_RT_records.append(rt_record) out_YR_records.append(yr_record)
records = {'YR': out_YR_records, 'RT': out_RT_records} for i in records.keys(): print str(len(records[i])), ' sequences selected for ', i
SeqIO.write(out_YR_records, folder + 'comparable_trees/YR.fas', 'fasta') SeqIO.write(out_RT_records, folder + 'comparable_trees/RT.fas', 'fasta')
489 sequences selected for RT 489 sequences selected for YR
489
from Bio import SeqIO folder = './'RT_no_outgroup_records = open(folder + 'comparable_trees/RT.fas', 'r').read() YR_no_outgroup_records = list(SeqIO.parse(folder + 'comparable_trees/YR.fas', 'fasta'))
RT_with_outgroup_records = open(folder + 'comparable_trees/RT_outgrouped.fas', 'wt') YR_with_outgroup_records = YR_no_outgroup_records
RT_outgroup = open(folder + 'LTR_sequences/RT_outgroup.fas', 'r').read()
RT_with_outgroup_records.write(RT_no_outgroup_records + RT_outgroup) RT_with_outgroup_records.close()
all_YR_records = list(SeqIO.parse(folder + 'YR_tree/all_YR.fas','fasta')) for yr_record in all_YR_records: if 'rypton' in yr_record.id: YR_with_outgroup_records.append(yr_record) SeqIO.write(YR_with_outgroup_records, folder + 'comparable_trees/YR_outgrouped.fas', 'fasta')
520
The following script replaces offensive header characters and eliminates gaps in the some of the sequences that were added manually
import os, sys, re from Bio import SeqIOpath = './' aa_files = ['YR_tree/all_YR.fas', 'comparable_trees/RT.fas', 'comparable_trees/RT_outgrouped.fas', 'comparable_trees/YR.fas', 'comparable_trees/YR_outgrouped.fas']
for aa_file in aa_files: corrected_filename = aa_file.split('.')[0] + 'corrected.fas' corrected_file = open(path + corrected_filename,'wt') records = SeqIO.parse(path + aa_file, 'fasta') for record in records: record.description = re.sub(':','',record.description) record.description = re.sub(';','',record.description) record.description = re.sub('|','',record.description) record.description = re.sub('*','',record.description) record.description = re.sub('-','',record.description) record.description = re.sub(',','',record.description) record.description = re.sub('/','',record.description) record.description = re.sub(' ','',record.description) record.description = re.sub('.','',record.description) corrected_file.write('>' + record.description.split(' ')[0] + '\n') sequence = re.sub('*','X',str(record.seq)) sequence = re.sub('J','X',sequence) sequence = re.sub('-','',sequence) corrected_file.write(sequence + '\n') corrected_file.close()
from Bio import SeqIO minimum_length = 90 path = './' trimmed_records = []records = list(SeqIO.parse(path + 'YR_tree/all_YR_corrected.fas', 'fasta'))
for record in records: res_count = 0 sequence = str(record.seq) for res in sequence: if not res == '-': res_count += 1
<span class="k">if</span> <span class="n">res_count</span> <span class="o">>=</span> <span class="n">minimum_length</span><span class="p">:</span> <span class="n">trimmed_records</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">record</span><span class="p">)</span>
SeqIO.write(trimmed_records, path + 'YR_tree/all_YR_corrected.fas', 'fasta')
1842
import ospath = './' aa_files = ['YR_tree/all_YR_corrected.fas', 'comparable_trees/RT_corrected.fas', 'comparable_trees/RT_outgrouped_corrected.fas', 'comparable_trees/YR_corrected.fas', 'comparable_trees/YR_outgrouped_corrected.fas'] for corrected_aa_file in aa_files: input_file = path + corrected_aa_file output_file = path + corrected_aa_file.split('.')[0] + '.mafft.fas' os.system('mafft ' + input_file + ' > ' + output_file)
This is done using the program TrimAl
http://trimal.cgenomics.org/
import os from Bio import SeqIOpath = './' aa_files = ['YR_tree/all_YR_corrected.mafft.fas', 'comparable_trees/RT_corrected.mafft.fas', 'comparable_trees/RT_outgrouped_corrected.mafft.fas', 'comparable_trees/YR_corrected.mafft.fas', 'comparable_trees/YR_outgrouped_corrected.mafft.fas'] for mafft_aa_file in aa_files: input_file = path + mafft_aa_file output_file = path + mafft_aa_file.split('.')[0] + '.mafft_trim07.fas' os.system('trimal -in ' + input_file + ' -out ' + output_file + ' -gt 0.3')
aa_files = ['YR_tree/all_YR_corrected.mafft_trim07.fas', 'comparable_trees/RT_corrected.mafft_trim07.fas', 'comparable_trees/RT_outgrouped_corrected.mafft_trim07.fas', 'comparable_trees/YR_corrected.mafft_trim07.fas', 'comparable_trees/YR_outgrouped_corrected.mafft_trim07.fas']
for trimal_aa_file in aa_files: records = list(SeqIO.parse(trimal_aa_file, 'fasta')) for record in records: record.description = '' SeqIO.write(records, trimal_aa_file, 'fasta')
from Bio.SeqRecord import SeqRecord from Bio import SeqIOpath = './' RT_records = list(SeqIO.parse(path + 'comparable_trees/RT_corrected.mafft_trim07.fas','fasta')) YR_records = list(SeqIO.parse(path + 'comparable_trees/YR_corrected.mafft_trim07.fas', 'fasta')) RT_YR_records = []
for rt_record in RT_records: for yr_record in YR_records: if rt_record.id == yr_record.id: RT_YR_record = SeqRecord(seq=rt_record.seq+yr_record.seq, id=yr_record.id) RT_YR_records.append(RT_YR_record)
SeqIO.write(RT_YR_records, path + 'comparable_trees/RTYR_corrected.mafft_trim07.fas','fasta')
489
from Bio import SeqIO minimum_length = 60 path = './' trimmed_records = []records = list(SeqIO.parse(path + 'YR_tree/all_YR_corrected.mafft_trim07.fas', 'fasta'))
for record in records: res_count = 0 sequence = str(record.seq) for res in sequence: if not res == '-': res_count += 1
<span class="k">if</span> <span class="n">res_count</span> <span class="o">>=</span> <span class="n">minimum_length</span><span class="p">:</span> <span class="n">trimmed_records</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">record</span><span class="p">)</span>
SeqIO.write(trimmed_records, path + 'YR_tree/all_YR_corrected.mafft_trim07_minlen30.fas', 'fasta')
1662
import os path = './' aa_files = ['YR_tree/all_YR_corrected.mafft_trim07_minlen30.fas', 'comparable_trees/RT_corrected.mafft_trim07.fas', 'comparable_trees/RT_outgrouped_corrected.mafft_trim07.fas', 'comparable_trees/YR_corrected.mafft_trim07.fas', 'comparable_trees/YR_outgrouped_corrected.mafft_trim07.fas', 'comparable_trees/RTYR_corrected.mafft_trim07.fas' ] for aa_file in aa_files: input_file = path + aa_file output_file = path + aa_file.split('.')[0] + '.tre' os.system('FastTreeMP -gamma -spr 4 -mlacc 2 -slownni -pseudo ' + input_file + ' > ' + output_file) #model is JTT
The RT dataset excludes LTR sequences and the YR dataset excludes crypton sequences in order to have identical element representations in all three datasets. Per-site log likelihoods were computed with RAxML 7.3.5. The AU test was computed with Consel 0.20. The results did not support partition concatenation.
from ete2 import Tree import dendropy, re, ospath = './' if os.path.exists('consel.log'): os.remove('consel.log')
#make tree file with resolved polytomies tree_count = 1
consel_trees = open('consel_trees.tre','wt') for tree in ['comparable_trees/RT_corrected.tre', 'comparable_trees/YR_corrected.tre', 'comparable_trees/RTYR_corrected.tre' ]: os.system('echo ' + str(tree_count) + '_' + tree + ' >> consel.log') tree_count += 1 t = Tree(tree) t.resolve_polytomy(recursive=True) t.set_outgroup(t&'BAS_CcNgaro3_4209_1_viper_or_Ngaro') consel_trees.write(t.write(format = 5) + '\n') consel_trees.close()
for dataset in ['comparable_trees/RT_corrected.mafft_trim07.fas', 'comparable_trees/YR_corrected.mafft_trim07.fas', 'comparable_trees/RTYR_corrected.mafft_trim07.fas']:
<span class="c">#convert to phylip for raxml</span> <span class="n">d</span> <span class="o">=</span> <span class="n">dendropy</span><span class="o">.</span><span class="n">DataSet</span><span class="o">.</span><span class="n">get_from_path</span><span class="p">(</span> <span class="n">dataset</span><span class="p">,</span> <span class="s">"fasta"</span><span class="p">,</span> <span class="c"># taxon_set=None,</span> <span class="n">data_type</span><span class="o">=</span><span class="s">"protein"</span><span class="p">,</span> <span class="n">row_type</span><span class="o">=</span><span class="s">'rich'</span><span class="p">)</span> <span class="n">d</span><span class="o">.</span><span class="n">write_to_path</span><span class="p">(</span> <span class="n">dataset</span> <span class="o">+</span> <span class="s">".phy"</span><span class="p">,</span> <span class="s">'phylip'</span><span class="p">,</span> <span class="n">taxon_set</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">strict</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">space_to_underscores</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">force_unique_taxon_labels</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="c">#make a correction to fasta headers</span> <span class="n">IN</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">dataset</span> <span class="o">+</span> <span class="s">".phy"</span><span class="p">,</span> <span class="s">'r'</span><span class="p">)</span><span class="o">.</span><span class="n">readlines</span><span class="p">()</span> <span class="n">newlines</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">IN</span><span class="p">:</span> <span class="n">newline</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="s">' \<unknown description\>'</span><span class="p">,</span> <span class="s">''</span><span class="p">,</span> <span class="n">line</span><span class="p">)</span> <span class="n">newlines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">newline</span><span class="p">)</span> <span class="n">OUT</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">dataset</span> <span class="o">+</span> <span class="s">".phy"</span><span class="p">,</span> <span class="s">'wt'</span><span class="p">)</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">newlines</span><span class="p">:</span> <span class="n">OUT</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">line</span><span class="p">)</span> <span class="n">OUT</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="c">#compute per-site log likelihoods</span> <span class="kn">import</span> <span class="nn">os</span> <span class="n">cwd</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getcwd</span><span class="p">()</span> <span class="n">dataname</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">'/'</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">'_'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="n">files</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="s">'./'</span><span class="p">)</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">files</span><span class="p">:</span> <span class="k">if</span> <span class="s">'RAxML'</span> <span class="ow">in</span> <span class="n">name</span><span class="p">:</span> <span class="n">os</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s">'raxmlHPC -f g -s '</span> <span class="o">+</span> <span class="n">dataset</span> <span class="o">+</span> <span class="s">'.phy -z consel_trees.tre -m PROTGAMMAJTT -n '</span> <span class="o">+</span> <span class="n">dataname</span> <span class="o">+</span> <span class="s">' -w '</span> <span class="o">+</span> <span class="n">cwd</span><span class="p">)</span> <span class="c">#au-test</span> <span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s">'makermt -b 10 --puzzle RAxML_perSiteLLs.'</span> <span class="o">+</span> <span class="n">dataname</span><span class="p">)</span> <span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s">'consel RAxML_perSiteLLs'</span><span class="p">)</span> <span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s">'echo >> consel.log; echo '</span> <span class="o">+</span> <span class="n">dataset</span> <span class="o">+</span> <span class="s">' >> consel.log; catpv RAxML_perSiteLLs >> consel.log'</span> <span class="p">)</span>
log = open('consel.log', 'r').read() print log
files = os.listdir('./') for name in files: if 'RAxML' in name: os.remove(name)
1_comparable_trees/RT_corrected.tre 2_comparable_trees/YR_corrected.tre 3_comparable_trees/RTYR_corrected.trecomparable_trees/RT_corrected.mafft_trim07.fas
comparable_trees/YR_corrected.mafft_trim07.fas
comparable_trees/RTYR_corrected.mafft_trim07.fas
Following a preliminary tree reconstruction, the diagrams of elements that seemed misplaced were examined and their classification was manually corrected if the pipeline indication did not seem to feet the diagram (see above for reasons why).
from ete2 import Tree, NodeStyle, TreeStyle, TextFace, PhyloTree import syspath = './' tre_files = ['YR_tree/all_YR_corrected.tre', 'comparable_trees/RT_corrected.tre', 'comparable_trees/RT_outgrouped_corrected.tre', 'comparable_trees/YR_corrected.tre', 'comparable_trees/YR_outgrouped_corrected.tre', 'comparable_trees/RTYR_corrected.tre' ] name_replacements = open(path + 'renamed_nodes.txt','r').readlines()
for tre_file in tre_files: replaced_nodes = open(path + tre_file.split('.')[0] + '_replaced_nodes.txt', 'wt') replaced_count = 0 tree = Tree(path + tre_file) for line in name_replacements: line = line.rstrip('\n') (old_name, new_name) = line.split('\t') if old_name in tree.get_leaf_names(): (tree&old_name).name = new_name replaced_count += 1 replaced_nodes.write(old_name + '\t' + new_name + '\n') print 'Replaced ' + str(replaced_count) + ' node names out of ' + str(len(tree)) + ' in tree ' + tre_file print 'Replacements are in ' + tre_file.split('.')[0] + '_replaced_nodes.txt' print replaced_nodes.write('Replaced ' + str(replaced_count) + ' node names out of ' + str(len(tree)) + ' in tree ' + tre_file + '\nReplacements are in ' + tre_file.split('.')[0] + '_replaced_nodes.txt') replaced_nodes.close() tree.write(outfile= path + tre_file.split('.')[0] + '_reviewed.tre', format=1, features=['support'])
Replaced 9 node names out of 1662 in tree YR_tree/all_YR_corrected.tre Replacements are in YR_tree/all_YR_corrected_replaced_nodes.txtReplaced 1 node names out of 489 in tree comparable_trees/RT_corrected.tre Replacements are in comparable_trees/RT_corrected_replaced_nodes.txt
Replaced 1 node names out of 520 in tree comparable_trees/RT_outgrouped_corrected.tre Replacements are in comparable_trees/RT_outgrouped_corrected_replaced_nodes.txt
Replaced 1 node names out of 489 in tree comparable_trees/YR_corrected.tre Replacements are in comparable_trees/YR_corrected_replaced_nodes.txt
Replaced 1 node names out of 520 in tree comparable_trees/YR_outgrouped_corrected.tre Replacements are in comparable_trees/YR_outgrouped_corrected_replaced_nodes.txt
Replaced 1 node names out of 489 in tree comparable_trees/RTYR_corrected.tre Replacements are in comparable_trees/RTYR_corrected_replaced_nodes.txt
Nodes were collapsed if they had a sh-like support grater than 0.7 in the reduced YR tree or 0.9 in the full YR tree and if they did not contain conflicting classificaionts. In addition, in unsupported clades (sh-support < 0.95) nodes that have long branches (4 times the median branch length in the clade) are removed for the sake of the comparisson between the full YR tree and the reduced YR tree.
from ete2 import Tree, NodeStyle, TreeStyle, TextFace, PhyloTree, CircleFace, PieChartFace import sys, numpy path = './'def blen_cutoff(node,brlns_cutoff_factor): blens = [] for br in node.traverse(): if not br.name == node.name and not br.dist == 0: blens.append(float(br.dist)) return numpy.median(blens)*brlns_cutoff_factor
tree_string_files = [ 'YR_tree/all_YR_corrected_reviewed.tre', #'comparable_trees/RT_corrected_reviewed.tre', 'comparable_trees/RT_outgrouped_corrected_reviewed.tre', #'comparable_trees/RTYR_corrected_reviewed.tre', #'comparable_trees/YR_corrected_reviewed.tre', 'comparable_trees/YR_outgrouped_corrected_reviewed.tre' ]
for tree_string_file in tree_string_files: support_cutoff = 0.01 if tree_string_file == 'YR_tree/all_YR_corrected_reviewed.tre': support_cutoff = 0.9 #higher cutoff for phylo-placing than for just phylo presenting elif tree_string_file == 'comparable_trees/YR_outgrouped_corrected_reviewed.tre': support_cutoff = 0.7 #higher cutoff for phylo-placing than for just phylo presenting
<span class="n">tree_string</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span> <span class="o">+</span> <span class="n">tree_string_file</span><span class="p">,</span> <span class="s">'r'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="n">t</span> <span class="o">=</span> <span class="n">Tree</span><span class="p">(</span><span class="n">tree_string</span><span class="p">,</span> <span class="n">format</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">print</span> <span class="nb">len</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="n">node_number</span> <span class="o">=</span> <span class="mi">1</span> <span class="c">#We are going to give running numbers to all the nodes for easier access later on</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">t</span><span class="o">.</span><span class="n">traverse</span><span class="p">():</span> <span class="k">if</span> <span class="n">n</span><span class="o">.</span><span class="n">is_leaf</span> <span class="ow">and</span> <span class="s">'rypton'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="n">n</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">n</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'_crypton'</span> <span class="c"># This will make the annotation easier</span> <span class="k">if</span> <span class="n">n</span><span class="o">.</span><span class="n">support</span> <span class="o"><</span> <span class="n">support_cutoff</span><span class="p">:</span> <span class="n">n</span><span class="o">.</span><span class="n">delete</span><span class="p">()</span> <span class="c"># multifrucate unsupported bifrucarions</span> <span class="n">print_collapsed_nodes</span> <span class="o">=</span> <span class="mi">1</span> <span class="c"># will make a table of the collaped nodes, named by running numbers, and their </span> <span class="c"># leaf content</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">t</span><span class="o">.</span><span class="n">traverse</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">n</span><span class="o">.</span><span class="n">is_leaf</span><span class="p">():</span> <span class="n">organisms_in_node</span> <span class="o">=</span> <span class="p">{</span> <span class="c"># will be used to count taxa in collapsed nodes </span> <span class="s">'PAT'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'BAS'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'RET'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Mflo'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Dviv'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Ebre'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Rcul'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Avit'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Asuu'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Bxyl'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Hbac'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Haor'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Hcon'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Srat'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'briC'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Cele'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Cjap'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Bmal'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Cbre'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Crem'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Gpal'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Lsig'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Ppac'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Lloa'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Mhap'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Minc'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'C5sp'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Dimm'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Ooch'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Ovol'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Otip'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Pred'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'C11sp'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Cjap'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Cang'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Tmur'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Tspi'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Wban'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Acas'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Agam'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Acal'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Dpul'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Dsim'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Lgi'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Nvi'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Nve'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Ath'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Vcar'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Alyr'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'Hduj'</span><span class="p">:</span> <span class="mi">0</span> <span class="p">}</span> <span class="n">elements_in_node</span> <span class="o">=</span> <span class="p">{</span> <span class="c">#will be used to count element types in collapsed nodes</span> <span class="s">'dirs1_like'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'pat1_like'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'PAT'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'viper_or_Ngaro'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'kangaroo'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'DIRS'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'domains_inverted'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'two_domains'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'YR_only'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'crypton'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'DERIVED'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'GYPSY'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'COPIA'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">'BEL'</span><span class="p">:</span> <span class="mi">0</span> <span class="p">}</span> <span class="c"># node will be collapsed only if all the leafs can be placed in only one of these arrays</span> <span class="n">dirs1_hirarchy</span> <span class="o">=</span> <span class="p">(</span><span class="s">'dirs1_like'</span><span class="p">,</span> <span class="s">'DIRS'</span><span class="p">,</span> <span class="s">'two_domains'</span><span class="p">,</span> <span class="s">'YR_only'</span><span class="p">,</span><span class="s">'DERIVED'</span><span class="p">)</span> <span class="n">pat_like_hirarchy</span> <span class="o">=</span> <span class="p">(</span><span class="s">'pat1_like'</span><span class="p">,</span> <span class="s">'DIRS'</span><span class="p">,</span> <span class="s">'two_domains'</span><span class="p">,</span> <span class="s">'YR_only'</span><span class="p">,</span><span class="s">'DERIVED'</span><span class="p">)</span> <span class="n">toc_hirarchy</span> <span class="o">=</span> <span class="p">(</span><span class="s">'PAT'</span><span class="p">,</span> <span class="s">'DIRS'</span><span class="p">,</span> <span class="s">'two_domains'</span><span class="p">,</span> <span class="s">'YR_only'</span><span class="p">,</span><span class="s">'DERIVED'</span><span class="p">)</span> <span class="n">viper_or_Ngaro_hirarchy</span> <span class="o">=</span> <span class="p">(</span><span class="s">'viper_or_Ngaro'</span><span class="p">,</span> <span class="s">'two_domains'</span><span class="p">,</span> <span class="s">'YR_only'</span><span class="p">,</span><span class="s">'DERIVED'</span><span class="p">)</span> <span class="n">kangaroo_hirarchy</span> <span class="o">=</span> <span class="p">(</span><span class="s">'kangaroo'</span><span class="p">,</span> <span class="s">'domains_inverted'</span><span class="p">,</span><span class="s">'two_domains'</span><span class="p">,</span> <span class="s">'YR_only'</span><span class="p">,</span><span class="s">'DERIVED'</span><span class="p">,</span> <span class="s">'DIRS'</span><span class="p">)</span> <span class="n">DIRS_hirarchy</span> <span class="o">=</span> <span class="p">(</span><span class="s">'DIRS'</span><span class="p">,</span> <span class="s">'two_domains'</span><span class="p">,</span> <span class="s">'YR_only'</span><span class="p">,</span><span class="s">'DERIVED'</span><span class="p">)</span> <span class="n">domains_inverted_hirarchy</span> <span class="o">=</span> <span class="p">(</span><span class="s">'domains_inverted'</span><span class="p">,</span><span class="s">'two_domains'</span><span class="p">,</span> <span class="s">'YR_only'</span><span class="p">,</span><span class="s">'DERIVED'</span><span class="p">)</span> <span class="n">two_domains_hirarchy</span> <span class="o">=</span> <span class="p">(</span><span class="s">'two_domains'</span><span class="p">,</span><span class="s">'domains_inverted'</span><span class="p">,</span> <span class="s">'YR_only'</span><span class="p">,</span><span class="s">'DERIVED'</span><span class="p">)</span> <span class="n">crypton_hirarchy</span> <span class="o">=</span> <span class="p">(</span><span class="s">'crypton'</span><span class="p">,</span> <span class="s">'YR_only'</span><span class="p">)</span> <span class="n">YR_hirarchy</span> <span class="o">=</span> <span class="n">crypton_hirarchy</span> <span class="n">GYPSY_hirarchy</span> <span class="o">=</span> <span class="p">(</span><span class="s">'GYPSY'</span><span class="p">)</span> <span class="n">COPIA_hirarchy</span> <span class="o">=</span> <span class="p">(</span><span class="s">'COPIA'</span><span class="p">)</span> <span class="n">BEL_hirarchy</span> <span class="o">=</span> <span class="p">(</span><span class="s">'BEL'</span><span class="p">)</span> <span class="n">n</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s">'node'</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_number</span><span class="p">)</span> <span class="c">#naming the node</span> <span class="n">node_number</span> <span class="o">+=</span> <span class="mi">1</span> <span class="c"># prepare species counts and element counts to be added to all nodes</span> <span class="k">for</span> <span class="n">leaf</span> <span class="ow">in</span> <span class="n">n</span><span class="p">:</span> <span class="k">for</span> <span class="n">orgn</span> <span class="ow">in</span> <span class="n">organisms_in_node</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="c"># orgn is the species key</span> <span class="n">search_length</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">orgn</span><span class="p">)</span> <span class="k">if</span> <span class="n">leaf</span><span class="o">.</span><span class="n">name</span><span class="p">[:</span><span class="n">search_length</span><span class="p">]</span> <span class="o">==</span> <span class="n">orgn</span><span class="p">:</span> <span class="c"># if begining of the leaf name same as orgn</span> <span class="n">organisms_in_node</span><span class="p">[</span><span class="n">orgn</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span> <span class="c"># count it</span> <span class="k">for</span> <span class="n">element</span> <span class="ow">in</span> <span class="n">elements_in_node</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="n">search_length</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">element</span><span class="p">)</span> <span class="k">if</span> <span class="n">leaf</span><span class="o">.</span><span class="n">name</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="o">*</span><span class="n">search_length</span><span class="p">:]</span> <span class="o">==</span> <span class="n">element</span><span class="p">:</span> <span class="c"># if end of the leaf name same as element</span> <span class="n">elements_in_node</span><span class="p">[</span><span class="n">element</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span> <span class="c"># count it </span> <span class="k">for</span> <span class="n">orgn</span> <span class="ow">in</span> <span class="n">organisms_in_node</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="c"># get read of species codes with zero count </span> <span class="k">if</span> <span class="n">organisms_in_node</span><span class="p">[</span><span class="n">orgn</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="n">organisms_in_node</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="n">orgn</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span> <span class="k">for</span> <span class="n">element</span> <span class="ow">in</span> <span class="n">elements_in_node</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="c"># get read of elements with zero count</span> <span class="k">if</span> <span class="n">elements_in_node</span><span class="p">[</span><span class="n">element</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="n">elements_in_node</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="n">element</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span> <span class="n">n</span><span class="o">.</span><span class="n">add_features</span><span class="p">(</span><span class="n">species</span><span class="o">=</span><span class="n">organisms_in_node</span><span class="p">,</span> <span class="n">elements</span><span class="o">=</span><span class="n">elements_in_node</span><span class="p">)</span> <span class="c"># add speceis count and element counts</span> <span class="c"># as node features</span> <span class="c"># write tree with node names for future reference </span> <span class="n">t</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">outfile</span><span class="o">=</span><span class="n">path</span> <span class="o">+</span> <span class="n">tree_string_file</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">'.'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="s">'_internal_nodes.tre'</span><span class="p">,</span> <span class="n">format</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">features</span><span class="o">=</span><span class="p">[</span><span class="s">"support"</span><span class="p">])</span> <span class="n">collapsed_nodes_file</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span> <span class="o">+</span> <span class="n">tree_string_file</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">'.'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="s">'_collapsed_nodes.txt'</span><span class="p">,</span><span class="s">'wt'</span><span class="p">)</span> <span class="n">collapsed_nodes</span> <span class="o">=</span> <span class="p">[]</span> <span class="c"># decside which node to collapse based on the presence absence of conflicting element forms</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">t</span><span class="o">.</span><span class="n">traverse</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">n</span><span class="o">.</span><span class="n">is_leaf</span><span class="p">():</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">1</span> <span class="k">if</span> <span class="s">'dirs1_like'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">dirs1_hirarchy</span><span class="p">:</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">elif</span> <span class="s">'pat1_like'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">pat_like_hirarchy</span><span class="p">:</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">elif</span> <span class="s">'PAT'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">toc_hirarchy</span><span class="p">:</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">elif</span> <span class="s">'kangaroo'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">kangaroo_hirarchy</span><span class="p">:</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">elif</span> <span class="s">'viper_or_Ngaro'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">viper_or_Ngaro_hirarchy</span><span class="p">:</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">elif</span> <span class="s">'DIRS'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">DIRS_hirarchy</span><span class="p">:</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">elif</span> <span class="s">'domains_inverted'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">domains_inverted_hirarchy</span><span class="p">:</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">elif</span> <span class="s">'two_domains'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">two_domains_hirarchy</span><span class="p">:</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">elif</span> <span class="s">'COPIA'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">COPIA_hirarchy</span><span class="p">:</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">elif</span> <span class="s">'BEL'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">BEL_hirarchy</span><span class="p">:</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">elif</span> <span class="s">'GYPSY'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">GYPSY_hirarchy</span><span class="p">:</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">elif</span> <span class="s">'crypton'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">crypton_hirarchy</span><span class="p">:</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">elif</span> <span class="s">'YR_only'</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">YR_hirarchy</span><span class="p">:</span> <span class="n">collapse</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">if</span> <span class="n">collapse</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span> <span class="k">if</span> <span class="n">tree_string_file</span> <span class="o">==</span> <span class="s">'YR_tree/all_YR_corrected_reviewed.tre'</span> <span class="ow">and</span> <span class="n">n</span><span class="o">.</span><span class="n">support</span> <span class="o"><</span> <span class="mf">0.95</span><span class="p">:</span> <span class="n">n_descendants</span> <span class="o">=</span> <span class="n">n</span><span class="o">.</span><span class="n">get_descendants</span><span class="p">()</span> <span class="n">cutoff</span> <span class="o">=</span> <span class="n">blen_cutoff</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="mi">4</span><span class="p">)</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">n_descendants</span><span class="p">:</span> <span class="k">if</span> <span class="n">d</span><span class="o">.</span><span class="n">dist</span> <span class="o">></span> <span class="n">cutoff</span><span class="p">:</span> <span class="n">d</span><span class="o">.</span><span class="n">detach</span><span class="p">()</span> <span class="k">if</span> <span class="n">print_collapsed_nodes</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span> <span class="c">#print all leaves in a file. It will be used later </span> <span class="k">for</span> <span class="n">leaf</span> <span class="ow">in</span> <span class="n">n</span><span class="p">:</span> <span class="k">if</span> <span class="ow">not</span> <span class="s">'node'</span> <span class="ow">in</span> <span class="n">leaf</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="n">collapsed_nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">leaf</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="n">n</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span><span class="p">)</span> <span class="k">for</span> <span class="n">C</span> <span class="ow">in</span> <span class="n">n</span><span class="o">.</span><span class="n">get_descendants</span><span class="p">():</span> <span class="c"># and remove their leaves</span> <span class="n">C</span><span class="o">.</span><span class="n">detach</span><span class="p">()</span> <span class="n">ts</span> <span class="o">=</span> <span class="n">TreeStyle</span><span class="p">()</span> <span class="n">ts</span><span class="o">.</span><span class="n">show_leaf_name</span> <span class="o">=</span> <span class="bp">False</span> <span class="c">#ts.scale = 150</span> <span class="n">ts</span><span class="o">.</span><span class="n">show_branch_support</span> <span class="o">=</span> <span class="bp">False</span> <span class="n">t</span><span class="o">.</span><span class="n">ladderize</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="n">R</span> <span class="o">=</span> <span class="n">t</span><span class="o">.</span><span class="n">get_midpoint_outgroup</span><span class="p">()</span> <span class="n">ns</span> <span class="o">=</span> <span class="n">NodeStyle</span><span class="p">()</span> <span class="n">ns</span><span class="p">[</span><span class="s">'hz_line_width'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">5</span> <span class="n">ns</span><span class="p">[</span><span class="s">'vt_line_width'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">5</span> <span class="n">ns</span><span class="p">[</span><span class="s">'fgcolor'</span><span class="p">]</span> <span class="o">=</span> <span class="s">'black'</span> <span class="n">ns</span><span class="p">[</span><span class="s">'size'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">t</span><span class="o">.</span><span class="n">set_outgroup</span><span class="p">(</span><span class="n">R</span><span class="p">)</span> <span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">t</span><span class="o">.</span><span class="n">traverse</span><span class="p">():</span> <span class="n">node</span><span class="o">.</span><span class="n">set_style</span><span class="p">(</span><span class="n">ns</span><span class="p">)</span> <span class="k">if</span> <span class="n">node</span><span class="o">.</span><span class="n">is_leaf</span><span class="p">():</span> <span class="k">if</span> <span class="ow">not</span> <span class="s">'node'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="c">#add faces to leaves</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'gray'</span> <span class="k">if</span> <span class="s">'rypton'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'sandybrown'</span> <span class="n">collapsed_nodes_file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="s">'Crypton'</span> <span class="o">+</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span> <span class="k">elif</span> <span class="s">'kangaroo'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'red'</span> <span class="n">collapsed_nodes_file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="s">'kangaroo'</span> <span class="o">+</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span> <span class="k">elif</span> <span class="s">'dirs1'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'darkviolet'</span> <span class="k">elif</span> <span class="s">'PAT'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'green'</span> <span class="k">elif</span> <span class="s">'Ngaro'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'black'</span> <span class="k">if</span> <span class="p">(</span><span class="s">'BAS'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="ow">or</span> <span class="s">'RET'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">)</span> <span class="ow">and</span> <span class="ow">not</span> <span class="s">'DIRS'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="n">collapsed_nodes_file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="s">'Ngaro'</span> <span class="o">+</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span> <span class="k">elif</span> <span class="s">'ypsy'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s">'GYPSY'</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'black'</span> <span class="n">name_parts</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">'_'</span><span class="p">)</span> <span class="n">name</span> <span class="o">=</span> <span class="n">name_parts</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="n">get_name_part</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">part</span> <span class="ow">in</span> <span class="n">name_parts</span><span class="p">:</span> <span class="k">if</span> <span class="n">get_name_part</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span> <span class="n">name</span> <span class="o">=</span> <span class="n">name</span> <span class="o">+</span> <span class="s">'_'</span> <span class="o">+</span> <span class="n">part</span> <span class="k">elif</span> <span class="n">part</span> <span class="o">==</span> <span class="s">'1'</span><span class="p">:</span> <span class="n">get_name_part</span> <span class="o">=</span> <span class="mi">1</span> <span class="n">hola</span> <span class="o">=</span> <span class="n">TextFace</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">fsize</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span> <span class="n">fgcolor</span><span class="o">=</span><span class="n">color</span><span class="p">)</span> <span class="c">#node.add_face(hola, column=0, position = "branch-right")</span> <span class="k">if</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s">'GYPSY'</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">hola</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"branch-right"</span><span class="p">)</span> <span class="k">elif</span> <span class="n">color</span> <span class="o">==</span> <span class="s">'gray'</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">CircleFace</span><span class="p">(</span><span class="n">radius</span> <span class="o">=</span> <span class="mi">10</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'gray'</span><span class="p">,</span> <span class="n">style</span> <span class="o">=</span> <span class="s">'sphere'</span><span class="p">),</span> <span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"branch-right"</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">CircleFace</span><span class="p">(</span><span class="n">radius</span> <span class="o">=</span> <span class="mi">10</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="n">color</span><span class="p">),</span> <span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"branch-right"</span><span class="p">)</span> <span class="k">if</span> <span class="s">'RET_'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="ow">or</span> <span class="s">'PAT_'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="ow">or</span> <span class="s">'BAS_'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="k">if</span> <span class="s">'dirs1_like'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="n">collapsed_nodes_file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="s">'dirs1_like'</span> <span class="o">+</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span> <span class="k">elif</span> <span class="s">'pat1_like'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="n">collapsed_nodes_file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="s">'pat1_like'</span> <span class="o">+</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span> <span class="k">elif</span> <span class="s">'_PAT'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="n">collapsed_nodes_file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="s">'PAT'</span> <span class="o">+</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="n">collapsed_nodes_file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="o">+</span> <span class="s">'</span><span class="se">\t</span><span class="s">'</span> <span class="o">+</span> <span class="s">'None'</span> <span class="o">+</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span> <span class="k">elif</span> <span class="s">'node'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="c"># add faces to collapsed nodes</span> <span class="n">bcolor</span> <span class="o">=</span> <span class="s">'white'</span> <span class="n">species</span> <span class="o">=</span> <span class="s">''</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">species</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="n">species</span> <span class="o">=</span><span class="n">species</span> <span class="o">+</span> <span class="n">key</span> <span class="o">+</span> <span class="s">'_'</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">species</span><span class="p">[</span><span class="n">key</span><span class="p">])</span> <span class="o">+</span> <span class="s">'_'</span> <span class="n">count</span> <span class="o">+=</span> <span class="n">node</span><span class="o">.</span><span class="n">species</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="n">elements</span> <span class="o">=</span> <span class="s">''</span> <span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">elements</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="n">elements</span> <span class="o">=</span> <span class="n">elements</span> <span class="o">+</span> <span class="n">key</span> <span class="o">+</span> <span class="s">'_'</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">elements</span><span class="p">[</span><span class="n">key</span><span class="p">])</span> <span class="o">+</span> <span class="s">' '</span> <span class="n">verdict</span> <span class="o">=</span> <span class="s">'None'</span> <span class="n">verdict_color</span> <span class="o">=</span> <span class="s">'black'</span> <span class="k">if</span> <span class="s">'dirs1_like'</span> <span class="ow">in</span> <span class="n">elements</span><span class="p">:</span> <span class="n">verdict</span> <span class="o">=</span> <span class="s">'dirs1_like'</span> <span class="n">verdict_color</span> <span class="o">=</span> <span class="s">'darkviolet'</span> <span class="k">elif</span> <span class="s">'pat1_like'</span> <span class="ow">in</span> <span class="n">elements</span><span class="p">:</span> <span class="n">verdict</span> <span class="o">=</span> <span class="s">'pat1_like'</span> <span class="n">verdict_color</span> <span class="o">=</span> <span class="s">'blue'</span> <span class="k">elif</span> <span class="s">'PAT'</span> <span class="ow">in</span> <span class="n">elements</span><span class="p">:</span> <span class="n">verdict</span> <span class="o">=</span> <span class="s">'PAT'</span> <span class="n">verdict_color</span> <span class="o">=</span> <span class="s">'green'</span> <span class="k">elif</span> <span class="s">'kangaroo'</span> <span class="ow">in</span> <span class="n">elements</span><span class="p">:</span> <span class="n">verdict</span> <span class="o">=</span> <span class="s">'kangaroo'</span> <span class="n">verdict_color</span> <span class="o">=</span> <span class="s">'red'</span> <span class="k">elif</span> <span class="s">'Ngaro'</span> <span class="ow">in</span> <span class="n">elements</span><span class="p">:</span> <span class="n">verdict</span> <span class="o">=</span> <span class="s">'Ngaro'</span> <span class="n">verdict_color</span> <span class="o">=</span> <span class="s">'black'</span> <span class="k">elif</span> <span class="s">'crypton'</span> <span class="ow">in</span> <span class="n">elements</span><span class="p">:</span> <span class="n">verdict</span> <span class="o">=</span> <span class="s">'Crypton'</span> <span class="n">verdict_color</span> <span class="o">=</span> <span class="s">'sandybrown'</span> <span class="k">elif</span> <span class="s">'BEL'</span> <span class="ow">in</span> <span class="n">elements</span><span class="p">:</span> <span class="n">verdict</span> <span class="o">=</span> <span class="s">'BEL'</span> <span class="n">verdict_color</span> <span class="o">=</span> <span class="s">'black'</span> <span class="k">elif</span> <span class="s">'COPIA'</span> <span class="ow">in</span> <span class="n">elements</span><span class="p">:</span> <span class="n">verdict</span> <span class="o">=</span> <span class="s">'COPIA'</span> <span class="n">verdict_color</span> <span class="o">=</span> <span class="s">'black'</span> <span class="k">elif</span> <span class="s">'GYPSY'</span> <span class="ow">in</span> <span class="n">elements</span><span class="p">:</span> <span class="n">verdict</span> <span class="o">=</span> <span class="s">'GYPSY'</span> <span class="n">verdict_color</span> <span class="o">=</span> <span class="s">'black'</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">collapsed_nodes</span><span class="p">:</span> <span class="k">if</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">'</span><span class="se">\t</span><span class="s">'</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span> <span class="n">collapsed_nodes_file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">line</span> <span class="o">+</span> <span class="n">verdict</span> <span class="o">+</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span> <span class="n">el</span> <span class="o">=</span> <span class="n">TextFace</span><span class="p">(</span><span class="n">elements</span><span class="p">,</span> <span class="n">fsize</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">fgcolor</span> <span class="o">=</span> <span class="s">'dimgray'</span><span class="p">)</span> <span class="k">if</span> <span class="n">bcolor</span> <span class="o">==</span> <span class="s">'black'</span><span class="p">:</span> <span class="n">el</span><span class="o">.</span><span class="n">fgcolor</span> <span class="o">=</span> <span class="s">'white'</span> <span class="n">sp</span> <span class="o">=</span> <span class="n">TextFace</span><span class="p">(</span><span class="n">species</span><span class="p">,</span> <span class="n">fsize</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">fgcolor</span> <span class="o">=</span> <span class="s">'dimgray'</span><span class="p">)</span> <span class="k">if</span> <span class="n">count</span> <span class="o"><</span> <span class="mi">15</span><span class="p">:</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">15</span> <span class="c">#if len(species) > 0:</span> <span class="c">#species_line_to_pie_chart_face(node, species,species_groupings,float(count)*1+5)</span> <span class="c">#sp.border.width = 1</span> <span class="n">vr</span> <span class="o">=</span> <span class="n">TextFace</span><span class="p">(</span><span class="n">verdict</span><span class="p">,</span> <span class="n">fsize</span><span class="o">=</span><span class="n">count</span><span class="p">,</span> <span class="n">fgcolor</span><span class="o">=</span><span class="n">verdict_color</span><span class="p">)</span> <span class="c">#if verdict == 'pat1_like':</span> <span class="c">#vr.background.color = 'lightgray'</span> <span class="c">#node.add_face(sp, column=2, position = "branch-right")</span> <span class="c">#node.add_face(el, column=2, position = "branch-right")</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">verdict</span> <span class="o">==</span> <span class="s">'None'</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">verdict</span> <span class="o">==</span> <span class="s">'COPIA'</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">verdict</span> <span class="o">==</span> <span class="s">'BEL'</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">verdict</span> <span class="o">==</span> <span class="s">'GYPSY'</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">CircleFace</span><span class="p">(</span><span class="n">radius</span> <span class="o">=</span> <span class="mi">10</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="n">verdict_color</span><span class="p">),</span> <span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"branch-right"</span><span class="p">)</span> <span class="c">#node.add_face(vr, column=1, position = "branch-right")</span> <span class="k">elif</span> <span class="ow">not</span> <span class="n">verdict</span> <span class="o">==</span> <span class="s">'COPIA'</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">verdict</span> <span class="o">==</span> <span class="s">'BEL'</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">verdict</span> <span class="o">==</span> <span class="s">'GYPSY'</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">CircleFace</span><span class="p">(</span><span class="n">radius</span> <span class="o">=</span> <span class="mi">10</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'gray'</span><span class="p">,</span> <span class="n">style</span> <span class="o">=</span> <span class="s">'sphere'</span><span class="p">),</span> <span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"branch-right"</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">vr</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"branch-right"</span><span class="p">)</span> <span class="n">hola</span> <span class="o">=</span> <span class="n">TextFace</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="n">fsize</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">fgcolor</span><span class="o">=</span><span class="s">'gray'</span><span class="p">)</span> <span class="c">#node.add_face(hola, column=0, position = "branch-right")</span> <span class="n">score</span> <span class="o">=</span> <span class="n">TextFace</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">support</span><span class="p">),</span><span class="n">fsize</span><span class="o">=</span><span class="mi">6</span><span class="p">,</span> <span class="n">fgcolor</span><span class="o">=</span><span class="s">'black'</span><span class="p">)</span> <span class="c">#node.add_face(score,column=0, position = "branch-right")</span> <span class="k">if</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">CircleFace</span><span class="p">(</span><span class="n">radius</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'black'</span><span class="p">),</span><span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"float"</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o"><</span> <span class="mi">1</span> <span class="ow">and</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o">>=</span> <span class="mf">0.95</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">CircleFace</span><span class="p">(</span><span class="n">radius</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'dimgray'</span><span class="p">),</span><span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"float"</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o"><</span> <span class="mf">0.95</span> <span class="ow">and</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o">>=</span> <span class="mf">0.9</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">CircleFace</span><span class="p">(</span><span class="n">radius</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'gray'</span><span class="p">),</span><span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"float"</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o"><</span> <span class="mf">0.9</span> <span class="ow">and</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o">>=</span> <span class="mf">0.8</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">CircleFace</span><span class="p">(</span><span class="n">radius</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'gainsboro'</span><span class="p">),</span><span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"float"</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="c">#add faces to internal nodes</span> <span class="n">hola</span> <span class="o">=</span> <span class="n">TextFace</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="n">fsize</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">fgcolor</span><span class="o">=</span><span class="s">'gray'</span><span class="p">)</span> <span class="c">#node.add_face(hola, column=0, position = "branch-right")</span> <span class="k">if</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">CircleFace</span><span class="p">(</span><span class="n">radius</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'black'</span><span class="p">),</span><span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"float"</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o"><</span> <span class="mi">1</span> <span class="ow">and</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o">>=</span> <span class="mf">0.95</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">CircleFace</span><span class="p">(</span><span class="n">radius</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'dimgray'</span><span class="p">),</span><span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"float"</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o"><</span> <span class="mf">0.95</span> <span class="ow">and</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o">>=</span> <span class="mf">0.9</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">CircleFace</span><span class="p">(</span><span class="n">radius</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'gray'</span><span class="p">),</span><span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"float"</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o"><</span> <span class="mf">0.9</span> <span class="ow">and</span> <span class="n">node</span><span class="o">.</span><span class="n">support</span> <span class="o">>=</span> <span class="mf">0.8</span><span class="p">:</span> <span class="n">node</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">CircleFace</span><span class="p">(</span><span class="n">radius</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'gainsboro'</span><span class="p">),</span><span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"float"</span><span class="p">)</span> <span class="n">t</span><span class="o">.</span><span class="n">dist</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">t</span><span class="o">.</span><span class="n">support</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">t</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">tree_style</span> <span class="o">=</span> <span class="n">ts</span><span class="p">)</span> <span class="c">#t.render(path + tree_string_file.split('.')[0] + '_Figure.tif', w = 1000, tree_style = ts)</span> <span class="n">collapsed_nodes_file</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
1662 520 520
Note: If the trees are recalculated the output of this cell will change.
For each well supported and well classified clade in the reduced YR tree, the leaves of the clade were manually searched in the full YR tree. In a case that a leaf was found in a clade that had branch support of 0.9 or more, all the leaves of this clade from the full tree, were considered to belong to the original clade from the reduced YR tree. This produced a set of groups that can be manually grouped by editing the dictionary based on the phylogenetic tree, or they can be automatically binned into Dirs1, Pat1, PAT, kangaroo, Crypton and Ngaro using the next script.
reduced_tree_nodes = open('comparable_trees/YR_outgrouped_corrected_reviewed_collapsed_nodes.txt').readlines() full_tree_nodes = open('YR_tree/all_YR_corrected_reviewed_collapsed_nodes.txt').readlines()node_to_nodes = {}
for line in reduced_tree_nodes: leaf_name, node_name, verdict = line.rstrip('\n').split('\t') if not verdict == 'None' and not node_name in node_to_nodes.keys(): node_to_nodes[node_name] = {} node_to_nodes[node_name]['verdict'] = verdict node_to_nodes[node_name]['nodes'] = []
for full_t_line in full_tree_nodes: full_t_leaf_name, full_t_node_name, full_t_verdict = full_t_line.rstrip('\n').split('\t') for line in reduced_tree_nodes: leaf_name, node_name, verdict = line.rstrip('\n').split('\t') if full_t_leaf_name == leaf_name: if (full_t_verdict == 'None' or full_t_verdict == verdict ): if (node_name in node_to_nodes.keys() and not full_t_node_name in node_to_nodes[node_name]['nodes']): node_to_nodes[node_name]['nodes'].append(full_t_node_name)
dict_for_figure = {} for node in node_to_nodes.keys(): node_verdict = node_to_nodes[node]['verdict'] if node_verdict in ['dirs1_like','pat1_like','kangaroo']: if not node_verdict in dict_for_figure.keys(): dict_for_figure[node_verdict] = {} dict_for_figure[node_verdict]['reduced'] = [] dict_for_figure[node_verdict]['full'] = [] dict_for_figure[node_verdict]['reduced'].append(node) for n in node_to_nodes[node]['nodes']: if not n in dict_for_figure[node_verdict]['full']: dict_for_figure[node_verdict]['full'].append(n) else: key = node_verdict + '__' + node dict_for_figure[key] = {} dict_for_figure[key]['reduced'] = [node] dict_for_figure[key]['full'] = node_to_nodes[node]['nodes']
print dict_for_figure
{'PAT__node235': {'full': ['node78'], 'reduced': ['node235']}, 'PAT__node236': {'full': [], 'reduced': ['node236']}, 'PAT__node237': {'full': ['node210'], 'reduced': ['node237']}, 'Crypton__node165': {'full': ['RET_CryptonCn1_1358_1_YR_only_crypton', 'node148'], 'reduced': ['node165']}, 'Ngaro__node167': {'full': ['node102', 'node82', 'BAS_DIRS_53_XT_3142_1_viper_or_Ngaro', 'node263', 'node448', 'node546'], 'reduced': ['node167']}, 'Crypton__BAS_CryptonA_1_OL_300_1_YR_only_crypton': {'full': ['BAS_CryptonA_1_OL_300_1_YR_only_crypton'], 'reduced': ['BAS_CryptonA_1_OL_300_1_YR_only_crypton']}, 'Ngaro__node222': {'full': [], 'reduced': ['node222']}, 'pat1_like': {'full': ['node224', 'node219', 'node119', 'node217'], 'reduced': ['node194', 'node197', 'node196', 'node262', 'node260', 'node325']}, 'Crypton__BAS_CryptonF_6_PI_451_1_YR_only_crypton': {'full': ['node204'], 'reduced': ['BAS_CryptonF_6_PI_451_1_YR_only_crypton']}, 'Crypton__BAS_CryptonV_1_NV_2095_1_YR_only_crypton': {'full': ['BAS_CryptonV_1_NV_2095_1_YR_only_crypton'], 'reduced': ['BAS_CryptonV_1_NV_2095_1_YR_only_crypton']}, 'PAT__node266': {'full': ['node133'], 'reduced': ['node266']}, 'PAT__node229': {'full': ['node213'], 'reduced': ['node229']}, 'Crypton__node173': {'full': ['BAS_Crypton_10_CGi_839_1_YR_only_crypton', 'BAS_Crypton_9_CGi_1469_1_YR_only_crypton', 'BAS_CryptonA_3_SK_54_1_YR_only_crypton', 'node240'], 'reduced': ['node173']}, 'PAT__node160': {'full': ['node79', 'node56'], 'reduced': ['node160']}, 'kangaroo': {'full': ['node93'], 'reduced': ['Vcar_scaffold_38_112287_1_kangaroo']}, 'Ngaro__node135': {'full': ['node102', 'node155'], 'reduced': ['node135']}, 'dirs1_like': {'full': ['node104', 'node103', 'node382', 'node98', 'node403', 'node299', 'node298', 'node293', 'node78', 'node112', 'node294', 'node381', 'node287', 'node111', 'node110', 'node109', 'node95', 'node120', 'node105', 'RET_RoDirs1_3186_1_dirs1_like', 'node373', 'BAS_DIRS_4_SK_3309_1_DIRS', 'node295', 'node107', 'node291', 'node290', 'node379', 'node376', 'node377', 'node292'], 'reduced': ['node77', 'node73', 'node17', 'node32', 'node30', 'node191', 'node193', 'node8', 'node95', 'node94', 'node50', 'node131', 'node180', 'RET_TcDirs1_3616_1_dirs1_like', 'RET_RoDirs1_3186_1_dirs1_like', 'node68', 'node46', 'node51', 'node3', 'node66', 'node161', 'node121', 'node122', 'node5', 'node4', 'node7']}, 'Crypton__BAS_Crypton_1_NV_637_1_YR_only_crypton': {'full': ['node683'], 'reduced': ['BAS_Crypton_1_NV_637_1_YR_only_crypton']}, 'PAT__node192': {'full': ['node90'], 'reduced': ['node192']}, 'Crypton__BAS_CryptonA_1_LG_614_1_two_domains_crypton': {'full': ['BAS_CryptonA_1_LG_614_1_two_domains_crypton'], 'reduced': ['BAS_CryptonA_1_LG_614_1_two_domains_crypton']}, 'Crypton__node134': {'full': ['node215'], 'reduced': ['node134']}, 'Crypton__BAS_CryptonS_2_PU_272_1_YR_only_crypton': {'full': ['BAS_CryptonS_2_PU_272_1_YR_only_crypton'], 'reduced': ['BAS_CryptonS_2_PU_272_1_YR_only_crypton']}, 'Crypton__BAS_Crypton_1_NV_309_1_YR_only_crypton': {'full': [], 'reduced': ['BAS_Crypton_1_NV_309_1_YR_only_crypton']}}
elements with ZF
Lgi_150_453169_PAT 1 pat1_Ngaro
Lgi_59_1690142_DIRS 1 pat1_Ngaro
Lgi_59_1685024_DIRS 1 pat1_Ngaro
Lgi_246_5989_DIRS 1 pat1_Ngaro
Lgi_246_5423_DIRS 1 pat1_Ngaro
Nve_318_58153_PAT 1 pat1_Ngaro
Nve_318_58227_PAT 1 pat1_Ngaro
Hcon_0059030_17723_pat1-like 1 pat1_Ngaro
Hcon_0059030_17196_pat1-like 1 pat1_Ngaro
Rcul_11614_5929_DIRS 1 pat1_Ngaro
Rcul_65775_27385_DIRS 1 pat1_Ngaro
TEs_with_ZF = """Lgi_150_453169_PAT 1 pat1_Ngaro Lgi_59_1690142_DIRS 1 pat1_Ngaro Lgi_59_1685024_DIRS 1 pat1_Ngaro Lgi_246_5989_DIRS 1 pat1_Ngaro Lgi_246_5423_DIRS 1 pat1_Ngaro Nve_318_58153_PAT 1 pat1_Ngaro Nve_318_58227_PAT 1 pat1_Ngaro Hcon_0059030_17723_pat1-like 1 pat1_Ngaro Hcon_0059030_17196_pat1-like 1 pat1_Ngaro Rcul_11614_5929_DIRS 1 pat1_Ngaro Rcul_65775_27385_DIRS 1 pat1_Ngaro"""TEs_with_ZF = TEs_with_ZF.split('\n') collapsed_nodes = open('comparable_trees/YR_outgrouped_corrected_reviewed_collapsed_nodes.txt', 'r').readlines()
keys_to_replace_with = []
for line in collapsed_nodes: line_contig_and_start = line.split('')[2] + '' + line.split('')[3] node_number = line.split('\t')[1] for hit in TEs_with_ZF: hit_contig_and_start = hit.split('')[1] + '' + hit.split('')[2] hit_verdict = hit.split('_')[3] hit_verdict = hit_verdict.split(' ')[0] if hit_contig_and_start == line_contig_and_start and hit_verdict == 'PAT': print hit_contig_and_start, line_contig_and_start for key in dict_for_figure.keys(): if [node_number] == dict_for_figure[key]['reduced']: print key node_in_key = key.split('')[1] new_key = 'pat1_like' + '' + node_number keys_to_replace_with.append([key, new_key])
for replace in keys_to_replace_with: if replace[0] in dict_for_figure.keys(): dict_for_figure[replace[1]] = dict_for_figure[replace[0]] del dict_for_figure[replace[0]]
318_58153 318_58153 PAT__node235 150_453169 150_453169 PAT__node160 318_58227 318_58227 PAT__node160
This is a list of leaves form the nodes 235 and 160 in the small YR tree
and the leaves from associated nodes in the full YR tree Vcar_scaffold_32_252119_1_DIRS
Vcar_scaffold_4_1088875_1_YR_only
Vcar_scaffold_9_651715_1_YR_only
Vcar_scaffold_84_98418_1_YR_only
Vcar_scaffold_27_1811040_1_YR_only
Vcar_scaffold_1_1672594_1_two_domains
Vcar_scaffold_31_1028164_1_DIRS
Vcar_scaffold_9_264930_1_PAT
Vcar_scaffold_2_3631086_1_PAT
Vcar_scaffold_1_3941792_1_PAT
Vcar_scaffold_21_2126231_1_PAT
Vcar_scaffold_21_80243_1_PAT
Vcar_scaffold_9_3478209_1_PAT
Vcar_scaffold_2_1384943_1_PAT
Vcar_scaffold_684_1328_1_two_domains
Vcar_scaffold_21_4474_1_PAT
Vcar_scaffold_77_345705_1_YR_only
BAS_DIRS_3_CGi_3386_1_DIRS
Lgi_sca_38_1319572_1_two_domains
Lgi_sca_13_1596769_1_DIRS
Lgi_sca_246_5423_1_DIRS
Lgi_sca_114_28036_1_two_domains
Lgi_sca_31_2840409_1_DIRS
Lgi_sca_59_1685024_1_DIRS
BAS_DIRS_1_RO_3043_1_DIRS
Lgi_sca_150_453169_1_PAT
RET_SpPat1_3157_1_DIRS
Rcul_scaffold5760_1_size11614_5929_1_DIRS
Nve_scaffold_318_58153_1_PAT
Nve_scaffold_17_1547274_1_DIRS
Nve_scaffold_318_58227_1_PAT
The groups in dict_for_figure are collapsed into bins according to their feature order based calssification. The phylogenetic information is retained in figure 2.
collapse_dict_for_figure = {} bins = ['Ngaro','PAT','kangaroo','crypton','pat1_like','dirs1_like'] for bin in bins: collapse_dict_for_figure[bin] = {'full': [], 'reduced': []} for key in dict_for_figure.keys(): if bin in key: collapse_dict_for_figure[bin]['full'] = collapse_dict_for_figure[bin]['full'] + dict_for_figure[key]['full'] collapse_dict_for_figure[bin]['reduced'] = collapse_dict_for_figure[bin]['reduced'] + dict_for_figure[key]['reduced'] print collapse_dict_for_figure
{'PAT': {'full': ['node210', 'node133', 'node213', 'node90'], 'reduced': ['node236', 'node237', 'node266', 'node229', 'node192']}, 'crypton': {'full': ['BAS_CryptonA_1_OL_300_1_YR_only_crypton', 'node204', 'BAS_CryptonV_1_NV_2095_1_YR_only_crypton', 'node683', 'BAS_CryptonA_1_LG_614_1_two_domains_crypton', 'BAS_CryptonS_2_PU_272_1_YR_only_crypton'], 'reduced': ['BAS_CryptonA_1_OL_300_1_YR_only_crypton', 'BAS_CryptonF_6_PI_451_1_YR_only_crypton', 'BAS_CryptonV_1_NV_2095_1_YR_only_crypton', 'BAS_Crypton_1_NV_637_1_YR_only_crypton', 'BAS_CryptonA_1_LG_614_1_two_domains_crypton', 'BAS_CryptonS_2_PU_272_1_YR_only_crypton', 'BAS_Crypton_1_NV_309_1_YR_only_crypton']}, 'Ngaro': {'full': ['node102', 'node82', 'BAS_DIRS_53_XT_3142_1_viper_or_Ngaro', 'node263', 'node448', 'node546', 'node102', 'node155'], 'reduced': ['node167', 'node222', 'node135']}, 'dirs1_like': {'full': ['node104', 'node103', 'node382', 'node98', 'node403', 'node299', 'node298', 'node293', 'node78', 'node112', 'node294', 'node381', 'node287', 'node111', 'node110', 'node109', 'node95', 'node120', 'node105', 'RET_RoDirs1_3186_1_dirs1_like', 'node373', 'BAS_DIRS_4_SK_3309_1_DIRS', 'node295', 'node107', 'node291', 'node290', 'node379', 'node376', 'node377', 'node292'], 'reduced': ['node77', 'node73', 'node17', 'node32', 'node30', 'node191', 'node193', 'node8', 'node95', 'node94', 'node50', 'node131', 'node180', 'RET_TcDirs1_3616_1_dirs1_like', 'RET_RoDirs1_3186_1_dirs1_like', 'node68', 'node46', 'node51', 'node3', 'node66', 'node161', 'node121', 'node122', 'node5', 'node4', 'node7']}, 'pat1_like': {'full': ['node224', 'node219', 'node119', 'node217', 'node79', 'node56', 'node78'], 'reduced': ['node194', 'node197', 'node196', 'node262', 'node260', 'node325', 'node160', 'node235']}, 'kangaroo': {'full': ['node93'], 'reduced': ['Vcar_scaffold_38_112287_1_kangaroo']}}
from ete2 import Tree, NodeStyle, TreeStyle, TextFace, PhyloTree import numpyreduced_tree = Tree('comparable_trees/YR_outgrouped_corrected_reviewed_internal_nodes.tre', format=1) full_tree = Tree('YR_tree/all_YR_corrected_reviewed_internal_nodes.tre', format=1)
def get_all_leaves(tree, node_name): if 'node' in node_name: leaves = (tree&node_name).get_leaf_names() for leaf_name in leaves: if 'node' in leaf_name: leaves.pop(leaves.index(leaf_name)) return leaves else: print 'trying to get leaves from a leaf'
def merge_objects(a, b): merged = [] if isinstance(a, list) and isinstance(b, list): merged = a for i in b: if not i in merged: merged.append(i) elif isinstance(a, list) and not isinstance(b, list): merged = a if not b in merged: merged.append(b) elif not isinstance(a, list) and isinstance(b, list): merged = b if not a in merged: merged.append(a) elif not isinstance(a, list) and not isinstance(b, list): if not a == b: merged = (a, b) return merged
count_elements_per_species = {}
all_leaf_names_per_element = {}
all_elements_nodes = collapse_dict_for_figure
for element in all_elements_nodes.keys(): #compose list of all nodes full_list_of_leaves_for_this_elements = [] for tree in all_elements_nodes[element].keys(): t = reduced_tree if tree == 'full': t = full_tree if isinstance(all_elements_nodes[element][tree], str): if not 'node' in all_elements_nodes[element][tree]: full_list_of_leaves_for_this_elements = merge_objects(full_list_of_leaves_for_this_elements, all_elements_nodes[element][tree]) elif 'node' in all_elements_nodes[element][tree]: leaves = get_all_leaves(t, all_elements_nodes[element][tree]) full_list_of_leaves_for_this_elements = merge_objects(full_list_of_leaves_for_this_elements, leaves) else: for node_name in all_elements_nodes[element][tree]: if not 'node' in node_name: full_list_of_leaves_for_this_elements = merge_objects(full_list_of_leaves_for_this_elements, node_name) elif 'node' in node_name: leaves = get_all_leaves(t, node_name) full_list_of_leaves_for_this_elements = merge_objects(full_list_of_leaves_for_this_elements, leaves) all_leaf_names_per_element[element] = full_list_of_leaves_for_this_elements
#Write a file of all the elements that were classified phylogenetically #Write a file of all the elements from REFERECE sequences that were classified phylogenetically
elements_list = open('./psiblast_ugene_RESULTS/all_elements_found.txt','wt') DB_elements_list = open('./psiblast_ugene_RESULTS/all_DB_elements_found.txt','wt') for element in all_leaf_names_per_element.keys(): elements_list.write(element + '\n-------\n') DB_elements_list.write(element + '\n-------\n') for leaf in all_leaf_names_per_element[element]: elements_list.write(leaf + '\n') if 'RET' in leaf or 'PAT' in leaf or 'BAS' in leaf: DB_elements_list.write(leaf + '\n') elements_list.write('\n') DB_elements_list.write('\n') elements_list.close() DB_elements_list.close()
#Summarize and write table
elements = all_leaf_names_per_element.keys() all_species = [] for element in elements: #count the nodes for each species leaves = all_leaf_names_per_element[element] count_elements_per_species[element] = {} for leaf in leaves: species = leaf.split('_')[0] if not species in all_species: all_species.append(species) if species in count_elements_per_species[element].keys(): count_elements_per_species[element][species] += 1 else: count_elements_per_species[element][species] = 1
from cogent import LoadTable from cogent.util.table import Table
column_headings = ['species'] for el in elements: column_headings.append(el) lines = [] for species in all_species: line = [] line.append(species) for element in elements: if species in count_elements_per_species[element].keys(): line.append(count_elements_per_species[element][species]) else: line.append(0) lines.append(line) table = Table(header = column_headings, rows = lines) print table table.writeToFile('element_hits_summary.csv', sep='\t')
========================================================================= species PAT crypton Ngaro dirs1_like pat1_like kangaroo ------------------------------------------------------------------------- Vcar 70 0 0 0 17 24 Acas 10 0 0 11 0 0 BAS 0 7 7 120 3 0 Nvi 0 1 0 46 0 0 Nve 0 1 114 59 3 0 Lgi 0 0 53 94 8 0 Hduj 0 0 14 3 0 0 Ebre 0 0 2 7 0 0 Dpul 0 0 1 182 0 0 Rcul 0 0 0 68 2 0 Ppac 0 0 0 1 32 0 Acal 0 0 0 33 0 0 RET 0 0 0 5 1 0 Hcon 0 0 0 0 18 0 PAT 0 0 0 0 4 0 Cele 0 0 0 0 1 0 Cjap 0 0 0 0 50 0 Crem 0 0 0 0 11 0 Cang 0 0 0 0 2 0 briC 0 0 0 0 25 0 -------------------------------------------------------------------------
from ete2 import Tree, NodeStyle, TreeStyle, TextFace, PhyloTree,PieChartFace, CircleFace, BarChartFace from cogent import LoadTable from cogent.util.table import Table import numpyNematoda_tree = Tree('Fitch2013_Adapted_DeLey2006.tre', format = 1) Nematoda_tree.ladderize(1)
ns = NodeStyle() ns['hz_line_width']=9 ns['vt_line_width']=9 ns["shape"]='square' ns["fgcolor"] = "black" ns["size"] = 9
for node in Nematoda_tree.traverse(): node.set_style(ns)
hits_table = LoadTable(filename = 'element_hits_summary.csv', header = True, with_title = False, sep = '\t') species = [] for row in hits_table: species.append(row['species']) elements = hits_table._header elements.pop(0)
nematode_species_table =LoadTable(filename = 'nematode_species.csv', header = True, with_title = False, sep = '\t')
colors = { 'pat1_like': 'blue', 'dirs1_like': 'darkviolet', 'PAT': 'green', 'kangaroo': 'red', 'Ngaro':'black', 'crypton': 'white', }
color_list = [] for el in elements: color_list.append(colors[el])
for leaf in Nematoda_tree: hola = TextFace(leaf.name + ' ', fsize = 50) leaf.add_face(hola, column=0, position = "branch-right")
leaf.add_features(species = {}, total_counts_per_speices = {}, per_element_counts_per_species = {}, per_element_percents_per_species = {}) for row in nematode_species_table: if row['LowerRank'] == leaf.name: leaf.species[row['Abbreviation']] = row['Species'] for row in hits_table: if row['species'] in leaf.species.keys(): counts = [] for i in row[1:]: counts.append(i) total_count = sum(counts) percents = [] for count in counts: percent = float(count)/total_count100 percents.append(percent) leaf.total_counts_per_speices[row['species']] = total_count leaf.per_element_counts_per_species[row['species']] = counts leaf.per_element_percents_per_species[row['species']] = percents species_num = 1 code_list = leaf.species.keys() if leaf.name == 'Rhabditomorpha': code_list = ['Cele', 'briC', 'Cjap', 'Crem', 'Cbre', 'C5sp', 'C11sp', 'Otip', 'Hbac', 'Hcon', 'Dviv'] # to order by taxonomy for sp in code_list: hola = TextFace(leaf.species[sp].split(' ')[0][0]+ '. ' + leaf.species[sp].split(' ')[1] + ' ', fsize = 55, tight_text=False, fstyle='italic') hola.margin_bottom=35 hola.margin_bottom=35 leaf.add_face(hola, column=1, position = "aligned") if sp in leaf.total_counts_per_speices.keys(): size = 30 if leaf.total_counts_per_speices[sp] > 10 and leaf.total_counts_per_speices[sp] < 190: size = leaf.total_counts_per_speices[sp]3 elif leaf.total_counts_per_speices[sp] >= 190: size = 571 count_face = TextFace(str(leaf.total_counts_per_speices[sp])+' ', fsize = 50, fgcolor = 'black') count_face.margin_bottom=30 count_face.margin_bottom=30 leaf.add_face(count_face, column=2, position = "aligned") pie = PieChartFace(leaf.per_element_percents_per_species[sp], size, size, color_list, line_color='black') max_val = 5 # y axis maximum for bar chart faces m = max(leaf.per_element_counts_per_species[sp]) if m > max_val and m < 100: max_val = 100 elif m > max_val: max_val = m #max_val = m bar = BarChartFace(leaf.per_element_counts_per_species[sp], colors=color_list, min_value=0, max_value=max_val, width=260, height=130,) if not leaf in (Nematoda_tree&'Nematoda'): pie.margin_bottom=65 pie.margin_top=65 bar.margin_bottom=60 bar.margin_top=60 leaf.add_face(pie, column=3, position = "aligned") #leaf.add_face(bar, column=3, position = "aligned") else: count_face = TextFace('0 ', fsize = 50, fgcolor = 'black') count_face.margin_bottom=35 count_face.margin_bottom=35 leaf.add_face(count_face, column=2, position = "aligned") dummy = CircleFace(0, color = 'black') dummy.margin_bottom=35 dummy.margin_bottom=35 leaf.add_face(dummy, column=3, position = "aligned") species_num += 1for n in Nematoda_tree.traverse(): if not n.is_leaf() and n.name == 'Nematoda': hola = TextFace(n.name, fsize = 50, fgcolor = 'black') if n.name == 'Nematoda': nst = NodeStyle() nst['hz_line_width']=9 nst['vt_line_width']=9 n.set_style(nst)
<span class="n">n</span><span class="o">.</span><span class="n">add_face</span><span class="p">(</span><span class="n">hola</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">position</span> <span class="o">=</span> <span class="s">"branch-right"</span><span class="p">)</span>
Nematoda_tree.write(outfile='./Fitch2013_Adapted_DeLey2006.nhx', format=1, features=["species", "total_counts_per_speices", "per_element_counts_per_species", "per_element_percents_per_species"])
Nematoda_tree.convert_to_ultrametric(10)
legend = [ ['PAT and Kangaroo', 'green'], ['pat1_like', 'blue'], ['dirs1_like', 'darkviolet'], ['Ngaro', 'black'], ['Crypton', 'white'], ]
tsn = TreeStyle() pos = 0 #for item in legend: # tsn.legend.add_face(TextFace(' ' + item[0] + ' ', fsize=50), column=pos+1) # tsn.legend.add_face(CircleFace(50, item[1], style='circle'), column=pos) # pos += 2 tsn.scale = 10 tsn.allow_face_overlap=False tsn.show_leaf_name = False tsn.min_leaf_separation = 90 tsn.draw_guiding_lines = False #tsn.legend_position=3 Nematoda_tree.dist = 0 (Nematoda_tree&'Nematoda').convert_to_ultrametric(10)
stdin = (Nematoda_tree&'Nematoda').render('Figure3_inset.tif', w=3000, tree_style = tsn) #stdin = (Nematoda_tree&'Nematoda').render('Figure3_inset_BARS.tif', w=3000, tree_style = tsn) for n in (Nematoda_tree&'Nematoda').get_children(): n.detach() Nematoda_tree.convert_to_ultrametric(10) stdin = Nematoda_tree.render('Figure3_main.tif', w=3000, tree_style = tsn) #stdin = Nematoda_tree.render('Figure3_main_BARS.tif', w=3000, tree_style = tsn)
from ete2 import Tree, TreeFace, TreeStyle, CircleFace, TextFace, NodeStyle import replacement = {'PAT__node235': {'full': ['node78'], 'reduced': ['node235']}, 'PAT__node236': {'full': [], 'reduced': ['node236']}, 'PAT__node237': {'full': ['node210'], 'reduced': ['node237']}, 'Crypton__node165': {'full': ['RET_CryptonCn1_1358_1_YR_only_crypton', 'node148'], 'reduced': ['node165']}, 'Ngaro__node167': {'full': ['node102', 'node82', 'BAS_DIRS_53_XT_3142_1_viper_or_Ngaro', 'node263', 'node448', 'node546'], 'reduced': ['node167']}, 'Crypton__BAS_CryptonA_1_OL_300_1_YR_only_crypton': {'full': ['BAS_CryptonA_1_OL_300_1_YR_only_crypton'], 'reduced': ['BAS_CryptonA_1_OL_300_1_YR_only_crypton']}, 'Ngaro__node222': {'full': [], 'reduced': ['node222']}, 'pat1_like': {'full': ['node224', 'node219', 'node119', 'node217'], 'reduced': ['node194', 'node197', 'node196', 'node262', 'node260', 'node325']}, 'Crypton__BAS_CryptonF_6_PI_451_1_YR_only_crypton': {'full': ['node204'], 'reduced': ['BAS_CryptonF_6_PI_451_1_YR_only_crypton']}, 'Crypton__BAS_CryptonV_1_NV_2095_1_YR_only_crypton': {'full': ['BAS_CryptonV_1_NV_2095_1_YR_only_crypton'], 'reduced': ['BAS_CryptonV_1_NV_2095_1_YR_only_crypton']}, 'PAT__node266': {'full': ['node133'], 'reduced': ['node266']}, 'PAT__node229': {'full': ['node213'], 'reduced': ['node229']}, 'Crypton__node173': {'full': ['BAS_Crypton_10_CGi_839_1_YR_only_crypton', 'BAS_Crypton_9_CGi_1469_1_YR_only_crypton', 'BAS_CryptonA_3_SK_54_1_YR_only_crypton', 'node240'], 'reduced': ['node173']}, 'PAT__node160': {'full': ['node79', 'node56'], 'reduced': ['node160']}, 'kangaroo': {'full': ['node93'], 'reduced': ['Vcar_scaffold_38_112287_1_kangaroo']}, 'Ngaro__node135': {'full': ['node102', 'node155'], 'reduced': ['node135']}, 'dirs1_like': {'full': ['node104', 'node103', 'node382', 'node98', 'node403', 'node299', 'node298', 'node293', 'node78', 'node112', 'node294', 'node381', 'node287', 'node111', 'node110', 'node109', 'node95', 'node120', 'node105', 'RET_RoDirs1_3186_1_dirs1_like', 'node373', 'BAS_DIRS_4_SK_3309_1_DIRS', 'node295', 'node107', 'node291', 'node290', 'node379', 'node376', 'node377', 'node292'], 'reduced': ['node77', 'node73', 'node17', 'node32', 'node30', 'node191', 'node193', 'node8', 'node95', 'node94', 'node50', 'node131', 'node180', 'RET_TcDirs1_3616_1_dirs1_like', 'RET_RoDirs1_3186_1_dirs1_like', 'node68', 'node46', 'node51', 'node3', 'node66', 'node161', 'node121', 'node122', 'node5', 'node4', 'node7']}, 'Crypton__BAS_Crypton_1_NV_637_1_YR_only_crypton': {'full': ['node683'], 'reduced': ['BAS_Crypton_1_NV_637_1_YR_only_crypton']}, 'PAT__node192': {'full': ['node90'], 'reduced': ['node192']}, 'Crypton__BAS_CryptonA_1_LG_614_1_two_domains_crypton': {'full': ['BAS_CryptonA_1_LG_614_1_two_domains_crypton'], 'reduced': ['BAS_CryptonA_1_LG_614_1_two_domains_crypton']}, 'Crypton__node134': {'full': ['node215'], 'reduced': ['node134']}, 'Crypton__BAS_CryptonS_2_PU_272_1_YR_only_crypton': {'full': ['BAS_CryptonS_2_PU_272_1_YR_only_crypton'], 'reduced': ['BAS_CryptonS_2_PU_272_1_YR_only_crypton']}, 'Crypton__BAS_Crypton_1_NV_309_1_YR_only_crypton': {'full': [], 'reduced': ['BAS_Crypton_1_NV_309_1_YR_only_crypton']}}
# This is the unedited dictionary
full_tree = Tree('YR_tree/all_YR_corrected_reviewed_internal_nodes.tre', format=1) full_tree_descendent_names = [] for node in full_tree.get_descendants(): full_tree_descendent_names.append(node.name) reduced_tree = Tree('comparable_trees/YR_outgrouped_corrected_reviewed_internal_nodes.tre', format=1) reduced_tree_descendent_names = [] for node in reduced_tree.get_descendants(): reduced_tree_descendent_names.append(node.name)
def add_node_support(node): if node.support == 1: node.add_face(CircleFace(radius = 4, color = 'black'),column=0, position = "float") elif node.support < 1 and node.support >= 0.95: node.add_face(CircleFace(radius = 4, color = 'gray'),column=0, position = "float") elif node.support < 0.95 and node.support >= 0.9: node.add_face(CircleFace(radius = 4, color = 'gray'),column=0, position = "float") elif node.support < 0.9 and node.support >= 0.8: node.add_face(CircleFace(radius = 4, color = 'gainsboro'),column=0, position = "float")
def add_leaf_name(leaf,color,size): name = leaf.name pattern = re.search(r'(^[^_]+).+(\d+)1(.+)', name) new_name = pattern.group(1) + ' ' + pattern.group(2) + ' ' + pattern.group(3) leaf_name = TextFace(new_name,fsize=size, fgcolor=color) leaf.add_face(leaf_name,0,position='branch-right')
def find_identical_leaves_between_clades_of_different_trees(node_a,node_b): matching_leaf_names = [] leaves_a = node_a.get_leaf_names() leaves_b = node_b.get_leaf_names() matching_leaf_names = list(set(leaves_a).intersection(leaves_b)) return matching_leaf_names
reduced_tree_leaves_that_were_found_in_the_full_tree = []
ts = TreeStyle() ts.draw_guiding_lines = False ts.show_leaf_name = False ts.scale = 50
ns = NodeStyle(hz_line_color = 'black', vt_line_color = 'black', hz_line_width = 3, vt_line_width = 3, fgcolor = 'black', size = 0)
ts2 = TreeStyle() ts2.draw_guiding_lines = False ts2.show_leaf_name = False ts2.scale = 5
for group_name in placement.keys(): reduced_tree_clades = [] full_tree_clades = [] if 'reduced' in placement[group_name].keys() and 'full' in placement[group_name].keys(): reduced_tree_clades = placement[group_name]['reduced'] full_tree_clades = placement[group_name]['full'] full_tree = Tree('YR_tree/all_YR_corrected_reviewed_internal_nodes.tre', format=1) if len(full_tree_clades) > 0 and len(reduced_tree_clades) > 0: for full_tree_clade in full_tree_clades: if full_tree_clade in full_tree_descendent_names: found = 0 for reduced_tree_clade in reduced_tree_clades: if found == 0 and reduced_tree_clade in reduced_tree_descendent_names: matches = find_identical_leaves_between_clades_of_different_trees(full_tree&full_tree_clade,reduced_tree&reduced_tree_clade) if len(matches) > 0: found = 1 reduced_tree_leaves_that_were_found_in_the_full_tree = reduced_tree_leaves_that_were_found_in_the_full_tree + matches full_tree = Tree('YR_tree/all_YR_corrected_reviewed_internal_nodes.tre', format=1) (full_tree&full_tree_clade).dist = 0 Nstyle = NodeStyle(hz_line_color = 'black', vt_line_color = 'black', hz_line_width = 3, vt_line_width = 3, fgcolor = 'black', size = 0) Nstyle["bgcolor"] = 'gainsboro' (full_tree&full_tree_clade).set_style(Nstyle) for n in (full_tree&full_tree_clade).traverse(): n.set_style(Nstyle) for leaf in full_tree&full_tree_clade: if leaf.name in matches: add_leaf_name(leaf,'black',15) else: add_leaf_name(leaf,'black',9) add_node_support(full_tree&full_tree_clade) clade_face = TreeFace(full_tree&full_tree_clade, ts2) (reduced_tree&matches[0]).add_face(clade_face, 4, position = 'aligned')
for leaf in reduced_tree: if leaf.name in reduced_tree_leaves_that_were_found_in_the_full_tree and 'RET' in leaf.name or 'BAS' in leaf.name or 'PAT ' in leaf.name: add_leaf_name(leaf,'green',15) elif leaf.name in reduced_tree_leaves_that_were_found_in_the_full_tree: add_leaf_name(leaf,'black',15) elif 'RET' in leaf.name or 'BAS' in leaf.name or 'PAT ' in leaf.name: add_leaf_name(leaf,'black',15) else: add_leaf_name(leaf,'black',9)
reduced_tree.ladderize() R = reduced_tree.get_midpoint_outgroup() reduced_tree.set_outgroup(R) for node in reduced_tree.traverse(): if not node.is_leaf(): add_node_support(node) for n in reduced_tree.traverse(): n.set_style(ns) reduced_tree.set_style(ns) #reduced_tree.show(tree_style = ts) print 'rendering figure' stdout = reduced_tree.render('./FigureS2.pdf', w=1000, tree_style = ts)
rendering figure
from ete2