Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance cooccurring_mutations.csv to include all haplotypes per mutation #45

Open
LaraFuhrmann opened this issue Feb 26, 2025 · 2 comments

Comments

@LaraFuhrmann
Copy link
Collaborator

Description:

Currently, the cooccurring_mutations.csv file lists one row per mutation. However, when a mutation occurs in multiple haplotypes, only one haplotype is listed in the haplotype column. This limits the information available and may lead to incomplete analysis.

Proposed Enhancement:

We suggest modifying the output to include one row per mutation occurrence in each haplotype. This means:

  • If a mutation occurs in multiple haplotypes, there should be a separate row for each of these haplotypes.
  • Each row will contain the mutation information along with its corresponding haplotype.
LaraFuhrmann added a commit that referenced this issue Feb 26, 2025
LaraFuhrmann added a commit that referenced this issue Feb 26, 2025
@LaraFuhrmann
Copy link
Collaborator Author

I have tested the fix on tests/data_1 with the command viloca run -a 0.1 -w 201 --mode shorah -x 100000 -p 0.9 -c 0 -r HXB2:2469-3713 -R 42 -f test_ref.fasta -b test_aln.cram --out_format csv "$@"

The fix passes the tests and the cooccurring_mutations.csv file does as intended.
See line 9 and 19 are the same mutation occurring in two different haplotypes.

9,w-HXB2-2268-2468.reads.fas,hap_0-2268-2468,HXB2,2268,2468,39,2432,T,G,0.46194355049812913,1.0
10,w-HXB2-2268-2468.reads.fas,hap_0-2268-2468,HXB2,2268,2468,39,2439,C,G,0.46194355049812913,1.0
11,w-HXB2-2268-2468.reads.fas,hap_0-2268-2468,HXB2,2268,2468,39,2440,T,A,0.46194355049812913,1.0
12,w-HXB2-2268-2468.reads.fas,hap_1-2268-2468,HXB2,2268,2468,39,2453,T,C,0.5380564495018709,1.0
13,w-HXB2-2268-2468.reads.fas,hap_0-2268-2468,HXB2,2268,2468,39,2467,T,A,0.46194355049812913,1.0
14,w-HXB2-2335-2535.reads.fas,hap_1-2335-2535,HXB2,2335,2535,87,2357,A,C,0.5184725950715671,1.0
15,w-HXB2-2335-2535.reads.fas,hap_1-2335-2535,HXB2,2335,2535,87,2361,A,G,0.5184725950715671,1.0
16,w-HXB2-2335-2535.reads.fas,hap_0-2335-2535,HXB2,2335,2535,87,2362,G,A,1.0,1.0
17,w-HXB2-2335-2535.reads.fas,hap_1-2335-2535,HXB2,2335,2535,87,2363,T,G,0.5184725950715671,1.0
18,w-HXB2-2335-2535.reads.fas,hap_1-2335-2535,HXB2,2335,2535,87,2372,A,G,0.5184725950715671,1.0
19,w-HXB2-2335-2535.reads.fas,hap_1-2335-2535,HXB2,2335,2535,87,2432,T,G,0.5184725950715671,1.0
20,w-HXB2-2335-2535.reads.fas,hap_1-2335-2535,HXB2,2335,2535,87,2439,C,G,0.5184725950715671,1.0

@LaraFuhrmann
Copy link
Collaborator Author

  • Updated the behaviour such that the posterior threshold (-p) is not applied for mutation filtering in cooccurring_mutations.csv.
  • Updated README accordingly

Tested it again on tests/data_1 with the command viloca run -a 0.1 -w 201 --mode shorah -x 100000 -p 0.9 -c 0 -r HXB2:2469-3713 -R 42 -f test_ref.fasta -b test_aln.cram --out_format csv "$@".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant