-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tumor-normal variant calling results change drastically after TrimGalore's application #201
Comments
Hi there, Looking at your command, I don't think that there is anything wrong with it. It should remove Illumina adapters, and further filter reads for a quality threshold of Phred 20 - and not do anything sinister to the reads that prevent variant callers from working. Trying to understand what LowEVS is, it seems that other have experienced the sample thing: All variants being labeled as "LowEVS" #231 Maybe you could try disabling the quality trimming ( Regarding the discrepancy of numbers: The 41% statistic is a cumulative value of all sequences that could the adapter, so for the Illumina adapater |
Just relaying some further information here from @maxulysse and @tdanhorn: "The LowEVS filter is Strelka's attempt to score its confidence in variants. I know nothing about it, but It might be based on machine learning, the key word (and E in EVS) being "empirical", which means based on data, rather than theory. I would assume that this was trained on data as it comes off the sequencer (i.e. not quality filtered, but probably adapter-trimmed). If you throw out a bunch of reads based on Q-scores, the data may no longer conform to the assumptions the EVS is based on. Furthermore, you would be reducing the number of reads, and if the coverage is already low or border-line, this would certainly affect variant calling (since in most sequence analysis applications confidence comes from high numbers)." and
"That is my gut-feeling yes. Whenever you filter or trim, you are throwing away information, so I am generally weary of doing it for just making data "clean and nice" (obviously, adapters are not helpful and should be removed). So yea, I think the current consensus is what I also suggested above, namely to try disabling the quality trimming ( Please let us know how you get on! |
Hi Felix, thank you so much for your replies! I looked into this more closely, and I realized there was a (very) dumb mistake in the way my scripts were parsing fastq files. Incredibly, I haven't realized this in days while testing the various options. Sorry about this. Now that everything seems to work fine, I can confirm that results look good when applying TrimGalore prior to mapping. Both trimming and quality filtering seem to impact the number of LowEVS-flagged variants, but neither of them seem to modify the number of PASS calls substantially. To put it in practical terms: without running TrimGalore, Strelka2 calls 8316 variants, including 852 PASS and 7462 LowEVS. Running TrimGalore with its default quality filtering (q>20), the total number of variants increases to 9239, including however a very similar number of PASS calls (863) and a higher number of LowEVS variants (8373). Running TrimGalore without quality filtering ( So, in this case at least it looks like adapter trimming has a bigger effect on the number of LowEVS calls than read quality filtering, but overall the number of PASS variants remains constant (which I take as a good sign). Based on this, I would probably be incline to use TrimGalore letting it perform its default read quality filtering. If you think it would be wise to disable that, I will proceed that way. Thanks again for your assistance! |
Thanks for taking a closer look and reporting back here, these are good news all round! It's good to see this put to a test. I am glad that the quality trimming doesn't seem to make much of difference (and that if anything the default Trim Galore results come out so well). Wishing you all the best with your downstream analyses! |
Hello,
I am processing 150bp Illumina paired-end whole-exome sequenced tumor-normal sample pairs, with an ensemble of 6 somatic variant callers.
I ran TrimGalore with the following command (which I then repeat on the normal control sample):
However, variant calling results change drastically without vs with trimming, to the point that there is clearly something wrong (e.g. Strelka2 goes from calling hundreds of PASS variants - including some lab-validated ones, to flagging all calls with the LowDepth/LowEVS filter - other callers also go from calling hundreds of variants to none). Average depth across the target region is >600x for the tumor and >200x for the normal sample, so the Strelka2 LowDepth filters don't make sense to me.
One thing I don't understand (and that I suspect may explain what I see) is that TrimGalore recognizes the Illumina universal adapter in 96741 reads (9.6%) of the first million reads of the tumor R1 file, but then the summary says that "reads with adapters" were 41% (39M on about 95.6M). A similar discrepancy is present on all files (tumor/normal R1/R2).
Is it possible I am using a wrong command? And is there something that can explain such a large discrepancy between the estimated 9.6% reads with adapters and the effective 41%?
Thanks very much
The text was updated successfully, but these errors were encountered: