-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'raw' #95
Comments
I traced the bug to something even weirder. Somehow this code changes bam$subseq, which isn't even listed or part of this as(as(as.raw(x),"XRaw"),"BString") code. It is a bit hard to explain without sharing the raw data and exact code, which I can do offline due to the large data size, but somehow modifying one element of a list with the above string modifications is causing accessing of another element of that list to throw the Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'raw'. And this only happens in Rscript mode, not in interactive mode. There must be some really deep bug somewhere in Biostrings or one of its dependencies, or maybe in R base code itself? I will also keep trying to narrow down the exact source to try and generate a minimal reproducible example, if you'd like to wait for that. |
Hi @gevro ,
Maybe this is not using the same R as the one you use interactively? Please show us the output of:
Also show us the output of Thanks, |
Interactive and Rscript both seem the same: Interactive:
Rscript:
Also, overnight, I found even stranger things suggesting some kind of memory leak across variables. So this bug is really weird. As mentioned above, I traced the issue to a specific line that I wrote related to your prior suggest for issue #65. Somehow that line is corrupting a different variable that I'm not even accessing.
Notice that the command in between the two head(bam$subseq[includereads]) commands doesn’t even modify the bam variable I am querying with the head(bam$subseq[includereads]) commands. Also, when I remove the outermost lapply from the problematic line, I get a different error:
This also happens without the D.letter option:
Note that this line of code works fine for hundreds of other input files, and there is something specific about this input file that is causing this error. But this code should be robust. I don't understand what might be happening. And how assignment to one variable somehow causes a completely different variable to throw an error, not to mention the difference between interactive vs Rscript. Note also what this line is trying to do in the first place is to convert PacBio BAM file ip and pw tags to BStringSet and then layer them with the read's cigar. The ip and pw tag specs are defined here: https://pacbiofileformats.readthedocs.io/en/latest/BAM.html Example tag:
|
Update - I found a workaround, though still no idea what the bug is caused by- Before this step:
Do this:
i.e. write to disk and read back the bam$subseq objects. Why this works is mysterious. |
Hi, This bug is happening again. I have created a reproducible code and input data. Can someone connect with me so I can transfer to them the code and data to reproduce this? This is a very bizarre bug. Likely some kind of data/buffer overflow bug. Just schematically, I'm seeing weird things like this: data <- list() However, with this code, the bug does NOT occur. Somehow, saving data$A is affected by a prior step of loading data$B, which doesn't make sense. Also, with this code, the bug does NOT occur. Somehow, saving data$A and reading it back in before the step of loading data$B prevents the bug from occurring. |
please post output of sessionInfo() in the session in which bug is triggered, after the error erupts. |
please ensure that BiocManager::valid() returns TRUE |
I think it would also be illuminating for you to set options(error=recover) before running the code that triggers the bug. you will get a stack trace that will help pin down where the bug is. post the whole trace. |
THanks.
I have the ~2Gb input data and code that should be reproducible if you or anyone would like to examine this. |
You're using Biostrings 2.68.1 which belongs to Bioconductor 3.17. Please update your installation to the latest version of Bioconductor ( |
Interesting - looks like this error is no longer happening after the upgrade! I wonder what the issue was... This was the strangest bug I've ever seen in R. |
Update: The error happened again with the latest Biostrings and R versions. It happened this time with a different chunk of the data. Fortunately it is happening with an even smaller set of code lines so it helps me narrow down more what might be happening. I will let you know once I have a minimal reproducible code. |
@hpages : Ok I have a minimal reproducible example, ~200 lines of code and an 845 Mb input file. Would you like me to share this with you? It is a very bizarre bug. Changing most lines in this 200 lines of code, even unrelated to the object whose manipulation triggers the error, causes the error. |
@gevro Thanks for your hard work tracking and narrowing down this nasty bug. Much appreciated. Can you please attach the file containing the code to your next comment? For the data, it would be great if you could put it on a file sharing service like Dropbox or similar. Thanks again! |
Thanks. Since I need to protect the input data (I couldn't make a reproducible version with dummy data), I can send you by e-mail. |
Unfortunately if the data is 845Mb, email is not going to work (generally speaking email attachments cannot/should not exceed 10Mb or 20Mb). |
Oh I meant I will send a dropbox link with the file. |
Ah, of course, that makes sense. Please send at |
Hi, I sent the dropbox link with script/files to reproduce the bug. Thanks! |
Hi, I also e-mailed @hpages a docker that reproduces the bug. It seems like a memory overflow bug. |
Hi, Just checking if someone from the developer team is able to reproduce and check this bug? Thank you! |
hpages is on vacation send dropbox link to carey dot vj at gmail and i will have a look |
Thanks - sent! |
Hi all, I managed to reproduce the bug inside the docker, while running with valgrind. Looks like the bug might be at XStringSet_class.c:141, or at least that is the line triggering the memory leak, but this is the first time I'm running valgrind, so to be honest, I don't really know how to interpret this. I previously sent to @vjcitn itn and @hpages the docker to reproduce this. I now also updated the docker to also have valgrind installed. Output below:
|
Hi all, just bumping up this issue again. Since I've narrowed it down to a specific line I think in the biostrings C code. But feasibly only someone who wrote that code would be able to figure out the memory overflow bug. Thanks |
@ahl27 are you able to look at this? |
Yep, I'll take a look at it tomorrow--thanks for the ping! |
HI @ahl27, Thanks! What is your email address so I can send you a link and info for a docker that exactly reproduces the bug? |
You can send it to [email protected]'s been a while since I read through this bug report, so I can reread through it tomorrow. It's been on my list for a while, and now that unit testing is done I can actually spend some time on it. |
Just a quick update since it's been a week on this--I did get the data from @gevro, and I am able to reproduce the bug in a docker container. I can also reproduce the bug on my own machine...so that's progress. I haven't yet been able to track down exactly what is causing the bug. I'm not yet convinced it's from Biostrings; there is certainly an issue with memory somewhere, but I haven't yet determined whether that's from Biostrings misallocating memory or a different function call messing up memory allocations. This week is particularly busy for me, but I'm planning to spend a lot more time on this next week to hopefully reach a resolution. This kind of bug feels a lot like when there are issues with PROTECT calls...to start testing that, I did re-run the examples using Hopefully I'll have more progress to report next week. |
Hi,
I'm encountering a very strange bug/error.
Code causing the error:
Error:
I am upgraded to the latest versions of Biostrings, GenomicAlignments, Rsamtools, etc, current as of yesterday.
Since I cannot share the input data here, if any of you has a sense of what this might be, I can connect off-line if you share your e-mail address to share with you example data.
The text was updated successfully, but these errors were encountered: