Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow comparison of XVector objects (part 1 of 2) #127

Open
wants to merge 3 commits into
base: devel
Choose a base branch
from

Conversation

ahl27
Copy link

@ahl27 ahl27 commented Jan 7, 2025

This is one of two PRs intended to fix a bug identified in Biostrings here: Bioconductor/Biostrings#51. Part 2 of the PR is located here: Bioconductor/XVector#6.

The bug identified was that comparing a DNAString with a single character caused a C stack too close to limit error. It turns out that the bug ran a lot deeper--comparison between XVector objects was completely broken. For example:

x <- XVector(10)
y <- sample(10)
x == y
## Error: C stack usage  7953792 is too close to the limit

XInteger(10) == XInteger(10)
## works fine

XInteger(10) < XInteger(10)
## Error: C stack usage  7953792 is too close to the limit

The culprit stems originally from the S4Vectors package. pcompare is defined for Vector objects, which uses order and sameAsPreviousROW to perform element-wise comparisons. However, neither of these methods are written for SharedVector objects, so they default to whatever we get from dispatch. order dispatches to the standard base::order function, which always causes a stack overflow when called on a SharedVector object. sameAsPreviousROW calls the default method in S4Vector, which calls .XVector.equals, which will always return c(FALSE, FALSE) or c(FALSE, TRUE) regardless of input size.

This PR adds in the functionality required in the S4Vectors package to support comparison between SharedVector objects. This is primarily just a C-level method to order an array -- this was previously in the package, but only supported integer vectors. This comparison uses memcmp when possible, but a separate method is included for numeric types because memcmp has weird behavior on doubles. Eventually will need to update the code to support the smart dispatch between sorting algorithms like in the other sorting functions, but for now this is sufficient.

@ahl27
Copy link
Author

ahl27 commented Jan 7, 2025

As with Bioconductor/XVector#4, I'll go through these methods to make sure there's consistency in their ordering -- based on a quick scan there are some that need to be moved around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant