Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findOverlaps with type="equal" and a GRangesList #16

Open
kasperdanielhansen opened this issue Oct 4, 2018 · 6 comments
Open

findOverlaps with type="equal" and a GRangesList #16

kasperdanielhansen opened this issue Oct 4, 2018 · 6 comments
Assignees

Comments

@kasperdanielhansen
Copy link

When I use findOverlaps with type="equal" and a GRangesList I get an error:

> findOverlaps(gr, grIntrons.24, type = "equal")
Error in match.arg(type) : 
  'arg' should be one of “any”, “start”, “end”, “within”

(in this case gr is a GRanges and grIntrons.24 is a GRangesList).

This is made even more confusing by the generic

> findOverlaps
standardGeneric for "findOverlaps" defined from package "IRanges"

function (query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", 
    "start", "end", "within", "equal"), select = c("all", "first", 
    "last", "arbitrary"), ...) 
standardGeneric("findOverlaps")
<bytecode: 0x7f8a47c5aea8>
<environment: 0x7f8a48e3a1e0>
Methods may be defined for arguments: query, subject
Use  showMethods("findOverlaps")  for currently available ones.

which strongly suggests type="equal" is valid.

@lawremi
Copy link
Contributor

lawremi commented Jan 25, 2019

I've recently encountered a use case for the "equal" type. I've defined it along the lines of setequal(), so duplicates and order are ignored when determining whether two compound ranges are "equal". This is consistent with type "within" which checks for whether one is a subset of the other. If this sounds OK then I will push it.

@hpages
Copy link
Contributor

hpages commented Jan 26, 2019

type="within" does not seem to treat a compound range as a set of positions:

gr <- GRanges("chr1:11-15")
grl <- GRangesList(GRanges(c("chr1:11-13", "chr1:12-15")))
findOverlaps(gr, grl, type="within")
# Hits object with 0 hits and 0 metadata columns:
#    queryHits subjectHits
#    <integer>   <integer>
#   -------
#   queryLength: 1 / subjectLength: 1

Given that type "equal" is expected to be more stringent than type "within", it would be counter-intuitive to get a hit in the above situation when replacing type="within" with type="equal".

@lawremi
Copy link
Contributor

lawremi commented Jan 26, 2019

I guess what I meant is that type="within" requires a within-match for all query ranges (so the query is a subset of the subject in a more general sense), while type="equal" requires equality for all query ranges, and all subject ranges (so it is more like set equality). This is at the range level, not position level.

@hpages
Copy link
Contributor

hpages commented Jan 26, 2019

mmh I see. So IIUC in the GRanges#GRangesList case (Kasper's use case), type="equal" should report a hit when:

gr <- GRanges("chr1:11-15")
grl <- GRangesList(GRanges(c("chr1:11-15", "chr1:11-15")))

but not when:

grl <- GRangesList(GRanges(c("chr1:11-15", "chr1:11-14")))

@kasperdanielhansen is that the semantic you're after?

@lawremi
Copy link
Contributor

lawremi commented Jan 26, 2019

That's right. It was easier to program and more efficient to ignore the duplicates and order and was good enough for my use case. That was the only rationale.

@hpages
Copy link
Contributor

hpages commented Jun 23, 2019

Looks like @lawremi pushed this back in March (commit f25a45f). Any chance Michael you can add a few unit tests and maybe an example in the man page for this new feature?

Feel free to close the issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants