Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow issues = "*" to get all issues of a single data point #260

Open
ryantibs opened this issue Nov 10, 2020 · 5 comments
Open

Allow issues = "*" to get all issues of a single data point #260

ryantibs opened this issue Nov 10, 2020 · 5 comments
Labels
covidcast Python package covidcast R package engineering Used to filter issues when synching with Asana good first issue Good for newcomers

Comments

@ryantibs
Copy link
Member

ryantibs commented Nov 10, 2020

Currently as I understand it, there's not a super convenient way in the covidcast_signal() function (in the covidcast R package) to specify that I want all issues of a single data point. To get all issues of the JHU deaths in PA on Sept 1, I could use, for example:

covidcast::covidcast_signal(data_source = "jhu-csse", signal = "deaths_incidence_num", geo_type = "state", geo_values="pa", start_day = "2020-09-01", end_day = "2020-09-01", issues = c("2020-09-01", "2020-11-10"))

It would be convenient if I could set issues = "*" to return the same thing. Similar for Python.

Tagging @sarah-colq @chinandrew to draw their attention. This is pretty low priority, but should be an easy fix.

@capnrefsmmat
Copy link
Contributor

The Epidata API actually doesn't support fetching all issues, so this will have to be supported on the server side before clients can add it. cc @krivard to add this to the Epidata wishlist

@ryantibs
Copy link
Member Author

Why couldn't this just be interpreted within the covidcast_signal() function as issues = c(start_day, Sys.Date())?

When the Epidata API itself allows for issues = "*", we can use that instead.

@krivard
Copy link
Contributor

krivard commented Nov 11, 2020

We probably need to use max(start_day, min_issue) since we have a bunch of signals whose first issue included data from multiple months beforehand. min_issue is not supported yet in metadata. There's a PR for it (cmu-delphi/delphi-epidata#236) but the logic is tricky and we don't want to inadvertently double the running time of the meta cache updates.

@SumitDELPHI SumitDELPHI added the engineering Used to filter issues when synching with Asana label Dec 2, 2020
@brookslogan
Copy link
Collaborator

brookslogan commented Aug 7, 2022

We use this kind of query to build epi_archives in epiprocess. In case this is relevant for v4 or implementing issues="*" in the API:

  • [if we ever include forecasting signals, max(start_day, min_issue) would be buggy. We might favor min_issue instead, but that might break covidcast_days. On the API server side, issues="*" could take the simpler approach of just not filtering by issue at all.]
  • This is already easy in epidatr by using covidcast(......., issues = epirange(12340101, 34560101)), but this approach looks a little ugly (and will be buggy in a short ~1400 years), and could not be simply applied to the covidcast library because covidcast_days works off of an issue sequence built from the requested range.

@krivard
Copy link
Contributor

krivard commented Aug 8, 2022

using covidcast(......., issues = epirange(12340101, 34560101))

@melange396 was this the usage you thought was causing performance problems?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
covidcast Python package covidcast R package engineering Used to filter issues when synching with Asana good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants