Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task: Find all APIs on apis.guru that are categorized as "open_data" #65

Open
jonthegeek opened this issue Feb 7, 2024 · 1 comment

Comments

@jonthegeek
Copy link
Owner

jonthegeek commented Feb 7, 2024

While not strictly NECESSARY, it's easiest to do this with rectangled data. That's the LOs I'm expecting here.

This tibblifys poorly. Don't go into tibblify yet here, and save this for a later discussion of pros and cons of tibblify.

@jonthegeek
Copy link
Owner Author

# Named list --> nested tibble
all_apis_df <- all_apis |>
  tibble::enframe(name = "api_name")
all_apis_df
all_apis_df$value |> lengths() |> unique()
all_apis_df$value[[1]] |> names()
setdiff(
  names(all_apis_df$value[[1]]),
  names(all_apis_df$value[[11]])
)

# all_apis_df$value contains length-3 named lists. Each value looks like a
# column.
all_apis_versions <- all_apis_df |>
  tidyr::unnest_wider(value)
all_apis_versions
all_apis_versions$versions |> lengths() |> unique()
all_apis_versions$versions |> lengths() |> head(10)
all_apis_versions$versions[[10]] |> names()
setdiff(
  names(all_apis_versions$versions[[10]]),
  names(all_apis_df$value[[1]])
)
# Each `versions` value is a separate API version, with no standardization.
# Prime case for unnesting longer.
all_apis_preferred <- all_apis_versions |>
  tidyr::unnest_longer(versions, indices_to = "version") |>
  # We only care about the "preferred" versions.
  dplyr::filter(preferred == version) |>
  # "preferred" and "version" now contain the same info by definition. In this
  # case "added" is duplicated in versions, so lets get rid of it, too. We also
  # want to reorder, so we'll select the columns we care about.
  dplyr::select(api_name, version, versions)
all_apis_preferred
all_apis_preferred$versions |> lengths() |> unique()
setdiff(
  names(all_apis_preferred$versions[[7]]),
  names(all_apis_preferred$versions[[1]])
)
# It looks like there's an optional field, but otherwise these are
# rectangle-able.
all_apis_preferred_wide <- all_apis_preferred |>
  tidyr::unnest_wider(versions)
all_apis_preferred_wide
all_apis_preferred_wide$info |> lengths() |> unique()
all_apis_preferred_wide$info |> lengths() |> head()
setdiff(
  names(all_apis_preferred_wide$info[[4]]),
  names(all_apis_preferred_wide$info[[1]])
)
# all_apis_preferred_wide$info is a list of many possible columns. We don't want
# all of them, we just want the categories.
all_apis_preferred_wide |>
  tidyr::hoist(info, categories = "x-apisguru-categories") |>
  tidyr::unnest_longer(categories) |>
  dplyr::filter(categories == "open_data")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant