Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect county FIPS code for Bedford, VA #3531

Open
gschivley opened this issue Apr 2, 2024 · 10 comments
Open

Incorrect county FIPS code for Bedford, VA #3531

gschivley opened this issue Apr 2, 2024 · 10 comments
Assignees
Labels
bug Things that are just plain broken.

Comments

@gschivley
Copy link
Contributor

Describe the bug

The addfips package is labeling Bedford, VA as '51515', which is the code for Bedford City. It should actually be '51019' (Bedford County). See their list of FIPS codes.

Bug Severity

How badly is this bug affecting you?
Medium: I was able to identify and fix the bug in my own workflow but it might affect other people.

To Reproduce

I found the error in the core_eia861__yearly_service_territory table. Census population files do not have the FIPS code 51515.

@gschivley gschivley added the bug Things that are just plain broken. label Apr 2, 2024
@e-belfer
Copy link
Member

e-belfer commented Apr 3, 2024

fitnr/addfips#8
It looks like this particular issue was flagged by @TrentonBush almost a year ago, and addressed in the underlying package but not released. So perhaps we just need to bug them to cut a new release.

@e-belfer
Copy link
Member

Update just got pushed! Should be a simple matter of updating dependencies, I'll throw this issue into this sprint.

@jdangerx
Copy link
Member

As far as I can tell we're still waiting on the maintainer to merge their fix commit which apparently didn't make it into the release. I'll bump them again.

@jdangerx
Copy link
Member

I guess we could also pin to their fix/8 branch but we'll see if they respond to my nag, first.

@TrentonBush
Copy link
Contributor

addfips exists to do one job and it fails to do it. Considering the whole package is like 300 lines and the maintainer doesn't maintain it, I think we should replace it. One option is to simply vendor it, another would be to replace it with something like Google's geocoder, which is much more powerful. I have used Google geocoder in a client project for years with good results.

@jdangerx
Copy link
Member

By Google's geocoder, you mean https://geocoder.readthedocs.io/index.html? Just poking around it seems like you'd need a TAMU key to pull FIPS codes out of county names. But it also seems like there's some federal APIs we could hit to get the FIPS codes?

@TrentonBush
Copy link
Contributor

I meant Google Maps Platform's Geocoding API. IMO the primary advantages are that:

  1. they already implemented fuzzy matching (good for manually entered data with misspellings)
  2. it can handle any granularity from street address or lat/lon up to country name.

The disadvantages I am aware of are:

  • I don't think you can select a historical map to reference
  • it will update the reference maps on its schedule, not yours
  • if you're running it on every automated build, you'll need to make a caching layer or suffer network latency and per-call costs.

I use a cache layer and my usage always fits in the (generous) free tier. Occasionally cache invalidation issues cause minor annoyance, but it is easy to fix with a refresh.

@jdangerx
Copy link
Member

Ah sweet! What do you do for a caching layer?

I also just spent a few minutes poking around at the documentation and couldn't see where FIPS code would get returned - unless that gets returned as the short_name of an administrative_level_2 address component. Has that been your experience?

@TrentonBush
Copy link
Contributor

TrentonBush commented Sep 19, 2024

Ah ya I use this as a cleaning/standardization function to convert dirty inputs to the official county names. Then you can do a simple join against the official Census data to get FIPS codes. But you need both!

Also I now realize the work I was referencing is actually public, so I'll just link to it. Sorry in advance for the data scientist quality code 😇

The row-level memory cache saves duplicate API calls per session (eg looking up the same county 1000 times), and the dataframe-level disk cache saves duplicate calls between runs (when a source dataset is unchanged).

I didn't automate the cache invalidation, I just do it manually because updates are infrequent. But the free tier resets each month, so a monthly clear could make sense.

@e-belfer
Copy link
Member

Migrating this discussion over to #3884 to discuss options for fixing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things that are just plain broken.
Projects
Status: Backlog
Development

No branches or pull requests

4 participants