-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sec10k metadata directly in PUDL #4035
Merged
Merged
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
900f926
Add sec10k metadata directly in PUDL
zschira 5931d9b
Remove deltalake dependency
zschira 31faab3
Add missing sec10k resource metadata
zschira 26140ec
Improve sec10k schema definitions
zschira 90aab59
Override fraction_owned desc for sec tables
zschira 5620521
Improve company_id_sec description
zschira e2bca00
Convert year_quarters to dates
zschira f07f9f9
Only load sec assets if env variable is set
zschira d875d4f
Merge branch 'main' into add-sec-metadata
zaneselvans 3858ba5
Merge in main and relock dependencies
zaneselvans 94dda09
Fix inconsistent column name
zschira fb3f1ad
Remove unnecessary column
zschira 9f86921
improve description of company_id_sec
zschira f091f24
Split sic into two columns
zschira 29838e1
Fix date types
zschira ef4e91d
Add sec tables to sqlite
zschira 3c765cb
Merge branch 'add-sec-metadata' of github.com:catalyst-cooperative/pu…
zschira ef7eb3d
Fix conflicting heads
zschira f9c3e76
Fix date column name conventions
zschira b3cf21c
Make subsidiary references consistent
zschira 61163e0
Actually fix alembic issues (hopefully)
zschira 8ecac80
Update SEC 10-K column naming and SIC industry code dtype.
zaneselvans fa201ff
Update micromamba docker image.
zaneselvans File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels odd that this is under the
analysis
subpackage given that the analysis isn't done here. It seems more like a straightforward ETL of some tables, albeit tables that we ourselves have produced elsewhere. Would it be more legible if it followed the same pattern as for other datasets, adding ansec10k
module under theextract
andtransform
subpackages?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the organization feels weird, but I don't think that would quite make sense either. We are doing some basic transformations in PUDL right now, but ideally these would all be moved back upstream so all we actually do in PUDL is read the parquet files in so they can be distributed. In that case the extract step would be creating
core
/out
tables which doesn't feel right. Would it make sense to add this to theetl
directory?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is just a new class of externally processed table? If it doesn't clearly fit into one of the other categories right now then I guess we can figure out where to put it when other tables like this show up.
The
etl
directory feels like a weird junk drawer to me. I still like the idea of organizing the subpackages by data source, given that different data sources have ended up having slightly different ETL journeys that don't necessarily fit neatly in to E, T, and L (and the fact that L has been almost totally take over by Dagster infra)