-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource field description cleanup #3283
Conversation
* Remove fields that are not in any PUDL tables. * Add descriptions to fields that do not have a description. * Made field descriptions manditory. * Update unit tests to have description fields when making test Resources * Fix docstring examples so the builds don't fail
* Add a description to the RESOURCE_METADATA for tables missing a description. * Make the description variable necessary for Resource() instances. * Update unit tests to have description fields for Resource() instances. * Fix docstring examples so the builds don't fail. * Tack on the alembic update that should have gone with the previous commit.
* Remove the xfail for the test_defined_fields_are_used() function. * Remove three fields that I missed before that aren't being used. * Update the alembic file for those three removed fields.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3283 +/- ##
=======================================
- Coverage 92.7% 92.7% -0.0%
=======================================
Files 145 144 -1
Lines 13100 13091 -9
=======================================
- Hits 12142 12133 -9
Misses 958 958 ☔ View full report in Codecov by Sentry. |
One question I have is how or whether we will create descriptions for the output tables (all tables with the |
I think given that we intend the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly the resource descriptions look good! I made a few suggestions.
We should remove the xfail
from the two unit tests that check whether all fields and resources have descriptions (they're currently XPASS in the unit tests, so seems like they should pass) or potentially remove them entirely since we're now requiring descriptions.
It does seem like there are some table name clarifications / simplifications that we would be good to do here while we're still cleaning up old names. @bendnorman what do you think about getting this into the Great Rename? The old EIA-861 table names were kind of just pulled from the EIA-861 forms / spreadsheets without too much intention since they were "interim" 😆.
@aesharpe would you be willing to put your detailed per-field "suggested fixes" in context in the fields.py
file as a self-review? I don't have enough context to understand what's going on with all those fields and would end up trying to transfer all the comments myself to see what's going on, but you've got more context on them.
I have not yet reviewed fields.py
(converting to draft so the |
I think it makes sense to remove them because of the required description field. |
…cription fields because now they are required (#3283).
…mbine the energy_displaced_mwh column with the sold_to_utility_mwh column. The former only shows up in years 2007-2009 and upon further inspection seems analogous with the latter. Removed the former from the schema and updated the column map to point the old energy_displaced_mwh columns at the sold_to_utility_mwh column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you feel pretty confident about the "total energy displaced" and "sold to utility" re-org then I think this is good to go assuming all the tests pass.
I am playing "Human Merge Queue" and just kicked off the CI on #3294. Once it finishes and merges in (or fails) I'll kick it off here too and get this PR merged in. |
This PR addresses the fixes laid out in #3224
Field
andResource
instancestest_defined_fields_are_used
test so we make sure all fields are being usedNotes on unused fields that were removed:
(I went through and checked when each description-less field was last edited to see what might have happened)
active
no record of additioncountry
changed to owner_country out_eia860__yearly_ownership table from the ownership_eia860 tablecredits_or_adjustments
no record of additiondelivery_customers
no record of additiondepreciation_amortization_value
from f1_dacs_epda table - I think changed to "dollar_value" in the out_ferc1__yearly_depreciation_summary_sched336 tableelectric_plant
no record of additionenergy_source
no record of additionenvironmental_equipment_name
from the raw_eia923__emissions_control table which hasn't been transformed yetexpense
changed to dollar_value in the core_ferc1__yearly_operating_expenses_sched320 tablefuel_transportation_mode
no record of additionfuture_plant
no record of additionincome
changed to dollar_value in the core_ferc1__yearly_income_statements_sched114 tableis_total
used to be used to identify total rows in ferc1 tablesleased_plant
no record of additionline_id
no record of additionmonth
no record of additionnotes
was used to show any notes extracted from ferc1 tablesoperator_name
not exactly sure, maybe dropped from eia860 ownership out_ table?operator_state
not exactly sure, maybe dropped from eia860 ownership out_ table?operator_utility_id_eia
not exactly sure, maybe dropped from eia860 ownership out_ table?other
no record of additionother_total
no record of additionowner_name
owner_namepeak_demand_summer_mw
old ferc714 table description_pa_ferc714peak_demand_winter_mw
old ferc714 table description_pa_ferc714period_nox
from the raw_eia860__boiler_info table that gets pulled into the _core_eia860__boilers table but I guess the columns get droppedperiod_particulate
from the raw_eia860__boiler_info table that gets pulled into the _core_eia860__boilers table but I guess the columns get dropped idk if it's intentional?period_so2
from the raw_eia860__boiler_info table that gets pulled into the _core_eia860__boilers table but I guess the columns get dropped idk if it's intentional?prime_mover
no record of additionretail_sales
no record of addition`sales_for_resale
no record of additionstatus
no record of additionstorage_capacity_mw
no record of additionstorage_customers
no record of additiontotal
no record of additiontotal_meters
no record of additiontransmission
no record of additionunbundled_revenues
no record of additionutility_attn
no record of additionutility_pobox
no record of additionvirtual_capacity_mw
no record of addition, probably from pre-normalized EIA861 datavirtual_customers
no record of addition, probably from pre-normalized EIA861 dataSuggested fixes based on going through lots of field metadata:
High priority fixes:
business_model
this is a weird column in core_eia861__yearly_sales which contains a letter I'm not sure if we've coded correctly.fuel_pct
kind of funky carve out.Low priority fixes (probably out of scope):
demand_mwh
Could be more specific: add "hourly_demand"?incremental_energy_savings_mwh
Appears to align with old column "energy_efficiency_incremental_effects_mwh" - maybe we want them to have the same name? (from 2012 split from DSM table to EE/DR tables).incremental_peak_reduction_mw
Appears to align with old column "energy_efficiency_incremental_actual_peak_reduction_mw" - maybe we want them to have the same name? (from 2012 split from DSM table to EE/DR tables).respondent_type
Maybe this should be more specific like "respondent_type_ferc714"utc_datetime
There's another field similar to this - investigate whether they are the same thing or not. On a similar note: it might be worth it to go through some of the fields and make sure there are no duplicates. I feel like there are a lot of categorical fields like "respondent type" etc. that are very similar across data sources. We should either make them more specific by adding data source suffixes or make them more generic and apply to all tables. There's definitely an opportunity to go through some of the fields and prune a bit, but it's not a small task.utility_owned_capacity_mw
Could make this col name more specific to non-net metered capacitySuggested fixes based on going through lots of resource metadata:
Low priority fixes (probably out of scope)
dispersed_generation
tables (part of the rawdistributed_generation
spreadsheets). Is that something we want to do? (distributed = grid connected, dispersed = not grid connected).