From 3d2c405b880724ebc9ba83bebef64771f24f2640 Mon Sep 17 00:00:00 2001 From: Dazhong Xia Date: Sat, 11 Nov 2023 12:56:31 -0500 Subject: [PATCH] Reorganize contributing docs + add process description. --- .github/ISSUE_TEMPLATE/new_dataset.md | 30 ++++ .github/pull_request_template.md | 52 ++----- .github/workflows/build-deploy-pudl.yml | 1 + CONTRIBUTING.rst | 97 ++++++++++++ docs/CONTRIBUTING.rst | 142 +++++++----------- .../resources/ferc1_eia_record_linkage.py | 4 +- 6 files changed, 197 insertions(+), 129 deletions(-) create mode 100644 .github/ISSUE_TEMPLATE/new_dataset.md create mode 100644 CONTRIBUTING.rst diff --git a/.github/ISSUE_TEMPLATE/new_dataset.md b/.github/ISSUE_TEMPLATE/new_dataset.md new file mode 100644 index 0000000000..f66566062c --- /dev/null +++ b/.github/ISSUE_TEMPLATE/new_dataset.md @@ -0,0 +1,30 @@ +--- +name: New dataset +about: Provide information about a new dataset you'd like to see in PUDL +title: '' +labels: new-data +assignees: '' +--- + +### Overview + +What is this dataset? + +Why do you want it in PUDL? + +Is it already partially in PUDL, or do we need to start from scratch? + +### Logistics + +Is this dataset publically available? + +Where can one download the actual data? + +How often does this dataset get updated? + +What licensing restrictions apply? + +### What do you know about it so far? + +What have you done with this dataset so far? Have you run into any problems with +it yet? diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 325f1bcb8a..6bdff4f217 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -1,49 +1,25 @@ +# Overview -# PR Overview +Closes #XXXX. - +# Testing -# PR Checklist +How did you make sure this worked? How can a reviewer verify this? -- [ ] Merge the most recent version of the branch you are merging into (probably `dev`). -- [ ] All CI checks are passing. [Run tests locally to debug failures](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/testing.html#running-tests-with-tox) -- [ ] Make sure you've included good docstrings. +```[tasklist] +# To-do list +- [ ] Make sure full ETL runs & `make pytest-integration-full` passes locally - [ ] For major data coverage & analysis changes, [run data validation tests](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/testing.html#data-validation) -- [ ] Include unit tests for new functions and classes. -- [ ] Defensive data quality/sanity checks in analyses & data processing functions. -- [ ] Update the [release notes](https://catalystcoop-pudl.readthedocs.io/en/latest/release_notes.html) and reference reference the PR and related issues. -- [ ] Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively. +- [ ] If updating analyses or data processing functions: write data quality checks +- [ ] Update the [release notes](../docs/release_notes.rst): reference the PR and related issues. +- [ ] Review the PR yourself and call out any questions or issues you have +``` diff --git a/.github/workflows/build-deploy-pudl.yml b/.github/workflows/build-deploy-pudl.yml index 9419258d18..eec4318ecf 100644 --- a/.github/workflows/build-deploy-pudl.yml +++ b/.github/workflows/build-deploy-pudl.yml @@ -139,4 +139,5 @@ jobs: channel-id: "C03FHB9N0PQ" slack-message: "build-deploy-pudl status: ${{ job.status }}\n${{ env.COMMIT_TIME}}-${{ env.SHORT_SHA }}-${{ env.COMMIT_BRANCH }}" env: + channel-id: "C03FHB9N0PQ" SLACK_BOT_TOKEN: ${{ secrets.PUDL_DEPLOY_SLACK_TOKEN }} diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst new file mode 100644 index 0000000000..a8a53bacf4 --- /dev/null +++ b/CONTRIBUTING.rst @@ -0,0 +1,97 @@ +-------------------- +Contributing to PUDL +-------------------- + +Welcome! We're so glad you're interested in contributing to PUDL! We would love +some help making PUDL data as complete as possible. + +.. _after-intro: + +.. IMPORTANT:: Already have a dataset in mind? + + If you **need data that's not in PUDL** that we're missing in PUDL, + `open an issue `__ + to tell us more about it! + + If you've **already written some code to wrangle a dataset**, find us at + `office hours `__ and we + can talk through next steps. + + +Your first contribution +----------------------- + +**Setup** + +You'll need to fork this repository and get the +`dev environment set up `__. + +**Pick an issue** + +* Look for issues with the `good first issue + `__ + tag in our `Community Kanban Board + `__. These + are issues that don't require a ton of PUDL-specific context, and are + relatively tightly scoped. + +* Comment on the issue and tag ``@com-dev`` (our Community Development Team) to + let us know you're working on it. Feel free to ask any questions you might + have! + +* Once you have an idea of how you want to tackle this issue, write out your + plan so we can guide you around obstacles in your way! Post a comment outlining: + * what steps have you broken this down into? + * what is the output of each step? + * how will one know that each step is working? + +**Work on it!** + +* Make a branch on your fork and open a draft pull request (PR) early so we can + discuss concrete code! **Set the base branch to ``dev`` unless there's a good + reason otherwise.** Please don't wait until it's all polished up - it's much + easier for us to help you when we can see the code evolve over time. + +* Please make sure to write tests and documentation for your code - if you run + into trouble with writing tests, let us know in the comments and we can help! + We automatically run the test suite for all PRs, but some of those will have + to be manually approved by Catalyst members for safety reasons. + +* **Try to keep your changes relatively small:** stuff happens, and one's + bandwidth for volunteer work can fluctuate frequently. If you make a bunch of + small changes, it's much easier to pause on a project without losing a ton of + context. We try to keep PRs to **less than 500 lines of code.** + +**Get it merged in!** + +* Turn the draft PR into a normal PR and ping ``@com-dev``. We'll try to get + back to you within a few days - the smaller/simpler the PR, the faster we'll + be able to get back to you. + +* The reviewer will leave comments - if they request changes, address their + concerns and re-request review. + +* There will probably be some back-and-forth until your PR is approved - this + is normal and a sign of good communication on your part! Don't be shy about + asking us for updates and re-requesting review! + +* Don't accidentally "start a review" when responding to comments! If this does + happen, don't forget to submit the review you've started so the other PR + participants can see your comments (they are invisible to others if marked + "Pending"). + +Next contributions +------------------ + +Hooray! You made your first contribution! To find another issue to tackle, check +out the `Community Kanban board +`__ where +we've picked out some issues that are + +* useful to work on + +* unlikely to become super time-sensitive + +* have some context, success criteria, and next steps information. + +Pick one of these and follow the contribution flow above! diff --git a/docs/CONTRIBUTING.rst b/docs/CONTRIBUTING.rst index 9e5bffc4c3..ee5c8d9a7c 100644 --- a/docs/CONTRIBUTING.rst +++ b/docs/CONTRIBUTING.rst @@ -2,111 +2,75 @@ Contributing to PUDL =============================================================================== + Welcome! We're excited that you're interested in contributing to the Public Utility -Data Liberation effort! The work is currently being coordinated by the members of the -`Catalyst Cooperative `__. PUDL is meant to serve a wide -variety of public interests including academic research, climate advocacy, data -journalism, and public policy making. This open source project has been supported by -a combination of volunteer contributions, grant funding from the `Alfred P. Sloan -Foundation `__, and reinvestment of net income from the -cooperative's client projects. +Data Liberation effort! + +We need lots of help with :ref:`user-feedback`, we welcome :ref:`code-contribs`, and +it would be great to :ref:`connect-orgs` that we can work with. + +--------------- +Code of Conduct +--------------- Please make sure you review our :doc:`code of conduct `, which is based on the `Contributor Covenant `__. We want to make the PUDL project welcoming to contributors with different levels of experience and diverse personal backgrounds. -------------------------------------------------------------------------------- -How to Get Involved -------------------------------------------------------------------------------- +.. _user-feedback: -We welcome just about any kind of contribution to the project. Alone, we'll never be -able to understand every use case or integrate all the available data. The project -will serve the community better if other folks get involved. +------------- +User feedback +------------- -There are lots of ways to contribute -- it's not all about code! +PUDL's goal is to help people use data to make change in the US energy landscape. +As such, it's critical that we understand our users' needs! `GitHub Discussions +`__ is our main forum +for all this. Since it's publicly readable, any conversation here can +potentially benefit other users too! -* If you need help, someone else might need it too - ask for help in `Github - Discussions +We'd love it if you could: + +* Tell us what problems you're running into, in the `Help Me! `__ - and maybe the ensuing discussion will be useful to other people too! -* `Suggest new data and features `__ that would be useful. + discussion board +* Tell us about what data you're looking for by opening an `issue + `__ +* Tell us what you're trying to do with PUDL data in `this thread + `__ * `File bug reports `__ on Github. -* Help expand and improve the documentation, or create new - `example notebooks `__ -* Help us create more and better software :doc:`test cases `. -* Give us feedback on overall usability using `GitHub Discussions +* Tell us what you'd like to see in PUDL in the `Ideas `__ - -- what's confusing? -* Tell us a story about how you're using of the data. -* Point us at interesting publications related to open energy data, open source energy - system modeling, how energy policy can be affected by better data, or open source - tools we should check out. -* Cite PUDL using - `DOIs from Zenodo `__ - if you use the software or data in your own published work. + discussion board + +.. _code-contribs: + +-------------------- +Code contributions +-------------------- + +.. include:: ../CONTRIBUTING.rst + :start-after: after-intro: + +.. _connect-orgs: + +----------------------------------- +Connect us with other organizations +----------------------------------- + +For PUDL to make a bigger impact, we need to find more people who need the data. +Here's how you can help: + +* Cite PUDL using `DOIs from Zenodo + `__ if you use the + software or data in your own published work. * Point us toward appropriate grant funding opportunities and meetings where we might present our work. +* Point us at interesting publications related to open energy data, open source + energy system modeling, how energy policy can be affected by better data, or + open source tools we should check out. * Share your Jupyter notebooks and other analyses that use PUDL. * `Hire Catalyst `__ to do analysis for your organization using the PUDL data -- contract work helps us self-fund ongoing open source development. -* Contribute code via - `pull requests `__. - See the :doc:`developer setup ` for more details. -* And of course... we also appreciate - `financial contributions `__. - -.. seealso:: - - * :doc:`dev/dev_setup` for instructions on how to set up the PUDL - development environment. - -------------------------------------------------------------------------------- -Find us on GitHub -------------------------------------------------------------------------------- -Github is the primary platform we use to manage the project, integrate -contributions, write and publish documentation, answer user questions, automate -testing & deployment, etc. -`Signing up for a GitHub account `__ -(even if you don't intend to write code) will allow you to participate in -online discussions and track projects that you're interested in. - -Asking (and answering) questions is a valuable contribution! As noted in `How to -support open-source software and stay sane -`__, it's much more efficient to -ask and answer questions in a public forum because then other users and contributors -who are having the same problem can find answers without having to re-ask the same -question. The forum we're using is our `Github discussions -`__. - -Even if you feel like you have a basic question, we want you to feel -comfortable asking for help in public -- we (Catalyst) only recently came to -this data work from being activists and policy wonks -- so it's easy for us to -remember when it all seemed frustrating and alien! Sometimes it still does. We -want people to use the software and data to do good things in the world. We -want you to be able to access it. Using a public forum also enables the -community of users to help each other! - -Don't hesitate to post a discussion with a `feature request -`__, -a pointer to energy data that needs liberating, or a reference to documentation -that's out of date, unclear, or missing. Understanding how people are using the -software, and how they would *like* to be using the software, is very valuable and -will help us make it more useful and usable. - -------------------------------------------------------------------------------- -Our design process -------------------------------------------------------------------------------- - -We do our technical design out in the open, so that community members can weigh -in. Here's the process we usually follow: - -1. Someone has a problem they'd like to solve. They post in the `Ideas - `__ - forum with their problem and some context. - -2. Discussion ensues. - -3. When the open questions are answered, we create an issue from the discussion, - which holds the conclusions of the discussion. diff --git a/src/pudl/metadata/resources/ferc1_eia_record_linkage.py b/src/pudl/metadata/resources/ferc1_eia_record_linkage.py index e1a5f89032..c60ecedf3f 100644 --- a/src/pudl/metadata/resources/ferc1_eia_record_linkage.py +++ b/src/pudl/metadata/resources/ferc1_eia_record_linkage.py @@ -23,8 +23,8 @@ Because generators are often owned by multiple utilities, another dimension of this plant part table involves generating two records for each owner: one for the portion of the plant part they own and one for the plant part as a whole. The -portion records are labeled in the "ownership_record_type" column as "owned" -and the total records are labeled as "total". +portion records are labeled in the ``ownership_record_type`` column as ``owned`` +and the total records are labeled as ``total``. This table includes A LOT of duplicative information about EIA plants. It is primarily meant for use as an input into the record linkage between FERC1 plants and EIA.""",