Skip to content

Commit

Permalink
Add CI targets in Makefile. Docs cleanup.
Browse files Browse the repository at this point in the history
  • Loading branch information
zaneselvans committed Nov 14, 2023
1 parent ff20c66 commit 08f318f
Show file tree
Hide file tree
Showing 5 changed files with 45 additions and 18 deletions.
22 changes: 19 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ VPATH = environments:${PUDL_OUTPUT}
########################################################################################
.PHONY: dagster
dagster:
dagster dev -m pudl.etl -m pudl.ferc_to_sqlite
dagster-webserver -m pudl.etl -m pudl.ferc_to_sqlite

.PHONY: jlab
jlab:
Expand Down Expand Up @@ -54,6 +54,7 @@ conda-lock.yml: pyproject.toml
pudl-dev: conda-lock.yml
${mamba} run --name base ${mamba} env remove --name pudl-dev
conda-lock install --name pudl-dev --${mamba} --dev environments/conda-lock.yml
echo "To activate the fresh environment run: mamba activate pudl-dev"

.PHONY: install-pudl
install-pudl: pudl-dev
Expand Down Expand Up @@ -104,7 +105,7 @@ pudl:
coverage run ${covargs} -- src/pudl/cli/etl.py ${gcs_cache_path} ${etl_full_yml}

########################################################################################
# pytest
# Targets that are coordinated by pytest -- mostly they're actual tests.
########################################################################################
.PHONY: pytest-unit
pytest-unit:
Expand All @@ -119,9 +120,12 @@ coverage-erase:
coverage erase

.PHONY: pytest-coverage
pytest-coverage: coverage-erase docs-build pytest-unit pytest-integration
pytest-coverage: coverage-erase docs-build pytest-ci
${coverage_report}

.PHONY: pytest-ci
pytest-ci: pytest-unit pytest-integration

.PHONY: pytest-integration-full
pytest-integration-full:
pytest ${pytest_args} -n auto --live-dbs --etl-settings ${etl_full_yml} test/integration
Expand Down Expand Up @@ -165,3 +169,15 @@ unmapped-ids:
--ignore-foreign-key-constraints \
--etl-settings ${etl_full_yml} \
test/integration/glue_test.py

########################################################################################
# Continuous Integration Tests
########################################################################################
.PHONY: pre-commit
pre-commit:
pre-commit run --all-files

# This target will run all the tests that typically take place in our continuous
# integration tests on GitHub (with the exception building our docker container).
.PHONY: ci
ci: pre-commit pytest-coverage
2 changes: 1 addition & 1 deletion docs/dev/clone_ferc1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ or with the ``ferc_to_sqlite`` script (see :ref:`run-cli`).

.. note::

We recommend using Dagit to execute the ETL as it provides additional
We recommend using the Dagster UI to execute the ETL as it provides additional
functionality for re-execution and viewing dependences.

Executing a ``ferc_to_sqlite`` job will create several outputs that you can
Expand Down
6 changes: 3 additions & 3 deletions docs/dev/dev_dagster.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ UI for a given Run and Dagster related logs will appear at the bottom of the UI:

.. image:: ../images/dagster_ui_logs.png
:width: 800
:alt: Dagit logs
:alt: Dagster UI logs

To view logs from previous runs, click on the Run tab in the upper left hand
corner, then click the Run ID of the desired run to view the dagster logs.
Expand Down Expand Up @@ -77,12 +77,12 @@ of a given resource. PUDL currently has three resources:
The ``dataset_settings`` resource tells the PUDL ETL which years
of data to process. You can configure the dataset settings
by holding shift while clicking the "Materialize All" button in the upper
right hand corner of the Dagit interface. This will bring up a window
right hand corner of the Dagster UI. This will bring up a window
where you change how the resource is configured:

.. image:: ../images/dataset_settings_config.png
:width: 800
:alt: Dagit home
:alt: Dagster UI home

.. note::

Expand Down
29 changes: 18 additions & 11 deletions docs/dev/run_the_etl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -157,8 +157,8 @@ Both definitions have two preconfigured jobs:

.. _run-dagster-ui:

Running the ETL with Dagit
--------------------------
Running the ETL via the Dagster UI
----------------------------------

Dagster needs a directory to store run logs and some interim assets. We don't
distribute these outputs, so we want to store them separately from
Expand Down Expand Up @@ -289,7 +289,7 @@ To view the status of the run, click the date next to "Latest run:".

.. image:: ../images/dagster_ui_pudl_etl.png
:width: 800
:alt: Dagit pudl_etl
:alt: Dagster UI pudl_etl

You can also re-execute specific assets by selecting one or
multiple assets in the "Overview" tab and clicking "Materialize selected".
Expand All @@ -298,14 +298,14 @@ want to rerun the entire ETL.

.. note::

Dagster does not allow you to select asset groups for a specific job.
For example, if you click on the ``raw_eia860`` asset group in Dagit,
click "Materialize All", the default configuration values will be used
so all available years of the data will be extracted.
Dagster does not allow you to select asset groups for a specific job. For example, if
you click on the ``raw_eia860`` asset group in the Dagster UI click "Materialize All",
the default configuration values will be used so all available years of the data will
be extracted.

To process a subset of years for a specific asset group, select the
asset group, shift+click "Materialize all" and configure the
``dataset_settings`` resource with the desired years.
To process a subset of years for a specific asset group, select the asset group,
shift+click "Materialize all" and configure the ``dataset_settings`` resource with the
desired years.

.. note::

Expand All @@ -325,7 +325,7 @@ Dagster's job execution API.

.. note::

We recommend using Dagit to execute the ETL as it provides additional
We recommend using the Dagster UI to execute the ETL as it provides additional
functionality for re-execution and viewing asset dependences.

There are two main CLI commands for executing the PUDL processing pipeline:
Expand All @@ -334,6 +334,13 @@ There are two main CLI commands for executing the PUDL processing pipeline:
You must run this script before you can run ``pudl_etl``.
2. ``pudl_etl`` executes the ``pudl.etl`` asset graph.

We also have targets set up in the ``Makefile`` for running these scripts:

.. code-block:: console
$ make ferc
$ make pudl
Settings Files
--------------
These CLI commands use YAML settings files in place of command line arguments.
Expand Down
4 changes: 4 additions & 0 deletions docs/dev/testing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ above there are also:
* ``pytest-minmax-rows``: Check that various database tables have the expected number of
records in them, and report back the actual number of records found. Requires an
existing PUDL DB.
* ``pytest-coverage``: Run all the software tests and generate a test coverage report.
* ``pytest-ci``: Run the unit and integration tests (those tests that get run in CI).

Running Other Commands with Make
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -89,6 +91,8 @@ There are a number of non-test ```make`` targets. To see them all open up the
kill it with ``Control-C``).
* ``jlab``: start up a JupyerLab notebook server (will remain running in your terminal
until you kill it with ``Control-C``).
* ``ci``: Run all the checks that would be run in CI on GitHub, including the pre-commit
hooks, docs build, and software unit and integration tests.

-------------------------------------------------------------------------------
Selecting Input Data for Integration Tests
Expand Down

0 comments on commit 08f318f

Please sign in to comment.