Add CI targets in Makefile. Docs cleanup.

catalyst-cooperative · Nov 14, 2023 · 08f318f · 08f318f
1 parent ff20c66
commit 08f318f
Show file tree

Hide file tree

Showing 5 changed files with 45 additions and 18 deletions.
diff --git a/Makefile b/Makefile
@@ -22,7 +22,7 @@ VPATH = environments:${PUDL_OUTPUT}
 ########################################################################################
 .PHONY: dagster
 dagster:
-	dagster dev -m pudl.etl -m pudl.ferc_to_sqlite
+	dagster-webserver -m pudl.etl -m pudl.ferc_to_sqlite
 
 .PHONY: jlab
 jlab:
@@ -54,6 +54,7 @@ conda-lock.yml: pyproject.toml
 pudl-dev: conda-lock.yml
 	${mamba} run --name base ${mamba} env remove --name pudl-dev
 	conda-lock install --name pudl-dev --${mamba} --dev environments/conda-lock.yml
+	echo "To activate the fresh environment run: mamba activate pudl-dev"
 
 .PHONY: install-pudl
 install-pudl: pudl-dev
@@ -104,7 +105,7 @@ pudl:
 	coverage run ${covargs} -- src/pudl/cli/etl.py ${gcs_cache_path} ${etl_full_yml}
 
 ########################################################################################
-# pytest
+# Targets that are coordinated by pytest -- mostly they're actual tests.
 ########################################################################################
 .PHONY: pytest-unit
 pytest-unit:
@@ -119,9 +120,12 @@ coverage-erase:
 	coverage erase
 
 .PHONY: pytest-coverage
-pytest-coverage: coverage-erase docs-build pytest-unit pytest-integration
+pytest-coverage: coverage-erase docs-build pytest-ci
 	${coverage_report}
 
+.PHONY: pytest-ci
+pytest-ci: pytest-unit pytest-integration
+
 .PHONY: pytest-integration-full
 pytest-integration-full:
 	pytest ${pytest_args} -n auto --live-dbs --etl-settings ${etl_full_yml} test/integration
@@ -165,3 +169,15 @@ unmapped-ids:
 		--ignore-foreign-key-constraints \
 		--etl-settings ${etl_full_yml} \
 		test/integration/glue_test.py
+
+########################################################################################
+# Continuous Integration Tests
+########################################################################################
+.PHONY: pre-commit
+pre-commit:
+	pre-commit run --all-files
+
+# This target will run all the tests that typically take place in our continuous
+# integration tests on GitHub (with the exception building our docker container).
+.PHONY: ci
+ci: pre-commit pytest-coverage
diff --git a/docs/dev/clone_ferc1.rst b/docs/dev/clone_ferc1.rst
@@ -41,7 +41,7 @@ or with the ``ferc_to_sqlite`` script (see :ref:`run-cli`).
 
 .. note::
 
-  We recommend using Dagit to execute the ETL as it provides additional
+  We recommend using the Dagster UI to execute the ETL as it provides additional
   functionality for re-execution and viewing dependences.
 
 Executing a ``ferc_to_sqlite`` job will create several outputs that you can

diff --git a/docs/dev/dev_dagster.rst b/docs/dev/dev_dagster.rst
@@ -23,7 +23,7 @@ UI for a given Run and Dagster related logs will appear at the bottom of the UI:
 
 .. image:: ../images/dagster_ui_logs.png
   :width: 800
-  :alt: Dagit logs
+  :alt: Dagster UI logs
 
 To view logs from previous runs, click on the Run tab in the upper left hand
 corner, then click the Run ID of the desired run to view the dagster logs.
@@ -77,12 +77,12 @@ of a given resource. PUDL currently has three resources:
 The ``dataset_settings`` resource tells the PUDL ETL which years
 of data to process. You can configure the dataset settings
 by holding shift while clicking the "Materialize All" button in the upper
-right hand corner of the Dagit interface. This will bring up a window
+right hand corner of the Dagster UI. This will bring up a window
 where you change how the resource is configured:
 
 .. image:: ../images/dataset_settings_config.png
   :width: 800
-  :alt: Dagit home
+  :alt: Dagster UI home
 
 .. note::
 

diff --git a/docs/dev/run_the_etl.rst b/docs/dev/run_the_etl.rst
@@ -157,8 +157,8 @@ Both definitions have two preconfigured jobs:
 
 .. _run-dagster-ui:
 
-Running the ETL with Dagit
---------------------------
+Running the ETL via the Dagster UI
+----------------------------------
 
 Dagster needs a directory to store run logs and some interim assets. We don't
 distribute these outputs, so we want to store them separately from
@@ -289,7 +289,7 @@ To view the status of the run, click the date next to "Latest run:".
 
 .. image:: ../images/dagster_ui_pudl_etl.png
   :width: 800
-  :alt: Dagit pudl_etl
+  :alt: Dagster UI pudl_etl
 
 You can also re-execute specific assets by selecting one or
 multiple assets in the "Overview" tab and clicking "Materialize selected".
@@ -298,14 +298,14 @@ want to rerun the entire ETL.
 
 .. note::
 
-  Dagster does not allow you to select asset groups for a specific job.
-  For example, if you click on the ``raw_eia860`` asset group in Dagit,
-  click "Materialize All", the default configuration values will be used
-  so all available years of the data will be extracted.
+  Dagster does not allow you to select asset groups for a specific job.  For example, if
+  you click on the ``raw_eia860`` asset group in the Dagster UI click "Materialize All",
+  the default configuration values will be used so all available years of the data will
+  be extracted.
 
-  To process a subset of years for a specific asset group, select the
-  asset group, shift+click "Materialize all" and configure the
-  ``dataset_settings`` resource with the desired years.
+  To process a subset of years for a specific asset group, select the asset group,
+  shift+click "Materialize all" and configure the ``dataset_settings`` resource with the
+  desired years.
 
 .. note::
 
@@ -325,7 +325,7 @@ Dagster's job execution API.
 
 .. note::
 
-  We recommend using Dagit to execute the ETL as it provides additional
+  We recommend using the Dagster UI to execute the ETL as it provides additional
   functionality for re-execution and viewing asset dependences.
 
 There are two main CLI commands for executing the PUDL processing pipeline:
@@ -334,6 +334,13 @@ There are two main CLI commands for executing the PUDL processing pipeline:
    You must run this script before you can run ``pudl_etl``.
 2. ``pudl_etl`` executes the ``pudl.etl`` asset graph.
 
+We also have targets set up in the ``Makefile`` for running these scripts:
+
+.. code-block:: console
+
+    $ make ferc
+    $ make pudl
+
 Settings Files
 --------------
 These CLI commands use YAML settings files in place of command line arguments.

diff --git a/docs/dev/testing.rst b/docs/dev/testing.rst
@@ -67,6 +67,8 @@ above there are also:
 * ``pytest-minmax-rows``: Check that various database tables have the expected number of
   records in them, and report back the actual number of records found. Requires an
   existing PUDL DB.
+* ``pytest-coverage``: Run all the software tests and generate a test coverage report.
+* ``pytest-ci``: Run the unit and integration tests (those tests that get run in CI).
 
 Running Other Commands with Make
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -89,6 +91,8 @@ There are a number of non-test ```make`` targets. To see them all open up the
   kill it with ``Control-C``).
 * ``jlab``: start up a JupyerLab notebook server (will remain running in your terminal
   until you kill it with ``Control-C``).
+* ``ci``: Run all the checks that would be run in CI on GitHub, including the pre-commit
+  hooks, docs build, and software unit and integration tests.
 
 -------------------------------------------------------------------------------
 Selecting Input Data for Integration Tests