Skip to content

Commit

Permalink
Consolidate optional dependencies; test nuke; parallelize tests
Browse files Browse the repository at this point in the history
* Move previous dev, test, and datasette optional dependencies into
  the required dependencies to simplify application installation.
* Test make nuke; parallelize --live-dbs tests
* Move prettier into conda-only dependencies
* Update conda-lock.yml and rendered conda environment files.
* Remove action test file hashlog
* Remove merge markers.
* Remove transitive astroid dependency that's now correctly included in solve.
* Use the real immature library version of dagster-postgres (0.21.6)
  rather than the accidentally packaged 1.5.6 version found in conda.
  We'll need to keep an eye out for when dagster-postgres graduates
  to the stable versioning and update it. This is a bit of a mess
  because of some broken automation in the conda packaging for dagster
  which has now been fixed.
* Update "make pudl" to remove the old PUDL DB and reinitialize with
  alembic, rather than writing to the DB that already exists.
* Fixed some groupby.agg() deprecation warnings.
* Fix dagster-postgres version (again).
* Update username in path to settings file
* Avoid bugs in ferc_to_sqlite --clobber; don't use cache_dir for pip install.
* Make FERC extraction output removal more specific.
  • Loading branch information
zaneselvans committed Nov 8, 2023
1 parent 614e4ee commit 6952059
Show file tree
Hide file tree
Showing 15 changed files with 2,314 additions and 1,510 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build-deploy-pudl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ jobs:
--container-env DAGSTER_PG_PASSWORD="$DAGSTER_PG_PASSWORD" \
--container-env DAGSTER_PG_HOST="104.154.182.24" \
--container-env DAGSTER_PG_DB="dagster-storage" \
--container-env PUDL_SETTINGS_YML="/home/catalyst/src/pudl/package_data/settings/etl_full.yml" \
--container-env PUDL_SETTINGS_YML="/home/mambauser/src/pudl/package_data/settings/etl_full.yml" \
# Start the VM
- name: Start the deploy-pudl-vm
Expand Down
3 changes: 0 additions & 3 deletions .github/workflows/tox-pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ jobs:
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
create-args: --category main dev docs test datasette

- name: Log environment details
run: |
Expand Down Expand Up @@ -74,7 +73,6 @@ jobs:
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
create-args: --category main dev docs test datasette

- name: Log environment details
run: |
Expand Down Expand Up @@ -125,7 +123,6 @@ jobs:
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
create-args: --category main dev docs test datasette

- name: Log environment details
run: |
Expand Down
2 changes: 1 addition & 1 deletion .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ version: 2
build:
os: ubuntu-22.04
tools:
python: mambaforge-4.10
python: mambaforge-22.9

# Define the python environment using conda / mamba
conda:
Expand Down
41 changes: 18 additions & 23 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ coverage_report := coverage report --sort=cover
pytest_args := --durations 20 ${pytest_covargs} ${gcs_cache_path}
etl_fast_yml := src/pudl/package_data/settings/etl_fast.yml
etl_full_yml := src/pudl/package_data/settings/etl_full.yml
pip_install_pudl := pip install --no-deps --editable ./

# We use mamba locally, but micromamba in CI, so choose the right binary:
ifdef GITHUB_ACTION
Expand Down Expand Up @@ -47,25 +46,17 @@ conda-lock.yml: pyproject.toml
cd environments && conda-lock render \
--kind env \
--dev-dependencies \
--extras docs \
--extras datasette \
conda-lock.yml
prettier --write environments/*.yml

# Create the pudl-dev conda environment based on the universal lockfile
.PHONY: pudl-dev
pudl-dev: conda-lock.yml
conda-lock install \
--name pudl-dev \
--${mamba} \
--dev \
--extras docs \
--extras datasette \
environments/conda-lock.yml
conda-lock install --name pudl-dev --${mamba} --dev environments/conda-lock.yml

.PHONY: install-pudl
install-pudl: pudl-dev
${mamba} run --name pudl-dev pip install --no-deps --editable .
${mamba} run --name pudl-dev pip install --no-cache-dir --no-deps --editable .

########################################################################################
# Build documentation for local use or testing
Expand All @@ -92,19 +83,24 @@ docs-build: docs-clean
########################################################################################

# Extract all FERC DBF and XBRL data to SQLite.
ferc1.sqlite ferc1_xbrl.sqlite:
.PHONY: ferc
ferc:
rm -f ${PUDL_OUTPUT}/ferc*.sqlite
rm -f ${PUDL_OUTPUT}/ferc*_xbrl_datapackage.json
rm -f ${PUDL_OUTPUT}/ferc*_xbrl_taxonomy_metadata.json
coverage run ${covargs} -- \
src/pudl/ferc_to_sqlite/cli.py \
--clobber \
${gcs_cache_path} \
${etl_full_yml}

# Run the full PUDL ETL
pudl.sqlite:
coverage run ${covargs} -- \
src/pudl/cli/etl.py \
${gcs_cache_path} \
${etl_full_yml}
# Remove the existing PUDL DB if it exists.
# Create a new empty DB using alembic.
# Run the full PUDL ETL.
.PHONY: pudl
pudl:
rm -f ${PUDL_OUTPUT}/pudl.sqlite
alembic upgrade head
coverage run ${covargs} -- src/pudl/cli/etl.py ${gcs_cache_path} ${etl_full_yml}

########################################################################################
# pytest
Expand Down Expand Up @@ -140,11 +136,10 @@ pytest-validate:
# Backgrounding the data validation and integration tests and using wait allows them to
# run in parallel.
.PHONY: nuke
nuke: coverage-erase docs-build pytest-unit ferc1.sqlite ferc1_xbrl.sqlite pudl.sqlite
nuke: coverage-erase docs-build pytest-unit ferc pudl
pudl_check_fks
pytest ${pytest_args} --live-dbs --etl-settings ${etl_full_yml} test/integration & \
pytest ${pytest_args} --live-dbs test/validate & \
wait
pytest ${pytest_args} -n auto --live-dbs --etl-settings ${etl_full_yml} test/integration
pytest ${pytest_args} -n auto --live-dbs test/validate
${coverage_report}

# Check that designated Jupyter notebooks can be run against the current DB
Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ COPY docker/dagster.yaml ${DAGSTER_HOME}/dagster.yaml

# Create a conda environment based on the specification in the repo
COPY environments/conda-lock.yml environments/conda-lock.yml
RUN micromamba create --prefix ${CONDA_PREFIX} --yes --category main dev docs test datasette --file environments/conda-lock.yml && \
RUN micromamba create --prefix ${CONDA_PREFIX} --yes --file environments/conda-lock.yml && \
micromamba clean -afy
# Copy the cloned pudl repository into the user's home directory
COPY --chown=${MAMBA_USER}:${MAMBA_USER} . ${CONTAINER_HOME}
Expand Down
16 changes: 9 additions & 7 deletions docker/gcp_pudl_etl.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,23 @@ function run_pudl_etl() {
alembic upgrade head && \
pudl_setup && \
ferc_to_sqlite \
--loglevel=DEBUG \
--gcs-cache-path=gs://internal-zenodo-cache.catalyst.coop \
--workers=8 \
--loglevel DEBUG \
--gcs-cache-path gs://internal-zenodo-cache.catalyst.coop \
--workers 8 \
$PUDL_SETTINGS_YML && \
pudl_etl \
--loglevel DEBUG \
--gcs-cache-path gs://internal-zenodo-cache.catalyst.coop \
$PUDL_SETTINGS_YML && \
pytest \
--gcs-cache-path=gs://internal-zenodo-cache.catalyst.coop \
--etl-settings=$PUDL_SETTINGS_YML \
-n auto \
--gcs-cache-path gs://internal-zenodo-cache.catalyst.coop \
--etl-settings $PUDL_SETTINGS_YML \
--live-dbs test/integration test/unit && \
pytest \
--gcs-cache-path=gs://internal-zenodo-cache.catalyst.coop \
--etl-settings=$PUDL_SETTINGS_YML \
-n auto \
--gcs-cache-path gs://internal-zenodo-cache.catalyst.coop \
--etl-settings $PUDL_SETTINGS_YML \
--live-dbs test/validate
}

Expand Down
58 changes: 37 additions & 21 deletions environments/conda-linux-64.lock.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Generated by conda-lock.
# platform: linux-64
# input_hash: 874b20e543d8ad29e094c126a85321cefaf15e80f7f713bc52572ed16a1facf3
# input_hash: 046905504b546bce6941e51f6aaa9b8aabf01ce6c8b2a57ca37c85aefa112f30

channels:
- conda-forge
Expand All @@ -25,13 +25,15 @@ dependencies:
- fonts-conda-ecosystem=1=0
- libgcc-ng=13.2.0=h807b86a_2
- aws-c-common=0.9.3=hd590300_0
- bzip2=1.0.8=h7f98852_4
- bzip2=1.0.8=hd590300_5
- c-ares=1.21.0=hd590300_0
- fribidi=1.0.10=h36c2ea0_0
- geos=3.12.0=h59595ed_0
- gettext=0.21.1=h27087fc_0
- gflags=2.2.2=he1b5a44_1004
- giflib=5.2.1=h0b41bf4_3
- gmp=6.2.1=h58526e2_0
- graphite2=1.3.13=h58526e2_1001
- icu=73.2=h59595ed_0
- json-c=0.17=h7ab15ed_0
- keyutils=1.6.1=h166bdaf_0
Expand All @@ -51,6 +53,7 @@ dependencies:
- libnuma=2.0.16=h0b41bf4_1
- libsodium=1.0.18=h36c2ea0_1
- libspatialindex=1.9.3=h9c3ff4c_4
- libtool=2.4.7=h27087fc_0
- libutf8proc=2.8.0=h166bdaf_0
- libuuid=2.38.1=h0b41bf4_0
- libuv=1.46.0=hd590300_0
Expand Down Expand Up @@ -115,7 +118,7 @@ dependencies:
- freetype=2.12.1=h267a509_2
- krb5=1.21.2=h659d440_0
- libarchive=3.7.2=h039dbb9_0
- libglib=2.78.0=hebfc3b9_0
- libglib=2.78.1=hebfc3b9_0
- libgrpc=1.57.0=ha4d0f93_2
- libopenblas=0.3.24=pthreads_h413a1c8_0
- libthrift=0.19.0=hb90f79a_1
Expand All @@ -134,6 +137,7 @@ dependencies:
- anyascii=0.3.2=pyhd8ed1ab_0
- appdirs=1.4.4=pyh9f0ad1d_0
- astroid=3.0.1=py311h38be061_0
- atk-1.0=2.38.0=hd4edc92_1
- attrs=23.1.0=pyh71513ae_1
- aws-c-event-stream=0.3.2=h6fea174_2
- aws-c-http=0.7.13=hb59894b_2
Expand All @@ -157,7 +161,7 @@ dependencies:
- colorama=0.4.6=pyhd8ed1ab_0
- crashtest=0.4.1=pyhd8ed1ab_0
- cycler=0.12.1=pyhd8ed1ab_0
- dagster-pipes=1.5.6=pyhd8ed1ab_1
- dagster-pipes=1.5.6=pyhd8ed1ab_2
- dataclasses=0.8=pyhc8e2a94_3
- dbus=1.13.6=h5008d03_3
- debugpy=1.8.0=py311hb755f60_1
Expand All @@ -169,15 +173,18 @@ dependencies:
- entrypoints=0.4=pyhd8ed1ab_0
- et_xmlfile=1.1.0=pyhd8ed1ab_0
- exceptiongroup=1.1.3=pyhd8ed1ab_0
- execnet=2.0.2=pyhd8ed1ab_0
- executing=2.0.1=pyhd8ed1ab_0
- filelock=3.13.1=pyhd8ed1ab_0
- fontconfig=2.14.2=h14ed4e7_0
- freexl=2.0.0=h743c826_0
- frozenlist=1.4.0=py311h459d7ec_1
- fsspec=2023.10.0=pyhca7485f_0
- gdk-pixbuf=2.42.10=h829c605_4
- google-cloud-sdk=453.0.0=py311h38be061_0
- greenlet=3.0.1=py311hb755f60_0
- grpcio=1.57.0=py311ha6695c7_2
- gts=0.7.6=h977cf35_4
- hpack=4.0.0=pyh9f0ad1d_0
- httptools=0.6.1=py311h459d7ec_0
- humanfriendly=10.0=pyhd8ed1ab_6
Expand All @@ -199,6 +206,7 @@ dependencies:
- libblas=3.9.0=19_linux64_openblas
- libcurl=8.4.0=hca28451_0
- libpq=16.0=hfc447b1_1
- libwebp=1.3.2=h658648e_1
- llvmlite=0.40.1=py311ha6695c7_0
- locket=1.0.0=pyhd8ed1ab_0
- lxml=4.9.3=py311h1a07684_1
Expand Down Expand Up @@ -333,7 +341,7 @@ dependencies:
- html5lib=1.1=pyh9f0ad1d_0
- hypothesis=6.88.3=pyha770c72_0
- importlib-metadata=6.8.0=pyha770c72_0
- importlib_resources=6.1.0=pyhd8ed1ab_0
- importlib_resources=6.1.1=pyhd8ed1ab_0
- isodate=0.6.1=pyhd8ed1ab_0
- janus=1.0.0=pyhd8ed1ab_0
- jaraco.classes=3.3.0=pyhd8ed1ab_0
Expand All @@ -344,6 +352,7 @@ dependencies:
- jupyterlab_pygments=0.2.2=pyhd8ed1ab_0
- latexcodec=2.0.1=pyh9f0ad1d_0
- libcblas=3.9.0=19_linux64_openblas
- libgd=2.3.3=h119a65a_9
- libgoogle-cloud=2.12.0=h8d7e28b_2
- liblapack=3.9.0=19_linux64_openblas
- linear-tsv=1.1.0=py_1
Expand Down Expand Up @@ -390,8 +399,8 @@ dependencies:
- arrow=1.3.0=pyhd8ed1ab_0
- async-timeout=4.0.3=pyhd8ed1ab_0
- aws-c-s3=0.3.17=hfb4bb88_4
- botocore=1.31.78=pyhd8ed1ab_0
- branca=0.6.0=pyhd8ed1ab_0
- botocore=1.31.79=pyhd8ed1ab_0
- branca=0.7.0=pyhd8ed1ab_1
- croniter=2.0.1=pyhd8ed1ab_0
- cryptography=41.0.5=py311h63ff55d_0
- fqdn=1.5.1=pyhd8ed1ab_0
Expand All @@ -402,13 +411,14 @@ dependencies:
- gql=3.4.1=pyhd8ed1ab_0
- graphql-relay=3.2.0=pyhd8ed1ab_0
- grpcio-health-checking=1.57.0=pyhd8ed1ab_0
- harfbuzz=8.2.1=h3d44ed6_0
- httpcore=1.0.1=pyhd8ed1ab_0
- importlib_metadata=6.8.0=hd8ed1ab_0
- jsonschema-specifications=2023.7.1=pyhd8ed1ab_0
- jupyter_server_terminals=0.4.4=pyhd8ed1ab_1
- kealib=1.5.2=hcd42e92_1
- libnetcdf=4.9.2=nompi_h80fb2b6_112
- libspatialite=5.1.0=h090f1da_0
- libspatialite=5.1.0=h090f1da_1
- mako=1.2.4=pyhd8ed1ab_0
- numpy=1.24.4=py311h64a7726_0
- pbr=5.11.1=pyhd8ed1ab_0
Expand All @@ -418,10 +428,11 @@ dependencies:
- psycopg2-binary=2.9.7=pyhd8ed1ab_1
- pybtex=0.24.0=pyhd8ed1ab_2
- pydantic=1.10.13=py311h459d7ec_1
- pyproj=3.6.1=py311h1facc83_3
- pyproj=3.6.1=py311h1facc83_4
- pytest-console-scripts=1.4.1=pyhd8ed1ab_0
- pytest-cov=4.1.0=pyhd8ed1ab_0
- pytest-mock=3.12.0=pyhd8ed1ab_0
- pytest-xdist=3.3.1=pyhd8ed1ab_0
- python-build=1.0.3=pyhd8ed1ab_0
- requests=2.31.0=pyhd8ed1ab_0
- rich=13.6.0=pyhd8ed1ab_0
Expand All @@ -444,7 +455,7 @@ dependencies:
- dask-core=2023.10.1=pyhd8ed1ab_0
- dnspython=2.4.2=pyhd8ed1ab_1
- ensureconda=1.4.3=pyhd8ed1ab_0
- folium=0.14.0=pyhd8ed1ab_0
- folium=0.15.0=pyhd8ed1ab_0
- google-resumable-media=2.6.0=pyhd8ed1ab_0
- graphene=3.3=pyhd8ed1ab_0
- grpcio-status=1.57.0=pyhd8ed1ab_0
Expand All @@ -459,6 +470,7 @@ dependencies:
- numexpr=2.8.7=py311h039bad6_104
- oauthlib=3.2.2=pyhd8ed1ab_0
- pandas=2.1.2=py311h320fe9a_0
- pango=1.50.14=ha41ecd1_2
- prompt-toolkit=3.0.39=pyha770c72_0
- pybtex-docutils=1.0.3=py311h38be061_1
- pyopenssl=23.3.0=pyhd8ed1ab_0
Expand All @@ -475,9 +487,9 @@ dependencies:
- uvicorn-standard=0.24.0=h38be061_0
- virtualenv=20.24.6=pyhd8ed1ab_0
- aws-sdk-cpp=1.11.156=h314d761_4
- boto3=1.28.78=pyhd8ed1ab_0
- boto3=1.28.79=pyhd8ed1ab_0
- cachecontrol-with-filecache=0.13.1=pyhd8ed1ab_0
- dagster=1.5.6=pyhd8ed1ab_1
- dagster=1.5.6=pyhd8ed1ab_2
- datasette=0.64.4=pyhd8ed1ab_1
- doc8=1.1.1=pyhd8ed1ab_0
- email-validator=2.1.0.post1=pyhd8ed1ab_0
Expand All @@ -486,9 +498,11 @@ dependencies:
- geopandas-base=0.14.0=pyha770c72_1
- google-auth=2.23.4=pyhca7485f_0
- gql-with-requests=3.4.1=pyhd8ed1ab_0
- gtk2=2.24.33=h90689f9_2
- jsonschema-with-format-nongpl=4.19.2=pyhd8ed1ab_0
- jupyter_client=8.5.0=pyhd8ed1ab_0
- jupyter_client=8.6.0=pyhd8ed1ab_0
- keyring=24.2.0=py311h38be061_1
- librsvg=2.56.3=h98fae49_0
- matplotlib-base=3.8.1=py311h54ef318_0
- nbformat=5.9.2=pyhd8ed1ab_0
- pandera-core=0.17.2=pyhd8ed1ab_0
Expand All @@ -499,37 +513,39 @@ dependencies:
- timezonefinder=6.2.0=py311h459d7ec_2
- catalystcoop.ferc_xbrl_extractor=1.2.1=pyhd8ed1ab_0
- conda-lock=2.4.2=pyhd8ed1ab_0
- dagster-graphql=1.5.6=pyhd8ed1ab_1
- dagster-postgres=1.5.6=pyhd8ed1ab_1
- dagster-graphql=1.5.6=pyhd8ed1ab_2
- dagster-postgres=0.21.6=pyhd8ed1ab_1
- fiona=1.9.5=py311hbac4ec9_0
- google-api-core=2.12.0=pyhd8ed1ab_0
- google-auth-oauthlib=1.1.0=pyhd8ed1ab_0
- graphviz=8.1.0=h28d9a01_0
- ipython=8.17.2=pyh41d4057_0
- jupyter_events=0.8.0=pyhd8ed1ab_0
- jupyter_events=0.9.0=pyhd8ed1ab_0
- libarrow=13.0.0=h0f80be4_7_cpu
- mapclassify=2.6.1=pyhd8ed1ab_0
- nbclient=0.8.0=pyhd8ed1ab_0
- recordlinkage=0.16=pyhd8ed1ab_0
- tabulator=1.53.5=pyhd8ed1ab_0
- dagster-webserver=1.5.6=pyhd8ed1ab_1
- dagster-webserver=1.5.6=pyhd8ed1ab_2
- geopandas=0.14.0=pyhd8ed1ab_1
- google-cloud-core=2.3.3=pyhd8ed1ab_0
- ipykernel=6.26.0=pyhf8b6a83_0
- ipywidgets=8.1.1=pyhd8ed1ab_0
- nbconvert-core=7.10.0=pyhd8ed1ab_0
- nbconvert-core=7.11.0=pyhd8ed1ab_0
- pyarrow=13.0.0=py311h39c9aba_7_cpu
- pygraphviz=1.11=py311h72a77b7_1
- tableschema=1.19.3=pyh9f0ad1d_0
- datapackage=1.15.2=pyh44b312d_0
- google-cloud-storage=2.13.0=pyhca7485f_0
- jupyter_console=6.6.3=pyhd8ed1ab_0
- jupyter_server=2.9.1=pyhd8ed1ab_0
- nbconvert-pandoc=7.10.0=pyhd8ed1ab_0
- jupyter_server=2.10.0=pyhd8ed1ab_0
- nbconvert-pandoc=7.11.0=pyhd8ed1ab_0
- qtconsole-base=5.5.0=pyha770c72_0
- gcsfs=2023.10.0=pyhd8ed1ab_0
- jupyter-lsp=2.2.0=pyhd8ed1ab_0
- jupyter-resource-usage=1.0.1=pyhd8ed1ab_0
- jupyterlab_server=2.25.0=pyhd8ed1ab_0
- nbconvert=7.10.0=pyhd8ed1ab_0
- nbconvert=7.11.0=pyhd8ed1ab_0
- notebook-shim=0.2.3=pyhd8ed1ab_0
- jupyterlab=4.0.8=pyhd8ed1ab_0
- notebook=7.0.6=pyhd8ed1ab_0
Expand Down
Loading

0 comments on commit 6952059

Please sign in to comment.