Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run ferc_to_sqlite and pudl_etl independent of integration tests #2825

Open
wants to merge 113 commits into
base: main
Choose a base branch
from
Open
Changes from 109 commits
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
6bf1115
Try etl/integraiton separation in tests.
rousik Sep 2, 2023
d89e1f4
fix some formatting in the python thingies
rousik Sep 3, 2023
3b43bef
Minor changes to the tox-pytest setup
rousik Sep 4, 2023
a8fb7dc
Comment out most things to isolate ci-integration-etl
rousik Sep 6, 2023
8587dc7
fix few things
rousik Sep 6, 2023
a5b2867
fix condarc
rousik Sep 6, 2023
4916320
install snappy before pudl
rousik Sep 6, 2023
2b1a116
Fix up integration tests, run on the large machine.
rousik Sep 6, 2023
bb623af
add alembic setup
rousik Sep 6, 2023
da07819
run conda info ahead of running the etl
rousik Sep 6, 2023
7dc86aa
Log conda env
rousik Sep 6, 2023
162f62a
Disable large runners for now
rousik Sep 6, 2023
68c6dee
fix exec shell for the action
rousik Sep 6, 2023
09fb56d
fix exec shell for the action
rousik Sep 6, 2023
f05fcd3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 6, 2023
173ad8c
Fix path to integration tests.
zaneselvans Sep 6, 2023
c9b83ab
Add concurrency to cancel previous runs
rousik Sep 6, 2023
e35efc9
Merge branch 'parallel-pytest' of github.com:catalyst-cooperative/pud…
rousik Sep 6, 2023
a81ebf0
Fix path to integration tests
rousik Sep 6, 2023
364b053
fix pytest paths
rousik Sep 6, 2023
d4b46fc
restructure etl steps
rousik Sep 6, 2023
ffd56e0
fix typo in alembic setup
rousik Sep 6, 2023
2edea32
uncomment steps and fix mamba setup
rousik Sep 6, 2023
d5fcc73
Install test dependencies with pip
zaneselvans Sep 6, 2023
4f42efa
resolve conflicts with dev branch
zaneselvans Sep 6, 2023
ac95389
Merge branch 'dev' into parallel-pytest
zaneselvans Sep 6, 2023
61bb7b6
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Sep 11, 2023
b1f3b95
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Sep 25, 2023
fb2892f
Add test deps to ci steps that use coverage.
rousik Sep 25, 2023
678d54e
I don't understand tox substitution :-/
rousik Sep 26, 2023
6ec7d99
Fix ETL command to run. Ugh.
rousik Sep 26, 2023
2f43405
Include datasette as tox dependency.
rousik Sep 26, 2023
66f5f54
Fix the input/output path override behavior when running in gha context.
rousik Sep 26, 2023
3ac2879
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Sep 26, 2023
12903a0
Merge branch 'dev' into parallel-pytest
rousik Oct 3, 2023
28af787
Rename github_ tox steps, make them fail unless GITHUB_ACTIONS env ex…
rousik Oct 18, 2023
a725c07
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Oct 18, 2023
00e41fb
Merge branch 'dev' into parallel-pytest
zaneselvans Oct 19, 2023
645ae4f
Merge branch 'dev' into parallel-pytest
zaneselvans Oct 19, 2023
051d2b1
Explicitly demand tox>=4.3.3 for the env reuse.
rousik Oct 21, 2023
58597e4
Drop usage of tox in the new ci-integration test.
rousik Oct 25, 2023
5b162bd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 25, 2023
b7afb1e
Put back diagnostic steps and install PUDL dependencies.
rousik Oct 25, 2023
e17f35c
Few cleanup tasks.
rousik Oct 25, 2023
64500b7
Fix up the coverage commands.
rousik Oct 25, 2023
3a7b30a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 25, 2023
2a9f5a2
Fix the path to etl.py.
rousik Oct 25, 2023
f226f38
Run alembic before ETL.
rousik Oct 25, 2023
583f7f8
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Oct 25, 2023
0590f29
Revert changes to other files.
rousik Oct 25, 2023
fa814ea
Also install datasette dependencies.
rousik Oct 28, 2023
c3da09e
Revert changes to .coveragerc
rousik Oct 29, 2023
0fdd73d
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Oct 29, 2023
171dab5
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Nov 1, 2023
2c96289
Merge branch 'dev' into parallel-pytest
rousik Nov 6, 2023
666f1e9
add option to pick runner when running from dispatch
rousik Nov 6, 2023
31dde43
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 6, 2023
297eca0
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Nov 6, 2023
fb30bd5
Fix workflow for runner size determination.
rousik Nov 7, 2023
6ff74f5
Run matrix of integration tests to check scaling.
rousik Nov 7, 2023
567da80
Fix integration workflow file.
rousik Nov 7, 2023
ca9854e
Drop support for matrix evaluation of ci-integration
rousik Nov 7, 2023
b08ead3
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Nov 14, 2023
a23f7eb
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Nov 27, 2023
e834f03
Update conda-lock.yml and rendered conda environment files.
rousik Nov 27, 2023
6d14c83
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Nov 27, 2023
8414ce3
Merge branch 'dev' into parallel-pytest
zaneselvans Nov 28, 2023
65aaa60
Update conda-lock.yml and rendered conda environment files.
zaneselvans Nov 28, 2023
48726c0
Replace ls -R with find -type f.
rousik Nov 27, 2023
cc1f6a9
Revert renames to be more friendly.
rousik Nov 29, 2023
14d9a4a
Fix some more code review comments.
rousik Nov 29, 2023
1c8dd9c
Fix error in call to pytest -n auto
zaneselvans Nov 29, 2023
6e4770c
Merge branch 'dev' into parallel-pytest
zaneselvans Nov 29, 2023
04812b5
Update conda-lock.yml and rendered conda environment files.
zaneselvans Nov 29, 2023
fc023db
Fail gha step if coverage fails to upload.
rousik Nov 29, 2023
7ac4fbb
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Nov 29, 2023
1ea46f5
Use specific coverage file names to aid the process.
rousik Nov 29, 2023
ba40c9b
add some debug options for coverage
rousik Dec 1, 2023
5515cbf
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Dec 1, 2023
5fa6f5a
Update conda-lock.yml and rendered conda environment files.
rousik Dec 1, 2023
9f2ad16
Checksum partial coverage files before merging.
rousik Dec 1, 2023
c039a0a
Explicitly include src and test as source directories.
rousik Dec 1, 2023
b0105f3
Use src/pudl as a source definition.
rousik Dec 2, 2023
dd6d357
Use --source=src/pudl when running coverage
rousik Dec 3, 2023
908e83c
Remove option that can't be set via cmdline.
rousik Dec 4, 2023
2fb0603
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Dec 10, 2023
bf77921
Use right parameters for coverage.run tool.
rousik Dec 10, 2023
a9f6053
Update conda-lock.yml and rendered conda environment files.
rousik Dec 10, 2023
9976578
Cast 0 xbrl workers back to None.
rousik Dec 10, 2023
87c6ab4
Merge branch 'parallel-pytest' of github.com:catalyst-cooperative/pud…
rousik Dec 10, 2023
d5c5b91
Drop COVERAGE_OPTIONS, everything is now in pyproject.toml
rousik Dec 10, 2023
a8e0eaa
Fix command for pudl_etl
rousik Dec 13, 2023
910977b
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Dec 13, 2023
c65426b
run coverage combine when building docs
rousik Dec 13, 2023
50eec4f
Use multiprocessing only when dealing with ci-integration.
rousik Dec 13, 2023
da8dbcc
Update conda-lock.yml and rendered conda environment files.
rousik Dec 13, 2023
5c9a194
Blablabla.
rousik Dec 13, 2023
72ce2f2
Update conda-lock.yml and rendered conda environment files.
rousik Dec 13, 2023
6d032f9
Try to tune coverage parameters once more.
rousik Dec 14, 2023
892adac
Update conda-lock.yml and rendered conda environment files.
rousik Dec 14, 2023
7f8619a
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Dec 14, 2023
e94e277
Reset lock files to match dev branch.
rousik Dec 14, 2023
da3b055
Merge branch 'dev' into parallel-pytest
zaneselvans Dec 15, 2023
5d49446
Merge branch 'dev' into parallel-pytest
zaneselvans Dec 15, 2023
6696f2f
Record coverage of unit and integration tests themselves
zaneselvans Dec 15, 2023
fec6061
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Dec 29, 2023
c201637
Merge branch 'dev' into parallel-pytest
rousik Jan 3, 2024
8ec32bb
Merge branch 'main' into parallel-pytest
zaneselvans Jan 5, 2024
e52c349
Merge remote-tracking branch 'origin/main' into parallel-pytest
rousik Jan 18, 2024
131b9fa
Make some changes as per feedback.
rousik Jan 18, 2024
d17fa54
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 18, 2024
01a5efc
Merge branch 'main' into parallel-pytest
zaneselvans Jan 19, 2024
c51d8e2
Rename glue.py unit tests so pytest picks them up.
zaneselvans Jan 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 72 additions & 46 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
@@ -8,11 +8,16 @@ on:
- opened
- synchronize
- ready_for_review
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event_name }}
cancel-in-progress: true
zaneselvans marked this conversation as resolved.
Show resolved Hide resolved

env:
PUDL_OUTPUT: /home/runner/pudl-work/output/
PUDL_INPUT: /home/runner/pudl-work/input/
DAGSTER_HOME: /home/runner/pudl-work/dagster_home/
ETL_CONFIG: src/pudl/package_data/settings/etl_fast.yml
ETL_COMMANDLINE_OPTIONS: --gcs-cache-path=gs://zenodo-cache.catalyst.coop

jobs:
ci-docs:
@@ -64,7 +69,6 @@ jobs:
defaults:
run:
shell: bash -l {0}

steps:
- uses: actions/checkout@v4
with:
@@ -105,31 +109,31 @@ jobs:
path: coverage.xml

ci-integration:
runs-on:
group: large-runner-group
labels: ubuntu-22.04-4core
needs:
- ci-unit
zaneselvans marked this conversation as resolved.
Show resolved Hide resolved
runs-on: ubuntu-22.04-4core
if: github.event.pull_request.draft == false
permissions:
contents: read
id-token: write
strategy:
fail-fast: false
Comment on lines -115 to -116
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want fail-fast enabled, given that we've already the dependency above specified? I think this would mean that if the docs build fails, the unit and integration tests would be canceled, right?

defaults:
run:
shell: bash -l {0}

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 2

- name: Install Conda environment using mamba
- name: Install conda-lock environment with micromamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true

- name: Install PUDL and its dependencies
run: pip install --no-deps --no-cache-dir .

- name: Log environment details
run: |
conda info
@@ -169,22 +173,41 @@ jobs:
workload_identity_provider: "projects/345950277072/locations/global/workloadIdentityPools/gh-actions-pool/providers/gh-actions-provider"
service_account: "tox-pytest-github-action@catalyst-cooperative-pudl.iam.gserviceaccount.com"

- name: Run integration tests, trying to use GCS cache if possible
- name: Run ferc_to_sqlite
env:
COVERAGE_FILE: .coverage.ferc_to_sqlite
run: |
pip install --no-deps --editable .
pudl_datastore --dataset epacems --partition year_quarter=2022q1
make pytest-integration

coverage run --concurrency=multiprocessing \
src/pudl/ferc_to_sqlite/cli.py --clobber ${{ env.ETL_COMMANDLINE_OPTIONS }} ${{ env.ETL_CONFIG }}
- name: Run pudl_etl
env:
COVERAGE_FILE: .coverage.pudl_etl
run: |
alembic upgrade head
coverage run --concurrency=multiprocessing \
src/pudl/etl/cli.py ${{ env.ETL_COMMANDLINE_OPTIONS }} ${{ env.ETL_CONFIG }}
- name: Run integration tests
env:
COVERAGE_FILE: .coverage.pytest
run: |
coverage run --concurrency=multiprocessing \
-m pytest -n auto --live-dbs test/integration
- name: Checksum coverage files
run: ls .coverage* | xargs md5sum | sort
zaneselvans marked this conversation as resolved.
Show resolved Hide resolved
- name: Generate coverage
run: |
coverage --version
coverage combine
zaneselvans marked this conversation as resolved.
Show resolved Hide resolved
coverage xml
coverage report
- name: Upload coverage
uses: actions/upload-artifact@v4
with:
name: coverage-integration
path: coverage.xml

- name: Log post-test Zenodo datastore contents
run: find ${{ env.PUDL_INPUT }}

ci-coverage:
name: Upload coverage to CodeCov
runs-on: ubuntu-latest
needs:
- ci-docs
@@ -198,37 +221,40 @@ jobs:
with:
path: coverage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use path: coverage here but when we're uploading the artifact above we use path: coverage.xml?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is a bit of confusing api. When uploading artifacts, path tells you what files/directories you want to upload, and optionally you can give them name to uniquely idenfiy them. When you're downloading artifacts, path tells you where to download things. You can either pick specific artifact names to download, or you can just download everything (which is happening here). What happens is that you'll end up with coverage/coverage-unit/coverage.xml, coverage/coverage-integration/coverage.xml and so on.

- name: List downloaded files
run: |
ls -R
run: find -type f
- name: Upload test coverage report to CodeCov
uses: codecov/codecov-action@v3
with:
directory: coverage

ci-notify:
runs-on: ubuntu-latest
if: ${{ always() }}
needs:
- ci-docs
- ci-unit
- ci-integration
steps:
- name: Inform the Codemonkeys
uses: 8398a7/action-slack@v3
continue-on-error: true
with:
status: custom
fields: workflow,job,commit,repo,ref,author,took
custom_payload: |
{
username: 'action-slack',
icon_emoji: ':octocat:',
attachments: [{
color: '${{ needs.ci-test.result }}' === 'success' ? 'good' : '${{ needs.ci-test.result }}' === 'failure' ? 'danger' : 'warning',
text: `${process.env.AS_REPO}@${process.env.AS_REF}\n ${process.env.AS_WORKFLOW} (${process.env.AS_COMMIT})\n by ${process.env.AS_AUTHOR}\n Status: ${{ needs.ci-test.result }}`,
}]
}
env:
GITHUB_TOKEN: ${{ github.token }} # required
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }} # required
MATRIX_CONTEXT: ${{ toJson(matrix) }} # required
fail_ci_if_error: true

# TODO(rousik): The following slack notification gives no value and
# needs to be fixed. Until then, it might be better to do nothing
rousik marked this conversation as resolved.
Show resolved Hide resolved
# at all.
# ci-notify:
# name: Notify slack
# runs-on: ubuntu-latest
# if: ${{ always() }}
# needs:
# - ci-unit
# - ci-integration
# steps:
# - name: Inform the Codemonkeys
# uses: 8398a7/action-slack@v3
# continue-on-error: true
# with:
# status: custom
# fields: workflow,job,commit,repo,ref,author,took
# custom_payload: |
# {
# username: 'action-slack',
# icon_emoji: ':octocat:',
# attachments: [{
# color: '${{ needs.ci-test.result }}' === 'success' ? 'good' : '${{ needs.ci-test.result }}' === 'failure' ? 'danger' : 'warning',
# text: `${process.env.AS_REPO}@${process.env.AS_REF}\n ${process.env.AS_WORKFLOW} (${process.env.AS_COMMIT})\n by ${process.env.AS_AUTHOR}\n Status: ${{ needs.ci-test.result }}`,
# }]
# }
# env:
# GITHUB_TOKEN: ${{ github.token }} # required
# SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }} # required
# MATRIX_CONTEXT: ${{ toJson(matrix) }} # required
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -80,6 +80,7 @@ docs-clean:
docs-build: docs-clean
doc8 docs/ README.rst
coverage run ${covargs} -- ${CONDA_PREFIX}/bin/sphinx-build -W -b html docs docs/_build/html
coverage combine
coverage xml

########################################################################################
22 changes: 21 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -310,11 +310,31 @@ curl = ">=8.4.0"

[tool.coverage.run]
# See note above on need to specify separate sources for pytest-coverage and coverage.
source = ["src/pudl/", "test/integration/", "test/unit/"]
# source = ["src/pudl/", "test/integration/", "test/unit/"]
include = [
"src/pudl/**",
"test/integration/**",
"test/unit/**",
"*/site-packages/pudl/**",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on with the change from source to include here? Should these be identical to the pudl_sources below? Should the commented out source line be removed? Is the comment referring above still relevant? Does switching from source to include mean we don't need the --cov=... arguments above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are not installing as editable, pudl will be loaded from site-packages/pudl and I feel like source was misbehaving and did not quite work, while include does.

AFAIK in the new ci-integration that is part of this PR, the coverage commands should work w/o the need of --cov arguments above. I'm not even sure that was there when I was working on this PR initially.

pudl_sources below is a translation block that maps site-packages/pudl/foo.py onto src/pudl/foo.py, so that the coverage works even when the libraries are loaded form slightly different location.

I suppose all of this could be made to work if we simply install pudl as editable.

omit = [
# Never hit by integration tests:
"src/pudl/validate.py",
]
sigterm = true
concurrency=["multiprocessing"]
debug = ["config", "trace"]

[tool.coverage.paths]
# When running pudl tools installed with pip, the sources are imported
# from package-data/pudl directory. The following maps this to raw
# source files.
pudl_sources = [
"src/pudl/",
"*/site-packages/pudl/",
"test/unit",
"test/integration",
]

[tool.coverage.report]
precision = 1
2 changes: 2 additions & 0 deletions src/pudl/extract/xbrl.py
Original file line number Diff line number Diff line change
@@ -68,6 +68,8 @@ def xbrl2sqlite(context) -> None:
clobber = context.op_config["clobber"]
batch_size = context.op_config["batch_size"]
workers = context.op_config["workers"]
if workers == 0:
workers = None
ferc_to_sqlite_settings = context.resources.ferc_to_sqlite_settings
datastore = context.resources.datastore
datastore = FercXbrlDatastore(datastore)