Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch datasets as release assets (instead of Git LFS pull) #190

Open
anthonyfok opened this issue Apr 9, 2022 · 0 comments
Open

Fetch datasets as release assets (instead of Git LFS pull) #190

anthonyfok opened this issue Apr 9, 2022 · 0 comments
Assignees
Labels

Comments

@anthonyfok
Copy link
Member

Large datasets, mostly CSV files, are currently fetched directly from Git LFS which induce significant Git LFS bandwidth costs.

Fetching these datasets as pre-compressed release assets will reduce download time and eliminate most GitHub Git LFS bandwidth costs. Thanks to @jvanulde for the idea and @DamonU2 for the pioneering work.

This, I think, is easier to implement and maintain, thus more robust and less error-prone than my previous unimplemented "XZ-compressed copies of repos" idea:

Data source repos:

  • OpenDRR/openquake-inputs
  • OpenDRR/model-inputs
  • OpenDRR/canada-srm2
  • OpenDRR/earthquake-scenarios

Scripts that fetch from these repos include (but may not be limited to):

  • python/add_data.sh (OpenDRR/opendrr-api)
  • scripts/DSRA_outputs2postgres_lfs.py (OpenDRR/model-factory)

Cf. these commands found in add_data.sh, for example:

fetch_csv openquake-inputs ...
fetch_csv model-inputs ...
curl -L https://api.github.com/repos/OpenDRR/canada-srm2/contents/cDamage/output?ref=tieg_natmodel2021
curl -L https://api.github.com/repos/OpenDRR/earthquake-scenarios/contents/FINISHED
python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=$DSRA_REPOSITORY --columnsINI=DSRA_outputs2postgres.ini --eqScenario="$eqscenario"

XZ or Zstd compression? (compressed file sizes vs. decompression speed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants