Fetch datasets as release assets (instead of Git LFS pull) #190

anthonyfok · 2022-04-09T13:41:31Z

Large datasets, mostly CSV files, are currently fetched directly from Git LFS which induce significant Git LFS bandwidth costs.

Fetching these datasets as pre-compressed release assets will reduce download time and eliminate most GitHub Git LFS bandwidth costs. Thanks to @jvanulde for the idea and @DamonU2 for the pioneering work.

This, I think, is easier to implement and maintain, thus more robust and less error-prone than my previous unimplemented "XZ-compressed copies of repos" idea:

Create XZ-compressed Git repos and download from them #91

Data source repos:

OpenDRR/openquake-inputs
OpenDRR/model-inputs
OpenDRR/canada-srm2
OpenDRR/earthquake-scenarios

Scripts that fetch from these repos include (but may not be limited to):

python/add_data.sh (OpenDRR/opendrr-api)
scripts/DSRA_outputs2postgres_lfs.py (OpenDRR/model-factory)

Cf. these commands found in add_data.sh, for example:

fetch_csv openquake-inputs ...
fetch_csv model-inputs ...
curl -L https://api.github.com/repos/OpenDRR/canada-srm2/contents/cDamage/output?ref=tieg_natmodel2021
curl -L https://api.github.com/repos/OpenDRR/earthquake-scenarios/contents/FINISHED
python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=$DSRA_REPOSITORY --columnsINI=DSRA_outputs2postgres.ini --eqScenario="$eqscenario"

XZ or Zstd compression? (compressed file sizes vs. decompression speed)

anthonyfok added Enhancement New feature or request Priority: Should Have labels Apr 9, 2022

anthonyfok assigned anthonyfok and DamonU2 Apr 9, 2022

anthonyfok pinned this issue Apr 9, 2022

anthonyfok mentioned this issue Apr 9, 2022

Download model-factory from release tarball instead of git clone #155

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetch datasets as release assets (instead of Git LFS pull) #190

Fetch datasets as release assets (instead of Git LFS pull) #190

anthonyfok commented Apr 9, 2022

Fetch datasets as release assets (instead of Git LFS pull) #190

Fetch datasets as release assets (instead of Git LFS pull) #190

Comments

anthonyfok commented Apr 9, 2022