Skip to content

Persisting Data

isaacmg edited this page Jun 10, 2020 · 11 revisions

Data Pipelines

With new COVID-19 data coming in on a daily basis we need to have pipelines to join and stash the relevant data sources. We want to enable data to be easily tracked and versioned to make models reproducible. Airflow Image Airflow Airflow will be used to schedule daily jobs to persist data to GCS and Dataverse.

GCS Layout GCS will be organized into buckets based on directory.