Skip to content

Latest commit

 

History

History
33 lines (33 loc) · 1.49 KB

README.md

File metadata and controls

33 lines (33 loc) · 1.49 KB

CoronaWhy Spanish Flu research framework

Using ML techniques to overlay CoronaWhy Spanish Flu data on COVID-19. You can find all datasets published in CoronaWhy Data Lake.

Regular meetings

We're sharing all meetings on YouTube, please feel free to join us if you would like to contribute.

Datasets

Download the latest version of the KB Spanish flu dataset

wget http://datasets.coronawhy.org/api/access/datafile/503748 -O data.tar.gz;gzip -cd data.tar.gz|tar xf -
wget http://datasets.coronawhy.org/api/access/datafile/741787 -O congress.tar.gz;gzip -cd congress.tar.gz|tar xf -

Framework installation

Download Language Identification Model:

wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin

Install fasttext module

pip install fasttext

Usage

Run Language Detection process

python3 ./main.py

Results

File citations.txt with relevant fragments will be produced based on keywords defined in config.py

CoronaWhy infrastructure

You can also do full-text search in the whole collection by querying Elasticsearch index spanishflu

curl "http://search.coronawhy.org/spanishflu/_search?pretty=true&q=*"