-
Notifications
You must be signed in to change notification settings - Fork 11
Default program execution
The example config.ini
file provided in the WikiDAT/wikidat
directory includes some default options that you can use to test your local installation of WikiDAT. This version obtains the complete dump with revision history of scowiki
, verifies the integrity of downloaded files and finally loads extracted information in a local database with name scowiki_20140527
. This example only uses one worker process for each type of element (one worker for pages, another worker for revisions) and only one ETL line (as there is only a single dump file to be extracted). Please, read section "Understanding the ETL process(es)" below for additional details on how to parallelize data extraction.
In addition, this example file contains dummy values for the user and password to connect to your local database. Therefore, you must edit these two fields and provide a valid user name and password before running the test, or the program will complain about invalid parameters to connect to your local database.
Once a valid user and password for your local database is provided, open a commandline terminal, go to the Wikidat/wikidat directory and execute the program:
user@host:somepath/WikiDAT/wikidat$ python main.py
To customize the execution of WikiDAT for your own needs, it is recommended to create a separate .ini file. You can achive this copying the template from the example config.ini
file and providing a different file name, for instance config_local.ini
. To use this new file, we must let the program know where to look up for the new configuration with the -c
or --config_file
command-line option:
user@host:somepath/WikiDAT/wikidat$ python main.py --config_file config_local.ini
Please, refer to the following sections to learn more about the different configuration options available in WikiDAT.
WikiDAT: Wikipedia Data Analysis Tooolkit. CC-BY-SA 3.0 Felipe Ortega. Icons: Font Awesome