Skip to content

1 Download raw data

Hou Yujun edited this page Aug 15, 2024 · 3 revisions

Please download the folder code/raw_download.

Set up environment with requirements-non_cv.txt.

An access token is required to download data from Mapillary. You can register one for free from Mapillary. Update your mapillary token in the following files:

  • code/raw_download/raw_download.py
  • code/raw_download/download_mly_points.py

How the code works

raw_download.py finds the level-14 vector tile associated with each input city's location, downloads and merges all available SVIs (metadata) that fall within this tile, from both Mapillary and KartaView.

Input

A list of city IDs (int)

Example: targets = [1702341327, 1276171358] # Singapore and Stuttgart

As city names are not unique, unique city IDs are used as input. They can be found in code/raw_download/data/worldcities.csv, which is the same file available on simplemaps.

Output

For each city, a csv file containing both Mapillary and KartaView SVI (metadata) downloaded from the level-14 vector tile associated with the city's location (city's location is provided in worldcities.csv). The csv file is named using the city name (in ASCII string) and a city ID: cityName_cityId.csv

Example: Singapore_1702341327.csv, Stuttgart_1276171358.csv

Adjustable variables

User can adjust the following variables in raw_download.py to suit their needs:

  • access_token (str):
    • Insert your Mapillary access token.
  • targets (list):
    • Insert the list of your target city IDs.
  • save_folder (str):
    • Insert the path to your output directory.
  • wc (str):
    • Insert the path to worldcities.csv.
  • reproduce (bool):
    • Set this to True if you wish to reproduce the dataset, and no new UUID (a universally unique identifier we generate to identify each SVI) will be generated for the data downloaded.
    • Set this to False if you wish to update or expand the dataset (i.e. download new data that is not already provided in the dataset), and new UUIDs will be generated for the data downloaded.
  • start_date (str) ('YYYY-MM-DD'):
    • Download data from this date onward.
    • If set to None, it means to start downloading from the earliest available data.
  • end_date (str) ('YYYY-MM-DD'):
    • Download data until this date.
    • If set to None, it means to download until the latest available data.

How to run the code

Set up environment with requirements-non_cv.txt.

To reproduce sample_output

Insert your access_token.

Modify save_folder to your output folder.

Then run:

python3 raw_download.py

To download other data

After modifying the adjustable variables, run

python3 raw_download.py

Notes

download_mly_points.py and download_kv_points.py contain functions to be imported to raw_download.py, but can also be run on their own to download Mapillary or KartaView SVI (metadata) from the level-14 vector file associated with each input city's location.

For the above reason, please keep download_mly_points.py, download_kv_points.py, and raw_download.py in the same folder for raw_download.py to work.

Sample output

As an example, we downloaded raw metadata for Singapore and Stuttgart in two different ways:

  • reproduce = False, start_date = '2024-04-01', end_date = None
  • reproduce = True, start_date = None, end_date = '2024-04-01'

The sample output is stored in code/raw_download/sample_output/reproduce_false and code/raw_download/sample_output/reproduce_true respectively.

Follow code/raw_download/visualise_output.ipynb to map the sample output and merge data.

The data merged from code/raw_download/sample_output/reproduce_false, which results in code/raw_download/sample_output/points.csv, is also used as the input to demostrate the subsequent processes: