Skip to content

Commit

Permalink
reasme updated
Browse files Browse the repository at this point in the history
  • Loading branch information
imbilalbutt committed Jan 31, 2024
1 parent 76d72cf commit b2ac97e
Showing 1 changed file with 34 additions and 0 deletions.
34 changes: 34 additions & 0 deletions exercises/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,4 +154,38 @@ Goal
- Use fitting SQLite types (e.g., BIGINT, TEXT or FLOAT) for all columns

- Write data into a SQLite database called “temperatures.sqlite”, in the table “temperatures”


## Exercise 5

- Build an automated data pipeline for the following source:

- Direct download link: https://gtfs.rhoenenergie-bus.de/GTFS.zip

Goal

- Work with GTFS data

for Python, consider using ‘urllib.request.urlretrieve’ instead of the request library to download the ZIP file
for Jayvee, if you use the FilePicker, do not use a leading dot in file paths, see this bug: https://github.com/jvalue/jayvee/issues/381

- Pick out only stops (from stops.txt)

- Only the columns stop_id, stop_name, stop_lat, stop_lon, zone_id with fitting data types

- Filter data

- Only keep stops from zone 2001

- Validate data

- stop_name can be any text but must maintain german umlauts

- stop_lat/stop_lon must be a geographic coordinates between -90 and 90, including upper/lower bounds

- Drop rows containing invalid data

- Use fitting SQLite types (e.g., BIGINT, TEXT or FLOAT) for all columns

- Write data into a SQLite database called “gtfs.sqlite”, in the table “stops”

0 comments on commit b2ac97e

Please sign in to comment.