Building an ETL pipeline using Python and Pandas to extract and transform data and PostgreSQL with pgAdmin to load
- Original source data (Excel files) can be found in the Resources folder, along w/images used in this README.
- All project deliverables can be found in the Project_Files folder.
- All CSV output files generated from the transformations can be found in the Output folder.
Jupyter Notebook - Extract/Transform: Crowdfunding_ETL.ipynb
Entity Relationship Diagram: Crowdfunding_ERD.png
Database Schema - Load: crowdfunding_db_schema.sql
A Jupyter Notebook file to extract and transform Excel data to create four separate cleaned DataFrames as shown below, and then export them to CSV files.
An Entity Relationship Diagram visualization of the cleaned tables & their relationships:
A SQL schema file that does the following:
- Creates a database in pgAdmin using PostgreSQL
- Creates tables for the above CSV files
- Runs some test queries to verify the tables were imported correctly
To run this file, please follow these steps:
- Run the code in "SECTION ONE" in pgAdmin to create the database.
- Open a new Query Tool in the new database & run the code in "SECTION TWO" to create the tables.
- Refresh the database.
- Import each table in the order they were created with the default settings.
- Run each query statement in "SECTION THREE" to verify the tables were imported correctly. You should see results similar to those in the image below on the last query.
Data for this dataset was generated by edX Boot Camps LLC, and is intended for educational purposes only.