Skip to content

Building an ETL pipeline using Python and Pandas to extract and transform data and PostgreSQL with pgAdmin to load

Notifications You must be signed in to change notification settings

andcetera/Crowdfunding_ETL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crowdfunding ETL Mini-Project

Building an ETL pipeline using Python and Pandas to extract and transform data and PostgreSQL with pgAdmin to load

stock_ETL_photo


Project Structure

  • Original source data (Excel files) can be found in the Resources folder, along w/images used in this README.
  • All project deliverables can be found in the Project_Files folder.
  • All CSV output files generated from the transformations can be found in the Output folder.

Deliverable 1:

Jupyter Notebook - Extract/Transform: Crowdfunding_ETL.ipynb

Deliverable 2:

Entity Relationship Diagram: Crowdfunding_ERD.png

Deliverable 3:

Database Schema - Load: crowdfunding_db_schema.sql


Deliverable 1:

A Jupyter Notebook file to extract and transform Excel data to create four separate cleaned DataFrames as shown below, and then export them to CSV files.

Campaign:

campaign_table

Campaign Category:

category_table

Campaign Subcategory:

subcategory_table

Contact Information:

contacts_table


Deliverable 2:

An Entity Relationship Diagram visualization of the cleaned tables & their relationships:

Campaign_ERD_Map


Deliverable 3:

A SQL schema file that does the following:

  • Creates a database in pgAdmin using PostgreSQL
  • Creates tables for the above CSV files
  • Runs some test queries to verify the tables were imported correctly

Instructions for Use

To run this file, please follow these steps:

  1. Run the code in "SECTION ONE" in pgAdmin to create the database.
  2. Open a new Query Tool in the new database & run the code in "SECTION TWO" to create the tables.
  3. Refresh the database.
  4. Import each table in the order they were created with the default settings.
  5. Run each query statement in "SECTION THREE" to verify the tables were imported correctly. You should see results similar to those in the image below on the last query.

pgAdmin_screenshot


References

Data for this dataset was generated by edX Boot Camps LLC, and is intended for educational purposes only.

About

Building an ETL pipeline using Python and Pandas to extract and transform data and PostgreSQL with pgAdmin to load

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published