Name		Name	Last commit message	Last commit date
parent directory ..
CALL_CENTER		CALL_CENTER
ETL_CSV_Python_MySQL		ETL_CSV_Python_MySQL
ETL_Complete_Example_Movies		ETL_Complete_Example_Movies
ETL_Electric_Vehicle_Example		ETL_Electric_Vehicle_Example
ETL_Excel_Python_MySQL		ETL_Excel_Python_MySQL
ETL_Insurance_Example		ETL_Insurance_Example
ETL_Insurance_Example_with_Transformation		ETL_Insurance_Example_with_Transformation
ETL_Parquet_Python_MySQL		ETL_Parquet_Python_MySQL
ETL_docs		ETL_docs
ETL_misc		ETL_misc
README.md		README.md

README.md

ETL: Extract Transform Load

1. ETL and ELT in Pictures

1.1 ETL in Picture

1.2 ETL vs ELT

1.3 ETL vs ELT

2. What is ETL and ELT?

2.1 ETL

Extract, transform, and load (ETL) is the process 
of combining data from multiple sources into a 
large, central repository called a data warehouse. 

ETL uses a set of business rules to clean and organize 
raw data and prepare it for storage, data analytics, 
and machine learning (ML). You can address specific 
business intelligence needs through data analytics 
(such as predicting the outcome of business decisions, 
generating reports and dashboards, reducing operational 
inefficiency, and more).

source: https://aws.amazon.com/what-is/etl/

2.2 ELT

ELT, which stands for “Extract, Load, Transform,” 
is another type of data integration process, similar 
to its counterpart ETL, “Extract, Transform, Load”. 
This process moves raw data from a source system to 
a destination resource, such as a data warehouse. 
While similar to ETL, ELT is a fundamentally different 
approach to data pre-processing which has only more 
recently gained adoption with the transition to 
cloud environments.

3. The ETL Process

The most underestimated process in DW development
The most time-consuming process in DW development
Up to 80% of the development time is spent on ETL!

3.1 Extract

Extract relevant data
Extraction can be from many data sources

3.2 Transform

Transform data to DW format
Build DW keys, etc.
Cleansing of data

3.3 Load

Load data into DW
Build aggregates, etc.

4. Sample ETL Program

4.1.0 Create Parquet File by PySpark (as a data source)

4.1.1 Create Parquet File by PySpark (log file)

4.2 ETL: 1. extract, 2. transform, and 3. load

5. Sample ELT Program

5.1.0 Create Parquet File by PySpark (as a data source)

5.1.1 Create Parquet File by PySpark (log file)

5.2 ETL: 1. extract, 2. load, and 3. transform

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

week-05_ETL_detailed

week-05_ETL_detailed

README.md

ETL: Extract Transform Load

1. ETL and ELT in Pictures

1.1 ETL in Picture

1.2 ETL vs ELT

1.3 ETL vs ELT

2. What is ETL and ELT?

2.1 ETL

2.2 ELT

3. The ETL Process

3.1 Extract

3.2 Transform

3.3 Load

4. Sample ETL Program

5. Sample ELT Program

6. Data Cleaning with SQL

7. ETL Pipeline for NYC Taxi Trip Data

8. ETL References

Files

week-05_ETL_detailed

Directory actions

More options

Directory actions

More options

Latest commit

History

week-05_ETL_detailed

Folders and files

parent directory

README.md

ETL: Extract Transform Load

1. ETL and ELT in Pictures

1.1 ETL in Picture

1.2 ETL vs ELT

1.3 ETL vs ELT

2. What is ETL and ELT?

2.1 ETL

2.2 ELT

3. The ETL Process

3.1 Extract

3.2 Transform

3.3 Load

4. Sample ETL Program

5. Sample ELT Program

6. Data Cleaning with SQL

7. ETL Pipeline for NYC Taxi Trip Data

8. ETL References