MSBD 5014 IP Project Repository

This repository contains the source code for the MSBD 5014 IP project, as well as various materials I have collected during this semester when learning Flink.

Data Generation

The dataset is generated using the DBGEN tool, and you can find that in my root directory. You can refer to the detailed documentation here.

You can also find a useful guide on GitHub: TPCH DBGEN Guide.

Important Parameters:

-s <scale>: Scale of the database population. Scale 1.0 represents ~1 GB of data.
-T <table>: Generate data for a particular table ONLY. Arguments:
- p -- part/partsupp
- c -- customer
- s -- supplier
- o -- orders/lineitem
- n -- nation
- r -- region
- l -- code (same as n and r)
- O -- orders
- L -- lineitem
- P -- part
- S -- partsupp

Note for macOS Users:

If you encounter errors such as:

bm_utils.c:71:10: fatal error: 'malloc.h' file not found
varsub.c:44:10: fatal error: 'malloc.h' file not found

Solution: Open bm_utils.c and varsub.c, locate and change the import statement #include <malloc.h> to #include <sys/malloc.h>.

Environment Setup

To run this project, you need to have Maven and MySQL installed on your system.

Project Structure

tpc_query: Contains the source code for the TPC-H queries.
learning: Contains various materials and resources collected during the semester.

Usage

Running the Project

Navigate to the tpc_query directory.
Configure the environment using pom.xml.
Run Main.java to simulate TPC-H Query 7.

Query Execution

You can choose between querying directly in MySQL or testing the AJU algorithm implemented using Flink.

String choice = "Memory";
if (choice.equals("MySQL")) {
    MySQLSink mySQLSink = new MySQLSink();
    dataSource.addSink(mySQLSink);
} else {
    MemorySink memorySink = new MemorySink();
    dataSource.addSink(memorySink);
}

Currently, Q7 and Q5 are supported. To try other queries, you can add them in the /tpc_query/src/main/java/tpc_query/Query directory following the existing format.

Algorithm

This section provides an overview of the four main algorithms implemented in this project. Each algorithm has an associated image to illustrate the process.

Insert Algorithm

This algorithm handles the insertion of new tuples into the database while maintaining the acyclic foreign-key join structure.

Insert-Update Algorithm

This algorithm deals with the insertion and updating of tuples simultaneously. It ensures that the database remains consistent and acyclic.

Delete Algorithm

This algorithm manages the deletion of tuples from the database, ensuring that the foreign-key constraints and the acyclic nature of the joins are preserved.

Delete-Update Algorithm

This algorithm handles the deletion and updating of tuples in tandem, maintaining the integrity and acyclic structure of the foreign-key joins.

References

Maintaining Acyclic Foreign-Key Joins under Updates
Cquirrel: Continuous Query Processing over Acyclic Relational Schemas
Flink: Apache Flink is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.

Feel free to explore the repository and use the provided resources. If you have any questions or need further assistance, please contact me.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_Michael 💻
Add your contributions

This project follows the all-contributors specification. Contributions of any kind welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.vscode		.vscode
DBGEN		DBGEN
learning		learning
output		output
pictures		pictures
tpc_query		tpc_query
.DS_Store		.DS_Store
.all-contributorsrc		.all-contributorsrc
.gitignore		.gitignore
.prettierrc		.prettierrc
.sh		.sh
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSBD 5014 IP Project Repository

Table of Contents

Data Generation

Important Parameters:

Note for macOS Users:

Environment Setup

Project Structure

Usage

Running the Project

Query Execution

Algorithm

Insert Algorithm

Insert-Update Algorithm

Delete Algorithm

Delete-Update Algorithm

References

Contributors ✨

About

Releases

Packages

Languages

MichaelYang-lyx/Flink-In-One

Folders and files

Latest commit

History

Repository files navigation

MSBD 5014 IP Project Repository

Table of Contents

Data Generation

Important Parameters:

Note for macOS Users:

Environment Setup

Project Structure

Usage

Running the Project

Query Execution

Algorithm

Insert Algorithm

Insert-Update Algorithm

Delete Algorithm

Delete-Update Algorithm

References

Contributors ✨

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages