This repository contains the source code for the MSBD 5014 IP project, as well as various materials I have collected during this semester when learning Flink.
The dataset is generated using the DBGEN tool, and you can find that in my root directory. You can refer to the detailed documentation here.
You can also find a useful guide on GitHub: TPCH DBGEN Guide.
-s <scale>
: Scale of the database population. Scale 1.0 represents ~1 GB of data.-T <table>
: Generate data for a particular table ONLY. Arguments:p
-- part/partsuppc
-- customers
-- suppliero
-- orders/lineitemn
-- nationr
-- regionl
-- code (same as n and r)O
-- ordersL
-- lineitemP
-- partS
-- partsupp
If you encounter errors such as:
bm_utils.c:71:10: fatal error: 'malloc.h' file not found
varsub.c:44:10: fatal error: 'malloc.h' file not found
Solution: Open bm_utils.c
and varsub.c
, locate and change the import statement #include <malloc.h>
to #include <sys/malloc.h>
.
To run this project, you need to have Maven and MySQL installed on your system.
tpc_query
: Contains the source code for the TPC-H queries.learning
: Contains various materials and resources collected during the semester.
- Navigate to the
tpc_query
directory. - Configure the environment using
pom.xml
. - Run
Main.java
to simulate TPC-H Query 7.
You can choose between querying directly in MySQL or testing the AJU algorithm implemented using Flink.
String choice = "Memory";
if (choice.equals("MySQL")) {
MySQLSink mySQLSink = new MySQLSink();
dataSource.addSink(mySQLSink);
} else {
MemorySink memorySink = new MemorySink();
dataSource.addSink(memorySink);
}
Currently, Q7 and Q5 are supported. To try other queries, you can add them in the /tpc_query/src/main/java/tpc_query/Query
directory following the existing format.
This section provides an overview of the four main algorithms implemented in this project. Each algorithm has an associated image to illustrate the process.
This algorithm handles the insertion of new tuples into the database while maintaining the acyclic foreign-key join structure.
This algorithm deals with the insertion and updating of tuples simultaneously. It ensures that the database remains consistent and acyclic.
This algorithm manages the deletion of tuples from the database, ensuring that the foreign-key constraints and the acyclic nature of the joins are preserved.
This algorithm handles the deletion and updating of tuples in tandem, maintaining the integrity and acyclic structure of the foreign-key joins.
- Maintaining Acyclic Foreign-Key Joins under Updates
- Cquirrel: Continuous Query Processing over Acyclic Relational Schemas
- Flink: Apache Flink is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.
Feel free to explore the repository and use the provided resources. If you have any questions or need further assistance, please contact me.
Thanks goes to these wonderful people (emoji key):
Michael 💻 |
||||||
|
This project follows the all-contributors specification. Contributions of any kind welcome!