Dynamic Load Balancing for a hp-adaptive Discontinuous Galerkin Wave Equation Solver via Space-Filling Curve and Advanced Data Structure
- Dynamic Load Balancing for a hp-adaptive Discontinuous Galerkin Wave Equation Solver via Space-Filling Curve and Advanced Data Structure
We combine a high-order method -- the discontinuous Galerkin spectral element method (DG-SEM), with parallel adaptive mesh refinement and coarsening (AMR) techniques and apply it to a two-dimensional wave equation solver.
Advanced data structures and dynamic load balancing are applied to the solver to achieve efficient data management and high-level parallelism.
mkdir build && cd build
cmake ..
make
mpirun -np 4 main
You also can use parallel make
, e.g., make -j 16
. 16 threads will be used to compile the code.
mpirun -np 4 main
executes the program with 4 processors, you could change the processor number as you want.
A detailed documentation an be found at here.
Source code explanation: Source code documentation
The basic model of wave propagation is the wave equation:
The variable p
represents the acoustic pressure and c
is the sound speed.
Two types of refinements are implemented in this work: h-refinement and p-refinement.
Subdivide an element into children elements. Raise polynomial orders inside the targeted element.In stead of using the conventional tree data structure, hash table data structure is employeed.
The adaptivity of the program introduces load imbalance among the processors, causing the program to slow down. In order to achieve high-level parallelism, dynamic load balancing strategy is apllied to balance the workload.
The workload redistribution problem is an optimization problem with two main goals: the workload should be distributed evenly among the processors with small memory overhead and the interfacing boundaries between partitions should be as small as possible. The optimization problem has been proven to be NP-hard.
pros:
- Well-studied.
- Multiple libraries.
cons:
- Reached scalability limits (requires "global" graph knowledge).
- High memory consumption.
pros:
- Simplifies a multi-dimensional partitioning problem into a one-dimensional one.
- SFCs have good locality and can be generated fast.
- Low memory usage.
- Favours distributed systems.
cons:
- The partitioning boundaries are decided by the trajectry of the space-filling curve.
Hilber curve is chosen for this work. The figure below shows the first three levels of Hilbert curve.
The test is done on Cedar with processor number ranging from 32 to 2048. We gained a maximum speedup around 8.