Performance Analysis and Optimization Report

This document outlines the performance analysis on the Jacobi test for the 2D Poisson equation solver, using default parameters specified in test.csv. The analysis includes initial benchmarking, profiling with gprof, and subsequent optimization strategies applied to enhance performance.

An initial benchmarking memory bandwidth with STREAM provides an insight into the memory bandwidth capabilities of the hardware utilized for the performance analysis:

Function	Best Rate MB/s	Avg time	Min time	Max time
Copy	23498.3	0.007372	0.006809	0.009033
Scale	23286.3	0.007183	0.006871	0.007961
Add	27881.5	0.009217	0.008608	0.011332
Triad	28027.4	0.008923	0.008563	0.009630

Profiling (no optimization)

Profiling the application with gprof highlighted several hotspots:

% Time	Cumulative Seconds	Self Seconds	Calls	s/call	s/call Total	Function
11.68%	0.35	0.35	374,740,052	0.00	0.00	`std::vector::at`
11.51%	0.70	0.35	549,685,071	0.00	0.00	`std::vector::_M_range_check`
9.21%	0.98	0.28	549,695,069	0.00	0.00	`std::vector::size`
8.72%	1.25	0.27	549,685,071	0.00	0.00	`std::vector::at`
6.25%	1.44	0.19	374,740,052	0.00	0.00	`std::vector::_M_range_check`
6.25%	1.63	0.19	1	0.19	3.01	`testJacobi`
5.10%	1.78	0.15	599,710,067	0.00	0.00	`std::vector::size`
4.28%	1.92	0.13	274,800,040	0.00	0.00	`Mesh::getNode`
3.95%	2.04	0.12	549,685,071	0.00	0.00	`std::vector::operator[]`
3.95%	2.15	0.12	1	0.12	0.86	`Boundaries::setInnerNodes`
3.62%	2.27	0.11	2	0.06	0.18	`std::vector::_M_fill_initialize`
3.29%	2.37	0.10	374,740,052	0.00	0.00	`std::vector::operator[]`
3.29%	2.46	0.10	1	0.10	1.15	`jacobi`
2.47%	2.54	0.07	1	0.07	0.35	`Mesh::swapGrids`
2.30%	2.61	0.07	10,000	0.00	0.00	`Node* std::__uninitialized_copy`
2.14%	2.67	0.07	174,945,017	0.00	0.00	`Mesh::numCols`

Total Runtime: 3.52s

Frequent calls to std::vector::at(unsigned long).
Frequent calls to Mesh::numCols() and Mesh::numCols() that leads to excessive use of std::size().
Poorly implemented Mesh::swapGrids().

To profile your application, execute:

./src/PDEsolver --test
gprof ./src/PDEsolver gmon.out > analysis.txt

The profiling output is available in analysis.txt within the build directory.

Optimization Strategies

Optimizing Frequent `std::vector::size()` Calls

Issue: Repeated calls to std::vector::size() within loops.
Solution: Cached the size of vectors before entering loops to avoid repeated function calls.

After optimization:
- Total Runtime: 2.24s
- Runtime optimization: 36%

Reducing `std::vector::at()` Calls

Issue: A significant amount of time was spent on bounds-checked access via std::vector::at().
Solution: Replaced std::vector::at() with operator[] for direct access, after ensuring index bounds are checked ahead of time.

After optimization:
- Total Runtime: 1.28s
- Runtime optimization: 27%

Optimizing `Mesh::swapGrids()`

Issue: The function swaps the grid entries one by one, inefficient use of std::swap.
Solution: Efficient use of std::swap.

After oprimization:
- Total Runtime: 1.12s
- Runtime optimization: 5%

Profiling after optimization

% Time	Cumulative Seconds	Self Seconds	Calls	s/call	s/call Total	Function
19.57%	0.23s	0.23s	274,800,040	0.00s	0.00s	`Mesh::getNode`
13.91%	0.39s	0.16s	324,780,050	0.00s	0.00s	`std::vector::operator[]`
11.74%	0.52s	0.14s	324,780,044	0.00s	0.00s	`std::vector::operator[]`
10.00%	0.64s	0.12s	50,000,000	0.00s	0.00s	`std::_Construct`
8.70%	0.73s	0.10s	1	0.10s	0.35s	`Boundaries::setInnerNodes`
6.09%	0.81s	0.07s	10,000	0.00s	0.00s	`std::__uninitialized_copy`
5.65%	0.87s	0.07s	2	0.03s	0.17s	`std::vector::vector`
4.78%	0.93s	0.06s	25,000,000	0.00s	0.00s	`Mesh::setNode`
4.35%	0.97s	0.05s	50,010,000	0.00s	0.00s	`__gnu_cxx::operator!=`
4.35%	1.02s	0.05s	1	0.05s	0.29s	`jacobi`
2.61%	1.05s	0.03s	1	0.03s	0.07s	`printToCSV`
2.61%	1.08s	0.03s	1	0.03s	0.38s	`Mesh::Mesh`
1.74%	1.10s	0.02s	100,020,004	0.00s	0.00s	`__gnu_cxx::__normal_iterator::base`
1.74%	1.12s	0.02s	50,000,000	0.00s	0.00s	`__gnu_cxx::__normal_iterator::operator++`

Total Runtime: 1.12s
Total Runtime Optimization: 68%

Compiler optimizations

Optimization Level: -O2
Impact on Runtime: Reduced runtime from 1.12 seconds to 1.02 seconds.
Runtime Optimization: 3%

The -O2 optimization level enables loop optimizations, function inlining, and other improvements. While the -O2 optimization did lead to a runtime improvement, it was not as substantial as expected. To achieve more significant improvements, a more cache friendly way to get the data from the grid or a more efficient way to trigger loop unrolling by the compiler can be implemented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance.md

performance.md

Performance Analysis and Optimization Report

Profiling (no optimization)

Optimization Strategies

Optimizing Frequent `std::vector::size()` Calls

Reducing `std::vector::at()` Calls

Optimizing `Mesh::swapGrids()`

Profiling after optimization

Compiler optimizations

Files

performance.md

Latest commit

History

performance.md

File metadata and controls

Performance Analysis and Optimization Report

Profiling (no optimization)

Optimization Strategies

Optimizing Frequent std::vector::size() Calls

Reducing std::vector::at() Calls

Optimizing Mesh::swapGrids()

Profiling after optimization

Compiler optimizations

Optimizing Frequent `std::vector::size()` Calls

Reducing `std::vector::at()` Calls

Optimizing `Mesh::swapGrids()`