Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace std::map by std::unordered_map #43

Merged
merged 1 commit into from
Jan 22, 2025

Conversation

ordinary-slim
Copy link
Contributor

I was profiling my application and saw that the DofMapRestriction constructor spent a considerable amount of time with std::map operations. I switched _restricted_to_unrestricted and _unrestricted_to_restricted into std::unordered_maps to speed-up the constructor. I'll put some numbers here tomorrow.

@francesco-ballarin
Copy link
Member

Thanks! Looking forward to seeing the numbers, but even before looking at the numbers it looks like you raise a good point.

I see that clang-format is failing: just have a look at the command which is run on CI and run it locally without the --dry-run option to have it automatically reformat the code. If all still looks good please squash into the current commit.

@ordinary-slim
Copy link
Contributor Author

ordinary-slim commented Jan 22, 2025

Using this code snippet

from dolfinx import fem, mesh
from mpi4py import MPI
import sys
sys.path.append("..")
import multiphenicsx.fem
from line_profiler import LineProfiler
import numpy as np

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

def main():
    num_els_dim = 60
    domain = mesh.create_unit_cube(comm, num_els_dim, num_els_dim, num_els_dim, mesh.CellType.tetrahedron)
    V = fem.functionspace(domain, ("Lagrange", 1))
    all_dofs = np.arange(V.dofmap.index_map.size_local)
    restriction = multiphenicsx.fem.DofMapRestriction(V.dofmap, all_dofs)

if __name__=="__main__":
    lp = LineProfiler()
    lp_wrapper = lp(main)
    lp_wrapper()
    with open(f"profiling_rank{rank}.txt", 'w') as pf:
        lp.print_stats(stream=pf)

and release builds with O3 flag, these are the results before and after:


    [s] Time % Time  Line Contents
  =================================
                      def main():
  0.000000904    0.0      num_els_dim = 60
  2.482508459   45.4      domain = mesh.create_unit_cube(comm, num_els_dim, num_els_dim, num_els_dim, mesh.CellType.tetrahedron)
  0.129820577    2.4      V = fem.functionspace(domain, ("Lagrange", 1))
  0.000237959    0.0      all_dofs = np.arange(V.dofmap.index_map.size_local)
  2.849659836   52.2      restriction = multiphenicsx.fem.DofMapRestriction(V.dofmap, all_dofs)
  

  ================================
                     def main():
  0.000001131    0.0      num_els_dim = 60
  2.521425897   79.8      domain = mesh.create_unit_cube(comm, num_els_dim, num_els_dim, num_els_dim, mesh.CellType.tetrahedron)
  0.129454192    4.1      V = fem.functionspace(domain, ("Lagrange", 1))
  0.000206605    0.0      all_dofs = np.arange(V.dofmap.index_map.size_local)
  0.507329679   16.1      restriction = multiphenicsx.fem.DofMapRestriction(V.dofmap, all_dofs)

so in this particular case it goes down from 2.8s to 0.5s.

Using a very similar example with only the left dofs this time:

def main():
    num_els_dim = 10
    domain = mesh.create_unit_cube(comm, num_els_dim, num_els_dim, num_els_dim, mesh.CellType.tetrahedron)
    V = fem.functionspace(domain, ("Lagrange", 1))
    left_dofs = fem.locate_dofs_geometrical(V, lambda x : x[0] <= 0.5)
    restriction = DofMapRestriction(V.dofmap, left_dofs)

and debug builds of both dolfinx and multiphenicsx, valgrind tells me that:

  • Before, it spent about 3% of the time in the constructor on create_sub_index_map and about 91-92% on _compute_cell_dofs. In _compute_cell_dofs, it spends 70% of its time on map operations. Notably, it spends about 35% on map[idx] or map.at(idx) and about 31-32% on map.count(idx).
  • After, it spends about 24-25% of the time in the constructor on create_sub_index_map and 71% on _compute_cell_dofs. In _compute_cell_dofs, it now spends about 50% of its time on unordered_map operations. unordered_map[idx] and unordered_map.at(idx) take up 25% and unordered_map.count(idx) takes up 15%.

I attach the valgrind outputs that you can open with kcachegrind.
BEFORE.out.txt
AFTER.out.txt

@francesco-ballarin francesco-ballarin merged commit 87c4386 into multiphenics:main Jan 22, 2025
3 checks passed
@francesco-ballarin
Copy link
Member

Thanks for the work on profiling this! This is a nice improvement, and I merged the commit to main.

@ordinary-slim ordinary-slim deleted the unordered_map branch January 29, 2025 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants