Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTF2 implementation of callbacks are at fault for uninitialized variable #186

Open
bertwesarg opened this issue Jan 15, 2025 · 1 comment

Comments

@bertwesarg
Copy link

Hi,

I stumbled across these issues starting in spack/spack#2719, STEllAR-GROUP/hpx#5239 and finally #152. The issue was never reported upstream and we could have helped you in resolving it, because the root cause is not in OTF2 but in the implementations of the collective callbacks APEX is passing to OTF2. In particular the broadcast callback is a no-op and hence the status is not properly distributed to all participating processes of the application. Hence it should be no surprise that the non-zero ranks access an uninitialized variable (btw, initializing an error variable to success should be frowned upon anyway).

So the proper fix to the problem is to properly implement the collective callbacks. Luckily if MPI is available the problem is already resolved. But we encourage you, to percolate the fallout of this misunderstanding to the other projects, i.e., at least remove the patch in spack.

Thanks.

@khuck
Copy link
Collaborator

khuck commented Jan 15, 2025

it took me a while to figure out what issue you were referring to, since this problem is 4 years old and the first link you included doesn't link to the spack PR but to an unrelated spack issue... :)

At any rate, you are correct - the collectives in APEX aren't fully implemented for non-MPI cases. Unfortunately, I don't have the time or funding to address that problem.

As for the patch in spack, I didn't add it but I can recommend removing it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants