You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I stumbled across these issues starting in spack/spack#2719, STEllAR-GROUP/hpx#5239 and finally #152. The issue was never reported upstream and we could have helped you in resolving it, because the root cause is not in OTF2 but in the implementations of the collective callbacks APEX is passing to OTF2. In particular the broadcast callback is a no-op and hence the status is not properly distributed to all participating processes of the application. Hence it should be no surprise that the non-zero ranks access an uninitialized variable (btw, initializing an error variable to success should be frowned upon anyway).
So the proper fix to the problem is to properly implement the collective callbacks. Luckily if MPI is available the problem is already resolved. But we encourage you, to percolate the fallout of this misunderstanding to the other projects, i.e., at least remove the patch in spack.
Thanks.
The text was updated successfully, but these errors were encountered:
it took me a while to figure out what issue you were referring to, since this problem is 4 years old and the first link you included doesn't link to the spack PR but to an unrelated spack issue... :)
At any rate, you are correct - the collectives in APEX aren't fully implemented for non-MPI cases. Unfortunately, I don't have the time or funding to address that problem.
As for the patch in spack, I didn't add it but I can recommend removing it...
Hi,
I stumbled across these issues starting in spack/spack#2719, STEllAR-GROUP/hpx#5239 and finally #152. The issue was never reported upstream and we could have helped you in resolving it, because the root cause is not in OTF2 but in the implementations of the collective callbacks APEX is passing to OTF2. In particular the broadcast callback is a no-op and hence the status is not properly distributed to all participating processes of the application. Hence it should be no surprise that the non-zero ranks access an uninitialized variable (btw, initializing an error variable to success should be frowned upon anyway).
So the proper fix to the problem is to properly implement the collective callbacks. Luckily if MPI is available the problem is already resolved. But we encourage you, to percolate the fallout of this misunderstanding to the other projects, i.e., at least remove the patch in spack.
Thanks.
The text was updated successfully, but these errors were encountered: