-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in certain HPX tests with 2.2.0 #126
Comments
Interesting, looks like a recursive set of destructors. The task_wrapper destructor is triggering the destruction of other task_wrappers… it’s possible there’s a cycle of pointers here that is causing this. I’ll take a look.
Thanks -
Kevin
… On Aug 18, 2020, at 7:45 AM, Mikael Simberg ***@***.***> wrote:
In trying to update to 2.2.0 on STEllAR-GROUP/hpx#4895 <STEllAR-GROUP/hpx#4895>, we have one remaining set of tests that still fail. The partitioned_vector_{ex,in}clusive scan tests all fail with a segfault on the main locality. The stacktrace looks like this, with most of the middle cut out (I'm guessing that's a normal-ish length of a stack trace for APEX, or? I can provide a full stack trace if that would be helpful):
#0 std::__1::vector<apex::profiler*, std::__1::allocator<apex::profiler*> >::__annotate_delete (this=0x2aaadcdc4d28) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/vector:873
#1 0x00002aaaaecacda9 in std::__1::vector<apex::profiler*, std::__1::allocator<apex::profiler*> >::~vector (this=0x2aaadcdc4d28) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/vector:552
#2 0x00002aaaaecacd73 in apex::task_wrapper::~task_wrapper (this=0x2aaadcdc4cf8) at ../../apex/src/apex/task_wrapper.hpp:29
#3 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaadcdc4ce0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#4 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaadcdc4ce0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#5 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaadcdc4ce0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#6 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaae08178d8) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#7 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaae08178b8) at ../../apex/src/apex/task_wrapper.hpp:29
#8 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaae08178a0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#9 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaae08178a0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#10 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaae08178a0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#11 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaad8118a88) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#12 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaad8118a68) at ../../apex/src/apex/task_wrapper.hpp:29
#13 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaad8118a50) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#14 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaad8118a50) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#15 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaad8118a50) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#16 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaadc02db68) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#17 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaadc02db48) at ../../apex/src/apex/task_wrapper.hpp:29
#18 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaadc02db30) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#19 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaadc02db30) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#20 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaadc02db30) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#21 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaad8154128) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#22 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaad8154108) at ../../apex/src/apex/task_wrapper.hpp:29
#23 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaad81540f0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#24 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaad81540f0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#25 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaad81540f0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#26 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaad4853508) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#27 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaad48534e8) at ../../apex/src/apex/task_wrapper.hpp:29
#28 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaad48534d0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#29 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaad48534d0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#30 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaad48534d0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#31 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaadc83cb18) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#32 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaadc83caf8) at ../../apex/src/apex/task_wrapper.hpp:29
#33 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaadc83cae0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#34 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaadc83cae0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#35 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaadc83cae0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#36 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaae0056e48) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#37 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaae0056e28) at ../../apex/src/apex/task_wrapper.hpp:29
#38 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaae0056e10) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
[snip]
#3660 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaae1641780) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#3661 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaad4862d48) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#3662 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaad4862d28) at ../../apex/src/apex/task_wrapper.hpp:29
#3663 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaad4862d10) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3710
#3664 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaad4862d10) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#3665 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaad4862d10) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#3666 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaae005af08) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#3667 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaae005aee8) at ../../apex/src/apex/task_wrapper.hpp:29
#3668 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaae005aed0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3710
#3669 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaae005aed0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#3670 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaae005aed0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#3671 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaadcde5f98) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#3672 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaadcde5f78) at ../../apex/src/apex/task_wrapper.hpp:29
#3673 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaadcde5f60) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3710
#3674 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaadcde5f60) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#3675 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaadcde5f60) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#3676 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaae16616a8) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#3677 0x00002aaaaecb4f79 in apex::profiler::~profiler (this=0x2aaae16616a0) at ../../apex/src/apex/profiler.hpp:170
#3678 0x00002aaaaecb4efb in std::__1::default_delete<apex::profiler>::operator() (this=0x2aaad81989a8, __ptr=0x2aaae16616a0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:2339
#3679 0x00002aaaaecb4ca0 in std::__1::__shared_ptr_pointer<apex::profiler*, std::__1::default_delete<apex::profiler>, std::__1::allocator<apex::profiler> >::__on_zero_shared (this=0x2aaad8198990) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3640
#3680 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaad8198990) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#3681 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaad8198990) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#3682 0x00002aaaaec9af9c in std::__1::shared_ptr<apex::profiler>::~shared_ptr (this=0x2aaae916dae0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#3683 0x00002aaaaece5b53 in std::__1::shared_ptr<apex::profiler>::operator= (this=0x2aaae916dce8, __r=...) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4555
#3684 0x00002aaaaed0f2d8 in hpx::concurrency::ConcurrentQueue<std::__1::shared_ptr<apex::profiler>, hpx::concurrency::ConcurrentQueueDefaultTraits>::ImplicitProducer::dequeue<std::__1::shared_ptr<apex::profiler> > (this=0x2aaae000bbd0, element=...) at ../../libs/concurrency/include/hpx/concurrency/concurrentqueue.hpp:2537
#3685 0x00002aaaaed0ef5c in hpx::concurrency::ConcurrentQueue<std::__1::shared_ptr<apex::profiler>, hpx::concurrency::ConcurrentQueueDefaultTraits>::ProducerBase::dequeue<std::__1::shared_ptr<apex::profiler> > (this=0x2aaae000bbd0, element=...) at ../../libs/concurrency/include/hpx/concurrency/concurrentqueue.hpp:1667
#3686 0x00002aaaaecf9bd1 in hpx::concurrency::ConcurrentQueue<std::__1::shared_ptr<apex::profiler>, hpx::concurrency::ConcurrentQueueDefaultTraits>::try_dequeue<std::__1::shared_ptr<apex::profiler> > (this=0x2aaae0000e78, item=...) at ../../libs/concurrency/include/hpx/concurrency/concurrentqueue.hpp:1079
#3687 0x00002aaaaecf59a9 in apex::profiler_listener::process_profiles (this=0x2aaabc00a360) at ../../apex/src/apex/profiler_listener.cpp:1130
#3688 0x00002aaaaecf584c in apex::profiler_listener::process_profiles_wrapper () at ../../apex/src/apex/profiler_listener.cpp:1078
#3689 0x00002aaaaed05f67 in hpx::util::invoke<void (*&)()> ***@***.***: 0x2aaaaecf5810 <apex::profiler_listener::process_profiles_wrapper()>) at ../../libs/functional/include/hpx/functional/invoke.hpp:135
#3690 0x00002aaaaed05ef5 in hpx::util::detail::annotated_function<void (*)()>::operator()<>() (this=0x2aaad404f9f8) at ../../libs/threading_base/include/hpx/threading_base/annotated_function.hpp:142
#3691 0x00002aaaaed05e88 in hpx::threads::detail::thread_function_nullary<hpx::util::detail::annotated_function<void (*)()> >::operator() (this=0x2aaad404f9f8) at ../../libs/threading_base/include/hpx/threading_base/register_thread.hpp:78
#3692 0x00002aaaaed05e11 in hpx::util::detail::callable_vtable<std::__1::pair<hpx::threads::thread_state_enum, hpx::threads::thread_id> (hpx::threads::thread_state_ex_enum)>::_invoke<hpx::threads::detail::thread_function_nullary<hpx::util::detail::annotated_function<void (*)()> > >(void*, hpx::threads::thread_state_ex_enum&&) (f=0x2aaad404f9f8, ***@***.***: hpx::threads::wait_signaled) at ../../libs/functional/include/hpx/functional/detail/vtable/callable_vtable.hpp:93
#3693 0x00002aaaadf43ce0 in hpx::util::detail::basic_function<std::__1::pair<hpx::threads::thread_state_enum, hpx::threads::thread_id> (hpx::threads::thread_state_ex_enum), false, false>::operator()(hpx::threads::thread_state_ex_enum) const (this=0x2aaad404f9e8, vs=hpx::threads::wait_signaled) at ../../libs/functional/include/hpx/functional/detail/basic_function.hpp:228
#3694 hpx::threads::coroutines::detail::coroutine_impl::operator() (this=0x2aaad404f8b0) at ../../libs/coroutines/src/detail/coroutine_impl.cpp:74
#3695 0x00002aaaadf43825 in hpx::threads::coroutines::detail::lx::trampoline<hpx::threads::coroutines::detail::coroutine_impl> (fun=0x2aaad404f8b0) at ../../libs/coroutines/include/hpx/coroutines/detail/context_linux_x86.hpp:92
#3696 0x0000000000000000 in ?? ()
This is definitely a new failure caused by the new APEX version. Latest HPX master with 2.1.9 works fine.
@khuck <https://github.com/khuck> does this ring any bells for you right away? Otherwise I can easily bisect APEX to see where this was introduced. Anything else that might help you?
This is with clang 8, C++17, Boost 1.69.0, hwloc 2.0.3, and PAPI 5.7.0.2 if that makes any difference.
CI logs: https://cdash.cscs.ch/buildSummary.php?buildid=123422 <https://cdash.cscs.ch/buildSummary.php?buildid=123422>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#126>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEBA7YV4MEIA6FWX3ZO6VLSBKHW3ANCNFSM4QDSX3LQ>.
--
Kevin Huck, PhD
Research Associate / Computer Scientist
OACISS - Oregon Advanced Computing Institute for Science and Society
University of Oregon
[email protected]
http://tau.uoregon.edu
http://oaciss.uoregon.edu
|
Note that I thought this was infinite recursion at first and that it ran out of stack at the end, but the first frame actually breaks the cycle. It could of course still be a cycle somewhere but it's not necessarily that. |
I am pretty sure I know what's happening - there is a bit of a rat's nest of dependencies between three classes, and I have known I need to clean them up. I discovered one of them was not getting set, and I added code to fix that. That is likely causing this cascade of destructors when all the pointers are chased. |
I can't reproduce it with my build environment (the tests all pass), so I am going to try clang 10 and the hwloc 2.2... |
This happens in 2.1.9, too... |
I pushed a change that appears to have fixed things on my end... |
Thanks @khuck for looking into this! Which commit/branch/tag would that be? We can try that out first before you update 2.2.0 or create a new tag. |
I forced an update to the tag. It’s also the head of master. Thanks! |
Thanks a lot @khuck! |
Does that mean it fixed the crash on your end? |
Yes we merged the PR updating the tag ;) thanks! |
In trying to update to 2.2.0 on STEllAR-GROUP/hpx#4895, we have one remaining set of tests that still fail. The
partitioned_vector_{ex,in}clusive
scan tests all fail with a segfault on the main locality. The stacktrace looks like this, with most of the middle cut out (I'm guessing that's a normal-ish length of a stack trace for APEX, or? I can provide a full stack trace if that would be helpful):This is definitely a new failure caused by the new APEX version. Latest HPX master with 2.1.9 works fine.
@khuck does this ring any bells for you right away? Otherwise I can easily bisect APEX to see where this was introduced. Anything else that might help you?
This is with clang 8, C++17, Boost 1.69.0, hwloc 2.0.3, and PAPI 5.7.0.2 if that makes any difference.
CI logs: https://cdash.cscs.ch/buildSummary.php?buildid=123422.
The text was updated successfully, but these errors were encountered: