Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in certain HPX tests with 2.2.0 #126

Closed
msimberg opened this issue Aug 18, 2020 · 11 comments
Closed

Segfault in certain HPX tests with 2.2.0 #126

msimberg opened this issue Aug 18, 2020 · 11 comments

Comments

@msimberg
Copy link
Contributor

In trying to update to 2.2.0 on STEllAR-GROUP/hpx#4895, we have one remaining set of tests that still fail. The partitioned_vector_{ex,in}clusive scan tests all fail with a segfault on the main locality. The stacktrace looks like this, with most of the middle cut out (I'm guessing that's a normal-ish length of a stack trace for APEX, or? I can provide a full stack trace if that would be helpful):

#0  std::__1::vector<apex::profiler*, std::__1::allocator<apex::profiler*> >::__annotate_delete (this=0x2aaadcdc4d28) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/vector:873                            
#1  0x00002aaaaecacda9 in std::__1::vector<apex::profiler*, std::__1::allocator<apex::profiler*> >::~vector (this=0x2aaadcdc4d28) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/vector:552                
#2  0x00002aaaaecacd73 in apex::task_wrapper::~task_wrapper (this=0x2aaadcdc4cf8) at ../../apex/src/apex/task_wrapper.hpp:29                                                                                        
#3  0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaadcdc4ce0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#4  0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaadcdc4ce0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#5  0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaadcdc4ce0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#6  0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaae08178d8) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#7  0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaae08178b8) at ../../apex/src/apex/task_wrapper.hpp:29
#8  0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaae08178a0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#9  0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaae08178a0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#10 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaae08178a0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#11 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaad8118a88) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#12 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaad8118a68) at ../../apex/src/apex/task_wrapper.hpp:29
#13 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaad8118a50) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#14 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaad8118a50) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#15 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaad8118a50) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#16 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaadc02db68) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#17 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaadc02db48) at ../../apex/src/apex/task_wrapper.hpp:29
#18 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaadc02db30) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#19 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaadc02db30) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#20 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaadc02db30) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#21 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaad8154128) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#22 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaad8154108) at ../../apex/src/apex/task_wrapper.hpp:29
#23 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaad81540f0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#24 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaad81540f0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#25 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaad81540f0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#26 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaad4853508) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#27 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaad48534e8) at ../../apex/src/apex/task_wrapper.hpp:29
#28 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaad48534d0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#29 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaad48534d0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#30 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaad48534d0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#31 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaadc83cb18) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#32 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaadc83caf8) at ../../apex/src/apex/task_wrapper.hpp:29
#33 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaadc83cae0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
#34 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaadc83cae0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#35 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaadc83cae0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#36 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaae0056e48) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#37 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaae0056e28) at ../../apex/src/apex/task_wrapper.hpp:29
#38 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaae0056e10) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/
v1/memory:3710
[snip]
#3660 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaae1641780) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#3661 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaad4862d48) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#3662 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaad4862d28) at ../../apex/src/apex/task_wrapper.hpp:29
#3663 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaad4862d10) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3710
#3664 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaad4862d10) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#3665 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaad4862d10) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#3666 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaae005af08) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#3667 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaae005aee8) at ../../apex/src/apex/task_wrapper.hpp:29
#3668 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaae005aed0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3710
#3669 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaae005aed0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#3670 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaae005aed0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#3671 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaadcde5f98) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#3672 0x00002aaaaecacd83 in apex::task_wrapper::~task_wrapper (this=0x2aaadcde5f78) at ../../apex/src/apex/task_wrapper.hpp:29
#3673 0x00002aaaaecaca41 in std::__1::__shared_ptr_emplace<apex::task_wrapper, std::__1::allocator<apex::task_wrapper> >::__on_zero_shared (this=0x2aaadcde5f60) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3710
#3674 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaadcde5f60) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#3675 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaadcde5f60) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#3676 0x00002aaaaec9947c in std::__1::shared_ptr<apex::task_wrapper>::~shared_ptr (this=0x2aaae16616a8) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#3677 0x00002aaaaecb4f79 in apex::profiler::~profiler (this=0x2aaae16616a0) at ../../apex/src/apex/profiler.hpp:170
#3678 0x00002aaaaecb4efb in std::__1::default_delete<apex::profiler>::operator() (this=0x2aaad81989a8, __ptr=0x2aaae16616a0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:2339
#3679 0x00002aaaaecb4ca0 in std::__1::__shared_ptr_pointer<apex::profiler*, std::__1::default_delete<apex::profiler>, std::__1::allocator<apex::profiler> >::__on_zero_shared (this=0x2aaad8198990) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3640
#3680 0x00002aaaaec9fcea in std::__1::__shared_count::__release_shared (this=0x2aaad8198990) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3544
#3681 0x00002aaaaec9fc8f in std::__1::__shared_weak_count::__release_shared (this=0x2aaad8198990) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:3586
#3682 0x00002aaaaec9af9c in std::__1::shared_ptr<apex::profiler>::~shared_ptr (this=0x2aaae916dae0) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4522
#3683 0x00002aaaaece5b53 in std::__1::shared_ptr<apex::profiler>::operator= (this=0x2aaae916dce8, __r=...) at /apps/daint/SSL/HPX/packages/llvm-8.0/include/c++/v1/memory:4555
#3684 0x00002aaaaed0f2d8 in hpx::concurrency::ConcurrentQueue<std::__1::shared_ptr<apex::profiler>, hpx::concurrency::ConcurrentQueueDefaultTraits>::ImplicitProducer::dequeue<std::__1::shared_ptr<apex::profiler> > (this=0x2aaae000bbd0, element=...) at ../../libs/concurrency/include/hpx/concurrency/concurrentqueue.hpp:2537
#3685 0x00002aaaaed0ef5c in hpx::concurrency::ConcurrentQueue<std::__1::shared_ptr<apex::profiler>, hpx::concurrency::ConcurrentQueueDefaultTraits>::ProducerBase::dequeue<std::__1::shared_ptr<apex::profiler> > (this=0x2aaae000bbd0, element=...) at ../../libs/concurrency/include/hpx/concurrency/concurrentqueue.hpp:1667
#3686 0x00002aaaaecf9bd1 in hpx::concurrency::ConcurrentQueue<std::__1::shared_ptr<apex::profiler>, hpx::concurrency::ConcurrentQueueDefaultTraits>::try_dequeue<std::__1::shared_ptr<apex::profiler> > (this=0x2aaae0000e78, item=...) at ../../libs/concurrency/include/hpx/concurrency/concurrentqueue.hpp:1079
#3687 0x00002aaaaecf59a9 in apex::profiler_listener::process_profiles (this=0x2aaabc00a360) at ../../apex/src/apex/profiler_listener.cpp:1130
#3688 0x00002aaaaecf584c in apex::profiler_listener::process_profiles_wrapper () at ../../apex/src/apex/profiler_listener.cpp:1078
#3689 0x00002aaaaed05f67 in hpx::util::invoke<void (*&)()> (f=@0x2aaad404f9f8: 0x2aaaaecf5810 <apex::profiler_listener::process_profiles_wrapper()>) at ../../libs/functional/include/hpx/functional/invoke.hpp:135
#3690 0x00002aaaaed05ef5 in hpx::util::detail::annotated_function<void (*)()>::operator()<>() (this=0x2aaad404f9f8) at ../../libs/threading_base/include/hpx/threading_base/annotated_function.hpp:142
#3691 0x00002aaaaed05e88 in hpx::threads::detail::thread_function_nullary<hpx::util::detail::annotated_function<void (*)()> >::operator() (this=0x2aaad404f9f8) at ../../libs/threading_base/include/hpx/threading_base/register_thread.hpp:78
#3692 0x00002aaaaed05e11 in hpx::util::detail::callable_vtable<std::__1::pair<hpx::threads::thread_state_enum, hpx::threads::thread_id> (hpx::threads::thread_state_ex_enum)>::_invoke<hpx::threads::detail::thread_function_nullary<hpx::util::detail::annotated_function<void (*)()> > >(void*, hpx::threads::thread_state_ex_enum&&) (f=0x2aaad404f9f8, vs=@0x2aaae916dfa4: hpx::threads::wait_signaled) at ../../libs/functional/include/hpx/functional/detail/vtable/callable_vtable.hpp:93
#3693 0x00002aaaadf43ce0 in hpx::util::detail::basic_function<std::__1::pair<hpx::threads::thread_state_enum, hpx::threads::thread_id> (hpx::threads::thread_state_ex_enum), false, false>::operator()(hpx::threads::thread_state_ex_enum) const (this=0x2aaad404f9e8, vs=hpx::threads::wait_signaled) at ../../libs/functional/include/hpx/functional/detail/basic_function.hpp:228
#3694 hpx::threads::coroutines::detail::coroutine_impl::operator() (this=0x2aaad404f8b0) at ../../libs/coroutines/src/detail/coroutine_impl.cpp:74
#3695 0x00002aaaadf43825 in hpx::threads::coroutines::detail::lx::trampoline<hpx::threads::coroutines::detail::coroutine_impl> (fun=0x2aaad404f8b0) at ../../libs/coroutines/include/hpx/coroutines/detail/context_linux_x86.hpp:92
#3696 0x0000000000000000 in ?? ()

This is definitely a new failure caused by the new APEX version. Latest HPX master with 2.1.9 works fine.

@khuck does this ring any bells for you right away? Otherwise I can easily bisect APEX to see where this was introduced. Anything else that might help you?

This is with clang 8, C++17, Boost 1.69.0, hwloc 2.0.3, and PAPI 5.7.0.2 if that makes any difference.

CI logs: https://cdash.cscs.ch/buildSummary.php?buildid=123422.

@khuck
Copy link
Collaborator

khuck commented Aug 18, 2020 via email

@msimberg
Copy link
Contributor Author

Note that I thought this was infinite recursion at first and that it ran out of stack at the end, but the first frame actually breaks the cycle. It could of course still be a cycle somewhere but it's not necessarily that.

@khuck
Copy link
Collaborator

khuck commented Aug 18, 2020

I am pretty sure I know what's happening - there is a bit of a rat's nest of dependencies between three classes, and I have known I need to clean them up. I discovered one of them was not getting set, and I added code to fix that. That is likely causing this cascade of destructors when all the pointers are chased.

@khuck
Copy link
Collaborator

khuck commented Aug 18, 2020

I can't reproduce it with my build environment (the tests all pass), so I am going to try clang 10 and the hwloc 2.2...

@khuck
Copy link
Collaborator

khuck commented Aug 18, 2020

This happens in 2.1.9, too...

@khuck
Copy link
Collaborator

khuck commented Aug 19, 2020

I pushed a change that appears to have fixed things on my end...

@msimberg
Copy link
Contributor Author

Thanks @khuck for looking into this! Which commit/branch/tag would that be? We can try that out first before you update 2.2.0 or create a new tag.

@khuck
Copy link
Collaborator

khuck commented Aug 19, 2020

I forced an update to the tag. It’s also the head of master. Thanks!

@aurianer
Copy link
Contributor

Thanks a lot @khuck!

@khuck
Copy link
Collaborator

khuck commented Aug 20, 2020

Does that mean it fixed the crash on your end?

@aurianer
Copy link
Contributor

Yes we merged the PR updating the tag ;) thanks!

@khuck khuck closed this as completed Sep 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants