-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_module_functions: separate gc pushes #177
Conversation
creating a new Array<...> calls jl_alloc_array_1d which can trigger garbage collection thus we need to protect each array before creating the next one instead of one JL_GC_PUSH6 call to do this we need some extra scope blocks
The windows failures seem unrelated, and I don't know how to properly tests these changes. Locally, I have been rerunning Oscar pre-compilation many times (with rr and without), so far no crashes after applying the second commit to 0.13.2, but I will keep the loop running. |
Thanks for looking into this, it looks like the Windows failures are related to ranges, so that would be unrelated to this. I still have to look in detail what these changes here mean, one idea to make testing simpler could be to try to manually trigger the GC at the worst possible times, e.g. by adding a compilation flag that activates this for test builds. |
A more detailed explanation:
|
So I ran some tests with extra GC calls: diff --git a/src/c_interface.cpp b/src/c_interface.cpp
index c8b4c47..c1e8d0c 100644
--- a/src/c_interface.cpp
+++ b/src/c_interface.cpp
@@ -137,19 +137,23 @@ JLCXX_API jl_array_t* get_module_functions(jl_module_t* jlmod)
const jlcxx::Module& module = registry().get_module(jlmod);
module.for_each_function([&](FunctionWrapperBase& f)
{
+ jl_gc_collect(JL_GC_AUTO);
Array<jl_datatype_t*> arg_types_array;
jl_value_t* boxed_f = nullptr;
jl_value_t* boxed_thunk = nullptr;
JL_GC_PUSH3(arg_types_array.gc_pointer(), &boxed_f, &boxed_thunk);
{
+ jl_gc_collect(JL_GC_AUTO);
Array<jl_value_t*> arg_names_array;
JL_GC_PUSH1(arg_names_array.gc_pointer());
{
+ jl_gc_collect(JL_GC_AUTO);
Array<jl_value_t*> arg_default_values_array;
jl_value_t* boxed_n_kwargs = nullptr;
jl_value_t* cppfuncinfo = nullptr;
JL_GC_PUSH3(arg_default_values_array.gc_pointer(), &boxed_n_kwargs, &cppfuncinfo);
+ jl_gc_collect(JL_GC_AUTO);
fill_types_vec(arg_types_array, f.argument_types());
boxed_f = jlcxx::box<void*>(f.pointer());
@@ -183,6 +187,7 @@ JLCXX_API jl_array_t* get_module_functions(jl_module_t* jlmod)
arg_default_values_array.wrapped(),
boxed_n_kwargs
);
+ jl_gc_collect(JL_GC_AUTO);
function_array.push_back(cppfuncinfo);
JL_GC_POP(); This seems to work fine. I ran this for about 500 iterations re-precompiling CxxWrap, Oscar and its dependencies. Adding similar explicit GC calls before the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Great detective work and explanation, thanks @benlorenz
Sorry for the delay on this, I wanted to investigate the MSVC problem, which turned out to be due to a newer MSVC version on github CI. Should build now. |
The failure on windows nightly is due to a problem in Julia, the one on Mac nightly is either another intermittent failure or due to a change in Julia between now and the last test run. I think we can merge this. |
Thanks! The failure on macos was probably caused by references to |
Many thanks for this work, I had seen very rare crashes for a long time and always had a nagging feeling there was some error, without knowing what exactly. Hopefully this was the only one :) |
https://github.com/JuliaInterop/CxxWrap.jl#stl-updates
Creating a new
jlcxx::Array<...>()
callsjl_alloc_array_1d
which can trigger garbage collection. Thus we need to protect each array before creating the next one, instead of doing oneJL_GC_PUSH6
call here:libcxxwrap-julia/src/c_interface.cpp
Lines 138 to 146 in d4192fe
This might help with some rare crashes during initialization or precompilation we have seen in the Oscar CI for quite a while (see oscar-system/Oscar.jl#3296).
The backtrace I got locally was:
Unfortunately this is slightly different from the backtraces in the Oscar CI but it might still be related and I could not reproduce the other backtrace yet.I will keep it a draft for now as I dig into another backtrace I just got,but happy to receive some feedback. Maybe there is another good way to protect these Arrays?cc: @fingolfin
Note: the diff is best viewed with ignore whitespace enabled: https://github.com/JuliaInterop/libcxxwrap-julia/pull/177/files?w=1
Edit: running against
stl-updates
branch because otherwise the version doesn't fit to the main branch here. I did my debugging with libcxxwrap-julia 0.13.2 though.Edit2:
The second crash (which is closer to the backtrace from the Oscar CI) appeared after fixing the first one and should be fixed with the second commit by protecting the CppFunctionInfo struct during the
push_back
(which grows the array and thus allocates).Another option to fix this would be to add a special version of
push_back
forjl_value_t*
which also adds the argument to theJL_GC_PUSH
macro. This would help for other users ofArray<jl_value_t*>
where the argument might be unprotected.backtrace for the second crash: