-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Handle dangling objects and rv_policy changes robustly #889
base: master
Are you sure you want to change the base?
Conversation
This is a potential answer to wjakob#888, also inspired by the discussion in wjakob#589. Seeking feedback on the approach before I work on docs and tests. When returning unique or shared ownership to Python when some Python instances already hold a non-owning reference to the same C++ object, we have a few choices: * Create a new owning instance in addition to the existing non-owning one. (This PR's approach.) * Upgrade the existing non-owning instance to owning. (More appealing in some ways, but hard to implement with shared ownership since nanobind doesn't have holders.) * Just return the non-owning instance and forget about the ownership. (nanobind's current approach, which sometimes causes problems.) This PR also handles the fact that a non-owning instance can dangle. That is, the referenced C++ object could be destroyed while the Python instance is still alive -- especially on PyPy, where it's hard to control when a Python instance dies. Dangling instances are always non-owning, so they are mostly handled by the same logic that handles rv_policy changes. The remaining piece of support for dangling instances is to acknowledge that, if the C++ referent is freed, a new instance could be allocated with its same address, even one with internal storage. (I have observed this in production.) So, there is some new logic in `inst_new_int` to remove the previous must-be-dangling instances from `inst_c2p`, rather than crashing because they exist. This also implies a new state that a nanobind instance can be in: if inst->offset == 0, the instance refers to no C++ object and is not stored in the inst_c2p map.
Hi @oremanj, after thinking a bit about this, I think that my position on this is as follows: if you create a Python object with a The PR you created is.. complicated. As far as I can tell, the main point is to enhance the capabilities of non-owning Python objects to do things that go beyond the spec, and to handle nontrivial consequences that this has on shared ownership. I am less clear about those parts, so it is possible that I misunderstood. There is one part of this PR that seems quite harmless, which is to append objects to a linked list in a different order. I had the impression that that might considerably reduce the fallout from "out of spec" usage (but not all of it?). In any case, I am much more open to a change of that magnitude. That all said, in #589, you mentioned the following:
I agree that this is a problem, and that should be fixed. I think it will be necessary to redesign the "relinquish" mechanism so that this failure mode does not exist. But this is more specific to unique pointers and not use of non-owning references in general. Altogether, I had the impression that the contents of the PR aren't particularly related to #888, which was about confusion between return value policy and whether or not to search the instance map. (But please let me know if I missed something) |
Thanks for your feedback. I'm going to try to make the case for why I think this is a problem that is worth spending some complexity budget on. Of course, as always the final outcome is up to you.
That is certainly a position that would simplify the constraints on nanobind. It doesn't seem very realistic to me; there are many C++ objects whose lifetimes cannot be tied to the lifetime of a Python object, which it is still useful to be able to inspect from Python. Setting rules for use of those objects in terms of visible operations in the Python program is feasible, though of course making it impossible to make lifetime errors is better where possible. Setting rules about when the Python object is destroyed -- something that Python goes out of its way to hide from the user, and that can be impacted in ways that many experts would have a hard time noticing, by commonplace actions like throwing an exception, attaching a debugger, or using PyPy -- seems like a standard that's almost impossible to actually meet. It's particularly pernicious because the existence of a dangling object creates no problems most of the time, but will sporadically fail in unclear ways when an address happens to get reused. I tend to think it's worth a fair bit of implementation complexity to avoid that sort of heisenbug. To demonstrate the difficulty of adhering to the position you've laid out: Any C++ pointer passed to a Python callable will use
I don't dispute this. I think it reflects the fact that the underlying problem is complicated. Lifetime issues, and specifically the mismatch in lifetime idioms between the different languages involved, are probably the hardest part of writing cross-language bindings. If the binding library handles reasonably predictable ways for lifetime issues to create surprises, then that has a force multiplier robustness impact on all authors of extension modules using the library, and all people who write Python scripts using those extension modules. On the other hand, if the binding library chooses to not acknowledge some of that underlying complexity, then the complexity will tend to show up in surprising and occasionally brittle ways for users who are much less experienced in understanding the nuances of these issues than we are. I'm sure there are likely ways to achieve the same goals as this PR (or at least many of them) with less implementation complexity. I am extremely open to making changes along those lines; I made it as simple as I could without leaving any cases unhandled (as far as I could tell), but I don't know which cases might deserve to remain unhandled. I am more skeptical of the idea that the problems this PR solves categorically don't matter, because I have personally spent quite a number of hours debugging crashes that this PR would've prevented.
I would say that the main point is to fix the fact that exposing an address to Python using the
If we don't admit the existence of dangling objects, then a decent fraction of the value of this PR could be obtained more simply by refusing to reuse an object that carries no ownership when returning an object that does carry ownership. (Or if you do, attach the ownership to the existing object somehow.) "Ownership" here would include things like
The relevance is mostly the example from #888 that says "For example, suppose that code returns an object twice -- once using rv_policy::reference, and later using rv_policy::take_ownership. The second ownership transfer will never occur." This PR fixes that. Another less-slam-dunk link: #888 says "It can also lead to weird/unexpected behavior. For example, [snippet] will not actually move the field when somebody else has previously created a reference to
That helps ensure that when we have created multiple Python instances for the same C++ type and address, we return the one that carries more ownership. Unfortunately it doesn't help at all under the nanobind status quo where we never create multiple Python instances for the same C++ type and address.
I mean, fundamentally if you pass a You could say that relinquished instances lose their C++ address identity and can never be revived, such that later passing the same C++ address back to Python always creates a new instance. I think that would be an unfortunate regression from the current status quo, but it is at least consistent and basically understandable. But if you want to say that relinquished objects maintain their C++ address identity, then I think you have to admit that this means they can dangle, and thus nanobind ought to handle dangling objects in some way other than considering their existence to be evidence of a bug. I guess you could alternatively implement de-conflicting logic along the lines of this PR while refusing to do it for anything but a relinquished object... but that seems a little silly to me :-) |
Sure, no rush! #879 is sort of the opposite problem: an instance returned using |
This is a potential answer to #888, also inspired by the discussion in #589. Seeking feedback on the approach before I work on docs and tests.
When returning unique or shared ownership to Python when some Python instances already hold a non-owning reference to the same C++ object, we have a few choices:
This PR also handles the fact that a non-owning instance can dangle. That is, the referenced C++ object could be destroyed while the Python instance is still alive -- especially on PyPy, where it's hard to control when a Python instance dies. Dangling instances are always non-owning, so they are mostly handled by the same logic that handles rv_policy changes. The remaining piece of support for dangling instances is to acknowledge that, if the C++ referent is freed, a new instance could be allocated with its same address, even one with internal storage. (I have observed this in production.) So, there is some new logic in
inst_new_int
to remove the previous must-be-dangling instances frominst_c2p
, rather than crashing because they exist. This also implies a new state that a nanobind instance can be in: if inst->offset == 0, the instance refers to no C++ object and is not stored in the inst_c2p map.