-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add otel_monitor #461
base: main
Are you sure you want to change the base?
Add otel_monitor #461
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #461 +/- ##
==========================================
- Coverage 73.57% 72.83% -0.75%
==========================================
Files 53 54 +1
Lines 1722 1745 +23
==========================================
+ Hits 1267 1271 +4
- Misses 455 474 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Just had a thought, maybe this should be part of the sweeper? |
handle_info({'DOWN', Ref, process, _Pid, normal}, State) -> | ||
case ets:take(?TABLE, Ref) of | ||
[] -> nil; | ||
[{_Ref, SpanCtx}] -> otel_span:end_span(SpanCtx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be more than 1 span being monitored in the same process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but each span will get its own ref, no? I guess this is what I will find out when I write tests...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, my mistake. I was thinking it was like my implementation which monitors all spans in a process, so it looks up all spans for a pid, not by ref.
Which is a separate question I have, is it really worth doing it per-span, are there spans that a user wouldn't want ended when a process crashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are asking why I didn't write a PR that monitors every span's process, it's because bigger changes like that are less likely to be accepted. I am aiming with this PR for the incremental approach: first offer the api optionally, if it turns out that everyone uses it all the time, then contemplate making it mandatory.
But if you are comfortable going directly to monitoring every span's process, then I am happy coding that up.
I'm not sure whether it's cheaper to have a single monitor per pid and juggling spans associated with that pid in an ets table, or just creating a new monitor and ets row per span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'm asking why not monitor every span in a single process the user requests.
If the user requests to monitor Span X in process P do they really not want Span Y that is also active in process P to be ended when the process dies? If they do they have to call monitor(X)
and monitor(Y)
. The API I had in mind was just monitor()
, and then every span in self()
is monitored. I suppose also a monitor(Pid)
makes sense to include if we go that route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if going the route of just monitoring the Pid and all its spans we can add a pid
to the span itself. In my PR for the monitor I add a pid to span so that the existing table can just be used to look up the spans to end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm asking in slack. It is hard to know what is best without people's input on how they'd use it, or if they are already using their own solution, like you have been.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 glad we are having this discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user requests to monitor Span X in process P do they really not want Span Y that is also active in process P to be ended when the process dies? If they do they have to call monitor(X) and monitor(Y).
I would want to have this applied to every span.
{noreply, State}. | ||
|
||
monitor(SpanCtx) -> | ||
ok = gen_server:call(?SERVER, {monitor, SpanCtx, self()}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User probably doesn't want to crash if the monitor fails. I'd say probably just want a cast
unless you think some users will want to be able to rely on this feature in such a way that it should return false
if the call fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea, returning true / false based for success / failure.
My first instinct would be to keep it self-contained and not include it in the sweeper. It's also a different use-case to the sweeper. The usage patterns of the sweeper are also quite different; it does a lot of work sometimes, while the monitor is doing a small amount of work all of the time, and it will also end up in the hot path of people's code, so it should be fast. Perhaps a cast is also better from the perspective of speed... |
@derekkraan What do you think of the alternative of monitoring every span in the process that is monitored? I also wonder about the api being an option passed to |
Re: monitoring every span, sounds good, and it appears this is what everyone wants, but I think then the exception event should only be added to the outermost one? Or the one that has been explicitly monitored? What do you think? I do think there should also be an option to |
Yes outermost seems fine. |
I don't think outermost is a good idea. It would be a bit of a pain to implement. And really I would think the inner most would make the most sense as its where the failure is more likely to have happened? I think it is better to just mark any active span in a process with the information. |
If the innermost is easier I think that's fine too, any trace viewer should handle that fine. |
My latest #602 |
Following up on this PR: open-telemetry/opentelemetry-erlang-contrib#109
This PR adds a new module,
otel_monitor
. We associate the span with a monitor ref in an ets table, and end the span when we detect that the process has died.Looking forward to feedback for this one.
(note: I haven't looked at tests yet)