Make tracing suitable for event-driven architectures #4349

wdonne · 2024-12-30T10:19:47Z

The current specification focuses on only one use-case. This is a synchronous service call with a clear beginning and an end, which are known upfront. Moreover, the logic to produce a consistent trace is left entirely to the emitter of the telemetry. Backends are not required to be able to construct traces from bits and pieces.

For event-driven architectures this cannot work because:

There is no logical end to an event trace. The last event that occurred is the end, but you cannot know that when it occurs.
The beginning of an event trace is uncertain. It is the first occurrence of event with a certain trace ID. That can come from anywhere. In all those places a root span could be generated but without an end time.
An event has no duration. It just marks a moment. With post-processing, the time between an event and some reaction to it may be measured, but there isn't always a reaction.
Often events already carry something like a correlation ID, which are propagated. It should therefore be possible to set trace IDs that are derived from such information instead of having only generated IDs.

To fix this, it should be possible to generate traces that are a collection of root spans, all with the same trace ID and no end time. From that, a backend can produce a consistent trace when it is requested or update it if it is stored.

danielgblanco · 2025-01-06T10:34:25Z

Hi @wdonne the tracing specification does contemplate asynchronous behaviour. We have a specific SIG that is working on semantic conventions for messaging. They meet on Thursdays at 8:00 PT and their Slack channel in CNCF is #otel-messaging.

We think some of the aspects you're proposing can be modelled via current functionality or can be discussed further in that group. Has this been already raised there?

wdonne · 2025-01-08T10:42:46Z

Hi @danielgblanco , event-driven systems are, indeed, asynchronous and often use a messaging system. The semantic conventions for messaging is only about attributes, as it is the case for all semantic conventions. Therefore, I wonder if that is the right place to discuss the constraints in the general specification that inhibit tracing for event-driven architectures. But if you think that SIG is a better option, then I will go there.

wdonne added the spec:trace Related to the specification/trace directory label Dec 30, 2024

danielgblanco added the triage:deciding:needs-info Not enough information. Left open to provide the author with time to add more details label Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make tracing suitable for event-driven architectures #4349

Make tracing suitable for event-driven architectures #4349

wdonne commented Dec 30, 2024

danielgblanco commented Jan 6, 2025

wdonne commented Jan 8, 2025

Make tracing suitable for event-driven architectures #4349

Make tracing suitable for event-driven architectures #4349

Comments

wdonne commented Dec 30, 2024

danielgblanco commented Jan 6, 2025

wdonne commented Jan 8, 2025