Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amend Pipeline Component Telemetry RFC to add a "rejected" outcome #11956

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 14 additions & 5 deletions docs/rfcs/component-universal-telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,11 +90,20 @@ The location of these measurements can be described in terms of whether the data
component to which the telemetry is attributed. Metrics which contain the term "produced" describe data which is emitted from the component,
while metrics which contain the term "consumed" describe data which is received by the component.

For both metrics, an `outcome` attribute with possible values `success` and `failure` should be automatically recorded, corresponding to
whether or not the corresponding function call returned an error. Specifically, consumed measurements will be recorded with `outcome` as
`failure` when a call from the previous component the `ConsumeX` function returns an error, and `success` otherwise. Likewise, produced
measurements will be recorded with `outcome` as `failure` when a call to the next consumer's `ConsumeX` function returns an error, and
`success` otherwise.
For both metrics, an `outcome` attribute with possible values `success`, `failure`, and `rejected` should be automatically recorded,
corresponding to whether or not the corresponding function call returned an error, and whether the error originates from the next
component(s) in the pipeline, or from one further downstream.
jade-guiton-dd marked this conversation as resolved.
Show resolved Hide resolved

Specifically, a call to `ConsumeX` is recorded with:
- `outcome = success` if the call returns `nil`;
- `outcome = failure` if the call returns a regular error;
- `outcome = rejected` if the call returns an error tagged as coming from downstream.
After inspecting the error, the instrumentation layer should tag it as coming from downstream before returning it to the parent component.

The upstream component which called `ConsumeX` will have this `outcome` attribute applied to its produced measurements, and the downstream
component that `ConsumeX` was called on will have the attribute applied to its consumed measurements.

Errors should be "tagged as coming from downstream" the same way permanent errors are currently handled: they can be wrapped in a `type downstreamError struct { err error }` wrapper error type, then checked with `errors.As`. Note that care may need to be taken when dealing with the `multiError`s returned by the `fanoutconsumer`. (If PR #11085 introducing a single generic `Error` type is merged, an additional `downstream bool` field can be added to it to serve the same purpose.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this may be a breaking change for some components, IF they are checking for types of errors using something other than errors.As. I think it's ok though and those components should update to use errors.As instead anyways. However, we should be aware of this when implementing changes in case it is a widespread problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right. I think those components would already be broken anyway, because of the permanentError and multiError wrappers we already use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we need / want to update this once #11085 is merged?

Copy link
Contributor Author

@jade-guiton-dd jade-guiton-dd Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If #11085 gets merged before this PR, I'll update this paragraph to only include the parenthetical. If this PR gets merged first, I think presenting the two alternatives is probably good enough? But we could make a second amendment if we feel the need to.


```yaml
otelcol.receiver.produced.items:
Expand Down
Loading