-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for bound instruments to the metrics API #4126
Comments
I like the idea of bound instruments, but I would first be interested to understand how much of an improvement they would provide. You mentioned map lookup, but even in your concrete example you would either have some switch statement to know which counter to bump, or in more realistic situation you might have a map lookup in the user code anyway. For example, when Jaeger collector emits metrics about the volume of received spans, it tags them with Having said that, there are certainly situations where labels are contextually known, e.g. RPC metrics by the endpoint - each endpoint implementation can have a reference to a bound instrument with no additional lookup. What actually worries me is the implementation details below the surface. I was looking at Go SDK today and it goes to lengths to avoid memory allocations by allowing users to pass AttributeSet which has a precomputed hash code for efficient lookup. But then something else causes memory allocation which destroys the benchmark (jaegertracing/jaeger#5676). So what I would really like to understand is whether bound instruments can provide a path to allocation-free (and ideally lock free) counter bumps, and whether the same can be achieved with the existing API. |
OTel .NET's benchmarks show a totally diff. story. With 0 attributes, it takes 10-15ns to perform a Counter.Add. With 1 attribute, it moves to 50, and with 5 attributes its near 200ns and so on. Bound instrument would give the same performance irrespective of the attribute count, and that would be the same as that of the 0 attributes case (.NET special cases the 0 attribute case, giving us a hint into the possible gains with bound instrument!) In short, with bound instruments, I expect the bound_counter.Add call to take 10-15ns, irrespective of the attributes bound to it. Without bound, this could be several 100s, so the perf gains are much more. (Used numbers from OTel .NET, and I see similar from OTel Rust too). Also adding the slack conversation earlier this year: https://cloud-native.slack.com/archives/C01NP3BV26R/p1700165373479809?thread_ts=1700152641.765839&cid=C01NP3BV26R |
#4126 (comment) Shows expected improvement based on my experiments with OTel .NET, Rust. I don't think it'll be significantly diff. in any language. |
@cijothomas If the API forces you to pass attributes one at a time (whether via varargs or a map) then there will be a proportional growth in cost. My point was that Go SDK already implements so called AttributeSet that, if I understand it correctly, would make the performance the same regardless of the number of attributes used. So if all that we're getting with bound instrument is avoiding 11ns map lookup, yes it's still a 2x improvement, but compared to 200ns for several attributes it's still negligible, and it seems we can do much better without introducing a whole new concept to the spec. Of course one can argue that the concept of AttributeSet is already pretty similar to bound instruments. Did we have any API proposals of what bound instruments should look like in the API? |
I believe Erlang/Elixir would see a significant boost when it comes to use of integer counters here. Right now measurements have to go through a table lookup and this would remove that need and instead allow direct access to the |
The motivation for bounded instruments is to avoid a Hashmap lookup when recording a measurement. Hashmap lookup needs both the hash code computation and an equality comparison check for the keys. The more the number of attributes the more would both these operations cost (hash code computation + equality check). I'm not aware of the details of Go's AttributeSet approach but it sounds like they only solved half of the issues with Hashmap lookup. They might have avoided repeated computation of hash code, but they would still incur the cost for equality check. A bounded instrument would let us avoid these costs entirely by providing us a direct handle to the metric value that needs to be updated.
The map lookup cost would not be some constant value for all kinds of measurement recordings. It would be higher for recording higher number of attributes and lengthier key and value strings.
Allocation-free recording of measurements should be possible even with the existing API. The motivation for bounded instruments is to achieve significantly faster recording of measurements (for highly perf sensitive scenarios) and NOT allocation or lock-free operations. |
(Note: OTel Rust tried AttributeSet concept only to revert it before shipping : open-telemetry/opentelemetry-rust#1512) |
My expectation is that when a bound instrument is used, the map lookup step described above is bypassed. I expect there is an initial step to construct the attribute set, then to perform a map lookup, after which the bound instrument is instantiated, holding a reference to the thing that was looked up. Subsequent operations using the bound instrument should not perform any map lookup-- in the case of a bound counter instrument, I expect a direct reference to an atomic variable, expecting this operation to take O(10ns). |
Some comments from the TC 8/14/24 meeting:
We would like a reference implementation to show the benefits as a prerequisite to accepting this. |
True! Bound ones cannot be enriched, it just has the attributes it was bound with. (except the Meter attributes, Resource attributes that are implicitly associated...) |
I'd be preparing one in OTel Rust. (mid-late Sept), but would be also good to see if OTel Java/Go can resurrect their existing bound instrument implementations. |
@jack-berg #4126 (comment) has shown the potential gains with bound instruments.. Wondering if that is what TC was looking for? |
@cijothomas I apologize for the delay. I wanted to be sure that we evaluate all the issues connected with this topic. I definitely support adding bound instruments to the Metric API, and I see it as one of several optimizations that an SDK could offer users to improve the cost of metrics instrumentation. Instead of a narrow focus on bound instruments, I would like to propose a more aggressive rewrite of the specification that gives each SDK broad permission to extend the metric API with alternatives tailored to specific needs in response to user feedback. At a high level, what I'm proposing says that the exact mechanics of the metric API do not matter at the specification level, because they are naturally language-specific in the extreme; idioms and practices of each language should govern metric API design. Relax the Metrics API specificationThe most important aspect of the Metric API/SDK specification, in my opinion, is the terminology and the data model. As an overview, I would attempt to relax the Metric API specification as follows, roughly:
I understand why people look to bound instruments -- it's a straightforward, established optimization for Metrics APIs. However, to use bound instruments requires a tight integration between instrumented code and the Metrics API; users have to inject stateful handles into their code in order to benefit, and the benefits are limited in cases where there is dynamic context. To me, these detractors make bound instruments a secondary optimization. I came to the proposal above after considering the thread started here: https://x.com/juliusvolz/status/1805523949260095672 in which an (IMO) unfair comparision between bound-style and unbound-style interfaces was shared without much context. One reason the comparison is unfair is that the OTel-Go Metrics API has made concessions in the API with optimization potential similar to bound instruments; the OTel-Go This has been a particular pain point for my internal users of OTel-Go. We migrated to OTel from a Statsd-flavored Metrics API, where instruments are not compiled and attributes are presented as key-value lists, and there's too much code to update every call site. In the Statsd-flavored API, attribute sets arrive as a key value list, that is the code path that matters to me. We should be able to convince ourselves that bound instruments fundamentally can't help with this problem, without my implementing essentially another Metric SDK. In order for the Statsd-flavored API to benefit from bound instruments, it has to construct a map from a list-of-key-values to bound instrument handles, which is what Metric SDKs are supposed to do. We should not offer bound instruments so that users can build sophisticated caches of bound instruments, we should provide SDKs with built-in sophisticated caches of bound instruments. Note that the Prometheus Go client, the library used for the comparison above, implements a sophisticated map from list-of-key-values to bound instruments based on "fingerprinting" the list to avoid memory allocations. The same could be done in an OTel-Go SDK, the same is done in the Lighstep Metrics SDK (bypassing the OTel API). Still, for the Stats-flavored API to benefit from fingerprinting, the input has to be list-of-key-value, not To illustrate how I would apply the rules above, some hypothetical examples. Hypothetical Example: List-of-key-value alternative APIAs described above, there is a lot of existing code that either wasn't written with bound instruments in mind or where there is no way to store a bound instrument, without creating a sophisticated cache keyed by list-of-key-values. The specification should not prevent a Metrics API from offering multiple input formats through multiple interfaces. The API could be extended with an alternative list-of-key-value interface, and the SDK would add an optimnized code path based on fingerprinting. A fallback mechanism would be provided for SDKs that do not recognize the alternative API, which in this case would construct the AttributeSet and call the basic API. Hypothetical Example: HistogramWithCountThere has been a request to add an optional Count argument to the Histogram instrument. While I agree with the motivation for this request, the rules above would require a new instrument to be created instead of overloading the existing histogram. The API could be extended with an instrument called HistogramWithCount, where the single operation ("Record") takes a second argument ("count"). A fallback mechanism would be provided for SDKs that do not recognize the alternative API, which in this case would repeatedly call a basic Histogram instrument count number of times. Hypothetical Example: Struct-tag attributesThe Segment.io Go Stats library library uses struct tags to convey attribute sets efficiently and idiomatically without memory allocations. If the OTel-Go community demanded it, and the maintainers agreed, the API could be extended to do this and the SDK could follow an optimized code path for these inputs. A fallback mechanism would be provided for SDKs that do not recognize the alternative API, which in this case would convert struct-tag metric events into basic metric events. Hypothetical Example: Multi-variate metrics APIAt times, especially when there is dynamic context, an application may be interested in computing one AttributeSet and then re-using it across a number of metric instruments for a simultaneous observation of a number of values. This was a feature of the OpenCensus Metric API that was de-scoped from OpenTelemetry which has no way to record multi-variate metrics. While the potential for optimization is good and I would support this proposal, it is probably at odds with the proposed specification changes above. There is semantic meaning in a multi-variable metric event, and the Metric API has no existing terminology for the pair of Instrument and Value, what OpenCensus and early drafts of OpenTelemetry called "Measurement". Even though a fallback mechanism could be provided for the multi-variate metric event into many basic metric events, this strays too far from the original specification and would require specification work. |
Bound instruments is the idea of obtaining a reference "bound" to a particular set of attributes which is known ahead of time. By doing so, you can record measurements directly to the bound reference and avoid an internal lookup.
For example, suppose I know ahead of time that I need to record measurements to attribute sets
{"color": "red", "shape", "square"}
,{"color": "red", "shape", "circle"}
,{"color": "blue", "shape", "square"}
,{"color": "blue", "shape", "circle"}
.Without bound instruments, code to record these series might look like:
Each time a measurement is recorded, the internals of the SDK must lookup the aggregated state corresponding to the attributes in a map.
With bound instruments, you can avoid this map lookup and improve the performance of record operations:
We should consider extending the metrics API to support this concept. I recently wrote a blog post comparing otel java metrics to other metrics systems in the java ecosystem (prometheus and micrometer), and one notable negative of otel was the lack of support for bound instruments. For java, avoiding the map lookup saves ~11 ns per record call on my machine.
Its worth noting that bound instruments serve a rather niche use case where: 1. The attribute sets are known ahead of time. 2. The application is very performance sensitive. If either of these is not true, then bound instruments are of limited value. (I previously argued that we should punt on support for bound instruments in the initial metrics API release for this very reason.) For example, some of the most popular metrics in otel are
http.server.request.duration
andhttp.client.request.duration
, and neither of these would be able to benefit from bound instruments because the attribute values are computed at runtime using application context.The text was updated successfully, but these errors were encountered: