Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Summary metric type #67

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

palash25
Copy link

Not ready for review yet, just opening a draft for visibility

Closes: #40

@palash25
Copy link
Author

sorry for the radio silence on this one, I was moving to another city and completely forgot about it, will try to get back to this next week.

@palash25 palash25 force-pushed the summary-metric branch 3 times, most recently from c4a57ff to 6458eb3 Compare November 20, 2022 14:26
@palash25 palash25 marked this pull request as ready for review November 20, 2022 14:27
@palash25
Copy link
Author

sorry i completely forgot about this PR @mxinden can you please take a look

Signed-off-by: Palash Nigam <[email protected]>
@palash25
Copy link
Author

i will add some comments in the summary.rs file and some doctests shortly

@gmossessian
Copy link

@palash25 do you need a hand here? I'm also wanting this metric, I started implementing it myself before I saw your PR. Looks like you still need to implement changes to proto.rs, can I help with this?

@mxinden in your opinion is anything besides this needed here before releasing?

@palash25
Copy link
Author

palash25 commented Dec 6, 2022

oh you are right, i missed the proto.rs file thanks for pointing it out, i will try to make the remaining changes in the next 2 days, if I can't i will ping you for help.

@palash25 palash25 marked this pull request as draft December 6, 2022 06:45
@palash25
Copy link
Author

palash25 commented Dec 6, 2022

moving this back to draft till i make the missing changes

@mxinden
Copy link
Member

mxinden commented Dec 14, 2022

@palash25 #105 introduced a fairly large change. I am sorry for the conflict. Let me know in case you need any help updating.

@palash25
Copy link
Author

sorry for the late response @mxinden the notification got buried. I tried to make the changes according to the new encoding API but I am stuck here.

Right now my code errors with this

error[E0282]: type annotations needed
   --> src/encoding.rs:145:15
    |
145 |             e.encode_summary(sum, count, quantiles)
    |               ^^^^^^^^^^^^^^ cannot infer type of the type parameter `S` declared on the associated function `encode_summary`
    |
help: consider specifying the generic argument
    |
145 |             e.encode_summary::<S>(sum, count, quantiles)
    |                             +++++

if I do as the compiler says and add this annotation e.encode_summary::<S>(sum, count, quantiles) all the tests start passing but then my code looks a bit inconsistent in comparison to other encode_* methods in the file.

Do you know what I might be doing wrong here?

@palash25 palash25 marked this pull request as ready for review February 2, 2023 13:02
@palash25
Copy link
Author

palash25 commented Feb 9, 2023

Hi @mxinden could you take a look at the comment? ☝🏽

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here. Implementation of encoding in text and protobuf looks good to me. Couple minor comments.

Before I dive deeper into the implementation of Summary, can you expand on why you chose CKMS and how it differs to the golang implementation?

src/metrics/summary.rs Outdated Show resolved Hide resolved
src/metrics/summary.rs Outdated Show resolved Hide resolved
src/metrics/summary.rs Outdated Show resolved Hide resolved
@mxinden
Copy link
Member

mxinden commented Feb 24, 2023

Before I dive deeper into the implementation of Summary, can you expand on why you chose CKMS and how it differs to the golang implementation?

@beorn7 given that you wrote most of the client-golang Summary implementation and given your involvement in prometheus/OpenMetrics#256 I would appreciate your input here. Do you have opinions on which quantile algorithm to use?

@palash25
Copy link
Author

palash25 commented Feb 24, 2023

so about using CKMS, i went through both go and java libraries, the underlying quantile library that go uses has some inacurracies mentioned here https://github.com/beorn7/perks/blob/master/quantile/stream.go#L174 so I looked at the java client implementation and it seemed to be using a CKMS library https://github.com/prometheus/client_java/blob/c28b901225e35e7c1df0eacae8b58fdfbb390162/simpleclient/src/main/java/io/prometheus/client/TimeWindowQuantiles.java#L14 at the same time I was able to find a rust crate implementing CKMS so I just went ahead with it.

Thats all I can remember since it was a long time ago

how it differs to the golang implementation?

Not sure I remember the differences but this is what I wrote in my notes at the time which seems to be a similarity between the go library and the rust crate: "Rust crate uses the LowBiased invariant defined in the golang one quantiles/store.rs at master · blt/quantiles · GitHub"

@beorn7
Copy link
Member

beorn7 commented Feb 26, 2023

tl;dr: Use whatever algorithm you find appropriate. It would require a lot of research to assess all the algorithms out there and pick the "best" one for client_rust.

The algorithm client_golang uses was picked even before my time. I don't have strong opinions about it. It's perfectly possible that there are "better" algorithms out there. It has been a hot topic of research for a while, so there are many algorithms available. It's of course also hard to decide about the "best" algorithm because that surely depends on your use case.

There was an attempt to add a now algorithm or replace the existing one, see prometheus/client_golang#859 . It was closed for lack of follow-up when we tried to discuss the aspects in which it would be a better algorithm.

I do believe that the algorithm in client_golang is implemented correctly (i.e. as it is described in the paper). client_golang uses my fork, which includes fixes. The Merge method @palash25 has mentioned above is still broken, and it cannot be fixed, because the used algorithm doesn't support merging. But that method is also not used in client_golang.

Signed-off-by: Palash Nigam <[email protected]>
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @palash25 and @beorn7 for the input 🙏

Overall this looks good to me. I have a couple of smaller comments, nothing major. Let me know in case you need help with the feature flagging.

Comment on lines +82 to +85
pub fn observe(&self, v: f64) {
self.rotate_buckets();

let mut inner = self.inner.write();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both rotate_buckets and later on observe itself grab the write lock. The lock will be unlocked in between. Thus another thread could potentially execute in between. Does that violate any consistency guarantees. In other words does observe depend on rotate_buckets to be called right before?

See similar race condition in #102.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we blocked by this issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cz-cube definitely needs some investigation.

@@ -24,6 +24,7 @@ parking_lot = "0.12"
prometheus-client-derive-encode = { version = "0.4.1", path = "derive-encode" }
prost = { version = "0.11.0", optional = true }
prost-types = { version = "0.11.0", optional = true }
quantiles = "0.7.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that most users of this library don't need the Summary type. What do you think of feature flagging the type as well as the quantiles dependency. The latter will help keeping ones dependency tree lean, thus compile times short.

use super::{MetricType, TypedMetric};
use parking_lot::RwLock;
use std::sync::Arc;
use std::time::{Duration, Instant};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::time::Instant is problematic with wasm. Consider using https://crates.io/crates/instant instead.

}

/// Retrieve the values of the summary metric.
pub fn get(&self) -> (f64, u64, Vec<(f64, f64)>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn get(&self) -> (f64, u64, Vec<(f64, f64)>) {
pub(crate) fn get(&self) -> (f64, u64, Vec<(f64, f64)>) {

Is this needed outside of the encode module?

let inner = self.inner.read();
let sum = inner.sum;
let count = inner.count;
let mut quantile_values: Vec<(f64, f64)> = Vec::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of allocating, you could consider returning an Iterator instead. Though likely premature optimization without any proof that this is an issue.

quantile_streams: Vec<CKMS<f64>>,
// head_stream is like a cursor which carries the index
// of the stream in the quantile_streams that we want to query.
head_stream_idx: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
head_stream_idx: u64,
head_stream_idx: usize,

Why not use a usize here as it is an index into a datastructure? You compare it to max_age_buckets, but that could be a usize as well.

@cz-cube
Copy link

cz-cube commented May 9, 2024

We want to observe multiple sample at once. Please consider adding fn observe(value: f64, times: usize).

@palash25
Copy link
Author

I am sorry for the inactivity here. I have not been in the best of health for the past few months. Please feel free to take over this PR and make whatever changes are required. You can even close and let someone else have a crack at it if they want to start from scratch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Summary metric
5 participants