-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
profiles: update default value field in message Profile #608
Conversation
To indicate the default kind of the sample type, the pointer should point to the array of `sample_type`. This avoids issues where the string that is pointed to by `default_sample_type_strindex` does not match with the value types in `sample_type`. ping @open-telemetry/profiling-maintainers @jhalliday
@@ -269,6 +265,9 @@ message Profile { | |||
// The field is optional, however if it is present then equivalent converted data should be populated in other fields | |||
// of this message as far as is practicable. | |||
bytes original_payload = 21; | |||
|
|||
// Index into the sample_type array for the type of the default sample value. | |||
int32 default_sample_type_index = 23; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the discussion in #606 (comment) this field ID could also be 16. 23 is chosen here, as this is the next free field ID once #606 is merged.
@@ -269,6 +265,9 @@ message Profile { | |||
// The field is optional, however if it is present then equivalent converted data should be populated in other fields | |||
// of this message as far as is practicable. | |||
bytes original_payload = 21; | |||
|
|||
// Index into the sample_type array for the type of the default sample value. | |||
int32 default_sample_type_index = 23; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This avoids issues where the string that is pointed to by default_sample_type_strindex does not match with the value types in sample_type.
Did you see this issue in practice? It's not very obvious to me that this change is worth it:
- The distinction between _index and _strindex can be subtle to some producers and consumers, causing bugs potentially.
- The index can still be out of bound and needs care similar to the string. Arguably, bugs around not adjusting the index can be more subtle - e.g. if certain profile processing drops some sample types but does not adjust the default sample type, it may be easier to notice and debug that if the default type is a string.
Also note that this changes the semantics of unset default sample type compared to pprof: here the unset field will effectively mean first sample type, in pprof that means last:
// Index into the string table of the type of the preferred sample
// value. If unset, clients should default to the last sample value.
int64 default_sample_type = 14;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The named issues around _index
vs _strindex
and indexing issues of these fields can be argued about every _strindex
and _index
field in the protocol and therefore are not specific to this field.
Unfortunately there are implementations, where default_sample_type
pointed to microseconds
(in the string table) and the unit in sample_type
was in nanoseconds
. Which one has the precedence in such a case?
If default_sample_type_index
is an index into sample_type
then such issues will not happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that example it seems odd that the sample type name had the unit. I'd expect the name of the sample type to be "time" or something like that, and that's what the default sample type field would also contain. And microseconds or nanoseconds would be the unit field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I'm mixing something. The value in sample_type
was [["cpu","nanoseconds"]]
- just like the example in the documentation to sample_type
. While default_sample_type
pointed to a value in the string_table
that referenced microseconds
.
So to me the values in sample_type
and default_sample_type
are valid by themselves. But as the unit (nanoseconds
) in sample_type
conflicts with the value in default_sample_type
it is hard to interpret the profile in the correct way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_sample_type_strindex
is supposed to point to one of ValueType.type_strindex
values.
For example, for sample types of [{"cpu_time","nanoseconds"}, {"num_samples", "count"}]
, default_sample_type_strindex
can point to the string index for cpu_time
or num_samples
, but not to nanoseconds
or count
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your example :) I think it points out exactly the issue with this field. While I run into issues with nanoseconds
vs microseconds
your example set it even to something different.
If one looks at the data one can usually differentiate and interpret data correctly. But such differences make it hard to build tooling for automation and potential recommendation.
For that reason, default_sample_type_strindex
should become default_sample_type_index
and point to an element in sample_type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the unit name is clearly not unique - two or more field can use the same unit (e.g. bytes). I'm not sure why this is ambiguous tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my motivation with this proposal is to avoid named issues in the future by pointing the index for the default value to the elements that already exist in sample_type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have opened #610 as an alternative and will close this proposal.
An alternative is proposed with #610 |
To indicate the default kind of the sample type, the pointer should point to the array of
sample_type
. This avoids issues where the string that is pointed to bydefault_sample_type_strindex
does not match with the value types insample_type
.ping @open-telemetry/profiling-maintainers @jhalliday