-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Rename db|messaging|gen_ai.system
to *.provider.name
, rpc.system
to rpc.protocol.name
#1613
Open
lmolkova
wants to merge
3
commits into
open-telemetry:main
Choose a base branch
from
lmolkova:rename-system-to-provider-name
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[DO NOT MERGE] Rename db|messaging|gen_ai.system
to *.provider.name
, rpc.system
to rpc.protocol.name
#1613
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
change_type: breaking | ||
component: db, messaging, gen_ai, rpc | ||
note: Rename `db|messaging|gen_ai.system` to `db|messaging|gen_ai.provider.name`, `rpc.system` to `rpc.protocol.name` | ||
issues: [1581, 1613] | ||
|
||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this comment here for lack of a better place and so it can be marked resolved when appropriate.
I think I prefer the word "system" over "provider".
When I read "db.provider.", "messaging.provider.", "gen_ai.provider.*", my first reaction is: provider of what? Provider of the db? No.. the concept we're trying capture is the "The database management system (DBMS) product as identified by the client instrumentation." Provider of the messaging? That doesn't seem right. Provider of the gen_ai? In all these cases we're trying to capture client's understanding of the type of system its communicating with.
"provider" is also strange when you consider expanding into the server side. In #1581 I proposed we may one day want to add an attribute representing the version of the db system, which would yield
db.provider.version
.db.system.version
is more intuitive to me.The word "system" seems just generic enough to be applicable to a broad set of domains. Its intuitive for a "system" to have name and version attributes describing it. Its not obvious to me what a "provider" is in the context of different domains, and what a provider's name and version attributes are describing. (I.e. why does a provider need a name and version in the first place?)
The use of "protocol" for the rpc domain is an interesting divergence. The "protocol" according to the client's POV is really the thing we're capturing in all these cases. I.e. a db client may report "postgres" even though the system behind the scenes it not postgres but has added postgres compatibility - "postgres" is just the protocol. This suggests we could go in a different direction and use a client-oriented naming convention, choosing a word that makes it abundantly clear that this is the name of the protocol as far as the client is concerned. But if we did this, we'd have invent new attributes to refer to the server side of things and correlation would suffer.
That's another reason I like the word "system". It seems to make enough intuitive sense from both a client and server perspective that we can get away with using the same word in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me system is better if we are trying to fit a name to multiple domains, however it is clear as mud here. It is used for a model family, and api and a platform. When we use "provider" it fits into "LLM provider" but then again this namespace is likely to be used for things besides pure LLM proxies.
I have seen "cloud provider" used, and it has potentially multiple apis, and that's the salient point of being not terrible for genai. Most services are web services even if they aren't always a cloud platform. platform could be a better word.
In any case, the concrete problem I've found that should be changed in genai isn't about system not having a .name suffix, rather system being used for unclear reasons. If it were broken down into what impacts things, it would be time better spent, vs justifying a global rename.
system ~= platform, cloud, etc, e.g. openai (the platform), xai, etc
api ~= the api used on that system which hints at the SDK, e.g. openai (used by many) sometimes others also re-use non-open apis. Even openai has a v2 now, and soon that's likely to be emulated! Also, some systems you could argue the invocation style matters. e.g. bedrock has a
converse
api which is more portable than its other ones. Long story short, the api has a lot more structured info needed, than the name of the system. an end user could convince me easily otherwise.model_family ~= if we really want to know the model family separate from the model requested and responded with, we could make a bucket for that, but conflating model family with system and api breaks down quickly.
I know I'll be told to make separate issues on these, etc, but I think those should happen before doing things for global namespaces as some of these change the motivation entirely. I also think these things should be led by the leads, I'm not even an approver, so take it for what it's worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah system (or whatever the word) is overloaded to refer to multiple things, and semantics vary based on whether we're talking about client or server's perspective. Tradeoff between distinct attributes for distinct concepts, or reusing attributes in different contexts with reduced conceptual purity.
The "provider" in "cloud provider" probably refers to the many services or apis provided by the cloud. Services being things like ec2, or rds, or gke. Applying "provider" to domains like messaging, database, etc doesn't seem right from this perspective. I don't mind platform - it has some of the same properties as system.
Whether or not we should differentiate between these is the tradeoff I was referring to above. Also, API might be the right word in genai, but seems like protocol is a better word in database, messaging, and rpc (if we do want to differentiate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think you are making my most important point, which is that we shouldn't make a change that couples db|messaging|gen_ai and maybe also rpc. We shouldn't have to defend one of these in the middle of a big change on other things. The pressure isn't helpful to making the model clear, even if the pressure is helpful for less PRs and more naming consistency regardless of domain it applies to.
TL;DR; These should be individually discussed and not assumed they should converge or be pressured to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems right, since its unlikely that we can find a single word which is the best across all domains. But I think what all these domains have in common is whether or not they should all follow this advice:
In all these domains, the current word "system" is being used as "{a property of the thing}" when the current semantics are actually more of a "{thing}".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the "system" point, when in a hole, stop digging. We should leave it alone until a conceptually sound alternative is made and in the case of genai it is likely multiple attributes vs multiple sub-attributes one being .name.
That's because breaking once to rename system to whatever.name, then later correct the real conceptual problem, so put it into a different X.name, that's two hard breaks.
Leaving system alone until the correct destination of the values are found is less breaks, less arduous work for folks who may not even see this PR, only find out later everything is broken and you have to re-do your indices, metrics, alerts etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should totally minimize churn. If we're going to make a breaking change, let's make sure we have the best possible shot at getting it right this time around.
To be fair, this PR is annotated with "do not merge" in the title and has language indicating it will be open for a while to encourage feedback from a wide array of people. I think this is a tactic to minimize churn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have multiple "problems" that we are trying to solve in a single PR/discussion and it will only get harder to arrive at any conclusion if we continue like this. To break it down, I see here these:
We want to allow extending
*.system.
later, so need to rename to*.system.name
We are not sure whether
*.system
here is the best terminology for all areas it is being used todayMy opinion is:
Ad 1: I'm fine with renaming and extending to add
.name
to whatever terminology we pick. Yes, it is a breaking change but we decided this long ago and have been doing it for other attributes. This is the time to break things.Ad 2: I tend to agree with folks that
provider
does not make much sense. I think*.system
for both messaging and database are much clearer than changing them toprovider
. I agree with @jack-berg provider of what? If I'm using a SaaS database service, and I see*.provider
on a span, I'd think there I would see something likeAzure
,AWS
and so on. That's the entity/company that is providing this service to me.Whereas
*.system
is the actual technology that a client application is communicating with. To me this is much clearer and can also be used just as well for server instrumentation.This opens then the question that if we need to make it consistent for all places. I undertand and would value also reaching a consistent conclusion, but maybe we can't and we don't actually need. I agree that maybe consistency for the sake of consistency could turn out to do more harm than good.
The
rpc
one is a clear example that it benefits from a different "nomenclature" and protocol makes much more sense there thansystem
. Perhaps then we should think on what to do withgen_ai
.