-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Rename db|messaging|gen_ai.system
to *.provider.name
, rpc.system
to rpc.protocol.name
#1613
base: main
Are you sure you want to change the base?
[DO NOT MERGE] Rename db|messaging|gen_ai.system
to *.provider.name
, rpc.system
to rpc.protocol.name
#1613
Conversation
schema-next.yaml
Outdated
- rename_attributes: | ||
attribute_map: | ||
feature_flag.provider_name: feature_flag.system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to extract it to another PR that we can merge right now to reduce churn in feature-flag area
9da95a2
to
60daa8c
Compare
@@ -33,7 +33,6 @@ with the naming guidelines for RPC client spans. | |||
|
|||
| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | | |||
|---|---|---|---|---|---| | |||
| [`rpc.system`](/docs/attributes-registry/rpc.md) | string | The value `aws-api`. | `aws-api` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe it's used correctly here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is, AWS SDK is modeled as remote call. There is #1622 to add aws_api
to the well known enums.
Anyhow, removing rpc.system
but keep rpc.method
and rpc.service
doesn't seem right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a remote call though - it's a logical call that uses HTTP underneath. OTel RPC semconv represent actual remote calls using remote calling protocols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that's more or less the same for GRPC which uses HTTP/2 as transport.
Anyhow, likely a topic for a different issue.
But still I think you should either keep or remove all rpc attributes here.
db|messaging|gen_ai.system
to *.provider.name
, rpc.system
to rpc.protocol.name
, feature_flag.system
back to feature_flag.provider_name
db|messaging|gen_ai.system
to *.provider.name
, rpc.system
to rpc.protocol.name
I have some opinions on this, which I don't expect to be accepted, but would like on the record even if dismissed. As genai is typically accessed from web service apis which are commodity and easier than the typical case of database protocols, I would prefer changes to be discussed independently. For the case of genai, I do strongly prefer whatever word we use for api (e.g. openai which is often emulated) to be decoupled from the provider of that. However, I would like this to advise to use the same constant even if in some cases the client might get it wrong. For example, don't say something like "azure inference client" as the system or provider name, for when it is being used to access azure openai service with the openai api. Especially a proxy client (litellm also has a proxy client library), should attempt to do more than say its name. The field will be used for both sides, and it seems a step back to end up with client SDK names in either of these fields, vs a web service api, possibly private provider of that. The edge case of local llms, e.g. running llama.cpp embedded, we can discuss, too, but most of the time this is about web services. more below if you like I think there is ongoing tension treating genai like database, when in practice they are typically accessed via web services apis and so more like a normal cloud service than a database. This becomes even more the case as more and more are instrumented server side, or as proxies. For example, while there are indeed some L7 proxies for database protocols, it is a lot more common and easier to do on genai as most if not all well known services are web services. So, there are backends who instrument the http server side like a normal private cloud product, as well as proxies who instrument both ways today. So, I would like to decouple the topic of database from genai, especially this sense of provider being implied as a client identifier. I'm not sure "provider" or "system" for that matter intuitively mean "client perceived name" especially when the attribute isn't namespaced as client and also used for server data (in practice even if only in genai metrics are the only server thing so far). I cofounded a project back in 2009 called jclouds which has very much overlap on the problem of api vs provider of that. When the community got together, probably the single best feature we did was formalizing "api" independent of "provider" of that. So, while I do like the idea of using provider, I would like it to be more strong, so it at least suggests to use an authoritative name for the actual service as opposed to a client SDK name, especially in SDKs that target multiple services. |
@codefromthecrypt can you suggest what'd you like to be changed/added ? |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my suggestions, and most are limited to gen_ai, however one isn't.
Generically:
- use less dots: unless we expect provider.something_else just call it provider. Similar for protocol.name. e.g.
protocol
instead ofprotocol.name
.
In GenAI:
- Correct the text about provider as it has a very indirect relationship with models in many cases. For example, if it is about models, rename it to model_family or something. Concretely, many LLM providers host model families owned elsewhere.
- The "Provider" concept should be the, possibly off the shelf, platform in use, for example OpenAI (the platform) vs Azure OpenAI vs xAI vs ollama. In some cases, the client won't know this without looking at the hostname.
- As many are emulating apis, and the client may not be able to guess the "provider", or choose not to based on hostname etc. use "api" to establish that. For example, commonly for at least several systems it can be "openai" even if the provider is not that. Similarly, on a proxy, you may have an inbound "openai" compatible request despite the provider being that proxy.
If these changes aren't specific enough, @lmolkova lemme know and I'll try again!
@codefromthecrypt this PR is limited to renaming You're welcome to propose any additional clarifications, caveats and changes to |
@lmolkova renaming Hope this helps! |
specifically I think it will be the "api" that has a version associated with it, which would be something we can resolve elsewhere. The provider itself doesn't need to add dots in defense of adding a version to it. I would prefer it simple and us take the api/ api version stuff in another PR. Make sense? |
@codefromthecrypt thanks for the suggestion. I agree provider makes sense. Adding |
…ider.name, pc.system to pc.protocol.name, feature_flag.system back to feature_flag.provider_name
60daa8c
to
ad77c8f
Compare
@@ -0,0 +1,5 @@ | |||
change_type: breaking | |||
component: db, messaging, gen_ai, rpc | |||
note: Rename `db|messaging|gen_ai.system` to `db|messaging|gen_ai.provider.name`, `rpc.system` to `rpc.protocol.name` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this comment here for lack of a better place and so it can be marked resolved when appropriate.
I think I prefer the word "system" over "provider".
When I read "db.provider.", "messaging.provider.", "gen_ai.provider.*", my first reaction is: provider of what? Provider of the db? No.. the concept we're trying capture is the "The database management system (DBMS) product as identified by the client instrumentation." Provider of the messaging? That doesn't seem right. Provider of the gen_ai? In all these cases we're trying to capture client's understanding of the type of system its communicating with.
"provider" is also strange when you consider expanding into the server side. In #1581 I proposed we may one day want to add an attribute representing the version of the db system, which would yield db.provider.version
. db.system.version
is more intuitive to me.
The word "system" seems just generic enough to be applicable to a broad set of domains. Its intuitive for a "system" to have name and version attributes describing it. Its not obvious to me what a "provider" is in the context of different domains, and what a provider's name and version attributes are describing. (I.e. why does a provider need a name and version in the first place?)
The use of "protocol" for the rpc domain is an interesting divergence. The "protocol" according to the client's POV is really the thing we're capturing in all these cases. I.e. a db client may report "postgres" even though the system behind the scenes it not postgres but has added postgres compatibility - "postgres" is just the protocol. This suggests we could go in a different direction and use a client-oriented naming convention, choosing a word that makes it abundantly clear that this is the name of the protocol as far as the client is concerned. But if we did this, we'd have invent new attributes to refer to the server side of things and correlation would suffer.
That's another reason I like the word "system". It seems to make enough intuitive sense from both a client and server perspective that we can get away with using the same word in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me system is better if we are trying to fit a name to multiple domains, however it is clear as mud here. It is used for a model family, and api and a platform. When we use "provider" it fits into "LLM provider" but then again this namespace is likely to be used for things besides pure LLM proxies.
I have seen "cloud provider" used, and it has potentially multiple apis, and that's the salient point of being not terrible for genai. Most services are web services even if they aren't always a cloud platform. platform could be a better word.
In any case, the concrete problem I've found that should be changed in genai isn't about system not having a .name suffix, rather system being used for unclear reasons. If it were broken down into what impacts things, it would be time better spent, vs justifying a global rename.
system ~= platform, cloud, etc, e.g. openai (the platform), xai, etc
api ~= the api used on that system which hints at the SDK, e.g. openai (used by many) sometimes others also re-use non-open apis. Even openai has a v2 now, and soon that's likely to be emulated! Also, some systems you could argue the invocation style matters. e.g. bedrock has a converse
api which is more portable than its other ones. Long story short, the api has a lot more structured info needed, than the name of the system. an end user could convince me easily otherwise.
model_family ~= if we really want to know the model family separate from the model requested and responded with, we could make a bucket for that, but conflating model family with system and api breaks down quickly.
I know I'll be told to make separate issues on these, etc, but I think those should happen before doing things for global namespaces as some of these change the motivation entirely. I also think these things should be led by the leads, I'm not even an approver, so take it for what it's worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me system is better if we are trying to fit a name to multiple domains, however it is clear as mud here. It is used for a model family, and api and a platform
Yeah system (or whatever the word) is overloaded to refer to multiple things, and semantics vary based on whether we're talking about client or server's perspective. Tradeoff between distinct attributes for distinct concepts, or reusing attributes in different contexts with reduced conceptual purity.
I have seen "cloud provider" used, and it has potentially multiple apis, and that's the salient point of being not terrible for genai. Most services are web services even if they aren't always a cloud platform. platform could be a better word.
The "provider" in "cloud provider" probably refers to the many services or apis provided by the cloud. Services being things like ec2, or rds, or gke. Applying "provider" to domains like messaging, database, etc doesn't seem right from this perspective. I don't mind platform - it has some of the same properties as system.
system ~= ...
api ~= ...
Whether or not we should differentiate between these is the tradeoff I was referring to above. Also, API might be the right word in genai, but seems like protocol is a better word in database, messaging, and rpc (if we do want to differentiate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think you are making my most important point, which is that we shouldn't make a change that couples db|messaging|gen_ai and maybe also rpc. We shouldn't have to defend one of these in the middle of a big change on other things. The pressure isn't helpful to making the model clear, even if the pressure is helpful for less PRs and more naming consistency regardless of domain it applies to.
TL;DR; These should be individually discussed and not assumed they should converge or be pressured to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR; These should be individually discussed and not assumed they should converge or be pressured to.
That seems right, since its unlikely that we can find a single word which is the best across all domains. But I think what all these domains have in common is whether or not they should all follow this advice:
I have an arbitrary (and optional) rule of thumb for attribute names to follow {domain}.{thing}.{a property of the thing} pattern.
In all these domains, the current word "system" is being used as "{a property of the thing}" when the current semantics are actually more of a "{thing}".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the "system" point, when in a hole, stop digging. We should leave it alone until a conceptually sound alternative is made and in the case of genai it is likely multiple attributes vs multiple sub-attributes one being .name.
That's because breaking once to rename system to whatever.name, then later correct the real conceptual problem, so put it into a different X.name, that's two hard breaks.
Leaving system alone until the correct destination of the values are found is less breaks, less arduous work for folks who may not even see this PR, only find out later everything is broken and you have to re-do your indices, metrics, alerts etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should totally minimize churn. If we're going to make a breaking change, let's make sure we have the best possible shot at getting it right this time around.
To be fair, this PR is annotated with "do not merge" in the title and has language indicating it will be open for a while to encourage feedback from a wide array of people. I think this is a tactic to minimize churn.
Fixes #1581
Warning
THIS IS A VERY-VERY-VERY BREAKING CHANGE (even though it affects experimental attributes only).
Backends use presence of these attributes as an indication that corresponding spans follow certain conventions.
*.provider.name
*.provider.version
and similar in the future, especially when defining server-side conventionsThere used to bemoved to Renamefeature_flag.protocol_name
, so we can't change it tofeature_flag.protocol.name
yet due to const name collision Implement code-generation hints to drop/rename attributes in case of a collision #1462feature_flag.system
back tofeature_flag.provider_name
#1614rpc.protocol.name
is more precise for RPC where it captures different application-or-higher-level protocol that may work on top of another application protocol (grpc/thrift over http/2)Despite being problematic, we believe this change is necessary to accommodate future extension of semantic conventions for
*.system.*
and this is the last chance to make this attribute name right before we declare any of these attributes stable.Merge requirement checklist
[chore]
Given potentially high impact of this change and also upcoming holiday season, we'll need to keep it open for a while to collect the feedback