Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support optional query param "spanKind" when get operations for given service #1920

Open
2 of 5 tasks
guo0693 opened this issue Nov 12, 2019 · 13 comments
Open
2 of 5 tasks
Labels
help wanted Features that maintainers are willing to accept but do not have cycles to implement

Comments

@guo0693
Copy link
Contributor

guo0693 commented Nov 12, 2019

Requirement - what kind of business use case are you trying to solve?

We want to filter the operations of a service by span kind, those filtered operations can be used for features e.g. get dependencies graph for service:operation pair from server side (spanKind == "server")

Problem - what in Jaeger blocks you from solving the requirement?

Currently we only save operations indexed by service name and we can only query operations by service name. We will need to add extra info: spanKind when saving operations for a service.

Proposal - what do you suggest to solve the problem or improve the existing situation?

API changes:
Restful:

  • GET /api/services/{serviceName}/operations :
    We will keep this endpoint unchanged for backward compatibility, it will still return all operations as a list of string for a given service.

  • GET /api/operations?service={service}&spanKind={spanKind}
    We will update this endpoint to return list of operation metadata for a given service whose spanKind matched the query parameter. If spanKind is empty, then return all operations of a service.
    Sample response:

{
  "data": [
    {
      "name": "A",
      "spanKind": "client"
    },
    {
      "name": "B",
      "spanKind": "server"
    }
  ],
  "total": 2,
  "limit": 0,
  "offset": 0,
  "errors": null
}

GRPC:
rpc GetOperations(GetOperationsRequest) returns (GetOperationsResponse)
Add span_kind to the rpc request and response

message GetOperationsRequest {
  string service = 1;
  string span_kind = 2;
}

message Operation{
    string name = 1;
    string span_kind = 2;
}

message GetOperationsResponse {
  repeated OperationMeta operations = 1;
}

Interfaces:
spanstore:

...
// Get operations by service and spanKind, empty spanKind means get operations for all kinds of span
GetOperations(ctx context.Context, query *spanstore.OperationQueryParameters) ([]*storage_v1.OperationMeta, error)
..

Span reader will need to take extra parameter to query operations by service name and span kind. All the storage plugins needs to be updated so that:

  • write operation index with service name and span kind
  • query operation index by service name and span kind, return all kinds of operations when span kind is empty.

Task List

Any open questions to address

@burmanm
Copy link
Contributor

burmanm commented Nov 13, 2019

Couple of questions:

  • What happens to old data? It can't be queried this way if the information doesn't exist.
  • What defines the kind? The writing client? I don't see any links to the client changes, so is this something coming from for example OpenTelemetry?
  • Why not just a tag?

And of course tasteful thing, but "spanKind" inside Process doesn't really feel descriptive to me. Also, what is a server and what is a client? Is a request made by the first server to next server a client or a server?

@objectiser
Copy link
Contributor

@guo0693 Agree with @burmanm - why just restrict to spanKind, when can support a list of tags, of which span.kind could be one that is queried?

@guo0693
Copy link
Contributor Author

guo0693 commented Nov 13, 2019

Couple of questions:

  • What happens to old data? It can't be queried this way if the information doesn't exist.
  • What defines the kind? The writing client? I don't see any links to the client changes, so is this something coming from for example OpenTelemetry?
  • Why not just a tag?

And of course tasteful thing, but "spanKind" inside Process doesn't really feel descriptive to me. Also, what is a server and what is a client? Is a request made by the first server to next server a client or a server?

  • For existing data, spanKind will be empty and should be queried by "GET /api/operations?service={service}". If necessary, we can create migration script for specific storage plugin to manually add "" spanKind for each operation. We use cassandra at our company and when we update table schema to add a new column, no extra action needs to be taken.
  • span writer will get spanKind from "span.kind" tag, if the tag is missing, then just set to empty string. And yes, users who instrument their code using jaeger-client can define spanKind when create a span.
  • It's value comes from span.kind tag. Thinking from schema point of view:
service_name | operation_name | span_kind
--------------------------------------------
"serviceA" | "operationA"|"server"

Are you proposing we should have below schema?

service_name | operation_name | tag
--------------------------------------------
"serviceA" | "operationA"|"span.kind=server"

@objectiser currently the UI only need span.kind to get a list of server operations to enrich dropdown list in the UI. Not sure if you also suggest a tag column with possible value to be "{tagkey}={tagvalue}", and in the span writer side maybe define a list of indexableOperationTags, for each of those indexable tag in a span, insert a row in the table.
"serviceA" | "operationA"|"{tag1=val1}"
"serviceA" | "operationA"|"{tag2=val2}"

  • span.kind doesn't have a clearly defined restrictions. User can define their kind too, e.g. "local", "test" or whatever. But we have some instrumented frameworks e.g. grpc/tchannel/http which will generate spans for example: client1 --tchannel--> server1:operation1 --tchannel-> server2:oepration2
  • tchannel clientSpan from client1
  • tchannel serverSpan from server1
  • tchannel clientSpan from server1
  • tchannel serverSpan from server2

@objectiser
Copy link
Contributor

@guo0693 Yes I think it would be better (more general) if was a tag at least (maybe not list for now) - rather than just implementing something for this specific use case.

@yurishkuro
Copy link
Member

There are certain arguments in favor of doing this as a general extension with arbitrary tags, but there are also a lot of arguments against that:

  1. Do we know of any real use cases when tags other than spanKind would be useful? Remember that this specific API is primarily used to populate the list of operations for a given service. We don't have any plans to support any search capabilities over that data set
  2. The operationName corresponds to a class of spans (which is why we always insist on it being low cardinality). The span.kind tag belongs to the same category, but many other tags do not, instead they describe a specific instance of that class of spans (e.g. error=true tag).
  3. span.kind is prescribed by the OpenTracing specification to be always recorded, as a way of classifying the spans. Most other tags are not, or they are specific to certain types of spans, like http.* tags. OpenTelemetry goes even further and requires span kind as a span constructor argument (same as span name).
  4. Because span kind is so standardized, it can be included in the index automatically. For any other tag, we would need user-controlled configuration to tell us which tags to include (error=true would never make sense in that index).
  5. Implementing the storage of this data with a list of tags vs. a top-level spanKind field is a lot more complicated. For example, the writers have a dedicated cache used to avoid writing operation names all the time, since it's a fairly static data set. With a list of tags this cache gets more complicated.

Finally, the way the PR itself is shaping, we're extending the operation name in the index from a simple string to a struct. This will allow adding arbitrary tags functionality in the future if the compelling use case is found. My preference would be to wait for such use case to materialize before paying the cost of higher complexity, and only implement support for span kind now as a dedicated field, because of a special role it plays.

@burmanm
Copy link
Contributor

burmanm commented Nov 13, 2019

Based on the support tickets for badger backend, a lot of them have been about tags searching. Sadly most of them have included censored tags, so the actual names are not known. But they have included multi-tags searching. Also, why wouldn't error=true not make sense? I think it sounds like a great filtering when you're interested in traces that end up in failure.

To me this ticket sounds like a limitation of Cassandra backend - and that leaking to the generic API feels weird. All the tags are currently indexed in Badger backend and I would imagine in the ES also. If the idea is to limit the search space to only configured list of tags, why not make that instead?

@guo0693
Copy link
Contributor Author

guo0693 commented Nov 13, 2019

Based on the support tickets for badger backend, a lot of them have been about tags searching. Sadly most of them have included censored tags, so the actual names are not known. But they have included multi-tags searching. Also, why wouldn't error=true not make sense? I think it sounds like a great filtering when you're interested in traces that end up in failure.

To me this ticket sounds like a limitation of Cassandra backend - and that leaking to the generic API feels weird. All the tags are currently indexed in Badger backend and I would imagine in the ES also. If the idea is to limit the search space to only configured list of tags, why not make that instead?

To query traces that end up in failure is a valid query, which I believe currently searching traces by tags is already supported.

But this issue is searching operations based on service and spanKind. We have an ask from UI to only return service-operations from server spans so that when user select a service from the DDG page the operations dropdown list only shows server operations.

@burmanm
Copy link
Contributor

burmanm commented Nov 13, 2019 via email

@yurishkuro
Copy link
Member

yurishkuro commented Nov 13, 2019

@burmanm I cannot find this ticket now, but it's somewhere in this repo or the UI where people complained

"why do you I see all those weird span names in the Operations dropdown on the Search page? I only want to see the endpoints of my service, not all the internal / client span names".

That's what this feature is fixing - to be able to separate operations into "endpoints" vs. "the rest".

@objectiser
Copy link
Contributor

My main objection was related to API design/consistency and mapping onto the model - service, operation and tags are first class concepts, whereas span.kind is not (i.e. it is a specific type of tag).

However given the transition to OpenTelemetry, and that span kind is a top level span field now (instead of an attribute/tag), I am ok with the approach.

@yurishkuro
Copy link
Member

@guo0693 I don't think we addressed one design point, I thought I raised it somewhere in the comments.

Do we really need to support query by kind in the storage? Yes, storage needs to return the spanKind, but why can't it just return ALL operations for a service? If we did not have this requirement, then the gRPC API change could be 100% backwards compatible, the existing storage plugins would return a list of op names in field 1, and after they are upgraded they would start returning the list of Operation structs in a new field 2. And there would be a lot lest implementation changes in the storage engines, only to insert and retrieve an extra field, but not build any additional filtering. The filtering can be done by the QuerySvc.

@guo0693
Copy link
Contributor Author

guo0693 commented Dec 2, 2019

Yes, storage needs to return the spanKind, but why can't it just return ALL operations for a service? If we did no

As of now, cassandra implementation did the filtering in storage layer using cql . If badger & ES does not support filtering in db query, we can put the filtering logic in a util class and retrieve operations of all span.kind from db and apply the filtering

@yurishkuro
Copy link
Member

Please add a ticket & coordinate making the UI change to use the new endpoint on the Search screen and display the span kind. I recommend adding it as a prefix, e.g. (server) getUser, and ordering the entries in the dropdown so that server and consumer spans are shown first.

Also add a ticket to remove the old endpoint once the UI is upgraded.

@yurishkuro yurishkuro added help wanted Features that maintainers are willing to accept but do not have cycles to implement good first issue Good for beginners and removed good first issue Good for beginners labels Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Features that maintainers are willing to accept but do not have cycles to implement
Projects
None yet
Development

No branches or pull requests

4 participants