-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define what behavior is expected when X-B3-TraceId is present, but not X-B3-SpanId #3
Comments
X-B3-TraceId must be present with X-B3-SpanId to affect the trace identifiers. If they aren't specified together, whichever is present will be ignored and a new trace and span id will be provisioned. |
PS I know many folks aren't watching this repository, so I'll do a one-time spam for B3-compliant tracer authors. The aim is to capture B3 here and also ideally an integration test for it (maybe crossdock). I hope some of you will choose to watch this repository (so as to make sure things make sense) @nicmunroe @felixbarny @yurishkuro @jcarres-mdsol @abesto @eirslett @kristofa @michaelsembwever @kevinoliver @mosesn @marcingrzejszczak @prat0318 @devinsba @basvanbeek @schlosna @ewhauser @klingerf |
related issue: #4 |
Apologies in advance if I'm missing something and the following is dumb. In the case of receiving a span ID but no trace ID I can understand throwing stuff away and creating new IDs as I don't see any benefit to honoring a span ID without knowing what trace it's attached to. But in the case of receiving a trace ID without a span ID that's still potentially useful info; it's true that you wouldn't know which span ID to use as the parent, but just knowing that a span is part of a given request even if it's "detached" is potentially useful information in my mind. This would only really come up in two cases:
I'm probably missing some important points about "why". Let me know what you think. |
Honoring the trace when present makes sense to me, too, whether or not a span is present. |
Prologue: A reverse-documentation effort smokes out concerns in the original implementation. So, let me first mention that by documenting what's been the case for years doesn't mean I agree with it, or decided it. In things like this in Zipkin, I usually take the stance of first document what exists. Then, if important enough help push the many parties to change. The latter is very expensive even when people agree. On requiring both, and not accepting degraded propagation: The coupling was required since its original commit in 2011 twitter/finagle@8939f31 Perhaps @johanoskarsson remembers why, I wouldn't. My 2p it that it reasonable to not accept broken instrumentation, especially if you have no defined way to log "broken". We do have a way now, via the "error" annotation. The way I see it is that people can choose to start a new span given broken propagation, but if we want to make this helpful, we should probably report that it is broken. One thing that has come up (for example reported by @kristofa) is that misleading traces are worse than no trace. One way we could do this is add either the existing "error" annotation or define a broken instrumentation annotation and use that. @nicmunroe on part 2: If the user can send trace id, they can also send a span id, right? Let me know if #4 (comment) doesn't cover that. |
@yurishkuro I think you have done some work in analyzing tracers.. do you have means to tag bad or out-of-date instrumentation? ex in finagle, the version is logged, so if there's an issue with one version, you could analyze on that. |
we do log the version of the tracing client library, but we do not bother processing malformed traces from the wire, we just start a new one. It's a bit different in our case because aside from baggage all of the span context is encoded as a single string in a single header, so the use case of "have trace id, don't have span id" is very unlikely to happen (unless the header itself got partially chopped off by some http proxy, but then we just won't parse it at all). |
@adriancole it does cover responsible callers who do their research before integrating, but it's happened more than once where a time-crunched team assumes they know what's required and only end up sending trace ID. The problem with simply dropping that on the floor is that the problem is usually only discovered during an important error debugging session, at which point it's too late. Since we use MDC tagged logging to put trace IDs into every log message and use a log aggregator to be able to search for all log messages across all microservices related to a given trace ID when investigating that request, switching trace IDs halfway through the request when we technically don't have to would be a big problem. I know that's a secondary use case for tracing and not supposed to be the primary consideration, but it's proven itself to be so incredibly useful that I don't think we can throw broken implementors under the bus. After all, the broken implementors are the ones most likely to need the extra debugging capabilities, yes? I don't think our use case should trump everything else and cause the spec to change - just trying to provide our perspective. |
@nicmunroe @jcarres-mdsol @shalako Ok, well we can't rewrite history, but we can affect this moving forward. How about this?
Instrumentation who tolerate absent
When multiple calls propagate the same |
@adriancole ok, a few things are clicking for me now and I think I see where some of the disconnect on my part was:
Again, this is not to say that what wingtips is doing is better; I mainly want to make sure what we're doing isn't wrong or going to cause other nasty problems. Assuming what we're doing is ok then I'm happy with the documentation stating what happens historically along with what other options are acceptable. |
This sounds similar to grpc. In grpc, they propagate the parent (client) id In B3 (today anyway), the caller propagates the span id for the RPC. For single-host spans, the client span id is propagated, but used Let's say the client has parentId 1 and spanId 2. The client propagates its If the server wants to use shared spans, it uses exactly the same ids for If the server wants to do separate spans for client and server, it uses the If there is no span id, with the change we mentioned, it just provisions a The catch is when a caller thinks the server is going to use the span id it Well, there's no impact in zipkin. Zipkin has no search by span id anyways. There could be impact to log tools that assume all span ids end up in logs. So... does this help clarify things about impact to wingtips (which is same On 19 Sep 2016 01:29, "Nic Munroe" [email protected] wrote:
|
Yeah in wingtips the server always separates the spans by taking the client's span ID as the server's parent span and creating a new span ID for the server span. So it sounds like that's a reasonable thing to do and not unheard of. Thanks for taking the time for all these explanations! |
that doesn't sound like a good practice. If the client app that only sends trace id makes a bunch of calls, each call will register as the same span id. |
I lean towards asking people to actually implement the specification, and |
@adriancole While I am intimately familiar with the pain of bad instrumentation, I agree that it's better to fix that instrumentation than to document weird scenarios of what should happen if half the info is missing. @nicmunroe 's argument that "you get burnt by bad instrumentation when you need it the most" is somewhat flawed, because if bad instrumentation is already in production, then the tracing system is already processing the bad data and can actually identify bad players proactively, without waiting for an outage to happen. |
I'd say just error. Both headers are necessary. |
@yurishkuro I agree with you in principle that bad instrumentation is possible to detect proactively, but not every organization is mature enough or has built the proper tools and alerting to stay on top of that. In the wild many teams don't have a solid grasp of distributed tracing and will do the wrong thing due to not reading the spec, or a bug, or what have you (and it will always be worse for orgs as they get started). I think it's fine for the spec to be strict and simply state it's an error to send trace ID without span ID, and not document weird scenarios. But in the spirit of "be conservative in what you send, be liberal in what you accept" robustness, wingtips will likely continue to handle that case by continuing the trace with a new random span ID as the disconnected span still allows for bad-instrumentation-detection, and having all parts of the request tagged with the same trace ID still provides tangible benefit for callers as they work through their instrumentation issues. |
ps I thought about this recently, and one thing about this thread is that it seems there's multiple valid policies, of which context decides what's best. One way out is to document a few policies in practice, what's more used (if there's a more used) .. in something like an appendix. Ex.
I'd concede that most of these choices are library and possibly span format specific, but anyway food for thought. this recently broke out from separate discussions with @basvanbeek and @pavolloffay on how to handle propagation nuance, as these sorts of things can be insidious in practice |
question here spring-cloud/spring-cloud-sleuth#400 (comment) from @shalako
The text was updated successfully, but these errors were encountered: