Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken trace context propagation: OTel Trace ID of DD agent spans converted by OTel Col Datadog Receiver are wrong #36926

Open
cyrille-leclerc opened this issue Dec 23, 2024 · 1 comment
Labels
bug Something isn't working needs triage New item requiring triage receiver/datadog

Comments

@cyrille-leclerc
Copy link
Member

cyrille-leclerc commented Dec 23, 2024

Component(s)

receiver/datadog

What happened?

Description

When doing W3C Trace Context Propagation from an app instrumented with OTel to an app instrumented with the Datadog agent and the Datadog agent sending spans to the OTel Collector Datadog Receiver, the Trace Id reported by the OTel Collector Datadog
Receiver is different from the Trace ID of the parent spans, breaking the trace.

I can't confirm but I suspect that this is caused by the logic in the OTel Col Datadog
Receiver to produce the OTel Trace ID from the Datadog ids.

Architecture:

                                                                   
 ┌──────────┐                               ┌───────────┐───┐      
 │OTel Java ┼──────┐                        │           │   │      
 └─────┬────┘      │                        │Receiver   │ O │      
       │           │                        │           │ T │      
       │           │                        │───────────│ e │      
       │           └───────────────────────►│ OTLP      │ l │      
       │                                    │           │   │      
       │traceparent: trace=xyz, parent=abc  │           │ C │      
       │                                    │───────────│ o │      
       │                                    │           │ l │      
 ┌─────▼──────┐ ┌──────────────────────────►┤Datadog    │   │      
 │Datadog Java┼─┘                           └───────────┘───┘      
 └────────────┘                             span:                  
                                              spanId=...           
                                              parent=abc <--CORRECT
                                              traceId=uvw <--WRONG 

OTel Collector debug log.

  • First span: HTTP client call span emitted by a Java Spring Boot app instrumented by the OTel Java Agent v1.44.1 with
    • traceId=37940834c74a2dfc11835c979eca1433
    • spanId=bb4331d223d59950
  • Second span: HTTP Server span emitted by a Spring Boot app instrumented by the Datadog Java Agent v1.44.1 with
    • parentId=bb4331d223d59950 as expected
    • traceId=000000000000000011835c979eca1433 which is NOT expected, we expect 37940834c74a2dfc11835c979eca1433
Resource SchemaURL: https://opentelemetry.io/schemas/1.24.0
Resource attributes:
     -> deployment.environment.name: Str(staging)
     -> host.arch: Str(aarch64)
     -> host.name: Str(cyrille-le-clerc-macbook.local)
     -> os.description: Str(Mac OS X 15.2)
     -> os.type: Str(darwin)
     -> process.command_args: Slice([...,"-jar","target/checkout-1.1-SNAPSHOT.jar"])
     -> process.executable.path: Str(.../bin/java)
     -> process.pid: Int(14768)
     -> process.runtime.description: Str(Homebrew OpenJDK 64-Bit Server VM 17.0.13+0)
     -> process.runtime.name: Str(OpenJDK Runtime Environment)
     -> process.runtime.version: Str(17.0.13+0)
     -> service.instance.id: Str(ccad3c44-aebc-4f8b-96b9-c4ed6a5433c4)
     -> service.name: Str(checkout)
     -> service.namespace: Str(shop)
     -> service.version: Str(1.1)
     -> telemetry.distro.name: Str(opentelemetry-java-instrumentation)
     -> telemetry.distro.version: Str(2.10.0)
     -> telemetry.sdk.language: Str(java)
     -> telemetry.sdk.name: Str(opentelemetry)
     -> telemetry.sdk.version: Str(1.44.1)
ScopeSpans #0
ScopeSpans SchemaURL:
InstrumentationScope io.opentelemetry.java-http-client 2.10.0-alpha
Span #0
    Trace ID       : 37940834c74a2dfc11835c979eca1433
    Parent ID      : 179ce2ee48649594
    ID             : bb4331d223d59950
    Name           : POST
    Kind           : Client
    Start time     : 2024-12-23 17:34:54.397226541 +0000 UTC
    End time       : 2024-12-23 17:34:54.63652375 +0000 UTC
    Status code    : Unset
    Status message :
Attributes:
     -> server.address: Str(shipping.local)
     -> tenant_id: Str(tenant-1)
     -> http.request.method: Str(POST)
     -> network.protocol.version: Str(1.1)
     -> http.response.status_code: Int(200)
     -> thread.id: Int(160)
     -> server.port: Int(8088)
     -> thread.name: Str(grpc-default-executor-36)
     -> url.full: Str(http://shipping.local:8088/shipOrder)

ResourceSpans #1
Resource SchemaURL: https://opentelemetry.io/schemas/1.16.0
Resource attributes:
     -> telemetry.sdk.language: Str(java)
     -> process.runtime.version: Str(17.0.13)
     -> service.version: Str(1.1)
     -> telemetry.sdk.version: Str(Datadog-1.44.1~13a9a2d011)
     -> telemetry.sdk.name: Str(Datadog)
     -> service.name: Str(shipping)
     -> host.name: Str(localhost)
     -> os.type: Str(darwin)
ScopeSpans #0
ScopeSpans SchemaURL:
InstrumentationScope Datadog 1.44.1~13a9a2d011
Span #0
    Trace ID       : 000000000000000011835c979eca1433
    Parent ID      : bb4331d223d59950
    ID             : 6176a9d3ea94c1f7
    Name           : servlet.request
    Kind           : Server
    Start time     : 2024-12-23 17:34:54.453192875 +0000 UTC
    End time       : 2024-12-23 17:34:54.638774084 +0000 UTC
    Status code    : Ok
    Status message :
Attributes:
     -> dd.span.Resource: Str(POST /shipOrder)
     -> sampling.priority: Str(1.000000)
     -> datadog.span.id: Str(7022987396569678327)
     -> datadog.trace.id: Str(1261954126867731507)
     -> servlet.path: Str(/shipOrder)
     -> deployment.environment: Str(production)
     -> peer.ipv4: Str(127.0.0.1)
     -> thread.name: Str(http-nio-8088-exec-1)
     -> language: Str(jvm)
     -> service.version: Str(1.1)
     -> span.kind: Str(server)
     -> http.method: Str(POST)
     -> _dd.p.dm: Str(-0)
     -> http.status_code: Str(200)
     -> _dd.tracer_host: Str(cyrille-le-clerc-macbook.local)
     -> http.url: Str(http://shipping.local:8088/shipOrder)
     -> http.hostname: Str(shipping.local)
     -> _dd.p.tid: Str(37940834c74a2dfc)
     -> servlet.context: Str(/)
     -> http.route: Str(/shipOrder)
     -> runtime-id: Str(8265563a-4256-4741-ba0c-ebbb676d4473)
     -> http.useragent: Str(Java-http-client/17.0.13)
     -> component: Str(tomcat-server)
     -> thread.id: Double(37)
     -> process.pid: Double(73021)
     -> _dd.profiling.enabled: Double(0)
     -> peer.port: Double(64145)
     -> _dd.trace_span_attribute_schema: Double(0)
     -> _sampling_priority_v1: Double(1)
     -> _dd.measured: Double(1)
     -> _dd.top_level: Double(1)

Steps to Reproduce

  • Setup an OTel Col with both OTLP and Datadog receivers and the debug exporter
  • Create two Spring Boot apps, one "upstream_app" calling the "downstream_app" through an HTTP call
    • On the HTTP handler of the "downstream_app", dump the traceparent http header to verify the context is propagated
  • Instrument the upstream app with OTel Java Auto Instr v2.10.0
  • Instrument the downstream app with dd-trace-java v1.44.1
export DD_TRACE_AGENT_URL="http://localhost:8126"
# disabling remote config to ensure no weird behavior 
export DD_REMOTE_CONFIGURATION_ENABLED=false

java \
     -javaagent:"$DATADOG_AGENT_JAR" \
     -Dserver.port=8088 \
     -jar target/shipping-1.1-SNAPSHOT.jar
  • Invoke the "upstream_app" to trigger an http call to the "downstream_app"
  • Inspect the produced spans in the OTel collector logs

Expected Result

  • The spans in the otel collector logs show that the trace context is properly propagated: there is just one traceID and the parentId of the HTTP handler of the "downstream_app" matches the spanId of th HTTP call of the "upstream_App".

Actual Result

The parentId is properly propagated by the TraceId is wrong.

Collector version

v0.116.0

Environment information

Environment

MacOS 15.2

Demo app:

OpenTelemetry Collector configuration

receivers:
  datadog:
    endpoint: localhost:8126
    read_timeout: 60s
exporters:
  debug:
    verbosity: detailed
service:
  pipelines:
    traces:
      receivers: [datadog]
      processors: 
      exporters: [debug]

Log output

See bug description

Additional context

No response

@cyrille-leclerc cyrille-leclerc added bug Something isn't working needs triage New item requiring triage labels Dec 23, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage receiver/datadog
Projects
None yet
Development

No branches or pull requests

1 participant