Spike: Test OTel Standalone Library Instrumentation with NR OTel Span Prototype #2060

jasonjkeller · 2024-09-23T17:37:21Z

Test whether the OTel Span Prototype captures spans from OTel Standalone Library Instrumentation for a framework currently un-instrumented by the New Relic Java agent (e.g. Armeria instrumentation is recommended). What happens when a framework is instrumented by both OTel Standalone Library Instrumentation and a NR Java agent instrumentation (weave module/custom instrumentation)?

https://opentelemetry.io/docs/concepts/instrumentation/libraries/
https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/supported-libraries.md

workato-integration · 2024-09-23T17:37:26Z

https://new-relic.atlassian.net/browse/NR-316379

jasonjkeller · 2024-10-24T00:25:55Z

The following project contains a blog server and a client that makes simple CRUD requests to the server. Both the server and the client are built using Armeria and both are configured to use the Armeria standalone OTel instrumentation for emitting OTel spans/metrics.

https://github.com/jasonjkeller/armeria-client-server-example

When instrumenting both the client and the server with the NR OTel Span Prototype Java agent the Armeria OTel spans are detected and added to, or start, NR traces.

The client requires usage of the New Relic Java agent @Trace annotation to start a transaction as the detected OTel SpanKind.CLIENT spans will not start a transaction on their own.

The server does not need the Java agent instrumentation or APIs to start transaction as the detected OTel SpanKind.SERVER spans will cause the agent to start transactions (if there is not already an existing transaction).

There are two cases to consider..

Case 1: NR Java agent instrumentation libraries disabled, custom instrumentation and OTel spans enabled

In the first case, all out of the box instrumentation from the NR Java agent has been disabled. Only custom instrumentation and detected OTel Spans will contribute to traces. There are some clear issues demonstrated with this case in the screenshots below.

Client

Server

Case 2: NR Java agent instrumentation libraries, custom instrumentation, and OTel spans enabled

In the second case, all out of the box instrumentation from the NR Java agent is enabled and it will contribute to traces along with any custom instrumentation and detected OTel Spans. There are some noticeable improvements when the New Relic Java agent instrumentation is working, specifically we now see that the client and server entities are properly linked in the trace:

However, there are some noticeable problems present when the two instrumentation sources are combined...

Client

In addition to the trace above showing the client calling the server, the Java agent's Netty instrumentation has started the following transaction, which should be included in the trace between services, but instead is a separate transaction because it never successfully linked the async token. Furthermore, the token timed out resulting in the Truncated segment.

Server

In addition to the trace above showing the client calling the server, we see a separate transaction for the OTel server span representing the call to the server blogs endpoint. This span should be included in the trace with the client calling the server however it wasn't successfully linked to it.

In summary, things sort of work but not currently in a correct or cohesive manner as expected from APM agents, and this is for a very simple distributed trace.

When relying on just the OTel span data the traces are not as complete as desired and we don't see proper entity linking. Also, the OTel client spans still require manually starting transactions via the NR API for them to show up at all.

When combining the OTel span data with out of the box NR instrumentation, it's resulting in what should be a single complete trace instead being broken up into multiple traces. This is due to missed token linking across threads between NR and OTel spans.

Further work will need to be done to investigate a solution for this.

jasonjkeller added this to the Java Agent - FY25Q3 - OTel Hybrid Agent PoC milestone Sep 23, 2024

github-project-automation bot added this to Java Engineering Board Sep 23, 2024

github-project-automation bot moved this to Triage in Java Engineering Board Sep 23, 2024

jasonjkeller added the oct-dec qtr Represents proposed work item for the Oct-Dec quarter label Sep 23, 2024

jasonjkeller self-assigned this Sep 23, 2024

jasonjkeller mentioned this issue Sep 23, 2024

Figure out an automated testing strategy for supported OTel instrumentation #2063

Open

jasonjkeller closed this as completed Oct 24, 2024

github-project-automation bot moved this from In Sprint to Code Complete/Done in Java Engineering Board Oct 24, 2024

jasonjkeller mentioned this issue Oct 24, 2024

Investigate bi-directional context propagation between NR and OTel #2098

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike: Test OTel Standalone Library Instrumentation with NR OTel Span Prototype #2060

Spike: Test OTel Standalone Library Instrumentation with NR OTel Span Prototype #2060

jasonjkeller commented Sep 23, 2024 •

edited

Loading

workato-integration bot commented Sep 23, 2024

jasonjkeller commented Oct 24, 2024 •

edited

Loading

Spike: Test OTel Standalone Library Instrumentation with NR OTel Span Prototype #2060

Spike: Test OTel Standalone Library Instrumentation with NR OTel Span Prototype #2060

Comments

jasonjkeller commented Sep 23, 2024 • edited Loading

workato-integration bot commented Sep 23, 2024

jasonjkeller commented Oct 24, 2024 • edited Loading

Case 1: NR Java agent instrumentation libraries disabled, custom instrumentation and OTel spans enabled

Client

Server

Case 2: NR Java agent instrumentation libraries, custom instrumentation, and OTel spans enabled

Client

Server

jasonjkeller commented Sep 23, 2024 •

edited

Loading

jasonjkeller commented Oct 24, 2024 •

edited

Loading