Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate otel metrics to libbeat monitoring #15094

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

kruskall
Copy link
Member

@kruskall kruskall commented Jan 1, 2025

Motivation/summary

use otel api to record metrics and export them to beats monitoring

Checklist

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

  • run apm-server
  • send data
  • go to index management and validate the monitoring index is there and monitoring data is inside it

Related issues

Related to #14488

@kruskall kruskall marked this pull request as ready for review January 1, 2025 00:50
@kruskall kruskall requested a review from a team as a code owner January 1, 2025 00:50
Copy link
Contributor

mergify bot commented Jan 1, 2025

This pull request does not have a backport label. Could you fix it @kruskall? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • backport-8.x is the label to automatically backport to the 8.x branch.

Copy link
Contributor

mergify bot commented Jan 1, 2025

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Jan 1, 2025
internal/beater/server_test.go Outdated Show resolved Hide resolved
x-pack/apm-server/main_test.go Outdated Show resolved Hide resolved
x-pack/apm-server/main_test.go Outdated Show resolved Hide resolved
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still many TODOs in the code, so it's unclear if this is ready for final review. At any rate, I should probably not be the final reviewer since I was involved in the initial implementation. It would be great if we could do this in stages, since this PR is pretty massive.

fmt.Sprintf("apm-server.processor.%s.transformations", eventType),
)
if err != nil {
// TODO(axw) return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs fixing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can panic. WDYT about ignoring the err like we did for outer counters ?

@@ -79,7 +54,17 @@ func RegisterGRPCServices(
Semaphore: semaphore,
RemapOTelMetrics: true,
})
gRPCMonitoredConsumer.set(consumer)

// FIXME we should add an otel counter metric directly in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs fixing before we merge anything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be needed now that we propagate the meterprovider instead of reusing a global meter.

I also think adding metrics to apm-data is not as simple since we need to keep other systems in mind (that's a library).

internal/beater/beater.go Outdated Show resolved Hide resolved
internal/beater/beater.go Outdated Show resolved Hide resolved
internal/beatcmd/reloader.go Outdated Show resolved Hide resolved
if value, ok := getScalarInt64(m.Data); ok {
monitoring.ReportInt(v, "indexers.destroyed", value)
}
// TODO output.elasticsearch.indexers.active (created - destroyed?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not produced by the go-docappender, it's one of the legacy metrics. If we think it's useful we should add it there. WDYT ?

internal/beatcmd/beat.go Show resolved Hide resolved
internal/beatcmd/beat.go Outdated Show resolved Hide resolved
Copy link
Contributor

@simitt simitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed on a high level in an initial round and agree with the general direction.
However, this PR is huge which makes it very hard to spot subtle bugs or potential issues. Therefore +1 on @axw 's suggestions to split this PR up into smaller chunks.

internal/beatcmd/beat.go Outdated Show resolved Hide resolved
internal/beatcmd/beat.go Outdated Show resolved Hide resolved
@kruskall
Copy link
Member Author

I don't think it's possible to split this PR given how dependant on each other the packages are. I'll try.

@kruskall kruskall requested a review from a team January 23, 2025 20:57
@kruskall
Copy link
Member Author

Split into #15360. That's probably as small as it gets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants