-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Target Allocator does not operate outside of Kubernetes but is essential for scaling OTEL(prometheus "receivers" are essential) #3317
Comments
I think it's doable especially because the allocator is using prometheus libraries and all the service discovery capabilities should be easy use. The part I think requires the most amount of changes is about the collector discovery, which is fully tied to the Kubernetes API. |
I will go even further, and point out that Target Allocator isn't even necessarily Otel-specific. You could easily adapt it to horizontally scaling plain Prometheus, and there's been some efforts in that direction. One of the reasons I'm bringing this up is that we're probably not going to accept any PRs adding non-K8s service discovery to the Target Allocator in the short-term. The reason is simply that this is out-of-scope for the operator project - we have neither the domain expertise nor the ability to test a wide range of SD mechanisms. With that said, I will happily review your PR @vape-spryker and discuss the changes you've made to introduce non-K8s SD. I've also added this topic to the agenda for our SIG meeting at 6 PM CET on 19.12.2024. If you'd like to join the discussion, we'd love to have you! |
I'm wondering if we could use plain prom config for non kubernetes service discovery. |
Also tagging @open-telemetry/operator-approvers @open-telemetry/operator-maintainers for visibility. |
@swiatekm I've introduced only outside of Kubernetes collector discovery based on AWS-Cloud Map- it can be used in any topology in AWS not specifically ECS. The service discovery is not touched and is entirely Prometheus based. For ECS service discovery, instead of implementing this I've just added a sidecar otel-collector with observer/ecs to present the targets as files which TA takes w/o problem as it uses the service discovery functionality from Prometheus. |
Why not only using static config leverage prometheus service discovery? @vape-spryker |
@nicolastakashi Thats exactly what I am doing. The part that adds is to discover the collectors so targets can be assigned to them effectevly. This part prometheus does not do. Apart from that the service discovery is entirely based on the prometheus. It is static config but the endpoints of the ecs cluster as they change has to be discovered and prometheus dont have native discovery of ECS, thats why i use sidecar collector with observer. If you think of a native option i would gladly use it |
@swiatekm I couldn't join the SIG meeting but also we are internally solving the CLA in the company. Did you manage to discuss it ? |
@vape-spryker Apologies, we discussed this at the SIG meeting and came to this conclusion (link to the meeting notes). We decided that we would accept the ability for the TA to discover collector targets outside of only Kubernetes, however, to be as environment agnostic as possible we would like that interface to simply be through a static file that the TA reads, combined with a reload endpoint that triggers a read-refresh. This would allow anyone (not just ECS) to take advantage of the TA through a separate environment-specific process. |
@jaronoff97 if we use the same file_sd as prometheus offer we can get it working right? |
@nicolastakashi can we? Prometheus' service discovery discovers scrape targets, whereas here we just need a list of collector IDs. Sounds a bit overkill to me. |
@swiatekm Both are not related. Introducing such dependency can be dangerous mechanism as IP discovery can be a byproduct of file_sd but in case if changes in this mechanism and format this will break collector discovery. Service discovery and collector discovery are not related. Lets try to uphold the Single responsibility. |
Component(s)
target allocator
Is your feature request related to a problem? Please describe.
I am a user of ECS but what I am writing makes sense for most out of Kubernetes use-cases like ours like bare instances.
We are using OTEL relaying heavily on the prometheus "receiver" ( I put in quotes as its a scraper :) ) these days most of the cloud-native stack is running prometheus-compliant api metrics endpoint hence this plugin of OTEL becomes critical for metrics collection. We are faced in a situation where we need to scale our collector for HA and potentially for capacity and thats where prometheus "receiver" becomes a pain which is currently only elegantly solved by the Target Allocator - any other solution like randomly spread the config and somehow feeding it via a separate configuration management creates complicated dynamics.
But here comes the problem, TA is written and designed for Kubernetes environment and currently tightly coupled in the otel-operator codebase, however it solves a domain of issues beyond orchestrator.
Currently the implementation allows me to feed static list of scraping configuration which is great but this is not flexible enough to use it in my use case. Discovery of collectors is still K8s hardcoded. Technically I can try to add AWS CloudMap discovery for ECS and maybe that will be enough to make it work but I am not sure if this contribution will be accepted in this project.
The use case of TA is outside of the domain of otel operator(the Single Responsibility principal) and it would be great that any OTEL citizen not only K8s has access to it :)
Describe the solution you'd like
Extend collector discovery with AWS CloudMap based on tags and names so the collectors be discovered.
Add service discovery using the AWS CloudMap so endpoints can be created automatically - this is not a big issue as I can provide static scraping config.
AWS has a https://github.com/aws-samples/prometheus-for-ecs/ (https://github.com/aws-samples/prometheus-for-ecs/blob/main/pkg/aws/cloudmap.go) and I am aiming to use it in a similiar approach to augment the TA.
At least, you should be able to provide static config of the collectors and scraping config to be chunked and distributed which would democratize the TA to work in any environment.
Describe alternatives you've considered
Manually randomize configuration, write it in AWS SSM and potentially let initcontainers for the collectors Service in ECS to determine which config is for which collector when they start. This is a flawed approach but apart from the TA there is no option.
Additional context
https://github.com/aws-samples/prometheus-for-ecs/
The text was updated successfully, but these errors were encountered: