-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/azuremonitorreceiver] feat: Allow to not split result by dimension #36240
base: main
Are you sure you want to change the base?
[receiver/azuremonitorreceiver] feat: Allow to not split result by dimension #36240
Conversation
|
aa46e6e
to
4f70e3d
Compare
Hey @codeboten, @nslaughter, @jpkrohling. Any opinion on that? I feel it's a no brainer so if you can run the pipeline and have a look, it would be cool! |
We have noticed that some metrics do not return any data if you include dimensions that do not have any data, are empty or nil (assumption based on results we have seen). For instance Azure Firewall Network Rule Hits have 3 dimensions Status, Reason and Protocol. If filtering on Status and Reason we get back data, but as soon as you add Protocol it returns no data at all. It returns no data if you only try to filter on Protocol too. Seen same for other metrics that have "empty" dimension values. Would it maybe make sense to not only opt in / out but set which dimensions you would like to filter on? We would still want Reason and Status but can drop Protocol dimension |
That's interesting. Then for that particular metric, using split_by_dimensions config field would allow you to receive data but you would lose the status and reason granularity. I just checked and indeed we have the same result even in the Azure Portal UI. Selecting I believe this is a bug on the Let me propose a config structure: dimensions:
# default to true to not introduce breaking change. This would mean that all the available dimensions will be collected, except if an exclusion exist.
enabled: true | false
exclusions:
"Microsoft.Network/azureFirewalls": # service name
"Network rules hit count": # metric name
- "Protocol" We can also implement it the other way dimensions:
enabled: true
overrides:
"Microsoft.Network/azureFirewalls":
"Network rules hit count": [Reason, Status] WDYT? |
I also think this is a bug in the api not only affecting this metric. We noticed it yesterday at 11:10 when suddenly all Postgresql flexible databases stoped reporting cpu, memory, storage etc. It had been working before that filtering on ServerName, but suddenly doesn't work anymore. Nevertheless it seems like a good feature when you do not need the granularity with all dimensions. I think then the last option makes more sense to override and specify which dimensions you need. |
Okay I will make an update. |
I agree, second proposed config format is better to me |
1d2a929
to
7bd6cad
Compare
ping @tesharp I finished and pushed and I confirm that before this new overrides, the network rule hit metric was not even in the result ! Now it works well 💪 Note it's probably worth it that I move these dimensions functions out of scraper.go file. As if I make scraper_batch right after, I will reuse them.. Edit: moved to a dedicated dimensions.go files + added some unit tests |
7bd6cad
to
2191959
Compare
Nice :) Looks good. |
Please make sure you run the linters :) |
2191959
to
72df93d
Compare
Thanks @MovieStoreGuy, sorry for that. It should be fine now. |
221eded
to
514efd8
Compare
438ed36
to
2fad09b
Compare
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
2fad09b
to
3649244
Compare
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
…mension Signed-off-by: Célian Garcia <[email protected]>
3649244
to
9709d07
Compare
Description
Currently there is a mechanism allowing to split the result by dimension, thanks to a filter param hack.
This is great that all the dimensions are collected as labels in the metrics, but for some resources types it could be unwanted. In cause some concern about cardinality, or continuity of the queries between one version to another (e.g with prometheus exporter, if one does not do a "sum by ..." and the otelcol version is updated, the query can display different results)
To mitigate that I propose to put an optout for this collection so if we want for some resource types and not for others, we can create two separate receivers for example. Or we completely opt out if we don't want additional labels.
Edit:
After a first review, we agreed on the fact that it could do a bit more like allowing us to specify a list of dimensions for a particular metric.
e.g from added documentation
Without this config you won't have the
azure_networkrulehit_average_Count
metric at all in your resultsBut with the config we have it with reason and status labels:
# HELP azure_networkrulehit_average_Count
# TYPE azure_networkrulehit_average_Count gauge
azure_networkrulehit_average_Count{azuremonitor_resource_id="/subscriptions/<redacted>/resourceGroups/<redacted>/providers/Microsoft.Network/azureFirewalls/<redacted>",location="<redacted>",metadata_reason="<redacted>",metadata_status="Allow",name="<redacted>",resource_group="<redacted>",type="Microsoft.Network/azureFirewalls"} 21.875
azure_networkrulehit_average_Count{azuremonitor_resource_id="/subscriptions/<redacted>/resourceGroups/<redacted>/providers/Microsoft.Network/azureFirewalls/<redacted>",location="<redacted>",metadata_reason="<redacted>",metadata_status="Deny",name="<redacted>",resource_group="<redacted>",type="Microsoft.Network/azureFirewalls"} 10
Link to tracking issue
Fixes #36611
Testing
Documentation