You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working to upgrade from v1.3.1 -> v1.4.6. When I performed this upgrade in a development environment, all pods came up cleanly, but I noticed that I lost some metrics like total_node_publish_error from our monitoring tool. I found the changelog entry on v1.4.0 indicating the metric was renamed and updated our monitoring tool to expect node_publish_error_total instead of total_node_publish_error. However, even after handling the rename, the metric still isn't available. Across versions, I'm still getting some metrics like rotation_reconcile_duration_sec.
If I port-forward :8095 on the pod to localhost and view the /metrics page, none of the node_* metrics are shown. To troubleshoot, I tried reverting versions to see where these metrics were lost. the last version where I see this metrics is v1.3.4. It seems like something happened in v1.4.0 that dropped these metrics. Is there a way to get them back?
What did you expect to happen:
metrics node_publish_total, node_unpublish_total, node_publish_error_total, node_unpublish_error_total, and sync_k8s_secret_total metrics to be available on :8098/metrics
I would expect that even if some of these metrics aren't created unless there are values to report (i.e. they default to null rather than 0), that I would still have at least node_publish_error_total. When I revert to v1.3.4 in our dev environment, this metric is immediately published with value 1 (which is a different issue I need to investigate 😅 ).
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Which provider are you using:
[e.g. Azure Key Vault, HashiCorp Vault, etc. Have you checked out the provider's repo for more help?]
What steps did you take and what happened:
I'm working to upgrade from v1.3.1 -> v1.4.6. When I performed this upgrade in a development environment, all pods came up cleanly, but I noticed that I lost some metrics like
total_node_publish_error
from our monitoring tool. I found the changelog entry on v1.4.0 indicating the metric was renamed and updated our monitoring tool to expectnode_publish_error_total
instead oftotal_node_publish_error
. However, even after handling the rename, the metric still isn't available. Across versions, I'm still getting some metrics likerotation_reconcile_duration_sec
.If I port-forward :8095 on the pod to localhost and view the
/metrics
page, none of thenode_*
metrics are shown. To troubleshoot, I tried reverting versions to see where these metrics were lost. the last version where I see this metrics is v1.3.4. It seems like something happened in v1.4.0 that dropped these metrics. Is there a way to get them back?What did you expect to happen:
metrics
node_publish_total
,node_unpublish_total
,node_publish_error_total
,node_unpublish_error_total
, andsync_k8s_secret_total
metrics to be available on:8098/metrics
I would expect that even if some of these metrics aren't created unless there are values to report (i.e. they default to
null
rather than0
), that I would still have at leastnode_publish_error_total
. When I revert to v1.3.4 in our dev environment, this metric is immediately published with value1
(which is a different issue I need to investigate 😅 ).Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Which provider are you using:
[e.g. Azure Key Vault, HashiCorp Vault, etc. Have you checked out the provider's repo for more help?]
GCP. I have checked https://github.com/GoogleCloudPlatform/secrets-store-csi-driver-provider-gcp/issues?q=is:issue+metrics and don't see any related issues.
Environment:
kubectl version
):The text was updated successfully, but these errors were encountered: