Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Metrics starting in 1.4.0? #1686

Open
peakematt opened this issue Nov 11, 2024 · 0 comments
Open

Missing Metrics starting in 1.4.0? #1686

peakematt opened this issue Nov 11, 2024 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@peakematt
Copy link

What steps did you take and what happened:

I'm working to upgrade from v1.3.1 -> v1.4.6. When I performed this upgrade in a development environment, all pods came up cleanly, but I noticed that I lost some metrics like total_node_publish_error from our monitoring tool. I found the changelog entry on v1.4.0 indicating the metric was renamed and updated our monitoring tool to expect node_publish_error_total instead of total_node_publish_error. However, even after handling the rename, the metric still isn't available. Across versions, I'm still getting some metrics like rotation_reconcile_duration_sec.

If I port-forward :8095 on the pod to localhost and view the /metrics page, none of the node_* metrics are shown. To troubleshoot, I tried reverting versions to see where these metrics were lost. the last version where I see this metrics is v1.3.4. It seems like something happened in v1.4.0 that dropped these metrics. Is there a way to get them back?

What did you expect to happen:

metrics node_publish_total, node_unpublish_total, node_publish_error_total, node_unpublish_error_total, and sync_k8s_secret_total metrics to be available on :8098/metrics

I would expect that even if some of these metrics aren't created unless there are values to report (i.e. they default to null rather than 0), that I would still have at least node_publish_error_total. When I revert to v1.3.4 in our dev environment, this metric is immediately published with value 1 (which is a different issue I need to investigate 😅 ).

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Which provider are you using:
[e.g. Azure Key Vault, HashiCorp Vault, etc. Have you checked out the provider's repo for more help?]

GCP. I have checked https://github.com/GoogleCloudPlatform/secrets-store-csi-driver-provider-gcp/issues?q=is:issue+metrics and don't see any related issues.

Environment:

  • Secrets Store CSI Driver version: (use the image tag): v1.4.0
  • Kubernetes version: (use kubectl version):
Client Version: v1.29.7
Server Version: v1.29.9-gke.1496000
@peakematt peakematt added the kind/bug Categorizes issue or PR as related to a bug. label Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant