-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition between Pods gets deployed and SPC creation/updates #1436
Comments
This issue is related to the helm execution order where CustomResource is always deployed after Deployment. This cause pods deployed with outdated SPC spec when it gets created before SPC changes were applied |
Order of installation could be found here: https://helm.sh/docs/intro/using_helm/ To reproduce this consistently.
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
/lifecycle frozen |
What steps did you take and what happened:
We are seeing a sporadic failure for services installing with CSI Driver the first time where pods are failing with the following issue.
This issue also happens sometimes when we are adding new secret to existing SPC.
Pods with MountVolume Setup failure is usually recycled and resolved. However, we have seen sometimes the application code runs before the secrets are mounted, causing pods to crash and stuck at CrashLoopBackOff even after the restart.
The workaround is to delete the pod that is failing and it will be back to normal.
We did verify the yaml manifest are correctly configured, and it also succeeded in most clusters.
So we are suspecting there is a race condition where CSI Driver attempts to setup volume mount before SPC is created.
Unfortunately, we can't consistently reproduce this bug, but it happens for a lot of our services.
What did you expect to happen:
SecretProviderClass should be created before CSI Driver tries to setup volume mount.
If MountVolume setup failed, the pods should be in ContainerCreating state and retry volume mount setup,
Anything else you would like to add:
We have 20mins timeout configured for helm upgrade command but the pods didn't turn healthy before.
Currently the temporary workaround is to delete the pod when the pods stucks at MountVolume Setup failure or fail to find the secrets in volume.
But this is not ideal to have to manually go into the cluster to monitor pods when we are deploying to many clusters.
Which provider are you using:
Azure KeyVault Provider
Environment:
AzurePublicCloud
kubectl version
): v1.27.7The text was updated successfully, but these errors were encountered: