-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vertical Pod Autoscaler is not recreating pods at runtime #6915
Comments
Faced the same issue as well. I set the Deployment with cpu of 500m. My expectation was that if I bombard the workloads with requests to a point where they require more cpu than the maximu allowed, the vpa was going to apply a new value recommended value and recreate the pod. In my case the recommended value for cpu was 564m. What happened is that the pod was never recreated but the pods were running with CPU higher than the limit set without ever recreating this pod to apply new values |
/area vertical-pod-autoscaler |
Have you looked at the logs for the updater to see if it mentions why it doesn't evict the pod? |
I setup a test, and after a short while the eviction did happen:
|
I have exactly the same problem and also use the versions mentioned above k8s v1.29, VPA v1.1.2 (btw. it works for me with k8s v1.26 VPA v.1.1.2). In addition, I use the provided pod deployment "hamster" to check the functionalities and have found that the admissionontroller does not generate any requests. Strangely enough, the hamster pod restarts, but always with the same specs |
For the initial resource requests it is alway updating. However, when I increase the load and the cpu utilization required goes beyond the limit set in the deployment manifest, the pods do not get updated with the new recommended values but when I check resource utilization using the 'top pods' command, I can see that CPU utilization is beyond my set limit but there is no event for an update of any new resource requests on the pod |
Can you provide an example VPA config that isn't working, specifically the I know that there's a bug with the targetRef that if the kind isn't capitalised correctly, some parts of the VPA don't work. For example, when I have
|
Please provide logs from the admission-controller |
/triage needs-information |
Would be great to see the admission-controller and updater logs when the issue happens. Any chance someone could share those here? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
Hello @adrianmoisey @raywainman . Same issue as @okanIz.
This is logs from updater
mutatingwebhookconfigurations.admissionregistration.k8s.io/vpa-webhook-config
|
Everything here looks good. I found this article which includes info about monitoring the webhook. Could you see if you can get logs from the control-plane? |
Based on this comment, I'm wondering if there's some proxy allowed setting in EKS that doesn't contain the IP address for the |
I think this is not our case. We use Cilium who add Overlay network to our cluster and this is a reason why kube-api can't make connection to pods. aws/containers-roadmap#2227 I'll try using hostNetwork again and fix the kube-api hanging and come back with a solution. |
I found solution. In EKS when we you use Cilium you should use btw, when yours vpa webhook doesn't work it add 30s delay to deploy EACH pods in cluster(because default vpa webhook intercepts all CREATE and UPDATE pods api calls) |
Do you have any EKS documentation that speaks about these problems and solutions? I'd like to se if we can link to them from the VPA documentation |
I know only cilium issue - cilium/cilium#21959 |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen I want to link the docs to relevant resources still |
@adrianmoisey: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/assign |
Ref: - AKS: kubernetes#7392 - EKS: kubernetes#6915
Which component are you using?:
vertical-pod-autoscaler
What version of the component are you using?:
v1.1.2
What k8s version are you using (
kubectl version
)?:Client Version: v1.29.3-eks-ae9a62a
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4-eks-036c24b
What environment is this in?:
AWS Elastic Kubernetes Service
What did you expect to happen?:
Whenever, the load increases on the existing pods, that goes beyond the limit specified in the deployment spec., VPA should recommend new values and recreate the new pods.
What happened instead?:
Whenever, the load increases on the existing pods, that goes beyond the limit specified in the deployment spec., VPA is recommending new value but recreation of the new pods is not happening.
How to reproduce it (as minimally and precisely as possible):
Configured VPA with the below spec:
resourcePolicy:
containerPolicies:
- containerName: 'ecmweb'
minAllowed:
cpu: 200m
memory: 1024Mi
maxAllowed:
cpu: 1000m
memory: 3072Mi
controlledResources: ["cpu","memory"]
updatePolicy:
updateMode: "Recreate"
Apply the VPA.yml and the recommended value were as below:
Recommendation:
Container Recommendations:
Container Name: ecmweb
Lower Bound:
Cpu: 200m
Memory: 3Gi
Target:
Cpu: 763m
Memory: 3Gi
Uncapped Target:
Cpu: 763m
Memory: 12965919539
Upper Bound:
Cpu: 1
Memory: 3Gi
Events:
Updated the deployment spec as below:
resources:
requests:
memory: 50Mi
cpu: 1m
limits:
memory: 4096Mi
cpu: 800m
Redeploy the containers and when described the newly created pod below was the allocated resources from VPA:
kubectl describe po ecmweb-64574fc46d-hgmz7 -n newgen | grep cpu
vpaUpdates: Pod resources updated by ecmweb: container 0: cpu request, memory request, cpu limit, memory limit
cpu: 610400m
cpu: 763m
kubectl describe po ecmweb-64574fc46d-hgmz7 -n newgen | grep memory
vpaUpdates: Pod resources updated by ecmweb: container 0: cpu request, memory request, cpu limit, memory limit
memory: 263882790666
memory: 3Gi
After that I generated the load using the Jmeter tool and observed the current resource utilization, at one point resource utilization was as below:
The text was updated successfully, but these errors were encountered: