Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertical Pod Autoscaler is not recreating pods at runtime #6915

Open
VivekPandeyDevOps opened this issue Jun 11, 2024 · 25 comments · May be fixed by #7655
Open

Vertical Pod Autoscaler is not recreating pods at runtime #6915

VivekPandeyDevOps opened this issue Jun 11, 2024 · 25 comments · May be fixed by #7655
Assignees
Labels
area/vertical-pod-autoscaler kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@VivekPandeyDevOps
Copy link

Which component are you using?:
vertical-pod-autoscaler

What version of the component are you using?:
v1.1.2

What k8s version are you using (kubectl version)?:
Client Version: v1.29.3-eks-ae9a62a
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4-eks-036c24b

What environment is this in?:
AWS Elastic Kubernetes Service

What did you expect to happen?:
Whenever, the load increases on the existing pods, that goes beyond the limit specified in the deployment spec., VPA should recommend new values and recreate the new pods.

What happened instead?:
Whenever, the load increases on the existing pods, that goes beyond the limit specified in the deployment spec., VPA is recommending new value but recreation of the new pods is not happening.

How to reproduce it (as minimally and precisely as possible):
Configured VPA with the below spec:
resourcePolicy:
containerPolicies:
- containerName: 'ecmweb'
minAllowed:
cpu: 200m
memory: 1024Mi
maxAllowed:
cpu: 1000m
memory: 3072Mi
controlledResources: ["cpu","memory"]
updatePolicy:
updateMode: "Recreate"

Apply the VPA.yml and the recommended value were as below:

Recommendation:
Container Recommendations:
Container Name: ecmweb
Lower Bound:
Cpu: 200m
Memory: 3Gi
Target:
Cpu: 763m
Memory: 3Gi
Uncapped Target:
Cpu: 763m
Memory: 12965919539
Upper Bound:
Cpu: 1
Memory: 3Gi
Events:

Updated the deployment spec as below:
resources:
requests:
memory: 50Mi
cpu: 1m
limits:
memory: 4096Mi
cpu: 800m

Redeploy the containers and when described the newly created pod below was the allocated resources from VPA:
kubectl describe po ecmweb-64574fc46d-hgmz7 -n newgen | grep cpu
vpaUpdates: Pod resources updated by ecmweb: container 0: cpu request, memory request, cpu limit, memory limit
cpu: 610400m
cpu: 763m

kubectl describe po ecmweb-64574fc46d-hgmz7 -n newgen | grep memory
vpaUpdates: Pod resources updated by ecmweb: container 0: cpu request, memory request, cpu limit, memory limit
memory: 263882790666
memory: 3Gi

After that I generated the load using the Jmeter tool and observed the current resource utilization, at one point resource utilization was as below:

NAME     MODE       CPU    MEM   PROVIDED   AGE
ecmweb   Recreate   813m   3Gi   True       148m
ecmweb-64574fc46d-hgmz7                      1/1     Running   0          12m
ecmweb-64574fc46d-q4vsr                      1/1     Running   0          13m
ecmweb-64574fc46d-hgmz7                      1286m        10559Mi
ecmweb-64574fc46d-q4vsr                      1346m        10600Mi

but still, pods were not recreated. 

**Anything else we need to know?**:

<!--
Is there anything else you think we should know? Configuration of the component (be careful what you post here if so)? Relevant logs?
-->
@VivekPandeyDevOps VivekPandeyDevOps added the kind/bug Categorizes issue or PR as related to a bug. label Jun 11, 2024
@Akuku25
Copy link

Akuku25 commented Jun 11, 2024

Faced the same issue as well. I set the Deployment with cpu of 500m. My expectation was that if I bombard the workloads with requests to a point where they require more cpu than the maximu allowed, the vpa was going to apply a new value recommended value and recreate the pod. In my case the recommended value for cpu was 564m.

What happened is that the pod was never recreated but the pods were running with CPU higher than the limit set without ever recreating this pod to apply new values

@adrianmoisey
Copy link
Member

/area vertical-pod-autoscaler

@adrianmoisey
Copy link
Member

Have you looked at the logs for the updater to see if it mentions why it doesn't evict the pod?

@adrianmoisey
Copy link
Member

I setup a test, and after a short while the eviction did happen:

I0611 18:05:08.116565       1 update_priority_calculator.go:143] pod accepted for update default/hamster-7b87ffb764-mw7kq with priority 106.7
I0611 18:05:08.117329       1 update_priority_calculator.go:143] pod accepted for update default/hamster-7b87ffb764-67bvs with priority 106.7
I0611 18:05:08.117390       1 updater.go:220] evicting pod hamster-7b87ffb764-mw7kq
I0611 18:05:08.139078       1 event.go:298] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"hamster-7b87ffb764-mw7kq", UID:"c7192ca8-8d04-4fff-8733-4c5891452e18", APIVersion:"v1", ResourceVersion:"30413605", FieldPath:""}): type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.

@okanIz
Copy link

okanIz commented Jun 12, 2024

I have exactly the same problem and also use the versions mentioned above k8s v1.29, VPA v1.1.2 (btw. it works for me with k8s v1.26 VPA v.1.1.2). In addition, I use the provided pod deployment "hamster" to check the functionalities and have found that the admissionontroller does not generate any requests. Strangely enough, the hamster pod restarts, but always with the same specs

@Akuku25
Copy link

Akuku25 commented Jun 12, 2024

I setup a test, and after a short while the eviction did happen:

I0611 18:05:08.116565       1 update_priority_calculator.go:143] pod accepted for update default/hamster-7b87ffb764-mw7kq with priority 106.7
I0611 18:05:08.117329       1 update_priority_calculator.go:143] pod accepted for update default/hamster-7b87ffb764-67bvs with priority 106.7
I0611 18:05:08.117390       1 updater.go:220] evicting pod hamster-7b87ffb764-mw7kq
I0611 18:05:08.139078       1 event.go:298] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"hamster-7b87ffb764-mw7kq", UID:"c7192ca8-8d04-4fff-8733-4c5891452e18", APIVersion:"v1", ResourceVersion:"30413605", FieldPath:""}): type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.

For the initial resource requests it is alway updating. However, when I increase the load and the cpu utilization required goes beyond the limit set in the deployment manifest, the pods do not get updated with the new recommended values but when I check resource utilization using the 'top pods' command, I can see that CPU utilization is beyond my set limit but there is no event for an update of any new resource requests on the pod

@adrianmoisey
Copy link
Member

I have exactly the same problem and also use the versions mentioned above k8s v1.29, VPA v1.1.2 (btw. it works for me with k8s v1.26 VPA v.1.1.2). In addition, I use the provided pod deployment "hamster" to check the functionalities and have found that the admissionontroller does not generate any requests. Strangely enough, the hamster pod restarts, but always with the same specs

Can you provide an example VPA config that isn't working, specifically the targetRef part.

I know that there's a bug with the targetRef that if the kind isn't capitalised correctly, some parts of the VPA don't work.

For example, when I have kind: deployment, the admission-controller doesn't match the new Pod since it can't find the VPA:

I0612 17:44:00.005936       1 matcher.go:73] Let's choose from 1 configs for pod default/hamster-7b87ffb764-%
I0612 17:44:00.005981       1 handler.go:82] No matching VPA found for pod default/hamster-7b87ffb764-%

@adrianmoisey
Copy link
Member

For the initial resource requests it is alway updating. However, when I increase the load and the cpu utilization required goes beyond the limit set in the deployment manifest, the pods do not get updated with the new recommended values but when I check resource utilization using the 'top pods' command, I can see that CPU utilization is beyond my set limit but there is no event for an update of any new resource requests on the pod

Please provide logs from the admission-controller

@voelzmo
Copy link
Contributor

voelzmo commented Jul 15, 2024

/triage needs-information

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jul 15, 2024
@raywainman
Copy link
Contributor

Would be great to see the admission-controller and updater logs when the issue happens.

Any chance someone could share those here?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 13, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 12, 2024
@avdeevartem2
Copy link

avdeevartem2 commented Nov 28, 2024

Hello @adrianmoisey @raywainman . Same issue as @okanIz.
Hamster pods recreate but without changing any requests and limits
AWS EKS version - v1.29.10-eks-7f9249a
Admission contoller and updater version - 1.2.1
This is logs from admission controller

I1128 06:06:33.533192       1 flags.go:57] FLAG: --add-dir-header="false"
I1128 06:06:33.533372       1 flags.go:57] FLAG: --address=":8944"
I1128 06:06:33.533378       1 flags.go:57] FLAG: --alsologtostderr="false"
I1128 06:06:33.533383       1 flags.go:57] FLAG: --client-ca-file="/etc/tls-certs/caCert.pem"
I1128 06:06:33.533388       1 flags.go:57] FLAG: --ignored-vpa-object-namespaces=""
I1128 06:06:33.533392       1 flags.go:57] FLAG: --kube-api-burst="10"
I1128 06:06:33.533397       1 flags.go:57] FLAG: --kube-api-qps="5"
I1128 06:06:33.533401       1 flags.go:57] FLAG: --kubeconfig=""
I1128 06:06:33.533405       1 flags.go:57] FLAG: --log-backtrace-at=":0"
I1128 06:06:33.533411       1 flags.go:57] FLAG: --log-dir=""
I1128 06:06:33.533415       1 flags.go:57] FLAG: --log-file=""
I1128 06:06:33.533420       1 flags.go:57] FLAG: --log-file-max-size="1800"
I1128 06:06:33.533424       1 flags.go:57] FLAG: --logtostderr="true"
I1128 06:06:33.533428       1 flags.go:57] FLAG: --min-tls-version="tls1_2"
I1128 06:06:33.533433       1 flags.go:57] FLAG: --one-output="false"
I1128 06:06:33.533437       1 flags.go:57] FLAG: --port="8000"
I1128 06:06:33.533441       1 flags.go:57] FLAG: --register-by-url="false"
I1128 06:06:33.533445       1 flags.go:57] FLAG: --register-webhook="true"
I1128 06:06:33.533448       1 flags.go:57] FLAG: --reload-cert="false"
I1128 06:06:33.533453       1 flags.go:57] FLAG: --skip-headers="false"
I1128 06:06:33.533458       1 flags.go:57] FLAG: --skip-log-headers="false"
I1128 06:06:33.533462       1 flags.go:57] FLAG: --stderrthreshold="2"
I1128 06:06:33.533466       1 flags.go:57] FLAG: --tls-cert-file="/etc/tls-certs/serverCert.pem"
I1128 06:06:33.617607       1 flags.go:57] FLAG: --tls-ciphers=""
I1128 06:06:33.617642       1 flags.go:57] FLAG: --tls-private-key="/etc/tls-certs/serverKey.pem"
I1128 06:06:33.617649       1 flags.go:57] FLAG: --v="4"
I1128 06:06:33.617654       1 flags.go:57] FLAG: --vmodule=""
I1128 06:06:33.617660       1 flags.go:57] FLAG: --vpa-object-namespace=""
I1128 06:06:33.617664       1 flags.go:57] FLAG: --webhook-address=""
I1128 06:06:33.617668       1 flags.go:57] FLAG: --webhook-port=""
I1128 06:06:33.617691       1 flags.go:57] FLAG: --webhook-service="vpa-webhook"
I1128 06:06:33.617696       1 flags.go:57] FLAG: --webhook-timeout-seconds="30"
I1128 06:06:33.617714       1 main.go:87] Vertical Pod Autoscaler 1.2.1 Admission Controller
I1128 06:06:33.619932       1 reflector.go:289] Starting reflector *v1.VerticalPodAutoscaler (1h0m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:90
I1128 06:06:33.619971       1 reflector.go:325] Listing and watching *v1.VerticalPodAutoscaler from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:90
I1128 06:06:33.720271       1 shared_informer.go:341] caches populated
I1128 06:06:33.720335       1 api.go:94] Initial VPA synced successfully
I1128 06:06:33.721189       1 reflector.go:289] Starting reflector *v1.CronJob (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:33.721242       1 reflector.go:325] Listing and watching *v1.CronJob from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:33.821335       1 shared_informer.go:341] caches populated
I1128 06:06:33.821379       1 fetcher.go:99] Initial sync of CronJob completed
I1128 06:06:33.822094       1 reflector.go:289] Starting reflector *v1.DaemonSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:33.822181       1 reflector.go:325] Listing and watching *v1.DaemonSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:33.922516       1 shared_informer.go:341] caches populated
I1128 06:06:33.922602       1 fetcher.go:99] Initial sync of DaemonSet completed
I1128 06:06:33.922885       1 reflector.go:289] Starting reflector *v1.Deployment (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:33.922902       1 reflector.go:325] Listing and watching *v1.Deployment from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:36.123783       1 shared_informer.go:341] caches populated
I1128 06:06:36.123829       1 fetcher.go:99] Initial sync of Deployment completed
I1128 06:06:36.124074       1 reflector.go:289] Starting reflector *v1.ReplicaSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:36.124102       1 reflector.go:325] Listing and watching *v1.ReplicaSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:50.923663       1 trace.go:236] Trace[1908865649]: "Reflector ListAndWatch" name:k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94 (28-Nov-2024 06:06:36.124) (total time: 14799ms):
Trace[1908865649]: ---"Objects listed" error:<nil> 14705ms (06:06:50.829)
Trace[1908865649]: ---"Resource version extracted" 0ms (06:06:50.829)
Trace[1908865649]: ---"Objects extracted" 5ms (06:06:50.834)
Trace[1908865649]: ---"SyncWith done" 88ms (06:06:50.923)
Trace[1908865649]: ---"Resource version updated" 0ms (06:06:50.923)
Trace[1908865649]: [14.799424184s] [14.799424184s] END
I1128 06:06:51.130815       1 shared_informer.go:341] caches populated
I1128 06:06:51.130880       1 fetcher.go:99] Initial sync of ReplicaSet completed
I1128 06:06:51.131170       1 reflector.go:289] Starting reflector *v1.StatefulSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:51.131288       1 reflector.go:325] Listing and watching *v1.StatefulSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:51.532061       1 shared_informer.go:341] caches populated
I1128 06:06:51.532114       1 fetcher.go:99] Initial sync of StatefulSet completed
I1128 06:06:51.532375       1 reflector.go:289] Starting reflector *v1.ReplicationController (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:51.532490       1 reflector.go:325] Listing and watching *v1.ReplicationController from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:51.633106       1 shared_informer.go:341] caches populated
I1128 06:06:51.633154       1 fetcher.go:99] Initial sync of ReplicationController completed
I1128 06:06:51.633439       1 reflector.go:289] Starting reflector *v1.Job (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:51.633516       1 reflector.go:325] Listing and watching *v1.Job from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I1128 06:06:52.833345       1 shared_informer.go:341] caches populated
I1128 06:06:52.833424       1 fetcher.go:99] Initial sync of Job completed
I1128 06:06:52.834351       1 shared_informer.go:341] caches populated
I1128 06:06:52.834388       1 controller_fetcher.go:141] Initial sync of ReplicationController completed
I1128 06:06:52.834407       1 shared_informer.go:341] caches populated
I1128 06:06:52.835826       1 controller_fetcher.go:141] Initial sync of Job completed
I1128 06:06:52.836373       1 shared_informer.go:341] caches populated
I1128 06:06:52.836409       1 controller_fetcher.go:141] Initial sync of CronJob completed
I1128 06:06:52.836421       1 shared_informer.go:341] caches populated
I1128 06:06:52.836427       1 controller_fetcher.go:141] Initial sync of DaemonSet completed
I1128 06:06:52.836435       1 shared_informer.go:341] caches populated
I1128 06:06:52.836442       1 controller_fetcher.go:141] Initial sync of Deployment completed
I1128 06:06:52.837068       1 shared_informer.go:341] caches populated
I1128 06:06:52.837100       1 controller_fetcher.go:141] Initial sync of ReplicaSet completed
I1128 06:06:52.837124       1 shared_informer.go:341] caches populated
I1128 06:06:52.837131       1 controller_fetcher.go:141] Initial sync of StatefulSet completed
W1128 06:06:52.837222       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W1128 06:06:52.837318       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W1128 06:06:52.837660       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W1128 06:06:52.837951       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W1128 06:06:52.837981       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W1128 06:06:52.837991       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W1128 06:06:52.837999       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I1128 06:06:52.838560       1 reflector.go:289] Starting reflector *v1.LimitRange (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/limitrange/limit_range_calculator.go:60
I1128 06:06:52.838598       1 reflector.go:325] Listing and watching *v1.LimitRange from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/limitrange/limit_range_calculator.go:60
I1128 06:06:52.938351       1 shared_informer.go:341] caches populated
I1128 06:06:52.939657       1 certs.go:41] Successfully read 1168 bytes from /etc/tls-certs/caCert.pem
I1128 06:07:02.981960       1 config.go:174] Self registration as MutatingWebhook succeeded.
I1128 06:13:05.029850       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.Deployment total 173 items received
I1128 06:13:15.539442       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.ReplicationController total 8 items received
I1128 06:13:23.822798       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.Job total 27 items received
I1128 06:13:49.735145       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.CronJob total 17 items received
I1128 06:14:25.848897       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/limitrange/limit_range_calculator.go:60: Watch close - *v1.LimitRange total 9 items received
I1128 06:14:49.846306       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.DaemonSet total 9 items received
I1128 06:14:58.633917       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:90: Watch close - *v1.VerticalPodAutoscaler total 9 items received
I1128 06:15:55.529191       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.StatefulSet total 10 items received
I1128 06:16:38.926442       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.ReplicaSet total 233 items received
I1128 06:20:08.541381       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.ReplicationController total 8 items received
I1128 06:20:43.847634       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.DaemonSet total 6 items received
I1128 06:20:44.636378       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:90: Watch close - *v1.VerticalPodAutoscaler total 6 items received
I1128 06:21:39.824856       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.Job total 35 items received

This is logs from updater

I1128 06:19:33.556206       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/eviction/pods_eviction_restriction.go:387: Watch close - *v1.ReplicaSet total 113 items received
I1128 06:19:51.585378       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.DaemonSet total 11 items received
I1128 06:19:56.923635       1 update_priority_calculator.go:146] pod accepted for update avdeev/hamster-c6967774f-klzsf with priority 7.77 - processed recommendations:
hamster: target: 262144k 477m; uncappedTarget: 262144k 477m;
I1128 06:19:56.923963       1 update_priority_calculator.go:146] pod accepted for update avdeev/hamster-c6967774f-qvjmh with priority 7.77 - processed recommendations:
hamster: target: 262144k 477m; uncappedTarget: 262144k 477m;
I1128 06:19:56.924001       1 updater.go:228] evicting pod avdeev/hamster-c6967774f-klzsf
I1128 06:19:56.946087       1 event.go:298] Event(v1.ObjectReference{Kind:"Pod", Namespace:"avdeev", Name:"hamster-c6967774f-klzsf", UID:"54dc0b73-0f9b-470d-a91f-030ebe8beee2", APIVersion:"v1", ResourceVersion:"1365941002", FieldPath:""}): type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.
I1128 06:20:13.761599       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/eviction/pods_eviction_restriction.go:387: Watch close - *v1.DaemonSet total 9 items received
I1128 06:20:52.049534       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.Deployment total 85 items received
I1128 06:20:56.926458       1 update_priority_calculator.go:146] pod accepted for update avdeev/hamster-c6967774f-qvjmh with priority 7.77 - processed recommendations:
hamster: target: 262144k 477m; uncappedTarget: 262144k 477m;
I1128 06:20:56.926670       1 update_priority_calculator.go:146] pod accepted for update avdeev/hamster-c6967774f-5x26r with priority 7.77 - processed recommendations:
hamster: target: 262144k 477m; uncappedTarget: 262144k 477m;
I1128 06:20:56.926712       1 updater.go:228] evicting pod avdeev/hamster-c6967774f-qvjmh
I1128 06:20:57.005417       1 event.go:298] Event(v1.ObjectReference{Kind:"Pod", Namespace:"avdeev", Name:"hamster-c6967774f-qvjmh", UID:"b69a84e3-cb61-4187-b92f-0263d2d8dbb7", APIVersion:"v1", ResourceVersion:"1365943351", FieldPath:""}): type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.
I1128 06:21:42.449800       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.Job total 36 items received
I1128 06:21:44.438531       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/logic/updater.go:302: Watch close - *v1.Pod total 950 items received
I1128 06:21:57.104293       1 update_priority_calculator.go:146] pod accepted for update avdeev/hamster-c6967774f-btzc7 with priority 7.77 - processed recommendations:
hamster: target: 262144k 477m; uncappedTarget: 262144k 477m;
I1128 06:21:57.104365       1 update_priority_calculator.go:146] pod accepted for update avdeev/hamster-c6967774f-5x26r with priority 7.77 - processed recommendations:
hamster: target: 262144k 477m; uncappedTarget: 262144k 477m;
I1128 06:21:57.104395       1 updater.go:228] evicting pod avdeev/hamster-c6967774f-btzc7
I1128 06:21:57.121245       1 event.go:298] Event(v1.ObjectReference{Kind:"Pod", Namespace:"avdeev", Name:"hamster-c6967774f-btzc7", UID:"3bc408ca-709f-4386-8b30-f679b788bcea", APIVersion:"v1", ResourceVersion:"1365948326", FieldPath:""}): type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.

mutatingwebhookconfigurations.admissionregistration.k8s.io/vpa-webhook-config

   Manager:         admission-controller
    Operation:       Update
    Time:            2024-11-28T06:07:02Z
  Resource Version:  1365911218
  UID:               531c38db-d60f-4d21-841d-59996782d70c
Webhooks:
  Admission Review Versions:
    v1
  Client Config:
    Ca Bundle:  Some CA
    Service:
      Name:        vpa-webhook
      Namespace:   vertical-pod-autoscaler
      Port:        443
  Failure Policy:  Ignore
  Match Policy:    Equivalent
  Name:            vpa.k8s.io
  Namespace Selector:
    Match Expressions:
      Key:       kubernetes.io/metadata.name
      Operator:  NotIn
      Values:

  Object Selector:
  Reinvocation Policy:  Never
  Rules:
    API Groups:

    API Versions:
      v1
    Operations:
      CREATE
    Resources:
      pods
    Scope:  *
    API Groups:
      autoscaling.k8s.io
    API Versions:
      *
    Operations:
      CREATE
      UPDATE
    Resources:
      verticalpodautoscalers
    Scope:          *
  Side Effects:     None
  Timeout Seconds:  30
Events:             <none>

@adrianmoisey
Copy link
Member

Everything here looks good.
Since both of you are struggling on EKS, I wonder if it's something specific to EKS?

I found this article which includes info about monitoring the webhook. Could you see if you can get logs from the control-plane?

@avdeevartem2
Copy link

This is error from kube-api.

Failed calling webhook, failing open vpa.k8s.io: failed calling webhook "vpa.k8s.io": failed to call webhook: Post "https://vpa-webhook.vertical-pod-autoscaler.svc:443/?timeout=30s": Address is not allowed

I think i know how to solve this problem, we should use `hostNetwork: true' for our admission-controller deployment. But when I enabled it my kube-api got a lot of errors and couldn't schedule pods in cluster. So strange behaviour.
image

@adrianmoisey
Copy link
Member

Based on this comment, I'm wondering if there's some proxy allowed setting in EKS that doesn't contain the IP address for the vpa-webhook service

@avdeevartem2
Copy link

avdeevartem2 commented Nov 28, 2024

Based on this comment, I'm wondering if there's some proxy allowed setting in EKS that doesn't contain the IP address for the vpa-webhook service

I think this is not our case. We use Cilium who add Overlay network to our cluster and this is a reason why kube-api can't make connection to pods. aws/containers-roadmap#2227

I'll try using hostNetwork again and fix the kube-api hanging and come back with a solution.

@avdeevartem2
Copy link

I found solution. In EKS when we you use Cilium you should use hostNetwork:true or NLB for deployment with
webhooks.
And if you have security groups in your nodes you should allow vpa-admission-controller port(8000 by default) for k8s nodes subnet.
This happens because Cilium add overlay network and control plane in EKS can't connect to clusterIP in cluster.

btw, when yours vpa webhook doesn't work it add 30s delay to deploy EACH pods in cluster(because default vpa webhook intercepts all CREATE and UPDATE pods api calls)

@adrianmoisey
Copy link
Member

Do you have any EKS documentation that speaks about these problems and solutions? I'd like to se if we can link to them from the VPA documentation

@avdeevartem2
Copy link

avdeevartem2 commented Dec 5, 2024

I know only cilium issue - cilium/cilium#21959
and this - aws/containers-roadmap#2227

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 4, 2025
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@adrianmoisey
Copy link
Member

/reopen

I want to link the docs to relevant resources still

@k8s-ci-robot k8s-ci-robot reopened this Jan 4, 2025
@k8s-ci-robot
Copy link
Contributor

@adrianmoisey: Reopened this issue.

In response to this:

/reopen

I want to link the docs to relevant resources still

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@adrianmoisey
Copy link
Member

/assign

adrianmoisey added a commit to adrianmoisey/autoscaler that referenced this issue Jan 5, 2025
@adrianmoisey adrianmoisey linked a pull request Jan 5, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants