-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
100% CPU usage of nginx worker process #10992
Comments
This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Just found out that it could be related to this bug https://trac.nginx.org/nginx/ticket/2548. As you can see in the list of annotations I have there used annotation Currently I removed this annotation from that ingress resource and I will wait to see if the CPU will be on 100% again or not. |
/remove-kind bug
/triage needs-information |
and yes, this link says default value is 8K https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#client-body-buffer-size and so setting a value of 0 by yourself and then reporting a issue here is very odd |
I would not say that it is very odd. Becuase if you have one big cluster where every developer can deploy their ingresses with ther own configuration and just one wrong configuration could take down the whole cluster is not so good. Have you ever thought about validating values of annotations with a webhook or something similar? Because this is the good candidate to validate that value is either 8k or 16k, everything else is denied as it could cause unexpected problems. Neverthless the CPU is still at normal value so removing annotation |
can you close the issue if resolved. if you do not allow any bits/bytes in client body buffer then there will be a infinite loop until connection is terminated. so while your interest to protect seems valid, it seems like a impractical config to deploy. but this is just my guess. so please close the issue if resolved |
it doesnt seem to be proper nginx behaviour, since local (un-kubernetes) nginx behaves differently on 0 as 'allow all' |
What happened:
We are using on-premise K8s via TKGI from VMware. When we upgraded from TKGI 1.16.3 (Kubernetes v1.25.12) to 1.18.1 (Kubernetes v1.27.8), after a few days there was a problem that ingress-nginx pods were randomly mining all CPU cores at 100%. First it happened with one pod, then with another, and so gradually until ingress-nginx was using 100% of the entire cluster's CPU. The only temporary solution is to kill these pods to create new ones that already have a good CPU.
What you expected to happen:
CPU of ingress-nginx remains at normal – typically for our environment on ~10-20m CPU instead of 1000,2000 or 3000m CPU.
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
Kubernetes version (use
kubectl version
):v1.27.8
Environment:
Cloud provider or hardware configuration: VMware TKGI
OS (e.g. from /etc/os-release): Ubuntu 22.04.3 LTS
Kernel (e.g.
uname -a
): Linux d95608af-3eb5-4d69-ad64-5603722db030 6.2.0-39-generic Developer documentation #40~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 16 10:53:04 UTC 2 x86_64 x86_64 x86_64 GNU/LinuxInstall tools:
Basic cluster related info:
kubectl version
kubectl get nodes -o wide
How was the ingress-nginx-controller installed:
Cluster has two instance of ingress-nginx installed via Helm using these values
Values of first instance
Values of second instance
Current State of the controller:
kubectl describe ingressclasses
kubectl -n <ingresscontrollernamespace> get all -A -o wide
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
Current state of ingress object, if applicable:
Others:
I enabled debug mode
--v=5
and found out that when CPU of the nginx worker process is on 100 % then in the logs of this pod isFrom this log I found out that this debug msg is from process with PID 712 which uses so much CPU. The process is probably stuck in some loop. I found out that this msg comes from this line https://github.com/nginx/nginx/blob/97a111c0c0a40ecaa7771ecec66b8ed37b0350d5/src/http/ngx_http_request_body.c#L402 which is part of func
ngx_http_do_read_client_request_body
. So maybe it is related to the we have setproxy-body-size: "0"
for ingress-nginx, or maybe some ingress's annotation could cause this issue??How to reproduce this issue:
Hard to reproduce this issue, even in our environment we couldn't do that. It happens randomly.
The text was updated successfully, but these errors were encountered: