Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU usage of nginx worker process #10992

Closed
jakuboskera opened this issue Feb 19, 2024 · 7 comments
Closed

100% CPU usage of nginx worker process #10992

jakuboskera opened this issue Feb 19, 2024 · 7 comments
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@jakuboskera
Copy link

What happened:

We are using on-premise K8s via TKGI from VMware. When we upgraded from TKGI 1.16.3 (Kubernetes v1.25.12) to 1.18.1 (Kubernetes v1.27.8), after a few days there was a problem that ingress-nginx pods were randomly mining all CPU cores at 100%. First it happened with one pod, then with another, and so gradually until ingress-nginx was using 100% of the entire cluster's CPU. The only temporary solution is to kill these pods to create new ones that already have a good CPU.

What you expected to happen:

CPU of ingress-nginx remains at normal – typically for our environment on ~10-20m CPU instead of 1000,2000 or 3000m CPU.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

NGINX Ingress controller
  Release:       v1.9.5
  Build:         f503c4bb5fa7d857ad29e94970eb550c2bc00b7c
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

Kubernetes version (use kubectl version):

v1.27.8

Environment:

  • Cloud provider or hardware configuration: VMware TKGI

  • OS (e.g. from /etc/os-release): Ubuntu 22.04.3 LTS

  • Kernel (e.g. uname -a): Linux d95608af-3eb5-4d69-ad64-5603722db030 6.2.0-39-generic Developer documentation #40~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 16 10:53:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:

    • kubeadm
  • Basic cluster related info:

    • kubectl version

      Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:14:48Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"darwin/amd64"}
      Kustomize Version: v5.0.1
      Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.8+vmware.1", GitCommit:"dba4dc9cc64d96ee4d003860d8fdb1722db28eb0", GitTreeState:"clean", BuildDate:"2023-11-17T05:55:19Z", GoVersion:"go1.20.11", Compiler:"gc", Platform:"linux/amd64"}
    • kubectl get nodes -o wide

      NAME                                   STATUS   ROLES    AGE     VERSION            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
      1dd0cedb-76cf-4203-8d76-a072b36cf6f5   Ready    <none>   5d18h   v1.27.8+vmware.1   X.X.X.16   X.X.X.16   Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
      2c483d04-f301-421c-b1c1-6478cb4c1fbb   Ready    <none>   5d19h   v1.27.8+vmware.1   X.X.X.6    X.X.X.6    Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
      3b4008ad-2a25-4eb0-9f2d-a418381e3c99   Ready    <none>   5d19h   v1.27.8+vmware.1   X.X.X.15   X.X.X.15   Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
      4995305e-ac00-4e32-a89a-ab3c494024d1   Ready    <none>   5d18h   v1.27.8+vmware.1   X.X.X.7    X.X.X.7    Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
      506bd7d1-eeb5-4cd3-8257-2f1787941879   Ready    <none>   5d18h   v1.27.8+vmware.1   X.X.X.9    X.X.X.9    Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
      57cf796c-fd8f-4617-95e2-7b04600a59ea   Ready    <none>   5d18h   v1.27.8+vmware.1   X.X.X.19   X.X.X.19   Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
      67c4d572-70dd-4a35-806b-3bcfce80ca5e   Ready    <none>   5d18h   v1.27.8+vmware.1   X.X.X.17   X.X.X.17   Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
      73679894-5b0c-4854-b52b-ff74de0a2367   Ready    <none>   5d18h   v1.27.8+vmware.1   X.X.X.11   X.X.X.11   Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
      78ef86b5-652c-49be-8705-09e6c023ceff   Ready    <none>   5d19h   v1.27.8+vmware.1   X.X.X.13   X.X.X.13   Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
      9052dada-c08b-4730-a617-0019611d6e41   Ready    <none>   5d18h   v1.27.8+vmware.1   X.X.X.12   X.X.X.12   Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
      9aab7f93-c011-468d-8898-f44cf6a00151   Ready    <none>   5d19h   v1.27.8+vmware.1   X.X.X.5    X.X.X.5    Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
      bd6af46b-e4ec-473c-ac94-15e7af91db93   Ready    <none>   5d18h   v1.27.8+vmware.1   X.X.X.8    X.X.X.8    Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
      d01334bc-5365-479f-a3b7-94fc4a0abb0e   Ready    <none>   5d19h   v1.27.8+vmware.1   X.X.X.14   X.X.X.14   Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
      e8a3ec30-7f99-4d33-ac1e-56d21e2bf338   Ready    <none>   5d18h   v1.27.8+vmware.1   X.X.X.18   X.X.X.18   Ubuntu 22.04.3 LTS   6.2.0-39-generic   containerd://1.6.24
  • How was the ingress-nginx-controller installed:

    Cluster has two instance of ingress-nginx installed via Helm using these values

    Values of first instance

    controller:
       autoscaling:
          enabled: true
          minReplicas: 14
          maxReplicas: 20
       extraArgs:
          enable-ssl-passthrough: ""
       ingressClass: foo
       ingressClassByName: true
       ingressClassResource:
          name: foo
          controllerValue: example.com/foo
       config:
          proxy-body-size: "0"
          enable-underscores-in-headers: "true"
          log-format-escape-json: "true"
          log-format-upstream: '{"time": "$time_iso8601", "remote_addr": "$proxy_protocol_addr", "x_forwarded_for": "$proxy_add_x_forwarded_for", "request_id": "$req_id", "remote_user": "$remote_user", "bytes_sent": $bytes_sent, "request_time": $request_time, "status": $status, "vhost": "$host", "request_proto": "$server_protocol", "path": "$uri", "request_query": "$args", "request_length": $request_length, "duration": $request_time,"method": "$request_method", "http_referrer": "$http_referer", "http_user_agent": "$http_user_agent", "kubernetes_namespace": "$namespace", "kubernetes_ingress_name": "$ingress_name", "kubernetes_service_name": "$service_name", "kubernetes_service_port": "$service_port"}'
       service:
          loadBalancerIP: X.X.X.X
       resources:
          requests:
             cpu: 100m
             memory: 300Mi
       metrics:
          enabled: true
          serviceMonitor:
             enabled: true

    Values of second instance

    controller:
       autoscaling:
          enabled: true
          minReplicas: 14
          maxReplicas: 20
       extraArgs:
          enable-ssl-passthrough: ""
       ingressClass: bar
       ingressClassByName: true
       ingressClassResource:
          name: bar
          controllerValue: example.com/bar
       config:
          proxy-body-size: "0"
          enable-underscores-in-headers: "true"
          log-format-escape-json: "true"
          log-format-upstream: '{"time": "$time_iso8601", "remote_addr": "$proxy_protocol_addr", "x_forwarded_for": "$proxy_add_x_forwarded_for", "request_id": "$req_id", "remote_user": "$remote_user", "bytes_sent": $bytes_sent, "request_time": $request_time, "status": $status, "vhost": "$host", "request_proto": "$server_protocol", "path": "$uri", "request_query": "$args", "request_length": $request_length, "duration": $request_time,"method": "$request_method", "http_referrer": "$http_referer", "http_user_agent": "$http_user_agent", "kubernetes_namespace": "$namespace", "kubernetes_ingress_name": "$ingress_name", "kubernetes_service_name": "$service_name", "kubernetes_service_port": "$service_port"}'
       service:
          loadBalancerIP: X.X.X.X
       resources:
          requests:
             cpu: 100m
             memory: 300Mi
       metrics:
          enabled: true
          serviceMonitor:
             enabled: true
  • Current State of the controller:

    • kubectl describe ingressclasses

        Name:         foo
        Labels:       app.kubernetes.io/component=controller
                      app.kubernetes.io/instance=foo
                      app.kubernetes.io/managed-by=Helm
                      app.kubernetes.io/name=ingress-nginx
                      app.kubernetes.io/part-of=ingress-nginx
                      app.kubernetes.io/version=1.9.5
                      helm.sh/chart=ingress-nginx-4.9.0
        Annotations:  <none>
        Controller:   example.com/foo
        Events:       <none>
      
        Name:         bar
        Labels:       app.kubernetes.io/component=controller
                      app.kubernetes.io/instance=bar
                      app.kubernetes.io/managed-by=Helm
                      app.kubernetes.io/name=ingress-nginx
                      app.kubernetes.io/part-of=ingress-nginx
                      app.kubernetes.io/version=1.9.5
                      helm.sh/chart=ingress-nginx-4.9.0
        Annotations:  <none>
        Controller:   example.com/bar
        Events:       <none>
    • kubectl -n <ingresscontrollernamespace> get all -A -o wide

    NAME                                                                  READY   STATUS    RESTARTS       AGE     IP             NODE                                   NOMINATED NODE   READINESS GATES
    pod/ingress-nginx-controller-84bd457d8c-45xsx                1/1     Running   0              13h     X.X.X.13   e8a3ec30-7f99-4d33-ac1e-56d21e2bf338   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-9rtmc                1/1     Running   0              68m     X.X.X.34   bd6af46b-e4ec-473c-ac94-15e7af91db93   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-bb7sm                1/1     Running   0              68m     X.X.X.36   78ef86b5-652c-49be-8705-09e6c023ceff   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-cr4mj                1/1     Running   0              13h     X.X.X.27   3b4008ad-2a25-4eb0-9f2d-a418381e3c99   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-ctggd                1/1     Running   0              68m     X.X.X.32   57cf796c-fd8f-4617-95e2-7b04600a59ea   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-fxt8m                1/1     Running   0              68m     X.X.X.33   506bd7d1-eeb5-4cd3-8257-2f1787941879   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-gqht2                1/1     Running   0              68m     X.X.X.23   67c4d572-70dd-4a35-806b-3bcfce80ca5e   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-hdv6c                1/1     Running   0              13h     X.X.X.11   506bd7d1-eeb5-4cd3-8257-2f1787941879   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-hgtlk                1/1     Running   1 (15h ago)    36h     X.X.X.20   9052dada-c08b-4730-a617-0019611d6e41   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-j5l8b                1/1     Running   0              68m     X.X.X.39   4995305e-ac00-4e32-a89a-ab3c494024d1   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-kr9gc                1/1     Running   0              68m     X.X.X.30   2c483d04-f301-421c-b1c1-6478cb4c1fbb   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-m4d7m                1/1     Running   0              68m     X.X.X.21   bd6af46b-e4ec-473c-ac94-15e7af91db93   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-nb7tq                1/1     Running   0              68m     X.X.X.28   4995305e-ac00-4e32-a89a-ab3c494024d1   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-nc95t                1/1     Running   0              36h     X.X.X.37   d01334bc-5365-479f-a3b7-94fc4a0abb0e   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-njb9s                1/1     Running   0              13h     X.X.X.12   e8a3ec30-7f99-4d33-ac1e-56d21e2bf338   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-pvmrt                1/1     Running   0              68m     X.X.X.19   78ef86b5-652c-49be-8705-09e6c023ceff   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-tfr75                1/1     Running   0              68m     X.X.X.35   d01334bc-5365-479f-a3b7-94fc4a0abb0e   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-v9vrn                1/1     Running   0              68m     X.X.X.38   9052dada-c08b-4730-a617-0019611d6e41   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-xpvxz                1/1     Running   0              68m     X.X.X.29   9aab7f93-c011-468d-8898-f44cf6a00151   <none>           <none>
    pod/ingress-nginx-controller-84bd457d8c-z5cvl                1/1     Running   0              36h     X.X.X.22   3b4008ad-2a25-4eb0-9f2d-a418381e3c99   <none>           <none>
    
    NAME                                                                   TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE   SELECTOR
    service/ingress-nginx-controller                              LoadBalancer   X.X.X.78    X.X.X.X   80:31139/TCP,443:31349/TCP   37d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx-foo
    service/ingress-nginx-controller-admission                    ClusterIP      X.X.X.219   <none>          443/TCP                      39d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx-foo
    service/ingress-nginx-controller-metrics                      ClusterIP      X.X.X.250   <none>          10254/TCP                    39d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx-foo
    
    NAME                                                                 READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                                                                                                                    SELECTOR
    deployment.apps/ingress-nginx-controller                    20/20   20           20          39d   controller   registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx
    
    NAME                                                                            DESIRED   CURRENT   READY   AGE     CONTAINERS   IMAGES                                                                                                                    SELECTOR
    replicaset.apps/ingress-nginx-controller-55d8899cc6                    0         0         0       35d     controller   registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx,pod-template-hash=55d8899cc6
    replicaset.apps/ingress-nginx-controller-5c75bfbdb                     0         0         0       2d18h   controller   registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx,pod-template-hash=5c75bfbdb
    replicaset.apps/ingress-nginx-controller-5f9d9c7984                    0         0         0       2d18h   controller   registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx,pod-template-hash=5f9d9c7984
    replicaset.apps/ingress-nginx-controller-6464475c6c                    0         0         0       39d     controller   registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx,pod-template-hash=6464475c6c
    replicaset.apps/ingress-nginx-controller-6f47f679d7                    0         0         0       2d17h   controller   registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx,pod-template-hash=6f47f679d7
    replicaset.apps/ingress-nginx-controller-6fc659fdff                    0         0         0       2d22h   controller   registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx,pod-template-hash=6fc659fdff
    replicaset.apps/ingress-nginx-controller-6fc74c9d69                    0         0         0       3d23h   controller   registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx,pod-template-hash=6fc74c9d69
    replicaset.apps/ingress-nginx-controller-75697fbc5c                    0         0         0       46h     controller   registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx,pod-template-hash=75697fbc5c
    replicaset.apps/ingress-nginx-controller-847fdc4c59                    0         0         0       2d10h   controller   registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx,pod-template-hash=847fdc4c59
    replicaset.apps/ingress-nginx-controller-84bd457d8c                    20        20        20      36h     controller   registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx,pod-template-hash=84bd457d8c
    replicaset.apps/ingress-nginx-controller-8684bc8f5d                    0         0         0       2d17h   controller   registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e   app.kubernetes.io/component=controller,app.kubernetes.io/instance=foo,app.kubernetes.io/name=ingress-nginx,pod-template-hash=8684bc8f5d
    
    NAME                                                                                     REFERENCE                                                       TARGETS            MINPODS   MAXPODS   REPLICAS   AGE
    horizontalpodautoscaler.autoscaling/ingress-nginx-controller                    Deployment/ingress-nginx-controller                    75%/50%, 51%/50%   14        20        20         39d
    • kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
    Name:         ingress-nginx-controller-84bd457d8c-z5cvl
    Namespace:    ingress-nginx
    Priority:     0
    Node:         3b4008ad-2a25-4eb0-9f2d-a418381e3c99/10.141.0.15
    Start Time:   Sat, 17 Feb 2024 21:41:39 +0100
    Labels:       app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=foo
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.9.5
                  helm.sh/chart=ingress-nginx-4.9.0
                  pod-template-hash=84bd457d8c
    Annotations:  kubectl.kubernetes.io/restartedAt: 2024-02-17T20:41:39Z
    Status:       Running
    IP:           X.X.X.22
    IPs:
    IP:           X.X.X.22
    Controlled By:  ReplicaSet/ingress-nginx-controller-84bd457d8c
    Containers:
    controller:
       Container ID:  containerd://a2c0dea5995e6a26f46de6d510ea85d44c4d0068dc865753d3e4b64594842e08
       Image:         registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e
       Image ID:      registry.k8s.io/ingress-nginx/controller@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e
       Ports:         80/TCP, 443/TCP, 10254/TCP, 8443/TCP
       Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
       Args:
          /nginx-ingress-controller
          --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
          --election-id=ingress-nginx-leader
          --controller-class=example.com/foo-access
          --ingress-class=foo
          --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
          --validating-webhook=:8443
          --validating-webhook-certificate=/usr/local/certificates/cert
          --validating-webhook-key=/usr/local/certificates/key
          --ingress-class-by-name=true
          --enable-ssl-passthrough
          --v=5
       State:          Running
          Started:      Sat, 17 Feb 2024 21:41:43 +0100
       Ready:          True
       Restart Count:  0
       Requests:
          cpu:      100m
          memory:   300Mi
       Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
       Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
       Environment:
          POD_NAME:       ingress-nginx-controller-84bd457d8c-z5cvl (v1:metadata.name)
          POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
          LD_PRELOAD:     /usr/local/lib/libmimalloc.so
       Mounts:
          /usr/local/certificates/ from webhook-cert (ro)
          /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qkr78 (ro)
    Conditions:
    Type              Status
    Initialized       True 
    Ready             True 
    ContainersReady   True 
    PodScheduled      True 
    Volumes:
    webhook-cert:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  ingress-nginx-admission
       Optional:    false
    kube-api-access-qkr78:
       Type:                    Projected (a volume that contains injected data from multiple sources)
       TokenExpirationSeconds:  3607
       ConfigMapName:           kube-root-ca.crt
       ConfigMapOptional:       <nil>
       DownwardAPI:             true
    QoS Class:                   Burstable
    Node-Selectors:              kubernetes.io/os=linux
    Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                               node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
    Events:                      <none>
    
    • kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
    Name:                     ingress-nginx-controller
    Namespace:                ingress-nginx
    Labels:                   app.kubernetes.io/component=controller
                            app.kubernetes.io/instance=internal
                            app.kubernetes.io/managed-by=Helm
                            app.kubernetes.io/name=ingress-nginx
                            app.kubernetes.io/part-of=ingress-nginx
                            app.kubernetes.io/version=1.9.5
                            argocd.argoproj.io/instance=k8s-test-01-ingress-nginx
                            helm.sh/chart=ingress-nginx-4.9.0
    Annotations:              ncp/internal_ip_for_policy: X.X.X.X
    Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=internal,app.kubernetes.io/name=ingress-nginx
    Type:                     LoadBalancer
    IP Family Policy:         SingleStack
    IP Families:              IPv4
    IP:                       X.X.X.X
    IPs:                      X.X.X.X
    IP:                       X.X.X.X
    LoadBalancer Ingress:     X.X.X.X
    Port:                     http  80/TCP
    TargetPort:               http/TCP
    NodePort:                 http  31139/TCP
    Endpoints:                X.X.X.11:80,X.X.X.12:80,X.X.X.13:80 + 17 more...
    Port:                     https  443/TCP
    TargetPort:               https/TCP
    NodePort:                 https  31349/TCP
    Endpoints:                X.X.X.11:443,X.X.X.12:443,X.X.X.13:443 + 17 more...
    Session Affinity:         None
    External Traffic Policy:  Cluster
    Events:                   <none>
  • Current state of ingress object, if applicable:

    • Cannot submit ingress manifests, however I applied all annotations used on all ingresses
    nginx.ingress.kubernetes.io/add-base-url: true
    nginx.ingress.kubernetes.io/affinity: cookie
    nginx.ingress.kubernetes.io/client-body-buffer-size: 0
    nginx.ingress.kubernetes.io/configuration-snippet: more_set_headers "X-Frame-Options: DENY";
    nginx.ingress.kubernetes.io/custom-http-errors: 502,503
    nginx.ingress.kubernetes.io/default-backend: defaultpage
    nginx.ingress.kubernetes.io/enable-cors: true
    nginx.ingress.kubernetes.io/proxy-body-size: 1500m
    nginx.ingress.kubernetes.io/proxy-body-size: 50m
    nginx.ingress.kubernetes.io/proxy-body-size: 8m
    nginx.ingress.kubernetes.io/proxy-buffer-size: 128k
    nginx.ingress.kubernetes.io/proxy-buffer-size: 16k
    nginx.ingress.kubernetes.io/proxy-buffer-size: 256k
    nginx.ingress.kubernetes.io/proxy-buffer-size: 32k
    nginx.ingress.kubernetes.io/proxy-buffering: on
    nginx.ingress.kubernetes.io/proxy-buffers-number: 16
    nginx.ingress.kubernetes.io/proxy-buffers-number: 16 16k
    nginx.ingress.kubernetes.io/proxy-buffers-number: 4
    nginx.ingress.kubernetes.io/proxy-buffers-number: 4 32k
    nginx.ingress.kubernetes.io/proxy-connect-timeout: 100s
    nginx.ingress.kubernetes.io/proxy-connect-timeout: 180
    nginx.ingress.kubernetes.io/proxy-connect-timeout: 2400
    nginx.ingress.kubernetes.io/proxy-connect-timeout: 30
    nginx.ingress.kubernetes.io/proxy-connect-timeout: 300
    nginx.ingress.kubernetes.io/proxy-connect-timeout: 600
    nginx.ingress.kubernetes.io/proxy-connect-timeout: 600s
    nginx.ingress.kubernetes.io/proxy-next-upstream-timeout: 600s
    nginx.ingress.kubernetes.io/proxy-read-timeout: 100s
    nginx.ingress.kubernetes.io/proxy-read-timeout: 180
    nginx.ingress.kubernetes.io/proxy-read-timeout: 1800
    nginx.ingress.kubernetes.io/proxy-read-timeout: 2400
    nginx.ingress.kubernetes.io/proxy-read-timeout: 300
    nginx.ingress.kubernetes.io/proxy-read-timeout: 330s
    nginx.ingress.kubernetes.io/proxy-read-timeout: 3600
    nginx.ingress.kubernetes.io/proxy-read-timeout: 600
    nginx.ingress.kubernetes.io/proxy-read-timeout: 600s
    nginx.ingress.kubernetes.io/proxy-send-timeout: 100s
    nginx.ingress.kubernetes.io/proxy-send-timeout: 180
    nginx.ingress.kubernetes.io/proxy-send-timeout: 2400
    nginx.ingress.kubernetes.io/proxy-send-timeout: 301
    nginx.ingress.kubernetes.io/proxy-send-timeout: 3600
    nginx.ingress.kubernetes.io/proxy-send-timeout: 600s
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/rewrite-target: /$2
    nginx.ingress.kubernetes.io/server-alias: example.com
    nginx.ingress.kubernetes.io/server-snippets: |
      http2_max_concurrent_streams 2000;
      error_log stderr info;
      keepalive_requests 1000;
    nginx.ingress.kubernetes.io/use-regex: true
  • Others:

    I enabled debug mode --v=5 and found out that when CPU of the nginx worker process is on 100 % then in the logs of this pod is

    2024/02/16 12:21:09 [debug] 712#712: *123040 http client request body rest 69180
    2024/02/16 12:21:09 [debug] 712#712: *123040 http client request body rest 69180
    2024/02/16 12:21:09 [debug] 712#712: *123040 http client request body rest 69180
    2024/02/16 12:21:09 [debug] 712#712: *123040 http client request body rest 69180
    2024/02/16 12:21:09 [debug] 712#712: *123040 http client request body rest 69180
    2024/02/16 12:21:09 [debug] 712#712: *123040 http client request body rest 69180
    2024/02/16 12:21:09 [debug] 712#712: *123040 http client request body rest 69180
    2024/02/16 12:21:09 [debug] 712#712: *123040 http client request body rest 69180
    2024/02/16 12:21:09 [debug] 712#712: *123040 http client request body rest 69180

    From this log I found out that this debug msg is from process with PID 712 which uses so much CPU. The process is probably stuck in some loop. I found out that this msg comes from this line https://github.com/nginx/nginx/blob/97a111c0c0a40ecaa7771ecec66b8ed37b0350d5/src/http/ngx_http_request_body.c#L402 which is part of func ngx_http_do_read_client_request_body. So maybe it is related to the we have set proxy-body-size: "0" for ingress-nginx, or maybe some ingress's annotation could cause this issue??

    $ top
    Mem: 31687648K used, 1171580K free, 702900K shrd, 1188760K buff, 19580176K cached
    CPU0:  32% usr  29% sys   0% nic  35% idle   0% io   0% irq   2% sirq
    CPU1:  76% usr  20% sys   0% nic   2% idle   0% io   0% irq   0% sirq
    CPU2:  61% usr  32% sys   0% nic   5% idle   0% io   0% irq   0% sirq
    CPU3: 100% usr   0% sys   0% nic   0% idle   0% io   0% irq   0% sirq
    Load average: 6.45 7.11 7.18 16/2336 189
    PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    712   24 www-data S<    190m   1%   0   5% nginx: worker process
    31    24 www-data S<    189m   1%   0   5% nginx: worker process
    8     1 www-data S<   1242m   4%   0   0% /nginx-ingress-controller --publish-service=ingress-nginx/ingress-nginx-internal-controller --election-id=ingress-nginx-i
    179     0 www-data S<    258m   1%   2   0% bash
    189   179 www-data R<    257m   1%   2   0% top
    33    24 www-data S<    190m   1%   2   0% nginx: worker process
    32    24 www-data S<    190m   1%   1   0% nginx: worker process
    24     8 www-data S<    153m   0%   2   0% nginx: master process /usr/bin/nginx -c /etc/nginx/nginx.conf
    34    24 www-data S<    151m   0%   0   0% nginx: cache manager process
    1     0 www-data S<     224   0%   3   0% /usr/bin/dumb-init -- /nginx-ingress-controller --publish-service=ingress-nginx/ingress-nginx-internal-controller --elect

How to reproduce this issue:

Hard to reproduce this issue, even in our environment we couldn't do that. It happens randomly.

@jakuboskera jakuboskera added the kind/bug Categorizes issue or PR as related to a bug. label Feb 19, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 19, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jakuboskera
Copy link
Author

jakuboskera commented Feb 19, 2024

Just found out that it could be related to this bug https://trac.nginx.org/nginx/ticket/2548. As you can see in the list of annotations I have there used annotation nginx.ingress.kubernetes.io/client-body-buffer-size: 0 which set the parameter client_body_buffer_size 0; which probably caused the metioned bug.

Currently I removed this annotation from that ingress resource and I will wait to see if the CPU will be on 100% again or not.

@longwuyuan
Copy link
Contributor

/remove-kind bug

  • The information you have provided is not helpful
  • The information you have provided is not as per the template of a new bug-report
  • You say 2 instances of the controller but you have not provided output of helm ls -A as requested in the template of a new bug-report
  • You say 2 instances but there is only 1 deployment visible in the output but 2 ingressclasses
  • Why do you need 14 replicas
  • Can you please answer the questions that are asked in the new bug-report template

/triage needs-information

@k8s-ci-robot k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 20, 2024
@longwuyuan
Copy link
Contributor

and yes, this link says default value is 8K https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#client-body-buffer-size and so setting a value of 0 by yourself and then reporting a issue here is very odd

@jakuboskera
Copy link
Author

I would not say that it is very odd. Becuase if you have one big cluster where every developer can deploy their ingresses with ther own configuration and just one wrong configuration could take down the whole cluster is not so good.

Have you ever thought about validating values ​of annotations ​with a webhook or something similar? Because this is the good candidate to validate that value is either 8k or 16k, everything else is denied as it could cause unexpected problems.

Neverthless the CPU is still at normal value so removing annotation nginx.ingress.kubernetes.io/client-body-buffer-size: 0 resolved this problem. Hope it will help to someone other to find bug quicker than me although this is actually a bug directly in the core of nginx and not ingress-nginx's fault.

@longwuyuan
Copy link
Contributor

can you close the issue if resolved. if you do not allow any bits/bytes in client body buffer then there will be a infinite loop until connection is terminated. so while your interest to protect seems valid, it seems like a impractical config to deploy. but this is just my guess. so please close the issue if resolved

@webzavoda
Copy link

can you close the issue if resolved. if you do not allow any bits/bytes in client body buffer then there will be a infinite loop until connection is terminated. so while your interest to protect seems valid, it seems like a impractical config to deploy. but this is just my guess. so please close the issue if resolved

it doesnt seem to be proper nginx behaviour, since local (un-kubernetes) nginx behaves differently on 0 as 'allow all'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Development

No branches or pull requests

4 participants