Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All ingress controllers restart at the same time on configuration change #11083

Closed
DASPRiD opened this issue Mar 9, 2024 · 7 comments
Closed
Labels
kind/support Categorizes issue or PR as a support question. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@DASPRiD
Copy link

DASPRiD commented Mar 9, 2024

What happened:

I have a ingress-nginx installed in a single-node cluster as Deployment kind with a replicaCount set to 2 and a rolling update strategy with maxUnavailable set to 1.

Whenever the configuration is updated (TLS entries added for instance), all controller pods emit NGINX reload triggered due to a change in configuration at the same time, and all of them restart. This takes between 5 to 10 seconds, during which no HTTP traffic is possible.

What you expected to happen:

Pods should restart one by one, so that the service always has at least one pod to talk to.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.9.6
  Build:         6a73aa3b05040a97ef8213675a16142a9c95952a
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version): v1.29.1+k0s

How was the ingress-nginx-controller installed:

helm ls -A | grep -i ingress
ingress-nginx 	ingress-nginx 	14      	2024-02-12 14:34:21.298344591 +0100 CET	deployed	ingress-nginx-1.0.0

Current State of the controller:

kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.9.6
              argocd.argoproj.io/instance=ingress-nginx
              helm.sh/chart=ingress-nginx-4.9.1
Annotations:  meta.helm.sh/release-name: ingress-nginx
              meta.helm.sh/release-namespace: ingress-nginx
Controller:   k8s.io/ingress-nginx
Events:       <none>
kubectl -n ingress-nginx get all -o wide
NAME                                            READY   STATUS    RESTARTS     AGE   IP             NODE            NOMINATED NODE   READINESS GATES
pod/ingress-nginx-controller-5476887d79-4dmq5   1/1     Running   3 (8h ago)   9h    10.244.0.148   furvester.org   <none>           <none>
pod/ingress-nginx-controller-5476887d79-wqf9b   1/1     Running   3 (8h ago)   9h    10.244.0.149   furvester.org   <none>           <none>

NAME                                         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE   SELECTOR
service/ingress-nginx-controller             NodePort    10.99.190.98   <none>        80:30080/TCP,443:30443/TCP   27d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
service/ingress-nginx-controller-admission   ClusterIP   10.99.41.32    <none>        443/TCP                      27d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                                                                                                                    SELECTOR
deployment.apps/ingress-nginx-controller   2/2     2            2           9h    controller   registry.k8s.io/ingress-nginx/controller:v1.9.6@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME                                                  DESIRED   CURRENT   READY   AGE   CONTAINERS   IMAGES                                                                                                                    SELECTOR
replicaset.apps/ingress-nginx-controller-5476887d79   2         2         2       9h    controller   registry.k8s.io/ingress-nginx/controller:v1.9.6@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=5476887d79
kubectl -n ingress-nginx describe po
Name:             ingress-nginx-controller-5476887d79-4dmq5
Namespace:        ingress-nginx
Priority:         0
Service Account:  ingress-nginx
Node:             furvester.org/78.46.174.251
Start Time:       Sat, 09 Mar 2024 17:23:08 +0100
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=ingress-nginx
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.9.6
                  helm.sh/chart=ingress-nginx-4.9.1
                  pod-template-hash=5476887d79
Annotations:      <none>
Status:           Running
IP:               10.244.0.148
IPs:
  IP:           10.244.0.148
Controlled By:  ReplicaSet/ingress-nginx-controller-5476887d79
Containers:
  controller:
    Container ID:    containerd://7e8a2c1d4927e4bbc19fd17ae7c19522f4b3c709c413a2ddab4cb436a2b4b49f
    Image:           registry.k8s.io/ingress-nginx/controller:v1.9.6@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c
    Image ID:        registry.k8s.io/ingress-nginx/controller@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c
    Ports:           80/TCP, 443/TCP, 8443/TCP
    Host Ports:      0/TCP, 0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
      --election-id=ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
    State:          Running
      Started:      Sat, 09 Mar 2024 18:42:58 +0100
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Sat, 09 Mar 2024 17:58:48 +0100
      Finished:     Sat, 09 Mar 2024 18:42:57 +0100
    Ready:          True
    Restart Count:  3
    Limits:
      memory:  400Mi
    Requests:
      cpu:      200m
      memory:   150Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       ingress-nginx-controller-5476887d79-4dmq5 (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ns77b (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       True 
  ContainersReady             True 
  PodScheduled                True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
  kube-api-access-ns77b:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>


Name:             ingress-nginx-controller-5476887d79-wqf9b
Namespace:        ingress-nginx
Priority:         0
Service Account:  ingress-nginx
Node:             furvester.org/78.46.174.251
Start Time:       Sat, 09 Mar 2024 17:23:08 +0100
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=ingress-nginx
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.9.6
                  helm.sh/chart=ingress-nginx-4.9.1
                  pod-template-hash=5476887d79
Annotations:      <none>
Status:           Running
IP:               10.244.0.149
IPs:
  IP:           10.244.0.149
Controlled By:  ReplicaSet/ingress-nginx-controller-5476887d79
Containers:
  controller:
    Container ID:    containerd://3de793a46fd90233c4c2786d8c255b16137a15f13c34690e89140f74156096aa
    Image:           registry.k8s.io/ingress-nginx/controller:v1.9.6@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c
    Image ID:        registry.k8s.io/ingress-nginx/controller@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c
    Ports:           80/TCP, 443/TCP, 8443/TCP
    Host Ports:      0/TCP, 0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
      --election-id=ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
    State:          Running
      Started:      Sat, 09 Mar 2024 18:42:58 +0100
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Sat, 09 Mar 2024 17:58:48 +0100
      Finished:     Sat, 09 Mar 2024 18:42:57 +0100
    Ready:          True
    Restart Count:  3
    Limits:
      memory:  400Mi
    Requests:
      cpu:      200m
      memory:   150Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       ingress-nginx-controller-5476887d79-wqf9b (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5ksjm (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       True 
  ContainersReady             True 
  PodScheduled                True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
  kube-api-access-5ksjm:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>
kubectl -n ingress-nginx describe svc
Name:                     ingress-nginx-controller
Namespace:                ingress-nginx
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress-nginx
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.9.6
                          argocd.argoproj.io/instance=ingress-nginx
                          helm.sh/chart=ingress-nginx-4.9.1
Annotations:              meta.helm.sh/release-name: ingress-nginx
                          meta.helm.sh/release-namespace: ingress-nginx
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:                     NodePort
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.99.190.98
IPs:                      10.99.190.98
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  30080/TCP
Endpoints:                10.244.0.148:80,10.244.0.149:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  30443/TCP
Endpoints:                10.244.0.148:443,10.244.0.149:443
Session Affinity:         None
External Traffic Policy:  Local
Events:                   <none>


Name:              ingress-nginx-controller-admission
Namespace:         ingress-nginx
Labels:            app.kubernetes.io/component=controller
                   app.kubernetes.io/instance=ingress-nginx
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=ingress-nginx
                   app.kubernetes.io/part-of=ingress-nginx
                   app.kubernetes.io/version=1.9.6
                   argocd.argoproj.io/instance=ingress-nginx
                   helm.sh/chart=ingress-nginx-4.9.1
Annotations:       meta.helm.sh/release-name: ingress-nginx
                   meta.helm.sh/release-namespace: ingress-nginx
Selector:          app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.99.41.32
IPs:               10.99.41.32
Port:              https-webhook  443/TCP
TargetPort:        webhook/TCP
Endpoints:         10.244.0.148:8443,10.244.0.149:8443
Session Affinity:  None
Events:            <none>

How to reproduce this issue:

Add a new resource Ingress resource with a TLS setting (I'm using Cert Manager to automatically issue a certificate for it).

Anything else we need to know:

Helm template
ingress-nginx:
  controller:
    kind: Deployment
    replicaCount: 2

    updateStrategy:
      rollingUpdate:
        maxUnavailable: 1
      type: RollingUpdate

    resources:
      limits:
        memory: 400Mi
      requests:
        cpu: 200m
        memory: 150Mi

    config:
      allow-snippet-annotations: "true"
      use-proxy-protocol: "true"

    service:
      type: NodePort
      externalTrafficPolicy: Local
      nodePorts:
        http: 30080
        https: 30443
@DASPRiD DASPRiD added the kind/bug Categorizes issue or PR as related to a bug. label Mar 9, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 9, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@longwuyuan
Copy link
Contributor

I think if you post the outputs of the commands that show the real installed state, it will help. Also include logs. The new bug report template asks for questions so its best if you see a new bug report template and then edit the issue description here to answer those questions

@DASPRiD
Copy link
Author

DASPRiD commented Mar 10, 2024

Thanks @longwuyuan, I've updated the issue description accordingly :)

@longwuyuan
Copy link
Contributor

longwuyuan commented Mar 10, 2024

thanks. I was hoping to see logs and k describe of ingress resources.

  • can you please add
    • kubectl -n ingress-nginx logs $ingress-controller-podname # for both pods
    • kubectl get events -A
    • kubectl get ing -A

@longwuyuan
Copy link
Contributor

my config for controller

% k -n ingress-nginx describe deployments.apps ingress-nginx-controller | egrep -i "replicas|image:" -m 2
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
    Image:           registry.k8s.io/ingress-nginx/controller:v1.10.0@sha256:42b3f0e5d0846876b1791cd3afeb5f1cbbe4259d6f35651dcc1b5c980925379c
[~] 
% 

I created a deployment and then an ingress for it

% k create deploy test0 --image httpd:alpine
deployment.apps/test0 created
[~] 
% k expose deployment test0 --port 80
service/test0 exposed
[~] 
% k create ing test0 --class nginx --rule test0.dev.enjoydevops.com/"*"=test0:80,tls=wildcard.dev.enjoydevops.com
ingress.networking.k8s.io/test0 created
[~] 
% k get ing
NAME    CLASS   HOSTS                       ADDRESS           PORTS     AGE
test0   nginx   test0.dev.enjoydevops.com   192.168.122.193   80, 443   37s
[~] 
% k describe ing test0 
Name:             test0
Labels:           <none>
Namespace:        default
Address:          192.168.122.193
Ingress Class:    nginx
Default backend:  <default>
TLS:
  wildcard.dev.enjoydevops.com terminates test0.dev.enjoydevops.com
Rules:
  Host                       Path  Backends
  ----                       ----  --------
  test0.dev.enjoydevops.com  
                             /   test0:80 (10.244.0.27:80)
Annotations:                 <none>
Events:
  Type    Reason  Age                From                      Message
  ----    ------  ----               ----                      -------
  Normal  Sync    49s (x2 over 58s)  nginx-ingress-controller  Scheduled for sync
  Normal  Sync    49s (x2 over 58s)  nginx-ingress-controller  Scheduled for sync
[~] 
% curl test0.dev.enjoydevops.com --resolve test0.dev.enjoydevops.com:80:`minikube ip` --resolve test0.dev.enjoydevops.com:443:`minikube ip`                                                                                        
<html>
<head><title>308 Permanent Redirect</title></head>
<body>
<center><h1>308 Permanent Redirect</h1></center>
<hr><center>nginx</center>
</body>
</html>
[~] 
% curl test0.dev.enjoydevops.com --resolve test0.dev.enjoydevops.com:80:`minikube ip` --resolve test0.dev.enjoydevops.com:443:`minikube ip` -L
<html><body><h1>It works!</h1></body></html>

I was watching evnets and i saw the reload after adding new ingress. I was also watching pods as well as tailing controller pod logs and unfortunately I did NOT see a restart

image

We also don't have many others reporting this. So need to rule out if something specific to your use case or your environ is causing the restart of pods

/remove-kind bug
/kind support

/triage needs-information

@k8s-ci-robot k8s-ci-robot added kind/support Categorizes issue or PR as a support question. triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels Mar 10, 2024
@DASPRiD
Copy link
Author

DASPRiD commented Mar 10, 2024

Thanks a lot, your last command actually helped me identify the issue! Watching the events I noticed that the controller gets OOMKilled, which lead me to read up on other tickets in here which showed the reason for that during config change.

I resolved this by limiting the number of worker processes to 8 (the node has 8 CPU-Cores plus Hyper Threading, so it defaulted to 16 worker processes) and raising the memory limit from 400Mi to 2Gi for these reload cases.

Again, thanks for being so helpful and sorry for potentially wasting some of your time :)

@longwuyuan
Copy link
Contributor

@DASPRiD sorry not accepted :-) LOL (kidding)

Really, glad to get feedack and info. Without feedback its hard to know reality. So thank you .

Glad problem solved. seeing events often provides insight, like this one. Have a great day

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Development

No branches or pull requests

3 participants