Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC stream is closed after 60 seconds of idle even with timeout annotations set #12434

Open
0x113 opened this issue Nov 29, 2024 · 7 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@0x113
Copy link

0x113 commented Nov 29, 2024

What happened:

The gRPC bi-directional stream is interrupted after 60 of idle even after necessary annotations are set.
Annotations:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "300"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/backend-protocol: GRPCS

I verified that these values are set correctly by execing into the pod and checking nginx.conf directly:

proxy_connect_timeout                   300s;
proxy_send_timeout                      300s;
proxy_read_timeout                      300s;
proxy_next_upstream                     error timeout;
proxy_next_upstream_timeout             0;
grpc_connect_timeout                    300s;
grpc_send_timeout                       300s;
grpc_read_timeout                       300s;

However, the bi-directional stream between the server and the agent is still closed after 60 seconds.

What you expected to happen:
I expected the stream to be closed after 5 minutes.

I think the default value of 60s is used whenever annotation values are greater than 60s. If I set these 3 annotations to a value less than 60, then the timeout is applied properly. For instance, I set it to "10" and the stream was interrupted after 10 seconds of idle.

NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version):

ingress-nginx-controller-6df48c5677-cjpgv:/etc/nginx$ /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.11.3
  Build:         0106de65cfccb74405a6dfa7d9daffc6f0a6ef1a
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.25.5

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version): v1.29.10

Environment:

  • Cloud provider or hardware configuration: Managed AKS

  • OS (e.g. from /etc/os-release):

  • Kernel (e.g. uname -a):

  • Install tools:

    • Helm
  • Basic cluster related info:

    • Managed AKS v1.29.10, Public Azure Cloud
  • How was the ingress-nginx-controller installed:

$ helm ls -A | grep -i ingress
ingress-nginx                   	ingress-nginx   	1       	2024-11-29 15:59:19.017422556 +0100 CET	deployed	ingress-nginx-4.11.3                                                     	1.11.3
$ helm -n ingress-nginx get values ingress-nginx
USER-SUPPLIED VALUES:
null
  • Current State of the controller:
$ kubectl describe ingressclasses
Name:         azure-application-gateway
Labels:       addonmanager.kubernetes.io/mode=Reconcile
              app=ingress-appgw
              app.kubernetes.io/component=controller
Annotations:  <none>
Controller:   azure/application-gateway
Events:       <none>

Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.11.3
              helm.sh/chart=ingress-nginx-4.11.3
Annotations:  meta.helm.sh/release-name: ingress-nginx
              meta.helm.sh/release-namespace: ingress-nginx
Controller:   k8s.io/ingress-nginx
Events:       <none>
$ kubectl -n ingress-nginx get all -o wide

NAME                                            READY   STATUS    RESTARTS   AGE    IP            NODE                                NOMINATED NODE   READINESS GATES
pod/ingress-nginx-controller-6df48c5677-cjpgv   1/1     Running   0          124m   10.244.2.16   aks-nodepool1-19682194-vmss000003   <none>           <none>

NAME                                         TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)                      AGE    SELECTOR
service/ingress-nginx-controller             LoadBalancer   10.0.217.178   <redacted>   80:32223/TCP,443:31568/TCP   124m   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
service/ingress-nginx-controller-admission   ClusterIP      10.0.24.224    <none>           443/TCP                      124m   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE    CONTAINERS   IMAGES                                                                                                                     SELECTOR
deployment.apps/ingress-nginx-controller   1/1     1            1           124m   controller   registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME                                                  DESIRED   CURRENT   READY   AGE    CONTAINERS   IMAGES                                                                                                                     SELECTOR
replicaset.apps/ingress-nginx-controller-6df48c5677   1         1         1       124m   controller   registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=6df48c5677
$ kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>

Name:             ingress-nginx-controller-6df48c5677-cjpgv
Namespace:        ingress-nginx
Priority:         0
Service Account:  ingress-nginx
Node:             aks-nodepool1-19682194-vmss000003/10.224.0.6
Start Time:       Fri, 29 Nov 2024 15:59:47 +0100
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=ingress-nginx
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.11.3
                  helm.sh/chart=ingress-nginx-4.11.3
                  pod-template-hash=6df48c5677
Annotations:      <none>
Status:           Running
IP:               10.244.2.16
IPs:
  IP:           10.244.2.16
Controlled By:  ReplicaSet/ingress-nginx-controller-6df48c5677
Containers:
  controller:
    Container ID:    containerd://c9fec8b39fbde912c1f7daf9e151fb32bc9fa4ab754b26908b873476f1a6d6a2
    Image:           registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7
    Image ID:        registry.k8s.io/ingress-nginx/controller@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7
    Ports:           80/TCP, 443/TCP, 8443/TCP
    Host Ports:      0/TCP, 0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
      --election-id=ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
      --enable-metrics=false
    State:          Running
      Started:      Fri, 29 Nov 2024 15:59:56 +0100
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       ingress-nginx-controller-6df48c5677-cjpgv (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vtb8k (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       True 
  ContainersReady             True 
  PodScheduled                True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
  kube-api-access-vtb8k:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason  Age                 From                      Message
  ----    ------  ----                ----                      -------
  Normal  RELOAD  13m (x6 over 125m)  nginx-ingress-controller  NGINX reload triggered due to a change in configuration
$ kubectl -n ingress-nginx describe svc ingress-nginx-controller

Name:                     ingress-nginx-controller
Namespace:                ingress-nginx
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress-nginx
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.11.3
                          helm.sh/chart=ingress-nginx-4.11.3
Annotations:              meta.helm.sh/release-name: ingress-nginx
                          meta.helm.sh/release-namespace: ingress-nginx
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.0.217.178
IPs:                      10.0.217.178
LoadBalancer Ingress:     <redacted>
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  32223/TCP
Endpoints:                10.244.2.16:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  31568/TCP
Endpoints:                10.244.2.16:443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason               Age                From                Message
  ----    ------               ----               ----                -------
  Normal  UpdatedLoadBalancer  33m (x3 over 51m)  service-controller  Updated load balancer with new hosts
  • Current state of ingress object, if applicable:
$ kubectl describe ing -n <ns> <ing-name>
Name:             <ing-name>
Namespace:         <ns>
Address:         <redacted>
Ingress Class:    nginx
Default backend:  <default>
Rules:
  Host        Path  Backends
  ----        ----  --------
  *
                 envoy-grpcapi:443 (10.244.0.29:8080)
Annotations:  nginx.ingress.kubernetes.io/backend-protocol: GRPCS
              nginx.ingress.kubernetes.io/proxy-connect-timeout: 300
              nginx.ingress.kubernetes.io/proxy-read-timeout: 300
              nginx.ingress.kubernetes.io/proxy-send-timeout: 300
              nginx.ingress.kubernetes.io/ssl-redirect: true
Events:
  Type    Reason  Age                From                      Message
  ----    ------  ----               ----                      -------
  Normal  Sync    15m (x7 over 55m)  nginx-ingress-controller  Scheduled for sync
  • Others:
    • Any other related information like ;
      • copy/paste of the snippet (if applicable)
      • kubectl describe ... of any custom configmap(s) created and in use
      • Any other related information that may help

How to reproduce this issue:

Anything else we need to know:

@0x113 0x113 added the kind/bug Categorizes issue or PR as related to a bug. label Nov 29, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Nov 29, 2024
@longwuyuan
Copy link
Contributor

/remove-kind bug

Can you please write a step-by-step guide that someone can copy/paste from, to reproduce on a kind cluster. Inclusding the gRPC application.

@k8s-ci-robot k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Nov 30, 2024
@0x113
Copy link
Author

0x113 commented Nov 30, 2024

Sure. I will work on a sample app and share the details soon.

@Dunge
Copy link

Dunge commented Dec 2, 2024

Try setting client-body-timeout

@strongjz
Copy link
Member

strongjz commented Dec 4, 2024

/kind bug
/priority backlog
/triage needs-information

Let us know if the client body timeout works, i am also seeing client header timeout as well should be set as well

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority labels Dec 4, 2024
@strongjz
Copy link
Member

@0x113 are you still having issues? If not can you post the resolution and/or close the ticket?

Copy link

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

@github-actions github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Development

No branches or pull requests

5 participants