-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstream Prematurely Closed Connection While Reading Response Header From Upstream #11244
Comments
This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-kind bug
/triage needs-information |
Thank you for the prompty reply.
Sorry for the confusion, I only added this information for people to better understand my cluster architecture. The issue arises in the nginx ingress controller.
You mention "provide information as asked" but I can't find which information was asked for? And where should I open a new issue and with which template? Thank you already in advance for your reply. |
|
Thank you for clarification, I will update it asap! |
Done, I have provided as much information as possible related to the issue described above.
When I understand you correctly, you recommend switching to service type loadBalancer, leveraging metallb? However, Using service Type NodePort seems to be supportet as well according to https://docs.nginx.com/nginx-ingress-controller/installation/installing-nic/installation-with-manifests/#option-1-create-a-nodeport-service. Is the problem that we use the F5 in combination with the service type NodePort?
I am not aware whether termination on the ingress controllers is possible. I have to check this internally with my team. Nonetheless, does it even matter what happens "before" the ingress controller as the issue seems to arise when connections between the ingress controllers and the backends are established/reused?
Unfortunately, I am not able to provide any further information as the one above. However, the problem seems to arise on a regular basis (0.003% of all requests), independent of the backend service. Therefore, with the setup described above it should be recreatable anyways. |
My point was that you terminate on F5 and then establish a new connection from F5 to ingress-nginx. This is not tested in the project and the relevance here is that there is no official stand on how your F5 works for Layer3/4 and Layer7. Nobody in the project can or will test this for you or try to reproduce it for you. Other users of F5 in front of NodePort service of ingress-nginx-controller may know better and comment here. |
Got it, let's see if there is someone who can relate to the issue stated. |
Just to add on that:
|
Sorry if my comments are not clear. The way I understand this is ;
|
Sounds legit and is totally understandable
Let's see if there is anyone in the community, using F5, NodePort Service and Nginx ingress controller
Got it, thank you for clarification!
Already checked resource-starvation and backend pod processing. I fully understand the notes you made about CI and reproduction though. I guess the only thing we can do right now is to wait for someone who knows more about F5 + NodePort + Nginx ingress controller |
@ChristianBieri1995 there is no data here that points at a problem to be solved in the ingress-nginx controller. Additionally I know that the 502s are reported by others in other issues and its come from timeouts or backend-pod design or load etc. The controller is helpless if the upstream responds with a 502. Also, I think you will get more eyes of F5 users if you message in the Kubernetes Slack in channels like the ingress-nginx-users or plain old kubernetes-users. We have 490ish issues open and not all are tracking a problem to be solved in the ingress-nginx controller per se. So I will close the issue for now but feel-free to re-open the issue when there is data posted here that indicates work to be done on the controller itself. It will help manage issue better for us. /close |
@longwuyuan: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Dear @longwuyuan, Finally, we were able to solve the issue (we think so at least) which arose because of a timeout mismatch (when reusing sessions, HTTP 1.1) between the Nginx ingress controller and our OpenLiberty backend. Within the Nginx ingress controller, the following (relevant) parameters (related to reusage of sessions) are set: keepalive_timeout 75s; keepalive_requests 1000; In the OpenLiberty backend, the "persistTimeout" configuration was set to 60s. As a result, whenever a session which was not used > "persistTimeout" is being reused by the Nginx ingress controller, the backend is no longer aware of these sessions (the session was prematurely closed). Even though you indicated multiple times that this is not an issue of the Nginx ingress controller, I am not totally sure about it. Should the Nginx ingress controller not be informed if sessions are being closed by the OpenLiberty backend, thus, the session not being reused by the Nginx ingress controller? As far as I know, we are not the only victims of this (in my view faulty) behaviour. I am really looking forward to your feedback. |
I'm getting very similar error: My Nginx ingress controller communicates (sending POST request) with the Wordpress/Printful WooCommerce Integration/3.0.0 backend |
What happened:
To route traffic into my Kubernetes cluster, I use an F5 load balancer, nginx ingress controllers (horizontal pod autoscaler) and k8s ingress objects. By doing so, I receive 502 HTTP status codes on a regular basis. Although low in percentage (0.003%), the errors must not be neglected as millions of requests are handled on a daily basis.
After having spent hours to detect the issue, I stumbled upon the following error: 2024/04/10 09:13:00 [error] 3980#3980: *29815576 upstream prematurely closed connection while reading response header from upstream
With that information at hand, I skimmed countless webpages to identify the root cause. After having checked the most obvious culprits such as
I could not achieve any progress. As a next step, I activated the error-log-level: debug to catch the corresponding logs (see relevant parts below):
2024-03-26T16:21:04.076718973+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 recv: fd:46 0 of 1048576
2024-03-26T16:21:04.076726784+01:00 stderr F 2024/03/26 15:21:04 [error] 136#136: *943568 upstream prematurely closed connection while reading response header from upstream (here, I left out some irrelevant parts)
2024-03-26T16:21:04.076750704+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 http next upstream, 2
2024-03-26T16:21:04.076756887+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 free keepalive peer
2024-03-26T16:21:04.076762948+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 lua balancer free peer, tries: 2
2024-03-26T16:21:04.076768572+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 finalize http upstream request: 502
2024-03-26T16:21:04.076774231+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 finalize http proxy request
2024-03-26T16:21:04.07677987+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 close http upstream connection: 46
Unfortunately, this did not help either but it indicated that there may be an issue with network connections that are reused (HTTP1.1). Therefore, I added nginx.ingress.kubernetes.io/proxy-http-version: "1.0" to the relevant k8s ingress object, and behold: No 502 HTTP status codes anymore. I could not only replicate this behaviour on my test environment, but also on more relevant stages.
In my view, there seems to be an issue with reusing established connections, coming along with HTTP1.1 - probably with my nginx.conf.
NGINX Ingress version: nginx version: nginx/1.21.6
Kubernetes version:
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.5+rke2r1
Environment:
Cloud provider or hardware configuration: Rancher, rke2
OS : NAME="SLES", VERSION="15-SP5", VERSION_ID="15.5"
Kernel: Linux 5.14.21-150500.55.52-default Basic structure #1 SMP PREEMPT_DYNAMIC Tue Mar 5 16:53:41 UTC 2024 (a62851f) x86_64
Install tools:
Basic cluster related info:
How was the ingress-nginx-controller installed:
kubectl -n <ingresscontrollernamespace> get all -A -o wide
:kubectl -n <ingresscontrollernamespace> describe pod <ingresscontrollerpodname>
:kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
:kubectl -n <appnamespace> describe ing <ingressname>
:May there be an issue with timeouts related to established connections?
How to reproduce this issue: Simply by accessing the backend via F5 and nginx ingress controller. To do so, I created a load tester.
The text was updated successfully, but these errors were encountered: