Incorrect handling of long URLs [draft] #11243

phantom943 · 2024-04-10T09:39:53Z

What happened:
I am getting intermittent HTTP errors when querying services via ingress, if the URL is longer than ~2000 symbols.
What you expected to happen:
Idempotency (no intermittent errors when querying the same stateless endpoint)
NGINX Ingress controller version:
v1.1.1
Kubernetes version (use kubectl version):

Environment:

Cloud provider or hardware configuration: Baremetal
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
- Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
Basic cluster related info:
- kubectl version
- kubectl get nodes -o wide
How was the ingress-nginx-controller installed:
- If helm was used then please show output of helm ls -A | grep -i ingress
- If helm was used then please show output of helm -n <ingresscontrollernamespace> get values <helmreleasename>
- If helm was not used, then copy/paste the complete precise command used to install the controller, along with the flags and options used
- if you have more than one instance of the ingress-nginx-controller installed in the same cluster, please provide details for all the instances
Current State of the controller:
- kubectl describe ingressclasses
- kubectl -n <ingresscontrollernamespace> get all -A -o wide
- kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
- kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
Current state of ingress object, if applicable:
- kubectl -n <appnamespace> get all,ing -o wide
- kubectl -n <appnamespace> describe ing <ingressname>
- If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag
Others:
- Any other related information like ;
  - copy/paste of the snippet (if applicable)
  - kubectl describe ... of any custom configmap(s) created and in use
  - Any other related information that may help

How to reproduce this issue:
Do the same CURL GET request to a web service in k8s 30 times in a row. The URL has to be long (over 2000 characters). Observe the 200/error rate (in my case it's 505, but I suspect it might differ by application)
Anything else we need to know:
When I do the same CURL GET request to it 30 times in a row, ~10 times (30%) I get HTTP 505 error, and the other 20 times are 200 ok.
Some relevant info:

Web service logs are empty (even with the highest debug level)
nginx logs for both 200 and 505 case are identical (except for the response size). They both get routed to the same service.
If I do kubectl port-forward of my service' port to my machine - all 30 out of 30 requests complete with HTTP 200.
The URL I have is quite long (2500 symbols) (it's not me who designed the service, so please don't judge). If I truncate it to 1000 symbols - I get 30 out of 30 requests with HTTP 200. The actual cutoff where it starts breaking is about 1500-1700 symbols (letters and numbers).

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-04-10T09:40:01Z

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

longwuyuan · 2024-04-10T23:37:46Z

/remove-kind bug

Why the word draft
Did yo umean URLs that are over 1600 chars in length
How does it work if you use plain old vanilla Nginx v1.25 as reverse-proxy instead of ingress-nginx

/kind support
/triage needs-information

phantom943 · 2024-04-12T08:28:39Z

Hello @longwuyuan !

The word draft just because I haven't finished filling in all the details (like how was the controller installed and whatnot). I am still finishing getting details from my colleagues
Yep, correct. URLs over 1600 chars in length
I am not sure if we can do that. I'll try, but that requires a lot of setup at this point.

longwuyuan · 2024-04-12T11:56:14Z

ok. can you also describe why your URL is so long.

phantom943 · 2024-04-12T13:48:58Z

@longwuyuan well, it's a very bad choice by the end app developers - they are passing an OpenID token via a URL parameter -.-
Can't do much about that though

longwuyuan · 2024-04-12T16:30:25Z

ok, thank you for the info. explains the use case

longwuyuan · 2024-04-12T16:32:57Z

i am checking the nginx specs and HTTP specs. Maybe you can do the same. This project code will not set that limit for sure.

cc @tao12345666333 @rikatz if you already know the spec limit for a HTTP len(URL)

longwuyuan · 2024-04-12T16:33:58Z

If you already have the complete error message from the controller logs, please copy/paste it here

longwuyuan · 2024-04-12T16:40:53Z

If it is this

https://www.slingacademy.com/article/nginx-error-414-request-uri-too-large-causes-and-solutions/

then the recommended solution is this

https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#large-client-header-buffers

longwuyuan · 2024-04-12T16:42:44Z

Also controller v1.1.1 is not supported anymore. Is that the real version of hte controller in use ?

phantom943 · 2024-04-15T15:52:22Z

Hey @longwuyuan
thanks a lot for your suggestions!
We have indeed tried large-client-header-buffers, to no avail.
We have also just updated to the latest version of controller (v1.10.0), also with no effect - the bug still
I don't believe it's a matter of a setting, because otherwise it would have reproduced 100% of the time, not 30%.
Also interesting think to note is that this seems to be dependent on the application - I have another application where if I put a long token into the URL and just return it as a response, no 505 errors occur. So it could be a combination of a bug in the application and a bug in the ingress controller itself.
Do you have any suggestions on how to debug this maybe?

longwuyuan · 2024-04-16T02:23:27Z

if you can run one request with 1800 chars in the URL and it does not fail, then I agree that this is not related to the large-client-header-buffer
I would next find a threshold at which the success rate begins to drop below 100%. Like use load generation tools and send incremental volumes but in batches.
- 10 requests with 1800 chars in URL
- 100 ditto
- 500 ditto
- and so on

longwuyuan · 2024-04-16T02:30:57Z

Also, still waiting on the exact and real complete error message lines

phantom943 · 2024-04-18T13:41:47Z

Hello!
So, we have managed to locate the root issue. It turns out, the issue wasn't in the nginx controller after all.
The issue was packet fragmenting. The packet we were sending was bigger than the MTU of the machines on route.

So, sometimes the packet would arrive in full, and sometimes the packet would arrive in two fragments. The problem was with the server - in case of packet fragmenting, it received the first fragment and (because of a bug they have in the code) it started to interpret that half of the packet as the whole thing. The HTTP headers are actually located at the end of the packet, so they were in the second fragment. Clearly, the server failed to process the half of the packet correctly and just replied with 505 instead of waiting for the rest of it to arrive and process it in full.
Why did the packet get fragmented randomly 1/3 of the time we have no idea, but we managed to find a workaround.
We placed an additional nginx server in front of our target container on the same physical node, that would just relay the message to the correct port in the server. But nginx CAN handle fragmented packets correctly, and can reassemble these. So it reassembled the packet, and given that after that stage the packet didn't encounter any different machines - it arrived in full to the target server.

longwuyuan · 2024-04-18T14:00:41Z

Thanks for the update.
If the apps in pods have listening sockets created by dev servers like jetty django other etc., it implies that folks did not put a nginx in front of it, inside the pod.

Your solutions sounds too anti-pattern to put a nginx webserver on node but since I am from outside, I would not know any better.

phantom943 · 2024-04-18T14:03:53Z

Thanks for the update. If the apps in pods have listening sockets created by dev servers like jetty django other etc., it implies that folks did not put a nginx in front of it, inside the pod.

Your solutions sounds too anti-pattern to put a nginx webserver on node but since I am from outside, I would not know any better.

Hey @longwuyuan
indeed you are right. They did not have any nginx or ASGI servers or anything - pure node.js server.
So we actually put the nginx into a side-car container in the same pod as the application, so hopefully not an anti-pattern.

longwuyuan · 2024-04-18T14:07:00Z

thanks for updating. helps future readers.

phantom943 added the kind/bug Categorizes issue or PR as related to a bug. label Apr 10, 2024

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Apr 10, 2024

strongjz added this to [SIG Network] Ingress NGINX Apr 10, 2024

k8s-ci-robot added kind/support Categorizes issue or PR as a support question. triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels Apr 10, 2024

phantom943 closed this as completed Apr 18, 2024

github-project-automation bot moved this to Done in [SIG Network] Ingress NGINX Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect handling of long URLs [draft] #11243

Incorrect handling of long URLs [draft] #11243

phantom943 commented Apr 10, 2024 •

edited

Loading

k8s-ci-robot commented Apr 10, 2024

longwuyuan commented Apr 10, 2024

phantom943 commented Apr 12, 2024

longwuyuan commented Apr 12, 2024

phantom943 commented Apr 12, 2024

longwuyuan commented Apr 12, 2024

longwuyuan commented Apr 12, 2024

longwuyuan commented Apr 12, 2024

longwuyuan commented Apr 12, 2024

longwuyuan commented Apr 12, 2024

phantom943 commented Apr 15, 2024

longwuyuan commented Apr 16, 2024

longwuyuan commented Apr 16, 2024

phantom943 commented Apr 18, 2024

longwuyuan commented Apr 18, 2024

phantom943 commented Apr 18, 2024

longwuyuan commented Apr 18, 2024

Incorrect handling of long URLs [draft] #11243

Incorrect handling of long URLs [draft] #11243

Comments

phantom943 commented Apr 10, 2024 • edited Loading

k8s-ci-robot commented Apr 10, 2024

longwuyuan commented Apr 10, 2024

phantom943 commented Apr 12, 2024

longwuyuan commented Apr 12, 2024

phantom943 commented Apr 12, 2024

longwuyuan commented Apr 12, 2024

longwuyuan commented Apr 12, 2024

longwuyuan commented Apr 12, 2024

longwuyuan commented Apr 12, 2024

longwuyuan commented Apr 12, 2024

phantom943 commented Apr 15, 2024

longwuyuan commented Apr 16, 2024

longwuyuan commented Apr 16, 2024

phantom943 commented Apr 18, 2024

longwuyuan commented Apr 18, 2024

phantom943 commented Apr 18, 2024

longwuyuan commented Apr 18, 2024

phantom943 commented Apr 10, 2024 •

edited

Loading