Ingress nginx controller changed load balancer when updating managed nodegroups AWS EKS #11116

numberama · 2024-03-13T10:59:16Z

What happened:
Hi, firstly I'm not sure if this is a bug or not, but I would very much appreciate help in understanding the behaviour.

In the process of updating node images we:

created a new managed nodegroup
cordoned old managed nodegroup nodes
drained old managed nodegroup nodes
deleted old managed nodegroup

On deleting the old managed nodegroup the target groups changed to a different load balancer. This caused an outage of our services until we could diagnose the dns change.

What you expected to happen:
We would not expect a change in node, or node image version to trigger a change in the ingress service load balancer.
We have performed similar operations many times and not experienced this behaviour

We opened a ticker with AWS support and spoke to the EKS team, they were able to confirm that the eks controller had switched the targets.
The best explanation they could come up with was that we have 2 load balancers with the same name (different full dns names)

is it possible that this could be a reason for the change?

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.9.4
  Build:         846d251814a09d8a5d8d28e2e604bfc7749bcb49
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version):
Server Version: v1.27.9-eks-5e0fdde
Environment:

Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release):na
Kernel (e.g. uname -a):na
Install tools:eksctl
Basic cluster related info:
- kubectl version:1.27
- kubectl get nodes -o wide
How was the ingress-nginx-controller installed: helm
ingress-nginx ingress-nginx 2 2023-10-31 09:03:19.593721 +0000 UTC deployed ingress-nginx-4.8.3 1.9.4
USER-SUPPLIED VALUES: null
Current State of the controller:
- kubectl describe ingressclasses

Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.9.4
              helm.sh/chart=ingress-nginx-4.8.3
Annotations:  meta.helm.sh/release-name: ingress-nginx
              meta.helm.sh/release-namespace: ingress-nginx
Controller:   k8s.io/ingress-nginx
Events:       <none>

Anything else we need to know:

As I said earlier we have seem to have 2 loadbalancers tagged for the ingress-nginx-controller service with the same name. I'm not sure of the mechanism but I feel like this is an important part of it

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-03-13T10:59:24Z

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

longwuyuan · 2024-03-14T00:00:45Z

/remove-kind bug

There are questions asked in the template of a new bug-report
You have skipped all those questions or almost all of them
That info is needed for readers to analyse the reported problems
Please read the questions asked in in a new issue template
Please edit the issue description here and provide all the info asked in the template of a new bug report

longwuyuan · 2024-03-14T00:00:57Z

/triage needs-information

strongjz · 2024-03-14T15:20:50Z

/assign @strongjz

github-actions · 2024-04-14T02:18:40Z

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

longwuyuan · 2024-04-29T02:20:37Z

/close

k8s-ci-robot · 2024-04-29T02:20:41Z

@longwuyuan: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

numberama added the kind/bug Categorizes issue or PR as related to a bug. label Mar 13, 2024

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Mar 13, 2024

strongjz added this to [SIG Network] Ingress NGINX Mar 13, 2024

k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Mar 14, 2024

k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Mar 14, 2024

k8s-ci-robot assigned strongjz Mar 14, 2024

github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 14, 2024

k8s-ci-robot closed this as completed Apr 29, 2024

github-project-automation bot moved this to Done in [SIG Network] Ingress NGINX Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingress nginx controller changed load balancer when updating managed nodegroups AWS EKS #11116

Ingress nginx controller changed load balancer when updating managed nodegroups AWS EKS #11116

numberama commented Mar 13, 2024

k8s-ci-robot commented Mar 13, 2024

longwuyuan commented Mar 14, 2024

longwuyuan commented Mar 14, 2024

strongjz commented Mar 14, 2024

github-actions bot commented Apr 14, 2024

longwuyuan commented Apr 29, 2024

k8s-ci-robot commented Apr 29, 2024

Ingress nginx controller changed load balancer when updating managed nodegroups AWS EKS #11116

Ingress nginx controller changed load balancer when updating managed nodegroups AWS EKS #11116

Comments

numberama commented Mar 13, 2024

k8s-ci-robot commented Mar 13, 2024

longwuyuan commented Mar 14, 2024

longwuyuan commented Mar 14, 2024

strongjz commented Mar 14, 2024

github-actions bot commented Apr 14, 2024

longwuyuan commented Apr 29, 2024

k8s-ci-robot commented Apr 29, 2024