Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingress nginx controller changed load balancer when updating managed nodegroups AWS EKS #11116

Closed
numberama opened this issue Mar 13, 2024 · 7 comments
Assignees
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@numberama
Copy link

What happened:
Hi, firstly I'm not sure if this is a bug or not, but I would very much appreciate help in understanding the behaviour.

In the process of updating node images we:

  • created a new managed nodegroup
  • cordoned old managed nodegroup nodes
  • drained old managed nodegroup nodes
  • deleted old managed nodegroup

On deleting the old managed nodegroup the target groups changed to a different load balancer. This caused an outage of our services until we could diagnose the dns change.

What you expected to happen:
We would not expect a change in node, or node image version to trigger a change in the ingress service load balancer.
We have performed similar operations many times and not experienced this behaviour

We opened a ticker with AWS support and spoke to the EKS team, they were able to confirm that the eks controller had switched the targets.
The best explanation they could come up with was that we have 2 load balancers with the same name (different full dns names)

is it possible that this could be a reason for the change?

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.9.4
  Build:         846d251814a09d8a5d8d28e2e604bfc7749bcb49
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version):
Server Version: v1.27.9-eks-5e0fdde
Environment:

  • Cloud provider or hardware configuration: AWS

  • OS (e.g. from /etc/os-release):na

  • Kernel (e.g. uname -a):na

  • Install tools:eksctl

  • Basic cluster related info:

    • kubectl version:1.27
    • kubectl get nodes -o wide
  • How was the ingress-nginx-controller installed: helm
    ingress-nginx ingress-nginx 2 2023-10-31 09:03:19.593721 +0000 UTC deployed ingress-nginx-4.8.3 1.9.4
    USER-SUPPLIED VALUES: null

  • Current State of the controller:

    • kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.9.4
              helm.sh/chart=ingress-nginx-4.8.3
Annotations:  meta.helm.sh/release-name: ingress-nginx
              meta.helm.sh/release-namespace: ingress-nginx
Controller:   k8s.io/ingress-nginx
Events:       <none>

Anything else we need to know:

As I said earlier we have seem to have 2 loadbalancers tagged for the ingress-nginx-controller service with the same name. I'm not sure of the mechanism but I feel like this is an important part of it

@numberama numberama added the kind/bug Categorizes issue or PR as related to a bug. label Mar 13, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Mar 13, 2024
@longwuyuan
Copy link
Contributor

/remove-kind bug

  • There are questions asked in the template of a new bug-report
  • You have skipped all those questions or almost all of them
  • That info is needed for readers to analyse the reported problems
  • Please read the questions asked in in a new issue template
  • Please edit the issue description here and provide all the info asked in the template of a new bug report

@k8s-ci-robot k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Mar 14, 2024
@longwuyuan
Copy link
Contributor

/triage needs-information

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Mar 14, 2024
@strongjz
Copy link
Member

/assign @strongjz

Copy link

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

@github-actions github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 14, 2024
@longwuyuan
Copy link
Contributor

/close

@k8s-ci-robot
Copy link
Contributor

@longwuyuan: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Development

No branches or pull requests

4 participants