Admission webhook endpoint is called from other ingressClass #10985

khauser · 2024-02-15T09:06:09Z

What happened:

I have some failing helm chart installations like:

+ helm --kubeconfig /data/k8s/kube-config.yaml install iste-run-tp221-ts4366 ./icm -f ./values.yaml
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /data/k8s/kube-config.yaml
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /data/k8s/kube-config.yaml
Error: INSTALLATION FAILED: 1 error occurred:
	* Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://iste-run-tp221-ts4368-ingress-nginx-controller-admission.iste-testrun.svc:443/networking/v1/ingresses?timeout=10s": no endpoints available for service "iste-run-tp221-ts4368-ingress-nginx-controller-admission"

So I try to install iste-run-tp221-ts4366 with ingressClass "ingress-class-iste-run-tp221-ts4366" and it fails due to another ingress with class "ingress-class-iste-run-tp221-ts4368".

Here is the configuration:

ingress-nginx:
  enabled: true
  controller:
    kind: Deployment
    ingressClassByName: "true"
    ingressClassResource:
      name: "ingress-class-${HELM_JOB_NAME}"
      controllerValue: "k8s.io/ingress-${HELM_JOB_NAME}"
    ingressClass: "ingress-class-${HELM_JOB_NAME}"
    nodeSelector:
      agentpool: agentpool2
    affinity:
      podAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: release
              operator: In
              values:
              - ${HELM_JOB_NAME}-icm-as
          topologyKey: kubernetes.io/hostname
    service:
      external:
        enabled: "false"
      internal:
        enabled: "true"
        annotations:
          service.beta.kubernetes.io/azure-load-balancer-internal: "true"
      externalTrafficPolicy: Local
    publishService:
      enabled: "true"
    resources:
      requests:
        cpu: 200m
        memory: 300Mi

  defaultBackend:
    enabled: false

HELM_JOB_NAME could be e.g. "iste-run-tp221-ts4366" or "iste-run-tp221-ts4368", ...

Our Ingress backend values.yaml:

  ingress:
    enabled: true
    className: "ingress-class-${HELM_JOB_NAME}"
    annotations:
      nginx.ingress.kubernetes.io/proxy-body-size: "600m"
      service.beta.kubernetes.io/azure-load-balancer-internal: "true"
    hosts:
    - host: "${HELM_JOB_NAME}.${DNS_ZONE_NAME}"
      paths:
      - path: /
        pathType: Prefix
    tls:
    - secretName: <ours>
      hosts:
      - "${HELM_JOB_NAME}.${DNS_ZONE_NAME}"

What you expected to happen:

One ingress shouldn't call webhooks from the other if these are differentiated by an ingressClass

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

name: ingress-nginx
version: 4.9.0
repository: https://kubernetes.github.io/ingress-nginx
condition: ingress-nginx.enabled

Pod image: registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e

Kubernetes version (use kubectl version):
Client Version: v1.28.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.3

Environment:

Cloud provider or hardware configuration: Azure AKS
OS (e.g. from /etc/os-release):
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.18.5
PRETTY_NAME="Alpine Linux v3.18"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"
Kernel (e.g. uname -a):
Linux -8567f45f9c-lnxjb 5.15.0-1049-azure Clarify port clash failure mode #56-Ubuntu SMP Wed Sep 20 12:34:34 UTC 2023 x86_64 Linux

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-02-15T09:06:17Z

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

longwuyuan · 2024-02-15T09:11:23Z

/remove-kind bug

Uninstall and remove all current resources related to ingress-nginx
Perform a non-custom vanilla install as per the documentation https://kubernetes.github.io/ingress-nginx/deploy/
Update status here
Now customize as per your needs and provide info on the diff
Ensure that all the questions that are asked in the issue template are answered
Update status

/triage needs-information

khauser · 2024-02-15T12:46:37Z

/label bug

It's a non-custom vanilla installation as chart dependency and the bug is obvious

strongjz · 2024-02-15T16:37:47Z

/assign @Gacko

longwuyuan · 2024-02-15T16:52:25Z

Is the ingress-nginx controller chart a sub-chart of your chart release named as iste-run-tp221-ts4366 ?

khauser · 2024-02-15T17:25:45Z

Yes it is.

longwuyuan · 2024-02-15T17:47:37Z

thanks

just FYI, CI does not chart installation of the ingress-nginx chart as a subchart
you may want to add all the related info for readers to comprehend the complete use-case
all details will also be essential in case someone wants to reproduce
the error no endpoint available is a hint. you could help out with additional information by doing a test like a vanilla install as per docs https://kubernetes.github.io/ingress-nginx/deploy/ and update status here. This will let readers know if you get the same error, even on a vanilla install

longwuyuan · 2024-02-15T17:53:04Z

Also, highly significant in this kind of use-case, the questions that are asked in a new bug-report for this project cover several critical aspects that a reader needs to make useful comments based on data. Hence if you read those questions of a new bug-report template and then edit this issue-description with the answers to those questions, it will be very useful to readers

khauser · 2024-02-15T18:45:03Z

Thanks for guidance.

Our usecase is happening in our testenvironment. There our product is tested within an Azure AKS and a scaling agentpool which could scale from 0-40. On each of this nodes multiple tests could be executed in paralell. In front of our product we use an ingress to terminate SSL. extenaldns regognizes new ingress rules and creates private dns entries. With these a test could be started. The ingress talks http with our application.

I tried also to use just one Ingress but there the problem is that an ingress rule change forces a nginx restart. So one test runs and one of the next starting destroys the other. Creating all needed Ingress rules before one test starts doesn't feel good but would be possible. On the other hand each test should be isolated as possible so One-Test&One-IngressController should be better.

My workaround is now to disable the webhooks admission with controller.admissionWebhooks.enabled: false (ATTENTION: Without quotes) and it seems to work that way. But I assume that there are usecases where it is needed. For this you now have bug report.

During my test I had 4 of 8 Tests starting well in the beginning. Then some tests began to fail due to the wrong endpoint usage.

Multiple vanilla installations would be very tasteful but not different from having multiple subchart deployments.

Here is also our ingress rule template file because I missed to add it initially:

{{- if .Values.ingress.enabled -}}
{{- $fullName := include "icm-as.fullname" . -}}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: {{ $fullName }}
  labels:
    app.kubernetes.io/name: {{ $fullName }}
    helm.sh/chart: {{ include "icm-as.chart" . }}
    app.kubernetes.io/instance: {{ $fullName }}
    app.kubernetes.io/managed-by: {{ .Release.Service }}
  annotations:
    nginx.ingress.kubernetes.io/affinity: cookie
    nginx.ingress.kubernetes.io/affinity-mode: persistent
    nginx.ingress.kubernetes.io/session-cookie-name: stid
    {{- range $key, $value := .Values.ingress.annotations }}
    {{ $key }}: {{ $value | quote }}
    {{- end }}
spec:
  {{- if .Values.ingress.className }}
  ingressClassName: {{ .Values.ingress.className }}
  {{- else }}
  ingressClassName: {{ $fullName }}-ingress
  {{- end }}
  {{- if .Values.ingress.tls }}
  tls:
    {{- range .Values.ingress.tls }}
    - hosts:
        {{- range .hosts }}
        - {{ . | quote }}
        {{- end }}
      secretName: {{ .secretName }}
    {{- end }}
  {{- end }}
  rules:
    {{- range .Values.ingress.hosts }}
    - host: {{ .host | quote }}
      http:
        paths:
          {{- range .paths }}
          - path: {{ .path }}
            pathType: {{ .pathType }}
            backend:
              service:
                name: {{ $fullName }}
                port:
                  name: svc
          {{- end }}
    {{- end }}
{{- end }}

That's all I have for now. Please tell me if you need more.

longwuyuan · 2024-02-15T20:10:46Z

There is likely some more contributor who may comment so lets wait for that too
Personally I already likely know what is the problem but because that discussion is a deep dive and too complicated to type out, or even engage in at this stage, its best left for a later stage, when more real data is already posted in this thread. In short you should run k get po -A -w and k get ep -A -w and concurrently run k get events -A -w. This info when corelated with timestamps will tell you the precise timings that your parent chart expected the endpoint to be ready and the precise timings the endpoint actually became ready.
I repeat that we don't test subchart usecase in CI so you are pretty much on your own to design the ingress-nginx controller's use-case as a sub-chart. Maybe even add "wait" periods in your chart install steps for example.
When others see the error message here and the other data you have provided, there would be comments forthcoming so hoping your issue gets resolved sooner than later
You already said the bug is obvious, but I can't reproduce the "endpoints not available" error on a vanilla install i.e. "no subcharting" install. Also this is a core functionality (endpoints) we are talking about so tons of users should have reported it but that is not the case. Hence the explicit issue-title should include subcharting use-case

/retitle no endpoints available error on subchart use-case

longwuyuan · 2024-02-15T20:13:32Z

on a different note, you use of words other ingressClass implies you may have multiple instances of the ingress-nginx controller in one cluster. that is a yet another whole different ballgame. but because the questions asked in a new bug-report template have not been answered, we can not be certain (because one of the questions in the template ask for related information)

longwuyuan · 2024-02-16T13:08:51Z

@khauser, I see that you changed the title back to refer to ingressClass but in the details of the issue description there is no output of the command "kubectl get ingressclass"
even if ingressClass is implied y other information you have posted, please help and add ingressClass information as output of kubectl command as asked above, as well as other answers to questions from the new bug-report template

strongjz · 2024-02-16T14:26:09Z

Marco is going to look deeper into this; with that said, if I'm reading this correctly, it should validate the ingress class.

The Admission control code is simple, the HandleAdmission function should validate the ingress class that matches it in this function

The way it looks is that all ingress controllers would need to be available; that may be the bug.

Gacko · 2024-02-16T19:40:37Z

Hey!

I've already seen this issue in the past and I sadly have to say everything is working as designed here. It might not be perfect, but it is the way Kubernetes and especially webhooks work.

Our chart comes with a validating webhook configuration. This configuration tells the Kubernetes API server to reach out to a service whenever something happens to the resource types specified in the configuration.

One can add multiple webhook configurations dealing with the same resources. This is what happens here: You have installed the Ingress NGINX at least twice and each of the installations comes with a webhook configuration. So the API server is told to reach out to the both of them whenever something happens with Ingress resources.

The webhook of Ingress NGINX is handled by the controller itself, so each Ingress NGINX controller pod is an eligible endpoint for the webhook the API server is calling.

Unfortunately the Kubernetes webhook configuration API is not aware of resource specific information, so it can not be limited on something inside the .spec of your Ingress resources and therefore also not the ingressClassName.

The only way to limit if a webhook is called for a specific resource are the objectSelector and the namespaceSelector in the webhook configuration spec. They allow you to only call the webhook if the resource it should be called for matches those selectors, so either its labels or its namespace. Obviously the ingressClassName is none of them. And this is the reason why every Ingress related webhook is being called for every Ingress resource.

Luckily the Ingress NGINX is checking if the ingressClassName matches with one of the IngressClasses it is managing. If the handed Ingress resource is matching none of them, it exits early.

As you might tell now, this either requires running at least one endpoint per webhook service all the time or removing the webhook configuration if there are not endpoints available.

As this is nothing we can change or improve in the Ingress NGINX project and things work as designed, I'm closing this issue for now. I already came across this with one of the customers of the company I'm working for when I was actively supporting the OSS Ingress NGINX for that company and we sadly weren't able to find a better solution there.

Feel free to ask further questions!

Kind regards
Marco

Gacko · 2024-02-16T19:40:41Z

/close

k8s-ci-robot · 2024-02-16T19:40:48Z

@Gacko: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

khauser added the kind/bug Categorizes issue or PR as related to a bug. label Feb 15, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 15, 2024

k8s-ci-robot added the needs-priority label Feb 15, 2024

strongjz added this to [SIG Network] Ingress NGINX Feb 15, 2024

k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 15, 2024

k8s-ci-robot assigned Gacko Feb 15, 2024

k8s-ci-robot changed the title ~~Admission webhook is called from other ingressClass~~ no endpoints available error on subchart use-case Feb 15, 2024

khauser changed the title ~~no endpoints available error on subchart use-case~~ Admission webhook endpoint is called from other ingressClass Feb 16, 2024

k8s-ci-robot closed this as completed Feb 16, 2024

github-project-automation bot moved this to Done in [SIG Network] Ingress NGINX Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Admission webhook endpoint is called from other ingressClass #10985

Admission webhook endpoint is called from other ingressClass #10985

khauser commented Feb 15, 2024 •

edited

Loading

k8s-ci-robot commented Feb 15, 2024

longwuyuan commented Feb 15, 2024

khauser commented Feb 15, 2024 •

edited

Loading

strongjz commented Feb 15, 2024

longwuyuan commented Feb 15, 2024

khauser commented Feb 15, 2024

longwuyuan commented Feb 15, 2024

longwuyuan commented Feb 15, 2024

khauser commented Feb 15, 2024 •

edited

Loading

longwuyuan commented Feb 15, 2024

longwuyuan commented Feb 15, 2024

longwuyuan commented Feb 16, 2024

strongjz commented Feb 16, 2024

Gacko commented Feb 16, 2024

Gacko commented Feb 16, 2024

k8s-ci-robot commented Feb 16, 2024

Admission webhook endpoint is called from other ingressClass #10985

Admission webhook endpoint is called from other ingressClass #10985

Comments

khauser commented Feb 15, 2024 • edited Loading

k8s-ci-robot commented Feb 15, 2024

longwuyuan commented Feb 15, 2024

khauser commented Feb 15, 2024 • edited Loading

strongjz commented Feb 15, 2024

longwuyuan commented Feb 15, 2024

khauser commented Feb 15, 2024

longwuyuan commented Feb 15, 2024

longwuyuan commented Feb 15, 2024

khauser commented Feb 15, 2024 • edited Loading

longwuyuan commented Feb 15, 2024

longwuyuan commented Feb 15, 2024

longwuyuan commented Feb 16, 2024

strongjz commented Feb 16, 2024

Gacko commented Feb 16, 2024

Gacko commented Feb 16, 2024

k8s-ci-robot commented Feb 16, 2024

khauser commented Feb 15, 2024 •

edited

Loading

khauser commented Feb 15, 2024 •

edited

Loading

khauser commented Feb 15, 2024 •

edited

Loading