Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admission webhook endpoint is called from other ingressClass #10985

Closed
khauser opened this issue Feb 15, 2024 · 16 comments
Closed

Admission webhook endpoint is called from other ingressClass #10985

khauser opened this issue Feb 15, 2024 · 16 comments
Assignees
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@khauser
Copy link

khauser commented Feb 15, 2024

What happened:

I have some failing helm chart installations like:

+ helm --kubeconfig /data/k8s/kube-config.yaml install iste-run-tp221-ts4366 ./icm -f ./values.yaml
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /data/k8s/kube-config.yaml
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /data/k8s/kube-config.yaml
Error: INSTALLATION FAILED: 1 error occurred:
	* Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://iste-run-tp221-ts4368-ingress-nginx-controller-admission.iste-testrun.svc:443/networking/v1/ingresses?timeout=10s": no endpoints available for service "iste-run-tp221-ts4368-ingress-nginx-controller-admission"

So I try to install iste-run-tp221-ts4366 with ingressClass "ingress-class-iste-run-tp221-ts4366" and it fails due to another ingress with class "ingress-class-iste-run-tp221-ts4368".

Here is the configuration:

ingress-nginx:
  enabled: true
  controller:
    kind: Deployment
    ingressClassByName: "true"
    ingressClassResource:
      name: "ingress-class-${HELM_JOB_NAME}"
      controllerValue: "k8s.io/ingress-${HELM_JOB_NAME}"
    ingressClass: "ingress-class-${HELM_JOB_NAME}"
    nodeSelector:
      agentpool: agentpool2
    affinity:
      podAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: release
              operator: In
              values:
              - ${HELM_JOB_NAME}-icm-as
          topologyKey: kubernetes.io/hostname
    service:
      external:
        enabled: "false"
      internal:
        enabled: "true"
        annotations:
          service.beta.kubernetes.io/azure-load-balancer-internal: "true"
      externalTrafficPolicy: Local
    publishService:
      enabled: "true"
    resources:
      requests:
        cpu: 200m
        memory: 300Mi

  defaultBackend:
    enabled: false

HELM_JOB_NAME could be e.g. "iste-run-tp221-ts4366" or "iste-run-tp221-ts4368", ...

Our Ingress backend values.yaml:

  ingress:
    enabled: true
    className: "ingress-class-${HELM_JOB_NAME}"
    annotations:
      nginx.ingress.kubernetes.io/proxy-body-size: "600m"
      service.beta.kubernetes.io/azure-load-balancer-internal: "true"
    hosts:
    - host: "${HELM_JOB_NAME}.${DNS_ZONE_NAME}"
      paths:
      - path: /
        pathType: Prefix
    tls:
    - secretName: <ours>
      hosts:
      - "${HELM_JOB_NAME}.${DNS_ZONE_NAME}"

What you expected to happen:

One ingress shouldn't call webhooks from the other if these are differentiated by an ingressClass

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

Pod image: registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e

Kubernetes version (use kubectl version):
Client Version: v1.28.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.3

Environment:

@khauser khauser added the kind/bug Categorizes issue or PR as related to a bug. label Feb 15, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 15, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@longwuyuan
Copy link
Contributor

/remove-kind bug

  • Uninstall and remove all current resources related to ingress-nginx
  • Perform a non-custom vanilla install as per the documentation https://kubernetes.github.io/ingress-nginx/deploy/
  • Update status here
  • Now customize as per your needs and provide info on the diff
  • Ensure that all the questions that are asked in the issue template are answered
  • Update status

/triage needs-information

@k8s-ci-robot k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 15, 2024
@khauser
Copy link
Author

khauser commented Feb 15, 2024

/label bug

It's a non-custom vanilla installation as chart dependency and the bug is obvious

@strongjz
Copy link
Member

/assign @Gacko

@longwuyuan
Copy link
Contributor

Is the ingress-nginx controller chart a sub-chart of your chart release named as iste-run-tp221-ts4366 ?

@khauser
Copy link
Author

khauser commented Feb 15, 2024

Yes it is.

@longwuyuan
Copy link
Contributor

thanks

  • just FYI, CI does not chart installation of the ingress-nginx chart as a subchart
  • you may want to add all the related info for readers to comprehend the complete use-case
  • all details will also be essential in case someone wants to reproduce
  • the error no endpoint available is a hint. you could help out with additional information by doing a test like a vanilla install as per docs https://kubernetes.github.io/ingress-nginx/deploy/ and update status here. This will let readers know if you get the same error, even on a vanilla install

@longwuyuan
Copy link
Contributor

Also, highly significant in this kind of use-case, the questions that are asked in a new bug-report for this project cover several critical aspects that a reader needs to make useful comments based on data. Hence if you read those questions of a new bug-report template and then edit this issue-description with the answers to those questions, it will be very useful to readers

@khauser
Copy link
Author

khauser commented Feb 15, 2024

Thanks for guidance.

Our usecase is happening in our testenvironment. There our product is tested within an Azure AKS and a scaling agentpool which could scale from 0-40. On each of this nodes multiple tests could be executed in paralell. In front of our product we use an ingress to terminate SSL. extenaldns regognizes new ingress rules and creates private dns entries. With these a test could be started. The ingress talks http with our application.

I tried also to use just one Ingress but there the problem is that an ingress rule change forces a nginx restart. So one test runs and one of the next starting destroys the other. Creating all needed Ingress rules before one test starts doesn't feel good but would be possible. On the other hand each test should be isolated as possible so One-Test&One-IngressController should be better.

My workaround is now to disable the webhooks admission with controller.admissionWebhooks.enabled: false (ATTENTION: Without quotes) and it seems to work that way. But I assume that there are usecases where it is needed. For this you now have bug report.

During my test I had 4 of 8 Tests starting well in the beginning. Then some tests began to fail due to the wrong endpoint usage.

Multiple vanilla installations would be very tasteful but not different from having multiple subchart deployments.

Here is also our ingress rule template file because I missed to add it initially:

{{- if .Values.ingress.enabled -}}
{{- $fullName := include "icm-as.fullname" . -}}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: {{ $fullName }}
  labels:
    app.kubernetes.io/name: {{ $fullName }}
    helm.sh/chart: {{ include "icm-as.chart" . }}
    app.kubernetes.io/instance: {{ $fullName }}
    app.kubernetes.io/managed-by: {{ .Release.Service }}
  annotations:
    nginx.ingress.kubernetes.io/affinity: cookie
    nginx.ingress.kubernetes.io/affinity-mode: persistent
    nginx.ingress.kubernetes.io/session-cookie-name: stid
    {{- range $key, $value := .Values.ingress.annotations }}
    {{ $key }}: {{ $value | quote }}
    {{- end }}
spec:
  {{- if .Values.ingress.className }}
  ingressClassName: {{ .Values.ingress.className }}
  {{- else }}
  ingressClassName: {{ $fullName }}-ingress
  {{- end }}
  {{- if .Values.ingress.tls }}
  tls:
    {{- range .Values.ingress.tls }}
    - hosts:
        {{- range .hosts }}
        - {{ . | quote }}
        {{- end }}
      secretName: {{ .secretName }}
    {{- end }}
  {{- end }}
  rules:
    {{- range .Values.ingress.hosts }}
    - host: {{ .host | quote }}
      http:
        paths:
          {{- range .paths }}
          - path: {{ .path }}
            pathType: {{ .pathType }}
            backend:
              service:
                name: {{ $fullName }}
                port:
                  name: svc
          {{- end }}
    {{- end }}
{{- end }}

That's all I have for now. Please tell me if you need more.

@longwuyuan
Copy link
Contributor

  • There is likely some more contributor who may comment so lets wait for that too
  • Personally I already likely know what is the problem but because that discussion is a deep dive and too complicated to type out, or even engage in at this stage, its best left for a later stage, when more real data is already posted in this thread. In short you should run k get po -A -w and k get ep -A -w and concurrently run k get events -A -w. This info when corelated with timestamps will tell you the precise timings that your parent chart expected the endpoint to be ready and the precise timings the endpoint actually became ready.
  • I repeat that we don't test subchart usecase in CI so you are pretty much on your own to design the ingress-nginx controller's use-case as a sub-chart. Maybe even add "wait" periods in your chart install steps for example.
  • When others see the error message here and the other data you have provided, there would be comments forthcoming so hoping your issue gets resolved sooner than later
  • You already said the bug is obvious, but I can't reproduce the "endpoints not available" error on a vanilla install i.e. "no subcharting" install. Also this is a core functionality (endpoints) we are talking about so tons of users should have reported it but that is not the case. Hence the explicit issue-title should include subcharting use-case

/retitle no endpoints available error on subchart use-case

@k8s-ci-robot k8s-ci-robot changed the title Admission webhook is called from other ingressClass no endpoints available error on subchart use-case Feb 15, 2024
@longwuyuan
Copy link
Contributor

on a different note, you use of words other ingressClass implies you may have multiple instances of the ingress-nginx controller in one cluster. that is a yet another whole different ballgame. but because the questions asked in a new bug-report template have not been answered, we can not be certain (because one of the questions in the template ask for related information)

@khauser khauser changed the title no endpoints available error on subchart use-case Admission webhook endpoint is called from other ingressClass Feb 16, 2024
@longwuyuan
Copy link
Contributor

  • @khauser, I see that you changed the title back to refer to ingressClass but in the details of the issue description there is no output of the command "kubectl get ingressclass"
  • even if ingressClass is implied y other information you have posted, please help and add ingressClass information as output of kubectl command as asked above, as well as other answers to questions from the new bug-report template

@strongjz
Copy link
Member

Marco is going to look deeper into this; with that said, if I'm reading this correctly, it should validate the ingress class.

The Admission control code is simple, the HandleAdmission function should validate the ingress class that matches it in this function

The way it looks is that all ingress controllers would need to be available; that may be the bug.

@Gacko
Copy link
Member

Gacko commented Feb 16, 2024

Hey!

I've already seen this issue in the past and I sadly have to say everything is working as designed here. It might not be perfect, but it is the way Kubernetes and especially webhooks work.

Our chart comes with a validating webhook configuration. This configuration tells the Kubernetes API server to reach out to a service whenever something happens to the resource types specified in the configuration.

One can add multiple webhook configurations dealing with the same resources. This is what happens here: You have installed the Ingress NGINX at least twice and each of the installations comes with a webhook configuration. So the API server is told to reach out to the both of them whenever something happens with Ingress resources.

The webhook of Ingress NGINX is handled by the controller itself, so each Ingress NGINX controller pod is an eligible endpoint for the webhook the API server is calling.

Unfortunately the Kubernetes webhook configuration API is not aware of resource specific information, so it can not be limited on something inside the .spec of your Ingress resources and therefore also not the ingressClassName.

The only way to limit if a webhook is called for a specific resource are the objectSelector and the namespaceSelector in the webhook configuration spec. They allow you to only call the webhook if the resource it should be called for matches those selectors, so either its labels or its namespace. Obviously the ingressClassName is none of them. And this is the reason why every Ingress related webhook is being called for every Ingress resource.

Luckily the Ingress NGINX is checking if the ingressClassName matches with one of the IngressClasses it is managing. If the handed Ingress resource is matching none of them, it exits early.

As you might tell now, this either requires running at least one endpoint per webhook service all the time or removing the webhook configuration if there are not endpoints available.

As this is nothing we can change or improve in the Ingress NGINX project and things work as designed, I'm closing this issue for now. I already came across this with one of the customers of the company I'm working for when I was actively supporting the OSS Ingress NGINX for that company and we sadly weren't able to find a better solution there.

Feel free to ask further questions!

Kind regards
Marco

@Gacko
Copy link
Member

Gacko commented Feb 16, 2024

/close

@k8s-ci-robot
Copy link
Contributor

@Gacko: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Development

No branches or pull requests

5 participants