-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling Helm .Values.controller.opentelemetry.enabled prevents pods from being rescheduled on different nodes and cluster auto-scaling from working. #10911
Comments
When I add
scaling works again as expected but I think it should not be necessary to do this. |
Please provide additional configuration, e.g. the complete values.yaml file. I tested locally to see what difference there is in manifest output when settings these values: controller:
opentelemetry:
enabled: "false"
config:
enable-opentelemetry: "true"
autoscaling:
apiVersion: autoscaling/v2
enabled: true
minReplicas: 2 And the difference is as expected: % diff enable-opentelemetry-*
256c256
< enable-opentelemetry: "false"
---
> enable-opentelemetry: "true"
598a599,600
> - mountPath: /modules_mount
> name: modules
602a605,622
> initContainers:
> - command:
> - /init_module
> image: registry.k8s.io/ingress-nginx/opentelemetry:v20230721-3e2062ee5@sha256:13bee3f5223883d3ca62fee7309ad02d22ec00ff0d7033e3e9aca7a9f60fd472
> name: opentelemetry
> securityContext:
> allowPrivilegeEscalation: false
> capabilities:
> drop:
> - ALL
> readOnlyRootFilesystem: true
> runAsNonRoot: true
> runAsUser: 65532
> seccompProfile:
> type: RuntimeDefault
> volumeMounts:
> - mountPath: /modules_mount
> name: modules
614a635,636
> - emptyDir: {}
> name: modules I don't think this is an Nginx issue, but an issue with your configuration. Please provide more information, and look into what specifically cluster-autoscaler is giving as an error. |
Yes, that's the expected output. Notice the emptyDir which is the source of the problem due to cluster autoscaler's handling of emptyDir... This is an issue with the helm chart as I see it. If
See the custer-autoscaler (Not HorizontalPodAutoscaler) FAQ regarding how emptyDir is handled... This lets the cluster autoscaler know it's safe to evict/reschedule the ingress-nginx pods even though they have the emptyDir so that nodes can be shutdown. When I add the following to my values.yaml for ingress-nginx it works properly because the annotation is added...
This is where the fix needs to be added to automatically add the safe-to-evict annotation...
|
Seems you figured out a way to solve this with the configuration options already available in the helm chart. Or is there additional changes you believe must be implemented? If so which? |
I think that annotation should automatically be added by the helm template since turning on opentelemetry shouldn't have the side effect of breaking cluster autoscaling. But yes, it's fixed for me using the podAnnotations (even before I opened this issue). |
I think the helm template controller-daemonset.yaml part where podAnnotations are used should be updated to something like this (untested!!)...
|
I'm not a maintainer, but this seems like a documentation enhancement rather than a helm change. I know they encourage PRs, so I would suggest that |
/remove-kind bug Yeah, please submit PR if you are inclined to. Thank you very much /help |
@longwuyuan: GuidelinesPlease ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/triage accepted |
This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach |
What happened:
Enabling .Values.controller.opentelemetry.enabled prevents pods from being rescheduled on different nodes and cluster auto-scaling from working.
What you expected to happen:
Expected cluster auto-scaling to continue to reschedule ingres-nginx pods on different nodes
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e
The helm chart adds emptyDir: {} here when the .Values.controller.opentelemetry.enabled is set to true:
ingress-nginx/charts/ingress-nginx/templates/controller-daemonset.yaml
Line 215 in 0c3d52b
I think it should also add the annotation in podAnnotations area in the same file...
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
automatically in the helm chart when .Values.controller.opentelemetry.enabled or at least give a warning that you might want to consider adding it yourself because cluster autoscaling won't be able to move your ingress-nginx pods otherwise.
The text was updated successfully, but these errors were encountered: