Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore nodes from other providers #6358

Closed
alex-treebeard opened this issue Dec 7, 2023 · 10 comments
Closed

Ignore nodes from other providers #6358

alex-treebeard opened this issue Dec 7, 2023 · 10 comments
Labels
area/cluster-autoscaler area/core-autoscaler Denotes an issue that is related to the core autoscaler and is not specific to any provider. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@alex-treebeard
Copy link

Which component are you using?:

Cluster Autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

When I have nodes with provider IDs that are incompatible with my cluster autoscaler's configured provider, the entire autoscaling loop fails.

Describe the solution you'd like.:
Ignore nodes with incompatible provider IDs

Describe any alternative solutions you've considered.:

using the existing startup-taints flag (doesn't seem to help)

I also tried to use --status-taint but registry.k8s.io/provider-aws/cloud-controller-manager:v1.28.1 said there was no such CLI flag

Additional context.:

I'm running a hybrid cluster (1 aws node, 1 azure) using k3s.

I believe that the cluster autoscaler won't work as long as I'm specifying providerIds for the respective clouds -- see logs for azure cluster autoscaler:

I1207 17:57:28.317543       1 static_autoscaler.go:289] Starting main loop
I1207 17:57:28.317914       1 azure_cache.go:329] FindForInstance: starts, ref: azure:///subscriptions/51b6ee5b-482d-4175-9627-dd226cf03844/resourceGroups/k3s/providers/Microsoft.Compute/virtualMachines/k3s-vm_2
I1207 17:57:28.317946       1 azure_cache.go:331] FindForInstance: resourceID: azure:///subscriptions/51b6ee5b-482d-4175-9627-dd226cf03844/resourceGroups/k3s/providers/Microsoft.Compute/virtualMachines/k3s-vm_2
I1207 17:57:28.317951       1 azure_cache.go:339] FindForInstance: Couldn't find NodeGroup of instance {"azure:///subscriptions/51b6ee5b-482d-4175-9627-dd226cf03844/resourceGroups/k3s/providers/Microsoft.Compute/virtualMachines/k3s-vm_2"}
I1207 17:57:28.317964       1 azure_cache.go:329] FindForInstance: starts, ref: aws:///us-east-1a/i-0ffc55eae83dfe1e3
I1207 17:57:28.317972       1 azure_cache.go:331] FindForInstance: resourceID: 
E1207 17:57:28.317983       1 static_autoscaler.go:354] Failed to get node infos for groups: "aws:///us-east-1a/i-0ffc55eae83dfe1e3" isn't in Azure resource ID format
I1207 17:57:30.338994       1 reflector.go:788] k8s.io/client-go/informers/factory.go:150: Watch close - *v1.Pod total 16 items received
I1207 17:57:37.716943       1 reflector.go:788] k8s.io/client-go/informers/factory.go:150: Watch close - *v1.StorageClass total 0 items received
I1207 17:57:38.318168       1 static_autoscaler.go:289] Starting main loop
I1207 17:57:38.318374       1 azure_cache.go:329] FindForInstance: starts, ref: aws:///us-east-1a/i-0ffc55eae83dfe1e3
I1207 17:57:38.318390       1 azure_cache.go:331] FindForInstance: resourceID: 
E1207 17:57:38.318406       1 static_autoscaler.go:354] Failed to get node infos for groups: "aws:///us-east-1a/i-0ffc55eae83dfe1e3" isn't in Azure resource ID format
@alex-treebeard alex-treebeard added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 7, 2023
@cristianrat
Copy link

so, did you manage to find a way around this?
for me it's an issue because i get failed to check if server exists for a node - but that node is manually added by me from a dedicated box... so... it will never be in the api :)

@alex-treebeard
Copy link
Author

hi @cristianrat, I came to the conclusion that many Kubernetes components are architected to only support one cloud provider per cluster, there may be some hacks and workarounds for individual components but it's an uphill battle.

@cristianrat
Copy link

cristianrat commented Jan 25, 2024 via email

@mddamato
Copy link

@cristianrat I am doing something similar. A mix of on-premise and aws nodes. I'd like the CCM to just ignore these. You find a solution to this?

@mddamato
Copy link

@cristianrat
Copy link

@mddamato no, I haven't found a solution for this
I'm just ok without auto scaling for now
This, however, should really be a feature (the CCM pod is throwing errors like crazy, because it can't find one node via the api calls)

@towca towca added area/cluster-autoscaler area/core-autoscaler Denotes an issue that is related to the core autoscaler and is not specific to any provider. labels Mar 21, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler area/core-autoscaler Denotes an issue that is related to the core autoscaler and is not specific to any provider. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants