Ignore nodes from other providers #6358

alex-treebeard · 2023-12-07T18:01:22Z

Which component are you using?:

Cluster Autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

When I have nodes with provider IDs that are incompatible with my cluster autoscaler's configured provider, the entire autoscaling loop fails.

Describe the solution you'd like.:
Ignore nodes with incompatible provider IDs

Describe any alternative solutions you've considered.:

using the existing startup-taints flag (doesn't seem to help)

I also tried to use --status-taint but registry.k8s.io/provider-aws/cloud-controller-manager:v1.28.1 said there was no such CLI flag

Additional context.:

I'm running a hybrid cluster (1 aws node, 1 azure) using k3s.

I believe that the cluster autoscaler won't work as long as I'm specifying providerIds for the respective clouds -- see logs for azure cluster autoscaler:

I1207 17:57:28.317543       1 static_autoscaler.go:289] Starting main loop
I1207 17:57:28.317914       1 azure_cache.go:329] FindForInstance: starts, ref: azure:///subscriptions/51b6ee5b-482d-4175-9627-dd226cf03844/resourceGroups/k3s/providers/Microsoft.Compute/virtualMachines/k3s-vm_2
I1207 17:57:28.317946       1 azure_cache.go:331] FindForInstance: resourceID: azure:///subscriptions/51b6ee5b-482d-4175-9627-dd226cf03844/resourceGroups/k3s/providers/Microsoft.Compute/virtualMachines/k3s-vm_2
I1207 17:57:28.317951       1 azure_cache.go:339] FindForInstance: Couldn't find NodeGroup of instance {"azure:///subscriptions/51b6ee5b-482d-4175-9627-dd226cf03844/resourceGroups/k3s/providers/Microsoft.Compute/virtualMachines/k3s-vm_2"}
I1207 17:57:28.317964       1 azure_cache.go:329] FindForInstance: starts, ref: aws:///us-east-1a/i-0ffc55eae83dfe1e3
I1207 17:57:28.317972       1 azure_cache.go:331] FindForInstance: resourceID: 
E1207 17:57:28.317983       1 static_autoscaler.go:354] Failed to get node infos for groups: "aws:///us-east-1a/i-0ffc55eae83dfe1e3" isn't in Azure resource ID format
I1207 17:57:30.338994       1 reflector.go:788] k8s.io/client-go/informers/factory.go:150: Watch close - *v1.Pod total 16 items received
I1207 17:57:37.716943       1 reflector.go:788] k8s.io/client-go/informers/factory.go:150: Watch close - *v1.StorageClass total 0 items received
I1207 17:57:38.318168       1 static_autoscaler.go:289] Starting main loop
I1207 17:57:38.318374       1 azure_cache.go:329] FindForInstance: starts, ref: aws:///us-east-1a/i-0ffc55eae83dfe1e3
I1207 17:57:38.318390       1 azure_cache.go:331] FindForInstance: resourceID: 
E1207 17:57:38.318406       1 static_autoscaler.go:354] Failed to get node infos for groups: "aws:///us-east-1a/i-0ffc55eae83dfe1e3" isn't in Azure resource ID format

The text was updated successfully, but these errors were encountered:

cristianrat · 2024-01-24T18:16:39Z

so, did you manage to find a way around this?
for me it's an issue because i get failed to check if server exists for a node - but that node is manually added by me from a dedicated box... so... it will never be in the api :)

alex-treebeard · 2024-01-25T14:31:46Z

hi @cristianrat, I came to the conclusion that many Kubernetes components are architected to only support one cloud provider per cluster, there may be some hacks and workarounds for individual components but it's an uphill battle.

cristianrat · 2024-01-25T14:45:20Z

indeed that seems to be the case..

…

On Thu, Jan 25, 2024 at 2:31 PM Alex Remedios ***@***.***> wrote: hi @cristianrat <https://github.com/cristianrat>, I came to the conclusion that many Kubernetes components are architected to only support one cloud provider per cluster, there may be some hacks and workarounds for individual components but it's an uphill battle. — Reply to this email directly, view it on GitHub <#6358 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALFYCRH2JJ6TYZNHRR33LRLYQJUF3AVCNFSM6AAAAABALO5TOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJQGMZTAMJYGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mddamato · 2024-02-14T02:03:44Z

@cristianrat I am doing something similar. A mix of on-premise and aws nodes. I'd like the CCM to just ignore these. You find a solution to this?

mddamato · 2024-02-14T02:04:54Z

kubernetes/cloud-provider#35

cristianrat · 2024-02-14T08:49:25Z

@mddamato no, I haven't found a solution for this
I'm just ok without auto scaling for now
This, however, should really be a feature (the CCM pod is throwing errors like crazy, because it can't find one node via the api calls)

k8s-triage-robot · 2024-06-19T15:42:52Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-07-19T16:37:43Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-08-18T16:42:21Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-08-18T16:42:26Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

alex-treebeard added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 7, 2023

towca added area/cluster-autoscaler area/core-autoscaler Denotes an issue that is related to the core autoscaler and is not specific to any provider. labels Mar 21, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 19, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore nodes from other providers #6358

Ignore nodes from other providers #6358

alex-treebeard commented Dec 7, 2023

cristianrat commented Jan 24, 2024

alex-treebeard commented Jan 25, 2024

cristianrat commented Jan 25, 2024 via email

mddamato commented Feb 14, 2024

mddamato commented Feb 14, 2024

cristianrat commented Feb 14, 2024

k8s-triage-robot commented Jun 19, 2024

k8s-triage-robot commented Jul 19, 2024

k8s-triage-robot commented Aug 18, 2024

k8s-ci-robot commented Aug 18, 2024

Ignore nodes from other providers #6358

Ignore nodes from other providers #6358

Comments

alex-treebeard commented Dec 7, 2023

cristianrat commented Jan 24, 2024

alex-treebeard commented Jan 25, 2024

cristianrat commented Jan 25, 2024 via email

mddamato commented Feb 14, 2024

mddamato commented Feb 14, 2024

cristianrat commented Feb 14, 2024

k8s-triage-robot commented Jun 19, 2024

k8s-triage-robot commented Jul 19, 2024

k8s-triage-robot commented Aug 18, 2024

k8s-ci-robot commented Aug 18, 2024