-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud-provider-vsphere should ignore nodes with providerID prefix != 'vsphere://' #677
Comments
Hi @tommasopozzetti it's a common issue on out of tree Cloud provider interface not just on CP vsphere Let me check if it's possible to ignore node. probably not since CPI will just delete those unless we throw error. I would bring this up in the sig-cloud-provider meeting. Thanks for mentioning this |
Hi @lubronzhan thank you for taking a look and for the extra info! I think your original proposal of an alpha feature would be reasonable though. While I agree that this is a shared effort between the CPI and CP vsphere, the assumption is that CPI will be running also in the other provider's cloud controller so it is still vsphere CP responsibility I think to ignore the nodes that are not specific to this CPI implementation to avoid throwing error for nodes that are not registered with this cloud provider. The issue that remains and that indeed would be more on the CPI side would be that of node deletion (aka what to do for a node that is not registered yet and is found by vsphere not to be an existing VM). I think in the grand scheme of things, implementing something like providerClasses would be the right way to go, but in the meantime even just a simple (reasonable and possibly configurable as another alpha feature disabled by default) timeout before deleting nodes (to give them time to initialize if managed by another provider) would solve the issue. Let me know what you think! |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened?
When running with the in-tree cloud controller manager, each kubelet deals with its own node, allowing different kubelets in a cluster to have different cloud controller manager implementations.
When moving to this external implementation, the controller makes the assumption that every node in the cluster is a VM from vsphere, and if that is not the case it deletes non-found VMs as soon as the nodes are NotReady and it continuously prints out warnings that the VMs are not found when the nodes are Ready.
What did you expect to happen?
The controller should allow multiple cloud providers to coexist in the same cluster. For example in the ClusterAPI case, both VMs and bare metal nodes might be required, and the bare metal nodes might be managed with a different provider such as BYOH or metal3.
Each different provider is already being differentiated by setting a unique providerID prefix on nodes, so it would be enough for the cloud-provider-vsphere to give some time when a new node pops up for it to be initialized before assuming it needs to delete it (in case it does not find that it needs to initialize it itself by finding the corresponding VM), and simply ignore initialized nodes that have a providerID prefix that is not the vsphere one (communicating that some other cloud controller manager is managing those).
How can we reproduce it (as minimally and precisely as possible)?
Create a ClusterAPI cluster with nodes from the CAPV provider and install the cloud-provider-vsphere. Add a node from a different provider (such as BYOH provider).
Notice cloud-provider-vsphere outputting warning logs that the VM cannot be found and deleting the node if it cannot become Ready in time.
Anything else we need to know (please consider providing level 4 or above logs of CPI)?
A current workaround consists in disabling node deletion for cloud-provider-vsphere by setting env variable
SKIP_NODE_DELETION
as explained in here which allows the nodes to coexist by avoiding deletion when the bare metal nodes are NotReady, but it still generates lots of warning logs from cloud-provider-vsphere about VMs not being found. Furthermore, this workaround removes the useful feature of node deletion when a VM is deleted from the provider (though a similar behavior can be achieved, at least in the clusterAPI case, through the CAPI primitives instead and the CAPV provider).Kubernetes version
1.23.5
Cloud provider or hardware configuration
vSphere 7.0.3.00700
OS version
No response
Kernel (e.g.
uname -a
)No response
Install tools
No response
Container runtime (CRI) and and version (if applicable)
No response
Related plugins (CNI, CSI, ...) and versions (if applicable)
No response
Others
No response
The text was updated successfully, but these errors were encountered: