Skip to content

Commit

Permalink
Improve status_reason when cluster creation is stalled
Browse files Browse the repository at this point in the history
If during cluster creation or update, one or more nodes cannot be
created by Nova (because, for example, the project runs out of
quota, or Nova runs out of suitable hypervisors), the cluster stalls
with CREATE_IN_PROGRESS status. The status_reason in this case is
not helpful as it only reports the previous step which succeeded.

This patch adds a check of each individual machine status looking
for machines which are not ready and reports any reason found.
This can result in useful status_reason messages such as
 'error creating Openstack instance ... Quota exceeded for instances'
  • Loading branch information
stuartgrace-bbc authored and Jonathan Rosser committed Jan 8, 2025
1 parent 724a3fb commit 8fdb65e
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions magnum_cluster_api/driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,25 @@ def update_cluster_status(
cluster.save()
return

# Check reason if an individual machine is not ready
machines = objects.Machine.objects(self.k8s_api).filter(
namespace="magnum-system",
selector={
"cluster.x-k8s.io/cluster-name": cluster.stack_id,
},
)

for machine in machines:
for cond in machine.obj["status"]["conditions"]:
if (
cond.get("type") == "InfrastructureReady"
and cond.get("status") == "False"
):
messagetext = cond.get("message")
if messagetext:
cluster.status_reason = messagetext
cluster.save()

api_endpoint = capi_cluster.obj["spec"]["controlPlaneEndpoint"]
cluster.api_address = (
f"https://{api_endpoint['host']}:{api_endpoint['port']}"
Expand Down

0 comments on commit 8fdb65e

Please sign in to comment.