-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e: use helm to install out-of-tree cloud-provider-azure #2209
e2e: use helm to install out-of-tree cloud-provider-azure #2209
Conversation
/test pull-cluster-api-provider-azure-e2e-optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern with this is we now have two ways to install cluster addons on until the new proposal lands and goes forward. This just adds some complexity from maintenace and possible confusion. We are also suggesting using helm for most out of the box clusters. Is that what we desire? I don't have any strong objections just pointing it out.
We also probably need to add docs on how to do this post cluster creation.
@@ -450,6 +450,10 @@ var _ = Describe("Workload cluster creation", func() { | |||
WaitForClusterIntervals: e2eConfig.GetIntervals(specName, "wait-cluster"), | |||
WaitForControlPlaneIntervals: e2eConfig.GetIntervals(specName, "wait-control-plane"), | |||
WaitForMachinePools: e2eConfig.GetIntervals(specName, "wait-machine-pool-nodes"), | |||
ControlPlaneWaiters: clusterctl.ControlPlaneWaiters{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, didn't know about this!
fa8fd21
to
7313554
Compare
/test pull-cluster-api-provider-azure-e2e-optional |
test/e2e/cloud-provider-azure.go
Outdated
if err != nil { | ||
return err | ||
} | ||
if len(n.Items) == (int(*input.ConfigCluster.WorkerMachineCount) + int(*input.ConfigCluster.ControlPlaneMachineCount)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't run for windows currently, but if it did we would need to include WINDOWS_WORKER_MACHINE_COUNT
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this brings up a good question. On this test, for example, this WorkerMachineCount
value seems to apply to each node pool (not total number of expected worker nodes):
And indeed, this value seems to be used to populate the named env var that kustomize uses to configure a single value which by convention is shared across all MachinePool
s/MachineDeployment
s:
https://github.com/kubernetes-sigs/cluster-api/blob/v1.1.3/cmd/clusterctl/client/config.go#L436
This check is really here to know that we're ready to reliably perform a helm install
against the cluster. We can simply check for 1 Running node to do that.
This will work for a Windows-enabled cluster.
Do we want to just convert the reference "external-cloud-provider" template to have a Windows MachineDeployment
as well? @CecileRobertMichon for thoughts as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should not change the existing external cloud provider template in this PR but instead change all the reference templates to use external cloud provider as we had started doing in #1889 in a follow PR. That will include various windows flavors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
ca7138b
to
1fc368f
Compare
/test pull-cluster-api-provider-azure-e2e-optional |
1fc368f
to
60d5a04
Compare
test/e2e/cloud-provider-azure.go
Outdated
p := helmGetter.All(settings) | ||
valueOpts := &helmVals.Options{} | ||
valueOpts.Values = []string{fmt.Sprintf("infra.clusterName=%s", input.ClusterProxy.GetName())} | ||
if imageRegistryFromEnv := os.Getenv("IMAGE_REGISTRY"); imageRegistryFromEnv != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CecileRobertMichon @jsturtevant the below is a proposed solution to ensuring these changes are back-compat with upstream cloud-provider-azure test-infra jobs that rely upon capz to test out-of-tree. This is how the current tests currently work:
TEST_CCM=true
is set when callingscripts/ci-entrypoint.sh
:
- capz maintains a script to set the CI-built image data here:
source "${REPO_ROOT}/scripts/ci-build-azure-ccm.sh" export IMAGE_REGISTRY=${REGISTRY} cluster-api-provider-azure/scripts/ci-build-azure-ccm.sh
Lines 34 to 36 in 1352fa2
declare CCM_IMAGE_NAME=azure-cloud-controller-manager # cloud node manager image declare CNM_IMAGE_NAME=azure-cloud-node-manager pushd "${AZURE_CLOUD_PROVIDER_ROOT}" && IMAGE_TAG=$(git rev-parse --short=7 HEAD) && export IMAGE_TAG && popd
These env vars aren't as specifically named as we'd ideally want, but re-using them as-is allows us to reduce the change surface area (don't have to update capz CI scripts and/or test-infra job defs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the out of tree cloud provider PR tests and conformance tests don't run CAPZ e2e so they won't call InstallCloudProviderAzureHelmChart
as far as I know. We may need to 1) modify ci-entrypoint.sh directly to do the helm install for out of tree cases, and 2) change conformance_test.go to use this custom control plane waiter too when testing out of tree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @lzhecheng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have to worry about the conformance stuff yet (in this PR), it seems the only template flavors that it currently uses are default and windows (not the external templates).
When we eventually get to the part of this effort where we're testing out-of-tree by default, then yeah, the conformance ControlPlaneWaiters
will need to get updated as well.
/test pull-cluster-api-provider-azure-e2e-optional |
60d5a04
to
20b2855
Compare
/test pull-cluster-api-provider-azure-e2e-optional |
20b2855
to
fb2b32e
Compare
@@ -152,6 +154,11 @@ create_cluster | |||
# export the target cluster KUBECONFIG if not already set | |||
export KUBECONFIG="${KUBECONFIG:-${PWD}/kubeconfig}" | |||
|
|||
# install cloud-provider-azure components, if using out-of-tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CecileRobertMichon @lzhecheng incorporated the helm stuff into the ci-entrypoint.sh
script itself. The way this will work is basically the same way I implemented this in the Tiltfile
(wait for kubeconfig secret, wait for successful get nodes). Reference:
cluster-api-provider-azure/Makefile
Lines 283 to 287 in 1352fa2
# Wait for the kubeconfig to become available. | |
timeout --foreground 300 bash -c "while ! kubectl get secrets | grep $(CLUSTER_NAME)-kubeconfig; do sleep 1; done" | |
# Get kubeconfig and store it locally. | |
kubectl get secrets $(CLUSTER_NAME)-kubeconfig -o json | jq -r .data.value | base64 --decode > ./kubeconfig | |
timeout --foreground 600 bash -c "while ! kubectl --kubeconfig=./kubeconfig get nodes | grep control-plane; do sleep 1; done" |
/test pull-cluster-api-provider-azure-e2e-optional |
fb2b32e
to
bb81770
Compare
test/e2e/cloud-provider-azure.go
Outdated
Namespace: input.ConfigCluster.Namespace, | ||
} | ||
return client.Get(ctx, key, secret) | ||
}, 20*time.Minute, 5*time.Second).Should(Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be using a config interval variable here instead of hardcoding to 20 minutes? same question for intervals below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reusing the WaitForControlPlaneIntervals
var everywhere here
arguably you can tune these down per retryable operations, but it's debatable whether or not it's worth the effort
90026ff
to
2527091
Compare
/retest |
/test pull-cluster-api-provider-azure-e2e-optional |
/retest |
/lgtm |
/hold cancel |
/assign @mboersma James is OOF this week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few questions, but this looks good.
@@ -331,9 +332,13 @@ def deploy_worker_templates(template, substitutions): | |||
yaml = yaml.replace("${" + substitution + "}", value) | |||
|
|||
yaml = yaml.replace('"', '\\"') # add escape character to double quotes in yaml | |||
flavor_name = os.path.basename(flavor) | |||
flavor_cmd = "RANDOM=$(bash -c 'echo $RANDOM'); CLUSTER_NAME=" + flavor.replace("windows", "win") + "-$RANDOM; make generate-flavors; echo \"" + yaml + "\" > ./.tiltbuild/" + flavor + "; cat ./.tiltbuild/" + flavor + " | " + envsubst_cmd + " | " + kubectl_cmd + " apply -f - && echo \"Cluster \'$CLUSTER_NAME\' created, don't forget to delete\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -236,6 +241,9 @@ verify-tiltfile: ## Verify Tiltfile format. | |||
|
|||
##@ Development: | |||
|
|||
.PHONY: install-tools # populate hack/tools/bin | |||
install-tools: $(ENVSUBST) $(KUSTOMIZE) $(KUBECTL) $(HELM) $(GINKGO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also mildly opposed to this change if it adds any noticeable time to testing targets. Could we maybe have the new target install just the tools that are actually shared, or is this still a maintenance headache?
install-tools: $(ENVSUBST) $(KUSTOMIZE) $(KUBECTL)
test-e2e-run: generate-e2e-templates install-tools $(GINKGO)
7251cc7
to
f8c166d
Compare
/test pull-cluster-api-provider-azure-e2e-optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CecileRobertMichon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR updates the "external" (or out-of-tree) cloud-provider-azure templates and test implementation so that we use the official helm chart (see this PR: kubernetes-sigs/cloud-provider-azure#1306).
This approach has the following advantages:
ClusterResourceSet
CRD, which is alpha and may not be enabled on your cluster-api mgmt clusterWhich issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
TODOs:
Release note: