Make Cl2 tests to work in <100 node clusters #1667

lyzs90 · 2021-01-19T15:59:07Z

What type of PR is this?

/kind bug

What this PR does / why we need it:
Details: #887

Which issue(s) this PR fixes:

Fixes #887

Special notes for your reviewer:
The type conversion was rounding fractions to 0. Hence when .Nodes passed in is less than NODES_PER_NAMESPACE (defaults to 100), namespaces = DivideInt .Nodes NODES_PER_NAMESPACE = 0. thereafter namespace is used as the denominator for several computations eg. https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/load/config.yaml#L7

Does this PR introduce a user-facing change?:

Users can now run load tests with < 100 nodes

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2021-01-19T15:59:15Z

Hi @lyzs90. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

lyzs90 · 2021-01-19T16:01:06Z

/cc @mm4tt

mm4tt · 2021-01-20T08:04:33Z

clusterloader2/testing/load/config.yaml

@@ -39,7 +39,8 @@
 {{$ENABLE_VIOLATIONS_FOR_API_CALL_PROMETHEUS := DefaultParam .CL2_ENABLE_VIOLATIONS_FOR_API_CALL_PROMETHEUS false}}
 {{$ENABLE_VIOLATIONS_FOR_API_CALL_PROMETHEUS_SIMPLE := DefaultParam .CL2_ENABLE_VIOLATIONS_FOR_API_CALL_PROMETHEUS_SIMPLE true}}
 #Variables
-{{$namespaces := DivideInt .Nodes $NODES_PER_NAMESPACE}}
+# set min namespaces to 1 to avoid divide by zero errors. See https://github.com/kubernetes/perf-tests/issues/887
+{{$namespaces := DivideInt .Nodes $NODES_PER_NAMESPACE | MaxInt 1}}
 {{$totalPods := MultiplyInt $namespaces $NODES_PER_NAMESPACE $PODS_PER_NODE}}


We should also adjust this value. Otherwise, if I run this test on a 1 node cluster, the total pods will be calculated as if there was 100 nodes (and it will be equal to 3000pods). Obviously, it won't work

ah my bad, i didn't look beyond that line. the real culprit seems to be the NODES_PER_NAMESPACE constant. is it ok to set NODES_PER_NAMESPACE to be the minimum of num nodes and 100 (or whatever value was sent in)? then we can do away with the rounding fix on namespaces

Yeah, that should work.

mm4tt · 2021-01-20T08:04:48Z

/ok-to-test

lyzs90 · 2021-01-28T17:05:40Z

@mm4tt Noticed a few other places where this fix might not work for small clusters eg. https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/load/config.yaml#L67 and https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/density/config.yaml#L17. These may have to be scaled down also - the goal is so that anyone can run load / density tests on a 1 node cluster?

lyzs90 · 2021-01-28T17:10:14Z

Instead of trying to patch the load and density tests so that they will work on small clusters, wdyt about defining a minimal test config that will run on a 1 node cluster?

I'm thinking of an examples folder with a few configs like minimal, with-modules, with-overrides, etc.

mm4tt · 2021-01-29T07:53:53Z

I'd really like to make the load test to work at any scale. All the hard-coded values we have there are actually a technical debt and we should aim to get rid of them. A few notes:

Please disregard the density test, we're aiming to deprecate it soon
Is there anything else other than MIN_PODS_IN_SMALL_CLUSTERS that's blocking us?
For MIN_PODS_IN_SMALL_CLUSTERS, I believe we have some options:
1. Ideally, we'd just fix the Re-implement scheduling-throughput measurement #1027, so we don't need this constant. Unfortunately, that's not that simple, so let's think of something else
2. Instead of having the MIN_PODS_IN_SMALL_CLUSTERS as a constant, hard-coded in the config.yaml, it could be a test param that is set to 500 in the test-infra presets only for our 100+ node tests
3. Given that this part of the test doesn't really work if we have less than 100 nodes (if we have less then we're not able to accurately measure the scheduling throughput), we should probably conditionally disable this part if # nodes < 100.

I believe 3 is the best option currently. Thoughts?

lyzs90 · 2021-01-29T08:55:34Z

Yes I think approach 3 sounds good. Will also need to handle BIG_GROUP_SIZE and MEDIUM_GROUP_SIZE in the same manner I suppose?

mm4tt · 2021-01-29T09:26:59Z

Will also need to handle BIG_GROUP_SIZE and MEDIUM_GROUP_SIZE in the same manner I suppose?

Hmm, wouldn't it work if we just round down everywhere? E.g. if we have just 10 nodes, the total number of pods will be 300 (30*10). Then we'll calculate how many small/medium/big deployments we can fit and e.g. for big we'll do: 300 / 4 / 250 = 0 (assuming we round down). For medium it'll be 300/4/30 = 2. For small it'll be 300/2/5 = 30. Does it make sense?

lyzs90 · 2021-01-29T09:39:39Z

Will also need to handle BIG_GROUP_SIZE and MEDIUM_GROUP_SIZE in the same manner I suppose?

Hmm, wouldn't it work if we just round down everywhere? E.g. if we have just 10 nodes, the total number of pods will be 300 (30*10). Then we'll calculate how many small/medium/big deployments we can fit and e.g. for big we'll do: 300 / 4 / 250 = 0 (assuming we round down). For medium it'll be 300/4/30 = 2. For small it'll be 300/2/5 = 30. Does it make sense?

But it's also used in number of replicas https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/load/config.yaml#257. Seems like BIG_GROUP_SIZE, MEDIUM_GROUP_SIZE needs to be a factor of number of nodes?

mm4tt · 2021-01-29T09:42:52Z

It will be fine if we have 250 there even if there is no place for so many pods. This is because this object will be never created. The replicasPerNamespace will be 0 - https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/load/config.yaml#L244

lyzs90 · 2021-01-29T09:57:21Z

Ah yes. I'm curious if that (medium + small) will all fit on 1 node 😅

lyzs90 · 2021-01-30T07:27:45Z

Noting down some issues on a single node run:

Creates ~1000 namespaces due to https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/load/config.yaml#L94
Big jobs are getting created
panic: invalid argument to Int63n, clusterloader2/pkg/tuningset/randomized_time_limited.go:43 -> saturationTime rounded down to 0 here https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/load/config.yaml#L116
reading template (deployment.yaml) error: replacing placeholders error: template: :13:14: executing "" at <RandIntRange .ReplicasMin .ReplicasMax>: error calling RandIntRange: cannot cast <nil> to float64 -> bigDeploymentsPerNamespace and mediumDeploymentsPerNamespace are negative

lyzs90 · 2021-01-30T14:27:17Z

@mm4tt Added defaults and disabled heavier workloads when running on clusters < 100 nodes (I tested with a single node cluster)

mm4tt

Thanks, this is great! I know how painful it's to write in yaml and go-templates, so I really appreciate it!

mm4tt · 2021-02-02T07:49:31Z

clusterloader2/testing/load/config.yaml

@@ -90,7 +97,7 @@

 name: load
 namespace:
-  number: {{AddInt $namespaces $schedulerThroughputNamespaces}}
+  number: {{MinInt .Nodes (AddInt $namespaces $schedulerThroughputNamespaces)}}


Not sure If I understand this line. Wouldn't be better to make sure the $namespaces and schedulerThroughputNamespaces are just set properly in all the cases?

I added some calculations below so it's easier to make sense of the change.

Without MinInt:

# 100 nodes $MIN_PODS_IN_SMALL_CLUSTERS := 500 $totalSchedulerThroughputPods := MaxInt (MultiplyInt 2 $MIN_PODS_IN_SMALL_CLUSTERS) (MultiplyInt 2 .Nodes) # 1000 $schedulerThroughputPodsPerDeployment := .Nodes # 100 $schedulerThroughputNamespaces := DivideInt $totalSchedulerThroughputPods $schedulerThroughputPodsPerDeployment # 10 number: AddInt $namespaces $schedulerThroughputNamespaces # 11

# 1 node $MIN_PODS_IN_SMALL_CLUSTERS := 500 $totalSchedulerThroughputPods := MaxInt (MultiplyInt 2 $MIN_PODS_IN_SMALL_CLUSTERS) (MultiplyInt 2 .Nodes) # 1000 $schedulerThroughputPodsPerDeployment := .Nodes # 1 $schedulerThroughputNamespaces := DivideInt $totalSchedulerThroughputPods $schedulerThroughputPodsPerDeployment # 1000 number: AddInt $namespaces $schedulerThroughputNamespaces # 1001 <- !!!

With MinInt:

# 100 nodes same as above number: MinInt .Nodes (AddInt $namespaces $schedulerThroughputNamespaces) # 11

# 1 node same as above number: MinInt .Nodes (AddInt $namespaces $schedulerThroughputNamespaces) # 1

Hmm, I'd rather leave this line as is and change $schedulerThroughputNamespaces to 0 if $IS_SMALL_CLUSTER. It makes the code easier to understand

mm4tt · 2021-02-02T07:50:59Z

clusterloader2/testing/load/config.yaml

@@ -229,6 +236,7 @@ steps:
      objectTemplatePath: daemonset.yaml
      templateFillMap:
        Image: k8s.gcr.io/pause:3.0
+  {{if not $IS_SMALL_CLUSTER}}


Is this if block needed here? I believe the $bigDeploymentsPerNamespace will be 0 if .Nodes < 100. In that case no object is going to be created

Will try again, I may have mixed this up with the error coming from a negative $bigDeploymentsPerNamespace

Big deployment objects are still being created even though bigDeploymentsPerNamespace is 0. Seems like ReplicasMin and ReplicasMin have the final say. So maybe targeting the BIG_GROUP size variable would be better as it will affect jobs created at the same time.

Nah, we're missing something. If the replicasPerNamespace is set to 0 then no new object should be created -

perf-tests/clusterloader2/pkg/test/simple_test_executor.go

Line 261 in 8c85b1d

for replicaCounter := instancesStates[0].CurrentReplicaCount; replicaCounter < phase.ReplicasPerNamespace; replicaCounter++ {

The only explanation is that the $bigDeploymentsPerNamespace` is not set to 0. Isn't it because in line 67 we're doing:

{{$bigDeploymentsPerNamespace := MaxInt 1 (SubtractInt $bigDeploymentsPerNamespace 1)}}

?

Yes! This is it, addressed in my latest commit 99c29b4

mm4tt · 2021-02-02T07:52:28Z

clusterloader2/testing/load/config.yaml

@@ -345,6 +354,7 @@ steps:
        templateFillMap:
          ReplicasMin: {{$MEDIUM_GROUP_SIZE}}
          ReplicasMax: {{$MEDIUM_GROUP_SIZE}}
+  {{if not $IS_SMALL_CLUSTER}}


Instead of havign this if block, we could introduce a variable$bigJobsPerNamespace and make sure it will be set to 0 if #Nodes < 100

mm4tt · 2021-02-02T07:53:09Z

clusterloader2/testing/load/config.yaml


 - name: Scaling and updating objects
  phases:
+  {{if not $IS_SMALL_CLUSTER}}


Same as above, I believe it's not needed

mm4tt · 2021-02-02T07:53:34Z

clusterloader2/testing/load/config.yaml

@@ -652,6 +671,7 @@ steps:

 - name: Deleting objects
  phases:
+  {{if not $IS_SMALL_CLUSTER}}


mm4tt · 2021-02-02T07:54:07Z

clusterloader2/testing/load/config.yaml

@@ -627,6 +644,7 @@ steps:
        templateFillMap:
          ReplicasMin: {{MultiplyInt $MEDIUM_GROUP_SIZE 0.5}}
          ReplicasMax: {{MultiplyInt $MEDIUM_GROUP_SIZE 1.5}}
+  {{if not $IS_SMALL_CLUSTER}}


Same here, one we introduce a variable it shoudln't be needed anymore

mm4tt · 2021-02-04T08:10:18Z

clusterloader2/testing/load/config.yaml

+# Ensure non zero or negative after subtraction.
+{{$smallDeploymentsPerNamespace := MaxInt 1 (SubtractInt $smallDeploymentsPerNamespace 1)}}
+{{$mediumDeploymentsPerNamespace := MaxInt 1 (SubtractInt $mediumDeploymentsPerNamespace 1)}}
+{{$bigDeploymentsPerNamespace := MaxInt 1 (SubtractInt $bigDeploymentsPerNamespace 1)}}


We should have: MaxInt 0 ... instead of MaxInt 1 .... Here and above. Then we won't need lines 69-71

I added :69-71 because there is nothing limiting $bigDeploymentsPerNamespace above any more, it will be 250 * 4 - 1 = 999 on this line

Hmm, the bigDeploymentsPerNamespace is calculated as

{{$bigDeploymentsPerNamespace := DivideInt $podsPerNamespace (MultiplyInt 4 $BIG_GROUP_SIZE)}}

And if we have $namespaces=1 then $podsPerNamespace = 30 * #Nodes

So, e.g. in a 10 node cluster

bigDeploymentsPerNamespace := 30* 10 / 4 / 250 = 0

I believe it should work, we don't need any custom logic there

Ah did I miss something. Let me try again later. I vaguely recall it wasn't working as expected when I set it to MaxInt 0

Ok it works! Made the change accordingly

mm4tt

Thanks!

mm4tt · 2021-02-05T10:35:00Z

clusterloader2/testing/load/config.yaml

+# Stateful sets are enabled. Reduce the number of small and medium deployments per namespace
+# See https://github.com/kubernetes/perf-tests/issues/1036#issuecomment-607631768
+# Ensure non zero or negative after subtraction.
+{{$smallDeploymentsPerNamespace := MaxInt 1 (SubtractInt $smallDeploymentsPerNamespace $SMALL_STATEFUL_SETS_PER_NAMESPACE)}}


Let' have MaxInt 0 everywhere. It's more correct. We should err on the side of downsizing to keep the "~30 pods per node on avg" assumption to hold.

mm4tt · 2021-02-05T10:36:58Z

clusterloader2/testing/load/config.yaml

@@ -112,7 +122,7 @@ tuningSets:
    # The expected number of created/deleted pods is totalPods/4 when scaling,
    # as each RS changes its size from X to a uniform random value in [X/2, 3X/2].
    # To match 10 [pods/s] requirement, we need to divide saturationTime by 4.
-    timeLimit: {{DivideInt $saturationTime 4}}s
+    timeLimit: {{MaxInt 1 (DivideInt $saturationTime 4)}}s


Let's revert this. It shouldn't really matter whether the time limit is 0s or 1s. Let's keep the code simple.

The RNG panics if timeLimit is 0:

panic: invalid argument to Int63n goroutine 321 [running]: math/rand.(*Rand).Int63n(0xc0000a8420, 0x0, 0xc0008b67c0) /usr/local/go/src/math/rand/rand.go:111 +0x11d math/rand.Int63n(...) /usr/local/go/src/math/rand/rand.go:327 k8s.io/perf-tests/clusterloader2/pkg/tuningset.(*randomizedTimeLimitedLoad).Execute.func1() /Users/shemleong/go/src/perf-tests/clusterloader2/pkg/tuningset/randomized_time_limited.go:43 +0x4a k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc000b00420, 0xc000b7a750) /Users/shemleong/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:73 +0x51 created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start /Users/shemleong/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:71 +0x65 exit status 2

Oh, so let's fix that here -

perf-tests/clusterloader2/pkg/tuningset/randomized_time_limited.go

Line 43 in 0cef733

time.Sleep(time.Duration(rand.Int63n(r.params.TimeLimit.ToTimeDuration().Nanoseconds())))

Basically, call it only if timeLimit > 0

Cool, added the check there

mm4tt · 2021-02-05T10:46:15Z

clusterloader2/testing/load/config.yaml

@@ -90,7 +97,7 @@

 name: load
 namespace:
-  number: {{AddInt $namespaces $schedulerThroughputNamespaces}}
+  number: {{MinInt .Nodes (AddInt $namespaces $schedulerThroughputNamespaces)}}


Hmm, I'd rather leave this line as is and change $schedulerThroughputNamespaces to 0 if $IS_SMALL_CLUSTER. It makes the code easier to understand

lyzs90 · 2021-02-05T12:38:36Z

Other than the timeLimit one, I've addressed the other comments!

mm4tt

LGTM with one comment

mm4tt · 2021-02-05T14:06:13Z

clusterloader2/testing/load/config.yaml

 {{$PODS_PER_NODE := DefaultParam .PODS_PER_NODE 30}}
 {{$LOAD_TEST_THROUGHPUT := DefaultParam .CL2_LOAD_TEST_THROUGHPUT 10}}
 {{$DELETE_TEST_THROUGHPUT := DefaultParam .CL2_DELETE_TEST_THROUGHPUT $LOAD_TEST_THROUGHPUT}}
 {{$BIG_GROUP_SIZE := DefaultParam .BIG_GROUP_SIZE 250}}
-{{$MEDIUM_GROUP_SIZE := DefaultParam .MEDIUM_GROUP_SIZE 30}}
+{{$MEDIUM_GROUP_SIZE := DefaultParam .MEDIUM_GROUP_SIZE 15}}


Oops my bad, forgot to undo the change after testing

lyzs90 · 2021-02-05T15:48:09Z

LGTM with one comment

Cool :) will squash commits

mm4tt · 2021-02-05T16:12:04Z

/lgtm
/approve

k8s-ci-robot · 2021-02-05T16:12:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lyzs90, mm4tt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~clusterloader2/OWNERS~~ [mm4tt]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 19, 2021

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 19, 2021

k8s-ci-robot requested review from krzysied and oxddr January 19, 2021 15:59

k8s-ci-robot requested a review from mm4tt January 19, 2021 16:01

mm4tt reviewed Jan 20, 2021

View reviewed changes

lyzs90 force-pushed the num-nodes-divide-by-zero-error branch from 91a2abe to 9c7c0d0 Compare January 29, 2021 13:03

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 30, 2021

mm4tt reviewed Feb 2, 2021

View reviewed changes

mm4tt reviewed Feb 4, 2021

View reviewed changes

mm4tt reviewed Feb 5, 2021

View reviewed changes

mm4tt changed the title ~~Set default number of namespaces to 1~~ Make Cl2 tests to work in <100 node clusters Feb 5, 2021

mm4tt reviewed Feb 5, 2021

View reviewed changes

lyzs90 force-pushed the num-nodes-divide-by-zero-error branch from db6047b to 1dbc08f Compare February 5, 2021 15:45

Make CL2 tests work on clusters with < 100 nodes

411b1af

lyzs90 force-pushed the num-nodes-divide-by-zero-error branch from 1dbc08f to 411b1af Compare February 5, 2021 15:58

k8s-ci-robot assigned mm4tt Feb 5, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 5, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 5, 2021

k8s-ci-robot merged commit 11eae44 into kubernetes:master Feb 5, 2021

tosi3k mentioned this pull request Mar 2, 2021

Cannot run load test at scale smaller than 100 nodes #1752

Closed

bzub mentioned this pull request Nov 22, 2022

Descriptions of test suites #2204

Closed

Make Cl2 tests to work in <100 node clusters #1667

Make Cl2 tests to work in <100 node clusters #1667

Conversation

lyzs90 commented Jan 19, 2021 • edited Loading

k8s-ci-robot commented Jan 19, 2021

lyzs90 commented Jan 19, 2021

Choose a reason for hiding this comment

lyzs90 Jan 20, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mm4tt commented Jan 20, 2021

lyzs90 commented Jan 28, 2021

lyzs90 commented Jan 28, 2021 • edited Loading

mm4tt commented Jan 29, 2021

lyzs90 commented Jan 29, 2021

mm4tt commented Jan 29, 2021

lyzs90 commented Jan 29, 2021

mm4tt commented Jan 29, 2021

lyzs90 commented Jan 29, 2021 • edited Loading

lyzs90 commented Jan 30, 2021 • edited Loading

lyzs90 commented Jan 30, 2021

mm4tt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mm4tt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lyzs90 commented Feb 5, 2021

mm4tt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lyzs90 commented Feb 5, 2021

mm4tt commented Feb 5, 2021

k8s-ci-robot commented Feb 5, 2021

lyzs90 commented Jan 19, 2021 •

edited

Loading

lyzs90 Jan 20, 2021 •

edited

Loading

lyzs90 commented Jan 28, 2021 •

edited

Loading

lyzs90 commented Jan 29, 2021 •

edited

Loading

lyzs90 commented Jan 30, 2021 •

edited

Loading