-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Cl2 tests to work in <100 node clusters #1667
Make Cl2 tests to work in <100 node clusters #1667
Conversation
Hi @lyzs90. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @mm4tt |
@@ -39,7 +39,8 @@ | |||
{{$ENABLE_VIOLATIONS_FOR_API_CALL_PROMETHEUS := DefaultParam .CL2_ENABLE_VIOLATIONS_FOR_API_CALL_PROMETHEUS false}} | |||
{{$ENABLE_VIOLATIONS_FOR_API_CALL_PROMETHEUS_SIMPLE := DefaultParam .CL2_ENABLE_VIOLATIONS_FOR_API_CALL_PROMETHEUS_SIMPLE true}} | |||
#Variables | |||
{{$namespaces := DivideInt .Nodes $NODES_PER_NAMESPACE}} | |||
# set min namespaces to 1 to avoid divide by zero errors. See https://github.com/kubernetes/perf-tests/issues/887 | |||
{{$namespaces := DivideInt .Nodes $NODES_PER_NAMESPACE | MaxInt 1}} | |||
{{$totalPods := MultiplyInt $namespaces $NODES_PER_NAMESPACE $PODS_PER_NODE}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also adjust this value. Otherwise, if I run this test on a 1 node cluster, the total pods will be calculated as if there was 100 nodes (and it will be equal to 3000pods). Obviously, it won't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah my bad, i didn't look beyond that line. the real culprit seems to be the NODES_PER_NAMESPACE
constant. is it ok to set NODES_PER_NAMESPACE
to be the minimum of num nodes and 100 (or whatever value was sent in)? then we can do away with the rounding fix on namespaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that should work.
/ok-to-test |
@mm4tt Noticed a few other places where this fix might not work for small clusters eg. https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/load/config.yaml#L67 and https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/density/config.yaml#L17. These may have to be scaled down also - the goal is so that anyone can run load / density tests on a 1 node cluster? |
Instead of trying to patch the load and density tests so that they will work on small clusters, wdyt about defining a minimal test config that will run on a 1 node cluster? I'm thinking of an examples folder with a few configs like minimal, with-modules, with-overrides, etc. |
I'd really like to make the load test to work at any scale. All the hard-coded values we have there are actually a technical debt and we should aim to get rid of them. A few notes:
I believe 3 is the best option currently. Thoughts? |
Yes I think approach 3 sounds good. Will also need to handle |
Hmm, wouldn't it work if we just round down everywhere? E.g. if we have just 10 nodes, the total number of pods will be 300 (30*10). Then we'll calculate how many small/medium/big deployments we can fit and e.g. for big we'll do: 300 / 4 / 250 = 0 (assuming we round down). For medium it'll be 300/4/30 = 2. For small it'll be 300/2/5 = 30. Does it make sense? |
But it's also used in number of replicas https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/load/config.yaml#257. Seems like |
It will be fine if we have 250 there even if there is no place for so many pods. This is because this object will be never created. The replicasPerNamespace will be 0 - https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/load/config.yaml#L244 |
Ah yes. I'm curious if that (medium + small) will all fit on 1 node 😅 |
91a2abe
to
9c7c0d0
Compare
Noting down some issues on a single node run:
|
@mm4tt Added defaults and disabled heavier workloads when running on clusters < 100 nodes (I tested with a single node cluster) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is great! I know how painful it's to write in yaml and go-templates, so I really appreciate it!
@@ -90,7 +97,7 @@ | |||
|
|||
name: load | |||
namespace: | |||
number: {{AddInt $namespaces $schedulerThroughputNamespaces}} | |||
number: {{MinInt .Nodes (AddInt $namespaces $schedulerThroughputNamespaces)}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure If I understand this line. Wouldn't be better to make sure the $namespaces
and schedulerThroughputNamespaces
are just set properly in all the cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some calculations below so it's easier to make sense of the change.
Without MinInt:
# 100 nodes
$MIN_PODS_IN_SMALL_CLUSTERS := 500
$totalSchedulerThroughputPods := MaxInt (MultiplyInt 2 $MIN_PODS_IN_SMALL_CLUSTERS) (MultiplyInt 2 .Nodes) # 1000
$schedulerThroughputPodsPerDeployment := .Nodes # 100
$schedulerThroughputNamespaces := DivideInt $totalSchedulerThroughputPods $schedulerThroughputPodsPerDeployment # 10
number: AddInt $namespaces $schedulerThroughputNamespaces # 11
# 1 node
$MIN_PODS_IN_SMALL_CLUSTERS := 500
$totalSchedulerThroughputPods := MaxInt (MultiplyInt 2 $MIN_PODS_IN_SMALL_CLUSTERS) (MultiplyInt 2 .Nodes) # 1000
$schedulerThroughputPodsPerDeployment := .Nodes # 1
$schedulerThroughputNamespaces := DivideInt $totalSchedulerThroughputPods $schedulerThroughputPodsPerDeployment # 1000
number: AddInt $namespaces $schedulerThroughputNamespaces # 1001 <- !!!
With MinInt:
# 100 nodes
same as above
number: MinInt .Nodes (AddInt $namespaces $schedulerThroughputNamespaces) # 11
# 1 node
same as above
number: MinInt .Nodes (AddInt $namespaces $schedulerThroughputNamespaces) # 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'd rather leave this line as is and change $schedulerThroughputNamespaces
to 0 if $IS_SMALL_CLUSTER
. It makes the code easier to understand
@@ -229,6 +236,7 @@ steps: | |||
objectTemplatePath: daemonset.yaml | |||
templateFillMap: | |||
Image: k8s.gcr.io/pause:3.0 | |||
{{if not $IS_SMALL_CLUSTER}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this if block needed here? I believe the $bigDeploymentsPerNamespace
will be 0 if .Nodes
< 100. In that case no object is going to be created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try again, I may have mixed this up with the error coming from a negative $bigDeploymentsPerNamespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, we're missing something. If the replicasPerNamespace
is set to 0 then no new object should be created -
for replicaCounter := instancesStates[0].CurrentReplicaCount; replicaCounter < phase.ReplicasPerNamespace; replicaCounter++ { |
The only explanation is that the $bigDeploymentsPerNamespace` is not set to 0. Isn't it because in line 67 we're doing:
{{$bigDeploymentsPerNamespace := MaxInt 1 (SubtractInt $bigDeploymentsPerNamespace 1)}}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! This is it, addressed in my latest commit 99c29b4
@@ -345,6 +354,7 @@ steps: | |||
templateFillMap: | |||
ReplicasMin: {{$MEDIUM_GROUP_SIZE}} | |||
ReplicasMax: {{$MEDIUM_GROUP_SIZE}} | |||
{{if not $IS_SMALL_CLUSTER}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of havign this if block, we could introduce a variable$bigJobsPerNamespace
and make sure it will be set to 0 if #Nodes < 100
|
||
- name: Scaling and updating objects | ||
phases: | ||
{{if not $IS_SMALL_CLUSTER}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, I believe it's not needed
@@ -652,6 +671,7 @@ steps: | |||
|
|||
- name: Deleting objects | |||
phases: | |||
{{if not $IS_SMALL_CLUSTER}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
@@ -627,6 +644,7 @@ steps: | |||
templateFillMap: | |||
ReplicasMin: {{MultiplyInt $MEDIUM_GROUP_SIZE 0.5}} | |||
ReplicasMax: {{MultiplyInt $MEDIUM_GROUP_SIZE 1.5}} | |||
{{if not $IS_SMALL_CLUSTER}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, one we introduce a variable it shoudln't be needed anymore
# Ensure non zero or negative after subtraction. | ||
{{$smallDeploymentsPerNamespace := MaxInt 1 (SubtractInt $smallDeploymentsPerNamespace 1)}} | ||
{{$mediumDeploymentsPerNamespace := MaxInt 1 (SubtractInt $mediumDeploymentsPerNamespace 1)}} | ||
{{$bigDeploymentsPerNamespace := MaxInt 1 (SubtractInt $bigDeploymentsPerNamespace 1)}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have: MaxInt 0 ...
instead of MaxInt 1 ...
. Here and above. Then we won't need lines 69-71
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added :69-71 because there is nothing limiting $bigDeploymentsPerNamespace
above any more, it will be 250 * 4 - 1 = 999
on this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the bigDeploymentsPerNamespace
is calculated as
{{$bigDeploymentsPerNamespace := DivideInt $podsPerNamespace (MultiplyInt 4 $BIG_GROUP_SIZE)}}
And if we have $namespaces=1
then $podsPerNamespace = 30 * #Nodes
So, e.g. in a 10 node cluster
bigDeploymentsPerNamespace := 30* 10 / 4 / 250 = 0
I believe it should work, we don't need any custom logic there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah did I miss something. Let me try again later. I vaguely recall it wasn't working as expected when I set it to MaxInt 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok it works! Made the change accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
# Stateful sets are enabled. Reduce the number of small and medium deployments per namespace | ||
# See https://github.com/kubernetes/perf-tests/issues/1036#issuecomment-607631768 | ||
# Ensure non zero or negative after subtraction. | ||
{{$smallDeploymentsPerNamespace := MaxInt 1 (SubtractInt $smallDeploymentsPerNamespace $SMALL_STATEFUL_SETS_PER_NAMESPACE)}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let' have MaxInt 0 everywhere. It's more correct. We should err on the side of downsizing to keep the "~30 pods per node on avg" assumption to hold.
@@ -112,7 +122,7 @@ tuningSets: | |||
# The expected number of created/deleted pods is totalPods/4 when scaling, | |||
# as each RS changes its size from X to a uniform random value in [X/2, 3X/2]. | |||
# To match 10 [pods/s] requirement, we need to divide saturationTime by 4. | |||
timeLimit: {{DivideInt $saturationTime 4}}s | |||
timeLimit: {{MaxInt 1 (DivideInt $saturationTime 4)}}s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revert this. It shouldn't really matter whether the time limit is 0s or 1s. Let's keep the code simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RNG panics if timeLimit is 0:
panic: invalid argument to Int63n
goroutine 321 [running]:
math/rand.(*Rand).Int63n(0xc0000a8420, 0x0, 0xc0008b67c0)
/usr/local/go/src/math/rand/rand.go:111 +0x11d
math/rand.Int63n(...)
/usr/local/go/src/math/rand/rand.go:327
k8s.io/perf-tests/clusterloader2/pkg/tuningset.(*randomizedTimeLimitedLoad).Execute.func1()
/Users/shemleong/go/src/perf-tests/clusterloader2/pkg/tuningset/randomized_time_limited.go:43 +0x4a
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc000b00420, 0xc000b7a750)
/Users/shemleong/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:73 +0x51
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
/Users/shemleong/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:71 +0x65
exit status 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so let's fix that here -
time.Sleep(time.Duration(rand.Int63n(r.params.TimeLimit.ToTimeDuration().Nanoseconds()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, call it only if timeLimit > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, added the check there
@@ -90,7 +97,7 @@ | |||
|
|||
name: load | |||
namespace: | |||
number: {{AddInt $namespaces $schedulerThroughputNamespaces}} | |||
number: {{MinInt .Nodes (AddInt $namespaces $schedulerThroughputNamespaces)}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'd rather leave this line as is and change $schedulerThroughputNamespaces
to 0 if $IS_SMALL_CLUSTER
. It makes the code easier to understand
Other than the timeLimit one, I've addressed the other comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one comment
{{$PODS_PER_NODE := DefaultParam .PODS_PER_NODE 30}} | ||
{{$LOAD_TEST_THROUGHPUT := DefaultParam .CL2_LOAD_TEST_THROUGHPUT 10}} | ||
{{$DELETE_TEST_THROUGHPUT := DefaultParam .CL2_DELETE_TEST_THROUGHPUT $LOAD_TEST_THROUGHPUT}} | ||
{{$BIG_GROUP_SIZE := DefaultParam .BIG_GROUP_SIZE 250}} | ||
{{$MEDIUM_GROUP_SIZE := DefaultParam .MEDIUM_GROUP_SIZE 30}} | ||
{{$MEDIUM_GROUP_SIZE := DefaultParam .MEDIUM_GROUP_SIZE 15}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops my bad, forgot to undo the change after testing
db6047b
to
1dbc08f
Compare
Cool :) will squash commits |
1dbc08f
to
411b1af
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lyzs90, mm4tt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Details: #887
Which issue(s) this PR fixes:
Fixes #887
Special notes for your reviewer:
The type conversion was rounding fractions to 0. Hence when
.Nodes
passed in is less thanNODES_PER_NAMESPACE (defaults to 100)
,namespaces = DivideInt .Nodes NODES_PER_NAMESPACE = 0
. thereafternamespace
is used as the denominator for several computations eg. https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/load/config.yaml#L7Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: