-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jobsets having replicas = 0 for their replicatedJob are not handled in Kueue #2227
Comments
@shrinandj Which version do you use JobSet? |
Is that a different error where the |
/assign |
The source of issue was correctly identified, because exactly Kueue option allows 2 solutions:
@alculquicondor @tenzen-y wdyt? @shrinandj seems that option 2 is your expected behaviour. May I ask what is the rationale behind |
I guess we could do option 2. However, it sounds surprising to me that jobset allows 0 replicas. @danielvegamyhre @kannon92 is this expected? |
I think most workloads allow 0 replicas. I know that you can downscale deployments to 0. When we implement elastic jobsets I think 0 should be a valid value. |
This is set dynamically and therefore could be 0. The scenario where we ran into this is:
|
Option 2 works best IMO as well. Failing the workload will make that users will have to factor that in and change their upstream logic (or job submission processes if submitting manually) which just adds that much more complications. |
Makes perfect sense, thanks! |
Yes I think we should allow 0 since we want to support elastic JobSets which may scale replicas up/down (perhaps even to 0). |
In that case, we should change the validation of Workload to allow count=0 (as opposed to not adding it to the list of podsets). And we need to make sure nothing else assumes like the value is not zero, like dividing by count (I think we have a few of these). /assign @trasc |
Thanks a lot for doing this! 🙏 |
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
):v1.28.8
git describe --tags --dirty --always
):v0.6
aws
cat /etc/os-release
):uname -a
):** Details **
Jobset CRD shows the default is 1 but it is not a minimum value of 1. So 0 is acceptable.
Kueue's workload CRD shows that the podSet's count has a minimum value of 1:
Kueue's log show the following error:
The text was updated successfully, but these errors were encountered: