Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-672: Implement the DependsOn API #740

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
39 changes: 39 additions & 0 deletions api/jobset/v1alpha2/jobset_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ const (
)

// JobSetSpec defines the desired state of JobSet
// +kubebuilder:validation:XValidation:rule="!(has(self.startupPolicy) && self.startupPolicy.startupPolicyOrder == 'InOrder' && self.replicatedJobs.all(x, has(x.dependsOn)))",message="StartupPolicy and DependsOn APIs are mutually exclusive"
// +kubebuilder:validation:XValidation:rule="!(has(self.replicatedJobs[0].dependsOn))",message="DependsOn can't be set for the first ReplicatedJob"
type JobSetSpec struct {
// ReplicatedJobs is the group of jobs that will form the set.
// +listType=map
Expand All @@ -105,6 +107,7 @@ type JobSetSpec struct {
FailurePolicy *FailurePolicy `json:"failurePolicy,omitempty"`

// StartupPolicy, if set, configures in what order jobs must be started
// Deprecated: StartupPolicy is deprecated, please use the DependsOn API.
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="Value is immutable"
StartupPolicy *StartupPolicy `json:"startupPolicy,omitempty"`

Expand Down Expand Up @@ -230,8 +233,44 @@ type ReplicatedJob struct {
// Jobs names will be in the format: <jobSet.name>-<spec.replicatedJob.name>-<job-index>
// +kubebuilder:default=1
Replicas int32 `json:"replicas,omitempty"`

// DependsOn is an optional list that specifies the preceding ReplicatedJobs upon which
// the current ReplicatedJob depends. If specified, the ReplicatedJob will be created
// only after the referenced ReplicatedJobs reach their desired state.
// The Order of ReplicatedJobs is defined by their enumeration in the slice.
// Note, that the first ReplicatedJob in the slice cannot use the DependsOn API.
// TODO (andreyvelich): Currently, only a single item is supported in the DependsOn list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have TODO in user facing APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO (andreyvelich): Currently, only a single item is supported in the DependsOn list.
// Currently, only a single item is supported in the DependsOn list.

// This API is mutually exclusive with the StartupPolicy API.
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="Value is immutable"
// +kubebuilder:validation:MaxItems=1
// +optional
// +listType=map
// +listMapKey=name
DependsOn []DependsOn `json:"dependsOn,omitempty"`
}

// DependsOn defines the dependency on the previous ReplicatedJob status.
type DependsOn struct {
// Name of the previous ReplicatedJob.
Name string `json:"name"`

// Status defines the condition for the ReplicatedJob. Only Ready or Complete status can be set.
// +kubebuilder:validation:Enum=Ready;Complete
Status DependsOnStatus `json:"status"`
}

type DependsOnStatus string

const (
// Ready status means the Ready counter equals the number of child Jobs.
// .spec.replicatedJobs["name==<JOB_NAME>"].replicas == .status.replicatedJobsStatus.name["name==<JOB_NAME>"].ready
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the boolean expression is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ?

Should we modify this as follows:

.spec.replicatedJobs["name==<JOB_NAME>"].replicas == 
.status.replicatedJobsStatus.name["name==<JOB_NAME>"].ready + 
.status.replicatedJobsStatus.name["name==<JOB_NAME>"].succeeded + 
.status.replicatedJobsStatus.name["name==<JOB_NAME>"].failed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadyStatus DependsOnStatus = "Ready"
Copy link
Contributor

@danielvegamyhre danielvegamyhre Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename these status variables something like DependencyReady and DependencyComplete (or ReplicatedJobReady and ReplicatedJobComplete) since they are specific to a replicatedJob dependency and not the status of the JobSet itself.

With the current naming, in the other places we're importing the api as jobset and using these variables, they sound like they are referring to the status of the JobSet itself (e.g. jobset.ReadyStatus, jobset.CompletedStatus, etc), which is confusing/misleading.


// Complete status means the Succeeded counter equals the number of child Jobs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is actually true. A ReplicatedJob can be mark as succeeded if the success policy passes which doesn't necessarly mean that all the replicated jobs finished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ReplicatedJob can be mark as succeeded if the success policy passes which doesn't necessarly mean that all the replicated jobs finished.

Can you give me an example of this use-case? From my understanding, ReplicatedJob completion is based on ReplicatedJob's SuccessPolicy: https://kubernetes.io/docs/concepts/workloads/controllers/job/#success-policy.
Which means, ReplicatedJob will be in Complete status when this policy is met.

SuccessPolicy on the JobSet level just sets whether we should mark JobSet as Completed, when All or Any ReplicatedJobs reach Complete status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. You are correct.

// .spec.replicatedJobs["name==<JOB_NAME>"].replicas == .status.replicatedJobsStatus.name["name==<JOB_NAME>"].succeeded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the boolean expression is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you mean for user-facing docs. Sure, I think we can remove it.

CompleteStatus DependsOnStatus = "Complete"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we think about Complete status here, given the discussion in: #723 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to match the Kubernetes Job and use Complete, so keep as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think that is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for consistency w/ Job API

)

type Network struct {
// EnableDNSHostnames allows pods to be reached via their hostnames.
// Pods will be reachable using the fully qualified pod hostname:
Expand Down
57 changes: 55 additions & 2 deletions api/jobset/v1alpha2/openapi_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 20 additions & 0 deletions api/jobset/v1alpha2/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

48 changes: 48 additions & 0 deletions client-go/applyconfiguration/jobset/v1alpha2/dependson.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 17 additions & 3 deletions client-go/applyconfiguration/jobset/v1alpha2/replicatedjob.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions client-go/applyconfiguration/utils.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

40 changes: 40 additions & 0 deletions config/components/crd/bases/jobset.x-k8s.io_jobsets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,40 @@ spec:
set.
items:
properties:
dependsOn:
description: |-
DependsOn is an optional list that specifies the preceding ReplicatedJobs upon which
the current ReplicatedJob depends. If specified, the ReplicatedJob will be created
only after the referenced ReplicatedJobs reach their desired state.
The Order of ReplicatedJobs is defined by their enumeration in the slice.
Note, that the first ReplicatedJob in the slice cannot use the DependsOn API.
This API is mutually exclusive with the StartupPolicy API.
items:
description: DependsOn defines the dependency on the previous
ReplicatedJob status.
properties:
name:
description: Name of the previous ReplicatedJob.
type: string
status:
description: Status defines the condition for the ReplicatedJob.
Only Ready or Complete status can be set.
enum:
- Ready
- Complete
type: string
required:
- name
- status
type: object
maxItems: 1
type: array
x-kubernetes-list-map-keys:
- name
x-kubernetes-list-type: map
x-kubernetes-validations:
- message: Value is immutable
rule: self == oldSelf
name:
description: |-
Name is the name of the entry and will be used as a suffix
Expand Down Expand Up @@ -8928,6 +8962,12 @@ spec:
minimum: 0
type: integer
type: object
x-kubernetes-validations:
- message: StartupPolicy and DependsOn APIs are mutually exclusive
rule: '!(has(self.startupPolicy) && self.startupPolicy.startupPolicyOrder
== ''InOrder'' && self.replicatedJobs.all(x, has(x.dependsOn)))'
- message: DependsOn can't be set for the first ReplicatedJob
rule: '!(has(self.replicatedJobs[0].dependsOn))'
status:
description: JobSetStatus defines the observed state of JobSet
properties:
Expand Down
34 changes: 33 additions & 1 deletion hack/python-sdk/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,26 @@
}
}
},
"jobset.v1alpha2.DependsOn": {
"description": "DependsOn defines the dependency on the previous ReplicatedJob status.",
"type": "object",
"required": [
"name",
"status"
],
"properties": {
"name": {
"description": "Name of the previous ReplicatedJob.",
"type": "string",
"default": ""
},
"status": {
"description": "Status defines the condition for the ReplicatedJob. Only Ready or Complete status can be set.",
"type": "string",
"default": ""
}
}
},
"jobset.v1alpha2.FailurePolicy": {
"type": "object",
"properties": {
Expand Down Expand Up @@ -177,7 +197,7 @@
"x-kubernetes-list-type": "map"
},
"startupPolicy": {
"description": "StartupPolicy, if set, configures in what order jobs must be started",
"description": "StartupPolicy, if set, configures in what order jobs must be started Deprecated: StartupPolicy is deprecated, please use the DependsOn API.",
"$ref": "#/definitions/jobset.v1alpha2.StartupPolicy"
},
"successPolicy": {
Expand Down Expand Up @@ -262,6 +282,18 @@
"template"
],
"properties": {
"dependsOn": {
"description": "DependsOn is an optional list that specifies the preceding ReplicatedJobs upon which the current ReplicatedJob depends. If specified, the ReplicatedJob will be created only after the referenced ReplicatedJobs reach their desired state. The Order of ReplicatedJobs is defined by their enumeration in the slice. Note, that the first ReplicatedJob in the slice cannot use the DependsOn API. This API is mutually exclusive with the StartupPolicy API.",
"type": "array",
"items": {
"default": {},
"$ref": "#/definitions/jobset.v1alpha2.DependsOn"
},
"x-kubernetes-list-map-keys": [
"name"
],
"x-kubernetes-list-type": "map"
},
"name": {
"description": "Name is the name of the entry and will be used as a suffix for the Job name.",
"type": "string",
Expand Down
Loading