Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Finalizers to Jobs so ReplicatedJobStatus can be updated before jobs are removed. #360

Open
kannon92 opened this issue Dec 21, 2023 · 5 comments
Assignees
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.

Comments

@kannon92
Copy link
Contributor

What would you like to be added:
Various policies in JobSet utilize ReplicatedJobStatus for tracking. If a Job is terminated on a successful run (ie via ttlAfterFinished is set and the update is quick), then we could lose the latest status updates.

One should make sure to drop the finalizers after the ReplicatedJobStatus has been updated.

Why is this needed:

#244 (comment)

Adding a finalizer to the jobs to confirm that an update of the replicatedJobStatus happens before the job is deleted would be useful.

@danielvegamyhre danielvegamyhre added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. labels Feb 16, 2024
@Bharadwajshivam28
Copy link
Member

Hey @kannon92 @danielvegamyhre I can take on this issue but I need some more details and steps to get started as I am a new contributor

@googs1025
Copy link
Member

Hi @kannon92 @danielvegamyhre , do we still need this issue? I would like to get involved in this issue recently, and once this PR #490 is merged, I can start working on resolving this issue.

@googs1025
Copy link
Member

/assign

@googs1025
Copy link
Member

Is this issue still needed? I am a little confused about something. If we create a Finalizer for a sub-job, it means that we cannot delete the job arbitrarily through kubectl.

@googs1025
Copy link
Member

I'm not sure if this is correct, but here's what I came up with:

func (r *JobSetReconciler) deleteJobs(ctx context.Context, jobsForDeletion []*batchv1.Job) error {
	...
	workqueue.ParallelizeUntil(ctx, constants.MaxParallelism, len(jobsForDeletion), func(i int) {
		targetJob := jobsForDeletion[i]
		// Skip deleting jobs with deletion timestamp already set.
		if targetJob.DeletionTimestamp != nil {
			return
		}
		// Remove finalizer before deleting the job.
		err := r.removeFinalizer(ctx, targetJob)
		if err != nil {
			...
		}
		// Delete job. This deletion event will trigger another reconciliation,
		// where the jobs are recreated.
		foregroundPolicy := metav1.DeletePropagationForeground
		if err := r.Delete(ctx, targetJob, &client.DeleteOptions{PropagationPolicy: &foregroundPolicy}); client.IgnoreNotFound(err) != nil {
			...
		}
		log.V(2).Info("successfully deleted job", "job", klog.KObj(targetJob), "restart attempt", targetJob.Labels[targetJob.Labels[constants.RestartsKey]])
	})
	return errors.Join(finalErrs...)
}

func (r *JobSetReconciler) removeFinalizer(ctx context.Context, job *batchv1.Job) error {
	// Remove finalizer
	if controllerutil.ContainsFinalizer(job, jobset.JobSetFinalizerName) {
		controllerutil.RemoveFinalizer(job, jobset.JobSetFinalizerName)
	}
	// Update Job object
	return r.Update(ctx, job)
}
func (r *JobSetReconciler) createJobs(ctx context.Context, js *jobset.JobSet, jobs []*batchv1.Job) error {
        ...
	workqueue.ParallelizeUntil(ctx, constants.MaxParallelism, len(jobs), func(i int) {
		job := jobs[i]
                 // Add Finalizer
		if !controllerutil.ContainsFinalizer(js, jobset.JobSetFinalizerName) {
			controllerutil.AddFinalizer(js, jobset.JobSetFinalizerName)
		}
		// Set jobset controller as owner of the job for garbage collection and reconcilation.
		...
		// Create the job.
		// TODO(#18): Deal with the case where the job exists but is not owned by the jobset.
		if err := r.Create(ctx, job); err != nil {
		     ...
		}
		...
	})
	allErrs := errors.Join(finalErrs...)
	return allErrs
}

I have another question, when I delete the Finalizer, I will update the job, but do I also need to update the jobset? Or when I update the job, the controller will automatically trigger.
Sorry, this may be a newbie question, but I want to confirm the problem first before working on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.
Projects
None yet
Development

No branches or pull requests

4 participants