Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

state store: fix logic for evaluating job status #24974

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

mismithhisler
Copy link
Member

Description

The state store persists the correct job version for job summaries, but not for specific versions. This fix attempts to simplify the logic around evaluating a job's status, and then make's sure it is persisted.

Testing & Reproduction steps

See GH issue for replication steps

Links

Fixes GH #24957

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

@mismithhisler mismithhisler requested a review from jrasell January 29, 2025 14:58
@mismithhisler mismithhisler self-assigned this Jan 29, 2025
@mismithhisler mismithhisler removed the request for review from jrasell January 29, 2025 16:50
@@ -3851,11 +3852,6 @@ func (s *StateStore) EvalsByJob(ws memdb.WatchSet, namespace, jobID string) ([]*

e := raw.(*structs.Evaluation)

// Filter non-exact matches
if e.JobID != jobID {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary anymore because periodic jobs do not get an evaluation.

@@ -5532,6 +5527,12 @@ func (s *StateStore) setJobStatus(index uint64, txn *txn,
if err := s.setJobSummary(txn, updated, index, oldStatus, newStatus); err != nil {
return fmt.Errorf("job summary update failed %w", err)
}

// Update the job version details
if err := s.upsertJobVersion(index, updated, txn); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels odd to call here, but it's necessary to also update a job's version status anytime we are updating the job summary, or they can get out of sync.

@@ -5084,6 +5087,7 @@ func TestStateStore_DeleteEval_Eval(t *testing.T) {
require.Equal(t, uint64(1002), evalsIndex)
}

// This tests the evalDelete boolean by deleting a Pending eval and Pending Alloc.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job tests the evalDelete boolean, but it's an odd test because it forcefully deletes a non-terminal eval and non-terminal alloc, and then makes sure we set the job to "dead".

nomad/fsm.go Outdated Show resolved Hide resolved
@mismithhisler mismithhisler requested a review from a team as a code owner February 4, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nomad job inspect showing incorrect job status
1 participant