Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stateful deployments: use TaskGroupVolumeClaim table to associate volume requests with volume IDs #24993

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

pkazmierczak
Copy link
Contributor

Rough draft for now.

nomad/state/state_store.go Outdated Show resolved Hide resolved
nomad/fsm.go Outdated Show resolved Hide resolved
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, @pkazmierczak!

@@ -223,11 +221,21 @@ func (h *HostVolumeChecker) hasVolumes(n *structs.Node) bool {
}

if req.Sticky {
if slices.Contains(h.hostVolumeIDs, vol.ID) || len(h.hostVolumeIDs) == 0 {
claim, err := h.ctx.State().GetTaskGroupVolumeClaim(nil, h.namespace, h.jobID, h.taskGroupName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this I'm realizing our schema isn't quite right... we potentially have multiple volume claims for a given task group because the group.count could be >1. We could potentially have multiple task groups claim a given volume ID (for multi-reader volumes), so we can't just have the ID be the volume either.

That implies the ID for a TaskGroupVolumeClaim should be (namespace + job ID + group + volume ID). Then we do a prefix get on the table for (namespace + job ID + group) to get all claims for the group. We return true if there are 0 claims or 1 claim with a matching volume ID. Otherwise false.

(Alternately, we could use a UUID for the ID after all and have a giant composite index on (namespace + job ID + group + volume ID), which might make deleting individual claims easier for the user?)

I think we also need to track which claims we're picking. Although the scheduler is going to try to spread allocs for a given job out (binpacking is between jobs), it's a soft preference. We need to make sure that for a given evaluation we don't accidentally give the same claim out multiple times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That implies the ID for a TaskGroupVolumeClaim should be (namespace + job ID + group + volume ID). Then we do a prefix get on the table for (namespace + job ID + group) to get all claims for the group. We return true if there are 0 claims or 1 claim with a matching volume ID. Otherwise false.

You're right! I will adjust the schema.

(Alternately, we could use a UUID for the ID after all and have a giant composite index on (namespace + job ID + group + volume ID), which might make deleting individual claims easier for the user?)

I think for the current functionality we don't need a UUID, it's only useful for users that want to list claims and delete them. In fact, I'd rather remove parts of the code that pertain to this to get things moving if you're okay with that.

I think we also need to track which claims we're picking. Although the scheduler is going to try to spread allocs for a given job out (binpacking is between jobs), it's a soft preference. We need to make sure that for a given evaluation we don't accidentally give the same claim out multiple times.

This is a more complex problem, and I'm not sure I really understand. Let's chat about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That implies the ID for a TaskGroupVolumeClaim should be (namespace + job ID + group + volume ID). Then we do a prefix get on the table for (namespace + job ID + group) to get all claims for the group. We return true if there are 0 claims or 1 claim with a matching volume ID. Otherwise false.

In the feasibility checker that's true. But in upsertAllocsImpl, when we do a check to see if a claim exists, can it be the case that there are multiple volume IDs claimed for namespace, job and group? I wanna make sure I'm getting the logic right here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in the upsertAllocsImpl case you're looking for exactly one of those claims for a specific volume ID because you know which node the allocation is on.

nomad/structs/structs.go Outdated Show resolved Hide resolved
nomad/state/state_store.go Outdated Show resolved Hide resolved

// Delete task group volume claims
for _, tg := range job.TaskGroups {
if _, err = txn.DeleteAll(TableTaskGroupVolumeClaim, indexID, namespace, jobID, tg.Name); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would become txn.DeletePrefix if we go with the schema change I recommend in feasibility.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants