stateful deployments: use `TaskGroupVolumeClaim` table to associate volume requests with volume IDs #24993

pkazmierczak · 2025-01-31T18:33:17Z

Rough draft for now.

nomad/state/state_store.go

nomad/fsm.go

tgross

This is looking great, @pkazmierczak!

tgross · 2025-02-04T14:13:29Z

scheduler/feasible.go

@@ -223,11 +221,21 @@ func (h *HostVolumeChecker) hasVolumes(n *structs.Node) bool {
 			}

 			if req.Sticky {
-				if slices.Contains(h.hostVolumeIDs, vol.ID) || len(h.hostVolumeIDs) == 0 {
+				claim, err := h.ctx.State().GetTaskGroupVolumeClaim(nil, h.namespace, h.jobID, h.taskGroupName)


Seeing this I'm realizing our schema isn't quite right... we potentially have multiple volume claims for a given task group because the group.count could be >1. We could potentially have multiple task groups claim a given volume ID (for multi-reader volumes), so we can't just have the ID be the volume either.

That implies the ID for a TaskGroupVolumeClaim should be (namespace + job ID + group + volume ID). Then we do a prefix get on the table for (namespace + job ID + group) to get all claims for the group. We return true if there are 0 claims or 1 claim with a matching volume ID. Otherwise false.

(Alternately, we could use a UUID for the ID after all and have a giant composite index on (namespace + job ID + group + volume ID), which might make deleting individual claims easier for the user?)

I think we also need to track which claims we're picking. Although the scheduler is going to try to spread allocs for a given job out (binpacking is between jobs), it's a soft preference. We need to make sure that for a given evaluation we don't accidentally give the same claim out multiple times.

That implies the ID for a TaskGroupVolumeClaim should be (namespace + job ID + group + volume ID). Then we do a prefix get on the table for (namespace + job ID + group) to get all claims for the group. We return true if there are 0 claims or 1 claim with a matching volume ID. Otherwise false.

You're right! I will adjust the schema.

(Alternately, we could use a UUID for the ID after all and have a giant composite index on (namespace + job ID + group + volume ID), which might make deleting individual claims easier for the user?)

I think for the current functionality we don't need a UUID, it's only useful for users that want to list claims and delete them. In fact, I'd rather remove parts of the code that pertain to this to get things moving if you're okay with that.

I think we also need to track which claims we're picking. Although the scheduler is going to try to spread allocs for a given job out (binpacking is between jobs), it's a soft preference. We need to make sure that for a given evaluation we don't accidentally give the same claim out multiple times.

This is a more complex problem, and I'm not sure I really understand. Let's chat about this.

That implies the ID for a TaskGroupVolumeClaim should be (namespace + job ID + group + volume ID). Then we do a prefix get on the table for (namespace + job ID + group) to get all claims for the group. We return true if there are 0 claims or 1 claim with a matching volume ID. Otherwise false.

In the feasibility checker that's true. But in upsertAllocsImpl, when we do a check to see if a claim exists, can it be the case that there are multiple volume IDs claimed for namespace, job and group? I wanna make sure I'm getting the logic right here.

Yes, but in the upsertAllocsImpl case you're looking for exactly one of those claims for a specific volume ID because you know which node the allocation is on.

nomad/structs/structs.go

nomad/state/state_store.go

tgross · 2025-02-04T14:18:03Z

nomad/state/state_store.go

+
+	// Delete task group volume claims
+	for _, tg := range job.TaskGroups {
+		if _, err = txn.DeleteAll(TableTaskGroupVolumeClaim, indexID, namespace, jobID, tg.Name); err != nil {


This would become txn.DeletePrefix if we go with the schema change I recommend in feasibility.go

pkazmierczak added 5 commits January 30, 2025 11:59

taskVolumeAssignmentSchema

47e8969

state store methods and struct definition

7f7fecd

wip

b0cc34f

clean up

22f5827

a few missing pieces

42a727a

tgross reviewed Jan 31, 2025

View reviewed changes

nomad/state/state_store.go Outdated Show resolved Hide resolved

nomad/fsm.go Outdated Show resolved Hide resolved

Tim's comments

727e564

vercel bot deployed to Preview – nomad-ui February 3, 2025 09:34 View deployment

pkazmierczak added 2 commits February 3, 2025 21:00

better schema?

3cb76b5

wip

088d525

vercel bot deployed to Preview – nomad-ui February 3, 2025 20:40 View deployment

pkazmierczak added 2 commits February 4, 2025 13:24

remove host volume IDs field from allocation

d0f5092

remove from api

0a11627

vercel bot deployed to Preview – nomad-ui February 4, 2025 12:29 View deployment

working prototype

554dda9

vercel bot deployed to Preview – nomad-ui February 4, 2025 13:03 View deployment

tgross reviewed Feb 4, 2025

View reviewed changes

clean-ups and Tim's comments

10599dc

vercel bot deployed to Preview – nomad-ui February 4, 2025 18:20 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stateful deployments: use `TaskGroupVolumeClaim` table to associate volume requests with volume IDs #24993

stateful deployments: use `TaskGroupVolumeClaim` table to associate volume requests with volume IDs #24993

pkazmierczak commented Jan 31, 2025

tgross left a comment

tgross Feb 4, 2025

pkazmierczak Feb 4, 2025

pkazmierczak Feb 4, 2025

tgross Feb 4, 2025

tgross Feb 4, 2025

stateful deployments: use TaskGroupVolumeClaim table to associate volume requests with volume IDs #24993

Are you sure you want to change the base?

stateful deployments: use TaskGroupVolumeClaim table to associate volume requests with volume IDs #24993

Conversation

pkazmierczak commented Jan 31, 2025

tgross left a comment

Choose a reason for hiding this comment

tgross Feb 4, 2025

Choose a reason for hiding this comment

pkazmierczak Feb 4, 2025

Choose a reason for hiding this comment

pkazmierczak Feb 4, 2025

Choose a reason for hiding this comment

tgross Feb 4, 2025

Choose a reason for hiding this comment

tgross Feb 4, 2025

Choose a reason for hiding this comment

stateful deployments: use `TaskGroupVolumeClaim` table to associate volume requests with volume IDs #24993

stateful deployments: use `TaskGroupVolumeClaim` table to associate volume requests with volume IDs #24993