GPU: Make Vulkan transfer buffers dedicated allocs #12256

thatcosmonaut · 2025-02-10T20:40:55Z

Our Vulkan memory allocation strategy is to do large heap allocations at a time (64MB or larger) and then slice out sections of them for resource binding. This means we have to defragment regularly or else holes will appear in the allocations.

The defrag process creates a new buffer on an available unfragmented allocation and then preserves data by enqueueing a copy command to preserve the data. We do the copy for GPU buffers, but we skip it for transfer buffers. This causes an issue where persistently-used upload transfer data is corrupted. An easy solution for this is to do what we did for download transfer buffers and make them dedicated allocations, meaning that they live alone on a heap allocation.

The only drawback is the maxMemoryAllocationCount limit. The most common driver limit here is 4096. Honestly, if you have that many transfer buffers you definitely need to redesign your renderer.

The alternative would be to preserve transfer contents in the defrag process. Since in the typical case transfer data does not need to be persistent, I think this would just be a waste of processing time.

Resolves #12242

slime73 · 2025-02-10T20:50:57Z

What sort of difference is there in the performance cost of creating or destroying one?

thatcosmonaut · 2025-02-10T20:54:21Z

vkAllocateMemory is an expensive kernel call, but we already recommend designing your renderer to cache resources. Churning resources would be bad either way.

slime73 · 2025-02-10T21:00:00Z

Given that transfer data isn't usually persistent, would caching the transfer buffers end up basically being a reimplementation of the sort of heap that SDL_GPU does for other types of resources?

I'm just imagining putting this directly into my own code and that's probably how it would end up with a bunch of extra work needed on my side, unless I'm missing something easy. Whereas with my native Vulkan code that uses VMA I don't have to worry about that and can just make throwaway transfer buffers for uploading/downloading texture or buffer data to and from the GPU.

thatcosmonaut · 2025-02-10T21:41:28Z

If you're just using vmaCreateBuffer all that's doing under the hood is calling vkAllocateMemory. You would have to use a more complex allocation strategy to avoid that.

EDIT: After skimming some VMA code it looks like they do have a block system for allocations, much like we do. The main difference I guess is that VMA only defragments when you ask it to. Since we're targeting a Metal-esque level of complexity I don't think it's reasonable to force the user to reason about memory compaction.

slime73 · 2025-02-10T21:53:21Z

Since we're targeting a Metal-esque level of complexity I don't think it's reasonable to force the user to reason about memory compaction.

Yeah, I completely agree. I just wonder if this change will create that need for users because they'd have to make their own memory allocation scheme to deal with the cost of transfer buffers.

thatcosmonaut · 2025-02-10T22:07:46Z

Let's consider the use cases.

You have to do some uploads every frame, like in the case of a sprite batcher. In this case you should just be hanging on to a transfer buffer object that manages the uploads. When you map the transfer buffer you set the cycle parameter to true to avoid having to manually pool transfers.
You are loading content at the start of your game. In this case you would create a transfer buffer, perform your uploads, and tear it down. This is heavy anyway so the extra overhead of creating a transfer buffer doesn't really matter.
You are reading back data every frame. In this case you would have one persistent transfer buffer per-frame in flight.
You are reading back data sporadically, like saving a screenshot. In this case you would hold on to a persistent transfer buffer that's big enough to contain the screenshot buffer. You could also just set up and tear down a transfer buffer on a thread since presumably you would want the screenshot saving process to be asynchronous.

I'm not really seeing where it would be necessary or less troublesome to be churning transfer buffers constantly. It's good practice in general not to touch the driver anyway.

slime73 · 2025-02-11T01:56:12Z

So one of the things I have in my app is multiple (one per font+size combination) texture atlases for font glyphs. Those atlases get populated with rasterized glyphs on demand during the app's runtime when there's a glyph that's needed which is missing from the atlas. The cost of uploading on-demand is expected to be fairly minimal, low enough that stutters aren't going to be a real problem.

So far I have not needed any higher level transfer buffer memory pool for that (especially since I use VMA with my Vulkan backend). If the cost in practice of making transfer buffers in SDL_GPU use dedicated memory with is low enough, then I still won't need to if I use SDL_GPU. This sort of thing is why I'm interested in real measurements rather than theoretical numbers.

I also have recent experience with small dedicated allocations in D3D12 on Windows being very slow, enough that the engine which was doing that had performance issues compared to D3D11 just because of those allocations. It's always good to measure this sort of thing in my opinion.

flibitijibibo · 2025-02-11T02:04:15Z

Yeah, I'll be running the FNA3D traces tomorrow... we cap our cycles but we also have a separate path for abnormally large transfers, so I could see it getting weird with a big enough test. Will try to confirm soon!

thatcosmonaut added 2 commits February 10, 2025 12:24

GPU: Make Vulkan transfer buffers dedicated allocs

f324dc7

fix comment

2d7ff63

thatcosmonaut requested review from flibitijibibo and TheSpydog February 10, 2025 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU: Make Vulkan transfer buffers dedicated allocs #12256

GPU: Make Vulkan transfer buffers dedicated allocs #12256

thatcosmonaut commented Feb 10, 2025 •

edited

Loading

slime73 commented Feb 10, 2025

thatcosmonaut commented Feb 10, 2025

slime73 commented Feb 10, 2025 •

edited

Loading

thatcosmonaut commented Feb 10, 2025 •

edited

Loading

slime73 commented Feb 10, 2025

thatcosmonaut commented Feb 10, 2025 •

edited

Loading

slime73 commented Feb 11, 2025 •

edited

Loading

flibitijibibo commented Feb 11, 2025

GPU: Make Vulkan transfer buffers dedicated allocs #12256

Are you sure you want to change the base?

GPU: Make Vulkan transfer buffers dedicated allocs #12256

Conversation

thatcosmonaut commented Feb 10, 2025 • edited Loading

slime73 commented Feb 10, 2025

thatcosmonaut commented Feb 10, 2025

slime73 commented Feb 10, 2025 • edited Loading

thatcosmonaut commented Feb 10, 2025 • edited Loading

slime73 commented Feb 10, 2025

thatcosmonaut commented Feb 10, 2025 • edited Loading

slime73 commented Feb 11, 2025 • edited Loading

flibitijibibo commented Feb 11, 2025

thatcosmonaut commented Feb 10, 2025 •

edited

Loading

slime73 commented Feb 10, 2025 •

edited

Loading

thatcosmonaut commented Feb 10, 2025 •

edited

Loading

thatcosmonaut commented Feb 10, 2025 •

edited

Loading

slime73 commented Feb 11, 2025 •

edited

Loading