Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase VAE decode memory estimates #7674

Draft
wants to merge 2 commits into
base: ryan/vae-decode-mem
Choose a base branch
from

Conversation

RyanJDick
Copy link
Collaborator

@RyanJDick RyanJDick commented Feb 24, 2025

Summary

This PR increases the working memory estimates for VAE decode operations. The previous values were set based on memory allocations and were pretty aggressive about trying to make full use of the available VRAM. The updated values were set based on experimentally-observed reserved memory (not just allocated) levels, and also aim to be conservative by including some buffer room.

This change is intended to address reports of slow VAE decoding.

The more conservative memory estimate values could cause more memory to be offloaded from the GPU during VAE decode. This in turn could result in slower model reloading on subsequent runs. The net impact of this change is expected to be positive, but there may be a noticeable regression for some users. We will want to keep a close eye on this when it is released in an RC.

Related Issues / Discussions

N/A

QA Instructions

I tested several common VAE decode scenarios with memory profiling enabled. Results:

All tests run with `pytorch_cuda_alloc_conf: "backend:cudaMallocAsync"`.

fp32, SD1, 512x512
- Estimated: 2450 MB
- Reserved:  1696 MB

fp16, SDXL, 1024x1024
- Estimated: 4400 MB
- Reserved:  3232 MB

fp32, SDXL, 1024x1024
- Estimated: 9050 MB
- Reserved:  5568 MB
Note: Even in fp32, we apply some optimizations to still run some layers in fp16. We don't account for this in the estimate, which explains why there is so much buffer in this case.

fp16, FLUX, 1024x1024
- Estimated: 4400 MB
- Reserved:  2944 MB

fp16, SD3, 1024x1024
- Estimated: 4400 MB
- Reserved:  3136 MB

Merge Plan

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions bot added python PRs that change python files invocations PRs that change invocations docs PRs that change docs labels Feb 24, 2025
@RyanJDick RyanJDick force-pushed the ryan/increase-vae-working-mem branch from fec3d0e to d7b5a6a Compare February 24, 2025 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs PRs that change docs invocations PRs that change invocations python PRs that change python files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant