Add `pytorch_cuda_alloc_conf` config to tune VRAM memory allocation #7673

RyanJDick · 2025-02-24T17:28:30Z

Summary

This PR adds a pytorch_cuda_alloc_conf config flag to control the torch memory allocator behavior.

pytorch_cuda_alloc_conf defaults to None, preserving the current behavior.
The configuration options are explained here: https://pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf. Tuning this configuration can reduce peak reserved VRAM and improve performance.
Setting pytorch_cuda_alloc_conf: "backend:cudaMallocAsync" in invokeai.yaml is expected to work well on many systems. This is a good first step for those looking to tune this config. (We may make this the default in the future.)
The optimal configuration seems to be dependent on a number of factors such as device version, VRAM, CUDA kernel version, etc. For now, users will have to experiment with this config to see if it hurts or helps on their systems. In most cases, I expect it to help.

Memory Tests

VAE decode memory usage comparison:

- SDXL, fp16, 1024x1024:
  - `cudaMallocAsync`: allocated=2593 MB, reserved=3200 MB
  - `native`:          allocated=2595 MB, reserved=4418 MB

- SDXL, fp32, 1024x1024:
  - `cudaMallocAsync`: allocated=3982 MB, reserved=5536 MB
  - `native`:          allocated=3982 MB, reserved=7276 MB

- SDXL, fp32, 1536x1536:
  - `cudaMallocAsync`: allocated=8643 MB, reserved=12032 MB
  - `native`:          allocated=8643 MB, reserved=15900 MB

Related Issues / Discussions

N/A

QA Instructions

Performance tests with pytorch_cuda_alloc_conf unset.
Performance tests with pytorch_cuda_alloc_conf: "backend:cudaMallocAsync".

Merge Plan

Merge Tidy app entrypoint #7668 first and change target branch to main

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

…ests.

… config field that allows full customization of the CUDA allocator.

…mported() to only run if CUDA is available.

invokeai/app/util/torch_cuda_allocator.py

hipsterusername · 2025-02-24T20:59:25Z

As confirmation, i presume this does not play nicely on AMD?

RyanJDick · 2025-02-24T22:04:35Z

As confirmation, i presume this does not play nicely on AMD?

I haven't tested on AMD, but I would not expect the recommended config of backend:cudaMallocAsync to work on AMD. That being said, the native allocator configs documented here might work with AMD (don't have a way to test and couldn't find it documented clearly anywhere). We'd need someone to test whether they do and experiment to find a good recommendation.

RyanJDick added 3 commits February 24, 2025 15:36

Add utils for configuring the torch CUDA allocator.

9abf44f

Add use_cuda_malloc config option.

3d2e245

Simplify is_torch_cuda_malloc_enabled() implementation and add unit t…

5f1de52

…ests.

github-actions bot added python PRs that change python files services PRs that change app services python-tests PRs that change python tests labels Feb 24, 2025

Switch from use_cuda_malloc flag to a general pytorch_cuda_alloc_conf…

76430cb

… config field that allows full customization of the CUDA allocator.

RyanJDick force-pushed the ryan/vae-decode-mem branch from e7ff9d7 to 76430cb Compare February 24, 2025 17:30

Update low-vram docs with info abhout .

cea366a

github-actions bot added the docs PRs that change docs label Feb 24, 2025

RyanJDick mentioned this pull request Feb 24, 2025

Increase VAE decode memory estimates #7674

Draft

6 tasks

Mark test_configure_torch_cuda_allocator_raises_if_torch_is_already_i…

6469f42

…mported() to only run if CUDA is available.

psychedelicious reviewed Feb 24, 2025

View reviewed changes

invokeai/app/util/torch_cuda_allocator.py Show resolved Hide resolved

RyanJDick marked this pull request as ready for review February 24, 2025 20:57

RyanJDick requested review from blessedcoolant, brandonrising, hipsterusername and lstein as code owners February 24, 2025 20:57

RyanJDick added the DO NOT MERGE label Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `pytorch_cuda_alloc_conf` config to tune VRAM memory allocation #7673

Add `pytorch_cuda_alloc_conf` config to tune VRAM memory allocation #7673

RyanJDick commented Feb 24, 2025

hipsterusername commented Feb 24, 2025

RyanJDick commented Feb 24, 2025

Add pytorch_cuda_alloc_conf config to tune VRAM memory allocation #7673

Are you sure you want to change the base?

Add pytorch_cuda_alloc_conf config to tune VRAM memory allocation #7673

Conversation

RyanJDick commented Feb 24, 2025

Summary

Memory Tests

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

hipsterusername commented Feb 24, 2025

RyanJDick commented Feb 24, 2025

Add `pytorch_cuda_alloc_conf` config to tune VRAM memory allocation #7673

Add `pytorch_cuda_alloc_conf` config to tune VRAM memory allocation #7673