Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: force_tiled_decode: true apparently ignored #7650

Open
1 task done
mcondarelli opened this issue Feb 17, 2025 · 2 comments
Open
1 task done

[bug]: force_tiled_decode: true apparently ignored #7650

mcondarelli opened this issue Feb 17, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@mcondarelli
Copy link

mcondarelli commented Feb 17, 2025

Is there an existing issue for this problem?

  • I have searched the existing issues

Operating system

Linux

GPU vendor

AMD (ROCm)

GPU model

Radeon RX 7600Xt

GPU VRAM

16GB

Version number

v5.6.2

Browser

Firefox 135.0

Python dependencies

accelerate 1.0.1
compel 2.0.2
cuda Not Installed
diffusers 0.31.0
numpy 1.26.4
opencv 4.9.0.80
onnx 1.16.1
pillow 11.1.0
python 3.11.11
torch 2.7.0.dev20250216+rocm6.3
torchvision 0.22.0.dev20250216+rocm6.3
transformers 4.46.3
xformers Not Installed

What happened

I enabled force_tiled_decode: true to prevent the huge VRAM spike (and OOM error) occurring during "Latents to Image" VAE decoding.
Apparently this setting has no effect as I have identical OOM error with and without; in both case 9.0GiB were requested.

What you expected to happen

I expected VRAM requirements to be lower, possibly avoiding OOM.

How to reproduce the problem

This happens on my setup with any SDXL model (I tested with Juggernaut XL v9 and Dreamshaper XL v2 Turbo).
My current invokeai.yaml is:

# This is an example file with default and example settings.
# You should not copy this whole file into your config.
# Only add the settings you need to change to your config file.

# Internal metadata - do not edit:
schema_version: 4.0.2

# Put user settings here - see https://invoke-ai.github.io/InvokeAI/configuration/:
host: 0.0.0.0
port: 9090
device: cuda
#precision: float32
precision: bfloat16
enable_partial_loading: true
device_working_mem_gb: 8
force_tiled_decode: true
#vae_tile_size: 512
remote_api_tokens:
- url_regex: civitai.com
  token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Additional context

Apparently root problem is default tile_size is 0 and this seems to disable somehow tiling.
I noticed this using explicit Text to Image - SDXL workflow.
Enabling Tiled in Latents to Image is not enough: I need to put some value (256 or even 512 work for me) in Tile Size.
I did not find a "legitimate" way to set that value in normal operation (i.e.: without using explicit workflows).
After a bit of snooping around I found where the default is set.
The following (VERY ugly) change works for me:

diff --git a/invokeai/app/invocations/latents_to_image.py b/invokeai/app/invocations/latents_to_image.py
index 4942ca5da..3ff1835f3 100644
--- a/invokeai/app/invocations/latents_to_image.py
+++ b/invokeai/app/invocations/latents_to_image.py
@@ -50,7 +50,7 @@ class LatentsToImageInvocation(BaseInvocation, WithMetadata, WithBoard):
     tiled: bool = InputField(default=False, description=FieldDescriptions.tiled)
     # NOTE: tile_size = 0 is a special value. We use this rather than `int | None`, because the workflow UI does not
     # offer a way to directly set None values.
-    tile_size: int = InputField(default=0, multiple_of=8, description=FieldDescriptions.vae_tile_size)
+    tile_size: int = InputField(default=512, multiple_of=8, description=FieldDescriptions.vae_tile_size)
     fp32: bool = InputField(default=False, description=FieldDescriptions.fp32)
 
     def _estimate_working_memory(

... but perhaps a better solution is in order.

Discord username

mcon

@RyanJDick
Copy link
Collaborator

Setting force_tiled_decode: true forces tiling to be used, but the default tile size (when tile_size=0) is determined based on the model architecture. For an SDXL model, the default tile size is 1024x1024 in image space (128x128 in latent space). So, enabling tiling will only have an effect if you are generating images above the standard resolution of 1024x1024.

It sounds like you want to globally set the VAE tile size to be smaller than the model's native resolution in order to reduce memory usage. We'll have to add a new config to support this.


Here's a proposal for how this could be achieved:

  • Add a force_vae_decode_tile_size: int | None config.
    • As its name suggests, this would override the tile_size parameter to the LatentsToImageInvocation
    • Default to None, preserving the current behaviour.
  • Add VAE tiling support to SD3LatentsToImageInvocation and FluxVaeDecodeInvocation. This should be pretty easy, just hasn't been done yet.

Another option that we have previously discussed is to automatically fallback to a tile size that enables VAE decode to complete. But, this moves further away from the spirit of deterministic nodes since the tile size does significantly impact the output image.

@hipsterusername @psychedelicious What do you think?

@mcondarelli
Copy link
Author

I indeed need to avoid OoM errors on my non-mainstream Radeon RX 7600XT with 16GB VRAM.
Part of the problem is current ROCm does not support float16 on my hardware (they say it will come along the way, but AMD couldn't commit to any date).
I am working now using precision: bfloat16 in invokeai.yaml but "Latent to Image" can't use bfloat16 so force_tiled_decode: true seems the only option I have.

This said there could be a third option (I don't know how difficult it could be): add support for bfloat16 to VAE decoder.

IMHO adding support for explicitly setting tile_size would be preferred solution and consistent with "Latent to Image" node in workflow.
Any other solution would seem too expensive given limited audience.


On the same subject: is 16 GB VRAM considered insufficient nowadays? I tried a couple of FLUX models but they die to OoM while trying to allocate "just" 2.27 GiB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants