[bug]: `force_tiled_decode: true` apparently ignored #7650

mcondarelli · 2025-02-17T09:25:41Z

Is there an existing issue for this problem?

I have searched the existing issues

Operating system

Linux

GPU vendor

AMD (ROCm)

GPU model

Radeon RX 7600Xt

GPU VRAM

16GB

Version number

v5.6.2

Browser

Firefox 135.0

Python dependencies

accelerate 1.0.1
compel 2.0.2
cuda Not Installed
diffusers 0.31.0
numpy 1.26.4
opencv 4.9.0.80
onnx 1.16.1
pillow 11.1.0
python 3.11.11
torch 2.7.0.dev20250216+rocm6.3
torchvision 0.22.0.dev20250216+rocm6.3
transformers 4.46.3
xformers Not Installed

What happened

I enabled force_tiled_decode: true to prevent the huge VRAM spike (and OOM error) occurring during "Latents to Image" VAE decoding.
Apparently this setting has no effect as I have identical OOM error with and without; in both case 9.0GiB were requested.

What you expected to happen

I expected VRAM requirements to be lower, possibly avoiding OOM.

How to reproduce the problem

This happens on my setup with any SDXL model (I tested with Juggernaut XL v9 and Dreamshaper XL v2 Turbo).
My current invokeai.yaml is:

# This is an example file with default and example settings.
# You should not copy this whole file into your config.
# Only add the settings you need to change to your config file.

# Internal metadata - do not edit:
schema_version: 4.0.2

# Put user settings here - see https://invoke-ai.github.io/InvokeAI/configuration/:
host: 0.0.0.0
port: 9090
device: cuda
#precision: float32
precision: bfloat16
enable_partial_loading: true
device_working_mem_gb: 8
force_tiled_decode: true
#vae_tile_size: 512
remote_api_tokens:
- url_regex: civitai.com
  token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Additional context

Apparently root problem is default tile_size is 0 and this seems to disable somehow tiling.
I noticed this using explicit Text to Image - SDXL workflow.
Enabling Tiled in Latents to Image is not enough: I need to put some value (256 or even 512 work for me) in Tile Size.
I did not find a "legitimate" way to set that value in normal operation (i.e.: without using explicit workflows).
After a bit of snooping around I found where the default is set.
The following (VERY ugly) change works for me:

diff --git a/invokeai/app/invocations/latents_to_image.py b/invokeai/app/invocations/latents_to_image.py
index 4942ca5da..3ff1835f3 100644
--- a/invokeai/app/invocations/latents_to_image.py
+++ b/invokeai/app/invocations/latents_to_image.py
@@ -50,7 +50,7 @@ class LatentsToImageInvocation(BaseInvocation, WithMetadata, WithBoard):
     tiled: bool = InputField(default=False, description=FieldDescriptions.tiled)
     # NOTE: tile_size = 0 is a special value. We use this rather than `int | None`, because the workflow UI does not
     # offer a way to directly set None values.
-    tile_size: int = InputField(default=0, multiple_of=8, description=FieldDescriptions.vae_tile_size)
+    tile_size: int = InputField(default=512, multiple_of=8, description=FieldDescriptions.vae_tile_size)
     fp32: bool = InputField(default=False, description=FieldDescriptions.fp32)
 
     def _estimate_working_memory(

... but perhaps a better solution is in order.

Discord username

mcon

The text was updated successfully, but these errors were encountered:

RyanJDick · 2025-02-18T23:11:05Z

Setting force_tiled_decode: true forces tiling to be used, but the default tile size (when tile_size=0) is determined based on the model architecture. For an SDXL model, the default tile size is 1024x1024 in image space (128x128 in latent space). So, enabling tiling will only have an effect if you are generating images above the standard resolution of 1024x1024.

It sounds like you want to globally set the VAE tile size to be smaller than the model's native resolution in order to reduce memory usage. We'll have to add a new config to support this.

Here's a proposal for how this could be achieved:

Add a force_vae_decode_tile_size: int | None config.
- As its name suggests, this would override the tile_size parameter to the LatentsToImageInvocation
- Default to None, preserving the current behaviour.
Add VAE tiling support to SD3LatentsToImageInvocation and FluxVaeDecodeInvocation. This should be pretty easy, just hasn't been done yet.

Another option that we have previously discussed is to automatically fallback to a tile size that enables VAE decode to complete. But, this moves further away from the spirit of deterministic nodes since the tile size does significantly impact the output image.

@hipsterusername @psychedelicious What do you think?

mcondarelli · 2025-02-19T00:35:10Z

I indeed need to avoid OoM errors on my non-mainstream Radeon RX 7600XT with 16GB VRAM.
Part of the problem is current ROCm does not support float16 on my hardware (they say it will come along the way, but AMD couldn't commit to any date).
I am working now using precision: bfloat16 in invokeai.yaml but "Latent to Image" can't use bfloat16 so force_tiled_decode: true seems the only option I have.

This said there could be a third option (I don't know how difficult it could be): add support for bfloat16 to VAE decoder.

IMHO adding support for explicitly setting tile_size would be preferred solution and consistent with "Latent to Image" node in workflow.
Any other solution would seem too expensive given limited audience.

On the same subject: is 16 GB VRAM considered insufficient nowadays? I tried a couple of FLUX models but they die to OoM while trying to allocate "just" 2.27 GiB.

mcondarelli added the bug Something isn't working label Feb 17, 2025

mcondarelli mentioned this issue Feb 18, 2025

[Issue]: Unable to compile ROCm5.7 on recent Ubuntu (InvokeAI + fp16 + RX7600 incompatibility) ROCm/ROCm#4358

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug]: `force_tiled_decode: true` apparently ignored #7650

[bug]: `force_tiled_decode: true` apparently ignored #7650

mcondarelli commented Feb 17, 2025 •

edited

Loading

RyanJDick commented Feb 18, 2025

mcondarelli commented Feb 19, 2025

[bug]: force_tiled_decode: true apparently ignored #7650

[bug]: force_tiled_decode: true apparently ignored #7650

Comments

mcondarelli commented Feb 17, 2025 • edited Loading

Is there an existing issue for this problem?

Operating system

GPU vendor

GPU model

GPU VRAM

Version number

Browser

Python dependencies

What happened

What you expected to happen

How to reproduce the problem

Additional context

Discord username

RyanJDick commented Feb 18, 2025

mcondarelli commented Feb 19, 2025

[bug]: `force_tiled_decode: true` apparently ignored #7650

[bug]: `force_tiled_decode: true` apparently ignored #7650

mcondarelli commented Feb 17, 2025 •

edited

Loading