You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
accelerate 1.0.1
compel 2.0.2
cuda Not Installed
diffusers 0.31.0
numpy 1.26.4
opencv 4.9.0.80
onnx 1.16.1
pillow 11.1.0
python 3.11.11
torch 2.7.0.dev20250216+rocm6.3
torchvision 0.22.0.dev20250216+rocm6.3
transformers 4.46.3
xformers Not Installed
What happened
I enabled force_tiled_decode: true to prevent the huge VRAM spike (and OOM error) occurring during "Latents to Image" VAE decoding.
Apparently this setting has no effect as I have identical OOM error with and without; in both case 9.0GiB were requested.
What you expected to happen
I expected VRAM requirements to be lower, possibly avoiding OOM.
How to reproduce the problem
This happens on my setup with any SDXL model (I tested with Juggernaut XL v9 and Dreamshaper XL v2 Turbo).
My current invokeai.yaml is:
# This is an example file with default and example settings.
# You should not copy this whole file into your config.
# Only add the settings you need to change to your config file.
# Internal metadata - do not edit:
schema_version: 4.0.2
# Put user settings here - see https://invoke-ai.github.io/InvokeAI/configuration/:
host: 0.0.0.0
port: 9090
device: cuda
#precision: float32
precision: bfloat16
enable_partial_loading: true
device_working_mem_gb: 8
force_tiled_decode: true
#vae_tile_size: 512
remote_api_tokens:
- url_regex: civitai.com
token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Additional context
Apparently root problem is default tile_size is 0 and this seems to disable somehow tiling.
I noticed this using explicit Text to Image - SDXL workflow.
Enabling Tiled in Latents to Image is not enough: I need to put some value (256 or even 512 work for me) in Tile Size.
I did not find a "legitimate" way to set that value in normal operation (i.e.: without using explicit workflows).
After a bit of snooping around I found where the default is set.
The following (VERY ugly) change works for me:
diff --git a/invokeai/app/invocations/latents_to_image.py b/invokeai/app/invocations/latents_to_image.py
index 4942ca5da..3ff1835f3 100644
--- a/invokeai/app/invocations/latents_to_image.py+++ b/invokeai/app/invocations/latents_to_image.py@@ -50,7 +50,7 @@ class LatentsToImageInvocation(BaseInvocation, WithMetadata, WithBoard):
tiled: bool = InputField(default=False, description=FieldDescriptions.tiled)
# NOTE: tile_size = 0 is a special value. We use this rather than `int | None`, because the workflow UI does not
# offer a way to directly set None values.
- tile_size: int = InputField(default=0, multiple_of=8, description=FieldDescriptions.vae_tile_size)+ tile_size: int = InputField(default=512, multiple_of=8, description=FieldDescriptions.vae_tile_size)
fp32: bool = InputField(default=False, description=FieldDescriptions.fp32)
def _estimate_working_memory(
... but perhaps a better solution is in order.
Discord username
mcon
The text was updated successfully, but these errors were encountered:
Setting force_tiled_decode: true forces tiling to be used, but the default tile size (when tile_size=0) is determined based on the model architecture. For an SDXL model, the default tile size is 1024x1024 in image space (128x128 in latent space). So, enabling tiling will only have an effect if you are generating images above the standard resolution of 1024x1024.
It sounds like you want to globally set the VAE tile size to be smaller than the model's native resolution in order to reduce memory usage. We'll have to add a new config to support this.
Here's a proposal for how this could be achieved:
Add a force_vae_decode_tile_size: int | None config.
As its name suggests, this would override the tile_size parameter to the LatentsToImageInvocation
Default to None, preserving the current behaviour.
Add VAE tiling support to SD3LatentsToImageInvocation and FluxVaeDecodeInvocation. This should be pretty easy, just hasn't been done yet.
Another option that we have previously discussed is to automatically fallback to a tile size that enables VAE decode to complete. But, this moves further away from the spirit of deterministic nodes since the tile size does significantly impact the output image.
I indeed need to avoid OoM errors on my non-mainstream Radeon RX 7600XT with 16GB VRAM.
Part of the problem is current ROCm does not support float16 on my hardware (they say it will come along the way, but AMD couldn't commit to any date).
I am working now using precision: bfloat16 in invokeai.yaml but "Latent to Image" can't use bfloat16 so force_tiled_decode: true seems the only option I have.
This said there could be a third option (I don't know how difficult it could be): add support for bfloat16 to VAE decoder.
IMHO adding support for explicitly setting tile_size would be preferred solution and consistent with "Latent to Image" node in workflow.
Any other solution would seem too expensive given limited audience.
On the same subject: is 16 GB VRAM considered insufficient nowadays? I tried a couple of FLUX models but they die to OoM while trying to allocate "just" 2.27 GiB.
Is there an existing issue for this problem?
Operating system
Linux
GPU vendor
AMD (ROCm)
GPU model
Radeon RX 7600Xt
GPU VRAM
16GB
Version number
v5.6.2
Browser
Firefox 135.0
Python dependencies
accelerate 1.0.1
compel 2.0.2
cuda Not Installed
diffusers 0.31.0
numpy 1.26.4
opencv 4.9.0.80
onnx 1.16.1
pillow 11.1.0
python 3.11.11
torch 2.7.0.dev20250216+rocm6.3
torchvision 0.22.0.dev20250216+rocm6.3
transformers 4.46.3
xformers Not Installed
What happened
I enabled
force_tiled_decode: true
to prevent the huge VRAM spike (and OOM error) occurring during "Latents to Image" VAE decoding.Apparently this setting has no effect as I have identical OOM error with and without; in both case 9.0GiB were requested.
What you expected to happen
I expected VRAM requirements to be lower, possibly avoiding OOM.
How to reproduce the problem
This happens on my setup with any SDXL model (I tested with
Juggernaut XL v9
andDreamshaper XL v2 Turbo
).My current
invokeai.yaml
is:Additional context
Apparently root problem is default
tile_size
is0
and this seems to disable somehow tiling.I noticed this using explicit
Text to Image - SDXL
workflow.Enabling
Tiled
inLatents to Image
is not enough: I need to put some value (256 or even 512 work for me) inTile Size
.I did not find a "legitimate" way to set that value in normal operation (i.e.: without using explicit workflows).
After a bit of snooping around I found where the default is set.
The following (VERY ugly) change works for me:
... but perhaps a better solution is in order.
Discord username
mcon
The text was updated successfully, but these errors were encountered: