Skip to content

Commit

Permalink
Bugfix: Offload of GGML-quantized model in torch.inference_mode() cm (
Browse files Browse the repository at this point in the history
#7525)

## Summary

This PR contains a bugfix for an edge case with model unloading (from
VRAM to RAM). Thanks to @JPPhoto for finding it.

The bug was triggered under the following conditions:
- A GGML-quantized model is loaded in VRAM
- We run a Spandrel image-to-image invocation (which is wrapped in a
`torch.inference_mode()` context manager.
- The model cache attempts to unload the GGML-quantized model from VRAM
to RAM.
- Doing this inside of the `torch.inference_mode()` cm results in the
following error:
```
 [2025-01-07 15:48:17,744]::[InvokeAI]::ERROR --> Error while invoking session 98a07259-0c03-4111-a8d8-107041cb86f9, invocation d8daa90b-7e4c-4fc4-807c-50ba9be1a4ed (spandrel_image_to_image): Cannot set version_counter for inference tensor
[2025-01-07 15:48:17,744]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/home/ryan/src/InvokeAI/invokeai/app/services/session_processor/session_processor_default.py", line 129, in run_node
    output = invocation.invoke_internal(context=context, services=self._services)
  File "/home/ryan/src/InvokeAI/invokeai/app/invocations/baseinvocation.py", line 300, in invoke_internal
    output = self.invoke(context)
  File "/home/ryan/.pyenv/versions/3.10.14/envs/InvokeAI_3.10.14/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ryan/src/InvokeAI/invokeai/app/invocations/spandrel_image_to_image.py", line 167, in invoke
    with context.models.load(self.image_to_image_model) as spandrel_model:
  File "/home/ryan/src/InvokeAI/invokeai/backend/model_manager/load/load_base.py", line 60, in __enter__
    self._cache.lock(self._cache_record, None)
  File "/home/ryan/src/InvokeAI/invokeai/backend/model_manager/load/model_cache/model_cache.py", line 224, in lock
    self._load_locked_model(cache_entry, working_mem_bytes)
  File "/home/ryan/src/InvokeAI/invokeai/backend/model_manager/load/model_cache/model_cache.py", line 272, in _load_locked_model
    vram_bytes_freed = self._offload_unlocked_models(model_vram_needed, working_mem_bytes)
  File "/home/ryan/src/InvokeAI/invokeai/backend/model_manager/load/model_cache/model_cache.py", line 458, in _offload_unlocked_models
    cache_entry_bytes_freed = self._move_model_to_ram(cache_entry, vram_bytes_to_free)
  File "/home/ryan/src/InvokeAI/invokeai/backend/model_manager/load/model_cache/model_cache.py", line 330, in _move_model_to_ram
    return cache_entry.cached_model.partial_unload_from_vram(
  File "/home/ryan/.pyenv/versions/3.10.14/envs/InvokeAI_3.10.14/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ryan/src/InvokeAI/invokeai/backend/model_manager/load/model_cache/cached_model/cached_model_with_partial_load.py", line 182, in partial_unload_from_vram
    cur_state_dict = self._model.state_dict()
  File "/home/ryan/.pyenv/versions/3.10.14/envs/InvokeAI_3.10.14/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1939, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  File "/home/ryan/.pyenv/versions/3.10.14/envs/InvokeAI_3.10.14/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1936, in state_dict
    self._save_to_state_dict(destination, prefix, keep_vars)
  File "/home/ryan/.pyenv/versions/3.10.14/envs/InvokeAI_3.10.14/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1843, in _save_to_state_dict
    destination[prefix + name] = param if keep_vars else param.detach()
RuntimeError: Cannot set version_counter for inference tensor
```

### Explanation

From the `torch.inference_mode()` docs:
> Code run under this mode gets better performance by disabling view
tracking and version counter bumps.

Disabling version counter bumps results in the aforementioned error when
saving `GGMLTensor`s to a state_dict.

This incompatibility between `GGMLTensors` and `torch.inference_mode()`
is likely caused by the custom tensor type implementation. There may
very well be a way to get these to cooperate, but for now it is much
simpler to remove the `torch.inference_mode()` contexts.

Note that there are several other uses of `torch.inference_mode()` in
the Invoke codebase, but they are all tight wrappers around the
inference forward pass and do not contain the model load/unload process.

## Related Issues / Discussions

Original discussion:
https://discord.com/channels/1020123559063990373/1149506274971631688/1326180753159094303

## QA Instructions

Find a sequence of operations that triggers the condition. For me, this
was:
- Reserve VRAM in a separate process so that there was ~12GB left.
- Fresh start of Invoke
- Run FLUX inference with a GGML 8K model
- Run Spandrel upscaling

Tests:
- [x] Confirmed that I can reproduce the error and that it is no longer
hit after the change
- [x] Confirm that there is no speed regression from switching from
`torch.inference_mode()` to `torch.no_grad()`.
    - Before: `50.354s`, After: `51.536s`


## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
  • Loading branch information
RyanJDick authored Jan 7, 2025
2 parents 67e948b + 85eb4f0 commit 6b18f27
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions invokeai/app/invocations/spandrel_image_to_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ def upscale_image(

return pil_image

@torch.inference_mode()
@torch.no_grad()
def invoke(self, context: InvocationContext) -> ImageOutput:
# Images are converted to RGB, because most models don't support an alpha channel. In the future, we may want to
# revisit this.
Expand Down Expand Up @@ -197,7 +197,7 @@ class SpandrelImageToImageAutoscaleInvocation(SpandrelImageToImageInvocation):
description="If true, the output image will be resized to the nearest multiple of 8 in both dimensions.",
)

@torch.inference_mode()
@torch.no_grad()
def invoke(self, context: InvocationContext) -> ImageOutput:
# Images are converted to RGB, because most models don't support an alpha channel. In the future, we may want to
# revisit this.
Expand Down

0 comments on commit 6b18f27

Please sign in to comment.