flux crashes with latest ggml #553

jeffbolznv · 2025-01-06T21:34:52Z

I'm using ggml @ c8bd0fee71dc8328d93be301bbee06bc10d30429 and sd @ dcf91f9, and using the vulkan backend. Trying to run a flux model using this command line:

sd --diffusion-model  flux1-dev-Q2_K.gguf --vae ae.safetensors --clip_l clip_l.safetensors --t5xxl t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v

I get a divide-by-zero crash in ggml_row_size because GGML_TYPE_Q4_0_4_8 is no longer supported after ggerganov/llama.cpp#10446. Is there a way to repack this, or do I need to use a different model or something? I'm generally just trying to run anything using flux to look at the performance.

The text was updated successfully, but these errors were encountered:

LostRuins · 2025-01-07T14:54:33Z

@jeffbolznv according to the tensor list, q2k should not contain any GGML_TYPE_Q4_0_4_8 inside.

Did you check if the enums for sd_type_t match exactly the enums for ggml_type? There should be exactly the same number, with TYPE_COUNT for both being equal to 39.

jeffbolznv · 2025-01-07T15:28:37Z

Did you check if the enums for sd_type_t match exactly the enums for ggml_type? There should be exactly the same number, with TYPE_COUNT for both being equal to 39.

Ah, this was it. I'm not sure where I got Q4_0_4_8 from, running it in the debugger again I see the invalid format was 36 (equal to SD_TYPE_COUNT). It would be nice to be more robust to these mismatches, or at least static_assert that the COUNT values are the same.

stduhpf · 2025-01-07T15:36:05Z

@LostRuins Good catch, that's exactly what's going on. I wrongly assumed that by "lastest ggml" jeffbolznv meant commit 6fcbd60, without paying attention to the details of the message.

stduhpf · 2025-01-07T15:45:27Z

By the way @jeffbolznv, I know this is completely off topic, but do you know how we could make LoRAs work with quantized models on Vulkan? Right now it crashes with Missing CPY op for types: f32 [quant_type]. I tried looking into it myself, but I can't grasp how the vulkan backend fundamentally works, so I had no luck.

jeffbolznv · 2025-01-07T16:40:25Z

Right now it crashes with Missing CPY op for types: f32 [quant_type]. I tried looking into it myself, but I can't grasp how the vulkan backend fundamentally works, so I had no luck.

I'm surprised it crashes since ggml_backend_vk_device_supports_op should return false for this. I thought ggml was supposed to fallback to the host for that, but maybe you have to use ggml a certain way to allow for the fallback.

I looked at cpy.cu, seems like it should not be terribly hard to implement similar shaders in Vulkan. Can you file an issue to track this? I may be able to do it soon.

stduhpf · 2025-01-07T16:42:34Z

Can you file an issue to track this? I may be able to do it soon.

Here, or on the ggml repo?

jeffbolznv · 2025-01-07T16:45:34Z

Maybe on llama.cpp, since that's where most work is happening.

This comment was marked as outdated.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flux crashes with latest ggml #553

flux crashes with latest ggml #553

jeffbolznv commented Jan 6, 2025

This comment was marked as outdated.

LostRuins commented Jan 7, 2025

jeffbolznv commented Jan 7, 2025

stduhpf commented Jan 7, 2025

stduhpf commented Jan 7, 2025

jeffbolznv commented Jan 7, 2025

stduhpf commented Jan 7, 2025

jeffbolznv commented Jan 7, 2025

flux crashes with latest ggml #553

flux crashes with latest ggml #553

Comments

jeffbolznv commented Jan 6, 2025

This comment was marked as outdated.

LostRuins commented Jan 7, 2025

jeffbolznv commented Jan 7, 2025

stduhpf commented Jan 7, 2025

stduhpf commented Jan 7, 2025

jeffbolznv commented Jan 7, 2025

stduhpf commented Jan 7, 2025

jeffbolznv commented Jan 7, 2025