Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux crashes with latest ggml #553

Open
jeffbolznv opened this issue Jan 6, 2025 · 8 comments
Open

flux crashes with latest ggml #553

jeffbolznv opened this issue Jan 6, 2025 · 8 comments

Comments

@jeffbolznv
Copy link

I'm using ggml @ c8bd0fee71dc8328d93be301bbee06bc10d30429 and sd @ dcf91f9, and using the vulkan backend. Trying to run a flux model using this command line:

sd --diffusion-model  flux1-dev-Q2_K.gguf --vae ae.safetensors --clip_l clip_l.safetensors --t5xxl t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v

I get a divide-by-zero crash in ggml_row_size because GGML_TYPE_Q4_0_4_8 is no longer supported after ggerganov/llama.cpp#10446. Is there a way to repack this, or do I need to use a different model or something? I'm generally just trying to run anything using flux to look at the performance.

@stduhpf

This comment was marked as outdated.

@LostRuins
Copy link
Contributor

@jeffbolznv according to the tensor list, q2k should not contain any GGML_TYPE_Q4_0_4_8 inside.

Did you check if the enums for sd_type_t match exactly the enums for ggml_type? There should be exactly the same number, with TYPE_COUNT for both being equal to 39.

image
image

@jeffbolznv
Copy link
Author

Did you check if the enums for sd_type_t match exactly the enums for ggml_type? There should be exactly the same number, with TYPE_COUNT for both being equal to 39.

Ah, this was it. I'm not sure where I got Q4_0_4_8 from, running it in the debugger again I see the invalid format was 36 (equal to SD_TYPE_COUNT). It would be nice to be more robust to these mismatches, or at least static_assert that the COUNT values are the same.

@stduhpf
Copy link
Contributor

stduhpf commented Jan 7, 2025

@LostRuins Good catch, that's exactly what's going on. I wrongly assumed that by "lastest ggml" jeffbolznv meant commit 6fcbd60, without paying attention to the details of the message.

@stduhpf
Copy link
Contributor

stduhpf commented Jan 7, 2025

By the way @jeffbolznv, I know this is completely off topic, but do you know how we could make LoRAs work with quantized models on Vulkan? Right now it crashes with Missing CPY op for types: f32 [quant_type]. I tried looking into it myself, but I can't grasp how the vulkan backend fundamentally works, so I had no luck.

@jeffbolznv
Copy link
Author

Right now it crashes with Missing CPY op for types: f32 [quant_type]. I tried looking into it myself, but I can't grasp how the vulkan backend fundamentally works, so I had no luck.

I'm surprised it crashes since ggml_backend_vk_device_supports_op should return false for this. I thought ggml was supposed to fallback to the host for that, but maybe you have to use ggml a certain way to allow for the fallback.

I looked at cpy.cu, seems like it should not be terribly hard to implement similar shaders in Vulkan. Can you file an issue to track this? I may be able to do it soon.

@stduhpf
Copy link
Contributor

stduhpf commented Jan 7, 2025

Can you file an issue to track this? I may be able to do it soon.

Here, or on the ggml repo?

@jeffbolznv
Copy link
Author

Maybe on llama.cpp, since that's where most work is happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants