-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flux crashes with latest ggml #553
Comments
This comment was marked as outdated.
This comment was marked as outdated.
@jeffbolznv according to the tensor list, q2k should not contain any GGML_TYPE_Q4_0_4_8 inside. Did you check if the enums for |
Ah, this was it. I'm not sure where I got Q4_0_4_8 from, running it in the debugger again I see the invalid format was 36 (equal to SD_TYPE_COUNT). It would be nice to be more robust to these mismatches, or at least static_assert that the COUNT values are the same. |
@LostRuins Good catch, that's exactly what's going on. I wrongly assumed that by "lastest ggml" jeffbolznv meant commit 6fcbd60, without paying attention to the details of the message. |
By the way @jeffbolznv, I know this is completely off topic, but do you know how we could make LoRAs work with quantized models on Vulkan? Right now it crashes with |
I'm surprised it crashes since ggml_backend_vk_device_supports_op should return false for this. I thought ggml was supposed to fallback to the host for that, but maybe you have to use ggml a certain way to allow for the fallback. I looked at cpy.cu, seems like it should not be terribly hard to implement similar shaders in Vulkan. Can you file an issue to track this? I may be able to do it soon. |
Here, or on the ggml repo? |
Maybe on llama.cpp, since that's where most work is happening. |
I'm using ggml @ c8bd0fee71dc8328d93be301bbee06bc10d30429 and sd @ dcf91f9, and using the vulkan backend. Trying to run a flux model using this command line:
I get a divide-by-zero crash in ggml_row_size because GGML_TYPE_Q4_0_4_8 is no longer supported after ggerganov/llama.cpp#10446. Is there a way to repack this, or do I need to use a different model or something? I'm generally just trying to run anything using flux to look at the performance.
The text was updated successfully, but these errors were encountered: