q80 and f16 models fail with Critical error: Unsupported ...

While conversion with `convert-hf.py` seems to work, models in `q80` and `f16` formats can not be loaded. Here's the combinations I tried with ` Llama-3.3-70B-Instruct`:

| quant | buffer-float-type | error |
|------:|------------------:|-------|
| `q80` | `q40` | Critical error: Unsupported op quant: F_32/F_UNK/F_Q40 |
| `q80` | `q80` | Critical error: Unsupported CPU op code: MATMUL, quant: Q80_Q80_F32, op name: block_matmul_q |
| `q80` | `f16` | Critical error: Unsupported op quant: F_32/F_UNK/F_16 |
| `q80` | `f32` | Critical error: Unsupported op quant: F_32/F_Q80/F_32 |
| `f16` | `q40` | Critical error: Unsupported op quant: F_32/F_UNK/F_Q40 |
| `f16` | `q80` | Critical error: Unsupported op quant: F_Q80/F_16/F_32 |
| `f16` | `f16` | Critical error: Unsupported op quant: F_32/F_UNK/F_16 |
| `f16` | `f32` | Critical error: Unsupported op quant: F_32/F_16/F_32 |

I'm mostly interested in `q80` models with `f16` or higher precision for synchronization. With llama.cpp, 8-bit quantization usually yield very high performance (only slightly slower than 4-bit) without the sometimes obvious model degradation with 4-bit quantization.

Am I doing something wrong, or is support currently missing?

quant	buffer-float-type	error
`q80`	`q40`	Critical error: Unsupported op quant: F_32/F_UNK/F_Q40
`q80`	`q80`	Critical error: Unsupported CPU op code: MATMUL, quant: Q80_Q80_F32, op name: block_matmul_q
`q80`	`f16`	Critical error: Unsupported op quant: F_32/F_UNK/F_16
`q80`	`f32`	Critical error: Unsupported op quant: F_32/F_Q80/F_32
`f16`	`q40`	Critical error: Unsupported op quant: F_32/F_UNK/F_Q40
`f16`	`q80`	Critical error: Unsupported op quant: F_Q80/F_16/F_32
`f16`	`f16`	Critical error: Unsupported op quant: F_32/F_UNK/F_16
`f16`	`f32`	Critical error: Unsupported op quant: F_32/F_16/F_32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

q80 and f16 models fail with Critical error: Unsupported ... #183

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

q80 and f16 models fail with Critical error: Unsupported ... #183

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions