KVarN: Native vLLM backend for KV-cache quantization by Huawei
Posted by theanonymousone 5 days ago
Comments
Comment by throwa356262 5 days ago
Am I reading this right??
Comment by qeternity 5 days ago
Comment by sheepscreek 5 days ago
Comment by qeternity 3 days ago
But the point was that quality didn't magically increase.
Comment by electroglyph 5 days ago
Comment by 7e 5 days ago
Comment by thefox96 5 days ago
Comment by pbich 5 days ago
Comment by v3ss0n 5 days ago
Comment by esafak 5 days ago
edit: It might not be clear that it is based on vLLM 0.22, which is the current version: https://github.com/huawei-csl/KVarN/commit/d6290e99098d7426d.... All you have to do is create a diff off it; it's fairly straightforward.
Comment by jmalicki 5 days ago
Comment by woadwarrior01 5 days ago
Comment by electronsoup 5 days ago
Comment by thefox96 5 days ago
Comment by lukasc-ch 4 days ago
Comment by lukasc-ch 4 days ago
Comment by mikeayles 5 days ago
Comment by sspoisk 4 days ago
Comment by shockembopper 5 days ago
Comment by 0xjeffro 5 days ago