Kimi vendor verifier – verify accuracy of inference providers
Posted by Alifatisk 14 hours ago
Comments
Comment by foundry27 11 hours ago
Comment by jychang 41 minutes ago
https://github.com/MoonshotAI/K2-Vendor-Verifier
https://github.com/MoonshotAI/Kimi-Vendor-Verifier
Note, this is before K2.5 and K2.6 even launched.
Comment by bobbiechen 13 hours ago
For example, Sketchy Provider tells you they are running the latest and greatest, but actually is knowingly running some cheaper (and worse) model and pocketing the difference. These tests wouldn't help since Sketchy Provider could detect when they're being tested and do the right thing (like the Volkswagen emissions scandal). Right?
Comment by nulltrace 11 hours ago
If someone actually goes out of their way to bypass the check, that's a pretty different situation legally compared to just quietly shipping a cheaper quant anyway.
Comment by KeplerBoy 29 minutes ago
Running different GPU kernels / inference engines also matters. It's easy to write an implementation that is faster and thus cheaper but numerically much noisier / less accurate.
Comment by jychang 37 minutes ago
Comment by frogperson 9 hours ago
This is probably kimi trying to protect their brand from bargain basement providers that dont properly represent what the models are capable of.
Comment by stingraycharles 3 hours ago
Comment by latchkey 8 hours ago
I'm curious what exactly they mean by this...
"because we learned the hard way that open-sourcing a model is only half the battle."
Comment by gpm 12 hours ago
For a truly malicious actor, you're right. But it shifts it from "well we aren't obviously committing fraud by quantizing this model and not telling people" to "we're deliberately committing fraud by verifying our deployment with one model and then serving customer requests with another".
I suspect there's a lot of semi-malicious actors who are only happy to do the former.
Comment by j-bos 13 hours ago
Comment by gertlabs 10 hours ago
Kimi K2.6, however, is the new open source leader, so far. Agentic evaluations still in progress, but one-shot coding reasoning benchmarks are ready at https://gertlabs.com/?mode=oneshot_coding
Comment by kristianp 8 hours ago
Edit: Kimi K2 uses int4 during its training as well as inference [2]. I wonder if that affects the quality if different gguf creators may not convert these correctly?
[1] https://openrouter.ai/docs/guides/routing/model-variants/exa...
[2] https://www.reddit.com/r/LocalLLaMA/comments/1pzfuqg/why_kim...
Comment by gertlabs 8 hours ago
Going to test it out, thanks!
Comment by seism 13 hours ago
Comment by Majromax 8 hours ago
Comment by Lalabadie 10 hours ago
Comment by OsamaJaber 13 hours ago
Comment by punkpeye 4 hours ago
I run an AI gateway (Glama), and we had to delist all third-party providers because some of them are obviously lying about their quantization.
Being able to vet providers would be a major improvement to our ability to offer a more diverse set of providers.
Comment by m1keil 8 hours ago
Comment by curioussquirrel 13 hours ago
Comment by throwa356262 3 hours ago
Comment by yorwba 2 hours ago
Comment by charcircuit 11 hours ago
Comment by techpulselab 1 hour ago
Comment by nexustoken 2 hours ago