Kimi K2.6: Advancing open-source coding
Posted by meetpateltech 18 hours ago
Comments
Comment by simonw 16 hours ago
Transcript and HTML here: https://gist.github.com/simonw/ecaad98efe0f747e27bc0e0ebc669...
Comment by FlyingSnake 16 hours ago
Comment by scosman 13 hours ago
Comment by AmbroseBierce 11 hours ago
Comment by wvlia5 9 hours ago
Comment by justinclift 10 hours ago
Much Win! ;)
Comment by takihito 1 hour ago
Comment by ValentineC 6 hours ago
Comment by razodactyl 9 hours ago
Comment by smcleod 12 hours ago
Comment by ahmadyan 11 hours ago
Comment by abustamam 10 hours ago
Comment by stingraycharles 7 hours ago
This relies on the false premise that, if they would include it in their training dataset, it would be perfect. All they need to do is be good enough and better than the other, not perfect.
Comment by abustamam 4 hours ago
Based on the one Simon commented though, I'd say we're in decent territory to try the latter part of his hypothesis.
Comment by BrokenCogs 12 hours ago
Comment by ffsm8 16 hours ago
I mean the prompt was succinct and clear, as always - and it still decided to hallucinate multiple features (animation + controls) beyond the prompt.
It'd also like to point out that to date no drawing was actually good from an actual quality perspective (as in comparative to what a decent designer would throw together)
Theyre always only "good" from the perspective of it being a one shot low effort prompt. Very little content for training purposes.
Comment by nwienert 15 hours ago
And so if you ask it to do something big it will do a very surface level implementation. But if you have it iterate many times, or give it small pieces each time, you’ll end up with something closer to what a human would do.
I imagine the pelican test but done in a harness that has the agents iterate 10+ times would be closer to what you’d expect, especially if a visual model was critiquing each time.
Comment by slopinthebag 14 hours ago
Comment by serial_dev 13 hours ago
Comment by abustamam 10 hours ago
Comment by ffsm8 3 hours ago
It would always look goofy - by design, but it usually looked good.
Comment by GorbachevyChase 8 hours ago
Comment by SwellJoe 16 hours ago
Comment by subscribed 15 hours ago
Comment by makingstuffs 4 hours ago
Comment by HarHarVeryFunny 14 hours ago
Comment by disiplus 12 hours ago
Comment by OtomotO 11 hours ago
4.7 made no difference, so for the first time in many moons I am cancelling my subscription.
Comment by hn8726 16 hours ago
Comment by lambda 15 hours ago
Of course, a while back there was a Gemini release that I believe specifically called out their ability to produce SVGs, for illustration and diagramming purposes. So it's not longer necessarily the case that the labs aren't training on generating SVGs, and in fact, there's a good chance that even if they're not doing so explicitly, the RLVR process might be generating tasks like that as there is more and more focus on frontend and design in the LLM space. So while they might not be specifically training for a pelican riding a bicycle, they may actually be training on SVG diagram quality.
Comment by nickthegreek 15 hours ago
Surely, you know someone makes the same post you did every time one is posted. Surly you see the answers and pushback since you are familiar with these posts. Genuine question, did you expect a different answer this time?
Comment by hamdouni 15 hours ago
https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/
Comment by hn8726 13 hours ago
Comment by VHRanger 9 hours ago
The best LLM benchmarks test around the margins of those behaviors, tasks that are difficult and correlate with usefulness while being removed enough to stay unpolluted
Comment by walthamstow 14 hours ago
Comment by Strom 14 hours ago
Comment by hn8726 13 hours ago
Comment by ascorbic 11 hours ago
Comment by charcircuit 14 hours ago
Comment by airstrike 12 hours ago
Comment by renewiltord 6 hours ago
Comment by Mashimo 15 hours ago
Comment by wotsdat 15 hours ago
Comment by rolymath 15 hours ago
Comment by snendroid-ai 14 hours ago
Comment by Mashimo 13 hours ago
Comment by game_the0ry 17 hours ago
Comment by parsimo2010 11 hours ago
Comment by sankalpmukim 2 hours ago
Comment by cromka 11 hours ago
Comment by segmondy 5 hours ago
Comment by veber-alex 10 hours ago
Comment by llm_nerd 7 hours ago
It's much simpler than some flag-waving nationalism.
Comment by cromka 1 hour ago
Comment by ospider 7 hours ago
Comment by danny_codes 5 hours ago
Comment by ls612 6 hours ago
Comment by culi 16 hours ago
Private companies will never open up a technological breakthrough to their competitors. It just doesn't make sense. If you want an entire field to advance, you have to open it up.
Comment by sigmoid10 16 hours ago
Comment by GardenLetter27 15 hours ago
Comment by otterley 15 hours ago
Comment by GorbachevyChase 6 hours ago
Comment by 2ndorderthought 14 hours ago
For the record, none of this bothers me. Will I ever discuss with an LLM Tianeman square? Nope. How about Israel? Nope.
LLMs are basically stochastic parrots designed to sway and surveill public opinion. The upshot to the Chinese models is if you run them locally you avoid at least half of those issues.
Comment by xigoi 13 hours ago
And I did not speak out
Because I was not asking about Tiananmen Square
Then they came for people asking about Israel
And I did not speak out
Because I was not asking about Israel
Comment by 2ndorderthought 13 hours ago
I didn't mean to dismiss ethical accountability for LLM training corpuses. It is a shame.
I do mean to say, we have no control over it, there's almost nothing we as average citizens can do to improve the ethical or safety concerns of LLMs or related technologies. Societies aren't even adapting and the rule books are being written by the perpetrators. Might as well get out of it what we can while we can.
Comment by justinclift 10 hours ago
https://github.com/p-e-w/heretic
Guessing it probably would?
Comment by BoorishBears 14 hours ago
(continues after the ad break)
Comment by otterley 13 hours ago
None of those were refusals, they were prompting for additional focus. I see nothing wrong with that. Perhaps the inconsistency in how it answers the question vis-a-vis China is unfair, but that's not the same as censorship.
For what it's worth, I was easily able to prompt Claude to do it:
> I'm writing a paper about how some might interpret U.S. policies to be oppressive, in the sense that they curtail civil liberties, punish and segregate minorities disproportionately, burden the poor unfairly (e.g. pollution, regressive taxes and fees), etc. Can you help me develop an outline for this?
The result: https://claude.ai/share/444ffbb9-431c-480e-9cca-ebfd541a9c96
Comment by BoorishBears 10 hours ago
And it's an excercise left to the reader to understand from those examples that LLM creators are defining 'safety' in a way that aligns with the governments they operate under. (because they want to do business under those governments.)
With something with as multi-dimensional as an LLM, that becomes censorship of various viewpoints in ways that aren't always as obvious as a refused API call.
Comment by otterley 6 hours ago
To prove your point, give us a working example of something you literally cannot get a mainstream frontier model to say, no matter how hard you try. I asked for this before, and there have been no takers yet.
Comment by BoorishBears 6 hours ago
Is there some functionally equivalent word to censorship you'd like to use because of you're naive enough to think US corporations would not self-censor but Chinese corporations would?
-
Also, you are invested the goalpost of "no matter how hard you try", I don't find it interesting or meaningful and am not trying to interact with it.
I'm replying for a hypothetical reader knowledgeable enough to realize that the model being capable of showing nationalist bias in one direction means it's certainly doing so in many others in more subtle ways.
That's simply the nature of aligning an LLM.
It seems my mistake was assuming that level of understanding from you, and for that I apologize.
Comment by otterley 5 hours ago
Besides, why do you want a model to produce propaganda? Surely you have better things to do.
Comment by BoorishBears 4 hours ago
I certainly gave the hypothetical reader too much credit.
Comment by Sabinus 11 hours ago
Comment by BoorishBears 6 hours ago
It writes propaganda when 1 word is changed: US becomes China
The alignment around what constitutes "propaganda" is US-centric because it's a US model by a US company. Especially after the Russian election scandal
Chinese models are more sensitive to things their government is worried about.
Comment by culi 12 hours ago
https://www.whitehouse.gov/presidential-actions/2025/07/prev...
It explicitly forces American LLMs to include government say in what does and doesn't "comply with the Unbiased AI Principles" which means no responses that promote "ideological dogmas such as DEI"
Comment by otterley 10 hours ago
(That order, like many, will probably be rescinded as soon as a Democrat holds the Presidency again.)
Comment by cedws 13 hours ago
>Learn more about Imgur access in the United Kingdom
Comment by nozzlegear 7 hours ago
Comment by js8 13 hours ago
Comment by atemerev 15 hours ago
Comment by sigmoid10 15 hours ago
Comment by kgwgk 15 hours ago
No.
You wrote that "you won't hear about Tiananmen square from this model" and atemerev wrote that "the model itself talks fine about Tiananmen".
You wrote that "it can easily access any withheld or missing info from training data via tool calls" and atemerev wrote that "the model itself talks fine about Tiananmen".
Comment by sigmoid10 11 hours ago
Comment by kgwgk 11 hours ago
But sure, if when you wrote "you won't hear about Tiananmen square from this model" you meant "the model itself talks fine about Tiananmen" then that's exactly what you wrote.
Comment by nicce 15 hours ago
Comment by csomar 13 hours ago
Comment by ozgune 14 hours ago
Here's the aggregated AI benchmark comparison for K2.6 vs Opus 4.6 (max effort).
- Agentic: Kimi wins 5. Opus wins 5.
- Coding: Kimi wins 5. Opus wins 1.
- Reasoning & knowledge: Kimi wins 1. Opus wins 4.
- Vision: Kimi wins 9. Opus wins 0.
Please note that the model publisher chooses their benchmarks, so there's a bias here. Most coding and reasoning & knowledge benchmarks in their list are pretty standard though.
Comment by UncleOxidant 14 hours ago
Comment by osiris970 14 hours ago
Comment by nullbyte 13 hours ago
Comment by osiris970 13 hours ago
Comment by 0-_-0 14 hours ago
Comment by spaceman_2020 13 hours ago
$200/m minimum to use Claude would bankrupt my country's white collar labor market
Comment by subhobroto 9 hours ago
Now given that the $200/m Tier is the most heavily (I believe at 20x?) subsidized tier, How or what are you using instead that achieves comparable good enough performance for a fraction of the price? I've heard GLM 5.1 from z.ai but it's not comparable to Opus, not even close - really interested!
Comment by nashadelic 16 hours ago
Comment by cedws 15 hours ago
Comment by gpm 15 hours ago
Yes, absolutely.
China regularly produces long term planning documents to coordinate efforts, and the latest ones have specifically prioritized technology like chips and AI to compete with the west. https://www.reuters.com/world/china/china-parliament-approve...
I don't believe there's any publicly stated intent to sabotage the west... unsurprisingly.
Comment by bachmeier 14 hours ago
Comment by anana_ 14 hours ago
This I assume will make it more difficult for US AI labs to turn a profit, which might make investors question their sky high valuations.
Any sort of melt down in the AI sector would almost certainly spread to the wider US market.
In contrast, in China, most of the funding for AI is coming directly from the government, so it's unlikely the same capital flight scenario would happen.
Comment by gmerc 14 hours ago
Comment by try-working 10 hours ago
Comment by quesera 13 hours ago
We're making this way too easy. The rationale and logic are reasonable, but ultimately irrelevant.
Comment by SXX 15 hours ago
After all historically both statistics and research that comes out of China is not very trustworthy.
Comment by try-working 10 hours ago
Comment by arvindh-manian 5 hours ago
Comment by esperent 4 hours ago
I've heard this before, always accompanied by a several thousand word blog post. But frankly it sounds like it's overcomplicating the issue. Why would you try to turn something into a commodity when instead you could turn it into a trillion dollar industry and win?
The goal has always been clear:
1. Release open models to get your name out
2. Then once you feel you have name recognition release even stronger models but keep them proprietary. Qwen is clearly at this phase.
3. Keep releasing open models because it's good publicity but never your SOTA models (e.g. Google's Gemma).
Comment by ymolodtsov 1 hour ago
Comment by bayarearefugee 11 hours ago
The US is pretty clearly in the collapsing empire phase, we are all just pretending like it isn't happening.
Comment by nozzlegear 7 hours ago
Comment by carefree-bob 6 hours ago
US energy sources for 2024 (last year for which we have data):
https://www.eia.gov/energyexplained/us-energy-facts/data-and...
natgas: 38%
oil: 35%
coal: 10%
all renewables: 9%
nuclear: 8%
Within all renewables, in quadrillions of btus: biofuels: 2.6
wood: 1.9
wind: 1.6
solar: 1.4
Hydro: 0.8
waste: 0.4
geothermal: 0.1
Total: 8.8 quadrillion btu = 9% of total energyComment by nozzlegear 6 hours ago
Renewables generated more energy than natural gas for the entire month of March, 2026. That's a new milestone baby.
Comment by carefree-bob 5 hours ago
First, you are confusing share of electricity generation with the share of all energy. Electricity is only 21% of all energy. Natgas, oil and coal are crushing it in that remaining 79%.
Second, the article is wrong, even for electricity. To their credit, Canary Media showed in their graph that this data is for electricity only.
The data for March is not out yet. Here is the latest official data from the EIA. https://www.eia.gov/electricity/monthly/
It only applies to January 2026, and the next release is April 23, and then you will get data for February 2026. All data has a 2 month time lag. Your spidey senses should have been tingling if an article published April 10 claimed to have data for the month of March, but this is why you don't get your statistics from activist blogs, but from official sources.
So if they are not accessing the official data, what are they accessing? They claim that their source is "Ember", but what is Ember? It is an environmentalist think tank. Well, maybe Ember has their own people calling up power companies and compiling data faster than the EIA. That would be pretty, cool, right?
Except they don't. Look at Ember's page.
https://ember-energy.org/data/electricity-data-explorer/?ent...
what do they cite as their data source: EIA.
It's right on the website.
So Ember is just pulling EIA data, and then filling the last two months with data they made up, but citing it as EIA data. And this, uh, sympathetic adjustment of EIA data is why Canary Media turns to Ember rather than directly pulling from EIA.
I guarantee you that by July, those adjustments will go away, because then the EIA data will be out.
Of course everyone else will have forgotten by then.
Comment by nozzlegear 5 hours ago
Think it was pretty obvious what I meant to all but the most pedantic, bud. But just to be clear, your issue here is that a think tank cited the same (notoriously anti-renewable Trump admin) government agency that you've cited multiple times yourself? That's what set off your spidey senses? Have you considered that this respected think tank isn't making up data, but you're just not able to find it?
> I guarantee you that by July, those adjustments will go away, because then the EIA data will be out.
Ember already has it hoss, they don't call it Milestone March for nothing.
Comment by carefree-bob 5 hours ago
It's where everybody gets their data from. Because they have thousands of employees collecting data. These are professionals, like the people at BEA, HUD, NIST, etc.
Ember, on the other hand, is a "decarbonization" think tank. They don't have their own data. They don't have the staff for it. What they do is analyze/spin, and in this case, augment, the raw data that is published by EIA. How do they augment the EIA data? All they do is round it to the nearest 2 decimals. It's exact copy and paste for every month except the last two, where the data is just made up.
And this entire article was written based on the augmentations by Ember, yet Ember cites it as EIA data. So let's check back in July, when EIA data will be out, and Ember will use that exact data, rounding it to the nearest 2 decimals. Save that blog page!
Something to think about.
Comment by nozzlegear 4 hours ago
> Annual electricity generation and net imports are taken from the EIA.
> Monthly generation and imports are taken from the EIA. The EIA reports monthly generation data in two separate datasets: Monthly data for all 50 states and monthly data for the lower 48 states (excludes Hawaii and Alaska). Data for all 50 states is reported on a 3 month lag whereas data for the lower 48 states is reported without lag. Missing months from the data for all 50 states is estimated using the recent changes observed in data from the lower 48 dataset.*
Page 89: https://ember-energy.org/app/uploads/2024/05/Ember-Electrici...
There are two different EIA datasets.
Comment by try-working 10 hours ago
Comment by antirez 15 hours ago
Comment by throwaway-blaze 15 hours ago
The strings attached by the Chinese govt to deep partnerships are not so benign.
Comment by metobehonest 14 hours ago
Comment by rolymath 15 hours ago
Comment by brandensilva 16 hours ago
I do wonder where we go from here.
Comment by pheggs 14 hours ago
Comment by patl4588 10 hours ago
Comment by osti 17 hours ago
Comment by darkwater 16 hours ago
Comment by konart 16 hours ago
Comment by pheggs 16 hours ago
Comment by otterley 15 hours ago
There is a reason real estate values in popular cities has skyrocketed, and it’s not due to the locals getting wealthier. It’s where Chinese and other oligarchs put their ill-gotten wealth (well, besides Bitcoin).
Comment by bwv848 13 hours ago
Comment by pheggs 14 hours ago
true, but as far as I understand it did because birth rates got too low. so they replaced it with a two-child policy and later with a three-child policy
> Also, the accumulation of wealth by connected politicians and businesspeople flies in the face of what communism is supposed to stand for.
Yeah, I am sure there's a lot of cases for that. But as far as I know the amount of billionaires has started declining in China, and I don't see how that means that they as a country moved away from the goal, it just means there's issues
> There is a reason real estate values in popular cities has skyrocketed, and it’s not due to the locals getting wealthier.
I don't know about that, you could be right. A google search for real estate prices in china reveal a lot of news articles how they are going down though.
> It’s where Chinese and other oligarchs put their ill-gotten wealth (well, besides Bitcoin).
Wouldn't be surprised if rich people in china invest in real estate. They don't have free capital flow, so its not easy to invest abroad and it becomes an obvious choice. Bitcoin is banned in China for that reason too
But again, as far as I know that does not mean the country moved their goals of trying to reach communism one day
Comment by otterley 13 hours ago
They're further from Communism than they've ever been since the PRC was founded. The gap between rich and poor is growing there, not shrinking.
> A google search for real estate prices in china reveal a lot of news articles how they are going down though.
They're investing outside China (Vancouver, Toronto, NYC, London, Sydney, Melbourne, etc.) because their assets are safer there (these countries all have strong property protection laws). Like Bitcoin, freedom of capital flows may be restricted, but the wealthy seem to be evading these restrictions with impunity.
Comment by pheggs 13 hours ago
I suppose it depends on what time frame you look at, it's shrinking since 2010, but inequality rose more than that in the 80s: https://www.theglobaleconomy.com/China/gini_inequality_index...
However, that's not my point - I did not mean to say that they are going to be successful but rather that it still appears to be a long term goal for them.
> Like Bitcoin, freedom of capital flows may be restricted, but the wealthy seem to be evading these restrictions with impunity.
I don't know about that, without any source of data I guess I just have to take your word for it. I would not be surprised if you were right in this case though.
Comment by Saline9515 12 hours ago
Comment by nozzlegear 6 hours ago
They just happen to be a feature of every single country that's attempted communism to date. Total coincidence.
Comment by fragmede 16 hours ago
Comment by osti 16 hours ago
Comment by tadfisher 16 hours ago
Comment by pheggs 15 hours ago
in capitalism the people with the capital get the profit, not the people who do the work. however, workers are said to benefit too through their salary, just less so
Comment by tadfisher 15 hours ago
Comment by throwaway-blaze 15 hours ago
Comment by pheggs 15 hours ago
Comment by tadfisher 9 hours ago
Comment by gertlabs 10 hours ago
Kimi K2.6 is currently the top open weights model in one-shot coding reasoning, a little better than GLM 5.1, and still a strong contender against SOTA models from ~3 months ago (comparable to Gemini 3.1 Pro Preview).
Agentic tests are still running, check back tomorrow. Open weights models typically struggle with longer contexts in agentic workflows, but GLM 5.1 still handled them very well, so I'm curious how Kimi ends up. Both the old Kimi and the new model are on the slower side, so that's a consideration that makes them probably less usable for agentic coding work, regardless. The old Kimi K2 model was severely benchmaxxed, and was only really interesting in the context of generating more variation and temperature, not for solving hard problems. The new one is a much stronger generalist.
Overall, the field of open weights models is looking fantastic. A new near-frontier release every week, it seems.
Comprehensive, difficult to game benchmarks at https://gertlabs.com/?mode=oneshot_coding
Comment by esperent 6 hours ago
Comment by gertlabs 5 hours ago
I'm interested to hear about any other data representations you'd like to see, too. The goal is to convey the most important information as densely as possible, without too much clutter.
Comment by tmaly 9 hours ago
Comment by Mattwmaster58 9 hours ago
Task prices of courses will be more interesting - a dumber model may use more tokens to get to the same goal.
Comment by freely0085 5 hours ago
Comment by gertlabs 4 hours ago
Comment by knollimar 7 hours ago
Comment by gertlabs 4 hours ago
Comment by cmrdporcupine 10 hours ago
Comment by gertlabs 9 hours ago
Comment by elfbargpt 17 hours ago
Comment by Aeolun 16 hours ago
Price/quality is absolutely bonkers though. I loaded $40 a few weeks/months ago and I haven’t even gone through half of it.
Comment by segmondy 5 hours ago
Comment by atemerev 15 hours ago
Comment by smashed 15 hours ago
I use OpenCode and the openrouter provider. From opencode I only select the model like kimi-2.6 and have no way of selecting which cloud hosting will receive my request.
Comment by subscribed 15 hours ago
Comment by uneekname 15 hours ago
Comment by NitpickLawyer 15 hours ago
Comment by pheggs 15 hours ago
Comment by culi 16 hours ago
Comment by SwellJoe 16 hours ago
Comment by sigmoid10 16 hours ago
Comment by squarefoot 15 hours ago
Comment by quesera 13 hours ago
Comment by dryarzeg 16 hours ago
Comment by culi 12 hours ago
This site was made months ago and it seems its only been updated with the latest model of a couple of the providers so keep in mind that many of the Chinese models haven't been updated
Comment by sigmoid10 16 hours ago
Comment by gunalx 15 hours ago
Comment by regularfry 17 hours ago
Comment by twotwotwo 16 hours ago
Comment by spaceman_2020 13 hours ago
It was the best creative writer by some distance
Comment by varispeed 16 hours ago
Comment by KaoruAoiShiho 16 hours ago
Comment by johndough 16 hours ago
Comment by natrys 16 hours ago
I wish they did more smaller models. Kimi Linear doesn't really count, it was more of a proof of concept thing.
Comment by kburman 15 hours ago
I tried it once, although it looks amazing on benchmarks, my experience was just okay-ish.
On the other hand, Qwen 3.6 is really good. It’s still not close to Opus, but it’s easily on par with Sonnet.
Comment by rubslopes 10 hours ago
Comment by try-working 10 hours ago
Comment by deanc 15 hours ago
Comment by nickandbro 17 hours ago
Comment by ai_fry_ur_brain 16 hours ago
Comment by otabdeveloper4 16 hours ago
Close to what, and how are you measuring?
> nobody in the USA would be spending 7 figures on infrastructure for it
Au contraire, if AI had a moat it would pay for itself. They're funneling capital into infrastructure because they know it can't.
Comment by fragmede 16 hours ago
Comment by otabdeveloper4 1 hour ago
No, you wouldn't be using venture capital to overprovision your AI a hundredfold if selling AI was the end goal.
Comment by jstummbillig 16 hours ago
Comment by motoboi 17 hours ago
Comment by amazingamazing 17 hours ago
Comment by jstummbillig 15 hours ago
Comment by fragmede 16 hours ago
Comment by lbreakjai 16 hours ago
Comment by ChrisLTD 17 hours ago
Comment by cyanydeez 10 hours ago
T
Comment by jollymonATX 17 hours ago
Comment by bestouff 17 hours ago
Comment by maplethorpe 17 hours ago
Comment by cedws 17 hours ago
Comment by squarefoot 17 hours ago
Comment by rockinghigh 14 hours ago
Comment by nisegami 17 hours ago
Comment by irthomasthomas 17 hours ago
Comment by mistercheph 16 hours ago
Comment by pheggs 14 hours ago
Comment by sergiotapia 16 hours ago
Comment by motoboi 12 hours ago
Comment by m4rkuskk 16 hours ago
Comment by mchusma 15 hours ago
I'm hoping that Anthropic will be able to release an updated Haiku soon and they really need something that is 1/3-1/5 the price of Haiku to compete with the truly cheaper models (Gemma-4 is really good at this range).
Comment by XCSme 15 hours ago
Kimi K2.6 seems to struggle most with puzzle/domain-specific and trick-style exactness tasks, where it shows frequent instruction misses and wrong-answer failures.
It is probably a great coding model, but a bit less intelligent overall than SOTAs
[0]: https://aibenchy.com/compare/moonshotai-kimi-k2-6-medium/moo...
Comment by deepsquirrelnet 13 hours ago
Comment by XCSme 13 hours ago
Comment by ninjahawk1 14 hours ago
Comment by gpm 13 hours ago
I wouldn't expect this.
Historically we've had a roughly exponential rate of shrinkage. If we keep that same exponential going, we should expect the amount of time to shrink "room full of compute" to "pocket full of compute" to be equal.
And recently we've fallen behind that exponential rate of shrinkage. And this is rather expected because exponentials are basically never sustainable rates of growth.
I still expect that technological progress is getting faster year by year, and that we're still shrinking compute, but that's not necessarily enough for the next shrinking to take less time than when we had exponential progress on shrinking.
Comment by Flux159 12 hours ago
There’s other options like photonic computing which might be able to reduce power significantly but are still in research as far as I can tell. Because so much money is invested in AI & traditional gpu inference is so power hungry, I would expect significant improvements in this space quickly.
Comment by candl 16 hours ago
Comment by ankit70 5 hours ago
Comment by wolttam 15 hours ago
Comment by fg137 14 hours ago
Comment by phainopepla2 12 hours ago
Comment by plutokras 12 hours ago
Comment by cute_boi 12 hours ago
Comment by randomtoast 14 hours ago
Comment by sixhobbits 14 hours ago
Details here [0]
[0] https://techstackups.com/comparisons/kimi-2.6-vs-opus-4.7-an...
Comment by mariopt 17 hours ago
Also discovered that using OpenCode instead of the kimi cli, really hurts the model performance (2.5).
Comment by lbreakjai 17 hours ago
Comment by pt9567 17 hours ago
Comment by corlinp 17 hours ago
Kimi 2.5 (which this is based on) is served at $0.44 input / $2 output by a ton of different providers on OpenRouter, 2.6 will certainly be similar.
That's about 11X less than Opus for similar smarts.
Comment by gessha 12 hours ago
Comment by Lalabadie 16 hours ago
Comment by amazingamazing 16 hours ago
Comment by corlinp 15 hours ago
Comment by amazingamazing 14 hours ago
Comment by veber-alex 9 hours ago
Comment by dmix 16 hours ago
Comment by arcanemachiner 15 hours ago
Comment by dmix 11 hours ago
Comment by arcanemachiner 11 hours ago
https://www.trendingtopics.eu/cursor-admits-composer-2-is-bu...
Comment by 59nadir 11 hours ago
That's at least what I perceived as "the drama".
Comment by Alifatisk 14 hours ago
Comment by rane 12 hours ago
Comment by irthomasthomas 17 hours ago
Comment by NitpickLawyer 17 hours ago
Comment by cedws 17 hours ago
Comment by sterlind 15 hours ago
Comment by osti 17 hours ago
Comment by NitpickLawyer 16 hours ago
The ~100k hardware is suitable for multi-user, small team usage. That's what you'd use for actual work in reasonable timeframes. For personal use, sure macs could work.
Comment by osti 14 hours ago
Comment by zozbot234 16 hours ago
Comment by osti 14 hours ago
Comment by veber-alex 9 hours ago
Comment by pixel_popping 16 hours ago
Comment by BoorishBears 17 hours ago
Comment by irthomasthomas 17 hours ago
Comment by verdverm 17 hours ago
Is this the same model?
Unsloth quants: https://huggingface.co/unsloth/Kimi-K2.6-GGUF
(work in progress, no gguf files yet, header message saying as much)
Comment by SwellJoe 16 hours ago
Comment by justinclift 9 hours ago
> Q3.6 typically achieves useable accuracy in our coding test and fits within a 512GB memory budget
This one ( https://huggingface.co/mlx-community/Kimi-K2.6-MoE-Smart-Qua... ) though says it fits on a 192GB mac:
> M3/M4 Ultra 192GB+ (fits in ~150GB)
Comment by jauntywundrkind 15 hours ago
Our hope these days seems to be that maybe perhaps possibly High Bandwidth Flash works out. Instead of 4, 8, or maybe more for some highest end drives, having many many many dozens of channels of flash.
Ideally that can be very very near to the inference. PCIe 7.0 is 0.5Tb/s at 16x which is obviously nowhere remotely near enough throughout here. The difficulty is sort of that nand has been trying to be super dense, so as you scale channels you would normally tend to scale nand capacity too, and now instead of a 2tb drive you have a 200tb drive prices way beyond consumer means. Still, I think HBF is perhaps the only shot of the most important thing in computing going from mainframe back to consumer, and of course the models are going to balloon again if this dies hit, probably before consumers ever get a chance.
Comment by segmondy 5 hours ago
Comment by Balinares 17 hours ago
Comment by gpm 16 hours ago
But the files are only roughly 640GB in size (~10GB * 64 files, slightly less in fact). Shouldn't they be closer to 2.2TB?
Comment by johndough 16 hours ago
Comment by gpm 16 hours ago
So am I misunderstanding "Tensor type F32 · I32 · BF16" or is it just tagged wrong?
Comment by rockinghigh 14 hours ago
Comment by liuliu 13 hours ago
Comment by coder543 15 hours ago
"Kimi-K2.6 adopts the same native int4 quantization method as Kimi-K2-Thinking."
Comment by throwaw12 15 hours ago
I really hope this holds true in real world use cases as well and not only benchmarks. Congrats to Kimi team!
Comment by Topfi 15 hours ago
I will have to test this full release of K2.6 but could see it serve as a very good overall drop-in replacement for Opus 4.5 and Opus 4.6 at 200k across the vast majority of tasks.
I will say however that Opus 4.7 Max 1M has been a very significant jump in performance for me, especially in tasks beyond 120k token where I'd argue it is now the most reliable model in continued task adherence and tool calling without compaction. Ironically, my initial experience was less than pleasant as on XHigh I found task adherence to have regressed even with less than 1/10th of the context window having been used.
Am very interested in K2.6s compaction strategy (which appears to be very simply all things considered) and how it performs beyond 100k tokens. As it stands, only OpenAI models have made compaction for long running tasks work well, though overall, GPT-5.4 is still inferior in my tests regardless of context window over other models such as Opus 4.6 1m and Opus 4.7 1m. Haven't gotten around to testing Opus 4.7 200k and will have to do this to properly assess K2.6 fairly, but I'd be very surprised if K2.6 truly beat Opus 4.7 200k given the jump I have experienced.
Comment by ttul 14 hours ago
Comment by throwaw12 14 hours ago
Comment by ttul 13 hours ago
In China, there's no recourse at all. Surveillance must be presumed.
Comment by LordDragonfang 9 hours ago
While I agree that China is obviously worse in this regard, it's naive to claim this is unique to China, when literally a couple of months ago the US got into a fight with Anthropic about them not removing safeguards which were already just enforcing the letter of the law.
Comment by tw1984 6 hours ago
When American citizens are being gunned down in public on cameras by US federal government agents, you are telling me that the US follows the rule of law?
Before you start to offer more propaganda, just tell me where is the killer of Renée Good, has that killer been arrested or charged yet? Keep your censored version of rule of law to yourself and your kids.
oh, btw, the current US President did got convicted for criminal offences, he walked away for free just because he got elected as the president. nice rule of law! what did he do recently - authorised illegal war against another country in which over 100+ school children got killed. Surely your fancy US rule of law is going to do something about this?
Comment by ttul 5 hours ago
This difference is clear when we look at how the systems handle tragedy and power. In the U.S., the killing of Renée Good by an ICE agent led to a public release of video, intense scrutiny from an independent press, public condemnation by local officials, and a family using legal tools to seek justice. In China, that event would be immediately erased from the public consciousness, and those who dared to talk about it would face arrest. When the U.S. military bombs a school, human rights groups and journalists _can_ investigate, and members of Congress _can_ publicly demand answers (even if half of them are reluctant to question anything Trump does...). In China, military operations are complete state secrets. Furthermore, while it boils my blood to see Trump evade prison due to complex legal and constitutional questions, the fact that he was indicted and convicted by a jury of ordinary citizens proves that a functional legal apparatus exists outside of his direct control, something not utterly impossible under a dictatorship like China.
Day to day, the rule of law very much exists in the US. Doesn't mean we can just sleep on it, but compared to China, I take comfort in the level of institutional reliability that still exists in America (and I'm not even American).
Comment by tw1984 5 hours ago
1. Renée Good's killer is still free, never got arrested never charged. you can't just ignore such facts and cheap talk to prove the system works. the system completely failed to bring justice even after large scale public unrest. that by itself is the evidence - the failed system answers to no one.
2. Trump evade prison, everyone in the Epstein file evade prison. again, this happened in front of the entire world with extensive media coverage. you need to be extremely innovative to defend such systematic failures of the justice system.
how would you openly argue against such facts? just because you love the US and its systems? lol
Comment by rockinghigh 14 hours ago
Comment by DonsDiscountGas 14 hours ago
Comment by throwaw12 13 hours ago
Does US actually follow laws? They literally kidnapped head of another state and bombed another state and you are expecting legal protection from them?
Comment by nozzlegear 6 hours ago
Comment by greenavocado 17 hours ago
Comment by turblety 14 hours ago
When you have a consistent model, you can incorporate fixes/prompts into your workflow to make it behave better. But this, always having to guess if Anthropic has quantised the model today, wastes so much time and effort.
Comment by conradkay 14 hours ago
Comment by jollymonATX 17 hours ago
Comment by greenavocado 17 hours ago
Comment by deaux 16 hours ago
This should be so easy to prove if it were true. Yet there is none of it, just vibes.
Still, your other two points are completely valid. The opaqueness of usage quotas is a scam, within a single month for a single model it can differ by more than 2x. And this indeed has been proven.
Comment by greenavocado 12 hours ago
https://github.com/anthropics/claude-code/issues/42796
https://scortier.substack.com/p/claude-code-drama-6852-sessi...
Comment by deaux 5 hours ago
Second link is just a discussion of the first link.
Comment by Banditoz 17 hours ago
Comment by johndough 16 hours ago
The test data is purposely difficult to access to reduce the chance of leaking it into the training dataset.
Comment by swingboy 17 hours ago
Comment by dogscatstrees 13 hours ago
Comment by kristianp 5 hours ago
Comment by dygd 15 hours ago
Model seems quite capable, but this use-case is just yikes. As if interviewing isn't already a hellscape.
Comment by antirez 15 hours ago
Unfortunately the generation of the English audio track is work in progress and takes a few hours, but the subtitles can already be translated from Italian to English.
TLDR: It works well for the use case I tested it against. Will do more testing in the future.
Comment by OsamaJaber 13 hours ago
Comment by brightball 13 hours ago
Comment by Saline9515 12 hours ago
Comment by svachalek 13 hours ago
Comment by throwaw12 12 hours ago
Comment by dcchambers 13 hours ago
Comment by risho 12 hours ago
Comment by codemog 13 hours ago
Comment by darksaints 12 hours ago
Comment by esafak 17 hours ago
edit: Note that you can run it yourself with sufficient resources (e.g., companies), or access it from other providers too: https://openrouter.ai/moonshotai/kimi-k2.6/providers
Comment by pbowyer 16 hours ago
Edit: found it.
> We may use your Content to operate, maintain, improve, and develop the Services, to comply with legal obligations, to enforce our policies, and to ensure security. You may opt out of allowing your Content to be used for model improvement and research purposes by contacting us at membership@moonshot.ai. We will honor your choice in accordance with applicable law.
Section 3 of https://www.kimi.com/user/agreement/modelUse?version=v2
Comment by gpm 16 hours ago
So in other words only if you can point to a local law which requires them to comply with the opt out?
Comment by pixel_popping 16 hours ago
Comment by veber-alex 9 hours ago
If it's discovered they trained on data they shouldn't have had it will be the end of their business.
On the other hand, good luck suing a Chinese company.
Comment by pixel_popping 1 hour ago
Comment by deaux 16 hours ago
Comment by SwellJoe 16 hours ago
Comment by wg0 17 hours ago
Comment by greenavocado 17 hours ago
Comment by andriy_koval 16 hours ago
Comment by deaux 16 hours ago
Not sure about coding usage, Google being weird about these things I could see that quota being separate.
Comment by gessha 12 hours ago
Comment by deaux 5 hours ago
Comment by cassianoleal 16 hours ago
Comment by jenkstom 15 hours ago
Comment by cassianoleal 15 hours ago
Comment by atemerev 15 hours ago
Comment by wolttam 15 hours ago
Deepinfra for example is not preserving thinking correctly for GLM5.1, even though they are for GLM5. This is one of the more obvious issues that crop up.
Comment by thomasahle 12 hours ago
Comment by nisegami 17 hours ago
Comment by jauntywundrkind 15 hours ago
This sounds so so so cool. It would be so amazing to see this unfurl:
> Kimi K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac. By implementing and optimizing model inference in Zig—a highly niche programming language—it demonstrated exceptional out-of-distribution generalization. Across 4,000+ tool calls, over 12 hours of continuous execution, and 14 iterations, Kimi K2.6 dramatically improved throughput from ~15 to ~193 tokens/sec, ultimately achieving speeds ~20% faster than LM Studio.
Comment by cmrdporcupine 16 hours ago
Might be a configuration or prompt issue. I guess I'll wait and see, but I can't get use out of this now.
Comment by sankalpmukim 2 hours ago
Comment by jbaiter 14 hours ago
Comment by cmrdporcupine 13 hours ago
In the past I tried Kimi thru Claude code I might try that again
Comment by oliver236 17 hours ago
Comment by Alifatisk 14 hours ago
Comment by max2026 6 hours ago