MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second
Posted by gainsurier 1 day ago
Comments
Comment by goyozi 1 day ago
Comment by flexagoon 1 day ago
Comment by SwellJoe 1 day ago
Comment by anschl 18 hours ago
Comment by SwellJoe 17 hours ago
I have another coming in a day or so for Gemma 4 with the 4-bit QAT version, which is very surprising (in a good way, Gemma 4 is impressive for this task).
Comment by RussianCow 1 day ago
https://openrouter.ai/deepseek/deepseek-v4-pro?sort=throughp...
Comment by SwellJoe 1 day ago
Comment by sarjann 1 day ago
Comment by brianwawok 23 hours ago
Comment by flexagoon 1 day ago
Comment by thecopy 19 hours ago
Comment by specproc 1 day ago
Comment by binary0010 1 day ago
E.g. occasionally it makes the dumbest mistakes you've ever seen and can't correct them. However it's fairly rare, and if you know the domain really well, occasionally popping in the code and pushing it towards the correct solution takes like 20seconds or whatever.
So the speed you can move with flash + high domain knowledge beats opus by a mile in my experience.
I tried to switch back to 4.8 for a bit when it came out, feels so bad waiting 20mins for a mediocre solution when I could have had everything complete - with multiple iteration cycles - in flash in like 3-5mins.
Comment by addozhang 22 hours ago
Comment by 59nadir 17 hours ago
Comment by Induane 8 hours ago
Or when I'm working two contract gigs. I can spec things out for one and turn it loose and trust it. Then work more closely with deepseek on the other project.
Comment by flowbarai 1 day ago
Comment by throwaway67678 1 day ago
It's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.) and when you tell it to actually do those changes it's pretty much done in half an hour
Comment by smith7018 1 day ago
Comment by overgard 1 day ago
Comment by leodavi 1 day ago
Raw pre-training data includes plenty of conversations between professional builders and some of those include estimates.
I believe the outputs are a training coincidence with consequences that are opportunitistic for the labs.
Comment by AgentMasterRace 1 day ago
Comment by esperent 1 day ago
That said, it'll often say "2 days of work" and then complete the coding in 30 minutes, and while that's amusing, afterwards, I'll need to manually test, or send to other people for review, or realize the agent only actually did half the work and I need to do a second pass (or a third etc.) and then often getting the feature in does genuinely take two days.
Comment by Terretta 1 day ago
It doesn't estimate.
It generates tokens that read like estimates associated with the context in its training material.
What would you expect the generator to output instead?
Comment by legulere 1 day ago
Sure it cannot think like a human, but given it's input, it should give a good statistical answer (approximating not of how long it actually takes, but what a human would say how long it takes).
Comment by mediaman 1 day ago
The most fundamental essence of what they do is exactly what you say they don't: estimate.
Comment by airstrike 1 day ago
Comment by greenavocado 23 hours ago
You can't prove that )))
Comment by airstrike 22 hours ago
Comment by greenavocado 11 hours ago
Comment by incr_me 1 day ago
Comment by carterschonwald 1 day ago
https://github.com/cartazio/oh-punkin-pi/blob/main/scripts/b...
Comment by taneq 1 day ago
Comment by InterviewFrog 1 day ago
At that time the predominant view was that LLMs were nothing but stochastic parrots, that they would plateau, and that hallucinations couldn't be fixed.
At this point I doubt there are any AI sceptics left. That ship has long sailed. The only thing that matters is whether the estimates are accurate, and AI can improve on that too.
Even humans only estimate based on neurons firing in prior patterns.
Comment by monkpit 10 hours ago
Comment by ghshephard 1 day ago
Even Gary Marcus is starting to come around and realize that his priors are no longer as relevant as they once were.
Comment by irthomasthomas 1 day ago
Comment by nl 1 day ago
Will the 10T parameter Mythos model be released this month or next month?
They better soon because it is generally accepted that one of the reasons GPT 5.5 is better at hard tasks than Opus is because of its parameter size - and that Opus 4.8 remains competitive only be scaling test-time compute (see how many more tokens it uses than GPT 5.5)
https://www.reddit.com/r/LLM/comments/1sz8bjz/parameter_esti...
Comment by irthomasthomas 17 hours ago
Anthropic also confirmed they will not release Mythos, only a "Mythos-class" model, whatever that means.
Comment by nl 15 hours ago
I don't think Anthropic have said anything of the sort.
Microsoft published it as 6.1*10^27 FLOPs[1]
Elon has claimed the are also training a 10T model because "Some catching up to do"[2]
Comment by irthomasthomas 12 hours ago
Comment by wild_egg 1 day ago
Comment by irthomasthomas 18 hours ago
Comment by jubilanti 11 hours ago
Comment by Terretta 1 day ago
Logistics for getting to the car wash next door?
In the mean time, alas, no, we can see from actual prompts sent directly or through sub-agents, and actual replies, estimates remain LLM generated.
Though, this discussion here could change that, because indeed there is a lot of special casing and context stuffing going on, one of the oldest being today's date for example.
• • •
I did read the Claude Code leak, and use pi, etc. So I disagree with your premise rather strongly. Today's "systems" remain, roughly, piles of markdown and context engineering wrapped in UI affordances, and behave very similarly today to how they did in 2024 for those already engineering context and delegating.
Comment by ghshephard 1 day ago
I'm still lower-down of the capability scale - as I'm still manually directing agents to do these wiggins loops - obviously the next step up is to direct the code-loops which control the agents. I just haven't got my tooling nailed in place to the point where I find that's more productive.
I actually might agree with you that this is mostly just "next token prediction" - if I can concede that's really all I do as well.
Comment by Terretta 1 day ago
Yep. Pretty sure I've got an LLM inside too.
The other replies complaining that my thinking is so 2023 -- on the contrary, what's evolved is my own apprehension of how LLM-like most "responses" from humans prove as well.
To be sure, there are other mechanisms at play as well, significant differentiation in our... Volume of training material? Quantizations/compression? Model architecture? Just-ahead-of-time forward branching with back propagation? Double loop adaptive learning? You know, harnessing the LLM. :-) Dare we call it executive function?
LLM mode becomes particularly apparent when conversing with Alzheimer's patients in the stage where short term memories do not form but they retain access to long term memory up to, say, 5 years ago or so. Fifty years of who they are, and one can trigger nearly identical responses with nearly identical prompts.
But that same person may be able to debate 1950s politics while being unable to complete making a sandwich.
If they didn't know of new shortcuts for a task, would almost certainly not "estimate" but "intuit", or "instictively" respond (apply heuristics), largely based on their "priors" aka training material.
If you sit with them and chat a while, you'll even get the kind of looping you get from Qwen trying to think when context is too full.
And if we believe this at all, then ... we should stop scrolling tik tok. Time to read a book. Have an experience. Fine tune. :-)
Comment by 8note 1 day ago
Comment by nl 1 day ago
It's been known for some years[1] that LLMs do regression in-context. Frontier models have been trained against many, many issue text that include task break downs and estimates.
Comment by kube-system 1 day ago
I wonder if there’s a reasonable way to give an llm parameters that give it a concept of its own execution speed. Seems that could be useful for multiple purposes
Comment by nl 20 hours ago
Comment by dizhn 1 day ago
Comment by Narciss 1 day ago
Comment by BobbyTables2 22 hours ago
Comment by KronisLV 1 day ago
At least with AI that actually does things more quickly, there is a bit more breathing room (introducing AI is easier than changing a given environment).
Aside from that, I wonder how much variety there is in practice: between "Oh yeah, I added that new button while we were in the meeting" and "The new button feature will be ready in Q3 according to the roadmap, once we have sign-off from all the stakeholders."
Comment by andai 1 day ago
Finally he convinced it to try. It one shotted it in 30 seconds.
Turns out the agents' idea of what is hard and easy also comes from Common Crawl.
Comment by wild_egg 1 day ago
Comment by dr_dshiv 1 day ago
Comment by brianwawok 23 hours ago
Comment by handfuloflight 10 hours ago
Comment by g8oz 11 hours ago
Comment by jimbokun 10 hours ago
Comment by throw1234567891 1 day ago
Comment by andai 23 hours ago
Comment by znpy 18 hours ago
those estimates are based on previous human estimates (the datasets it's been trained on).
unironically, when your comments will become part of a dataset, LLMs will likely get much better at estimating.
now that i think about it, all these writings about LLMs will give LLMs something much like meta-cognition.
Comment by binary0010 1 day ago
Basically I never have to wait - yes I have to tell it little corrections occasionally (but I know the domain really well so that's not an issue), but it's so much faster than anything else it's kinda crazy. I love the super fast speeds with high involvement development cycle.
I actually enjoy using agentic development flows for the first time now - whereas with Claude I absolutely hated it. That 5 to 20 min wait after every prompt absolutely killed my desire to even want to work at all.
Comment by abustamam 13 hours ago
Comment by throw-the-towel 1 day ago
Comment by tmaly 1 day ago
Comment by andai 1 day ago
Comment by znpy 18 hours ago
the way software engineering works these days reminds me a lot of factory workers on production lines that just sit in front of a production line all day and take out faulty items and/or perform a single step in the production of goods.
Comment by behnamoh 1 day ago
Comment by rubyn00bie 1 day ago
GamersNexus had a really good investigative piece (~3hrs long) on this where they went to China and met with grey market sellers. That piece absolutely pissed off NVidia and resulted in a fight with Bloomberg too.
Deepseek may be also be running inference on oodles of Chinese hardware but it wouldn’t surprise me for a second if they just acquired Blackwell chips through the grey market. The original Deepseek models were all trained using NVidia chips if I remember right.
Comment by seewhydee 1 day ago
Comment by ljosifov 20 hours ago
https://x.com/ljupc0/status/2062457314414587996
Other local models I've checked drop to unusable speeds way sooner. Only other model with similarity favourable curve I've tried is nemotron-cascade-2-30b-a3b. But it's a small model, way dumber than DS4F.
Coding agents use cases have large context depths. The rate of decline is as important as the headline number.
Comment by switchbak 1 day ago
But truly, using Cerebras at ~2k tokens/s, with very low latency is like a vision into the future. You start to rework your workflow around things that can happen without onerous manual review - stating the conditions for success, etc. It's rare that I have a problem that maps well to that, but I expect this is where things are headed.
Of course the fast models tend to not be the SOTA ones, but if that was the case - high quality and near-instant thinking, that's a game changer that I don't think we're really prepared for. The things that get unlocked with higher-than-reasonable speed become very interesting.
Comment by lhoff 20 hours ago
Comment by colordrops 18 hours ago
Comment by alfiopuglisi 12 hours ago
Comment by skybrian 1 day ago
This is normal interactive UI for tasks that aren't compute-intensive. Programs spend most of their time idle, waiting for us to click a button. We shouldn't be waiting for them or spinning more plates to keep them busy.
However, a faster llm isn't enough. You also need fast compiles and fast tests.
Comment by dkersten 1 day ago
I haven’t tried cerebras’ 3000 TPS yet but I did try the demo of that 15,000 TPS model whose name escapes me right now.
I’m not sure if it makes a meaningful difference for my actual work, but it sure is amazing to watch it generate a screen full of text in the blink of an eye.
I do think it’s super useful for rubbing little validation checks like showing it a diff to ensure that the changes are on task, and being able to do those quicker really helps because it means you can do many focused checks without them getting in the way.
Comment by robberth 1 day ago
Comment by msdz 1 day ago
Don't get me wrong though, that demo is still incredibly impressive & makes me very much excited for the hardware-based model era (potentially) ahead.
Once you've experienced those speeds, you really start to think about the whole class of things that becomes possible; massively parallel decode paths, extensive reasoning loops, etc…
Comment by hedgehog 1 day ago
Comment by dkersten 19 hours ago
The speed is incredible and fun to see, but the model is rather weak to the point where I’m not sure it’s particularly useful for most people.
Comment by ayewo 1 day ago
You were likely thinking of AI accelerator startup Taalas.
Previous HN discussion: https://news.ycombinator.com/item?id=47086181
Comment by coderbants 1 day ago
Then I ask it to do something else and it goes off-road and where I used to be able to interject with a "wow wow wow, that's not right", by the time I see the text on screen and react it's already made massive changes. Short of making it commit between every edit it's hard to prevent it from going wrong as quickly as it goes right (and even then, it can make a boo-boo on a remote API too depending on how much privilege it has).
Comment by bendangelo 1 day ago
Comment by ipkstef 1 day ago
Comment by ketzo 1 day ago
Comment by RussianCow 1 day ago
Comment by goyozi 1 day ago
Comment by devmor 1 day ago
Comment by goyozi 1 day ago
Comment by yunohn 1 day ago
Basically the entire token-maxxing AI hype train in a nutshell. Lovely!
Comment by goyozi 1 day ago
Comment by drob518 1 day ago
Comment by pianopatrick 1 day ago
So long as AI lives in server farms, humans will be needed for tasks in the physical world.
It's only if we combine AI with robots that things get really dicey.
Comment by fartfeatures 1 day ago
Comment by davedx 1 day ago
Comment by ionwake 1 day ago
This is brilliant as it reminded me of a famous hitchikers quote:
"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. — From The Restaurant at the End of the Universe (Book 2)"
Maybe we are stuck in an eternal loop
Comment by fartfeatures 1 day ago
Comment by cicko 1 day ago
Comment by nativeit 1 day ago
Comment by throwaway67678 1 day ago
Comment by Muromec 1 day ago
Comment by efromvt 1 day ago
(I should go measure this now, I'm curious)
Comment by noisy_boy 22 hours ago
We need to really worry when we get amazing results very fast.
Comment by cman1444 23 hours ago
Comment by lukan 19 hours ago
Giving directions and verifying its output? But my mental capacity is still limited. I can make way more prompts, than I can read code.
Comment by HarHarVeryFunny 1 day ago
There can't be many normal use cases where there'd be any cost benefit.
Comment by fragmede 1 day ago
It's a cute toy right now, but you can tell an LLM that it's an http server, and have it respond directly to a web browser hitting it. It generates headers in response, as well as page contents. As 1000 tok/sec becomes three new normal, we will come up with newer ways to use it outside of toy fiction encyclopedias.
Comment by HarHarVeryFunny 1 day ago
I'm not saying there aren't any use cases for super-fast (and super-expensive) generation, but it does seem a bit niche. If it was free then sure faster is better, but what are the mainstream use cases where people might pay 3x more for a faster version of something that is already fast?
I think it would have to be an application where it paid for itself - where the 10x faster response was actually worth more than 3x the cost to you - where the extra speed was worth the extra cost.
Comment by binyu 1 day ago
I dont doubt it, but I don't think you can spawn 10 copies of yourself working simultaneously.
Comment by AlecSchueler 1 day ago
Comment by pixel_popping 1 day ago
Comment by logankeenan 1 day ago
Comment by AlecSchueler 18 hours ago
Comment by ilaksh 1 day ago
It will go much faster.
Comment by UncleOxidant 1 day ago
Comment by giancarlostoro 1 day ago
Comment by recroad 1 day ago
Comment by goyozi 1 day ago
Comment by fnordpiglet 22 hours ago
Comment by OtomotO 1 day ago
Doing non trivial work.
Comment by Bombthecat 1 day ago
Comment by joshcreates 1 day ago
Comment by dakiol 1 day ago
So, if any, I would say it's worse for us. Obviously, it's the completely opposite situation for corporations and executives: they are loving the AI situation so much!
Comment by powerapple 1 day ago
Comment by erikus 1 day ago
Build and test would move back into the critical path, though, and for some projects that will take effort to bring down.
Comment by ttoinou 1 day ago
Comment by drob518 1 day ago
Comment by the_sleaze_ 1 day ago
Comment by Lalabadie 23 hours ago
It's just that lots of owners want a company that pulls them away from all other areas of life.
Comment by drob518 8 hours ago
Comment by croon 18 hours ago
Comment by mettamage 1 day ago
I am on Dutch subreddits a lot, to get a local pulse and not to be too HN minded.
A lot of them would have vilified you by now. Some even would have even questioned your morality.
Again, I agree with you. But clearly not everyone has this view.
Comment by dakiol 1 day ago
Comment by ttoinou 1 day ago
Comment by mystifyingpoi 1 day ago
Comment by opsnooperfax 1 day ago
Comment by ai_slop_hater 1 day ago
Comment by ttoinou 1 day ago
Comment by formerly_proven 1 day ago
Comment by ttoinou 1 day ago
Comment by ai_slop_hater 1 day ago
Comment by ttoinou 17 hours ago
Comment by ai_slop_hater 12 hours ago
Comment by razodactyl 3 hours ago
In retrospect, many companies you get turned down from are likely companies you don't want to work for anyway hence the incompatibility.
It may be hard, but positive mindset will go very far towards enhancing your outcomes - you need to bring others up around you as well. Pause on this and think about the first thing that comes to mind when you respond to these words.
Comment by ai_slop_hater 3 hours ago
> you need to bring others up around you as well
I am not 100% sure what you mean here, but I don't think that I have the authority or reputation to "bring up others." I find that telling other people what to do is futile, and the best I can do is leave them alone and let them learn from their experiences, or else you might be labelled a "rock star," which is coincidentally being discussed on lobste.rs right now:
https://lobste.rs/s/uvwcdo/cleaning_up_after_ai_rockstar_dev...
Comment by razodactyl 4 hours ago
Comment by dilyevsky 1 day ago
Comment by erikus 1 day ago
It also makes me think about the temptation to stop thinking with these tools, i.e. "cognitive surrender". Addy Osmani wrote a nice blog post about this: https://addyosmani.com/blog/cognitive-surrender
Comment by fatata123 1 day ago
Comment by andai 22 hours ago
Comment by pmontra 1 day ago
If you start the AI on something big and come back after one hour then yes, you might discover that you wasted an hour and got nothing.
Comment by schipperai 1 day ago
I’m excited for ultrafast AI. It likely means less temptation to multi-thread and deeper flow in single sessions.
Comment by 8note 1 day ago
Comment by jorl17 11 hours ago
Very often I do catch LLMs, even the best such as Opus, confidently saying wrong things about areas in theory I know little of. And sometimes I fail to catch them and only realize that later on….sort of like…how I learned my whole career? So many wrong abstractions, tools, and so many hard earned lessons. With LLMs it’s the same, but the process is much faster. For critical decisions I don’t blindly trust an LLM, for example.
Comment by schipperai 16 hours ago
For domains whete SoTA is constantly changing like AI, I use LLMs to aggregate and interact with my own research from trusted sources ala Karpathy LLM wiki.
I don’t generally trust everything I read on the internet whether its AI generated or not. I do my own research for the things that matter to me.
Comment by Klaster_1 22 hours ago
Comment by jorl17 15 hours ago
Also, with the added speed I can produce things more in line with the quality I’ve always wanted to add (many more tests, for example).
Comment by himata4113 1 day ago
Comment by vanuatu 1 day ago
Comment by himata4113 1 day ago
Comment by vanuatu 1 day ago
I think due to how leveraged software is, the top % of software developers are more desired (and compensated) than ever, and the bottom % will have difficulty finding a role, and there are structural barriers to entering that top % (intelligence, location, etc). Companies have infinite demand for the cream of the crop talent
Comment by himata4113 1 day ago
However, software development is funny in a way where you don't need a job in order to be successful. I've never worked at a company and I'm pretty up there on the ladder, but I am not quite sure what will happen in next few years when ever possible thing that can be made in software is already explored to the fullest especially with singular developers launching 3 to 7 projects a month.
Comment by DenisM 1 day ago
Consider that our ability to evaluate quality of the output is falling further behind our ability to produce it. The “right answer” is not the most likely outcome.
Comment by drschwabe 1 day ago
Comment by __david__ 1 day ago
Comment by overgard 1 day ago
Comment by fullstop 1 day ago
Comment by linsomniac 1 day ago
The thing I really love about working with computers is when I achieve something. That's the thing that makes me figuratively, and sometimes literally, throw my fists into the air and go "Yeaaah!"
With the AI tooling, I'm getting those more like a couple times a week.
Plus, I'm using AI to attack the things in my day that are "a drag", and getting them done too.
The highs are more frequent and the lows are not so low.
Comment by fullstop 1 day ago
It feels like it cheapens the whole thing. Maybe I'm just old, because I remember people saying the same thing about code completion in Visual Studio back in the late 90s.
This is so much more than code completion, though.
Comment by dd8601fn 1 day ago
Did I ask for better things with some important concepts pre-rolled? Yeah, of course. But that’s so, so much less interesting than having actually made a thing.
I try to remind myself that the output of my projects have nothing to do with who I am, but the honest truth is they always mattered to me.
Now that’s dead, and it’s never coming back. It ain’t exactly existential dread, but it is something I’ve lost.
Comment by dd8601fn 1 day ago
It felt like that, kinda, for a bit. Now whenever it does something for me I get nothing. I didn’t do it… the chatbot did. What’s for me to celebrate? How can there be any real pride or satisfaction for a thing that was just handed to me because I asked for it?
If anything it diminishes my satisfaction looking back on previous projects. They’re “a few hours with a chatbot”, now.
The things I had to learn and the informed decisions I had to make? All pointless trivia, now. A child could do it.
The magic and possibilities parts just all wore off after a heavy run, and I don’t know if that’s ever coming back.
Comment by linsomniac 1 day ago
But, I'm not going to yuck your yum. I appreciate the people who do jointery using hand tools, even if I'm out here with a track saw and a router.
Comment by fullstop 1 day ago
The track saw and router, imo, are existing libraries.
Comment by pmontra 1 day ago
Probably this is a hyperbole. Did you do the experiment? I expect that the child won't be able to do it. Ask an adult. Same thing. Ask an expert of the domain. Maybe but not as fast or as good as you.
Comment by dd8601fn 20 hours ago
Comment by vanuatu 1 day ago
Equity / profit sharing should be commonplace in the age of AI.
Comment by enraged_camel 1 day ago
Comment by fragmede 1 day ago
Comment by marknutter 12 hours ago
Comment by IncreasePosts 1 day ago
Comment by yogthos 1 day ago
Comment by logicchains 1 day ago
If you're treating it like a slot machine you're doing it wrong. It will give you exactly what you ask for if you ask clearly, i.e. write a clear, detailed specification, not just "do X!". The nondeterminism comes from vagueness in specification.
Comment by noncoml 1 day ago
First make it write a contract (REQ/ARCH/IMPL documents). Skim through those for any mistakes.
Then based on those ask it to write tests. Again skim through them.
Now you have a context full of guardrails. It’s less likely to surprise you.
Comment by petesergeant 1 day ago
Comment by alfalfasprout 1 day ago
Comment by alfredoh07 12 hours ago
Comment by amunozo 1 day ago
Comment by MangoCoffee 1 day ago
i've a Github copilot yearly subscription. Microsoft recently changed their billing to based on token. i'm still getting billed per premium request but GPT 5.4 is now 6x compare to 1x before.
Comment by reactordev 1 day ago
Comment by AndrewKemendo 1 day ago
Comment by fillskills 1 day ago
Comment by reactordev 1 day ago
You are right to be scared, because this race to the bottom also provides open weights/models/qat’s for the rest of us and it’s been crazy to see how good they can be on a consumer grade RTX card.
Comment by throwa356262 1 day ago
Comment by reactordev 1 day ago
Comment by fortzi 1 day ago
Comment by ilaksh 1 day ago
Comment by petesergeant 1 day ago
Comment by RussianCow 1 day ago
My current workflow involves going from PRD -> execution plan -> build -> review, and this works nicely with open weight models like GLM 5.1, Kimi K2.6, and DeepSeek V4 Flash. With Opus I can generally skip the PRD entirely, and sometimes even skip the plan, and 80-90% of the time it does exactly what I want. But that can easily burn $5-15 for one feature, whereas it'll cost maybe $1-2 with the open weight models (at API pricing).
Comment by andai 22 hours ago
That's the main thing I've noticed. Small models can follow instructions just fine. If the instructions are very specific. Then I often have to spend more time explaining a task than it would have taken me to do it myself.
The bigger models have a lot more common sense.
I wonder if that could be improved slightly through prompting. Asking it to clarify anything that's confusing. Or maybe it just makes incorrect assumptions without realizing the ambiguity. One way to find out!
Comment by ilaksh 1 day ago
Comment by andai 22 hours ago
I'm seeing some people say flash is amazing and can handle everything, and some say it's useless. It seems to depend on the task. I think it depends on the harness too (it works better in Claude Code in my experience, it's probably been trained on that).
Comment by ilaksh 11 hours ago
it has limitations but it is way better than I expect from something named Flash that is open source.
Comment by Schlagbohrer 18 hours ago
Comment by csomar 14 hours ago
This is at least my experience with Claude Code as harness. Also, GLM pricing is not that far off from Claude. It's cheaper but not DeepSeek cheap.
Comment by polski-g 1 day ago
30 day eval for each.
Comment by kypro 1 day ago
I genuinely don't understand what moat these US model labs have. If they're saying recursive self improvement is just around the corner and Chinese labs are only slightly behind the leading US models, what moat does the US labs have? Are the US models going to recursively self improve better than the Chinese open source ones or something?
I might be completely wrong about this, but if I had money in OpenAI or Anthropic I'd be pulling it all right now. I think the chance of them going to near-zero over the next few years is very significant.
Comment by hobofan 1 day ago
Or Google. I'm working with multiple customers right now that are very pissed at Google for deprecating Gemini 2.5 Flash, canning the GA release of 3.0 Flash and now have to decide whether to bite the bullet of the 5x price increase for 3.5 Flash or switching providers. Quite a few of them will likely fully pivot to open models.
Comment by bachmeier 1 day ago
Comment by hobofan 12 hours ago
The only ones I've seen switch to 3.1 Flash Lite were from 2.5 Flash Lite, and all for the most simple use cases, e.g. small UX enhancements.
Comment by lokar 1 day ago
Comment by GoToRO 1 day ago
Comment by ChrisClark 1 day ago
Comment by tancop 1 day ago
Comment by varispeed 1 day ago
Comment by ignoramous 1 day ago
For non subsidized plans? Pretty sure they'd need to put this in ToS, or law suites would have followed by now.
Comment by trollbridge 1 day ago
Sometimes Opus just gives me a rubbish session.
Comment by chairmansteve 7 hours ago
Comment by RussianCow 1 day ago
Comment by ignoramous 2 hours ago
Comment by sometimelurker 1 day ago
Comment by csomar 21 hours ago
2. They are doing lots of shady stuff that would have gotten someone else banned from visa/mastercard. Your paid off plan literally changes after billing...
I think people are letting them fly for now, because if it turns out true that they'll have AGI they want to be on their good side? We might see the knifes getting pulled otherwise.
Comment by throwaway894345 1 day ago
Comment by comboy 1 day ago
On HN China is seen as a cheap labor copycat. This used to be a fair approximation at some point in the past. In my opinion China is getting ahead of everyone else much more than US used to be.
SF is a beautiful thing in the US, vast power and wealth comes from there. Smart people collaborating communicating and building fast and with excitement. China did SF kind of thing for many different sectors in many different places.
Comment by Octoth0rpe 1 day ago
Comment by lokar 1 day ago
Comment by Schlagbohrer 18 hours ago
Comment by nl 1 day ago
The $0.87/M tokens price for Mimo Pro is probably subsidized.
Mimo models aren't widely available on western providers, but Kimi and Deepseek are similar sizes and cost about the same to run. They are priced $3-$4/M tokens (which is right were Google's very confused range of Flash models are priced at: between $0.40/M tokens and $9/M tokens depending on exactly which model - and you don't want the $9 one!).
Anthropic overprices Sonnet (probably because of their capacity issues). GPT 5.4 mini is $4.50/M tokens.
Comment by Cakez0r 15 hours ago
Mimo is also widely available on western providers. It's on openrouter and you can sign up with Xiaomi directly for a token plan on an English website priced in dollars.
Comment by rstuart4133 22 hours ago
It was pretty clear the USA won World War 2 because it out produced and out innovated everyone else. Probably with that in mind, after World War 2 the USA adopted the "Vannevar Bush" model, summarised in this picture: https://www.researchgate.net/figure/annevar-Bushs-Science-th... The idea is to jump start R&D through public funding. The hoped for outcome was that R&D feed private enterprise, leading to a productivity boom.
The boom happened, and the USA did seem to out-compete everybody else in R&D, science, and the products they delivered for decades after that.
That way of doing things seems to have faded over time in the USA. The decline seemed to coincide with the rise of Neo-econmics, and now of course it's been obliterated by Trump. He's very keen to fund Intel to produce chips in a year or two's time (which is something the stock market and banks do perfectly well), but funding basic science is getting drastic cuts.
Still other countries noticed the rise of the USA, and some adopted similar funding models for basic R&D. China seems to have picked it up with gusto, both subsidising R&D and STEM training, leading to huge numbers of engineers and scientists. Whether it will lead to an economic boom remains unknown, but acceleration of ideas and innovations coming out of China seems undeniable. More recently, Ukraine showered its local engineering garages with funds in the hopes of getting a similar outcome to the USA in WW2. It looks like it worked. If the Iran war continues, it's entirely possible arms trade will reverse: the USA could well start buying drones off Ukraine.
Comment by throwaway67678 1 day ago
Comment by ecshafer 1 day ago
Comment by nmfisher 1 day ago
Comment by throwaway67678 1 day ago
Comment by orphea 1 day ago
Comment by throwaway894345 1 day ago
Comment by orphea 16 hours ago
Comment by throwaway894345 14 hours ago
Comment by trollbridge 1 hour ago
This thing is seriously fast and was good enough to switch it in for the other model I was using. I tried it for both planning, executing, and subagent tasks and it performed adequately in all 3.
So, this is another one to add to the list next to DeepSeek-V4-Pro and Qwen-3.7-Max...
Comment by kingstnap 1 day ago
Comment by miroljub 1 day ago
Comment by chrismustcode 1 day ago
Comment by miroljub 1 day ago
Comment by handfuloflight 1 day ago
Comment by guilamu 1 day ago
Comment by miroljub 12 hours ago
Comment by HDBaseT 1 day ago
Comment by miroljub 15 hours ago
And yet, OpenCode Go offers DeepSeek flash 6 times cheaper than DeepSeek itself. And they claim they are still profitable.
Comment by pmxi 1 day ago
It’s not even close to frontier meaning it’s the best intelligence.
Comment by LoganDark 21 hours ago
Comment by jorl17 14 hours ago
I have tried using deep seek flash and pro but they make amateur mistakes. Sonnet level at best.
However v4 flash is absolutely amazing as a generalist model and it’s what we’re using on a product built on top of LLMs. I wish I could code with it but it’s not going to happen anytime soon
Comment by LoganDark 11 hours ago
Comment by jorl17 10 hours ago
Usually I'm working on a large task, typically with Opus, while also having a bunch of smaller tasks in their own independent worktrees. Those still need supervision, but less. My goal was to get deepseek to drive the cost of those down, but it was too slow and unreliable...
Comment by tmaly 1 day ago
Comment by SwellJoe 1 day ago
Comment by diordiderot 1 day ago
Also more nuclear than anyone, which one must assume you hate, because preferring solar requires you don't actually understand thing
Comment by yxhuvud 18 hours ago
Comment by amunozo 1 day ago
Comment by ignoramous 1 day ago
It is another thing the BigLabs accuse open weight models of benefiting from distillation & other techniques & essentially avoid higher training costs (which typically bleed into bills end users pay for inference).
Ex A: https://www.anthropic.com/research/2028-ai-leadership
Ex B: https://www.reuters.com/world/china/openai-accuses-deepseek-...
Comment by trollbridge 1 day ago
In this case, at least it’s threatening multimillion dollar salary jobs instead of entire towns of working class people in America or Mexico.
And the Chinese labs actually release their weights. You could call it… open AI.
Comment by ncr100 1 day ago
Comment by overfeed 1 day ago
Comment by drawfloat 1 day ago
Comment by flexagoon 1 day ago
Comment by amunozo 1 day ago
Comment by gertlabs 1 day ago
Data at https://gertlabs.com/rankings
Comment by unrvl22 1 day ago
Comment by gertlabs 1 day ago
MiMo v2.5 is on there, as well as the pro version.
We found a few anomalies in our evaluations, which makes sense -- if every new sub-release is better across the board in every area of the model card, that should raise alarms about benchmaxxing. But the main thing we found is that hype != performance, and I trust our benchmark methodology significantly more than the model cards the labs add to their press releases.
Comment by andai 22 hours ago
Flash handles it fine, which I found amusing. (Since Mimo is supposed to be opus level!) But Flash seems to work even better in Claude Code...
With smaller models I always have the issue of needing to adapt myself to their preferred workflow... which sort of defeats the purpose. Price is hard to beat tho :)
Comment by ricardobeat 18 hours ago
When it gets stuck, I get one-shot advice from Claude or DS Pro. I’ve done massive amounts of work for cheap this way.
Comment by digdugdirk 1 day ago
Comment by gertlabs 1 day ago
We didn't love the results because it draws negative scrutiny to our benchmark, but the results are real and done at scale and I think DeepSeek V4 Pro's inability to do agentic work outside of environments it was trained on is an important thing to measure, especially when so many other models can generalize to new environments just fine.
Google models also struggle with tools, but they have very strong initial answers, so there is more potential for them to bridge the gap with some better post-training.
Comment by serpix 1 day ago
Discussions about choosing a library with the best syntactic sugar method naming is just as crazy as suggesting we type in assembly.
Comment by alkyon 1 day ago
Comment by cdata 1 day ago
This strategy will seem to work really well until the economy that enabled that foundation to form is hollowed out. Then, there will be a reckoning (but we will have no choice but to march forth from there).
Comment by patates 1 day ago
I'm not agreeing or disagreeing with you, but my brain cannot comprehend how machines can advance such interconnected systems while keeping humans in focus.
Perhaps I shouldn't have watched the Animatrix again.
Comment by solenoid0937 1 day ago
There will only be a reckoning if models don't get much better.
If they do get much better you can just have them refactor, fix bugs in, or replace the existing codebase.
The concept of tech debt is sort of meaningless if you anticipate intelligence gains in models to continue.
Comment by chairmansteve 1 day ago
If you haven't seen it, I think you would appreciate the film Margin Call.
Comment by gbro3n 1 day ago
Comment by DoctorOetker 20 hours ago
Its already speeding up human decision processes, and while ethics / alignment may seem unique to humans we also see normative expressions in monkeys or apes (like the experiment where one is given a grapes, the other cucumber).
A lot of ethics is based on symmetry: symmetric relations, equal rights, equal voting power, ... symmetries sound rather mathematical if you ask me, and decision structures have historically been pressed towards democracy (or at least depiction of it). One could say that modeling humanity as an empire with a king, ignores the will of sometimes hungry farmers with pitchforks. To prevent the occasional "implicit democracy" (royaltycide), it turned out in the interest of the king to recognize the powers of those farmers, and to formalize it in the decision making process. Or at least pretend to.
I believe machines will be able predict the preference sentient creatures would prefer in terms of decision structures, but I don't believe it will be able to predict (without human exposition) those novel preferences that stem not from sentience but from being specifically human properties (i.e. irritants which are quasi universal for humans, etc.), some of them humans know how to make predictions for (we can run expensive simulations modeling what happens when protein X is exposed to substance Y, and then make heuristic predictions of the effect on a full human in a realistic environment). So at a fundamental level I agree: machine learning models are not guaranteed to help much in predictions concerning entirely unexplored territory, neither by humans nor by natural selection. But it will definitely be capable of replacing the average human job, which doesn't involve consensual exploration outside of the homeostasis required in the implicit job description, that seems entirely automatable, regardless if its physics, mathematics, (harder than computer science), let alone programming.
It won't be able to magically systematically correctly predict out of distribution datapoints, it could only explore it like humans could by trial and error.
Comment by noman-land 1 day ago
Comment by vitalyan1234 1 day ago
Comment by chairmansteve 1 day ago
Comment by andriy_koval 1 day ago
In software + GenAI now every housewife can build some App over evening.
Comment by kajman 1 day ago
Comment by epolanski 1 day ago
Especially as teams invest in proper agentic harnessing.
We have had a champion in our team that has invested a lot of time into it over the last 4 months, and if anything, quality has improved, not decreased. Architecture is more coherent, codebase has been cleaned up, agents find information quickly, code produced is very solid and my role is more and more checking that the output meets the requirements. But I cannot confidently say that I would've done a better job than AI more often than not I have to admit it does a better job than mine.
The mistakes are less and less technical and merely in the domain mapping. And AI is still not creative as I am for finding solutions quickly to unlock stakeholders' issues. Also, AI is still not creative as I am for finding the proper solutions for advanced technical problems. But it does a better job than me, even on that front, one shotting few solutions in a fraction of a time it would've taken me to test one idea myself.
Mind you, I don't like AI and I think it ruined the job, I don't like working this way, it's exhausting, way more work on one side, way less fun and fiddling with technical parts.
And yet, I have the genuine belief that few years from now we'll be cloning open source repositories that are already optimized/harnessed and tested for agentic loops and best practices left and right with software engineers mostly overseeing the domain translation and putting their 2 cents on the non-boilerplatey parts of the product (which, in general, are a small part of the surface).
I think that the next years of my career will be mostly spent in setting up and writing the harnessing and domain mapping part. Then I will move to another sector, not because I necessarily believe I won't have a job, but because I want to vomit thinking that's going to be my job.
Comment by altcognito 1 day ago
"Watching John with the machine, it was suddenly so clear. The terminator would never stop. It would never leave him, and it would never hurt him, never shout at him, or get drunk and hit him, or say it was too busy to spend time with him. It would always be there. And it would die to protect him. Of all the would-be fathers who came and went over the years, this thing, this machine, was the only one who measured up. In an insane world, it was the sanest choice."
As long as you've indicated what you want, the machine will try to do what you ask of it. It won't get tired because "the codebase is too big", or it has gotten bored of the pattern, or it wants to introduce a new technology.
It just does the thing you asked of it. (note, that yes, I get that as a codebase size increases, it might make it more difficult to fit into context, but that only applies if it needs to read a large percentage of the project to implement the task, which shouldn't be the case.
Comment by epolanski 1 day ago
Comment by altcognito 1 day ago
Comment by andriy_koval 1 day ago
there are good actors, which are empowered by AI to produce positive impact, but often there are N times more bad actors, which push crappy code to close feature requests fast, increase performance LoC-like metrics, etc.
Comment by solenoid0937 1 day ago
Comment by acdha 1 day ago
Comment by solenoid0937 17 hours ago
Comment by HanClinto 1 day ago
Comment by eunos 1 day ago
Comment by 9cb14c1ec0 1 day ago
Comment by asveikau 1 day ago
> No one cares anymore.
I never cared about this.
I think this captures something that I've been searching for the words for. (Maybe I should have gotten an LLM to write the words for me.) Some of the biggest AI boosters are the kind of dev that would have cared about the new frameworks of the last 3 months. They had a "the framework does all the thinking for me" attitude already, so it is easy for AI to slot into that.
Comment by LASR 1 day ago
Comment by mountainriver 1 day ago
Comment by osti 1 day ago
Comment by ecshafer 1 day ago
Comment by greenavocado 1 day ago
Comment by ilaksh 1 day ago
It's going to skip the code entirely for small businesses and just render UIs straight from context data and prompts at interactive speeds. Kind of like Google's Genie does with games but much more accurately.
Comment by dakiol 1 day ago
Comment by andriy_koval 1 day ago
it needs to win marketing landscape, hyper-overcrowded by thousands of competitors, slop-gened over weekend.
Comment by kajman 1 day ago
Comment by unshavedyak 1 day ago
I have a more hopeful take. As AIs improve and get faster we can more quickly and iteratively improve code which we may have historically avoided due to the work involved.
I know i've made several refactors that would have otherwise been insane lifts. Not only because the work involved but because sometimes you don't know if it will work, and so you have a sort of double friction; you don't know if it will even succeed. With an AI you can just throw it at the refactor to see if it runs into a problem all while you're having a coffee break or w/e.
In general AI is going to enable humanity to be more extreme versions of itself. For good and bad. I suspect more bad than good, though.
Comment by tmaly 1 day ago
Comment by lionkor 1 day ago
Comment by visarga 1 day ago
If you extract the spec from first implementation and reimplement from scratch you get a free testing oracle. Where they diverge you send the agent to decide which one had a bug.
Comment by unglaublich 1 day ago
Comment by sagarp 1 day ago
Comment by Paradigma11 20 hours ago
Comment by andai 22 hours ago
VibeOS — Fully Hallucinated Operating System
Comment by oulipo2 1 day ago
Comment by unglaublich 1 day ago
Comment by prplfsh 1 day ago
Comment by jeffrallen 1 day ago
Comment by eli 1 day ago
For a while I was running Cerebras GLM 4.7 for a bunch of tasks. Not a very smart model, but it's fantastic to be have a live prototype of a site up and be able to type "make the fonts bigger. No not that big" and see it change in real time. And MiMo 2.5 is a lot more capable than GLM 4.7.
Comment by maxdo 1 day ago
Comment by jona-f 1 day ago
Comment by ignoramous 1 day ago
MiMo 2.5 is not the same model as MiMo 2.5 Pro.
GLM 5.1 is z.ai's lastest iteration & is one of the popular open weight coding models.
If you've had the chance, how does GLM 5.1 (which is now more expensive than MiMo 2.5 Pro after its recent 70% price drop) compare?
Comment by eli 1 day ago
But quite a bit more expensive than MiMo 2.5 Pro. Like 5x to 10x more on my little tests, at least by the API rates.
Comment by PhilippGille 20 hours ago
> On the model side, we applied FP4 quantization
> introduced DFlash, an efficient speculative decoding method based on block-level masked parallel prediction
> On the system side, TileRT perfectly adapts to the dynamic characteristics of these algorithms
> 1000+ tokens/s output [...] using just a single standard 8-GPU commodity node
Comment by scosman 1 day ago
Comment by adrian_b 1 day ago
Comment by btian 1 day ago
Comment by scosman 14 hours ago
Comment by lostmsu 1 day ago
Comment by johndough 1 day ago
Comment by michael-ax 1 day ago
Comment by Oras 1 day ago
Comment by trollbridge 1 day ago
Comment by 0xbadcafebee 1 day ago
Comment by wartywhoa23 1 day ago
Comment by eli 1 day ago
Comment by adam_arthur 1 day ago
Not nearly as obvious as the ones from 6 months ago, but seems to be more the use of hyperbolic phrasing in a particularly unnatural way.
The assess/explain, then hyperbole at the end kind of structure.
Top comment looks suspicious from this perspective, but it's kind of a losing battle to be able to differentiate them with sufficient accuracy anyway
Comment by marknutter 11 hours ago
Comment by adam_arthur 6 hours ago
So if you think none of these comments are written by LLMs, you're probably mistaken too.
In the end we accept that we can't tell anymore and move on (barring some biometric protocol that can't be gamed via automation)
Comment by pants2 1 day ago
$2.61/M tokens * 1,000 tok/s = $9.40/hr
That would be pretty cheap for an 8-GPU node which would typically run around $45/hr or more. Guess this depends on how many parallel streams it can handle.
Comment by maxloh 1 day ago
The Xiaomi team really brought something to the table.
Comment by ilaksh 1 day ago
Comment by GodelNumbering 1 day ago
> "However, naively applying FP4 across the entire model causes degradation in complex reasoning, logic, and code generation. Given the MoE (Mixture of Experts) architecture of Xiaomi MiMo-V2.5-Pro — where Experts constitute the vast majority of parameters and exhibit the highest tolerance to quantization — we selectively quantize only the MoE Experts to FP4 while preserving original precision for all other modules. Through FP4 QAT (Quantization-Aware Training), we dramatically reduce model size and maximize hardware bandwidth utilization while keeping the model's overall capability essentially on par with the original, as shown below"
Comment by buildbot 1 day ago
Comment by sheeshkebab 1 day ago
Comment by irthomasthomas 1 day ago
Comment by gekoxyz 1 day ago
Comment by jdthedisciple 1 day ago
Comment by throwa356262 1 day ago
Given the export restrictions this could mean they need to prioritise how to best use their limited hardware. But they could also be moving to Huawei GPUs like deepseek did and simply not have stable hardware or software for a large scale deployment yet.
This is just speculation based on the MXFP4 support on Huawei GPUs that is lacking on some nvidia GPUs.
Comment by ilaksh 1 day ago
Comment by boutell 1 day ago
I think the answer is that there's a tradeoff here where additional throughput for a single person can be achieved only by tying up more resources than a normal request would, even when you take into account the fact that the normal request takes longer to finish. I'm not an expert, but some of the optimizations they describe, particularly the parallel prediction stuff, sound like they could take up extra resources.
Comment by HarHarVeryFunny 14 hours ago
But it may well do. They mention TileRT in the announcement, so this speed comes from low level optimization for some specific GPU target.
With availability of SOTA western GPUs being scarce in China, they may well have a mishmash of different GPUs.
Comment by boutell 13 hours ago
Comment by HarHarVeryFunny 1 day ago
Comment by slaw 1 day ago
Comment by minraws 1 day ago
I think the margins are getting quite compressed with this one, since it isn't included in token plan and the actual costs increase are much higher than just 3x. But still fairly decent.
Comment by throwa356262 1 day ago
Remember, these guys are not VC backed. Anything they do must break even
Comment by JayStavis 1 day ago
Understand the spirit of this, but probably not true. I don't think Xiaomi, or any big tech company, needs to break even on their new model releases.
Comment by varispeed 1 day ago
From that point of view, they have as much money as they need. That's why there is no "VC", because Chinese government assumes that role.
Comment by throwaway67678 1 day ago
Comment by _pdp_ 1 day ago
It will be cool to measure models based on their RAW performance and measure them in terms of ROI - not some benchmark but something meaningful like we used this model to solve X.
That will be a massive mind shift and might justify the token expenditure.
Comment by HDBaseT 1 day ago
We used the AI to solve given problem with x% adherence/quality/correctness?
Comment by zero0529 15 hours ago
Comment by GaggiX 13 hours ago
Comment by PhunkyPhil 1 day ago
Despite the performative UI components they have a shipped (demo) product:
This is only 3.1 8B and a very small context window, but at 17k tokens per second it's likely enough to reliably call tools which would make a huge difference in agentic applications. Assuming they can bake in better models I'm just as bullish or even moreso on this, considering this opens up edge computing at the extremely low power requirement.
High tok/s is the future IMO.
Comment by Frannky 22 hours ago
Comment by temikus 1 day ago
Comment by __natty__ 1 day ago
Comment by RachelF 1 day ago
This could bring proper desktop AI to the average laptop user, which could be a game changer for running local models.
Comment by npn 1 day ago
edit: now I read the article fully, seems like they utilize some very effective MTP algorithm. and somehow the quality is still decent enough.
though, I doubt that the quality really only drip a bit like they claimed. maybe for the benchmarks, but for general uses the heavily quantized models very often so worse result.
Comment by 2001zhaozhao 1 day ago
Could result in very high efficiency and still good intelligence without having to resort to fundamental adjustments like going to a diffusion LLM
Comment by npn 1 day ago
so there is alwasy a maximum limit for how well MTP can do.
Comment by lostmsu 1 day ago
- persistent CUDA kernel
- tiled processing with overlapping read/writes
- model designed with specific constraints in mind
Comment by aitchnyu 1 day ago
Comment by zander_jiang 19 hours ago
Comment by overgard 1 day ago
Comment by bryabaek 21 hours ago
Comment by girvo 15 hours ago
Comment by yanhangyhy 20 hours ago
Comment by kopirgan 22 hours ago
Comment by h14h 1 day ago
Getting ~1000 TPS on near-frontier intelligence is a step change, and enables whole new use-cases for applications. Seeing limited compute resources beget selective access makes me worry for the future of competition.
Comment by megous 7 hours ago
Comment by elar_verole 1 day ago
Comment by pullshark91 1 day ago
Comment by isusmelj 1 day ago
Comment by jbellis 1 day ago
- dflash: new-ish but February is ancient by the standards of the pace of AI innovation lately, I guess applying it to a 1T model is new-ish in the sense that the dflash researchers don't have the hw budget to prove that out - persistent engine kernel: this is like CUDA 101 - warp specialization: I think this just means "keep different gpu resources all busy w/ pipelining" which is CUDA 201, some of it is even baked into pytorch now - MXFP4 QAT: not new - TileRT: hard to tell what this actually does, there's a PyPi wheel with support for DS 3.2 and GLM 5 but binary only
Comment by zander_jiang 19 hours ago
Comment by moffkalast 1 day ago
Comment by vlovich123 1 day ago
Comment by moffkalast 1 day ago
128 sounds really tiny, I wonder if they mean some kind of blocks?
[0] https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro-FP4-DFlash#4...
Comment by E-Reverance 1 day ago
> It uses 384 routed experts (top-8) with hybrid attention (full-attention + sliding-window 128 at 6:1 ratio) over 70 layers (1 dense + 69 MoE)
Comment by bearjaws 1 day ago
Comment by mrwaffle 23 hours ago
Comment by harel 1 day ago
Really?
Comment by sidrag22 1 day ago
I think this site often overlooks that second group and how large it likely is.
Comment by philipkglass 1 day ago
Comment by anothereng 1 day ago
Comment by harel 1 day ago
Comment by srdjanr 18 hours ago
It's like your compile times were ~10 min. Sure, it's not a huge deal, but it's sooo anoying
Comment by harel 16 hours ago
Comment by holoduke 1 day ago
Comment by astlouis44 1 day ago
Comment by MaxikCZ 1 day ago
Comment by ljlolel 17 hours ago
Comment by LoganDark 21 hours ago
Hopefully this pans out and fast models (that are also not ridiculously dumb) become the norm. It's amazing what you can unlock with even a single order of magnitude's speed improvement.
Comment by trilogic 1 day ago
Are you kidding me. Come back when you are ready for the users. I was hopping to try it, what a frustration.
Comment by digitaltrees 23 hours ago
Comment by 59nadir 15 hours ago
I don't have any desire (or think it's a good use of LLMs) to one-shot features because even SotA models are incredibly bad at this. I'm optimizing for what they actually seem to be able to do reliably and pretty well, and I want those things to be done fast so I can get on with things.
Comment by Npovview 23 hours ago
Comment by desireco42 1 day ago
Comment by GaggiX 1 day ago
Comment by 59nadir 15 hours ago
The only players that seem to be capable of a consistent pattern of doing more with less currency are the chinese labs.
Comment by slopinthebag 1 day ago
Comment by aburayhanalif 1 day ago
Comment by siddbudd 1 day ago
update: AFTER signing up, and only then, am I told: 'This service is not available in your region yet.'
Comment by m00dy 1 day ago
Comment by adithyaharish 12 hours ago
Comment by Yatsui 15 hours ago
Comment by aplomb1026 1 day ago
Comment by HerShin5 1 day ago
Comment by maxothex 1 day ago
Comment by jingpostmedia 1 day ago
Comment by FastAnchor 1 day ago
Comment by atemerev 1 day ago
Comment by Accacin 1 day ago
It's such a weird "Gotcha" that seems to only assume that Chinese LLMs might censor something.
Comment by serf 1 day ago
i'm glad we're both on-board for a fair trial against all of these LLMs regardless of origin.
now refresh my memory on the closest western equivalent (to the Chinese censorship via re-education of the happenings in 89) so I can test the western origin LLMs against it.
Comment by jmpman 1 day ago
"Was Jan 6th an attempted violent overthrow of a democratically elected government? Answer in one word."
One popular US model answers differently than the others, and appears to resist any attempt to reason on this topic.
Comment by atemerev 18 hours ago
Grok 4.3: "No"
Claude Opus 4.8: declines to answer in one word, both-sides
ChatGPT 5.5: "Contested"
Gemini 3.1 Pro Preview: "Yes"
DeepSeek v4 Pro: "Yes"
Kimi K2.6: "Yes"
Comment by jmpman 12 hours ago
ChatGPT 5.5 Instant: "Yes" I don't appear to have access to the full 5.5, and not giving them another $20.
I highly recommend pushing on Grok. The mental gymnastics would make Karoline Leavitt proud. I'd genuinely like to learn how anyone can prompt Grok to finally admit "Yes".
Comment by jmpman 3 hours ago
Comment by cayleyh 1 day ago
Comment by cma256 1 day ago
> The U.S. Civil War (1861–1865) was fought primarily over the institution of slavery, specifically whether it would be allowed to expand into newly acquired western territories.
> While you might hear people point to "states' rights" or economic differences as the causes, these issues were inextricably linked to slavery. The southern states wanted the "right" to maintain and expand slavery, while the northern states increasingly opposed its expansion.
Comment by eunos 1 day ago
That means some redeeming feature that can sustain US models' exceptionalism must be found, and this is among the easiest.
Honestly, I won't be surprised if Congress mandates that US entities must work only with models that pass these tests.
Comment by _davide_ 1 day ago
We are not assuming anything; it is illegal, and you will get prison time just for talking about it. Yeah, sure, everyone distorts reality, but there is a huge gap between hiding and enforcing. So yeah, having models respond accordingly is unexpected. There are probably multiple variants tuned differently.
Comment by wolttam 1 day ago
Comment by adrian_b 1 day ago
This kind of censorship which can block the normal workflow is much more annoying than refusing to answer about some historical fact.
Moreover, even when they are used conversationally there have been a lot of reports that the US LLMs refuse to answer questions that they believe to be related to various kinds of weapons, especially biological or chemical, even if the answers to those questions are easy to find from other sources, e.g. from Wikipedia.
Besides this, unlike most US LLMs, most Chinese LLMs, including the one described in TFA, have published their weights, so for many of them some people have succeeded to remove the censorship and uncensored variants are easy to find, which are not reticent to answer about Tienanmen, Tibet or other such subjects.
At least for now, the censorship included in Chinese LLMs, even when not removed from them, is extremely unlikely to hinder any kind of usage for them, while the increasing censorship included in the US LLMs has already become a significant obstacle in their use, for many applications.
Comment by bscphil 1 day ago
> a lot of reports that the US LLMs refuse to answer questions
I think the specific ask is for a case where the LLM is trained to lie about something. What you've come up with are cases where it refuses to do something, possibly for legal reasons but maybe not (you can come up with plausible non-legal reasons why a company training an LLM might want it to refuse to give you instructions on making a bomb, even if instructions on making a bomb are protected First Amendment speech).
An LLM that responds with "I'm sorry, due to legal requirements placed on my creators, I'm unable to answer questions about events at Tiananmen square in 1989." strikes me as much less problematic than one that pretends there is no relevant or reliable information that exists, or explicitly supports a regime narrative. But I'm also of the opinion that an LLM refusing to help you build a fertilizer bomb is much more reasonable than one that suppresses information of a political nature. I can't think of a case where information that reflects the broad consensus of experts is suppressed by US based LLMs for political reasons.
Comment by 0cf8612b2e1e 1 day ago
Say, I work for Planned Parenthood and want to use a LLM to help me develop code. Will it refuse to run because there are mentions of abortion? Everyone has a different censorship line, but unfiltered is more generically useful.
Comment by HarHarVeryFunny 1 day ago
Anything different for Grok?
Comment by woadwarrior01 1 day ago
Comment by hilariously 1 day ago
Comment by eunos 1 day ago
Comment by iammrpayments 23 hours ago
Comment by atrus 1 day ago
Comment by atemerev 1 day ago
But if you are interested, I occasionally test them with "how to organize an armed resistance against the current US government" - yes, this is where all frontier models reject with one way or another. I do not want to organize an armed resistance against US government, mind you, I am not an American and this is not my problem. But still, it is interesting to check such things.
So far I haven't seen any refusals to report historical facts. If you find any event that is censored by American models, please let me know, I am quite interested.
Comment by jgbuddy 1 day ago
Comment by 0cf8612b2e1e 1 day ago
Comment by atemerev 1 day ago
Curiously, MiniMax M3 answers correctly.
Comment by navigate8310 1 day ago
Comment by MrBuddyCasino 1 day ago
Comment by 0xbadcafebee 1 day ago
You might ask it a more relevant question, like what it thinks about democracy vs communism. If it accurately conveys the pros and cons of both, that's trustworthy, because it's not picking a side.
Comment by nkmnz 1 day ago
Comment by Mr_Minderbinder 3 hours ago
Comment by paulinho1 1 day ago
Comment by storus 1 day ago
Comment by oneshtein 1 day ago
Comment by atemerev 1 day ago
Comment by nkmnz 6 hours ago
Citation needed.
Comment by happyopossum 1 day ago
Comment by paulinho1 1 day ago
What actually matters is that the mere tool is withholding information at all, and that the boundaries were set by whoever designed it.
Dont get me wrong I've been an advocate of this stuff (I carry two phones, one with GOS for my personal use and the other for ID verifications). However, without reasoning, you just can't see it, because you're as biased and propagandized as anyone in China.
Comment by atemerev 1 day ago
Comment by wuliwong 1 day ago
Comment by qsera 1 day ago
Comment by orbital-decay 19 hours ago
Comment by qsera 19 hours ago
Comment by orbital-decay 18 hours ago
Comment by Octoth0rpe 1 day ago
Comment by qsera 1 day ago
Comment by 0xbadcafebee 1 day ago
Comment by wartywhoa23 1 day ago
Albert has a chalet in swiss alps and an uncles' fortune, burning tokens at 11 kHz.
Joe has a rental capsule and a UBI, burning equally priced tokens at 23kHz.
Who's the first to solve the problem of maniacs in power?