Claude Opus 4.7 Model Card
Posted by adocomplete 1 day ago
Comments
Comment by bachittle 23 hours ago
Comment by film42 22 hours ago
Comment by tomaskafka 18 hours ago
Comment by freedomben 22 hours ago
Comment by RobinL 22 hours ago
Comment by Someone1234 21 hours ago
Comment by RobinL 18 hours ago
Comment by timvb 17 hours ago
Comment by jzig 22 hours ago
Comment by daemonologist 22 hours ago
The longer the context the worse the performance; there isn't really a qualitative step change in capability (if there is imo it happens at like 8k-16k tokens, much sooner than is relevant for multi-turn coding tasks - see e.g. this old benchmark https://github.com/adobe-research/NoLiMa ).
Comment by the13 20 hours ago
Comment by enraged_camel 20 hours ago
Comment by teaearlgraycold 21 hours ago
Comment by vessenes 20 hours ago
I surmise that someone at the top put the Mythos release on hold, and the product team was told "ship this other interim step model instead. quickly."
I wonder if 4.7 will be seen as a net step-up in quality; there are some regressions noted in the document, and it's clearly substantially worse than Mythos, at least according to its own model card. Should be an interesting few months -- if I were at oAI I'd be rushing to get something out that's clearly better, and pressing for weakness here.
Comment by the13 20 hours ago
Comment by vessenes 20 hours ago
Comment by barneybooroo 18 hours ago
Comment by kube-system 21 hours ago
That's an interesting choice of benchmark for measuring the risk of "Chemical and biological weapons"
Comment by Aboutplants 21 hours ago
Comment by koehr 23 hours ago
Comment by Uehreka 21 hours ago
I guess maybe, but then do those documents lose value as technical documents? Not necessarily at all, so I don’t see the point. How are you supposed to describe a useful technical thing to users?
Comment by parsimo2010 20 hours ago
For context, the word "Mythos" appears 331 times in a 221 page document. "Opus 4.6" appears 240 times, so a reference to a model that nobody has really used happens more often than the reference to the last generation model.
Comment by ModernMech 22 hours ago
Comment by Symmetry 23 hours ago
>_>
Comment by 100ms 23 hours ago
$ pbpaste | wc -w
62508
$ pbpaste | grep -oi mythos|wc -w
331
$ pbpaste | grep -oi opus|wc -w
809Comment by aliljet 23 hours ago
Comment by computomatic 23 hours ago
They have also repeatedly communicated that the base unit (Pro allotment) is subject to change and does change often.
As far as I can tell, that implies there is no guarantee that those subscriptions get some specific number of tokens per unit of time. It’s not a claim they make.
Comment by msikora 15 hours ago
Comment by DonsDiscountGas 22 hours ago
Comment by ModernMech 21 hours ago
Comment by joeumn 23 hours ago
Comment by bicepjai 1 day ago
Comment by STRiDEX 23 hours ago
Would there not already be websites that contain that information? How is an llm different, i guess, from some sort of anarchist cookbook thing.
Comment by Philpax 23 hours ago
The bigger issue is that they are potentially capable of producing novel formulations capable of producing harm, and guiding someone through this process. That is, consider a world in which someone with malicious desires has access to a model as capable at chemistry / biology as Mythos is at offensive cybersecurity abilities.
This is obviously limited by the fact that the models don't operate in the physical world, but there's plenty of written material out there.
Comment by rogerrogerr 23 hours ago
1. Smart people have economic opportunities that align them away from being evil
2. People who are evil tend not to be smart.
We're breaking both of these assumptions.
Comment by chrisweekly 22 hours ago
For some definition of evil, some of the time, ok. But as economic opportunities compound (looking at the behavior of the ultra-rich), it seems there's at least strong correlation in the other direction, if not full-on "root of all evil" causation.
Comment by rogerrogerr 22 hours ago
So much infrastructure is very soft because the evil people aren’t smart enough to conceive of or conduct an attack.
Comment by fwip 21 hours ago
Comment by Jensson 14 hours ago
Comment by fwip 11 hours ago
Comment by ben_w 4 hours ago
Is Russia currently capitalist, or non-capitalist? Which is Myanmar?
Anyway, personally I think it's the wrong axis; while capitalism and democracy and free press are often correlated, I think that the latter two are the important ones for actually choosing the lesser evils, though capitalism does generate more options to choose between.
Comment by JohnMakin 16 hours ago
for now
Comment by Der_Einzige 23 hours ago
Comment by malcolmgreaves 22 hours ago
Comment by mikek 21 hours ago
Comment by hxugufjfjf 21 hours ago
Comment by orneryostrich 21 hours ago
Comment by dcre 22 hours ago
On top of LLMs reducing the cost/difficulty, the other reason biological and chemical weapons are such a worry is their asymmetric character — they are much much easier and cheaper to produce and deploy than they are to defend against.
Comment by Aboutplants 21 hours ago
Comment by somesortofthing 20 hours ago
Comment by Nicook 19 hours ago
Comment by rgbrenner 23 hours ago
Comment by CodingJeebus 23 hours ago
Comment by jmward01 1 day ago
Comment by blixt 23 hours ago
Comment by jmward01 23 hours ago
Comment by mvkel 23 hours ago
Comment by deaux 22 hours ago
It isn't. Gemini has gotten more expensive with each release. Anthropic has stayed pretty similar over time, no? When is the last time OpenAI dropped API prices? OpenAI started very high because they were the first, so there was a ton of low hanging fruit and there was much room to drop.
Comment by mvkel 19 hours ago
It's well known that GPT-4 is much more expensive to operate than the GPT-5 family.
Of course they won't drop the prices; it's pure profit if they make models more efficient.
Comment by qingcharles 19 hours ago
Comment by dkhenry 23 hours ago
Comment by make3 23 hours ago
Comment by dkhenry 22 hours ago
This comparison shows them neck and neck https://benchlm.ai/compare/claude-sonnet-4-5-vs-gemma-4-31b
As Does this one https://llm-stats.com/models/compare/claude-sonnet-4-6-vs-ge...
And the pelican benchmark even shows them pretty close https://simonwillison.net/2026/Apr/2/gemma-4/ https://simonwillison.net/2025/Sep/29/claude-sonnet-4-5/
Also this isn't a fringe statement, you can see most people who have done an evaluation agree with me
Comment by jmward01 21 hours ago
Comment by make3 17 hours ago
Comment by lostmsu 23 hours ago
Comment by il-b 23 hours ago
Comment by msla 21 hours ago
Comment by marginalia_nu 18 hours ago
Comment by Rekindle8090 19 hours ago
What is the justification for .4.5.6.7.8.9 when the difference isn't measurable and it destroys productivity because they test the next increment on the previous one without customer consent?
Comment by nothinkjustai 22 hours ago
Comment by NickNaraghi 22 hours ago
Comment by nullc 20 hours ago
I've getting a small but steady stream of harassment from mentally ill people who get spun up on crazy conspiracy theories and claude is all too willing to tell them they are ABSOLUTELY RIGHT, encourage them to TAKE ACTION, and telling them that people who disagree are IN ON IT.
The other major AI LLM services will shut down the deflect to be less crazy or shut down conversation entirely, -- but it seems claude doesn't. Anthropic is probably the worst about prattling on about safety but it seems like their concern is mostly centered on insane movie plot threats and less concerned about things with more potential for real harm.
I've complained to anthropic with no response.
Comment by pukaworks 21 hours ago
Comment by gignico 19 hours ago
Comment by deflator 22 hours ago
Comment by hgoel 15 hours ago
With the weights being mostly opaque, these kinds of evaluations are an important piece of reducing the harm an AI model can cause.
Comment by deflator 1 hour ago