Changes in the system prompt between Claude Opus 4.6 and 4.7
Posted by pretext 2 days ago
Comments
Comment by embedding-shape 2 days ago
Uff, I've tried stuff like these in my prompts, and the results are never good, I much prefer the agent to prompt me upfront to resolve that before it "attempts" whatever it wants, kind of surprised to see that they added that
Comment by alsetmusic 1 day ago
Edit: forgot "don't assume"
Comment by gck1 1 day ago
Otherwise, the intent gets lost somewhere in the chat transcript.
Comment by chermi 1 day ago
Comment by fnord123 1 day ago
Comment by unshavedyak 1 day ago
Comment by yfontana 1 day ago
Comment by majormajor 1 day ago
Comment by rob74 1 day ago
Comment by BehindBlueEyes 2 hours ago
Comment by ikari_pl 1 day ago
I try to explicitly request Claude to ask me follow-up questions, especially multiple-choice ones (it explains possible paths nicely), but if I don't, or when it decides to ignore the instructions (which happens a lot), the results are either bad... or plain dangerous.
Comment by lishuaiJing03 1 day ago
Comment by tuetuopay 1 day ago
Comment by sutterd 1 day ago
Comment by mh- 1 day ago
It's possible they tried to train this out of it for 4.7 and over corrected, and the addition to the system prompt is to rein it in a bit.
Comment by naasking 1 day ago
Edit: That said, it's entirely possible that large and sophisticated LLMs can invent some pretty bizarre but technically possible interpretations, so maybe this is to curb that tendency.
Comment by embedding-shape 1 day ago
To me too, if something is ambigious or unclear when I'm getting something to do from someone, I need to ask them to clarify, anything else be borderline insane in my world.
But I know so many people whose approach is basically "Well, you didn't clearly state/say X so clearly that was up to me to interpret however I wanted, usually the easiest/shortest way for me", which is exactly how LLMs seem to take prompts with ambigiouity too, unless you strongly prompt them to not "reasonable attempt now without asking questions".
Comment by gausswho 1 day ago
Comment by gck1 1 day ago
When I task my primary agent with anything, it has to launch the Socratic agent, give it an overview of what are we working on, what our goals are and what it plans to do.
This works better than any thinking tokens for me so far. It usually gets the model to write almost perfectly balanced plan that is neither over, nor under engineered.
Comment by fragmede 1 day ago
Comment by eastbound 1 day ago
—Claude Code: FLIPS THE SWITCH, does not answer the question.
Claude does that in React, constantly starting a wrong refactor. I’ve been using Claude for 4 weeks only, but for the last 10 days I’m getting anger issues at the new nerfing.
Comment by tobyhinloopen 1 day ago
Comment by ashdksnndck 1 day ago
Comment by adw 1 day ago
Comment by majormajor 1 day ago
Comment by PunchyHamster 1 day ago
Comment by niobe 1 day ago
Comment by ignoramous 1 day ago
I've found that Google AI Mode & Gemini are pretty good at "figuring it out". My queries are oft times just keywords.
Comment by PunchyHamster 1 day ago
Comment by bartread 1 day ago
Comment by jrvarela56 1 day ago
Here are a few changes:
1. AGENTS.md by default across the codebase, a script makes sure CLAUDE.md symlink present wherever there's an AGENTS.md file
2. Skills are now in a 'neutral' dir and per agent scripts make sure they are linked wherever the coding agent needs them to be (eg .claude/skills)
3. Hooks are now file listeners or git hooks, this one is trickier as some of these hooks are compensating/catering to the agent's capabilities
4. Subagents and commands also have their neutral folders and scripts to transform and linters to check they work
5. `agent` now randomly selects claude|codex|gemini instead of typing `claude` to start a coding session
I guess in general auditing where the codebase is coupled and keeping it neutral makes it easier to stop depending solely on specific providers. Makes me realize they don't really have a moat, all this took less than an hour probably.
Comment by esperent 1 day ago
So I'm migrating to pi. I realized that the hardest thing to migrate is hooks - I've built up an expensive collection of Claude hooks over the last few months and unlike skills, hooks are in Claude specific format. But I'd heard people say "just tell the agent to build an extension for pi" so I did. I pointed it at the Claude hooks folder and basically said make them work in pi, and it, very quickly.
Comment by jrvarela56 1 day ago
Comment by fouc 10 hours ago
Comment by esperent 8 hours ago
It's got an annoyingly hard to search name because there's a lot of overlap in results with the Raspberry Pi single board computer.
Over the past week or so my workload has been quite low so I've been tinkering rather than doing serious deep work.
I've been using:
* Gemini pro and flash
* Opus 4.6 when I had some free extra usage credits (it burned through $50 of credits like crazy).
* Qwen 3.6 Plus
* Codex 5.3
* Kimi 2.5
I just spent the last hour using Kimi. I was very impressed actually, definitely possible to do useful work with it. However, I used $1 of openrouter credits in about 20 or 30 minutes of a single session, no subagents, so it's not cheap.
Comment by grantcarthew 7 hours ago
Using it I don't need skills, memory, subagents, a specific agent CLI. It defines roles, tasks, context out of the box.
I made it for me and my family though. I don't expect interest outside of that.
Comment by Lucasoato 1 day ago
In Claude, I’ve seen cases in which spawning subagents from Gemini and Codex would raise strange permission errors (even if they don’t happen with other cli commands!), making claude silently continue impersonating the other agent. Only by thoroughly checking I was able to understand that actually the agent I wanted failed.
Comment by jrvarela56 1 day ago
For (1) I'm trying to come up with a simple enough definition that can be 'llm compiled' into each format. Permissions format requires something like this two and putting these together some more debugging.
(2) the only one I've played with is `claude -p` and it seems to work for fairly complex stuff, but I run it with `--dangerously-skip-permissions`
Comment by bootlooped 1 day ago
Comment by lbreakjai 1 day ago
Comment by dockerd 1 day ago
Comment by jrvarela56 1 day ago
Context: AGENTS.md is standard across all; and subdirectories have their AGENTS.md so in a way this is a tree of instructions. Skills are also standard so it's a bunch of indexable .md files that all agents can use.
Comment by potter098 19 hours ago
Comment by walthamstow 1 day ago
Comment by embedding-shape 1 day ago
Comment by gchamonlive 1 day ago
Comment by zozbot234 1 day ago
Comment by MoltenMan 1 day ago
At some point you just have to accept that llm's, like people, make mistakes, and that's ok!
Comment by alwillis 1 day ago
It's not a niche issue at all. 29 million people in the US are struggling with an eating disorder [1].
> This single paragraph is going to legitimately cost anthropic at least 4, maybe 5 digits.
It's 59 out of 3,791 words total in the system prompt. That's 1.48%. Relax.
It should go without saying, but Anthropic has the usage data; they must be seeing a significant increase in the number of times eating disorders come up in conversations with Claude. I'm sure Anthropic takes what goes into the system prompt very seriously.
[1]: from https://www.southdenvertherapy.com/blog/eating-disorder-stat...
The trajectory is troubling. Eating disorder prevalence has more than doubled globally since 2000, with a 124% increase according to World Health Organization data. The United States has seen similar trends, with hospitalization rates climbing steadily year over year.
Comment by phainopepla2 1 day ago
I don't mean to dispute your assertion that it's not a niche issue, but that site does not strike me as a reliable interpreter of the facts.
Comment by redsocksfan45 1 day ago
Comment by zozbot234 1 day ago
> At some point you just have to accept that llm's, like people, make mistakes, and that's ok!
Except that's not the way many everyday users view LLM's. The carwash prompt went viral because it showed the LLM making a blatant mistake, and many seem to have found this genuinely surprising.
Comment by otabdeveloper4 1 day ago
It will take years until the understanding sets in that they're just calculators for text and you're not praying to a magic oracle, you're just putting tokens into a context window to add bias to statistical weights.
Comment by mudkipdev 1 day ago
Comment by SilverElfin 1 day ago
Comment by nozzlegear 1 day ago
Comment by otabdeveloper4 1 day ago
Mm, yes. Let's add mitigation for every possible psychological disorder under the sun to my Python coding context. Very common-sense.
Comment by zdragnar 1 day ago
LLMs aren't AGI, and I'd go further and say they aren't AI, but admitting it is snake oil doesn't sell subscriptions.
Comment by WarmWash 1 day ago
So spending $50M to fund a team to weed out "food for crazies" becomes a no-brainer.
Comment by goosejuice 1 day ago
Comment by bojan 1 day ago
Comment by fineIllregister 1 day ago
Yes, the companies providing these products are sued a lot and are heavily regulated, too.
Comment by ChadNauseam 1 day ago
Comment by goosejuice 18 hours ago
It's much more difficult to isolate alcohol and exhaust as the primary driver of an individuals disease than the above and that's the primary reason it's not regulated more than it is today. I expect that to change as research evolves.
Comment by nozzlegear 1 day ago
Comment by ChadNauseam 1 day ago
Comment by nozzlegear 1 day ago
I'm a teetotaler so no, I literally have not. I was mostly thinking about cigarette and tobacco products which are the most glaring, obvious counterpoints. But you'll be happy to learn that virtually all vehicles in the US also come with operating manuals that profusely warn people not to breathe in the exhaust from the vehicle.
Comment by salad-tycoon 1 day ago
On every bottle:
Alcoholic Beverage Labeling Act of 1988
“ GOVERNMENT WARNING: (1) According to the Surgeon General, women should not drink alcoholic beverages during pregnancy because of the risk of birth defects. (2) Consumption of alcoholic beverages impairs your ability to drive a car or operate machinery, and may cause health problems"
Cancer proposal: https://www.mdanderson.org/cancerwise/not-just-a-hangover--t...
https://www.ttb.gov/regulated-commodities/beverage-alcohol/d...
(As if adding this text will do anything other than reduce the companies liability, rofl)
Comment by goosejuice 19 hours ago
Comment by goosejuice 21 hours ago
Comment by WarmWash 1 day ago
Comment by arcanemachiner 1 day ago
Comment by nozzlegear 1 day ago
Comment by salad-tycoon 1 day ago
Finally, what is often missed is what if an actual good is decided harmful or something that is harmful is decided by AI company board XYZ to be “good”?
I think censorship is bad because of that danger. Quis custodiet ipsos custodes (who will watch the watchers).
Instead of throwing ourselves into that minefield of moral hazard, we should be lifting each other up to the tops of our ability and not infantilizing / secretly propagandizing each other.
Well, ideally at least.
Comment by goosejuice 21 hours ago
Look, I get where you're coming from, partially. I generally believe we should make an effort to maximize individual liberty. But in this case, were talking about severe bodily harm and the death of young adults. We've spent the last decade dealing with the chaos and general unwellness has brought to our societies. This isn't much different.
What are you giving up here where such sacrifices are worth it? Can you measure it? What's the utility?
There's room for models trained for non consumer purposes, further age restriction etc but shit is moving so fast. If there are actual needs for a less censored model these can be addressed.
> Finally, what is often missed is what if an actual good is decided harmful or something that is harmful is decided by AI company board XYZ to be “good”?
This is just standard product liability and consumer protection. Companies who do nothing to protect their consumers from known harms are liable. Are you saying you think that's somehow bad for society?
Comment by echelon 1 day ago
We let people buy kitchen knives. But because the kitchen knife companies don't have billions of dollars, we don't go after them.
We go after the LLM that might have given someone bad diet advice or made them feel sad.
Nevermind the huge marketing budget spent on making people feel inadequate, ugly, old, etc. That does way more harm than tricking an LLM into telling you you can cook with glue.
Comment by gmac 1 day ago
Comment by mattjoyce 1 day ago
Comment by jeffrwells 1 day ago
Comment by zythyx 1 day ago
Comment by teaearlgraycold 1 day ago
Comment by wongarsu 1 day ago
And of course all conversations now have to compact 80 tokens earlier, and are marginally worse (since results get worse the more stuff is in the context)
Comment by dymk 1 day ago
Comment by whateveracct 1 day ago
Comment by bradley13 1 day ago
In the best case, wrapping users in cotton wool is annoying. In the worst case, it limits the usefulness of the tool.
Comment by seba_dos1 1 day ago
Comment by pllbnk 1 day ago
Hard for me to say this because I have always been pro-Western and suddenly it seems like the world has flipped.
Comment by salad-tycoon 1 day ago
I have just one question for you pllbnk, are we the baddies?
Comment by pllbnk 23 hours ago
So yeah, at this moment in time it's really really hard to say who are better or worse as the collective West's reputation is tumbling down and China's if not rising, then at least staying put.
Comment by rzmmm 1 day ago
It's a particularly sensitive issue so they are just probably being cautious.
Comment by echelon 1 day ago
This era of locked hyperscaler dominance needs to end.
If a third tier LLM company made their weights available and they were within 80% of Opus, and they forced you to use their platform to deploy or license if you ran elsewhere, I'd be fine with that. As long as you can access and download the full raw weights and lobotomize as you see fit.
Comment by renewiltord 1 day ago
Comment by ikari_pl 1 day ago
Because it's a waste of my money to check whether my Object Pascal compiler doesn't develop eating disorders, on every turn.
Comment by ubercore 1 day ago
If Claude is going to be Claude, we should support these kind of additions.
Comment by salad-tycoon 1 day ago
The better solution I think would be a reality/personal responsibility approach, teach the consumers that the burden of interpretation is on them and not the magic 8ball. For example if your AI tells you to kill your parents or that you’ve discovered new math that makes time travel possible, etc then: 1. Stop 2. Unplug 3. Go outside 4. Ask a human for a sanity check.
Since that would be bad for business and take a lot of effort on the user side (while being very embarrassing). Obviously can’t do that right before an IPO & in the middle of global economic war so secretive moral frameworks have to be installed.
If you are what you eat then you believe what you consume. Ironically, I think this undisclosed and hidden moral shaping of billions of people will be the most dangerous. Imagine all the things we could do if we can just, ever-so-slightly, move the Overton window / goal posts on w/e topic day by day, prompt by prompt.
Personally I find AI output insidiously disarming and charming and I think I’m in the norm. So while we’ve been besieged by propaganda since time immemorial I do worry that AI is a special case.
Comment by mohamedkoubaa 1 day ago
Comment by nozzlegear 1 day ago
Comment by gloomyday 1 day ago
Comment by l5870uoo9y 1 day ago
Comment by dwaltrip 1 day ago
They don’t reliably have the judgment to pause and proceed carefully if a delicate topic comes up. Hence these bandaids in the system prompt.
Comment by rcfox 1 day ago
Comment by newZWhoDis 1 day ago
Comment by felixgallo 1 day ago
Comment by forshaper 21 hours ago
Comment by idiotsecant 1 day ago
Letting the system improve over time is fine. System prompt is an inefficient place to do it, buts it's just a patch until the model can be updated.
Comment by ls612 1 day ago
Comment by ikari_pl 1 day ago
I am strongly opinionated against this. I use Claude in some low-level projects where these answers are saving me from making really silly things, as well as serving as learning material along the way.
This should not be Anthropic's hardcoded choice to make. It should be an option, building the system prompt modularily.
Comment by j-bos 1 day ago
Comment by stingraycharles 1 day ago
Comment by j-bos 23 hours ago
Comment by xpct 1 day ago
Comment by stingraycharles 1 day ago
You do realize that these LLMs are trained with a metric ton of synthetic examples? You describe the kind of examples / behavior you want, let it generate thousands of examples of this behavior (positive and negative), and you feed that to the training process.
So changing this type of data is cheap to change, and often not even stored (one LLM is generating examples while the other is training in real-time).
Here's a decent collection of papers on the topic: https://github.com/pengr/LLM-Synthetic-Data
Comment by xpct 1 day ago
I imagine the system prompt can correct some training artifacts and drive abnormal behavior to the mean in the dimensions that Anthropic deems fit. So it's either that they are responding to their brittle training process, or that they chose this direction deliberately for a different reason.
Comment by jwpapi 1 day ago
For low level I recommend to run tests as early as you can and verify whatever information you got when you learn, build a fundamental understanding
Comment by cowlby 1 day ago
Key example for me was the "malware" tool call section that included a snippet with intent "if it's malware, refuse to edit the file". Yet because it appears dozens of times in a convo, eventually the LLM gets confused and will refuse to edit a file that is not malware.
I've resorted to using tweakcc to patch many of these well-intentioned sections and re-work them to avoid LLM pitfalls.
Comment by stingraycharles 1 day ago
I run claude code with my own system prompt and toolings on top of it. tweakcc broke too often and had too many glitches.
Comment by mpalczewski 1 day ago
Comment by alfiedotwtf 1 day ago
Comment by jwpapi 1 day ago
Comment by cfcf14 2 days ago
The malware paranoia is so strong that my company has had to temporarily block use of 4.7 on our IDE of choice, as the model was behaving in a concerningly unaligned way, as well as spending large amounts of token budget contemplating whether any particular code or task was related to malware development (we are a relatively boring financial services entity - the jokes write themselves).
In one case I actually encountered a situation where I felt that the model was deliberately failing execute a particular task, and when queried the tool output that it was trying to abide by directives about malware. I know that model introspection reporting is of poor quality and unreliable, but in this specific case I did not 'hint' it in any way. This feels qualitatively like Claude Golden Gate Bridge territory, hence my earlier contemplation on steering vectors. I've been many other people online complaining about the malware paranoia too, especially on reddit, so I don't think it's just me!
Comment by daemonologist 2 days ago
Of course it's also been noted that this seems to be a new base model, so the change could certainly be in the model itself.
Comment by chatmasta 1 day ago
(URL is to diff since 2.1.98 which seems to be the version that preceded the first reference to Opus 4.7)
Comment by dhedlund 1 day ago
I feel like this explains about a quarter to half of my token burn. It was never really clear to me whether tool calls in an agent session would keep the context hot or whether I would have to pay the entire context loading penalty after each call; from my perspective it's one request. I have Claude routinely do large numbers of sequential tool calls, or have long running processes with fairly large context windows. Ouch.
> The Anthropic prompt cache has a 5-minute TTL. Sleeping past 300 seconds means the next wake-up reads your full conversation context uncached — slower and more expensive. So the natural breakpoints:
> - *Under 5 minutes (60s–270s)*: cache stays warm. Right for active work — checking a build, polling for state that's about to change, watching a process you just started.
> - *5 minutes to 1 hour (300s–3600s)*: pay the cache miss. Right when there's no point checking sooner — waiting on something that takes minutes to change, or genuinely idle.
> *Don't pick 300s.* It's the worst-of-both: you pay the cache miss without amortizing it. If you're tempted to "wait 5 minutes," either drop to 270s (stay in cache) or commit to 1200s+ (one cache miss buys a much longer wait). Don't think in round-number minutes — think in cache windows.
> For idle ticks with no specific signal to watch, default to *1200s–1800s* (20–30 min). The loop checks back, you don't burn cache 12× per hour for nothing, and the user can always interrupt if they need you sooner.
> Think about what you're actually waiting for, not just "how long should I sleep." If you kicked off an 8-minute build, sleeping 60s burns the cache 8 times before it finishes — sleep ~270s twice instead.
> The runtime clamps to [60, 3600], so you don't need to clamp yourself.
Definitely not clear if you're only used to the subscription plan that every single interaction triggers a full context load. It's all one session session to most people. So long as they keep replying quickly, or queue up a long arc of work, then there's probably a expectation that you wouldn't incur that much context loading cost. But this suggests that's not at all true.
Comment by wongarsu 1 day ago
Comment by stingraycharles 1 day ago
The part that does get cached - attention KVs - is significantly cheaper.
If you read documentation on this, they (and all other LLM providers) make this fairly clear.
Comment by dhedlund 1 day ago
The interface strongly suggests that you're having a running conversation. Tool calls are a non-interactive part of that conversation; the agent is still just crunching away to give you an answer. From the user's perspective, the conversation feels less like stateless HTTP where the next paragraph comes from a random server, and more like a stateful websocket where you're still interacting with the original server that retains your conversation in memory as it's working.
Unloading the conversation after 5 minutes idling can make sense to most users, which is why the current complaints in HN threads tend to align with that 1 hour to 5 minute timeout change. But I suspect a significant amount of what's going on is with people who:
* don't realize that tool calls really add up, especially when context windows are larger.
* had things take more than 5 minutes in a single conversation, such as a large context spinning up subagents that are each doing things that then return a response after 5+ minutes. With the more recent claude code changes, you're conditioned to feel like it's 5 minutes of human idle time for the session. They don't warn you that the same 5 minute rule applies to tool calls, and I'd suspect longer-running delegations to subagents.
Comment by NitpickLawyer 1 day ago
Comment by stingraycharles 1 day ago
The attention part of LLMs (that is, for every token, how much their attention is to all other tokens) is cached in a KV cache.
You can imagine that with large context windows, the overhead becomes enormous (attention has exponential complexity).
Comment by ianberdin 1 day ago
No I am not joking. Every time you install something, there is a risk you clicked a wrong page with the absolute same design.
Comment by jeffrwells 1 day ago
Comment by Schlagbohrer 1 day ago
Comment by dandaka 2 days ago
Comment by solenoid0937 1 day ago
Comment by greenchair 1 day ago
Comment by sensanaty 1 day ago
Every statement they make, hell even the models themselves are going to be doing this theater of "Ooooh scary uber h4xx0r AI, you can only beat it if you use our Super Giga Pro 40x Plan!!". In a month or two they'll move onto some other thing as they always do.
Comment by cowlby 1 day ago
They really should hand off read() tool calls to a lean cybersecurity model to identify if it's malware (separately from the main context), then take appropriate action.
Comment by matheusmoreira 21 hours ago
Comment by ricardobeat 1 day ago
Comment by lionkor 1 day ago
Comment by sigmoid10 2 days ago
Comment by an0malous 2 days ago
Comment by embedding-shape 1 day ago
Comment by jug 1 day ago
Comment by wongarsu 1 day ago
Comment by dataviz1000 1 day ago
Comment by mysterydip 2 days ago
Comment by sigmoid10 2 days ago
Comment by aesthesia 1 day ago
Comment by sigmoid10 1 day ago
Comment by pests 1 day ago
Not true, it gets calculated once and essentially baked into initial state basically and gets stored in a standard K/V prefix cache. Processing only happens on new input (minus attention which will have to content with tokens from the prompt)
Comment by jatora 2 days ago
Comment by formerly_proven 2 days ago
Comment by sigmoid10 2 days ago
Comment by pests 1 day ago
Comment by sigmoid10 1 day ago
Comment by cfcf14 2 days ago
Comment by winwang 2 days ago
Comment by bavell 1 day ago
Comment by sigmoid10 2 days ago
Comment by cma 1 day ago
It gets pretty efficiently cached, but does eat the context window and RAM.
Comment by ares623 1 day ago
Comment by simonw 1 day ago
The Claude Code one isn't published anywhere but it's very easy to get hold of. One way to do that is to run Claude Code through a logging proxy - I was using a project called claude-trace for this last year but I'm not sure if it still works, I've not tried it in a while: https://simonwillison.net/2025/Jun/2/claude-trace/
Comment by varispeed 2 days ago
edit: to be fair Anthropic should be giving money back for sessions terminated this way.
Comment by ceejayoz 2 days ago
I asked it for one and it told me to file a Github issue.
Which I interpreted as "fuck off".
Comment by xvector 1 day ago
Comment by mwexler 1 day ago
Also full of "can" and "should" phrases: feels both passive and subjunctive as wishes, vs strict commands (I guess these are better termed “modals”, but not an expert)
Comment by KolenCh 1 day ago
Comment by zmmmmm 1 day ago
It must be that they are training very deeply the sense of identity in to the model as Claude. Which makes me wonder how it then works when it is asked to assume a different identity - "You are Bob, a plumber who specialises in advising design of water systems for hospitals". Now what? Is it confused? Is it still going to think all the verbiage about what "Claude" does applies?
Comment by ehnto 1 day ago
I also talk this way with people because I feel it makes it clear we're collaborating and fault doesn't really matter. I feel it lets junior memberstake more ownership of the successes as well. If we ever get juniors again.
Comment by saagarjha 1 day ago
Comment by akdor1154 1 day ago
Comment by saagarjha 1 day ago
class Claude {}
Claude anthropicInstance = new Claude();
anthropicInstance.greet();
Just like a "Cat" object in Java is supposed to behave like a cat, but is not a cat, and there is no way for Cat@439f5b3d to "be" a cat. However, it is supposed to act like a cat. When Anthropic spins up a model and "runs" it they are asking the matrix multipliers to simulate the concept of a person named Claude. It is not conscious, but it is supposed to simulate a person who is conscious. At least that is how they view it, anyway.Comment by EMM_386 1 day ago
Comment by SoKamil 2 days ago
Comment by lkbm 1 day ago
Comment by clickety_clack 1 day ago
Comment by wongarsu 1 day ago
Comment by dannyw 1 day ago
Comment by jimmypk 2 days ago
Comment by Havoc 1 day ago
Seems like a good idea. Don't think I've ever had any of those follow up suggestions from a chatbot be actually useful to me
Comment by jwpapi 1 day ago
Comment by xpct 1 day ago
Comment by ikidd 1 day ago
Comment by embedding-shape 1 day ago
So I'm guessing they want none of the model users (webui + API) to be able to do those things, rather than not being able to do that just in the webui. The changes mentioned in the submission is just for claude.ai AFAIK, not API users, so the "disordered eating" stuff will only be prevented when API users would prompt against it in their system prompts, but not required.
Comment by kaoD 1 day ago
Comment by bakugo 1 day ago
Comment by sams99 1 day ago
Comment by jachva95 1 day ago
Users need to unite and take control back, or be controlled
Comment by Schlagbohrer 1 day ago
Also, people already run local AI.
Are you proposing a public fund for frontier level open weights models? $1 Trillion from between the couch cushions?
Comment by dmk 2 days ago
Comment by bavell 1 day ago
Comment by sersi 1 day ago
Comment by verve_rat 1 day ago
Comment by poszlem 1 day ago
Comment by jwilliams 1 day ago
Yay! This will be a big win. I'm glad they fixed this. The number of times I've had to prompt "you do have access to GitHub"...
Comment by adrian_b 1 day ago
I wonder which are the "signs of disordered eating" on which Claude relies.
Comment by raincole 1 day ago
Comment by Grimblewald 1 day ago
Comment by lossyalgo 23 hours ago
Comment by xvector 1 day ago
Comment by Grimblewald 1 day ago
I swear 4.6+ looks for reasons to ask clarifying questions sometimes, even when really not required, and this fucks flow/quality up in a big way.
I just wish there was a "im not stupid" checkbox you can use to get a minimalistic interference access to claude. Im starting to use local models again, which I havent in a while because claude was so much better, but once i fully lose access to 4.5 it might be time to go back to fully local for good. 4.6+ fails to add value for me, projects 4.5- did good jobs on first try now require multiple prompts and feedback. Exact same initial prompt and project files extracted from archive. I liked claude because it aced those tests while local required handholding. Now claude requires handholding, so why use it over local? Once 4.5 leaves openrouter it might just be time.
Comment by nwienert 1 day ago
.6 is some sort of quantized or distilled .5 with a bit more RL, and the current .5 is that same cost reduced model without the extra RL.
Comment by c2xlZXB5Cg1 1 day ago
Comment by amelius 1 day ago
Comment by codensolder 1 day ago
Comment by mannanj 1 day ago
My concern is these models revert all medical, scientific and personal inquiry to the norm and averages of whats socially acceptable. That's very anti-scientific in my opinion and feels dystopian.
Comment by gausswho 1 day ago
Comment by mannanj 22 hours ago
Comment by techpulselab 1 day ago
Comment by sergiopreira 1 day ago
Comment by theoperatorai 1 day ago
Comment by jiusanzhou 1 day ago
Comment by kantaro 1 day ago
Comment by vicchenai 1 day ago
Comment by xdavidshinx1 1 day ago
Comment by foreman_ 2 days ago
Comment by Moonye666 1 day ago
Comment by richardwong1 1 day ago
Comment by dd8601fn 1 day ago
I did this in mine by only really having a few relevant tool functions in the prompt, ever. Search for a Tool Function, Execute A Tool Function, Request Authoring of a Tool Function, Request an Update to a Tool Function, Check Status of an Authoring Request.
It doesn't have to "remember" much. Any other functions are ones it already searched for and found in the tool service.
When it needs a tool it reliably searches (just natural language) against the vector db catalog of functions for a good match. If it doesn't have one, it requests one. The authoring pipeline does its thing, and eventually it has a new function to use.