Ask HN: How are thinking efforts implemented?
Posted by simianwords 2 days ago
Claude and ChatGPT have thinking efforts where you can tune the amount of thinking allowed.
Like low, medium, high, xhigh and so on.
But are they different models underneath? Or same model with different parameter?
The reason I ask is because, if I change the effort param mid conversation in Claude code, I get a warning suggesting I’m breaking the cache.
I don’t think this happens in Codex because when I change the effort, the responses are still quick.
Comments
Comment by pyentropy 2 days ago
Note that inference libs also have parsers that put hard limits on reasoning tokens with separate counters (similar to how you can put a limit on token generation per completion versus waiting for an <eos>). For that, take a look at vllm reasoning docs.
Comment by pyentropy 2 days ago
https://docs.vllm.ai/en/latest/features/reasoning_outputs/#a...
Comment by simianwords 2 days ago
Maybe like: add a secret suffix to your chat in the conversation to think more like
conversation....
Hey please help
[think more]Comment by pyentropy 2 days ago
I might be very very wrong though and LLMs disagree with me, insisting that cache is preserved and the system message doesn't have to change (even though it often contains effort level in context) if effort level changes across turns, and that all you have to do is tell the inference lib that parses think tags to early-close think tags that are too long.
Comment by simianwords 1 day ago
Comment by aabdi 2 days ago
Usually it’s done in post training to enforce behavior based on prompt. Ie. System prompt with thinking:max or low or wtv.
Enforcement then goes via constrained decoding, checking for think token start and end with max lengths, or other variations
Comment by bjourne 2 days ago
Comment by simianwords 2 days ago
Comment by bjourne 2 days ago
Comment by pyentropy 2 days ago
See https://developers.openai.com/cookbook/articles/openai-harmo... and src/openai/types/shared/reasoning_effort.py
Comment by bjourne 1 day ago
Comment by simianwords 1 day ago
Comment by bjourne 14 hours ago
Comment by __patchbit__ 2 days ago
Comment by sometimelurker 2 days ago
Comment by pyentropy 2 days ago
Comment by sometimelurker 1 day ago
Comment by Yahyaaa 1 day ago
Comment by shanewei 2 days ago