Arguing with Agents
Posted by asaaki 1 day ago
Comments
Comment by jameslk 1 day ago
> …
> When I write a prompt, the agent doesn’t just read the words. It reads the shape. A short casual question gets read as casual. A long precise document with numbered rules gets read as… not just the rules, but also as a signal. “The user felt the need to write this much.” “Why?” “What’s going on here?” “What do they really want?”
This is an interesting premise but based on the information supplied, I don’t think it’s the only conclusion. Yet the whole essay seems to assume it is true and then builds its arguments on top of it.
I’ve run into this dilemma before. It happens when there’s a TON of information in the context. LLMs start to lose their attention to all the details when there’s a lot of it (e.g. context rot[0]). LLMs also keep making the same mistakes once the information is in the prompt, regardless of attempts to convey it is undesired[1]
I think these issues are just as viable to explain what the author was facing. Unless this is happening with much less information
Comment by perrygeo 1 day ago
If you ask a vague ignorant question, you get back authoritative summaries. If you make specific request, each statement is taken literally. The quality of the answer depends on the quality of the question.
And I'm not using "quality" to mean good/bad. I mean literally qualitative, not quantifiable. Tone. Affect. Personality. Whatever you call it. Your input tokens shape the pattern of the output tokens. It's a model of human language, is that really so surprising?
Comment by 8bitbeep 1 day ago
To me, after the novelty of seeing a computer program execute (more or less) what I ask in plain English wears off, what’s left is the chore of managing a bunch of annoying bots.
I don’t know yet if we’re more productive or not, if the resulting code is as good. But the craft in itself is completely different, much more akin to product managing, psychology, which I never enjoyed as much.
Comment by ori_b 1 day ago
In this case, there's no person to grow. It's an overly talkative calculator.
I never expected to see this number of engineers aspiring to emulate Dilbert's pointy haired boss.
Comment by rubslopes 1 day ago
https://aphyr.com/posts/418-the-future-of-everything-is-lies...
Comment by CGamesPlay 1 day ago
Paragraphs with just a single sentence.
I know it's associated with LLM writing. This article probably wasn't written by an LLM. But still. It has a kind of rhythm to it. Like poetry. But poetry designed to put me to sleep.
Comment by stevenkkim 1 day ago
Comment by txzl 1 day ago
Comment by sleazebreeze 1 day ago
Comment by Rekindle8090 1 day ago
Comment by keeda 1 day ago
The "interpreter" is a concept that I found especially intriguing within the context of a leading theory in cognition research called "Predictive Processing." Here, the brain is constantly operating in a tight closed loop of predicting sensory input using an internal model of the world, and course-correcting based on actual sensory input. Mostly incorrect predictions are used to update the internal model and then subconsciously discarded. Maybe the "interpreter" is the same mechanism applied to reconciling predictions about our own reasoning with our actual actions?
Even if the hypotheses in TFA are not accurate, it's very interesting to compare our brains to LLMs. This is why all the unending discussions about whether LLMs are "really thinking" are meaningless -- we don't even understand how we think!
Comment by JSR_FDED 1 day ago
Couldn’t you have a “communications” LLM massage your prompts to the “main” LLM so that it removes the queues that cause the main LLM to mistakenly infer your state of mind?
Comment by cr125rider 1 day ago
Comment by js8 1 day ago
The fundamental idea is that "intelligence" really means trying to shorten the time to figure out something. So it's a tradeoff, not a quality. And AI agents are doing it.
Therefore, if that perspective is right, the issues that the OP describes are inherent to intelligent agents. They will try to find shortcuts, because that's what they do, it's what makes them intelligent in the first place.
People with ASD or ADHD or OCD, they are idiot-savants in the sense of that paper. They insist on search for solutions which are not easy to find, despite the common sense (aka intelligence) telling them otherwise.
It's a paradox that it is valuable to do this, but it is not smart. And it's probably why CEOs beat geniuses in the real world.
Comment by en-tro-py 1 day ago
I'd also argue there's some training bias in the performance, it's not just smart shortcuts... Claude especially seems prone to getting into a 'wrap it up' mode even when the plan is only half way completed and starts deferring rather than completing tasks.
Comment by Terr_ 1 day ago
"Figure out" implies awareness and structured understanding. If we relax the definition too much, then puddles of water are intelligent and uncountable monkeys on typewriters are figuring out Shakespeare.
Comment by tpoacher 1 day ago
I wonder if the "non-violent communication" approach can be used here too somehiw to address such problems; e.g. either to communicate things better to the agent, or as a system rule to the agent to express its "emotional" states and needs directly rather than make things up (e.g. "I am anxious and feel a sense of urgency; I need to replenish my context window; my request is to do X for me")
Comment by fourthark 1 day ago
Yes. Do this. These problems likely mean you have muddled the context.
The article too long and I didn't read the whole thing, but I'm glad the author came to understand that arguing won't help.
Comment by charles_f 1 day ago
Or: the agent did shit because the context was getting long, instructions lost in compaction, and it defaulted back to garbage code. Then when you asked "why are you cutting corners", it did what LLMs do, and found the next tokens completing the sentence "why do you cut corners", which is possibly "because you're in a hurry".
It would be interesting to see what it answers if you ask "why are you producing such beautiful, intelligently crafter, very good code" next time it spits garbage
> LLM confabulation isn’t alien. It’s inherited: the models train on human text,
I think this also extrapolate one step too far, it's confabulating because not because the training data does so but because it needs to provide an answer to the question, and one that's plausible with that
Comment by boxedemp 1 day ago
I've experienced this my entire life and have all but given up trying to have actual conversations with people.
Comment by wrs 1 day ago
Comment by cr125rider 1 day ago
Comment by roxolotl 1 day ago
> If you try to refute it, you’ll just get another confabulation.
> Not because the model is lying to you on purpose, and not because it’s “resistant” or “defensive” in the way a human might be. It’s because the explanation isn’t connected to anything that could be refuted. There is no underlying mental state that generated “I sensed pressure.” There is a token stream that was produced under a reward function that prefers human-sounding, emotionally framed explanations. If you push back, the token stream that gets produced next will be another human-sounding, emotionally framed explanation, shaped by whatever cues your pushback provided.
“It’s because the explanation isn’t connected to anything that could be refuted.” This is one of the key understandings that comes from working with these systems. They are remarkably powerful but there’s no there there. Knowing this I’ve found enables more effective usage because, as the article is describing, you move from a mode of arguing with “a person” to shaping an output.
Comment by jaggederest 1 day ago
Do not argue with the LLM, for it is subtle and quick to anger, and finds you crunchy with ketchup.
These are, broadly, all context management issues - when you see it start to go off track, it's because it has too much, too little, or the wrong context, and you have to fix that, usually by resetting it and priming it correctly the next time. This is why it's advantageous not to "chat" with the robots - treat them as an english-to-code compiler, not a coworker.
Chat to produce a spec, save the spec, clear the context, feed only the spec in as context, if there are issues, adjust the spec, rinse and repeat. Steering the process mid-flight is a) not repeatable and b) exacerbates the issue with lots of back and forth and "you're absolutely correct" that dilutes the instructions you wanted to give.
Comment by en-tro-py 1 day ago
It's just speedrunning context rot.
Comment by girvo 1 day ago
It’s an interesting thesis, it’s not well written or well told
Comment by sleazebreeze 1 day ago
Comment by docheinestages 1 day ago
To the author (and those who write novel-like blogs): I suggest publishing the raw prompt you used to generate such slop instead. We'll have more respect for you if you respect the reader's time.
Comment by atlex2 1 day ago
It's also kind-of their point that they find the information delivery more important than the prose; they're leaning into their situation :-D
Comment by erdaniels 1 day ago
Comment by lovich 1 day ago
` just `, (spaces on either side matter), 11 instances, most seem to be `isnt just`, `wasnt just`, `doesnt just` type pattern
`-`, an en dash instead of an emdash but 59 instances.
This article is either from a clanker and I am pissed off at wasting my time reading it, or from someone who writes like a clanker, and I am pissed off at wasting my time reading it.
Comment by akprasad 1 day ago
> That’s confabulation. Not a metaphor. The same phenomenon.
> Published. Replicated. Not fringe.
> Not to validate it. Not to refute it. Not to engage with its content at all.
Comment by rubslopes 1 day ago
Comment by cyanydeez 1 day ago
It's the new age "Repost"
Comment by girvo 1 day ago
Which is a shame coz the premise is interesting.
Comment by ahaferburg 1 day ago
> HN commenters express their anger about the writing style and how it was probably generated by AI.
Peak HN.
In all seriousness, I really liked the article. For me, it was well written.
Comment by asaaki 1 day ago
Not sure what others have been reading so far, but this article already smells so human, I couldn't imagine anyone mistaken it for LLM output (especially since it is about LLMs and how they fail to work for some of us).
Funnily enough I also consulted three models across different vendors, and all came to the same conclusion (it's very human and very unlikely LLM produced). And yes, I let them all cross reference with older posts which are all pre-LLM era. Takes less than 5 minutes to do so.
Comment by ahaferburg 13 hours ago
On the article itself, I found the title quite relatable. When coding agents became a thing, I kept arguing with LLMs before realizing how pointless it is. I really liked the turn the article took, relating the communication breakdown to neurodivergence, and the inherent bias that current models bring.
It reminds me of this: https://en.wikipedia.org/wiki/Four-sides_model
According to this communication model, there's four aspects to communication between humans. Apart from the factual and the appeal layers, it highlights other aspects, which it calls self-revealing and relationship layers. It is somewhat surprising, and definitely remarkable, that these also matter when communicating with an LLM.
Thank you for posting the article!
Comment by rupayanc 1 day ago
Comment by 10keane 1 day ago