I prompted ChatGPT, Claude, Perplexity, and Gemini and watched my Nginx logs
Posted by startages 20 hours ago
Comments
Comment by lambda 19 hours ago
The writing style is so unclear, it's hard to figure out one of the key points: it mentions that Gemini doesn't use a distinct user-agent for its grounding. It doesn't mention whether it actually hit the endpoint during the test, though it kind of implies that with "Silence from Google is not evidence of no fetch." Uh, if there are no requests coming in live, that means no fetch, it's using a cache of your site.
It makes a difference whether it fetches a page live, or whether it's using a cached copy from a previous crawl; that tells you something about how up-to-date answers are going to be from people asking questions about your website from Gemini. But I guess the LLM writing this article just wanted to make things sound punchy an impressive, not actually communicate useful information.
Anyhow, LLM marketing spam from an LLM marketing spam company. Bleh.
Comment by stronglikedan 17 hours ago
Comment by anygivnthursday 18 hours ago
Comment by startages 18 hours ago
Anyway, in my test I saw zero requests from any Google UA after multiple Gemini and AI mode prompts that should have triggered grounding, so the working interpretation is that Gemini served from its own index/cache rather than doing a live provider-side fetch. The original phrasing was fuzzier than it should have been.
Comment by zenoprax 16 hours ago
> attributing hits was a grep, not a guess > values below are copied from the probe’s log file, not paraphrased > a User-agent: Claude-User disallow is the live control > Only Claude-User is the user-initiated retrieval signal
I could go on and on but I won't. Phrasing aside, the text is too structured with many sections and subsections when the intent was clearly more narrative. "I was curious about X and did Y and I am going to tell you about it."
Signals that suggest a human who cares would be: use of the first-person; demonstrated curiosity, humility, and uncertainty; inline hyperlinks; and any kind of personality or opinion.
"Idiolect" is both subtle and distinct: the choice of vocabulary, grammar, phrasing and colloquial metaphors will vary in kind and frequency for everyone like an intellectual signature. You can sometimes tell if someone has been reading too much of a particular author recently just because of the way the author's choice of vocabulary bleeds into their own speech patterns. Sometimes it's a permanent influence.
I wonder if reading so much LLM stuff lately has affected my idiolect and that I write (or worse, think) more machine-like than before...
Comment by ffsm8 59 minutes ago
Totally of topic ofc, but I always get triggered by the claim that llms are "machine-like". I'm aware it's a total pet peeve and a lil irrational, but "machine-like" would imply to me that it's thinking like a machine, which in turn implies machine intelligence - which in turn implies they're doing something which they aren't.
I'm not trying to undersell their capabilities. Used well they're able to do a lot of things. But the way they achieve it is by mimicking human dialogue and rhetoric processes to facilitate this process. That's in my opinion anything but machine intelligence. I struggle finding an applicable word for it though
Comment by realo 18 hours ago
Don't worry.
Comment by bigyabai 18 hours ago
Comment by worik 16 hours ago
Comment by nryoo 19 hours ago
Comment by bombcar 18 hours ago
Comment by ctime 19 hours ago
The IPs listed in the output are from reserved ranges as well, like they were intentionally obfuscated (but this was not shared with the reader).
It’s the kind of obfuscation that AI would do (using esoteric bogon ranges as well)
Comment by startages 18 hours ago
Comment by reincoder 15 hours ago
https://community.ipinfo.io/t/can-we-detect-ai-agents-we-can...
Most AI crawlers self-identify with a UA. However, Grok uses resproxies and sends a high volume of simultaneous requests. Even though we can detect resproxies, it is not possible to map these resproxy IPs to grok.
I still could not figure out why I saw legitimate Googlebot IPs when I requested Perplexity to review the website. I verified those Googlebot IPs using both using UA and the listed IP address ranges published by Google.
Comment by Auburn_AI 17 hours ago
Comment by hajimuz 19 hours ago
Comment by startages 18 hours ago
ChatGPT-User/1.0 text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9 Claude-User/1.0 / Perplexity-User/1.0 (empty, no Accept header) PerplexityBot/1.0 (empty, no Accept header) ChatGPT sends a Chrome-style Accept string. Claude sends a wildcard. Perplexity sends nothing at all. Gemini didn't fetch in my test.
Also worth noting: Claude-User hit /robots.txt before the page.
Comment by cruffle_duffle 19 hours ago
There are multiple ways these tools access your site and only one of them is “using it for training”. Others are webfetch from chat sessions, “deep research” agents, etc. And those will have different traffic patterns. They aren’t crawlers, they are clumsy, ham handed AI agents doing their humans bidding.
Both can give a site the hug of death. Both can be badly coded. But there is much different intent behind the two and I feel it is important to acknowledge the difference.
Comment by dalton_zk 19 hours ago
Comment by startages 17 hours ago
Comment by realaccfromPL 19 hours ago
Comment by dawolf- 19 hours ago
Comment by worik 16 hours ago
Microsoft pushing up the Linux Desktop count.
I doubt that is corporate policy!
Comment by shermantanktop 19 hours ago
The content is interesting, but it's delivered in an article that smells like slop.
Comment by KaiShips 16 hours ago