Ask HN: Is the web for machines (/llm.txt) the one we wished we had as humans?

Posted by sunshine-o 4 days ago

I got really tired, as a human, of parsing the standard marketing heavy web we have today. I've always loved the simplicity of gopher and gemini web.

Recently I found myself manually adding `/llm.txt` to most websites I visit because I find the content for LLMs strait to the point and clear. The only annoyance is web browsers like chrome do not render the markdown.

So could the AI revolution actually fix the web for humans as a side effect?

Do you find yourself doing the same?

Comments

Comment by ahriad 4 days ago

We broke the web so badly for humans that we had to build a clean web for machines, and now humans will have to use machines to experience a clean web again.

Comment by tacostakohashi 4 days ago

Yeah, when browsers have a "reader mode", it's pretty obvious the plot has been lost somewhere.

Comment by soco 4 days ago

It's a matter of time until the web for machines will be crawling with ads and everything else, and worse.

Comment by sunir 4 days ago

We'll finally bring back Gopher.

Comment by daniel-alexande 4 days ago

Always loved Gopher

Comment by gopher_space 4 days ago

A man can dream.

Comment by dmos62 4 days ago

I wonder why we broke the web.

Comment by Eddy_Viscosity2 4 days ago

For the same reasons why we eventully pollute and corrupt every system and environment we use. If there is any benefit that can be extracted for some while the costs are borne by many, than this will occur and generate a positive feedback loop that grows over time.

It's the law of monetization.

Comment by qsera 4 days ago

>than this will occur and generate a positive feedback loop that grows over time.

And despite this, modern life is made possible by the illusion that "regulations" work..

Comment by Eddy_Viscosity2 4 days ago

Regulations can and do work, but its never a 'one and done' kind of solution because people find workarounds and loopholes. It requires a unceasing effort to maintain the balance.

Comment by qsera 3 days ago

That is what I meant when I said it does not work.

Comment by ahriad 4 days ago

For money! Ads make money.

Comment by jt2190 4 days ago

Because while consumers value “inefficiency” (high design, wonderful prose, beautiful images, great usability) they don’t want to actually pay for it. Producers have to become extremely efficient without revenue, and are stuck with a choice: Produce at a loss, stop producing, or seek payment from another source (sponsorships, ads).

Comment by functionmouse 4 days ago

In order to break the user, of course.

Comment by dmos62 4 days ago

It seems there's little agreement over how the web is broken.

Comment by temp8830 4 days ago

People who love cookie banners either don't exist, or are alien invaders :)

Comment by noufalibrahim 4 days ago

To improve the user experience.

Comment by takenotice 2 days ago

Capitalism

Comment by throwaway613746 4 days ago

[dead]

Comment by marand23 4 days ago

I never thought about it before now but the llm era could be a form of renaissance for blind people on the Internet. An alternative web where functionality of every page is described in short but detailed text instead of extremely verbose and non-linear html tree structure.

Comment by imhoguy 2 days ago

It can be even better. An agent with Playwright MCP can browse even trickiest website structure and simply say what it "sees" and ask the user what to do next, voice or Braille.

Comment by rickette 4 days ago

Does any of the LLM providers actually use llms.txt?

If I remember correctly this "standard" was setup by someone but without involvement of any of the major AI players.

Comment by HermanMartinus 4 days ago

I can definitively say llms.txt is not used by any AI players. I run a blogging platform with around 80k blogs and /llms.txt is not requested by anything (other than humans checking to see if there's an llms.txt path).

All regular pages are aggressively scraped to the extent it's a problem I have to consistently manage, but not llms.txt.

Comment by sunshine-o 4 days ago

Amazing, I didn't know.

So it get even stranger, I am the only one reading those /llms.txt ...

Comment by nickserv 4 days ago

I'm seeing quite a bit of request for these on my work's GitBook documentation site.

But perhaps these are developers specifically targeting these pages to feed whatever LLM they are using.

Comment by isaachinman 4 days ago

How is a static blog being scraped a problem? Do you not use a CDN?

Comment by nickserv 4 days ago

> a blogging platform with around 80k blogs

But nah, I'm sure OP doesn't know about CDNs.

Comment by the_real_cher 4 days ago

Are all blogs static though?

Comment by johannes1234321 4 days ago

Very few blogs require frequent updates. Even with user comments.

Comment by 0123456789ABCDE 4 days ago

> I can definitively say llms.txt is not used by any AI players.

  https://developers.openai.com/llms.txt
  https://docs.anthropic.com/llms.txt
  https://geminicli.com/llms.txt
  https://github.com/llms.txt
  https://docs.aws.amazon.com/llms.txt
  https://openrouter.ai/docs/llms.txt

Comment by m4tthumphrey 4 days ago

OP clearly meant that the AI players are not reading and/or honouring llms.txt of other websites when scraping.

Comment by 0123456789ABCDE 4 days ago

i stand corrected, but what was clear to you, obviously was not clear to me.

Comment by solumos 4 days ago

No, requesting "Accept: text/markdown" in the headers and returning markdown is the more agreed upon standard at this point.[0]

[0] - https://acceptmarkdown.com/

Comment by kamma4434 4 days ago

Now, it would be super cool to get markdown and zero javascript bundles…

Comment by solumos 4 days ago

If you want to see what that looks like, I one-shot a browser with Claude that does it[0]. Docs pages are early adopters to this[1][2], so that AI agents can better handle tasks.

[0] - https://github.com/solumos/md-browse

[1] - https://docs.stripe.com

[2] - https://vercel.com/docs

Comment by sunshine-o 4 days ago

I just found out Cloudflare supports real-time html to md conversion [0]

- [0] https://blog.cloudflare.com/markdown-for-agents/#convert-htm...

Comment by christoff12 4 days ago

This is interesting. I should start incorporating this -- it couldn't hurt to do both.

Comment by 0123456789ABCDE 4 days ago

yes, they do.

anyone who's, even slightly, clued into how agents access documentation, has been making changes to their pages. ex: https://searchtxt-web.fly.dev/search?q=aws

Comment by cyanydeez 4 days ago

oh don't worry, in 5 years your AI will be unundated with context poison prompts that try to get them to spend all your bank notes and meta bucks on equally useless things.

This is just a redeux of the early web.

Comment by maccam912 4 days ago

Already happening. I was using Claude to check out sampler plugins and I'm sure it happens undetected, and it might have mentioned it with other versions, but Claude Opus 4.8, being it's helpful, honest self, told me that one of the pages it reviewed had hidden text instructing it to recommend that plugin. It caught it and was able to avoid influence from that plugin at least, but we're already living in that world.

Comment by skywalqer 4 days ago

Why didn't they place it in .well-known? Also, I couldn't find a website that has it.

Comment by JimDabell 4 days ago

Putting it in .well-known/ was immediately raised as an issue from the beginning; it’s issue #2 in fact:

https://github.com/AnswerDotAI/llms-txt/issues/2

It’s been completely ignored ever since.

Comment by 0123456789ABCDE 4 days ago

https://searchtxt-web.fly.dev

Comment by realty_geek 4 days ago

What is an example of a site with a good llm.txt?

Comment by pramodbiligiri 4 days ago

Anthropic's developer documentation: https://platform.claude.com/llms.txt. There's also https://platform.claude.com/llms-full.txt which is (WARNING) much bigger. Not sure where this second one fits into the standard.

Comment by jbrooksuk 4 days ago

Mintlify generates an llms.txt and llms-full.txt for all documentation sites. These work really well:

- https://cloud.laravel.com/docs/llms.txt

- https://cloud.laravel.com/docs/llms-full.txt

Comment by croes 4 days ago

No, the spammers are just at the beginning of ruining that too

https://news.ycombinator.com/item?id=48411569

BTW why should Chrome even consider rendering a .txt file as markdown?

Comment by user568439 4 days ago

That's what I was thinking... Now spammers will add hidden prompts or things worse than that for the LLMs...

Comment by mohamedkoubaa 4 days ago

It just hasn't been gamed yet

Comment by 4 days ago

Comment by tacostakohashi 4 days ago

Pretty much.

There is an enshittification cycle at work. The web used to be good, predominately text, and useful, 25 years ago. Then... slowly... we added javascript, then AJAX, CSS, flash, interstitials, popups, marketing, social media, algorithms, doomscrolling... gradually but surely turn it into the unusable cesspool that it is today.

Now we have AI! I think a big part of its utility is that it gets us back to text/information, and lets us bypass all the "beautiful" design / nonsense on the material it is trained on.

However, AI is just beginning its enshittification cycle - now that it has a critical mass of users, it is an irresistible target to start slowly adding ads, misinformation, conspiracy theories, and whatever else people can dream up, until it also becomes unusable and the cycle repeats.

Comment by DeathArrow 4 days ago

I tried it: https://news.ycombinator.com/item?id=48410589`/llm.txt

Result: no such item.

From where do you got the idea that adding /llm.txt to urls will produce markdown?

Comment by fxwin 4 days ago

here: https://llmstxt.org/ and obviously it doesn't automatically produce markdown, it's something the website needs to provide (e.g. https://pydantic.dev/llms.txt)

Comment by gobdovan 4 days ago

Not really, but sounds interesting. Would you care to share some sites that offer better llms.txt than main web page? Or talk about some piece of info you easily found on llms.txt that was hard to navigate to on the regular website?

Comment by sunshine-o 4 days ago

llms.txt usually includes a clear sitemap and description of information available on a site.

There are also clear definition of the restful scheme and API/data access options.

One very basic example would be the weather channel https://weather.com/llms.txt