I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours
Posted by pbowyer 1 day ago
Comments
Comment by simonw 1 day ago
The big unlock here is https://github.com/html5lib/html5lib-tests - a collection of 9,000+ HTML5 parser tests that are their own independent file format, e.g. this one: https://github.com/html5lib/html5lib-tests/blob/master/tree-...
The Servo html5ever Rust codebase uses them. Emil's JustHTML Python library used them too. Now my JavaScript version gets to tap into the same collection.
This meant that I could set a coding agent loose to crunch away on porting that Python code to JavaScript and have it keep going until that enormous existing test suite passed.
Sadly conformance test suites like html5lib-tests aren't that common... but they do exist elsewhere. I think it would be interesting to collect as many of those as possible.
Comment by avsm 22 hours ago
This run has (just in the last hour) combined the html5lib expect tests with https://github.com/validator/validator/tree/main/tests (which are a complex mix of Java RELAX NG stylesheets and code) in order to build a low-dependency pure OCaml HTML5 validator with types and modules.
This feels like formal verification in reverse: we're starting from a scattered set of facts (the expect tests) and iterating towards more structured specifications, using functional languages like OCaml/Haskell as convenient executable pitstops while driving towards proof reconstruction in something like Lean.
Comment by leafmeal 15 hours ago
Comment by Havoc 20 hours ago
Turns out they're quite good at that sort of pattern matching cross languages. Makes sense from a latent space perspective I guess
Comment by gwking 1 day ago
Comment by skissane 1 day ago
Comment by gaigalas 1 day ago
That is significantly harder to do than writing an implementation from tests, especially for codebases that previously didn't have any testing infrastructure.
Comment by skissane 1 day ago
Comment by joshstrange 15 hours ago
If you’ve actually tried this, and actually read the results, you’d know this does not work well. It might write a few decent tests but get ready for an impressive number of tests and cases but no real coverage.
I did this literally 2 days ago and it churned for a while and spit out hundreds of tests! Great news right? Well, no, they did stupid things like “Create an instance of the class (new MyClass), now make sure it’s the right class type”. It also created multiple tests that created maps then asserted the values existed and matched… matched the maps it created in the test… without ever touching the underlying code it was supposed to be testing.
I’ve tested this on new codebases, old codebases, and vibe coded codebases, the results vary slightly and you absolutely can use LLMs to help with writing tests, no doubt, but “Just throw an agent at it” does not work.
Comment by gaigalas 1 day ago
Comment by pbowyer 1 day ago
Having a standard test input/output format would let test definitions be shared between libraries.
Comment by sfjailbird 22 hours ago
Comment by sciurus 20 hours ago
Comment by k__ 22 hours ago
Comment by cr125rider 1 day ago
Comment by pplonski86 22 hours ago
Comment by bzmrgonz 22 hours ago
Comment by exclipy 20 hours ago
Comment by heavyset_go 1 day ago
Comment by simonw 1 day ago
Comment by heavyset_go 1 day ago
Comment by cortesoft 1 day ago
Why are you making your stuff open source in the first place if you don't want other people to build off of it?
Comment by heavyset_go 1 day ago
Because I enjoy the craft. I will enjoy it less if I know I'm being ripped off, likely for profit, hence my deliberate choices of licenses, what gets released and what gets siloed.
I'm happy if someone builds off of my work, as long as it's on my own terms.
Comment by nicoburns 20 hours ago
There are strong parallels to the image generation models that generate images in the style of studio ghibli films. Does that benefit studio ghibli? I'd argue not. And if we're not careful, it will to undermine the business model that produced the artwork in the first place (which the AI is not currently capable of doing).
Comment by bgwalter 1 day ago
1) Ensuring that there is no malicious code and enabling you to build it yourself.
2) Making modifications for yourself (Stallman's printer is the famous example).
3) Using other people's code in your own projects.
Item 3) is wildly over-propagandized as the sole reason for open source. Hard forks have traditionally led to massive flame wars.
We are now being told by corporations and their "AI" shills that we should diligently publish everything for free so the IP thieves can profit more easily. There is no reason to oblige them. Hiding test suites in order to make translations more difficult is a great first step.
Comment by inejge 1 day ago
Provided that the project is popular and has a community, especially a contributor community (the two don't have to go together.) Most projects aren't that prominent.
Comment by visarga 1 day ago
The rest is enshittified web, focused on attention grabbing, retention dark patterns and misinformation. They all exist to make a profit off our backs.
A pattern I see is that we moved on from passive consumption and now want interactivity, sociality and reuse. We like to create together.
Comment by tracnar 1 day ago
It doesn't work for everything of course but it's a nice way to bug-for-bug compatible rewrites.
Comment by aadishv 1 day ago
Comment by montroser 1 day ago
Comment by simonw 1 day ago
Coding agents are fantastic at these kinds of loops.
Comment by cies 1 day ago
Also: it may be interesting to port it to other languages too and see how they do.
JS and Py are but runtime-typed and very well "spoken" by LLMs. Other languages may require a lot more "work" (data types, etc.) to get the port done.
Comment by cxr 1 day ago
This blog post isn't really about HTML parsers, however. The JustHTML port described in this blog post was a worthwhile exercise as a demonstration on its own.
Even so, I suspect that for this particular application, it would have been more productive/valuable to port the Java codebase to TypeScript rather than using the already vibe coded JustHTML as a starting point. Most of the value of what is demonstrated by JustHTML's existence in either form comes from Stenström's initial work.
Comment by simonw 1 day ago
Here's the relevant folder:
https://github.com/mozilla-firefox/firefox/tree/main/parser/...
make translate # perform the Java-to-C++ translation from the remote
# sources
And active commits to that javasrc folder - the last was in November: https://github.com/mozilla-firefox/firefox/commits/main/pars...Comment by cxr 1 day ago
(a) permit a fully mechanical, on-the-fly rederivation of the canonical TypeScript sources into Java, for Java consumers that need it (a lot like the ts->js step that happens for execution on JS engines), and
(b) compiler support that can go straight from the TypeScript subset used in the parser to a binary that's as performant as the current native implementation, without requiring any intermediate C++ form to be emitted or reviewed/vetted/maintained by hand
(Sidenote: Hejlsberg is being weird/not entirely forthcoming about the overall goals wrt the announcement last year about porting the TypeScript compiler to Go. We're due for an announcement that they've done something like lifted the Go compilers' backends out of the golang.org toolchain, strapped the legacy tsc frontend on top, allowing the TypeScript compiler to continue to be developed and maintained in TypeScript while executing with the performance previously seen mostly with tools written in Go vs those making do with running on V8.)
I agree with the overall conclusion of the post that what is demonstrated there is a good use case for LLMs. It might even be the best use for them, albeit something to be undertaken/maintained as part of the original project. It wouldn't be hugely surprising if that turned out to be the dominant use of LLM-powered coding assistants when everything shakes out (all the other promises that have been made for and about them notwithstanding).
No real reason that they couldn't play a significant role in the project I outlined above.
Comment by simonw 1 day ago
... and then when I checked the henri-sivonen tag https://simonwillison.net/tags/henri-sivonen/ found out I'd previously written about the exact same thing 16 years earlier!
Comment by po 1 day ago
Comment by simonw 1 day ago
I picked JustHTML as a base because I really liked the API Emil had designed, and I also thought it would be darkly amusing to take his painstakingly (1,000+ commits, 2 months+ of work) constructed library and see if I could port it directly to Python in an evening, taking advantage of everything he had already figured out.
Comment by QuantumNomad_ 1 day ago
The MIT family of licenses state that the copyright notice and terms shall be included in all copies of the software.
Porting code to a different language is in my opinion not much different from forking a project and making changes to it, small or big.
I therefore think the right thing to do is to keep the original copyright notice and license file, and adding your additional copyright line to it.
So for example if the original project had an MIT license file that said
Copyright 2019 Suchandsuch
Permission is hereby granted and so on
You should keep all of that and add your copyright year and author name on the next line after the original line or lines of the authors of the repo you took the code from.
Comment by simonw 1 day ago
I'm not certain I should add the html5ever copyright holders, since I don't have a strong understanding of how much of their IP ended up in Emil's work - see https://news.ycombinator.com/item?id=46264195#46267059
Comment by EmilStenstrom 1 day ago
Comment by fergie 21 hours ago
Comment by aster0id 1 day ago
I personally think that even before LLMs, the cost of code wasn't necessarily the cost of typing out the characters in the right order, but having a human actually understand it to the extent that changes can be made. This continues to be true for the most part. You can vibe code your way into a lot of working code, but you'll inevitably hit a hairy bug or a real world context dependency that the LLM just cannot solve, and that is when you need a human to actually understand everything inside out and step in to fix the problem.
Comment by monkpit 1 day ago
Comment by killingtime74 1 day ago
Comment by doganugurlu 1 day ago
Doesn’t matter how quick it is to write from scratch, if you want varying inputs handled by the same piece of code, you need maintainability.
In a way, software development is all about adding new constraints to a system and making sure the old constraints are still satisfied.
Comment by skydhash 1 day ago
Comment by f311a 1 day ago
Verified Compliance: Passes all 9k+ tests in the official html5lib-tests suite (used by browser vendors).
Yes, browsers do you use it. But they handle a lot of stuff differently. selectolax 68% No Very Fast CSS selectors C-based (Lexbor). Very fast but less compliant.
The original author compares selectolax to html5lib-tests, but the reality is that when you compare selectolax to Chrome output, you get 90%+.One of the tests:
INPUT: <svg><foreignObject></foreignObject><title></svg>foo
It fails for selectolax: Expected:
| <html>
| <head>
| <body>
| <svg svg>
| <svg foreignObject>
| <svg title>
| "foo"
Actual:
| <html>
| <head>
| <body>
| <svg>
| <foreignObject>
| <title>
| "foo"
But you get this in Chrome and selectolax: <html><head></head><body><svg><foreignObject></foreignObject><title></title></svg>foo
</body></html>Comment by EmilStenstrom 1 day ago
You are also looking at the test format of the tag, when serialized to HTML the svg prefixes will disappear.
Comment by seinecle 1 day ago
https://martinalderson.com/posts/has-the-cost-of-software-ju...
This last post was largely dismissed in the comments here on HN. Simon's experiment brings new ground for the argument.
Comment by akie 1 day ago
These two preconditions don't generally apply to software projects. Most of the time there are vague, underspecified, frequently changing requirements, no test suite, and no API design.
If all projects came with 9000 pre-existing tests and fleshed-out API, then sure, the article you linked to could be correct. But that's not really the case.
Comment by jillesvangurp 1 day ago
Once you have that, you port over the tests to a new language and generate an implementation that passes all those tests. You might want to do some reviews of the tests but it's a good approach. It will likely result in bug for bug compatible software.
Where it gets interesting is figuring out what to do with all the bugs you might find along the way.
Comment by baq 23 hours ago
if there exists a language specific test harness, you can ask the LLMs to port it before porting the project itself.
if it doesn't, you can ask the LLM to build one first, for the original project, according to specs.
if there are no specs, you can ask the LLM to write the specs according to the available docs.
if there are no docs, you can ask the LLM to write them.
if all the above sounds ridiculous, I agree. it's also effective - go try it.
(if there is no source, you can attempt to decompile the binaries. this is hard, but LLMs can use ghidra, too. this is probably unreasonable and ineffective today, though.)
Comment by philipwhiuk 20 hours ago
And you have no idea if that is necessary and sufficient at this point.
You are building on sand.
Comment by minimaxir 1 day ago
> Does this library represent a legal violation of copyright of either the Rust library or the Python one? Even if this is legal, is it ethical to build a library in this way?
Currently, I am experimenting with two projects in Claude Code: a Rust/Python port of a Python repo which necessitates a full rewrite to get the desired performance/feature improvements, and a Rust/Python port of a JavaScript repo mostly because I refuse to install Node (the speed improvement is nice though).
In both of those cases, the source repos are permissively licensed (MIT), which I interpret as the developer intent as to how their code should used. It is in the spirit of open source to produce better code by iterating on existing code, as that's how the software ecosystem grows. That would be the case whether a human wrote the porting code or not. If Claude 4.5 Opus can produce better/faster code which has the same functionality and passes all the tests, that's a win for the ecosystem.
As courtesy and transparency, I will still link and reference the original project in addition to disclosing the Agent use, although those things aren't likely required and others may not do the same. That said, I'm definitely not using an agent to port any GPL-licensed code.
Comment by throwup238 1 day ago
IANAL but regardless of the license, you have to respect their copyright and it’s hard to argue that an LLM ported library is anything but a derivative work. You would still have to include the original copyright notices and retain the license (again IANAL).
Comment by minimaxir 1 day ago
Comment by throwup238 1 day ago
It’s a lot easier to argue that it’s a derivative work when you feed the copyrighted code directly into the context and ask it to port it to another language. If the copyrighted code is literally an input to the inference request, that would not escape any judge’s notice. The law may not have any precedent for this technology but judges aren’t automatons beholden to trivially buggy code that can’t adapt.
Comment by simonw 1 day ago
Comment by EmilStenstrom 1 day ago
Comment by simonw 1 day ago
Comment by ZeroGravitas 1 day ago
Comment by jackfranklyn 16 hours ago
For solo devs this changes the calculus entirely. Supporting multiple languages used to mean maintaining multiple codebases - now you can treat the original as canonical and regenerate ports as needed. The test suite becomes the actual artifact you maintain.
Comment by solvedd 15 hours ago
Comment by mirthturtle 1 day ago
Comment by simonw 1 day ago
I'm ready to take a risk to my own reputation in order to demonstrate that this kind of thing is possible. I think it's useful to help people understand that this kind of thing isn't just feasible now, it's somewhat terrifyingly easy.
Comment by ethanpil 1 day ago
> It took two initial prompts and a few tiny follow-ups. GPT-5.2 running in Codex CLI ran uninterrupted for several hours, burned through 1,464,295 input tokens, 97,122,176 cached input tokens and 625,563 output tokens and ended up producing 9,000 lines of fully tested JavaScript across 43 commits.
Using a random LLM cost calculator, this amounts to $28.31... pretty reasonable for functional output.I am now confident that within 5-10 years (most/all?) junior & mid and many senior dev positions are going to drop out enormously.
Source: https://www.llm-prices.com/#it=1464295&cit=97123000&ot=62556...
Comment by elcritch 1 day ago
However this changes the economics for languages with smaller ecosystems!
Comment by afro88 1 day ago
Comment by almostgotcaught 1 day ago
yes because this is what we do all day every day (port existing libraries from one language to another)....
like do y'all hear yourselves or what?
Comment by hatefulheart 1 day ago
The commenter you’re replying to, in their heart of hearts, truly believes in 5 years that an LLM will be writing the majority of the code for a project like say Postgres or Linux.
Worth bearing in mind the boosters said this 5 years ago, and will say this in 5 years time.
Comment by igouy 14 hours ago
> (most/all?) junior & mid and many senior dev positions
Comment by hatefulheart 11 minutes ago
Everyone working in programming is writing code for a project more like Postgres or Linux than they are a project like making a wood cabinet or a life drawing.
Comment by yobbo 21 hours ago
Comment by cjlm 1 day ago
Comment by zamadatix 1 day ago
It'd be really interesting if Simon gave a crack at the above and wrote about his findings in doing so. Or at least, I'd find it interesting :).
Comment by mNovak 1 day ago
I'm curious if this will implicitly drive a shift in the usage of packages / libraries broadly, and if others think this is a good or bad thing. Maybe it cuts down the surface of upstream supply-chain attacks?
Comment by MangoToupe 1 day ago
The package import thing seems like a red herring
Comment by Retr0id 1 day ago
Comment by MangoToupe 16 hours ago
how do you distinguish this from injecting a vulnerable dependency to a dependency list?
Comment by Retr0id 14 hours ago
Comment by MangoToupe 9 hours ago
Comment by Retr0id 8 hours ago
Comment by MangoToupe 6 hours ago
Comment by Retr0id 3 hours ago
Comment by orange_puff 1 day ago
It is enormously useful for the author to know that the code works, but my intuition is if you asked an agent to port files slowly, forming its own plan, making commits every feature, it would still get reasonably close, if not there.
Basically, I am guessing that this impressive output could have been achieved based on how good models are these days with large amounts of input tokens, without running the code against tests.
Comment by EmilStenstrom 1 day ago
Comment by simonw 21 hours ago
I think that represents the bulk of the human work that went into JustHTML - it's really nice, and lifting that directly is the thing that let me build my library almost hands-off and end up with a good result.
Without that I would have had to think a whole lot more about what I was doing here!
Comment by orange_puff 10 hours ago
Comment by simonw 4 hours ago
See also the demo app I vibe-coded against their library here: https://tools.simonwillison.net/justhtml - that's what initially convinced me that the API design was good.
I particularly liked the design of JustHTML's core DOM node: https://github.com/EmilStenstrom/justhtml/blob/main/docs/api... - and the design of the streaming API: https://github.com/EmilStenstrom/justhtml/blob/main/docs/api...
Comment by xarope 1 day ago
I'm a bit sad about this; I'd rather have "had fun" doing the coding, and get AI to create the test cases, than vice versa.
Comment by EmilStenstrom 1 day ago
Comment by vessenes 1 day ago
As is mentioned in the comments, I think the real story here is two fold - one, we're getting longer uninterrupted productive work out of frontier models - yay - and a formal test suite has just gotten vastly more useful in the last few months. I'd love to see more of these made.
Comment by sgc 14 hours ago
Comment by leroman 1 day ago
This specific case worked well, I suspect, since LLMs have a LOT of previous knowledge with HTML, and saw multiple impl and parsing of HTML in the training.
Thus I suspect that in real world attempts of similar projects and any non well domain will fail miserably.
Comment by adastra22 1 day ago
No, seriously. If you break your task into bite sized chunks, do you really need more than that at a time? I rarely do.
Comment by leroman 1 day ago
To your q, I make huge effort in making my prompts as small as possible (to get the best quality output), I go as far as removing imports from source files, writing interfaces and types to use in context instead of fat impl code, write task specific project / feature documentation.. (I automate some of these with a library I use to generate prompts from code and other files - think templating language with extra flags). And still for some tasks my prompt size reaches 10k tokens, where I find the output quality not good enough
Comment by swyx 1 day ago
i think the fun conclusion would be: ideally no better, and no worse. that is the state you arrive it IFF you have complete tests and specs (including probably for performance). now a human team handcrafting would undoubtedly make important choices not clarified in specs, thereby extending the spec. i would argue that human chain of thought from deep involvement in building and using the thing is basically 100% of the value of human handcrafting, because otherwise yeah go nuts giving it to an agent.
Comment by p0w3n3d 13 hours ago
burned through 1,464,295 input tokens, 97,122,176 cached input tokens and 625,563 output tokens
How much did it cost?Comment by dimava 10 hours ago
> I was running this against my $20/month ChatGPT Plus account
Comment by tantalor 1 day ago
No, because it's a derivative work of the base library.
Comment by simonw 1 day ago
Comment by tantalor 1 day ago
I think you can claim the prompt itself. But you didn't create the new code. I'd argue copyright belongs to the original author.
Comment by simonw 1 day ago
This project is the absolute extreme: I handed over exactly 8 prompts, and several of those were just a few words. I count the files on disk as part of the prompts, but those were authored by other people.
The US copyright office say "the resulting work is copyrightable only if it contains sufficient human-authored expressive elements" - https://perkinscoie.com/insights/update/copyright-office-sol... - but what does that actually mean?
Emil's JustHTML project involved several months of work and 1,000+ commits - almost all of the code was written by agents but there was an enormous amount of what I"d consider "human-authored expressive elements" guiding that work.
Many of my smaller AI-assisted projects use prompts like this one:
> Fetch https://observablehq.com/@simonw/openai-clip-in-a-browser and analyze it, then build a tool called is-it-a-bird.html which accepts a photo (selected or drag dropped or pasted) and instantly loads and runs CLIP and reports back on similarity to the word “bird” - pick a threshold and show a green background if the photo is likely a bird
Result: https://tools.simonwillison.net/is-it-a-bird
It was a short prompt, but the Observable notebook it references was authored by me several years ago. The agent also looked at a bunch of other files in my tools repo as part of figuring out what to build.
I think that counts as a great deal of "human-authored expressive elements" by me.
So yeah, this whole thing is really complicated!
Comment by tantalor 1 day ago
Laying claim to anything generated is very likely to fail.
Comment by simonw 1 day ago
Comment by brailsafe 1 day ago
Hmm, it is interesting to think about that situation. Intuitively it would seem to me like there's some nuance between whether work would need to be "thrown out" or whether it just can't be sold as their own creation, marking some kind of divide between code produced and used privately for commercial purposes vs code that is produced and sold/provided publicly as a commercial product. The risk in doing the latter, or entirely throwing out the code, seems like it would be a relatively cheap risk that those companies do anyway all the time.
However, if I as a small business owner made a tool to help other businesses based on LLM code that used some of my own prior work for context, then sold the code itself as a product or sold a product with it as a dependency, it would be a much greater liability for me if it turned out to include copyrighted && unlicensed work that was produced by an LLM that further can't be claimed as my own.
Privately, on servers or in internal tooling not sold commercially, it would perhaps be next to impossible to either identify or enforce those limits. Without explicit attribution to an agent, I have no idea (with certainty anyway) which code anyone on my team has produced with an LLM, and it's not available publicly—aside from pure frontend web stuff—so I wonder in what capacity it would even be possible to throw specific chunks out if it was hypothetically enforceable.
Comment by tantalor 11 hours ago
Comment by leprechaun1066 1 day ago
Comment by simonw 1 day ago
They also frequently offer "liability shields" where their legal teams will go to bat for you if you get sued for copyright infringement based on your usage of their terms.
https://help.openai.com/en/articles/5008634-will-openai-clai...
https://www.anthropic.com/news/expanded-legal-protections-ap...
Comment by visarga 1 day ago
Comment by WhyOhWhyQ 1 day ago
^Claude still thinks it's 2024. This happens to me consistently.
Comment by febed 1 day ago
Comment by simonw 1 day ago
We are going to create a JavaScript port of ~/dev/justhtml - an HTML parsing library that passes the full ~/dev/html5lib-tests test suite. [...]
And later: Configure GitHub Actions test.yml to run that on every commit, then commit and push
Good coding models don't need much of a push to get heavily into automated testing.I used Codex for a few reasons:
1. Claude was down on Sunday when I kicked off tbis project
2. Claude Code is my daily driver and I didn't want to burn through my token allowance on an experiment
3. I wanted to see how well the new GPT-5.2 could handle a long running project
Comment by EmilStenstrom 1 day ago
Comment by rcaught 20 hours ago
Comment by ulrischa 15 hours ago
Comment by fithisux 22 hours ago
There are many OSe out there suffering from the same problem. Lack of drivers.
AI can change it.
Comment by RobertoG 1 day ago
Comment by bambax 23 hours ago
Comment by Mystery-Machine 21 hours ago
It's an interesting assumption that an expert team would build a better library. I'd change this question to: would an expert team build this library better?
Comment by deanc 14 hours ago
Comment by simonw 13 hours ago
https://developers.openai.com/codex/pricing#what-are-the-usa...
ChatGPT Plus with Codex CLI provides "45-225 local messages per 5 hour period".
The https://chatgpt.com/codex/settings/usage is pretty useless right now - it shows that I used "100%" on December 14th - the day I ran this experiment - which presumably matches that Codex stopped working at 6:30pm but then started again when the 5 hour window reset at 7:14pm.
Running this command:
npx @ccusage/codex@latest
Reports these numbers for December 14th along with a pricing estimate: │ Date │ Models │ Input │ Output │ Reasoning │ Cache Read │ Total Tokens │ Cost (USD) │
│ Dec 14, 2025 │ - gpt-5.2 │ 2,988,774 │ 1,271,970 │ 908,526 │ 194,963,328 │ 199,224,072 │ $57.16 │
You can spend a lot of tokens on that $20/month plan!It's possible OpenAI are being generous right now because they see Claude Code as critical competition.
Comment by EmilStenstrom 1 day ago
Comment by pietz 1 day ago
Comment by bgwalter 1 day ago
Comment by EmilStenstrom 1 day ago
The license of html5ever is MIT, meaning the original authors are OK that people do whatever they want with it. I've retained that license and given them acknowledgement (not required by the license) in the README. Simon has done the same, kept the license and given acknowledgement (not required) to me.
We're all good to go.
Comment by teppic 1 day ago
Comment by StarterPro 1 day ago
Comment by simonw 1 day ago
Comment by kjgkjhfkjf 1 day ago
Most projects don't have a detailed spec at the outset. Decades of experience have shown that trying to build a detailed spec upfront does not work out well for a vast class of projects. And many projects don't even have a comprehensive test suite when they go into production!
Comment by simonw 1 day ago
Comment by kjgkjhfkjf 1 day ago
Comment by visarga 1 day ago