Show HN: Libretto – Making AI browser automations deterministic
Posted by muchael 2 days ago
Libretto (https://libretto.sh) is a Skill+CLI that makes it easy for your coding agent to generate deterministic browser automations and debug existing ones. Key shift is going from “give an agent a prompt at runtime and hope it figures things out” to: “Use coding agents to generate real scripts you can inspect, run, and debug”.
Here’s a demo: https://www.youtube.com/watch?v=0cDpIntmHAM. Docs start at https://libretto.sh/docs/get-started/introduction.
We spent a year building and maintaining browser automations for EHR and payer portal integrations at our healthcare startup. Building these automations and debugging failed ones was incredibly time-consuming.
There’s lots of tools that use runtime AI like Browseruse and Stagehand which we tried, but (1) they’re reliant on custom DOM parsing that's unreliable on older and complicated websites (including all of healthcare). Using a website’s internal network calls is faster and more reliable when possible. (2) They can be expensive since they rely on lots of AI calls and for workflows with complicated logic you can’t always rely on caching actions to make sure it will work. (3) They’re at runtime so it’s not interpretable what the agent is going to do. You kind of hope you prompted it correctly to do the right thing, but legacy workflows are often unintuitive and inconsistent across sites so you can’t trust an agent to just figure it out at runtime. (4) They don’t really help you generate new automations or help you debug automation failures.
We wanted a way to reliably generate and maintain browser automations in messy, high-stakes environments, without relying on fragile runtime agents.
Libretto is different because instead of runtime agents it uses “development-time AI”: scripts are generated ahead of time as actual code you can read and control, not opaque agent behavior at runtime. Instead of a black box, you own the code and can inspect, modify, version, and debug everything.
Rather than relying on runtime DOM parsing, Libretto takes a hybrid approach combining Playwright UI automation with direct network/API requests within the browser session for better reliability and bot detection evasion.
It records manual user actions to help agents generate and update scripts, supports step-through debugging, has an optional read-only mode to prevent agents from accidentally submitting or modifying data, and generates code that follows all the abstractions and conventions you have already in your coding repo.
Would love to hear how others are building and maintaining browser automations in practice, and any feedback on the approach we’ve taken here.
Comments
Comment by skapadia 1 day ago
2. playwright code generation based on 1, which captures a repeatable workflow
3. agent skills - these can be playwright based, but in some cases if I can just rely on built-in tools like Web Search and Web Fetch, I will.
playwright is one of the unsung heroes of agentic workflows. I heavily rely on it. In addition to the obvious DOM inspection capabilities, the fact that the console and network can be inspected is a game changer for debugging. watching an agent get rapid feedback or do live TDD is one of the most satisfying things ever.
Browser automation and being able to record the graphics buffer as video, during a run, open up many possibilities.
Comment by miohtama 1 day ago
"Claude, reverse engineer the APIs of this website and build a client. Use Dev Tools."
I have succeed 8/8 websites with this.
Sites like Booking.com, Hotels.com, try to identify real humans with their AWS solution and Cloudflare, but you can just solve the captcha yourself, login and the session is in disguishable from a human. Playwright is detected and often blocked.
Comment by muchael 23 hours ago
The playwright codegen tool exists, but the script it generates is super simple and it can't handle loops or data extraction.
So for libretto we often use a mix of instructions + recording my actions for the agent. Makes the process faster than just relying on a description and waiting for the agent to figure out the whole flow
Comment by freedomben 1 day ago
Comment by anthuswilliams 1 day ago
I'm also using Playwright, to automate a platform that has a maze of iframes, referer links, etc. Hopefully I can replace the internals with a script I get from this project.
Comment by muchael 1 day ago
Comment by potter098 1 day ago
Comment by muchael 1 hour ago
Our healthcare workflows are sensitive so if there happens to be some website refactor or something, we want to be kept in the loop and not have AI just try and figure it out.
At the end of the day this all lives as code though, so if you you needed to use the DOM and wanted a workflow that would naturally handle any handle DOM change without escalating to you, then you could just throw a computer use agent at it.
Comment by drob518 1 day ago
Comment by Guillaume86 1 day ago
Comment by muchael 22 hours ago
1. Noticed that the API was a couple seconds faster than spinning up the coding agent
2. Spinning up a separate agent you can't guarantee its behavior, and we wanted to enforce that only a single LLM call was run to read the snapshot and analyze the selector. You can guarantee this with an API call but not with a local coding agent
Comment by Guillaume86 21 hours ago
In that case, the protocol has a feature called "sampling" that allow the MCP server (Libretto) to send completion requests to the MCP client (the main agent/harness the user interacts with), that means that Libretto would not need its own LLM API keys to work, it would piggyback on the LLMs configured in the main harness (sampling support "picking" the style of model you prefer too - smart vs fast etc).
Comment by jimmypk 1 day ago
@muchael: does Libretto constrain the model to prefer accessible-name-based selectors during generation, or does the determinism come primarily from the execution-verification loop (run → fail → self-correct)? The two approaches have meaningfully different failure modes—the first makes the initial code robust, the second only catches brittleness at runtime.
Comment by muchael 23 hours ago
Right now we kind of have a mixture of the 2 approaches, but there's a large room for improvement.
- When libretto performs the code generation it initially inspects the page and sends the network calls/playwright actions using `snapshot` and `exec` tools to test them individually. After it's tested all of individual selectors and thinks it's finished, it creates a script and then runs the script from scratch. Oftentimes the generated script will fail, and that will trigger libretto to identify the failure and update the code and repeat this process until the script works. That iteration process helps make the scripts much more reliable.
- The way our `snapshot` command works is that we send a screenshot + DOM (depending on size may be condensed) to a separate LLM and ask it to figure out the relevant selectors. We do this to not pollute context of main agent with the DOM + lots of screenshots. As a part of that analyzers prompt we tell it to prefer selectors using: data-testid, data-test, aria-label, name, id, role. This just lives in the analyzer prompt and is not deterministic though. It'd be interesting to see if we can improve script quality if we add a hard constraint on the selectors or with different prompting.
I'm also curious if you have any guidance for prompt improvements we can give the snapshot analyzer LLM to help it pick more robust selectors right off the bat.
Comment by z3ugma 1 day ago
Comment by muchael 1 day ago
Comment by tanishqkanc 1 day ago
Comment by terabytest 1 day ago
EDIT: To clarify, I realize there are skill files that can be used with Claude directly, but the snapshot analysis model seems to require a key. Any way to route that effort through Claude Code itself, such as for example exporting the raw snapshot to a file and instructing Claude Code to use a built-in subagent instead?
Comment by muchael 22 hours ago
We can update the config though to allow you to set up snapshot through the CLI instead of going through the API!
Comment by terabytest 10 hours ago
I haven’t used libretto myself yet but I’m excited about having this kind of tool at my disposal as it’s been a need in the past.
Comment by coderw 1 day ago
Comment by muchael 22 hours ago
- We use runtime agents in very specific places. For example on Availity they frequently have popups right after you login, so if there's a failure right after signup we spin up an agent to close it and then resume the flow with basically a try/catch
- We wait for it to fail and then tell the agent to look at the error logs and use `libretto run` command to rerun the workflow and fix the error
We're thinking of extending libretto to handle these better though. Some of our ideas:
- Adding a global/custom fallback steps to every page action. This way we could for example add our popup handler error recovery to all page actions or some subset of them
- Having a hosted version which flags errors and has a coding agent examine the issue and open a PR with the fix
Curious if you have any other ideas!
Comment by cowartc 1 day ago
We used to deal with RPA stuff at work. Always fragile. Good to see evolution in the space.
Comment by admiralrohan 1 day ago
Comment by muchael 22 hours ago
Comment by boriskurikhin 1 day ago
Comment by muchael 1 day ago
- Libretto prefers network requests over DOM interaction when possible, so this will circumvent a lot of complex JS rendering issues
- When you do need the DOM, playwright can handle a lot of the complexity out of the box: playwright will re-query the live DOM at action time and automatically wait for elements to populate. Libretto is also set up to pick selectors like data-testid, aria-label, role, id over class names or positional stuff that's likely to be dynamic.
- At the end of the day the files still live as code so you could always just throw a browser agent at it to handle a part of a workflow if nothing else works
Comment by heyitsaamir 1 day ago
Comment by muchael 1 day ago
Comment by etwigg 1 day ago
Comment by canarias_mate 1 day ago
Comment by messh 2 days ago
Comment by muchael 1 day ago
The implementation is also pretty different:
- libretto gives your agent a single exec tool (instead of different tools for each action) so it can write arbitrary playwright/javascript and is more context efficient
- Also we gave libretto instructions on bot detection avoidance so that it will prefer using network requests for automation (something that other tools don’t support), but will fall back to playwright if it identifies network requests as too risky
Comment by tanishqkanc 1 day ago
libretto gives a similar ability for agents for building scripts but:
- agents automatically run, debug, and test the integrations they write - they have a much better understanding of the semantics of the actions you take (vs. playwright auto-assuming based on where you clicked) - they can parse network requests and use those to make direct API calls instead
there's fundamentally a mismatch where playwright-cli is for building e2e test scripts for your own app but libretto is for building robust web automations
Comment by yehia2amer 1 day ago
Comment by tanishqkanc 1 day ago
// Let AI click await stagehand.act("click on the comments link for the top story");
the issue with this is that there's now runtime non-determinism. We move the AI work during dev-time: AI explores and crawls the website first, and generates a deterministic legible script.
Tangentially, Stagehand's model may have worked 2 years ago when humans still wrote the code, but it's no longer the case. We want to empower agents to do the heavy lifting of building a browser automation for us but reap the benefits of running deterministic, fast, cheap, straightforward code.
Comment by arizen 1 day ago
Comment by seagull 2 days ago
Comment by tanishqkanc 1 day ago
Comment by voidUpdate 1 day ago
Comment by afro88 1 day ago
Comment by daveguy 1 day ago
Edit: nevermind. I see from the website it is MIT. Probably should add a COPYING.md or LICENSE.md to the repository itself.
Comment by tanishqkanc 1 day ago
Comment by gbibas 1 day ago
Comment by tanishqkanc 1 day ago
Comment by devstatic 2 days ago
Comment by tanishqkanc 1 day ago
Comment by arpadav 1 day ago
Comment by tanishqkanc 1 day ago
Comment by cafecito_dev 23 hours ago
Comment by kantaro 1 day ago
Comment by KaiShips 1 day ago
Comment by Unsponsoredio 1 day ago
Comment by WhoffAgents 1 day ago
Comment by danelliot 1 day ago
Comment by raffaeleg 1 day ago
Comment by maxbeech 1 day ago
Comment by huflungdung 1 day ago
Comment by secureotter 1 day ago
Comment by surgical_fire 1 day ago
Comment by alexbike 1 day ago
Comment by dang 1 day ago
Comment by muchael 1 day ago
For more complex cases where libretto can't validate that the network approach would produce the right data (like sites that rely on WebSockets or heavy client-side logic) it falls back to using the DOM with playwright