A2UI: A Protocol for Agent-Driven Interfaces

Posted by makeramen 10 hours ago

Comments

Comment by codethief 8 hours ago

> A2UI lets agents send declarative component descriptions that clients render using their own native widgets. It's like having agents speak a universal UI language.

(emphasis mine)

Sounds like agents are suddenly able to do what developers have failed at for decades: Writing platform-independent UIs. Maybe this works for simple use cases but beyond that I'm skeptical.

Comment by observationist 36 minutes ago

Nope, it's just a repackaging of the same problem, except in this case, the problem is solved with APIs and CLI and not jumping through hoops in order to get the AI to do what humans do.

It's about accomplishing a task, not making a bot accomplish a task using the same tools and embodiment context as a human - there's no upside, unless the bot is actually using a humanoid embodiment, and even then, using a CLI and service API is going to be preferable to doing things with UI in nearly every possible case, except where you want to limit to human-ish capabilities, like with gaming, or you want to deceive any monitors into thinking that a human is operating.

It's going to be infinitely easier to wrap a json get/push wrapper around existing APIs or automation interfaces than to universalize some sort of GUI interactions, because LLM's don't have the realtime memory you need to adapt to all the edge cases on the fly. It's incredibly difficult for humans, and hundreds of billions of dollars have been spent trying to make software universally accessible and dumbed down for users, and still ends up being either stupidly limited, or fractally complex in the tail, and no developer can ever account for all the possible ways in which users interact with a feature for any moderately complex piece of software.

Just use existing automation patterns. This is one case where if an AI picks up this capability alongside other advances, then awesome, but any sort of middleware is going to be a huge hack that immediately gets obsoleted by frontier models as a matter of course.

Comment by giancarlostoro 1 hour ago

I've thought about how to write a platform independent UI framework that doesn't care what language you write it in, and every time I find myself reinventing X.org or at least my gut tells me I'm just reinventing a cross-platform X server implementation.

Comment by rockwotj 8 hours ago

this isn’t the right way to look at it. It’s really server side rendering where the LLM is doing the markup language generation instead of a template. The custom UI is usually higher level. Airbnb has been doing this for years: https://medium.com/airbnb-engineering/a-deep-dive-into-airbn...

Comment by hurturue 3 hours ago

platform independent UIs exist - HTML and Electron

Comment by kridsdale3 2 hours ago

Sure. HTML is a Markup-Language (it's in the acronym). Markdown is also a Markup Language. LLMs are super good at Markdown and just about every chatbot frontend now has a renderer built in.

A2UI is a superset, expanding in to more element types. If we're going to have the origin of all our data streams be string-output-generators, this seems like an ok way to go.

I've joined an effort inside Google to work in this exact space, though what we're doing has no plan to become open source, other groups are working on stuff like A2UI and we collaborate with them.

My career previous to this was nearly 20 years of native platform UI programming and things like Flutter, React Native, etc have always really annoyed me. But I've come around this year to accept that as long as LLMs on servers are going to be where the applications of the future live, we need a client-OS agnostic framework like this.

Comment by mentalgear 7 hours ago

It still needs language-specific libraries [1] (and no sveltekit even announced yet :( ).

[1] https://a2ui.org/renderers/

Comment by ddrdrck_ 4 hours ago

Well it is open source and they expect the community to add more renderers. So if you are a sveltekit specialist this could actually be an opportunity.

Comment by epec254 4 hours ago

Plus 1! We’d love community contributions here!

Comment by awei 3 hours ago

I see how useful a universal UI language working across platforms is, but when I look at some examples from this protocol, I have the feeling it will eventually converge to what we already have, html. Instead of making all platforms support this new universal markup language, why not make them support html, which some already do, and which llms are already trained on.

Some examples from the documentation: { "id": "settings-tabs", "component": { "Tabs": { "tabItems": [ {"title": {"literalString": "General"}, "child": "general-settings"}, {"title": {"literalString": "Privacy"}, "child": "privacy-settings"}, {"title": {"literalString": "Advanced"}, "child": "advanced-settings"} ] } } }

{ "id": "email-input", "component": { "TextField": { "label": {"literalString": "Email Address"}, "text": {"path": "/user/email"}, "textFieldType": "shortText" } } }

Comment by epec254 3 hours ago

A key challenge with HTML is client side trust. How do I enable an agent platform (say Gemini, Claude, OpenAI) to render UI from an untrusted 3p agent that’s integrated with the platform? This is a common scenario in the enterprise version of these apps - eg I want to use the agent from (insert saas vendor) alongside my company’s home grown agents and data.

Most HTML is actually HTML+CSS+JS - IMO, accepting this is a code injection attack waiting to happen. By abstracting to JSON, a client can safely render UI without this concern.

Comment by lunar_mycroft 2 hours ago

If the JSON protocol in question supports arbitrary behaviors and styles, then you still have an injection problem even over JSON. If it doesn't support them you don't need to support those in an HTML protocol either, and you can solve the injection problem the way we already do: sanitizing the HTML to remove all/some (depending on your specific requirements) script tags, event listeners, etc.

Comment by epicurean 2 hours ago

Perhaps the protocol, is then html/css/js in a strict sandbox. Component has no access to anything outside of component bounds (no network, no dom/object access, no draw access, etc).

Comment by awei 1 hour ago

I think you can do that with an iframe, but it always makes me nervous

Comment by awei 3 hours ago

Right this makes sense, I wonder if it would then be a good idea to abstract html to JSON, making it impossible to include css and js into it

Comment by epec254 3 hours ago

Curious to learn more what you are thinking?

One challenge is you do likely want JS to process/capture the data - for example, taking the data from a form and turning it into json to send back to the agent

Comment by oooyay 3 hours ago

If you play with A2UIs generator that's effectively what it does, just layer of abstraction or two above what you're describing.

Comment by awei 2 hours ago

That's what I thought too skimming through the documentation, my thinking is that since it does that, which makes sense to avoid script injection, why not do it with "jsonized" html.

Comment by mbossie 8 hours ago

So there's MCP-UI, OpenAI's ChatKit widgets and now Google's A2UI, that I know of. And probably some more...

How many more variants are we introducing to solve the same problem. Sounds like a lot of wasted manhours to me.

Comment by MrOrelliOReilly 8 hours ago

I agree that it's annoying to have competing standards, but when dealing with a lot of unknowns it's better to allow divergence and exploration. It's a worse use of time to quibble over the best way to do things when we have no meaningful data yet to justify any decision. Companies need freedom to experiment on the best approach for all these new AI use cases. We'll then learn what is great/terrible in each approach. Over time, we should expect and encourage consolidation around a single set of standards.

Comment by pscanf 8 hours ago

> when dealing with a lot of unknowns it's better to allow divergence and exploration

I completely agree, though I'm personally sitting out all of these protocols/frameworks/libraries. In 6 months time half of them will have been abandoned, and the other half will have morphed into something very different and incompatible.

For the time being, I just build things from scratch, which–as others have noted¹–is actually not that difficult, gives you understanding of what goes on under the hood, and doesn't tie you to someone else's innovation pace (whether it's higher or lower).

¹ https://fly.io/blog/everyone-write-an-agent/

Comment by kridsdale3 2 hours ago

I recently heard that when automobiles were new the USA quickly ended up in a state with 80 competing manufacturing brands. In a couple decades, the market figured out what customers actually want and what styles and features mattered, and the competition ecosystem consolidated to 5 brands.

The same happened with GPUs in the 90s. When Jensen formed Nvidia there were 70 other companies selling Graphics Cards that you could put in a PCI slot. Now there are 2.

Comment by shireboy 4 hours ago

AGUI sounds similar: https://github.com/ag-ui-protocol/ag-ui

Comment by meander_water 58 minutes ago

This provides a bit more detail on how they relate to each other

https://www.copilotkit.ai/ag-ui-and-a2ui

Comment by epec254 4 hours ago

Same team! AGUI uses a2UI as the protocol under the hood.

Comment by mystifyingpoi 7 hours ago

> Sounds like a lot of wasted manhours to me

Sounds like a lot of people got paid because of it. That's a win for them. It wasn't their decision, it was company decision to take part in the race. Most likely there will be more than 1 winner anyway.

Comment by kridsdale3 2 hours ago

I'm one of these people. We have to start working on the problem many months before the competition announces that they exist. So we are all just doing parallel evolution here. Everyone agrees that to sit and wait for a standard means you wouldn't waste energy, but you'd also have no influence.

Like you mentioned, its a good time to be employed.

Comment by hobofan 6 hours ago

MCP-UI and OpenAI Apps are converging into the MCP Apps extension specification: https://blog.modelcontextprotocol.io/posts/2025-11-21-mcp-ap...

Comment by 8 hours ago

Comment by p_v_doom 5 hours ago

We should make one new standard for everyone to use ...

Comment by askl 7 hours ago

Obligatory https://xkcd.com/927/

Comment by pedrozieg 7 hours ago

We’ve had variations of “JSON describes the screen, clients render it” for years; the hard parts weren’t the wire format, they were versioning components, debugging state when something breaks on a specific client, and not painting yourself into a corner with a too-clever layout DSL.

The genuinely interesting bit here is the security boundary: agents can only speak in terms of a vetted component catalog, and the client owns execution. If you get that right, you can swap the agent for a rules engine or a human operator and keep the same protocol. My guess is the spec that wins won’t be the one with the coolest demos, but the one boring enough that a product team can live with it for 5-10 years.

Comment by wongarsu 8 hours ago

I wouldn't want this anywhere near production, but for rapid prototyping this seems great. People famously can't articulate what they want until they get to play around with it. This lets you skip right to the part where you realize they want something completely different from what was first described without having to build the first iteration by hand

Comment by turnsout 3 hours ago

Honestly the point of this is not to help app developers—it's to replace the need for apps altogether.

The vision here is that you can chat with Gemini, and it can generate an app on the fly to solve your problem. For the visualized landscaping app, it could just connect to landscapers via their Google Business Profile.

As an app developer, I'm actually not even against this. The amount of human effort that goes into creating and maintaining thousands of duplicative apps is wasteful.

Comment by verdverm 35 minutes ago

This sounds like they creators think that even more duplicative apps that no one knows how it works or what the code even looks like... is a better idea?

How many times are users going to spin GPUs to create the same app?

Comment by 2 hours ago

Comment by jadelcastillo 1 hour ago

I think this is a good and pragmatic way to approach the use of LLM systems. By translating to an intermediate language, and then processing further symbolically. But probably you can be prompt injected also if you expose sensible "tools" to the LLM.

Comment by verdverm 25 minutes ago

Am I reading (7) of the data flow correctly?

1. Establish SSE connection

... user event

7. send updates over origin SSE connection

So the client is required to maintain an SSE capable connection for the entire chat session? What if my network drops or I switch to another agent?

Seems an onerous requirement to maintain a connection for the life-time of a session, which can span days (as some people have told us they have done with agents)

Comment by 5 hours ago

Comment by tasoeur 9 hours ago

In an ideal world, people would be implementing UI/UX accessibility in the first place, and a lot of those problems would be solved in the first place. But one can also hope that having the motivation to get agents running on those things could actually bring a lot of accessibility features to newer apps.

Comment by uptownhr 4 hours ago

My approach/prototype using XState with websockets from an MCP server https://github.com/uptownhr/mcp-agentic-ui

Comment by oddrationale 3 hours ago

Seems similar to [Adaptive Cards](https://adaptivecards.io/). Both have a JSON-based UI builder system.

Comment by ceuk 5 hours ago

A few days ago I was predicting to some colleagues a revival of ideas around "server-driven UI" (which never really seemed to catch on) in order to facilitate agentic UIs.

Feels good to have been on the money, but I'm also glad I didn't start a project only to be harpooned by Google straight away

Comment by kridsdale3 1 hour ago

Server Driven UI has absolutely caught on. Not including all the Electron apps out there, things like Instagram's native mobile apps have about half of their screens being SDUI at this point because Meta needs to be able to change them instantly, not with a 3 week release cycle.

Comment by qsort 9 hours ago

This is very interesting if used judiciously, I can see many use cases where I'd want interfaces to be drawn dynamically (e.g. charts for business intelligence.)

What scares me is that even without arbitrary code generation, there's the potential for hallucinations and prompt injection to hit hard if a solution like this isn't sandboxed properly. An automatically generated "confirm purchase" button like in the shown example is... probably something I'd not make entirely unsupervised just yet.

Comment by barbazoo 4 hours ago

This sounds like a way to have the LLM client render dynamic UI. Is this for use during the chat session or yet another way to build actual applications?

Comment by epec254 4 hours ago

Google PM here. Right now, it’s designed for rendering UI widgets inline with a chat conversation - it’s an extension to a2a that lets you stream JSON defining UI components in addition to chat messages.

Comment by kridsdale3 1 hour ago

Google SWE working in this space here. Look up my username (minus the digit) on Moma, let's talk. I can't ID you from your HN handle.

Comment by iristenteije 6 hours ago

I think ultimately GenUI can be integrated into apps more seamlessly, but even if today it's more in context of chat interfaces with prompts, I think it's clear that a wall of text isn't always the best UX/output and it's already a win.

Comment by jy14898 9 hours ago

I never want to unknowingly use an app that's driven this way.

However, I'm happy it's happening because you don't need an LLM to use the protocol.

Comment by 8 hours ago

Comment by zwarag 4 hours ago

Could this be the link that allows designers to design a UI in Figma and let an agent build it via A2UI?

Comment by _pdp_ 7 hours ago

I am fan of using markdown to describe the UI.

It is simple, effective and feels more native to me than some rigid data structure designed for very specific use-cases that may not fit well into your own problem.

Honestly, we should think of Emacs when working with LLMs and kind of try to apply the same philosophy. I am not a fan of Emacs per-se but the parallels are there. Everything is a file and everything is a text in a buffer. The text can be rendered in various ways depending on the consumer.

This is also the philosophy that we use in our own product and it works remarkably well for diverse set of customers. I have not encountered anything that cannot be modelled in this way. It is simple, effective and it allows for a great degree of flexibility when things are not going as well as planned. It works well with streaming too (streaming parsers are not so difficult to do with simple text structures and we have been doing this for ages) and LLMs are trained very well how to produce this type of output - vs anything custom that has not been seen or adopted yet by anyone.

Besides, given that LLMs are getting good at coding and the browser can render iframes in seamless mode, a better and more flexible approach would be to use HTML, CSS and JavaScript instead of what Slack has been doing for ages with their block kit API which we know is very rigid and frustrating to work with. I get why you might want to have a data structures for UI in order to cover CLI tools as well but at the end of the day browsers and clis are completely different things and I don not believe you can meaningfully make it work for both of them unless you are also prepared to dumb it down and target only the lowest common dominator.

Comment by evalstate 8 hours ago

I quite like the look of this one - seems to fit somewhere between the rigid structure of MCP Elicitations and the freeform nature of MCP-UI/Skybridge.

Comment by raybb 8 hours ago

Is there a standard protocol for the way things like Cline sometimes give you multiple choice buttons to click on? Or how does that compare to something like this?

Comment by ChrisArchitect 2 hours ago

Blog post: https://developers.googleblog.com/introducing-a2ui-an-open-p...

Comment by mentalgear 7 hours ago

The way to do this would be to come together and design a common W3C-like standard.

Comment by lowsong 8 hours ago

> A2UI lets agents send declarative component descriptions that clients render using their own native widgets. It's like having agents speak a universal UI language.

Why the hell would anyone want this? Why on earth would you trust an LLM to output a UI? You're just asking for security bugs, UI impersonation attacks, terrible usability, and more. This is a nightmare.

Comment by vidarh 7 hours ago

If done in chat, it's just an alternative to talking to you freeform. Consider Claude Code's multiple-choice questions, which you can trigger by asking it to invoke the right tool, for example.

Comment by DannyBee 6 hours ago

None of the issues go away just because it's in chat?

Freeform looks and acts like text, except for a set of things that someone vetted and made work.

If the interactive diagram or UI you click on now owns you, it doesn't matter if it was inside the chat window or outside the chat window.

Now, in this case, it's not arbitrary UI, but if you believe that the parsing/validation/rendering/two way data binding/incremental composition (the spec requires that you be able to build up UI incrementally) of these components: https://a2ui.org/specification/v0.9-a2ui/#standard-component...

as transported/renderered/etc by NxM combinations of implementations (there are 4 renderers and a bunch of transports right now), is not going to have security issues, i've got a bridge to sell you.

Here, i'll sell it to you in gemini, just click a few times on the "totally safe text box" for me before you sign your name.

My friend once called something a babydoggle - something you know will be a boondoggle, but is still in its small formative stages.

This feels like a babydoggle to me.

Comment by vidarh 4 hours ago

> None of the issues go away just because it's in chat?

There is a wast difference in risk between me clicking a button provided by Claude in my Claude chat, on the basis of conversations I have had with Claude, and clicking a random button on a random website. Both can contain a malicious. One is substantially higher risk. Separately, linking a UI constructed this way up to an agent and let third parties interact with it, is much riskier to you than to them.

> If the interactive diagram or UI you click on now owns you, it doesn't matter if it was inside the chat window or outside the chat window.

In that scenario, the UI elements are irrelevant barring a buggy implementation (yes, I've read the rest, see below), as you can achieve the same things as you can do that way with just presenting the user with a basic link and telling them to press it.

> as transported/renderered/etc by NxM combinations of implementations (there are 4 renderers and a bunch of transports right now), is not going to have security issues, i've got a bridge to sell you.

I very much doubt we'll see many implementations that won't just use a web view for this, and I very much doubt these issues will even fall in the top 10 security issues people will run into with AI tooling. Sure, there will be bugs. You can use this argument against anything that requires changes to client software.

But if you're concerned about the security of clients, mcp and hooks is a far bigger rats nest of things that are inherently risky due to the way they are designed.

Comment by empath75 5 hours ago

I couldn't get this to work with the default model because it's overloaded, but I tried flash-lite, which at least gave me a response, but it only presents an actual UI 1/3rd of the time that I tried the suggested questions in the demo, and otherwise it attempts to ask me a question which doesn't present a ui at all or even do anything in the app -- i had to look at the logs to see what it was trying to do.

Comment by nsonha 7 hours ago

What's agent/AI specific about this? Seems just backend-driven UI

Comment by mannanj 2 hours ago

I want instead of being told “here’s what I think you want to see, now look at it”, “what do you want to see?” And be shown that.

Yes yes we claim the user doesn’t know what they want. I think that’s largely used as an excuse to avoid rethinking how things should meet the users needs and keep status quo where people are made to rely on systems and walled gardens. The goal of this article is UIs should work better for the user. What better way then to let them imagine (or even nudge them with example actions, buttons, text to click to render specific views) in the UI! I’ve been wanting to build something where I just ask in English from options I know I have or otherwise play and hit edges to discover what’s possible and not.

Anyone else thinking along this direction or think I’m missing something obvious here?

Comment by alexgotoi 2 hours ago

So we're reinventing SOAP but for AI agents. Not saying that's bad - sometimes you need to remake old mistakes before you figure out what actually works.

The real question: do UIs even make sense for agents? Like the whole point of a UI is to expose functionality to humans with constraints (screens, mice, attention). Agents don't have those constraints. They can read JSON, call APIs directly, parse docs. Why are we building them middleware to click buttons?

I think this makes sense as a transition layer while we figure out what agent-native architecture looks like. But long-term it's probably training wheels.

Will include this in my https://hackernewsai.com/ newsletter.

Comment by kridsdale3 2 hours ago

The need here is at some point an agent has to produce an output that is consumed by a human with eyes. A pixel grid on a screen is far more high bandwidth to send information to a human than a linear string of text.