> A2UI lets agents send declarative component descriptions that clients render using their own native widgets. It's like having agents speak a universal UI language.
(emphasis mine)
Sounds like agents are suddenly able to do what developers have failed at for decades: Writing platform-independent UIs. Maybe this works for simple use cases but beyond that I'm skeptical.
this isn’t the right way to look at it. It’s really server side rendering where the LLM is doing the markup language generation instead of a template. The custom UI is usually higher level. Airbnb has been doing this for years: https://medium.com/airbnb-engineering/a-deep-dive-into-airbn...
Nope, it's just a repackaging of the same problem, except in this case, the problem is solved with APIs and CLI and not jumping through hoops in order to get the AI to do what humans do.
It's about accomplishing a task, not making a bot accomplish a task using the same tools and embodiment context as a human - there's no upside, unless the bot is actually using a humanoid embodiment, and even then, using a CLI and service API is going to be preferable to doing things with UI in nearly every possible case, except where you want to limit to human-ish capabilities, like with gaming, or you want to deceive any monitors into thinking that a human is operating.
It's going to be infinitely easier to wrap a json get/push wrapper around existing APIs or automation interfaces than to universalize some sort of GUI interactions, because LLM's don't have the realtime memory you need to adapt to all the edge cases on the fly. It's incredibly difficult for humans, and hundreds of billions of dollars have been spent trying to make software universally accessible and dumbed down for users, and still ends up being either stupidly limited, or fractally complex in the tail, and no developer can ever account for all the possible ways in which users interact with a feature for any moderately complex piece of software.
Just use existing automation patterns. This is one case where if an AI picks up this capability alongside other advances, then awesome, but any sort of middleware is going to be a huge hack that immediately gets obsoleted by frontier models as a matter of course.
Sure. HTML is a Markup-Language (it's in the acronym). Markdown is also a Markup Language. LLMs are super good at Markdown and just about every chatbot frontend now has a renderer built in.
A2UI is a superset, expanding in to more element types. If we're going to have the origin of all our data streams be string-output-generators, this seems like an ok way to go.
I've joined an effort inside Google to work in this exact space, though what we're doing has no plan to become open source, other groups are working on stuff like A2UI and we collaborate with them.
My career previous to this was nearly 20 years of native platform UI programming and things like Flutter, React Native, etc have always really annoyed me. But I've come around this year to accept that as long as LLMs on servers are going to be where the applications of the future live, we need a client-OS agnostic framework like this.
I've thought about how to write a platform independent UI framework that doesn't care what language you write it in, and every time I find myself reinventing X.org or at least my gut tells me I'm just reinventing a cross-platform X server implementation.
Well it is open source and they expect the community to add more renderers. So if you are a sveltekit specialist this could actually be an opportunity.
I see how useful a universal UI language working across platforms is, but when I look at some examples from this protocol, I have the feeling it will eventually converge to what we already have, html. Instead of making all platforms support this new universal markup language, why not make them support html, which some already do, and which llms are already trained on.
A key challenge with HTML is client side trust. How do I enable an agent platform (say Gemini, Claude, OpenAI) to render UI from an untrusted 3p agent that’s integrated with the platform? This is a common scenario in the enterprise version of these apps - eg I want to use the agent from (insert saas vendor) alongside my company’s home grown agents and data.
Most HTML is actually HTML+CSS+JS - IMO, accepting this is a code injection attack waiting to happen. By abstracting to JSON, a client can safely render UI without this concern.
If the JSON protocol in question supports arbitrary behaviors and styles, then you still have an injection problem even over JSON. If it doesn't support them you don't need to support those in an HTML protocol either, and you can solve the injection problem the way we already do: sanitizing the HTML to remove all/some (depending on your specific requirements) script tags, event listeners, etc.
Perhaps the protocol, is then html/css/js in a strict sandbox. Component has no access to anything outside of component bounds (no network, no dom/object access, no draw access, etc).
We’ve had variations of “JSON describes the screen, clients render it” for years; the hard parts weren’t the wire format, they were versioning components, debugging state when something breaks on a specific client, and not painting yourself into a corner with a too-clever layout DSL.
The genuinely interesting bit here is the security boundary: agents can only speak in terms of a vetted component catalog, and the client owns execution. If you get that right, you can swap the agent for a rules engine or a human operator and keep the same protocol. My guess is the spec that wins won’t be the one with the coolest demos, but the one boring enough that a product team can live with it for 5-10 years.
I agree that it's annoying to have competing standards, but when dealing with a lot of unknowns it's better to allow divergence and exploration. It's a worse use of time to quibble over the best way to do things when we have no meaningful data yet to justify any decision. Companies need freedom to experiment on the best approach for all these new AI use cases. We'll then learn what is great/terrible in each approach. Over time, we should expect and encourage consolidation around a single set of standards.
> when dealing with a lot of unknowns it's better to allow divergence and exploration
I completely agree, though I'm personally sitting out all of these protocols/frameworks/libraries. In 6 months time half of them will have been abandoned, and the other half will have morphed into something very different and incompatible.
For the time being, I just build things from scratch, which–as others have noted¹–is actually not that difficult, gives you understanding of what goes on under the hood, and doesn't tie you to someone else's innovation pace (whether it's higher or lower).
Sounds like a lot of people got paid because of it. That's a win for them. It wasn't their decision, it was company decision to take part in the race. Most likely there will be more than 1 winner anyway.
I'm one of these people. We have to start working on the problem many months before the competition announces that they exist. So we are all just doing parallel evolution here. Everyone agrees that to sit and wait for a standard means you wouldn't waste energy, but you'd also have no influence.
Like you mentioned, its a good time to be employed.
Unlike many of those approaches which concern themselves with delivery of human-designed static UI, this seems to be a tool designed to support generative UIs. I personally think that's a non-starter and much prefer the more incremental "let the agent call a tool that renders a specific pre-made UI" approach of MCP UI/Apps, OpenAI Apps SDK, etc for now.
Oh yes. I send this all the time. And I also see the irony.
I can justify A2UI as doing something not otherwise accomplishable in the market today, but you saw how long the blog post was trying to explain that. :shrug:
I wouldn't want this anywhere near production, but for rapid prototyping this seems great. People famously can't articulate what they want until they get to play around with it. This lets you skip right to the part where you realize they want something completely different from what was first described without having to build the first iteration by hand
Honestly the point of this is not to help app developers—it's to replace the need for apps altogether.
The vision here is that you can chat with Gemini, and it can generate an app on the fly to solve your problem. For the visualized landscaping app, it could just connect to landscapers via their Google Business Profile.
As an app developer, I'm actually not even against this. The amount of human effort that goes into creating and maintaining thousands of duplicative apps is wasteful.
This sounds like they creators think that even more duplicative apps that no one knows how it works or what the code even looks like... is a better idea?
How many times are users going to spin GPUs to create the same app?
In an ideal world, people would be implementing UI/UX accessibility in the first place, and a lot of those problems would be solved in the first place.
But one can also hope that having the motivation to get agents running on those things could actually bring a lot of accessibility features to newer apps.
(emphasis mine)
Sounds like agents are suddenly able to do what developers have failed at for decades: Writing platform-independent UIs. Maybe this works for simple use cases but beyond that I'm skeptical.
It's about accomplishing a task, not making a bot accomplish a task using the same tools and embodiment context as a human - there's no upside, unless the bot is actually using a humanoid embodiment, and even then, using a CLI and service API is going to be preferable to doing things with UI in nearly every possible case, except where you want to limit to human-ish capabilities, like with gaming, or you want to deceive any monitors into thinking that a human is operating.
It's going to be infinitely easier to wrap a json get/push wrapper around existing APIs or automation interfaces than to universalize some sort of GUI interactions, because LLM's don't have the realtime memory you need to adapt to all the edge cases on the fly. It's incredibly difficult for humans, and hundreds of billions of dollars have been spent trying to make software universally accessible and dumbed down for users, and still ends up being either stupidly limited, or fractally complex in the tail, and no developer can ever account for all the possible ways in which users interact with a feature for any moderately complex piece of software.
Just use existing automation patterns. This is one case where if an AI picks up this capability alongside other advances, then awesome, but any sort of middleware is going to be a huge hack that immediately gets obsoleted by frontier models as a matter of course.
A2UI is a superset, expanding in to more element types. If we're going to have the origin of all our data streams be string-output-generators, this seems like an ok way to go.
I've joined an effort inside Google to work in this exact space, though what we're doing has no plan to become open source, other groups are working on stuff like A2UI and we collaborate with them.
My career previous to this was nearly 20 years of native platform UI programming and things like Flutter, React Native, etc have always really annoyed me. But I've come around this year to accept that as long as LLMs on servers are going to be where the applications of the future live, we need a client-OS agnostic framework like this.
[1] https://a2ui.org/renderers/
Some examples from the documentation: { "id": "settings-tabs", "component": { "Tabs": { "tabItems": [ {"title": {"literalString": "General"}, "child": "general-settings"}, {"title": {"literalString": "Privacy"}, "child": "privacy-settings"}, {"title": {"literalString": "Advanced"}, "child": "advanced-settings"} ] } } }
{ "id": "email-input", "component": { "TextField": { "label": {"literalString": "Email Address"}, "text": {"path": "/user/email"}, "textFieldType": "shortText" } } }
Most HTML is actually HTML+CSS+JS - IMO, accepting this is a code injection attack waiting to happen. By abstracting to JSON, a client can safely render UI without this concern.
The genuinely interesting bit here is the security boundary: agents can only speak in terms of a vetted component catalog, and the client owns execution. If you get that right, you can swap the agent for a rules engine or a human operator and keep the same protocol. My guess is the spec that wins won’t be the one with the coolest demos, but the one boring enough that a product team can live with it for 5-10 years.
How many more variants are we introducing to solve the same problem. Sounds like a lot of wasted manhours to me.
I completely agree, though I'm personally sitting out all of these protocols/frameworks/libraries. In 6 months time half of them will have been abandoned, and the other half will have morphed into something very different and incompatible.
For the time being, I just build things from scratch, which–as others have noted¹–is actually not that difficult, gives you understanding of what goes on under the hood, and doesn't tie you to someone else's innovation pace (whether it's higher or lower).
¹ https://fly.io/blog/everyone-write-an-agent/
https://www.copilotkit.ai/ag-ui-and-a2ui
Sounds like a lot of people got paid because of it. That's a win for them. It wasn't their decision, it was company decision to take part in the race. Most likely there will be more than 1 winner anyway.
Like you mentioned, its a good time to be employed.
Making an agent call a tool to manipulate a UI does feel like normal application development and an event driven interaction... I get that.
What else drives your preference?
Deleted Comment
I can justify A2UI as doing something not otherwise accomplishable in the market today, but you saw how long the blog post was trying to explain that. :shrug:
The vision here is that you can chat with Gemini, and it can generate an app on the fly to solve your problem. For the visualized landscaping app, it could just connect to landscapers via their Google Business Profile.
As an app developer, I'm actually not even against this. The amount of human effort that goes into creating and maintaining thousands of duplicative apps is wasteful.
How many times are users going to spin GPUs to create the same app?
Deleted Comment
However, I'm happy it's happening because you don't need an LLM to use the protocol.