Readit News logoReadit News
dang · 7 days ago
Related ongoing thread:

Google Workspace CLI - https://news.ycombinator.com/item?id=47255881 - March 2026 (136 comments)

sheept · 7 days ago
This feels completely speculative: there's no measure of whether this approach is actually effective.

Personally, I'm skeptical:

- Having the agent look up the JSON schemas and skills to use the CLI still dumps a lot of tokens into its context.

- Designing for AI agents over humans doesn't seem very future proof. Much of the world is still designed for humans, so the developers of agents are incentivized to make agents increasingly tolerate human design.

- This design is novel and may be fairly unfamiliar in the LLM's training data, so I'd imagine the agent would spend more tokens figuring this CLI out compared to a more traditional, human-centered CLI.

gck1 · 7 days ago
Yeah, people seem to forget one of the L's in LLM stands for Language, and human language is likely the largest chunk in training data.

A cli that is well designed for humans is well designed for agents too. The only difference is that you shouldn't dump pages of content that can pollute context needlessly. But then again, you probably shouldn't be dumping pages of content for humans either.

Smaug123 · 7 days ago
It's not obvious that human language is or should be the largest amount of training data. It's much easier to generate training data from computers than from humans, and having more training data is very valuable. In paticular, for example, one could imagine creating a vast number of debugging problems, with logs and associated command outputs, and training on them.
rkagerer · 7 days ago
I also feel like it's just a matter of time until someone cracks the nut of making agents better understand GUI's and more adept at using them.

Is there progress happening in that trajectory?

magospietato · 7 days ago
Surely the skill for a cli tool is a couple of lines describing common usage, and a description of the help system?
sheept · 7 days ago
Sure, but the post itself brags,

> gws ships 100+ SKILL.md files

Which must altogether be hundreds of lines of YAML frontmatter polluting your context.

Deleted Comment

mellosouls · 7 days ago
John Carmack made this observation (cli-centred dev for agents) a year ago:

LLM assistants are going to be a good forcing function to make sure all app features are accessible from a textual interface as well as a gui. Yes, a strong enough AI can drive a gui, but it makes so much more sense to just make the gui a wrapper around a command line interface that an LLM can talk to directly.

https://x.com/ID_AA_Carmack/status/1874124927130886501

https://xcancel.com/ID_AA_Carmack/status/1874124927130886501

Andrej Karpathy reiterated it a couple of weeks ago:

CLIs are super exciting precisely because they are a "legacy" technology, which means AI agents can natively and easily use them, combine them, interact with them via the entire terminal toolkit.

https://x.com/karpathy/status/2026360908398862478

https://xcancel.com/karpathy/status/2026360908398862478

lidn12 · 7 days ago
Thanks for sharing these contents. They are very interesting. I found "making all app features accessible from a textual interface..." actually quite challenging in cerntain domains such as graphics related editing tools. Though many editing functions can be exposed as CLI properly, but the content being edited is very hard to be converted into texts without losing its geometric meaning. Maybe this is where we truly need the multimodal models or where training on specialized data is needed.
Terretta · 4 days ago
> the content being edited is very hard to be converted into texts

For decades now, pro design print shops have required text files describing the design to print from.

And as every Danish pelican cyclist knows, graphics are their most scalable as text vectors.

Inkscape does fine with these.

utopiah · 7 days ago
That's how artificial this "intelligence" is, when LLMs can't even use text based tools full of txt based documentation formatted coherently without those very tools being adapted.
bonoboTP · 7 days ago
It looks like an AI generated fluff article without any evidence. People also did this for image generators as if you needed these arcane templates to prompt them, but actually the latest models are great at figuring out what you want from messy human input. Similarly LLMs can use regular CLI just fine. But how do you write a hype FOMO article about the fact that actually you don't need to do anything...
jsunderland323 · 7 days ago
I'm working on a CLI now.

The pattern I used was this:

1) made a docs command that printed out the path of the available docs

$ my-cli docs

- README.md

- DOC1.md

- dir2/DOC2.md

2) added a --path flag to print out a specific doc (tried to keep each doc less than 400 lines).

$ my-cli docs --path dir2/DOC2.md

# Contents of DOC2.md

3) added embeddings so I could do semantic search

$ my-cli search "how do I install x?"

[1] DOC1.md

"You can install x by ..."

[2] dir2/DOC2.md

"after you install..."

You then just need a simple skill to tell the agent about the docs and search command.

I actually love this as a pattern, it works really well. I got it to work with i18n too.

danw1979 · 7 days ago
I really like this - especially the embedded search. What do the embeddings and model cost you in terms of binary size ?
jsunderland323 · 7 days ago
Umm it's not as bad I thought it would be. ~16mb per locale. ~28k words in the english docs.

Not schilling, just easier to show you the repo since it's open source. https://github.com/coast-guard/coasts

jeppeb · 7 days ago
The article states that agents work better with JSON than documented flags - that seems counterintuitive . How is this assumption validated?
justinwp · 7 days ago
Try building a CLI with a complex JSON as flags approach. :)
peddling-brink · 7 days ago
I just did the opposite and am seeing better results.

Claude was struggling to use the ‘gh’ command to reliably read and respond to code review line level comments because it had to use the api. I had it write a few simple command line tools and a skill for invoking it, instantly better results.

YMMV

ptak_dev · 2 days ago
This is a great point and something we've been dealing with firsthand. The biggest shift is that AI agents need structured, predictable output — not pretty human-formatted tables. We found that adding a simple --json flag and making error messages machine-parseable made our tools 10x more useful to agents. The other thing worth considering is idempotency — when an agent retries a command because it wasn't sure if it succeeded, you want the second run to be safe. That changed how we think about every CLI we build now.
smy20011 · 7 days ago
Are we reinventing RPC again? Calling CLI program with JSON sounds like RPC call. The schema feels likes something LSP can provided for such function.

Maybe asking agent to write/execute code that wraps CLI is a better solution.

CamperBob2 · 7 days ago
Confused? You won't be, after this week's episode of https://en.wikipedia.org/wiki/SOAP.

Everything old is new again...

tayo42 · 7 days ago
Doesn't powershell have structured input and output?