I do really like the Unix approach Claude Code takes, because it makes it really easy to create other Unix-like tools and have Claude use them with basically no integration overhead. Just give it the man page for your tool and it'll use it adeptly with no MCP or custom tool definition nonsense. I built a tool that lets Claude use the browser and Claude never has an issue using it.
Definitely searched apt on Debian before I installed the pip pkg. On a somewhat related note, I also thought something broke when `uv tool install mansnip` didn't work.
How does Claude Code use the browser in your script/tool? I've always wanted to control my existing Safari session windows rather than a Chrome or a separate/new Chrome instance.
Most browsers these days expose a control API (like ChromeDevtools Protocol MCP [1]) that open up a socket API and can take in json instructions for bidirectional communication. Chrome is the gold standard here but both Safari and Firefox have their own driver.
For you existing browser session you'd have to start it already with open socket connection as by default that's not enabled but once you do the server should able to find an open local socket and connect to it and execute controls.
worth nothing that this "control browser" hype is quite deceiving and it doesn't really work well imo because LLMs still suck at understanding the DOM so you need various tricks to optimize for that so I would take OP's claims with a giant bag of salt.
Also these automations are really easy to identify and block as they are not organic inputs so the actual use is very limited.
The light switch moment for me is when I realized I can tell claude to use linters instead of telling it to look for problems itself. The later generally works but having it call tools is way more efficient. I didn't even tell it what linters to use, I asked it for suggestions and it gave me about a dozen of suggestions, I installed them and it started using them without further instruction.
I had tried coding with ChatGPT a year or so ago and the effort needed to get anything useful out of it greatly exceeded any benifit, so I went into CC with low expectations, but have been blown away.
As an extension of this idea: for some tasks, rather than asking Claude Code to do a thing, you can often get better results from asking Claude Code to write and run a script to do the thing.
Example: read this log file and extract XYZ from it and show me a table of the results. Instead of having the agent read in the whole log file into the context and try to process it with raw LLM attention, you can get it to read in a sample and then write a script to process the whole thing. This works particularly well when you want to do something with math, like compute a mean or a median. LLMs are bad at doing math on their own, and good at writing scripts to do math for them.
A lot of interesting techniques become possible when you have an agent that can write quick scripts or CLI tools for you, on the fly, and run them as well.
The lightbulb moment for me was to have it make me a smoke test and to tell to run the test and fix issues (with the code it generated) until it passes. iterate over all features in the Todo.md (that I asked it to make). Claude code will go off and do stuff for I dunno, hours?, while I work on something else.
I have a Just task that runs linters (ruff and pyright, in my case), formatter, tests and pre-commit hooks, and have Claude run it every time it thinks it's done with a change. It's good enough that when the checks pass, it's usually complete.
My mind was blown when claude randomly called adb/logcat on my device connected via usb & running my android app, ingesting the real time log streams to debug the application in real time. Mind boggling moment for me. All because it can call "simple" tools/cli application and use their outputs. This has motivated me to adjust some of my own cli applications & tools to have better input, outputs and documentation, so that claude can figure them out out and call them when needed. It will unlock so many interesting workflows, chaining things together (but in a clever way).
I have some repair shop experience, and in my experience, a massive bottleneck in repairing truly complex devices is diagnostics. Often, things are "repaired" by swapping large components until the issue goes away, because diagnosing issues in any more detail is more of an arcane art than something you can teach an average technician to do.
And I can't help but think: what would a cutting edge "CLI ninja" LLM like Claude be able to do if given access to a diagnostic interface that exposes all the logs and sensor readings, a list of known common issues and faults, and a full technical reference manual?
So try it. Ask claude to call the tool that tails the diagnostics/logs. For some languages, like in android or C#, simply running the application generates a ton of logs, never mind on OS level, which has more low-level stuff. Claude reads through it really well and can find bugs for you. You can tell it what you are looking for, tell it a common/correct set of data or expectations, so it can compare it to what it finds in the logs. It solved an issue for me in 2 minutes that I wasn't able to solve in a couple of months. Basically anything you can run and see output for in the terminal, claude can do the same and analyse it at the same time.
This is also a fantastic way for someone to learn the principle of least privilege by setting up a very strict IAM profile for the agent to use without the risk of nuking the system.
All GUI apps are different, each being unhappy in its own way. Moated fiefdoms they are, scattered within the boundaries of their operating system. CLI is a common ground, an integration plaza where the peers meet, streams flow and signals are exchanged. No commitment needs to be made to enter this information bazaar. The closest analog in the GUI world is Smalltalk, but again - you need to pledge your allegiance before entering one.
Just because it says compostable on the container doesn't mean it will actually break down in a reasonable amount of time on your home compost heap, or that they don't leach some environmentally harmful chemicals in the process.
I'd love something like the Emacs approach. Multi-UI's. Graphical, but with an M-x (or anything else) command line prompt in order to do UI tasks scriptable, from within the application or from the outside.
Just because a popular new tool runs in the terminal, doesn't make it a shining example for the "Unix philosophy" lol.
the comparison makes no sense if you think about it for more than 5 seconds and is hacker news clickbait you and i fell for :(
1. Small programs that do a single thing and are easy to comprehend.
2. Those programs integrate with one another to achieve more complex tasks.
3. Text streams are the universal interface and state is represented as text files on disk.
Sounds like the UNIX philosophy is a great match for LLMs that use text streams as their interface. It's just so normalized that we don't even "see" it anymore. The fact that all your tools work on files, are trivially callable by other programs with a single text-based interface of exec(), and output text makes them usable and consumable by an LLM with nothing else needed. This didn't have to be how we built software.
Right, and Claude Code is a large proprietary monolith. There’s nothing particularly UNIXy about it except that it can fork/execve to call ripgrep (or whatever), and that its CLI can use argv or stdin to receive inputs. That’s nowhere enough to make it “UNIX way”.
The Unix philosophy here is less about it being a terminal app (it's a very rich terminal app, lots of redrawing the whole screen etc) and more about the fact that giving a modern LLM the ability to run shell commands unlocks an incredibly useful array of new capabilities.
An LLM can do effectively anything that a human can do by typing commands into a shell now.
I don't remember any advanced computer user, including developers saying that the CLI is dead.
The CLI has been dead for end-users since computers became powerful enough for GUIs, but the CLI has always been there behind the scenes. The closest we have been to the "CLI is dead" mentality was maybe in the late 90s, with pre-OSX MacOS and Windows, but then OSX gave us a proper Unix shell, Windows gave us PowerShell, and Linux and its shell came to dominate the server market.
There was a period in the early-mid 2000s where CLIs were considered passe and an emblem of the past. Some developers relied solely on graphical IDEs on GUI-oriented operating systems, and the transition to Linux everywhere broke that trend. Some people didn't take Linux seriously because it was CLI oriented.
> I don't remember any advanced computer user, including developers saying that the CLI is dead.
Obviously not around during the 90's when the GUI was blowing up thanks to Windows displacing costly commercial Unix machines (Sun, SGI, HP, etc.) By 2000 people were saying Unix was dead and the GUI was the superior interface to a computer. Visual Basic was magic to a lot of people and so many programs were GUI things even if they didn't need to be. Then the web happened and the tables turned.
I think it might loop back around pretty quick. I've been using it to write custom GUI interfaces to streamline how I use the computer, I'm working piecemeal towards and entire desktop environment custom made to my own quirky preferences. In the past a big part of the reason I used the terminal so often for basic things was general frustration and discomfort using the mainstream GUI tools, but that's rapidly changing for me.
My main problem with GUI tooling is that keyboard use is an afterthought in too many of them
With CLI and TUI tools it's keyboard first and the mouse might work if it wasn't too much of a hassle for the dev.
And another issue with GUI tooling is the lack of composability. With a CLI I can input files to one program grab the output and give it to another and another with ease.
With GUI tools I need to have three of them open at the same time and manually open each one. Or find a single tool that does all three things properly.
I implore people who are willing and able to send the contents and indices of their private notes repository to cloud based services to rethink their life decisions.
Not around privacy, mind you. If your notes contain nothing that you wouldn’t mind being subpoenaed or read warrantlessly by the DHS/FBI, then you are wasting your one and only life.
My experience has been the opposite — a shell prompt is too many degrees of freedom for an LLM, and it consistently misses important information.
I’ve had much better luck with constrained, structure tools that give me control over exactly how the tools behave and what context is visible to the LLM.
It seems to be all about making doing the correct thing easy, the hard things possible, and the wrong things very difficult.
I've done exactly this with MCP
{
"name": "unshare_exec",
"description": "Run a binary in isolated Linux namespaces using unshare",
"inputSchema": {
"type": "object",
"properties": {
"binary": {"type": "string"},
"args": {"type": "array", "items": {"type": "string"}}
},
"required": ["binary"],
"additionalProperties": false
}
}
It started as unshare and ended up being a bit of a yakshaving endeavor to make things work but i was able to get some surprisingly good results using gemma3 locally and giving it access to run arbitrary debian based utilities.
https://github.com/day50-dev/Mansnip
wrapping this in an STDIO mcp is probably a smart move.
I should just api-ify the code and include the server in the pip. How hard could this possibly be...
For you existing browser session you'd have to start it already with open socket connection as by default that's not enabled but once you do the server should able to find an open local socket and connect to it and execute controls.
worth nothing that this "control browser" hype is quite deceiving and it doesn't really work well imo because LLMs still suck at understanding the DOM so you need various tricks to optimize for that so I would take OP's claims with a giant bag of salt.
Also these automations are really easy to identify and block as they are not organic inputs so the actual use is very limited.
- https://github.com/ChromeDevTools/chrome-devtools-mcp/
I had tried coding with ChatGPT a year or so ago and the effort needed to get anything useful out of it greatly exceeded any benifit, so I went into CC with low expectations, but have been blown away.
Example: read this log file and extract XYZ from it and show me a table of the results. Instead of having the agent read in the whole log file into the context and try to process it with raw LLM attention, you can get it to read in a sample and then write a script to process the whole thing. This works particularly well when you want to do something with math, like compute a mean or a median. LLMs are bad at doing math on their own, and good at writing scripts to do math for them.
A lot of interesting techniques become possible when you have an agent that can write quick scripts or CLI tools for you, on the fly, and run them as well.
And I can't help but think: what would a cutting edge "CLI ninja" LLM like Claude be able to do if given access to a diagnostic interface that exposes all the logs and sensor readings, a list of known common issues and faults, and a full technical reference manual?
Deleted Comment
Really, GUIs can be formed of a public API with graphics slapped on top. They usually aren't, but they can be.
Yet highly preferred over CLI applications to the common end user.
CLI-only would have stunted the growth of computing.
2. Those programs integrate with one another to achieve more complex tasks.
3. Text streams are the universal interface and state is represented as text files on disk.
Sounds like the UNIX philosophy is a great match for LLMs that use text streams as their interface. It's just so normalized that we don't even "see" it anymore. The fact that all your tools work on files, are trivially callable by other programs with a single text-based interface of exec(), and output text makes them usable and consumable by an LLM with nothing else needed. This didn't have to be how we built software.
An LLM can do effectively anything that a human can do by typing commands into a shell now.
Now, due to tools like claude code, CLI is actually clearly the superior interface.
(At least for now)
It's not supposed to be an us vs them flamewar, of course. But it's fun to see a reversal like this from time to time!
The CLI has been dead for end-users since computers became powerful enough for GUIs, but the CLI has always been there behind the scenes. The closest we have been to the "CLI is dead" mentality was maybe in the late 90s, with pre-OSX MacOS and Windows, but then OSX gave us a proper Unix shell, Windows gave us PowerShell, and Linux and its shell came to dominate the server market.
Obviously not around during the 90's when the GUI was blowing up thanks to Windows displacing costly commercial Unix machines (Sun, SGI, HP, etc.) By 2000 people were saying Unix was dead and the GUI was the superior interface to a computer. Visual Basic was magic to a lot of people and so many programs were GUI things even if they didn't need to be. Then the web happened and the tables turned.
BSD/Mach gave us that, OSX just included it in their operating system.
With CLI and TUI tools it's keyboard first and the mouse might work if it wasn't too much of a hassle for the dev.
And another issue with GUI tooling is the lack of composability. With a CLI I can input files to one program grab the output and give it to another and another with ease.
With GUI tools I need to have three of them open at the same time and manually open each one. Or find a single tool that does all three things properly.
Not around privacy, mind you. If your notes contain nothing that you wouldn’t mind being subpoenaed or read warrantlessly by the DHS/FBI, then you are wasting your one and only life.
I’ve had much better luck with constrained, structure tools that give me control over exactly how the tools behave and what context is visible to the LLM.
It seems to be all about making doing the correct thing easy, the hard things possible, and the wrong things very difficult.
It started as unshare and ended up being a bit of a yakshaving endeavor to make things work but i was able to get some surprisingly good results using gemma3 locally and giving it access to run arbitrary debian based utilities.
I'm curious to see what you've come up with. My local LLM experience has been... sub-par in most cases.