This whole Copilot vs HUD debate instantly brought to mind a classic Japanese anime from 1991 called Future GPX Cyber Formula (https://en.wikipedia.org/wiki/Future_GPX_Cyber_Formula). Yeah, it’s a racing anime set in the then-distant future of 2015, where cars come with full-on intelligent AIs.
The main character’s car, Asurada, is basically a "Copilot" in every sense. It was designed by his dad to be more than just a tool, more like a partner that learns, adapts, and grows with the driver. Think emotional support plus tactical analysis with a synthetic voice.
Later in the series, his rival shows up driving a car that feels very much like a HUD concept. It's all about cold data, raw feedback, and zero bonding. Total opposite philosophy.
What’s wild is how accurately it captures the trade-offs we’re still talking about in 2025. If you’re into human-AI interaction or just want to see some shockingly ahead-of-its-time design thinking wrapped in early '90s cyber aesthetics, it’s absolutely worth a watch.
The Anime Yukikaze (2002 - 2005) has some similar themes. It's about a fighter jet pilot using a new AI-supported jet to fight against aliens. It asserts that the combination of human intuition and artificial intelligence trumps either of the two on its own. If I remember correctly, the jet can pilot on its own, but when it becomes dangerous, the human pilot only uses the AI hints instead of letting it autopilot.
Yukikaze is a very interesting novel - still have to sit down and read the novels instead of watching the anime, but an important plot element is both the interactions between humans and their AIs (which are not just "human in a computer" as usual) but also a different take on popular views of which way an AI will decide in a conflict :)
I wish Kambayashi was just more widely known. Or Japanese Sci-Fis and LNs in general. There have been couple legitimate "oh that's now reality" moments for me in real world developments of AI.
I'm very curious if a toggle would be useful that would display a heatmap of a source file showing how surprising each token is to the model. Red tokens are more likely to be errors, bad names, or wrong comments.
Turns out this kind of UI is not only useful to spot bugs, but also allows users to discover implementation choices and design decisions that are obscured by traditional assistant interfaces.
Very exciting indeed. I will definitely do a deep dive into this paper, as my current work is exploring layers of affordances such as these in workflows beyond coding.
This! That's what I wanted since LLMs learned how to code.
And in fact, I think I saw a paper / blog post that showed exactly this, and then... nothing. For the last few years, the tech world became crazy with code generation, with forks of VSCode hooked to LLMs worth billions of dollars and all that. But AI-based code analysis is remarkably poor. The only thing I have seen resembling this is bug report generators, which is I believe is one of the worst approach.
The idea you have, that I also had and I am sure many thousands of other people had seem so obvious, why is no one talking about it? Is there something wrong with it?
The thing is, using such a feature requires a brain between the keyboard and the chair. A "surprising" token can mean many things: a bug, but also a unique feature, anyways, something you should pay attention to. Too much "green" should also be seen as a signal. Maybe you reinvented the wheel and you should use a library instead, or maybe you failed to take into account a use case specific to your application.
Maybe such tools don't make good marketing. You need to be a competent programmer to use them. It won't help you write more lines faster. It doesn't fit the fantasy of making anyone into a programmer with no effort (hint: learning a programming language is not the hard part). It doesn't generate the busywork of AI 1 introducing bugs for AI 2 to create tickets for.
> The idea you have, that I also had and I am sure many thousands of other people had seem so obvious, why is no one talking about it? Is there something wrong with it?
I expect it definitely requires some iteration, I don't think you can just map logits to heat, you get a lot of noise that way.
Honestly I just never really thought about it. But now it seems obvious that AI should be continuously working in the background to analyze code (and the codebase) and could even tie into the theme of this thread by providing some type of programming HUD.
Even if something is surprising just because it's a novel algorithm, it warrants better documentation - but commenting the code explaining how it works will make the code itself less surprising!
In short, it's probably possible (and it's maybe a good engineering practice) to structure the source such as no specific part is really surprising
It reminds me how LLMs finally made people to care about having good documentation - if not for other people, for the AIs to read and understand the system
I often find myself leaving review comments on pull requests where I was surprised. I'll state as much: This surprised me - I was expecting XYZ at this point. Or I wasn't expecting X to be in charge of Y.
> It reminds me how LLMs finally made people to care about having good documentation - if not for other people, for the AIs to read and understand the system
Honestly I've mostly seen the opposite - impenetrable code translated to English by AI
Interesting! I've often felt that we aren't fully utilizing the "low hanging fruit" from the early days of the LLM craze. This seems like one of those ideas.
I'd like to see more contextually meaningful refactoring tools. Like "Remove this dependency" or "Externalize this code with a callback".
And refactoring shouldn't be done by generatively rewriting the code, but as a series of guaranteed equivalent transformations of the AST, each of which should be committed separately.
The AI should be used to analyse the value of the transformation and filter out asinine suggestions, not to write code in itself.
If it can't guarantee exact equivalence, it should at least rerun the relevant tests in the background and make sure they still pass before making suggestions.
That's actually something I implemented for a university project a few weeks ago.
My professor also did some research into how this can be used for more advanced UIs.
I'm sure it's a very common idea.
Do you have a link to the code? I'm curious how you implemented it. I'd also be really intrigued to see that research - does your professor have any published papers or something for those UIs?
It would depend on how surprising the name is, right? The declaration of `int total` should be relatively unsurprising at the top of a function named `sum`, but probably more surprising in `reverseString`.
Love the idea & spitballing ways to generalize to coding..
Thought experiment: as you write code, an LLM generates tests for it & the IDE runs those tests as you type, showing which ones are passing & failing, updating in real time. Imagine 10-100 tests that take <1ms to run, being rerun with every keystroke, and the result being shown in a non-intrusive way.
The tests could appear in a separated panel next to your code, and pass/fail status in the gutter of that panel. As simple as red and green dots for tests that passed or failed in the last run.
The presence or absence and content of certain tests, plus their pass/fail state, tells you what the code you’re writing does from an outside perspective. Not seeing the LLM write a test you think you’ll need? Either your test generator prompt is wrong, or the code you’re writing doesn’t do the things you think they do!
Making it realtime helps you shape the code.
Or if you want to do traditional TDD, the tooling could be reversed so you write the tests and the LLM makes them pass as soon as you stop typing by writing the code.
Humans writing the test first and LLM writing the code is much better than the reverse. And that is because tests are simply the “truth” and “intention” of the code as a contract.
When you give up the work of deciding what the expected inputs and outputs of the code/program is you are no longer in the drivers seat.
Yes this is fundamental to actually designing software. Still, it would be perfectly reasonable to ask "please write a test which gives y output for x input".
There's no way this would work for any serious C++ codebase. Compile times alone make this impossible
I'm also not sure how LLM could guess what the tests should be without having written all of the code, e.g. imagine writing code for a new data structure
> There's no way this would work for any serious C++ codebase. Compile times alone make this impossible
There's nothing in C++ that prevents this. If build times are your bogeyman, you'd be pleased to know that all mainstream build systems support incremental builds.
Then do you need tests to validate your tests are correct, otherwise the LLM might just generate passing code even if the test is bad? Or write code that games the system because it's easier to hardcode an output value then to do the actual work.
There probably is a setup where this works well, but the LLM and humans need to be able to move across the respective boundaries fluidly...
Writing clear requirements and letting the AI take care of the bulk of both sides seems more streamlined and productive.
The harder part is “test invalidation”. For instance if a feature no longer makes sense, the human / test validator must painstakingly go through and delete obsolete specs. An idea I’d like to try is to “separate” the concerns; only QA agents can delete specs, engineer agents must conform to the suite, and make a strong case to the qa agent for deletion.
> Thought experiment: as you write code, an LLM generates tests for it & the IDE runs those tests as you type, showing which ones are passing & failing, updating in real time. Imagine 10-100 tests that take <1ms to run, being rerun with every keystroke, and the result being shown in a non-intrusive way.
I think this is a bad approach. Tests enforce invariants, and they are exactly the type of code we don't want LLMs to touch willy-nilly.
You want your tests to only change if you explicitly want them to, and even then only the tests should change.
Once you adopt that constraint, you'll quickly realize ever single detail of your thought experiment is already a mundane workflow in any developer's day-to-day activities.
Consider the fact that watch mode is a staple of any JavaScript testing framework, and those even found their way into .NET a couple of years ago.
So, your thought experiment is something professional software developers have been doing for what? A decade now?
I think tests should be rewritten as much as needed. But to counter the invariant part, maybe let the user zoom back and forth through past revisions and pull in whatever they want to the current version, in case something important is deleted? And then allow “pinning” of some stuff so it can’t be changed? Would that solve for your concerns?
> Imagine 10-100 tests that take <1ms to run, being rerun with every keystroke, and the result being shown in a non-intrusive way.
Even if this were possible, this seems like an absolutely colossal waste of energy - both the computer's, and my own. Why would I want incomplete tests generated after every keystroke? Why would I test an incomplete if statement or some such?
> Imagine 10-100 tests that take <1ms to run, being rerun with every keystroke, and the result being shown in a non-intrusive way.
Doesn’t seem like high ROI to run full suite of tests on each keystroke. Most keystrokes yield an incomplete program, so you want to be smarter about when you run the tests to get a reasonably good trade off.
You could prune this drastically by just tokenizing the file with a lexer suitable for the language, turn them into a canonical state (e.g. replace the contents of any comment tokens with identical text), and check if the token state has changed. If you have a restartable lexer, you can even re-tokenize only from the current line until the state converges again or you encounter a syntax error.
That's already part of most IDE's and they know which tests to re-run, because of coverage, so it's really fast.
It also updates the coverage on the fly, you don't even have to look at the test output to know that you've broken something since the tests are not reaching your lines.
Yes the reverse makes much more sense to me. AI help to spec out the software & then the code has an accepted definition of correctness. People focus on this way less than they should I think
Besides generating the tests, automatically running tests on edit and showing the results inline is already a thing. I think it'd be better to do it the other way around, start with the tests and let the LLM implement it until all tests are green. Test driven development.
Absolutely agree, and spellchecker is a great analogy.
I've recently been snoozing co-pilot for hours at a time in VS Code because it’s adding a ton of latency to my keystrokes. Instead, it turns out that `rust_analyzer` is actually all that I need. Go-to definition and hover-over give me exactly what the article describes: extra senses.
Rust is straightforward, but the tricky part may be figuring out what additional “senses” are helpful in each domain. In that way, it seems like adding value with AI comes full circle to being a software design problem.
ChatGPT and Claude are great as assistants for strategizing problems, but even the typeahead value seems to me negligible in a large enough project. My experience with them as "coding agents" is generally that they fail miserably or are regurgitating some existing code base on a well known problem. But they are great at helping config things and as teachers in (the Socratic sense) to help you get up-to-speed with some technical issue.
The heads-up display is the thesis for Tritium[1], going back to its founding. Lawyers' time and attention (like fighter pilots') is critical but they're still required in the cockpit. And there's some argument they always will be.
On the topic of Rust IDE plugins that give you more senses, take a look at Flowistry: https://github.com/willcrichton/flowistry . It's not AI, it's using information flow analysis.
AI building complex visualisations for you on-the-fly seems like a great use-case.
For example, if you are debugging memory leaks in a specific code path, you could get AI to write a visualisation of all the memory allocations and frees under that code path to help you identify the problem. This opens up an interesting new direction where building visualisations to debug specific problems is probably becoming viable.
This idea reminds me of Jonathan Blow's recent talk at LambdaConf. In it, he shows a tool he made to visualise his programs in different ways to help with identifying potential problems. I could imagine AI being good at building these. The talk: https://youtu.be/IdpD5QIVOKQ?si=roTcCcHHMqCPzqSh&t=1108
Anecdotally, the low time to prototype has had me throwing usable tool projects up on my local network. I wrote a meal planner and review tool for our weekly vegetable share, and local board game / book library that guests can connect to and reference.
It scratches the itch to build and ship with the benefit of a growing library of low scope, high performance, highly customized web tools I can write over a few hours in an evening instead of devoting weekends to it. It feels like switching from hand to power tools
Same. It's very handy to be able to ask for one-off tools for things that I _could_ try to figure out, but probably wouldn't be worth the time. My favorite example so far was a Python script to help debug communication between a microcontroller and a motor driver. I was able to dump the entire datasheet PDF for the driver into Gemini and ask it for a Python CLI to decode communication traffic, and 30 seconds later it was done. Fantastic bang/buck ratio!
There’s a lot of ideation for coding HUDs in the comments, but ironically I think the core feature of most coding copilots is already best described as a HUD: tab completion.
And interestingly, that is indeed the feature I find most compelling from Cursor. I particularly love when I’m doing a small refactor, like changing a naming convention for a few variables, and after I make the first edit manually Cursor will jump in with tab suggestions for the rest.
To me, that fully encapsulates the definition of a HUD. It’s a delightful experience, and it’s also why I think anyone who pushes the exclusively-copilot oriented Claude Code as a superior replacement is just wrong.
I've spent the last few months using Claude Code and Cursor - experimenting with both. For simple tasks, both are pretty good (like identifying a bug given console output) - but when it comes to making a big change, like adding a brand new feature to existing code that requires changes to lots of files, writing tests, etc - it often will make at least a few mistakes I catch on review, and then prompting the model to fix those mistakes often causes it to fix things in strange ways.
A few days ago, I had a bug I just couldn't figure out. I prompted Claude to diagnose and fix the issue - but after 5 minutes or so of it trying out different ideas, rerunning the test, and getting stuck just like I did - it just turned off the test and called it complete. If I wasn't watching what it was doing, I could have missed that it did that and deployed bad code.
The last week or so, I've totally switched from relying on prompting to just writing the code myself and using tab complete to autocomplete like 80% of it. It is slower, but I have more control and honestly, it's much more enjoyable of an experience.
Drop in a lint rule to fail on skipped tests. Ive added these at a previous job after finding that tests skipped during dev sometimes slipped through review and got merged.
I'd love to have something that operates more at the codebase level.
Autocomplete is very local.
(Maybe "tab completion" when setting up a new package in a monorepo? Or make architectural patterns consistent across a whole project? Highlight areas in the codebase where the tests are weak? Or collect on the fly a full view of a path from FE to BE to DB?)
Doesn't it all come down to "what is the ideal interface for humans to deal with digital information"?
We're getting more and more information thrown at us each day, and the AIs are adding to that, not reducing it. The ability to summarise dense and specialist information (I'm thinking error logs, but could be anything really) just means more ways for people to access and view that information who previously wouldn't.
How do we, as individuals, best deal with all this information efficiently? Currently we have a variety of interfaces, websites, dashboards, emails, chat. Are all these necessary anymore? They might be now, but what about the next 10 years. Do I even need to visit a companies website if can get the same information from some single chat interface?
The fact we have AIs building us websites, apps, web UI's just seems so...redundant.
Websites were a way to get authoritative information about a company, from that company (or another trusted source like Wikipedia). That trust is powerful, which is why we collectively spent so much time trying to educate users about the "line of death" in browsers, drawing padlock icons, chasing down impersonator sites, mitigating homoglyph attacks, etc. This all rested on the assumption that certain sites were authoritative sources of information worth seeking out.
I'm not really sure what trust means in a world where everyone relies uncritically on LLM output. Even if the information from the LLM is usually accurate, can I rely on that in some particularly important instance?
You raise a good point, and one I rarely see discussed.
I still believe it fundamentally comes down to an interface issue, but how trust gets decoupled from the interface (as you said, the padlock shown in the browser and certs to validate a website source), thats an interesting one to think about :-)
I imagine there will be the same problems as with Facebook and other large websites, that used their power to promote genocide. If you're in the mood for some horror stories:
When LLM are suddenly everywhere, who's making sure that they are not causing harm? I got the above link from Dan Luu (https://danluu.com/diseconomies-scale/) and if his text there is anything to go by, the large companies producing LLMs will have very little interest in making sure their products are not causing harm.
In some cases like the Air Canada one where the courts made them uphold a deal offered by their chatbot it'll be "accurate" information whether the company wants it to be or not!
Not not everything an LLM tells you is going to be worth going to court over if it's wrong though.
The designers of 6th gen fighter jets are confronting the same challenge. The cockpit, which is an interface between the pilot and the airframe, will be optionally manned. If the cockpit is manned, the pilot will take on a reduced set of roles focused on higher-level decision making.
By the 7th generation it's hard to see how humans will still be value-add, unless it's for international law reasons to keep a human in the loop before executing the kill chain, or to reduce Skynet-like tail risks in line with Paul Christiano's arms race doom scenario.
Perhaps interfaces in every domain will evolve this way. The interface will shrink in complexity, until it's only humans describing what they want to the system, at higher and higher levels of abstraction. That doesn't necessarily have to be an English-language interface if precision in specification is required.
> keep a human in the loop before executing the kill chain, or to reduce Skynet-like tail risks in line with Paul Christiano's arms race doom scenario.
It is a little known secret that plenty of defense systems are already set up to dispense of the human in the loop protocol before a fire action. For defense primarily, but also for attack once a target has been designated. I worked on protocols in the 90's, and this decision was already accepted.
It happens to be so effective that the military won't bulge on this.
Also, it is not much worse to have a decision system act autonomously for a kill system, if you consider that the alternative is a dumb system such as a landmine.
Btw: while there always is a "stop button" in these systems, don't be fooled. Those are meant to provide semblance of comfort and compliance to the designers of those systems, but are hardly effective in practice.
Computers / the web are a fast route to information, but they are also a storehouse of information; a ledger. This is the old "information wants to be free, and also very expensive." I don't want all the info on my PC, or the bank database, to be 'alive', I want it to be frozen in kryptonite, so it's right where I left it when I come back.
I think we're slowly allowing AI access to the interface layer, but not to the information layer, and hopefully we'll figure out how to keep it that way.
I think one key reason HUDs haven’t taken off more broadly is the fundamental limitation of our current display medium - computer screens and mobile devices are terrible at providing ambient, peripheral information without being intrusive. When I launch an AI agent to fix a bug or handle a complex task, there’s this awkward wait time where it takes too long for me to sit there staring at the screen waiting for output, but it’s too short for me to disengage and do something else meaningful. A HUD approach would give me a much shorter feedback loop. I could see what the AI is doing in my peripheral vision and decide moment-to-moment whether to jump in and take over the coding myself, or let the agent continue while I work on something else. Instead of being locked into either “full attention on the agent” or “completely disengaged,” I’d have that ambient awareness that lets me dynamically choose my level of involvement. This makes me think VR/AR could be the killer application for AI HUDs. Spatial computing gives us the display paradigm where AI assistance can be truly ambient rather than demanding your full visual attention on a 2D screen. I picture that this would be especially helpful for help on more physical tasks, such as cooking, or fixing a bike.
You just described what I do with my ultrawide monitor and laptop screen.
I can be fully immersed in a game or anything and keep Claude in a corner of a tmux window next to a browser on the other monitor and jump in whenever I see it get to the next step or whatever.
It’s a similar idea, but imagine you could fire off a task, and go for a run, or do the dishes. Then be notified when it completes, and have the option to review the changes, or see a summary of tests that are failing, without having to be at your workstation.
The only real life usage of any kind of HUD I can imagine at the moment is navigation, and I have only ever used that (or other car related things) as something I selectively look at, never felt like it's something I need to have in sight at all times.
That said, the best GUI is the one you don't notice, so uh... I can't actually name anything else, it's probably deeply engrained in my computer usage.
The main character’s car, Asurada, is basically a "Copilot" in every sense. It was designed by his dad to be more than just a tool, more like a partner that learns, adapts, and grows with the driver. Think emotional support plus tactical analysis with a synthetic voice.
Later in the series, his rival shows up driving a car that feels very much like a HUD concept. It's all about cold data, raw feedback, and zero bonding. Total opposite philosophy.
What’s wild is how accurately it captures the trade-offs we’re still talking about in 2025. If you’re into human-AI interaction or just want to see some shockingly ahead-of-its-time design thinking wrapped in early '90s cyber aesthetics, it’s absolutely worth a watch.
One part "90 degree right in 200m" and one part "OMG, sheep, dodge left".
Deleted Comment
Deleted Comment
Deleted Comment
Turns out this kind of UI is not only useful to spot bugs, but also allows users to discover implementation choices and design decisions that are obscured by traditional assistant interfaces.
Very exciting research direction!
And in fact, I think I saw a paper / blog post that showed exactly this, and then... nothing. For the last few years, the tech world became crazy with code generation, with forks of VSCode hooked to LLMs worth billions of dollars and all that. But AI-based code analysis is remarkably poor. The only thing I have seen resembling this is bug report generators, which is I believe is one of the worst approach.
The idea you have, that I also had and I am sure many thousands of other people had seem so obvious, why is no one talking about it? Is there something wrong with it?
The thing is, using such a feature requires a brain between the keyboard and the chair. A "surprising" token can mean many things: a bug, but also a unique feature, anyways, something you should pay attention to. Too much "green" should also be seen as a signal. Maybe you reinvented the wheel and you should use a library instead, or maybe you failed to take into account a use case specific to your application.
Maybe such tools don't make good marketing. You need to be a competent programmer to use them. It won't help you write more lines faster. It doesn't fit the fantasy of making anyone into a programmer with no effort (hint: learning a programming language is not the hard part). It doesn't generate the busywork of AI 1 introducing bugs for AI 2 to create tickets for.
> Is there something wrong with it?
> Maybe such tools don't make good marketing.
You had the answer the entire time :)
Features that require a brain between the AI and key-presses just don't sell. Don't expect to see them for sale. (But we can still get them for free.)
I expect it definitely requires some iteration, I don't think you can just map logits to heat, you get a lot of noise that way.
In short, it's probably possible (and it's maybe a good engineering practice) to structure the source such as no specific part is really surprising
It reminds me how LLMs finally made people to care about having good documentation - if not for other people, for the AIs to read and understand the system
Honestly I've mostly seen the opposite - impenetrable code translated to English by AI
The perplexity calculation isn't difficult; just need to incorporate it into the editor interface.
Interestingly, frequency of "surprising" sentences is one of the ways quality of AI novels is judged: https://arxiv.org/abs/2411.02316
I'd like to see more contextually meaningful refactoring tools. Like "Remove this dependency" or "Externalize this code with a callback".
And refactoring shouldn't be done by generatively rewriting the code, but as a series of guaranteed equivalent transformations of the AST, each of which should be committed separately.
The AI should be used to analyse the value of the transformation and filter out asinine suggestions, not to write code in itself.
Dead Comment
Dead Comment
Dead Comment
Dead Comment
Dead Comment
Thought experiment: as you write code, an LLM generates tests for it & the IDE runs those tests as you type, showing which ones are passing & failing, updating in real time. Imagine 10-100 tests that take <1ms to run, being rerun with every keystroke, and the result being shown in a non-intrusive way.
The tests could appear in a separated panel next to your code, and pass/fail status in the gutter of that panel. As simple as red and green dots for tests that passed or failed in the last run.
The presence or absence and content of certain tests, plus their pass/fail state, tells you what the code you’re writing does from an outside perspective. Not seeing the LLM write a test you think you’ll need? Either your test generator prompt is wrong, or the code you’re writing doesn’t do the things you think they do!
Making it realtime helps you shape the code.
Or if you want to do traditional TDD, the tooling could be reversed so you write the tests and the LLM makes them pass as soon as you stop typing by writing the code.
When you give up the work of deciding what the expected inputs and outputs of the code/program is you are no longer in the drivers seat.
You don’t need to write tests for that, you need to write acceptance criteria.
Isn't that logic programming/Prolog?
You basically write the sequence of conditions(i.e tests in our lingo) that have to be true, and the compiler(now AI) generates code for your.
Perhaps there has to be a relook on how Logic programming can be done in the modern era to make this more seamless.
I'm also not sure how LLM could guess what the tests should be without having written all of the code, e.g. imagine writing code for a new data structure
There's nothing in C++ that prevents this. If build times are your bogeyman, you'd be pleased to know that all mainstream build systems support incremental builds.
There probably is a setup where this works well, but the LLM and humans need to be able to move across the respective boundaries fluidly...
Writing clear requirements and letting the AI take care of the bulk of both sides seems more streamlined and productive.
I think this is a bad approach. Tests enforce invariants, and they are exactly the type of code we don't want LLMs to touch willy-nilly.
You want your tests to only change if you explicitly want them to, and even then only the tests should change.
Once you adopt that constraint, you'll quickly realize ever single detail of your thought experiment is already a mundane workflow in any developer's day-to-day activities.
Consider the fact that watch mode is a staple of any JavaScript testing framework, and those even found their way into .NET a couple of years ago.
So, your thought experiment is something professional software developers have been doing for what? A decade now?
Even if this were possible, this seems like an absolutely colossal waste of energy - both the computer's, and my own. Why would I want incomplete tests generated after every keystroke? Why would I test an incomplete if statement or some such?
Doesn’t seem like high ROI to run full suite of tests on each keystroke. Most keystrokes yield an incomplete program, so you want to be smarter about when you run the tests to get a reasonably good trade off.
Deleted Comment
It also updates the coverage on the fly, you don't even have to look at the test output to know that you've broken something since the tests are not reaching your lines.
https://gavindraper.com/2020/05/27/VS-Code-Continious-Testin...
https://wallabyjs.com/
I've recently been snoozing co-pilot for hours at a time in VS Code because it’s adding a ton of latency to my keystrokes. Instead, it turns out that `rust_analyzer` is actually all that I need. Go-to definition and hover-over give me exactly what the article describes: extra senses.
Rust is straightforward, but the tricky part may be figuring out what additional “senses” are helpful in each domain. In that way, it seems like adding value with AI comes full circle to being a software design problem.
ChatGPT and Claude are great as assistants for strategizing problems, but even the typeahead value seems to me negligible in a large enough project. My experience with them as "coding agents" is generally that they fail miserably or are regurgitating some existing code base on a well known problem. But they are great at helping config things and as teachers in (the Socratic sense) to help you get up-to-speed with some technical issue.
The heads-up display is the thesis for Tritium[1], going back to its founding. Lawyers' time and attention (like fighter pilots') is critical but they're still required in the cockpit. And there's some argument they always will be.
[1] https://news.ycombinator.com/item?id=44256765 ("an all-in-one drafting cockpit")
For example, if you are debugging memory leaks in a specific code path, you could get AI to write a visualisation of all the memory allocations and frees under that code path to help you identify the problem. This opens up an interesting new direction where building visualisations to debug specific problems is probably becoming viable.
This idea reminds me of Jonathan Blow's recent talk at LambdaConf. In it, he shows a tool he made to visualise his programs in different ways to help with identifying potential problems. I could imagine AI being good at building these. The talk: https://youtu.be/IdpD5QIVOKQ?si=roTcCcHHMqCPzqSh&t=1108
I've experienced the case where asking for a quick python script was faster and more powerful than learning how to use a cli to interact with an API.
It scratches the itch to build and ship with the benefit of a growing library of low scope, high performance, highly customized web tools I can write over a few hours in an evening instead of devoting weekends to it. It feels like switching from hand to power tools
And interestingly, that is indeed the feature I find most compelling from Cursor. I particularly love when I’m doing a small refactor, like changing a naming convention for a few variables, and after I make the first edit manually Cursor will jump in with tab suggestions for the rest.
To me, that fully encapsulates the definition of a HUD. It’s a delightful experience, and it’s also why I think anyone who pushes the exclusively-copilot oriented Claude Code as a superior replacement is just wrong.
I've spent the last few months using Claude Code and Cursor - experimenting with both. For simple tasks, both are pretty good (like identifying a bug given console output) - but when it comes to making a big change, like adding a brand new feature to existing code that requires changes to lots of files, writing tests, etc - it often will make at least a few mistakes I catch on review, and then prompting the model to fix those mistakes often causes it to fix things in strange ways.
A few days ago, I had a bug I just couldn't figure out. I prompted Claude to diagnose and fix the issue - but after 5 minutes or so of it trying out different ideas, rerunning the test, and getting stuck just like I did - it just turned off the test and called it complete. If I wasn't watching what it was doing, I could have missed that it did that and deployed bad code.
The last week or so, I've totally switched from relying on prompting to just writing the code myself and using tab complete to autocomplete like 80% of it. It is slower, but I have more control and honestly, it's much more enjoyable of an experience.
I'd love to have something that operates more at the codebase level. Autocomplete is very local.
(Maybe "tab completion" when setting up a new package in a monorepo? Or make architectural patterns consistent across a whole project? Highlight areas in the codebase where the tests are weak? Or collect on the fly a full view of a path from FE to BE to DB?)
We're getting more and more information thrown at us each day, and the AIs are adding to that, not reducing it. The ability to summarise dense and specialist information (I'm thinking error logs, but could be anything really) just means more ways for people to access and view that information who previously wouldn't.
How do we, as individuals, best deal with all this information efficiently? Currently we have a variety of interfaces, websites, dashboards, emails, chat. Are all these necessary anymore? They might be now, but what about the next 10 years. Do I even need to visit a companies website if can get the same information from some single chat interface?
The fact we have AIs building us websites, apps, web UI's just seems so...redundant.
I'm not really sure what trust means in a world where everyone relies uncritically on LLM output. Even if the information from the LLM is usually accurate, can I rely on that in some particularly important instance?
I still believe it fundamentally comes down to an interface issue, but how trust gets decoupled from the interface (as you said, the padlock shown in the browser and certs to validate a website source), thats an interesting one to think about :-)
Deleted Comment
https://erinkissane.com/meta-in-myanmar-full-series
When LLM are suddenly everywhere, who's making sure that they are not causing harm? I got the above link from Dan Luu (https://danluu.com/diseconomies-scale/) and if his text there is anything to go by, the large companies producing LLMs will have very little interest in making sure their products are not causing harm.
Not not everything an LLM tells you is going to be worth going to court over if it's wrong though.
By the 7th generation it's hard to see how humans will still be value-add, unless it's for international law reasons to keep a human in the loop before executing the kill chain, or to reduce Skynet-like tail risks in line with Paul Christiano's arms race doom scenario.
Perhaps interfaces in every domain will evolve this way. The interface will shrink in complexity, until it's only humans describing what they want to the system, at higher and higher levels of abstraction. That doesn't necessarily have to be an English-language interface if precision in specification is required.
It is a little known secret that plenty of defense systems are already set up to dispense of the human in the loop protocol before a fire action. For defense primarily, but also for attack once a target has been designated. I worked on protocols in the 90's, and this decision was already accepted.
It happens to be so effective that the military won't bulge on this.
Also, it is not much worse to have a decision system act autonomously for a kill system, if you consider that the alternative is a dumb system such as a landmine.
Btw: while there always is a "stop button" in these systems, don't be fooled. Those are meant to provide semblance of comfort and compliance to the designers of those systems, but are hardly effective in practice.
I think we're slowly allowing AI access to the interface layer, but not to the information layer, and hopefully we'll figure out how to keep it that way.
I can be fully immersed in a game or anything and keep Claude in a corner of a tmux window next to a browser on the other monitor and jump in whenever I see it get to the next step or whatever.
That said, the best GUI is the one you don't notice, so uh... I can't actually name anything else, it's probably deeply engrained in my computer usage.