Still, I love Matrix and hope that these issues will be resolved in time.
Still, I love Matrix and hope that these issues will be resolved in time.
I think the point was that install.md is a good way to generate an install.sh.
> validate that, and put it into the repo
The problem being discussed is that the user of the script needs to validate it. It's great if it's validated by the author, but that's already the situation we're in.
The user is free to use a LLM to 'validate' the `install.sh` file. Just asking it if the script does anything 'bad'. That should be similarly successful as the LLM generating the script based on a description. Maybe even more successful.
That way we can have entire projects with nothing but Markdown files. And we can run apps with just `claude run app.md`. Who needs silly code anyway?
Wouldn't that be nice?
Install scripts are a simple example that current generation LLMs are more than capable of executing correctly with a reasonably descriptive prompt.
More generally, though, there's something fascinating about the idea that the way you describe a program can _be_ the program that tbh I haven't fully wrapped my head around, but it's not crazy to think that in time more and more software will be exchanged by passing prompts around rather than compiled code.
It is much easier to use LLMs to generate code, validate that code as a developer, fix it, if necessary, and check it into the repo, then if every user has to send prompts to LLMs in order to get the code they can actually execute.
While hoping it doesn't break their system and does what they wanted from it.
Also... that just doesn't scale. How much power would we need, if everyday computing starts with a BIOS sending prompts to LLMs in order to generate a operating system it can use.
Even if it is just about installing stuff... We have CI runners, that constantly install software often on every build. How would they scale if they need LLMs to generate install instructions every time?
I used minimax M2 (context it's very unreliable) for installation and it didn't work and my document folder is missing, help
how do you even debug this? imagine you some path or behaviour is changed in new os release and model thinks it knows better? if anything goes wrong who is responsible?
Pretty brilliant in a way.
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
What Naur meant by "theory" was the mental model of the original programmers who understood why they wrote it that way. He argued the real program was is theory, not the code. The translation of the theory into code is lossy: you can't reconstruct the former from the latter. Naur said that this explains why software teams don't do as well when they lose access to the original programmers, because they were the only ones with the theory.
If we take "a great description" to mean a writeup of the thinking behind the program, i.e. the theory, then your comment is in keeping with Naur: you can go one way (theory to code) but not the other (code to theory).
The big question is whether/how LLMs might change this equation.
And natural languages are open to interpretation and a lot of context will remain unmentioned. While programming languages, together with their tested environment, contain the whole context.
Instrumenting LLMs will also mean, doing a lot of prompt engineering, which on one hand might make the instructions clearer (for the human reader as well), but on the other will likely not transfer as much theory behind why each decision was made. Instead, it will likely focus on copy&pasta guides, that don't require much understanding on why something is done.
Instead of asking the agent to execute it for you, you ask the agent to write an install.sh based on the install.md?
Then you can both audit whatever you want before running or not.
Good idea. That seems sensible.
Bonus: LLM is only used once, not every time anyone wants to install some software. With some risks of having to regenerate, because the output was nonsensical.
That is why we have programming languages, they, coupled with a specific interpreter/compiler, are pretty clear on what they do. If someone misunderstands some specific code segment, they can just test their assumptions easily.
You cannot do that with just written prose, you would need to ask the writer of that prose to clarify.
And with programming languages, the context is contained, and clearly stated, otherwise it couldn't be executed. Even undefined behavior is part of that, if you use the same interpreter/compiler.
Also humans often just read something wrong, or skip important parts. That is why we have computers.
Now, I wouldn't trust a LLM to execute prose any better then I trust a random human of reading some how-to guide and doing that.
The whole idea that we now add more documentation to our source code projects, so that dumb AI can make sense of it, is interesting... Maybe generally useful for humans as well... But I would instead target humans, not LLMs. If the LLMs finds it useful as well, great. But I wouldn't try to 'optimize' my instructions so that every LLM doesn't just fall flat on its face. That seems like a futile effort.
Zulip has client-server encryption, which is fine if you control the server.