didericis (u/didericis)

didericis commented on 50 years of proof assistants lawrencecpaulson.github.i... · Posted by u/baruchel

PaulHoule · 2 days ago

I can see "no progress in 50 years" in fundamental physics where the experimental frontier seems to be running away from us (though recent gamma astronomy results suggest a next generation accelerator really could see the dark matter particle)

In biology or chemistry it's absurd to say that -- look at metal organic frameworks or all kinds of new synthetic chemistry or ionic liquids or metagenomics, RNA structure prediction, and unraveling of how gene regulation works in the "dark genome".

Progress in the 'symbolic AI' field that includes proof assistants is a really interesting story. When I was a kid I saw an ad for Feigenbaum's 3-volume "Handbook of AI" and got a used copy years later -- you would have thought production rules (e.g. "expert systems" or "business rules") were on track to be a dominant paradigm but my understanding was that people were losing interest even before RETE engines became mainstream and even the expert system shells of the early 1980s didn't use the kind of indexing structures that are mainstream today so that whereas people we saying 10,000 rule rule bases were unruly in the 1980s, 10,000,000 well-structured rules are no problem now. Some of it is hardware but a lot of it is improvements in software.

SAT/SMT solvers (e.g. part of proof assistants) have shown steady progress in the last 50 years, though not as much as neural networks because they are less parallelization. There is dramatically more industrial use of provers though business rules engines, complex event processing, and related technologies are still marginal in the industry for reasons I don't completely understand.

didericis · a day ago

> business rules engines, complex event processing, and related technologies are still marginal in the industry for reasons I don't completely understand

Translating between complex implicit intention in colloquial language and software and formal language used in proof assistants is usually very time consuming and difficult.

By the time you’ve formalized the rules, the context in which the rules made sense will have changed/a lot will be outdated. Plus time and money spent on formalizing rules is time and money not spent on core business needs.

didericis commented on Show HN: Why write code if the LLM can just do the thing? (web app experiment) github.com/samrolken/noko... · Posted by u/samrolken

simianwords · a month ago

this is a weak argument.. i use normal taxis and ask the driver to take me to a place in natural language - a process which is certainly non deterministic.

didericis · a month ago

> a process which is certainly non deterministic

The specific events that follow when asking a taxi driver where to go may not be exactly repeatable, but reality enforces physical determinism that is not explicitly understood by probabilistic token predictors. If you drive into a wall you will obey deterministic laws of momentum. If you drive off a cliff you will obey deterministic laws of gravity. These are certainties, not high probabilities. A physical taxi cannot have a catastrophic instant change in implementation and have its wheels or engine disappear when it stops to pick you up. A human taxi driver cannot instantly swap their physical taxi for a submarine, they cannot swap new york with paris, they cannot pass through buildings… the real world has a physically determined option-space that symbolic token predictors don’t understand yet.

And the reason humans are good at interpreting human intent correctly is not just that we’ve had billions of years of training with direct access to physical reality, but because we all share the same basic structure of inbuilt assumptions and “training history”. When interacting with a machine, so many of those basic unstated shared assumptions are absent, which is why it takes more effort to explicitly articulate what it is exactly that you want.

We’re getting much better at getting machines to infer intent from plain english, but even if we created a machine which could perfectly interpret our intentions, that still doesn’t solve the issue of needing to explain what you want in enough detail to actually get it for most tasks. Moving from point A to point B is a pretty simple task to describe. Many tasks aren’t like that, and the complexity comes as much from explaining what it is you want as it does from the implementation.

didericis commented on Show HN: Why write code if the LLM can just do the thing? (web app experiment) github.com/samrolken/noko... · Posted by u/samrolken

mikodin · a month ago

Well the need is to arrive where you are going.

If we were in an imagined world and you are headed to work

You either walk out your door and there is a self driving car, or you walk out of your door and there is a train waiting for you or you walk out of your door and there is a helicopter or you walk out of your door and there is a literal worm hole.

Let's say all take the same amount of time, are equally safe, same cost, have the same amenities inside, and "feel the same" - would you care if it were different every day?

I don't think I would.

Maybe the wormhole causes slight nausea ;)

didericis · a month ago

> Well the need is to arrive where you are going.

In order to get to your destination, you need to explain where you want to go. Whatever you call that “imperative language”, in order to actually get the thing you want, you have to explain it. That’s an unavoidable aspect of interacting with anything that responds to commands, computer or not.

If the AI misunderstands those instructions and takes you to a slightly different place than you want to go, that’s a huge problem. But it’s bound to happen if you’re writing machine instructions in a natural language like English and in an environment where the same instructions aren’t consistently or deterministically interpreted. It’s even more likely if the destination or task is particularly difficult/complex to explain at the desired level of detail.

There’s a certain irreducible level of complexity involved in directing and translating a user’s intent into machine output simply and reliably that people keep trying to “solve”, but the issue keeps reasserting itself generation after generation. COBOL was “plain english” and people assumed it would make interacting with computers like giving instructions to another employee over half a century ago.

The primary difficulty is not the language used to articulate intent, the primary difficulty is articulating intent.

didericis commented on We need a clearer framework for AI-assisted contributions to open source samsaffron.com/archive/20... · Posted by u/keybits

pards · 2 months ago

> code is getting less and less important in our team, so we don't need many engineers.

That's a bit reductive. Programmers write code; engineers build systems.

I'd argue that you still need engineers for architecture, system design, protocol design, API design, tech stack evaluation & selection, rollout strategies, etc, and most of this has to be unambiguously documented in a format LLMs can understand.

While I agree that the value of code has decreased now that we can generate and regenerate code from specs, we still need a substantial number of experienced engineers to curate all the specs and inputs that we feed into LLMs.

didericis · 2 months ago

> we can generate and regenerate code from specs

We can (unreliably) write more code in natural english now. At its core it’s the same thing: detailed instructions telling the computer what it should do.

didericis commented on "Vibe code hell" has replaced "tutorial hell" in coding education blog.boot.dev/education/v... · Posted by u/wagslane

crazygringo · 2 months ago

> If you're checking all the details for correct behavior, the effort involved converges to roughly the same thing.

Except it doesn't. It's much less to verify the tests.

> That's a valid argument, if you're ok tolerating a certain level of uncertainty about behavior that you haven't meticulously checked or tested.

I'm a realist, and know that I, like all other programmers, am fallible. Nobody writes perfect code. So yes, I'm ok tolerating a certain level of uncertainty about everybody's code, because there's no other choice.

I can get the same level of uncertainty in far less time with an LLM. That's what makes it great.

didericis · 2 months ago

> Except it doesn't. It's much less to verify the tests.

This is only true when there is less information in those tests. You can argue that the extra information you see in the implementation doesn't matter as long as it does what the tests say, but the amount of uncertainty depends on the amount of information omitted in the tests. There's a threshold over which the effort of avoiding uncertainty becomes the same as the effort involved in just writing the code. Whether or not you think that's important depends on the problem you're working on and your tolerance for error and uncertainty, and there's no hard and fast rule for that. But if you want to approach 100% correctness, you need to attempt to specify your intentions 100% precisely. The fact that humans make mistakes and miscommunicate their intentions does not change the basic fact that a human needs to communicate their intention for a machine to fulfill that intention. The more precise the communication, the more work that's involved, regardless of whether you're verifying that precision after something generates it or generating it yourself.

> I can get the same level of uncertainty in far less time with an LLM. That's what makes it great.

I have a low tolerance for uncertainty in software, so I usually can't reach a level I find acceptable with an LLM. Fallible people who understand the intentions and current function of a codebase have a capacity that a statistical amalgamation of tokens trained on fallible people's output simply do not have. People may not use their capacity to verify alignment between intention and execution well, but they have it.

Again, I'm not denying that there's plenty of problems where the level of uncertainty involved in AI generated code is acceptable. I just think it's fundamentally true that extra precision requires extra work/there's simply no way to avoid that.

didericis commented on "Vibe code hell" has replaced "tutorial hell" in coding education blog.boot.dev/education/v... · Posted by u/wagslane

crazygringo · 2 months ago

> if you care about correctness they all basically converge to the same thing and the same amount of work.

That's the part I'd push back on. They're not the same amount of work.

When I'm writing the code myself, it's basically a ton of "plumbing" of loops and ifs and keeping track of counters and making sure I'm not making off-by-one errors and not making punctuation mistakes and all the rest. It actually takes quite a lot of brain energy and time to get that all perfect.

It saves a lot of time to write the function definition in plain English, have the LLM generate a bunch of tests that you verify are the correct definition... and then let the LLM take care of all the loops and indexing and punctuation and plumbing.

I regularly cut what used to be an entire afternoon or day's worth of work down into 30 minutes. I spend 10 minutes writing the design for what will be 500-1,000 lines of code, 5 minutes answering the LLM's questions about it, 5 minutes skimming the code to make sure it all looks vaguely plausible (no obvious red flags), 5 minutes ensuring the unit tests cover everything I can think of (almost always, the LLM has thought of a bunch of edge cases I never would have bothered to test), and another 5 minutes telling it to fix things, like its unit tests make me suddenly realize there's an edge case that should be defined differently.

The idea that it's the "same amount of work" is crazy to me. It's so much more efficient. And in all honesty, the code is more reliable too because it tests things that I usually wouldn't bother with, because writing all the tests is so boring.

didericis · 2 months ago

> When I'm writing the code myself, it's basically a ton of "plumbing" of loops and ifs and keeping track of counters and making sure I'm not making off-by-one errors and not making punctuation mistakes and all the rest. It actually takes quite a lot of brain energy and time to get that all perfect.

All of that "plumbing" affects behavior. My argument is that all of the brain energy used when checking that behavior is necessary in order to check that behavior. Do you have a test for an off by one error? Do you have a test to make sure your counter behaves correctly when there are multiple components on the same page? Do you have a test to make sure errors don't cause the component to crash? Do you have a test to ensure non utf-8 text or binary data in a text input throws a validation error? Etc etc. If you're checking all the details for correct behavior, the effort involved converges to roughly the same thing.

If you're not checking all of that plumbing, you don't know whether or not the behavior is correct. And the level of abstraction used when working with agents and LLMs is not the same as when working with a higher level language, because LLMs make no guarantees about the correspondence between input and output. Compilers and programming languages are meticulously designed to ensure that output is exactly what is specified. There are bugs and edge cases in compilers and quirks based on different hardware, so it's not always 100% perfect, but it's 99.9999% perfect.

When you use an LLM, you have no guarantees about what it's doing, and in a way that's categorically different than not knowing what a compiler does. Very few people know all of the steps that break down `console.log("hello world")` into the electrical signals that get sent to the pixels on a screen on a modern OS using modern hardware given the complexity of the stack, but they do know with as close as is humanly possible to 100% certainty that a correctly configured environment will result in that statement outputting the text "hello world" to a console. They do not need to know the implementation because the contract is deterministic and well defined. Prompts are not deterministic nor well defined, so if you want to verify it's doing what you want it to do, you have to check what it's doing in detail.

Your basic argument here is that you can save a lot of time by trusting the LLM will faithfully wire the code as you want, and that you can write tests to sanity check behavior and verify that. That's a valid argument, if you're ok tolerating a certain level of uncertainty about behavior that you haven't meticulously checked or tested. The more you want to meticulously check behavior, the more effort it takes, and the more it converges to the effort involved in just writing the code normally.

didericis commented on "Vibe code hell" has replaced "tutorial hell" in coding education blog.boot.dev/education/v... · Posted by u/wagslane

rmonvfer · 2 months ago

The only way this might work (IMO) is writing the tests yourself (but of course, this requires you to plan and design very meticulously in advance) and doing some kind of “blind TDD” where the LLM is not able to see the tests, only run them and act on the results. Even then, I’ve had Claude (Opus 4.1) bypass tests by hardcoding conditions as it found them so I’d say reliability for this method is not 100%.

Having the LLM write the tests is… well, a recipe for destruction unless you babysit it and give it extremely specific restrictions (again, I’ve done this in mid to large sized projects with fairly comprehensive documentation on testing conventions and results have been mixed: sometimes the LLM does an okay job but tests obvious things, sometimes it ignores the instructions, sometimes it hardcodes or disables conditions…)

didericis · 2 months ago

I’ve been saying this for years now: you can’t avoid communicating what you want a computer to do. The specific requirements have to be made somewhere.

Inferring intent from plain english prompts and context is a powerful way for computers to guess what you want from underspecified requirements, but the problem of defining what you want specifically always requires you to convey some irreducible amount of information. Whether it’s code, highly specific plain english, or detailed tests, if you care about correctness they all basically converge to the same thing and the same amount of work.

didericis commented on Suspicionless ChatControl must be taboo in a state governed by the rule of law digitalcourage.social/@ec... · Posted by u/nabla9

jopsen · 2 months ago

Just my personal, not fully conceived opinion:

ChatControl cannot exists without criminalizing cryptography (crypto with backdoors is not crypto).

When the act of uttering sufficiently complicated mathematics is a crime, we entering the territory of absurdity.

Such laws cannot be enforced. Enforcement can only be arbitrary.

didericis · 2 months ago

> Such laws cannot be enforced. Enforcement can only be arbitrary.

I am against criminalizing cryptography and largely agree about it being infeasible given the extent of proliferation and ease of replicating it/am playing devil's advocate:

Laws banning math related to manufacturing nuclear weapons can and has been enforced. It's important to take legal threats like ChatControl seriously and not just dismiss it as absurd/unenforceable overreach, even if that's likely true.

didericis commented on Typst: A Possible LaTeX Replacement lwn.net/Articles/1037577/... · Posted by u/pykello

koito17 · 3 months ago

> I don't understand why some academic flavor of markdown isn't the standard

I would argue that is exactly what LaTeX is... I studied mathematics in university, and from what I recall, every major publisher provided a LaTeX template for articles and textbooks. Likewise, pretty much every mathematics presentation uses Beamer slides, and most mathematicians are able to "compile" subsets of LaTeX in their head. Websites like MSE and MO use MathJax precisely so that people can write notation as they would on assignments, notes, papers, etc.

Note: I am not saying people particularly like LaTeX as a tool. However, the vast majority of the complaints about LaTeX do seem to be from computer science people. Many mathematics students just create an Overleaf (formerly ShareLaTeX) account and call it a day. Of course, nobody enjoys notes taking 10 seconds to compile, or the first 100 lines of their thesis being a preamble, but the important part is the ability to express notation across a variety of mediums, and the publisher support (as GP mentioned).

didericis · 3 months ago

I agree the standard for mathematical notation is latex, but it’s only needed for fairly limited parts of a document. It makes more sense to me as something you’d use in snippets like `$\sum_{n=1}^{10}n$` than something that should control the whole document.

Markdown and mathjax is imo way more web friendly than a full latex document and avoids distracting/unnecessary aspects of latex.

As for publisher support, that’s what frustrates me most: html was specifically designed for academics at cern to publish and link documents. Instead of using an html friendly format like markdown, publishers demand a format designed for printing content to a physical piece of paper. So now html is used for almost everything except academic papers, and academic papers are all pdfs submitted to journal or preprint severs with no links.

didericis commented on Xeres: Uncensorable Peer-to-Peer Communications xeres.io/... · Posted by u/thunderbong

biomcgary · 3 months ago

I didn't see a public source repository for keet, just compiled releases. Why would you trust closed source for a privacy app?

didericis · 3 months ago

I’ve talked to their devs/met them in person and trust them, most of their stack is public/all the primitives they use are available and well documented (see https://github.com/holepunchto and https://docs.pears.com/), I’ve used that stack and verified it does what is advertised, and I believe they’re planning a full open source release of the parts that aren’t already public.