I would say that there is very little danger of a proof in Lean being incorrect.
There is a serious danger, which has nothing to do with bugs in Lean, which is a known problem for software verification and also applies in math: one must read the conclusions carefully to make sure that the right thing is actually proved.
I read Wilshaw's final conclusions carefully, and she did indeed prove what needed to be proved.
Every definition and theorem in mathlib and this project have been checked by Lean’s trusted kernel, which computationally verifies that the proofs we have constructed are indeed correct. However, Lean cannot check that the statements of the definitions and theorems match their intended English equivalents, so when drawing conclusions from the code in this project, translation to and from English must be done with care.
The problem I express relates to the issues people mention about libraries: if a defined concept is used, one has to be sure the definition is correct (i.e., that the right thing has been proved).
Wilshaw's formalization is not vulnerable to this objection, though libraries were used. What is proved is that a certain defined concept satisfies a certain suite of formulas of first order logic. If there is a predicate satisfying that suite of formulas, NF is consistent.
and it very much IS an essential part of my confidence in this proof that conversations between me and Sky Wilshaw reveal that she understands my argument [and was able to point out errors and omissions in paper versions!]
human interaction helps create confidence. But the software is extremely reliable: a philosophical challenge based on bugs in theorem proving software just is not going to hold water.
I am sure you know this, but for the audience: the danger can be mitigated somewhat with a "test suite" of theorems and examples about the definitions. These examples can be very simple ("this particular object, with this particular operation, is a group; this other object is not") or much more sweeping and general (e.g. fundamental theorems like "all objects with this property are isomorphic" or "all objects with this property embed canonically in this other construction"). It doesn't prove that your definitions are correctly capturing what your natural-language proof talks about, but it can help you be more confident.
Another danger is some sort of bug in Lean itself. This isn't unprecedented in theorem provers [1][2]. These might be hard to hit by accident... but there are larger and larger collaborations where arbitrarily people fill in steps (like [3]). Someone trolling one of these efforts by filling a step in using a bug they found might become worth worrying about.
It's kind of a root of trust problem, isn't it? I think the algorithm for checking proofs is relatively simple. All those fancy tactics boil down to a sequence of rewrites of an expression tree, using a small handful of axiomatic rewrite rules.
The trusted codebase becomes that checking algorithm, along with the "compiler" that translates the high-level language to term rewriting syntax. Formally verifying that codebase is a rather circular proposition (so to speak), but you could probably bootstrap your way to it from equations on a chalkboard.
> Every definition and theorem in mathlib and this project have been checked by Lean’s trusted kernel, which computationally verifies that the proofs we have constructed are indeed correct.
From a foundational perspective, it is also important to note that this proof is one of equiconsistency between NF and the Lean kernel, which itself is handchecked. Mechanized theorem provers preserve that level of correctness imputed to them via outside injection, from humans or other out-of-band systems.
It certainly isnt a proof of equiconsistency between NF and the Lean kernel. The theory implemented in the Lean kernel is considerably stronger than NF.
Congratulations on the verification of your proof! It must be great to have your research life's crowning work being formally confirmed! Also a great victory for the new foundations of Quine.
I have shown the consistency of New Foundations. My aim is not actually to promote it as a working set theory. NFU, which admits Choice, is probably better for that. But if there are people who want to use NF as the foundation, it is now seen to be as secure as a rather small fragment of ZFC.
If I’m not mistaken, this is the first use of a proof assistant to settle the status of a difficult proof that had been sitting in limbo for years. There were some previous projects (e.g. the Four Color Theorem in Coq) that validated existing proofs with a large computational element handled by untrusted software, but I think this is the first one where the epistemological status of the result was uncertain in the larger mathematical community.
There is a major difference though: Holmes's proof was broadly comprehensible, just extremely complicated and easy to get lost in the details. In particular Holmes really tries to make the reader understand, with a fairly gentle/apologetic introduction.
But Mochizuki's "proof" is completely impenetrable even to experts on, like, page 3, and makes no effort to explain what is happening.
Another key difference is that New Foundations is a niche field, so there simply was not a huge amount of human effort spent reading Holmes's work. That's not the case with Mochizuki's proof. There's a big difference between "a small number of mathematicians didn't understand the proof but suspect it's correct" and "a large number of mathematicians didn't understand the proof and concluded it was incorrect."
And, most of all: Holmes's formulation of twisted type theory made the proof a natural candidate for dependently-typed formal verification. Mochizuki's proof is not type-theoretic and does not seem like a great candidate for the calculus of constructions - maybe it is! I actually have no idea. I suspect Mochizuki is the only person in the world who can answer that. But it's critical that Holmes did so much background work to simplify his proof and make this Lean program possible. Mochizuki should do the same.
AFAICT, both in terms of the "sociology of mathematics" and the amount of work required to even attempt a Lean formalization, trying to verify Mochizuki's proof is a waste of time.
I have studied IUT since 2012, and it is indeed, totally baroque. However, Motizuki Theory, is totally rebased and admits results of much interest. I will write more on this matter if my claim is of mutual interest.
IRT, the topic and digression here, LLM and LEAN4 are of not much use for IUT.
IUT Theory is much easier understood by a hacker than by a math guy, eventually tho, monitors and tv's did kinda become the same thing but there are, some minor differences.
Can someone give a rough description of what is special, or novel, in the "New Foundations" set theory formulation, relative to other formulations? Or at least, link to a description readable by, say, a math undergrad student or an engineering professional?
I think the main thing is the existence of the universal set. For my use case, the type system of a programming language, such a universal set is very useful. There are various hacks used in existing systems like cumulative universes or type-in-type which are not as satisfying - instead, I can just check that the type signature is stratified and then forget about types having numerical levels.
I agree, a "cool" think about NF is the universal set.
Another way to be introduced to New Foundations, along with how one can use it, is the Metamath database for New Foundations: https://us.metamath.org/nfeuni/mmnf.html
Metamath is a proof system, but unlike most alternative systems like Lean, it doesn't have a built-in set of axioms - you need to declare your axioms, and then you can prove theorems (using only previous axioms and proven theorems). So you can declare the axioms of New Foundations, and then use them to prove other things.
One thing you immediately discover when you try to use New Foundations is that "the usual definition of the ordered pair, first proposed by Kuratowski in 1921 and used in the regular Metamath Proof Explorer, has a serious drawback for NF and related theories that use stratification. The Kuratowski ordered pair is defined as << x , y >> = { { x } , { x , y } }. This leads to the ordered pair having a type two greater than its arguments. For example, z in << << x , y >> , z >> would have a different type than x and y, which makes multi-argument functions extremely awkward to work with. Operations such as "1st" and complex "+" would not form sets in NF with the Kuratowski ordered pairs. In contrast, the Quine definition of ordered pairs, defined in definition df-op, is type level. That is, <. x , y >. has the same type as x and y, which means that the same holds of <. <. x , y >. , z >. This means that "1st" is a set with the Quine definition, as is complex "+". The Kuratowski ordered pair is defined (as df-opk), because it is a simple definition that can be used by the set construction axioms that follow it, but for typical uses the Quine definition of ordered pairs df-op is used instead."
One eye-popping result is that the Axiom of Choice can be disproven in NF. See that site (or other pages about NF) for details.
The thing I find really aesthetically pleasing about NF is how it adjusts the axiom of subset selection so that "the set of all sets" doesn't cause Russell's paradox. Basically it requires that the predicates you use to select subsets must obey a certain very lightweight type system. "x is not a member of itself" is not a well-typed question under any reasonable type system, and in particular it doesn't satisfy NF's "stratifiability" requirement, so you can't use it to construct the Russell's-paradox set of all sets which do not contain themselves.
Like having a programming language sold as free of any awful metaprogramming constructions (and induced headaches) through guarantees that no metaprogramming at all can be done in it.
A language that forbids to talk about its own grammatical structures is doomed to see its users abandon or extend it when they will inevitably deal with a situation where exchanging about the communication tool at hand is required to appropriately deal with its possibilities, expectable behaviors and limits.
You just move the sorts around, you don't get rid of having to have sorts. I guess a rug with dirt swept under it is more aesthetically pleasing than a rug with dirt on top of it?
One "nice" thing about NF is that it has only two axioms/axiom schemes: 1) sets with the same members are the same, and 2) any stratifiable property corresponds to a set off all the things which have that property. And the definition of "stratifiable" isn't very complicated.
ZF, by contrast, has eight rather ad hoc axioms/axiom schemes.
I was wondering what were the fundamental differences between Coq and Lean and if they operated on the same kind of logic and found this [1].
I barely understand anything from that discussion and don't practice any of them. Feel free to chime in if you have something about this, or a comparison with other proof assistants.
> Randall Holmes has claimed to have a proof of its consistency. In this repository, we use the interactive theorem prover Lean to verify the difficult part of his proof, thus proving that New Foundations is indeed consistent.
I think proponents of Lean are a little bit too strong in their use of language. Lean is not a superior method of proof, as is often implied. It is an alternative way of proof. If one looks into getting into Lean, one quickly realizes this situation. It is a programming language and system, with its own bugs, and you are highly reliant upon various stacks of libraries that other humans have written. These libraries have choices made in them and potentially gaps or even bugs.
So I think I take issue with the language here that it was Lean that said the proof was good. I think the more accurate and honest language is that the written proof was verified by human mathematicians and that the proof was translated into Lean by humans and verified there as well. I don't think it's necessarily accurate, or I haven't seen explanations that say otherwise, that it's Lean that provides the sole, golden verification, but that is often implied. I think the subtitle is the most accurate language here: "A digitisation of Randall Holmes' proof".
I do think that machine-verified proofs in strong systems like Lean are far superior to proofs merely verified by humans. Humans are amazing, but they also get bored and miss details.
This isn't just a theoretical claim. People read Euclid's Elements for over two thousand years before noticing a missing axiom. That's the sort of basic mistake that any properly-working machine proof verification system would immediately reveal. Published math proofs are often later revealed to be wrong. The increasing sophistication of math means it's getting harder and harder for humans to properly verify every step. Machines are still not as good at creating proofs, but they're unequalled at checking them.
There are "competing" systems to Lean, so I wouldn't say that Lean is the "one true way". I like Metamath, for example. But the "competition" between these systems needs to be in quotes, because each has different trade-offs, and many people like or use or commit to multiples of them. All of them can verify theorems to a rigor that is impractical for humans.
While there indeed may be bugs, it's my understanding that any trust needs to be placed only on the kernel.
> various stacks of libraries that other humans have written
If you're referring to mathlib here, I'm not sure this is correct. As again all mathlib code compiles down to code that is processed by the kernel.
Indeed this is reinforced by the draft paper on the website [0]:
> Lean is a large project, but one need only trust its kernel to ensure that accepted proofs are correct.
If a tactic were to output an incorrect proof term, then the kernel would have the opportunity to find
this mistake before the proof were to be accepted.
I think that's an overly formalistic view by the Lean team.
When I looked into Lean last (sometime last year), it was a hodgepodge of libraries and constructions. The maturity of the library depended upon if someone in that area of math had spent a lot of time building one up. There were even competing libraries and approaches. To me, that implies that there's "choice" there, just as there is in mathematics. And a choice could be "buggy". What's to say that the construction of a mathematical structure in Lean is the same as the mathematical structure in "normal" mathematics?
So yes, if that statement by the Lean team is accurate, you can build correct things on top of the kernel, but the use of "correct" is a very formal one. It's formally correct in terms of the kernel, but it doesn't mean that it's necessarily mathematically correct. This is to my understanding of course, garnered from trying to get into Lean.
I think their point is it’s turtles all the way down. At some point you’re relying on software, kernel or otherwise, which will always have the possibility for bugs.
It seems like Lean is an interesting tool that maybe could help with proving, but ultimately will never be authoritative itself as a statement of proof.
Of course, this is ultimately a philosophical debate, which hinges on the fact to at while Lean may be demonstrably correct in any number of known applications, there are no guarantees that it is and will remain correct. Such is the nature of software.
The difference is that in lean you only need to trust the kernel. The rest is constructed on top. If the kernel is sound everything else is as well. This is in stark contrast with a standard programming language. Where at any time a bug can be introduced. It's also in stark contrast with math where any lemma may contain an error.
> It is a programming language and system, with its own bugs
The cool thing about theorem provers is that an invalid proof won’t even compile (as long as the kernel is correct). There’s no such thing as a traditional software bug that happens only at runtime in these systems when it comes to proofs, because there’s no runtime at all.
You can also use Lean as a “regular” programming language, with the corresponding risk of runtime bugs, but that’s not happening here.
You misunderstand theorem provers. This isn't kid "all abstractions leak" land -- you do not need to trust the libraries, you only need to trust the kernel.
Trusting the kernel is not trivial either, but this is still a major leap over informal proofs. With informal proofs you do in fact need to trust the "libraries" (culture, the knowledge of other people), because there is no practical way to boil down to axioms.
As an amateur mathematician whose use of sets is mostly as a lingua franca for describing other things, it's not clear to me what implications this has for the wider mathematical field, especially if the usefulness of NF is comparable to the established ZFC and its variants. Is it expected to become as popular as ZFC in machine proofs?
I do find the existence of a universal set more intuitive, so if nothing else this proof has rekindled my own interest in formalisation.
From my naive and amateur view, the relative consistency result makes NF at least as useful as ZFC, since every model of ZFC can be extended into a model of NF. But it seems it won't make NF all that useful unless:
1. We prove NF is inconsistent. Then ZFC is also inconsistent (and the stars start winking out in the night sky ;)
2. We prove ZFC is inconsistent. Then there's still a chance NF is consistent (fingers crossed!)
I'm probably ignoring more practical "quality of life" benefits of NF, like being able to talk about proper classes, and side stepping Russell's paradox with stratified formulas.
It is no part of my agenda to promote the use of NF as an independent foundational system. It is a very odd one. But if someone wants to promote this, the consistency result says, it will work, at least in the sense that you are in no more danger of arriving at a contradiction than you are in ZFC.
I wonder whether this will eventually lead to collaborative proofs , and ‘ bug fixes’ , essentially turning maths into a process similar to code on GitHub.
Yea I’ve run through that a couple of years ago - was brilliant, had a lot of fun. But I mean to stay up to date and somehow contribute from the sidelines
There is a serious danger, which has nothing to do with bugs in Lean, which is a known problem for software verification and also applies in math: one must read the conclusions carefully to make sure that the right thing is actually proved.
I read Wilshaw's final conclusions carefully, and she did indeed prove what needed to be proved.
Every definition and theorem in mathlib and this project have been checked by Lean’s trusted kernel, which computationally verifies that the proofs we have constructed are indeed correct. However, Lean cannot check that the statements of the definitions and theorems match their intended English equivalents, so when drawing conclusions from the code in this project, translation to and from English must be done with care.
Wilshaw's formalization is not vulnerable to this objection, though libraries were used. What is proved is that a certain defined concept satisfies a certain suite of formulas of first order logic. If there is a predicate satisfying that suite of formulas, NF is consistent.
human interaction helps create confidence. But the software is extremely reliable: a philosophical challenge based on bugs in theorem proving software just is not going to hold water.
[1]: https://inutile.club/estatis/falso/
[2]: https://www.youtube.com/watch?v=sv97pXplxf0
[3]: https://terrytao.wordpress.com/2023/11/18/formalizing-the-pr...
The trusted codebase becomes that checking algorithm, along with the "compiler" that translates the high-level language to term rewriting syntax. Formally verifying that codebase is a rather circular proposition (so to speak), but you could probably bootstrap your way to it from equations on a chalkboard.
From a foundational perspective, it is also important to note that this proof is one of equiconsistency between NF and the Lean kernel, which itself is handchecked. Mechanized theorem provers preserve that level of correctness imputed to them via outside injection, from humans or other out-of-band systems.
They do exactly what you tell them to.
https://www.nature.com/articles/d41586-021-01627-2
https://leanprover-community.github.io/blog/posts/lte-final/
Claimed to be proven in 2012, and to 400+ page paper is online. But I don't many accept the proof.
But Mochizuki's "proof" is completely impenetrable even to experts on, like, page 3, and makes no effort to explain what is happening.
Another key difference is that New Foundations is a niche field, so there simply was not a huge amount of human effort spent reading Holmes's work. That's not the case with Mochizuki's proof. There's a big difference between "a small number of mathematicians didn't understand the proof but suspect it's correct" and "a large number of mathematicians didn't understand the proof and concluded it was incorrect."
And, most of all: Holmes's formulation of twisted type theory made the proof a natural candidate for dependently-typed formal verification. Mochizuki's proof is not type-theoretic and does not seem like a great candidate for the calculus of constructions - maybe it is! I actually have no idea. I suspect Mochizuki is the only person in the world who can answer that. But it's critical that Holmes did so much background work to simplify his proof and make this Lean program possible. Mochizuki should do the same.
AFAICT, both in terms of the "sociology of mathematics" and the amount of work required to even attempt a Lean formalization, trying to verify Mochizuki's proof is a waste of time.
https://www.reddit.com/r/math/comments/1bhiz0s/construction_...
It would be nice if this one were formalized in Lean (or Coq or HOL) though.
But a lean-assistive llm trained on mochi's work? Ahhh! What an intriguing possibility!
IRT, the topic and digression here, LLM and LEAN4 are of not much use for IUT.
IUT Theory is much easier understood by a hacker than by a math guy, eventually tho, monitors and tv's did kinda become the same thing but there are, some minor differences.
I think the main thing is the existence of the universal set. For my use case, the type system of a programming language, such a universal set is very useful. There are various hacks used in existing systems like cumulative universes or type-in-type which are not as satisfying - instead, I can just check that the type signature is stratified and then forget about types having numerical levels.
Another way to be introduced to New Foundations, along with how one can use it, is the Metamath database for New Foundations: https://us.metamath.org/nfeuni/mmnf.html
Metamath is a proof system, but unlike most alternative systems like Lean, it doesn't have a built-in set of axioms - you need to declare your axioms, and then you can prove theorems (using only previous axioms and proven theorems). So you can declare the axioms of New Foundations, and then use them to prove other things.
One thing you immediately discover when you try to use New Foundations is that "the usual definition of the ordered pair, first proposed by Kuratowski in 1921 and used in the regular Metamath Proof Explorer, has a serious drawback for NF and related theories that use stratification. The Kuratowski ordered pair is defined as << x , y >> = { { x } , { x , y } }. This leads to the ordered pair having a type two greater than its arguments. For example, z in << << x , y >> , z >> would have a different type than x and y, which makes multi-argument functions extremely awkward to work with. Operations such as "1st" and complex "+" would not form sets in NF with the Kuratowski ordered pairs. In contrast, the Quine definition of ordered pairs, defined in definition df-op, is type level. That is, <. x , y >. has the same type as x and y, which means that the same holds of <. <. x , y >. , z >. This means that "1st" is a set with the Quine definition, as is complex "+". The Kuratowski ordered pair is defined (as df-opk), because it is a simple definition that can be used by the set construction axioms that follow it, but for typical uses the Quine definition of ordered pairs df-op is used instead."
One eye-popping result is that the Axiom of Choice can be disproven in NF. See that site (or other pages about NF) for details.
Like having a programming language sold as free of any awful metaprogramming constructions (and induced headaches) through guarantees that no metaprogramming at all can be done in it.
A language that forbids to talk about its own grammatical structures is doomed to see its users abandon or extend it when they will inevitably deal with a situation where exchanging about the communication tool at hand is required to appropriately deal with its possibilities, expectable behaviors and limits.
ZF, by contrast, has eight rather ad hoc axioms/axiom schemes.
I barely understand anything from that discussion and don't practice any of them. Feel free to chime in if you have something about this, or a comparison with other proof assistants.
[1] https://proofassistants.stackexchange.com/questions/153/what...
[1] https://github.com/coq/coq/issues/10871
I think proponents of Lean are a little bit too strong in their use of language. Lean is not a superior method of proof, as is often implied. It is an alternative way of proof. If one looks into getting into Lean, one quickly realizes this situation. It is a programming language and system, with its own bugs, and you are highly reliant upon various stacks of libraries that other humans have written. These libraries have choices made in them and potentially gaps or even bugs.
So I think I take issue with the language here that it was Lean that said the proof was good. I think the more accurate and honest language is that the written proof was verified by human mathematicians and that the proof was translated into Lean by humans and verified there as well. I don't think it's necessarily accurate, or I haven't seen explanations that say otherwise, that it's Lean that provides the sole, golden verification, but that is often implied. I think the subtitle is the most accurate language here: "A digitisation of Randall Holmes' proof".
This isn't just a theoretical claim. People read Euclid's Elements for over two thousand years before noticing a missing axiom. That's the sort of basic mistake that any properly-working machine proof verification system would immediately reveal. Published math proofs are often later revealed to be wrong. The increasing sophistication of math means it's getting harder and harder for humans to properly verify every step. Machines are still not as good at creating proofs, but they're unequalled at checking them.
There are "competing" systems to Lean, so I wouldn't say that Lean is the "one true way". I like Metamath, for example. But the "competition" between these systems needs to be in quotes, because each has different trade-offs, and many people like or use or commit to multiples of them. All of them can verify theorems to a rigor that is impractical for humans.
https://math.stackexchange.com/questions/1901133/euclids-ele...
> various stacks of libraries that other humans have written
If you're referring to mathlib here, I'm not sure this is correct. As again all mathlib code compiles down to code that is processed by the kernel.
Indeed this is reinforced by the draft paper on the website [0]:
> Lean is a large project, but one need only trust its kernel to ensure that accepted proofs are correct. If a tactic were to output an incorrect proof term, then the kernel would have the opportunity to find this mistake before the proof were to be accepted.
[0]: https://zeramorphic.github.io/con-nf-paper/main.l.pdf
When I looked into Lean last (sometime last year), it was a hodgepodge of libraries and constructions. The maturity of the library depended upon if someone in that area of math had spent a lot of time building one up. There were even competing libraries and approaches. To me, that implies that there's "choice" there, just as there is in mathematics. And a choice could be "buggy". What's to say that the construction of a mathematical structure in Lean is the same as the mathematical structure in "normal" mathematics?
So yes, if that statement by the Lean team is accurate, you can build correct things on top of the kernel, but the use of "correct" is a very formal one. It's formally correct in terms of the kernel, but it doesn't mean that it's necessarily mathematically correct. This is to my understanding of course, garnered from trying to get into Lean.
It seems like Lean is an interesting tool that maybe could help with proving, but ultimately will never be authoritative itself as a statement of proof.
Of course, this is ultimately a philosophical debate, which hinges on the fact to at while Lean may be demonstrably correct in any number of known applications, there are no guarantees that it is and will remain correct. Such is the nature of software.
The cool thing about theorem provers is that an invalid proof won’t even compile (as long as the kernel is correct). There’s no such thing as a traditional software bug that happens only at runtime in these systems when it comes to proofs, because there’s no runtime at all.
You can also use Lean as a “regular” programming language, with the corresponding risk of runtime bugs, but that’s not happening here.
Trusting the kernel is not trivial either, but this is still a major leap over informal proofs. With informal proofs you do in fact need to trust the "libraries" (culture, the knowledge of other people), because there is no practical way to boil down to axioms.
As an amateur mathematician whose use of sets is mostly as a lingua franca for describing other things, it's not clear to me what implications this has for the wider mathematical field, especially if the usefulness of NF is comparable to the established ZFC and its variants. Is it expected to become as popular as ZFC in machine proofs?
I do find the existence of a universal set more intuitive, so if nothing else this proof has rekindled my own interest in formalisation.
1. We prove NF is inconsistent. Then ZFC is also inconsistent (and the stars start winking out in the night sky ;)
2. We prove ZFC is inconsistent. Then there's still a chance NF is consistent (fingers crossed!)
I'm probably ignoring more practical "quality of life" benefits of NF, like being able to talk about proper classes, and side stepping Russell's paradox with stratified formulas.
I wonder whether this will eventually lead to collaborative proofs , and ‘ bug fixes’ , essentially turning maths into a process similar to code on GitHub.
https://adam.math.hhu.de/#/g/leanprover-community/NNG4