It's even more subtle than that. They do coincide in a sense, which is proven by Gödel's completeness theorem (well, at least in First-Order Logic). That one just says that a sentence is provable from some axioms exactly iff it's true in every interpretation that satisfy the axioms.
So one thing that Gödel's first incompleteness theorem shows it's that it's impossible to uniquely characterise even a simple structure such as the natural numbers by some "reasonable"[0] axioms - precisely because there will always be sentences that are correct in some interpretations but not in others.
Unless you use second-order logic - in which case the whole enterprise breaks down for different reasons (because completeness doesn't hold for second order logic).
[0] reasonable basically means that it must be possible to verify whether a sentence is an axiom or not, otherwise you could just say that "every true sentence is an axiom"
There, the obstacle is in some sense of a simplest nature: if your set of axioms admits a countable model, then it admits models of all infinite cardinalities. In other words, it shows that there is something fundamentally impossible in trying to capture an infinite structure (like numbers) by finite means (e.g. recursively axiomatizable).
As for this particular issue I think this wikipedia page is relevant: https://en.wikipedia.org/wiki/Impredicativity
An alternative approach is purely syntactic and sees a logical system as collection of valid transformation rules that can be applied to the axioms. In this view, a sentence is true if it can be obtained from the axioms by applying a sequence of valid transformation rules. This purely syntactic notion of truth is known as “provability”.
Then the key question is to ask whether the two notions coincide: one way to state Godel's first incompleteness theorem is that it shows the two notions do not coincide.
I hope this isn't too far off topic, but can someone clarify exactly how this problem indicts axioms? As an uninformed and naive musing, it occurs to me that an issue with the statement "this statement is false" is this. The whole of the statement, that is, the thing having truth or falsehood, cannot be addressed by one of its components.
This being said, your intuition is absolutely correct, the crux of the issue is with ‘this‘. What mathematicians realized is that if you are not careful with your choice of axioms, the resulting logical system becomes too “powerful” in the sense that it becomes self-referential: you can construct sentences that refer to themselves in a self-defeating manner.
As others have mentioned, this is the idea underlying Gödel's incompleteness theorem but also, to some extent, Russel's paradox that came before and is what the article is referring to. In Russel's paradox, the contradiction comes from constructing the set of all sets that contain themselves.
edit: Also their point on the curriculum being similar misses the fact that the standard for doing well differs. Barely learning some material is different from understanding it well. I've had the experience of taking various college math classes from a variety of institutions. There is a very large difference in the effort I had to put in to do well at a top school versus a middling school. This is of course related to my first point on the quality of students mattering as you simply cannot have the same bar given such different student populations.
> Really, it’s harder than this picture suggests, because many experiences are based on other students. If I want you as my project partner but you want to forget I exist, then something has to give.
Perhaps the most famous example is provided by the Lp spaces [1] consisting of functions whose pth power is absolutely integrable. For p≥1, these spaces are Banach spaces (complete normed spaces) but it is only when p=2 that the norm is associated with an inner product.
The significance of this paradigm shift is that it allowed mathematicians to apply some of the geometric intuition developed from the study of finite-dimensional spaces (such as the 3D Euclidean space) to difficult questions involving functions, such as the existence of solutions to certain differential equations.
The history of this change of perspective is absolutely fascinating and can be traced back to the end of the 19th century and beginning of the 20th century. At the time, work on axiomatic foundations of mathematics was driving a systematization of the study of mathematical objects by capturing their structure with a concise list of axioms. This is for example how the concept of an abstract vector space was born, encompassing not only Euclidean spaces but also infinite-dimensional spaces of functions.
An early reference already demonstrating this change of perspective, albeit in a primitive form, is a memoir by Vito Volterra from 1889 [1]. The PhD thesis of Maurice Fréchet from 1906 [2] is arguably the work that was most influential in crystalizing the new paradigm and presenting it in a modern form that served as a key reference for the first half of the 19th century. Of course, these are only two among a multitude of works around that time. Looking at later developments in the 19th century, it is hard not to also mention the book by Stefan Banach from 1932 [3].
[1] https://projecteuclid.org/journals/acta-mathematica/volume-1...
[2] https://zenodo.org/record/1428464/files/article.pdf
[3] http://kielich.amu.edu.pl/Stefan_Banach/pdf/teoria-operacji-...
1. The animated graphic illustrating simulated annealing infuriates me. It is described (without calling it that) as solving a shortest Hamiltonian path problem. If you look at it, it is actually a shortest Hamiltonian cycle problem, aka. travelling salesman problem (TSP). TSP is the canonical example of problem that simulated annealing and other metaheuristics are terrible at solving [1]. Proper mathematically-justified algorithms like Lin-Kernighan-Helsgaun [2] give better (often optimal!) results orders of magnitude faster. You can even solve TSP (an NP-hard problem) with optimality guarantees with Concorde, at sizes that beggar belief [3].
2. Saying that stochastic gradient descent is kind-of the same as simmulated annealing is quite a stretch. Gradient descent attempts to give local optima, full stop. Quite the opposite of simulated annealing. Now, there is an art in ML in choosing the step size (learning rate) and starting point. But the "stochastic" part is necessary to make it work on the huge problems that DNN require, where computing a full gradient would be impossible. The claim that we use SGD to get better local optima is new to me.
3. The mention of SAT/SMT is making the analogy do a ton of work here. The article admits it, but still, I struggle to understand how backtracking, a recursive deterministic (full-search-space) enumerative algorithm has anything in common with simulated annealing, a randomized iterative heuristic.
[1] http://www.math.uwaterloo.ca/tsp/usa50/index.html
The theory of finite fields is based on the theory of prime numbers, because the finite fields are sets of residues modulo a prime number or modulo a power of a prime number.
The theory of finite fields is involved in the design of many other block cipher functions or secure hash functions and also in the design of the most important message-authentication methods, like GCM, which is used to authenticate this HTML page on the HN site.
So prime numbers are important in most cryptographic applications, not only in asymmetric cryptography, like Diffie-Hellman or RSA. Prime numbers are used in one way or another for the transmission of any HTTPS data packet, not only in the key establishment phase of a TLS connection.
It is note quite correct that the finite field of order p^k is the set of residues modulo p^k when k > 1. Instead this field is obtained as a splitting field of the field of order p (which is the set of residues mod p).