> Most "definitely harder than NP" problems require a nontrivial background in theoretical computer science or mathematics to understand.
One of the simplest examples of this is automatic program synthesis.
Searching for the optimal (e.g., shortest) program that satisfies a given set of inputs/outputs (function) is an undecidable (worse than exponential time) problem similar to the halting problem. The exponent in this case would have been the size of the instruction set, but we still don't know how many cycles we need (i.e., is it even practical to run?).
In applications like genetic programming, we deal with the halting problem by specifying a maximum # of cycles that the program interpreter will run for each time. The ability of the synthesized programs to halt in a bounded time becomes part of selection pressure. Candidates that return poor or no results within the constraints are quickly eliminated. This can be further compounded by making the candidates evolve their own resource limits as part of their genome.
Put differently, we can approximately solve the halting problem for some smaller problems with clever search techniques. I don't think talking about these things in absolute terms is very useful. Everything depends on the problem and the customer's expectations.
I don't know if I would call that "approximately solving the halting problem", and the use of the halting problem is already short-hand for much more general and less binary results. In the 1965 [1] paper by Hartmanis and Stearns that gave birth to computational complexity theory, they generalise the halting theorem to effectively say that it's generally impossible to tell what a program would do (specifically, whether it halts) in it's Nth step without simulating it for N steps. The halting theorem is just the special case where N is any N.
What you're doing isn't approximating or getting around anything. You're simply following the "generalised halting theorem": you're interested in what the program does in N steps, and you're finding that out by simulating it for N steps. You're not making any approximation that shortcuts any computational complexity results. (Such approximations exist in some cases, but that's not what you're doing here)
> You're not making any approximation that shortcuts any computational complexity results.
I completely agree in the default/initial case.
> (Such approximations exist in some cases, but that's not what you're doing here)
However, the entire point of genetic programming is that once you find something that is "on the map" at all, you can begin to exploit the region around that program with some reasonably accurate expectations regarding how certain behaviors will unfold in the local space. The larger the population the more stable this tends to become on average.
So, for pure exploration in GP, you do indeed suffer the full weight of the halting theorem. As you transition into the exploitation regime, this becomes more of a grey area.
> I don't think talking about these things in absolute terms is very useful. Everything depends on the problem and the customer's expectations.
Agreed. In the real world, approximate solutions to hard problems are often good enough.
For instance, the TSP might be a hard problem, but solving it exactly is pointless since the distances between nodes might be subject to measurement uncertainty anyway, and actual travel times might not be fixed as they might fall under a distribution.
Having said that, it’s still worth studying the theoretical aspects of these problems.
> There is one problem, though, that I find easily explainable. Place a token at the bottom left corner of a grid that extends infinitely up and right, call that point (0, 0). You're given list of valid displacement moves for the token, like (+1, +0), (-20, +13), (-5, -6), etc, and a target point like (700, 1). You may make any sequence of moves in any order, as long as no move ever puts the token off the grid. Does any sequence of moves bring you to the target?
If someone gives you such a sequence, it seems trivial to verify it in linear time. Even for arbitrary dimensions, and such witness can be verified in linear time.
To be in NP, witness must be verifiable in polynomial time with respect to the size of the original input. In this problem (VAS Reachability), the solution can be `2^2^2^...^K` steps long. Even if that's linear with respect to the witness, it's not polynomial with respect to the set of moves + goal.
Hmm.. I'd love to see a more formal statement of this, because it feels unintuitive.
Notably the question "given a number as input, output as many 1's as that number" is exponential in the input size. Is this problem therefore also strictly NP-hard?
Linear time in the length of the sequence, yes. But is the sequence length linear in dimension size, or number of moves given? Thats what is interesting.
I agree with the issue ("What's bigger than a dog? THE MOON"), but I don't agree with the need to provide an example that is provably distinct from NP. We're fine providing NP-complete problems without proving that they're distinct from P.
There are lots of accessible problems that belong to spaces that probably aren't NP. One that should be familiar to most people studying CS theory is "do these two regular expressions match the same set of strings?".
---
For the Diophantine vector reachability problem... I don't really like the Quanta presentation. ("An easy-sounding problem yields numbers too big for our universe.")
First, the problem, if you describe it accurately, doesn't sound easy at all. Diophantine problems are never easy. That's why we have real numbers.
Second, the article suggests that the problem should be introduced to children by casting in in terms of several rules of exchange ("For Alice, the transition vector (−1, −1, 1, 0) would represent the exchange of an apple and a banana for a cantaloupe."). But that would make the problem trivial: you start at the origin; without a rule of "exchange" in which the other party gives you as much as you want of something for free, you're never going to leave it.
And third, many easy problems generate numbers too big for our universe. That's not unusual at all. Compare the question "how many digits of pi do you need before the potential error in calculating the volume of a sphere the size of the visible universe is less than the volume of one hydrogen atom?". Can you imagine using more digits than that? You just involved a number too big for the universe.
The article passes over this point itself:
> It’s physically impossible to write down all the digits of 2↑↑6 — a liability of living in such a small universe.
> Second, the article suggests that the problem should be introduced to children by casting in in terms of several rules of exchange ("For Alice, the transition vector (−1, −1, 1, 0) would represent the exchange of an apple and a banana for a cantaloupe."). But that would make the problem trivial: you start at the origin; without a rule of "exchange" in which the other party gives you as much as you want of something for free, you're never going to leave it.
I'm not sure what point you're trying to make here? That specific transition vector would be one such vector among several in a given ruleset. You're correct that for any ruleset, if I don't have at least one vector that can get me away from the origin, I'm stuck there, but then I also have to consider that I still might not be able to get to where I want to go. If I have the ruleset [(2, 0), (-4, 2)] I still can't get to (3, 5).
Something I find fascinating is that we know that P != EXPTIME, and that P <= NP <= EXPTIME, but have managed to prove neither P != NP nor NP != EXPTIME. NP has to be somewhere between them but we have no idea where.
Technically correct is the best kind of correct, and The Halting Problem is a fun head scratcher for a green computer scientist. I'm glad it was part of my introduction to NP hardness.
That said, you do make a great point OP, and I'll think about it every time the Halting Problem comes up.
This seems weird to me. I would think the classic way to introduce these topics is to first talk about decidability, introduce the halting problem through that, then introduce complexity. That's certainly how I was introduced to this material.
I wouldn't think that you'd learn about the halting problem as only an example of something harder than NP hard.
I agree. NP, NP-complete, and NP-hard are all talking about how long things take. If something is uncomputable, then how long it takes doesn't seem very relevant or interesting.
"If Jimmy earns 5 cents a day and a watermelon costs $10, how long will it take for Jimmy to earn enough money to buy the color blue?"
Maybe the connection is that it feels like the halting problem could be solved if you had an infinite amount of time?
The million dollar question here is... a literal million dollar question.
The natural way to go up in complexity from NP is along the polynomial hierarchy, but since it could be the case that P=NP, those problems aren't provably harder.For all we know it could even be the case that P=PSPACE.
You could invoke the nondeterministic variant of the time-hierarchy theorem.
Is the problem mentioned in the article equivalent to deciding whether there exists a solution to a system of linear equations in the positive integers?
I think no, because the vector additions are considered in sequence, and at no point are you allowed to leave the positive quadrant.
One of the simplest examples of this is automatic program synthesis.
Searching for the optimal (e.g., shortest) program that satisfies a given set of inputs/outputs (function) is an undecidable (worse than exponential time) problem similar to the halting problem. The exponent in this case would have been the size of the instruction set, but we still don't know how many cycles we need (i.e., is it even practical to run?).
In applications like genetic programming, we deal with the halting problem by specifying a maximum # of cycles that the program interpreter will run for each time. The ability of the synthesized programs to halt in a bounded time becomes part of selection pressure. Candidates that return poor or no results within the constraints are quickly eliminated. This can be further compounded by making the candidates evolve their own resource limits as part of their genome.
Put differently, we can approximately solve the halting problem for some smaller problems with clever search techniques. I don't think talking about these things in absolute terms is very useful. Everything depends on the problem and the customer's expectations.
What you're doing isn't approximating or getting around anything. You're simply following the "generalised halting theorem": you're interested in what the program does in N steps, and you're finding that out by simulating it for N steps. You're not making any approximation that shortcuts any computational complexity results. (Such approximations exist in some cases, but that's not what you're doing here)
[1]: On the computational complexity of algorithms: https://www.ams.org/journals/tran/1965-117-00/S0002-9947-196...
I completely agree in the default/initial case.
> (Such approximations exist in some cases, but that's not what you're doing here)
However, the entire point of genetic programming is that once you find something that is "on the map" at all, you can begin to exploit the region around that program with some reasonably accurate expectations regarding how certain behaviors will unfold in the local space. The larger the population the more stable this tends to become on average.
So, for pure exploration in GP, you do indeed suffer the full weight of the halting theorem. As you transition into the exploitation regime, this becomes more of a grey area.
Agreed. In the real world, approximate solutions to hard problems are often good enough.
For instance, the TSP might be a hard problem, but solving it exactly is pointless since the distances between nodes might be subject to measurement uncertainty anyway, and actual travel times might not be fixed as they might fall under a distribution.
Having said that, it’s still worth studying the theoretical aspects of these problems.
> There is one problem, though, that I find easily explainable. Place a token at the bottom left corner of a grid that extends infinitely up and right, call that point (0, 0). You're given list of valid displacement moves for the token, like (+1, +0), (-20, +13), (-5, -6), etc, and a target point like (700, 1). You may make any sequence of moves in any order, as long as no move ever puts the token off the grid. Does any sequence of moves bring you to the target?
If someone gives you such a sequence, it seems trivial to verify it in linear time. Even for arbitrary dimensions, and such witness can be verified in linear time.
Notably the question "given a number as input, output as many 1's as that number" is exponential in the input size. Is this problem therefore also strictly NP-hard?
But I guess it can be proven that the shortest possible sequence grows faster than polynomial.
Roughly put that is the certificate definition of being in NP.
Dead Comment
There are lots of accessible problems that belong to spaces that probably aren't NP. One that should be familiar to most people studying CS theory is "do these two regular expressions match the same set of strings?".
---
For the Diophantine vector reachability problem... I don't really like the Quanta presentation. ("An easy-sounding problem yields numbers too big for our universe.")
First, the problem, if you describe it accurately, doesn't sound easy at all. Diophantine problems are never easy. That's why we have real numbers.
Second, the article suggests that the problem should be introduced to children by casting in in terms of several rules of exchange ("For Alice, the transition vector (−1, −1, 1, 0) would represent the exchange of an apple and a banana for a cantaloupe."). But that would make the problem trivial: you start at the origin; without a rule of "exchange" in which the other party gives you as much as you want of something for free, you're never going to leave it.
And third, many easy problems generate numbers too big for our universe. That's not unusual at all. Compare the question "how many digits of pi do you need before the potential error in calculating the volume of a sphere the size of the visible universe is less than the volume of one hydrogen atom?". Can you imagine using more digits than that? You just involved a number too big for the universe.
The article passes over this point itself:
> It’s physically impossible to write down all the digits of 2↑↑6 — a liability of living in such a small universe.
I'm not sure what point you're trying to make here? That specific transition vector would be one such vector among several in a given ruleset. You're correct that for any ruleset, if I don't have at least one vector that can get me away from the origin, I'm stuck there, but then I also have to consider that I still might not be able to get to where I want to go. If I have the ruleset [(2, 0), (-4, 2)] I still can't get to (3, 5).
That said, you do make a great point OP, and I'll think about it every time the Halting Problem comes up.
I wouldn't think that you'd learn about the halting problem as only an example of something harder than NP hard.
"If Jimmy earns 5 cents a day and a watermelon costs $10, how long will it take for Jimmy to earn enough money to buy the color blue?"
Maybe the connection is that it feels like the halting problem could be solved if you had an infinite amount of time?
The natural way to go up in complexity from NP is along the polynomial hierarchy, but since it could be the case that P=NP, those problems aren't provably harder.For all we know it could even be the case that P=PSPACE.
You could invoke the nondeterministic variant of the time-hierarchy theorem.
Deleted Comment
I think no, because the vector additions are considered in sequence, and at no point are you allowed to leave the positive quadrant.
[Edit] Yeah, it's more than just solving positive integer linear systems: https://en.m.wikipedia.org/wiki/Vector_addition_system