So I don't buy the engineering angle, I also don't think LLMs will scale up to AGI as imagined by Asimov or any of the usual sci-fi tropes. There is something more fundamental missing, as in missing science, not missing engineering.
Sorry this is some bull. Either it works or it doesn’t.
Old lithium batteries are ticking time bombs. Swelling, leaking, and ready to ignite at the slightest spark or contact with moisture.
As their chemicals break down, one short-circuit can unleash a chain reaction of fire and toxic smoke. Even sitting forgotten in a drawer, they can suddenly swell, rupture, or explode without warning.
In construction, grading a site for building is a whole process involving surveying. If you dropped a person on a random patch of earth that hasn't previously been levelled and gave them no tools, it would be a significant challenge for that person to level the ground correctly.
What I'm saying is, your intuition that "I can look around me and find the minimum of anything" is almost certainly wrong, unless you have a superpower that no other person has.
When a computer has a surface which is 2 dimensions in and 1 dimension out, you can actually just do the same thing. Check like 100 values in the x/y directions and you only have to check like 10000 values. A computer can do that easy peasy.
When a computer does ML with a deep neural network, you don't have 2 dimensions in and 1 dimension out. You have thousands to millions of dimensions in and thousands to millions of dimensions out. If you have 100000 inputs, and you check 1000 values for each input, the total number of combinations is 1000^100000. Then remember that you also have 100000 outputs. You ain't doin' that much math. You ain't.
So we need fancy stuff like Jacobians and backtracking.
It's just that when the function is implemented on the computer, evaluating so many points takes a long time, and using a more sophisticated optimization algorithm that exploits information like the gradient is almost always faster. In physical reality all the points already exist, so if they can be observed cheaply the brute force approach works well.
Edit: Your question was good. Asking superficially-naive questions like that is often a fruitful starting point for coming up with new tricks to solve seemingly-intractable problems.
It does feels to me that we do some sort of sampling, definitely is not a naive grid search.
Also I find it easier to find the minima in specific directions (up, down, left, right) rather than let’s say a 42 degree one. So some sort of priors are probably used to improve sample efficiency.
We all can do it in 2-3D. But our algorithms don’t do it. Even in 2D.
Sure if I was blindfolded, feeling the surface and looking for minimization direction would be the way to go. But when I see, I don’t have to.
What are we missing?
For all the pro-WFH/fully remote developers on HN who live in North America, you're going to be in for a surprise when your company decides to replace you with someone living in another country. Why hire you when the company can hire someone who costs 1/5 of you and is willing to work harder without complaining? Both of you are remote anyway. So what if the new hire works at night and sleeps during the day?
For all pro-WFH/fully remote developers living in North America, you should be cheering for return to office mandates. It'll probably save your career long-term.
Still the approach is to put code and data in a folder and call it a day. Slap a "_FINAL" at the folder name and you are golden.