> AlphaGo can play very well on a 19x19 board but actually has to be retrained to play on a rectangular board.
This right here is the soft underbelly of the entire “machine learning as step towards AGI” hype machine, fueled in no small part by DeepMind and its flashy but misleading demos.
Once a human learns chess, you can give it a 10x10 board and she will perform at nearly the same skill level with zero retraining.
Give the same challenge to DeepMind’s “superhuman” game-playing machine and it will be an absolute patzer.
This is an obvious indicator that the state of the art in so-called “machine learning” doesn’t involve any actual learning in the way it is normally applied to intelligent systems like humans or animals.
I am continually amazed by the failure of otherwise exceedingly intelligent tech people to grasp this problem.
Try learning to ride bike with inverted steering, try to navigate world with your vision flipped over or use your non-dominant hand to do things that you normally do. Well, try to write on Azerty keyboard if you are Qwerty native (really, fk Azerty :P).
Humans are also not a general intelligence.
In certain sense Deep Reinforcement Learning is actually more general than human intelligence. For example, when playing games you can remove certain visual clues. It makes it almost impossible to play for humans, while Deep RL scores will not even budge. It means that Deep RL is more general, because it does not relay on certain priors, but it also makes it more stupid in narrow domain of human expertise. Try this game to see yourself: https://high-level-4.herokuapp.com/experiment
Human brains are amazing, but they also require certain amount of time to retrain when inputs/outputs are fundamentally changed.
PS. I didn't hear about anyone testing different board sizes with AlphaZero-esque computer players. But I saw Leela Zero beating very strong humans, when rules of the game were modified so that that the human player could play 2 additional moves: https://www.youtube.com/watch?v=UFOyzU506pY
This is true too, we are adapted to our environment, in particular the things that we do automatically (System I).
Playing chess well is a combination of both conscious and unconscious skills. However when deep learning systems play, it is all the unconscious, automatic application of statistical rules. They are playing a very different game from the human chess game.
Because there is no abstract reasoning involved here, these systems cannot apply the lessons learned from chess to another board game, or to something completely different in life, which humans can and do. So even though they are much stronger than human players, they aren't strong in the same way.
>Once a human learns chess, you can give it a 10x10 board and she will perform at nearly the same skill level with zero retraining.
Interesting. Has this actually been shown? I would assume a lot of the strategies a human is familiar with would fall apart as well. I'm no chess or go player but I would have to learn new strategies in a tic-tac-toe game scaled to 10x10. I would certainly not be as proficient although I would still consider myself to have intelligence.
Almost all of the human strategies and concepts would still apply: center control, square control, development, initiative, king safety, the opposition, etc. The only exceptions would be fringe concepts like opening theory (already moot in Chess960) and endgame edge cases.
If you’re still not convinced,
I’ll prove that skills transfer by playing bullet against anyone who can make a 10x10 variant playable online.
Why stick to chess? Magic: the Gathering is a game that is played with cards with printed rules text that describes how a card should be played. The set of cards is constantly updated with a few hundred new cards introduced at least twice a year (although games are often played with only a subset of all cards).
Despite the constant change of the card pool, and also the wording of the rules text on the cards, and the rules themselves, human players are perfectly capable of "picking up a card they've never seen before and playing it" correctly.
Perhaps a better example than 10x10 chess would be bughouse chess [1]. That's a chess variant played between two teams of two players using two sets and two clocks. It's a common break activity between rounds at amateur chess tournaments. Human chess players of all levels pick it up pretty fast after they play a handful of games.
I think a lot of AI research now is very narrow, but this ignores that there's also a lot of research in RL/etc that's working to solve the problem of generalization.
Meta learning is for solving similar problems from a distribution (like different sized boards in your chess example) and has taken off recently (only baby steps so far though). Modular learning is also becoming big, where concepts that are repeatedly used are stored/generalized.
Of course if you mess with a function's inputs in ways it's never seen it's going to "not understand" what's going on. This is an agent which only knows 8x8 space.
Train it on variable spaces, and you'll get an agent that can play on variable spaces. In fact, you can probably speed things up drastically by using transfer learning from a model which already learned 8x8 space and modifying the inputs and outputs to match the new state and action space.
What part of this do you think "exceedingly intelligent tech people" aren't grasping? Something qualitative? Do you think people in machine learning think of "learning" as literally meaning the same thing as the colloquial usage? What, precisely, are you attacking here? All the harsh anti-machine-learning viewpoints with no clarity are becoming exhausting.
The GP described the "hype machine" and the implication that deep learning is step to AGI. As far as I can tell, the "hype machine" is real in sense that popular articles describe current methods as steps towards our broad concept of intelligence.
Certainly, someone close enough to the technical process of deep learning will admit that it essentially an extension of logistic regression without any "larger" implications - at least some deep learning researchers are always clear to distinguish the activity from "human intelligence" (and even if a given research never parrots the hype train's mantra, they know it's there and inherently play some part).
But more a minimum assertion of deep learning is that it "generalizes well". And what does "well" mean in this context? In the few situations where data can be generated by the process, like Alpha-Go, it can make a good average approximation of a function but in most situations of deep learning it means "generalizes like a human" - especially image recognition.
This comes together in the process of training AIs. Researchers take data that they hope represents a pattern of inputs and output in a human decision making process and assume they can construct a good approximation of a function that underlies this data. A variety of things can go wrong - the input data can be selective in ways the researchers don't understand (there was a discussion about a large database of images from the net being biased just by the tendency of photographers to center their main subject), there can be no unambiguous "function" - loan/parole AI that's inherently biased because it associated data that isn't legitimate, objective criteria for the decision sought), and so-forth. Some tech people are aware of the problems here to but this stuff is going out the door and being used in decisions affecting people's lives. Merely noting possible problems isn't enough here. These "exceedingly smart people" are still handing off their creations to other people are taking them as something akin to miraculous decision makers.
The question we should be asking is how much retraining had to occur to accomplish the new task? If it's significantly less than to the accomplish the original task, the algorithm has transferred its latent knowledge from the original task to the new task, which is significant.
Humans have orders of magnitude more neurons, more complicated neurons, more intricate neural structures, and their training data is larger and more varied.
Agreed except for training data is larger. Training data is often far smaller for humans. You probably saw a few cats before generalizing and understanding what a cat looks like. A neural net might require hundreds of thousands if not more samples to be a robust classifier for cats. AlphaGo et al look at tens of millions of games, humans look at a small fraction.
This doesn't have much to do with the algorithms, and is more to do with the engineering decisions that went into AlphaGo and AlphaZero. They are designed to play one combinatorial game really well. With a bit of additional efffort and a lot of additional compute, you could expand the model to account for multiple rule / scale variations, maybe even different combinatorial games.
I think the GP was noting the problem that AI can easily encounter situations beyond what it was designed and simply fail while human intelligence involves a more robust combination of behaviors and thus humans can generalize in a much wider variety of situations.
If the system designer has to know the parameters of the challenge the system is up again, it should be obvious you can always add another parameter that the designer didn't know about and get a situation where the system will fail. This is much more of a problem in "real world situations" which no designer can fully describe.
I'm coming to suspect that even our data isn't enough for useful AI. Imagine you had a truly general sci-fi AI at your office. It still couldn't just look at your database and answer a simple question like "What was the difference in client churn rates between mobile and desktop last month?" or "What was the effect of experiment 1234 on per-client revenue?" Hell, a human couldn't do it. As far as the human or AI would know, you just presented it with a bunch of random tables. This matters because it's incredibly helpful to know which pieces are randomized. Which rows are repeated measurements as opposed to independent measurements. Which pieces are upstream of which others. There's so much domain knowledge baked into data, while we just expect an algorithm to learn from a simple table of floats.
The human state of the art solution seems to be going on slack and asking questions about the data provenance, which will decidedly not work for an automated approach.
A primary reason I can do better a better job than a generic algorithm is because you told me where the data came from (or I designed the schema and ETL myself), while the algo can't make any useful assumptions because all that info is hidden.
I'm beginning to suspect that it could be relevant soon. If you wait for such advanced AI that can understand your documentation, you're making it harder than it has to be, kinda like Marcus is saying in OP. You could solve this with just tons of raw data, but that seems unnecessarily hard. For a firm with the usual small dataset, maybe even unrealistically hard.
Anyways, with some naive googling I found these references which seem interesting with regards to lineage and causality (for the query "lineage causal database schema"):
[0] Causality and Explanations in Databases. The "Related topics in Databases" section seems interesting.
[1] Duke's 'Understanding Data: Theory and Applications, Lecture 16: Causality in Databases' (by one of [0]'s authors).
[2] Quantifying Causal Effects on Query Answering in
Databases. This has some interesting definitions.
[3] Causality in Databases. Seems like a more in depth version of [0].
[4] A whole course on "Provenance and lineage"
[5] Causality and the Semantics of Provenance. Defines "provenance graphs" and some properties.
If you don't know the sampling regime that generated the data, you might as well give up. The only solutions to this seem to be collecting and providing all the data to ML analysis, or using provenance to inform the training procedure, which requires human expertise.
However, with a more general AI, we would be able to tell it "this is where and how the data were collected" and it could make the necessary inferences. Fully general AI would also be able to ask the right questions and make reasonable guesses on its own. Everything you do now, and nothing like anything that's been developed.
To me the weak point of this article isn't in the thesis (of course everyone can agree that general intelligence is more useful than narrow, all else being equal) but that there was nothing said about how to get there. The only reason ML is currently resurgent is because it we've figured out how to do something that works, while general intelligence has proven beyond our reach for 60+ years.
Edit: actually it seems to be "classical AI" and hybrid approaches and I guess for more details one would need to read the book.
I guess articles like this are worthwhile to temper expectations of what's possible with the current crop of technologies for those not in the field, which could help prevent another winter due to overinflated expectations.
I totally agree with your statement about AGI, but wouldn't be as pessimistic about neural networks in general. Of course our data is enough for useful deep neural models! Many problems can be solved without it, but in areas of computer vision and speech recognition they seem to be the best (currently known) choice.
Your point about AGI which needs to ask questions about data provenance is super interesting. Are you aware of the line of inquiry into active learning? It's fascinating and has a long history:
https://papers.nips.cc/paper/1011-active-learning-with-stati...
The point is that data provenance isn't encoded in the data. Like the difference between a schema where column X has type `int` versus X having type `do(int)` or something cleverer. If the way you get your causal model is to ask the person who ran the experiment, then it's very much an uphill battle for an algorithm to get a causal model. We want to enable automated causal inference, so we should better record our causal models (data lineage).
I've been waiting for the Symbolic/NN pendulum to starting swinging back the other way and start settling in the center. NN/DL is great for the interface between the outer world and the inner world of the mind (pattern recognition and re-construction), and symbolic AI more straightforwardly represents more "language of the mind" tasks, and easily handles issues like explanation and other meta-behaviors that with DL is difficult due to its black-box nature. DL's reliance on extension/training vs. intention/rules can develop ad-hoc intentional emergent theories which is their strength but also their weakness as these theories may not be correct or complete. Each can be brittle in their own way - so it'll be interesting to see more cross-pollination.
Lenat's Cyc? I'm quite familiar. I hadn't looked at it in about 25 years until I just happened to come across the ATT-CYC docs a few days ago (I seem to be missing Part 2) and a printout of a PPT that he gave group of us at Microsoft in 1994/5 or so.
SAT solvers are really fast now. Some sort of "neural SAT problem definition" followed by solving it seems to be an interesting direction, but I'm relatively naive on it all. Not sure how training would work since there's no backprop through Boolean logic.
I'm not an expert but it seems like whatever symbolic reasoning humans have is pretty rudimentary anyways compared to what we're doing on computers already, so I could see the union being very powerful.
I started reading Rebooting AI last night. I think that Marcus and Davis (so far in the book) take a reasonable approach by wanting to design robust AI. Robust AI requires general real world intelligence that is not provided by deep learning.
I have earned over 90% of my income over the last five or six years as a deep learning practitioner. I am a fan of DL based on great results for perception tasks as well as solid NLP results like using BERT like models for things like anaphora resolution.
But, I am in agreement with Marcus and Davis that our long term research priorities are wrong.
As much I'm hoping there'll be a breakthrough in AGI, maybe the right approach is the one AlphaGo was using: DL not as the top level decision-making, but plugged into a traditional decision-making algorithm in specific places.
Yes. Though arguably this is just the same thing as the traditional approach to deep learning, which is to select features to input into a model. You don't always just have to train on raw information, pre-processing the input data and calculating specific features that we know, as humans, will be important for the final result and feeding those in addition to the raw data is a common approach. Don't see much difference between this and taking the output of a network and running it through a few decision trees. Most publishable projects applying AI you see typically have these type of human interaction on both sides of models.
> As much I'm hoping there'll be a breakthrough in AGI
I think it probably won't be one breakthrough, but several, over decades. Personally, I'm pretty happy that AGI is taking a long time to materialize. We likely won't see a "fast takeoff scenario" (the computer is learning at a geometric rate !!1). It will likely happen gradually over years (progressively more intelligent, more aware computer systems), and we may have a chance to adapt in response.
I think that’s fair, deep learning today has an issue with learning guide rails and obviously it is only as good as the data you feed it. I think it’s fair that our models need more
Making 90% of your income off of this tech over the last n years is different than that tech being successful. I work at a very large company that is trying to use ML and AI in all kinds of places. The trend I am seeing is that most of that effort is falling flat, really flat, in fact. They have success in places where regular algorithms would also succeed, but just having regular developers design matching systems and do pretty basic statistics isn't sexy in terms of marketing, so they hush it all up and pretend that ML and neural nets and things are the only way forwards. I don't think it is. Our problem is that we have too many ETL robots who aren't very intelligent people and not very forward-thinking themselves!
The cutting edge NLP stuff just showcased at my company was pretty lame, too. I barely saw any statistically significant results at all and yet they rather unscientifically proclaim success because they got any effect at all. Some of what we do in our field doesn't matter because it comes down to whether a customer got a 2nd call back and got converted to some minor sale or added to a program for them. It's throw away and creates good will at conferences and talks. We make a big deal out of it.
We are spending hundreds of millions on projects, trying to save money on generating leads, reducing interactions with customers and vendors through staffed phone banks, and so on. My company has hired all kinds of academics and research type people and has given them titles of "Distinguished this" and "Principal that" and honestly there's not that much to show for it, maybe zero direct outcomes so far. What galls me the most is in all the conferences and demos they are showing off things like High School robotics vehicles and AI parlor tricks and astonishingly little has translated into the business we do. Meanwhile, there are people in the company who do know how to reduce costs and get more done and have outstanding outcomes, but their techniques are not sexy and thus unimportant to the PT Barnum MBAs running our company. I'm sure that's true most everywhere, of course.
These Principals and Distinguisheds all keep proclaiming success while cashing fat paychecks. Meanwhile this year, our stock has had a tough go of it, so I'm curious whether these attempts will continue. The market takes no prisoners. Sure we get a lot of mileage out of looking cool for the recent grad crowd purposes of recruiting--kids want sexy, cool tech projects to work on and words like "insurance" turn them off, so there's that, I guess.
My take on that is that it won't be long before all those new recruits will figure out they got bait and switched pretty bad and that they aren't going to get to work on any of this sexy ML and AI stuff anymore than I am in my role. I got lured in by Data Science (because PhD), which just shows how gullible I am, but at least some of that traditional statistical modeling is having an impact here and there. The problem again is that even that is overblown by a couple of orders of magnitude! In my project, we're simply trying to get more real-time data out to people who need it without having to call in to get it and that is ridiculously difficult because of all the systems we try to knit together and how overall terrible our data quality is. And now my boss wants to build out an "analytics engine" to capture some of this sexy ML and AI stuff. It leads me to believe that the people involved are most interested in getting promoted and not much more.
Anyways, it is cool tech, but American taxpayers and people who are forced to buy our products are paying for it and I rather think they would prefer to spend their money in some better fashion.
Thanks for your response. Sure, the level of hype is rather high for DL.
That said, I also lived through and worked through the level of hype around expert systems. I think the high level of hype around expert systems in the 1980s was much more extreme and unwarranted that the DL hype levels. I base this on selling expert system tools for both Xerox Lisp Machines and for the Macintosh when it was released in 1984. Some of my customers did cool and useful things, but nothing earth shaking.
At least DL provides very strong engineering results for some types of problems.
there are some surprisingly weak arguments in the text. It's correct to not treat computational resources as constant, but ot treat them as unimportant or negligible is awful as well.
Already computational resources are becoming prohibitive with only a few institutions producing state of the art models at high financial cost. If the goal is AGI this might get exponentially worse. Intelligence needs to take resource consumption into account. The models we produce aren't even close to high level reasoning and we're already consuming significantly more energy than humans or animals, something is wrong.
The scale argument isn't great either because deep learning is running into the inverse issue of classical AI. Now instead of having to program all logic explicitly we have to formulate every individual problem as training data. This doesn't scale either. If an AI gets attacked by a wild animal the solution can't be to first produce 10k pictures of mauled victims, intelligence includes to reason about things in the abscence of data. We can't have autonomous cars constantly running into things until we provide huge amounts of data for every problem, this does not scale either.
> Already computational resources are becoming prohibitive with only a few institutions producing state of the art models at high financial cost.
That is something of an illusion.
Obviously there will be some sort of uneven distribution of computing power; some institutions will have more, some less. The institutions with more power will create models at the limit of what they can do, because that is the best use of their power.
So if the thesis of more power = more results holds then truly cutting results will always be by people with resources that are practically unattainable by everyone else. Google's AlphaGo wasn't a particularly clever model, for example. It just had a lot of horsepower behind it to train it and the various ranging shot attempts Deepmind would have gone through. Someone else would have figured it out albeit more slowly in a few years as computing power became available.
Computational power is still getting exponentially more affordable [0]. Costs aren't really rising, so much as the people who have spent more money get a few years ahead of everyone else and can preview what is about to become cheap.
This argument completely ignores anything statistical, which is another limiting factor. It reminds me of the difference in bandit research and full RL research. RL researchers are fine throwing a thousand years of experience at their algorithm, because it can be simulated. Meanwhile people using bandits in the real world care about statistical efficiency (learning lots with little data), and it's reflected in the research. Most decisions aren't made with a huge abundance of data (most of us aren't google or facebook).
> General AI also ought to be able to work just as comfortably reasoning about politics as reasoning about medicine. It’s the analogue of what people have; any reasonably bright person can do many, many different things.
The average human has extreme difficulty reasoning about politics, while usually being reasonable on medicine (anti-vax being one of many exceptions). And it seems strange to expect a skilled pianist to also be a skilled neuroscientist or a skilled construction worker. On the other hand these people all use similar neural architectures (brains). So he seems pretty off-track when he criticizes "narrow AI" in favor of "general AI", as if there's some magic AI that will do everything perfectly, and even more off track when he criticizes researchers for using "one-size-fits-all" technologies, when indeed that is exactly what humans have been doing for millennia for their cognitive needs.
And sure, ML models in publications so far are typically one-off things that react poorly to modified inputs or unexpected situations. But it's not clear this has any relevance to commercial use. Tesla is still selling self-driving cars despite the accidents.
Total straw man. He actually uses an intern as an example in the very next sentence after what you quoted, as you would expect them to be able to read and get up to speed on a new area regardless of what it was. Meanwhile SOTA in NLP is a system that can be built to answer a single kind of question but can't explain why it did so or do anything useful if given an explanation of why its answer was wrong.
There are deep models like BERT that do pre-training and then need minimal training to do multiple tasks such as question answering, entailment, sentiment analysis, etc. I don't know about "explaining" an answer but there are debuggers that find errors in data sets: https://arxiv.org/pdf/1603.07292.pdf.
But as I said, I don't see why an artist would suddenly get up to speed as a construction worker. He seems to overestimate the capacity of interns as well.
In cognitive science we talk about having cognitive models of things. So I’m sitting in a hotel room, and I understand that there’s a closet, there’s a bed, there’s the television that’s mounted in an unusual way. I know that there are all these things here, and I don’t just identify them. I also understand how they relate to one another. I have these ideas about how the outside world works. They’re not perfect. They’re fallible, but they’re pretty good. And I make a lot of inferences around them to guide my everyday actions.
The opposite extreme is something like the Atari game system that DeepMind made, where it memorized what it needed to do as it saw pixels in particular places on the screen. If you get enough data, it can look like you’ve got understanding, but it’s actually a very shallow understanding. The proof is if you shift things by three pixels, it plays much more poorly. It breaks with the change. That’s the opposite of deep understanding.
Of course. There are an infinte way to make interpretations of perceptions and a finite subset of possible valid ones.
It's among those possible, that the AI will be a concrete implementation of an ideology.
This article doesn’t have any substance. It’s full of anecdata like shifting by 3 pixels to mess up a video game AI or some vague nonsense about “a model of this chair or this tv mounted to the wall.” It’s all casual hypotheticals.
There’s plenty of research on Bayesian neural networks for causal inference. But even more, a lot of causal inference problems are “small data” problems where choosing a strongly informative prior to pair with simple models is needed to prevent overfitting and poor generalization and to account for domain expertise.
Deep learning practitioners generally know plenty about this stuff and fully understand that deep neural networks are just one tool in the tool box, not applicable to all problems and certainly not approaching any kind of general AI solution that supersedes causal inference, feature engineering, etc.
This article is just a sensationalist hit job trying to capitalize on public anxieties about AI to raise the profile of this academic and try to sell more copies of his book.
I’d say, let’s not waste time on this crap. There are engineering problems that deep learning allows us to safely & reliably solve where other methods never could. We absolutely can trust these models for specific use cases. Let’s just get on with doing the work.
This right here is the soft underbelly of the entire “machine learning as step towards AGI” hype machine, fueled in no small part by DeepMind and its flashy but misleading demos.
Once a human learns chess, you can give it a 10x10 board and she will perform at nearly the same skill level with zero retraining.
Give the same challenge to DeepMind’s “superhuman” game-playing machine and it will be an absolute patzer.
This is an obvious indicator that the state of the art in so-called “machine learning” doesn’t involve any actual learning in the way it is normally applied to intelligent systems like humans or animals.
I am continually amazed by the failure of otherwise exceedingly intelligent tech people to grasp this problem.
Humans are also not a general intelligence.
In certain sense Deep Reinforcement Learning is actually more general than human intelligence. For example, when playing games you can remove certain visual clues. It makes it almost impossible to play for humans, while Deep RL scores will not even budge. It means that Deep RL is more general, because it does not relay on certain priors, but it also makes it more stupid in narrow domain of human expertise. Try this game to see yourself: https://high-level-4.herokuapp.com/experiment
Here is bike with reverse steering: https://www.youtube.com/watch?v=MFzDaBzBlL0 Here is flipped vision experiment: https://www.youtube.com/watch?v=MHMvEMy7B9k
Human brains are amazing, but they also require certain amount of time to retrain when inputs/outputs are fundamentally changed.
PS. I didn't hear about anyone testing different board sizes with AlphaZero-esque computer players. But I saw Leela Zero beating very strong humans, when rules of the game were modified so that that the human player could play 2 additional moves: https://www.youtube.com/watch?v=UFOyzU506pY
Playing chess well is a combination of both conscious and unconscious skills. However when deep learning systems play, it is all the unconscious, automatic application of statistical rules. They are playing a very different game from the human chess game.
Because there is no abstract reasoning involved here, these systems cannot apply the lessons learned from chess to another board game, or to something completely different in life, which humans can and do. So even though they are much stronger than human players, they aren't strong in the same way.
Interesting. Has this actually been shown? I would assume a lot of the strategies a human is familiar with would fall apart as well. I'm no chess or go player but I would have to learn new strategies in a tic-tac-toe game scaled to 10x10. I would certainly not be as proficient although I would still consider myself to have intelligence.
If you’re still not convinced, I’ll prove that skills transfer by playing bullet against anyone who can make a 10x10 variant playable online.
[Edited to come across less egotistical]
Despite the constant change of the card pool, and also the wording of the rules text on the cards, and the rules themselves, human players are perfectly capable of "picking up a card they've never seen before and playing it" correctly.
https://en.wikipedia.org/wiki/Magic:_The_Gathering
[1] Detail on bughouse in this comment from an earlier discussion: https://news.ycombinator.com/item?id=20831586
Meta learning is for solving similar problems from a distribution (like different sized boards in your chess example) and has taken off recently (only baby steps so far though). Modular learning is also becoming big, where concepts that are repeatedly used are stored/generalized.
Train it on variable spaces, and you'll get an agent that can play on variable spaces. In fact, you can probably speed things up drastically by using transfer learning from a model which already learned 8x8 space and modifying the inputs and outputs to match the new state and action space.
What part of this do you think "exceedingly intelligent tech people" aren't grasping? Something qualitative? Do you think people in machine learning think of "learning" as literally meaning the same thing as the colloquial usage? What, precisely, are you attacking here? All the harsh anti-machine-learning viewpoints with no clarity are becoming exhausting.
Certainly, someone close enough to the technical process of deep learning will admit that it essentially an extension of logistic regression without any "larger" implications - at least some deep learning researchers are always clear to distinguish the activity from "human intelligence" (and even if a given research never parrots the hype train's mantra, they know it's there and inherently play some part).
But more a minimum assertion of deep learning is that it "generalizes well". And what does "well" mean in this context? In the few situations where data can be generated by the process, like Alpha-Go, it can make a good average approximation of a function but in most situations of deep learning it means "generalizes like a human" - especially image recognition.
This comes together in the process of training AIs. Researchers take data that they hope represents a pattern of inputs and output in a human decision making process and assume they can construct a good approximation of a function that underlies this data. A variety of things can go wrong - the input data can be selective in ways the researchers don't understand (there was a discussion about a large database of images from the net being biased just by the tendency of photographers to center their main subject), there can be no unambiguous "function" - loan/parole AI that's inherently biased because it associated data that isn't legitimate, objective criteria for the decision sought), and so-forth. Some tech people are aware of the problems here to but this stuff is going out the door and being used in decisions affecting people's lives. Merely noting possible problems isn't enough here. These "exceedingly smart people" are still handing off their creations to other people are taking them as something akin to miraculous decision makers.
Please refer, precisely, to my earlier comment in this thread.
https://news.ycombinator.com/item?id=21109193
Humans have orders of magnitude more neurons, more complicated neurons, more intricate neural structures, and their training data is larger and more varied.
In contrast to “machine learning” which is merely a fancy way to say “data processing with massive compute”.
Maybe, but they’re certainly not described that way by whoever is in charge of publishing DeepMind’s research:
“A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play”
https://deepmind.com/research/publications/general-reinforce...
If the system designer has to know the parameters of the challenge the system is up again, it should be obvious you can always add another parameter that the designer didn't know about and get a situation where the system will fail. This is much more of a problem in "real world situations" which no designer can fully describe.
Deleted Comment
Dead Comment
The human state of the art solution seems to be going on slack and asking questions about the data provenance, which will decidedly not work for an automated approach.
A primary reason I can do better a better job than a generic algorithm is because you told me where the data came from (or I designed the schema and ETL myself), while the algo can't make any useful assumptions because all that info is hidden.
I’m not aware of any work or even sci-fi that addresses AGI with regards to this question, and would be curious if there’s stuff out there?
Anyways, with some naive googling I found these references which seem interesting with regards to lineage and causality (for the query "lineage causal database schema"):
[0] Causality and Explanations in Databases. The "Related topics in Databases" section seems interesting.
[1] Duke's 'Understanding Data: Theory and Applications, Lecture 16: Causality in Databases' (by one of [0]'s authors).
[2] Quantifying Causal Effects on Query Answering in Databases. This has some interesting definitions.
[3] Causality in Databases. Seems like a more in depth version of [0].
[4] A whole course on "Provenance and lineage"
[5] Causality and the Semantics of Provenance. Defines "provenance graphs" and some properties.
-----------
[0] http://www.vldb.org/pvldb/vol7/p1715-meliou.pdf
[1] https://www2.cs.duke.edu/courses/fall15/compsci590.6/Lecture...
[2] https://www.usenix.org/sites/default/files/conference/protec...
[3] https://www.cs.cornell.edu/home/halpern/papers/DE_Bulletin20...
[4] https://cse.buffalo.edu/~chomicki/cse703-s18.html
[5] https://arxiv.org/pdf/1006.1429.pdf
However, with a more general AI, we would be able to tell it "this is where and how the data were collected" and it could make the necessary inferences. Fully general AI would also be able to ask the right questions and make reasonable guesses on its own. Everything you do now, and nothing like anything that's been developed.
To me the weak point of this article isn't in the thesis (of course everyone can agree that general intelligence is more useful than narrow, all else being equal) but that there was nothing said about how to get there. The only reason ML is currently resurgent is because it we've figured out how to do something that works, while general intelligence has proven beyond our reach for 60+ years.
Edit: actually it seems to be "classical AI" and hybrid approaches and I guess for more details one would need to read the book.
I guess articles like this are worthwhile to temper expectations of what's possible with the current crop of technologies for those not in the field, which could help prevent another winter due to overinflated expectations.
Your point about AGI which needs to ask questions about data provenance is super interesting. Are you aware of the line of inquiry into active learning? It's fascinating and has a long history: https://papers.nips.cc/paper/1011-active-learning-with-stati...
https://en.wikipedia.org/wiki/Active_learning_(machine_learn...
Or if we're talking serious AGI level, just feed it your codebase/email/slack history and have it learn all of that hidden info.
https://www.cyc.com/
I have earned over 90% of my income over the last five or six years as a deep learning practitioner. I am a fan of DL based on great results for perception tasks as well as solid NLP results like using BERT like models for things like anaphora resolution.
But, I am in agreement with Marcus and Davis that our long term research priorities are wrong.
I think it probably won't be one breakthrough, but several, over decades. Personally, I'm pretty happy that AGI is taking a long time to materialize. We likely won't see a "fast takeoff scenario" (the computer is learning at a geometric rate !!1). It will likely happen gradually over years (progressively more intelligent, more aware computer systems), and we may have a chance to adapt in response.
I think that’s fair, deep learning today has an issue with learning guide rails and obviously it is only as good as the data you feed it. I think it’s fair that our models need more
The cutting edge NLP stuff just showcased at my company was pretty lame, too. I barely saw any statistically significant results at all and yet they rather unscientifically proclaim success because they got any effect at all. Some of what we do in our field doesn't matter because it comes down to whether a customer got a 2nd call back and got converted to some minor sale or added to a program for them. It's throw away and creates good will at conferences and talks. We make a big deal out of it.
We are spending hundreds of millions on projects, trying to save money on generating leads, reducing interactions with customers and vendors through staffed phone banks, and so on. My company has hired all kinds of academics and research type people and has given them titles of "Distinguished this" and "Principal that" and honestly there's not that much to show for it, maybe zero direct outcomes so far. What galls me the most is in all the conferences and demos they are showing off things like High School robotics vehicles and AI parlor tricks and astonishingly little has translated into the business we do. Meanwhile, there are people in the company who do know how to reduce costs and get more done and have outstanding outcomes, but their techniques are not sexy and thus unimportant to the PT Barnum MBAs running our company. I'm sure that's true most everywhere, of course.
These Principals and Distinguisheds all keep proclaiming success while cashing fat paychecks. Meanwhile this year, our stock has had a tough go of it, so I'm curious whether these attempts will continue. The market takes no prisoners. Sure we get a lot of mileage out of looking cool for the recent grad crowd purposes of recruiting--kids want sexy, cool tech projects to work on and words like "insurance" turn them off, so there's that, I guess.
My take on that is that it won't be long before all those new recruits will figure out they got bait and switched pretty bad and that they aren't going to get to work on any of this sexy ML and AI stuff anymore than I am in my role. I got lured in by Data Science (because PhD), which just shows how gullible I am, but at least some of that traditional statistical modeling is having an impact here and there. The problem again is that even that is overblown by a couple of orders of magnitude! In my project, we're simply trying to get more real-time data out to people who need it without having to call in to get it and that is ridiculously difficult because of all the systems we try to knit together and how overall terrible our data quality is. And now my boss wants to build out an "analytics engine" to capture some of this sexy ML and AI stuff. It leads me to believe that the people involved are most interested in getting promoted and not much more.
Anyways, it is cool tech, but American taxpayers and people who are forced to buy our products are paying for it and I rather think they would prefer to spend their money in some better fashion.
That said, I also lived through and worked through the level of hype around expert systems. I think the high level of hype around expert systems in the 1980s was much more extreme and unwarranted that the DL hype levels. I base this on selling expert system tools for both Xerox Lisp Machines and for the Macintosh when it was released in 1984. Some of my customers did cool and useful things, but nothing earth shaking.
At least DL provides very strong engineering results for some types of problems.
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Already computational resources are becoming prohibitive with only a few institutions producing state of the art models at high financial cost. If the goal is AGI this might get exponentially worse. Intelligence needs to take resource consumption into account. The models we produce aren't even close to high level reasoning and we're already consuming significantly more energy than humans or animals, something is wrong.
The scale argument isn't great either because deep learning is running into the inverse issue of classical AI. Now instead of having to program all logic explicitly we have to formulate every individual problem as training data. This doesn't scale either. If an AI gets attacked by a wild animal the solution can't be to first produce 10k pictures of mauled victims, intelligence includes to reason about things in the abscence of data. We can't have autonomous cars constantly running into things until we provide huge amounts of data for every problem, this does not scale either.
That is something of an illusion.
Obviously there will be some sort of uneven distribution of computing power; some institutions will have more, some less. The institutions with more power will create models at the limit of what they can do, because that is the best use of their power.
So if the thesis of more power = more results holds then truly cutting results will always be by people with resources that are practically unattainable by everyone else. Google's AlphaGo wasn't a particularly clever model, for example. It just had a lot of horsepower behind it to train it and the various ranging shot attempts Deepmind would have gone through. Someone else would have figured it out albeit more slowly in a few years as computing power became available.
Computational power is still getting exponentially more affordable [0]. Costs aren't really rising, so much as the people who have spent more money get a few years ahead of everyone else and can preview what is about to become cheap.
[0] https://aiimpacts.org/recent-trend-in-the-cost-of-computing/
Most of the article is describing past scenarios, only the last 3 paragraphs make the argument that the past is a good representation of the present
https://twitter.com/ylecun/status/1066568396177842176
i.e. gradient-based learning is the final word on the matter.
Deleted Comment
The average human has extreme difficulty reasoning about politics, while usually being reasonable on medicine (anti-vax being one of many exceptions). And it seems strange to expect a skilled pianist to also be a skilled neuroscientist or a skilled construction worker. On the other hand these people all use similar neural architectures (brains). So he seems pretty off-track when he criticizes "narrow AI" in favor of "general AI", as if there's some magic AI that will do everything perfectly, and even more off track when he criticizes researchers for using "one-size-fits-all" technologies, when indeed that is exactly what humans have been doing for millennia for their cognitive needs.
And sure, ML models in publications so far are typically one-off things that react poorly to modified inputs or unexpected situations. But it's not clear this has any relevance to commercial use. Tesla is still selling self-driving cars despite the accidents.
Total straw man. He actually uses an intern as an example in the very next sentence after what you quoted, as you would expect them to be able to read and get up to speed on a new area regardless of what it was. Meanwhile SOTA in NLP is a system that can be built to answer a single kind of question but can't explain why it did so or do anything useful if given an explanation of why its answer was wrong.
But as I said, I don't see why an artist would suddenly get up to speed as a construction worker. He seems to overestimate the capacity of interns as well.
The opposite extreme is something like the Atari game system that DeepMind made, where it memorized what it needed to do as it saw pixels in particular places on the screen. If you get enough data, it can look like you’ve got understanding, but it’s actually a very shallow understanding. The proof is if you shift things by three pixels, it plays much more poorly. It breaks with the change. That’s the opposite of deep understanding.
Of course. There are an infinte way to make interpretations of perceptions and a finite subset of possible valid ones.
It's among those possible, that the AI will be a concrete implementation of an ideology.
To select which one is always done by humans.
There’s plenty of research on Bayesian neural networks for causal inference. But even more, a lot of causal inference problems are “small data” problems where choosing a strongly informative prior to pair with simple models is needed to prevent overfitting and poor generalization and to account for domain expertise.
Deep learning practitioners generally know plenty about this stuff and fully understand that deep neural networks are just one tool in the tool box, not applicable to all problems and certainly not approaching any kind of general AI solution that supersedes causal inference, feature engineering, etc.
This article is just a sensationalist hit job trying to capitalize on public anxieties about AI to raise the profile of this academic and try to sell more copies of his book.
I’d say, let’s not waste time on this crap. There are engineering problems that deep learning allows us to safely & reliably solve where other methods never could. We absolutely can trust these models for specific use cases. Let’s just get on with doing the work.