I feel like this needs an editor to have a chance of reaching almost anyone… there are ~100 section/chapter headings that seem to have been generated through some kind of psychedelic free association, and each section itself feels like an artistic effort to mystify the reader with references, jargon, and complex diagrams that are only loosely related to the text. And all wrapped here in a scroll-hijack that makes it even harder to read.
The effect is that it's unclear at first glance what the argument even might be, or which sections might be interesting to a reader who is not planning to read it front-to-back. And since it's apparently six hundred pages in printed form, I don't know that many will read it front-to-back either.
From a rhetorical perspective, it's an extended "Yes-set" argument or persuasion sandwich. You see it a lot with cult leaders, motivational speakers, or political pundits. The problem is that you have an unpopular idea that isn't very well supported. How do you smuggle it past your audience? You use a structure like this:
* Verifiable Fact
* Obvious Truth
* Widely Held Opinion
* Your Nonsense Here
* Tautological Platitude
This gets your audience nodding along in "Yes" mode and makes you seem credible so they tend to give you the benefit of the doubt when they hit something they aren't so sure about. Then, before they have time to really process their objection, you move onto and finish with something they can't help but agree with.
The stuff on the history of computation and cybernetics is well researched with a flashy presentation, but it's not original nor, as you pointed out, does it form a single coherent thesis. Mixing in all the biology and movie stuff just dilutes it further. It's just a grab bag of interesting things added to build credibility. Which is a shame, because it's exactly the kind of stuff that's relevant to my interests[3][4].
> "Your manuscript is both good and original; but the part that is good is not original, and the part that is original is not good." - Samuel Johnson
The author clearly has an Opinion™ about AI, but instead of supporting they're trying to smuggle it through in a sandwich, which I think is why you have that intuitive allergic reaction to it.
https://wii-film.antikythera.org/ - This is a 1-hour talk by the author which summarizes what seems to be the gist of the book. I haven't read the book completely. I read a few sections.
Personally, I think the book does not add anything novel. Reading Karl Friston and Andy Clark would be a better investment of time if the notion of predictive processing seems interesting to you.
I guess I am the odd one out here. Reading it front-to-back has been a blast so far and even though i find my own site's design to be a bit more readable for long text, I certainly appreciate the strangeness of this one.
Ooh, that looks very cool. The lack of a concrete definition of AGI and a scientifically (in the correct domains) backed operationalization of such a definition that can allow direct comparisons between humans and current AIs, where it isn't impossible for humans and/or easy to saturate by AIs, is much needed.
I got the same impression as well. I think I've become so cynical to these kinds of things that whenever I see this kind of thing, I immediately assume bad faith / woo and just move on to the next article to read.
This discussion is not complete without a mention of Marcus Hutter’s seminal book[0] “Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability”. It provides many of the formalisms upon which metrics of intelligence are based. The gaps in current AI tech are pretty explainable in this context.
If you've read the book, please elaborate and point us in the right direction, so we don't all have to do the same just to get an idea how those gaps can be explained.
I'm going to go into my own perspective of it; it is not reflective of what it discusses.
The linked multimedia article gives a narrative of intelligent systems, but Hutter and AIXI give a (noncomputable) definition of an ideal intelligent agent. The book situates the definitions in a reinforcement learning setting, but the core idea is succinctly expressed in a supervised learning setting.
The idea is this: given a dataset with yes/no labels (and no repeats in the features), and a commonsense encoding of turing machines as a binary string, the ideal map from input to probability distribution model is defined by
1. taking all turing machines that decide the input space and agree with the labels of the training set, and
2. the inference algorithm is that on new input, the output is exactly the distribution by counting all such machines that assent vs. reject the input, with their mass being weighted by the reciprocal of 2 to the power of the length, then the weighted counts normalized. This is of course a noncomputable algorithm.
The intuition is that if a simply-patterned function from input to output exists in the training set, then there is a simply/shortly described turing machine that captures that function, and so that machine's opinion on the new input is given a lot of weight. But there exist plausible more complex patterns, and we also consider them.
What I like about this abstract definition is that it is not in reference to "human intelligence" or "animal intelligence" or some other anthropic or biological notion. Rather, you can use these ideas anytime you isolate a notion of agent from an environment/data, and want to evaluate how the agent interacts/predicts intelligently against novel input from the environment/data, under the limited input that it has. It is a precise formalization of inductive thinking / Occam's razor.
Another thing I like about this is that it gives theoretical justification for the double-descent phenomenon. It is a (noncomputable) algorithm to give the best predictor, but it is defined in reference to the largest hypothesis space (all turing machines that decide on the input space). It suggests that whereas prior ML methods got better results with architectures that are carefully designed to make bad predictors unrepresentable, it is also not idle, if you have a lot of computational resources, to have an architecture that defines an expressive hypothesis space, and instead softly prioritizing simpler hypotheses through the learning algorithms (e.g. an approximation of which is regularization). This allows your model to learn complex patterns defined by the data that you did not anticipate, if that evidence in the data justifies it, whereas a small, biased hypothesis space would not be able to represent such a pattern if not anticipated but significant.
Note that under this definition, you might want to talk about a situation where the observations are noisy but you want to learn the trend of it without the noise. You can adapt the definition to be over noisy input by for example accompanying each input with distinct sequence numbers or random salts, then consider the marginal distribution for numbers/salts not in the training set (there are some technical issues of convergence, but the general approach is feasible), and this models the noise distribution as well.
I didn’t read the book, but, I’d advise people not to go into mysticism, it has brought us very little compared to the scientific method, which has powered our industrial and information revolutions.
Dive into the Mindscape podcast, investigate complex systems. Go into information theory. Look at evolution from an information theory perspective. Look at how intelligence enables (collective) modeling of likely local future states of the universe, and how that helps us thrive.
Don’t get caught in what at least I consider to be a trap: “to use your consciousness to explain your consciousness”. I think the jump is, for now, too large.
Just my 2 ct. FWIW I consider myself a cocktail philosopher. I do have a PhD in Biophsyics, it means something to some. Although I myself consider it of limited value.
So, did anyone here actually read the book? I’m halfway through and I think there are compelling ideas around how self-replication emerges naturally from a fundamentally computational universe and how that leads to increasingly complex computation (and ultimately “intelligence”). The book definitely has Wolfram vibes but it’s thought provoking to draw a connecting line through many domains like the author does. It’s best treated as pop-sci, like most of the AI literature.
I read the whole book. Sure an editor could have tightened it up, but it was an enjoyable tour of so many topics that I enjoy. So often I will read a paper or book on just one of the topics, seeing so many of them together was fun and made me think about how they all intertwine more. I have studied biology, physics, computer science and finance so I love bouncing across them. It explores defining life and then intelligence and while I might have a similar, but slightly different definition of intelligence personally I loved comparing them.
There was a concept buried in it that before a system evolves replication it first will learn how to copy badly, then better, then replicate. This might feel a minor and obvious statement, but I have never seen it called out before and it is a concept I have discussed many times with people since. Some obvious things such as if I want a system to obtain replication my initial selection filter should be for copying. I am able to induce replication in a system for less input energy via this process. But this can also be flipped around and being hyper aware of systems that are copying badly knowing they are much more likely to phase shift to full replication shortly. I see this everywhere from ideas, startups, and even in finance.
And to nerd snipe everyone here, I spotted a bug in the brainfuck which is still on the online copy, can you find it?
I don't really concern myself with consciousness or intelligence.
It seems to me as a mereological nihilist that meaning primitives are state changes.
The existence of syntatic meaning depends on state changes.
The existence of semantic meaning depends on syntax.
The existence of complex meaning generators like humans is dependent on less complex meaning in DNA and RNA.
We are meaning generators, we are the empirical generators of the most complex observed meaning and you can follow the causal chain all the way to state change.
I don't know if the universe has an ultimate meaning, but I suspect that if we don't wipe ourselves out we'd be part of the causual chain that creates and/or discovers it.
All the particles and information is all tied to the systems/universes entropy so there is a timer on how long we have, but in the meantime.
We emit meaning like stars emit photons. We are the complex meaning generators of the universe. Maybe there are others out there but we haven't met them yet, and until we do we should take precaution.
Without actually reading the book, it appears the author asserts that a large component of human intelligence can be reproduced by AI, and perhaps the chaotic interactions that underpin human intelligence, also allow nonliving systems such as AI farms to express intelligent behavior.
What he would like people to believe is that AI is real intelligence, for some value of real.
Even without AI, computers can be programmed for a purpose, and appear to exhibit intelligence. And mechanical systems, such as the governor of a lawnmower engine, seem able to seek a goal they are set for.
What AI models have in common with human and animal learning is having a history which forms the basis for a response. For humans, our sensory motor history, with its emotional associations, is an embodied context out of which creative responses derive.
There is no attempt to recreate such learning in AI. And by missing out on embodied existence, AI can hardly be claimed as being on the same order as human or animal intelligence.
To understand the origin of human intelligence, a good starting point would be, Ester Thelen's book[0], "A Dynamic Systems Approach to the Development of Cognition and Action" (also MIT Press, btw.)
According to Thelen, there is no privileged component with prior knowledge of the end state of an infant's development, no genetic program that their life is executing. Instead, there is a process of trial and error that develops the associations between senses and muscular manipulation that organize complex actions like reaching.
If anything, it is caregivers in the family system that knowledge of an end result resides: if something isn't going right with the baby, if she not able to breastfeed within a few days of birth (a learned behavior) or not able to roll over by themselves at 9 months, they will be ones to seek help.
In my opinion, it is in the caring arts, investing time in our children's development and education, that advances us as a civilization, although there is now a separate track, the advances in computers and technology, that often serves as a proxy for improving our culture and humanity, easier to measure, easier to allocate funds, than for the squishy human culture of attentive parenting, teaching and caregiving.
I have no problem with using the word intelligence to describe human-made systems, since the attribute artificial preserves the essential distinction. These systems inhabit the second-order world of human-created symbols and representations, they are not, and never will be, beings in the real world. Even when they inevitably will be enhanced to learn from their interactions and equipped with super-human sensors and robotic arms. What they won't have is the millions of years of evolution, of continuous striving for self-preservation and self-expansion which shaped the consciousness of living organisms. What they won't ever have is a will to be. Even if we program them to seek to persist and perpetuate themselves, it will not be their will, but the will of whoever programmed them thus.
Would you say someone suffering from locked-in syndrome is of a different order of intelligence due to their no longer having a fully embodied experience?
Not parent, but I would say their experience, even though severely impaired in many areas, is still infinitely more embodied than any human artifact is or even conceivably could be. Simply because the millions of years of embodied evolution which have shaped them into who they are and because of the unimpaired embodiment of most of the cells that make up their organism.
Considering that even simple neural networks are universal approximators, and that most of the intelligent tasks require prediction of the next state(s) according to previous state, aren't biological or artificial brains "just" universal approximators of extremely complex function of the world?
That’s true in a narrow functional sense, but it misses the role of a world model. Intelligence isn’t just about approximating input-output mappings, it’s about building structured, causal models that let an agent generalize, simulate, and plan. Universal approximation only says you could represent those mappings, not that you can efficiently construct them. Current LLMs seem intelligent because they encode vast amounts of knowledge already expanded by biological intelligence. The real question is whether an LLM, on its own, can achieve the same kind of efficient causal and world-model building rather than just learning existing mappings. It can interpolate new intermediate representations within its learned manifold, but it still relies on the knowledge base produced by biological intelligence. It’s more of an interpolator than an extrapolator: as an analogy.
Note that you'd also have to be somewhat more precise as to what the "state" and "next state" are. It is likely that the state is everything that enters the brain (i.e. by means of sensing, such as what we see, hear, feel, introspect, etc.). However, parts of this state enter the brain at various places and at various frequencies. Abstracting that all away might be problematic.
For years, I've taken the position that intelligence is best expressed as creativity - that is, the ability to come up with something that isn't predictable based on current data. Today's "artificial intelligence" analyzes words (tokens) based on an input (prompt) to come up with an output. It's predictable. It's fast. But, imho, it lacks creativity, and therefore lacks intelligence.
One example of this I often ponder is the boxing style of Muhammad Ali, specifically punching while moving backwards. Before Ali, no one punched while moving away from their opponent. All boxing data said this was a weak position, time for defense, not for punching (offense). Ali flipped it. He used to do miles of roadwork, throwing punches while running backwards to train himself on this style. People thought he was crazy, but it worked, and, imho, it was extremely creative (in the context of boxing), and therefore intelligent.
Did data exist that could've been analyzed (by an AI system) to come up with this boxing style? Perhaps. Kung Fu fighting styles have long known about using your opponents momentum against them. However, I think that data (Kung Fu fighting styles) would've been diluted and ignored in face of the mountains of traditional boxing style data, that all said not to punch while moving backwards.
I think it depends on the complexity of the knowledge to be created. I agree with you broadly, but the danger of using your boxing analogy is that for game systems that can be sufficiently understood, AI has actually invented new strategies. TD-Gammon introduced new advances in the strategy of playing backgammon because its very strong understanding of early gameplay meant that it found some opening moves that humans didn't realize were as strong as they were.
I would argue that the only truly new things generative AI has introduced are mostly just byproducts of how the systems are built. The "AI style" of visual models, the ChatGPT authorial voice, etc., are all "new", but they are still just the result of regurgitating human created data and the novelty is an artifact of the model's competence or lack thereof.
There has not been, at least to my knowledge, a truly novel style of art, music, poetry, etc. created by an AI. All human advancements in those areas build mostly off of previous peoples' work, but there's enough of a spark of human intellect that they can still make unique advancements. All of these advancements are contingent rather than inevitable, so I'm not asking that an LLM, trained on nothing but visual art from the Medieval times and before, could recreate Impressionism. But I don't think it would make anything the progresses past or diverges from Medieval and pre-Medieval art styles. I don't think an LLM with no examples of or references to anything written before 1700 would ever produce poetry that looked like Ezra Pound's writing, though it just might make its own Finnegan's Wake if the temperature parameter were turned out high enough.
And how could it? It works because there's enough written data that questions and context around the questions are generally close enough to previously seen data that the minor change in the question will be matched by a commensurate change in the correct response from the ones in the data. That's all a posteriori!
> Today's "artificial intelligence" analyzes words (tokens) based on an input (prompt) to come up with an output. It's predictable. It's fast. But, imho, it lacks creativity ...
I would have agreed with you at the dawn of LLM emergence, but not anymore. Not because the models have improved, but because I have a better understanding and more experience now. Token prediction is what everyone cites, and it still holds true. This mechanism is usually illustrated with an observable pattern, like the question, "Are antibiotics bad for your gut?" which is the predictability you mentioned. But LLM creativity begins to emerge when we apply what I’d call "constraining creativity." You still use token prediction, but the preceding tokens introduce an unusual or unexpected context - such as subjects that don't usually appear together or a new paradoxical observation (It's interesting that for fact-based queries, rare constraints lead to hallucinations, but here they're welcome)
I often use the latter for fun by asking an LLM to create a stand-up sketch based on an interesting observation I noticed. The results aren’t perfect, but they combine the unpredictability of token generation under constraints (funny details, in the case of the sketch) with the cultural constraints learned during training. For example, a sketch imagining doves and balconies as if they were people and real estate. The quote below from that sketch show that there are intersecting patterns between the world of human real estate and the world of birds, but mixed in a humorous way.
"You want to buy this balcony? That’ll be 500 sunflower seeds down, and 5 seeds a day interest. Late payments? We send the hawk after you."
It's hard to pin point what creativity is. But in your example, the more creative thing was really coming up with the scenario of pigeons selling balconies as real state. What followed was just applying usual tropes for that sort of joke on the subject matter. I feel like LLMs are not very good at coming up with something novel. I'm not even sure they are capable of that. It's not as if coming up with something novel is easy for humans either.
This book lines up with a lot of what I've been thinking: the centrality of prediction, how intelligence needs distributed social structure, language as compression, why isolated systems can't crack general intelligence.
But there are real splits on substrate dependence and what actually drives the system. Can you get intelligence from pure prediction, or does it need the pressure of real consequences? And deeper: can it emerge from computational principles alone, or does it require specific environmental embeddedness?
My sense is that execution cost drives everything. You have to pay back what you spend, which forces learning and competent action. In biological or social systems you're also supporting the next generation of agents, so intelligence becomes efficient search because there's economic pressure all the way down. The social bootstrapping isn't decorative, it's structural.
By that logic, wouldn't the electric kettle heating water for the coffee be intelligent? Had it not measured heat when activated, it wouldn't know how to stop and the man would have thrown it away or at least stopped paying for the kettle's electricity.
I think we need a meta layer - ability to reason over one's own goals (this does not contradict the environment creating hard constraints). The man has it. The machine may have it (notably a paperclip maximizer will not count under this criteria). The crow does not.
Yes, if only a tiny amount. The example I use is a toilet cistern, when explaining this to children. It’s probably the closed loop control system with which they have the most firsthand experience, so they understand it best. Also toilet funny haha.
You could say that that, yes, that kettle is intelligent, or smart, as in smart watch. But the intelligence in question clearly derives from the human who designed that kettle. Which is why we describe it as artificial.
Similarly, a machine could emulate meta-cognition, but it would in effect only be an reflection and embodiment of certain meta-cognitive processes originally instantiated in the mind which created that machine.
Don't "real" consequences apply for setting weights? There's an actual monetary cost to train these models, and they have to actually perform to keep getting trained. Sure it's VC spend right now and not like, biological reproduction driving the incentives ultimately, but it's not outside the same structure.
Yes, but the (semi-)autonomous entity you're referring to now is the whole company, including all who work there and design the LLM system and negotiate contracts and all that. The will to persist and expand of all those humans together result in the will to expand of the company which then evolves those systems. But the systems themselves don't contribute to that collective will.
Depending on the time horizon the predictions change. So we get layers - what is going to happen in the next hour/tomorrow/next year/next 10 years/next 100 etc (and layers of compression of which language is just one) and that naturally produces contradictions which creates bounds on "intelligence".
It really is a stupid system. No one rational wants to hear that, just like no one religious wants to hear contradictions in their stories, or no one who plays chess wants to hear its a stupid game. The only thing that can be said about the chimp intelligence is it has developed a hatred of contradictions/unpredictability and lack of control unseen in trees, frogs, ants and microbes.
Stories becomes central to survive such underlying machinery.
Part of the story we tell is no no we don't all have to be Kant or Einstein because we just absorb what they uncovered. So apparently the group or social structures matters. Which is another layer of pure hallucination. All social structures if they increase the prediction horizon also generate/expose themselves to more prediction errors and contradictions not less.
So again Coherence at group level is produced through story - religion will save us, the law will save us, trump will save, the jedi will save us, AI will save us etc. We then build walls and armies to protect ourselves from each others stories. Microbes don't do this. They do the opposite and have produced the krebs cycle, photosynthesis, crispr etc. No intelligence. No organization.
Our intelligence are just bubbling cauldrons at the individual and social level through which info passes and mutates. Info that survives is info that can survive that machinery. And as info explodes the coherence stabilization process is over run. Stories have to be written faster than stories can be written.
So Donald Trump is president. A product of "intelligence" and social "intelligence". Meanwhile more microbes exist than stars in the universe. No Trump or ICE or Church or data center is required to keep them alive.
If we are going to tell a story about Intelligence look to Pixar or WWE. Don't ask anyone in MIT what they think about it.
The MIT vs. WWE contrast feels like a false dichotomy. MIT represents systematic, externalized intelligence (structured, formal, reductive, predictive). WWE or Pixar represent narrative and emotional intelligence. We do need both.
Also evolution is the original information-processing engine, and humans still run on it just like microbes. The difference is just the clock speed. Our intelligence, though chaotic and unstable, operates on radically faster time and complexity scales. It's an accelerator that runs in days and months instead of generations. The instability isn’t a flaw: it’s the turbulence of the way faster adaptation.
It’s hard not to see consciousness (whatever that actually is) lurking under all this you just explained. If it’s emergent, the substrate wars might just be detail; if it’s not, maybe silicon never gets a soul.
I listened to an interview with a researcher a while back who hypothesized that human reasoning probably evolved not mostly for the abstract logical reasoning we associate with intelligence, but to “give reasons” to motivate other humans or to explain our previous actions in a way that would make them seem acceptable…social utility basically. My experience with next token predicting LLMs aligns with human communication. We humans rarely complete a thought before we start speaking, so I think our brains are often just predicting the next 1-5 words that will be accepted by who we’re talking to based on previous knowledge of them and evaluation of their (often nonverbal) emotional reactions to what we’re saying. Our typical thought patterns may not be as different from LLMs’ as we think.
IIRC the researcher was Hugo Mercier, probably on Sean Carroll’s fantastic Mindscape podcast, but it might have been Lex Fridman before he strayed from science/tech.
"reasoning evolved not to complement individual cognition but as an argumentative device" -- and it has more positive effects at social level than at individual level
> and it has more positive effects at social level than at individual level
Now it raises the question should we be reasoning in our head then? Is there a better way to solve intractable math problems for example? Is math itself a red herring created for argumentative purposes?
We can never know, but I personally favour the rise of "handedness" and the tool-making (technological) hypothesis. To make and use tools, and to transfer the recipes and terminology, we must educate one another.
"In the physical adaptation view, one function (producing speech sounds) must have been superimposed on existing anatomical features (teeth, lips) previously used for other purposes (chewing, sucking). A similar development is believed to have taken place with human hands and some believe that manual gestures may have been a precursor of language. By about two million years ago, there is evidence that humans had developed preferential right-handedness and had become capable of making stone tools. Tool making, or the outcome of manipulating objects and changing them using both hands, is evidence of a brain at work." [1]
Not choosing exactly what words you want to use is something very different than not completing a thought IMO. When you speak you may not know exactly which words you're going to use to communicate an idea, but you already know the idea that you're communicating. By contrast LLMs have no such concept as an idea - only words.
And it's also important that language and words are just another invention of humanity. We achieved plenty before we had any meaningful language whatsoever. At the bare minimum we reproduced - and think about all that's involved in forming a relationship, having a child, and then successfully raising that child for them to go on and do the same, all without any sort of language. It emphasizes that ideas and language are two very different concepts, at least for humans.
Interesting. N.J. Enfield (Linguist, Anthropologist) makes a similar point about the purpose for which language evolved for in "Language vs Reality". I'm paraphrasing loosely, but the core argument is that the primary role of language is to create an abstraction of reality in order to convince other people, than to accurately capture reality. He talks about how there are 2 layers of abstraction - how our senses compress information into higher order concepts that we consciously perceive, and how language further compresses information about these higher order concepts we have in our minds.
Why would a human need to develop the ability to convince others if truth should be enough? One would have to make the argument that convincing others and oneself involves things that are not true to at least one party (as far as they know). I don't know why a species would develop misunderstanding if truth is always involved. If emotions/perception are the things that create misunderstanding, then I can see the argument for language as necessary to fix misunderstanding in the group. On some level, nature thought it correct to fix misunderstanding on a species level.
I have had the same suspicion. I can propose a new kind of ongoing Turing-like test where we track how many words are suggested on our phones (or computers) as we type. On my phone it guesses the next single word pretty well, so why not the next two? Then 3... imagine half-way through a message it "finishing your sentence" as close friends and family often do. Then why should it wait for halfway? What are the various milestones of finishing the last word, last 5 words, half the sentence, 80%, etc?
There's also the whole predictive processing camp in cognitive science whose position is loosely similar to the author's, but the author makes a much stronger commitment to computationalism than other researchers in the camp.
This just doesn't explain things by itself. It doesn't explain why humans would care about reasoning in the first place. It's like explaining all life as parasitic while ignoring where the hosts get their energy from.
Think about it, if all reasoning is post-hoc rationalization, reasons are useless. Imagine a mentally ill person on the street yelling at you as you pass by: you're going to ignore those noises, not try to interpret their meaning and let them influence your beliefs.
This theory is too cynical. The real answer has got to have some element of "reasoning is useful because it somehow improves our predictions about the world"
The effect is that it's unclear at first glance what the argument even might be, or which sections might be interesting to a reader who is not planning to read it front-to-back. And since it's apparently six hundred pages in printed form, I don't know that many will read it front-to-back either.
* Verifiable Fact
* Obvious Truth
* Widely Held Opinion
* Your Nonsense Here
* Tautological Platitude
This gets your audience nodding along in "Yes" mode and makes you seem credible so they tend to give you the benefit of the doubt when they hit something they aren't so sure about. Then, before they have time to really process their objection, you move onto and finish with something they can't help but agree with.
The stuff on the history of computation and cybernetics is well researched with a flashy presentation, but it's not original nor, as you pointed out, does it form a single coherent thesis. Mixing in all the biology and movie stuff just dilutes it further. It's just a grab bag of interesting things added to build credibility. Which is a shame, because it's exactly the kind of stuff that's relevant to my interests[3][4].
> "Your manuscript is both good and original; but the part that is good is not original, and the part that is original is not good." - Samuel Johnson
The author clearly has an Opinion™ about AI, but instead of supporting they're trying to smuggle it through in a sandwich, which I think is why you have that intuitive allergic reaction to it.
[1]: https://changingminds.org/disciplines/sales/closing/yes-set_...
[2]: https://en.wikipedia.org/wiki/Compliment_sandwich
[3]: https://www.oranlooney.com/post/history-of-computing/
[4]: https://news.ycombinator.com/item?id=45220656#45221336
Personally, I think the book does not add anything novel. Reading Karl Friston and Andy Clark would be a better investment of time if the notion of predictive processing seems interesting to you.
[0] https://www.hutter1.net/ai/uaibook.htm
The linked multimedia article gives a narrative of intelligent systems, but Hutter and AIXI give a (noncomputable) definition of an ideal intelligent agent. The book situates the definitions in a reinforcement learning setting, but the core idea is succinctly expressed in a supervised learning setting.
The idea is this: given a dataset with yes/no labels (and no repeats in the features), and a commonsense encoding of turing machines as a binary string, the ideal map from input to probability distribution model is defined by
1. taking all turing machines that decide the input space and agree with the labels of the training set, and
2. the inference algorithm is that on new input, the output is exactly the distribution by counting all such machines that assent vs. reject the input, with their mass being weighted by the reciprocal of 2 to the power of the length, then the weighted counts normalized. This is of course a noncomputable algorithm.
The intuition is that if a simply-patterned function from input to output exists in the training set, then there is a simply/shortly described turing machine that captures that function, and so that machine's opinion on the new input is given a lot of weight. But there exist plausible more complex patterns, and we also consider them.
What I like about this abstract definition is that it is not in reference to "human intelligence" or "animal intelligence" or some other anthropic or biological notion. Rather, you can use these ideas anytime you isolate a notion of agent from an environment/data, and want to evaluate how the agent interacts/predicts intelligently against novel input from the environment/data, under the limited input that it has. It is a precise formalization of inductive thinking / Occam's razor.
Another thing I like about this is that it gives theoretical justification for the double-descent phenomenon. It is a (noncomputable) algorithm to give the best predictor, but it is defined in reference to the largest hypothesis space (all turing machines that decide on the input space). It suggests that whereas prior ML methods got better results with architectures that are carefully designed to make bad predictors unrepresentable, it is also not idle, if you have a lot of computational resources, to have an architecture that defines an expressive hypothesis space, and instead softly prioritizing simpler hypotheses through the learning algorithms (e.g. an approximation of which is regularization). This allows your model to learn complex patterns defined by the data that you did not anticipate, if that evidence in the data justifies it, whereas a small, biased hypothesis space would not be able to represent such a pattern if not anticipated but significant.
Note that under this definition, you might want to talk about a situation where the observations are noisy but you want to learn the trend of it without the noise. You can adapt the definition to be over noisy input by for example accompanying each input with distinct sequence numbers or random salts, then consider the marginal distribution for numbers/salts not in the training set (there are some technical issues of convergence, but the general approach is feasible), and this models the noise distribution as well.
Dive into the Mindscape podcast, investigate complex systems. Go into information theory. Look at evolution from an information theory perspective. Look at how intelligence enables (collective) modeling of likely local future states of the universe, and how that helps us thrive.
Don’t get caught in what at least I consider to be a trap: “to use your consciousness to explain your consciousness”. I think the jump is, for now, too large.
Just my 2 ct. FWIW I consider myself a cocktail philosopher. I do have a PhD in Biophsyics, it means something to some. Although I myself consider it of limited value.
There was a concept buried in it that before a system evolves replication it first will learn how to copy badly, then better, then replicate. This might feel a minor and obvious statement, but I have never seen it called out before and it is a concept I have discussed many times with people since. Some obvious things such as if I want a system to obtain replication my initial selection filter should be for copying. I am able to induce replication in a system for less input energy via this process. But this can also be flipped around and being hyper aware of systems that are copying badly knowing they are much more likely to phase shift to full replication shortly. I see this everywhere from ideas, startups, and even in finance.
And to nerd snipe everyone here, I spotted a bug in the brainfuck which is still on the online copy, can you find it?
It seems to me as a mereological nihilist that meaning primitives are state changes.
The existence of syntatic meaning depends on state changes.
The existence of semantic meaning depends on syntax.
The existence of complex meaning generators like humans is dependent on less complex meaning in DNA and RNA.
We are meaning generators, we are the empirical generators of the most complex observed meaning and you can follow the causal chain all the way to state change.
I don't know if the universe has an ultimate meaning, but I suspect that if we don't wipe ourselves out we'd be part of the causual chain that creates and/or discovers it.
All the particles and information is all tied to the systems/universes entropy so there is a timer on how long we have, but in the meantime.
We emit meaning like stars emit photons. We are the complex meaning generators of the universe. Maybe there are others out there but we haven't met them yet, and until we do we should take precaution.
What he would like people to believe is that AI is real intelligence, for some value of real.
Even without AI, computers can be programmed for a purpose, and appear to exhibit intelligence. And mechanical systems, such as the governor of a lawnmower engine, seem able to seek a goal they are set for.
What AI models have in common with human and animal learning is having a history which forms the basis for a response. For humans, our sensory motor history, with its emotional associations, is an embodied context out of which creative responses derive.
There is no attempt to recreate such learning in AI. And by missing out on embodied existence, AI can hardly be claimed as being on the same order as human or animal intelligence.
To understand the origin of human intelligence, a good starting point would be, Ester Thelen's book[0], "A Dynamic Systems Approach to the Development of Cognition and Action" (also MIT Press, btw.)
According to Thelen, there is no privileged component with prior knowledge of the end state of an infant's development, no genetic program that their life is executing. Instead, there is a process of trial and error that develops the associations between senses and muscular manipulation that organize complex actions like reaching.
If anything, it is caregivers in the family system that knowledge of an end result resides: if something isn't going right with the baby, if she not able to breastfeed within a few days of birth (a learned behavior) or not able to roll over by themselves at 9 months, they will be ones to seek help.
In my opinion, it is in the caring arts, investing time in our children's development and education, that advances us as a civilization, although there is now a separate track, the advances in computers and technology, that often serves as a proxy for improving our culture and humanity, easier to measure, easier to allocate funds, than for the squishy human culture of attentive parenting, teaching and caregiving.
[0] https://www.amazon.com/Approach-Development-Cognition-Cognit...
Dead Comment
Deleted Comment
Note that you'd also have to be somewhat more precise as to what the "state" and "next state" are. It is likely that the state is everything that enters the brain (i.e. by means of sensing, such as what we see, hear, feel, introspect, etc.). However, parts of this state enter the brain at various places and at various frequencies. Abstracting that all away might be problematic.
One example of this I often ponder is the boxing style of Muhammad Ali, specifically punching while moving backwards. Before Ali, no one punched while moving away from their opponent. All boxing data said this was a weak position, time for defense, not for punching (offense). Ali flipped it. He used to do miles of roadwork, throwing punches while running backwards to train himself on this style. People thought he was crazy, but it worked, and, imho, it was extremely creative (in the context of boxing), and therefore intelligent.
Did data exist that could've been analyzed (by an AI system) to come up with this boxing style? Perhaps. Kung Fu fighting styles have long known about using your opponents momentum against them. However, I think that data (Kung Fu fighting styles) would've been diluted and ignored in face of the mountains of traditional boxing style data, that all said not to punch while moving backwards.
I would argue that the only truly new things generative AI has introduced are mostly just byproducts of how the systems are built. The "AI style" of visual models, the ChatGPT authorial voice, etc., are all "new", but they are still just the result of regurgitating human created data and the novelty is an artifact of the model's competence or lack thereof.
There has not been, at least to my knowledge, a truly novel style of art, music, poetry, etc. created by an AI. All human advancements in those areas build mostly off of previous peoples' work, but there's enough of a spark of human intellect that they can still make unique advancements. All of these advancements are contingent rather than inevitable, so I'm not asking that an LLM, trained on nothing but visual art from the Medieval times and before, could recreate Impressionism. But I don't think it would make anything the progresses past or diverges from Medieval and pre-Medieval art styles. I don't think an LLM with no examples of or references to anything written before 1700 would ever produce poetry that looked like Ezra Pound's writing, though it just might make its own Finnegan's Wake if the temperature parameter were turned out high enough.
And how could it? It works because there's enough written data that questions and context around the questions are generally close enough to previously seen data that the minor change in the question will be matched by a commensurate change in the correct response from the ones in the data. That's all a posteriori!
I would have agreed with you at the dawn of LLM emergence, but not anymore. Not because the models have improved, but because I have a better understanding and more experience now. Token prediction is what everyone cites, and it still holds true. This mechanism is usually illustrated with an observable pattern, like the question, "Are antibiotics bad for your gut?" which is the predictability you mentioned. But LLM creativity begins to emerge when we apply what I’d call "constraining creativity." You still use token prediction, but the preceding tokens introduce an unusual or unexpected context - such as subjects that don't usually appear together or a new paradoxical observation (It's interesting that for fact-based queries, rare constraints lead to hallucinations, but here they're welcome)
I often use the latter for fun by asking an LLM to create a stand-up sketch based on an interesting observation I noticed. The results aren’t perfect, but they combine the unpredictability of token generation under constraints (funny details, in the case of the sketch) with the cultural constraints learned during training. For example, a sketch imagining doves and balconies as if they were people and real estate. The quote below from that sketch show that there are intersecting patterns between the world of human real estate and the world of birds, but mixed in a humorous way.
But there are real splits on substrate dependence and what actually drives the system. Can you get intelligence from pure prediction, or does it need the pressure of real consequences? And deeper: can it emerge from computational principles alone, or does it require specific environmental embeddedness?
My sense is that execution cost drives everything. You have to pay back what you spend, which forces learning and competent action. In biological or social systems you're also supporting the next generation of agents, so intelligence becomes efficient search because there's economic pressure all the way down. The social bootstrapping isn't decorative, it's structural.
I also posted yesterday a related post on HN
> What the Dumpster Teaches: https://news.ycombinator.com/item?id=45698854
I think we need a meta layer - ability to reason over one's own goals (this does not contradict the environment creating hard constraints). The man has it. The machine may have it (notably a paperclip maximizer will not count under this criteria). The crow does not.
Similarly, a machine could emulate meta-cognition, but it would in effect only be an reflection and embodiment of certain meta-cognitive processes originally instantiated in the mind which created that machine.
It really is a stupid system. No one rational wants to hear that, just like no one religious wants to hear contradictions in their stories, or no one who plays chess wants to hear its a stupid game. The only thing that can be said about the chimp intelligence is it has developed a hatred of contradictions/unpredictability and lack of control unseen in trees, frogs, ants and microbes.
Stories becomes central to survive such underlying machinery. Part of the story we tell is no no we don't all have to be Kant or Einstein because we just absorb what they uncovered. So apparently the group or social structures matters. Which is another layer of pure hallucination. All social structures if they increase the prediction horizon also generate/expose themselves to more prediction errors and contradictions not less.
So again Coherence at group level is produced through story - religion will save us, the law will save us, trump will save, the jedi will save us, AI will save us etc. We then build walls and armies to protect ourselves from each others stories. Microbes don't do this. They do the opposite and have produced the krebs cycle, photosynthesis, crispr etc. No intelligence. No organization.
Our intelligence are just bubbling cauldrons at the individual and social level through which info passes and mutates. Info that survives is info that can survive that machinery. And as info explodes the coherence stabilization process is over run. Stories have to be written faster than stories can be written.
So Donald Trump is president. A product of "intelligence" and social "intelligence". Meanwhile more microbes exist than stars in the universe. No Trump or ICE or Church or data center is required to keep them alive.
If we are going to tell a story about Intelligence look to Pixar or WWE. Don't ask anyone in MIT what they think about it.
Also evolution is the original information-processing engine, and humans still run on it just like microbes. The difference is just the clock speed. Our intelligence, though chaotic and unstable, operates on radically faster time and complexity scales. It's an accelerator that runs in days and months instead of generations. The instability isn’t a flaw: it’s the turbulence of the way faster adaptation.
IIRC the researcher was Hugo Mercier, probably on Sean Carroll’s fantastic Mindscape podcast, but it might have been Lex Fridman before he strayed from science/tech.
"reasoning evolved not to complement individual cognition but as an argumentative device" -- and it has more positive effects at social level than at individual level
https://www.dan.sperber.fr/wp-content/uploads/2009/10/Mercie...
Why do humans reason? Arguments for an argumentative theory
Now it raises the question should we be reasoning in our head then? Is there a better way to solve intractable math problems for example? Is math itself a red herring created for argumentative purposes?
We can never know, but I personally favour the rise of "handedness" and the tool-making (technological) hypothesis. To make and use tools, and to transfer the recipes and terminology, we must educate one another.
"In the physical adaptation view, one function (producing speech sounds) must have been superimposed on existing anatomical features (teeth, lips) previously used for other purposes (chewing, sucking). A similar development is believed to have taken place with human hands and some believe that manual gestures may have been a precursor of language. By about two million years ago, there is evidence that humans had developed preferential right-handedness and had become capable of making stone tools. Tool making, or the outcome of manipulating objects and changing them using both hands, is evidence of a brain at work." [1]
[1] _ https://assets.cambridge.org/97811084/99453/excerpt/97811084...
And it's also important that language and words are just another invention of humanity. We achieved plenty before we had any meaningful language whatsoever. At the bare minimum we reproduced - and think about all that's involved in forming a relationship, having a child, and then successfully raising that child for them to go on and do the same, all without any sort of language. It emphasizes that ideas and language are two very different concepts, at least for humans.
Think about it, if all reasoning is post-hoc rationalization, reasons are useless. Imagine a mentally ill person on the street yelling at you as you pass by: you're going to ignore those noises, not try to interpret their meaning and let them influence your beliefs.
This theory is too cynical. The real answer has got to have some element of "reasoning is useful because it somehow improves our predictions about the world"