Darwin Machines - Readit News

> This layering forms the dominant narrative of how intelligence may work and is the basis for deep neural nets. The idea is, stimulus is piped into the "top" layer and filters down to the bottom layer, with each layer picking up on more and more abstract concepts.

popular deep artificial neural networks (lstms, llms, etc.) are highly recurrent, in which they are simulating not deep networks, but shallow networks that process information in loops many times.

> columns.. and that's about it.

recommend not to oversimplify structure here. what you describing is only high-level structure of single part of brain (neocortex).

1. brain has many other structures inside basal ganglia, cerebellum, midbrain, etc. each with different characteristic micro-circuits.

2. brain networks are highly interconnected on long range. neurons project (as in send signals) to very distant parts of the brain. similarly they get projections from other distant parts of brain too.

3. temporal dimension is important. your article is very ML-like focusing on information processing devoid of temporal dimension. if you want to draw parallels to real neurons in brain, need to explain how it fits into temporal dynamics (oscillations in neurons and circuits).

4. is this competition in realm of abeyant (what you can think in principle) or current (what you think now) representations? what's the timescales and neurological basis for this?

overall, my take it is a bit ML-like talk. if it describes real neurological networks it got to be closer and stronger neurological footing.

here is some good material, if you want to dive into neuroscience. "Principles of Neurobiology", Liqun Luo, 2020 and "Fundamental Neuroscience", McGraw Hill.

more resources can be found here:

http://neuroscience-landscape.com/

calepayson · a year ago

> popular deep artificial neural networks (lstms, llms, etc.) are highly recurrent, in which they are simulating not deep networks, but shallow networks that process information in loops many times.

Thanks for the info. Is there anything you would recommend to dive deeper into this? Books/papers/courses/etc.

> recommend not to oversimplify structure here. what you describing is only high-level structure of single part of brain (neocortex).

Nice suggestion. I added a bit to make it clear that I'm talking about the neocortex.

> 1 & 2

Totally. I don't think AI is a simple as building a Darwin Machine, much like it's not as simple as building a neural net. But I think the concept of a Darwin Machine is an interesting, and possibly important, component.

My goal with this post was to introduce folks who hadn't heard of this concept and, hopefully, get in contact with folks who had. I left out the other so I could try to focus on what matters.

> temporal dimension is important. your article is very ML-like focusing on information processing devoid of temporal dimension. if you want to draw parallels to real neurons in brain, need to explain how it fits into temporal dynamics (oscillations in neurons and circuits).

Correct me if I misunderstand, but I believe I did. The spatio-temporal firing patterns of minicolumns contain the temporal dimension. I touched on the song analogy but we can go deeper here.

Let's imagine the firing pattern of a minicolumn as a melody that fits within the period of some internal clock (I doubt there's actually a clock but I think it's a useful analogy). Each minicolumn starts "singing" its melody over and over, in time with the clock. Each clock cycle, every minicolumn is influenced by its neighbors within the network and they begin to sync up. Eventually they're all harmonizing to the same melody.

A network might propagate a bunch of different melodies at once. When they meet, the melodies "compete". Each tries to propagate to a new minicolumn and fitness is judged by other inputs to that minicolumn (think sensory) and the tendencies of that minicolumn (think memory).

I think the evolution is an incredible algorithm is because it relies as much as it does on time.

> is this competition in realm of abeyant (what you can think in principle) or current (what you think now) representations? what's the timescales and neurological basis for this?

I'm not familiar with these ideas but let me give it a shot. Feel free to jump in with more questions to help clarify.

Neural Darwinism points to structures - minicolumns, cortical columns, and interesting features of their connections - and describes one possibility for how those structures might lead to thought. In your words, I think the structures are the realm of abeyant representations while the theory describes current representations.

The neurological basis for this, the description of the abeyant representation (hope I'm getting that right), is Calvin's observations of the structure of the brain. Observations based on his and other's research.

To a large extent, neuroscience doesn't have a great through-line-story of how the brain works. For example the idea of regions of the brain responsible for specific functions - like the hippocampus for memory - doesn't exactly play nice with Karl Lashley's experimental work on memory.

What I liked most about this book is how Calvin tried to relate his theory to both structure and experimental results.

> overall, my take it is a bit ML-like talk. if it describes real neurological networks it got to be closer and stronger neurological footing.

If, by ML-like talk, you mean a bit woo-woo and hand wavy. Ya, I agree. Ideally I'd be a better writer. But I'm not, so I highly recommend the book.

It's written by an incredible neuroscientist and, so far, none of the neuroscience researchers I've given it to have expressed anything other than excitement about it. And I explicitly told them to keep an eye out for places they might disagree. One of them is currently reading it a second time right now with the goal verifying everything. If it all checks out, he plans on presenting the ideas to his lab. I'll update the post if he, or anyone in his lab, finds something that doesn't check out.

> here is some good material, if you want to dive into neuroscience. "Principles of Neurobiology", Liqun Luo, 2020 and "Fundamental Neuroscience", McGraw Hill.

Why these two textbooks? I got my B.S. in neuroscience so I feel good about the foundations. Happy to check these out if you believe they add something that many other textbooks are missing.

I don't think it matters so much how the brain is made, what matters is the training data. And we obtain data by searching. Search is a great concept, it covers evolution, intelligence and creativity, it's also social. Search is discrete, recursive, combinatorial and based on some kind of language (DNA, or words, or just math/code).

Searching the environment provides the data brain is trained on. I don't believe we can understand the brain in isolation without its data engine and the problem space where it develops.

Neural nets showed that given a dataset, you can obtain similar results with very different architectures, like transformer and diffusion models, or transformer vs Mamba. The essential ingredient is data, architecture only needs to pass some minimal bar for learning.

Studying just the brain misses the essential - we are search processes, the whole life is search for optimal actions, and evolution itself is search for environment fitness. These search processes made us what we are.

advael · a year ago

What in the world

Most "diffusion models" use similar VAE to transformer backbone architectures. Diffusion isn't an architecture, it's a problem framing

As for the rest of this, I'm torn between liking the poetry of it and pointing out that this is kind of that thing where you say something like it's supposed to be a mind-blowing insight when it's well-known and pretty obvious. Most people familiar with learning theory already understand learning algorithms of any kind as a subset of probabilistic search algorithms with properties that make them responsive to data. The idea that the structure of the information processing system doesn't matter and there's just this general factor of learning capacity a thing has is... not well supported by the way in which research has progressed in the entire period of time when this has been relevant to most people? Sure, in theory any neural network is a general function approximator and could in theory learn any function it's complex enough to represent. Also, we can arrive at the solution to any computable problem by representing it as a number and guessing random numbers until we can verify a solution. Learning algorithms can almost be defined as attempts to do better search via structured empiricism than can be done with the assumption that structure doesn't matter. Like, sometimes multiple things work, sure. That doesn't mean it's arbitrary

TL;DR: Of course learning is a kind of search, but discovering structures that are good at learning is the whole game

Xcelerate · a year ago

Yeah, I really don’t understand this recently popular viewpoint that the algorithm doesn’t matter, just how much data you throw at it. It doesn’t seem to be based on anything more than wishful thinking.

One can apply Hutter search to solve just about any problem conceivable given the data and guess what—you’ll approach the optimal solution! The only downside is that this process will take more time than available in our physical universe.

I think people forget the time factor and how the entire field of computational complexity theory arose because the meta problem is not that we can’t solve the problem—it’s that we can’t solve it quickly enough on a timescale that matters to humans.

Current NN architectures are missing something very fundamental related to the efficiency of problem solving, and I really don’t see how throwing more data at them is going to magically convert an EXPTIME algorithm into a PTIME one. (I’m not saying NNs are EXPTIME; I’m saying that they are incapable of solving entire classes of problems that have both PTIME and EXPTIME solutions, as the NN architecture is not able to “discover” PTIME solutions, thus rendering them incapable of solving those classes of problems in any practical sense).

visarga · a year ago

> Of course learning is a kind of search, but discovering structures that are good at learning is the whole game

No, you missed the essential. I mentioned search in the context of discovery, or in other words expanding knowledge.

Training neural nets is also a search for the best parameters that fit the data, but it's secondary. Many architectures work, there have been a thousand variations for the transformer architectures and plenty of RNN-like approaches since 2017 when transformer was invented, and none of them is better than the current one or significantly worse.

Also, considering human population, the number of neurons in the brain, synapses and wiring are very different at micro level from person to person, yet we all learn. The difference between the top 5% and bottom 5% humans is small compared with other species, for example. What makes a big difference between people is education, in other words experiences, or training data.

To return to the original idea - AI that simply learns to imitate human text is capable only of remixing ideas. But an AI that actively explores can discover novel ideas, like AlphaZero and AlphaTensor. In both these cases search played a major role.

So I was generalizing the concept of "search" across many levels of optimization, from protein folding to DNA and human intelligence. Search is essential for progress across the stack. Even network architecture evolves by search - with human researchers.

calepayson · a year ago

>I don't think it matters so much how the brain is made, what matters is the training data.

I agree that training data is hugely important but I think it does matter how the brain is made. Structures in the brain are remarkably well preserved between species. Despite the fact that evolution loves to try different methods, if it can get away with it.

> Searching the environment provides the data brain is trained on. I don't believe we can understand the brain in isolation without its data engine and the problem space where it develops.

I completely agree and suspect we might be on the same page. What I find most compelling about the idea of Darwin Machines is the fact that it relies on evolution. In my opinion, true Dawkinsian evolution, is the most efficient search algorithm.

I'd love to hear you go deeper on what you mean by data engine and problem space. To (possibly) abuse those terms, I think evolution is the data engine. The problem space is fun and I love David Eagleman's description of the brain as sitting in a warm bath in a dark room trying to figure out what to do with all these electric shocks.

> Neural nets showed that given a dataset, you can obtain similar results with very different architectures, like transformer and diffusion models, or transformer vs Mamba. The essential ingredient is data, architecture only needs to pass some minimal bar for learning.

My understanding of neural nets, and please correct me if I'm wrong, is that they solve system-one thinking, intuition. As of yet, they haven't been able to do much more than produce an average of their training data (which is incredible). With a brute force approach they can innovate in constrained environments, e.g. move 37 (or so I'm told, I haven't played go :)). I haven't seen evidence that they might be able to innovate in open-ended environments. In other words, there's no suggestion they can do system-two thinking where time spent on a problem correlates with the quality of the answer.

> Studying just the brain misses the essential - we are search processes, the whole life is search for optimal actions, and evolution itself is search for environment fitness.

I completely agree. I even suspect that, in a few years, we'll see "life" and "intelligence" as synonymous concepts, just implemented in different mediums. At the same time, studying those mediums can be a blast.

nikolayasdf123 · a year ago

cs702 · a year ago

Big-picture, the idea is that different modalities of sensory data (visual, olfactory, etc.) are processed by different minicolumns in the brain, i.e., different subnetworks, each outputting a different firing pattern. These firing patterns propagate across the surface area of the brain, competing with conflicting messages. And then, to quote the OP, "after some period of time a winner is chosen, likely the message that controls the greatest surface area, the greatest number of minicolumns. When this happens, the winning minicolumns are rewarded, likely prompting them to encode a tendency for that firing pattern into their structure." And this happens in multiple layers of the brain.

In other words, there's some kind of iterative mechanism for higher-level layers to find which lower-level subnetworks are most in agreement about the input data, inducing learning.

Capsule-routing algorithms, proposed by Hinton and others, seek to implement precisely this idea, typically with some kind of expectation-maximization (EM) process.

There are quite a few implementations available on github:

https://github.com/topics/capsules

https://github.com/topics/em-routing

https://github.com/topics/routing-algorithm

abeppu · a year ago

I haven't heard of anyone talk about Hinton's capsule network concepts for some time. In 2017-18 it seemed exciting both because of Hinton but also because the pose/transformation sounded pretty reasonable. I don't know what would count as "explanation", but I'd be curious to hear any thoughts about why it seems they didn't really pan out. (Are there any tasks for which capsule methods are the best?)

MAXPOOL · a year ago

If you take a birds eye view, fundamental breakthroughs don't happen that often. "Attention Is All You Need" paper also came out in 2017. It has now been 7 years without breakthrough at the same level as transformers. Breakthrough ideas can take decades before they are ready. There are many false starts and dead ends.

Money and popularity are orthogonal to pathfinding that leads to breakthroughs.

The short answer as to why capsule networks have "fallen out of fashion" is... Transformers.

Transformers came out at roughly the same time[a] and have proven to be great at... pretty much everything. They just work. Since then, most AI research money, effort, and compute has been invested to study and improve Transformers and related models, at the expense of almost everything else.

Many promising ideas, including routing, won't be seriously re-explored until and unless progress towards AGI seems to stall.

---

[a] https://arxiv.org/abs/1706.03762

Great summary. Thanks for the links. These are awesome.

jaimie · a year ago

The domain of Artificial Life is highly related and has had an ongoing conference series and journal going, might be worth mining for more inspiration:

https://en.wikipedia.org/wiki/Artificial_life https://direct.mit.edu/artl https://alife.org

mprime1 · a year ago

FYI Evolutionary Algorithms have been an active area of research for decades.[1]

Among the many uses, they have been applied to ‘evolving’ neural networks.

Famously a guy whose name I can’t remember used to generate programs and mutations of programs.

My recommendation if you want to get into AI: avoid anything written in the last 10 years and explore some classics from the 70s

[1] https://en.m.wikipedia.org/wiki/Evolutionary_algorithm

EvanAnderson · a year ago

I'm sure it's not who you're thinking of, but I can't miss an opportunity to mention Tom Ray and Tierra: https://tomray.me/tierra/whatis.html

PinkMilkshake · a year ago

In the Creatures artificial life / virtual pet series, the creatures have about 900 (maybe more in later versions) or so neurons. Each neuron is a little virtual machine that is designed in such a way that programs remain valid even with random mutation.

blixt · a year ago

A friend of mine made this in-browser neural network engine that could run millions of multi-layer NNs in a simulated world at hundreds of updates per second and each network could reproduce and evolve. It worked in the sense that the networks exhibited useful and varied behaviors. However, it was clear that larger networks were needed for more complex behaviors and evolution just starts to take a lot longer.

https://youtu.be/-1s3Re49jfE?si=_G8pEVFoSb2J4vgS

fancy_pantser · a year ago

Perhaps it was John Koza?

http://www.genetic-programming.com/johnkoza.html

JoeDaDude · a year ago

There is the case of Blondie24, an evolutionary neural net, or genetic algorithm, which was able to develop a very strong checkers-playing capability by self-play with no human instruction. It was later extended to paly other games.

https://en.wikipedia.org/wiki/Blondie24

northernman · a year ago

I read this book:

     <https://books.google.ca/books/about/Artificial_Intelligence_Through_Simulate.html?id=QMLaAAAAMAAJ>

in 1972. It was published in 1966.

mandibeet · a year ago

Your recommendation to explore the classics is a good one. You can gain a deeper appreciation by studying these foundational works

petargyurov · a year ago

> avoid anything written in the last 10 years

Why?

exe34 · a year ago

presumably because it's saturated with a monoculture, and the hope (rightly or wrongly), some of the other roads might lead to some alternative breakthrough.

Dead Comment

nirvael · a year ago

I think this is over-simplified and possibly misunderstood. I haven't read the book this article references but if I am understanding the main proposal correctly then it can be summarised as "cortical activity produces spatial patterns which somehow 'compete' and the 'winner' is chosen which is then reinforced through a 'reward'".

'Compete', 'winner', and 'reward' are all left undefined in the article. Even given that, the theory is not new information and seems incredibly analogous to Hebbian learning which is a long-standing theory in neuroscience. Additionally, the metaphor of evolution within the brain does not seem apt. Essentially what is said is that given a sensory input, we will see patterns emerge that correspond to a behaviour deemed successful. Other brain patterns may arise but are ignored or not reinforced by a reward. This is almost tautological, and the 'evolutionary process' (input -> brain activity -> behaviour -> reward) lacks explanatory power. This is exactly what we would expect to see. If we observe a behaviour that has been reinforced in some way, it would obviously correlate with the brain producing a specific activity pattern. I don't see any evidence that the brain will always produce several candidate activity patterns before judging a winner based on consensus. The tangent of cortical columns ignores key deep brain structures and is also almost irrelevant, the brain could use the proposed 'evolutionary' process with any architecture.

While it does build on established concepts like Hebbian learning, I think theory offers a potentially insightful way of thinking about brain function

> I think this is over-simplified and possibly misunderstood.

I'm with you here. I wrote this because I wanted to drive people towards the book. It's incredible and I did it little justice.

> "cortical activity produces spatial patterns which somehow 'compete' and the 'winner' is chosen which is then reinforced through a 'reward'"

A slight modification: spatio-temporal patterns*. Otherwise you're dead on.

> 'Compete', 'winner', and 'reward' are all left undefined in the article.

You're right. I left these undefined because I don't believe I have a firm understanding of how they work. Here's some speculation that might help clarify.

Compete - The field of minicolumns is an environment. A spatio-temporal pattern "survives" when a minicolumn is firing in that pattern. It's "fit" if it's able to effectively spread to other minicolumns. Eventually, as different firing patterns spread across the surface area of the neocortex, a border will form between two distinct firing patterns. They "Compete" insofar as each firing pattern tries to "convert" minicolumns to fire in their specific pattern instead of another.

Winner - This has two levels. First, an individual firing pattern could "win" the competition by spreading to a new minicolumn. Second, amalgamations of firing patterns, the overall firing pattern of a cortical column, could match reality better than others. This is a very hand-wavy answer, because I have no intuition for how this might happen. At a high level, the winning thought is likely the one that best matches perception. How this works seems like a bit of a paradox as these thoughts are perception. I suspect this is done through prediction. E.g. "If that person is my grandmother, she'll probably smile and call my name". Again, super hand-wavy, questions like this are why I posted this hoping to get in touch with people who have spent more time studying this.

Reward - I'm an interested amateur when it comes to ML, and folks have been great about pointing out areas that I should go deeper. I have only a basic understanding of how reward functions work. I imagine the minicolumns as small neural networks and alluded to "reward" in the same sense. I have no idea what that reward algorithm is or if NNs are even a good analogy. Again, I really recommend the book if you're interested in a deeper explanation of this.

> the theory is not new information and seems incredibly analogous to Hebbian learning which is a long-standing theory in neuroscience.

I disagree with you here. Hebbian learning is very much a component of this theory, but not the whole. The last two constraints were inspired by it and, in hindsight, I should have been more explicit about that. But, Hebbian learning describes a tendency to average, "cells that fire together wire together". Please feel free to push back here but, the concept of Darwin Machines fits the constraints of Hebbian learning while still offering a seemingly valid description of how creative thought might occur. Something that, if I'm not misunderstanding, is undoubtedly new information.

> I don't see any evidence that the brain will always produce several candidate activity patterns before judging a winner based on consensus.

That's probably my fault in the retelling, check out the book: http://williamcalvin.com/bk9/index.htm

I think if you read Chapters 1-4 (about 60 pages and with plenty of awesome diagrams) you'd have a sense for why Calvin believes this (whether you agree or not would be a fun conversation).

> The tangent of cortical columns ignores key deep brain structures and is also almost irrelevant, the brain could use the proposed 'evolutionary' process with any architecture.

I disagree here. A common mistake I think we to make is assuming evolution and natural selection are equivalent. Some examples of natural selection: A diversified portfolio, or a beach with large grains of sand due to some intricacy of the currents. Dawkinsian evolution is much much rarer. I can only think of three examples of architectures that have pulled it off. Genes, and their architecture, are one. Memes (imitated behavior) are another. Many animals imitate, but only one species has been able to build architecture to allow those behaviors to undergo an evolutionary process. Humans. And finally, if this theory is right, spatiotemporal patterns and the columnar architecture of the brain is the third.

Ignoring Darwin Machines, there are only two architectures that have led to an evolutionary process. Saying we could use "any architecture" seems a bit optimistic.

I appreciate the thoughtful response.

Thanks for the considered reply.

jekude · a year ago

I’ve been noodling on how to combine neural networks with evolution for a while. I’ve always thought that to do this, you need some sort of evolvable genetic/functional units, and have had trouble fitting traditional artificial neurons w backprop into that picture.

My current rabbit hole is using Combinatory Logic as the genetic material, and have been trying to evolve combinators, etc (there is some active research in this area).

Only slightly related to the author’s idea, its cool that others are interested in this space as well.

Matumio · a year ago

Then probably you know about NEAT (the genetic algorithm) by now. I'm not sure what has been tried in directly using combinatorical logic instead of NNs (do Hopfield networks count?), any references?

I've tried to learn simple look-up tables (like, 9 bits of input) using the Cross-Entropy method (CEM), this worked well. But it was a very small search space (way too large to just try all solutions, but still, a tiny model). I haven't seen the CEM used on larger problems. Though there is a cool paper about learning tetris using the cross-entropy method, using a bit of feature engineering.

daveguy · a year ago

I am familiar with NEAT, it was very exciting when it came out. But, NEAT does not use back propagation or single network training at all. The genetic algorithm combines static neural networks in an ingenious way.

Several years prior, in undergrad, I talked to a professor about evolving network architectures with GA. He scoffed that squishing two "mediocre" techniques together wouldn't make a better algorithm. I still think he was wrong. Should have sent him that paper.

IIRC NEAT wasn't SOTA when it came out, but it is still a fascinating and effective way to evolve NN architecture using genetic algorithms.

If OP (or anyone in ML) hasn't studied it, they should.

https://en.m.wikipedia.org/wiki/Neuroevolution_of_augmenting... (and check the bibliography for the papers)

Edit: looking at the continuation of NEAT it looks like they focused on control systems, which makes sense. The evolved network structures are relatively simple.

peheje · a year ago

Maybe a key innovation would be to apply backpropagation to optimize the crossover process itself. Instead of random crossover, compute the gradient of the crossover operation.

For each potential combination, "learn" (via normal backprop) how different ways of crossover impacts on overall network performance. Then use this to guide the selection of optimal crossover points and methods.

This "gradient-optimized crossover" would be a search process in itself, aiming to find the best way to combine specific parts of networks to maximize improvement of the whole. It could make "leaps", instead of small incremental steps, due to the exploratory genetic algorithm.

Has anything like this been tried?

pyinstallwoes · a year ago

Thermodynamic annealing over a density parameter space

sdwr · a year ago

Fantastic speculation here, explains a lot, and has testable hypotheses.

For example, there should be a relationship between rate of learning and the physical subcolumns - we should be able to identify when a single column starts up / is fully trained / is overused

Or use AI to try to mirror the learning process, creating an external replica that makes the same decisions as the person

Marvin Minsky was spot on about the general idea 50 years ago, seeing the brain as a collection of 1000s of atomic operators (society of mind?)

> Fantastic speculation here, explains a lot, and has testable hypotheses.

Calvin is the man.

> For example, there should be a relationship between rate of learning and the physical subcolumns - we should be able to identify when a single column starts up / is fully trained / is overused

This sounds super interesting. Could you break down what you're thinking here?

> Marvin Minsky was spot on about the general idea 50 years ago, seeing the brain as a collection of 1000s of atomic operators (society of mind?)

I'm very much an amateur in this field and was under the impression that Minsky was trying to break it up, but was trying to specify each of those operations. What I find so enticing about Neural Darwinism is the lack of specification needed. Ideally, once you get the underlying process right, there's a cascade of emergent properties.

Using the example of a murmuration of starlings I picture Minsky trying to describe phase transitions between every possible murmuration state. On the other hand I see Neural Darwinism as an attempt to describe the behavior of a single starling which can then be scaled to thousands.

Let me know if that's super wrong. I've only read second hand descriptions of Minsky's ideas, so feel free to send some homework my way.

breck · a year ago

> I've only read second hand descriptions of Minsky's ideas, so feel free to send some homework my way.

Here you go: https://breckyunits.com/marvin-minsky.html

I think you are right in that Minsky was missing some important details in the branches of the tree, particularly around cortical columns, but he was old when Hawkins and Numenta released their stuff.

In terms of the root idea of the mind being a huge number of concurrent agents, I think he was close to the bullseye and it very much aligns with what you wrote.

jcynix · a year ago

Regarding Minsky: the most interesting thoughts I read about theories of a mind, are his books, namely: The Society of Mind and The Emotion Machine which should be more widely known.

More of Minsky's ideas on “Matter, Mind, and Models” are mentioned here: https://www.newyorker.com/magazine/1981/12/14/a-i

And let's not forget Daniel Dennett: In “Consciousness Explained,” a 1991 best-seller, he described consciousness as something like the product of multiple, layered computer programs running on the hardware of the brain. [...]

Quoted from https://www.newyorker.com/magazine/2017/03/27/daniel-dennett...