General Theory of Neural Networks

AIorNot · a year ago

What’s wild to me is that Donald Hoffman is also proposing a similar foundation for his metaphysical theory of consciousness, ie that it is a fundamental property and that it exists outside of spacetime and leads via a markov chain of conscious agents (in a Network as described above)

Ie everything that exists may be the result of some kind of Uber Network existing outside of space and time

It’s a wild theory but the fact that these networks keep popping up and recurring at level upon level when agency and intelligence is needed is crazy

https://youtu.be/yqOVu263OSk?si=SH_LvAZSMwhWqp5Q

optimalsolver · a year ago

>it exists outside of spacetime

So I guess this theory won't be subject to empirical testing any time soon?

downboots · a year ago

A universe where everything can be empirically tested would be conveniently and suspiciously human-centered

jungturk · a year ago

Perhaps "outside" means "resident on the boundary of", a la holograms and Maldecena's AdS/CFT work, and so still within reach of experiment?

mistermann · a year ago

I don't think one even needs "supernatural" explanations.

1. Consider the base hardware of each agent:

http://neuropathologyblog.blogspot.com/2017/06/shannon-curra...

2. Consider that (according to science anyways) there is no central broadcaster of reality (it is at least plausible)

3. Consider each agent (often/usually) "knows" all of reality, or at least any point you query them about (for sure: all agents claim to know the unknowable, regularly; I have yet to encounter one who can stop a "powerful" invocation of #3 (or even try: the option seems literally unavailable), though minor ones can be overridden fairly trivially (I can think of two contrasting paths of interesting consideration based on this detail, one of them being extremely optimistic, and trivially plausible))

Simplified: what is known to be, is (locally).

4. Consider the possibility (or assume as a premise of a thought experiment) that reality and the universe are not exactly the very same thing ("it exists outside of spacetime"), though it may appear that they are (see #3)

Is it not fairly straightforward what is going on?

A big part of the problem is that #3 is ~inevitably[1] invoked if such things are analyzed, screwing up the analysis, thus rendering the theory necessarily "false" (it "is" false...though, it will typically not be asserted as such explicitly, and direct questions will be ignored/dodged).

[1] which is...weird (the inevitable part...like, it is as if consciousness is ~hardwired to disallow certain inspection (highly predictable evasive actions are invoked in response), something which can easily be tested/demonstrated).

AIorNot · a year ago

Can you explain #3 and 4# more clearly?

in #2 you are claiming there is no objective reality or no 'broadcaster' of reality

We must assume some things as being objective such as a rational universe in order to make any claims at all.

-if you are saying in #3 that humans as conscious agents make subjective claims about reality but that those claims are in fact 'the reality' for that agent or person, that is a subjective claim. (I'm not saying that that subjective reality isn't true for that person)

Also, Hoffman doesn't make a 'supernatural' claim per se, his claim is simply that reality as 'we all see it' is NOT the whole story, and that it is in fact only the projection of a vast, infinitely complex network of conscious agents that creates what we perceive as the material universe and time. He starts with the idea that consciousness as a property is fundamental, existing outside of space and time and that if you apply reasoning and mathematics that networks of agents acting as UANs in a sense project that material universe into being, with that assumption, ie that it extrapolates to our entire universe.

I'm not sure I'm (or anyone for that matter) is really qualified to answer that claim..it's so big that it does verge on mysticism. that's why I said its such a wild idea, but I found the article above another interesting piece of evidence for Hoffman, because it talks about a general theory underlying such networks:

whose "repeated and recursive evolution of Universal Activation Networks (UANs). These networks consist of nodes (Universal Activators) that integrate weighted inputs from other units or environmental interactions and activate at a threshold, resulting in an action or an intentional broadcast"

ie this is very similar to Hoffmans system of Conscious Agents -which is an extreme theory of such networks that I described above

https://evolutionnews.org/2023/10/eccentric-theories-of-cons...

cscurmudgeon · a year ago

Why is #3 obvious? How can agents know all of reality? May be a subset?

humansareok1 · a year ago

We've already invalidated hidden variable theories in Physics so I find it hard to believe consciousness has some separate class of hidden effects still undiscovered and allowable in our universe.

codethief · a year ago

> We've already invalidated hidden variable theories in Physics

Not quite, see e.g. https://en.wikipedia.org/wiki/De_Broglie%E2%80%93Bohm_theory

naasking · a year ago

> We've already invalidated hidden variable theories in Physics

No we haven't.

AIorNot · a year ago

I think Hoffmans path and the idea of consciousness being fundamental comes from a few conceptual Leaps - let me go through the high level of each one:

1. Current physics shows via quantum mechanics that spacetime has a definite limit in measurement (Planck scale)

2. Relatively also applies a similar limit on our ability to measure time and space (infinite energy/black holes)

3. Latest work in high energy physics has led to some interesting new findings (in the last 10 years or so) regarding an approach to calculate particle scattering amplitudes in supercolliders <= that is: when you apply nonlocal assumptions and certain mathematical simplification and that new approach simplifies the scattering amplitude calculations and ALSO just happens to map to a new conceptual framework where you think “outside of space and time” and then you can come to a geometric “structure” of immense complexity (let’s call one conception of that geometry the ‘amplitudehedron’) which is static, and immense encoding the universe itself, this polytope encodes the scattering amplitudes

For more on this See: https://youtu.be/6TYKM4a9ZAU?si=alGV5ThrCdBKcyfJ (hour long lecture by physicist Nima Arkani Hamid)

Short version: https://www.ias.edu/ideas/nima-arkani-hamed-amplituhedron

4. Given the hard problem of consciousness, ie “only awareness is aware” (ie we cannot break down the qualia of awareness) (Now this is where Hoffman goes wild: Hoffman says: “ Ok well, given that space time as we know it doomed (not fundamental - again see point 3 above)” then, let’s propose that consciousness IS defined to be FUNDAMENTAL and that it exists as a ‘network of conscious agents’:

Ie he preposes a “formal model of consciousness based on a mathematical structure called conscious agents”. then he proposes how time and space emerge from the interactions of conscious agents via the structure mentioned in point 3 above..

Hoffman then claims his math for these models implies that we are in a universe that emerged out fundamental consciousness and that he is working on a mathematical model he hopes can be tied the new physics that emerge out of the amplitudehedron through networks of these agents

5. Finally it was my observation that the general theory of Neural Networks in the article had some interesting similarities with all of this (ie maybe Nature uses such networks at all scales to represent intelligence )

Feel free to be skeptical- I am but I get all sorts of weird feeling he’s onto something here…

rdlecler1 · a year ago

I don’t know if this exists outside of spacetime, but I have a suspicion that UACs didn’t begin with gene regulatory networks, but are more fundamental part of a computational universe hypothesis.

kovezd · a year ago

Category theory is the mathematical formulation/foundation of this "Uber Network".

Graphs are the most basic unit of meaning.

raidicy · a year ago

I am a hobby student of category theory. Is there any breadcrumbs to your comment?

rdlecler1 · a year ago

No, graphs are too inclusive.

pyinstallwoes · a year ago

So the gnostics were right? Demiurge spatial-temporal firewall of reality nodes

quetzthecoatl · a year ago

weren't the sophia/gnosis, emnations and eons were from greek philosophy? also any philosophy/hottakes that stress on duality (what's seen here and what's out there that is causing what's seen here - such as manichean, advaita etc).

winter_blue · a year ago

This is a pretty cool theory that resonates well with me. What are some good places I can read more about this (and related theories)?

rdlecler1 · a year ago

This sits in a larger field of complexity theory and complex adaptive systems. There was also some interesting work on “Artificial Life” although that research program seems to have fallen out of favor. My introduction in 1995 was the book Chaos and then Stuart Kauffman’s At Home in the Universe. Wolframs New Kind of Science was also interesting.

CuriouslyC · a year ago

This is just Berkeley's idealism with a bunch of pseudoscientific hand waiving.

Consciousness isn't outside of space and time, it creates it.

lumost · a year ago

The existence of a universal function approximator or function representation is not particularly unique to neural networks. Fourier transforms can represent any function as a (potentially) infinite vector on an orthonormal basis.

What would be particularly interesting is if there were a proof that some universal approximators were more parameter efficient than others. The simplicity of the neural representation would suggest that it may be a particularly useful - if inscrutable approximator.

rdlecler1 · a year ago

I'm not arguing that this approximator is necessary (not sufficient) for this class of networks. I've proposed some conjectures on what we might expect to see, but there are certainly other salient ingredients and common principles that we haven't discovered, and I think it's important to hunt for them.

lumost · a year ago

Oh absolutely, the article gave me quite a bit to think about. It wasn't until I sat down and tried swapping a fourier transform/representation into the conjectures that I was able to think critically on the topic.

I suspect that the pruning operation is useful to consider mathematically. A fourier transform is a universal approximator - but only has useful approximation power when the basis vectors have eigenvalues which are significant for the problem at hand (PCA). If NN's replace that condition with a topological sense of utility. Then that is a major win (if formalized).

LarsDu88 · a year ago

There are a whole lot more activation functions used nowadays in NNs

https://dublog.net/blog/all-the-activations/

The author is extrapolating way too much. The simplest model of X is similar to the simplest model of Y, therefore the common element is deep and insightful, rather than mathematical modelers simply being rationally parsimonious.

cventus · a year ago

Nice list and history of common activation units used today.

Small note though, the heaviside function used in the the perceptron is non-linear (it can tell you which side of a plane the input point lies), and a multi-layer perceptron could classify the red and blue dots in your example. But it cannot be used with back-propagation because its derivative is zero everywhere, except at f(0), where it's non-differentiable.

LarsDu88 · a year ago

I think I should clarify... A multilayer perceptron can classify the red and blue dots if it uses a non-linear activation function for some or most of its layers correct?

If its perceptrons all the way down, it will fundamentally reduce down to a linear function or single linear layer and will not be able to classify the dots.

So there's the downside of not being able to linearly separate certain datasets, and the inability to scale weights or thresholds by differences in expected and observed data (e.g. using backpropagation)

LarsDu88 · a year ago

Thanks for the clarification. I'll update the post!

rdlecler1 · a year ago

Activation functions are implementation details. See appendix for the general formula.

LarsDu88 · a year ago

Ok, I get what you mean now. You can build a model by plugging in any activation function into the two slots in the equation at the bottom.

There's a typo in the activation function next to "otherwise" in the "Ant Pheromone Signaling" row.

AndrewKemendo · a year ago

This is another example of Markov Chains in the wild - so that’s what he’s seeing

The general nn is a discrete implementation of that

https://en.m.wikipedia.org/wiki/Markov_chain

rdlecler1 · a year ago

No, too inclusive.

rdlecler1 · a year ago

Despite vast implementation constraints spanning diverse biological systems, a clear pattern emerges the repeated and recursive evolution of Universal Activation Networks (UANs). These networks consist of nodes (Universal Activators) that integrate weighted inputs from other units or environmental interactions and activate at a threshold, resulting in an action or an intentional broadcast. Minimally, Universal Activator Networks include gene regulatory networks, cell networks, neural networks, cooperative social networks, and sufficiently advanced artificial neural networks.

Evolvability and generative open-endedness define Universal Activation Networks, setting them apart from other dynamic networks, complex systems or replicators. Evolvability implies robustness and plasticity in both structure and function, differentiable performance, inheritable replication, and selective mechanisms. They evolve, they learn, they adapt, they get better and their open-enedness lies in their capacity to form higher-order networks subject to a new level of selection.

RaftPeople · a year ago

Thoughts:

> 2-UANs operate according to either computational principles or magic.

Given that quantum effects do exist, does this mean that the result of quantum activity is still just another physical input into the UAN and does not change the analysis of what the UAN computes? It seems difficult to think that what a UAN computes is not impacted by those lower level details (meaning specifically quantum effects, I'm not thinking of just alternate implementations).

> 4-A UANs critical topology, and its implied gating logic, dictate its function, not the implementation details.

Dynamic/short term networks in brain:

Neurons in the brain are dynamically inhibited+excited due to various factors including brain waves, which seems like they are dynamically shifting between different networks on the fly. I assume when you say topology, you're not really thinking in terms of static physical topology, but more of the current logical topology that may be layered on top of the physical?

Accounting for Analog:

A neurons function is heavily influenced by current analog state, how is that accounted for in the formula for the UAN?

For example, activation at the same synapse can either trigger an excitatory post synaptic action potential or an inhibitory post synaptic action potential depending on the concentration of permeant ions inside and outside the cell at that moment.

I'm assuming a couple possible responses might be:

1-Even though our brain has analog activity that influence the operation of cells, there is still an equivalent UAN that does not make use of analog.

or

2-Analog activity is just a lower level UAN (e.g. atom/molecule level)

I don't think either of those are strong responses. The first triggers the question: "How do you know and how do you find that UAN?". The second one seems to push the problem down to just needing to simulate physics within +/- some error.

kaibee · a year ago

> Given that quantum effects do exist, does this mean that the result of quantum activity is still just another physical input into the UAN

Yeah, it could be a spurious input though. My understanding is that quantum mechanics doesn't really matter at biological scale, and that kinda makes sense right? Like, if this whole claim about biology being reducible to the topology of the components of the network is true, then the first thing you'd do is try to evolve components that are robust to quantum noise or leverage it for some result (ie: one can imagine some binding site constructed in such a way that it requires a rare event that none-the-less actually has a very specific probability of occurring).

> and does not change the analysis of what the UAN computes? It seems difficult to think that what a UAN computes is not impacted by those lower level details (meaning specifically quantum effects, I'm not thinking of just alternate implementations).

What the UAN computes is impacted by those lower level details, but it is abstractable given enough simulation data.

ie, imagine if you had a perfect molecular scan of a modern CPU that detailed the position of every atom. While it would be neat to simulate it physically, for the purpose of analysis, you'd likely want to at least abstract it to the transistor level. The 'critical topology' is I guess, the highest possible level of abstraction before a CPU tester can tell your simulation from an atom-level simulation.

Now for CPUs, we designed that model first and then built the CPU. In biology, it evolved on the physical level, but still maps to a 'critical topology'.

t_serpico · a year ago

"Topology is all that matters" --> bold statement, especially when you read the paper. The original authors were much more reserved in terms of their conclusions.

griffzhowl · a year ago

Yes, on its face it looks like he's saying that you can throw out the weights of any network and still expect the same or similar behaviour, which is obviously false. It's also contradicted in that very section where he reports from the cited paper that randomized parameters reproduced the desired behaviour in about 1 in 200 cases. All these cases have the same network topology so while that might be higher than expected probability for retaining function with randomized paramteres (over 2-3 orders of magnitude), it's also a clear demonstration that more than topology is significant

rdlecler1 · a year ago

The topology needs to be information bearing. Weights of 0.0001 are likely spurious and if other weights are so relatively big they can effectively make the other fan in weights spurious as well.

rdlecler1 · a year ago

The original papers were published in scientific journals. More assertive claims aren’t kosher.

sixo · a year ago

God this grandiose prose style is insufferable. Calm down.

Anyway, this doesn't even try to make the case that that equation is universal, only that "learning" is a general phenomena of living systems, which can be modeled probably in many different ways.

cfgauss2718 · a year ago

Agreed, I can’t help but feel there is some overcompensation driving the style of writing. It was difficult to finish.

rdlecler1 · a year ago

You’re right. Writing is hard—especially when you’re cutting across disciplines. I wasn’t happy with the writing, but I stand by the claims.

sharp11 · a year ago

Personally, I find the writing to be just fine. It is clear and cogent. I don’t have enough background to follow all the details, but I certainly hope you are not discouraged from pursuing big ideas by negative comments on style!

proof_by_vibes · a year ago

The excitement of new horizons is necessary for innovation, and a substack article is a safe way to express that excitement. It's clearly understood by the choice of medium that this is meant to be speculation, so there aren't any significant risks in engaging with the text on its own terms.

ai4ever · a year ago

architecture astronauts let loose on unified field theories.. talking warm and fuzzy - big bold ideas.

let them, i say, until, the tide shifts to something else tomorrow, and a new generation of big-picture thought leaders take over dumping their insufferable text on the populace.

grape_surgeon · a year ago

Yeah my bs meter went off in seconds. So much fluff

downboots · a year ago

Can you share the source code? (Half joking)

Imnimo · a year ago

How does the attention operator in transformers, in which input data is multiplied by input data (as opposed other neural network operations in which input data is multiplied by model weights) fit into the notion of a universal activator?

rdlecler1 · a year ago

This is a great question, and I don't yet have an answer. I'm going to butcher this description, so please be charitable, but functionally, the attention mechanism reduces the dimensions and uses the coincidence between the Q and K linear layers to narrow down to a subset of the input, and then the softmax amplifies the signal.

One unsatisfying argument might be that this might fall into implementation details for this particular class. Another prediction might be that an attention mechanism is an essential element of these networks that appears in other networks of this class. Another is that this is a decent approximation, but has limitations, and we'll figure out how the brain does it and replace it with that.