I’m not a fan of Extropic, but I’m seeing a lot of misconceptions here.
They’re not building “a better rng”- they’re building a way to bake probabilistic models into hardware and then run inference on them using random fluctuations. Theoretically this means much faster inference for things like PGMs.
Skimmed the litepaper. Has the flavor of: you can do "simulated" annealing by literally annealing. I like the idea of using raw physics as a "hardware" accelerator, i.e. analog computing. fwiw, quantum computing can be seen as a form of analog computing.
I do think that a "better rng" can be interesting and useful in and of itself.
Thanks for the Normal Computing post, it felt more substantial.
We experimented with doing ML training with it, but it's not clear that it trains any better than a non-broken PRNG. It might be fun to feed the output into stable diffusion and see how cool the pictures are, though.
It did make me curious however, if we dropped the requirement that operations return correct values in favor of probably correct values - would we see any material computing gains in hardware? Large neural models are intrinsically error correcting and stochastic.
I’m unfortunately not familiar enough with hardware to weigh in.
The trouble is if you use actual randomness then you lose repeatability which is an incredibly useful property of computers. Have fun debugging that!
What you want is low precision with stochastic rounding. Graphcore's IPUs have that and it's a really great feature. It lets you use really low precision number formats but effectively "dithers" the error. Same thing as dithering images or noise shaping audio.
I wouldn't want to write this off because you get the feeling these guys are on to something that could be hugely important (ignoring quantum this thermodynamic that) - but surely it feels like they need to get to the point a lot faster e.g.
"We're taking a new approach to building chips for AI because transistors can't get any smaller."
I really don't know what they gain by convoluting the point and it's pretty hard to follow what the CEO is talking about half the time.
Quantum computing people have been selling this exact spiel (including the convoluted talking points) for decades and it keeps working at getting funded. It has not produced any results for the rest of us, though.
One difference is that baking mathematical models into electronic analogs is older than integrated circuits. The reason we deviated from that model is because the re-programmability and cost of general purpose, digital computers was way more economical than bespoke hardware for expensive and temperamental single purpose analog computers. The unit economics basically killed analog computing. What Extropic (and others) have identified is that in the case of machine learning, the pendulum might have to swing back because we do have a large scale need for bespoke hardware. We'll see if they're right.
Quantum computing has been exploring an entirely new model of computation for which it's hard to even articulate the problems it can solve. Whereas using analog computers in place of digital is already well defined.
The tech could be really cool if e.g. classifiers could be represented within the probability space modeled on their hardware. However their shaman-speak isn't confidence inducing.
Your summary seems to miss a later quote from the article:
> Extropic is also building semiconductor devices that operate at room temperature to extend our reach to a larger market. These devices trade the Josephson junction for the transistor. Doing so sacrifices some energy efficiency compared to superconducting devices. In exchange, it allows one to build them using standard manufacturing processes and supply chains, unlocking massive scale.
So, their mass-market device is going to be based on transistors.
The actual article read like a weird mesh of techno-babble and startup-evangelism to me. I can't judge if what they are suggesting is vaporware or hyperbole. This is one of those cases where they are either way ahead of my own thinking or they are trying to bamboozle me with jargon.
I personally find it hard to categorize a lot of AI hype into "worth actually looking into" vs. "total waste of time". The best I can do in this case is suspend my judgement and if they come up again with something more substantive than a rambling post then I can always readjust.
Am I the only one who thought the article was clear, lucid, and reasonably concise?
The company's success or failure will depend on execution, but the value proposition is quite sound. Maybe I've just spent too much time in the intersection between information theory, thermodynamics, and signal processing...
"Don't splurge on high SNR ('digital') hardware just to re-introduce noise later." == "Don't dig a hole and fill it in again. You waste energy twice!"
> Doing so sacrifices some energy efficiency compared to superconducting devices.
In most applications superconductivity does not actually yield better energy efficiency at system level, since it turns out cooling stuff to negative several hundred degrees is quite energy demanding.
I don't disagree. I just come away from the article feeling more confused as opposed to enlightened and excited about what they're building.
It even makes me think that they don't understand what they're talking about which is why they're using complicated terminology to mask it but I'm hopeful I'm wrong and this is an engineering innovation that benefits everyone.
Since they're building a special-purpose accelerator for a certain class of models, what I'd like to see is some evidence that those models can achieve competitive performance (once the hardware is mature). Namely, simulate these models on conventional hardware to determine how effective they are, then estimate what the cost would be to run the same model on Extropic's future hardware.
Much, much better. The first minute or so explains what they are trying to do and why in a way the I can understand.
This interview makes me much more excited and less skeptic than Verdon's usual mumbo-jumbo jargon. He should try using simpler, and more humble language more often.
This interview makes their product seem like BS. First, they literally cannot simply explain the problem or solution. Regardless, their pitch is that they're building a more power efficient probability distribution sampler. No one in AI research thinks that's a bottleneck.
edit: btw the bottleneck in AI algos is matrix multiply and memory bandwith.
My take on the Garry Tan interview (which seems pretty clear, regardless of whether this is snake oil or not) is that Extropic are building low power analog chips because we're hitting up against the limits of Moore's Law (limit's of physics in reducing transistor size), and at the same time the power consumption for LLM/AI training and inference is starting to get out of hand.
So, their solution is to embrace the stochastic operation of smaller chip geometries where transistors become unreliable, and double down on it by running the chips at low power where the stochasticity is even worse. They are using an analog chip design/architecture of some sort (presumably some sort of matmul equivalent?) and using a "full-stack" design whereby they have custom software to run neural nets on their chips, taking advantage of the fact the neural nets can tolerate, and utilize, randomness.
Computationally, yes, those are the bottlenecks. But I would also add supervised training data, as we can never get enough of that and it is one of few things that increases in compute are (to my mind, you could argue that by scaling unsupervised training further we could do away with it, but I am not yet convinced) not able to solve.
People need to read Hamming’s old papers in which he very clearly explains why analog circuits are not viable at scale. This is also why the brain uses spikes rather than continuous signals. The issue is noise, interference, and attenuation. There’s no way to get around this. If they have invented a way, I’d like to see it. But until it’s demonstrated, I’d take such things with a large grain of salt.
You can re-quantize analog signals into a finite number of levels to prevent noise accumulation. That's how TLC (8 levels) and QLC (16 levels) flash memory cells work. The cells store an analog value, but it's forced to a value close to one of N discrete values. The same approach is used in modems.
Deep learning doesn't seem to need that much numerical precision. People started with 32-bit floats, then 16-bit floats, now sometimes 8-bit floats, and recently there are people talking up 2-bit trinary. The number of levels needed may not be too much for analog. If you have a regenerator once in a while to slot values back to the allowed discrete levels, you can clean up the noise. That's an analog to digital to analog conversion, of course.
That's not what these guys are talking about, as far as I can tell.
Not at the moment, but I do recall he has a chapter on this in his book “The Art of Doing Science and Engineering”, which I also recommend. He uses very long transmission lines to explain this, but the same thing applies at the nano scale, and perhaps to an even greater extent due to the much noisier environment and higher frequencies.
I really hope this was an experiment in using gen AI:
“Create a website for a new company that is building the next generation of computing hardware to power AI software. Make sure it sounds science-y but don’t be too specific.”
HN has always been a tense standoff between a few cliques, the first two being the ostensibly intended audience;
* competent and curious engineers
* entrepreneurs, who live on a continuum where one end is...
* ...hucksters and snake-oil purveyors, of which there are plenty, and
* (because this is the Internet) conspiracy theorists and other such loons
and recently
* political provocateurs
You can make a thread work (for that group of people) if it self-selects who reads it. Unfortunately, AI is catnip to all five of these groups, so the average thread quality is exceptionally low – it serves all five groups badly.
Whether some of these people _should_ be served well is a separate question.
The use of “full-stack” was the first thing I noticed. Everyone, please stop using that term. I’m pretty sure, with a high degree of certainty, you don’t know what it means. If you do, there’s a merit badge waiting for you. And can we please stop using “hallucinations” to describe output. Yes, it may look like your tool dropped acid, but that’s not what it is.
I now think of the "stack" of a modern business as starting with physics and ending with making someone happy (unless you are Oracle). Full-stack engineers should then know how to connect physics to peoples' happiness.
At a high level it is the right answer to the data center electricity demand problem. Which is that we need to make AI hardware more efficient.
Pragmatically, it doesn't make much sense given that it would take years for this approach to have any real work use cases in a best case scenario. It seems way more likey that efficiency gains in digital chips will happen first making these chips less economically valuable.
For the longest time I thought the person behind the account was just some random guy who was probably very into crypto and decided to dabble in AI because of the parallels between e/acc and the whole "to the moon" messaging you find in crypto communities.
Never would have guessed the guy was an actual physicist
Hard time believing this is legit given how much time the CEO spends goofing around on social media. If it were possible to short startups, this would be a top candidate.
honestly, it would be too early to say this. Considering the people who invested in this startup, its better to assume CEO is capable. If he is not able to deliver in reasonable timeline then, we all are free to blame him for posting things on SM. actually many knows his company because he is goofing around on SM especially e/acc stuff.
They’re not building “a better rng”- they’re building a way to bake probabilistic models into hardware and then run inference on them using random fluctuations. Theoretically this means much faster inference for things like PGMs.
See here for similar things: https://arxiv.org/abs/2108.09836
There’s a company called Normal Computing that did something similar: https://blog.normalcomputing.ai/posts/2023-11-09-thermodynam...
I do think that a "better rng" can be interesting and useful in and of itself.
Thanks for the Normal Computing post, it felt more substantial.
We experimented with doing ML training with it, but it's not clear that it trains any better than a non-broken PRNG. It might be fun to feed the output into stable diffusion and see how cool the pictures are, though.
I’m unfortunately not familiar enough with hardware to weigh in.
What you want is low precision with stochastic rounding. Graphcore's IPUs have that and it's a really great feature. It lets you use really low precision number formats but effectively "dithers" the error. Same thing as dithering images or noise shaping audio.
Is there any evidence that such a probabilistic model can run better than a state of the art model?
Or alternatively what would it take to convert an existing model (let's say, an easy one like llama2-7b) into an extropic model?
No, but they got 15M seed funding anyway.
"We're taking a new approach to building chips for AI because transistors can't get any smaller."
I really don't know what they gain by convoluting the point and it's pretty hard to follow what the CEO is talking about half the time.
Quantum computing has been exploring an entirely new model of computation for which it's hard to even articulate the problems it can solve. Whereas using analog computers in place of digital is already well defined.
> Extropic is also building semiconductor devices that operate at room temperature to extend our reach to a larger market. These devices trade the Josephson junction for the transistor. Doing so sacrifices some energy efficiency compared to superconducting devices. In exchange, it allows one to build them using standard manufacturing processes and supply chains, unlocking massive scale.
So, their mass-market device is going to be based on transistors.
The actual article read like a weird mesh of techno-babble and startup-evangelism to me. I can't judge if what they are suggesting is vaporware or hyperbole. This is one of those cases where they are either way ahead of my own thinking or they are trying to bamboozle me with jargon.
I personally find it hard to categorize a lot of AI hype into "worth actually looking into" vs. "total waste of time". The best I can do in this case is suspend my judgement and if they come up again with something more substantive than a rambling post then I can always readjust.
Am I the only one who thought the article was clear, lucid, and reasonably concise?
The company's success or failure will depend on execution, but the value proposition is quite sound. Maybe I've just spent too much time in the intersection between information theory, thermodynamics, and signal processing...
"Don't splurge on high SNR ('digital') hardware just to re-introduce noise later." == "Don't dig a hole and fill it in again. You waste energy twice!"
In most applications superconductivity does not actually yield better energy efficiency at system level, since it turns out cooling stuff to negative several hundred degrees is quite energy demanding.
It even makes me think that they don't understand what they're talking about which is why they're using complicated terminology to mask it but I'm hopeful I'm wrong and this is an engineering innovation that benefits everyone.
https://twitter.com/Extropic_AI/status/1767203839818781085
Since they're building a special-purpose accelerator for a certain class of models, what I'd like to see is some evidence that those models can achieve competitive performance (once the hardware is mature). Namely, simulate these models on conventional hardware to determine how effective they are, then estimate what the cost would be to run the same model on Extropic's future hardware.
This interview makes me much more excited and less skeptic than Verdon's usual mumbo-jumbo jargon. He should try using simpler, and more humble language more often.
edit: btw the bottleneck in AI algos is matrix multiply and memory bandwith.
So, their solution is to embrace the stochastic operation of smaller chip geometries where transistors become unreliable, and double down on it by running the chips at low power where the stochasticity is even worse. They are using an analog chip design/architecture of some sort (presumably some sort of matmul equivalent?) and using a "full-stack" design whereby they have custom software to run neural nets on their chips, taking advantage of the fact the neural nets can tolerate, and utilize, randomness.
https://m.youtube.com/watch?v=8fEEbKJoNbU&pp=ygUVbGV4IGZyaWR...
If it is a fraud, how do people like this get funded?? (And how can I be creepier so that my real ideas get funded)
Deep learning doesn't seem to need that much numerical precision. People started with 32-bit floats, then 16-bit floats, now sometimes 8-bit floats, and recently there are people talking up 2-bit trinary. The number of levels needed may not be too much for analog. If you have a regenerator once in a while to slot values back to the allowed discrete levels, you can clean up the noise. That's an analog to digital to analog conversion, of course.
That's not what these guys are talking about, as far as I can tell.
“Create a website for a new company that is building the next generation of computing hardware to power AI software. Make sure it sounds science-y but don’t be too specific.”
* competent and curious engineers
* entrepreneurs, who live on a continuum where one end is...
* ...hucksters and snake-oil purveyors, of which there are plenty, and
* (because this is the Internet) conspiracy theorists and other such loons
and recently
* political provocateurs
You can make a thread work (for that group of people) if it self-selects who reads it. Unfortunately, AI is catnip to all five of these groups, so the average thread quality is exceptionally low – it serves all five groups badly.
Whether some of these people _should_ be served well is a separate question.
now, it is hard to tell who put effort in at all. read or write.
would you consider your own response to be optimistic or high effort?
Dead Comment
Right. A better word is confabulation.
I.e. pseudomemories, a replacement of a gap in information with false information that is not recognized as such.
Dead Comment
Pragmatically, it doesn't make much sense given that it would take years for this approach to have any real work use cases in a best case scenario. It seems way more likey that efficiency gains in digital chips will happen first making these chips less economically valuable.
So much so I wonder what the hell they're doing with this company. Is he a prolific poster and an engineering genius? Or is he just another poster
Never would have guessed the guy was an actual physicist
This whole pitch sounds like the usual quantum computing babble.