The Tesla Dojo Chip Is Impressive, but There Are Some Major Technical Issues

jvanderbot · 5 years ago

As Hamming suggested in "Art of doing science and engineering", when you want to make something autonomous, you usually have to build a completely different device that solves the same problem, rather than automating the same device.

I wonder. For all the money thrown into self-driving cars research, could we have had an autonomous rail system by now? The technology for mostly-autonomous rail is well understood. Most of the financial cost is in infrastructure to support the system. Seems to me self-driving cars try to short-circuit that infrastructure build-up. They try to "automate the device" rather than "producing an automated system that solves the problem of moving people and goods".

Specifically, I wonder if, for the cost and time spent on CPU-and-engineer-driven research and development of autonomous cars, if we could have had nationwide autonomous rail rolled out by now.

dragontamer · 5 years ago

> could we have had an autonomous rail system by now?

We already have autonomous rail systems. Its called positive train control and was fully implemented like a year or two ago (mandated in 2009, but you know how government works, lol) https://en.wikipedia.org/wiki/Positive_train_control

The train conductor has become more-and-more automated to remove the chance of human error. It works with a system of very reliable sensors that indicate where every train engine is on the rails.

Given the huge amount of cargo any particular train has, I don't think there's any intent on cutting the last two humans (the conductor + engineer) out of their job. Their salary costs are miniscule compared to the safety value they deliver, even if the job of driving a train has been almost entirely automated away by now.

llsf · 5 years ago

Paris got its first autonomous metro line in 1998. It was developed in ADA. http://archive.adaic.com/projects/atwork/paris.html

Using the B method: https://en.wikipedia.org/wiki/B-Method

https://link.springer.com/content/pdf/10.1007%252F3-540-4811...

https://arxiv.org/pdf/2005.07190.pdf

All this was developed in the 80's and 90's. It would be interesting to see how this evolved. Obviously with a ML/AI approach it would be different now. Although there might be some ways to specify some constraints or boundaries to an AI system, for safety, comfort, physics, etc.

jvanderbot · 5 years ago

I wonder if, for the cost spent on CPU-and-engineer-driven research and development of autonomous cars, if we could have had nationwide autonomous rail rolled out.

samstave · 5 years ago

Remember that derailment/near derailment that happened when someone cut the electrical circuit connector cable on some rails, which then prevented the system for knowing where the train was?

zdragnar · 5 years ago

We could have possibly automated some existing rail, but I am not altogether certain that doing so would have lead to any significant improvements in cost or efficiency.

Actually laying the infrastructure for mass transit via rail is an entirely different league of cost from what has been dumped into self driving cars.

We have a hard enough time agreeing on how to do light rail transit in places that want it, and then actually getting it done.

jvanderbot · 5 years ago

I'm not altogether convinced that the money and effort spent on self-driving cars has led or will lead to any significant improvements in cost or efficiency either.

Even if it does succeed, it seems to be about convenience anyway.

jeffbee · 5 years ago

A self-driving streetcar, that only has 1 degree of freedom, might be a practical problem to solve and maybe if it existed we would not have had to go through the 90-day outage of VTA light rail service that the Bay Area just experienced, an experience which in all likelihood killed VTA light rail forever.

deeviant · 5 years ago

Rail is already highly autonomous with external monitoring systems integrated with vehicle control systems, but I'll just go on as if the distinction between what it is now and what are calling "autonomous" is significant.

What problem does autonomous rail solve? The single driver is already a rounding error in total costs. Also, rail is already a controlled environment where collisions are much less likely to happen than road, so the fruit is much higher up the tree on that aspect too.

It seems to me that bringing autonomy to rail would have little on it's bottom line.

vishnugupta · 5 years ago

> build a completely different device that solves the same problem

I realised this while discussing self-driving cars with my friends.

I used example of Uber Eats. The problem statement is "I don't want to cook" and a reasonably acceptable solution IMO is cloud kitchens + delivery. As opposed to building a cooking robot.

Cloud kitchens could automate 80% of repeatable stuff because it makes sense to solve that problem at scale.

stcredzero · 5 years ago

Why not optimize Blue Apron and HelloFresh style meal kits so that they involve even less labor, then distribute the production to local commercial kitchens? There are already businesses that package this kind of prep work for restaurants.

whatshisface · 5 years ago

The labor saving advantages of carrying 100 people on the same vehicle are so enormous that there is little motivation if any to quit paying conductors and engineers.

samstave · 5 years ago

The profit motivation of having 100 people being required to buy gas is the anti-motivation in capitalist oligarchies

samstave · 5 years ago

I've always wondered if TCP and networking design could be applied to autonomous traffic... basically think of every car/train as a packet and ensure no collisions...

Which networking protocol best maps to this?

And what if we had smart traffic lights that were aware of every car in an surrounding area of an intersection...

I mean FFS certain tech companies track all vehicles that drive by/near their corporate campuses and report that back to the city...

And that's almost a decade old now...

So apply the same but report the data back to the traffic management system which is also trained on all the traffic patterns for a given intersection to best optimize for their patterns...

hamiltonkibbe · 5 years ago

For the most part the MAC just looks for another signal on the wire (another train on the same section of rail) and when it looks clear, starts transmitting (driving). As you can imagine, there will be cases when 2 MACs start talking at the same time, at which point they detect the collision, wait a random delay and try again. I wouldn’t want to be on that train, I’d prefer plain old serial with hardware flow control

rini17 · 5 years ago

Large part of TCP is dealing with dropped packets. Presumably you don't want that to carry that over to rail/road traffic :)

fanf2 · 5 years ago

The London Docklands Light Railway (DLR) has had fully automatic running since 1987 https://en.wikipedia.org/wiki/Docklands_Light_Railway

HPsquared · 5 years ago

We could go the other way and have humanoid robot drivers that can get in and drive any car. Now that'd be difficult!

dragontamer · 5 years ago

Somehow, I'm reminded of the Tsar tank from WW1. The Russians knew that a new weapon of war: an armored car, was necessary to break the stalemate of trench warfare.

This hypothetical armored car needed many features: the most important was that it must be able to move across the muddy no man's land reliably.

Tests have shown that regular sized wheels would get stuck in the mud. A bigger wheel has more surface area and greater contact area. So the Russians built an armored car with the largest wheels possible. Russian tests were outstanding, the Tsar tank rolled over a tree !!!!

https://en.m.wikipedia.org/wiki/Tsar_Tank

The French design was to use caterpillar tracks. We know what works now since we have a century of hindsight.

--------

Spending the most money to make the biggest wheel isn't necessarily the path to victory. I think it's more likely that the tech (aka, caterpillar track equivalent) hasn't been invented yet for robotaxis. Hitting the problem with bigger and more expensive neural network computers doesn't seem to be the right way to solve the problem.

Deleted Comment

zaptrem · 5 years ago

I agree with your points on the robotaxi front, but there are many other problems that will totally benefit from a bigger training computer.

dragontamer · 5 years ago

But Tesla isn't a cloud-provider company, nor is it a hardware company. None of the technical specs, assembly language, API, SDKs or whatnot have been released for Dojo.

The model of "someone will find this training computer useful" is... fine. Google TPUs, NVidia DGX, Intel Xe-HPC, AMD MI100, Cerebras wafer scale AI. These are computers that nominally are aiming for the market of selling computers / APIs / SDKs that will make training easier.

Its a pretty crowded field. Someone probably has struck gold (NVidia has a lead but... its still anyone's game IMO)

-------

If Tesla's goal is to compete against everyone else (or make a chip that's cost-competitive with everyone else), Tesla needs more volume than (allegedly) 3000 chips (quoted from the article: I dunno where they got this figure but... there's no way in hell 3k chips is cost-effective).

That's the name of the game: volume. The reason why NVidia leads is because NVidia sells the most GPUs right now, which means their R&D costs are applied to the broadest base, which means those company's engineering costs (aka: CUDA training) is spread across the widest number of programmers, leading to a self-reinforcing cycle of better hardware, lower costs, with a larger community of programmers to learn from.

baybal2 · 5 years ago

> many other problems that will totally benefit from a bigger training computer.

I don't really think it's that many.

The industry collectively sank untold billions into the blind belief that neural algorithms will somehow turn into "AI."

10 years later, no "AI," and not even a single money making niche use.

Right now the industry is deep in sank cost falacy, and people who promised this, and that to investors are now desperate, and doubling their bets in hopes that "at least something will come out of it...," a casino mode basically.

justapassenger · 5 years ago

> Of this competition, only Google and Nvidia have supercomputers that stand toe to toe with the Tesla’s

Even assuming that it's true (which I very much doubt - anyone that's willing to spend enough money with Nvidia, can have powerful supercomputer fairly quickly), it's very dishonest statement. It's comparing deployed system with a lab prototype of a single competent of potential supercomputer, that may be fully operational in few years (software is a really, really, really big deal here).

solidasparagus · 5 years ago

I assumed they're talking about Tesla's A100 cluster, which is huge - https://blogs.nvidia.com/blog/2021/06/22/tesla-av-training-s...

Tesla's compute-to-researcher ratio is definitely rare

jeffbee · 5 years ago

It is really unreasonable to compare Tesla's photoshop mocks with hardware already deployed in the field today. Google already has a TPUv4 cluster that can train ResNet-50 in 13 seconds, which is ridiculous. Until Tesla publishes actual MLPerf benchmarks, you can assume that their ASIC game is at least as far behind Google's as their self-driving game is behind Waymo's: 5 years at a minimum.

https://github.com/mlcommons/training_results_v1.0/tree/mast...

thesausageking · 5 years ago

The Q&A section on their compiler and software that the author links to is very interesting:

https://www.youtube.com/watch?v=j0z4FweCy4M&t=8047s

It sounds like they're going to have write a ton of custom software in order to use this hardware at scale. And, based on the team being speechless when asked a follow up question, it doesn't sound like they know (yet) how they're going to solve this.

Nvidia gets a lot of credit for their hardware advances, but what really what their chips work so well for deep learning was the huge software stack they created around CUDA.

Underestimating the software investment required has plagued a lot of AI chip startups. It doesn't sound like Tesla is immune to this.

ggoo · 5 years ago

Tesla's claim to delivery ratio is abysmal. I'm not sure why anybody even bothers deconstructing these presentations anymore, they're just fluff.

snorrah · 5 years ago

I would argue it’s always useful to see their tech deconstructed and explained. If nothing else, so we get an idea what the reality is to counter possible outlandish claims from overly-enthusiastic followers of the company (and its CEO)

stcredzero · 5 years ago

Tesla's claim to delivery ratio is abysmal.

Can you substantiate this concretely? How about a list, with direct sources? (Not opinion pieces.)

gooseus · 5 years ago

> The full system is scheduled for some time in 2022. Knowing Tesla’s timing on Model 3, Model Y, Cyber Truck, Semi, Roadster, and Full Self Driving, we should automatically assume we can pad this timing here.

That's just from the article; off the top of my head:

* NYC to LA fully autonomous drive by 2017.

* 1M Robotaxis on the road by 2021.

* Hyperloop.

* Solar roof tiles.

* All superchargers will be solar-powered.

* Tesla Semi.

Sure, some of these things may be "just around the corner" or "ramping up now", but some of these are claims going back almost 5 - 10 years where Elon says "2 weeks", "next year", "2 years", really whatever it takes to be just believable enough to get enough people to buy into a future where Tesla is worth 10x what it is today.

ggoo · 5 years ago

https://en.wikipedia.org/wiki/Criticism_of_Tesla,_Inc.

michelpp · 5 years ago

Clearly a a shot across the bow for Cerebras and another excellent target for the GraphBLAS.

Dense numeric processing for image recognition is a key foundation for what Tesla is trying to do, but that tagging of the object is just the beginning of the process, what is the object going to do? What are its trajectories, what is the degree of belief that a unleashed dog vs a stationary baby carriage is going to jump out?

We are just beginning to scratch the surface of counterfactual and other belief propagation models which are hypersparse graph problems at their core. This kind of chip, and what Cerebras are working on, are the future platforms for the possibility of true machine reasoning.

2bitencryption · 5 years ago

from the article:

> but the short of it is that their unique system on wafer packaging and chip design choices potentially allow an order magnitude advantage over competing AI hardware in training of massive multi-trillion parameter networks.

I kind of wonder if Tesla is building the Juicero of self-driving. [0]

Beautifully designed. An absolute marvel of engineering. The result of brilliant people with tons of money using every ounce of their knowledge to create something wonderful.

Except... you could just squeeze the bag. You could just use LIDAR. You could just use your hands to squish the fruit and get something just as good. You could just (etc etc).

No doubt future Teslas will be supercomputers on wheels. But what if all those trillions of parameters spent trying to compose 3D worlds out of 2D images is pointless if you can just get a scanner that operates in 3D space to begin with??

[0] https://www.theguardian.com/technology/2017/sep/01/juicero-s...

arnaudsm · 5 years ago

The Juicero comparison doesn't hold up. LIDAR is 10x more expensive than RGB, but neither reach lvl5 at the moment. I'm glad multiple companies try multiple paths, it's the best way to avoid a research dead-end.

KaiserPro · 5 years ago

> LIDAR is 10x more expensive than RGB

but pure RGB needs $millions to make a reliable realtime depth sensor, plus custom silicon and a massive annotated dataset.

It might just be that one company can do it, but its a hefty gamble.

nightski · 5 years ago

Everyone acts like LIDAR is the holy grail but then why isn't there someone destroying Tesla with that tech? Waymo is not much farther along than Tesla, maybe even behind as far as miles driven.

If that was all that was needed then it would be done.

modeless · 5 years ago

I'm glad people are exploring the design space. To some extent the training techniques and neural net architectures need to be tailored to the hardware. Nvidia isn't on top just because they're good at chip design, but because people have chosen to focus research effort on techniques that work well on Nvidia hardware. New hardware may allow new techniques to shine.

New hardware architectures can't really be used to their full potential without years of research into techniques that are suited for them. The more people who have access to the hardware, the faster we can discover those techniques. If Tesla is serious about their hardware project, they need to offer it to the public as some kind of cloud training system. They don't have enough people internally to develop everything themselves in a short enough time to remain competitive with the rest of the industry.