Readit News logoReadit News
Animats · 9 years ago
I'm still not happy with self-driving on vision alone, or vision augmented with radar. There are too many hard cases for vision. Everybody who has good self-driving right now - Google, Otto, Volvo, GM - uses LIDAR.

Self-driving is coming to the first end users in 2017, in Volvo's test of 100 vehicles. Volvo has multiple LIDARs, multiple radars, multiple cameras, redundant computers, and redundant actuators. They're being cautious. Yet they're getting there first.

With the new hardware, Tesla ought to be able to field smart cruse control that doesn't ram into stopped vehicles partially blocking a lane. They've rammed stopped vehicles at speed three times now. At least with the new hardware things should get better. Do they still have the radar blind spot at windshield height?

ChuckMcM · 9 years ago
The argument reasoning I've heard goes like this;

People drive reasonably well using vision primarily and with imperfect visibility of their environment.

Computer learning networks can classify imagery at least as accurately as humans and sometimes more so.

A computer using imagery that is well classified from an array of visual sensors with near perfect visibility should be able to drive as well, or better, than a human driver.

The execution strategy appears to be to run classification and command prediction all the time, and while the human is in control consider it supervised learning.

The argument against LIDAR is just this in reverse, humans don't need LIDAR to drive, why should computers?

LIDAR is an engineering solution to the problem of creating a representation of the 3D space around the vehicle. It is a stand in for the less well understood human ability to do the same just by looking around. As a result if the "looking around" solution being proposed by NVidia and Tesla meets the engineering requirement, I don't see any reason that the car should have LIDAR.

argonaut · 9 years ago
Driving has almost nothing to do with image classification. *

Humans implicitly perform SLAM (simulataneous localization and mapping). What do I mean? Look around your room. Close your eyes. Visualize the room. As a human, you've built a rough 3D model of the room. And if you keep your eyes open and walk through the room, that map is pretty fine-grained/detailed too and humans can keep track of where they are in the map.

The state of the art in visual SLAM (visual SLAM = SLAM from just images, nothing else) is not deep learning. It's actually still linear-algebra/geometric/keyframe based traditional computer vision (including variants that incorporate GPS/accelerometer info). There are all sorts of limitations, but the biggest is the current algos don't work when the environment is moving (!!!).

SLAM from LIDAR is solved. That's why people use LIDAR.

You might argue, that perfect SLAM is overkill for driving. And I agree. Humans rely on being able to do lots of things that are "theoretically overkill" for any given task - and maybe that's exactly why, so far, humans can drive and computer can't.

* It bears noting that even in domains like image segmentation, humans still do better than neural nets. (Group pixels in an image into different categories - this is still a caricature of "vision," but still far more representative of real vision than simply giving a global label to an image).

Cybiote · 9 years ago
> People drive reasonably well using vision primarily

This is not accurate however. Other important senses in use include proprioceptive, hearing and tactile feedback from wheels. In addition to vision and the improved dynamic range of eyes, there is the important fact that human vision integrates a world model into expectations. Human vision also models time and motion which help manage where to focus attention. Humans can additionally predict other agents and other things about the world based on intuitive physics. This is why they can get on without the huge array of sensors and cars cannot. Humans make up for the lack of sensors by being able to use the poor quality data more effectively.

To put this in perspective, 8.75 megabits / second is estimated to pass through the human retina but only on the order of a 100 bits is estimated to reach conscious attention.

> Computer learning networks can classify imagery at least as accurately as humans and sometimes more so.

This is true but only in a limited sense. For example, when I put in the image on the right (of a car in a swimming pool) from http://icml.cc/2015/invited/LeonBottouICML2015.pdf#page=58 (which you should read and find the talk of but) in ResNet I get as top results:

0.2947; screen, CRT screen

golfcart, golf cart

boathouse

amphibian, amphibious vehicle

For LeNet it's:

0.5422; amphibian, amphibious vehicle

jeep, landrover

wreck

speedboat

The key difference is learning in animals occurs by breaking things down in terms of modular concepts, so even when things are not recognized new things can be labeled as a composition of smaller nearby concepts. Machines cannot yet do this well at all and certainly not as flexibly. Things as lighting and shading do not move animals as much in the concept space.

> The execution strategy appears to be to run classification and command prediction all the time, and while the human is in control consider it supervised learning.

This strategy will not learn from accidents because the signal there will be far from optimal usually.

mattnewton · 9 years ago
Why not shoot for better than human performance and "cheat" any way possible along the way? To paraphrase a quote I can't remember by who, do we care if a submarine "swims"?

Besides, with even with lidar, the problem is hard enough.

tedunangst · 9 years ago
ML seems pretty bad at classifying things it hasn't seen before though. There are quite a few examples where an input outside the training data resulted in misclassification.

Humans may not always see a white truck in a snowstorm, but is computer vision going to see it either? Or will it pattern match the few visible parts as something else entirely? Or dismiss the truck entirely as noise?

guiambros · 9 years ago
> ... humans don't need LIDAR to drive, why should computers?

That being the case, wouldn't we be limiting self-driving technology to the same traffic-related death rates as humans? Maybe 10, 20% better, but still fundamentally close.

For self-driving cars to be truly successful, the death rates will need to be an order of magnitude better. An incremental improvement won't convince governments and the public at large to trust their lives to an algorithm running inside a black box.

To be an order of magnitude better, you'll likely need to go well beyond simply processing pixels, including LIDAR and other sensors.

Shivetya · 9 years ago
LIDAR is just another form of seeing, just not as we are used to as people but combined with cameras they two would compliment each other. Relying on only one is a fool's gambit.

LIDAR won't go blind from white trucks on sunny days. LIDAR won't suffer snow blindness or inability to track in conditions where humans don't see well, like heavy rain at night. You add in visual acquisition to fine tune what you are detecting if necessary; perhaps to read signs and tell what color the traffic light is, maybe even to see brake lights. To know a floating bag is just that and not a solid object, to see that road is washed out or such.

qq66 · 9 years ago
The problem is that humans do primarily (not solely) use vision to drive, but they have mental models about the other driver. I remember once I was at a red light and when it turned green, I looked at the oncoming driver (far away) and thought, "that guy is too into his music" and didn't accelerate. Sure enough, he goes right through the red and slams his brakes halfway through the intersection.
philjohn · 9 years ago
One of the central tenets of autonomous vehicles should be that they are BETTER than any human driver could be. Relying on vision because, well, it works OK for humans doesn't cut it in my book.
sangnoir · 9 years ago
> The argument against LIDAR is just this in reverse, humans don't need LIDAR to drive, why should computers?

That is a terrible argument: birds don't need ailerons either.

revelation · 9 years ago
Unless self driving vehicles drive better than humans, and LIDAR or 360° vision seems a requirement for that, they will not succeed.
k_lander · 9 years ago
article from yesterday:

http://spectrum.ieee.org/cars-that-think/transportation/sens...

hopefully this will make LIDAR more economically practical

pakl · 9 years ago
> There are too many hard cases for vision.

Even worse, most machine learning vision approaches make the vision problem much harder on themselves. They do not treat the visual world as the dynamic physical interacting processes that give rise to it.

A disadvantage treating of vision as static frames of pixels is that deep feedforward networks have to "memorize" all the physical dynamical effects of e.g., shadows on textures.

Such systems cannot generalize well. A more promising approach is hierarchical systems with ubiquitous recurrent connectivity.

bluthru · 9 years ago
What's a hard case for vision? That's how our eyes work. LiDAR has problems with precipitation too.
kchoudhu · 9 years ago
Depth map extraction from vision in real time depends on accurate algorithmic merging of past frames, color gradients and motion vector extraction to come up with a 3D map of what's around the vehicle. Contrast this with LIDAR, which can present a depth map in real time by sending out an array of light pulses and timing how long they take to come back to the IP.

Which method is more likely to have implementation errors?

piemonkey · 9 years ago
There are a number of interesting examples in computer vision and psychology literature which document difficult cases for visual perception, see for example slides 14-19 in [0]. Ultimately, vision is a holistic process with edge cases that require knowledge of physics, human psychology, and so-called "common-sense reasoning" in order to resolve. Depending on how one phrases the objective of what a computer vision system should do, the problem can go from a tractable subset of automated reasoning to an intractable general AI task, often with very subtle changes to the problem statement.

[0] http://www.cs.toronto.edu/~urtasun/courses/CV/lecture01.pdf

sliken · 9 years ago
Tesla is going a different way with radar+cameras. Lidar TODAY is too expensive for normal priced vehicles, google's solution is very expensive, volvo seems to be targeting large trucks which are less price sensitive. The price of the future volvo passenger cars hasn't been announced, has it?

Society seems hyper sensitive to different risks, even if lower than existing risks. Thus a tesla fire is big news, even if the rate is less than the normal gas car fires. Thus weaknesses of lidar (like say fog) could cause problems, even if safer than existing cars. This is complicated by the humans driving cars around. Imagine heavy fog on the highway, and that humans decide that 45mph is safe, and the telsa (with a camera+radar system) decides on similar. Lidar might well decide the safe speed is less, and get rear ended more.

Reference for the 3 rammed cars "at speed"? The one I saw the car in front slowed, then accelerated, and merged right before a stopped car. The telsa slowed, then accelerated, decided it wasn't safe to merge, and braked hard. It did hit the car, but not very fast. Not sure I would have done better myself.

Animats · 9 years ago
Reference for the 3 rammed cars "at speed"?

Sideswipe of car stopped at inner edge of roadway in China: https://www.youtube.com/watch?v=rJ7vqAUJdbE

Rammed stopped or slow moving street sweeper at inner edge of roadway in China. Driver killed: https://www.youtube.com/watch?v=xoSNw_n1Xgk

Rammed stopped van at inner edge of roadway in Germany: https://www.youtube.com/watch?v=qQkx-4pFjus

Those are all the same design flaw - a big solid obstacle partly blocking the lane was hit.

These all have dashcam video on Youtube. One wonders how many more times this has happened without a dashcam.

This list doesn't include ramming the semitrailer in Florida, another fatal event. (Ref NSTB investigation HWY16FH018).

(There's some denial from Tesla fans, and Musk, about this.)

raverbashing · 9 years ago
I suspect Volvo started first as well
frik · 9 years ago
I agree, everyone is using LIDAR in the sensor mix but Tesla. Cameras simple don't work in certain weather conditions like direct low sun light rays eg in the evening.

Ford, Volvo and others are using two (or more) smaller cheaper ($6k) LIDAR instead of the single big $70k one that everyone remembers from Google cars. And smaller cheaper LIDAR are around the corner. It seems Telsa isn't going the full self-driving long way at the moment but offers some package that doesn't fully deliver and is a risk on the road if the driver uses it in condititions outside of it's limited designated highway-style roads, but in inner cities or country side roads. I am surprised that Google doesn't deliver something or Mercedes who did research since the 1980s and if the other companies take that long, Volvo and other chinese owned car manufacturers will take the lead in the next years.

randomdrake · 9 years ago
Super cool to see NVIDIA releasing hardware specifically meant to be a platform for building self-driving systems[1]:

"NVIDIA DRIVE™ PX 2 is the open AI car computing platform that enables automakers and their tier 1 suppliers to accelerate production of automated and autonomous vehicles. It scales from a palm-sized, energy efficient module for AutoCruise capabilities, to a powerful AI supercomputer capable of autonomous driving."

I hadn't heard of this before and with their purported pivot to an AI company, I can't wait to see what other platforms they develop in a similar capacity.

[1] - http://www.nvidia.com/object/drive-px.html

dogma1138 · 9 years ago
NVIDIA has been trying to pivot into a "services" company for a while, NVIDIA GRID/Gaming Cloud, computing etc. "AI" or at least fused sensor automation seems like a good place for them since they already have both the hardware and the software expertise for this.

NVIDIA actually gave up on even attempting to do console graphics again since they didn't want their pipeline to be suffocated by these draconian single customer contracts. What keeps AMD's graphics department alive these days, is exactly what would have prevented NVIDIA from pushing their business forward.

moyta · 9 years ago
Well, that and AMD has x86-64 & ARM licensing and experience sitting right there in house, ready for Microsoft, Sony, Nintendo, etc to say what IP Blocks they want, with a 3 month lag until that custom chip is ready.

Nvidia did make what could have been an x86-64 chip, but had to can that since they couldn't get the relevant licensing from AMD & Intel to allow its production and sale, hence why everything they sell is ARM.

erk__ · 9 years ago
The new nintendo console/handheld have Nvidia graphics.
matheweis · 9 years ago
I thought the reason for Tesla switching away from Mobileeye was that Mobileeye and Tesla couldn't come to an agreement on price and data licensing?

https://electrek.co/2016/09/15/tesla-vision-mobileye-tesla-a...

... and because Mobileeye wasn't comfortable with Tesla using their system for level 4 & 5 driving:

https://electrek.co/2016/09/16/mobileye-responds-to-tesla-ag...

bcantrill · 9 years ago
Could someone who understands this space weigh in on how technically interesting this is? (Or isn't?) In particular, their research paper on "End to End Learning for Self-Driving Cars"[1] seems to yield a system that requires an unacceptable amount of manual intervention: in their test drive, they achieve autonomous driving only 98% of the time. But I have no real expertise in this space; perhaps this result is impressive because it was end-to-end or because of the relatively little training? Is such a system going to be sufficiently safe to be used in fully autonomous systems? Or is NVIDIA's PX 2 interesting but not at all for the way it was used in their demonstration system?

[1] http://images.nvidia.com/content/tegra/automotive/images/201...

ilaksh · 9 years ago
It's incredibly freaking amazing if they are using deep learning to drive via mainly cameras only 98 percent of the time. No one else can do that. 98 percent is obviously a lot.
bcantrill · 9 years ago
Thanks -- that answers the question! So fair to say that it's impressive because of the absence of LIDAR and/or other sensors -- and that by adding LIDAR to such a system one could presumably get towards 0% manual intervention?
varelse · 9 years ago
NVIDIA's Drive PX 2 has too high power consumption and too low perf/W for the moment. They're winning this space because they are there more than that they are the best possible solution.

And they may continue to win because successful execution of an 80% product is worth far more than a 90+% powerpoint processor cough TenSilica et al. cough, or because this is such a huge potential market, it might actually go to a successful competitor. 2018 and beyond will be very interesting.

For while it's really desirable have the deep learning equivalent of x86 assembly language (CUDA) across a full stack from training to inference, in the end, IMO cost will be king. I'm not a big fan of $150K high-end servers filled with $5000 GPUs that can be bested with clever code on a $25K server fill with $1200 consumer GPUs. But I am a huge fan of charging what you can while you are unopposed. It's just that I think that state is temporary.

dogma1138 · 9 years ago
>I'm not a big fan of $150K high-end servers filled with $5000 GPUs that can be bested with clever code on a $25K server fill with $1200 consumer GPUs. But I am a huge fan of charging what you can while you are unopposed. It's just that I think that state is temporary.

There is virtually not a single "enterprise" grade product which can't be made at least 50% cheaper (or sometimes 10 times...) with off the shelf consumer grade hacked hardware....

Enterprise products always have a pretty steep markup, but what you lose with those 1200$ GPUs is both features (e.g. virtualization, thin provisioning, DMA/Cuda Direct etc.) and support. When you buy a 5000$ CPU over a 500$ with the same performance what you pay for is reliability and support, if you don't care about that then fine, but when you need to launch a 100M$ service on top of that platform you won't really care about the price tag it's all in the cost of doing business.

varelse · 9 years ago
Virtualization? Don't care, in fact, virtualization is what disabled P2P copies and created craptastic upload/download perf on AWS until the P2 instance.

DMA/CUDA Direct? Say Hello to P2P and staged MPI transfers, faster, cheaper (and usually better). Know your PCIE tree FTW.

Support? As someone who has been playing with GPUs for over a decade, bugs get fixed in the next CUDA release no matter Tesla or GeForce, if ever.

$100M service? Yep I'm with you. But I prefer a world without a huge barrier to entry to building that service, especially a barrier built 99% on marketecture. I want to build on commodity hardware and deploy in the datacenter.

Unfortunately, sales types seem to hate that outlook.

mattnewton · 9 years ago
I don't understand your argument, so maybe this is off base, but if you are saying people in industry aren't replacing their supercomputers with commodity gpu's, you're wrong; both apple and google have massive purchase orders for commodity nvidia gpus because they aren't just cheaper, they are better at this application. And I imagine other companies are as well.

Edit: "replace" is probably not the right word, this is work that the old systems don't do well, but they aren't throwing out x86 racks for gpus of course. It's just instead of buying more of the same for machine learning applications.

revelation · 9 years ago
Wait, who cares about the power consumption in a car?

perf/W is a useful number if you are trying to stack a data center full of these things and electricity (incl for cooling) is essentially your only cost. But if a car is using 300Wh/mile or, in an ICE, generating 100kW in excess heat it is an entirely pointless metric.

(Just for clarification: no one is using the Drive PX2 in a data center. Just look at these connectors:

http://images.anandtech.com/doci/9903/NVIDIA_DRIVE-PX-2.jpg )

varelse · 9 years ago
Informal anecdotal knowledge: GM wants 10-20 watts for the entire self-driving system, sensors and all.
maxerickson · 9 years ago
Is 10 watts the correct figure?

The powertrain uses 300+ Watts, so it seems like a manageable number if it is correct.

LeifCarrotson · 9 years ago
That's a big "+" on your figure - it's about 300 Watt-hours per mile, or 3600W constantly, if you're going 60 mph. For comparison, one horsepower is about 750 watts - a Tesla is a very low-drag car.

Anandtech reported [1] that the TDP of the whole board is around 250W. The Tegra SoCs are probably around 10W, but that doesn't get you much in the way of GPU horsepower.

1: http://www.anandtech.com/show/9903/nvidia-announces-drive-px...

varelse · 9 years ago
Seems indeed, and in fact an A/C will eat a kilowatt or more, but...

The components of this system need to generate next to zero heat and they need to be placed in all sorts of inconvenient locations. That's causing automakers to desire extremely low-power dedicated circuitry over GPUs.

Consider as an example the C7 Corvette: despite its enormous blindspot, it still doesn't have blindspot indicators because they can't find any way to cram existing sensors into the thing. There are more examples, that's just the one with which I'm most familiar because I balked on purchasing one over this despite an insanely great deal on it at the time.

throwaway123dse · 9 years ago
One thing I haven't seen discussed:

Many times I'm driving and have the police wave me through a traffic light.

Would a self-driving car realize what's happening?

What about a driver waving me to get ahead?

drcross · 9 years ago
It's not a solved problem but it's something that people are working on.

https://youtube.com/watch?v=8aEWHdduPwc This is the Mercedes car that communicates to pedestrians about the vehicle's intent.

https://en.wikipedia.org/wiki/Vehicular_communication_system... Something like this might feature prominently as well.

pimlottc · 9 years ago
The headline reads confusingly to me. It sort of sounds like it's saying the system is five years in future but what he meant is that Telsa itself is five years ahead of the competition in this area. (Off-the-cuff verbal speech is often a bit hard to follow when written down exactly word-for-word)
mrfusion · 9 years ago
How well would a modern car do in the darpa grand challenge? I'm curious how far we've come.
frik · 9 years ago
I doubt a self-driving car without a LIDAR would make it - on the same "test road" as in 2006. But it shouldn't be a problem for others like Google, Volvo, Ford, etc

We need an independent review of self-driving cars in a few years. It will be quite interesting how good they really are in different driving situations like different road and weather conditions. Say good buy to only-camera+radar based cars on a snowy road with bright winter sun (low sun rays) or heavy rain in a dark foggy night.