"A single neural net policy operating directly from a camera image, trained in simulation with large scale RL, can overcome imprecise sensing and actuation to output highly precise control behavior end-to-end."
So that's what it takes. That's so much simpler than the way Boston Dynamics does it, working out all the dynamics in simulation first. It's amazing to see this done from vision to actuators in one net. It's now much less of a mystery how animals managed to evolve the ability to do that.
(I had a go at this problem years ago. I was trying to work out the dynamics from first principles, knowing it would be somewhat off. Then use some self-tuning (a predecessor to machine learning) to fine-tune the thing. Got as far as running up and downhill in 2D in the early 1990s. About one hour of compute for one second of motion.)
Of course, the details of how to actually implement something like this are way more complex than "just throw everything into a big neural net with images as the inputs and actuators as the output". You need to provide the right kind of guidance in order to learn a usable policy in any reasonable amount of time.
A very recent development (which this work builds on) is the idea of "online adaptation". It essentially involves doing the training in two stages:
1. You add a variety of dynamically-varying environmental effects to your simulator, by randomly altering parameters such as ground friction, payload weight distribution, motor effectiveness, and so on. You give the motion controller perfect knowledge of these parameters at all times, and let it learn how to move in response to them.
2. Then, you remove the oracle that tells the controller about the current environmental parameters, and replace it with another neural network that is trained to estimate (a latent representation of) those parameters, based on a very short window of data about the robot's own motor commands and the actual motion that resulted from it.
All of this can be done in simulation, many times faster than real-time. But when you transfer the system to a real robot, it adapts to its environment using the estimated parameters, without any of the networks needing to be re-trained. This ends up making it pretty robust to difficult terrain and perturbations. It also has the benefit of papering over subtle differences that arise between the simulated and real-world dynamics.
This paper adds a lot of additional refinements to the same basic idea. In the first stage, the system is given perfect knowledge of its surrounding terrain and the locations of some preselected waypoints, and learns to follow them. The second stage replaces those inputs with estimates derived from an RGB+depth camera.
Yes, but the key is the RL simulation can be simple & unrealistic since it's used to train offline, not to plan & control live: a control systems approach like the Boston Dynamics requires excellent physics modeling because that's the only way it can do planning or adjust for errors while controlling the actual robot in real time, while the NN is trained to cover many possible physics during development, and at runtime in an actual robot, it just shrugs and adapts to whatever crazy scenario it finds itself in.
It’s pretty incredible how animalistic its behaviors are becoming (hesitating for a moment at the edge of a lip, coiling it’s back legs for increased actuation, and when it almost misses the high jump and moves its back leg really fast multiple times to give itself a tiny boost each time until it recovers). Is that because it’s trained on animal behaviors? Or is this emergent behavior and animals just also do it more or less optimally already?
I'm sure this is impressive for an autonomous robot. However, as a fan of real parkour its kinda annoying to see some modest jumps and walking on a slope labelled "extreme parkour". What the robot demonstrates I'd expect any healthy 10 year old to be able to equal.
The metrics they're using (2x its height for climbing a wall, 2x its length for crossing a gap) are weird and don't really relate to the same achievements for a traceur. 2x its height is really more like slightly over 1x its usable body for that maneuver (0.4m length, 0.51cm height of the climb). I agree, not extreme but still pretty impressive for a robot. We're not going to see them doing cat leaps any time soon ;)
Relative size of leaps doesn't seem to be a particularly useful metric for assessing the NN performance anyway, since in the absence of the human constraint of fear it's really limited only by the mechanics of its legs and its weight.
More impressive would be adaptation to obstacles without clearly delineated edges, sticking landings on uneven/moving landing sites, and especially avoidance of landing sites which appear incapable of supporting its weight properly, particularly if it could do it well enough to generalise to novel courses.
The video hints the model may be able to do this to some extent (the high jump does show some apparently necessary compensating movement to avoid slipping off), but doesn't really demonstrate it.
For sure, but the paper is disingenuous in their description of the challenge they are supposedly tackling compared to what they actual achieve. Here's an excerpt from the introduction of the paper:
"Parkour is a popular athletic sport that involves humans traversing obstacles in a highly dynamic
manner like running on walls and ramps, long coordinated jumps, and high jumps across obstacles.
This involves remarkable eye-muscle coordination since missing a step can be fatal. Further, because
of the large torques exerted, human muscles tend to operate at the limits of their ability and limbs
must be positioned in such a way as to maximize mechanical advantage. Hence, margins for error
are razor thin, and to execute a successful maneuver, the athlete needs to make all the right moves.
Understandably, this is a much more challenging task than walking or running and requires years of
practice to master. Replicating this ability in robotics poses a massive software as well as hardware
challenge as the robot would need to operate at the limits of hardware for extreme parkour."
The difference being each human has to spend thousands of hours learning that level of control over their bodies, and work hard to maintain that level of physical fitness. And if they fall off the roof, millions of dollars of potential earnings and GDP die with them. Whereas these robots are only $70k, and once one of them can do this, they all can do it. Just like with Chess and Go. It’s not impressive at first, then a couple years later, it’s better than humans could ever be, and it can be cheaply replicated ad infinitum.
Yes, that's very true. Success for one robot means success for a whole bunch of robots. However, success for one Olympic athlete does not mean everyone can achieve the same level. That's the main difference.
I got the opposite conclusion from watching this video.
I feel the way the robots do it is similar to humans. Considering that's all emergency behaviours (not trained with animals' or humans' motion data) I believe that robots will exceed humans by far in few years.
Out of curiosity, is everyone of university age good at clickbait now?
Like the whole point of saying “extreme” parkour is to boost engagement from pedantic analytic people like us talking about the hyperbolic title choice
The mainstream, state broadcaster in my country publishes youtube videos of news topics with SENSATIONAL TITLES and thumbnails with close-up REACTION FACES. On their home page, along with the written articles.
I hate what the attention economy has become. Old man yells at emoji...
It is a bit disappointing. This video did not show anything more "extreme" than various Boston Dynamics videos from years ago. And to be even more pedantic, this is hardly parkour at all. Jumping over a gap and climbing on a box has little resemblance of what we came to expect when we hear this term.
I scanned the paper and, if I got it right, during training the robot gets "external" information about its world position and speed compared to pre-set waypoints, something that an animal or person wouldn't have access to. What I didn't understand for sure is if this information is also available to the robot while performing the "parkour": I'm pretty sure that it perceives the obstacles only through its depth camera, but how does it determine where it is and where it should go next? Is this still done through waypoints and global position knowledge? A joystick is mentioned for control; is that being used to set waypoints? Or feed the robot a relative direction and speed?
When the policy is deployed in the real world only the depth camera is used no waypoints etc.. Scan dots and target heading is used in the first Phase of the training to pretrain a policy in simulation. In Phase 2 a policy is trained end-to-end using the pretrained actor network: "First, exteroceptive information is only available in the form of depth images from a front-facing camera instead of scandots. Second, there is no expert to specify waypoints and target directions, these must be inferred from the visible terrain geometry." For policy training in Phase 2 DAgger which is based on Behavior Cloning is used (with the policy from Phase 1 as the expert), they also use some tricks to make sure no actions that are too different from the expert actions are executed during training. In Phase 2 the network learns to extract environment information instead of from the scan dots from the depth camera. Also in Phase 2 they use the pretrained actor network from Phase 1 but the depth embedding must be learned from scratch. This is how I understand it.
Thank you for your comment. What I don't understand is this: when the robot is in a new environment, how does it know where it's supposed to go? My understanding is that the training teaches the robot how to get to a position, but I didn't see anything about how to choose where to go (in "old AI" parlance, what could have been defined as planning).
So that's what it takes. That's so much simpler than the way Boston Dynamics does it, working out all the dynamics in simulation first. It's amazing to see this done from vision to actuators in one net. It's now much less of a mystery how animals managed to evolve the ability to do that.
(I had a go at this problem years ago. I was trying to work out the dynamics from first principles, knowing it would be somewhat off. Then use some self-tuning (a predecessor to machine learning) to fine-tune the thing. Got as far as running up and downhill in 2D in the early 1990s. About one hour of compute for one second of motion.)
A very recent development (which this work builds on) is the idea of "online adaptation". It essentially involves doing the training in two stages:
1. You add a variety of dynamically-varying environmental effects to your simulator, by randomly altering parameters such as ground friction, payload weight distribution, motor effectiveness, and so on. You give the motion controller perfect knowledge of these parameters at all times, and let it learn how to move in response to them.
2. Then, you remove the oracle that tells the controller about the current environmental parameters, and replace it with another neural network that is trained to estimate (a latent representation of) those parameters, based on a very short window of data about the robot's own motor commands and the actual motion that resulted from it.
All of this can be done in simulation, many times faster than real-time. But when you transfer the system to a real robot, it adapts to its environment using the estimated parameters, without any of the networks needing to be re-trained. This ends up making it pretty robust to difficult terrain and perturbations. It also has the benefit of papering over subtle differences that arise between the simulated and real-world dynamics.
This paper adds a lot of additional refinements to the same basic idea. In the first stage, the system is given perfect knowledge of its surrounding terrain and the locations of some preselected waypoints, and learns to follow them. The second stage replaces those inputs with estimates derived from an RGB+depth camera.
> So that's what it takes. That's so much simpler than the way Boston Dynamics does it, working out all the dynamics in simulation first.
I mean, it sounds like they have had to work out all the dynamics, and simulate them.
Admittedly they don't have to do it on-robot in real time any more.
More impressive would be adaptation to obstacles without clearly delineated edges, sticking landings on uneven/moving landing sites, and especially avoidance of landing sites which appear incapable of supporting its weight properly, particularly if it could do it well enough to generalise to novel courses.
The video hints the model may be able to do this to some extent (the high jump does show some apparently necessary compensating movement to avoid slipping off), but doesn't really demonstrate it.
"Parkour is a popular athletic sport that involves humans traversing obstacles in a highly dynamic manner like running on walls and ramps, long coordinated jumps, and high jumps across obstacles. This involves remarkable eye-muscle coordination since missing a step can be fatal. Further, because of the large torques exerted, human muscles tend to operate at the limits of their ability and limbs must be positioned in such a way as to maximize mechanical advantage. Hence, margins for error are razor thin, and to execute a successful maneuver, the athlete needs to make all the right moves. Understandably, this is a much more challenging task than walking or running and requires years of practice to master. Replicating this ability in robotics poses a massive software as well as hardware challenge as the robot would need to operate at the limits of hardware for extreme parkour."
Here's an example of real parkour: https://www.youtube.com/watch?v=5lp1oS0vXg0
They aren't doing parkour, let alone "extreme" parkour.
Dead Comment
https://m.youtube.com/watch?v=QHqAVaQqQWQ&t=106s
The one pictured appears to be $14k
https://m.unitree.com/en/a1/
I feel the way the robots do it is similar to humans. Considering that's all emergency behaviours (not trained with animals' or humans' motion data) I believe that robots will exceed humans by far in few years.
https://unitreerobotics.net/
Like the whole point of saying “extreme” parkour is to boost engagement from pedantic analytic people like us talking about the hyperbolic title choice
I hate what the attention economy has become. Old man yells at emoji...