Work from 2023 [1] showed general purpose models did better when they were able to incorporate error feedback, humans incorporate error feedback, but none of the SOTA models on minif2f seem to.
This is distinct from the approach of the previous SOTA for an open-weights model (Kimina Prover) which generated at the full-proof level.
While it was very impressive to see Kimina's ability to generate medium-length proofs (think AIME-level problems) without sub-goals or feedback at intermediate steps, it's likely that at least subgoal decomposition will be required for longer proofs (think IMO-level problems.)
I certainly agree that where and how error/proof state feedback is best incorporated (training data synthesis / reward function / CoT during inference / etc.) is a fascinating area of research. (It's rumored that GDM's AlphaProof does use proof state / lean feedback already.)
* Waymo vehicle creeping into the pedestrian crosswalk (while the pedestrians had right of way to cross), which caused someone to have to walk around the car into the intersection ahead of the Waymo.
* Waymo vehicle entering a dedicated bike lane and practically tailgating the bicyclist that was ahead of it.
These might be safer than human drivers in aggregate and normalized by kilometer driven, but they drive like humans — greedily and non-defensively. I wouldn't want one these anywhere near a high-pedestrian traffic area ever, and I feel the same about human-driven cars, too.
In California, California Vehicle Code § 21209(a)(3) expressly permits a motor vehicle to enter a bicycle lane “to prepare for a turn within a distance of 200 feet from the intersection” -- among other cases. (The vehicle must yield to cyclists in the lane.)