The current state of having multiple editors open, or having to switch between JetBrains stuff and Cursor is really a bit of an annoying transition period (I hope).
In terms of performance, their agents differ. The base model their agents use are the same, but for example how they look at your codebase or decide to farm tasks out to lesser models, and how they connect to tools all differ.
Would be nice to see how inference speed stacks up against say llama.cpp
This is a crippling disadvantage. Consider what it takes to evaluate a single software release for a robotaxi.
If you have a simulator, you can take long tail distribution events and just resimulate your software to see if there are regressions against those events. (Waymo, Zoox)
If you don't, or your simulator has too much error, you have to deploy your software in cars in "ghost mode" and hope that sufficient miles see rare and scary situations recur. You then need to find those specific situations and check if your software did a good job (vs just getting lucky). But what if you need to A/B test a change? What if you need to A/B test 100 changes made by different engineers? How do you ensure you're testing the right thing? (Tesla)
And if you have a simulator that _sucks_ because it doesn't have physics-grounded understanding of distances (i.e. it's based on distance estimates from camera), then you can easily trick yourself into thinking your software is doing the right thing, right up until you start killing people.
Another way to look at it is: most driving data is actually very low in signal. You want all the hard driving miles, and in high resolution, so that you can basically generate the world's best unit testing suite for the software driver. You can just throw the rest of the driving data away -- and you must, because nobody has that much storage and unit economics still matter.
This is to say nothing of the fact that differences between hardware matter too. Tesla has a bunch of car models out there, and software working well one one model may not actually work well on another.
Of course it does. Pre-Uber, we had both standard yellow cabs and black car services at different levels. (The main reason you see relative homogeneity within yellow cabs is that the government forces it by setting prices, not because of anything intrinsic about a fleet. Black cars are excluded from these rules.)
In shipping we can pay for different speeds and types of handling. On planes and trains we have different class tickets. In the rental car market, we have Hertz and we have rent-a-wreck. And even within Hertz, there are different car quality levels, which somewhat decreases flexibility (since you need to have more cars on hand than you would with a homogeneous fleet), but it's worth the upkeep to charge the wealthy customers more. Etc.
> Then please explain how cars would get dirtier as the service scales up? If today is already seeing the cars at full utilization, barring a cost-cutting measure that determined that cleaning less frequently would be a significant cost savings (which is a big assumption on your part), then we should be seeing roughly how clean the cars will be into perpetuity.
1. Tech prices come down, so the average customer willingness to pay for cleanliness comes down.
2. Services often launch with non-scalable attention to detail to control the initial public impression (eating the cost), and then relax over time.
3. Segmentation that's not feasible at the current scale but will be in the future.
So you _do_ agree that willingness to pay is only helpful if there is segmentation.
> Pre-Uber, we had both standard yellow cabs and black car services at different levels
There is more to the gap between yellow cab and black car than cleanliness. Stuff like service / helping you with bags, ETAs, partitions between yourself and the driver, niceness of the car itself, etc.
I'm sure we'll see segmentation along the lines of vehicle size and capability, but I expect cleanliness to be the same across those segments.
> Services often launch with non-scalable attention to detail to control the initial public impression (eating the cost), and then relax over time.
I don't think cleaning is the burden you're making it out to be. These cars return to depot when their battery is down. If you're to clean them at all, you should clean them when they return for charging, and then to your set standard. It's not a big knob for controlling costs.
Incorrect.
> unless they explicitly make an expensive "oft-cleaned" tier and a less expensive "less-oft-cleaned" tier,
That's exactly what I'm saying would happen. We already have it with Uber Black.
> You're assuming that the cars today don't get maximum utilization, and that with more utilization you'd see dirtier cars.
No, I'm not assuming that.
> As another note, I just don't see how cleaning-based market segmentation would make good operational sense
Again, Uber Black.
But I'm saying we _don't_ have that for Waymo, and it's very unlikely to happen, for many reasons. A big reason is simply that managing a fleet in heterogeneous fashion as you're describing (different cleaning schedules for the cars) doesn't really make sense IRL. It's a purely imagined scenario on your part.
> Incorrect.
Pray tell how I can pay for a cleaner car when there's only one option, car or no car?
> No, I'm not assuming that.
Then please explain how cars would get dirtier as the service scales up? If today is already seeing the cars at full utilization, barring a cost-cutting measure that determined that cleaning less frequently would be a significant cost savings (which is a big assumption on your part), then we should be seeing roughly how clean the cars will be into perpetuity.
> Again, Uber Black.
Uber Black achieves higher standards for cleaning by farming that out to the people renting out their personal vehicles. The drivers are incentivized to clean the cars more (than UberX drivers) to get more expensive fares.
But again, fleet management companies already do this for _all_ their cars. So for Waymo this is moot.
This was for classifying sentiment on yelp review polarity.