Can you explain more why you think this? As I see it, cars in general remain very highly negative-externality, and self-driving doesn't change it much.
Regular ICE cars:
- air pollution from tire wear and brake dust
- air pollution from exhaust
- embodied carbon
- noise
- endangering other road users
- traffic congestion
- land use (sprawl)
- long term health impacts (encouraging sedentary lifestyle)
Switch to all-electric and you lose a bit of noise and all the tailpipe emissions, but gain whatever emissions generated the power (sometimes solar, great, but sometimes lignite, boo), whatever environmental damage resulted from the battery materials, and probably marginal worsening of safety as the same ranger requires a heavier vehicle
Switch to self-driving and you may increase safety (feels like Waymo basically yes, Tesla probably no based on their track record with stat manipulation), but also vastly increase use, worsening congestion and land-use. The others stay the same.
So I don't understand why you're saying the externality ratio is good. From my perspective self-driving cars don't really move the needle.
I guess I place a premium on safety over land use and congestion.
I also suspect any future congestion and land-use problems will get better after an initial dip. Urban living becomes more desirable as city parking lots disappear.
If roads are used exclusively for self driving cars, this would probably improve traffic flow. Robot cars multiply current highway and city street capacity by coordination. They can smoothen traffic flow due to hard braking, and drive much closer to each other.
ChatGPT seems like a huge distraction for OpenAI if their goal is transformative AI
IMO: the largest value creation from AGI won’t come from building a better shopping or travel assistant. The real pot of gold is in workflow / labor automation but obviously they can’t admit that openly.
It's broader, it's about users data. For example, you can store my address so you can send the item I ordered to me. You can't, without permission, use that to send me marketing stuff.
Kinda genius to scale exoskeleton data collection with UMI grippers when most labs are chasing "general" VLMs / VLAs by training on human demonstration videos.
Imo the latter will be very useful for semantic planning and reasoning, but only after manipulation is solved.
A ballpark cost estimate -
- $10 to $20 hourly wages for the data collectors
- $100,000 to $200,000 per day for 10,000 hours of data
- ~1,500 to 2,500 data collectors doing 4 to 6 hours daily
- $750K to $1.25M on hardware costs at $500 per gripper
Fully loaded cost between $4M to $8M for 270,000 hours of data.
Not bad considering the alternatives.
For example, teleoperation is way less efficient - it's 5x-6x slower than human demos, and 2x-3x more expensive per hour of operator time. But could become feasible after low-level and mid-level manipulation and task planning is solved.
Not teleoperating can have certain disadvantages due to mismatches between how humans move vs. how robots move though. See here: https://evjang.com/2024/08/31/motors.html
Intuitively, yes. But is it really true in practice?
Thinking about it, I'm reminded of various "additive training" tricks. Teach an AI to do A, and then to do B, and it might just generalize that to doing A+B with no extra training. Works often enough on things like LLMs.
In this case, we use non-robot data to teach an AI how to do diverse tasks, and robot-specific data (real or sim) to teach an AI how to operate a robot body. Which might generalize well enough to "doing diverse tasks through a robot body".
The exoskeletons are instrumented to match the kinematics and sensor suite of the actual robot gripper. You can trivially train a model on human collected gripper data and replay it on the robot.
You mentioned UMI, which to my knowledge runs VSLAM on camera+IMU data to estimate the gripper pose and no exoskeletons are involved. See here: https://umi-gripper.github.io/
Calling UMI an "exoskeleton" might be a stretch but the principle is the same - humans use a kinematically matched instrumented end affector to collect data that can be trivially replayed on the robot.
There is probably an equivalent of Amdahl's law for GDP - overall productivity will be bottlenecked by the least productive sectors.
Until AI becomes physically embodied, that would mean all high-mix, low-volume physical labor is likely going to become a lot more valuable in the mid-term.
Ben’s original take about 1% of users being creators might end up being right eventually
Consider the Studio Ghibli phenomenon. It was fun to create and share photos of your loved ones in Ghibli aesthetics until that novelty wore off
Video models arguably have a lot more novelty to juice. But they will eventually get boring once you have explored the usually finite latent space of interesting content