How AI robots learn just like babies—but a million times faster w/ NVIDIA's Rev Lebaredian (Transcript)

Listen along

The TED AI Show
How AI robots learn like babies—but a million times faster w/ NVIDIA's Rev Lebaredian
December 3, 2024

Please note the following transcript may not exactly match the final audio, as minor edits or adjustments could be made during production.

[00:00:00] Bilawal Sidhu: Hey, Bilawal here. Before we start the show, I have a quick favor to ask. If you're enjoying The TED AI Show, please take a moment to rate and leave a comment in your podcast app. Which episodes have you loved and what topics do you want to hear more of? Your feedback helps us shape the show to satisfy your curiosity, bring in amazing guests, and give you the best experience possible.

The world of AI is advancing at an incredible pace, and it's no secret that in many areas computers have long outperformed humans. But there's been one area that's been tough for robots to master: physical intelligence. We've talked a lot on this podcast about text and image generation, technologies that took years of research, immense computational power and vast datasets to develop.

But when compared to mapping 3D spaces and predicting the chaotic randomness of the real world, that's all child's play. So what gives humans the edge here, at least for now? It's simple: we've had a lot of practice. Imagine you're a pro baseball player in the outfield watching a fly ball come your way. In an instant, your brain calculates the ball's speed, spin, and trajectory to predict where it will land. To you, it feels automatic, but it's the result of years of practice and learned experiences, not just from baseball, but from a lifetime of physical interactions. From childhood, moments of trial and error in the physical world have trained your brain to understand how objects move and react. And for humans, mastering these skills takes time, because a real world practice can't be rushed.

But fortunately for robots, it can be rushed. And NVIDIA, the AI giant historically known for its graphics cards, has developed incredibly powerful simulated environments where robots can practice and learn at a supercharged pace. Tens of millions of repetitions, which might take humans years, can be compressed into minutes.

We're already seeing this in self-driving cars, but the potential goes far beyond that. By building AI that understands the physical world, NVIDIA is setting the stage for machines that could revolutionize industries, assist in complex surgeries, and even help around the house. So what does it mean for robots to develop a kind of physical intuition?

And what challenges and opportunities lie ahead as we continue to push the boundaries of robotics? I'm Bilawal Sidhu, and this is The TED AI Show, where we figure out how to live and thrive in a world where AI is changing everything.

Our guest today, Rev Lebaredian, began his career in Hollywood, where he worked on visual effects for films like Mighty Joe Young and Stuart Little. His experience in creating detailed, dynamic 3D worlds laid the foundation for his role today as VP of Omniverse and Simulation Technology at NVIDIA. There, he's using that expertise to push the boundaries of robotics by applying simulation technology to teach robots physical intelligence.

In other words, how to understand and interact with the real world. In our conversation, we explore how NVIDIA, known for its role in gaming technology, became a key player in the development of generative AI, what a robot even is, and Rev's vision for a future where robots enhance our lives. So Rev, welcome to the show.

[00:03:43] Rev Lebaredian: Thank you for having me, Bilawal.

[00:03:45] Bilawal Sidhu: So in the first part of your career, you worked in entertainment, helping audiences become immersed in fantasy worlds, and now your work involves helping robots become immersed in simulations of the real world. Can you explain to our listeners what your role is at NVIDIA?

[00:04:01] Rev Lebaredian: Technically, my role is, the title is, Vice President of Omniverse and Simulation Technology. It's kind of a weird title. I don't think there's many others like it out there. And it's strange because it's a new concept, relatively speaking. I started my career, as you mentioned, in entertainment, media entertainment, doing visual effects and computer graphics for that purpose.

I joined NVIDIA 23 years ago with the hope of taking what I was doing in movies, creating this imagery of high fidelity, high quality fantasy worlds, and doing it in real time, doing it really fast, using our GPUs to, to power that computation so that it could become, what's a linear experience in movies could become an interactive one, like in a video game or in a immersive experience like XR.

It took a while for us to get there, though.

[00:04:55] Bilawal Sidhu: Speaking of that, you've had a very unique vantage point over the years watching NVIDIA almost evolve from basically a gaming hardware company to a leader in AI and simulation. Could you share a little bit about your journey in NVIDIA and how NVIDIA's mission has transformed over the years?

[00:05:12] Rev Lebaredian: That's a really, really great question and I think, uh, a lot of people don't really understand how NVIDIA, this quote-unquote gaming company, or this chip company that made chips for gaming PCs, is now the most valuable company in the world and at the center of all of this AI stuff. But if you go back to what the idea behind the, the creation of the company was, all the way at the beginning, it actually makes a lot of sense.

The founding principle of the company was this idea that general purpose computers, ones built around CPUs, the same architecture that we built all computers around since the 1960s, starting from the IBM System 360, they're really great, but there are certain computing problems that they just aren't fast enough to solve.

Now, at the time, we had this law called Moore's Law. It's not law like a law of physics. It was more like a, an observation of how semiconductors were essentially providing double the compute for the same price or the same, same amount of power every year and a half or two. At its height, Moore's Law made it so that we could get a hundred times speed increases for the same price or the same power over a 10 year period, but we looked at that, at Moore's Law and said, well, if we wait for Moore's Law to give us enough computing power to do certain things, like rendering for computer graphics for video games, we would have to wait decades or maybe even hundreds of years before, before the computers would be fast enough to do some of the things we wanted to do.

So NVIDIA set about creating this new form of computing that doesn't do everything, but it can do many things that would otherwise be impossible with this generic kind of computer. And, uh, we call that accelerated computing. We invented the idea of a GPU and the first problem we chose to tackle was the problem of 3D rendering for producing these images in video games.

At the time when NVIDIA was formed in 1993, there was no market for this. There were actually no 3D video games. They were just starting. There was Doom and Wolfenstein, like the first ones that just showed up.

[00:07:36] Bilawal Sidhu: And Duke Nukem.

[00:07:37] Rev Lebaredian: Yeah. That, that came a little bit later I think. Uh, it was not '93, maybe '95, I think.

And so we imagine that this problem, if we could help solve it, a market would form around that, and then we could expand into other markets with the same accelerated computing architecture. And that's essentially what happened. Fast forward a few more years. In the early two thousands, we added a critical feature to our GPUs.

It's called programmable shading, which is simulating how the light interacts with the material inside a 3D world. That's, that's what makes plastic look like plastic, aluminum look like aluminum, wood look like wood. Up until that point in time, the kinds of shaders we could have, the kinds of materials, were very limited, and they made the video games look very, uh, simple or cartoony.

Not quite realistic. In the movie world, we weren't limited by time and how much time you have to render. We could spend hours and hours rendering. So there's this big disconnect between how the quality of a computer generated image in a movie and what you could see in a video game. We introduced programmable shading, and that feature of making it programmable unlocked the possibility of us using the same GPUs for more than computer graphics and rendering.

And very quickly we saw, we saw researchers and other people who weren't doing computer graphics take advantage of all the computing capabilities that were in our GPUs by taking their problems, other sorts of physics problems like molecular dynamics and fluid dynamics, they would take these problems and phrase them like they're a computer graphics problem.

[00:09:25] Bilawal Sidhu: Hmm.

[00:09:26] Rev Lebaredian: And when we realized that, that that was happening, people were willing to contort themselves into using graphics AVS to do this other stuff, we said, let's make it easier for them. And we introduced CUDA, which was a more natural way of programming general purpose things that weren't graphics on our GPUs.

And we essentially waited for six, seven years to see what the killer app would be. We imagined some developer, somewhere, probably a grad student, is gonna go figure out something amazing to do with this computing capabilities. And uh, it took a while. We introduced CUDA in 2006. At the end of 2012, almost seven years later, we finally had that moment.

And what happened was two research students and their professor at the University of Toronto. Ilya Suskever, Alex Krashevsky, and their professor Jeff Hinton, who just won the Nobel Prize. They beat all of the benchmarks in image classification with a deep learning neural network called AlexNet at the end of 2012, when they published that. And that, that essentially changed everything.

[00:10:38] Bilawal Sidhu: And this is insane because up until that point, basically every other approach for the ImageNet benchmark was not really winning because of this deep learning approach. This was the first time deep learning kind of blew everyone's mind in the realm of computer vision. And it's kind of wild to imagine it started off with programmable shaders and trying to make, like, cinematic visuals from Hollywood run in real time on your computer.

But that same capability, like you said, as you made it easier for developers, unlocked this whole new world in computer vision and certainly caught the whole world's attention, particularly y'all's. Probably sooner than everyone else, I assume.

[00:11:15] Rev Lebaredian: Mm-Hmm, that's exactly right. It seems counterintuitive that this thing built to create images is somehow the same thing that you need to build intelligence, but really it all just comes down to computing.

The form of computing we had to build for computer graphics. We process a lot of pixels, a lot of triangles, a lot of light rays bouncing around in a scene. That same form of computation is the same thing you need to do all of the tensor math, all of the matrix math. The problem of image classification, that's been a long standing one that we've all known would be great if we could solve.

They've been trying to solve it since the 1950s. It's a really, really useful thing to do, to be able to distinguish what's inside an image that you provide the computer automatically. And uh, up until that point, we would take a really smart person, a computer scientist, that person would imagine an algorithm that can do image classification, and then transcode what's in their brain into the computer and produce a program.

What changed here was, for the first time, we were able to create an algorithm to solve something that no human could actually imagine. The way we solved it was by taking a large computer, effectively a super computer, we gave it millions of examples of images and said, when you see an image that looks like this, that's a cat.

And when you look at an image that looks like this, it's a dog. We look at this image, it's an airplane. And so we did that enough times that it wrote the software, it wrote the algorithm that that could do that image classification, and so it did it better than any algorithm that a human could imagine.

[00:13:05] Bilawal Sidhu: And that's wild, right? You're talking about this, like, era where humans have written software. Now software is writing software.

[00:13:12] Rev Lebaredian: That's right. There's two basic ingredients: a supercomputer, lots of computation, and you give it a whole bunch of data or examples of what you would like it to do, and it figures out the algorithm for you based on the examples you give it.

The first one, building large computers, that's our happy place, right? That's what NVIDIA knows how to do. We love building powerful computers and, and scaling them up. And so that's what we set about doing over a decade ago. And the recent explosive growth of NVIDIA is essentially because of the bet we placed over a decade ago, that these big computers were gonna be useful.

That's what everybody is clamoring for right now. They're setting up these AI supercomputers.

[00:13:55] Bilawal Sidhu: Yeah. And every, every country and company wants more of your GPUs. And, and of course the recent demand has really been driven by large language models and diffusion models, which we've talked about a bunch on the podcast, but it's interesting, like as, as cool as ChatGPT is, and as cool as it is to be able to type a prompt and get an image out, this stuff isn't the holy grail. These systems have their limitations, right? Could you talk a little bit about that as we transition this conversation towards physical AI?

[00:14:26] Rev Lebaredian: Yes, that's, uh, that's exactly right. So, at that moment when we realized how profound this change was, that we could now produce algorithms that we never imagined we would have in our lifetimes through this new technique of deep learning and AI, the next question we asked ourselves was, now that we have this possibility of creating these amazing new things, which ones should we go create? What are gonna be the most valuable and impactful ones? Now, if you just take a step back and think about the computing industry, the IT industry, it's somewhere between two and $5 trillion a year globally.

The, which is a huge number, right? That's a really big industry. However, all of the rest of the industries out there, the industries that are about our physical world, the world of atoms, that's a hundred trillion dollars. That includes markets like transportation, transporting humans, transporting goods. It includes manufacturing, which is reassembling atoms into products, includes drug discovery and design, reassembling atoms into medicines. So on and so forth. Like all these things about our physical world, at least the way humans value them through markets, are much greater value than information. Now, information is the easiest things for us to digitize.

So it kind of makes sense that the first algorithms that we develop using this new machine learning, deep learning AI technique, it's gonna use all the data that we have readily available to us, which is essentially what's on the internet. But if we could somehow take this new superpower and apply it to the realm of atoms, we unlock that hundred trillion dollar market.

[00:16:22] Bilawal Sidhu: Hmm.

[00:16:23] Rev Lebaredian: And that, all of those markets, take, uh, manufacturing, for example, we've, we've applied it and computing to, to those markets like manufacturing. But if you go into a factory, it's not that different from a factory 50 years ago. They've been largely untouched by computing. The reason why we haven't been able to do that is because we haven't really had a bridge between the physical world and the computing world.

[00:16:51] Bilawal Sidhu: Connecting bits and atoms, baby. Let's go.

[00:16:53] Rev Lebaredian: Yes. And if you think a little bit more about that, bridge is essentially robotics.

[00:16:58] Bilawal Sidhu: Hmm, totally.

[00:16:59] Rev Lebaredian: And so we thought about this and we said, this is now maybe possible. The robotics, it's been a dream for a long time. But what we've been missing are the fundamental algorithms we need to build a, a truly useful robotic brain so that we could apply computing to the real world. And so, what's a robot? A robot is essentially an agent out here in our real world that does three things and does these three things in a loop: a robot perceives the world around us, the physical world. It inputs the world through sensors. They can be cameras and lidars and and radars, all kinds of sensors, whatever the sensing mechanism is. And it makes some kind of sense out of what, what's coming in. It understands what's coming in. Essentially that first neural network, AlexNet, was doing that, right? It's, uh, getting some information from the real world, an image, a photograph, and making sense of what's inside it. The next thing it does, a robot agent inside the physical world, it takes this information, what is perceived, and makes some decisions. Makes a decision about how it should act. It plans and decides how it's going to affect the world. And the third thing is actuation. It actually does something inside the world. So once it's made the decision, it does something that actually moves or affects the, the physical world. And once that happens, then it's a loop. You perceive your changes to the world, update your decisions and your plan, and go actuate. By this definition, many things are, are robots, not just the things we normally think of as a robot, like a C3PO or R2D2. A self-driving car is definitely a robot, has to perceive the world around it. Where are the other cars? The stop signs? Pedestrians, bicyclists. How fast are they all moving? What's the state of the world around me, around the car? Make some decisions on how it's going to get to the final destination, and actuates: steers, brakes or accelerates. And this thing runs in a loop. Lots of things are robots if you define them this way.

The building I'm in right now, which is our Endeavor building, our headquarters, every day when I enter it, in the reception area, we have turnstiles. There are sensors there, there's some cameras. They know when I walk up to the turnstile, it senses that I've approached and then decides who I am based on an image classification algorithm not dissimilar from that original AlexNet. And once it determines that I'm Rev, it can look me up in a database, it should, I have access, and then it actuates in the world. It opens the turnstile so I can pass through and updates some count somewhere that now I'm in the main area. So this building is essentially a robot.

And so, so if you think about robots in this way and you think about robotic systems as essentially the bridge between computing and the a hundred trillion dollars worth of industries out there that, that deal with the physical world, you start to get pretty excited. You're like, wow, we now potentially have a, the opportunity to go make a big impact in many of these other industries.

[00:20:19] Bilawal Sidhu: So on, on that note, I mean, it's interesting, right? You are talking about how factories haven't changed in decades. And you're right, there's like enterprise resource planning software to keep track of the inventory of stuff and how it's moving around. But the world of atoms hasn't seen as much progress in the world of bits. And to unlock that massive, like physical, like the massive opportunity in these physically based industries, what's the missing piece?

What do we not have today, and what are y'all building to make that happen?

[00:20:48] Rev Lebaredian: So this is where simulation comes in. If we go back to, you know, what were the key differences between how we used to write software and this new form of AI, one is supercomputing. The other is you need that data, or the set of examples to give it so it could go write the function well.

Where were we gonna get that data to learn the physics of the world around us? How do you gather that data? It doesn't just exist on the internet. The stuff we have on the internet is largely the things that were easy to digitize, which is not stuff in the physical world. And so our thesis is that the only way we're gonna get all the data that we need is by essentially taking the physical world and all the laws of the physical world and putting it in a computer, making a simulation of the physical world.

Once you have that, you can produce all of the data you need, essentially the training grounds for these AIs to learn about the physical world. You're no longer constrained by all of the constraints that we have out here in the real world. We can train faster than time, than the real world time out here.

By just adding more compute, you can go for every real world second, we can do millions of seconds in the simulated world.

[00:22:01] Bilawal Sidhu: Wow. Yeah.

[00:22:01] Rev Lebaredian: Uh, and collecting data from the real world is really expensive. Let's take one kind of robot: self-driving cars, autonomous vehicles. If, if you want to train a network to perceive a child running across the street in any condition, any lighting condition, any city. Uh, and...

[00:22:24] Bilawal Sidhu: Different times of years, a different weather.

[00:22:26] Rev Lebaredian: Different, yeah. Di yeah, different weather conditions. You're gonna have to actually go out there in the real world and have a child run across the street as your car's barreling down the road and capture it.

I mean, first, first of all, obviously this is unethical to do and we shouldn't do that. But then just the tediousness of that, of capturing it in every possible long tail scenario, it's just untenable. You can't do that. It's too expensive and it's just impossible. You know, there are some really rare weather conditions you might want to have the, that same condition with, uh, volcanic ash falling. That might happen in Hawaii.

How, how can you even construct that scenario, right? But in simulation, we can create it all. In addition, when you grab data from the real world, you only have kind of half the data you need. Yeah. We also need to know about what's inside this information and the unstructured information.

[00:23:24] Bilawal Sidhu: The labels.

[00:23:24] Rev Lebaredian: The labels, exactly. So, so with AlexNet, when they trained it, they had not only the image, but they had the label that said that image is a cat or a dog. When we simulate a world, we can produce the labels perfectly and automatically.

[00:23:40] Bilawal Sidhu: You, get it for free pretty much.

[00:23:41] Rev Lebaredian: Yeah, but when you do it in the real world, you have to have an army of humans or some, some other mechanism of adding the labels, and they're gonna be inaccurate.

And before you deploy it out into the real world, you probably want to make sure it's gonna work. You know, we don't, we don't wanna put a robot brain in a self-driving car and just hope that it's gonna work when that child runs across the street. And the the best place to go test that is in a virtual world, in a simulation.

And, uh, that was a really long-winded way to get to, this is essentially what I've been working on, uh, in recent years. Uh, here at NVIDIA, we saw this, the need for this many years ago. So we started building what we call Omniverse. Omniverse is kind of a quote unquote operating system that we collect all of our simulation and virtual world technologies into.

And the goal of Omniverse is specifically about doing simulations that are as physically accurate as possible. That's the key thing. It has to match the real world because otherwise our robots would be learning about, uh, laws of physics from something that's just wrong. This is distinctly different than what I did before in my work in movies, um, and doing simulations to produce the, the amazing imagery that we see in, uh, visual effects in CGI movies or in video games.

That's all about creating really cool looking images that are fun, of fantasy worlds of fake worlds. There's all kinds of stuff that we're cheating. We add extra lights and makeup and, and, uh, we're breaking the laws of physics in, in order to make the movie fun and cool or exciting.

[00:25:29] Bilawal Sidhu: There is something really poetic about that though. Like it basically goes back to the start of your career. Like all this stuff, all these capabilities y'all built to emulate the laws of physics, let's say for light transport, and just get the material properties right, so the glint veneer, the reflections and refraction all look really good. That's exactly what you need, obviously tuned in a fashion that's physically accurate, as you said, so these robots have kind of a believable digital twin or copy or replica of the real world where they can, where they're free to make mistakes, but also the time dilation aspect that you mentioned where you can scale up and have these like models go do things in the digital realm that, like, would take forever to do, uh, in the physical world.

And it feels like there's another piece of this too, is like you create these digital replicas of the world that becomes the training data, because as you said, you don't have the internet to go and pull all this text or image data from, but then you have the robots try things. And there's this like, like domain gap that, this chasm that you need to cra cross between the simulation in the real world.

What are some of the other capabilities that are, you're, y'all are building to make that happen?

[00:26:38] Rev Lebaredian: Yeah, I kind of oversimplified how, how we build these AIs to just, you feed data into, in, into the supercomputer and out comes, out comes this amazing robot brain. That's some of how we do it, but there's many different forms of learning, and I think the one you're touching upon is what's called reinforcement learning.

It turns out that these robots, one of the best ways for them to learn is sort of how humans and creatures learn. When a baby is born, a human baby is born into the world, it still doesn't understand the physics of the world around him. A baby can't see depth. They can't really see color yet. They have to learn how to see color.

And over time, over weeks, they start learning those things. They start learning how to classify. They classify mom and dad and, and siblings and, and, uh...

[00:27:30] Bilawal Sidhu: Apple, ball.

[00:27:31] Rev Lebaredian: Apple, all of those things around. They learn it just through, through experience. They also learn about the laws of physics through a lot of experimentation.

So when you first start giving your baby food and putting food in front of 'em, one of the first things they do is drop it or throw it. Breaking things, throwing things, making a mess. Those are essentially science experiments. They're all little scientists that are trying things until they learn it and once, once they understand how that, that physics works, they move on.

Robots learn in the same way. That through this method called reinforcement learning, where we throw them into a virtual world or into, it could actually be in the real world, but it's too slow to do in the real world. Generally we do it in the virtual world. We give this robot the ability to perceive and actuate inside that world, but it doesn't actually know anything.

But we give it a goal, we'll say, stand up. And we have them try millions and millions of iterations of standing up. And so, so what you were alluding to, this Isaac Sim, that's our robotic simulator that we've built on top of our Omniverse platform, on this quote unquote operating system, that allows you to do many of the things you need in order to build robot brains.

One of those things is reinforcement learning.

[00:28:56] Bilawal Sidhu: It's almost like a training simulator built on top of Omniverse where it can, where it's free to make mistakes, and you're almost like like you said, I, I love the notion of wall clock time and speeding that up. You're compressing down, compressing all these like epochs of learning and evolution down into something that is manageable, and then you plop that into a real world robot and it still works.

[00:29:18] Rev Lebaredian: That's exactly right. Simulated time is not bound to wall clock time. If I double the amount of compute, double the size of my computer, that's twice the amount of simulation I can do. That's twice the number of simulation hours, and so, so the scaling laws apply here in a profound way.

[00:29:37] Bilawal Sidhu: That's pretty magical.

Let's talk a little bit about the applications of physical AI, like obviously applies to so many different fields. We talked about autonomous vehicles. There's like robotic assisted surgery. You alluded to automated warehousing. Could you share some examples of how physical AI is currently impacting these areas and what it's unlocking for these industries that have sort of been stuck in the past?

[00:30:01] Rev Lebaredian: I think the very first place that it's impacting the most, the first area is, uh, autonomous vehicles. The first robots that, once we discover this deep learning machine learning thing, immediately you saw all of these efforts from different companies to go build autonomous vehicles, whether they're robo taxis or assistants inside, um, uh, inside commercial cars.

And it's actually become a reality now. Like, I don't know if you've been to San Francisco or Phoenix, or...

[00:30:32] Bilawal Sidhu: We got Waymo in Austin here, too.

[00:30:34] Rev Lebaredian: Way- yeah, Waymo, I, I didn't realize they were in Austin as well. It's pretty awesome. I just, I was in Phoenix a month or so ago at the airport and I was waiting for my Uber and five Waymo's picked up these, these people standing next to me and, and it was super mundane.

[00:30:52] Bilawal Sidhu: Just another day.

[00:30:53] Rev Lebaredian: Just another day staring at their phones and got into the car like it was nothing. This was unimaginable 10 years ago, and now it's become mundane. And all of that is powered by these AI algorithms. Now, I, I don't know exactly what's inside Waymo or any of the other ones, but there's this trend that's happening where we're moving from the kind of earlier generations of AI that are some more specific AI like, like AlexNet, where we trained these models on very specific data sets, and then we kind of string these different models together to form a whole system.

[00:31:30] Bilawal Sidhu: Like kind of like task specific models that you kludge together.

[00:31:33] Rev Lebaredian: Yeah, you put together to, to these more general purpose unified models that are built on, on the transformer architecture, the same thing that, that powers LLMs.

And so we're starting to see these robotics models that are more general purpose. And that's, that's what we're talking about with, uh, physical AI being the next wave. Essentially having these kind of foundation models with general purpose understanding of the physics world around us that you use as the basis, as the foundation to then fine tune for your specific purpose. Just like we have, uh, Llama and GPT and the anthropic models.

And then from there you go, fine tune those for specific kinds of tasks. We're gonna start seeing a lot of new physical AI models that just understand the general laws of physics. And then we'll go take those and fine tune them to specialize for different kinds of robotic tasks.

[00:32:34] Bilawal Sidhu: And so those robotic tasks, it's like a, you know, the Roomba in your fricking house versus like, you know, of course, uh, a warehouse robot or even an autonomous vehicle.

[00:32:43] Rev Lebaredian: That's right. Yeah. They could be a pick and place robot in a warehouse. It could be an AMR. They're like basically little driving platforms that, that zip around in these warehouses and factories.

That can be drones that are flying around inside factories outside.

[00:33:01] Bilawal Sidhu: I mean, that's what I want by the way, is I want like a hot latte delivered like on my balcony by a drone. Not having to navigate traffic. It's like it's actually hot and gets to you.

[00:33:11] Rev Lebaredian: Yeah. I'm not sure I'm with you on that one. Uh, like, I don't know if I wanna have thousands of drones zipping around my, my neighborhood, just dropping off lattes everywhere.

That's one of, uh, the, the few things that I do by hand and handcraft at home myself.

[00:33:28] Bilawal Sidhu: You like your latte art?

[00:33:29] Rev Lebaredian: I, I make, I make one every morning for my wife. That's like the first thing I do every day, and it kind of grounds me into the, the world. So I don't need a, a drone doing that.

[00:33:39] Bilawal Sidhu: Fair enough. Fair enough.

Is that, how do you think about where we are in terms of like physical AI capabilities today? I don't know if like the GPT 1, 2, 3, 4 nomenclature is the right way to think about it, but I'm curious, as you think about where we are now and where we're headed, what stage are we at in terms of the maturity of physical AI capabilities, especially this more general approach to agents that understand and can take action in the physical world.

[00:34:04] Rev Lebaredian: I think we're right at the beginning. I don't know how to relate it exactly to GPT 1, 2, 3, 4. I'm not sure if that works, but we're at the very beginning of this. That being said, we're also building on, on the GPT 1, 2, 3, 4, on the LLMs themselves. The information and data that's fed into these text based or LLM models is actually still relevant to, to the physical AI models as well.

Inside, inside these descriptions, in the text that was used to train them, is information about the physical world. We talk about things like the color red and putting a book on a shelf and an object falling. Those, the abstract ideas are still relevant. It's just insufficient. If a human has never seen any of those things, never touched or experienced, it only had the words describing the color red, they're, they're not really gonna understand it.

[00:34:59] Bilawal Sidhu: It's not grounded in the physical world, as you said previously.

[00:35:02] Rev Lebaredian: Right. And so they're going to take all of this different modes of information and fuse them together to get a more complete understanding of the physical world around us.

[00:35:12] Bilawal Sidhu: Is a good analogy, like, different parts of our brains? Like it's, it seems like these LLMs are really good at reasoning about sort of this, like, symbolic textual world and there's all this debate over how far the video models can go and, like, reproduce the physics of the world.

But it sounds like you just create another primitive that kind of works in concert with these other pieces that is actually grounded in the real world and has seen examples of the physical world and all the edge cases that you talked about. And then that system as a whole is far more capable.

[00:35:41] Rev Lebaredian: Exactly. I think, you know, the, there is debate over how far you can go with these video models because of the physics of the world. Now even these, the current more limited video models we have, they're not trained with just video. They're multimodal. There's lots of information coming from non-video sources. There's text and captions and, and other things that are, that are in there. And so if we can bring in more modes of information, like the state of the world that you have inside a simulator. And inside a simulator, we know the position of every object in 3D space. We know the distance of every pixel. We don't just see things in the world. We can touch it, we can smell it, we can taste it. We have multiple sensory experiences that fuse together to give us a more complete understanding of the world around us. Like right now, I'm sitting in this chair, I can't see behind my head, but I'm pretty sure if I put my hand behind me here, uh, I'm gonna be able to touch the back of the chair.

Uh, that's proprioception.

[00:36:45] Bilawal Sidhu: Totally.

[00:36:46] Rev Lebaredian: I know that 'cause I have a model of what the world is around me because I've been able to synthesize that through all of my senses and there's some memory there. We're essentially replicating the same kind of process, the same basic idea, with how we train AIs. The first, the missing piece was this transformer model, this idea that we just throw all kinds of unstructured data at this thing, and it figures out, it creates this general purpose function that can do all kinds of different things through the understanding of, of complex patterns.

So we, we had that, and we need all of the right data to pump into it. And so our, our belief is that a lot, if not, most of this data is gonna come from simulation, not from what happens to be on the internet.

[00:37:35] Bilawal Sidhu: So interesting. Your point about, you have the state of the world, like you have the, to use nerd speak, the 3D scene graph, and as you mentioned, yeah, like the vectors of all the various objects.

All this stuff that you take for granted in video games could then be thrown into a transformer along with other image data. Maybe decimate it to look like a real sensor and then suddenly you can, like it, it'll build an understanding or build a, I've heard it described as like a universal function approximator to, to figure out how to, yeah, emulate all these other senses like proprioception and all these other things. I think there's like 30 or 40. I was like kind of surprised to hear that we have so many. And maybe robots could, I mean, they're not even limited by art. You alluded to lidar and lasers earlier, right? Or infrared. And so it's like at some point these robots will be, going back to the start of our conversation, superhuman.

[00:38:27] Rev Lebaredian: Yeah. I mean, we, we have animals that are superhuman in this way too, right? Bats can see with sound.

[00:38:33] Bilawal Sidhu: Yeah, eagles can like, have got like, like very focal like vision. They can kind of zoom in.

[00:38:38] Rev Lebaredian: Sure. Why, why, why won't they be superhuman in certain dimensions of, uh, sensing the world and acting within the world? Of course. Uh, they already are in many respects. We have image classifiers that can classify, uh, animals, every breed of dog and plants, better than any human can. So.

[00:38:55] Bilawal Sidhu: So true.

[00:38:55] Rev Lebaredian: So we'll certainly do that, at least in certain dimensions.

[00:39:13] Bilawal Sidhu: So let's talk about looking towards the future a little bit here. So you talked about physical AI is transforming factories and warehouses. What's your take on the potential in our everyday lives? Right? Like how do you see these technologies evolving to bring robots into our home or personal spaces in really meaningful ways?

This is like as intimate as it possibly can get, right? It's not really a controlled environment either.

[00:39:36] Rev Lebaredian: Mm-Hmm. If you've been watching any of Jensen's keynotes this past year, within the last 10, 12 months or so, there's been a lot of talk of humanoid robots.

[00:39:46] Bilawal Sidhu: Absolutely, yeah.

[00:39:48] Rev Lebaredian: And that's kind of all the rage. You're seeing them everywhere. I imagine for many people when they see this, they could just kind of roll their eyes like, oh yeah, yeah. Humanoid robots. We've been talking about these forever. Why does it have to look like humanoid? Doesn't it make more sense to build specialized robots that are really good at specific tasks at the end?

We've had robots in our most advanced factories for a long time, and they're not humanoids. They're like these large arms and automotive factories. And why are we talking about humanoid robots? The reason why this is coming up now is because if you take a step back and think about it, if you're gonna build a general purpose robot that can do many different things, the most useful one today is gonna be one that's roughly shaped and behaves and acts like a human because we built all of these spaces around us-

[00:40:39] Bilawal Sidhu: For humans.

[00:40:39] Rev Lebaredian: For humans. So, so we built our factories, our warehouses, our hospitals, our kitchens, uh, our doors, retail spaces. There's stairs and ramps and , and shelves. And so, if we can build a general purpose robot brain, then the most natural kind of physical robot to build, to put that brain in, for it to be useful, would be something that's human-like, because we could then take that robot and plop it into many different environments where, where it could be productive and do productive things.

And so many companies have realized this and they're going all in on that. We're bullish on it. I think, even within this space, though, there are specializations. Uh, not every humanoid robot is gonna be perfect for every task that, that a human can do. Actually, not all humans are good at every task. Some humans are better, uh, at playing baseball, and some are better at, um, at chopping onions.

You know, the, there's-

[00:41:43] Bilawal Sidhu: Astronauts have a certain criteria, right?

[00:41:45] Rev Lebaredian: That's, that's right. So we're gonna, we're gonna have many companies building more specialized kind of humanoids or in different kinds of robots. The ones that we're immediately focused on are the ones in industry. We think this is where they're gonna be adopted the most, the quickest, and where it's gonna make the most impact.

Everywhere we look globally, including here in the US, there's labor shortages in, in factories, warehouses, transportation, retail. We don't have enough people to stock shelves, and the demographics are such that, that that's just gonna get worse and worse. So there's a huge demand for humanoid robots that could go work in some of these spaces.

I think as far as in our personal space, uh, a robot that can work side by side with a human in a factory or a warehouse should also be able to work inside your kitchen. In your home. How quickly those kinds of humanoid robots are gonna be accepted, there'll be a market for it. I think it's gonna depend on which country we're talking about, because there's a very cultural element.

Bringing a robot into your home, another entity, some other thing that's human-like into your home, that's very personal and-

[00:42:58] Bilawal Sidhu: God forbid it makes your latte for you.

[00:43:01] Rev Lebaredian: Exactly. I don't wanna do that in my kitchen. I don't even want other humans in there at the, in the morning. I just, but there's cultural elements here in the US and the West in general, we're probably a bit, um, more cautious or careful about robots. In the East, especially countries like Japan-

[00:43:21] Bilawal Sidhu: Totally. That's where my head is going.

[00:43:21] Rev Lebaredian: They love 'em. Right? Like, and, and they, they, they want it. But industry everywhere needs it now.

[00:43:28] Bilawal Sidhu: Right, yeah.

[00:43:29] Rev Lebaredian: Uh, and, and so for industrial applications, I think it makes sense to start there and then we can take those technologies into the consumer space and we'll, the markets will explore where they fit the best at first, but eventually we'll have them everywhere.

[00:43:44] Bilawal Sidhu: It's so fascinating to think about how many technologies that their early adopters are of including virtual avatars and things like that, but sort of bridging virtual and the physical, the technologies you all are building aren't just limited to robots, right? As, as this tech improves spatial understanding, they could enhance our personal devices, sort of virtual assistants.

How close do you think we are to that sort of, you know, in real life, Jarvis experience? A virtual assistant that can seamlessly understand and interact with our physical environment, even if it's not embodied as a robot.

[00:44:19] Rev Lebaredian: So this gets back to what, what I was saying earlier about the definition of a robot.

[00:44:23] Bilawal Sidhu: Yeah.

[00:44:23] Rev Lebaredian: What is a robot?

[00:44:24] Bilawal Sidhu: Totally.

[00:44:24] Rev Lebaredian: The way you just talked about that, like, to me, Jarvis is actually a robot. It does those three things. It perceives the world around us.

[00:44:32] Bilawal Sidhu: Yep.

[00:44:33] Rev Lebaredian: Through many different sensors, it makes some decisions, and it can even act upon the world. Like Jarvis, inside the Avengers movies?

[00:44:40] Bilawal Sidhu: Yep.

[00:44:40] Rev Lebaredian: It can actually go activate the Ironman suit.

[00:44:43] Bilawal Sidhu: Right, yeah.

[00:44:43] Rev Lebaredian: And do things there, right? Like so, so what is the difference between that and a C3PO?

[00:44:49] Bilawal Sidhu: Totally.

[00:44:49] Rev Lebaredian: Fundamentally.

[00:44:50] Bilawal Sidhu: You're kind of inside a robot, sort of as you alluded to the NVIDIA building too.

[00:44:53] Rev Lebaredian: Yeah, and if you think about some of these XR devices that immerse us into the world, they're half a robot.

There's the perception part of it, there's the sensors along with some intelligence to, to do the perception, but then it's fed into a human brain, and then the human makes some decisions, and then it acts upon the world.

[00:45:14] Bilawal Sidhu: Right.

[00:45:15] Rev Lebaredian: And when we act upon the world, there's maybe some more software, some, some even AI, doing things inside the simulation of that world or the combination.

So, um, it's not black or white. What's a robot and what's a human or, or a human intelligence, where there's kind of a spectrum between these things. We can augment humans with artificial intelligence. We're already doing it. Every time you use your phone to ask a question, you go to Google or Perplexity or something, you're adding AI. You're augmenting yourself with AI, thereby. Asking ChatGPT a question. It's that blend of AI with a Jarvis experience that's immersive with XR. It's just making it so that the, that loop is faster, the with the augmentation.

[00:46:01] Bilawal Sidhu: You beautifully set up my last question, which is, as AI is becoming infused in not just the digital world, but the physical world, I have to ask you: what can go wrong and what can go right?

[00:46:13] Rev Lebaredian: Well, uh, with any powerful technology, there's always going to be ways things can go wrong, and this is the most powerful of technologies potentially that we have ever seen. So we have to be, I think, very careful and deliberate about how we deploy these technologies to ensure that they're safe. So in, in terms of deploying AIs into the physical world, I think one of the most important things we have to do is ensure that there's always some human in the loop somewhere in the process. That we have the ability to turn it off, that nothing happens without our explicit knowledge of it happening and without our permission. We have, um, a system here. We have sensors all around our building. We can kind of see where people are, which areas they're trafficking the most at night.

We have robotic cleaners. They're like huge Roombas that go clean our floors. And we direct them to the areas that people have actually been, and they don't bother in the areas that haven't been trafficked at all. Uh, to optimize them. We're gonna have lots of systems like that. That's a robotic system, that's essentially a robot controlling other robots.

But we need to make sure that there's humans inside that loop somewhere, deploying that, watching it, and ensuring that we can stop it and pause it and do whatever's necessary. And so the other part of the question was, you know, what are the good things that are gonna come outta this? We touched on a bunch of those things there, but ultimately being able to, to apply all of this computing technology and intelligence to things around us in the physical world, I can't even begin to imagine the, the potential for the increase in productivity.

Just look at something like agriculture. If you have effectively unlimited workers, who can do extremely tedious things like pull out one weed at a time, in, uh, thousands of acres of fields, go through and just identify where there's a weed or a pest and take 'em out one by one. Then maybe we don't need to blanket, blanket these areas with pesticides, with all these other techniques that, that harm, uh, the environment around us that harm humans.

We can... essentially, the primary driver for economic productivity anywhere is the number of people we have in a country. I mean, we measure productivity with GDP, gross domestic product, and we look at GDP per head. That's the measure of efficiency, right? But it always correlates with the number of people, countries that have more people have more GDP. And so when we, when we take physical AIs and apply them to the physical world around us, it's almost like we're adding more to the population, and the productivity growth can increase. And it's, it's even more so because the things that we can have them do or things that humans can't or won't do.

They're just too tedious and boring and, and and awful. So you find plenty of examples of this in manufacturing, in warehouses, in agriculture and transportation. Look, we keep talking about transportation being, being this huge issue right now. Truck drivers, we don't have enough of them out there. This is essentially a bottleneck on productivity for a whole economy.

Soon, we're effectively gonna have an unlimited number of workers who can do those things, and then we can deploy our humans to go do all the things that are fun for us that we like doing.

[00:50:04] Bilawal Sidhu: I love that. It's like we're finally gonna have technology that's fungible in general enough where we can reimagine all these industries and yeah, let humans do the things that are enriching and fulfilling.

And perhaps even have a world of radical abundance. I know that's a little trendy thing to say, but it feels like when you talk about that, it sounds like a world of radical abundance. You feel that way?

[00:50:26] Rev Lebaredian: I do. I do. I mean, I mean, if you just think about everything I said from first principles, why won't that happen?

If we can manufacture intelligence and this intelligence can go drive, be embodied in the physical world, and, and do things inside the physical world for us, why won't we have radical abundance? I mean, that's basically it.

[00:50:51] Bilawal Sidhu: I love it. Thank you so much for joining us, Rev.

[00:50:54] Rev Lebaredian: Thank you for having me. It's always fun talking to you.

[00:50:59] Bilawal Sidhu: Okay. As I wrap up my conversation with Rev, there are a few things that come to mind. Oh my God, NVIDIA has been playing the long game all along. They found just the right wedge, computer gaming, to de-risk a bunch of this fundamental technology that has now come full circle. Companies and even governments all over the world are buying NVIDIA GPUs so they can train their own AI models, creating bigger and bigger computing clusters, effectively turning the CEO, Jensen Huang, into a bit of a king maker. But what's particularly poetic is how all the technologies they've invested in are the means by which they're going to have robots roaming the world. We are creating a digital twin of reality. A mirror world, if you will, and it goes far beyond predicting an aspect of reality like the weather.

It's really about creating a full fidelity approximation of reality, where robots can be free to make mistakes and be free from the shackles of wall clock time. I'm also really excited about this because creating this type of synthetic training data has so many benefits for us as the consumer. For instance, training robots in the home.

Do we really want a bunch of data being collected in our most intimate locations inside our houses? Synthetic data provides a very interesting route to train these AI models in a privacy preserving fashion. Of course, I'm left wondering if that gap between simulation and reality can truly be overcome, but what it seems is that gap is gonna continually close further.

Who knew? Everyone was throwing shade on the metaverse when it first hit public consciousness. Like who really wants this 3D successor to the internet? Now I'm thinking maybe the killer use case for the Metaverse isn't for humans at all, but really it's for robots.

The TED AI Show is a part of the TED Audio Collective and is produced by TED with Cosmic Standard. Our producers are Dominic Gerard and Alex Higgins. Our editor is Banban Cheng. Our showrunner is Ivana Tucker, and our engineer is Aja Pilar Simpson. Our researcher and fact-checker is Krystian Aparta. Our technical director is Jacob Winik, and our executive producer is Eliza Smith.

And I'm Bilawal Sidhu. Don't forget to rate and comment, and I'll see you in the next one.