How Pokemon Go and augmented reality are transforming how we’ll navigate the world w/ Niantic's Brian McClendon (Transcript)

The TED AI Show
How Pokemon Go and augmented reality are transforming how we’ll navigate the world w/ Niantic's Brian McClendon
January 21, 2025

Please note the following transcript may not exactly match the final audio, as minor edits or adjustments could be made during production.


[00:00:00] Bilawal Sidhu: Hey, Bilawal here. Before we start the show, I have a quick favorite to ask. If you're enjoying the Ted AI show, please take a moment to rate and leave a comment in your podcast app. Which episodes have you loved and what topics do you want to hear more of? Your feedback helps us shape the show to satisfy your curiosity, bring in amazing guests, and give you the best experience possible.

Remember printing MapQuest directions. Those paper maps seem ancient now that we all have GPS in our pockets, but even today's digital maps have a major limitation. They can't truly understand that three-dimensional world the way humans do. That's where AI comes in. What if we could teach AI to see and understand spaces and places just like we do?

And the solution isn't coming from a self-driving car or satellite imagery company. It's coming from millions of people playing a beloved video game on their smartphones very soon. The way we navigate won't just be through our phones, but through a world where digital information is perfectly mapped onto every building, street corner and landmark that we see reshaping not just how we navigate, but how we interact with the physical world.

I'm Bilawu Sidhu and this is the Ted AI Show where we figure out how to live and thrive in a world where AI is changing everything.

I've spent years at Google developing next generation 3D mapping technologies, including Google Maps, immersive view, and the AR core geospatial API, which turned the world into a 3D canvas for augmented reality. And these innovations were built on the foundations laid by today's guest who saw the future coming decades ago.

Brian McClendon co-founded Keyhole, which became Google Earth and went on to lead the teams that created Google Maps and Streetview tools that transformed how billions of people navigate the world. Now at Niantic, the company behind Pokémon Go and other games that blend digital experiences with the physical world, he's building something even more mind blowing than that.

And yes, it involves millions of Pokémon go players. Brian's been consistently ahead of the curve in predicting and building the future of how we interact with our world. So today we're gonna take a deep dive into his vision for the future of maps. Get ready to explore a world where maps are no longer just about getting from point A to point B, but gateways to entirely new realities that connect us more deeply to the world around us.

So the apartment you grew up in, Lawrence, Kansas, is now the default location for Google Earth. What got you into geospatial technologies and mapping in the first place?

[00:02:56] Brian McClendon: When I was in Lawrence, I started out with the Atari 400 computer, you know, long before video consoles were really a thing. And I programmed that and got excited about cg and of course video games were very popular in the, uh, you know, early eighties when I started.

So I got my degree focusing on computers and 3D graphics and spent the next 10 years building 3D graphics for Intergraph workstations and then Silicon Graphics.

[00:03:23] Bilawal Sidhu: Let's skip ahead a little bit. It's 2004 and Google is buying the company you co-founded Keyhole Inc. Talk to me about what excited Google about the tech you were creating there and what did it turn into?

[00:03:35] Brian McClendon: So when they were looking at us, we had been, uh, out in the public for about three years and we had this Earth Viewer application that ran on PCs. We combined satellite with maps and terrain data to create a new way of visualizing the world. And, you know, Google's mission is to organize the world's information and make it, uh, universally accessible and useful.

And this… product that we had built completely aligned with their mission and their vision. And so when they saw it and sat down and, and started using it, they got extremely excited.

[00:04:07] Bilawal Sidhu: And I’ve heard the first thing anyone does when they first got hands on Earth, is basically keying their home address and see the cameras zoomed down into that.

[00:04:15] Brian McClendon: That's exactly right. And it's really a test of whether the product works because the, the way you test something is you go to where you know, and if it reflects what you know correctly, then you start to explore the rest of the world because now you're excited that it matches your view of the world.

But if we didn't have high-res imagery of their country, or there's suburban or rural town, they were disappointed. And so our goal, you know, Keyhole was to get as much, uh, imagery as we could afford. But with Google buying us, you know, one of the big arguments for doing so is that they were willing to spend the money to get as much satellite imagery as we could handle.

[00:04:50] Bilawal Sidhu: And of course, that technology turned into Google Earth, and a bunch of it went into Google Maps as well. Can you talk briefly about your time at Google and the engineering efforts that you led there in the Geo team?

[00:05:02] Brian McClendon: So, when we joined Google in 2004, we had the Keyhole product, but we were also sitting next to a, another small acquisition that was working on a maps-based product.

And initially that maps-based product was also PC based, but very quickly they, they redirected to start working on a JavaScript web-based map viewer. And they built Google Maps and worked with us. Um, as part of that, you know, they built this very, uh, fast JavaScript Ajax engine. The first sort of like, you know, client-side JavaScript situation.

And then they pre-rendered every single map tile out to a server. And what that means is from a speed perspective, Google Maps was faster on day one than anything anybody had seen before, 'cause MapQuest would take 20 seconds to render a tiny little map tile and Google Maps was able to basically pull it up at the speed of your network and they could pan, it could zoom, you know, that was very exciting.

Uh, but we then did something even bigger as we added the satellite imagery that we had from Keyhole onto Google Maps very soon after. And immediately the Google Maps users had the same experience that we'd seen at Keyhole and that we would see with Google Earth, which is, can they see their house? They were many people, this was their very first introduction to satellite imagery.

[00:06:17] Bilawal Sidhu: Yeah, it's like the best of like an abstracted map to get you from point A to point B, but also the best photorealistic rendition of the real world. And I can't like overstate how easy it is for people to take for granted that this exists now, but back then, like you said, people were literally printing out directions from MapQuest to go get around from point A to point B.

Along comes Google Maps and Earth and of course next thing you know, this thing is on the iPhone as well.

[00:06:43] Brian McClendon: Yeah, the mobile phones, um, you know, right when we were acquired, uh, did not have the power. Like we, we talked about it, but they're just, you know, screen real estate and CPU and Network were not good enough.

But iPhone came out in 2007, Android in 2008, and suddenly the screen space was there and we finally had network bandwidth and, you know, there was a good enough graphics chip in there that we were able to get, uh, Google Maps running on an iPhone in 2007. Yeah, I think on, on launch, 2007. And, uh, were then able, uh, later to get Google Earth running on both iPhone and Android.

[00:07:20] Bilawal Sidhu: That’s wild. I gotta ask you, what were some of the biggest changes in the world that you've seen as a result of Google Earth and Maps?

[00:07:27] Brian McClendon: I think there is, uh, uh, been a huge change in how people think about visiting a place and exploring a place. The, in the past you'd, you'd read the, the guides about a place and you would, um, you know, talk to people and get, uh, recommendations.

In many cases, now you can actually go to the place, you can go to street view, you can look at the location, you can see where your hotel's gonna be and see where the beach is and see how far the walk is to the beach. And, uh, I think that preview of being there before you go there has made it easier, uh, for people to travel and to explore the world.

And that's really, I think, the, you know, one of the goals is we wanna make the world easier and more accessible, both virtually by on the computer screen, but also then opening it up so that people actually go out and experience it as well.

[00:08:17] Bilawal Sidhu: I love it. That's the perfect segue into something very close to my heart, having worked at Google Maps, building a next Gen 3D map on the foundation that you created.

So I wanna talk about the shift in how maps are made for the uninitiated. The way Google or Apple makes maps, and you alluded to this with the expensive satellite imagery, like these captures of the real world using satellite, aerial and ground level sensors. There are these like super structured and semi frequent captures of the world, but now at Niantic, you're trying to build a different kind of map in a different way.

Tell us more about that.

[00:08:49] Brian McClendon: Well to, to bring it, you know, all the way back into history. Before Google, the way that people built maps was that they would literally drive around in a van and take notes and draw pictures, and they would only visit the most popular or important urban locations. And so the map data that Google licensed in 2005 was the best you could get, but it wasn't great.

You know, it was based on government data plus as much work as the companies had put into it, like Navtech and Tele, Tele Atlas. But at Google, we realized that the maps were not good enough, you know, and we realized is honestly when we put street view in, because the first thing you do is, you know, people looked at the street view pictures, they see a photo of an intersection, and then they look at our map data and it's wrong.

And they say, how can this be? The pictures that you gave us is, is clearly correct. That project, uh, that we started was called Ground Truth, uh, for exactly that reason. We built our own maps. We started with government data, but we had the. Power and superpower of street view and satellite imagery and a lot of elbow grease to start building maps.

And we launched US, Mexico, Canada in 2009 and continued on for five more years until we had basically mapped all of the larger countries in the world or the larger GDP countries in the world and used, uh, user generated content to map the rest of them. And that's to a degree why Google Maps are better than many of the other providers because we were able to use this data to make a better map.

Now the problem with any map. Is that things change. And so even if you were perfect on in, at a snapshot in time, it immediately starts to fall out of date. You know, my rule of thumb has always sort of been that roads change at 1 to 2% a year, and that local businesses change at 10 to 20% a year because there's a lot of turnover in local businesses.

And these are, you know, the most important things people search for is where am I going? What restaurant do I go to? Where do I get my, uh, dry cleaning or whatever it is. You know, this, this data changes. And so keeping your map up to date is a really big challenge. And you need people on the ground. You need new data, and you also need signal that, you know, things are good or not.

And Google, I think, has done a good job, uh, for the basic data, but you know, at the level that Niantic is approaching it, it's, it's much different. We are very much on the ground collecting imagery at a level of detail finer than even Google has collected. And the challenge there is the closer you look at the world, the more change there is.

So it's even harder to maintain an accurate map if you're trying to get detailed down at the, you know, the bench and park and and chair scale of the world.

[00:11:33] Bilawal Sidhu: Why do we need that kind of map and why approach it in building this kind of, you know, kind of a crowdsourced manner that you are?

[00:11:40] Brian McClendon: One of the things that, uh, Niantic realized is that to build an accurate localization system, to know where somebody is, you need data far beyond a street map.

You know, you, if you wanna know exactly where you're standing relative to a statue or a park or even a sidewalk, you need a level of detail that just isn't available in, in Google Maps today. And so building this much higher precision localization system, which we called a visual positioning system.

Required this high-resolution data. Pokémon Go was launched in 2016. It was the first AR game. Augmented reality was enabled with Pokémon Go. You could take pictures of Pokémon in locations and we got Pokémon Go players and ingress players to actually actively choose to scan our poké stops to start building this map for us.

And that data has been put together. We've created a VPS system with it. And now when you point your phone somewhere, we know exactly where you're standing

[00:12:44] Bilawal Sidhu: In case you need a refresher, Pokémon Go is an augmented reality game for your smartphone that took the world by storm in 2016. Overnight it felt like everyone was wandering around, staring through their smartphone cameras, hunting for virtual Pokémon to catch parks, streets, and even parking lots became hotspots for adventure.

Before Pokémon Go, Niantic created Ingress, a game with a more sci-fi edge. Instead of catching creatures, players split into two factions and battled for control of real world locations, linking them together to claim territory. It is also interesting that you're talking about as a complimentary map to Google what Google, Apple, and other companies are doing.

It tends to be sort of like the drivable areas of the world and maybe some of the trackable, you know, walkable areas. But there's so many parts of the map that, especially as you mentioned, like parks and other places where people do congregate that are never mapped in that level of detail and you're able to do exactly that.

So it's almost like the inverse of sort of what the mainstream mapping providers are doing. So you can enable this kind of world anchored AR experience and that itself is cool tech, 'cause you're totally right, you talked about visual positioning system, you know, GPS just is not good enough. Like if you've got five meters and then like 30 degrees of rotational accuracy, like the thing that you placed in the virtual world will rarely, seldom line up with a thing that needs to actually be there.

But you can do far higher precision with the VPS maps that y'all are building. Is that right?

[00:14:15] Brian McClendon: Exactly. And, and the way we think about it is that the, the prior methods that, that you and I both worked on at Google, you know, build the map from the top down, right? We start with satellite imagery and that, that inspired us to then use street view and so forth.

Uh, Niantic is building the map from the bottom up, you know, from the locations that people spend time. And we have this advantage that, you know, we have a pretty curated list now, you know, eight years into Pokémon Go, and actually 10 or 11 years in, if you count ingress of. 20 million poké stops, 20 million waste spots that we call them that are, are sort of congregation points for people, landmarks for walking and, you know, are in those areas that you talk about, they're in parks and they move around, but they're not part of the official business or street sign, uh, locale.

And so those, those points, you know, play a central aspect in the game, but also give us the basis for creating this map where these are the little islands that we will then build out from.

[00:15:14] Bilawal Sidhu: I love that. Yeah. That has to be some very interesting data of just like, what are the points of interest like that are compelling to users on a neighborhood level, like even in my neighborhood, like what are the landmarks that here in like Sunset Valley, Texas that people care about would be very different than perhaps, you know, the, the rendition or interpretation from a traditional mapping provider.

But there's something interesting you said, which is you're talking about visual positioning system, which I kind of think of this like machine readable map. This map that a machine can look at. It'll compare your photo with a prior map that exists and figure out, ah, you're exactly located here on the globe.

But we're also seeing a boost in human readable maps. So not just how maps are captured, but what we can do with them. Can you explain why the shift has been significant to our listeners? You know, who might not be computer graphics nerds like you and I, I'm specifically talking about Gaussian Splatting and Radiance Fields here.

Here's how it works: first, you take a bunch of regular photos of a place from different angles. The system then creates what's essentially a cloud of special 3D points called gaussians. Think of them as these sophisticated light carrying bubbles. Each bubble knows not just about its color value, but also how that color changes when you look at it from different directions.

Much like how a car's paint might shift in the sunlight as you walk around it. What makes this special is that it runs super fast. You can zoom around at like a hundred frames per second, just like a video game, while still looking incredibly realistic. This is especially exciting because it means we're getting closer to easily capturing and sharing perfect 3D replicas of real places that anyone can explore on their phone, computer, or even a VR headset.

[00:17:00] Brian McClendon: You know, for, for a long time, uh, building a visual map of the world, you know, with satellite imagery was a top-down sort of 2D array of pixels kind of situation. But Google and others started collecting aerial imagery with oblique data and started making 3D buildings out of it. 3D reconstruction, you know, allows for a pretty good 3D model of the world.

But if you wanna see the pain, you know, all you have to do is just go look at trees. Trees are this super hard visualization and reconstruction problem. And anytime you zoom in on any of this data that you see from Google or Apple, um, trees are the worst part of it.

[00:17:38] Bilawal Sidhu: Broccoli trees.

[00:17:39] Brian McClendon: Broccoli trees, there's many good reasons for it.

Um, uh, for one, they move between every picture so they're not the same thing twice. Uh, they grow, uh, the leaves drop and they have a huge amount of detail 'cause they're effectively fractals themselves. And so reproducing trees is really hard. And what we've discovered, you know, with this paper that came out at CI Graph last year about Gaussian splats, it's a new way of both visualizing and reconstructing 3D data.

And what it does is it retains not just the. Specific point locations of things, but also the lighting conditions from every angle. So it achieves a visualization realism that is, uh, far beyond what polygonal reconstruction was able to do. And in particular, uh, with the transparency possibilities of Gaussian splats trees come out really, really well.

The, if you look at them, they look realistic. You can see through them and they're stable. And the added realism sort of gets us over the uncanny valley that I think that many prior 3D reconstructions have had, where they aren't really believable. It doesn't look right. It's kind of

[00:18:49] Bilawal Sidhu: like going from like, I don't know, GTA 2 or 3 graphics to suddenly, like GTA 7 is like suddenly the leap that we’ve had.

[00:18:56] Brian McClendon: Exactly. And you know, urban canyons are an interesting problem 'cause a lot of urban canyons have planted trees down at the street level. Right. And those trees actually block the storefronts and make it hard to, you know, give you a real visual cue about what the place looks like on the ground if you can't reproduce them well.

And so even urban canyons benefit significantly from this new reconstruction.

[00:19:20] Bilawal Sidhu: Totally, and I mean, I remember playing with the previous instantiation of, of Radiance Fields, uh, Neural Radiance Fields a couple years ago, and I was like, I needed this like beefy GPU or I needed to go beg somebody at Google for some TPUs to go process these data sets.

And it took hours. And then we had a chance to meet at Niantic HQ earlier this year, and I was blown away with what y'all are doing with your app Scaniverse, which is basically 3D Gaussian splatting in real time on the phone in your pockets. And now you can bring those things on the map. So tell me a little bit about Scaniverse and your vision there.

[00:19:53] Brian McClendon: Yeah, so, uh, we acquired Scaniverse in its original form in 2021. And you know, it, it is the preeminent 3D reconstruction for, you know, sort of old school photogrammetry and produces very nice models that people use in many different applications. But early in 2024, we added Gaussian splats to the output of Scaniverse.

And to use it, you can use an iPhone or an Android, you don't need LiDAR, and you just move the phone around an object and get it from angles high and low and it's able to reconstruct the object or the scene or the room very, very quickly. And in particular on an iPhone, it can build your Gaussian splat in about a minute.

And there are several reasons why this is good. One, it. You get quick feedback. So like a Polaroid, if you don't like it, you can shoot another picture, you know, within a minute while you're still at the location. So that's a big advantage. Another is privacy. You know, this data doesn't leave your phone unless you choose to send it out.

So you can build your model, you can decide if you like it, you can decide who you share it with, and literally it stays on your device until you upload it. We've added the capability very recently for you to add it to our map, and so there's now the Scaniverse map allows you to walk through and see all of the other scans that other people have uploaded, including ones that we've built from the uh, Pokémon Go and ingress scans as well.

And so this map is our beginning of the next generation of 3D reconstructed maps.

[00:21:22] Bilawal Sidhu: Ah, the seed map for, for this next gen map. Are you excited about the fact that there's sort of like a standalone way for people to capture these type of things now that we have outputs like 3D Gaussian Splatting versus say having it be a part of the Pokémon Go or Ingress experience?

How do you think about that?

[00:21:39] Brian McClendon: Well, I think when we were just building the VPS, I think we struggled ex with something you mentioned earlier called the Invisible map, right? Mm-hmm. It, it is an invisible map and it's very important and it provides great value, but there's no way for people to sort of understand, will it work here?

Does it work? You know, how, how does it work? And the advantage of Gaussian splats or that the same data that we're using to build VPS can also be used to use Gaussian splats, and now we can visualize. We have data here, you can see it. And yes, you can also localize yourself at this particular location. So you can have either an AR experience or a VR experience from the same location, depending on which way you want to go.

And we've built products that allow you to develop both of those experiences, um, on web or uh, on uh, any device you want.

[00:22:26] Bilawal Sidhu: That's really exciting because yeah, you're totally right. If you're, you know, um, you know, an altruistic user of Pokémon Go and you absolutely love the game and you wanna unlock certain experiences in the part of your city, it's one thing to be sitting there and kind of scanning to create this invisible map as you call it.

It's another thing entirely to walk away with this artifact that's kind of useful in itself. It's like, it's like literally a 3D copy of that place, right? It's like, I've been describing it sort of as like memory capture. You capture a space or place once and then you can reframe it infinitely in post and like you said, even unlock VR experiences.

So it's kind of cool that we've got both half of the coins now, uh, with this technology at our hands.

[00:23:07] Brian McClendon: Yeah, and we've had many different, you know, paintings than photos than stereo then video. This is a new form of quick 3D capture that, uh, I think retains a better and more complete feel of a place than any single photo can alone because you know, if you drop into it in a headset or you look at it, you know, and, and navigate around on your screen, whether you're desktop or your mobile, you, you get a much better sense of what the location's like and so that anybody can collect these and share these and publish these, I think is, is a superpower.

[00:23:41] Bilawal Sidhu: And that brings us to a new announcement that y'all made recently, large geospatial models or LGMs. Before we get into that, can you just explain to our listeners like what is spatial understanding and why do computers or even AI systems struggle with this stuff today?

[00:23:57] Brian McClendon: Spatial understanding is, is how uh, objects in three dimensions sit relative to each other in a most simplistic form.

If you're in a room, how does your chair sit next to your desk? Where you know who's in front, who's in back? These are generic problems that you know, many people in many offices around the world have. You can imagine training a model that understands almost every office configuration and has a pretty good idea of what that means.

Once you go outdoors, you discover that the world is much more complicated and much more variable across geographies. And the way I like to think of it is use an example, which is, uh, for those of your listeners who have, uh, tried the geo guesser game. This is a case where you look at a, a picture in street view and you guess where on the map this picture came from.

And what's fascinating is just how different. These pictures can be, and how much information in a single picture is contained about where you really are. But the differences are either very obvious or very subtle. And there's a player called Geo Rainbolt or Rainbolt. Yes. Who, who's just amazing at this.

And I, I like to think of him as building a neural net in his brain from studying hundreds of thousands or millions of these pictures that he now knows either consciously or subconsciously the signals that each of these pictures produce. And what I think we are talking about with a large geospatial model is to reproduce that neural net by feeding it not a million photos, but hundreds of billions of photos.

If we can do that, then maybe this, uh, geospatial model will have enough understanding to localize you, you know, visually position you, 3D reconstruct the parts of the scene that you can't see because it's seen enough of the front of a church to predict what the back of a church looks like. And so it, the opportunity is, is big, but the data set required and the understanding that a model would have to have is, is very, very large.

And so that is in fact what we're working on.

[00:26:00] Bilawal Sidhu: I love that. And it sounds like you alluded to having 10 million scans, you know, uh, sort of these seed locations worldwide. And basically the shift is we have these islands of spatial understanding, these individual maps where you can figure out, you know, exactly where the user is, but now you're working on kind of fusing that together.

I think you gave a really great example of how that fusion works. Can we go a little bit more into. Like, why is this the better way to build this type of map than perhaps what other folks are doing?

[00:26:30] Brian McClendon: Well, a, a systemic coverage of the world that one of the challenges is that to keep it up to date, you have to revisit all of it all the time.

Or have a really smart model about visiting that which changes all the time. And if to visit it, you have to send a sensor, a street view car, or you know, a Waymo vehicle to go collect the data, uh, fly a plane overhead. These are all heavyweight things that are not relatively easily repeatable or don't provide high coverage.

And if we can get to the point where a single photo can. Give enough information about whether the world has changed, and if so, how? Then you now have the opportunity to update and maintain a map of the world from sort of very small inputs, you know, single pictures every now and then. And the, the rest of the system can detect that, yes, this is different than what was before.

And we can then deduce the other changes around that photo. And so I think this is an opportunity to build a much better, more frequently updated and more accurate map.

[00:27:39] Bilawal Sidhu: It's like you're building a map that is not only resilient to change, because obviously, as you said, the world changes at different clips, but it doesn't actually require you to map every inch of it with these really, really expensive, uh, sensor systems.

I'm kind of curious, like how do you see the product experience for games like Pokémon Go and other stuff that Niantic is working on, evolving as you create this new type of map?

[00:28:03] Brian McClendon: One of the things that we'd launched just, you know, in the last week, is what we call Pokémon Playgrounds, which is this ability to put Pokémon on the map at a poke stop waste spot location, and leave them there precisely so that the next user can see them or add their own Pokémon into a collection.

And so actually build up a little collection of, of Pokémon, allowing for a sort of shared virtual experience and. One of the challenges with, uh, augmenting reality in general is the believability factor. And if everybody sees a different world, you, you can't talk about it like it's actually a world. It's just your vision and you're hallucinating.

But if we see the same thing at the same place at the same time, then it's a shared experience and you really are augmenting the world. You're not augmenting yourself.

[00:28:54] Bilawal Sidhu: It's like this next level from like dropping a pin and sharing it with somebody, you're kind of like annotating this 3D map of the world and then anyone who comes there in that location, whether they're there, then I guess in this case, you know, even if they come after the fact, they can see that exact same annotation the way you left it.

So there's a lot of hard engineering that goes into that. You keep talking about this term localization. Let's unpack that and kind of talk about the old way of localizing and the new way of localizing that y'all are investing in.

[00:29:21] Brian McClendon: Sounds good. So the, uh, we call it Visual Positioning System, and that's, you know, actually a very clear name.

It's visual. What you see is what you get and how you do that is in the old school is you collect a bunch of data. You try to build a point cloud of features of, of things that are easily visually distinguished and their position. The world is, is fixed. And so when you see them kind of like, uh, at star field, they're all in, in a particular position that helps you find where you are because you see these single features in a particular orientation.

So that's, that's the way it's always worked in the past. Um, and what that is, is a point cloud map of the world. With our, our map free ace zero implementation. Though we did something different. Uh, we took, for each scene we train a neural net model with the video scans that our Pokémon Go and Ingress players have provided, and we build up a neural net that has the same capability, but it encodes the space into this network.

And now when we send a picture into this network, much like you would upload a, a photo into a large language model right now. It goes through and it can tell you exactly where you're standing. And it can do so more accurately than our prior algorithms that just use this visual point cloud. And so we call it a zero, and it's taught us a lot about how to take those video scans and turn them into a reasonably sized, uh, neural net that encodes all of this information about a location.

[00:30:56] Bilawal Sidhu: So that's kind of wild. So instead of having an image come in, extracting some features, trying to match that against this 3D model that you've, you know, created offline, uh, you kind of just provide this image to a neural network and you get back like, yo, this is where you're located. And as like, going back to the geo guesser example, maybe the neural net will be far more resilient at sort of figuring out how best to localize the user or figure out where they are in 3D space without just relying on these static features that don't change over time.

[00:31:25] Brian McClendon: Exactly. I, they, it seems to be more stable, you know, as we said, the world changes all the time. I'll say that, you know, going back to our tree, problem with localization, leaves falling from trees and sitting on the cement become, you know, visual features that actually make our point cloud solution not work as well.

But the, uh, neural net is more robust to this. That's an example of how to overcome change and find the core static true, solid ground to localize against. And I think that's helped with our accuracy over time.

[00:31:58] Bilawal Sidhu: That's a great point. Yeah. Trees, if you extract features from trees, they definitely change with the seasons and they may not be the most resilient anchors, if you will, to, to localize against.

But you have this idea of sort of an AI figuring out how to do what the best GeoGuessr players in the world do, 'cause I've seen some of those videos too, and you're totally right. It's almost like this person's a human VPS, like it's, it's kind of wild. And I can only imagine what'll happen when you start taking these large data sets of scans and start putting them together to see if you could create something that is better than the best GeoGuessr players.

[00:32:31] Brian McClendon: I mean that, that, that is certainly a bar we would like to, like to achieve at some point. And it's very good that we have, you know, somebody and a actually a whole game and a whole set of competitors who are all in the space and, and they all use different techniques. You know, each of their neural nets is tuned a bit differently and some are good at some things and others are good at others, but I think it's, it's educational to watch how they play and how they think about it, 'cause you know, if you've watched Rainbolt, he'll talk through how he, you know, some of the signals that he sees and why he makes some of the decisions that he does, but some he can't explain it. Yeah. You know, his brain just goes there and that's because, you know, his neural net is, is baked in pretty good too.

[00:33:12] Bilawal Sidhu: You know, related to that, one question I know a lot of people have in their minds is like, when y'all were building Ingress and Pokémon Go, like how much of the gaming and product experience is designed in mind that it's like equally fun to play, but also conducive to kind of building this sort of like map of the world.

[00:33:30] Brian McClendon: I would say it was the gameplay and game design was focused almost entirely on getting people to explore the world together. That's Niantic mission statement. And so the, I think the focus on location was really about how to get people outside, how to exercise and how to, uh, play games together. And I would say that the games themselves were not designed to build this map.

The map became a follow-on side effect of making the games better. And once we started to want to know exactly where somebody was to decide whether they could spin a poke stop or not, we realized that, you know, figuring out where you are in a GPS Denied or GPS Urban Canyon is really hard. And so if there was another better way to solve it, you know, could, could we create that?

And that's probably the genesis of the VPS at Niantic.

[00:34:23] Bilawal Sidhu: It's like, it's almost like a means to an end versus an end in itself. And yeah, you just alluded to GPS Denied, that's another great example where like you need visual positioning because yeah, your GPS signal bounces around when you're like in, you know, surrounded by tall metallic buildings that are, you know, reflecting that GPS signal as it gets to you.

And I think everyone can relate to sort of walking down the street in one direction and realizing they're actually going in the opposite direction that they intended. Obviously that problem doesn't exist when you're using visual positioning.

[00:34:54] Brian McClendon: Exactly.

[00:34:55] Bilawal Sidhu: So in the interest of it sort of being a means to an end and not the end in itself, let's talk about why is this such a game changer?

Like what are the sorts of possibilities this opens up for, let's say, augmented and virtual reality. It feels like we're already seeing the instantiation of the next computing platform and these devices are getting more and more real. What can we do once machines have mastered spatial understanding?

[00:35:18] Brian McClendon: I mean, I think if you look at the, uh, focus of large language models right now, a lot of them are around providing assistance and you explain your problem to them and they give you advice. And in, in some company's views of the world, the goal would be is that you can ask them a question and they'll use the context of the question, but they'll use all the other context that they know about you and any other context that they can get their hands on.

And I think one of the important bits of context that a, you know, even a camera by itself doesn't have is where am I exactly. And what is around me. Now a camera can see what I can see here, where it's pointed, but it doesn't know the rest of the story. It doesn't know what's behind me, it doesn't know what's behind that wall.

And it, there's a lot more context that can be provided to an assistant who can, uh, include that information in their advice. And so. I think contextual advice is, is one of the big applications, building a, a view into the places you can't see for short-term navigation, for answering questions about facilities or, or safety.

These things are derivable or knowable from a, a larger model that can recognize sort of systemic examples of the problem, 'cause all of humanity, you know, streets, street corners and sidewalks are similar in many places in the world. And there's a pretty good guess that if this has a sidewalk here, the sidewalk will continue.

What can you do with that information? How can you visualize it? How can you tell users about it? And I think these models will be able to answer these questions without having continuous input, without, you know, being forced to have video on all the time.

[00:37:04] Bilawal Sidhu: What you're saying is really interesting, right?

Because you're totally right. When these large language models operate, they kind of have, they're pulling on, as you said, world knowledge that they've, you know, seen on at least public content on the internet. Alright, so this is pretty cool. Basically, this technology gives you the ability to search what you see, but also what you can't see.

You can start asking all sorts of amazing questions like, Hey, which of these hotel rooms is gonna have an ocean view or a city view? Hey, how much sunlight is this room gonna get? Or pull up the reviews for this restaurant that you're looking at. And of course, given Niantic own focus, you can literally re-skin the world for gaming applications.

The sky is truly the limit. But what's also cool is how these models can work in concert with large language models. You are trying to do the same thing for the real world. Right. And I'm kind of curious, how do you think about these large geospatial models working in concert with these large language models?

You mentioned, for example, not having the camera on the whole time. I mean, this gets into sort of like all the privacy question peoples have about glasses, right? Is like, do I really want a camera on my face? Is it like lidar? So it's not, you can't see exactly what it is, but you can see the structure, but it seems like there could be this sort of holy matrimony between what you're building.

And what other companies are building, especially given that they can understand visual inputs and even audio inputs now.

[00:38:24] Brian McClendon: Yeah, I think one of the things that we're going to see, you know, large language models, these large s patient models, we talk about image generation. All of these at the moment are, tend to be cloud-based, right?

They tend to be big models living in the cloud. They work really well. OpenAI is a fine company to provide you with, you know, a service, but it means you're sending your data to them. And, you know, I, I do think that there is a privacy issue that is going to get resolved by having these models get small enough that they run on your device and most of what goes into them stays on the device.

A highly trained model tuned to, let's say your language, your geo geography, your place could be much smaller because we know that you're in. Kansas City, or we know that you speak English, or we know that the visuals in question are gonna be sports-based because you walked into a football stadium and you queried the, you know, football digest version of the model.

So with, you know, I think there's gonna be a, a sort of a tricky balance between what is on device and what is in the cloud. But p piecing together these small models so that you can bring them onto your phone and answer questions without having those answers go back up to the cloud.

[00:39:38] Bilawal Sidhu: That intuitively makes sense, right?

Like if you think about, I don't know, like a taxi driver in New York kind of has a map of New York in their head. They're not constantly needing to reference Google Maps. Or one of the other analogies I use for visual positioning is like, like Shazam, like so Android's version of Shazam can figure out what song is playing without needing to send that audio up to the cloud.

Your device just knows the signature of all these various songs and you could just do that locally and then, like you said, for certain experiences, when you need to send that up, you can, or other insights that are derived. Um, it really does feel like we're seeing right now kind of a lot of the magic happening in the cloud, I guess.

'Cause it's like easy to manage, build and serve. But yeah, like as these things get out into the wild, like why do I need to send, you know, a photo up or a video feed up just to figure out where I am. Um, that I'm excited about that. How far out do you think that is?

[00:40:29] Brian McClendon: I mean, I, for specific problems, it, it already exists.

We've seen, um, the, like in the large language model world, we've seen 70 B models train three B models to train a one B model. Yeah. That does exactly this one task really well. And a one B model can sit on your phone and be performant and not even burn a lot of power. And I think the same will be true of, um, these other, uh, other models over time.

And, you know, they can be sliced and diced in many different ways. Like they can be tasked. Customized, they can be geographically customized. Uh, like I said, they could be language customized. So, uh, if you, once you know, the subset of the problem that you're really trying to solve, the model can be downloaded.

And after that everything is on device. And that's, you know, from a privacy perspective is something I'm very excited about.

[00:41:16] Bilawal Sidhu: That's magical, right? Yeah. You've got your, like, uh, I've been playing around with the new Apple Intelligence. It's got a bunch of these like rewriting models that are running all on device.

Maybe you have some future instantiation of like a distilled, large geospatial model that like knows its way around the city. And so I just point my camera and get an answer about like, what I'm looking at. And then to your earlier point about like kind of x-ray vision, like what's even like behind the buildings, all without the data leaving the device.

That's freaking cool.

Now before we go, talking about the future, I did wanna get your take on sort of other approaches to crowdsource mapping. Right? Like, uh, the two companies that come to mind for me are, are Hive Mapper and even Meta's Mapillary, which was an acquisition, and they're kind of focusing more on like sort of dash cams that you pop on.

Uh, you know, in your vehicle as you're driving around, you've got a bunch of ride sharing vehicles like fleet telematics, companies like FedEx, delivery drivers that got all these cameras. Um, what do you think about some of these other approaches to crowdsource mapping?

[00:42:32] Brian McClendon: I think that the, I mean, obviously they're collecting a, a lot of great photos of the world.

The struggle with Mapillary and with Hive Mapper is really the pose on these, on these photos is, is not good enough to do Gaussian Splats, for example. They're just, they're not, and I think, you know, we're very more interested in a reasonable frame rate video, uh, you know, with the camera changing, changing orientation and being able to track the IMU of the camera at the time.

[00:43:01] Bilawal Sidhu: Yeah, I think that makes sense, right? Like, uh, Mapillary, maybe that one photo every 10 meters or whatever is enough to figure out, oh, the speed sign changed. It's now 35 miles an hour instead of whatever it was before. But it isn't enough to create this like 3D rendition of the world. Um, you just don't have enough, enough views at it.

What are the incentives for creating this map, right? Like, do you have any guesses on like how many people it would take to sort of map the world in this sort of like, decentralized crowdsource fashion and why? What's in it for them?

[00:43:30] Brian McClendon: I in, in the, uh, in the early days, it's the inverse of the Google Earth problem.

So with Google Earth, you would zoom in and you would discover whether your house was in high-res or not, and you'd either be happy or sad. Our answer is you can put your location, your neighborhood onto the map, you can solve this problem yourself. And what we found is people are really proud of their neighborhood, of their, of their city, of their landmarks.

And so being able to have a, a high quality, uh, representation of their neighborhood is, is I think, a strong motivator. Not for everybody, but for, uh, uh, you know, enough people that I think we, we can, uh, put a pretty good dent in this problem.

[00:44:13] Bilawal Sidhu: Yeah, it makes sense, like where the users are, they can kind of create this map.

And I think that brings us nicely to the fact that like you're using this new kind of 3D map of the world, not just for, you know, your own first party experiences like Pokémon Go, but it's, it's a platform that other developers can then build on, right? And so if they want to unlock this type of augmented reality experience, wherever they may be, they've got means to sort of put things on the map and then start building those experiences without needing to go through, you know, like the, the mainstream mapping companies to have them go map those places for them.

[00:44:46] Brian McClendon: That's exactly right. I mean, one of the things we provide with our data is APIs. We have a Unity development kit called ARDK that allows you to bring this kind of data for both VPS, um, into Unity. Uh, but we also have this new Niantic studio, which is a low-code, no-code way to author experiences initially for the web, but in augmented reality and virtual reality where you get to pick all of the locations from the million locations we've already mapped.

But if we don't have that location, you can take Scaniverse out and go map your location or your 10 locations for your game or your experience, and build up a great experience around it. And I, I am really proud of the Niantic Studio experience and, and how easy it is to use.

[00:45:32] Bilawal Sidhu: That's really exciting, right?

'Cause you're right, like there's suddenly a barrier when the moment you bring up unity or whatever game engine that you're using, you suddenly need development experience. But to, and to kind of like literally be able to capture the world and then turn it into a canvas for creativity and do it in this like no code fashion.

That's really, really exciting. Um, but there's also like non entertainment use cases here. Right? And of course y'all are focused on the Niantic spatial platform, and I saw a bunch of use cases there, like spatial planning, warehouse logistics, audience engagement, remote collaboration. What are you most excited about?

[00:46:06] Brian McClendon: I think the. The thing I really am impressed by is this idea of a realtime shared AR VR experience. So let's say you send a an operator out to a site that's got a device that's got a problem, that operator can scan that device and build a 3D visual map of it and upload it to the cloud and immediately show it to somebody who's back at their desk or back on a VR headset.

And that user can then see the VR user and the VR user can see the AR user and they can talk about exactly the same thing. One is there virtually through VR and the other is onsite. You know, one of the things that I've seen just in general is that there, the level of knowledge that you know is needed to solve some problems is very high, but the ability to get out into the field is also a lot of work.

And if you have to do both, your coverage is gonna be much less. And so this idea of having. Multiple data collectors and fixers with the manuals open and everything, I think is, is gonna change how a lot of repairs and a lot of products are built.

[00:47:14] Bilawal Sidhu: I love that. It's like I, I find myself, uh, always going back to this like Peter Thiel quote of The world of bits is easier than the world of Adams.

And you kind of have this technology that in a sense is connecting bits and atoms and like a field service expert going to, I don't know, frigging fix like a power issue or like a 5G tower or something and needs to get that one expert that's sitting in some part of the world to weigh in on it, like as if they're actually there.

Sounds really exciting and it's, it's cool 'cause you can immediately see why that same capability to fricking leave that like Pokémon, you know, on that sidewalk for somebody else to discover is exactly the same technology that enables this far more utilitarian and useful use case.

[00:47:56] Brian McClendon: That's right. And, and we're, we're very excited.

We've got a, uh, several partner customers building experiences to solve the field service style problem right now. Um, but you're right, it's, it's this ability to take consumer capabilities and, you know, build out enterprise products. Something that we learned way back in the day at Keyhole. Right. You know, when we started Keyhole in 2001, that was the end of the dot com era.

And, you know, we thought we'd get millions of users and monetize later. Well, uh, the year 2002 came around and we had to go to Enterprise. And so what could we do with this satellite imagery product? Well, we built Enterprise Services and Enterprise, you know, targeted certain verticals that really wanted this capability and were willing to pay for it.

And so that was how Keyhole survived sort of the dark days of 2002 and 2003. Um, but by 2004 we were doing pretty well when Google bought us.

[00:48:50] Bilawal Sidhu: And now it does feel like there's a bit of this chasm now with like these hardware devices, right? Like I was at the SNAP Summit playing with the SNAP glasses. Of course y'all have a partnership there.

Y'all pretty much have a partnership with all the AR glasses creators and it's like, yeah, these are still kind of dev kits. They're still not quite there yet, the main experiences on the phone. But we can see it's just a couple years out. But I can imagine an enterprise, you know, shelling out for like $1,500 pair of glasses is like the ROI is like immediately

[00:49:19] Brian McClendon: clear.

Exactly. And I think, you know, you're gonna see there will be interesting applications for mixed reality devices like AVP and Quest because they can do the equivalent of AR except it's MR. Um, but many of the same applications work. And now obviously the headset on the AVP is pretty big, but it's, it's beautiful and for a subset of problems, it might be useful today, uh, the Quest three is a lot cheaper and a bit easier to wear and allows for many of those same MR experiences with the same Cogo localization that we were just talking about.

[00:49:54] Bilawal Sidhu: I love that. Yeah. It's like, uh, I, I've got the AVP, the Apple Vision Pro myself, and it's like, it is sort of like this glimpse into what. Amazing AR glasses will enable, I guess in a sense it's like, you know, without waiting for like, I don't know, the likes of meta to take that $10,000 pair of Orion glasses that they showed to, to make that a mass market product.

You can kind of just put cameras at, at the front of a VR headset and pass through reality in a sense, and then, but still build out these experiences that'll transfer over beautifully. But at the same time we also have like the meta like Ray-Ban glasses and these lighter weight form factors. Are you excited about bringing sort of like geospatial intelligence, especially when you combine them with, you know, we talked about large language models to these lighter weight form factors that are more like a microphone and a camera on your head and maybe a really small display, but sometimes not even that.

[00:50:45] Brian McClendon: There's a set of capabilities you can add around localization and like, where are you, you know, these ray bands are going to be an interface. That Meta is obviously hooking them up to their AI and gen AI interfaces. They need context for input and the camera can provide some of that context. And the ability to turn a camera photo into additional context will help with the assistant model the Meta Ray-Bans are working on.

But I think that, you know, once you start adding a display, it gets better. But I, I, I do agree. The Snap spectacles are actually really impressive in the sense that, you know, Evan (Spiegel) has a vision that, that he will build these things. And I think more than almost anyone else, he's really focused on the consumer use case for ar.

And so I think we were surprised at how good this version of spectacles is. It's still not a consumer device, it is a dev kit plus plus, but it is. It points at a good future. The Orion's seem awesome, so we're very happy that Meta's in the game too. And obviously they, they're investing a lot in this, but I think the, the interim step of MR is more interesting on the enterprise because consumers are never gonna wear a Quest headset outside.

But the enterprise, the enterprise user may well feel that wearing one of these makes them a better operator, better technologist, whatever it is, and they're willing to do it to do their job better. And so I think we will see Mr use cases in the enterprise before, you know, AR takes over on the consumer side.

[00:52:16] Bilawal Sidhu: That makes total sense. Yeah. Even despite Apple's best efforts to sort of make the Apple Vision Pro cool and T-Pain walking around wearing those things like yeah. You know, like I stopped seeing those at malls very quickly. It became like a bit of a trope. But yeah, you're right. Like it is the closest thing we have to that North Star experience and it'll be very, very exciting.

Fast forwarding. A little bit there. How do you see all these advancements affecting, you know, even how like cities are designed and how public spaces are used in the future?

[00:52:48] Brian McClendon: I think that, you know, one of the things I've always wondered about is like signage. Hmm. And, you know, signage is, is both good and bad.

I mean, it's, it's good in the sense that it makes it easier to understand where you are if it's not in your language. That's a problem. You know, you don't know exactly what the sign means. You know, signage is one where in a world where everybody had AR glasses, you wouldn't need to label anything because the labels would all be augmentations that are, everybody gets to see in their language at a density that is relevant to them, and that's pretty exciting.

But that's. Definitely only going to happen when everybody has the glasses. And because at the end of the day, if you're device list, you still need to figure out where you're going. And I've looked at cities, you know, I, I regularly look at cities when I go to visit and it's very interesting how dense signage is in some cities versus others.

And, uh, Tokyo being one that is, you know, unbelievably dense. And the problem there is that at least half of it, I don't understand. Thankfully half of it's in English, which is, which is, is very useful. But, uh, the, uh, the important stuff many times is only in Japanese. And, and I will, I will learn kanji, katakana, but, uh, not very quickly.

[00:54:02] Bilawal Sidhu: Yeah. I mean, Japan also seems to be sort of like the final boss of 3D mapping. There's just like so much stacking and 3D, even 4D and nesting that's going on that Yeah. Building a model to encapsulate. Yeah. So one of, some of the denser Japanese cities just feels like it, it's gonna be. The last thing than maybe the highest standard for 3D mapping?

[00:54:23] Brian McClendon: Well, it de definitely was for Google because I think it was one of the last countries that Google actually launched their own map data for, because the, the existing in-country supplier, which was Zenrin, was really good at what they did, but they did it by employing hundreds of thousands of people to go collect that data.

Uh, 'cause the Japan took their maps very seriously. Um, and so it was very hard to keep up with that. But eventually Google got its data good enough that it was, uh, beyond that capability.

[00:54:51] Bilawal Sidhu: One other thing I wanna hit on is, you know, you, you kind of talk about connecting the sort of central mission, the mission statement of Niantic, and also about, you know, there this like utilitarian version of like, you know, an AR and VR experience where like you use VR to preview a place that you want to go, like immersive view and maps or you know, the experiences that you mentioned where you can remotely see a field service expert, what they're doing.

But it seems you and Niantic as a company is far more bullish on AR and that is something y'all share with, you know, snap. Evan has been very, uh, shall we say, blunt about his take on VR. I'm kind of curious, why are you so much more bullish on AR? Like, do you think there's a world in which VR will be just as compelling?

Like, because, you know, people have consoles, people have desktops that we use and those aren't necessarily experiences that are always anchored in the real world.

[00:55:40] Brian McClendon: I, I think, I think VR has a fine, you know, fine experience. Like I, you know, my PC, you know, as you say, game consoles are all great examples of how VR will continue to be consumed, because that's, you know, even without the headset, you know, virtual reality through my, my desktop window here is pretty good, especially with Microsoft Flight Simulator 2024 coming out.

The reason AR is more interesting is that the, you know, we've already proven that we have, you know, 3 to 5 billion of these smartphone devices and everything we do, the number of minutes we spend on our phone. Not in a VR experience, but in a data experience, in a video experience, in a text experience that is, is huge.

And AR glasses are a better screen for that experience to happen on. And you know, they're more convenient. They're, you know, they free up your hands at least to some degree, and they still allow you to look up at the rest of the world. And I believe that these AR glasses are gonna replace phone screens.

And so that's why we're bullish.

[00:56:47] Bilawal Sidhu: That makes a ton of sense. Yeah, I mean like my phone is definitely my primary computing device and I think is for everyone else and like yeah, there's still context where you want the, you know, currently we we're probably staring at multi-monitor setups right now and we've got our, our setup to lock down and get some work done.

But yeah, when you're out and abouts it's the phone and it's so weird that we have this slab of glass that we have to keep looking at. I just can't wait for it to, you know, you go to any of these concerts, especially, you know, I'm in Austin, Texas, there was this like before and after of ACL Austin City limits, like, you know, the skyline certainly changed, but the other thing that changed in the content is like not everyone had their fricking phones up, like completely lost in the experience and that is something, yeah, we could use technology that connects us more to the world around us.

[00:57:32] Brian McClendon: Exactly. I think phones get in the way of that. They cause you to look down and not at the world. They cause you to hold the phone up and take a picture, both of which glasses could replace. And those are both not healthy experiences for us.

[00:57:46] Bilawal Sidhu: So as we wrap up here, like when you think about a 3D map that is, you know, uh, ubiquitous and we need for all of these devices, you know, perhaps there are like glasses or phones, like, I don't know, like food delivery robots on sidewalks delivering stuff.

Um, it feels like we're moving to a more connected world, yet it feels like we're gonna have these like overlapping maps of reality, kind of like we do today. There's like a handful of maps of the world. Is that the future ahead for this new kind of 3D map? Or do you foresee some sort of consolidation?

Because when I think about GPS, it's like, yeah, there's a couple different GPS constellations that you use, uh, in different parts of the world, but it's largely this like public good that anyone can use, right? Maybe the public sector subsidized it. How do you see it playing out for this next generation 3D map?

[00:58:32] Brian McClendon: Uh, I think in the, I mean in the short term, uh, there'll be fragmentation because there'll be subsets of the problem solved by different maps for different reasons. Like Waymo has an excellent map of Phoenix, San Francisco and Austin, and they use that very specifically so they can drive safely around the city without a driver.

Um, so their, their map is, you know, their, their, their solution. I think that the maps that we're talking about will be applied, uh, in certain areas, you know, for localization VPS like we're doing right now. Uh, but more generally to provide this context, I think. For a while there will be fragmentation as the market finds itself, and then there will be a race to quality and completeness and, and who's the, uh, most accurate, the most up to date and, um, provides it the most effectively.

And I think there, you know, like Google in 2008 and 9, there became a winner, um, when one became better than the rest and stayed that way for several years. And, and I know that frustrated Apple quite a bit, but, uh, I think that will eventually happen here. But in the interim, there will be several providers trying to solve this problem, and they'll solve it each in slightly different ways.

Um, they'll share as much data as possible. We already have mapping, uh, mapping data consortiums like Overture that uh, are now close to open sourcing, you know, sort of important parts of the mapping data. But I think that this next set is not, not going to quickly go through the open source world. I think it's a, it's a harder problem because it isn't trivially solvable and it's not, uh, easily copyable either, given the amount of data involved.

[01:00:20] Bilawal Sidhu: Yeah, it's an, it's, this is where like what you're building and what all these other companies, you know, including Google or building is different than the large language model problem, 'cause Yeah, like everyone's kind of just scraping the open web and that's like a thing that you can do easily. Turns out scraping the physical world is a lot harder.

Again, back to world of atoms being way harder than the world of bits. Exactly. All right, so last question. So at Google, you know, one of the things I know is like you predicted that DSLR cameras would be used to create 3D models of the world. While smartphones ended up taking the lead, you seemed to know what was going to happen before most did.

What are some of your current predictions for the next big technological shift?

[01:01:01] Brian McClendon: I think that a lot of capabilities are gonna go on device. Hmm. I think that, you know, phones, uh, phone memory, phone processing, power and battery is capable of solving a subset of what we currently use the cloud to solve. And I do think that privacy is gonna be a big issue that will cause that to happen.

Um, I think that, uh, one of the challenges that we have with human knowledge is being highlighted by Gen AI in large language models right now, which is that the best language models we have are built on the best data they could scrape or collect or combine. And it isn't perfect and it sometimes hallucinates because there are either holes in the data or it gets confused and it gets its wires crossed.

So the how do we get to a point where instead of oscillating around the inaccuracies of data, we start to focus our results in our answers to the correct. Self-checking answer. And I think building systems that can do that, cross-checking that effectively, can fact check misinformation, whether it's geographic, visual or, or you know, text on the web.

That is, that's gonna be very critical. And I think the language models at the moment are suffering both sides of it. But I think there could be a path to applying these models to detect and flag incorrect information. And I. If they can do that enough, we can start to build up a, a, a better data set. I mean, I think Wikipedia is the embodiment of this in some sense, is that people are editing this model of the world and there's enough editing and there's enough process that for the most part, most of the time Wikipedia is correct.

And, you know, if somebody had said this like 20 years ago, we wouldn't have believed it. Yeah. But, but you know, humans have self-corrected Wikipedia to become the sort of best source of truth. And I think we're gonna need to do that far beyond the level of Wikipedia, whether it's through science information or political information or geographic information.

I think we're going to need to build tools to self-correct the, the mistakes in the world.

[01:03:14] Bilawal Sidhu: I love that. Certainly you have an advantage on the geospatial side, where in most cases, ground truth is actually easier to find than a bunch of these other conversations.

[01:03:26] Brian McClendon: This is true

[01:03:27] Bilawal Sidhu: Brian, thank you so much for joining us.

[01:03:28] Brian McClendon: It's been great to be here Bilawal. I am very happy with our shared experience between the two of us we've had such a long history with the creation of map data at Google, and I, you know, really appreciate the conversations we have together.

[01:03:43] Bilawal Sidhu: All right, so let me tell you that conversation with Brian really got my gears turning.

It's wild to see how Niantic flipped the script on mapping Google, Apple, Microsoft, they all build maps from the top down. Satellites, planes, cars, you know the drill Niantic, they're like, hold my pokéball. They're building from the bottom up. Turning millions of Pokémon go players into a global mapping party.

Talk about harnessing the power of games and motivated communities for real world impact. This isn't just changing how maps are made, it's changing what maps can become. We're moving from static snapshots that are updated yearly towards this dynamic, near real-time understanding of our world. And these aren't just maps for humans to navigate to their coffee shop.

They're the maps for machines to understand where they are in 3D space. Whether that's your phone, AR glasses, or even autonomous vehicles. What's particularly mind blowing is how Niantic is bringing cutting edge tech like Gaussian Splatting to the phone in your pocket. Suddenly, anyone can create photo realistic 3D captures of any space or object they care about.

It's literally like having a memory capture device in your hand. And while much of the world is focused on large language models, Niantic focus on large geospatial models is incredibly intriguing. They're taking all these islands of spatial understanding. Their community is already created, all these poké stops that have been scanned and fusing them together, giving AI the same intuitive understanding of a place as the best GeoGuessr players.

This to me is the foundational substrate connecting our digital and physical worlds, dare I say, the bedrock of the metaverse. While I have no doubt that AR will eventually replace our phones, I'm incredibly excited about the future of on-device AI. It's amazing to think that the neural radiance field technology I talked about in my 2023 TED Talk is now doable.

Not in a massive data center, but right there on your phone, all without your data ever leaving the device. That's a huge win for privacy and user control. In Brian's emphasis on shared experiences, well, it really resonated with me. We're already so lost in our digital bubbles, but AR powered by these incredible 3D maps can help us reconnect with the physical world, with the people and places that matter the most to us.

And let's not forget the massive potential for enterprise applications. AI powered tools that allow us to annotate places, collaborate remotely and virtually teleport to any location. Even with the current AR VR headsets, the possibilities are transformative. When I take a step back, it seems the future of maps isn't just about better technology, it's about better connections.

As we push the boundaries of spatial computing, Niantic is showing us that the real power lies not in building perfect 3D models or precise positioning, but in creating tools that bring us closer together, tools that help us rediscover the magic in our physical world. Now that's the kind of future we should be excited to help build.

All right, folks. This is the last episode of season one of the Ted AI Show. Over the past 25 episodes, we've embarked on an amazing journey exploring a world where AI is changing everything from DeepFakes challenging our sense of reality to the dramatic open AI board saga unfolding in real time. We've truly witnessed AI permeate every aspect of our lives.

We ventured into territories where AI is becoming deeply personal from AI NPCs as companions to therapy bots and mind reading interfaces. We've examined AI's growing influence on global systems from predicting the weather to transforming education from UN governance frameworks to national security considerations.

And perhaps most fascinatingly, we've explored AI's relationship with human creativity and consciousness from Hollywood's embrace of AI to the vibrant open source AI art communities. From philosophical discussions about consciousness to the emergence of digital doppelgangers, these conversations have transformed how I think about our technological future, and they've inspired me to push even further as we close out this chapter of the TED AI Show.

I'm excited to share that my journey with TED is evolving. I'll be moving into a guest curator role, bringing cutting edge voices in technology and AI to TED's global stage. For those curious about what's next and wanting to continue exploring these frontiers together you can find me sharing my insights on X and LinkedIn under my name Bilawal Sidhu.

Thank you for being a part of these conversations. They've been foundational for what comes next.

The TED AI Show is a part of the TED Audio Collective and is produced by TED with Cosmic Standard. Our producers are Dominic Gerard and Alex Higgins. Our editor is Banban Cheng. Our showrunner is Ivana Tucker, and our engineer is Aja Pilar Simpson. Our researcher and fact checker is Krystian Aparta. Our technical director is Jacob Winik, and our executive producer is Eliza Smith.

But don't worry this isn't goodbye. I'll see y'all in the next one this time, not as the host of the show, but as the guest.