How AI digital doppelgängers could change the way we communicate w/ Synthesia CEO Victor Riparbelli (Transcript)

Listen along

The TED AI Show
How AI digital doppelgängers could change the way we communicate w/ Synthesia CEO Victor Riparbelli
December 17, 2024

Please note the following transcript may not exactly match the final audio, as minor edits or adjustments could be made during production.

[00:00:00] Bilawal Sidhu: Hey, Bilawal here. Before we start the show, I have a quick favor to ask. If you're enjoying the TED AI Show, please take a moment to rate and leave a comment in your podcast app. Which episodes have you loved and what topics do you want to hear more of? Your feedback helps us shape the show to satisfy your curiosity, bring in amazing guests, and give you the best experience possible.

If you could jump online and be able to chat with your favorite musician anytime you'd like for as long as you'd like, what would that be worth to you? What if you could connect with a personal dating coach as often as you wished to help sharpen up your online dating skills? Would that be appealing? Or what if you could make a digital copy of yourself and release your doppelganger to the web to take care of some of your online identity work for you?

Much of this is actually within reach. Companies are learning to pair AI tech with video, audio, and animation tools to effectively mimic real people and real-ish interactions all at the same time. Musician FKA Twigs, for instance, built a digital clone of herself and uses it to let fans interact with a version of her.

The founder of Bumble, the dating app, talked about how the future of dating might begin with digital avatars pre-interviewing each other, and that sort of flips the AI argument on its head a little bit, doesn't it? We've talked a lot about the potential and risks of AI becoming too human-like, but this is the reverse story.

This is about human beings becoming more digital-like. To become, in a sense, digital humans. If that's something you'd find useful, there's a handful of companies ready to help you create the digital version of you. One of those is called Synthesia. Using a short five minute video you can record with your phone or webcam, you can build a reasonable facsimile of a human being. You can then choose a voice, give it a script, get it translated to dozens of languages, add a few design flourishes, and now you can push relatively pro looking video content to your followers, your employees, whoever. No sets, no actors, no sweat. Many of Synthesia's clients aren't individual people.

They're massive global companies like Heineken, Zoom. Xerox. Synthesia says more than 50,000 customers have built digital avatars into their comm strategies.

[00:02:24] Synthesia Avatar: In today's demanding market, we as team leaders need to be more than just experts at our jobs. This means that we need to be a leader, a coach, and a trainer, and we also need to embody the values, mission, and vision of our company.

[00:02:37] Bilawal Sidhu: That probably sounds to you like a generic, typical computer generated voice, and sure, it is, but it's also the voice of a Synthesia avatar that Electrolux, the global appliance company, uses to distribute video modules to help train its workforce.

[00:02:53] Synthesia Avatar: Be open and listen, be transparent and available. Is that difficult?

[00:02:57] Bilawal Sidhu: The tech is impressive enough that last summer, investors lifted Synthesia valuation to unicorn status, hitting that vaunted $1 billion valuation. It seems like a lot of people are very interested and now very invested in seeing digital humans take off and take over how we communicate with each other now and into the future.

But in this quest to build lifelike, useful digital avatars of ourselves, are we rewriting our understanding of what communicating human to human looks like? Who are we in a world that could soon be dominated by digital doppelgangers? I'm Bilawal Sidhu, and this is the TED AI Show, where we figure out how to live and thrive in a world where AI is changing everything.

What does it mean to be human in a world of digital doppelgangers? Big, juicy, philosophical question, I know. But Victor Riparbelli is one of those real humans who thinks about this a lot. He's the co-founder of Synthesia.

Hey Victor, welcome to the show.

[00:04:08] Victor Riparbelli: Thanks, man. Glad to be here.

[00:04:10] Bilawal Sidhu: Just to level set this conversation first, we already have so many tools for communication, and you've talked about how text, it was the original data compression for human communication, but now we have video calls, messaging, social media, podcasts, newsletters, emails.

The list goes on. Why are digital avatars necessary?

[00:04:29] Victor Riparbelli: I think at its very core, almost any technology we've invented for communication kind of abstracts something away, right? Like text being the most obvious example, where if you take the experience of me talking to you in real life and delivering some kind of a message versus, um, the way you kind of perceive that message, interpret that message, would be very different than if I sent you exactly the same words written in, in a text message.

[00:04:53] Bilawal Sidhu: Totally.

[00:04:54] Victor Riparbelli: I mean, even kind of pre text, right, we had cave paintings, we had all sorts of other technologies that essentially helped us kind of like store information and deliver it to someone else, uh, kind of in a different time and, and, and different space. And what we've been doing since then is really just trying as much as we can to make these technologies appear to be as close to the experience we have in real life as possible.

And I think we have lots of ways we've sort of gone around that, but obviously, you know, the ultimate way of doing this is that it can replicate the actual human experience of speaking to someone. And digital humans and digital avatars, of course is, is an important part in that.

[00:05:31] Bilawal Sidhu: And on that note, I've heard you refer to your avatars as digital humans.

What's the difference in your mind?

[00:05:36] Victor Riparbelli: I think there's a lot of, there's a lot of different words that that kind of goes around. AI clones, AI avatars, AI humans. I think ultimately, I think they all represent roughly the same thing. If you say it's an avatar or a face or character, that kind of implies that it's a non-human entity.

Where I think if you use the word human, that, that does imply something kind of different about it. And with the era we're living through right now with computational intelligence improving very, very rapidly, maybe I think the reason people are talking about digital human now is because it actually feels tangible that we can create something that very closely resembles human life, right? Both in the real world but I think before that, in digital world. Like, all of us have interacted with ChatGPTs and LLMs, we've seen firsthand the power, how much they can actually, uh, pretend to be a human. And if we can give them the kind of visual expression and the, uh, the audio expression of that as well, digital humans, it actually does feel like we're gonna get pretty close to being able to create something that feels like a digital human. Um, not just because we, we use the word, but 'cause it actually feels like it, we'll interact with it, right? So next year we'll launch a real-time avatar we can actually talk to. And I think, I think there's probably something there where that's when we begin to think of it more as a human than we think, uh, of it as just a technology.

And I think maybe a good way to anchor that is when you think about like a chat bot and ChatGPT, one thing that's very interesting is that, I do this myself, and I think most people do, is that when you're interacting with these systems, people are actually quite polite.

[00:07:06] Bilawal Sidhu: Yeah, definitely.

[00:07:08] Victor Riparbelli: You talk to ChatGPT like it's actually like a coworker.

You say please, and it's, it's kind of weird, right? Because you're interacting with a computer that, that, that has no feelings as far as we are aware. But because technology is now so powerful, it's very hard for us, despite conscious knowing that we're interacting with a large language model, to, to feel that way, right? And I think that is, our relationship with machines, that's, that's about to change quite dramatically and digital humans is gonna be the most obvious expression of that.

[00:07:34] Bilawal Sidhu: There are two ways that a person can use Synthesia to create a digital human. They can pick from these off the shelf avatars that you own and build, uh, or they can get custom avatars made of themselves.

I'm curious, which is the more popular route?

[00:07:48] Victor Riparbelli: It's actually like roughly 50 50. In the beginning we were kinda like, which one is the most important? Right? And um, I think as kind of time has gone on, it's very clear that there's no answer to that. They both serve different purposes. One of the things we learned very early on and when we started the company was that one of big reasons that people loved the product so much was because they didn't have to be on camera themselves.

They don't like how they sound. They don't like their accent. And so a big part of the value proposition around Synthesia was actually the people could make video without it having to be themselves, right? Um, and, and that was a pretty big unlock. But then it's also very obvious that there's also a bunch of use cases where you want it to be yourself, right?

So if you're a CEO creating a video about your company's strategy for next year, that's kind of weird coming from an anonymous avatar. If you're a sales person sending out videos to your prospects or to your existing customers to update them on something that's happening, the product, it makes a lot of sense that it's you, and so on and so forth.

So I think it's just, there will be many different types of use cases. Um, and I think we'll see a mix of people's own avatars. We'll see entirely generated, um, avatars that are specific to, to companies and our customers, right? So you can build your own kind of like IP, if you will. And there's also gonna be existing real celebrities that's gonna have, there's gonna be a big unlock in terms of how they can work with brands in a much more scalable way than they could before.

[00:09:08] Bilawal Sidhu: Look, even for myself, I would love to have my digital avatar, digital human, be delegated to do a bunch of this stuff, especially this setup process of recording a video I think is painful, but I'm curious for the demographic that you talked about that is super excited about not having to go through that pain, or perhaps didn't grow up with selfie culture in the, in this world with cameras all around them.

When those folks first encounter their digital avatars, what kind of reactions do you typically see?

[00:09:35] Victor Riparbelli: A lot of people are very self-conscious, um, like they would be if they recorded, you know, just the screen recording of themselves, like a sell to video. But people like it when they like the result. And I think one interesting anecdote here is, um, you know, in the early days of Instagram, for example, the big growth hack that Instagram employed was actually filters on images and on videos, right?

It's actually very simple. It's like you take a picture and you make it, uh, you know, slightly more saturated. You make it like black and white or whatever, but that makes that picture appear to look much, much better than before, right? Whereas every single image that people are taking before on their home cameras would look fairly crappy without having someone actually edit it, which was like out of bounds for like most people.

And so I think what we see a lot of that is the same with avatars. People want to kind of like, touch themselves up. They wanna make sure that they're like, you know, being shot in a nice environment with nice lighting, that they're, like, wearing their best clothes. They want to be like the best representation of themself.

But I think in general people love it, right? People, uh, especially people who doesn't want to be in video, once they're happy with their avatar, it unlocks so much for them. Like, executives who otherwise are asked to record videos several days a week, they now don't have to do that. They can work with their team to just create the content automatically.

And then I think also people have this sort of, um, on a personal level, right, it's kind of, it's pretty odd the first time you see your avatar. It's pretty odd the first time you hear yourself speaking a language that you don't actually speak and it's clearly your voice. It sounds like you, and I think that's a very interesting glimpse for, for people into kinda like the future, right? A lot of these, what I love about gen AI, uh, as kind of a cultural movement and technology movement is that it's so accessible that all of us actually gets to feel firsthand what these technologies mean, right? What can they do? How powerful are they? And this is just such a visceral, I think, experience of some of the things that AI can do.

And I think also everyone feels like, well, this is only gonna get better and better and better, right? Even though of course it's, uh, we, we've made a lot of progress, there's still so much more to go.

[00:11:29] Bilawal Sidhu: I mean, these avatars are really cool. And I will say, I mean, especially coming from a VFX and CG background, you can, at this stage, tell that they're still an avatar.

There's that whole uncanny valley question. And I'm curious on the consumption end of this, what are the reactions like and, and does the context matter there? Like if people are reacting to a video in like a sales inbound email versus, you know, encountering it on a banking website versus a virtual CEO address, how do people react to these digital humans in these various contexts?

[00:12:00] Victor Riparbelli: So I, I think you, you nailed it there, right? It's very much about the context. I'm pretty sure that if I use my avatar to record a love letter to my, my girlfriend, um...

[00:12:10] Bilawal Sidhu: It's like, okay, you outsourced this?

[00:12:11] Victor Riparbelli: She'd probably be, uh, a bit disappointed that I sent my avatar to do that and not my, my real self. Um, but if you're a user trying to understand like your mortgage application on a banking website, and you're presented with like 10 pages of text, of very kind of complex, with very complex information, almost everyone prefers to watch a video that just simplifies it for them, right? So what we generally see a lot of our customers, if not almost all of our customers think, is that they introduce the avatars, like, Hey, this is your virtual facilitator.

This is, uh, not a real person. This is an avatar and they're gonna help you through the buying process. They're gonna help onboard you to your company, whatever. And, um, what we see overwhelmingly is just that people really love interacting with these videos, especially if the alternative is text, right? We just did a big study with UCL here in London because we want to investigate how do people like to react to, to these videos?

There's a few kind of interesting stats. One of them is that, um, people actually completed the videos with avatars faster than the one with humans. That's because when they watched the videos of humans, humans are more imperfect, right? Like, you know, we, we kind of use a few too many words or we say something a little bit clunky or whatever.

Um, and so people kind of scroll back into the video to watch a section again. But with the avatars, because it's kind of like perfect in the sense that it's a script that's been kind of written from the get go, um, the information is actually more concise. And also very overwhelmingly shows that people by far prefer to learn by watching AI videos rather than just in text.

[00:13:41] Bilawal Sidhu: The study that you just, the stats that you mentioned make total sense to me, right? It's like you're distilling down the information and just like communicating it in a far crisper fashion than say, you know, a long meandering conversation from a human. Though, you know, like some humans are more concise than others.

When it comes to that CEO example though, how important is photorealism to you? Maybe it's a level set. If I had to ask you to grade, uh, you know, your, the photorealism of your avatars right now, on a scale of 1 to 10, where would you put it?

[00:14:11] Victor Riparbelli: I think if you, I think you have to dissect it a little bit. I think you would take the photo realism as in like, how real does it kind of look?

I think it's very close to 10. I think what's, um, as in like, if you took a still frame of the video, right? I think it's very difficult tell that it's an avatar, which, uh, is, is very large part due to AI being very good. Like rendering. I think where avatars still have a bit of a way to go, right, is, um, is the body language matching what you say?

There's a, there's a beat to what we say. So when I, when I speak to you now, right? Like my eyebrows move in a specific way. My hands move in a specific way. We have this whole language with our bodies, and we don't notice that in the real world because all of us do this, but we notice it when we see a video of a digital avatar whose body language is kinda like out of sync.

So what most avatar products in the market today, um, not ours, but most avatar kind of companies usually do, is that you take a real video of someone and then you loop it in perpetuity, you just change the lips.

[00:15:08] Bilawal Sidhu: Right.

[00:15:08] Victor Riparbelli: This, this illusion sort of works pretty well in shorter bursts, but you begin to get this kind of weird sense where the head movement is out tune what they're saying.

The hands doesn't match what's being said, and that kind of throws you off quite a bit, right? And I think there's a little bit to go. Our new model that we're launching soon has kind full body, uh, language, including hands. That makes a big difference. And then I think there's still in the voice, there's a little bit of imperfections, but I do think that the visual quality is more or less there.

It's more about like the last percentage of like the body language and the kind of emotional expressiveness, um, in these avatars, right?

[00:15:40] Bilawal Sidhu: What you're saying makes sense to me. So it's almost like the visual fidelity, if you just look at it that way, is, is pretty cool, it's kind of crossed the uncanny valley. But on the other hand, yeah, you're totally right.

Like that emotive quality and the body language, like in motion, um, the, the, that, yeah, still needs a little bit of work there.

[00:15:57] Victor Riparbelli: And that part is like, again, AI will, I think the models we have in-house have have more less solved that. But basically I think what we've seen is that no matter how many human animators you throw at like animating a digital human, we cannot animate it to perfection.

And as humans, we are so, so, so sensitive to even the slightest inconsistencies, right? And what's amazing about AI and generative AI is that the old school way of doing this, right, is that you sit down as a human being and we try to make a list of instructions of exactly how, how someone should move.

[00:16:23] Bilawal Sidhu: Mm.

[00:16:24] Victor Riparbelli: And of course, with AI, what we're doing is kinda like the opposite way around. We're saying, we're not gonna tell you what to do. We're just gonna show you a bunch of examples of how people actually move. And you can yourself learn what that means, right? So we don't tell the computer, Hey, uh, this, there's like, six, seven facial, um, bones and muscles and, you know, all those kind of abstractions in some sense that we as humans have built to animate, uh, digital humans. We can kind of throw those out the window and say to the machine, you know, you figure out your own taxonomy of how the body works and how people move.

And that can be like a 5 billion, uh, parameter model that a human being would never be able to sit down and comprehend, but if the computer understands it, who cares, right? It computes an output that actually looks and feels very realistic, and I think that's what we've seen in every modality, right?

It's just that AI is extremely good at this because it can think way more abstract, um, and in way more kind of parameters and dimensions than human beings ever could, right?

[00:17:32] Bilawal Sidhu: I love this because this is certainly what you're describing as like a huge difference to the way Hollywood has traditionally done it, where it's like, you know, a crazy light stage scan where you're essentially in this dome with a bunch of lights pointed at you, or you know, a Medusa scan where you have to do these explicit expressions.

So that really makes me curious. You know, for a lot of these, um, off the shelf avatars you offer, um, you do capture a ton of your own training data when generating those. And of course, there's a process for folks to make their own digital twin, their own replica as well. You know, what does that process look like now and what is it gonna look like in the future?

[00:18:06] Victor Riparbelli: So right now we need around three to four minutes of footage of someone, and that's just, I mean, that can be recorded with your webcam. You can record with your phone. You can go into a studio. Today, you're still, basically the input is the output as we generally say. So if you record with your webcam, you're gonna get a video back.

Your avatar is gonna be you sitting recording yourself on a webcam. If you're going to studio, it's gonna be you in the studio and so forth. The big thing we're launching very soon is being able to essentially create an avatar view once and then create new variations of your avatar in different environments.

So let's say you've recorded one, where you're sitting at home in your podcast studio, but now you actually wanna record a video where you're on top of a mountain or you're flying a plane, or you're skydiving and you're doing like a million different other things. We can then create that avatar for you by you just essentially using text to prompt yourself into new scenarios.

[00:18:54] Bilawal Sidhu: Cool.

[00:18:55] Victor Riparbelli: This is gonna be a big, big, big unlock. So the way it works is that we still need some video of you. And the reason we need some video of you is because if we, if we started from just an image of you, which is, um, that, that's basically the modality you want this to work in, right? You take a single image, a fanatic can generate a scene of you, then we don't know anything about how you look, how you move, uh, how your head kind of, uh, goes around, right?

[00:19:17] Bilawal Sidhu: So even my teeth, you know, like you gotta...

[00:19:20] Victor Riparbelli: Even your teeth, the way you, the way you talk, we can never infer this from just a single image, right? Because the information is just not there. But what we want to be able to do is we want to build a model that says, this is exactly, you know, like how you move and how you speak, and how your hands kind of work in conjunction with what you're saying.

And then once we have that model, then we can much easier just say, okay, here's a picture of you standing on top of a mountain. Here's you in the supermarket, here's you behind a, a bar or whatever. And then we can begin to create these kind of new scenes. And I, I, I think, you know, this is, this is gonna be gonna, one of those advancements that's gonna have like a huge impact in terms of, of what people use the product for and how much fun you can help with it.

[00:19:55] Bilawal Sidhu: I love that it's kind of replacing the whole kind of green screen visual effects workflow, right? If you just go capture it in reasonably diffused, decent lighting and suddenly you can kind of, you know, choose a bunch of different backgrounds, like that's like virtual production democratized. But before I get carried away and get too excited about that, I do have a question.

Like, so if someone creates this avatar, let's say I made it, who owns it? And can I license my digital doppelganger?

[00:20:21] Victor Riparbelli: So you own it, a hundred percent, and if you want it to delete it, we'll of course fully delete it. No, no questions asked. And, and that'll always be the case. We are thinking about what to do with kind of likenesses and should we create a marketplace where people can rent out their likeness, um, to work with like brands or creators.

It's not a functionality we have yet. What's exciting about it's that it opens up like so many new ways of using your likeness, right? So let's say that you're a celebrity, for example. The traditional way a celebrity would engage with, with a brand is you say, okay, miss big celebrity, we've gotta go into this warehouse.

We're gonna shoot, uh, an advertisement with you and we're gonna take a bunch of still photos. And this is then sort of material for, um, all of our campaigns, uh, moving forward, right? And maybe they'll record some social media clips as well. And then you're kind of done, you've recorded all the content, and now the brand can induce that.

What this unlocks is, what if you have an e-commerce store, and every time someone buys a product, you want to send a thank you message from a well-known celebrity. All of a sudden, it doesn't necessarily need the celebrity to do much else than just say, yes, I'm fine with this, a license of my likeness. And maybe instead of that being kind of like a big upfront payment to celebrity, celebrity is just paid $1 every time someone buys a product in that store, right? And the store can quickly switch out the celebrity with someone else if they wanna try someone else. Or maybe they think that for one, uh, segment of their customers, the Celebrity A is the best choice, for another group of, of, uh, customers, Celebrity B is the right choice. And because everything here is generated with code, right, you can actually begin to, to do these kind of things. And so what I think we'll see is actually a democratization of like working with celebrities in some sense.

[00:21:50] Bilawal Sidhu: Mm.

[00:21:51] Victor Riparbelli: Where today, you need to have like millions of dollars and big budgets and whatever to work with a big celebrity.

In this way, the celebrity could actually pick who they wanna work with, right? Maybe, maybe a celebrity would prefer to work with 500 small artisanal shops all over the US that each pay them, you know, much less. But in aggregate pays the same as like one big Coca-Cola campaign. I think that's actually pretty interesting because I, I, my guess would be if you ask lot of celebrities, um, who they would prefer to work with, they probably would prefer to work with small artisanal shops with products that they actually love rather than some makeup brand who just throw millions at them, right? So I think we'll see a lot of new business models kind of emerge and, and I personally think that's, that's pretty exciting.

[00:22:28] Bilawal Sidhu: That is exciting indeed. And it brings me back to sort of the B2B focus for your company, given that most of your customers are businesses, you know, what are the types of things that they're using it for?

And you know, in the past you've described this sort of as like, you know, it was a vitamin for like the entertainment industry, but it's really a painkiller for businesses. Why is that?

[00:22:48] Victor Riparbelli: So, when we started the company, we initially, as, as you said, right, we set out to actually build tooling for video professionals to be more efficient.

And the first thing we did was build these like AI dubbing functionalities. You kind of take a real video. We did a very famous one, David Beckham speaking obviously in English. And then we could take that advertisement that we could create in 10 different languages. And so it looks like they become this case of speaking in, in a different language, and it's definitely a very cool product.

And there was a lot of interest in it, and it, it, it did like okay in the marketplace, but just had this kind of feeling that if we disappeared tomorrow, they would find another way of solving the problem, right? And it was kind of like a cool thing, but it wasn't really a painkiller, right? It was like an, it was a nice thing to have and it's very difficult to build a big company around something that's nice to have.

You wanna sell something that people really, really need to have. And so as we kind of went through the motions of taking that product to market and, and really just trying to build understanding of video from first principles, we suddenly had this feeling that there's a lot of people in the world who are not making video today, and they're desperate to make video. And when we spoke to those people, they obviously did not work in the video industry, right? They work in big companies. They're like a marketing manager, training instructor, sales professional, something like that. And they're all telling us that they're desperate to make video.

They have a lot of great content, a lot of great knowledge that they wanna share with their customers and with their employees. But nobody reads anymore, right? They send out these emails that just ends up in the archive. So they wanted to make videos. They tried to make videos. The thing if you work in a, in a big company, is that often there's a lot of content to produce, which means the quantity of videos you have to make is very high.

Does often need to translate them, does need to update them after you've shot them because something changes in your business, and that's just impossible to do with a real video. And so, for these people, if we can give them a way to make video, which is a thousand times easier and a thousand times more affordable than shooting it with a camera, they would probably be okay with the quality of those videos being lower than what the video industry would, because for these people, the alternative is not a real video from a camera.

The alternative is text, right?

[00:24:49] Bilawal Sidhu: Yeah.

[00:24:49] Victor Riparbelli: And so it's like we compare this to a real video or we compare it to text. It's not like people are saying, you know, all this content we use to shoot with a camera will allow me to make good Synthesia instead. It's people saying, well, all this text that we have and all these slide decks and all this kind of static information, we can now turn that into video content.

And that became the, the kind of inflection point for us once we kind of figured that out. And I think there's, and I love what you said before, because we have the same kind of feeling, right? It's like, how weird is it that potentially the biggest market for visual effects is actually gonna be corporate communication, um, in a couple of years, not Hollywood?

[00:25:22] Bilawal Sidhu: Right.

[00:25:23] Victor Riparbelli: That's very contradictory. Like no one would've thought that to ever happen. But in many ways, I think the biggest ideas, the most impactful ideas always feel very weird and very contradictory, right? Like Airbnb, I think is like, what if people just like invite strangers to sleep in their home, um, for, for a bit of money?

Like everyone would be like, you're absolutely crazy, right? But I think that's what technology kind of does. It challenges a lot of these kind of inherent assumptions. And I think in our little world, this is a pretty good example of that, uh, because ultimately what we do, to your point, is special effects, right?

It's visual effects. It's, we call it AI because we use AI, but, but at its core, right, it's, it's not too different from what Hollywood has been trying to do for many years.

[00:26:00] Bilawal Sidhu: Definitely is the art and science of visual effects. And I'm kind of curious, right? Like on the consumer side, there's this short form video fatigue and just video fatigue.

Everyone's doing video all the time, but on the enterprise side, as you mentioned, there's a bunch of this content that just would never have been converted into video form. If you take that to the limit, do you think there's a similar risk where we just end up polluting our feeds with a bunch of throwaway content?

It is just going to be like an onslaught of enterprise B2B video content.

[00:26:29] Victor Riparbelli: But I think what's gonna happen is that video's gonna become the table stakes. So today, email is table stakes, right? You don't operate a company without sending out emails.

[00:26:38] Bilawal Sidhu: Mm.

[00:26:38] Victor Riparbelli: At one point, if you're sending me like, email with lots of texts in them, you're just not gonna open them, right?

Your inbox in the future is gonna look more like your TikTok feed, where you just kind of quickly scroll through what's interesting. And as always, just like it is with email today, um, just because something gets easier to produce, you still have to be a great storyteller. You still have to figure out what's the right hook to get my attention to watch your video all the way through and get in contact or whatever it is that you want me to do.

I think all those things around storytelling and building a good product and being good at communicating, none of that goes away. So I think what's true now, what is gonna be true in the future, is about duration and standing out.

[00:27:14] Bilawal Sidhu: So we are seeing an explosion of content and of course every time tools like the ones that you're creating come out, people use it for misinformation and disinformation, right? And so there have been instances in the past where Synthesia avatars were used to spread misinformation. How much of those incidents pushed you to sort of lock down or put rails on the abilities of these avatars?

[00:27:35] Victor Riparbelli: So the, the safety aspect has always been very important to us. And, you know, since we, we founded the company in, in 2017, we did so on, on an ethical framework called the Three Cs: Consent, Control, and Collaboration. Uh, consent is about, we never create avatars of anyone without explicit consent. And that's kind of like a hard stop. Um, which means we kind of lose out on some virality because we don't make funny videos for satire of like celebrities or whatever, right? Uh, but that's a, that's a choice we decided to make. Um, the second one is, is from control, right? So that's basically content moderation, which is that we take a very strong view on what you can use the platform for, what you can't do with the platform for.

We're a B2B product. We work with enterprise. Uh, and so we're probably, uh, a bit overly strict in some senses. You know, there's legal, uh, categories of content that we kind of are very restrictive around. Um, and we put a lot of effort both with machines and with humans into making sure that people don't use our platform of things they shouldn't.

I, I think with, with, with these, uh, incidents that happened in the past and like, we'll always get judged by the one video that makes it through, and we learn something from that every single time. In, in, in many ways, right, like when you do content moderation, a lot of people disagree with you no matter what direction you go in.

[00:28:39] Bilawal Sidhu: Yeah. You're not gonna make everyone happy.

[00:28:41] Victor Riparbelli: Exactly. And especially, of course, when it comes to things like news and politics, religion, et cetera, this gets very, very hairy. And no matter what you do, there'll be people who, who don't like it, right? And so it's, uh, there was specifically one of these instances, which I think was something we discussed a lot internally, was someone make a, made a video and, um, I'll leave out kinda like the details of it, but essentially a video about like a, a pretty hairy topic, right? A, a topic that'll divide people into either you're very pro or you're very against. And the video was actually entirely factual, but it was not perceived at this one big newspaper as being kind of a piece of sort of propaganda information.

[00:29:14] Bilawal Sidhu: Hmm.

[00:29:14] Victor Riparbelli: And that was a very interesting one for us because we fact checked it, and there's nothing that, that wasn't factual in there. You could argue that talking about it in a specific way was kind of like, uh, applied to make people believe something specifically. But I mean, all communication has those properties.

And so what we've decided to do is just to be, again, kind of overly restrictive so we don't allow news and current events content on an enterprise customer, for example. That's actually a shame because we had a lot of like, NGOs, citizen journalists, and so, uh, and, and those kind of folks making great content on the platform, but it's just, like, too difficult to manage eventually.

And so we decided to, to make that, um, to make that rule. So it's something we always work on. As I said, you know, we're not claiming we're perfect, but I think we've, um, I think we have very, very good systems in place today that keeps bad people out of the, the platform.

[00:30:01] Bilawal Sidhu: I gotta say, the stance you're taking is indeed more restrictive.

I hear most platform creators sort of punting this to the point of distribution where they're like, well, the creation tool shouldn't be responsible for this. The distribution platforms should be the ones, you know, bringing the hammer down.

[00:30:15] Victor Riparbelli: Look, I think that the, these questions are like so difficult, right?

And there's so many different ways we think about them. If you think about them philosophically, if there's a question like freedom of speech.

[00:30:24] Bilawal Sidhu: Yeah.

[00:30:24] Victor Riparbelli: From very practical perspective, is this, uh, you know, just about keeping out the bad people that we all agree, uh, are, are bad people, is an economical question.

You know, am I hindering my growth as a company because I'm overly restrictive and leaving the door open for other competitors who are left? Like, there's so many angles, there's so many, it, it's not an easy question, right? And what we have talked a lot about is that there is a shift happening right now, specifically in AI, where gradually, a lot of companies are moving the, the point of moderation to the point of creation, right? Where, of course with the big language models, we see this all the time, right? They will, there's a bunch of things they just won't talk about and they'll definitely not help you with the recipe for a bomb or something like that.

But even also more vanilla topics, like obviously politics being the obvious one. They'll also be kind of like, tiptoeing very much around those kind of things. Uh, in our case, it's sort of the same thing where we actually limit you from actually creating the content. And um, I always explain this is like, that is actually very new, right?

Imagine that when you're using PowerPoint, Microsoft Word, it would stop you from making a slide about-

[00:31:24] Bilawal Sidhu: Right.

[00:31:25] Victor Riparbelli: -how to do something horrible, right? Uh, that's a very weird thought for most people. But in, in many ways, that's actually what we are doing and, and, and what we're building, right? And no one has ever held Microsoft responsible for the fact that a school shooter can write their manifesto in Microsoft Word, right? Or that, I'm sure there's been made PowerPoints about how to do evil, horrible things in wars and so on. But we've never seen that as being Microsoft's responsibility. We've always seen that as being, um, you know, the distribution platform's responsibility once that content actually gets uploaded somewhere.

But I do think that as a society, it's probably good that we, we're like extra careful when we roll out these things in the beginning and then, you know, maybe in 10, 15 years, we'll have a different view on how these technologies should be used and governed. But then, but as a starting point, uh, I mean, my own kind of moral inclination and, and the rest of the company's is that it's good to be a little bit on the back foot and be a little bit more restrictive than, than, than what some people will feel comfortable with.

[00:32:18] Bilawal Sidhu: Building off the discussion and looking towards the future, you talked about next year you're gonna have these avatars that you can talk to in real time. There's an interesting thing that we came across, we did this episode with ChatGPT Advanced Voice Mode, where sort of the guardrails and restrictions that are put on it almost prevent the avatar from being like fully human-like, you know. Like if it's too much in a, in a box, you can kind of see those seams and that kind of pops the illusion. How do you think about that tension, especially as you're moving towards these more expressive product experiences?

[00:32:51] Victor Riparbelli: I totally agree with you, and I think it's so deeply fascinating to me how as humans, we're so good at detecting something that's non-human.

Like when you talk to the voice mode chat, right? Like you understand, okay, this will help you answer, like, kind of practical, factual questions. And every time you ask it for an opinion or to be a little bit human, it'll just default to, you know, back to the, the, the kind of like, uh, robot speech, uh, to some extent. At some point, you know, I think these restrictions will be lifted.

There's a big market and there's a big appetite for interacting with computers that feels very, very lifelike, right? So I think we'll see that, that the kind of, uh, boundary disappear over time. As for us, I think again, you know, we, we've made a decision to be a B2B company and so we're not gonna be offering like virtual boyfriends and girlfriends any, anytime in, in the near future.

Um, but I think a lot of those properties are also very interesting in a business context. Right? For example, if you're a sales person and you do sales training, if you can role play with a prospect that can be programmed and prompted to act in a specific way, you could probably ramp a lot faster than if you have to read a document about how to, uh, you know, come back from different objections.

And I think there's a lot of other, and potential, also more controversial application of this. Think about like psychology, therapists, um, doctors. I think we'll see a lot of those, uh, pop up in, in, in the next, uh, couple of years. And I think ultimately, for a lot of these use cases to really work, it has to feel very lifelike.

You know, I think if you're interacting with like a, a sales simulator, which looks like a computer game from the nineties, you're just gonna disconnect from it. It's not, it's not gonna work, right? And I think right now we're very, very close, like passing through that uncanny valley where actually will feel very, very close to having a zoom call with a real, real, real human being.

[00:34:40] Bilawal Sidhu: It's interesting, even with your B2B focus, you just outlined a bunch of these scenarios where the box is large enough where you can have a very meaningful, interactive experience. Um, so I have to ask you, how far away are we where we can have these AI avatars that can feel indistinguishable from a human conversation?

[00:34:58] Victor Riparbelli: I don't think we're very far, to be honest. I think, um, I think in 12 months time you could probably simulate Zoom calls at a pretty good fidelity. I think the voice component of this is, is kind of getting to full maturity. There's a lot of great technologies out there and the video part of it, depending a bit what you're trying to simulate, but,

if you look at the, the videos that we're watching each other on right now, right? And, and that's a, you know, that's a compressed like zoom feed.

[00:35:24] Bilawal Sidhu: Yeah.

[00:35:24] Victor Riparbelli: Then that's not like, the most challenging thing to replicate, and you're already gonna expect a whole bunch of artifacts and compressions and all these sort of things, right? So if, if that's kind of like the goal, then I think you're, you're not very far from it.

[00:35:36] Bilawal Sidhu: Let me, let me ask it in a slightly different way on this, especially on the visual fidelity end, to use your example from earlier, how long before you can send that digital love letter to your girlfriend and she believes it was actually from you?

[00:35:48] Victor Riparbelli: Um, I think next year. Like I really, I, I don't think it's far away. I think looking at what we're building right now, we have the components. We've taught a system how to predict the correct body language, facial expressions, gestures, that goes with what you're saying. We can generate the voice in high enough quality where it sounds deep and emotional.

So I really don't think that, uh, it, it's, it's, it's more than, than than 12 months away. And it'll be very interesting. I usually in internally, we talk about this as like the ChatGPT moment for video.

[00:36:18] Bilawal Sidhu: Mm-hmm.

[00:36:18] Victor Riparbelli: I think what's so powerful about ChatGPT is that it truly kind of broke through uncanny valley, right?

The first time we use ChatGPT, it's so human that you begin talking to it like a human subconsciously without even thinking about it. Um, I think for audio and text to speech, kind of got there. And for video, I think this is getting very close. So internally we think of this like when you can generate a video of like, a vlogger on YouTube, like you know, the traditional style of like sitting in my bedroom, kinda like talking at you where you can generate that in a high enough quality, a high enough fidelity that you would come home after work one day and you put on an avatar video and just sit down and watch an avatar talk for 18 minutes like a lot of people do with vloggers.

That's where the total market for these technologies explodes by a thousand. Uh, I think when that happens, Pandora's Box is open. There's gonna be lots of ethical questions, lots of cultural questions, lots of, uh, art questions about what does it mean. Um, and uh, I think it will be a pretty meaningful and, and powerful moment.

[00:37:15] Bilawal Sidhu: So let's get into those ethical questions. I mean, it's, it's fascinating, right? Let's say you have these photorealistic avatars that you can talk to in real time. You know, could this tech eventually replace humans completely in, let's say, like customer service roles? And how do you think about that tension, right?

It's like how do you ensure this tech enhances rather than replaces human interactions? Because the thing that keeps popping into my head is like pulling up to a hotel at like 11:00 PM and instead of a human there, there's like a fricking iPad. You know? It's multimodal. It can see me, it'll check me in, it'll do everything.

It's perfect. It can work around the clock, but there's not a human. And you're already seeing some hotels try this, where they've got, you know, essentially a remote worker playing that role right now. But eventually it'll be autonomous. And that's just one example. So how do you think about that Pandora's Box opening?

[00:38:04] Victor Riparbelli: I think there are ultimately two types of, of use cases. If you're calling in customer support, for example, you don't really care about who the customer support agent is, right? You just care about solving your problem the fastest way you, you possibly can. And I think if we replace that with an agent, um, or a bot, I think no one will care about that.

And I think that'll definitely happen. It's a matter of like when the technology is are good enough. If you take the example of a sales person or maybe a hotel receptionist, I think some hotels, they'll want to sell the cheapest room. They'll want to, to have the fastest experience in, of like getting the, the, the key card and just getting into your room.

Other hotels will put a lot of emphasis on meeting and greeting you at the door, taking your luggage for you, explaining what's happening in the city this weekend, and so on and so forth, right? That's a product that's pretty heavily, uh, service dependent. And I think for those kind of things, we'll really value the human connection.

I think it's a bit the same thing with like a salesperson. A lot of people wanna talk to a salesperson because it's a relationship that you built with someone else, right? And I don't think we can replace that. And I think that the human touch, the human element would become much more important in the future.

AI is gonna be much faster at replacing people, typing in Excel spreadsheets all day, than a waiter giving you a great experience at the, at the local restaurant.

[00:39:19] Bilawal Sidhu: I think that's well said. But I wanna ask you, uh, like do you foresee a world where having a digital avatar is as common as somebody having a social media profile? Like, Meta recently announced, you know, digital avatar tools for creators on their platforms, for instance.

[00:39:35] Victor Riparbelli: Absolutely. I think it's just an evolution of the profiles we all have today, right? In some sense, your profile on a social media network is also a clone of you. It's, it's maybe not as visceral as like an, an avatar of yourself, but that is what it is, right? It's a digital representation of who you are.

And, uh, if I go back to my childhood, when I was on forums, right, we have a username. And then the next generation forum, you'd have like a username and a profile picture, and then you'd have like a profile picture with a profile page where you can write something about yourself and your interests or whatever.

And then we all graduated to like social media. And now we have not just one picture of ourselves, we have a whole gallery of pictures that talks about us. And on TikTok we have a whole, uh, library of videos that explains something about ourselves and who we are and our place in the world and so on and so forth.

So I think in many ways just a natural evolution of that. That we will have, uh, kind of digital personas that represent us in, in the kind of digital space.

[00:40:25] Bilawal Sidhu: So are you imagining this tech evolves to a level where, let's say my digital self not only represents me in the virtual world, but in a sense kind of lives my virtual life for me?

[00:40:38] Victor Riparbelli: Um, I don't think it's off the table, you know. I, I, I think, again, I don't think that I will enjoy interacting with my friend's bot as much as I'll enjoy interaction with my friend in, in the flesh than knowing that it's actually him. I think it'll be, again, more probably practical and maybe we'll have like agents that says like, Hey, you haven't seen Simon for six months.

Why don't we arrange something. And I'll say, yeah, that's actually a good idea, right? Then my AI will go to Simon's AI and say, Hey, these guys haven't met up for a while. Why don't we, uh, why don't we set up something for them in a couple of months time? Right? We know that they both love listening to techno music, so let's, like, find a concert, uh, you know, rave somewhere close by and set that up for you.

Right? So I think, again, it's more utilitarian, I think. I, I don't think it's gonna be like our AI, like, catching up on behalf of us and then giving each of like the humans the lowdown of like what was discussed and how Simon's life, I hope that's not gonna be the case. But I think those kind of things, I think we will see a lot more, right? And for one, as someone who has like a pretty busy life, I think that'd be pretty awesome actually. You know? I think from a very philosophical perspective, you can argue that basically everything online is already not real, right? Like your Instagram profile is not a real representation of you. We present ourselves the best light possible, and I think our avatars and all the digital content we'll create around ourselves will probably just be, be an extension of that.

I think what we'll have to learn and what I actually say to the younger generation to some extent, our learning is that this is like it, it's fiction grounded in reality, right? And I usually use the example like when you go to a dinner party and if, or when your parents went to a dinner party. Also when you do for that matter, but in a different time and age, right?

You sit down dinner table, you ask everyone how's it going? And people do exactly the same thing in real life as they do on Instagram, right? It's very few people sit down table and say, actually, you know what? I'm really tired of my wife. I want a divorce. I hate my job. Like most people, yeah. It's going pretty well.

Like we project a version of ourselves to the world. And so I think it's like this idea of projecting yourself is not something that Instagram has created. It's always been the case. It, it makes...

[00:42:29] Bilawal Sidhu: It's amplified, perhaps.

[00:42:30] Victor Riparbelli: It amplifies it and it makes more concrete in many ways, but I think most human behavior has been the same for like thousands and thousands of years, right? We just express it in a different way.

[00:42:40] Bilawal Sidhu: So in this future where these digital humans are photorealistic, they've crossed the uncanny valley, what does that mean for individuality? Like will we be confused by the fact like I can't even tell if this is like Victor that I'm interviewing or you delegated your deep fake to like come and do the interview and it's like indiscernible to me, like what is gonna happen to transparency in that context and individuality?

[00:43:01] Victor Riparbelli: I think that if you look at, at, if you look at text, like you have been able to produce, text and share with anyone online for the last many years. And I think by now most of us have some sort of critical sense that just because something exists as text on the internet somewhere does not make it true.

If you see a tweet from some random account saying, uh, World War IV just, uh, you know, uh, kicked off or whatever, your first instinct is gonna be, that's probably not true, right? You've gotta triangulate that information with, you know, a new source or you go through whatever, right? Um, and I think what's what's gonna happen now is that we're gonna have to move from a world in which, in general, if someone has been recorded with a microphone, with a camera, most people assume that that means that just the fact that it exists means that it's true. That's, that's not gonna be the case anymore, right? And so it, it'll be even more important that all of us learn how to be literate with media. We need to look at things from different angles. Who created this piece of content? When was it created? Is this from a reputable source? And I think this technology's developing very fast. I think it's gonna bridge into a world where we just, per definition, believe nothing of what we see online.

[00:44:11] Bilawal Sidhu: Yeah.

[00:44:11] Victor Riparbelli: We presume that everything is fiction. Everything is a Hollywood film, right? And I think also just that we, we, we basically go back to saying we can only trust things just because it happened in front of us if we saw it in real life. That doesn't mean we can't trust anything we read or see, like we're just gonna have to be more critical around like presuming that just because something exists, it does not actually make it true, right? And I think that's actually gonna be a good thing that we just per definition think that like almost everything is fake and we work backwards from that. And there's a couple of ways we can work backwards from that. We are working with Adobe and some other tech companies on a thing called C2PA, which is the idea that you fingerprint and, and watermark content, essentially, I think we'll move into a world where content is per default verified. So when you take a picture with your phone and you make a video on Synthesia, when you create an image in Photoshop, you choose to register that piece of content in the global database of all the world's content.

I hate the word, but I actually think a blockchain can be a good solution there because it's immutable. When you then upload it to YouTube or whatever your social media platform is, it will look at the content, they'll identify it in the database for all the world's content and say, this video was created by Victor originally in 2019.

It was made with Photoshop and with Synthesia, whatever. Here's some information around it. We know where this came from originally. And that move us into an internet, I think, where most content will be verified, that'll help you make a decision as to, um, to evaluate every single piece of content essentially, and we'll then be in a world in which the content is not verified. It'll stick out like a saw thumb.

[00:45:45] Bilawal Sidhu: I think you, you're right. We are going into a world where authenticating content will be the default and we'll have provenance for most pieces of content that are created. Leaving aside sort of the concerns about the technology, what is it about the potential of digital avatars that excites you most about humans wanting to interact, live, work, and play in this future?

What can go right if you execute your mission correctly?

[00:46:09] Victor Riparbelli: I think the, the beautiful thing about technology is that it enables everyone to essentially have a voice, to be able to bring their ideas to life, share their knowledge with the world. Um, and the two main vectors there is of course distribution, which is that we can share the content once you've created.

And the other one is creation, right? And I think we've seen in many modalities how powerful it is when you allow more people to create. If you look a more recent example, just in my own life, you know, I, I love music and I've seen firsthand how the fact that we've been able to produce digital instruments and we can sample things has led to new genres like electronic music, house and techno for example, right?

That, that's not, that wouldn't have been possible with real instruments. Um, when you see just more recently camera technology being very accessible, like YouTube and I mean podcasts like we're doing right now, those are essential formats that didn't exist before we invented technologies that massively democratize that.

And so for me, the promise of all this is like, well, what if everyone could be a Spielberg? Right? What if anyone, uh, a film student, uh, can go out and say, I have a great idea and all I need to realize that is a lot of time. Uh, and a good idea, right? There'll be a whole bunch of content as we discussed about, that's never gonna be watched by anyone who's gonna be crappy content, but there will also be a film student from somewhere in some, you know, small country in the world, that manages to produce amazing art despite not being connected to Hollywood. And I think that's really the thing that excites me the most. You know, it's like freeing creativity, culture and art is such an important part of moving humanity forward, of, uh, creating peace in the world, bridging all the gaps that, that we kind of have between us.

And I think that that's gonna be a massively, uh, positive thing for the world. We've already seen it play out in many other types of media. Um, and getting video there as well is gonna be, I think, is gonna be transformational for the world.

[00:47:54] Bilawal Sidhu: Love it. Victor, thank you so much for joining us.

[00:47:57] Victor Riparbelli: Thank you.

[00:48:01] Bilawal Sidhu: Victor Riparbelli is the co-founder and CEO of Synthesia, and yes, I'm quite sure I spoke with the real Victor, not his digital twin. Though, in a year or two, even that certainty might be up for debate. What fascinates me is how we've inadvertently paved the way for digital humans through our everyday tech compromises.

I mean, think about it. We've grown completely comfortable with grainy video calls, audio glitches, and awkward zoom delays. These imperfections have actually created the perfect landing pad for digital avatars. We're already operating in a world where good enough video quality is well, you know, good enough.

But what Synthesia shows us is that this isn't just about making believable digital humans. It's about transforming how we create and share ideas at scale. When I started making videos, it meant countless hours of shooting, reshooting, and painstaking editing just to get a simple message across. Now we're approaching a world where anyone with an idea can spin up a video presentation in minutes, in any language, with any number of perfectly delivered takes, and that power to create is incredible.

But it also means we're racing towards a fascinating cultural crossroads. Soon, everything we see online might come with its own digital birth certificate, a verified chain of creation that tells us exactly where it came from and how it was made. It's like we're building a new trust architecture for the digital age.

In a world where anyone can create any video featuring any person saying anything, maybe what becomes most valuable isn't the tech that makes it all possible, but the story underneath it all.

The TED AI Show is a part of the TED Audio Collective and is produced by TED with Cosmic Standard. Our producers are Dominic Gerard and Alex Higgins. Our editor is Banban Cheng. Our showrunner is Ivana Tucker. And our engineer is Aja Pilar Simpson. Our researcher and fact-checker is Krystian Aparta.

Our technical director is Jacob Winik, and our executive producer is Eliza Smith. And I'm Bilawal Sidhu. Don't forget to rate and comment, and I'll see you in the next one.