ENGLISH

Nixon’s unheard moon-disaster speech is now a warning about the deepfake future

162

Andrada Fiscutean

| October 21, 2021

| Topic: Innovation

The entertainment industry has yet to regulate the use of deepfakes and voice cloning.

Image: photoworldwide / Getty Images

On September 29, the Emmy for interactive documentary went to ‘In Event of Moon Disaster’, a film that uses artificial intelligence (AI) to create a fake video featuring former US President Richard Nixon. The film shows him delivering a speech that was prepared in case the Apollo 11 mission failed, leaving astronauts Neil Armstrong and Buzz Aldrin to die on the moon.

The multimedia project was created by the Massachusetts Institute of Technology’s Center for Advanced Virtuality, with a bit of help from a Ukrainian voice-cloning startup, Respeecher, which worked on Nixon’s voice.

Then, using a deep neural net, Respeecher’s engineers joined the two, adding Nixon’s vocal timber on top of the actor’s performance, thus creating a deepfake audio recording. To anyone listening, the synthetic voice sounds natural, and it’s indistinguishable from the original.

SEE: Report finds startling disinterest in ethical, responsible use of AI among business leaders

To achieve this level of quality, Serdiuk’s team needed several hours of recording from both Nixon and the actor. Now, they’ve improved their technology, and the process is more straightforward.

“We usually ask for about 60 minutes of speech recordings for target and source voices,” he says. “In many projects, we had less data or worse data, so we know how to work with all data.”

Unlike text-to-speech conversions, which often sound artificial, Respeecher’s technology helps preserve emotions. “Our goal was to make the quality on that level where it would be satisfactory for high-demanded sound professionals in Hollywood,” says Serdiuk.

Respeecher currently employs about 20 experts and has high-profile clients such as Lucasfilm on their books. The startup has worked on several cutting-edge projects in the past few years. For example, it has recreated Michael York’s voice, allowing him to talk about his rare disease, amyloidosis.

“It was a very cool project in terms of using the technology for someone whose voice is gone, who cannot use this voice anymore,” says Serdiuk. His team brought back another iconic voice, that of late American football coach Vince Lombardi, who sent an encouraging message for those struggling with the pandemic during the SuperBowl. In addition to that, Respeecher also synthesized the voice of the young Luke Skywalker for the last episode of season two of Mandalorian.

Serdiuk is optimistic, saying that his small Kyiv-based studio will continue to contribute to blockbusters: “It takes time to build credibility and reputation in Hollywood. But now, we are in a position where some cool projects are coming to us from word of mouth because some people in Hollywood use our technology, and they share this experience with their friends and coworkers.”

Speech-to-speech conversions can be useful in a wide range of projects, from video games to films, from audiobooks to call centre assistants. Respeecher can emulate male-to-female and female-to-male conversions, and in the future, it might even work for voice dubbing in foreign languages.

Ethical questions

Voice cloning raises a number of ethical questions, and some find the technology disturbing. The documentary ‘Roadrunner: A Film About Anthony Bourdain’ that appeared in cinemas during summer faced criticism after it was revealed that a segment of the voice of the late chef was created using voice-cloning technology. Bourdain did indeed write those sentences, but there was no recording of him reading them.

The use of AI was not signaled to the audience. It was only revealed when Morgan Neville mentioned it. Also, it’s not clear if the crew got permission from Bourdain’s family to create his voice synthetically.

Serdiuk says he and the other two co-founders created a set of rules both they and their clients should follow. Respeecher does not provide a public API, and whenever it clones a voice, it adds an audio watermark to it to allow detection by specialized software. Also, when a client wants to clone someone’s voice, they need written consent from that person or their family.

“In my opinion, there is nothing new about this technology that our society has never seen before,” says Serdiuk. “It’s not different from Photoshop, right?”

The entertainment industry has yet to regulate deepfakes, but Serdiuk believes the set of rules his team developed should be mandatory, given that online misinformation might become more prevalent. The recent Emmy his team contributed to might be a small step in raising awareness on the dangers of deepfakes.

“We do spend a lot of time educating, telling about what’s possible, showing what’s possible,” he said. “And this MIT project with President Nixon is a good example for that.”

Innovation

2022 tech trends: generative AI, autonomic systems, hyperautomation, and more

Intel’s automated debugging tool ControlFlag is now open source

The best robot vacuums of 2021: Roomba isn’t your only option

What is AI? Everything you need to know

Nixon’s unheard moon-disaster speech is now a warning about the deepfake future

Ethical questions

Innovation

Related Topics:

LEAVE A REPLY