Living in the Future : Star Trek Voice Device
Looking back at this Star Trek episode, It’s hilarious at how we are literally living in the future, just not on a spaceship…. yet.
In the scene (Star Trek : The Next Generation s01e03 — The Naked Now) we see Wes and Geordi having a conversation about a piece of tech that Wes developed to make it feel like he was on the Bridge with the high level crew members. The device plays a clip of Captain Picard giving an order, and Wes reveals that he’s actually holding a sound editing device, which he has used to take different clips from the intercom of Captain Picard talking, to make an edited version say what he wants the Captain to say.
The motive of Wes to create this device does suggest a loneliness and isolation that I could get into, but won’t. More to point though, the tech displayed here is pretty rudimentary by today’s standards. Namely, simple sound editing. Something that has been possible for years with software like Pro Tools , Open-Source audio editing darling Audacity, or countless other examples.
Living in 2019 (as seen in the publishing date), however, we now have tools that not only let us edit sound clips together, but re-create entire human voices. As an example, Dessa, a Toronto based Machine Learning company, had a little fun with marketing and decided to take the literal 1000’s of hours of recorded voice from famous Podcaster, Comedian, and Entertainer Joe Roegan’s massive podcast “The Joe Rogan Experience”; to train a specialised text-to-speech software program to sound exactly like Joe, in order to showcase current Machine Learning (LM) techniques.
The voice is powered by a machine-learning model that copies the sound of Rogan’s voice and applies it to a text script. source
In fact, it sounds so convincing that they even made a little game out of it at this website : http://fakejoerogan.com/ . You have 8 different clips, some of which are real Joe and some are the Vocal DeepFake.
The reason this particular example works is that Joe has a literal gold mine of vocal data, due to the hours and hours and hours of interview on his show. Thus, training a Neural Network to mimic his specific voice based on text entry is a great project to show what is possible with a large amount of data.
However, Adobe showed off something a bit more terrifying that goes beyond what the team at Dessa accomplished :
The reason this is even more frightening, if you haven’t put it together yet, is that it makes DeepFake vocal technology extremely accessible. With a tool like this, Wes could have easily had Picard give literal orders to his crew that sounded straight from the source.
In the real world, within the past few years, Scammers began capturing people’s spoken “Yes” on the phone in order to then turn around and manipulate their finances, steal identities, or run scams on others. This was with a one word clip taken over the phone! source
Joe’s a great counter example to the Adobe product, as the only reason he works for Machine Learning, is because of the extreme amount of historical vocal data. However, In this demonstration from adobe, they seemingly did it with one clip from the subject. The lack of data needed to create false statements is actually astounding here.
I also want to point out that Adobe clip is from 2016. That’s 3 years ago! A scary long amount of time for innovation to occur. Meaning, there were some points in the video where you could clearly tell something wasn’t right, the word “wife” for example, when the presenter played it back. I can’t imagine they haven’t smoothed that out.
This sort of easily accessible sound editing tool combined with DeepFake video technology would have created some very problematic content for the world. We currently rely on recorded and video evidence as admissible in the US court of law, can we trust it moving forward?
On a positive note : This technology was never officially released by Adobe, and they haven’t had any other mentioned of it since the 2016 ideas demonstration. My guess is they realised how much damage a tool like that could be and decided not unleash it up on the public.
Circling back to Wes, his simple sound editing device was thought of as very futuristic, a thing that existed in space! We now have tech that is far beyond Wes’s tricks, and opens up much larger ethical questions than that episode of Star Trek even imagined in this throw away scene.
The future sure is strange.
Like my writing? I hope to write more frequently about business, the future, and other nerdy things that interest me.
You can find me @cannolinatoli on instagram and twitter, where I spout wisdom, post the most handsome pictures of myself, and share my favorite bassy music.