New Tools Allow Voice Patterns To Be Cloned To Produce Realistic But Fake Sounds Of Anyone Saying Anything

from the shopped-images-are-so-yesterday dept

Fake images, often produced using sophisticated software like Photoshop or the GIMP, were around long before so-called "fake news" became an issue. They are part and parcel of the Internet's fast-moving creative culture, and a trap for anyone that passes on striking images without checking their provenance or plausibility. Until now, this kind of artful manipulation has been limited to the visual sphere. But a new generation of tools will soon allow entire voice patterns to be cloned from relatively small samples with increasing fidelity such that it can be hard to spot they are fake. For example, in November last year, the Verge wrote about Adobe's Project VoCo:

"When recording voiceovers, dialog, and narration, people would often like to change or insert a word or a few words due to either a mistake they made or simply because they would like to change part of the narrative," reads an official Adobe statement. "We have developed a technology called Project VoCo in which you can simply type in the word or words that you would like to change or insert into the voiceover. The algorithm does the rest and makes it sound like the original speaker said those words."

Since then, things have moved on apace. Last week, the Economist wrote about the French company CandyVoice:

Utter 160 or so French or English phrases into a phone app developed by CandyVoice, a new Parisian company, and the app's software will reassemble tiny slices of those sounds to enunciate, in a plausible simulacrum of your own dulcet tones, whatever typed words it is subsequently fed. In effect, the app has cloned your voice.

The Montreal company Lyrebird has a page full of fascinating demos of its own voice cloning technology, which requires even less in the way of samples:

Lyrebird will offer an API to copy the voice of anyone. It will need as little as one minute of audio recording of a speaker to compute a unique key defining her/his voice. This key will then allow to generate anything from its corresponding voice. The API will be robust enough to learn from noisy recordings. The following sample illustrates this feature, the samples are not cherry-picked.

Please note that those are artificial voices and they do not convey the opinions of Donald Trump, Barack Obama and Hillary Clinton.

As Techdirt readers will have spotted, this technical development raises big ethical questions, articulated here by Lyrebird:

Voice recordings are currently considered as strong pieces of evidence in our societies and in particular in jurisdictions of many countries. Our technology questions the validity of such evidence as it allows to easily manipulate audio recordings. This could potentially have dangerous consequences such as misleading diplomats, fraud and more generally any other problem caused by stealing the identity of someone else.

The Economist quantifies the problem. According to its article, voice-biometrics software similar to the kind deployed by many banks to block unauthorized access to accounts was fooled 80% of the time in tests using the new technology. Humans didn't do much better, only spotting that a voice had been cloned 50% of the time. And remember, these figures are for today's technologies. As algorithms improve, and Moore's Law kicks in, it's not unreasonable to think that it will become almost impossible to tell by ear whether the voice you hear is the real thing, or a version generated using the latest cloning technology.

Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • identicon
    Anonymous Coward, 2 May 2017 @ 6:23pm

    Reminds me of that episode of "The Clone Wars" where Obi-Wan had to infiltrate a group of bounty hunters by posing as one named Rako Hardeen.

    In that episode he had to ingest a robot that had copied Hardeen's voice so he could sound like him.

    Makes me wonder if Star Wars tech isn't as science fantasy as people thought.

    reply to this | link to this | view in chronology ]

    • icon
      DannyB (profile), 3 May 2017 @ 6:08am

      Re:

      Reminds me of the StarTrek TOS episode A Taste of Armageddon.

      While Kirk is held on the planet below, the Enterprise receives a message in Kirk's voice. They don't believe it with good reason. Their first theory, which they exclude for technical reasons is that it was not done by a "voice synthesizer".

      reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 2 May 2017 @ 6:40pm

    Pairs nicely with video manipulation.

    https://www.youtube.com/watch?v=ohmajJTcpNk

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 2 May 2017 @ 7:04pm

      Response to: Anonymous Coward on May 2nd, 2017 @ 6:40pm

      Great. Now everything can really be fake news. It was fabricated from (nearly) whole cloth.

      reply to this | link to this | view in chronology ]

      • icon
        DannyB (profile), 3 May 2017 @ 6:09am

        Re: Response to: Anonymous Coward on May 2nd, 2017 @ 6:40pm

        Which will be more bigly? A market for fake news or fake pr0n?

        reply to this | link to this | view in chronology ]

      • icon
        DannyB (profile), 3 May 2017 @ 6:12am

        Re: Response to: Anonymous Coward on May 2nd, 2017 @ 6:40pm

        Another thought: plausible deniability of video or audio evidence.

        No, I did not have sex with that woman / man / goat / etc.

        He was resisting arrest. I did not use excessive force. That is not him screaming. I did not beat him mercilessly.

        reply to this | link to this | view in chronology ]

  • icon
    discordian_eris (profile), 2 May 2017 @ 6:41pm

    Say goodbye to wiretaps and banking by phone. Etc...

    And really Josh, you shouldn't call your mother that! We have you on her voicemail.

    reply to this | link to this | view in chronology ]

  • identicon
    Unanimous Cow Herd, 2 May 2017 @ 6:47pm

    The GIMP?

    GIMP is an image manipulation tool. "The GIMP" is a patent leather clad chap locked in a box in Zed's basement.

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 2 May 2017 @ 6:50pm

    While I personally am really excited about the possibilities, I can't help but think cops are, too. Coupled with so-called exonerating phrases like "stop resisting," they may just try to make a situation look like they were justified in deploying even an RPG (perhaps by making it sound as if an arrestee hurled abusibe/menacing worda at the cops?). After all, they've never really shied away from deceit and manipulation.

    reply to this | link to this | view in chronology ]

    • icon
      nerd bert (profile), 3 May 2017 @ 8:30am

      Re:

      Some folks view this with doom and gloom.

      I, on the other hand, welcome this new technology!

      "Honey, I never said THAT! You aren't remembering our conversation correctly. Here, let me play the recording I made of it."

      reply to this | link to this | view in chronology ]

    • icon
      Bergman (profile), 3 May 2017 @ 10:37am

      Re:

      Why actually interrogate people, when they can simply create an audio (or even video) recording of them confessing?

      Just TRY to get a court to accept your claim that you never said anything of the sort.

      reply to this | link to this | view in chronology ]

      • icon
        R.H. (profile), 3 May 2017 @ 4:41pm

        Re: Re:

        Just a few of the linked articles entered into evidence and maybe an expert witness with an audio "recording" of the prosecutor admitting to having had sexual relations with a goat (or something equally implausible) should convince a jury that the tape is unreliable evidence on its own (in addition to giving them a good laugh).

        reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 2 May 2017 @ 6:59pm

    As Techdirt readers will have spotted, this technical development raises big ethical questions

    There are probably some ethical questions to ask here, but that statement doesn't talk about any of them. It, at most, introduces a question of the efficacy of police/criminal justice best practices. That's not ethical, it's procedural. The only vaguely ethical conundrum it alludes to is whether or not we should execute anyone who displays talent in any scientific or engineering field to prevent technology driven change.

    reply to this | link to this | view in chronology ]

  • identicon
    Peter, 2 May 2017 @ 7:14pm

    Maybe someday...

    Let's be honest, the current state of the art isn't fooling anyone. Just listen to the Lyrebird samples: while you can tell who the famous person is supposed to be, it's still got the drunken Swedish robot quality that has plagued text-to-speech engines forever.

    The very best TTS these days is very very good, but it still requires carefully collecting phonemes and a lot of work to make it sound realistic. Even then, its generally distinguishable from a real voice within a sentence or two.

    Color me skeptical, but I think the nightmare scenario of creating forensically-realistic fake audio from just a few minutes of voice sample is a long way away. The old-fashioned way of splicing together words and phrases is still better.

    reply to this | link to this | view in chronology ]

    • icon
      PaulT (profile), 3 May 2017 @ 12:42am

      Re: Maybe someday...

      Three thoughts come to mind:

      1. So what if it's a long way away, should the implications of the tech be ignored until someone perfects it?

      2. These things do tend to have a tendency to improve exponentially, so it could be a lot sooner than you think.

      "Even then, its generally distinguishable from a real voice within a sentence or two."

      3. Given the tendency for political debate to be driven by soundbites and for people to jump to conclusions based on a couple of seconds of video, that might be all that's needed.

      reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 3 May 2017 @ 5:02am

      Re: Maybe someday...

      As someone who's done graduate-level research in this area, let me comment on that. The algorithms in use here are being driven by a limited number of voice samples; clearly, if the size of the training set increases, so will the accuracy of the output. We've already seen similar rapid progress in image and video manipulation, so there's no reason not to expect the same here.

      The reason that you can -- currently -- readily detect that the output isn't real is that you're a human being who's evolved an extraordinary auditory sense over millenia. Of all our senses, it's arguably the most highly developed -- which is why, for example, we can detect a musical note that's only a tiny fraction off or recognize each other with a sample size of one word. In other words, our ability to detect ersatz speech is much better than our ability to detect ersatz pictures.

      But this technology, or one like it, will eventually confound that too. Whether it takes a year or twenty, it's coming. So just as "pictures don't lie" is now obsolete, we'll have to change our standards for evidence to cope.

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 3 May 2017 @ 9:40am

        Re: Re: Maybe someday...

        One thing that I have observed is that humans are very good at detecting changes in background noises. Also, this has been a way of detecting edits to sound for a very long time, and part of the reason than films and videos use so much background music, it makes the glitches in background noise over edits.

        Conversely, if you want to protect a recording from alteration, play some songs at low level in the background, as that will make changing the recording hugely more difficult, in both separating your words from the background, and in syncing up the replacement background..

        reply to this | link to this | view in chronology ]

    • icon
      JoeCool (profile), 3 May 2017 @ 8:54am

      Re: Maybe someday...

      The current voice tech is meant to change a word here and there, not speak whole sentences. That might not sound dangerous, but could be if you change the right word the right way (or should that be the wrong way?).

      reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 2 May 2017 @ 7:57pm

    "I like big butts and I cannot lie" - Hillary Clinton

    reply to this | link to this | view in chronology ]

  • identicon
    Lawrence D’Oliveiro, 2 May 2017 @ 8:26pm

    In The Future ...

    ... everyone will sound like Doctor Bot from Space Station 76.

    reply to this | link to this | view in chronology ]

  • identicon
    John, 2 May 2017 @ 9:09pm

    This could make future voice recordings invalid as evidence

    With this tech it makes you wonder how future wire taps and similar voice recordings could be used as evidence in court. It would be "easy" to fabricate voice recordings or have the audio play in an environment in which the voice would be recorded, which could result in charges being brought against a person.

    reply to this | link to this | view in chronology ]

    • identicon
      Se Habla Espol, 2 May 2017 @ 10:22pm

      Re: This could make future voice recordings invalid as evidence

      I think you'll find that chain-of-evidence rules require that any such evidence be attested by a human, under oath, claiming that he performed the recording being offered. Other possibilities exist, but they amount to swearing as to knowledge of the authenticity of the offered evidence.

      reply to this | link to this | view in chronology ]

      • icon
        discordian_eris (profile), 3 May 2017 @ 1:25am

        Re: Re: This could make future voice recordings invalid as evidence

        Um, hate to break this to you, but you cannot necessarily trust such attestations. Hell, HENRY LEE was caught tampering with evidence!

        reply to this | link to this | view in chronology ]

    • icon
      Ninja (profile), 3 May 2017 @ 7:24am

      Re: This could make future voice recordings invalid as evidence

      That's one of the most important aspects of this issue. I'd go even further though. Countries like China could use the technology to rewrite history as they please. The implication of technologies that allow the production of full videos with voice and all that are very hard to distinguish from reality can have very real and devastating consequences. One more reason to doubt everything unless there's a way to trace the 'supply chain' of the thing. Maybe we are entering an era of zero trust. Which may be a good thing since people will try to develop systems that don't rely on trust to operate and produce reliable, trustworthy results (CAs came to mind instantly because they are already living that trust crisis).

      reply to this | link to this | view in chronology ]

      • identicon
        Thad, 3 May 2017 @ 11:26am

        Re: Re: This could make future voice recordings invalid as evidence

        Countries like China could use the technology to rewrite history as they please.

        Nah, not China, not anymore. Maybe North Korea. In China, and even countries like Iran, despite the government's best efforts the public can still get access to the open internet.

        That's where Orwell was wrong: he lived in an era where the government could control the public's access to mass communications media, and he assumed that would still be the case in the future. It's not, except in nations with crippling poverty like NK.

        China still does just fine with its disinformation campaigns, of course. And anyone, even Alex Jones, can make outlandish claims and convince some people that they're true. But China doesn't have the propaganda stranglehold on its public that it used to, and the way I see it, improvements to technology will benefit the public's ability to see through bullshit more than the governments' ability to create it.

        (Whether or not people actually see through the bullshit is, to my mind, a separate issue. There are plenty of people who will believe what they want to believe regardless of evidence; more realistic fakes will color that issue but I don't think they'll fundamentally change it.)

        reply to this | link to this | view in chronology ]

  • identicon
    Pixelation, 2 May 2017 @ 9:16pm

    Let's hear Trump say...

    "I have a dream..."

    reply to this | link to this | view in chronology ]

    • identicon
      Wendy Cockcroft, 3 May 2017 @ 7:15am

      Re: Let's hear Trump say...

      I have a dream that my five children will one day live in a nation where they will not be judged by the content of their bank accounts but by the content of their character. Wait...

      reply to this | link to this | view in chronology ]

  • identicon
    Châu, 2 May 2017 @ 10:38pm

    Open source version

    Have open source version yet?

    reply to this | link to this | view in chronology ]

  • icon
    Kal Zekdor (profile), 2 May 2017 @ 11:06pm

    I am the system administrator.

    My voice is my passport. Verify me.

    reply to this | link to this | view in chronology ]

  • icon
    orbitalinsertion (profile), 3 May 2017 @ 12:05am

    At some point, the general expectation that the voice one is hearing does not in fact belong to the purported owner of the voice will outpace the technology anyway. Interesting times.

    reply to this | link to this | view in chronology ]

    • icon
      PaulT (profile), 3 May 2017 @ 12:45am

      Re:

      That raises an interesting idea in my mind. If people are conditioned to ignore audio & video evidence because it's often faked, how much are people going to get away with because people distrust the evidence? You can literally film someone red handed, and they just have to raise the idea that footage has been tampered with to introduce reasonable doubt and get away with it.

      In fact, how would news reporting work, given that nobody trusts first hand accounts any more even when accurate audio & video evidence is gathered.

      reply to this | link to this | view in chronology ]

      • identicon
        Michael, 3 May 2017 @ 6:26am

        Re: Re:

        "how would news reporting work, given that nobody trusts first hand accounts any more even when accurate audio & video evidence is gathered"

        The same way it works today, no?

        reply to this | link to this | view in chronology ]

  • icon
    That Anonymous Coward (profile), 3 May 2017 @ 1:35am

    The FBI today dismissed the terrorism case, to avoid revealing how they obtained the alleged wiretaps.
    Questions came to light after the defendant submitted hospital records showing he was in an ICU in a coma when these alleged recording were made.

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 3 May 2017 @ 1:49am

    Yep it's pretty scary in terms of potential abuse by government, LEO, etc.
    But I can't help think this would work great in games:
    Being able to generate dynamic NPC dialog without having to record hundreds of hours.
    Calling the player by their actual chosen full name.
    Imagine something like DA:O only fully voiced this time. Yes, even your character's lines.

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 3 May 2017 @ 7:29am

      Re:

      You could even have the player's own voice for his or her character - during character creation, put a blurb of text on the screen and ask the player to read it out loud.

      And the modding scene would take off. New quests would only need text typed into a database if the modder is happy with the existing library of in-game voices.

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 3 May 2017 @ 10:51pm

        Re: Re:

        This could potentially destroy the market for voice actors in video games and animated movies/videos. I wonder how they will protect the use of their voice in games or other media, especially as the samples to seed the algorithm could likely be taken from someone who sounds like the actor.

        reply to this | link to this | view in chronology ]

        • identicon
          Anonymous Coward, 4 May 2017 @ 3:17am

          Re: Re: Re:

          Actors (or their agencies) could license their own sample libraries. Or they could just refuse to record & license samples.

          Western gamers don't generally buy games based on voice actor casting. The studio's name matters more than who's voicing.

          For example Lara Croft was voiced by at least 5 voice actors. And Cole MacGrath was voiced by 2 actors.

          Anyway the VA market is a lot bigger in East Asia (S. Korea, Japan and maybe China) than it is here.

          reply to this | link to this | view in chronology ]

  • icon
    hij (profile), 3 May 2017 @ 4:12am

    Dawn of a new age

    Obligatory Edward Tufte Tweet demonstrating that the more things change the more they stay the same: Edward Tufte on photoshop

    reply to this | link to this | view in chronology ]

  • icon
    Alasdair Fox (profile), 3 May 2017 @ 4:42am

    Voice Rights

    So, how long until we see the first voice copyright / vocal rights lawsuit?

    reply to this | link to this | view in chronology ]

  • identicon
    Jigsy, 3 May 2017 @ 6:05am

    Honestly, a better title would be something like...

    "New tools allow voice patterns to be cloned to produce 'totally legitimate confessions of crimes say Police'"

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 3 May 2017 @ 6:25am

    Actually kind of excited for this technology

    This tech could take procedural generation of video game NPC dialogue to the next level.

    reply to this | link to this | view in chronology ]

  • icon
    D.C. Pathogen (profile), 3 May 2017 @ 6:47am

    Ahhh....Classic leverage

    You will watch your neighbors and report back to us or we release the fabricated tape of you admitting to something that will really embarrass you and get you fired, possibly jailed.

    reply to this | link to this | view in chronology ]

  • identicon
    Joe P, 3 May 2017 @ 8:37am

    scary possibilities

    When terrorists use fake video with voice to convince the masses that their legitimate leaders are corrupt (e.g. Pope saying kill all the ..) then we are really screwed. A war could be started or just one lone wolf converted to the cause.

    We need to teach everyone to be skeptical, inquisitive, and knowledgeable of the many ways people can be manipulated.

    reply to this | link to this | view in chronology ]

  • identicon
    Stephen, 3 May 2017 @ 8:43am

    With all due respect to Lyrebird, but the example of Obama, Trump, and Hillary talking on the demo page referenced in the article are all too obviously fake. The stilted, machine-like monotone gives them away as computer-generated voices.

    Real people don't talk like that.

    If Lyrebird want authenticity they need to try harder to get rid of those qualities.

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 3 May 2017 @ 8:50am

    could this be the savior of Reality TV?

    The so-called "reality" TV shows will love this because it will be much quicker and easier for them to create fake dialog than their current method of painstakingly splicing a person's spoken words together. And presumably much less fake sounding than the often sloppy splicing of multi-toned speech snippets.

    The next logical innovation for "reality" shows may well be the ability to "photoshop" these synthesized words into people's mouths so the camera won't be forced to cut away whenever they "speak" spliced words.

    ... but on the other hand, wouldn't it be so much easier to just give these "reality" actors an actual script instead of creating dialog in the editing room?

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 3 May 2017 @ 8:50am

    Fake voice, ID Theft and Wall Street/401K

    Kiss your 401K retirement money good by. If Wall Street itself doesn't steal your retirement money then the ID thieves (who get the tool and a voice sample) will!

    Remember, the scum on Wall Street use your recorded voice (on the phone) and the personal identity information (routinely exposed by Wall Street's own butt kissing firms) to "establish" your identity. You/We are truly screwed.

    The next time Al-Qaeda or ISIS or the Mob or the Drug Lords attack Wall Street (and Washington), I think I'm not likely to care too much. Wall Street/Washington is looting us so viciously, that their enemies attacking them is very low on my list of concerns!

    reply to this | link to this | view in chronology ]

  • icon
    McGyver (profile), 3 May 2017 @ 10:06am

    It's never a question of "if we should make something", just implement it as soon as possible and let others deal with the consequences later.
    But then again one of these companies is Adobe and they pretty much never turn down any idea they think will make money, regardless of ethics or fairness.

    reply to this | link to this | view in chronology ]

  • icon
    Bergman (profile), 3 May 2017 @ 10:41am

    We're not supposed to say "yes" to robo-callers...

    But now they don't need us to say yes at all to get a recording of us agreeing to something we never agreed to.

    This could really pump up FedEx stock, since oral contracts just became completely unreliable.

    reply to this | link to this | view in chronology ]

  • identicon
    Rekrul, 3 May 2017 @ 12:21pm

    If I didn't know it was fake, the Trump speech samples might have fooled me, but Obama doesn't sound right and Hillary sounds just like one of those robotic, female text-to-speech apps.

    The problem isn't so much matching the pitch and such of a specific voice, it's writing the software to properly pronounce words. People have been putting up videos on YouTube with artificial voices for years. Many of them are very good and sound almost perfect, but then they mispronounce a word and you realize that it's a machine.

    reply to this | link to this | view in chronology ]

  • icon
    Uriel-238 (profile), 3 May 2017 @ 2:26pm

    Does this mean...

    I can actually get my voice-activated digital assistant to sound like GlaDOS?

    reply to this | link to this | view in chronology ]

  • icon
    Griffdog (profile), 4 May 2017 @ 12:32pm

    Benefits for communications

    In these days of broadband communications, it’s hard to remember that there are still some very low data rate channels in use. Meteor burst, VLF, and others offer some unique propagation benefits, but at speeds that were already eclipsed by 1980’s era telephone modems. So, imagine that your communications set already has the voice parameters of the people you’re most likely to talk with. Now, by simply exchanging text at a low data rate, your comm gear can convert the words into realistic voices that actually sound like the people with whom you’re talking. Real-time conversations on channels that are running 75 bits per second, or less. Just add some encryption and authentication protocols, and Bob's your uncle.

    Hats off to science fiction author David Drake and his Hammer’s Slammers series, where hovercraft tank commanders use this approach to hold voice conversations via radio waves bounced off of the ionized trails left by the small meteors that constantly burn up in the atmosphere; a very robust but low data rate communications channel.

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 4 May 2017 @ 1:38pm

    But now we can get Robert Redford to narrate EVERY movie.
    Or Samuel L Jackson, or or or

    Are you not entertained?

    reply to this | link to this | view in chronology ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Use markdown for basic formatting. HTML is no longer supported.
  Save me a cookie
Follow Techdirt
Techdirt Gear
Shop Now: Copying Is Not Theft
Advertisement
Report this ad  |  Hide Techdirt ads
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Advertisement
Report this ad  |  Hide Techdirt ads
Recent Stories
Advertisement
Report this ad  |  Hide Techdirt ads

Close

Email This

This feature is only available to registered users. Register or sign in to use it.