AI Text-to-Speech models have accents too

jefftk · on July 10, 2021

To really do this you want to break the text to speech into two pieces: use English to turn the text into phonemes, and then use the other language to turn the phonemes into audio.

This only works if your phonemes are encoded pretty generically, though. For example, /f/ in English is labiodental while it's bilabial in Spanish, so if you want your accent changing to work right you'll need to either represent both as /f/ or have a reasonable model for picking the closest sound a speaker of a given language is likely to be able to reproduce for any given input.

C-x_C-f · on July 10, 2021

IMHO one of the reasons for the author's surprise is the colloquial use of the word accent, whereby one usually means a mix of pronunciation [1] and intonation [2].

I think that the surprise disappears once we look at these two factors individually. As per jefftk's comment, it is to be expected that TTS in a certain language will be limited to the phones (and thus the pronunciation) of its language. On the other hand, intonation is always bound to sound "foreign" seeing as this TTS software cannot get even the original intonation right (try listening to the sample text with the US voice to see what I mean), let alone that of a different language.

[1] https://en.wikipedia.org/wiki/Pronunciation

[2] https://en.wikipedia.org/wiki/Intonation_(linguistics)

jrochkind1 · on July 10, 2021

The surprising thing to some of us is how much it sounds like a human native speaker of that language speaking English. Not that it doesn't sound like 'native' English intonation, nobody would expect that, but still surprising to see how after being trained to speak language A, it sounds like a human language A speaker's accent when reading English too, even though that wasn't the training intent/setup. Perhaps not surprising to you that it would go like this because you understand the technology better so expected it!

And then there are other people in this thread who disagree and don't think most of them sound very much like a human speaker of non-English language speaking English! So maybe it's not obvious after all...

dj_mc_merlin · on July 10, 2021

I tried a couple of most of them don't sound super accurate to foreign accents. The Dutch one the author highlighted is pretty far off from what I'm used to. It sounds more like a Dutch person trying to pronounce English like it was Dutch, rather than an actual Dutch accent.

salamandersauce · on July 10, 2021

Yeah. The Japanese one does not sound like a Japanese accent at all. I think it's just reading it as though it were English.

smcameron · on July 10, 2021

Trivially, pico2wave has two English voices, "en-US" and "en-GB", having an "American" and "English" accents, respectively. Incidentally the "en-GB" one is quite a bit better than the "en-US" one to my ear.

pico2wave also has:

German (de-DE) English, US (en-US) English, GB (en-GB) Spanish (es-ES) French (fr-FR) Italian (it-IT)

I think pico2wave's accents induced by cramming English text through the "wrong" language sound a bit better than the few I tried on the Mozilla web speech API, and it works offline, but I don't know that they sound good enough, similar enough to a real person's accent, to be really very useful for that.

yogue · on July 10, 2021

Fascinating. I tried Hindi, and at best, it’s pretty similar to how Hollywood portrays a native Hindi speaker talk in English. Unlike the 1000+ Hindi speakers I know.

jtwaleson · on July 10, 2021

Author here: I discovered this when I was building a multiplication table practice app for my 7 y/o son. You can play around with that here (try quiz mode): https://hugo-tafels.waleson.com/ . Note that the compliments and encouragements are a bit .. weird .. as I just took them from a random 'compliments to kids' website.

malf · on July 10, 2021

I noticed this last year and made a dumb game out of it: thisaccentdoesnotexist.com

pmontra · on July 10, 2021

I typed "Buongiorno, quanto fa venti per dieci?" and made the English voices read it. They sound like Stan Laurel and Oliver Hardy: they subbed themselves in Italian without knowing the language much. It surely added to their performance. You can check their accent at https://youtu.be/057aVSbqWiU

pvtmert · on July 18, 2021

I think this is usable in Latin-based languages. Because Turkish is pretty weird on macOS Catalina + Chrome 92

For instance, "Can" is a Turkish name. So, the pronunciation has nothing to do with the accent in that case.

numpad0 · on July 10, 2021

I guess this is today's lucky 10k[1] thing - Speech synthesis engines in most OS are not deep int[]-to-sound mappings, they are decades old hand built language specific algorithms that parse sentences and synthesize audio by patching library sounds or generate out of trigonometric in whatever way its designers thought would make sense.

Some engines [ignore] foreign words, some pronounce as if TEE-AYCH-EE-EYE are initialisms, some are built multilingual or otherwise as flexible and accommodating as possible. OS included engines are flexible kind because users would make them say "Your Soufflé au Chocolat is arriving" et cetra.

1: https://xkcd.com/1053/

e: ok maybe 100k or more than 10k

JoeyBananas · on July 10, 2021

This is a well-understood phenomenon. I believe that a similar phenomenon occurs with vocaloids (which is basically text to speech software that is designed to sing songs.) I had no idea this was a thing until I happened to meet a person who is a vocaloid connoisseur.

ionwake · on July 10, 2021

In a similar vein - can anyone recommend the lastest and best effort at emotive TTS? ( Preferably with an accessible API? ) Thank you

dkdbejwi383 · on July 10, 2021

It took me around 5 plays of the English accented German to understand what it was attempting to say!

Retr0id · on July 10, 2021

What browser/OS combo do I need, to make this work?

Edit: Worked on Firefox on Android

wh33zle · on July 10, 2021

Firefox for Android works for me. Running version 89.

jtwaleson · on July 10, 2021

I know Chrome works (not Chromium, as it uses online Google services) and iPhone should work too.

high_byte · on July 10, 2021

They all sound the same on my end (Brave)

qayxc · on July 10, 2021

It's using OS level TTS, you don't have the required language packages installed then and it falls back to the installed system default which in your case seems to be English.

nextaccountic · on July 10, 2021

[flagged]

paisawalla · on July 10, 2021

What does `uname -a` return for you?

nextaccountic · on July 12, 2021

  $ uname -a
  Linux hype 5.12.15-zen1-1-zen #1 ZEN SMP PREEMPT Wed, 07 Jul 2021 23:35:33 +0000 x86_64 GNU/Linux