Movie Critic Roger Ebert–who is recovering from a serious bout with thyroid cancer that rendered him speechless–has been using TTS to communicate. However, Ebert’s TTS recently got a new voice–his own. His new voice by CereProc is programmed from collected audio clips and snippets from the movie critic’s many television appearances and DVD commentaries.
Today, The New York Times reported that Amazon’s Kindle 2 much vaunted text-to-speech (TTS) capabilities, provided by Nuance Communications, came up short when trying to pronounce President Barack Obama’s name. The device uttered something closer to Baah-raah-k O-baah-maah (closer to the sounds in “black” and “Alabama,” the Times said. The paper adds that the problem has since been corrected. Obama’s name has added to the Kindle’s TTS dictionary and will be included in the next wireless update.
The Kindle TTS misfire came to prominence as many news organizations began openly speculating on whether subsequent versions of the Kindle could create a viable non-paper-based means of distribution. Wired, for instance was running the headline “How the Next Kindle Could Save the Newspaper Business” in stories about partnerships the The New York Times and Washington Post were looking to hatch, while mediabistro.com pondered, “Can The Kindle Save Newspapers?” Whether any of that’s true, the failure of Kindle’s TTS to pronounce things like the President’s name correctly may put at least a temporary crimp in any role speech might in any Kindle paper-saving venture.
When it comes to that though, don’t blame Nuance. (more…)
In our continuing examination of speech technology in popular culture, I bring to you one of the weirder things we’ve stumbled across.
You’re all doubtlessly familiar with Loquendo TTS work my brother, Adam B., does every week for our news items on the mother site, but apparently its popular use runs far deeper. Our investigations show that there is an entire community of Spanish-speaking YouTube users who are using Loquendo’s TTS to make video críticas or “criticisms. The críticas are rants chockfull of curses and insults leveled on their subject which range from Dragonball Z to emo kids, a subculture of much maligned, droopy-haired teeners who patron a genre of existentially sentimental rock music also known as “emo,” that are delivered by Loquendo TTS products.
The videos are pretty similar to one another and vary only in target. All, as best we can tell, use the Castilian Spanish male voice font, “Jorge,” and many seem to take advantage of the free demo that Loquendo offers on their website, marked as such because they have this creepy music that Loquendo puts in the background of their demo files–a chorus of synthetic TTS sirens singing Loquendo! over and over again. You’re going to want to check that out for yourself.
For the most part, críticas level their attacks at popular television institutions House, Pokemon, and the Disney Channel (there seems to be an entire subgenre of just Disney Channel críticas–I found heaps of them on YouTube), but there are also some vaguely offensive works like “How to Seduce a Woman,” (in which the narrator explains the importance of body language) and other such lessons.
While these things are easy to write off as just a bunch of immature adolescents or, at worst, adults slagging around, críticas still offer some insights about the future of speech.
We mostly think of Loquendo’s TTS offerings as being business oriented, allowing companies to generate a spoken interface on-the-fly in IVR phone systems or whatever. In these instances, the software essentially allows a big company, an abstract collective, to give a single voice, separate from any real living entity in the world, to itself. This, however, cuts right back in the other direction, and allows individuals to deliver spoken messages anonymously, assuming a synthetic and collectively created voice. Individuals can hide their gender, their age, their nationality-any number of things which might be inadvertently revealed in the expression of their biological voice.
The anonymity in críticas, pretty much authorizes users to curse left and right and traffic in the most backward kind of homophobia. That is, it lets them spout all the words they wouldn’t normally say in public–like a vocalized internet flame–so you get a lot of puta madre this and that. In point of fact, just about every other word in these videos is puta, a Spanish insult for a female sex-worker. The TTS writers get pretty creative with it, transmogrifying the word into just about every noun, adjective, and verb form imaginable. Putanizada, putilla, and putón are just some of the kinds of the flourishes that they luxuriate in.
Also interesting, the otherwise coarse language is couched in fairly complex grammatical structures. The work of one 2Alfredo2, in particular, makes heavy use of interjected clauses. When combined with colorful grammatical plays on common Spanish insult words like jilipolla, the overall effect is both formal and pruriently vulgar. It kind of sounds like a high school English essay gone wrong.
Granted, this work is pretty limited in scope to say the least. There’s only so many kicks you can get out of listening to a machine tell off Hannah Montana with every curse word out of the Real Academia Diccionario de Palabortas. But TTS is an artistic medium in its infancy. There is real potential for the anonymity afforded to users to do good.
TTS might allow human rights activists under repressive regimes, and other marginalized voices, to express their deepest feelings without compromising themselves. It, moreover, gives them access to auditory-dependent media infrastructures like podcasting. Likewise, the cadences in TTS are still, for all Loquendo’s immense advancements in recent years, still sometimes jerky, and, in their strangeness, embody a certain set of aesthetic values that can probably be capitalized by artists willing to engage with them.
There are also potentially harmful effects. The technology could be used to anonymize all sorts of ill-intent and to dissemination any hateful message one might care to pass along. TTS is just a tool, like any other, but as speech starts making its way out of the rarified corridors of business, it is likely to begin to be plied for all sorts of artistic and political ends.
If you have your own TTS art you’d like to share, please leave us a comment!
In yet another twist and turn of the ongoing flap over Amazon’s Kindle 2–you know the one, the whole TTS vs. Copyright Law Controversy–nine disability groups have written to US publishers urging them not to opt out of the TTS function on its e-books for the new Kindle.
The disability groups which including the National Federation of the Blind and the International Dyslexia Association said in a letter to Simon & Schuster:
“For a terribly long time those with print disabilities have been consigned to alternative formats with limited choices on expensive special purpose machines. Now that the opportunity for mainstream access to books on equal terms is possible, this community will not allow publishers and authors to deny them the right to read.”
Letters also went to Random House, Penguin, HarperCollins, Macmillan, and Hachette Book Group.
Over the weekend, my Speech Brother Eric B. and I were searching the Web for Hot Speech Sites–picture the two of us huddled over a computer in trashed hotel room, “Painkiller” by Judas Priest blasting from the stereo, champagne flowing like water, groupies everywhere, chaos–when we happened upon Gizmoz.
For those of you who don’t know, Gizmoz is a site lets users upload pictures, create animated avatars, and record corresponding messages. And like any good site, Gizmoz makes use of TTS!
Naturally, my Speech Brother Eric B. and I created a message for all you Speech Heads out there: