Speech Technology Magazine SpeechTEK Conference
 
Eric B.   —   March 31, 2009 @ 6:18 pm

Jenga!Yesterday, the Gerson Lehrman Group (GLG) provided analysis of a joint study between Harvard University and Warwick University. The results, they suggest, put a damper on the unspoken implications of a 2008 Nuance study that found using speech recognition was safer than using tactile controls.

The Harvard/Warwick study, which had a quick rundown in Wired magazine last December, found that “The worst results came from the subjects tasked with listening to a list of words and then speaking new words that began with the same letters as each word on the list. Those ‘drivers’ had a 480 millisecond delay, which at 60 miles per hour would mean 42.3 additional feet traveled before applying the brakes.”

This, GLG extrapolates this to mean that voice command-and-control will have similar results.

“This task is similar to using an in-vehicle system for command and control purposes.  The driver is speaking to the system and then waiting for [its] response and possibly speaking again,” it writes.

It’s quick to add, however, that speech interactive systems often offer shortcuts and reduce the amount of time require to engage with them, possibly mitigating some of the risk.

It should also be noted that these results seem to collude with a AAA study we reported on last month on the main site, that concluded that the danger to drivers in using wireless devices was not primarily the use of their hands, but the use of their cognitive attentions. Where strict safety is concerned, really drivers shouldn’t even been listening to music, much less doing anything more complicated.

The conclusion that GLG comes to is that voice command-and-control while safer are not safe. It suggests that Nuance’s report has some limitations. This isn’t the first time it’s questioned the 2008 report. In July of 2008, GLG questioned the significance of the sample size, thirty participants, and how accurate a study in an artificial simulated scenario would be in the real world.

Perhaps somewhat derisively, it writes,“Nuance recently released the results of a study that claims to “prove” that speech recognition used in-vehicle while driving increases driving safety. I’m sure that the results of the study are right, to the extent that Nuance is releasing any data and conclusions.”

Responding to the concerns raised by GLG in yesterday’s analysis, Michael Thompson, senior vice president and general manager of Nuance Mobile, says, “The results of last year’s study demonstrated that speech-powered systems in vehicles help reduce driver distractions posed by manually entering information into navigation systems, entering music selections via mp3 players, making and receiving phone calls, and so on.  Clearly, the safest option is for drivers to simply refrain from using these devices and applications, but for those who insist on using them, the study showed that a hands-free, eyes-free option provided by speech is the next best alternative.”

Perhaps, Thompson is right. Who, for instance, is going to forgo listening to music in the car? On the other hand, one might argue that it isn’t enough for any manufacturer, developer, or even person to take morally neutral stands, reconciling ourselves to saying people oughtn’t do it, but we may as well make it safer. That’s perhaps too easy an answer. But then, what can you do? If Nuance doesn’t do it, some might say, someone else will, and then they will have ceded important business ground, really the existential foundation of their entire venture into automotive work. If there is a demand, are companies responsible first to some arguably tentative moral stand (after all who is authorized to make decisions for people unilaterally?) or the market?

And there is a market. My brother Adam B. for instance, will never stop using speech in the car. He moonlights as a NYC cabdriver–one of the 5% of cabbies in the City without a driver’s license I may add. His cab is so speech-enabled that it won’t even start unless he politely says “Good morning, Mackie”– Mackie’s the cab’s name.

For dangerous speech-enabled drivers like him, there’s just no reformin’.

Adam B.   —   March 30, 2009 @ 1:21 pm

tweet tweet

Hey Speech-Heads,

So I was talking to my Speech Brother Eric B. this morning about all the usual things–zombies, the crushing flow of time, hosted speech solutions, what to eat for lunch–when it dawned on me: Speech Tech is on Twitter and many of you Social Media Savvy Speech-Heads may be sadly unaware.

So, if you Twitter, follow SpeechTech for One-Stop Speech-Enabled Tweets. We will provide you with all the latest in the world of Speech Technology and updates about SpeechTEK 2009.

And, we will follow you back, I promise.

Adam B.   —   March 26, 2009 @ 11:26 am

cosmocomYesterday I interviewed Steve Kowarsky, executive vice president and co-founder of CosmoCom, for a News Feature on the company’s launch of CosmoCall Universe Version 6.  If you haven’t seen the story yet, check it out here to learn about all the really cool innovations happening over at CosmoCom.

Following up on that story, we at SpeechTech Blog today bring you a couple of related videos from CosmoCom.

Check out a very funny video about CosmoGo here and a video about CosmoDashboard here.

Eric B.   —   March 25, 2009 @ 10:42 am

Que pasa, baby?In our continuing examination of speech technology in popular culture, I bring to you one of the weirder things we’ve stumbled across.

You’re all doubtlessly familiar with Loquendo TTS work my brother, Adam B., does every week for our news items on the mother site, but apparently its popular use runs far deeper. Our investigations show that there is an entire community of Spanish-speaking YouTube users who are using Loquendo’s TTS to make video críticas or “criticisms. The críticas are rants chockfull of curses and insults leveled on their subject which range from Dragonball Z to emo kids, a subculture of much maligned, droopy-haired teeners who patron a genre of existentially sentimental rock music also known as “emo,” that are delivered by Loquendo TTS products.

The  videos are pretty similar to one another and vary only in target. All, as best we can tell, use the Castilian Spanish male voice font, “Jorge,” and many seem to take advantage of the free demo that Loquendo offers on their website, marked as such because they have this creepy music that Loquendo puts in the background of their demo files–a chorus of synthetic TTS sirens singing Loquendo! over and over again. You’re going to want to check that out for yourself.

For the most part,  críticas level their attacks at popular television institutions House, Pokemon, and the Disney Channel (there seems to be an entire subgenre of just Disney Channel críticasI found heaps of them on YouTube), but there are also some vaguely offensive works like “How to Seduce a Woman,” (in which the narrator explains the importance of body language) and other such lessons.

Here’s a typical specimen:

[youtube]http://www.youtube.com/watch?v=ERueevnv-YU&feature=related[/youtube]

While these things are easy to write off as just a bunch of immature adolescents or, at worst, adults slagging around, críticas still offer some insights about the future of speech.

We mostly think of Loquendo’s TTS offerings as being business oriented, allowing companies to generate a spoken interface on-the-fly in IVR phone systems or whatever. In these instances, the software essentially allows a big company, an abstract collective, to give a single voice, separate from any real living entity in the world, to itself. This, however, cuts right back in the other direction, and allows individuals to deliver spoken messages anonymously, assuming a synthetic and collectively created voice. Individuals can hide their gender, their age, their nationality-any number of things which might be inadvertently revealed in the expression of their biological voice.

The anonymity in críticas, pretty much authorizes users to curse left and right and traffic in the most backward kind of homophobia. That is, it lets them spout all the words they wouldn’t normally say in public–like a vocalized internet flame–so you get a lot of puta madre this and that. In point of fact, just about every other word in these videos is puta, a Spanish insult for a female sex-worker. The TTS writers get pretty creative with it, transmogrifying the word into just about every noun, adjective, and verb form imaginable. Putanizada, putilla, and putón are just some of the kinds of the flourishes that they luxuriate in.

Also interesting, the otherwise coarse language is couched in fairly complex grammatical structures. The work of one 2Alfredo2, in particular, makes heavy use of interjected clauses. When combined with colorful grammatical plays on common Spanish insult words like jilipolla, the overall effect is both formal and pruriently vulgar. It kind of sounds like a high school English essay gone wrong.

They say Montana is the "Cyrus State."Granted, this work is pretty limited in scope to say the least. There’s only so many kicks you can get out of listening to a machine tell off Hannah Montana with every curse word out of the Real Academia Diccionario de Palabortas. But TTS is an artistic medium in its infancy. There is real potential for the anonymity afforded to users to do good.

TTS might allow human rights activists under repressive regimes, and other marginalized voices, to express their deepest feelings without compromising themselves. It, moreover, gives them access to auditory-dependent media infrastructures like podcasting. Likewise, the cadences in TTS are still, for all Loquendo’s immense advancements in recent years, still sometimes jerky, and, in their strangeness, embody a certain set of aesthetic values that can probably be capitalized by artists willing to engage with them.

There are also potentially harmful effects. The technology could be used to anonymize all sorts of ill-intent and to dissemination any hateful message one might care to pass along. TTS is just a tool, like any other, but as speech starts making its way out of the rarified corridors of business, it is likely to begin to be plied for all sorts of artistic and political ends.

If you have your own TTS art you’d like to share, please leave us a comment!

Adam B.   —   March 24, 2009 @ 1:49 pm

to tts or not to ttsIn yet another twist and turn of the ongoing flap over Amazon’s Kindle 2–you know the one, the whole TTS vs. Copyright Law Controversy–nine disability groups have written to US publishers urging them not to opt out of the TTS function on its e-books for the new Kindle.

The disability groups which including the National Federation of the Blind and the International Dyslexia Association said in a letter to Simon & Schuster:

“For a terribly long time those with print disabilities have been consigned to alternative formats with limited choices on expensive special purpose machines. Now that the opportunity for mainstream access to books on equal terms is possible, this community will not allow publishers and authors to deny them the right to read.”

Letters also went to Random House, Penguin, HarperCollins, Macmillan, and Hachette Book Group.

For the full story, check out this link to TechFlash.

For a glimpse at my Speech Brother Eric B patiently waiting for a fax, check out this link.

Eric B.   —   March 23, 2009 @ 5:01 pm

Enter the Dragon!First of all, I know what you’re probably thinking out there in Speechlandia. Where’s this much talked, much huffed about Dragon 10 review that the Brothers B. have been promising? Well, we’ve had to keep mum about this because of an embargo, but we’re finally unfettered. Shortly after we began our review process, we got a phone call from Nuance HQ.

Hold the presses!

They told us that they were going to be releasing 10.1 and did we want to review it? Did we want to review it? Did we want to review it? Sheeyeah we wanted to review it. We were promised some copies and are now just patiently awaiting to begin the process anew with the latest version.

Apparently, the biggest update is that 10.1 is compatible with the 64-bit version of Windows Vista. Likewise, it has a couple of fixes.

But, wait! That’s not all!

(more…)

Eric B.   —   March 23, 2009 @ 2:54 pm

THIS IVR IS BUSTIN' MY BRAINS!!!!Apparently this past weekend was the in time to be blogging about IVR-building best practices. It was like some kind of IVR High Holiday. TMCnet opened services with a mammoth “Ten Tips for Improving IVR Functionality” just in time for Friday Shabbat, while Ivrsworld followed with a riveting Sunday mass, “Top Five Tips for Effective Use of IVRs in Call Centers.

There was a lot of overlap between the two. Both suggested that you should never hide live agent options, make callers repeat any of their information, and keep prompts short and to the point-sound advice for a would-be designer.

Oh, boy, we really wish we could have been there to deliver you to our IVR Saturday call to prayer, but alas, we failed you Speech Heads. While Adam B. and I were selfishly enjoying our own weekend holiday from speech, we missed this trend. But let us better late than never, right?

Thusly, I would like to announce Speech Tech Blog’s very own:

Top Three Monday Tips for IVR-Building

(All of which are real original stuff, and cannot be found in any other top anything IVR list.)

(more…)

Adam B.   —   March 18, 2009 @ 12:24 pm

oh my

If there are two things that my Speech Brother Eric B. loves, they are Speech and Fashion.

So naturally, he was frolicking and doing victory laps and slapping high fives and pounding out Terrorist Fist Bumps when we came across the following Speech Gem.

For Japan’s Fashion Week in Tokyo, the Institute of Industrial Technology Agency developed a Creepy Talking Robot (CTR) that can walk the catwalk and interact with the audience via speech recognition.

Check out this link for more information about the creators of this CTR–code named HRP-4C.  Or, if you are feeling particularly brave, check out this slide show.

As for Eric B. and me, you know where we can be found: On The Speech-Enabled Cat Walk:


Eric B.   —   March 17, 2009 @ 4:19 pm

When I think of you iTouchscreen myself.Speech Heads, the world of smart phones is a hectic wild west of speech-enabled announcements and developments these days. We’ve got the new BlackBerry Storm that has some users chucking their iPhones like yesterday’s borscht, a new Treo coming out hoping to do the same to the Storm, and the iPhone looking to borschterize the whole lot of them with a software upgrade being unveiled, TODAY!

That’s right! The new iPhone OS 3.0 is being unveiled today! Engadget has coverage here!

Anticipated among the features is a heavy dose of new speech-enabled command-and-control features. The move would remedy what some have seen as wasted potential.

Speech-enabled dialing has ranked on nearly every top 10 list of sorely missed features for the iPhone. While there have been a spate of such dialers available for download on the AppStore, it seems rather silly for Apple not to have its own and cede an obvious feature to a third-party.

No word yet as to whether this will be included in the new version of the iPhone OS, but as Apple looks to release and include some speech, I thought I’d help them along by writing up a little list of my:

Top Three Missed Speech Features for iPhone.

1.) Speech-Enabled Conversation Agent

"Nobody by that name lives here, see?"I don’t know how many times you’ve gotten a phone call from someone you didn’t want to talk to, but had to. Probably every day, if you’re anything like me.

Perchance to dream no more!

With the iPhone’s Speech-Enabled Conversation Agent, all those calls can be handled by a real-time TTS agent. Conversation Agent’s recognition engine processes the voice on the other line and provides lifelike accurate responses. You can chose from a suite of 24 voice-fonts in 4 different languages.

For American English users, you can choose thickly southern regional Emmett, street-smart Sammy, fast-stuttering Stenny, Brooklyn-brogue Haroldine, and smoker’s-cough Walter.

Use it on bill collectors, relatives, for product briefings, on office conference calls, or on needy significant others who call incessantly to be told they are not an abject failure. The sky’s the limit!

Here’s a sample:

Bill Collector: May I please speak with Mr. Adam B?
Smoker’s-Cough Walt: Who’s askin’?
Bill Collector: My name is Bill Collector. I’m calling on behalf of the Collector Collection Agency.
Smoker’s Cough Walt: Ain’t you heard? He’s been dead for years…

2.) Speech-Enabled Free Phone Calls

Cell phone bills these days are outrageous, aren’t they? It’s like 400cc’s of human blood a minute to make a call during “peak hours.” This new iPhone feature, however, takes care of that with free telephone calls. Pay absolutely nothing. Local, long-distance, international. Free all the time, everywhere. Oh, also, it’s speech-enabled or something.

3.) Voice Search Room Searcher

It's a dog's world, baby.“A search engine for you keys.”

Tired of losing things all the time? Tired  of desperately scrambling through your hotel room, making sure you didn’t leave anything behind before your noon checkout? Tired of not knowing where grandpa hid the afikoman?

With Room Searcher, let the iPhone do the work. Simply say Seek X and the iPhone will find it. The recognizer engine will process your request and then use the onboard infrared, heat-sensitive, and x-ray capabilities to find whatever you were looking for in a given room.

For a fee, you may also upgrade to Voice Search BountyHunter, which extends Seek capabilities beyond just a given room to the entire Tri-State Area. BountyHunter also adds Seek&Destroy, a new capability that lets users track down any living quarry and remotely destroy it using the iPhone’s onboard HazardArray of projectile missiles.

Adam B.   —   March 16, 2009 @ 12:11 pm

scaryJust last week, British Speech-To-Text provider SpinVox issued a press release listing its “Top Ten Worst Moments In Public Speaking.”

The list–which includes everyone from George W. Bush to Kate Winslet–features links to news stories about and YouTube clips of the Terrible Moments in Public Speaking.

Also in the release are public speaking tips and a list of the Top 5 Worst Public Speakers: 1. Gordon Brown; 2. David Beckham; 3. Kate Winslet; 4. Chris Moyles; 5. Prince Charles.

Well, Speech-Heads, you like my Speech Brother Eric B., maybe be wondering what this release has to do with Speech Technology, STT, or SpinVox.  All I can say is this: I have no idea.

So,without further ado, here is SpinVox’s Top 10 Worst Moments In Public Speaking:

1. George W.  Bush; “Fool me once”; 2002.

2. Delia Smith; “Lets be ‘avin’ you!”; 2005.

3. Kate Winslet; “Oh, God, who was the other one again?”; 2009 at Golden Globes.

4. Judy Finnigan; “An unfortunate wardrobe malfunction“; 2000 at National TV Awards.

5. Gwyneth Paltrow, “Sobs”; 2005 at Academy Awards.

6. Halle Berry; “Tears and screams”‘; 2002 at Academy Awards.

7. Boris Johnson; “Olympic handover speech“; 2008.

8. Gerald Ratner, “Total crap speech”; 2001.

9. Keven Keegan; “I will love it if we beat them”; 1996.

10. Donald Rumsfeld; “Known unknowns”; 2002 .

Next Page »
Previous Posts
Keyword Tags
Archives
© 2008 - 2010 Speech Technology Media, a division of Information Today, Inc. About/Contacts | PRIVACY POLICY