Hey Speech-Heads,
My Speech-Brother Eric B. and I were enjoying a leisurely brunch here at The Home Office–reading the newspaper, sipping mimosas and eating our traditional Speech Tech Breakfast of poached eggs, kippers, rashers, fried bread and Jell-O–when we came across a story on the cover of The New York Times about speech technology.
Needless to say, Eric B. spilled a cup of hot tea on his trousers and shrieked like a small child. But there it was: A New York Times Story about Google Translate!
Check out the above link, to read all the news that’s fit to print about Google Translate. We haven’t seen anything this high profile since The New Yorker Episode of 08.
|
|  |
Lo’ Speech Heads! From high atop Mount Apple, the sentence was handed down. Today, Eric Schmidt, Google’s CEO, will have to find a seat outside Apple’s boardroom. Schmidt had served on the board for three years. In a statement from Apple, Steve Jobs noted Schmidt’s contributions and said that the departure was mutual.
“Unfortunately, as Google enters more of Apple’s core businesses, with Android and now Chrome OS, Eric’s effectiveness as an Apple Board member will be significantly diminished, since he will have to recuse himself from even larger portions of our meetings due to potential conflicts of interest,” Jobs said.
There may be more to the story than just that, Speech Heads. Schmidt’s departure follows a murmur of rumors that the FTC was investigating whether his position on the Apple board would constitute a violation of anti-trust laws. It also follows Apple rejection of Google Voice from the App Store—meow!—which, according to Techcrunch, is being looked at by the FCC. Apparently, Apple, Google, and AT&T were all given letters of inquiry on Friday asking about the decision. Citing pending proceedings regarding wireless access and handset exclusivity the FCC wants to know what role AT&T plays in deciding what makes it into the App Store.
No clue what the answer is, but by brother Adam B. says it’s bound to be juicy!
|
|  |
Now back in the loving embrace of our New York offices, I thought I’d take a look back at Voice Search and give you Speech Heads out there some final views.
Like all trade shows, there was of course a fair amount of wheeling and dealing-companies ponying up to each other, seeing if they could hew together some kind of symbiotic relationship that would produce some killer solution capable of reaping mega profits. Sort of like a Power Ranger’s Megazord, those giant fighting robots the Rangers had that were made up of various other smaller robots.
In all that hubbub, it was pretty clear that there were three companies that everyone was looking to try and integrate their offerings into: Google, Yahoo, and Microsoft.
There was hardly a minute between sessions I didn’t see Michael Cohen from Google or the gaggle of Microsoft folks not surrounded by eager speech impresarios. Marc Davis from Yahoo, who was only in town for a couple of hours to boost oneSearch at his keynote, was literally deluged by a crush of people wanting to exchange business cards (full disclosure: me too!) before he had to jet back to San Francisco.
The prevailing feeling at the conference, as I described in my last dispatch, was that mobile voice search is where it was at; that there we would see real and massive growth for speech in the coming years. All heads were turned to giants like Google and Microsoft to lead the way, too. They, many feel, could provide the shake up that speech has really needed.
The field has been kind of limited in scope for the last pack of years. Until late, it hadn’t really expanded too far beyond the places it’s traditionally been found: call centers, command-and-control functionality, and dictation. Without new territory, speech has plugged along without ever seeing explosive growth. With the entrance of Google, Microsoft, and Yahoo into voice search, the speech community seems to be excited by the possibilities, and, though they might be reluctant to say it on the record, some of the potential changes in players.
It’s no state secret that Nuance has been dominating speech, acquiring technologies like IBM’s patents, or Philips’ speech, and a slew of others. In the process, as you might find in any aggressive climb to the top, it’s stepped on quite a few toes getting there and has no shortage of discontents. You don’t have to push too hard to get people griping about Nuance in San Diego.
“In a market where there hasn’t been a big brother, [Nuance] rolled up into one,” Joseph Bentzel, chief marketing officer for SpeechCycle and, it should be noted, a competitor, told me. “But in a market where there are bigger brothers doing it for free and virally…” he added before trailing off with half a smile and letting his pause sketch out the possibilities.
While Nuance has cast a large shadow over speech, acquiring its way to the top, building a strong speech provider out of a company that originally just handled OCR scanner software, ScanSoft, Mr. Bentzel thinks it’s reached the end of the line as far as being the undisputed king of speech. By his account, voice search will grow the market and create a space outside of Nuance’s purview.
“Nuance will not exist as a leader in 24 months unless Paul Ricci [Nuance's CEO] reads this article and hires me,” Mr. Bentzel jokes.
Part of Nuance’s problem, as he sees it, is that they’ve tried to become the one-stop solution for all speech needs. They’ve tried to control the process from the ground up, acquiring and integrating technologies into their own banner. This has had the effect of freezing other companies out, and, in some cases, making them hostile.
“This is the Rebel Alliance,” Mr. Bentzel says of Voice Search. “This is the Luke Skywalker Show. We’re on the ice planet and they’ve ignored us.”
While he seems totally at ease comparing Nuance to the Empire from Star Wars, Mr. Bentzel is also quick to say that everyone in speech ought to “thank Paul Ricci for putting speech on the map.”
“I’m not one of these Nuance haters,” he insists. He says he’s more or less agnostic and only sees problems where market growth is impeded, so forget about thinking he views Ricci as some kind of Darth Vader force-choking everyone at the table.
In fact, he suggests that there wouldn’t be much speech out there without Nuance’s drive to make it a big business.
Mr. Bentzel’s position (and others like his) represents an attitudinal shift in how the field has come to view itself. If I, or anyone else for that matter, made the mistake of saying “speech industry,” there were a group of people on hand, just ready pounce, saying, “Speech isn’t an industry, it’s a tool.” Speech is starting to see itself as a subordinate modality to larger functionality, not an end in and of itself the way it has been viewed in its more academic roots.
If you don’t believe me, just try saying “speech industry” for yourself at SpeechTek in August. When you walk into that trap, they’ll whip out that little tool mantra like it were a brand new gun they’d just been itching use and you were the hapless mugger who made the mistake of trying something today.
It’s a crazy mixed up world out there, Speech Heads. Even without the recession, everything is in flux and it seems like everyone is trying something today. Carry a speech-gun and watch your back is my advice.
***SPECIAL NOTE: Due to an oversight entirely on my part, we had erroniously reported that Nuance didn’t have much of a presence at Voice Search. In fact, they did. Brad Bargan, Nuance’s VP of product development, participated in several events. My most humble apologies to them and to our readers.***
Now that Google has released Voice Search for it’s Google Mobile App for iPhone, reports of problems with the speech recognition are being reported.
Evidently, Voice Search works really well in North America, but is posing some challenges for users Britain.
For example, one user spoke a search for “fish,” only to retrieve the search results for “sex.”
Of course, we at Speech Tech Blog can relate. Every time I do any type of web search, I only get search results about sex. But maybe that’s just me.
Below are a number of stories about similar problems with Voice Search:
Link 1
Link 2
Link 3
Link 4
As promised, the Speech Technology News Feature about Google’s new voice search for iPhone.
A few days ago, we brought you this post about Google’s new voice search for the iPhone.
And as expected, Google is being typically reticent about this development.
But fear not, gentle reader: We at Speech Tech Blog are not done yet.
In fact, as soon as I am done with this blog post, I have a strongly worded email to send to a certain massively-powerful-and-ridiculously-secretive-company. A company that has access to all my personal information; a company that has a record of every Internet search I have ever made; a company that will, for obvious reasons. remain nameless.
But in the meantime, check out the following links for more information on Google’s voice search for iPhone.
Official Google Blog
Official Google Mobile Blog
Google Press Center

According to a story in today’s New York Times, Google researchers have added voice recognition to their search software for the iPhone.
According to The Times, the new application will be free and available via the iTunes store. Basically, users will be able to simply speak searches into their iPhone. The results will then be displayed on their phone and include local information when applicable via iPhone features that detrmine user location.
We expect an announcement about this from Google, so keep your eyes peeled for a Speech Technology News Feature in the near future.
In the meantime, John Markoff’s New York Times story can be found here.
|
|  |
So, I came across this strange and oddly worded press release today, which announces the creation of a toolbar for Internet Explorer that enables speech recognition for Google’s search engine.
According to the release, when the program–created by developers from the Ukraine–is installed, users may enter a search by voice and then either click the search button or say “search!” This redirects users to www.voicesearchbar.com where they will see the results of their search.
From the release: “Authors of the program recommend launching the training mode to enhance the quality of speech recognition. After 3-4 launches of a training mode program recognizes speech pretty good.”
Hmmm.
Apparently, the VoiceSearchBar toolbar is available for download at www.voicesearchbar.com.
I visited the site–it features an attractive blond woman (who evidently likes speech enabled internet searches) and comes with a special dedication: “Dedicated to Bill Gates: ‘In five years, Microsoft expects more Internet searches to be done through speech than through typing on a keyboard.’”
Hmmm.
The site also features an About page that allow users to make donations.
Hmmm.
I really don’t know what to make of this. Part of me feels like if I were to download VoiceSearchBar, my bank account would be instantaneously drained, a vast array of pornographic images would be sent to everyone in my email address book and my computer would explode.
In search of a second opinion and in an effort to not malign what could be a perfectly legitimate product, I contacted a friend in the Speech Tech World. He seemed to think that my instincts were correct. He also thought it odd that VoiceSearchBar was being released for Internet Explorer, which is not an open source browser.
I don’t know: Maybe VoiceSearchBar is totally legitimate. Maybe not. Does anyone out there have any information about this?
|
|  |
Last week, Google implemented optional audio previews for YouTube comments that allow users to listen to comments before posting them.
What made this move somewhat buzzworthy was that it came shortly after Randall Munroe’s popular webcomic XKCD suggested that Google add audio preview for all comments–so that people might realize how inane the majority of their comments really are.
And really, you can’t argue with that. A random visit to YouTube yields the following:
The YouTube “Spotlight” was “Project’s Slidy Interlude.” I clicked on it and watched “Interlude: Slidy“–which was actually pretty cool: a beatboxing flutist with string instrumentation. But let’s look at the comments. Here are the first three:
- sweetcandy719: “anyway, Lo and behold, I found my man cheating on me with some whorish babe named Julia…”
- mystikcateyez: “Pretty cool music. Stop by my channel and check it out. Thanks alot for making youtube worth watching.”
- godbil4: “awsome”
Truth be told, it could be a lot worse. I have to say that sweetcandy719 probably could have benefited from hearing her comment read back to her. But, that is neither here nor there, I suppose.
My next move was to contact Google for some comment on Randall Monroe and this new speech feature. My first email was met with stony silence. The following email–sent a day later–yielded a response: I was told that my message would be passed along to someone at YouTube who would get back to me.
Someone at YouTube never got back to me. Luckily a lifetime of rejection has given me thick skin and I only wept for a few hours.
But back to the audio preview. I decided to test it out with a random comment for “Interlude: Slidy.”
When asked to speak my comment–”And they have no disregard for human life.” ~ George W. Bush, on the brutality of Afghan fighters, Washington, D.C., July 15, 2008–the speech technology worked pretty well. It didn’t read the quotation marks and had a little bit of trouble with the president’s middle initial, saying “west” instead of “W.” But, all in all, not too shabby.
So, whom do we have to thank for this new feature? Google? YouTube? Randall Monroe?
The world may never know …
As Speech Technology reported here a few weeks ago, Google launched a new audio search indexing experiment that allows users to find spoken words inside videos.
Google Audio Indexing (GAUDI) was developed by Google Labs and lets users search for spoken words inside video clips and jump to portions of a video where the searched words are spoken. For now, the GAUDI tool is available only for election videos, but Google plans to expand its use to other videos.
More importantly, GAUDI works well and is pretty fun to use. So, if you want to locate particularly rousing sections of John McCain’s convention speech based on key words or relive Sarah Palin’s sage remarks about lipstick and pit bulls, GAUDI may be the tool for you!
Next Page » |