By now, everybody knows about Google expanding its robo-transcription function to include every video on YouTube. In fact, you may have read about it in a recent story in Speech Technology.
Well, that’s all fine and good. We here at SpeechTech Blog love transcription as much as the next guy. But every video? I mean there are some pretty insane/marginal/terrible videos on YouTube. Take for example Mannequin Dance Party. Do we really need this video transcribed? Do we really want it transcribed? Well, maybe. Who am I to judge? And, I suppose I shouldn’t be so negative. This next video, titled Talking Carrots definitely deserves transcription.
But my Speech Brother Eric. B and I still have doubts about Google’s ability to transcribe every video. So, that is why we are issuing the following Transcription Challenge to YouTube and Google.
My Speech-Brother Eric B. and I were enjoying a leisurely brunch here at The Home Office–reading the newspaper, sipping mimosas and eating our traditional Speech Tech Breakfast of poached eggs, kippers, rashers, fried bread and Jell-O–when we came across a story on the cover of The New York Times about speech technology.
Check out the above link, to read all the news that’s fit to print about Google Translate. We haven’t seen anything this high profile since The New Yorker Episode of 08.
Lo’ Speech Heads! From high atop Mount Apple, the sentence was handed down. Today, Eric Schmidt, Google’s CEO, will have to find a seat outside Apple’s boardroom. Schmidt had served on the board for three years. In a statement from Apple, Steve Jobs noted Schmidt’s contributions and said that the departure was mutual.
“Unfortunately, as Google enters more of Apple’s core businesses, with Android and now Chrome OS, Eric’s effectiveness as an Apple Board member will be significantly diminished, since he will have to recuse himself from even larger portions of our meetings due to potential conflicts of interest,” Jobs said.
There may be more to the story than just that, Speech Heads. Schmidt’s departure follows a murmur of rumors that the FTC was investigating whether his position on the Apple board would constitute a violation of anti-trust laws. It also follows Apple rejection of Google Voice from the App Store—meow!—which, according to Techcrunch, is being looked at by the FCC. Apparently, Apple, Google, and AT&T were all given letters of inquiry on Friday asking about the decision. Citing pending proceedings regarding wireless access and handset exclusivity the FCC wants to know what role AT&T plays in deciding what makes it into the App Store.
No clue what the answer is, but by brother Adam B. says it’s bound to be juicy!
Now back in the loving embrace of our New York offices, I thought I’d take a look back at Voice Search and give you Speech Heads out there some final views.
Like all trade shows, there was of course a fair amount of wheeling and dealing-companies ponying up to each other, seeing if they could hew together some kind of symbiotic relationship that would produce some killer solution capable of reaping mega profits. Sort of like a Power Ranger’s Megazord, those giant fighting robots the Rangers had that were made up of various other smaller robots.
In all that hubbub, it was pretty clear that there were three companies that everyone was looking to try and integrate their offerings into: Google, Yahoo, and Microsoft.
There was hardly a minute between sessions I didn’t see Michael Cohen from Google or the gaggle of Microsoft folks not surrounded by eager speech impresarios. Marc Davis from Yahoo, who was only in town for a couple of hours to boost oneSearch at his keynote, was literally deluged by a crush of people wanting to exchange business cards (full disclosure: me too!) before he had to jet back to San Francisco.
The prevailing feeling at the conference, as I described in my last dispatch, was that mobile voice search is where it was at; that there we would see real and massive growth for speech in the coming years. All heads were turned to giants like Google and Microsoft to lead the way, too. They, many feel, could provide the shake up that speech has really needed.
The field has been kind of limited in scope for the last pack of years. Until late, it hadn’t really expanded too far beyond the places it’s traditionally been found: call centers, command-and-control functionality, and dictation. Without new territory, speech has plugged along without ever seeing explosive growth. With the entrance of Google, Microsoft, and Yahoo into voice search, the speech community seems to be excited by the possibilities, and, though they might be reluctant to say it on the record, some of the potential changes in players.
It’s no state secret that Nuance has been dominating speech, acquiring technologies like IBM’s patents, or Philips’ speech, and a slew of others. In the process, as you might find in any aggressive climb to the top, it’s stepped on quite a few toes getting there and has no shortage of discontents. You don’t have to push too hard to get people griping about Nuance in San Diego.
“In a market where there hasn’t been a big brother, [Nuance] rolled up into one,” Joseph Bentzel, chief marketing officer for SpeechCycle and, it should be noted, a competitor, told me. “But in a market where there are bigger brothers doing it for free and virally…” he added before trailing off with half a smile and letting his pause sketch out the possibilities.
While Nuance has cast a large shadow over speech, acquiring its way to the top, building a strong speech provider out of a company that originally just handled OCR scanner software, ScanSoft, Mr. Bentzel thinks it’s reached the end of the line as far as being the undisputed king of speech. By his account, voice search will grow the market and create a space outside of Nuance’s purview.
“Nuance will not exist as a leader in 24 months unless Paul Ricci [Nuance's CEO] reads this article and hires me,” Mr. Bentzel jokes.
Part of Nuance’s problem, as he sees it, is that they’ve tried to become the one-stop solution for all speech needs. They’ve tried to control the process from the ground up, acquiring and integrating technologies into their own banner. This has had the effect of freezing other companies out, and, in some cases, making them hostile.
“This is the Rebel Alliance,” Mr. Bentzel says of Voice Search. “This is the Luke Skywalker Show. We’re on the ice planet and they’ve ignored us.”
While he seems totally at ease comparing Nuance to the Empire from Star Wars, Mr. Bentzel is also quick to say that everyone in speech ought to “thank Paul Ricci for putting speech on the map.”
“I’m not one of these Nuance haters,” he insists. He says he’s more or less agnostic and only sees problems where market growth is impeded, so forget about thinking he views Ricci as some kind of Darth Vader force-choking everyone at the table.
In fact, he suggests that there wouldn’t be much speech out there without Nuance’s drive to make it a big business.
Mr. Bentzel’s position (and others like his) represents an attitudinal shift in how the field has come to view itself. If I, or anyone else for that matter, made the mistake of saying “speech industry,” there were a group of people on hand, just ready pounce, saying, “Speech isn’t an industry, it’s a tool.” Speech is starting to see itself as a subordinate modality to larger functionality, not an end in and of itself the way it has been viewed in its more academic roots.
If you don’t believe me, just try saying “speech industry” for yourself at SpeechTek in August. When you walk into that trap, they’ll whip out that little tool mantra like it were a brand new gun they’d just been itching use and you were the hapless mugger who made the mistake of trying something today.
It’s a crazy mixed up world out there, Speech Heads. Even without the recession, everything is in flux and it seems like everyone is trying something today. Carry a speech-gun and watch your back is my advice.
***SPECIAL NOTE: Due to an oversight entirely on my part, we had erroniously reported that Nuance didn’t have much of a presence at Voice Search. In fact, they did. Brad Bargan, Nuance’s VP of product development, participated in several events. My most humble apologies to them and to our readers.***
A few days ago, we brought you this post about Google’s new voice search for the iPhone.
And as expected, Google is being typically reticent about this development.
But fear not, gentle reader: We at Speech Tech Blog are not done yet.
In fact, as soon as I am done with this blog post, I have a strongly worded email to send to a certain massively-powerful-and-ridiculously-secretive-company. A company that has access to all my personal information; a company that has a record of every Internet search I have ever made; a company that will, for obvious reasons. remain nameless.
But in the meantime, check out the following links for more information on Google’s voice search for iPhone.
According to a story in today’s New York Times, Google researchers have added voice recognition to their search software for the iPhone.
According to The Times, the new application will be free and available via the iTunes store. Basically, users will be able to simply speak searches into their iPhone. The results will then be displayed on their phone and include local information when applicable via iPhone features that detrmine user location.
We expect an announcement about this from Google, so keep your eyes peeled for a Speech TechnologyNews Feature in the near future.
In the meantime, John Markoff’s New York Times story can be found here.
So, I came across this strange and oddly worded press release today, which announces the creation of a toolbar for Internet Explorer that enables speech recognition for Google’s search engine.
According to the release, when the program–created by developers from the Ukraine–is installed, users may enter a search by voice and then either click the search button or say “search!” This redirects users to www.voicesearchbar.com where they will see the results of their search.
From the release: “Authors of the program recommend launching the training mode to enhance the quality of speech recognition. After 3-4 launches of a training mode program recognizes speech pretty good.”
Hmmm.
Apparently, the VoiceSearchBar toolbar is available for download at www.voicesearchbar.com.
I visited the site–it features an attractive blond woman (who evidently likes speech enabled internet searches) and comes with a special dedication: “Dedicated to Bill Gates: ‘In five years, Microsoft expects more Internet searches to be done through speech than through typing on a keyboard.’”
Hmmm.
The site also features an About page that allow users to make donations.
Hmmm.
I really don’t know what to make of this. Part of me feels like if I were to download VoiceSearchBar, my bank account would be instantaneously drained, a vast array of pornographic images would be sent to everyone in my email address book and my computer would explode.
In search of a second opinion and in an effort to not malign what could be a perfectly legitimate product, I contacted a friend in the Speech Tech World. He seemed to think that my instincts were correct. He also thought it odd that VoiceSearchBar was being released for Internet Explorer, which is not an open source browser.
I don’t know: Maybe VoiceSearchBar is totally legitimate. Maybe not. Does anyone out there have any information about this?
Last week, Google implemented optional audio previews for YouTube comments that allow users to listen to comments before posting them.
What made this move somewhat buzzworthy was that it came shortly after Randall Munroe’s popular webcomic XKCD suggested that Google add audio preview for all comments–so that people might realize how inane the majority of their comments really are.
And really, you can’t argue with that. A random visit to YouTube yields the following:
The YouTube “Spotlight” was “Project’s Slidy Interlude.” I clicked on it and watched “Interlude: Slidy“–which was actually pretty cool: a beatboxing flutist with string instrumentation. But let’s look at the comments. Here are the first three:
sweetcandy719: “anyway, Lo and behold, I found my man cheating on me with some whorish babe named Julia…”
mystikcateyez: “Pretty cool music. Stop by my channel and check it out. Thanks alot for making youtube worth watching.”
godbil4: “awsome”
Truth be told, it could be a lot worse. I have to say that sweetcandy719 probably could have benefited from hearing her comment read back to her. But, that is neither here nor there, I suppose.
My next move was to contact Google for some comment on Randall Monroe and this new speech feature. My first email was met with stony silence. The following email–sent a day later–yielded a response: I was told that my message would be passed along to someone at YouTube who would get back to me.
Someone at YouTube never got back to me. Luckily a lifetime of rejection has given me thick skin and I only wept for a few hours.
But back to the audio preview. I decided to test it out with a random comment for “Interlude: Slidy.”
When asked to speak my comment–”And they have no disregard for human life.” ~ George W. Bush, on the brutality of Afghan fighters, Washington, D.C., July 15, 2008–the speech technology worked pretty well. It didn’t read the quotation marks and had a little bit of trouble with the president’s middle initial, saying “west” instead of “W.” But, all in all, not too shabby.
So, whom do we have to thank for this new feature? Google? YouTube? Randall Monroe?