In the past, I have been unsuccessful in my attempts to learn another language. From age 10, I was taught Spanish. However, my knowledge of Spanish today basically allows me to identify myself, wish people happy Christmases and birthdays, and ask someone’s name or the time. I can also say “I’m sorry,” which is very helpful when you’re constantly massacring someone else’s language. Then there was Latin (yes, of the dead language variety). What made me think learning Latin was a good idea? I was fresh out of high school and probably dazed from watching Dead Poet’s Society too many times. After two years of taking Latin all I really remember is carpe diem (it helped that this was the slogan of my undergrad alma mater, Montclair State University). Anyhow, I decided recently that I’d like to try to learn Spanish, at least more functionally, and I think when one learns languages, it’s so important to be able to speak and practice with others.
I read about the new language learning Web site, LiveMocha in– guess where (the NYT. I know. I love The Times). Not only is this site free, but it allows you to connect with other users so you can help others learn your native language–which gains you more points to access more lessons for yourself. At first, I balked at this possibility–”but I really, REALLY can’t speak Spanish,” I said. The site however, neatly sets up small lessons for you to learn and then gives you exercises you can write and record so other users can tell you if you speak, write and even spell words correctly. Users can also tell you how to not sound all text-bookish and overly formal. For instance, I told a user trying to learn English the other day that most people don’t say they’re buying a “little car” unless that car is a toy. Instead, we say, “small car.” Also, I love the ability to record, and hear recordings of the language. While you learn the words and phrases, you hear a voice saying them. You can even turn off the ability to see the translation, and I imagine this feature has come from other language programs (like Rosetta Stone, which I’ve only heard about and never tried). Anyhow, these programs really teach you to learn contextually, and the speech element here really shows itself to be a crucial element, not just a fun extra (though I have no problem with fun extras!).
My only issues so far have been some navigational and aesthetic (or perhaps it is more accurate to say philosophical) ones. I find that it’s sometimes hard to tell where you left off, and the site will improve dramatically if it were structured a bit more intuitively. Also, I was annoyed that the gorda mujer (fat woman) had a weird expression and looked sad while the delgada mujer (skinny woman) was all sexified. Why not sexy fat and skinny ladies. We’re teaching languague, not making judgments, LiveMocha, come on! I admit too that it was funny (but unrealistic) that someone would look at an old person and say “You are old.” Also, I felt the images didn’t always correspond as easily to the words. For instance an image that’s supposed to be matched with “you are tall” is of a short boy and a man standing back to back. Sure, one is taller than the other, but one is also shorter. As a short person, I always see the short part first I think. Anyhow, most of these issues are somewhat minor, and I hope that I will be able to improve my Spanish enough to hold an actual conversation with someone. I’ll follow with more updates if and when I get to more advanced levels. Adios for now, cabezas de discurso.
(p.s. translation from google. You can let me know if the comments if that’s totally wrong or not)
Speech technologies took another hit recently when an article in PCWorld, titled “Google Voice Failures: Lost in Transcription,” pointed to several pretty bad inaccuracies in Google’s voicemail-to-text translation service that is being offered as part of Google Voice.
In a FierceVoIP blog about this particular subject, writer Mike Dolan suggests that “probably the major reason other UC offerings do not employ this wiz-bang technology is that the software still leaves much to be desired.” That statement completely off-base for a number of reasons. For starters, many other companies, including Nuance, Yap, YouMail, and GotVoice, are offering voicemail-to-text services. A simple Google search for “voicemail-to-text” can uncover dozens of them, including most of the major telcos. In fact, the technology had so much promise that Nuance shelled out $102 million to acquire SpinVox, one of the leading providers of the service, late last year.
Dolan, and the PCWorld writer both also omit the fact—and its one that Google has freely fessed up to—that the service gets better with use.
Finally, Blogger Dolan concludes with the following statement: “The day when all our communications can be siphoned right into our Gmail are still out on the horizon somewhere.”
So, post- Speech Tek 2010, I’m contemplating the future of speech. In particular, I’m mulling over the IVR, and I think I wouldn’t get an argument from anyone, speech-head or not, that the IVR suffers the worst reputation as far as speech technology goes. In her talk, “Your Call is (not that) important to us,” Emily Yellin called for more human-like IVRs. She also argues–and I agree–that companies should use language that is less technical and make them easier to use in a general sense. When I talked to designers and testers of IVRs for my upcoming feature, I found out that the reason so many of these systems are flawed (some more tragically than others) is because a lot of companies don’t do sufficient (or any) usability testing. The good news is that that this is apparently changing. More testing is being done, most likely because companies are seeing that when IVRs are bad, people just don’t use them.
But. I’m wondering about a new trend I’m seeing in some new commercials. First, there was the Chase Sapphire Commercial, called “the Bet.”In this particular commercial, we meet this couple, who are meant to embody luxury, style and attractiveness, as per usual since in credit card commercials the only people apparently who use credit cards are all rich and attractive, not former grad students trying to pay down credit card debt accrued while under-employed during a recession (EHEM). Oh, and full disclosure, I use Chase, but not sapphire, though I remain firmly ambivalent about which credit card might be better than another. They remain, at best, a necessary evil for me.
Anyhow, said couple, is on a ski lift, and the man is calling Chase to find out how many points they have. His wife (or partner or girlfriend or mistress) doesn’t believe that he’s going to get directly to a customer representative. He bets her a massage; she loses; silly puns ensue; blah, blah. Anyhow, what I’m getting at is the slogan: “You call. We answer. No waiting.” This commercial is clearly an IVR backlash. In a way, it’s interesting; the company seems to recognize that most customers want to talk to a person. In fact, as Emily Yellin mentioned in her talk this week, there are many websites that tell you how to opt out of an IVR. But Chase Sapphire is taking it to another level: it is saying, you won’t have to even worry about that.
What’s interesting to me especially, is that this is a luxury card. As Tara Siegel Bernard of the NY times points out in her review “Chase also promises cardholders direct access to real customer service representatives based in the United States, without getting lost in call center purgatory, which is truly a luxury nowadays.”
Now that is some bad news: if one company brands the lack of an IVR as a luxury, will other companies follow? Recently, esurance came out with a commercial introducing a slogan I haven’t heard from them yet: “People when you want them. Technology when you don’t.” Now that’s a hopeful, twist right? I’m not so sure. In the commercial one esurance worker (a youngish man) says he’s just made it easier for customers to access their accounts online. The older co-worker (“Babs”) an woman perhaps in her sixties, says that customers can also reach a representative 24/7. The response to Babs is that all the other workers in the office start imitating robots, and saying “does not compute.”
Despite making fun of robots in the commercial, and actually parodying an IVR, it’s what’s strange about this commercial is while it makes fun of technology, it identifies itself as being pro-technology. I’m guessing, however, that the technology cited in the slogan is referring to the internet which is a much more likable solution. The commercial itself seems to be appealing to younger users while also letting older people, who might not be as internet savvy (I’m looking at you, dad, even though you’ll never get online to read this) that they can use the phone if they want to. Taking this closer look then, it is still troubling for the speech world, though I certainly wondered about esurance’s business model; I’m betting that it might be cheaper and easier for the company if more customers went online.
That being said, these commercials do seem to represent the growing ‘go human’ trend. This then is indeed a pivotal moment for speech technology. Considering all the talk at the conference about changing IVR for the better and how to fight for those changes if you’re a designer, I think the change is on the way. Here’s hoping.
Apparently, we here at Speech Tech are not the only ones obsessed with Robots; the New York Times has been doing a series of articles focused on technology that has included a lot of robots: dancing robots, talking robot heads, teaching robots and more. One thing at a time though, since today we’re going to focus on teaching robots. Thanks NYT, we knew we weren’t alone.
According to the article, robots are used to teach everything from basic social behavior to languages. Most of these robots fall into the less creepy looking, i.e. not rubberized valley ala uncanny humanoid. In fact, it does seem that more often now robots are falling into a kind of Wall-e category: robots that are designed to clearly be recognized as Robots, but still have some kind of “face” and/or possess the ability to present an expression. In other words, they’re kind of cute.
My favorite of the new robots the NYT reported on was the Engkey, which is currently used to teach South Korean kids English. Engkey, is a squat robot that looks a little like a futuristic Russian nesting doll.
Part of the reason EngKey has come into use is because there is a dearth of native English-Speaking teachers available. As a former teacher (who was actually recruited by Korea to teach English but couldn’t relocate), I was especially interested in the the idea of robots being used to teach students, especially because its so hard sometimes to get student attention, and I imagine a robot would get most people’s attention, at least at first. It certainly beats the lame math “games” I played in the eighties which were presented on “toy” robots or other tech systems, meant to make math fun. Yeah right.
Anyhow, the Engkey is speech-enabled, so it can respond to students, telling them if their accent needs work or if they spoke a phrase correctly or not. It can also spin around and give students a high five while showing images of stars on its screen.
Of course people are worried that robots won’t be able to teach students as well as real teachers, though robots certainly won’t run out of patience, form unions or demand to be paid more. At least, not yet.
I have to admit speechheads, I was a bit of a skeptic when I saw Paro, the robot baby seal on a tv show, but the article and video in the NY times may have changed my mind. While in my last post, I was being a bit facetious about cuddling with the robot baby (you can’t really cuddle with it, though it is supposed to inspire cuddly feelings in you), Paro is really designed to be cuddly, and unlike my cat Oscar, who likes to express his complaints with plaintive meowing or jumping on my head while I’m sleeping, Paro doesn’t need any thing from its owner (well, except about six grand to get him in the first place). Either way, according to the Times Paro is catching on at nursing homes as a form of ‘robot therapy.’ The aspect of Paro that does seem to make him uniquely different from just another stuffed animal–according to the article, they were tried but didn’t achieve the same calming affect)–is that Paro responds to your speech, can learn to respond to names and will try to ‘adapt’ his behavior–for example, if it is petted a certain way, it will try to move in the same way to be petted again. Alternately, if you bat poor Paro away, he will try to avoid moving that way again.
And for your further amusement, dearest speechheads, here is a comparison list of some things you could do with my cat versus Paro. Which would you get?
Going to a Sushi Restaurant
Oscar: would beg for sushi, be denied, ultimately try to “hunt” whatever fish are in the decorative tank, knock it over, causing you thousands of dollars in repairs/ and you are banned for life from the restaurant.
Paro: as seen in a you tube video, paro sits on the counter watching you eat, makes other customers interested in you. You make new friends, even if they might think you’re a little odd.
Watching a Movie:
Oscar: demands you play fetch, meaning, you throw–a ball, a tiny stuffed mouse, etc.–he runs to get it and then meows for you to retrieve it and throw it again. In other words, you fetch.
Paro: Does not know what “fetch” is.
Sleeping in the Early Wee hours of the Morning
Oscar: wants to let you know there are birds outside. He does this by meowing directly into your ear.
Paro: “sleeps” peacefully beside you and remains motionless.
Playing in the Backyard
Oscar: will defend his territory by howling and poofing out his tail
Paro: will scare away other animals simply by mewing strangely.
But, as far as cuteness goes, well, I think Oscar might be our winner hands down. But, then again, I’m biased.
My first direct encounter with a GPS system was in the back of my partner’s parents’ (I guess you can call them my sort-of In-Laws) SUV. A female voice calmly instructed my sort-of-father-in-law to turn onto route 95.”But that’s not the right way!” He was very exasperated. His wife said something in response that probably amounted to “Honey, every time you go against the GPS, we get hopelessly lost.” But she probably didn’t use the word “hopelessly.” It was however, what I was thinking at the time. And oh, did we get lost. After about 700 hours in the car we did eventually arrive at our destination. In this situation, it did seem that the GPS didn’t really help us calm my sort-of-father -law. If anything, she seemed to inspire further irritation. At some point I do think there was a “Shut up!” and an eventual silencing of the GPS.
However, my experience is something of a challenge the argument made in a New York Times article last week in which Bruce Feiler confessed to falling for his female GPS voice (he is kind of joking, but not really). He writes that many–men at least–have been having sexy thoughts about their GPS voices. Felier uses the work of Dr. Clifford Nass, a communications professor at Stanford and a consultant for many car companies, to discuss the ideas that surround these disembodied female voices, that are often voices of actual woman. In the article Nass is quoted as saying that female voices are more likable, but male voices are seen as more competent. He also states that either way both men and women prefer female voices because they are seen as less threatening.
So maybe my experience isn’t as much of a challenge as I thought–perhaps our female GPS voice was seen as incompetent. That being said, my sort-of-father-in-law also thinks map quest and google maps are incompetent, and, as someone who has been totally and utterly screwed by mapquest–I was was led onto some weird army reserve in New Jersey, which was certainly not where I was headed– I have to somewhat agree with that.
Anyhow. I think this article seems to gloss over what seems to me to be a sticky mire of gender issues– that we long to hear submissive female voices that desire only to cater to every need strikes me as something troubling. And the flip-side of submission is incompetence? Yikes.
I also think that if we’re going to do sexy GPS voices, not being equal opportunity is a missed opportunity. Does that fix the problem I mentioned? Not really. However, if people are responding to voices this way, I’m sure there are a number of straight women and gay men who might want a sexy Johnny Depp or Jake Gyllenhaal type voice to tell them how to get where they’re going.
I can only hope that some customers do want spunkier confident female voices to tell us where to go. Personally, I wouldn’t mind a little variation to the monotony of a GPS voice that could even be customized. Like a voice that sounds like my best friend saying, “Hey, lady, don’t miss the next exit.” And I have to say, she has a very confident tone without harboring any judgement at all.
post script:
You can watch this totally bizarre video that was linked to the times article of a woman who does the voice over for some airlines. What’s funny to me is she sounds she’s trying to sound more automated even though she’s a real person. What is up with that, speech-heads?
Thinking about having a baby? There might be a number of ways to prepare from taking a parenting class to learning how to change a diaper. However, now, dear speech-heads, you can practice by interacting with Yotaro, a errie luminescent baby robot created by students at University of Tsukuba.
Yotaro does more than just glow; he cries and pours fake baby snot from a tiny implanted tube; responds to touch and can be calmed by a rattle attached to sensors. The creators fused odd somewhat realistic and non-realistic features to create him since Yotaro’s face is a two dimensional projection from a computer program, but his “skin” is silicone warmed up by warm water. (Insert Yikes here).
His movements are also synthesized by motors beneath a baby blanket. You can see more of this bizarre spectacle in the video.
While I admit encountering fake baby robot snot is not on my list of priorities, as far as robots go, this blog has seen a lot weirder.
My favorite voice avatar might be “whispery” which is accompanied by an image of feathers floating in what is probably supposed to be wind, but looks more like smoke. The voice is, as you might guess, a whispery, creepy voice that sounds straight out of a b-movie. But then again there’s “Evil Genius.” It’s so hard to choose!
“Whispery” and “Evil Genius” are just two of Voice Forge’s voice avatars, which allows you or your company to choose a particular voice or voices based on your needs. Each avatar has an image to help you understand what the voice might sound like. You can test out all the voices on the website and make them say whatever you want. For example, I can make the voice “Evil Genius” say “Speech brother Adam B. likes Hannah Montana–he must pay,” or “I have an ingrown hair in my evil genius goatee.”
Voice Forge’s goal is to keep automated voices from sounding “corporate and boring.” You know, just in case your corporate image involves being a “jerkface.” Also, if you were wondering, Jerkface is a pissed off white guy with a hoodie making a blurred angry gesture at you. Who else?
“Wiseguy” is a fellow whose eyes are shadowed by a straw hat (straw hat? I might be behind in my mobsterology, but really?) To me some of the voices sound just like the other TTS voices I’ve heard. But I have to give bonus points to the voice avatar simply labeled dog. It features a dog that might be a beagle of some kind. No matter what you type in the TTS space, you just hear barking.
So, speech-heads, world cup season has descended upon us in all its sweaty rowdy glory. I should disclose now: I remain mostly ambivalent about sports in general, but I know some of you are excited and can’t stop talking and blogging about the world cup (not to mention those of you who painting your faces. For you, I have no comment).
Anyhow, this all brings me to the recent xbox 360 addition (which will be available for November, so maybe in time for the other “football”), Kinect. Kincect’s selling point is that you control it with your (possibly wild) gestures and voice. You will be able to talk live with friends about the game, so you can heckle and argue in real-time. Basically, now it’s not only your family (and neighbors) who might hear what you think of the Yankees.* You can have your own network of hecklers.
Microsoft also states in its release that the camera will pan the room to follow you (useful, but creepy) and that the system will be intuitive.
Also, what’s of more interest to me and speech-brother Adam B. is the music component, which will have seven million songs that one can play music using voice commands or gestures. The catch: you’ll have pay for a subscription from Zune. But with that price comes possibilities for spying, er, observing what your friends are playing and, as usual, you will be able to get suggestions on what music you might like. Though I can already tell you, I’m not always into what Adam B.’s listening to.
Oh, and of course there are video games. To control these games, you move or make gestures (the press release says you’ll be able to simply kick a ball by kicking your foot.) There’s also movies, etc., for all the details, go to the website.
*for the record, don’t come after me. I remain firmly on the fence with this whole Yankees versus whomever conversation. Leave me out of it. I’ll be drinking iced coffee and watching Iron Chef. Now, Flay versus Morimoto, that I’ll have an opinion about.
My grandmother was obsessed with jeopardy. Everyone knew: you don’t call Katie between the hours of seven and seven-thirty EST because she would be planted on her balding easy chair, watching The Show. She will not be taking your call.
Unfortunately, my grandmother was losing her sight as well as her hearing, which probably meant the next door neighbors were listening to jeopardy as well.
My grandma Katie, rest her soul, probably would have liked the talking television from Ocean Blue Software, a company based in the UK and Korea. The telly will be able to use text to speech to read programming schedules so viewers (or TV listeners) with failing sight (or no sight at all) can hear what’s going to be on. There is also an option that allows viewers to change fonts–so a person could potentially enlarge fonts used in the guide.
Soon the company says it will also allow a person the ability to change the channels with a simple voice command of “channel up,” “channel down,” or even “volume up.”
Unfortunately, at the moment, the talking television is only available in the UK, but who knows, dear speechheads, who knows.