VoiceSearch 08 : AVIOS Student Application Contest Winners
AVIOS had a wine-and-cheese fest in which they announced the winners of their 2007 Application Contest. I’m sure they’ll post detailed information on the site in the coming days.
There were three winners:
Michael Turner studied with Jim Larson at Portland State University. He designed a collection of five children’s games.
Michael Laskowski, also from Portland; I’m not sure if it was Portland State University because my pen slipped into my wine glass as I was writing. So that information is lost forever. He designed a maze game for children with speech impediments. Children maneuvered their avatars through the maze using repeated words. The game reinforced memorization as well.
Also, it delivered a stunning electric shock if a child said the wrong thing.
Kidding.
Finally, Jamey White studied under Juan Gilbert at Auburn University. He designed a training application for pilots.
I conducted two quick interviews with Jamey White and Michael Turner. To read the interviews in full, click on the “More” button.
Interview with Jamey White (Auburn University)
RJ: Tell me about your application.
JW: It actually is now titled VFR which stands for Visual Flight Rules. It’s an aircraft term. It’s a radio communications simulator. Its intent is to provide pilots with a way to efficiently simulate communications with air traffic controls without actually requiring air time.
So it’s a training application to get pilots familiar with the vocabulary?
Correct. And seeing there are some applications out there that come close to this, but they don’t offer the recognition aspect of it. One that I actually toyed around with while trying to learn the language is it presents scenarios with recordings of towers saying their part. You as the pilot say what you’re supposed to say but it only records it. Ther’s no processing of what you said so you could still say it incorrectly and the system would never know it. At the end of the scenario, you can play back your recording but if you said it wrong, the system would never know it. It trains you on what you might hear as opposed to what you should say. This application processes what you should say and gives you helpful feedback.
And they progress from level to level based on whether they say the correct thing or not?
That’s right. It will recognize your voice. It’s a multimodal application. The idea was to make the audio interface as realistic as possible. Let’s say you leave a piece of information out. The program should respond the way a control tower will respond. For instance, as a pilot making initial contact, some pieces of information you’re required to say would be the tower you’re addressing, your plane model, and your tail number. You also have to give your altitude, your heading, your intent.
Say for instance a pilot does not address the tower or address their tail number; the control tower needs that information and will respond: “November Six-Seven-Niner-Zero, What are you? Identify yourself?” As far as the audio interface goes, it tries to mimic that response. Visually, you’ll get a text blurb explaining what you left out and how to restate. That adds the learning aspect to it.
I didn’t know communications were that tightly regulated.
There’s actually a strict protocol defined by the FAA. They have instructions for each event, how to progress through the communication. However as communication goes, a lot of pilots inject slang, dialect, variations on these protocols which a learning pilot may pick up. We’ll refer to it as improper dialogue. Which may be fine for smaller airports but your larger, busier one, you need to follow protocol.
For instance, giving too much information. Say a pilot is contacting the initial approach tower. They may add information: “Foxtrot, flying at an altitude of five thousand feet, heading one-zero-one.” The problem there is they injected a lot of extra information. In reality, the tower wants it as fast as possible. You don’t need to give excess information. The extra “ums” and “uhs” and “this is” and “I’m flying.”
It reduces the rate of transmission errors when you’re not giving excess information, when the tower knows exactly what to listen for. There’s a greater chance they’ll understand exactly what you said.
Do you have family members that fly?
I have two cousins. One is commercial, going into Delta. The other is just an amateur hobby pilot. I’ve always wanted to be a pilot. Not quite there yet, but this pilot is pushing me towards that.
Where are you right now?
Step one. Finding a ground school and taking the courses and taking the courses. I actually haven’t begun the training to be a pilot yet. Hopefully it’ll be in the near future.
A friend of mine got his pilot license. He got one of these build-it yourself kits. It looked like Origami.
Not sure if I’d trust a plane I built myself.
Are you trying to fly professionally or will it be a hobby?
Probably a hobby. I’d like to professionally pursue more VUI development.
Anything in particular?
Probably mobile applications. I’m getting some exposure to that area and the potential is there. I’d like to pursue some technology on that side.
What were the nature of your studies with Dr. Gilbert?
It started with database. My first two courses in the graduate curriculum focussed on data mining, information retrieval. He offered additional course. I’d never had any exposure to voice user interface and I took the first course which predominately focussed on strictly voice. I loved it and the follow-up course got me into multimodal development. I’m leaning more towards that industry now. It’s definitely a key element in mobile speech recognition development. I’d like to be a part of that future.
Interview with Michael Turner (Portland State University)
RJ: So what does your application do?
MT: It’s a menu with five different options, five different stories. One of the stories is about a little boy named Freddy who’s having these difficulties. And it asks How do you think Freddy feels? One of the things for example is that Freddy’s brother comes into Freddy’s room, takes his toy and breaks it. How does Freddy feel: Mad, Happy, or Sad. The answer to that is Mad.
Then it says “Good Job.” If you messed up, it says “Good try.” And it’ll repeat the prompt.
Another story is you’re helping out at your grandmother’s orchard. She sends you out to get three apples, then sends you out again to get one more. How many apples do you have? Three plus one is four. So it’s kind of a learning tool.
I used my wife’s voice (in the application). I recorded her and put her in the final version.
Why?
I didn’t really like the standard speech algorithm that was being used. I wanted to use something that sounded more human-like.
It sounded too much like The Terminator?
Yes. I know you can get ones that are more human, but you have to pay for them.
What age range are you targeting?
I tried it out on my niece who is four. She was able to do it, but got bored after a while. Of course, she’s four. So maybe at the five to seven range, I’ll add more games to it, more interactive things. One of the options they can do is choose their age. And from there, it can (provide the appropriate game).
So when they hit fifteen, the game just turns into Grand Theft Auto?
Yeah, it’ll say “Go buy another game.”
Kids don’t always enunciate well. Was that a factor with the speech recognition?
It wasn’t initially because I wasn’t thinking about that side of it. But ultimately if I were to pursue this, it would be an issue that I’d have to work out to make sure what they say is accepted by the grammar.
Will you continue with this project?
Not at this moment. It was a school project. I might pick it up at some point. I thought the idea is a good one. Maybe you could put it in a child’s toy, interact with the toy, like a stuffed animal, and have it talk back to you.
There’s a new Elmo doll that does that:
That would be interesting to see if I could do something like that.
What are you looking into?
Many things that have to do with programming. Not necessarily speech-related. I’ve been a C Sharp Dot Net programmer for two and a half years. I’ve done networking, web design, I’ve had a lot of experiences. I like speech and I’m doing that in a project I’m currently working for and we’ll see if I pursue that or another field in IT.
So this project was your first foray into speech?
This is my first foray into speech. I’ve taken the class at Portland (State University) and started liking it. Currently I’m working on a large-scale business solution for the cabinetry industry. The program that I’m working on started as a CAD tool. It’s moved on to a business solution where not only do you have CAD, but you also have contact managers, support capabilities, document management, campaign management. It links up to your phone. You can send text messages to your phone as alerts.
The speech aspect is a recent addition. We’ll add more things but right now it just makes the lives of the people using it a lot easier. Our whole thing is making it lean, letting people get in and get out. For example, you can have it read the equipment listing. It will query the schedule, retrieve those, and read them to you.
What do you want it to do?
I want it to be able to put in wild card matching so you could really give it, like in a schedule for example, instead of saying just today or tomorrow, if you can say read the appointments for Monday the 5. You’re not as restrained, you can give it anything. Perhaps linking it to a login script so it recognizes your voice, so it’s locked into you. That would be a more advanced feature.
Realistically will that happen soon?
Not at this point.
You took one of Jim Larson’s courses in developing speech applications at Portland State University?
I’m not with him right now. I took his class in the summer, but didn’t know that much about it at the time.
So why’d you take it?
It fulfilled my elective. Honestly.
That worked out pretty well.
It did. And it filled out my elective. It was a good class, I had fun, and I like what I’m seeing in this industry.

Ryan…where do you find these videos? I bet our readers would love to know why our snarkiest writer has knowledge of the newest Elmo doll.
Comment by Lauren
— March 11, 2008 @ 11:41 am
Ryan,
Thanks for the interview… I had a great time at the conference.
Comment by Mike Turner
— March 11, 2008 @ 10:36 pm
Ryan,
I believe Mr. White, from Auburn University spells his name Jamey.
Comment by Peg
— March 18, 2008 @ 12:38 pm
Thanks. I checked the directory and he does. Fixed.
Comment by Ryan
— March 18, 2008 @ 12:45 pm