Speech Technology Magazine SpeechTEK Conference
 

And Now, A Message From Jim Larson, Co-chair of W3C Voice Browser Working Group

STM Blog @ 10:16 am

SCXML, or the “State Chart extensible Markup Language” provides a generic state-machine based execution environment based on CCXML and Harel State Tables. The fourth Working Draft of SCXMLhas been published at:

http://www.w3.org/TR/scxml/

The main differences from the previous draft are:
1. the modularization of the language,
2. the introduction of profiles and
3. a revision of the algorithm for document interpretation.

The document as a whole has changed significantly and the W3C Voice Browser Working Group welcomes review. Please send comments to www-voice@w3.org

Jim Larson
Co-chair, W3C Voice Browser Working Group

Q) Dammit, Jim, what is this?

A) State charts are a used by software developers to specify the “flow of control.” State charts can be though of as a type of flow chart that describes the order and sequence of things that happen during the execution of an application. Basically, a state chart consists of “states” and “transitions”. Think of “state” as a something that the the application does, and a “transition” as the rules for moving from one position to another.

State charts were originally developed by the mathematician David Harel and is included in the Universal Modeling Language (UML). State charts offer a clean and well-thought out semantics for sophisticated flow control constructs such as sequential, conditional, and parallel flow control. State chart have been defined as a graphical specification language, however, and hence do not have an XML representation. The goal of this document is to combine Harel semantics with an XML syntax.

We expect that the SCXML notation specified in this document will be used to specify the flow control of multimodal and speech only applications. The notation may also be used to specify flow control for other types of applications.

VoiceSearch 08 : Final-Final Thoughts From Jim Larson

STM Blog @ 12:19 pm

James Larson, Ph.D., is co-program chair for the SpeechTEK 2008 Conference, co-chair of the World Wide Web Consortium’s Voice Browser Working Group, and author of the home-study guide The VoiceXML Guide. He can be reached at jim@larson-tech.com. He was kind enough to submit some thoughts on the recent Voice Search Conference in San Diego.

1. Voice search can be defined as (a) using voice to search text information, and (b) using voice to search voice information. There was little discussion the second type of voice search. There were many talks about the first type of voice search, especially for directory assistance, customer info lookup, and music “jukebox” applications.

2. While much of the conference dealt with voice search, several sessions addressed other speech technology topics. For example, 5. The folks form Spoken talked about Secret Agents. A secret agent is a human who monitors several ongoing IVR dialogs. The agent is notified when the speech recognition engine fails to understand what the user said. The user’s utterance is replayed to the agent, who selects the appropriate word from the grammar, or causes the dialog to transfer to a regular human agent. The overall effect to the user is the dialog works better.

I note that AT&T did this some time ago for directory assistance calls.

The goal of secret agents is to contain the user within the automated IVR system. As we saw from Paul English and the gethuman.com web site, users hate containment, especially if they have a difficult request that they feel can only be handled by a live agent. I wonder how these users will feel if they knew that a secret agent is listening to them but is not allowed to speak directly with them.

3. Mike Phillips, Vlingo, has a nice demonstration for accessing textual data by voice. Vlingo has done a lot of usability testing, and it shows when you use the UI, which I think is very good. Check out the UI by going to http://www.vlingo.com/ and clicking “watch the demo.”

4. Three hot topics of discussion were:

(1) multimodal user interfaces

(2) analytics

(3) video and voice dialog. Most conversations delt with how cool these new technologies were and how to make money using them.

5. I had a chat with David Thomson, who gave a talk about how phones can be used in social web sites. We see opportunities for speech technology in social web sites:

(1) Provide simple authoring tools so teen can create speech dialogs to their portrayed personas.

(2) Viewers could call a phone number and leave messages, which could be converted to text by general purpose dictation recognition.

(3) The virtual equivalent of an answering machine that could accept VoIP calls, filter them, and route them according to instructions by the social web site owner. I think there are many opportunities for speech technologies in social web sites.

Previous Posts
Keyword Tags
Archives
© 2008 Speech Technology Media, a division of Information Today, Inc. About/Contacts | PRIVACY POLICY