Top Menu

Speech Standards Improve Service Quality, Customer Experience and Reduce 

Speech Standards Improve Service Quality, Customer Experience and Reduce

Speech Standards Improve Service Quality, Customer Experience and Reduce

3/31/2004
By Donna Fluss
CRMXchange

  Printer Friendly Format       View this document on the publisher’s website.

Rarely does a technical standard directly benefit end users, but in the world of speech technologies it does. Standards facilitate innovation and reduction in the total cost of ownership (TCO) of speech applications, but have been slow to market. Standards allow programmers to create platform-independent (and ideally vendor-independent) speech applications. Prior to the advent and acceptance of standards, developers were forced to use the proprietary development environments of each speech technology provider to create a new speech application. There were a limited number of speech application specialists for each of the proprietary environments, making it expensive and difficult to find developers with the right expertise and experience. Speech vendors were also restricted to specific platforms – limiting the market’s ability to create open packaged speech applications that could run on any platform. These technical constraints contributed to the high cost of entry and limited investments in speech applications to enterprises with deep pockets. Many companies that wanted to invest in speech to provide a friendlier and more satisfying service experience for their customers could not afford to do so.

The New Standards

On March 16, 2004 the World Wide Web Consortium (W3C) released two speech “recommendations” as part of its overall Speech Interface Framework. (The W3C is the accepted standards body for the web. It was founded in 1994 for the purpose of developing common protocols to ensure interoperability and today has over 400 members.) A “recommendation,” in the vocabulary of the W3C, is a fully tested and accepted standard that is ready for market adoption.

The first new recommendation is the Voice Extensible Markup Language (VoiceXML) Version 2.0. “VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations.” (Source: W3C Recommendation, Voice Extensible Markup Language (VoiceXML) Version 2.0, March 16, 2004.) Its primary purpose is to bring the advantages of web-based development and content delivery to interactive voice response (IVR) applications. The “main goal is to bring the full power of Web development and content delivery to voice response applications, and to free the authors of such applications from low-level programming and resource management.” (Source:W3C Recommendation, Voice Extensible Markup Language (VoiceXML) Version 2.0, March 16, 2004.)

The first version of VoiceXML was released in 2000 by the VoiceXML Forum, a group of vendors including AT&T, IBM, Lucent and Motorola. After releasing the standard, the Forum submitted it to the W3C, which has managed it going forward. While the W3C is slow to release new recommendations, there has been a great deal of innovation since 2000 and it’s estimated that there are close 80 – 100 speech vendors currently using the speech standards.

The second new standard is the Speech Recognition Grammar Specification (SRGS) Version 1.0. This “recommendation” addresses “the syntax for grammar representation. The grammars are intended for use by speech recognizers and other grammar processors so that developers can specify the words and patterns of words to be listened for by a speech recognizer.” This standard describes “words that may be spoken, patterns in which those words may occur, [and the] spoken language of each word” that are presented to speech recognizers [speech recognition engines].” (Source: W3C Recommendation, Speech Recognition Grammar Specification Version 1.0, March 16, 2004.)

The WC3 is also addressing a number of other speech standards that will help to simplify the development and implementation of speech applications. One is the Call Control Extensible Markup Language (CCXML). “CCXML is designed to provide telephony call control support for VoiceXML or other dialog systems.” (Source: W3C Working draft, Voice Browser Call Control: CCXML Version 1.0, June 12, 2003.) CCXML is generally used in conjunction with VoiceXML, as VoiceXML does not address call control.

New SALT Standard Challenges the Status Quo

The Speech Application Language Tags (SALT) forum was founded in 2001 and in August 2002 the SALT specification was submitted to the W3C. “The Speech Application Language Tags (SALT) 1.0 specification enables multimodal and telephony-enabled access to information, applications, and Web services from PCs, telephones, tablet PCs, and wireless personal digital assistants (PDAs). The Speech Application Language Tags extend existing mark-up languages such as HTML, XHTML, and XML. Multimodal access will enable users to interact with an application in a variety of ways: they will be able to input data using speech, a keyboard, keypad, mouse and/or stylus, and produce data as synthesized speech, audio, plain text, motion video, and/or graphics. Each of these modes will be able to be used independently or concurrently.” (Source: The SALT Forum.) Early on there were differences between VoiceXML and SALT and today programming variations remain. SALT was designed as to handle multimodal devices (PCs, phones, wireless PDAs), while VoiceXML was initially accessible only by phone. However, with the addition of X + V (XHTML + Voice) that has been submitted to the WS3 for consideration, VoiceXML-based applications can address multimodal devices. A second difference surrounds royalty charges. SALT has been royalty-free from the start, not always the case for VoiceXML. Today, VoiceXML can be obtained for free, just like SALT.

Final Thoughts

VoiceXML and its related standards are mature enough to be used by most organizations for everything from basic applications, such as directory assistance, to advanced “natural language-like” applications used for customer service and sales. During the past 18 months, these standards have facilitated the delivery of relatively sophisticated packaged applications, with many more on the way. The standards are bringing down the startup costs and TCO of speech applications, enabling companies large and small to invest and benefit from these technologies.

, , ,