Top Menu

Speech Recognition: Ripe for the Picking 

Speech Recognition: Ripe for the Picking

Speech Recognition: Ripe for the Picking

Speech recognition is proven technology, yet few companies are taking advantage of this low-hanging fruit.

6/1/2002
By Donna Fluss
Customer Interface Magazine

  Printer Friendly Format       View this document on the publisher’s website.

Customer service is going from bad to worse as the weak economy pressures enterprises to reduce costs. Conflicting with, or maybe in response to this reality are recent studies showing that customer service is the most important initiative for more than 65 percent of enterprises in 2002. But investments are not keeping up with stated priorities; enterprises are squeezing customer service budgets and the quality of service is declining. The service market is demanding technology that simultaneously reduces cost and improves quality, and speech recognition has what it takes. Speech recognition technology is very compelling, particularly in a challenged economy that purports to emphasize service quality, so it’s counterintuitive that vendors in the speech market are continuing to struggle and most have not yet achieved profitability.

Worldwide investments in speech recognition technology are $650 million today and are expected to grow to $5.6 billion in 2006, according to Datamonitor. The market is ripe for speech recognition and the value proposition and economic benefits are clear, yet these products are not realizing their sales potential. Speech recognition should be as common as Interactive Voice Response (IVR) systems and in some companies should actually replace IVR applications, but it’s just not happening on a large scale.

The adoption rate of speech recognition in Customer Relationship Management (CRM) and call center environments, its primary market for the last 20-plus years, has been painfully slow. The acceptance of speech and voice technologies in non-CRM areas during the last two years, however, is exceeding its rate of implementation in call centers and will drive its success in call centers and CRM during the next five years.

A Brief History Lesson

The development of speech recognition technology is a great story. In 1997, after approximately 20 years of development, the accuracy level of speaker independent directed speech recognition was greater than 95 percent, giving enterprises the confidence to use it in call centers. (At a 95 percent accuracy level, speech recognition is as accurate as many call center representatives and often more polite!) Unfortunately for speech recognition vendors, this was also the time the Internet took off and too many enterprises decided to invest in Web initiatives instead of speech recognition, even though the potential return from speech recognition was higher. We all know what happened with the Web investments once reality hit, but the economic slowdown followed by the recession continue to put great pressure on enterprises to minimize new technology investments. Speech recognition is again a victim when it should be a must-have.

Although speech recognition technology is viable, there are just a few primary providers due to the high start-up investments. The leading vendors in speech and voice technologies in the United States are Nuance, Speechworks and IBM. Outside the United States, particularly in Europe, Philips is a contender but they have not been successful in the United States.

Speech recognition technology is ready for prime time. It has confronted and addressed many technology challenges in the last five years. System enhancements include:

  • Achieving an accuracy threshold of 96 to 98 percent on speaker independent directed speech recognition,
  • Adding “barge in” capability to improve system usability,
  • Development of natural language like interfaces that improve ease of use, and
  • Introduction of vendor independent VoiceXML development standards.
  • Ninety-nine percent recognition accuracy using voice verification in conjunction with speech recognition,
  • Reduction in computing needs for speech applications,
  • Support of vocabularies in excess of 1 million words,

Market Penetration Obstacles

There are three major obstacles slowing the penetration of speech recognition systems in service environments: proprietary coding languages, the difficulty in developing effective customer interfaces and the cost and time required to build speech applications.

The first challenge, proprietary coding languages, is being addressed with VoiceXML. VoiceXML, introduced in late 1998, is a non-propriety, open development standard for voice applications, and is changing the perception of how difficult it is to code voice applications. All four leading speech vendors, Nuance, SpeechWorks, IBM and Philips support VoiceXML standards and participate in industry committees to enhance this language. VoiceXML is still maturing, so it’s three to five years away from being flexible and complete enough to address the more sophisticated speech applications. Developers still need to use proprietary development code for advanced speech applications, but VoiceXML is fine for basic functionality.

The second major obstacle is the difficulty in designing effective scripts and user interfaces – how the speech recognition system communicates and interacts with system users makes the difference between customer acceptance and rejection of a speech application. The lack of true speech recognition scripting and customer interface domain experts (speech scientists) is exacerbating this problem, as are claims of expertise by companies that clearly lack it. Many speech recognition resellers and consulting firms purport to have the resources to design user-friendly speech recognition applications but experience keeps pushing prospects and users back to the primary vendors whose experienced speech scientists have designed more than five speech systems. When hiring external scripting and customer interface experts, avoid a false start by verifying that the company you are working with has experienced application coders, scriptwriters and speech scientists. Look for people who have built five or more applications and are familiar with your industry.

The third impediment is the perception that it’s very expensive to build speech applications (see Figure 1) and that investments in other technologies and applications have higher and quicker returns for companies. Managers are having a difficult time justifying investments in speech applications even though the payback from speech applications in a service environment can be as short as six to nine months, even for a complicated application. It’s possible that e-mail response management applications that are implemented with the proper procedures and processes could yield a higher ROI, and it’s likely that once CRM analytics software delivers on its promise to increase revenue, it will have a higher ROI. But this isn’t likely to happen anytime soon, so for the foreseeable future, speech recognition appears to be the clear winner.

If speech recognition tracks in a similar pattern to the IVR market, common acceptance in corporations won’t happen until enterprises are able to own their development and interface resources. (As these applications change frequently and sometimes even daily, it’s important to be close to the resources). VoiceXML addresses the coding issue but not the script and interface challenge. True speech recognition scripting and customer interface experts are rare, many have PhDs in speech and related areas, and they are very expensive. At the current pace it will be three to five years before there is a market of qualified speech scientists.

Driving Acceptance and Adoption

Two factors may speed up the acceptance of speech applications in service environments, the increasing use of voice and speech technologies for many functions, from hands free cellular phones to talking cars, and voice Application Service Providers (ASPs) and packaged application providers.

Voice ASPs and Packaged Application Providers: Voice ASPs and packaged application providers (see Figure 2) are trying to address speech recognition’s current limitations with out-of-box, verticalized systems that do not require a great deal of customization. Unfortunately, many of the enterprises looking for short cuts to entry in the speech market are not willing to compromise their service differentiators; they don’t want to use the same application as their competitors.

The Voice ASPs and packaged solution providers are on the right track by delivering verticalized applications but need to modify their business models. Out-of-box applications are a good starting point but Voice ASPs need to provide application development and customer interface expertise at reasonable prices.

Growing Demand for Speech Applications Outside Call Centers: Until 2000, the predominant use of speech recognition was in call centers in conjunction with IVR applications. As speech remains the most ubiquitous form of communication and is unlikely to be replaced by the Web in the next five years, there’s been a growing demand for speech-empowered applications from many industries hoping to reduce their costs while improving the quality of service. Uses include:

Content – stock quotes, sports, news, weather, and horoscopes;

Retail – placing orders, checking prices and availability of stock items, and locating stores and addresses;

Telecos – voice activated dialing, information portals, phone based e-mail readers;

Directory Assistance – speech-enabled directory assistance to replace overworked and often unpleasant phone operators who shouldn’t be serving the public;

Government – applying for loans, checking the status of filed documents, locating the closest post office, directions;

Energy – meter tracking, payment due dates, making payments, reporting gas leaks.

Transportation – providing train and airplane schedules, booking travel reservations, changing or canceling reservations, checking hotel rates and availability, checking status of loyalty programs;

Field Service – checking the status of parts, scheduling service visits, placing orders for repairs and parts;

Entertainment – identifying location of movies or shows, dinner reservations, ticket purchases, directions;

Credit Card – authenticating the caller, obtaining account balances and available credit, making payments, transferring funds, requesting copy of statement, activating account, reporting lost or stolen cards, new marketing promotions;

Automotive – basic car instructions;

Parcel Services – requesting a pick-up, package status, office locations and directions, rate calculations;

Call Center – product and service information, making payments, placing orders, reviewing order status, updating personal information, applying for credit, applying for jobs, locating stores, paying bills, pricing information, marketing promotions;

Future Applications

The uses and industry applications of speech recognition are growing, as are its opportunities within the CRM market. Speaker independent Speech-to-Text (STT) technologies are developing and are expected to be viable within the next three to five years. Speaker independent STT applications have huge potential within the CRM and call center markets as they open customer conversations to analysis and analytics, which will allow enterprises to mine these conversations for new revenue opportunities (privacy issues aside). Marketing organizations are anxious to capture and leverage the information freely shared by customers in call centers, but do not yet have automated tools to fully understand customers’ intents. Speaker independent STT gives them these tools.

Figure 3 lists just a few of the potential future applications of speech recognition, displaying the true flexibility of this technology. Directed speech recognition is very flexible and allows enterprises to automate many activities currently being handled by expensive service agents. Automating basic functions will reduce operating expenses and improve quality as studies show that service representatives often get bored handling mundane activities and do a better job dealing with more complex issues.

Compelling Cases

The ROI for speech is very compelling, as the following case studies illustrate. United Airlines and United Parcel Service (UPS) started with small speech recognition applications and added new functionality when the initial applications proved overwhelmingly successful. Both companies realized quick payback, reduced operating expenses and improvements in quality and customer satisfaction.

In 1998, United Airlines, the second largest air carrier in the world, wanted to reduce the cost of processing 1.5 million employee calls to its reservation representatives requesting complimentary travel. Working with SpeechWorks, United Airlines implemented a speech-empowered employee reservation system within four months. United’s employees were very pleased with the self-service speech system since it was personalized and confidential. The employee reservation system paid for itself quickly and freed United’s reservation specialists to handle revenue-generating customers.

In August 1999, United rolled out the industry’s first delayed baggage speech recognition module. Using a speech recognition system to handle a sensitive issue like delayed baggage was challenging, but the automated service has been well received by customers and employees.

Encouraged by the success of the employee reservation system and delayed baggage application, United quickly proceeded to build a flight information speech recognition system for its customers in five months. This module answers questions about United’s 1,700 daily flights, handling 40,000 to 60,000 calls daily and 200,000 calls at peak. Off loading basic informational inquiries to the speech recognition system frees United service agents to handle more valuable revenue-generating calls.

United is continuing to bring to market innovative speech recognition applications that benefit its customers and shareholders and is considering new opportunities to use speech recognition within its call centers. United knows that speech recognition isn’t for everyone, so it allows its customers to opt in or out of the system. But customers who do use the speech-empowered self-service options find them convenient and satisfying.

United does not publicly share its ROI data but its continuous investments in speech recognition applications reflect its huge success.

United Parcel Service (UPS), the largest express carrier and package delivery company with 2001 annual revenues of $30.6 billion, delivers more than 13 million packages and documents per day for 1.8 million shipping customers and 6.1 million recipients worldwide. In 1997, UPS was faced with the challenge of distributing more than 265 million packages in a single month. Dedicated to providing outstanding customer service with minimal or no hold time, UPS knew it couldn’t staff up its already large workforce of 6,000 operators quickly enough to handle the anticipated call volume without a decline in service quality. Instead, they decided to implement a speech recognition application to handle inquiries about package status. Working with Nuance, the system was implemented within four months, in time to handle peak holiday call volume. The system paid for itself within three months. The tracking application typically handles 120,000 calls a day but can handle a peak volume of 936 simultaneous callers and 240,000 daily calls.

Since 1997, UPS has continued to introduce innovative and helpful speech recognition applications for its customers. In 1998, UPS deployed a package pick-up module that has shaved two minutes off pick-up requests and has also automated the delivery notice follow-up process to enable customers to locate packages that could not otherwise be delivered. In 2001, UPS deployed a cost calculator module that allows customers to determine the price of package deliveries. As valuable as these innovations have been for saving money and speeding up service delivery, UPS’s customers are also “comfortable with these friendly speech applications because they are in control, ” states Joan Madden, Project Manager. “They know our operators are available to assist them 24/7 but are increasingly using our speech applications because they meet their needs.”

A Final Word

Within five years, speech recognition technology will become so pervasive in our daily lives that service environments lacking this technology will be considered inferior. Speech recognition technology is now viable for all companies but the remaining challenge is the limited availability of experienced user interface and scripting specialists. It’s expected to take another three years before speech recognition scripting and user interface experts are readily available but enterprises cannot afford to wait. It’s time to invest in speech recognition, even if your script isn’t perfect at first. Just as the market did with IVR applications in the 1980’s, carefully monitor new speech recognition applications and be prepared to make frequent scripting changes. As scripts and interfaces are enhanced, your company will develop the necessary scripting and user interface expertise.


Figure 1

Figure 2

Figure 3


, , ,