Top Menu

The Ideal Speech Analytics Solution 

The Ideal Speech Analytics Solution

The Ideal Speech Analytics Solution 1/3/2014
By Donna Fluss


Speech analytics applications use a variety of mathematical algorithms, other analytic techniques, and contextual and other call/customer-related metadata (increasingly including screen, desktop and text analytics) to structure unstructured conversations. These techniques can be applied to live calls or conversations or to recorded audio files.

The analysis is multi-phased; it starts with a speech engine – either phonetic or large vocabulary continuous speech recognition (LVCSR, also known as speech-to-text) – to perform the initial task of converting a conversation into system-readable data that can be analyzed further. Next, each of the speech analytics solutions applies different technology and methodology to create output that can be analyzed for meaning and searched by the customer. This step in the process, which enriches the output file to improve its usefulness, is critically important and an area of great differentiation among the solutions. The output is indexed prior to making it available to end users so that they can analyze it.

All speech analytics solutions use an underlying speech engine to perform the initial analysis of a conversation. The two primary types of speech engines are LVCSR and phonetic.

LVCSR engines depend on a language model that includes a vocabulary/dictionary to do a speech-to-text conversion of audio files. The text file is then searched for target words, phrases and concepts. Users can add words, names and phrases to the language model, but the accuracy of the recognition depends on the contents of the language model.

Phonetics-based applications separate conversations into phonemes, the smallest components of spoken language. Phonetics-based applications then find segments within the long file of phonemes that match a phonetic index file representation of target words, phrases and concepts. Phonetic engines do not depend upon a language model or dictionary; instead they rely on the sounds of any spoken word or utterance. It is interesting to note that during the LVCSR process, words are broken down into phonemes, which is one reason why many of the LVCSR vendors now claim to do a phonetic analysis.

There are limitations and challenges associated with both phonetic and LVCSR-based engines. Phonetic engines continue to be easier to deploy because they do not depend upon a language model or require users to pre-define all of the words, phrases and terms used by an organization. However, while phonetics-based solutions can process large volumes of data quickly and without a language model, LVCSR-based solutions have proven to be more accurate in discerning the details of the reasons why customers call.

For this reason, DMG Consulting maintains that the ideal speech analytics solution should include both LVCSR and phonetic recognition engines. This allows an organization to optimize the use of the application and increase the accuracy of its output, as each engine can be properly applied for different aspects of the analysis. (See the Figure below).

It is also advisable for the solution to be able to perform a post-call (historical) analysis of the interaction in addition to having the ability to analyze conversations in real time. This is because there are very different uses for post-call and real-time speech analytics.

Real-time speech analytics solutions can identify issues that need to be resolved while the caller is still on the phone. These solutions allow companies to alter the outcome of conversations as they are happening.

Post-call (historical) speech analytics applications allow companies to identify and analyze trends. This data can be used to identify all types of operational and performance issues as well as sales opportunities.

The value of both of these applications increases when they are used together, as they give a company an opportunity to identify an issue in real time and then evaluate its impact on the organization over a longer period.

The Ideal Speech Analytics Platform

, ,