Daniel Ziv, VP, Verint * Heath Ahrens, CEO, iSpeech
Amy Neustein, Editor, Springer Series in Speech Tech
The success of the iPhone's SIRI interface has taken a lot of voice recognition experts by surprise. Nevertheless, it is expected to open a lot of doors - and it may have created its own monster.
Speech recognition is a technology that has been brewing slowly sine the 1980s' but the combination of cloud computing, powerful smart phones and huge amounts of statistical voice data has been magical. Speech can be rapidly sent to a server where something like 3 gigs of RAM are allocated per speech processing session. The server then access massive amounts of statistical data to find just the right context for the question so that it can make better sense of the speech. They key word there is disambiguation - the better it understands the context the more accurate its understanding of your speech.
Since this has been developing quietly across the board, few insiders expected Apple to do so well with the iPhone 4S and SIRI. Now that it has, it has become a major marketing vehicle for Apple and their supplier, Nuance, has been in an acquisition mode that is either exciting or terrorizing the industry with patent suits.
The bigger issue is what happens when a technology - long considered a sideshow - moves to center stage? Who remembers that Symantec, the antiviral company started out as a natural language front-ended database? The critics loved the product but the public quickly became bored.
Companies like iSpeech, founded by Heath Ahrens, provide a Software Development Kit that enables any mobile or web developer to add a highly effective voice interaction component. Daniel Ziv of Verint, the company that is quietly behind the "this is being recorded for quality assurance" message you hear on customer assistance calls, showed what happened with these recordings. Those interactions are mined for clues about what customers really think. Customer Center calls tend to be the canary in the coal mine - the calls are where they show up first with their real feelings.
So, having a voice interaction is not only a CRM advantage, it is an invaluable source for mining true customer sentiment. It is so powerful that as Amy Neustein, the editor of the Springer Verlag Voice Tech series pointed out, you can determine mood and modify your message accordingly. It could also be used in a number of security-related applications.
In Verint's case, they can show how customer sentiment is reflected in key words that are not always obvious but when you know - watch out. For example, when a customer starts calling themselves a "customer" or even worse, "valued customer" there is a very high chance they are looking to fire your company. When they use the words "you people" - depending on the context - you just might want to call 911.
Bottom line: voice is here, it comes with a ton of psychology and it may be the gateway to a host of new product interfaces - each one with its own special set of rules and linguistic contexts. And if Nuance doesn't take over the world, you could still do what Apple did with surprisingly few resources and very little money.