It seems like voice-to-text technology is everywhere nowadays. Long gone are the days when Dragon was the only transcription game in town and you had to train it for hours to give the overmatched processors of the day any chance of getting the job done. Now Dragon works out of the box, Siri understands and answers your questions the first time almost all the time, and Interactive Voice Response (IVR) has become the norm in most contact centers. And here’s the thing, we all just take it for granted and expect it to work. If I have to reply to an email or write a long text message on my phone, I do it via voice and save my fat thumbs the hassle – and I’m surprised when I have to do anything beyond very minor editing.
According to Roger Northrop, CTO at Mutare Software, we are experiencing the long-anticipated hockey stick in voice-to-text effectiveness that is bringing the tech mainstream. “The speed of the solutions has become incredibly fast. Not only has the software become more advanced, but the processing power is now up to the task,” explained Northrop. He went on to explain that in most instances the heavy lifting of the transcription is being done on the cloud in data centers, and as a result the return of usable text is seemingly instantaneous. “The combination of cloud-based processing and specialized speech-to-text engines geared to effectively accomplish certain tasks has made transcription ubiquitous and practically invisible.”
Types of Voice-to-Text Transcription Engines and How They Benefit Businesses
There are two reasons the improvement in voice-to-text is so important. First, it is faster to speak than it is to type or provide data in any other manner, and second, you can’t search or leverage voice until it is turned into text. To make this capability as effective as possible for various business use cases, there are several types of voice-to-text engines available;
This is the type most of are now taking advantage of on a daily basis. Dragon Systems (now Nuance) was the real pioneer in this space and they are still considered the leader. You can still buy a version of Dragon for your PC and the software has advanced to the point where it runs very effectively on a modern processor with little or no training. A small percentage of business users – especially those that struggle with typing – utilize Dragon for everyday business activities like emailing and creating documents. If you spend time getting to know the system, the voice commands are pretty sweet.
If you use the dictation capability on an iPhone or iPad, you are also using the Nuance product, which explains the effectiveness. Google has their own very impressive personal transcription engine – which was initially code named Majel after the woman who played the voice of the computer in the original Star Trek series. Both Apple and Google perform the heavy lifting of processing the transcription in the cloud.
Contact Centers – Interactive Voice Response
I remember the first time I called into a contact center and was asked to speak my answers, probably back in 1996 or so and it blew me away. I was walking back to my office from lunch at the time and it was the talk of the tech team for the next hour. How’d they pull it off so quickly? The answer is with a voice-to-text engine that was customized to recognize answers to specific questions – name, address, customer number, phone number, yes/no, choose an option, etc.. This was a lot easier than trying to transcribe every word someone might speak. It also allowed contact centers to take that newly available data and integrate it in their processes by providing agents with all the customer details before they even get on a call. These engines have continued to progress and become more capable and are now industry standard. Of course, I wish more companies would complete the integration so I didn’t have to answer all the questions with the IVR and then have to repeat them once I finally speak to an agent.
Voicemail transcription is another area where voice-to-text shines. As we discussed in an earlier post:
Tapping into digitally recorded messages and bouncing them against a transcription platform allows the recipient to receive an email or text message within moments containing a voice-to-text translation of the message. This provides the employee with the gist of (and often, if the caller doesn’t mumble, the exact transcription) of the voicemail, allowing them to gauge its importance, determine whether they should listen to the entire message for more nuanced insights and prioritize the response time.
According to Northrop, this is another example of a specialized engine. “There are certain cadences to most voicemails, with salutations, names, phone numbers and action items generally included. So there has been a lot of tweaking of the engine to not only perform general transcription, but also to be able to take advantage of the patterns inherent in voicemail.” Transcription of voicemails and immediate notification for users is keeping voicemail relevant in a unified communications world.
“We had a law firm approach us and say ‘We have 5,000 hours of audio from depositions and we need to transcribe it, review it for keywords and get it back in two days’. This is where the new Real-Time Transcription engines come into play.”
These are the serious workhorses of voice-to-text. Not only can they do extremely accurate transcribing at a tremendously fast clip – up to 45 hours of audio in one hour, according to Northrop – but they also can assess voice biometrics to determine who is speaking during a conversation. This can come in very handy when doing transcription of a deposition or closed captioning of a newscast for example. “These engines can also do keyword spotting and highlight them in the text with bold or certain colors, and can do the same thing when speakers are speaking with strong emotions, highlighting them in red for anger or green for complimentary” says Northrop.
Turning the spoken word into the written word opens up the opportunity to improve business processes, customer service and communications. With the effectiveness of engines continuing to increase and the processing capacity no longer the issue it once was, expect to see businesses look for creative ways to take advantage of voice-to-text transcription.