Introducing General Availability for Automatic Speech Recognition

Speech recognition has become an increasingly popular concept in recent years. For organizations and individuals – especially businesses that provide a service – the use of this technology provides a better customer experience and reduces organizational costs.

We’re thrilled to announce the general availability of Automatic Speech Recognition (ASR) with GetInput XML, an easy way to configure engaging, voice-driven user experiences on Plivo.

Plivo’s ASR technology takes away the heavy legwork that’s often associated with building intelligent AI-driven voice interactions. Our ASR helps build responsive applications that act on partial recognition results as your customer speaks, and we’re able to make voice transcriptions available to your application in real time!

Here are some ways ASR can greatly enhance your end user experience, as well as help your customer service agents work more efficiently.

  • Conversational IVRs: Upgrade a manual, traditional IVR menu to a speech-driven experience that gets callers the answers they seek faster. Conversational IVRs can do more than just say, “Press 1.”
  • Voice Search: Build virtual assistants that intelligently provide relevant information based on the user’s query.
  • Surveys and Form Fills: Prompt users with questions and automatically capture and transcribe their answers to fill out forms and surveys.

How does it work?

Plivo routes user responses based on a speech or a digit-selection prompt. When collecting a user’s speech as the input, Plivo transcribes and relays the spoken phrases to the specified action URL in real time. When collecting input through digit press, the digits entered by the user are relayed to the specified action URL. For more information, check out our detailed product documentation.

How much does it cost?

The amount you’re charged is based on the duration of speech that is analyzed. Charges are calculated as USD $0.02 per a 15 second pulse (rounded up). For example, if speech was recognized for 35 seconds, the account would be billed for 45 seconds (15 * 3) of speech.

What are some key features of Plivo’s ASR functionality?

Extensive Language Support: Speech recognition support for 27 major languages and their regional variants. Click here to see which languages are supported.

Speech Adaptation With Hints: Improve speech recognition accuracy by providing a set of hint words and phrases expected from the speaker. This feature can greatly improve transcription accuracy of proper nouns, homophones (ex: one, won), and domain-specific words rarely used in everyday conversation.

Prebuilt Models: Reduce the amount of time spent configuring an IVR system and select from a range of pre-built models, depending on your use case.

Profanity Filter: Keep your transcriptions clean, and identify and monitor the use of profanity. The profanity filter masks specific words in the transcriptions programmatically forwarded to your application.

Simul-Input Detection: Augment your existing IVRs with speech by enabling the simultaneous detection of DTMF and speech inputs. For example: ‘Press 1 or say ‘yes’ to accept.’ This keeps the phone tree element intact.

Advanced End-Of-Speech Detection: Automatically detect the end of the user’s speech. Advanced timeout controls help configure the end of speech detection behavior.

Interim Transcription Results: Reduce response times by receiving transcription results in real time with each new word spoken by the caller.

Getting Started

Getting started with Speech Recognition is easy. Head on over to our product guide for detailed references and code samples. All Plivo Server SDKs come with helper functions to work with GetInput XML. Click here for language-specific guides on getting started with Plivo Voice APIs.

Not using Plivo yet? Getting started takes just five minutes. Sign up today!

comments powered by Disqus