Interactive Voice Response
For users with a straightforward enquiry, make it quick for them by offering an Interactive Voice Response (IVR) service. This tutorial will walk you through building an application to do exactly that with text-to-speech (TTS) prompts and keypad input.
The scenario is this: a customer phones a delivery company to find out the status of their order. They will be prompted to enter their order number and then hear a spoken response letting them know the (randomly generated by our example code) status of their order.
This tutorial is based on the IVR use case. All the code is available on GitHub.
In this tutorial
- Get set up - Create an application, configure it to point to your code, and set up the numbers you will use for this tutorial.
Make the call - Call your application and walk through the prompts to hear the spoken information.
Code review: Handle an inbound call - How to make the first response to the incoming call.
Code review: Send a text-to-speech greeting - Greet the user with text-to-speech upon answer.
Code review: Request user input via IVR - Create a text-to-speech prompt then request user input.
Code review: Respond to user input - Handle the user order number input and play back status via text-to-speech.
Tips for better text-to-speech experiences - Inspect the helper methods we use to give better spoken responses.
Next steps - Some further reading for your enjoyment.
Setting Up for IVR
In order to work through this tutorial you need:
- A Vonage account.
- The Nexmo CLI installed and set up.
- A publicly accessible PHP web server so that Vonage can make webhook requests to your app, or for local development we recommend ngrok.
- The tutorial code, either clone the repository or download and extract the zip file to your machine.
- Learn how to use
ngrok
Create a Voice Application
A Vonage application contains the security and configuration information you need to connect to Vonage endpoints and use our products. You make calls to a Vonage product using the security information in the application. When the call connects, Vonage communicates with your webhook endpoints so you can manage your call.
You can use Nexmo CLI to create an application for Voice API by using the following command and replacing the YOUR_URL
segments with the URL of your own application:
nexmo app:create phone-menu YOUR_URL/answer YOUR_URL/event
Application created: 5555f9df-05bb-4a99-9427-6e43c83849b8
This command uses the app:create
command to create a new app. The parameters are:
-
phone-menu
- the name you give to this application -
YOUR_URL/answer
- when you receive an inbound call to your Vonage number, Vonage makes a GET request and retrieves the NCCO that controls the call flow from this webhook endpoint -
YOUR_URL/event
- as the call status changes, Vonage sends status updates to this webhook endpoint
The command returns the UUID (Universally Unique Identifier) that identifies your application - you might like to copy and paste this as we will need it later.
Buy a phone number
To handle inbound calls to your application, you need a number from Vonage. If you already have a number to use, jump to the next section to associate the existing number with your application.
You can use the Nexmo CLI to buy the phone number:
nexmo number:buy --country_code GB --confirm
Number purchased: 441632960960
The number:buy
command allows you to specify which country the number should be in, using ISO 3166-1 alpha-2 format. By specifying --confirm
as well, we don't need to confirm the choice of number, the first available one will be purchased.
Now we can set up the phone number to point to the application you created earlier.
Link phone numbers to the Vonage Application
Next you will link each phone number with the phone-menu application you created. When any event occurs relating to a number associated with an application, Vonage sends a web request to your webhook endpoints with information about the event. To do this, use the link:app
command in the Nexmo CLI:
nexmo link:app 441632960960 5555f9df-05bb-4a99-9427-6e43c83849b8
The parameters are the phone number you want to use and the UUID returned when you created a voice application earlier.
Try it yourself!
There's a detailed walkthrough of the code sample but for the impatient, let's try the application before we dive in too deeply. You should have your number and application created and linked from the above instructions; now we'll grab and run the code.
Start by cloning the repository if you haven't already.
In the project directory, install the dependencies with Composer:
composer install
Copy the config.php.dist
to config.php
and edit it to add your base URL (the same URL that you used when setting up the application above).
If you're using ngrok, it randomly generates a tunnel URL. It can be helpful to start ngrok before doing the other configuration so that you know what URL your endpoints will be on (paid ngrok users can reserve tunnel names). It might also be useful to know that there is a nexmo app:update
command if you need update the URLs you set at any time
All set? Then start up the PHP webserver:
php -S 0:8080 ./public/index.php
Once it's running, call your Vonage number and follow the instructions! The code receives webhooks to /event
as the call is started, ringing, etc. When the system answers the call, a webhook comes in to /answer
and the code responds with some text-to-speech and then waits for user input. The user's input then arrives by webhook to /search
and again the code responds with some text-to-speech.
Now you've seen it in action, you may be curious to know how the various elements work. Read on for a full walkthrough of our PHP code and how it manages the flow of the call...
Handle an Inbound Call
When Vonage receives an inbound call to your Vonage number it makes a request to the event webhook endpoint you set when you created a Voice application. A webhook is also sent each time DTMF input is collected from the user.
This tutorial code uses a small router to handle these inbound webhooks. The router determines the requested URI path and uses it to map the caller's navigation through the phone menu - the same as URLs in web application.
Data from webhook body is captured and passed in the request information to the Menu:
<?php
// public/index.php
require_once __DIR__ . '/../bootstrap.php';
$uri = ltrim(strtok($_SERVER["REQUEST_URI"],'?'), '/');
$data = file_get_contents('php://input');
Vonage sends a webhook for every change in call status. For example, when the phone is ringing
, the call has been answered
or is complete
. The application uses a switch()
statement to log the data received by the /event
endpoint for debug purposes. Every other request goes to the code that handles the user input. Here is the code:
<?php
// public/index.php
switch($uri) {
case 'event':
error_log($data);
break;
default:
$ivr = new \NexmoDemo\Menu($config);
$method = strtolower($uri) . 'Action';
$ivr->$method(json_decode($data, true));
header('Content-Type: application/json');
echo json_encode($ivr->getStack());
}
Any request that is not for /event
is mapped to an Action
method on the Menu
object. Incoming request data is passed to that method. The router retrieves the NCCO (Nexmo Call Control Object) and sends it in the response as a JSON body with the correct Content-Type
.
The $config
array is passed to the Menu
object, as it needs to know the base URL for the application when generating NCCOs that could include callback URLs:
<?php
// src/Menu.php
public function __construct($config)
{
$this->config = $config;
}
Generate NCCOs
A Nexmo Call Control Object (NCCO) is a JSON array that is used to control the flow of a Voice API call. Vonage expects your answer webhook to return an NCCO to control the various stages of the call.
To manage NCCOs this example application uses array manipulation and a few methods.
The router handles encoding to JSON, the Menu
object provides access to the NCCO stack with its getStack()
method:
<?php
// src/Menu.php
public function getStack()
{
return $this->ncco;
}
There are also some helper methods to provide the foundation for managing the NCCO stack. You may find these useful in your own applications:
<?php
// src/Menu.php
protected function append($ncco)
{
array_push($this->ncco, $ncco);
}
protected function prepend($ncco)
{
array_unshift($this->ncco, $ncco);
}
Send text-to-speech greeting
Vonage sends a webhook to the /answer
endpoint of the application when the call is answered. The routing code sends this to the answerAction()
method of the Menu
object, which begins by adding an NCCO containing a greeting.
<?php
// src/Menu.php
public function answerAction()
{
$this->append([
'action' => 'talk',
'text' => 'Thanks for calling our order status hotline.'
]);
$this->promptSearch();
}
This is a great example of how to return a text-to-speech message.
Request user input via IVR (Interactive Voice Response)
For our example application, the user needs to supply their order ID. For this part, first add another "talk" NCCO to the prompt (if the greeting was included, you'd greet the user every time we asked them for their order number). The next NCCO is where the user's input is received:
<?php
// src/Menu.php
protected function promptSearch()
{
$this->append([
'action' => 'talk',
'text' => 'Using the numbers on your phone, enter your order number followed by the pound sign'
]);
$this->append([
'action' => 'input',
'eventUrl' => [$this->config['base_path'] . '/search'],
'timeOut' => '10',
'submitOnHash' => true
]);
}
The eventUrl
option in your NCCO is used to specify where to send the webhook when the user has entered their data. This is essentially the same thing you do with the action
property of a HTML <form>
. This is where the $config
array and the base URL are used.
A few other input
specific properties are used. timeOut
gives the user more time to enter the order number and submitOnHash
lets them avoid waiting by ending their order ID with the pound sign (that's a hash symbol '#' for all you British English speakers).
Respond to user input
After the user has provided input, Vonage sends a webhook to the eventUrl
defined in the input
. Since we set the eventUrl
to /search
, our code routes the request to searchAction
. The request includes a dtmf
field which contains the numbers input by the user. We use this input data and randomly generate example data to return to the user, your applications will probably do something much more useful such as fetch information from a database. Here's the action:
<?php
// src/Menu.php
public function searchAction($request)
{
if(isset($request['dtmf'])) {
$dates = [new \DateTime('yesterday'), new \DateTime('today'), new \DateTime('last week')];
$status = ['shipped', 'backordered', 'pending'];
$this->append([
'action' => 'talk',
'text' => 'Your order ' . $this->talkCharacters($request['dtmf'])
. $this->talkStatus($status[array_rand($status)])
. ' as of ' . $this->talkDate($dates[array_rand($dates)])
]);
}
$this->append([
'action' => 'talk',
'text' => 'If you are done, hangup at any time. If you would like to search again'
]);
$this->promptSearch();
}
As you can see from the search action, the sample application sends some rather silly data back to the user! There is an NCCO that includes the order number from the incoming dtmf
data field, a random order status and a random date (today, yesterday or a week ago) as a spoken "update". In your own application, there would probably be some more logical, err, logic.
Once the order information is passed to the user, they are told that they can hang up at anytime. The method that adds the order prompt NCCO is reused. That way the user can search for another order, but does not hear the welcome prompt every time.
Tips for better Text-To-Speech experiences
The Menu
class has a few more methods that are there to improve the process of turning application data into spoken prompts. Examples in this application include:
- the date when the status was last reported
- the order number
- the order status
We have methods to assist us in communicating these values to the user. First: the talkDate
method returns a string with a date format that works well for spoken words.
<?php
// src/Menu.php
protected function talkDate(\DateTime $date)
{
return $date->format('l F jS');
}
The talkCharacters
method puts a space between each character in a string, so they are read individually. We use this when reporting the order number:
<?php
// src/Menu.php
protected function talkCharacters($string)
{
return implode(' ', str_split($string));
}
The talkStatus
method converts a very terse constant into a more conversational phrase using a lookup:
<?php
// src/Menu.php
protected function talkStatus($status)
{
switch($status){
case 'shipped':
return 'has been shipped';
case 'pending':
return 'is still pending';
case 'backordered':
return 'is backordered';
default:
return 'can not be located at this time';
}
}
Conclusion
Now you've built a interactive phone menu that both collects input from the user, and responds with (albeit fake) information. Instead of using the talk
NCCO to inform the user, a connect
NCCO could forward the call to a particular department, or a record
NCCO could capture a voicemail from the user.
Next Steps
Here are a few more resources that could be useful for building this type of application:
- Text-To-Speech Guide - includes the different voices on offer and information about SSML (Speech Synthesis Markup Language) for better control of the spoken output.
- Twitter IVR - another fairly silly example but a great example app written in Python.
- Text To Speech With Prompt Calls Using Python on AWS Lambda - a similar application but this time using AWS Lambda (a serverless platform) and Python.
- Code Samples for handling DTMF - examples in various programming languages for handling the user keypad input as used in this tutorial.