NCCO reference

A Nexmo Call Control Object (NCCO) is a JSON array that you use to control the flow of a Voice API call. For your NCCO to execute correctly, the JSON objects must be valid.

While developing and testing NCCOs, you can use the Voice Playground to try out NCCOs interactively. You can read more about it in the Voice API Overview or go directly to the Voice Playground in the Dashboard.

NCCO actions

The order of actions in the NCCO controls the flow of the Call. Actions that have to complete before the next action can be executed are synchronous. Other actions are asynchronous. That is, they are supposed to continue over the following actions until a condition is met. For example, a record action terminates when the endOnSilence option is met. When all the actions in the NCCO are complete, the Call ends.

The NCCO actions and the options and types for each action are:

Action Description Synchronous
record All or part of a Call No
conversation Create or join an existing Conversation Yes
connect To a connectable endpoint such as a phone number or VBC extension. Yes
talk Send synthesized speech to a Conversation. Yes, unless bargeIn=true
stream Send audio files to a Conversation. Yes, unless bargeIn=true
input Collect digits or capture speech input from the person you are calling. Yes
notify Send a request to your application to track progress through an NCCO Yes

Note: Connect an inbound call provides an example of how to serve your NCCOs to Vonage after a Call or Conference is initiated

Record

Use the record action to record a Call or part of a Call:

Copy to Clipboard
[
  {
    "action": "record",
    "eventUrl": ["https://example.com/recordings"]
  },
  {
    "action": "connect",
    "eventUrl": ["https://example.com/events"],
    "from":"447700900000",
    "endpoint": [
      {
        "type": "phone",
        "number": "447700900001"
      }
    ]
  }
]

The record action is asynchronous. Recording starts when the record action is executed in the NCCO and finishes when the synchronous condition in the action is met. That is, endOnSilence, timeOut or endOnKey. If you do not set a synchronous condition, the Voice API immediately executes the next NCCO without recording.

For information about the workflow to follow, see Recording.

You can use the following options to control a record action:

Option Description Required
format Record the Call in a specific format. Options are:
  • mp3
  • wav
  • ogg
The default value is mp3, or wav when recording more than 2 channels.
No
split Record the sent and received audio in separate channels of a stereo recording—set to conversation to enable this. No
channels The number of channels to record (maximum 32). If the number of participants exceeds channels any additional participants will be added to the last channel in file. split conversation must also be enabled. No
endOnSilence Stop recording after n seconds of silence. Once the recording is stopped the recording data is sent to event_url. The range of possible values is 3<=endOnSilence<=10. No
endOnKey Stop recording when a digit is pressed on the handset. Possible values are: *, # or any single digit e.g. 9 No
timeOut The maximum length of a recording in seconds. One the recording is stopped the recording data is sent to event_url. The range of possible values is between 3 seconds and 7200 seconds (2 hours) No
beepStart Set to true to play a beep when a recording starts No
eventUrl The URL to the webhook endpoint that is called asynchronously when a recording is finished. If the message recording is hosted by Vonage, this webhook contains the URL you need to download the recording and other meta data. No
eventMethod The HTTP method used to make the request to eventUrl. The default value is POST. No

The following example shows the return parameters sent to eventUrl:

Copy to Clipboard
{
  "start_time": "2020-01-01T12:00:00Z",
  "recording_url": "https://api.nexmo.com/v1/files/aaaaaaaa-bbbb-cccc-dddd-0123456789ab",
  "size": 12345,
  "recording_uuid": "aaaaaaaa-bbbb-cccc-dddd-0123456789ab",
  "end_time": "2020-01-01T12:01:00Z",
  "conversation_uuid": "bbbbbbbb-cccc-dddd-eeee-0123456789ab",
  "timestamp": "2020-01-01T14:00:00.000Z"
}

Possible return parameters are:

Name Description
recording_uuid The unique ID for the Call.
Note: recording_uuid is not the same as the file uuid in recording_url.
recording_url The URL to the file containing the Call recording
start_time The time the recording started in ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ. For example 2020-01-01T12:00:00Z
end_time The time the recording finished in ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ. For example 2020-01-01T12:00:00Z
size The size of the recording at recording_url in bytes. For example: 603423
conversation_uuid The unique ID for this Call.

Conversation

You can use the conversation action to create standard or moderated conferences, while preserving the communication context. Using conversation with the same name reuses the same persisted Conversation. The first person to call the virtual number assigned to the conversation creates it. This action is synchronous.

Note: you can invite up to 50 people to your Conversation.

The following NCCO examples show how to configure different types of Conversation. You can use the answer_url webhook GET request parameters to ensure you deliver one NCCO to participants and another to the moderator.

Copy to Clipboard
[
  {
    "action": "conversation",
    "name": "nexmo-conference-standard",
    "record": "true"
  }
]
Copy to Clipboard
// As the customer is the first person to join, there is no canHear/canSpeak entry
// The customer's leg ID is 6a4d6af0-55a6-4667-be90-8614e4c8e83c
[
  {
    "action": "conversation",
    "name": "selective-audio-demo",
    "startOnEnter": false,
    "musicOnHoldUrl": ["https://nexmo-community.github.io/ncco-examples/assets/voice_api_audio_streaming.mp3"],
  }
]

// The agent joins and can both hear, and speak to the customer
// The agent's leg ID is 533c0874-f43d-446c-a153-f35bf30783fa
[
  {
    "action": "conversation",
    "name": "selective-audio-demo",
    "startOnEnter": true,
    "record": true,
    "canHear": ["6a4d6af0-55a6-4667-be90-8614e4c8e83c"], // Customer leg ID
    "canSpeak": ["6a4d6af0-55a6-4667-be90-8614e4c8e83c"] // Customer leg ID
  }
]

// Finally, the supervisor joins the conversation. They can hear both the customer
// and the agent, but only speak to the agent
// The supervisor's leg ID is e2833e43-db39-4c1a-b689-d17ad2cf3529
[
  {
    "action": "conversation",
    "name": "selective-audio-demo",
    "startOnEnter": true,
    "record": true,
    "canHear": ["6a4d6af0-55a6-4667-be90-8614e4c8e83c", "533c0874-f43d-446c-a153-f35bf30783fa"] // Customer leg ID, Agent leg ID
    "canSpeak": ["533c0874-f43d-446c-a153-f35bf30783fa"] // Agent leg ID
  }
]
Copy to Clipboard
[
  {
    "action": "conversation",
    "name": "nexmo-conference-moderated",
    "record": true,
    "startOnEnter": true
  }
]
Copy to Clipboard
[
  {
    "action": "talk",
    "text": "Welcome to a Vonage moderated conference. We will connect you when an agent is available",
    "voiceName": "Amy"
  },
  {
    "action": "conversation",
    "name": "nexmo-conference-moderated",
    "startOnEnter": false,
    "musicOnHoldUrl": ["https://nexmo-community.github.io/ncco-examples/assets/voice_api_audio_streaming.mp3"]
  }
]

You can use the following options to control a conversation action:

Option Description Required
name The name of the Conversation room. Names are namespaced to the application level. Yes
musicOnHoldUrl A URL to the mp3 file to stream to participants until the conversation starts. By default the conversation starts when the first person calls the virtual number associated with your Voice app. To stream this mp3 before the moderator joins the conversation, set startOnEnter to false for all users other than the moderator. No
startOnEnter The default value of true ensures that the conversation starts when this caller joins conversation name. Set to false for attendees in a moderated conversation. No
endOnExit Specifies whether a moderated conversation ends when the moderator hangs up. This is set to false by default, which means that the conversation only ends when the last remaining participant hangs up, regardless of whether the moderator is still on the call. Set endOnExit to true to terminate the conversation when the moderator hangs up. No
record Set to true to record this conversation. For standard conversations, recordings start when one or more attendees connects to the conversation. For moderated conversations, recordings start when the moderator joins. That is, when an NCCO is executed for the named conversation where startOnEnter is set to true. When the recording is terminated, the URL you download the recording from is sent to the event URL. You can override the default recording event URL and default HTTP method by providing custom eventUrl and eventMethod options in the conversation action definition.
By default audio is recorded in MP3 format. See the recording guide for more details
No
canSpeak A list of leg UUIDs that this participant can be heard by. If not provided, the participant can be heard by everyone. If an empty list is provided, the participant will not be heard by anyone No
canHear A list of leg UUIDs that this participant can hear. If not provided, the participant can hear everyone. If an empty list is provided, the participant will not hear any other participants No
mute Set to true to mute the participant. The audio from the participant will not be played to the conversation and will not be recorded. When using canSpeak, the mute parameter is not supported. No

Connect

You can use the connect action to connect a call to endpoints such as phone numbers or a VBC extension.

This action is synchronous, after a connect the next action in the NCCO stack is processed. A connect action ends when the endpoint you are calling is busy or unavailable. You ring endpoints sequentially by nesting connect actions.

The following NCCO examples show how to configure different types of connections.

Copy to Clipboard
[
  {
    "action": "talk",
    "text": "Please wait while we connect you"
  },
  {
    "action": "connect",
    "eventUrl": ["https://example.com/events"],
    "timeout": "45",
    "from": "447700900000",
    "endpoint": [
      {
        "type": "phone",
        "number": "447700900001",
        "dtmfAnswer": "2p02p"
      }
    ]
  }
]
Copy to Clipboard
[
  {
    "action": "talk",
    "text": "Please wait while we connect you"
  },
  {
    "action": "connect",
    "eventType": "synchronous",
    "eventUrl": [
      "https://example.com/events"
    ],
    "from": "447700900000",
    "endpoint": [
      {
        "type": "websocket",
        "uri": "ws://example.com/socket",
        "content-type": "audio/l16;rate=16000",
        "headers": {
            "name": "J Doe",
            "age": 40,
            "address": {
                "line_1": "Apartment 14",
                "line_2": "123 Example Street",
                "city": "New York City"
            },
            "system_roles": [183493, 1038492, 22],
            "enable_auditing": false
        }
      }
    ]
  }
]
Copy to Clipboard
[
  {
    "action": "talk",
    "text": "Please wait while we connect you"
  },
  {
    "action": "connect",
    "eventUrl": [
      "https://example.com/events"
    ],
    "from": "447700900000",
    "endpoint": [
      {
        "type": "app",
        "user": "jamie"
      }
    ]
  }
]
Copy to Clipboard
[
  {
    "action": "talk",
    "text": "Please wait while we connect you"
  },
  {
    "action": "connect",
    "eventUrl": [
      "https://example.com/events"
    ],
    "from": "447700900000",
    "endpoint": [
      {
        "type": "sip",
        "uri": "sip:rebekka@sip.mcrussell.com",
        "headers": { "location": "New York City", "occupation": "developer" }
      }
    ]
  }
]

You can provide a fallback for Calls that do not connect. To do this set the eventType to synchronous and return an NCCO from the eventUrl if the Call enters any of the following states:

  • timeout - your user did not answer your call with ringing_timer seconds
  • failed - the call failed to complete
  • rejected - the call was rejected
  • unanswered - the call was not answered
  • busy - the person being called was on another call
Copy to Clipboard
[
  {
    "action": "connect",
    "from": "447700900000",
    "timeout": 5,
    "eventType": "synchronous",
    "eventUrl": [
      "https://example.com/event-fallback"
    ],
    "endpoint": [
      {
        "type": "phone",
        "number": "447700900001"
      }
    ]
  }
]
Copy to Clipboard
[
  {
    "action": "record",
    "eventUrl": ["https://example.com/recordings"]
  },
  {
    "action": "connect",
    "eventUrl": ["https://example.com/events"],
    "from": "447700900000",
    "endpoint": [
      {
        "type": "phone",
        "number": "447700900001"
      }
    ]
  }
]
Copy to Clipboard
[
  {
    "action": "talk",
    "voiceName": "Russell",
    "text": "Thank you for calling. Connecting you to extension."
  },
  {
    "action": "connect",
    "endpoint": [
      {
        "type": "vbc",
        "extension": "111"
      }
    ]
  }
]

You can use the following options to control a connect action:

Option Description Required
endpoint Array of endpoint objects to connect to. Currently supports a maximum of one endpoint object. Available endpoint types Yes
from A number in E.164 format that identifies the caller.

This must be one of your Vonage virtual numbers, another value will result in the caller ID being unknown. If the caller is an app user, this option should be omitted.
No
eventType Set to synchronous to:
  • make the connect action synchronous
  • enable eventUrl to return an NCCO that overrides the current NCCO when a call moves to specific states.
No
timeout If the call is unanswered, set the number in seconds before Vonage stops ringing endpoint. The default value is 60.
limit Maximum length of the call in seconds. The default and maximum value is 7200 seconds (2 hours). No
machineDetection Configure the behavior when Vonage detects that a destination is an answerphone. Set to either:
  • continue - Vonage sends an HTTP request to event_url with the Call event machine
  • hangup - end the Call
No
eventUrl Set the webhook endpoint that Vonage calls asynchronously on each of the possible Call States. If eventType is set to synchronous the eventUrl can return an NCCO that overrides the current NCCO when a timeout occurs. No
eventMethod The HTTP method Vonage uses to make the request to eventUrl. The default value is POST. No
ringbackTone A URL value that points to a ringbackTone to be played back on repeat to the caller, so they don't hear silence. The ringbackTone will automatically stop playing when the call is fully connected. It's not recommended to use this parameter when connecting to a phone endpoint, as the carrier will supply their own ringbackTone. Example: "ringbackTone": "http://example.com/ringbackTone.wav". No

Endpoint Types and Values

Phone (PSTN) - phone numbers in E.164 format

Value Description
type The endpoint type: phone for a PSTN endpoint.
number The phone number to connect to in E.164 format.
dtmfAnswer Set the digits that are sent to the user as soon as the Call is answered. The * and # digits are respected. You create pauses using p. Each pause is 500ms.
onAnswer A JSON object containing a required url key. The URL serves an NCCO to execute in the number being connected to, before that call is joined to your existing conversation. Optionally, the ringbackTone key can be specified with a URL value that points to a ringbackTone to be played back on repeat to the caller, so they do not hear just silence. The ringbackTone will automatically stop playing when the call is fully connected. Example: {"url":"https://example.com/answer", "ringbackTone":"http://example.com/ringbackTone.wav" }. Please note, the key ringback is still supported.

App - Connect the call to a RTC capable application

Value Description
type The endpoint type: app for an application.
user The username of the user to connect to. This username must have been added as a user

WebSocket - the WebSocket to connect to

Value Description
type The endpoint type: websocket for a WebSocket.
uri The URI to the websocket you are streaming to.
content-type the internet media type for the audio you are streaming. Possible values are: audio/l16;rate=16000 or audio/l16;rate=8000.
headers a JSON object containing any metadata you want. See connecting to a websocket for example headers

SIP - the SIP endpoint to connect to

Value Description
type The endpoint type: sip for SIP.
uri the SIP URI to the endpoint you are connecting to in the format sip:rebekka@sip.example.com.
headers key => value string pairs containing any metadata you need e.g. { "location": "New York City", "occupation": "developer" }

VBC - the Vonage Business Cloud (VBC) extension to connect to

Value Description
type The endpoint type: vbc for a VBC extension.
extension the VBC extension to connect the call to.

Talk

The talk action sends synthesized speech to a Conversation.

The text provided in the talk action can either be plain, or formatted using SSML. SSML tags provide further instructions to the text-to-speech synthesizer which allow you to set pitch, pronunciation and to combine together text in multiple languages. SSML tags are XML-based and sent inline in the JSON string.

By default, the talk action is synchronous. However, if you set bargeIn to true you must set an input action later in the NCCO stack. The following NCCO examples shows how to send a synthesized speech message to a Conversation or Call:

Copy to Clipboard
[
  {
    "action": "talk",
    "text": "You are listening to a Call made with Voice API"
  }
]
Copy to Clipboard
[
  {
    "action": "talk",
    "text": "Welcome to a Voice API I V R. ",
    "language": "en-GB",
    "bargeIn": false
  },
  {
    "action": "talk",
    "text": "Press 1 for maybe and 2 for not sure followed by the hash key",
    "language": "en-GB",
    "bargeIn": true
  },
  {
    "action": "input",
    "submitOnHash": true,
    "eventUrl": ["https://example.com/ivr"]
  }
]
Copy to Clipboard
[
  {
    "action": "talk",
    "text": "<speak><prosody rate='fast'>I can speak fast.</prosody></speak>"
  }
]

You can use the following options to control a talk action:

Option Description Required
text A string of up to 1,500 characters (excluding SSML tags) containing the message to be synthesized in the Call or Conversation. A single comma in text adds a short pause to the synthesized speech. To add a longer pause a break tag needs to be used in SSML. To use SSML tags, you must enclose the text in a speak element. Yes
bargeIn Set to true so this action is terminated when the user presses a button on the keypad. Use this feature to enable users to choose an option without having to listen to the whole message in your Interactive Voice Response (IVR). If you set bargeIn to true the next non-talk action in the NCCO stack must be an input action. The default value is false.

Once bargeIn is set to true it will stay true (even if bargeIn: false is set in a following action) until an input action is encountered
No
loop The number of times text is repeated before the Call is closed. The default value is 1. Set to 0 to loop infinitely. No
level The volume level that the speech is played. This can be any value between -1 to 1 with 0 being the default. No
language The language (BCP-47 format) for the message you are sending. Default: en-US. Possible values are listed in the Text-To-Speech guide. No
style The vocal style (vocal range, tessitura and timbre). Default: 0. Possible values are listed in the Text-To-Speech guide. No

Stream

The stream action allows you to send an audio stream to a Conversation

By default, the stream action is synchronous. However, if you set bargeIn to true you must set an input action later in the NCCO stack.

The following NCCO example shows how to send an audio stream to a Conversation or Call:

Copy to Clipboard
[
  {
    "action": "stream",
    "streamUrl": ["https://acme.com/streams/music.mp3"]
  }
]
Copy to Clipboard
[
  {
    "action": "stream",
    "streamUrl": ["https://acme.com/streams/announcement.mp3"]
    "bargeIn": "true"
  },
  {
    "action": "input",
    "submitOnHash": "true",
    "eventUrl": ["https://example.com/ivr"]
  }
]

You can use the following options to control a stream action:

Option Description Required
streamUrl An array containing a single URL to an mp3 or wav (16-bit) audio file to stream to the Call or Conversation. Yes
level Set the audio level of the stream in the range -1 >=level<=1 with a precision of 0.1. The default value is 0. No
bargeIn Set to true so this action is terminated when the user presses a button on the keypad. Use this feature to enable users to choose an option without having to listen to the whole message in your Interactive Voice Response (IVR ). If you set bargeIn to true on one more Stream actions then the next non-stream action in the NCCO stack must be an input action. The default value is false.

Once bargeIn is set to true it will stay true (even if bargeIn: false is set in a following action) until an input action is encountered
No
loop The number of times audio is repeated before the Call is closed. The default value is 1. Set to 0 to loop infinitely. No

The audio stream referred to should be a file in MP3 or WAV format. If you have issues with the file playing, please encode it to the following technical specification: What kind of prerecorded audio files can I use?

Input

You can use the input action to collect digits or speech input by the person you are calling. This action is synchronous, Vonage processes the input and forwards it in the parameters sent to the eventUrl webhook endpoint you configure in your request. Your webhook endpoint should return another NCCO that replaces the existing NCCO and controls the Call based on the user input. You could use this functionality to create an Interactive Voice Response (IVR). For example, if your user presses 4 or says "Sales", you return a connect NCCO that forwards the call to your sales department.

The following NCCO example shows how to configure an IVR endpoint:

Copy to Clipboard
[
  {
    "action": "talk",
    "text": "Please enter a digit or say something"
  },
  {
    "action": "input",
    "eventUrl": [
      "https://example.com/ivr"
    ],
    "type": [ "dtmf", "speech" ],
    "dtmf": {
      "maxDigits": 1
    },
    "speech": {
      "context": [ "sales", "support" ]
    }
  }
]

The following NCCO example shows how to use bargeIn to allow a user to interrupt a talk action. Note that an input action must follow any action that has a bargeIn property (e.g. talk or stream).

Copy to Clipboard
[
  {
    "action": "talk",
    "text": "Please enter a digit or say something",
    "bargeIn": true
  },
  {
    "action": "input",
    "eventUrl": [
      "https://example.com/ivr"
    ],
    "type": [ "dtmf", "speech" ],   
    "dtmf": {
      "maxDigits": 1
    },
    "speech": {
      "context": [ "sales", "support" ]
    }   
  }
]

The following options can be used to control an input action:

Option Description Required
type Acceptable input type, can be set as [ "dtmf" ] for DTMF input only, [ "speech" ] for ASR only, or [ "dtmf", "speech" ] for both. Yes
dtmf DTMF settings. No
speech Speech recognition settings. No
eventUrl Vonage sends the digits pressed by the callee to this URL 1) after timeOut pause in activity or when # is pressed for DTMF or 2) after user stops speaking or 30 seconds of speech for speech input. No
eventMethod The HTTP method used to send event information to event_url The default value is POST. No

DTMF Input Settings

Option Description Required
timeOut The result of the callee's activity is sent to the eventUrl webhook endpoint timeOut seconds after the last action. The default value is 3. Max is 10. No
maxDigits The number of digits the user can press. The maximum value is 20, the default is 4 digits. No
submitOnHash Set to true so the callee's activity is sent to your webhook endpoint at eventUrl after they press #. If # is not pressed the result is submitted after timeOut seconds. The default value is false. That is, the result is sent to your webhook endpoint after timeOut seconds. No

Speech Recognition Settings

Option Description Required
uuid The unique ID of the Call leg for the user to capture the speech of, defined as an array with a single element. The first joined leg of the call by default. No
endOnSilence Controls how long the system will wait after user stops speaking to decide the input is completed. The default value is 2 (seconds). The range of possible values is between 1 second and 10 seconds. No
language Expected language of the user's speech. Format: BCP-47. Default: en-US. List of supported languages. No
context Array of hints (strings) to improve recognition quality if certain words are expected from the user. No
startTimeout Controls how long the system will wait for the user to start speaking. The range of possible values is between 1 second and 10 seconds. No
maxDuration Controls maximum speech duration (from the moment user starts speaking). The default value is 60 (seconds). The range of possible values is between 1 and 60 seconds. No

The following example shows the parameters sent to the eventUrl webhook for DTMF input:

Copy to Clipboard
{
  "speech": { "results": [ ] },
  "dtmf": {
    "digits": "1234",
    "timed_out": true
  },
  "from": "15551234567",
  "to": "15557654321",
  "uuid": "aaaaaaaa-bbbb-cccc-dddd-0123456789ab",
  "conversation_uuid": "bbbbbbbb-cccc-dddd-eeee-0123456789ab",
  "timestamp": "2020-01-01T14:00:00.000Z"
}

The following example shows the parameters sent back to the eventUrl webhook for speech input:

Copy to Clipboard
{
  "speech": {
    "timeout_reason": "end_on_silence_timeout",
    "results": [
      {
        "confidence": "0.9405097",
        "text": "sales"
      },
      {
        "confidence": "0.70543784",
        "text": "sails"
      },
      {
        "confidence": "0.5949854",
        "text": "sale"
      }
    ]
  },
  "dtmf": {
    "digits": null,
    "timed_out": false
  },
  "from": "15551234567",
  "to": "15557654321",  
  "uuid": "aaaaaaaa-bbbb-cccc-dddd-0123456789ab",
  "conversation_uuid": "bbbbbbbb-cccc-dddd-eeee-0123456789ab",
  "timestamp": "2020-01-01T14:00:00.000Z"
}

Input Return Parameters

See Webhook Reference for input parameters which are returned to the eventUrl.

Notify

Use the notify action to send a custom payload to your event URL. Your webhook endpoint can return another NCCO that replaces the existing NCCO or return an empty payload meaning the existing NCCO will continue to execute.

Copy to Clipboard
[
  {
    "action": "notify",
    "payload": {
      "foo": "bar"
    },
    "eventUrl": [
      "https://example.com/webhooks/event"
    ],
    "eventMethod": "POST"
  }
]
Option Description Required
payload The JSON body to send to your event URL Yes
eventUrl The URL to send events to. If you return an NCCO when you receive a notification, it will replace the current NCCO Yes
eventMethod The HTTP method to use when sending payload to your eventUrl No