How to Tailor Voice and Text with the Voiceflow Locale Variable

Level: Intermediate

This tutorial introduces the new Voiceflow locale variable, and a technique for using it to control which voice and text are used. The locale variable makes it possible to detect a user’s language and region, and use this for tailoring the interaction.

This tutorial assumes you are familiar with the basics of creating Alexa skills using Voiceflow. If not, review the prerequisites and try some of the introductory tutorials noted in the next section.

Prerequisites for Beginners

If you haven’t developed an Amazon Alexa skill before, or haven’t used Voiceflow, the following are recommended as prerequisites:

  1. Set up an Amazon Developer account. It is free:

Amazon Alexa Developer Home Page

2. Set up a Voiceflow Account. They offer a generous free tier from which you can design, develop and publish Amazon Alexa skills:

Voiceflow Home Page

3. Try some of the Voiceflow University tutorials and practice making a skill or two. There is an introductory video series, along with extensive reference documentation.

Voiceflow University

Voiceflow Basics Tutorial Series

4. Join the Voiceflow User’s Group on Facebook. The community is very supportive should you have questions or issues when building skills, as well as a lot of tips, techniques and camaraderie.

Voiceflow Facebook Community

Introduction

The locale property in an Amazon Alexa skill identifies the region and language preference of a customer.

For example, a customer may be in the U.S. and prefers to use English when talking with Alexa. Likewise, a customer may be in the U.K, and also use English when talking with Alexa. The difference, however is that accents, the voice used and word choices can be different.

In Voiceflow, the locale property is now available in the form of a global variable when building skills. This makes it possible to refine how a skill interacts with the user based on their region and language preferences.

Here are some example aspects of your skill that you can control using the locale variable:

  • Which sub flows are run.
  • Which APIs are called when retrieving data.
  • Which Alexa Polly Voice is used.
  • Which SSML tags are used.
  • How variables are populated.
  • Which text is spoken.
  • Which units of measure are used.
  • Which currency type is used.
  • Which audio clips are played.

This tutorial illustrates how to use the Voiceflow locale variable for two regional alternatives, namely voice options and several alternate word options for English – US users versus English – UK users.

Locale Property, Valid Values

The locale property is an individual user’s geographical region and language preference. It is associated with a customer’s account, and is included in the request sent by Alexa when accessing your Alexa skills.

The following quote from the Amazon reference documentation illustrates the relationship between users, their devices, and your Alexa skills:

“Users who interact with Alexa in a particular language can use skills that support that language. For example, a user might set their Alexa device to use German. That user can enable and use any published skills available in their region that support German.”

Develop Skills in Multiple Languages

The valid values are reproduced below for copying and pasting when creating IF conditions using the locale variable. The first part of the code is the language, and the second part of the code is the region. For example, the value “en-GB” represents the english language and the UK region.

Code          Language (Region)
en-AU      // English (AU)
en-CA      // English (CA)
en-GB      // English (UK)
en-US      // English (US)
en-IN      // English (IN)
fr-CA      // French (CA)
fr-FR      // French (FR)
de-DE      // German (DE)
it-IT      // Italian (IT)
ja-JP      // Japanese (JP)
es-ES      // Spanish (ES)
ex-MX      // Spanish (MX)

Voiceflow Locale Variable

Variables are located in the left hand pane of the Voiceflow project canvas. The locale variable is a global variable. It will be automatically populated with the user’s locale when the skill is run.

The Voiceflow Locale variable

Demonstration Skill

This demo skill provides an example of how to use the locale variable. Here is how our demo skill will look on the Voiceflow project canvas:

Demo Skill for Locale Variable

The demo skill will first check the locale variable in the IF block.  If the locale is en-US, the skill will flow out of node 1 to a SET block.  This SET block will then set several variables that are specific for that locale.  

Likewise, if the locale is en-GB, the skill will flow out of node 2 to the SET block that will set the same variables which are specific for en-GB.

Afterwards, either SET block will flow to the SPEAK block, and during testing we will show how Alexa speaks differently, based on the locale.

If you wish, you can either drag all of the blocks onto the canvas right away, or add them one at a time and connect them as we proceed through the tutorial.  TIP:  Just drag out one SET block.  After it is configured for the en-US values, we can copy it and edit it to make the en-GB SET block, which will save a bit of time.

Task Summary

Part 1 – Configure the demo skill

  • Initiate a new project for US and UK
  • Configure the Variables
  • Configure the IF Block
  • Configure the SET Block for en-US
  • Configure the SET Block for en-GB
  • Configure the SPEAK Block

Part 2 – Test on the Alexa Developer Console

  • Test the en-US Configuration
  • Test the en-GB Configuration

Part 1 – Configure the demo skill

Initiate a new project for US and UK

If you are going to follow along and reproduce the example skill, start a new Voiceflow project, and use whatever project / invocation name you prefer. Let’s imagine this project will be distributed to English (UK) and English (US) regions. Click the checkboxes as shown below.

Demo skill project initiation in Voiceflow

Configure Variables

We are going to add five variables for our tutorial.  The variables could be either global or local, depending on your particular skill’s needs. However for this demo we can add them as local variables.

voiceTags     // This will store SSML tags for Polly voices.
endTags       // This will store the closing SSML tags for the 
              // voices.

Using a matched pair of tags as shown above makes it easier if you decide to add prosody or other SSML tags for the voice. If a change is needed, changes can be done in one location for each voice, such as the SET blocks, rather than a bunch of SPEAK blocks throughout a large skill.

Also add the following three variables.  These will store word alternatives for car parts.

wTrunk     // "trunk" versus "boot"
wHood      // "hood" versus "bonnet"
wTire      // "tire" versus "tyre"

The small “w” in front of the variable names is just to help indicate the variables are storing words. Use whatever nomenclature works best for you when naming and organizing your variables.

Variables which will store en-US vs. en-GB values

Configure the IF Block

Add two IF statements as shown in the image below. To summarize, if the locale is en-US, the flow will exit via node 1. Likewise, if the locale is en-GB, the flow will exit via node 2.

NOTE: The IF block also has an Else exit node. An alternate approach would be to assume that if the value is not en-US, it must then be en-GB, and assign the Else node to en-GB. For clarity in this tutorial, we will use two distinct nodes, one for each region, and route the Else exit node in the same manner as node 1.

IF Block

Configure the SET Block for en-US

Add the following five IF statements for the SET Block. This set block will be accessed if the user’s region is en-US.

First, assign the Alexa Polly voice SSML tag for Joanna to the voiceTags variable. Joanna is an en-US voice.

<voice name="Joanna">
SET Block for en-US.SET voiceTags to Joanna

Second, click “+ Add Variable Set” and assign the endTags variable the closing voice SSML tag.

</voice>
SET Block for en-US.Set endTags to the close tag for voice

PRO TIP: The benefit of using a variable for the closing voice SSML tag, rather than just putting the SSML tag directly in the SPEAK blocks, is that it is easier if you wish to add some additional prosody, emphasis, or even break tags later. This will update the voice all throughout the skill without having to update each individual SPEAK block. The endTag variable would store the matching closing tags. This provides more flexibility and is an elegant way to add refinements to the voices.

Assign the word “hood” to the wHood variable.

SET Block for en-US. Assign “hood” to the wHood variable.

Assign the words “trunk” and “tire” to their respective variables.

SET Block for en-US. Assign “trunk” and “tire” to their respective variables.

You should now have five statements in your IF block for en-US, as shown above. Next, you can collapse it, rename it to “SET en-US”, and connect it to node 1 (and the Else exit node) of the IF statement.

SET Block for en-US. Project Canvas

Configure the SET Block for en-GB

Rather than dragging out a second SET block from the left panel, it is easier to make a copy of the previous set block, and then edit it. To begin, make a copy of the previous SET block.

SET Block for en-GB. Copy SET Block

Second, rename the block to “Set en-GB”.

SET Block for en-GB. Rename copied SET Block to en-GB

Third, expand the block and update the values to those which correspond with the en-GB region. “Brian” is an en-GB voice with a British accent.

<voice name="Brian">

Replace “hood”, “trunk” and “tire” with “bonnet”, “boot” and “tyre.”

SET Block for en-GB. Second SET Block for en-GB is updated

Finally, collapse the block and connect it to node 2 of the IF block.

SET Block for en-GB. Project canvas. Connect IF block to SET Block.

Configure the SPEAK Block

Position a SPEAK block on the canvas and connect to the two SET blocks as shown below.

Position and connect SPEAK Block

Expand the SPEAK block and add the following “raw” text.

When preparing for a long trip, be sure to open the hood and check the oil, and open the trunk and make sure your spare tire is properly inflated.

Speak Block with “raw” text

In the next step we are going to add the variables for the voice tags, and substitute in variables for several words which are unique for each region.

More specifically, we are going to add the variables in the various positions as shown by the red circles in the image below. The voiceTags and endTags will be at the very beginning and end of the text.  Likewise, we will substitute the variables wHood, wTrunk and wTire for their respective words.

SPEAK Block with red markers which indicate variable placements

Add the variables as shown below.

SPEAK Block updated with variables.

Once the SPEAK block is configured, collapse it. The canvas should now look something like this:

Project Canvas. Demo skill is ready for testing

Congratulations, the demo skill is now ready for testing!

Part 2 – Test on the Alexa Developer Console

Test the en-US Configuration

Press Upload to Alexa to prepare the demo skill for testing, and then access the ADC (Alexa Developer Console) test tab.

Be sure the region selection is ENGLISH (US)

Run the test, using the invocation name you provided. The output should be similar to the following:

Test en-US Configuration

While listening, you should hear Joanna’s voice, using an en-US accent, and the US versions of the words for trunk, hood and tire. Audio sample:

Joanna, en-US

If you view the JSON output in the testing pane, you can visually confirm that Joanna’s voice was used.

Joanna, JSON Output, en-US

Testing is successful.

Test the en-GB Configuration

For this next test, change the region selection to ENGLISH (GB).

Run the test again. The output should be similar to the following.  In the lower blue box, the alternative words appear in the spoken text.

Test en-GB configuration

While listening, you should hear Brian’s voice, using a British accent, and the the words boot, bonnet and tyre. Audio sample:

Brian, en-GB

If you view the JSON output in the testing pane, you can visually confirm that Brian’s voice was used.

Testing is successful. In consideration of both tests, the locale variable is successfully triggering use of different voices and spoken text.

Conclusion

This tutorial introduced you to the new locale variable which is available in Voiceflow. With this variable, it is possible to design and develop multi-region skills which take into consideration a user’s region and language preferences.

The benefit of using the locale variable is being able to provide tailored interactions for users among different regions. This will provide more engaging and enriched experiences when using your skill.

Thank you for reading, and happy skill building!

Credits

Header photo by freestocks.org on Unsplash