Amazon Alexa Skills Development Tutorial: How to Enhance Alexa with SSML using the Voice and Tone Simulator


Updates:

January 30th, 2019: Updated to reflect use with the Voiceflow skill building tool.

Additional updates coming soon regarding the URL changes for the Amazon Sound Library, and to discuss the Alexa Polly voice SSML tags.


Level: Beginner

This tutorial is designed for Alexa Voice Skill Designers and Developers, and no coding experience is required.


Introduction

The Amazon Alexa toolset provides a wide variety of SSML (Synthetic Speech Markup Language) tags for enhancing speech and adding audio. This tutorial illustrates a way to quickly apply and test SSML tags for multi-character conversations.

Applying SSML to skill content can improve the user’s experience, and in general, it is an iterative process. Amazon’s Voice & Tone simulator is an excellent tool for quickly enhancing and testing content with SSML.

The Voice & Tone simulator looks like this:

Voice & Tone simulator

What is SSML?

Think of SSML as being similar to HTML. As you probably know, HTML tags are used to “mark up” or change the appearance of text on web pages.

Likewise, SSML tags are used to change how voices sound. They can change how a word, phrase or an entire conversation sounds by tagging scripted text. They are also used to help add audio content such as short, pre-recorded music clips and sound effects.


What is the Voice & Tone simulator?

The Voice & Tone simulator, which is a part of the Amazon Developer Console Test Simulator, makes it easy to experiment with, edit, and test content with SSML tags.

You can either write text directly into the tool’s window, or copy and paste from drafted scripts.

After editing, you can then copy and paste the updated content back into their project or an updated script document.


Benefits

The benefits of using this tool are being able to quickly and easily experiment with and test different SSML tag configurations with your content, quickly test and edit longer blocks of text, and save time editing and testing text in larger projects. As a Voice Professional, this saves you time, and as you may find, can even enhance the creative process through experimentation.


Prerequisites

This tutorial assumes little or no experience developing Alexa skills, however the following are needed:

  • An Amazon Developer account. This is needed in order to access the Voice & Tone Simulator.
  • An existing skill you can work with. This is needed in order to access the Voice & Tone simulator screen within the Amazon Developer Console.

If you don’t have these yet, they are covered below.


Task Summary

This tutorial is organized into two parts. The first is a series of tasks for finding and using the Voice & Tone simulator. The second part is to practice experimenting with SSML tags and build a mini conversational scene.

PART 1 – How to find and use the Voice & Tone simulator

  • TASK 1: Log into your Amazon Developer Account and Access the Alexa Skills Console.
  • TASK 2: Create Your First Skill using Voiceflow (if needed).
  • TASK 3: Select a Skill.
  • TASK 4: Go to the Test tab.
  • TASK 5: Go to the Voice & Tone tab.
  • TASK 6: Optional: Change the Region and Language option.
  • TASK 7: Open and review the Amazon SSML Reference document.
  • TASK 8: Optional: Open and review an SSML Copy and Paste Worksheet.

PART 2 – Practice Experimenting with SSML Tags

  • TASK 9: Add Prosody SSML and adjust the pitch.
  • TASK 10: Add a line break and re-test.
  • TASK 11: Add a second voice, and test them side by side.
  • TASK 12: Add a third character, add conversational text, and add SSML for breaks (pauses).
  • TASK 13: Add a Speechcon using a Say-As SSML Interjection tag.
  • TASK 14: Add a sound from the Sound Library.
  • TASK 15: Add some more sounds.
  • TASK 16: Add a Phoneme IPA SSML tag and adjust the pronunciation of a word.

TASK 1: Log into your Amazon Developer Account and Access the Alexa Skills Console.

Log into your account. If you do not yet have an account, you can follow the link below and quickly set one up.

Amazon Developer Services and Technologies

Once you are in your account, navigate to the “Alexa Skills Console.” It should look something like this:

Alexa Skills Console

A skill, any skill, is needed to access the Voice & Tone simulator. As you can see, I have a dummy skill named “sound check” in the grid section.

Important note: The Voice & Tone simulator works independently of the skill. Any skill can be used to help access the simulator, and any content can be independently tested.

If you have just set up your Developers account for the first time, and have not started your first skill yet, proceed to the next task to set up an account on Voiceflow and develop a basic skill. If you already have a skill, you can proceed to TASK 3.


TASK 2: Create your first skill using Voiceflow (if needed).

Voiceflow is a prototyping and development platform which makes it easy to create Amazon Alexa skills with a visual drag-and-drop interface. Voiceflow is also a certified Amazon Business Partner for creating Alexa skills.

Voiceflow Homepage

Review some of the documentation and tutorials.

Voiceflow University

Voiceflow Basics Tutorial Series

To proceed with this tutorial, you will need to initiate a project, but you will NOT need to publish your new skill to the general public.


TASK 3: Select a Skill, any Skill.

In the Amazon Alexa Skills Console, select a skill.  You can either click on your skill name or the “Edit” link in the Actions column in the far right of the screen.

Alexa Skills Console

Any skill can be selected, because the Voice & Tone simulator can be used independently from the skill.  Upon selecting the skill, you will arrive in the Build screen.

Alexa Build Screen

TASK 4: Go to the Test tab.

In the upper left corner of the Build screen, you will see a tab link that says “Test.”

Alexa Skills Console – Build Screen

Click the “Test” link. This will bring you to the Alexa Test Simulator screen.

Alexa Test Simulator

NOTE: If the Alexa Test Simulator is not enabled, click the slider button in the upper left corner to enable testing for the skill.


TASK 5: Go to the Voice & Tone tab.

Currently the “Alexa Simulator” tab is selected in the upper portion of the left pane.

Alexa Test Simulator

Click on the “Voice & Tone” link to open the Voice and Tone simulator window.

Voice & Tone simulator – default view

As you can see, there is some default text already in the window.  Click on the blue Play button at the bottom of the pane to hear what the default text sounds like. 


TASK 6: Optional: Change the Region and Language Option.

Notice the region and language selection in the lower left part of the pane. You can change the Region and Language if you desire, and press Play again, to hear what some of the different regional Alexa voices sound like.


TASK 7: Open and review the Amazon SSML Reference document.

Here is a link to the Amazon SSML Reference document, for reviewing what is available, reviewing parameter options, and copying and pasting the SSML tag syntax.

Speech Synthesis Markup Language (SSML) Reference

Amazon SSML Reference for Alexa Skills

There is detailed information regarding the parameters for each type of SSML tag. The screenshot below shows some of the parameters and syntax for the Prosody SSML speech tags for pitch, volume and rate. This page also has links at the bottom for the Speechcons and Audio Library pages.

Amazon SSML Reference for Alexa Skills – Prosody

TASK 8: Optional: Open and review an SSML Copy and Paste Worksheet.

The following web page has a handy worksheet that will help facilitate quick copying and pasting of SSML tags. The web page has as a .txt file you can download as well.

Amazon Alexa Skills Development: SSML Copy and Paste Worksheet 1


PART 2: Practice Experimenting with SSML Tags

The next set of tasks provides a way for new Alexa Voice Skill Developers to experiment with a variety of SSML tags and hear what is possible.

IMPORTANT:

When copying and pasting content from the Voice & Tone simulator, some platforms will not need the <speak></speak> tags. Voiceflow, for example, will not need these. Check your reference documentation for the platform you are using to determine whether these are needed or not.

Likewise, when copying and pasting existing content from a document or a platform such as Voiceflow, be sure to paste the text in between the existing speak tags in the Voice & Tone simulator.

Finally, be very careful with Word Processors that use smart quotes. There is a difference between quote marks and apostrophe’s which are “curly” vs. “straight” (you want to be using the straight ones). Often, smart quotes will fail when read by Alexa.

A good practice is to make sure Smart Quotes are turned off for your Word Processor, or use a more basic text processor such as Notepad or  TextEdit.


TASK 9: Add Prosody SSML and adjust the pitch.

In TASK 5, you heard what Alexa’s natural, un-modified voice sounds like, along with the “whispered” SSML effect. In this exercise we will add a Prosody-pitch SSML tag and adjust her voice.

In the Amazon SSML Reference document we opened in TASK 7, scroll down to the Prosody section.

Amazon SSML Reference for Alexa Skills – Prosody

In this section, you can read about the options for Prosody-pitch. The pitch can be adjusted using parameters such as x-low, low, medium, etc., as well as by percentage.

Syntax: Practice copying the sample prosody tags from the black box portion of the Amazon reference document for the pitch, and then pasting in between the speak tags in the Voice & Tone simulator.

Text Sample with Prosody – pitch SSML

Press the blue Play button at the bottom of the screen to hear what Alexa’s un-modified voice sounds like in comparison to her voice with the pitch adjustment.

As you develop your skill, you can now start considering whether adjusting the pitch, and by how much, will sound better. These two different-sounding voices could even be two separate characters.

Also try adjusting the pitch using some of the other options and hear what kind of range you can achieve.


TASK 10: Add a line break and re-test.

Something you will notice is that the Voice & Tone pane is a bit narrow, and not easy to widen.

Something you can do is carefully put in a temporary line break in the text to wrap it around. Give this a try and then re-test, to confirm nothing was broken.

Text Sample with Prosody – pitch SSML – with text wrapped

Note that you might have problems putting in a line break to wrap text for an SSML tag with audio, as this can cause the Voice & Tone simulator to not process it correctly.  Also, it is recommended that the line breaks be removed after testing to avoid issues downstream.


TASK 11: Add a second voice, and test them side by side.


Think of this particular task as being an easy way to do “screen tests,” so to speak, of different characters, and how they sound when they interact with each other.


The Voice & Tone simulator is handy for testing multiple voices, as if they are characters in an interactive story or game, and making adjustments so they sound good together.

For our example, we will pretend this is a bit of conversation in an Escape Room style adventure game. Perhaps Alexa’s un-modified voice is the narrator, but in this section of script we want to have two distinct characters with two different additional voices.

Using the SSML reference document to copy and paste the SSML tags, modify the text as shown below and then press Play to test.

Example “Screen Test” with character voices

Think of this particular task as being an easy way to do “screen tests,” so to speak, of different characters, and how they sound when they interact with each other. Feel free to experiment with some of the other SSML tags for Prosody (rate, volume), and Emphasis.


TASK 12: Add a third character, add conversational text, and add SSML for breaks (pauses).

For this next task, let’s make the text more like a conversation. Edit the text in the Voice & Tone simulator in the manner as shown below, or however you like.

Also we want to add some breaks, or pauses, in between their lines. For these, scroll to the Break section in the SSML Reference document to copy and paste the Break SSML tags, and edit the parameters.

Press Play and listen to how the conversation is flowing.

Three characters having a conversation

TASK 13: Add a Speechcon using a Say-As SSML Interjection tag.

Amazon provides a variety of Speechcons that can be easily applied as SSML tags. These are commonly used words and short phrases that have an extra amount of emphasis when used. Alexa can recite these phrases more expressively with the Speechcon SSML tags.

They are available in multiple languages, and the the Amazon reference documentation also includes sound snippets to try them out before adding them to your skill.

Here is a link to the English (US) Speechcons:

Speechcon Reference (Interjections): English (US)

Additional links are provided at the bottom of the referenced document for other regions and languages.

According to the documentation, this is what the SSML syntax looks like:

Speechcon SSML Syntax

For our example, let’s add the “good grief” Speechcon using the SSML say-as interpret-as = interjection tag. First, find it in the Amazon reference document for Speechcons. This is what it looks like:

English (US) Speechcon for “good grief”

Next, let’s add the Speechcon to our example conversation in the Voice & Tone window. Carefully cut and paste the syntax model from the top of the reference page, and then modify the tag to read “good grief”.

It has been added to line 16 in the image below. Also a temporary line break was added, wrapping some of the text to line 17 to make it more visible.

Example conversation with Speechcon added

Press Play and listen to how everything sounds together. Feel free to experiment with other Speechcons. Perhaps there is a better one for this particular scene? Review some of the other available Speechcons and try some out. 


TASK 14: Add a sound from the Sound Library.

Adding sound using SSML tags for short audio clips can really punch up a scene. The Amazon Alexa reference documentation provides both the SSML methodology, plus a library of audio clips of sound effects which you can use.

Here is a link to the main page of the Amazon Alexa Sound Library. As you will see, there are many categories.

Alexa Skills Kit Sound Library

For this task, open up this document, and then scroll down and open up the page for “Foley Sounds.”

Alexa Sound Library – Foley Sounds Page

Scroll down and find the sound for “wooden door (1)”. Here, you can click the player button to hear what it sounds like prior to adding it to your content in the Voice & Tone simulator.

Alexa Sound Library – Foley Sounds – wooden door (1)

Next, carefully copy and paste the Audio SSML tag and add it to our example conversation in line 17 in the Voice and Tone simulator window (Note: I took out the word wrap in our previous step for readability). The example conversation should look something like this:

Voice & Tone simulator conversation with a sound clip added

Press play to hear how the conversation and the audio clip sounds. Feel free to try substituting other sound clips.

NOTE: Also, audio clips should not exceed more than 240 seconds, nor there should be more than five audio clips in between user interactions. If you have more than five, Voiceflow provides a COMBINE block which can be used to bypass this limit. If you are working with audio clips longer than 240 seconds, you will need to implement a STREAM Block.


TASK 15: Add some more sounds.

In the example, more sounds are added as shown below, including wings flapping and several crow caws from the animals sound library. This is what the text looks like (note: the text in the box below can scroll):

How do we escape from this room?  Are we
    going to die? <break time="300ms"/>
    <prosody pitch="x-high"> No, we are not going
    to die. Let's open that door with the skull
    on it.</prosody><break time="300ms"/>
    <prosody pitch="x-low"> Are you nuts?  No,
    let's open the other door, the one with the
    cute little bird looking at us through the
    little window. </prosody><break time="300ms"/>
    <prosody pitch="x-high"> No way are opening
    that door with that bird.  That is a
    trap, for sure.</prosody><break time="300ms"/>
    I think both doors are deadly traps.  Maybe
    we should just wait here for awhile.
    <say-as interpret-as="interjection">good grief</say-as>
    <audio src='https://s3.amazonaws.com/ask-soundlibrary/foley/amzn_sfx_wooden_door_01.mp3'/>
    <prosody pitch="x-low">Hey, that bird is 
    opening the door!</prosody><break time="300ms"/>
    <audio src='https://s3.amazonaws.com/ask-soundlibrary/foley/amzn_sfx_wings_flap_fast_01.mp3'/>
    <audio src='https://s3.amazonaws.com/ask-soundlibrary/animals/amzn_sfx_crow_caw_1x_01.mp3'/>
    <audio src='https://s3.amazonaws.com/ask-soundlibrary/animals/amzn_sfx_crow_caw_1x_01.mp3'/>
    That bird is attacking us! We are going to
    die!
    <audio src='https://s3.amazonaws.com/ask-soundlibrary/animals/amzn_sfx_crow_caw_1x_02.mp3'/>
    <prosody pitch="x-high">Run through the door!
    </prosody><break time="300ms"/>

For this exercise, either find the additional sounds in the sound library, or copy and paste the SSML tags from above into the Voice & Tone simulator. Press play, and listen to how it is sounds.

Voice & Tone simulator conversation with additional sound clips

TASK 16: Add a Phoneme IPA SSML tag and adjust the pronunciation of a word.

Another powerful tool is the Phoneme SSML tag. This leverages the International Phonetic Alphabet (IPA) to adjust the pronunciation of individual words.

Phonemes can be used for a variety of reasons. One is to correct the pronunciation. For example, Alexa might be pronouncing “bass,” in the context of music and a bass guitar, when the context is bass fishing.

Another reason might be to adjust for regional or even cultural differences, depending on the context of your skill. I say “pecan;” you say pecan.”

The Phoneme for IPA can be used to precisely spell out uncommon words, such as the names of people, regional cuisines and places.

Jeff Blankenburg’s excellent article provides more insight on the use of phonemes.

Alexa Blogs: How to Use Phonemes to Change Alexa’s Pronunciation, by Jeff Blankenburg, March 6, 2018

The Amazon SSML documentation provides the syntax for using the Phoneme SSML tag:

Speech Synthesis Markup Language (SSML) Reference – Phoneme

Amazon Alexa SSML Reference – Phoneme Syntax

The syntax is in the black box, and illustrates two different IPA spellings of pecan. Let’s add them both to our example conversation. We will have one character says pecan in a different way than the other character.

Let’s add the following lines, and also add an ‘s’ at the end of pecan, to form the word “pecans” (note: the text in the boxes below can scroll).

<prosody pitch="x-high"> I know!  Give those <phoneme alphabet="ipa" ph="pɪˈkɑːns">pecans</phoneme> to the bird!.</prosody><break time="300ms"/>
<prosody pitch="x-low"> Here you go, pretty bird, have some yummy <phoneme alphabet="ipa" ph="ˈpi.kæns">pecans</phoneme><break time="300ms"/>Oh cool, the bird is eating them! It worked!  Now what?</prosody><break time="300ms"/>

These are added to lines 25 and 26 of our example conversation. At this point, our example looks like this when viewed in the Voice & Tone simulator:

Voice & Tone simulator conversation with phoneme SSML added

Here is the full text, which you can copy and paste into the simulator (note: the text in the box below can scroll):

How do we escape from this room?  Are we
    going to die? <break time="300ms"/>
    <prosody pitch="x-high"> No, we are not going
    to die. Let's open that door with the skull
    on it.</prosody><break time="300ms"/>
    <prosody pitch="x-low"> Are you nuts?  No,
    let's open the other door, the one with the
    cute little bird looking at us through the
    little window. </prosody><break time="300ms"/>
    <prosody pitch="x-high"> No way are opening
    that door with that bird.  That is a
    trap, for sure.</prosody><break time="300ms"/>
    I think both doors are deadly traps.  Maybe
    we should just wait here for awhile.
    <say-as interpret-as="interjection">good grief</say-as>
    <audio src='https://s3.amazonaws.com/ask-soundlibrary/foley/amzn_sfx_wooden_door_01.mp3'/>
    <prosody pitch="x-low">Hey, that bird is 
    opening the door!</prosody><break time="300ms"/>
    <audio src='https://s3.amazonaws.com/ask-soundlibrary/foley/amzn_sfx_wings_flap_fast_01.mp3'/>
    <audio src='https://s3.amazonaws.com/ask-soundlibrary/animals/amzn_sfx_crow_caw_1x_01.mp3'/>
    <audio src='https://s3.amazonaws.com/ask-soundlibrary/animals/amzn_sfx_crow_caw_1x_01.mp3'/>
    That bird is attacking us! We are going to die!
    <prosody pitch="x-high"> I know!  
Give those <phoneme alphabet="ipa" ph="pɪˈkɑːns">pecans</phoneme> to the bird!.
</prosody><break time="300ms"/>
    <prosody pitch="x-low"> Here you go, pretty bird, have some yummy 
<phoneme alphabet="ipa" ph="ˈpi.kæns">pecans</phoneme><break time="300ms"/>
Oh cool, the bird is eating them! It worked!  
Now what!</prosody><break time="300ms"/>
    <audio src='https://s3.amazonaws.com/ask-soundlibrary/animals/amzn_sfx_crow_caw_1x_02.mp3'/>
    <prosody pitch="x-high">Run through the door!
    </prosody><break time="300ms"/>

Press play, and listen to how it sounds.


Conclusion

During this tutorial we found and began using the Voice & Tone simulator, and experimented with a variety of SSML tags which can be used to enhance an Alexa skill’s content.

The next step would be to copy and paste your enhanced content back into your project, upload your project to Alexa, and then test further using the standard Test Simulator, with an Alexa-enabled device, or with an Alexa app.

You might also proceed to do formal beta testing with a group of people as well (highly recommended).

As you test and listen to how your skill sounds, and receive feedback from beta testers or users, you can continue to return back to the Voice & Tone simulator to make additional edits using SSML and continue to enhance your Alexa skill.

Thank you for reading!


Image source: Pexels


#SSML #Prosody #Phonemes #VoiceDevelopment #AlexaSkillsKit #AlexaDevs #Amazon #Alexa #Echo #VoiceFirst #AmbientComputing #FrictionlessWorld #Voiceflow