Amazon Alexa Skills Development: SSML Copy and Paste Worksheet 9: Twelve Genderless Polly Voices

Blog Post Header Image
Blog Post Header Image

This article provides a worksheet with twelve Speech Synthesis Markup Language (SSML) “formulas” for genderless (gender-neutral) synthetic voices. 

The first four can be used directly in Alexa skills for Text-to-Speech (TTS) using the Alexa Skills Kit (ASK) and the various skill building tools, while the other eight can be used to make audio recordings using Amazon Polly.

Audio samples and copy/paste SSML formulas which you can edit are included.

Voices Preview

1. Jordan

Jordan’s voice

2. Ash

Ash’s voice

3. Charlie

Charlie’s voice

4. Jesse

Jesse’s voice

5. Finley

Finley’s voice

6. Justice

Justice’s voice

7. Salem

Salem’s voice

8. Campbell

Campbell’s voice

9. Honor

Honor’s voice

10. Robin

Robin’s voice

11. Frankie

Frankie’s voice

12. Sidney

Sidney’s voice

Twelve SSML Formulas

The first four SSML formulas can be used in your custom Alexa skills for Text-to-Speech (TTS) conversion when using the Alexa Skills Kit (ASK) or skill building tools such as Voiceflow Creator. These formulas take advantage of the Alexa Polly Voice SSML tags.

The SSML for voices five through twelve are for use with AWS Amazon Polly. You can create MP3 files and use them your skills. Voices one through four can be replicated with Amazon Polly as well.

The reason voices with Amazon Polly SSML formulations are included is because the service provides several additional SSML effects which are not available in ASK. The most notable effect is controlling timbre (vocal tract length). This effect provides substantial control for approximating genderless-sounding voices.

How genderless voices can be included in your skills

There are several ways genderless voices can be used in custom skills:

1. If a skill just has one narrator, a genderless voice can be an alternative to either Alexa’s default voice or the Polly unmodified voices.

2. If a skill is an interactive story, a game, or another multi-character skill, one or more characters can be designed to be genderless. This tutorial provides a technique for managing character voices with variables using Voiceflow:

How to Efficiently Manage Character Voices. Part 1: Variables

3. Allow users to choose a voice during their first session. For example, offer male and female voice options along with one or more genderless voices. This tutorial shows how to configure user preferences using Voiceflow:

How to Efficiently Manage Character Voices. Part 2: User Preferences

Persona Design

Something to bear in mind is that these genderless voices are just part of a character’s persona.

Other aspects, such as a character’s choice of words when interacting with users, how they interact with other characters (including pronouns), and how they behave are design considerations as well.

What is SSML?

If you are not familiar with Speech Synthesis Markup Language (SSML), they are tags which can be applied to text being converted to speech.

As you probably know, HTML can alter the appearance of text on a page. For example, HTML can be used to specify a font. HTML can make text appear bold, italicised, have different colors, and display in different sizes.

Likewise, SSML can alter how a voice sounds when text is recited. The SSML syntax consists of tags, which are added to text in a manner similar to HTML. Both the Alexa Skills Kit (ASK) and Amazon Polly support a variety of SSML tag options.

A starting point is picking a “base” voice. They are nicknamed “Polly voices.” In Alexa skills, you can use SSML voice tags to utilize the various Amazon Polly voices.

Once a Polly voice is selected, a common SSML alteration for voice is Prosody. Prosody adjustments generally include refining a voice’s pitch, rate of speech, and relative loudness. There are additional SSML tags that can enable a variety of effects as well.

SSML Reference Documentation

These Amazon references provide detailed explanations of how the SSML tags work, additional SSML tag options, syntax and the parameter ranges.

Speech Synthesis Markup Language (SSML) Reference: Alexa Skills Kit (ASK)

SSML Tags Supported by Amazon Polly

WORKSHEET FOR GENDERLESS VOICE SSML FORMULAS

This article has the copy/paste SSML formulas in the next sections. However for your convenience the SSML formulas are also in this downloadable .txt document (Right Click / Save Link As…).

Alexa SSML Copy And Paste Genderless Voices Worksheet V01

PRO TIP: Be careful when working with single and double quotes when copying and pasting between your word processor and your skill user interface. Quote marks should be straight vertical characters. It is best to turn off smart quote options in your word processor, and be aware that curly and angled quote marks may fail.

PRO TIP: Be careful when copying and pasting SSML tags into a word processor if they are at the beginning of a sentence. Some word processors will auto-capitalize the first letter. For example, your word processor might change <prosody> to <Prosody> and the latter might fail. If possible, turn off auto-capitalization in your word processor.

Sample Text

The sample text is from Herman Melville’s, Moby-Dick. A few breaks have been added to help with pacing.

Call me Ishmael. <break time="300ms"/> Some years ago 
<break time="300ms"/> never mind how long precisely 
<break time="300ms"/> having little or no money in my purse, and 
nothing particular to interest me on shore, I thought I would sail 
about a little <break time="100ms"/> and see the watery part of 
the world.

Genderless Voice Names

Nicknames have been provided for each genderless voice to help with distinguishing them from the names of their base Polly voices.

Four Alexa Voices- ASK/CLI and Skill Building Platforms

The SSML tags used are Voice, Lang, and Prosody (Pitch, Rate and Volume).

Three Prosody tags are combined into one larger tag to minimize nesting of SSML tags. Example opening and closing tags:

<prosody pitch="-27%" rate="95%" volume="+0dB">

</prosody>

A parameter for volume is included as you may need to adjust a voice’s volume relative to other voices and sound effects.

SSML tags which are not included are Emphasis and Whispering, however these are worth working with when refining or designing new voices.

1. Jordan

Jordan’s voice

Base Voice: English, US: Joanna, Female

Opening and closing SSML Tags:

<voice name="Joanna"><lang xml:lang="en-US"><prosody pitch="-27%" 
rate="95%" volume="+0dB">

</prosody></lang></voice>

If you are working with ASK/CLI, you may need opening and closing speak tags to wrap around everything. Skill builders such as Voiceflow usually don’t require Developers to include them, as they are added behind the scenes by the UI. If you need them, the Speak tags are as follows:

<speak>
</speak>

When testing voices, the Alexa Developer Console (ADC) Voice & Tone Simulator works pretty well. If you are not familiar with it, here is a tutorial:

How to Enhance Alexa with SSML using the Voice and Tone Simulator

This is what the the SSML tags for Jordan looks like in the Voice & Tone Simulator:

Alexa Developer Console Jordan Configuration Screen 1
Alexa Developer Console Jordan Configuration Screen 1

The red bubbles highlight that we are in the ADC Test section, with Voice & Tone selected. The SSML tags and text we are working with are in the window, and towards the bottom we can see we are working with the English (US) Language/region.

Let’s take a closer look at Jordan’s configuration:

Alexa Developer Console Jordan Configuration Screen 2
Alexa Developer Console Jordan Configuration Screen 2

Lines 1-3 are the opening SSML tags, and lines 11 and 12 are the closing tags. The content in between these tags is recited as speech. Please note that the window doesn’t auto-expand, and so I added some line breaks for visibility.

This is a handy tool when working with your own text. You can also add refinements such as SSML for phonemes, breaks and other effects to improve how things sound.

The same SSML formula can be used if you wish to generate audio files, such as MP3s, using AWS Amazon Polly. This is what it looks like in the Amazon Polly console:

Amazon Polly Console Jordan Configuration
Amazon Polly Console Jordan Configuration

The SSML tab should be selected. Let’s take a closer look at Jordan’s voice configuration:

Amazon Polly Console Jordan Configuration
Amazon Polly Console Jordan Configuration

My experience has been that the voices sound identical for both ASK and Amazon Polly if the SSML tags are identical. This is something to keep in mind if you are working with passages of static text alongside text that changes frequently.

2. Ash

Ash’s voice

Base Voice: English, US: Salli, Female

Opening and closing SSML Tags:

<voice name="Salli"><lang xml:lang="en-US"><prosody pitch="-33%" 
rate="95%" volume="+0dB">

</prosody></lang></voice>

3. Charlie

Charlie’s voice

Base Voice: English, US: Justin, Male

Opening and closing SSML Tags:

<voice name="Justin"><lang xml:lang="en-US"><prosody pitch="+25%" 
rate="105%" volume="+0dB">

</prosody></lang></voice>

4. Jesse

Jesse’s voice

Base Voice: English, GB: Amy, Female

Opening and closing SSML Tags:

<voice name="Amy"><lang xml:lang="en-GB"><prosody pitch="-33%" 
rate="90%" volume="+0dB">

</prosody></lang></voice>

Eight Amazon Polly Voices – Recorded Text-to-Speech

Amazon Polly provides additional SSML tag options in comparison to the available Alexa Polly tags. For genderless voice configuration, the following effects are particularly useful:

A. Dynamic Range Compression:

<amazon:effect name="drc">

This effect increases the midrange volume. Aside from the volume difference, the change is subtle. However it is worth experimenting with.

B. Speaking Softly:

<amazon:effect phonation="soft">

This is a nice effect for dampening harshness. Again, this is something you can experiment with.

C. Controlling Timbre:

<amazon:effect vocal-tract-length="+15%">

This particular effect has the most dramatic effect when working with genderless voices.

The above are included in the SSML formulas. All three effects are combined into one larger tag to minimize nesting of SSML tags. Example opening and closing tags:

<amazon:effect vocal-tract-length="+15%" phonation="soft" name="drc">

</amazon:effect>

SSML tag options which are not included are Emphasis, Automatic Breathing, and Whispering, however these are also worth considering when refining or designing new voices.

5. Finley

Finley’s voice

Base Voice: English, US: Joanna, Female

Opening and closing SSML Tags:

<speak><amazon:effect vocal-tract-length="+15%" phonation="soft" 
name="drc"><prosody pitch="-27%" rate="95%" volume="+0dB">

</prosody></amazon:effect></speak>

Let’s take a closer look at how this voice looks in the Amazon Polly console. As you can see, the SSML tag for amazon:effect is included.

Amazon Polly Console Finley Configuration
Amazon Polly Console Finley Configuration

6. Justice

Justice’s voice

Base Voice: English, US: Salli, Female

Opening and closing SSML Tags:

<speak><amazon:effect vocal-tract-length="+25%" phonation="soft" 
name="drc"><prosody pitch="-33%" rate="105%" volume="+3dB">

</prosody></amazon:effect></speak>

7. Salem

Salem’s voice

Base Voice: English, US: Justin, Male

Opening and closing SSML Tags:

<speak><amazon:effect vocal-tract-length="135%" phonation="soft" 
name="drc"><prosody pitch="+5%" rate="100%" volume="+0dB">

</prosody></amazon:effect></speak>

8. Campbell

Campbell’s voice

Base Voice: English, British: Amy, Female

Opening and closing SSML Tags:

<speak><amazon:effect vocal-tract-length="+10%" phonation="soft" 
name="drc"><prosody pitch="-33%" rate="103%" volume="+0dB">

</prosody></amazon:effect></speak>

9. Honor

Honor’s voice

Base Voice: English, US: Kendra, Female

Opening and closing SSML Tags:

<speak><amazon:effect vocal-tract-length="+10%" phonation="soft" 
name="drc"><prosody pitch="-20%" rate="105%" volume="+3dB">

</prosody></amazon:effect></speak>

10. Robin

Robin’s voice

Base Voice: English, US: Kimberly, Female

Opening and closing SSML Tags:

<speak><amazon:effect vocal-tract-length="+15%" phonation="soft" 
name="drc"><prosody pitch="-30%" rate="105%" volume="+3dB">

</prosody></amazon:effect></speak>

11. Frankie

Frankie’s voice

Base Voice: English, US: Joey, Male

Opening and closing SSML Tags:

<speak><amazon:effect vocal-tract-length="80%" phonation="soft" 
name="drc"><prosody pitch="x-high" rate="100%" volume="+0dB">

</prosody></amazon:effect></speak>

12. Sidney

Sidney’s voice

Base Voice: English, AU: Nicole, Female

Opening and closing SSML Tags:

<speak><amazon:effect vocal-tract-length="120%" phonation="soft" 
name="drc"><prosody pitch="x-low" rate="100%" volume="+0dB">

</prosody></amazon:effect></speak>

Conclusion

The twelve SSML formulas for genderless-sounding voices are just starting points. You can modify them to suit your needs. You can also use the general approach for creating additional voices using the many other Amazon Polly voices.

These voices, along with the many other aspects that comprise a persona, can result in characters that are engaging and delightful to interact with in your Alexa skills.

Thank you for reading, and happy skill building.

Credits

Header photo by Brooke Cagle on Unsplash