This article provides a worksheet with twelve Speech Synthesis Markup Language (SSML) “formulas” for genderless (gender-neutral) synthetic voices.
The first four can be used directly in Alexa skills for Text-to-Speech (TTS) using the Alexa Skills Kit (ASK) and the various skill building tools, while the other eight can be used to make audio recordings using Amazon Polly.
Audio samples and copy/paste SSML formulas which you can edit are included.
Voices Preview
1. Jordan
2. Ash
3. Charlie
4. Jesse
5. Finley
6. Justice
7. Salem
8. Campbell
9. Honor
10. Robin
11. Frankie
12. Sidney
Twelve SSML Formulas
The first four SSML formulas can be used in your custom Alexa skills for Text-to-Speech (TTS) conversion when using the Alexa Skills Kit (ASK) or skill building tools such as Voiceflow Creator. These formulas take advantage of the Alexa Polly Voice SSML tags.
The SSML for voices five through twelve are for use with AWS Amazon Polly. You can create MP3 files and use them your skills. Voices one through four can be replicated with Amazon Polly as well.
The reason voices with Amazon Polly SSML formulations are included is because the service provides several additional SSML effects which are not available in ASK. The most notable effect is controlling timbre (vocal tract length). This effect provides substantial control for approximating genderless-sounding voices.
How genderless voices can be included in your skills
There are several ways genderless voices can be used in custom skills:
1. If a skill just has one narrator, a genderless voice can be an alternative to either Alexa’s default voice or the Polly unmodified voices.
2. If a skill is an interactive story, a game, or another multi-character skill, one or more characters can be designed to be genderless. This tutorial provides a technique for managing character voices with variables using Voiceflow:
How to Efficiently Manage Character Voices. Part 1: Variables
3. Allow users to choose a voice during their first session. For example, offer male and female voice options along with one or more genderless voices. This tutorial shows how to configure user preferences using Voiceflow:
How to Efficiently Manage Character Voices. Part 2: User Preferences
Persona Design
Something to bear in mind is that these genderless voices are just part of a character’s persona.
Other aspects, such as a character’s choice of words when interacting with users, how they interact with other characters (including pronouns), and how they behave are design considerations as well.
What is SSML?
If you are not familiar with Speech Synthesis Markup Language (SSML), they are tags which can be applied to text being converted to speech.
As you probably know, HTML can alter the appearance of text on a page. For example, HTML can be used to specify a font. HTML can make text appear bold, italicised, have different colors, and display in different sizes.
Likewise, SSML can alter how a voice sounds when text is recited. The SSML syntax consists of tags, which are added to text in a manner similar to HTML. Both the Alexa Skills Kit (ASK) and Amazon Polly support a variety of SSML tag options.
A starting point is picking a “base” voice. They are nicknamed “Polly voices.” In Alexa skills, you can use SSML voice tags to utilize the various Amazon Polly voices.
Once a Polly voice is selected, a common SSML alteration for voice is Prosody. Prosody adjustments generally include refining a voice’s pitch, rate of speech, and relative loudness. There are additional SSML tags that can enable a variety of effects as well.
SSML Reference Documentation
These Amazon references provide detailed explanations of how the SSML tags work, additional SSML tag options, syntax and the parameter ranges.
Speech Synthesis Markup Language (SSML) Reference: Alexa Skills Kit (ASK)
SSML Tags Supported by Amazon Polly
WORKSHEET FOR GENDERLESS VOICE SSML FORMULAS
This article has the copy/paste SSML formulas in the next sections. However for your convenience the SSML formulas are also in this downloadable .txt document (Right Click / Save Link As…).
Alexa SSML Copy And Paste Genderless Voices Worksheet V01
PRO TIP: Be careful when working with single and double quotes when copying and pasting between your word processor and your skill user interface. Quote marks should be straight vertical characters. It is best to turn off smart quote options in your word processor, and be aware that curly and angled quote marks may fail.
PRO TIP: Be careful when copying and pasting SSML tags into a word processor if they are at the beginning of a sentence. Some word processors will auto-capitalize the first letter. For example, your word processor might change <prosody> to <Prosody> and the latter might fail. If possible, turn off auto-capitalization in your word processor.
Sample Text
The sample text is from Herman Melville’s, Moby-Dick. A few breaks have been added to help with pacing.
Call me Ishmael. <break time="300ms"/> Some years ago
<break time="300ms"/> never mind how long precisely
<break time="300ms"/> having little or no money in my purse, and
nothing particular to interest me on shore, I thought I would sail
about a little <break time="100ms"/> and see the watery part of
the world.
Genderless Voice Names
Nicknames have been provided for each genderless voice to help with distinguishing them from the names of their base Polly voices.
Four Alexa Voices- ASK/CLI and Skill Building Platforms
The SSML tags used are Voice, Lang, and Prosody (Pitch, Rate and Volume).
Three Prosody tags are combined into one larger tag to minimize nesting of SSML tags. Example opening and closing tags:
<prosody pitch="-27%" rate="95%" volume="+0dB">
</prosody>
A parameter for volume is included as you may need to adjust a voice’s volume relative to other voices and sound effects.
SSML tags which are not included are Emphasis and Whispering, however these are worth working with when refining or designing new voices.
1. Jordan
Base Voice: English, US: Joanna, Female
Opening and closing SSML Tags:
<voice name="Joanna"><lang xml:lang="en-US"><prosody pitch="-27%"
rate="95%" volume="+0dB">
</prosody></lang></voice>
If you are working with ASK/CLI, you may need opening and closing speak tags to wrap around everything. Skill builders such as Voiceflow usually don’t require Developers to include them, as they are added behind the scenes by the UI. If you need them, the Speak tags are as follows:
<speak>
</speak>
When testing voices, the Alexa Developer Console (ADC) Voice & Tone Simulator works pretty well. If you are not familiar with it, here is a tutorial:
How to Enhance Alexa with SSML using the Voice and Tone Simulator
This is what the the SSML tags for Jordan looks like in the Voice & Tone Simulator:
The red bubbles highlight that we are in the ADC Test section, with Voice & Tone selected. The SSML tags and text we are working with are in the window, and towards the bottom we can see we are working with the English (US) Language/region.
Let’s take a closer look at Jordan’s configuration:
Lines 1-3 are the opening SSML tags, and lines 11 and 12 are the closing tags. The content in between these tags is recited as speech. Please note that the window doesn’t auto-expand, and so I added some line breaks for visibility.
This is a handy tool when working with your own text. You can also add refinements such as SSML for phonemes, breaks and other effects to improve how things sound.
The same SSML formula can be used if you wish to generate audio files, such as MP3s, using AWS Amazon Polly. This is what it looks like in the Amazon Polly console:
The SSML tab should be selected. Let’s take a closer look at Jordan’s voice configuration:
My experience has been that the voices sound identical for both ASK and Amazon Polly if the SSML tags are identical. This is something to keep in mind if you are working with passages of static text alongside text that changes frequently.
2. Ash
Base Voice: English, US: Salli, Female
Opening and closing SSML Tags:
<voice name="Salli"><lang xml:lang="en-US"><prosody pitch="-33%"
rate="95%" volume="+0dB">
</prosody></lang></voice>
3. Charlie
Base Voice: English, US: Justin, Male
Opening and closing SSML Tags:
<voice name="Justin"><lang xml:lang="en-US"><prosody pitch="+25%"
rate="105%" volume="+0dB">
</prosody></lang></voice>
4. Jesse
Base Voice: English, GB: Amy, Female
Opening and closing SSML Tags:
<voice name="Amy"><lang xml:lang="en-GB"><prosody pitch="-33%"
rate="90%" volume="+0dB">
</prosody></lang></voice>
Eight Amazon Polly Voices – Recorded Text-to-Speech
Amazon Polly provides additional SSML tag options in comparison to the available Alexa Polly tags. For genderless voice configuration, the following effects are particularly useful:
A. Dynamic Range Compression:
<amazon:effect name="drc">
This effect increases the midrange volume. Aside from the volume difference, the change is subtle. However it is worth experimenting with.
B. Speaking Softly:
<amazon:effect phonation="soft">
This is a nice effect for dampening harshness. Again, this is something you can experiment with.
C. Controlling Timbre:
<amazon:effect vocal-tract-length="+15%">
This particular effect has the most dramatic effect when working with genderless voices.
The above are included in the SSML formulas. All three effects are combined into one larger tag to minimize nesting of SSML tags. Example opening and closing tags:
<amazon:effect vocal-tract-length="+15%" phonation="soft" name="drc">
</amazon:effect>
SSML tag options which are not included are Emphasis, Automatic Breathing, and Whispering, however these are also worth considering when refining or designing new voices.
5. Finley
Base Voice: English, US: Joanna, Female
Opening and closing SSML Tags:
<speak><amazon:effect vocal-tract-length="+15%" phonation="soft"
name="drc"><prosody pitch="-27%" rate="95%" volume="+0dB">
</prosody></amazon:effect></speak>
Let’s take a closer look at how this voice looks in the Amazon Polly console. As you can see, the SSML tag for amazon:effect is included.
6. Justice
Base Voice: English, US: Salli, Female
Opening and closing SSML Tags:
<speak><amazon:effect vocal-tract-length="+25%" phonation="soft"
name="drc"><prosody pitch="-33%" rate="105%" volume="+3dB">
</prosody></amazon:effect></speak>
7. Salem
Base Voice: English, US: Justin, Male
Opening and closing SSML Tags:
<speak><amazon:effect vocal-tract-length="135%" phonation="soft"
name="drc"><prosody pitch="+5%" rate="100%" volume="+0dB">
</prosody></amazon:effect></speak>
8. Campbell
Base Voice: English, British: Amy, Female
Opening and closing SSML Tags:
<speak><amazon:effect vocal-tract-length="+10%" phonation="soft"
name="drc"><prosody pitch="-33%" rate="103%" volume="+0dB">
</prosody></amazon:effect></speak>
9. Honor
Base Voice: English, US: Kendra, Female
Opening and closing SSML Tags:
<speak><amazon:effect vocal-tract-length="+10%" phonation="soft"
name="drc"><prosody pitch="-20%" rate="105%" volume="+3dB">
</prosody></amazon:effect></speak>
10. Robin
Base Voice: English, US: Kimberly, Female
Opening and closing SSML Tags:
<speak><amazon:effect vocal-tract-length="+15%" phonation="soft"
name="drc"><prosody pitch="-30%" rate="105%" volume="+3dB">
</prosody></amazon:effect></speak>
11. Frankie
Base Voice: English, US: Joey, Male
Opening and closing SSML Tags:
<speak><amazon:effect vocal-tract-length="80%" phonation="soft"
name="drc"><prosody pitch="x-high" rate="100%" volume="+0dB">
</prosody></amazon:effect></speak>
12. Sidney
Base Voice: English, AU: Nicole, Female
Opening and closing SSML Tags:
<speak><amazon:effect vocal-tract-length="120%" phonation="soft"
name="drc"><prosody pitch="x-low" rate="100%" volume="+0dB">
</prosody></amazon:effect></speak>
Conclusion
The twelve SSML formulas for genderless-sounding voices are just starting points. You can modify them to suit your needs. You can also use the general approach for creating additional voices using the many other Amazon Polly voices.
These voices, along with the many other aspects that comprise a persona, can result in characters that are engaging and delightful to interact with in your Alexa skills.
Thank you for reading, and happy skill building.
Credits
Header photo by Brooke Cagle on Unsplash