Amazon Alexa Skills Development: SSML Copy and Paste Worksheet 1 – UPDATED!

SSML Copy and Paste Worksheet Image
Header Image

Update: December 6, 2020

Additional SSML support was announced by Amazon on November 24th, 2020:

Alexa Speaking Styles and Emotions Now Available in Additional Languages

The worksheets and incompatible tags chart have been updated. Summary of changes:

  • Music speaking style: Canadian English and British English are now supported.
  • Conversational speaking style: Japanese and Italian are now supported.
  • Emotions: British English and Japanese are now supported.

INTRODUCTION

This is a handy web page and .txt document for copying and pasting Speech Synthesis Markup Language (SSML) tags. Included are tags pre-configured with commonly used parameters.

Separate SSML Copy and Paste web pages are available for the Alexa Polly Voice and Language tags and the English Speechcons:

SSML Copy and Paste Worksheet 2 for Alexa Polly Voice and Language Tags

English Regional Speechcons SSML Cross Reference Worksheet

AMAZON REFERENCE DOCUMENTATION

Speech Synthesis Markup Language (SSML) Reference

INCOMPATIBLE TAGS

Not all tags can be applied to the same chunk of text:

ActionCompatible SSML tags and voicesIncompatible SSML tagsAvailable only in the following Regions
Alexa default voiceall
Alexa Polly voices - Matthew and JoannaCompatible with conversational and news styleslong-form and music styles, emotion, emphasisall, but may require pairing with lang tag to improve pronunciation.
Alexa Polly voice - LupeCompatible with news styleConversational, long-form and music styles, emotion, emphasisall, but may require pairing with lang tag to improve pronunciation.
Alexa Polly voices - all othersstyles, emotion, speechcons, emphasisall, but may require pairing with lang tag to improve pronunciation.
conversational styleAlexa default voice. Matthew and Joanna (requires pairing with a voice tag)emotion, speechcons, emphasis, prosody with pitch attributeen-US, ja-JP, it-IT
news styleAlexa default voice. Matthew, Joanna and Lupe (requires pairing with a voice tag)emotion, speechcons, emphasis, prosody with pitch attributeen-US and en-AU
long-form styleAlexa default voicevoice, emotion, speechcons, emphasis, prosody with pitch attributeen-US
music styleAlexa default voicevoice, emotion, speechcons, emphasis, prosody with pitch attributeen-US, en-CA, en-GB
emotionAlexa default voicevoice, styles, speechcons, emphasis, prosody with pitch attributeen-US, en-GB, ja-JP
emphasisAlexa default voicevoice, styles, emotion, speechcons, prosody with pitch attributeall
prosody with pitch attributeallstyles, emotion, speechcons, emphasisall
speechconsAlexa default voicevoice, styles, emotion, emphasis, prosody with pitch attributeregionalized listings

Refer to the Amazon Reference Documentation for details and best practices.

VOICEFLOW USERS

These worksheets are designed to be easily used with Voiceflow, a no-code skill-building tool.  Be aware that Voiceflow does not require the use of the <speak></speak> tags, as these are included automatically behind the scenes.

After this article was originally published, Voiceflow has also added the ability to select speech effects in the Speak block and other blocks where voices are recited. If you proceed with copying and pasting SSML tags or manage them using variables, DO NOT select effects using the drop-downs in the Voiceflow text boxes.

INSTRUCTIONS

For your convenience, many the copy and paste formulas are in several formats.  Most have both single line versions, and versions which separate the opening and closing tags on two separate lines.  The latter is useful for complex tags or to configure them into variables.

Just copy and paste into wherever you are drafting scripts and prompts, directly into your skill building tool, or into your code. 

DOWNLOADABLE WORKSHEETS

The SSML tags are available in downloadable .txt documents for when working offline:

SSML Tag Categories

The SSML types in yellow highlighting are the ones I use most frequently.

1. amazon:domain – conversational

2. amazon:domain – long-form

3. amazon:domain – music

4. amazon:domain – news

5. amazon:effect

6. amazon:emotion

7. audio (several variations depending on URL source)

8. audio from the Amazon Sound Library (instructions)

9. break

10. emphasis

11. lang (link to other worksheet)

12. p (paragraph)

13. phoneme

14. prosody – rate

15. prosody – pitch

16. prosody – volume

17. prosody Valid Parameter Values

18. prosody – three prosody attributes (pitch, rate, volume)

19. prosody – two attributes (pitch, rate)

20. prosody – two attributes (pitch, volume)

21. prosody  – two attributes (rate, volume)

22. nested polly voice, lang and prosody combinations (link to other worksheet)

23. s (sentence)

24. say-as characters (spell-out)

25. say-as cardinal (number)

26. say-as ordinal (number)

27. say-as digits (number)

28. say-as fraction

29. say-as unit

30. say-as date

31. say-as time

32. say-as telephone (number)

33. say-as address

34. say-as interjection (Speechcons) (link to other worksheet)

35. say-as expletive (bleep)

36. speak

37. sub (substitute)

38. voice (link to other worksheet)

39. w amazon:VB – pronounce word as a present simple verb

40. w amazon:VBD – pronounce word as a verb, past participle

41. w amazon:NN – pronounce word as a noun

42. w amazon:SENSE_1 – non-default word pronunciation


SSML TAG COPY AND PASTE WORKSHEET

1. amazon:domain – conversational

// en-US, it-IT and ja-JP. Native Alexa voice

<amazon:domain name="conversational">
</amazon:domain>
// en-US. Matthew and Joanna

<voice name="Joanna"><amazon:domain name="conversational">
<voice name="Matthew"><amazon:domain name="conversational">
</amazon:domain></voice>

2. amazon:domain – long-form

// en-US only. Native Alexa voice only

<amazon:domain name="long-form">
</amazon:domain>

3. amazon:domain – music

// en-US, en-CA and en-GB. Native Alexa voice only

<amazon:domain name="music">
</amazon:domain>

4. amazon:domain – news

// en-US and en-AU. Native Alexa voice

<amazon:domain name="news">
</amazon:domain>

// en-US. Matthew and Joanna voices. 

<voice name="Joanna"><amazon:domain name="news">
<voice name="Matthew"><amazon:domain name="news">
</amazon:domain></voice>

// es-US (Spanish/American). Lupe voice

<voice name="Lupe"><amazon:domain name="news">
</amazon:domain></voice>

5. amazon:effect

<amazon:effect name="whispered">*</amazon:effect>

<amazon:effect name="whispered">
</amazon:effect>

6. amazon:emotion

// en-US, en-GB and ja-JP. Native Alexa voice only.

<amazon:emotion name="excited" intensity="low">
<amazon:emotion name="excited" intensity="medium">
<amazon:emotion name="excited" intensity="high">
</amazon:emotion>

<amazon:emotion name="disappointed" intensity="low">
<amazon:emotion name="disappointed" intensity="medium">
<amazon:emotion name="disappointed" intensity="high">
</amazon:emotion>

7. audio (several variations depending on URL source)

// Basic tag

<audio src="*" />


<audio src="
*
" />

// Tag with MP3 extension

<audio src="https://*.mp3" />


<audio src="https://
*
.mp3" />


// Tag for a sound file on Amazon AWS S3

<audio src="https://s3.amazonaws.com/*.mp3" />


<audio src="https://s3.amazonaws.com/
*
.mp3" />

// Tag for a sound file on Amazon AWS Cloudfront

<audio src="https://*.cloudfront.net/*.mp3" />


<audio src="https:/
*
.cloudfront.net/
*
.mp3" />

8. audio from the Amazon Sound Library.

Instructions:

a. Open the Amazon Sound Library Reference Page:

Alexa Skills Kit Sound Library

b. Search for an available sound.
c. When find a sound, click on the row. This will open up the Source code.
d. Copy / Paste the source code, which is already in an SSML tag format.

Example:

Category: Animals/Bear
Name: Bear Groan Roar (1)

<audio src="soundbank://soundlibrary/animals/amzn_sfx_bear_groan_roar_01"/>

9. break

Default value with no specified attribute is the same as “medium.”

<break/>

<break strength="none"/>
<break strength="x-weak"/>
<break strength="weak"/>
<break strength="medium"/>
<break strength="strong"/>
<break strength="x-strong"/>

<break time="100ms"/>
<break time="300ms"/>
<break time="500ms"/>
<break time="700ms"/>
<break time="1s"/>
<break time="2s"/>
<break time="3s"/>
<break time="5s"/>
<break time="10s"/>

10. emphasis

Default value with no specified attribute is the same as “moderate.”

<emphasis>*</emphasis>
<emphasis level="strong">*</emphasis>
<emphasis level="moderate">*</emphasis>
<emphasis level="reduced">*</emphasis>

<emphasis level="strong">
<emphasis level="moderate">
<emphasis level="reduced">
</emphasis>

11. lang

Separate worksheet:
https://voices.app/?p=2054

12. p (paragraph)

Equivalent to break strength = x-strong before and after the tag.

<p>*</p>

<p>
</p>

13. phoneme

This SSML tag can be used to change the pronunciation of words using the International Phonetic Alphabet (IPA). Lists of supported symbols, by language, are in the Amazon SSML Reference

IPA

<phoneme alphabet="ipa" ph="*">*</phoneme>

<phoneme alphabet="ipa" ph="
*
">
*
</phoneme>

Example:
You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>. 
I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.

X-SAMPA

<phoneme alphabet="x-sampa" ph="*">*</phoneme>

<phoneme alphabet="x-sampa" ph="
*
">
*
</phoneme>

Example:

<phoneme alphabet="x-sampa" ph="ˈbɑ.təl">bottle</phoneme>

14. prosody – rate

<prosody rate="x-slow">*</prosody>
<prosody rate="slow">*</prosody>
<prosody rate="medium">*</prosody>
<prosody rate="fast">*</prosody>
<prosody rate="x-fast">*</prosody>

<prosody rate="x-slow">
<prosody rate="slow">
<prosody rate="medium">
<prosody rate="fast">
<prosody rate="x-fast">
</prosody>


<prosody rate="n%">*</prosody>

<prosody rate="20%">*</prosody>
<prosody rate="50%">*</prosody>
<prosody rate="60%">*</prosody>
<prosody rate="70%">*</prosody>
<prosody rate="80%">*</prosody>
<prosody rate="90%">*</prosody>
<prosody rate="93%">*</prosody>
<prosody rate="95%">*</prosody>
<prosody rate="97%">*</prosody>

<prosody rate="103%">*</prosody>
<prosody rate="105%">*</prosody>
<prosody rate="107%">*</prosody>
<prosody rate="110%">*</prosody>
<prosody rate="112%">*</prosody>
<prosody rate="115%">*</prosody>
<prosody rate="118%">*</prosody>
<prosody rate="120%">*</prosody>

<prosody rate="n%">
</prosody>

<prosody rate="20%">
<prosody rate="50%">
<prosody rate="60%">
<prosody rate="70%">
<prosody rate="80%">
<prosody rate="90%">
<prosody rate="93%">
<prosody rate="95%">
<prosody rate="97%">
</prosody>

<prosody rate="103%">
<prosody rate="105%">
<prosody rate="107%">
<prosody rate="110%">
<prosody rate="112%">
<prosody rate="115%">
<prosody rate="118%">
<prosody rate="120%">
</prosody>

15. prosody – pitch

<prosody pitch="x-low">*</prosody>
<prosody pitch="low">*</prosody>
<prosody pitch="medium">*</prosody>
<prosody pitch="high">*</prosody>
<prosody pitch="x-high">*</prosody>

<prosody pitch="x-low">
<prosody pitch="low">
<prosody pitch="medium">
<prosody pitch="high">
<prosody pitch="x-high">
</prosody>

<prosody pitch="-n%">*</prosody>
<prosody pitch="+n%">*</prosody>

<prosody pitch="-33.3%">*</prosody>
<prosody pitch="-20%">*</prosody>
<prosody pitch="-10%">*</prosody>
<prosody pitch="-5%">*</prosody>

<prosody pitch="+5%">*</prosody>
<prosody pitch="+10%">*</prosody>
<prosody pitch="+20%">*</prosody>
<prosody pitch="+30%">*</prosody>
<prosody pitch="+50%">*</prosody>

<prosody pitch="-n%">
<prosody pitch="+n%">
</prosody>

<prosody pitch="-33.3%">
<prosody pitch="-20%">
<prosody pitch="-10%">
<prosody pitch="-5%">
</prosody>

<prosody pitch="+5%">
<prosody pitch="+10%">
<prosody pitch="+20%">
<prosody pitch="+30%">
<prosody pitch="+50%">
</prosody>

16. prosody – volume

<prosody volume="x-soft">*</prosody>
<prosody volume="soft">*</prosody>
<prosody volume="medium">*</prosody>
<prosody volume="loud">*</prosody>
<prosody volume="x-loud">*</prosody>

<prosody volume="x-soft">
<prosody volume="soft">
<prosody volume="medium">
<prosody volume="loud">
<prosody volume="x-loud">
</prosody>

<prosody volume="-ndB">*</prosody>
<prosody volume="+ndB">*</prosody>

<prosody volume="-6dB">*</prosody>
<prosody volume="-5dB">*</prosody>
<prosody volume="-4dB">*</prosody>
<prosody volume="-3dB">*</prosody>
<prosody volume="-2dB">*</prosody>
<prosody volume="-1dB">*</prosody>
<prosody volume="-0.5dB">*</prosody>

<prosody volume="+0.5dB">*</prosody>
<prosody volume="+1dB">*</prosody>
<prosody volume="+2dB">*</prosody>
<prosody volume="+3dB">*</prosody>
<prosody volume="+4dB">*</prosody>

<prosody volume="-ndB">
<prosody volume="+ndB">
</prosody>

<prosody volume="-6dB">
<prosody volume="-5dB">
<prosody volume="-4dB">
<prosody volume="-3dB">
<prosody volume="-2dB">
<prosody volume="-1dB">
<prosody volume="-0.5dB">
</prosody>

<prosody volume="+0.5dB">
<prosody volume="+1dB">
<prosody volume="+2dB">
<prosody volume="+3dB">
<prosody volume="+4dB">
</prosody>

17. prosody valid parameter values

Multiple prosody tags can be combined in order to make them more compact and enable use of a single closing tag.

Numerical parameter values are possible for all 3 prosody types. However due to the variety of possible combinations when combining prosody attributes in the next several sections below, only the word-based values are listed here. If you need more refinement check for valid values in the SSML reference document or see the examples in the above sections.

Copy and paste the desired parameter value into the formulas in the next sections as part of configuring your voices. You can test and adjust the values using the ADC Voice & Tone Simulator.

For Pitch, check for incompatible tags in the reference documentation.

A. Pitch

x-slow
slow
medium
fast
x-faxt

B. Rate

x-low
low
medium
high
x-high

C. Volume

silent
x-soft
soft
medium
loud
x-loud

18. prosody – three prosody attributes (pitch, rate, volume)

<prosody pitch="*" rate="*" volume="*">*</prosody>

<prosody pitch="*" rate="*" volume="*">
</prosody>

<prosody pitch="
*
" rate="
*
" volume="
*
">
</prosody>

// Example:
<prosody pitch="-27%" rate="95%" volume="+0dB">hello</prosody>

19. prosody – two attributes (pitch, rate)

<prosody pitch="*" rate="*">*</prosody>

<prosody pitch="*" rate="*">
</prosody>

<prosody pitch="
*
" rate="
*
">
</prosody>

// Example:
<prosody pitch="high" rate="slow">hello</prosody>

20. prosody – two attributes (pitch, volume)

<prosody pitch="*" volume="*">*</prosody>

<prosody pitch="*" volume="*">
</prosody>

<prosody pitch="
*
" volume="
*
">
</prosody>

// Example:
<prosody pitch="high" volume="low">hello</prosody>

21. prosody – two attributes (rate, volume)

<prosody rate="*" volume="*">*</prosody>

<prosody rate="*" volume="*">
</prosody>

<prosody rate="
*
" volume="
*
">
</prosody>

// Example:
<prosody rate="slow" volume="low">hello</prosody>

22. nested polly voice, lang and prosody combinations

Separate worksheet:
https://voices.app/?p=2054

23. s (sentence)

Equivalent to break strength = strong before and after the tag or ending a sentence with a period.

<s>*</s>

<s>
</s>

24. say-as characters (spell-out)

<say-as interpret-as="characters">*</say-as>

<say-as interpret-as="characters">
</say-as>

// Recite each individual letter
// Example: "h-e-l-l-o"

<say-as interpret-as="characters">hello</say-as>

25. say-as cardinal (number)

<say-as interpret-as="cardinal">*</say-as>

<say-as interpret-as="cardinal">
</say-as>

// Recite the value as a cardinal number
// Example: "Twelve thousand three hundred and forty five"

<say-as interpret-as="cardinal">12345</say-as>

26. say-as ordinal (number)

<say-as interpret-as="ordinal">*</say-as>

<say-as interpret-as="ordinal">
</say-as>

// Recite the value as an ordinal number. 
// An ordinal number is a position in a series.  For example, first, 
// second, third, etc.

// Example: "You are now third in line"

You are now <say-as interpret-as="ordinal">3</say-as> in line

// Example: "Twelve thousand three hundred and forty fifth"

<say-as interpret-as="ordinal">12345</say-as>

27. say-as digits (number)

<say-as interpret-as="digits">
</say-as>

<say-as interpret-as="digits">*</say-as>

// Recite each digit of a number individually.
// Example: "one two three four five"

<say-as interpret-as="digits">12345</say-as>

28. say-as fraction

<say-as interpret-as="fraction">*</say-as>

<say-as interpret-as="fraction">
</say-as>

// Recite numerical value as a fraction.  Alexa can recite both common 
// fractions and mixed fractions.

// Common fractions. Examples: 
// "1/2" will be recited as "half"
// "2/3" will be recited as "two thirds" 
// "11/16" will be recited as "eleven sixteenths."

<say-as interpret-as="fraction">1/2</say-as>

// Mixed fractions. Examples: 
// "2+1/2" will be recited as "two and a half"

<say-as interpret-as="fraction">2+1/2</say-as>

// "2+2/3" will be recited as "Two and two thirds."

<say-as interpret-as="fraction">2+2/3</say-as>

29. say-as unit

<say-as interpret-as="unit">*</say-as>

<say-as interpret-as="unit">
</say-as>

// Recite full name of an abbreviated unit value. Examples:
// "lb" and "lb." are recited as "pound"
// "lbs" and "lbs." are recited as "pounds."

<say-as interpret-as="unit">lb</say-as>

30. say-as date

This tag can configure how dates are recited based on a variety of abbreviated configurations.

<say-as interpret-as="date">*</say-as>
<say-as interpret-as="date" format="mdy">*</say-as>
<say-as interpret-as="date" format="dmy">*</say-as>
<say-as interpret-as="date" format="ymd">*</say-as>
<say-as interpret-as="date" format="md">*</say-as>
<say-as interpret-as="date" format="dm">*</say-as>
<say-as interpret-as="date" format="ym">*</say-as>
<say-as interpret-as="date" format="my">*</say-as>
<say-as interpret-as="date" format="d">*</say-as>
<say-as interpret-as="date" format="m">*</say-as>
<say-as interpret-as="date" format="y">*</say-as>

<say-as interpret-as="date">
<say-as interpret-as="date" format="mdy">
<say-as interpret-as="date" format="dmy">
<say-as interpret-as="date" format="ymd">
<say-as interpret-as="date" format="md">
<say-as interpret-as="date" format="dm">
<say-as interpret-as="date" format="ym">
<say-as interpret-as="date" format="my">
<say-as interpret-as="date" format="d">
<say-as interpret-as="date" format="m">
<say-as interpret-as="date" format="y">
</say-as>
Examples:
<say-as interpret-as="date">20191209</say-as>
<say-as interpret-as="date" format="mdy">12/9/19</say-as>
<say-as interpret-as="date" format="mdy">12/9/2019</say-as>
<say-as interpret-as="date" format="dmy">9/12/19</say-as>
<say-as interpret-as="date" format="ymd">19/12/9</say-as>
// Interpreted as: "December ninth twenty nineteen."

<say-as interpret-as="date" format="md">12/9</say-as>
<say-as interpret-as="date" format="md">12/09</say-as>
<say-as interpret-as="date" format="dm">9/12</say-as>
<say-as interpret-as="date" format="dm">09/12</say-as>
// Interpreted as: "December ninth."

<say-as interpret-as="date" format="ym">19/12</say-as>
<say-as interpret-as="date" format="my">12/19</say-as>
// Interpreted as: "December twenty nineteen."

<say-as interpret-as="date" format="d">9</say-as>
// Interpreted as: "ninth."

<say-as interpret-as="date" format="m">12</say-as>
// Interpreted as: "December."

<say-as interpret-as="date" format="y">2019</say-as>
<say-as interpret-as="date" format="y">19</say-as>
// Interpreted as: "twenty nineteen."

// NOTE: Format YYYMMDD is not supported.
// Use of question marks will bypass recitation of that part of the date:

<say-as interpret-as="date">????0922</say-as>

// Interpreted as: "September twenty second."

31. say-as time

<say-as interpret-as="time">*</say-as>

<say-as interpret-as="time">
</say-as>

// Recite duration in minutes and seconds. Example: 

<say-as interpret-as="time">1'21"</say-as>

// This will be recited as as: "One minute and twenty one seconds."

32. say-as telephone (number)

<say-as interpret-as="telephone">*</say-as>

<say-as interpret-as="telephone">
</say-as>

// Examples:
<say-as interpret-as="telephone">5551212</say-as>
<say-as interpret-as="telephone">555-1212</say-as>
<say-as interpret-as="telephone">2025551212</say-as>
<say-as interpret-as="telephone">2025551212x345</say-as>

// The last one will be recited as "two oh two, five five five, one two one two,
// extension three four five."

// NOTE: The say-as tag is not needed if the phone number is formatted with dashes. 
// For example, "202-555-1212" will be recited with a pause after each dash.  
// However if the text format is "2025551212" the say-as tag is needed in order to 
// recite it as a telephone number.

33. say-as address

<say-as interpret-as="address">*</say-as>

<say-as interpret-as="address">
</say-as>

// Example: 

<say-as interpret-as="address">410 Terry Ave. N, Seattle WA, 98109</say-as>

// This will be recited as: "Four ten Terry Avenue North, Seattle Washington, 
// nine eight zero one nine."

34. say-as interjection (Speechcons)

Separate worksheet:
https://voices.app/?p=856

35. say-as expletive (bleep)

<say-as interpret-as="expletive">*</say-as>

<say-as interpret-as="expletive">
</say-as>

// Recite a "bleep"
// Example: "bad word" is recited as "bleep"

<say-as interpret-as="expletive">bad word</say-as>

36. speak

Root element of SSML documents. Not required for Voiceflow.

<speak>*</speak>

<speak>
</speak>

37. sub (substitute)

<sub alias="*">*</sub>

<sub alias="*">
</sub>

<sub alias="
*
">
*
</sub>

// Example: <sub alias="aluminum">Al</sub>

38. voice

Separate worksheet:
https://voices.app/?p=2054

39. w amazon:VB – pronounce word as a present simple verb

<w role="amazon:VB">*</w>

<w role="amazon:VB">
</w>

// Examples:

<w role="amazon:VB">read</w> 

// As in: "I am going to read a book" and not "I have read the book."


<w role="amazon:VB">object</w>

// As in: "I object, your honor!" and not "The object is an apple."

40. w amazon:VBD – pronounce word as a verb, past participle

<w role="amazon:VBD">*</w>

<w role="amazon:VBD">
</w>

// Example:

<w role="amazon:VB">read</w> //verb, past participle

// As in: "I have read the book" and not "I am going to read the book."

41. w amazon:NN – pronounce word as a noun

<w role="amazon:NN">*</w>

<w role="amazon:NN">
</w>

// Example:

<w role="amazon:NN">object</w>

// As in: "The object is an apple" and not "I object your honor!"

42. w amazon:SENSE_1 – non-default word pronunciation

<w role="amazon:SENSE_1">*</w>

<w role="amazon:SENSE_1">
</w>

// Example:

<w role="amazon:SENSE_1">bass</w>

// "I play the bass guitar" is the primary pronunciation of "bass."
// This SSML tag changes to the secondary pronunciation of "bass."  
// As in:  "Let's go bass fishing" and not "I play the bass guitar."

CONCLUSION

Hopefully this worksheet will help you save a bit of time while marking up your scripts and prompts, and editing how they sound during testing.

Additional SSML Copy/Paste worksheets are available in the Tools section of the voices.app website. 

voices.app Tools

A variety of tutorials, including how to effectively manage character voices using variables, are available on the website as well.

Thank you for reading, and happy skill building!

CREDITS

Header image by Burst, Pexels