Exploring SSML: Speech Synthesis Markup Language

[{"selector":"#anim-95a4654f-8a8e-4162-96d0-d4a7b27aaeb6 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate(0%, 0%) scale(1)","translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(.3,0,.55,1)","fill":"forwards"}] [{"selector":"#anim-63cbff79-c008-4cd4-acee-c5ca06d0a713","keyframes":{"opacity":[0,1]},"delay":0,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-105648c5-0321-40f6-bb68-0d520d24847d","keyframes":{"transform":["translate3d(-115.55555%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-cf48f736-3772-4da6-82b2-881b24a6207e","keyframes":{"opacity":[0,1]},"delay":2000,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-52c364f4-384c-47ba-9e1b-5f77c38d724a","keyframes":{"transform":["translate3d(0px, 236.61622%, 0)","translate3d(0px, 0px, 0)"]},"delay":2000,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] allowing adjustments in pitch, pronunciation, speaking rate, volume, and more. Speech Synthesis Markup Language (SSML) is an XML-based markup language that provides enhanced control over text-to-speech output attributes,

[{"selector":"#anim-f5bc820d-30ab-40fe-b680-0d20407bafbd [data-leaf-element=\"true\"]","keyframes":{"transform":["translate(0%, 0%) scale(1.5)","translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(.3,0,.55,1)","fill":"forwards"}] [{"selector":"#anim-14d2545a-b963-4515-84bc-f491f00b38da","keyframes":{"opacity":[0,1]},"delay":2000,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-67fd3452-b7de-406c-a4ff-a498b3b3286a","keyframes":{"transform":["translate3d(0px, 241.16169%, 0)","translate3d(0px, 0px, 0)"]},"delay":2000,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-af01f542-11f3-4cf0-a973-84712e29ffd5","keyframes":{"opacity":[0,1]},"delay":0,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-3df66caf-f8ff-4530-b618-451b8f3654fa","keyframes":{"transform":["translate3d(115.55555%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] It allows you to structure input text, defining paragraphs, sentences, breaks, pauses, or silence. SSML is designed to offer flexibility in shaping the sound of speech output.

[{"selector":"#anim-efa996df-c331-4d3f-9427-be69a9747f32 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate(0%, 0%) scale(1)","translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(.3,0,.55,1)","fill":"forwards"}] [{"selector":"#anim-da49c50e-78ac-498c-b989-e0f02a2a55a9","keyframes":{"opacity":[0,1]},"delay":2000,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-c6d206d3-e27e-49b9-986d-3e8ed8b28530","keyframes":{"transform":["translate3d(0px, 276.30810%, 0)","translate3d(0px, 0px, 0)"]},"delay":2000,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-6c7484e9-316b-43ab-be3c-ac5d780bfed2","keyframes":{"opacity":[0,1]},"delay":0,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-e658e061-93fa-47a5-b824-7d07c46324ac","keyframes":{"transform":["translate3d(-115.2381%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] You can also choose voices, languages, styles, roles, and adjust emphasis, speaking rate, pitch, and volume. SSML even supports the insertion of prerecorded audio, such as sound effects or musical notes. Additionally, SSML enables the use of event tags like bookmarks or visemes, providing a visual description of phonemes, the individual speech sounds in spoken language.

[{"selector":"#anim-2373b1b6-67b4-435c-951d-d65231a4c30f [data-leaf-element=\"true\"]","keyframes":{"transform":["translate(0%, 0%) scale(1.5)","translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(.3,0,.55,1)","fill":"forwards"}] [{"selector":"#anim-9a937691-5b1a-483b-989a-d1a29035f8a3","keyframes":{"opacity":[0,1]},"delay":2000,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-ff265388-c52b-41e7-a6d9-9900c486f16d","keyframes":{"transform":["translate3d(0px, 457.34123%, 0)","translate3d(0px, 0px, 0)"]},"delay":2000,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-ea3fe199-1588-4939-a82c-d002c06e07d6","keyframes":{"opacity":[0,1]},"delay":0,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-8c65b567-53e2-4f3f-a5a1-f51bb173f905","keyframes":{"transform":["translate3d(115.55555%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] Furthermore, SSML allows users to define how specific words or mathematical expressions are pronounced. SSML empowers users to control the pronunciation of the output audio. By employing phonemes and custom lexicons, users can enhance pronunciation.

[{"selector":"#anim-cb5a86b3-f9d7-40f6-abbb-cceba7437ae8 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate(0%, 0%) scale(1)","translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(.3,0,.55,1)","fill":"forwards"}] [{"selector":"#anim-4d0a2457-3e4f-4014-85cb-fdb0975b4b6d","keyframes":{"opacity":[0,1]},"delay":2000,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-cb4e6926-4fff-4493-b210-c0d3f8d134c9","keyframes":{"transform":["translate3d(0px, 142.00056%, 0)","translate3d(0px, 0px, 0)"]},"delay":2000,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-73e7f994-8594-4110-b734-92fe6cdad443","keyframes":{"opacity":[0,1]},"delay":0,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-0f6f1ae2-d491-4cef-9071-49893d33ff31","keyframes":{"transform":["translate3d(-115.2381%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] 1. Speech Studio: Utilize the audio content creation tool in Speech Studio, allowing the authoring of both plain text and SSML. 2. Batch Synthesis API: The batch synthesis API accepts SSML through the inputs property. 3. Speech CLI: Users can incorporate SSML via the spx synthesize --ssml SSML command line argument in the Speech CLI. 4. Speech SDK: Across different supported languages, the Speech SDK accepts SSML via the "speak" SSML method. Working with SSML To leverage SSML functionality, users have several options:

Exploring SSML: Speech Synthesis Markup Language

More Stories

Sound of Text