Where This Fits in the Exam
- Exam area: Describe features of Natural Language Processing (NLP) workloads on Azure (15–20%)
- Sub-area: Identify features of common NLP workload scenarios
- Key focus: Understanding what speech recognition and synthesis do, when to use them, and which Azure services support them
This topic is highly scenario-driven on the exam.
Overview: Speech in NLP Workloads
Speech-related NLP workloads allow AI systems to:
- Understand spoken language (speech recognition)
- Generate spoken language (speech synthesis)
Together, these capabilities enable voice-based interactions such as virtual assistants, voice bots, dictation tools, and accessibility solutions.
Speech Recognition
What Is Speech Recognition?
Speech recognition (also called speech-to-text) is the process of converting spoken audio into written text.
The AI system analyzes:
- Audio signals
- Phonemes and pronunciation
- Language patterns
- Context
And produces text that represents what was spoken.
Key Features of Speech Recognition
Speech recognition solutions can:
- Convert live or recorded audio into text
- Support real-time transcription
- Handle multiple languages and accents
- Apply noise reduction
- Recognize custom vocabulary (e.g., medical or technical terms)
- Provide timestamps for spoken words or phrases
Common Uses of Speech Recognition
Speech recognition is used when users speak instead of type.
Common scenarios include:
- Voice commands (e.g., “Turn on the lights”)
- Call center transcription
- Meeting and lecture transcription
- Voice-controlled applications
- Accessibility tools for users with limited mobility
- Voice input for chatbots and virtual assistants
Azure Services for Speech Recognition
In Azure, speech recognition is provided by:
Azure AI Speech (Speech service)
Capabilities include:
- Speech-to-text
- Real-time and batch transcription
- Language detection
- Custom speech models
Speech Synthesis
What Is Speech Synthesis?
Speech synthesis (also called text-to-speech) is the process of converting written text into spoken audio.
The goal is to produce natural, human-like speech that sounds fluent and expressive.
Key Features of Speech Synthesis
Speech synthesis solutions can:
- Convert text into spoken audio
- Use natural-sounding neural voices
- Support multiple languages and accents
- Adjust:
- Pitch
- Speed
- Tone
- Apply SSML (Speech Synthesis Markup Language) for fine control
- Generate speech for audio files or real-time playback
Common Uses of Speech Synthesis
Speech synthesis is used when systems need to speak to users.
Common scenarios include:
- Virtual assistants and chatbots
- Navigation and GPS systems
- Accessibility tools for visually impaired users
- Audiobooks and e-learning content
- Automated announcements
- Customer service voice bots
Azure Services for Speech Synthesis
In Azure, speech synthesis is also provided by:
Azure AI Speech (Speech service)
Capabilities include:
- Text-to-speech
- Neural voices
- Voice customization
- Multilingual speech output
Speech Recognition vs Speech Synthesis
| Capability | Speech Recognition | Speech Synthesis |
|---|---|---|
| Direction | Speech → Text | Text → Speech |
| Input | Audio | Text |
| Output | Text | Audio |
| Common Name | Speech-to-text | Text-to-speech |
| Example | Transcribing a call | Reading text aloud |
Combined Speech Workloads
Many real-world solutions use both capabilities together.
Example:
- User speaks a question (speech recognition)
- System processes the text using NLP or AI logic
- System responds verbally (speech synthesis)
This is the foundation of:
- Voice assistants
- Conversational AI
- Interactive voice response (IVR) systems
Exam-Focused Clues to Watch For 👀
On the AI-900 exam, speech workloads are usually described using phrases like:
- “Convert spoken audio into text” → Speech recognition
- “Generate spoken responses from text” → Speech synthesis
- “Voice-enabled application” → Azure AI Speech
- “Real-time transcription” → Speech recognition
- “Reads text aloud” → Speech synthesis
Key Takeaways for AI-900
- Speech recognition converts speech to text
- Speech synthesis converts text to speech
- Both are part of NLP workloads
- Azure AI Speech is the primary Azure service for both
- Common exam scenarios involve:
- Voice assistants
- Transcription
- Accessibility
- Customer service automation
Go to the Practice Exam Questions for this topic.
Go to the AI-900 Exam Prep Hub main page.
