Where This Fits in the Exam
- Exam: AI-900 – Microsoft Azure AI Fundamentals
- Domain: Describe features of Natural Language Processing (NLP) workloads on Azure (15–20%)
- Sub-area: Identify Azure tools and services for NLP workloads
For AI-900, Microsoft expects you to understand what the Azure AI Speech service does, when to use it, and how it differs from other AI services — not how to code it.
What Is the Azure AI Speech Service?
The Azure AI Speech service is a cloud-based service that enables applications to process spoken language. It allows systems to:
- Convert speech into text
- Convert text into natural-sounding speech
- Translate spoken language
- Recognize speakers and voices
It is part of Azure AI Services and focuses on audio and voice-based NLP workloads.
Core Capabilities of Azure AI Speech
1. Speech to Text
Speech to Text converts spoken audio into written text.
Key features:
- Real-time transcription
- Batch transcription of audio files
- Support for multiple languages
- Automatic punctuation and formatting
Common use cases:
- Transcribing meetings or calls
- Voice-controlled applications
- Call center analytics
- Accessibility tools (captions and subtitles)
📌 AI-900 exam tip:
If the question mentions converting spoken words into text, the answer is Azure AI Speech (Speech to Text).
2. Text to Speech
Text to Speech converts written text into natural-sounding spoken audio.
Key features:
- Neural voices that sound human-like
- Multiple languages and accents
- Adjustable pitch, speed, and tone
- Support for voice styles (e.g., cheerful, calm)
Common use cases:
- Voice assistants
- Read-aloud applications
- Accessibility for visually impaired users
- Automated announcements
📌 AI-900 exam tip:
If the scenario describes reading text out loud, think Text to Speech.
3. Speech Translation
Speech Translation converts spoken language into another language, either as text or synthesized speech.
Key features:
- Real-time speech translation
- Multi-language support
- Can output translated speech or text
Common use cases:
- Multilingual meetings
- Travel and tourism apps
- International customer support
📌 AI-900 exam tip:
Speech translation handles spoken language, while Azure Translator handles written text.
4. Speaker Recognition
Speaker Recognition identifies or verifies who is speaking based on their voice.
Capabilities include:
- Speaker verification (confirming identity)
- Speaker identification (determining who is speaking)
Common use cases:
- Secure voice authentication
- Call center speaker tracking
- Personalized voice experiences
📌 AI-900 note:
You only need to understand what it does, not how voice models are trained.
5. Speech-to-Speech Scenarios
By combining Speech to Text, Translation, and Text to Speech, Azure AI Speech supports end-to-end voice experiences, such as:
- Speaking in one language and hearing a response in another
- Voice-based chatbots
- Smart devices and assistants
How Azure AI Speech Differs from Other Azure AI Services
| Service | Primary Purpose |
|---|---|
| Azure AI Speech | Spoken language (audio) |
| Azure AI Language | Written text analysis |
| Azure Translator | Text translation |
| Azure AI Vision | Images and video |
📌 Exam pattern to watch for:
Microsoft often tests whether you can choose the right service based on the input type (audio vs text vs image).
Typical AI-900 Scenarios Involving Azure AI Speech
You should choose Azure AI Speech when a scenario involves:
- Audio recordings
- Live speech
- Voice input or output
- Real-time transcription
- Spoken translation
Key Takeaways for the AI-900 Exam
- Azure AI Speech focuses on spoken language, not written text
- Core capabilities:
- Speech to Text
- Text to Speech
- Speech Translation
- Speaker Recognition
- Exam questions are scenario-based, not technical
- If the question mentions audio, voice, or speech, Azure AI Speech is usually the answer
Go to the Practice Exam Questions for this topic.
Go to the AI-900 Exam Prep Hub main page.
