Welcome to the AI-901: Azure AI Fundamentals Exam Prep Hub!
Welcome to the one-stop hub with information for preparing for the AI-901: Azure AI Fundamentals certification exam. The content for this exam helps you to demonstrate that “you have conceptual knowledge of AI solutions in Azure and the foundational technical skills to work with them”. You will also need “knowledge of Python coding syntax and programming techniques, and you should be familiar with Azure resources”. Upon successful completion of the exam, you earn the Microsoft Certified: Azure AI Fundamentals certification.
This hub provides information directly here (topic-by-topic as outlined in the official study guide), links to a number of external resources, tips for preparing for the exam, practice tests, and section questions to help you prepare. Bookmark this page and use it as a guide to ensure that you are fully covering all relevant topics for the AI-901 exam and making use of as many of the resources available as possible.
Audience profile (from Microsoft’s site)
As a candidate for this Microsoft Certification, you’re at the beginning of your career in AI solution development. These Microsoft certifications offer opportunities to demonstrate your understanding of machine learning, AI concepts, and Azure services, whether you are starting your career or advancing your skills in AI solution development. Both certifications are designed for candidates from technical and non-technical backgrounds—prior experience in data science or software engineering is not required, though familiarity with basic cloud concepts and client-server applications will be helpful.
For the AI-901, you should have foundational knowledge of AI workloads and understand the basic principles of AI and machine learning. And also, you should have foundational technical skills for working with AI solutions in Azure, conceptual knowledge of Azure-based AI solutions, and familiarity with Python coding syntax and programming techniques, as well as Azure resources.
This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. This topic falls under these sections: Implement AI solutions by using Microsoft Foundry (55–60%) --> Implement AI solutions for information extraction by using Foundry --> Build a lightweight application with Information Extraction capabilities by using Content Understanding
Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.
Modern organizations often need applications that can automatically extract information from documents, images, audio, and video. Azure AI services and Microsoft Foundry tools make it possible to create lightweight applications that use AI-powered content understanding without requiring advanced machine learning expertise.
For the AI-901 certification exam, candidates should understand the foundational concepts involved in building lightweight applications with information extraction capabilities by using Azure Content Understanding and Microsoft Foundry.
This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.
What Is Information Extraction?
Information extraction is the process of automatically identifying and retrieving useful data from content.
AI systems can extract information from:
Documents
Images
Audio
Video
Text
Examples include:
Names
Dates
Invoice totals
Keywords
Objects
Spoken words
What Is Azure Content Understanding?
Azure Content Understanding enables AI-powered analysis of different types of content.
Capabilities include:
OCR (Optical Character Recognition)
Speech recognition
Entity extraction
Image analysis
Video analysis
Classification
Caption generation
What Is a Lightweight Application?
A lightweight application is a simple application that performs focused tasks using cloud-based AI services.
Characteristics include:
Minimal infrastructure
API-based communication
Rapid development
Simple user interface
Cloud-hosted AI processing
For AI-901, candidates should understand concepts and workflows rather than advanced coding details.
Azure AI Foundry
Azure AI Foundry provides tools for building and testing AI applications.
Developers can:
Access AI models
Configure services
Test prompts
Analyze content
Build AI-powered workflows
Common Information Extraction Capabilities
OCR (Optical Character Recognition)
OCR extracts text from images and scanned documents.
Example
Input
Photo of a receipt
Output
Store name
Total amount
Purchase date
Entity Extraction
AI systems can identify important entities within content.
Examples of Entities
Names
Locations
Organizations
Phone numbers
Dates
Speech Recognition
Speech recognition converts spoken language into text.
Example
Input
Customer support call recording
Output
Searchable transcript
Object Detection
Object detection identifies objects within images or video.
Example
A warehouse-monitoring application may detect:
Boxes
Forklifts
Employees
Sentiment Analysis
Sentiment analysis determines emotional tone.
Example
Customer feedback classified as:
Positive
Neutral
Negative
Typical Lightweight Application Workflow
A lightweight information-extraction application often follows these steps:
User uploads content
Application sends content to Azure AI service
AI analyzes content
Structured results are returned
Application displays extracted information
Example Workflow
User uploads:
Image
PDF
Audio file
Video file
AI extracts:
Text
Keywords
Objects
Entities
Captions
APIs and Endpoints
Applications communicate with Azure AI services through:
APIs
Endpoints
The application sends content to the AI service and receives structured results.
Authentication
Applications must authenticate securely before using Azure AI services.
Common authentication methods include:
API keys
Azure credentials
Managed identities
Example High-Level Pseudocode
content = upload_file()
results = analyze_content(content)
display_results(results)
For AI-901, understanding the workflow is more important than memorizing exact syntax.
Structured Outputs
AI systems often return structured data formats such as:
JSON
Tables
Lists
Metadata
Structured outputs make integration easier.
Example JSON-Like Output
{
"invoiceNumber": "INV-1001",
"date": "2026-05-15",
"total": "$245.99"
}
Common Real-World Scenarios
Scenario 1: Invoice Processing
Goal
Automatically extract invoice data.
Extracted Information
Vendor name
Invoice number
Total amount
Due date
Scenario 2: Customer Service Analytics
Goal
Analyze customer interactions.
Extracted Information
Topics
Sentiment
Keywords
Transcripts
Scenario 3: Healthcare Document Analysis
Goal
Extract information from medical documents.
Extracted Information
Patient names
Dates
Medical terms
Scenario 4: Media Monitoring
Goal
Analyze audio and video content.
Extracted Information
Captions
Objects
Speakers
Keywords
Responsible AI Considerations
Information-extraction applications should follow Responsible AI principles.
Key considerations include:
Privacy
Fairness
Transparency
Inclusiveness
Accountability
Security
Privacy Concerns
Content may contain:
Personal information
Financial records
Medical data
Private conversations
Organizations should secure sensitive data appropriately.
Fairness and Bias
AI systems may perform differently across:
Languages
Accents
Demographics
Image quality
Environmental conditions
Testing and evaluation are important.
Transparency
Users should understand:
AI is analyzing their content
AI-generated outputs may contain errors
Human review may still be needed
Accuracy Limitations
Information-extraction systems may struggle with:
Blurry images
Poor audio quality
Handwritten text
Background noise
Low-resolution files
Hallucinations and Errors
AI systems may occasionally:
Extract incorrect information
Misidentify objects
Misinterpret speech
Generate inaccurate summaries
Applications should validate important outputs.
Error Handling
Applications should handle:
Unsupported file formats
Corrupted files
Authentication failures
Network interruptions
Rate limits
Advantages of Lightweight AI Applications
Benefits include:
Rapid deployment
Reduced development complexity
Scalability
Automation
Faster information processing
Limitations of Lightweight AI Applications
Challenges include:
Dependence on cloud services
Accuracy limitations
Privacy concerns
Potential bias
Environmental variability
Multimodal AI
Modern AI systems can combine:
Text
Speech
Vision
Generative AI
These systems can process multiple content types together.
High-Level Architecture
A simplified architecture often includes:
User uploads content
Application sends content to Azure AI service
AI analyzes content
Structured results are returned
Application displays extracted information
Important AI-901 Exam Tips
For the exam, remember these key points:
Information extraction retrieves useful data from content.
OCR extracts text from images and documents.
Speech recognition converts speech into text.
Object detection identifies objects within images or video.
APIs and endpoints connect applications to Azure AI services.
Authentication secures access to AI resources.
Structured outputs often use JSON-like formats.
Responsible AI principles apply to information extraction systems.
Poor-quality content can reduce accuracy.
Hallucinations are inaccurate AI-generated outputs.
Azure AI Foundry supports AI application development.
Quick Knowledge Check
Question 1
What does OCR do?
Answer
Extracts text from images and scanned documents.
Question 2
What does speech recognition do?
Answer
Converts spoken language into text.
Question 3
Why is authentication important?
Answer
It secures access to Azure AI services.
Question 4
What can reduce information-extraction accuracy?
Answer
Poor-quality images, background noise, and blurry documents.
Practice Exam Questions
Exam: AI-901
Topic: Build a Lightweight Application with Information Extraction Capabilities by Using Content Understanding
Question 1
What is the PRIMARY purpose of information extraction in AI applications?
A. To automatically retrieve useful data from content B. To increase internet speed C. To replace operating systems D. To improve monitor resolution
Correct Answer
A. To automatically retrieve useful data from content
Explanation
Information extraction uses AI to identify and retrieve meaningful data from documents, images, audio, video, and text.
Why the Other Answers Are Incorrect
B. To increase internet speed
Information extraction does not improve networking performance.
C. To replace operating systems
AI extraction tools do not replace operating systems.
D. To improve monitor resolution
This is unrelated to AI information extraction.
Question 2
What does OCR stand for?
A. Optical Character Recognition B. Open Cloud Routing C. Operational Content Reporting D. Object Classification Retrieval
Correct Answer
A. Optical Character Recognition
Explanation
OCR extracts machine-readable text from images and scanned documents.
Why the Other Answers Are Incorrect
B. Open Cloud Routing
This is not an OCR term.
C. Operational Content Reporting
This is unrelated to text extraction.
D. Object Classification Retrieval
This is not the meaning of OCR.
Question 3
Which AI capability converts spoken language into text?
A. Speech recognition B. Image classification C. Speech synthesis D. Object detection
Correct Answer
A. Speech recognition
Explanation
Speech recognition transcribes spoken words into text.
Why the Other Answers Are Incorrect
B. Image classification
This categorizes images.
C. Speech synthesis
This converts text into spoken audio.
D. Object detection
This identifies objects within images or video.
Question 4
What is a lightweight AI application?
A. A simple application that uses cloud AI services for focused tasks B. A hardware-only system C. A networking device D. A spreadsheet management tool
Correct Answer
A. A simple application that uses cloud AI services for focused tasks
Explanation
Lightweight applications typically use APIs and cloud services to provide AI capabilities without requiring complex infrastructure.
Why the Other Answers Are Incorrect
B. A hardware-only system
Lightweight AI apps commonly use cloud services.
C. A networking device
Networking devices are unrelated.
D. A spreadsheet management tool
This is unrelated to AI application design.
Question 5
How do lightweight AI applications commonly communicate with Azure AI services?
A. Through APIs and endpoints B. Through printer drivers C. Through monitor settings D. Through USB-only connections
Correct Answer
A. Through APIs and endpoints
Explanation
Applications use APIs and endpoints to send content to Azure AI services and receive analysis results.
Why the Other Answers Are Incorrect
B. Through printer drivers
Printers are unrelated to Azure AI communication.
C. Through monitor settings
This is unrelated to cloud AI services.
D. Through USB-only connections
Cloud AI services use network communication.
Question 6
Why is authentication important in Azure AI applications?
A. To secure access to AI resources B. To improve image brightness C. To increase network speed D. To improve speaker volume
Correct Answer
A. To secure access to AI resources
Explanation
Authentication ensures that only authorized users and applications can access Azure AI services.
Why the Other Answers Are Incorrect
B. To improve image brightness
Authentication does not affect image quality.
C. To increase network speed
Authentication does not improve networking.
D. To improve speaker volume
Authentication does not affect audio playback.
Question 7
Which format is commonly used for structured AI output data?
A. JSON B. JPEG C. MP3 D. ZIP
Correct Answer
A. JSON
Explanation
AI systems often return structured data in JSON-like formats for easy application integration.
Why the Other Answers Are Incorrect
B. JPEG
JPEG is an image format.
C. MP3
MP3 is an audio format.
D. ZIP
ZIP is a compressed archive format.
Question 8
Which factor can reduce information-extraction accuracy?
A. Poor-quality input content B. Spreadsheet formatting C. Keyboard layout changes D. Screen brightness settings
Correct Answer
A. Poor-quality input content
Explanation
Blurry images, poor audio quality, and noisy environments can negatively affect AI extraction accuracy.
Why the Other Answers Are Incorrect
B. Spreadsheet formatting
This does not affect AI extraction services.
C. Keyboard layout changes
This is unrelated to AI analysis.
D. Screen brightness settings
This does not affect AI processing accuracy.
Question 9
Which Responsible AI concern is especially important for information extraction applications?
A. Protecting sensitive personal data B. Increasing printer performance C. Improving spreadsheet formulas D. Reducing monitor power usage
Correct Answer
A. Protecting sensitive personal data
Explanation
Extracted content may contain financial, medical, or personal information that must be protected securely.
Why the Other Answers Are Incorrect
B. Increasing printer performance
This is unrelated to Responsible AI.
C. Improving spreadsheet formulas
This is unrelated to information extraction.
D. Reducing monitor power usage
This is unrelated to AI ethics.
Question 10
What are hallucinations in AI information-extraction systems?
A. Incorrect or fabricated AI-generated outputs B. Hardware installation failures C. Network outages D. Operating system crashes
Correct Answer
A. Incorrect or fabricated AI-generated outputs
Explanation
Hallucinations occur when AI systems generate inaccurate extracted information, captions, summaries, or identifications.
Why the Other Answers Are Incorrect
B. Hardware installation failures
This is unrelated to AI-generated outputs.
C. Network outages
This is a connectivity issue.
D. Operating system crashes
This is unrelated to AI hallucinations.
Final Thoughts
Building lightweight applications with information extraction capabilities is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as OCR, speech recognition, APIs, authentication, structured outputs, Responsible AI principles, and lightweight AI workflows.
Azure AI services and Azure AI Foundry provide powerful tools for creating scalable applications capable of extracting valuable information from text, images, audio, video, and documents.
This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. This topic falls under these sections: Implement AI solutions by using Microsoft Foundry (55–60%) --> Implement AI solutions for information extraction by using Foundry --> Extract information from audio and video by using Content Understanding
Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.
Organizations increasingly rely on AI systems to analyze audio and video content for automation, accessibility, security, analytics, and customer experiences. AI-powered content understanding solutions can extract valuable information from spoken language, sounds, images, and moving video streams.
For the AI-901 certification exam, candidates should understand the foundational concepts behind extracting information from audio and video by using Azure Content Understanding and Microsoft Foundry tools.
This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.
What Is Content Understanding?
Content understanding refers to AI systems analyzing and interpreting different forms of content, including:
Audio
Video
Images
Documents
Text
AI systems can identify patterns, extract information, and generate useful insights.
Azure Content Understanding
Azure Content Understanding enables AI-powered analysis of multimedia content.
Capabilities include:
Speech recognition
Video analysis
Speaker identification
Caption generation
Object detection
Keyword extraction
Azure AI Foundry
Azure AI Foundry provides tools for building, testing, and managing AI applications.
Developers can:
Deploy AI services
Process multimedia content
Build lightweight applications
Test AI workflows
Audio Information Extraction
AI systems can analyze audio files to extract useful information.
Examples include:
Spoken words
Speaker identity
Keywords
Emotions
Language detection
Speech Recognition
Speech recognition converts spoken language into text.
Example
Input
Audio recording of a meeting
Output
Meeting transcript
Speaker Identification
AI systems can distinguish between different speakers.
Example
A meeting transcription may identify:
Speaker 1
Speaker 2
Speaker 3
Language Detection
AI systems can identify the spoken language within audio content.
Example
An AI system determines whether audio is:
English
Spanish
French
Japanese
Keyword Extraction
AI systems can identify important terms within conversations.
Example
A customer support call may extract:
Product names
Complaint topics
Order numbers
Sentiment Analysis
AI systems can analyze emotional tone in speech.
Example
A customer call may be classified as:
Positive
Neutral
Negative
Video Information Extraction
Video analysis combines:
Audio analysis
Image analysis
Motion analysis
Common Video Analysis Capabilities
AI systems may perform:
Object detection
Facial analysis
Activity recognition
Scene description
Text extraction
Caption generation
Object Detection in Video
AI systems can identify objects appearing in video frames.
Example
A traffic-monitoring system may detect:
Cars
Trucks
Pedestrians
Traffic lights
Scene Detection
AI systems can identify scene changes within videos.
Example
A sports video may identify:
Game start
Replay segments
Commercial breaks
Video Captioning
AI systems can generate descriptions or subtitles for videos.
Example
A training video may automatically generate captions for accessibility.
Optical Character Recognition (OCR) in Video
AI systems can extract text appearing in video frames.
Example
A video may contain:
Street signs
License plates
Product labels
APIs and Endpoints
Applications communicate with Azure AI services using:
APIs
Endpoints
Audio and video content is submitted programmatically for analysis.
Authentication
Applications must securely authenticate before accessing Azure AI services.
Common authentication methods include:
API keys
Azure credentials
Managed identities
Lightweight Application Workflow
A typical workflow includes:
User uploads audio or video
Application sends content to AI service
AI analyzes multimedia content
Results are returned
Application displays extracted information
Example High-Level Pseudocode
media = upload_media()
results = analyze_media(media)
display_results(results)
For AI-901, understanding the workflow is more important than memorizing exact syntax.
Common Real-World Scenarios
Scenario 1: Meeting Transcription
Goal
Convert meeting audio into searchable text.
Features
Speech recognition
Speaker identification
Keyword extraction
Scenario 2: Call Center Analytics
Goal
Analyze customer service calls.
Features
Sentiment analysis
Topic extraction
Call summarization
Scenario 3: Security Monitoring
Goal
Analyze surveillance video.
Features
Object detection
Activity recognition
Facial analysis
Scenario 4: Video Accessibility
Goal
Improve accessibility for multimedia content.
Features
Caption generation
Speech transcription
Scene descriptions
Responsible AI Considerations
Audio and video AI systems should follow Responsible AI principles.
Key considerations include:
Privacy
Fairness
Transparency
Inclusiveness
Accountability
Security
Privacy Concerns
Audio and video may contain:
Personal conversations
Faces
Biometric data
Sensitive information
Organizations should protect multimedia data appropriately.
Fairness and Bias
Speech and video systems may perform differently across:
Languages
Accents
Dialects
Lighting conditions
Demographics
Testing and evaluation are important.
Transparency
Users should understand:
AI is analyzing multimedia content
AI-generated outputs may contain errors
Human review may still be needed
Accuracy Limitations
Audio and video analysis systems may struggle with:
Background noise
Poor audio quality
Low-resolution video
Obstructed visuals
Multiple overlapping speakers
Hallucinations and Errors
AI systems may occasionally:
Misidentify speakers
Generate inaccurate captions
Misinterpret speech
Detect nonexistent objects
Applications should validate important outputs.
Error Handling
Applications should handle:
Unsupported file formats
Corrupted media files
Authentication failures
Network interruptions
Rate limits
Advantages of Multimedia Information Extraction
Benefits include:
Automation
Faster analysis
Improved accessibility
Searchable content
Scalable processing
Limitations of Multimedia Information Extraction
Challenges include:
Privacy concerns
Accuracy limitations
Bias
Environmental variability
Ethical considerations
Multimodal AI
Modern AI systems may combine:
Speech
Vision
Text
Generative AI
These systems can:
Analyze multimedia content
Answer questions
Generate summaries
Create captions and descriptions
High-Level Architecture
A simplified architecture often includes:
User uploads audio/video
Application sends media to Azure AI service
AI processes multimedia content
Structured results are returned
Application displays extracted information
Important AI-901 Exam Tips
For the exam, remember these key points:
Speech recognition converts speech to text.
Speaker identification distinguishes speakers.
Sentiment analysis detects emotional tone.
OCR can extract text from video frames.
Object detection identifies objects in video.
APIs and endpoints connect applications to AI services.
Authentication secures AI resources.
Responsible AI principles apply to multimedia AI systems.
Poor audio or video quality can reduce accuracy.
Hallucinations are inaccurate AI-generated outputs.
Azure AI Foundry supports multimedia AI application development.
Quick Knowledge Check
Question 1
What does speech recognition do?
Answer
Converts spoken language into text.
Question 2
What is speaker identification?
Answer
Distinguishing between different speakers in audio content.
Question 3
Why is authentication important?
Answer
It secures access to Azure AI services.
Question 4
What can reduce multimedia-analysis accuracy?
Answer
Background noise, low-quality audio, and poor video quality.
Practice Exam Questions
Exam: AI-901
Topic: Extract Information from Audio and Video by Using Content Understanding
Question 1
What is the PRIMARY purpose of content understanding in AI systems?
A. To analyze and interpret multimedia content such as audio and video B. To increase internet bandwidth C. To replace operating systems D. To improve keyboard performance
Correct Answer
A. To analyze and interpret multimedia content such as audio and video
Explanation
Content understanding enables AI systems to analyze audio, video, images, and other forms of content to extract useful information.
Why the Other Answers Are Incorrect
B. To increase internet bandwidth
Content understanding does not improve networking speed.
C. To replace operating systems
AI multimedia analysis does not replace operating systems.
D. To improve keyboard performance
This is unrelated to AI content understanding.
Question 2
What does speech recognition do?
A. Converts spoken language into text B. Converts images into audio C. Encrypts media files D. Repairs damaged videos
Correct Answer
A. Converts spoken language into text
Explanation
Speech recognition transcribes spoken words into machine-readable text.
Why the Other Answers Are Incorrect
B. Converts images into audio
This is unrelated to speech recognition.
C. Encrypts media files
Encryption is unrelated to speech transcription.
D. Repairs damaged videos
Speech recognition does not repair media files.
Question 3
Which AI capability identifies different speakers in an audio recording?
A. Speaker identification B. OCR C. Image classification D. Object compression
Correct Answer
A. Speaker identification
Explanation
Speaker identification distinguishes between different speakers within audio content.
Why the Other Answers Are Incorrect
B. OCR
OCR extracts text from images.
C. Image classification
This categorizes images.
D. Object compression
This is not a multimedia AI capability.
Question 4
What is sentiment analysis used for in audio processing?
A. Detecting emotional tone in speech B. Increasing audio volume C. Compressing audio files D. Repairing broken microphones
Correct Answer
A. Detecting emotional tone in speech
Explanation
Sentiment analysis identifies whether speech content is positive, negative, or neutral.
Why the Other Answers Are Incorrect
B. Increasing audio volume
This is unrelated to AI analysis.
C. Compressing audio files
Compression is unrelated to sentiment detection.
D. Repairing broken microphones
This is a hardware issue.
Question 5
Which AI capability can extract text from video frames?
A. OCR B. Speech synthesis C. Audio normalization D. File compression
Correct Answer
A. OCR
Explanation
OCR can identify and extract text that appears visually within video frames.
Why the Other Answers Are Incorrect
B. Speech synthesis
This converts text into speech.
C. Audio normalization
This adjusts sound levels.
D. File compression
This reduces file size.
Question 6
How do lightweight multimedia-analysis applications typically communicate with Azure AI services?
A. Through APIs and endpoints B. Through printer drivers C. Through monitor settings D. Through USB-only connections
Correct Answer
A. Through APIs and endpoints
Explanation
Applications use APIs and endpoints to send audio and video content to Azure AI services for analysis.
Why the Other Answers Are Incorrect
B. Through printer drivers
Printers are unrelated to multimedia AI communication.
C. Through monitor settings
This is unrelated to cloud AI services.
D. Through USB-only connections
Cloud AI services use network communication.
Question 7
Why is authentication important when using Azure AI multimedia services?
A. To secure access to AI resources B. To improve speaker volume C. To increase internet speed D. To improve video resolution
Correct Answer
A. To secure access to AI resources
Explanation
Authentication ensures that only authorized users and applications can access Azure AI services.
Why the Other Answers Are Incorrect
B. To improve speaker volume
Authentication does not affect sound levels.
C. To increase internet speed
Authentication does not improve networking.
D. To improve video resolution
Authentication does not affect video quality.
Question 8
Which factor can reduce speech-recognition accuracy?
A. Background noise B. Spreadsheet formatting C. Keyboard layout changes D. Monitor brightness
Correct Answer
A. Background noise
Explanation
Noise and poor audio quality can make it difficult for AI systems to correctly recognize speech.
Why the Other Answers Are Incorrect
B. Spreadsheet formatting
This does not affect audio AI systems.
C. Keyboard layout changes
This is unrelated to speech recognition.
D. Monitor brightness
This does not affect audio analysis.
Question 9
Which Responsible AI concern is especially important for audio and video analysis systems?
A. Protecting sensitive personal information B. Increasing printer speed C. Improving spreadsheet formulas D. Reducing file storage costs
Correct Answer
A. Protecting sensitive personal information
Explanation
Audio and video files may contain faces, voices, and personal conversations that require privacy protection.
Why the Other Answers Are Incorrect
B. Increasing printer speed
This is unrelated to Responsible AI.
C. Improving spreadsheet formulas
This is unrelated to multimedia analysis.
D. Reducing file storage costs
This is not a Responsible AI principle.
Question 10
What are hallucinations in multimedia AI systems?
A. Incorrect or fabricated AI-generated outputs B. Hardware installation failures C. Network outages D. Speaker hardware malfunctions
Correct Answer
A. Incorrect or fabricated AI-generated outputs
Explanation
Hallucinations occur when AI systems produce inaccurate captions, object detections, speaker identifications, or transcriptions.
Why the Other Answers Are Incorrect
B. Hardware installation failures
This is unrelated to AI-generated outputs.
C. Network outages
This is a connectivity issue.
D. Speaker hardware malfunctions
This is a hardware problem, not an AI hallucination.
Final Thoughts
Extracting information from audio and video by using Content Understanding is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as speech recognition, video analysis, OCR, APIs, authentication, Responsible AI principles, and lightweight multimedia-analysis workflows.
Azure AI services and Azure AI Foundry provide powerful tools for building intelligent multimedia applications capable of understanding spoken language, video content, and visual information at scale.
This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. This topic falls under these sections: Implement AI solutions by using Microsoft Foundry (55–60%) --> Implement AI solutions for information extraction by using Foundry --> Extract information from documents and forms by using Azure Content Understanding in Foundry Tools
Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.
Organizations process enormous amounts of documents every day, including invoices, receipts, forms, contracts, and identification documents. AI-powered information extraction solutions help automate the process of reading, understanding, and organizing document data.
For the AI-901 certification exam, candidates should understand the foundational concepts behind extracting information from documents and forms by using Azure Content Understanding and Microsoft Foundry tools.
This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.
What Is Information Extraction?
Information extraction is the process of identifying and retrieving useful data from documents, images, forms, audio, or other content.
Examples include extracting:
Names
Dates
Invoice totals
Addresses
Phone numbers
Product information
What Is Azure Content Understanding?
Azure Content Understanding helps AI systems analyze and interpret structured and unstructured documents.
Capabilities include:
Text extraction
Form recognition
Document analysis
Information classification
Key-value pair extraction
Azure AI Foundry
Azure AI Foundry provides tools for building, testing, and managing AI-powered applications.
Developers can:
Configure AI services
Process documents
Test extraction workflows
Build lightweight AI applications
Structured vs. Unstructured Documents
Structured Documents
Structured documents follow a consistent layout.
Examples include:
Tax forms
Invoices
Receipts
Application forms
Unstructured Documents
Unstructured documents have less predictable layouts.
Examples include:
Emails
Letters
Articles
Contracts
Optical Character Recognition (OCR)
OCR converts text within images or scanned documents into machine-readable text.
Example
Input
Scanned receipt image
OCR Output
Store name
Date
Total amount
Form Recognition
Form recognition identifies fields and values within forms.
Example
Form
Insurance application
Extracted Data
Customer name
Policy number
Address
Claim amount
Key-Value Pair Extraction
AI systems can identify relationships between labels and values.
Example
Key
Value
Invoice Number
INV-1045
Total
$250.00
Due Date
05/30/2026
Table Extraction
AI can identify and extract tables from documents.
Example
A receipt table may contain:
Item names
Quantities
Prices
Classification
Document classification identifies the type of document being processed.
Example
The system determines whether a file is:
Invoice
Contract
Receipt
Resume
Named Entity Recognition (NER)
NER identifies important entities within text.
Entities may include:
People
Organizations
Locations
Dates
Example
Text
“John Smith works for Contoso in Seattle.”
Extracted Entities
John Smith (Person)
Contoso (Organization)
Seattle (Location)
APIs and Endpoints
Applications communicate with Azure AI services through:
APIs
Endpoints
Documents are submitted for analysis programmatically.
Authentication
Applications must securely authenticate before accessing Azure AI services.
Common authentication methods include:
API keys
Azure credentials
Managed identities
Lightweight Application Workflow
A typical workflow includes:
User uploads document
Application sends file to AI service
AI extracts information
Results are returned
Application displays or stores extracted data
Example Workflow
Input
Scanned invoice
AI Processing
OCR
Key-value extraction
Table analysis
Output
Structured invoice data
Example High-Level Pseudocode
document = upload_document()
results = analyze_document(document)
display_results(results)
For AI-901, understanding the workflow is more important than memorizing exact syntax.
Common Real-World Scenarios
Scenario 1: Invoice Processing
Goal
Automate invoice data extraction.
Features
OCR
Table extraction
Total amount detection
Scenario 2: Receipt Scanning
Goal
Extract purchase information from receipts.
Features
Text extraction
Merchant identification
Expense categorization
Scenario 3: Resume Processing
Goal
Extract candidate information from resumes.
Features
Name extraction
Skill identification
Contact information detection
Scenario 4: Healthcare Forms
Goal
Digitize patient records.
Features
Form recognition
Key-value extraction
Classification
Responsible AI Considerations
Document-processing applications should follow Responsible AI principles.
Key considerations include:
Privacy
Security
Fairness
Transparency
Accountability
Inclusiveness
Privacy Concerns
Documents may contain:
Personal information
Financial data
Medical information
Legal records
Organizations should protect sensitive data appropriately.
Security Considerations
Applications should secure:
Uploaded files
Stored documents
API credentials
Extracted data
Transparency
Users should understand:
AI is analyzing documents
Extracted data may contain errors
Human review may still be needed
Accuracy Limitations
AI extraction systems may struggle with:
Poor scan quality
Handwritten text
Complex layouts
Damaged documents
Hallucinations and Errors
AI systems may occasionally:
Extract incorrect values
Miss fields
Misclassify documents
Applications should validate important information.
APIs and endpoints connect applications to Azure AI services.
Authentication secures access to AI resources.
Responsible AI principles apply to document-processing systems.
Poor document quality can reduce extraction accuracy.
AI-generated outputs may still require validation.
Quick Knowledge Check
Question 1
What does OCR do?
Answer
Extracts machine-readable text from images or scanned documents.
Question 2
What is form recognition?
Answer
Identifying and extracting fields and values from forms.
Question 3
Why is authentication important?
Answer
It secures access to Azure AI services and protects resources.
Question 4
What can reduce extraction accuracy?
Answer
Poor scan quality, handwriting, and inconsistent document layouts.
Practice Exam Questions
Exam: AI-901
Topic: Extract Information from Documents and Forms by Using Azure Content Understanding in Foundry Tools
Question 1
What is the PRIMARY purpose of information extraction AI solutions?
A. To retrieve useful data from documents and content B. To increase internet bandwidth C. To replace operating systems D. To improve monitor resolution
Correct Answer
A. To retrieve useful data from documents and content
Explanation
Information extraction AI systems identify and retrieve meaningful information such as names, dates, totals, and addresses from documents and forms.
Why the Other Answers Are Incorrect
B. To increase internet bandwidth
Information extraction does not affect network speed.
C. To replace operating systems
AI document processing does not replace operating systems.
D. To improve monitor resolution
This is unrelated to AI information extraction.
Question 2
What does OCR stand for?
A. Optical Character Recognition B. Open Content Retrieval C. Object Classification Routing D. Operational Compute Reporting
Correct Answer
A. Optical Character Recognition
Explanation
OCR converts printed or handwritten text within images and scanned documents into machine-readable text.
Why the Other Answers Are Incorrect
B. Open Content Retrieval
This is not the meaning of OCR.
C. Object Classification Routing
This is unrelated to document analysis.
D. Operational Compute Reporting
This is not an OCR term.
Question 3
Which AI capability identifies fields and values within forms?
A. Form recognition B. Speech synthesis C. Image compression D. Network monitoring
Correct Answer
A. Form recognition
Explanation
Form recognition extracts structured information such as names, dates, totals, and addresses from forms and documents.
Why the Other Answers Are Incorrect
B. Speech synthesis
This converts text into speech.
C. Image compression
This reduces file size and is unrelated to field extraction.
D. Network monitoring
This is unrelated to document AI.
Question 4
Which Azure platform provides tools for building and managing AI-powered applications?
A. Azure AI Foundry B. Microsoft Paint C. Windows Task Manager D. Azure DNS
Correct Answer
A. Azure AI Foundry
Explanation
Azure AI Foundry provides tools for deploying, testing, and managing AI applications and services.
Why the Other Answers Are Incorrect
B. Microsoft Paint
Paint is a graphics editor.
C. Windows Task Manager
This is a system monitoring tool.
D. Azure DNS
This is a networking service.
Question 5
What is key-value pair extraction?
A. Identifying labels and their associated values in documents B. Encrypting document files C. Compressing image sizes D. Converting audio into text
Correct Answer
A. Identifying labels and their associated values in documents
Explanation
Key-value extraction identifies relationships such as:
Invoice Number → INV-1045
Total → $250.00
Why the Other Answers Are Incorrect
B. Encrypting document files
Encryption is unrelated to data extraction.
C. Compressing image sizes
Compression is unrelated to document intelligence.
D. Converting audio into text
This is speech recognition.
Question 6
What is the purpose of document classification?
A. To identify the type of document being processed B. To increase network performance C. To generate music files D. To repair damaged documents physically
Correct Answer
A. To identify the type of document being processed
Explanation
Document classification determines whether a file is an invoice, contract, receipt, resume, or another document type.
Why the Other Answers Are Incorrect
B. To increase network performance
Classification does not improve networking.
C. To generate music files
This is unrelated to document AI.
D. To repair damaged documents physically
AI classification does not physically repair documents.
Question 7
How do lightweight document-processing applications typically communicate with Azure AI services?
A. Through APIs and endpoints B. Through USB-only connections C. Through monitor calibration tools D. Through printer drivers
Correct Answer
A. Through APIs and endpoints
Explanation
Applications send documents to Azure AI services using APIs and endpoints and receive structured analysis results.
Why the Other Answers Are Incorrect
B. Through USB-only connections
Cloud services use network communication.
C. Through monitor calibration tools
This is unrelated to AI services.
D. Through printer drivers
Printers are unrelated to cloud AI communication.
Question 8
Which factor can reduce the accuracy of document extraction systems?
A. Poor document quality B. Spreadsheet color themes C. Keyboard layout changes D. Audio playback speed
Correct Answer
A. Poor document quality
Explanation
Blurry scans, damaged pages, handwriting, and poor lighting can negatively affect extraction accuracy.
Why the Other Answers Are Incorrect
B. Spreadsheet color themes
This does not affect document extraction AI.
C. Keyboard layout changes
This is unrelated to AI document analysis.
D. Audio playback speed
This is unrelated to document processing.
Question 9
Why is authentication important when using Azure AI services?
A. To secure access to AI resources B. To improve image resolution C. To increase internet speed D. To compress document files
Correct Answer
A. To secure access to AI resources
Explanation
Authentication ensures that only authorized users and applications can access AI services.
Why the Other Answers Are Incorrect
B. To improve image resolution
Authentication does not affect image quality.
C. To increase internet speed
Authentication does not improve networking.
D. To compress document files
Authentication is unrelated to file compression.
Question 10
Which Responsible AI concern is especially important when processing documents?
A. Protecting sensitive personal information B. Increasing monitor brightness C. Improving printer speed D. Reducing spreadsheet file size
Correct Answer
A. Protecting sensitive personal information
Explanation
Documents may contain financial, medical, legal, or personal information that must be protected appropriately.
Why the Other Answers Are Incorrect
B. Increasing monitor brightness
This is unrelated to Responsible AI.
C. Improving printer speed
This is unrelated to document intelligence.
D. Reducing spreadsheet file size
This is unrelated to AI ethics or privacy.
Final Thoughts
Extracting information from documents and forms using Azure Content Understanding and Foundry tools is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as OCR, form recognition, document analysis, APIs, authentication, Responsible AI principles, and lightweight document-processing workflows.
Azure AI services and Azure AI Foundry provide powerful tools for automating information extraction and improving efficiency across business, healthcare, finance, and administrative scenarios.
This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. This topic falls under these sections: Implement AI solutions by using Microsoft Foundry (55–60%) --> Implement AI solutions for text and speech by using Foundry --> Build a lightweight application by using Azure Speech in Foundry Tools
Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.
Speech-enabled AI applications are becoming increasingly common in customer service, accessibility, virtual assistants, and productivity solutions. Microsoft Azure provides speech services that allow developers to add speech recognition and speech synthesis capabilities to lightweight AI applications.
For the AI-901 certification exam, candidates should understand the foundational concepts behind building lightweight speech-enabled applications using Azure Speech and Microsoft Foundry tools.
This topic falls under the “Implement AI solutions for text and speech by using Foundry” section of the AI-901 exam objectives.
What Is Azure AI Speech?
Azure AI Speech is a cloud-based AI service that enables speech-related functionality in applications.
Azure AI Speech supports:
Speech recognition
Speech synthesis
Speech translation
Voice generation
What Is a Lightweight Application?
A lightweight application is a simple application designed to perform focused tasks with minimal complexity.
Characteristics include:
Simple user interface
Fast deployment
Lower resource usage
Easy maintenance
Examples of Lightweight Speech Applications
Examples include:
Voice-enabled chatbots
Simple voice assistants
Speech-to-text applications
Text-to-speech readers
Voice-controlled support tools
Azure AI Foundry
Azure AI Foundry provides tools for building, deploying, and testing AI-powered applications.
Developers can:
Access AI services
Configure models
Test applications
Manage deployments
Speech Recognition
Speech recognition converts spoken language into text.
This process is commonly called:
Speech-to-text (STT)
Automatic speech recognition (ASR)
Example
Spoken Input
“Schedule a meeting tomorrow.”
Recognized Text
“Schedule a meeting tomorrow.”
Speech Synthesis
Speech synthesis converts written text into spoken audio.
This process is commonly called:
Text-to-speech (TTS)
Example
Text
“Your appointment is confirmed.”
Spoken Output
The application reads the text aloud.
Speech Translation
Speech translation converts spoken language from one language into another.
Example
Spoken English
“Good morning.”
Translated Spanish Audio
“Buenos días.”
Voice Generation
AI systems can generate natural-sounding voices for:
Virtual assistants
Narration
Accessibility
Customer service systems
Basic Workflow of a Speech Application
A lightweight speech application commonly follows this workflow:
Applications communicate with Azure Speech services using:
APIs
Endpoints
These allow applications to send requests and receive responses programmatically.
Authentication
Applications must securely authenticate before using Azure Speech services.
Common methods include:
API keys
Azure credentials
Managed identities
Common User Interface Components
A lightweight speech application often includes:
Microphone input button
Text display area
Playback controls
Response output area
Real-Time Processing
Many speech applications process audio in real time.
This allows conversational experiences with minimal delay.
Streaming Audio
Streaming audio enables continuous processing of speech as users speak.
Benefits include:
Faster responses
More natural interactions
Reduced waiting time
Conversation Context
Some applications preserve context across interactions.
This allows more natural conversations.
Example
User
“Who founded Microsoft?”
User Later
“When was it created?”
The system understands “it” refers to Microsoft.
System Prompts
System prompts guide AI behavior and responses.
They help define:
Tone
Personality
Response style
Safety boundaries
Example System Prompt
“You are a friendly virtual assistant.”
Responsible AI Considerations
Speech-enabled applications should follow Responsible AI principles.
Key considerations include:
Privacy
Security
Inclusiveness
Transparency
Fairness
Accountability
Privacy Concerns
Speech systems may process sensitive spoken information.
Organizations should:
Secure recordings
Protect user conversations
Minimize unnecessary data retention
Inclusiveness
Speech applications should support:
Different accents
Multiple languages
Diverse speech patterns
Accessibility needs
Transparency
Users should know:
AI is processing speech
Audio may be analyzed
AI-generated responses may contain errors
Hallucinations
Generative AI systems may occasionally generate inaccurate responses.
These inaccuracies are called hallucinations.
Applications should not assume responses are always correct.
Error Handling
Applications should handle:
Background noise
Recognition errors
Authentication failures
Network interruptions
Rate limits
Background Noise Challenges
Speech recognition accuracy may decrease in:
Loud environments
Crowded spaces
Poor microphone conditions
Rate Limits
Azure AI services may limit request frequency.
Applications should handle throttling gracefully.
Latency
Latency refers to delays between:
User speech
AI processing
Spoken responses
Low latency improves user experience.
Advantages of Speech-Enabled Applications
Benefits include:
Natural interaction
Hands-free usage
Accessibility improvements
Faster communication
Improved engagement
Limitations of Speech Applications
Challenges include:
Accent variability
Background noise
Recognition inaccuracies
Privacy concerns
Network dependency
Common Real-World Scenarios
Scenario 1: Voice Assistant
Goal
Allow users to ask spoken questions.
Features
Speech recognition
Spoken responses
Conversational interaction
Scenario 2: Accessibility Tool
Goal
Assist visually impaired users.
Features
Text-to-speech
Voice commands
Audio navigation
Scenario 3: Customer Support Bot
Goal
Provide voice-based support.
Features
Real-time speech recognition
AI-generated responses
Multilingual support
High-Level Application Workflow
A simplified workflow includes:
Capture speech
Convert speech to text
Process request
Generate response
Convert response to speech
Play audio response
Example High-Level Pseudocode
audio = capture_audio()
text = speech_to_text(audio)
response = process_request(text)
speak(response)
For AI-901, understanding the workflow is more important than memorizing exact syntax.
Important AI-901 Exam Tips
For the exam, remember these key points:
Azure AI Speech provides speech-related AI services.
Speech recognition converts speech to text.
Speech synthesis converts text to speech.
Azure AI Foundry supports AI application development.
APIs and endpoints connect applications to cloud AI services.
Authentication secures access to Azure services.
Streaming audio supports real-time interaction.
Responsible AI principles apply to speech-enabled applications.
Inclusiveness is important for diverse speech patterns and accents.
Hallucinations are inaccurate AI-generated outputs.
Quick Knowledge Check
Question 1
What does speech recognition do?
Answer
Converts spoken language into text.
Question 2
What does speech synthesis do?
Answer
Converts text into spoken audio.
Question 3
Why is authentication important?
Answer
It secures access to Azure AI services.
Question 4
Why is inclusiveness important in speech applications?
Answer
To support users with different accents, languages, and accessibility needs.
Practice Exam Questions
Question 1
What is the PRIMARY purpose of Azure AI Speech?
A. To manage virtual machines B. To provide speech-related AI capabilities such as speech recognition and speech synthesis C. To monitor network hardware D. To create relational databases
Correct Answer
B. To provide speech-related AI capabilities such as speech recognition and speech synthesis
Explanation
Azure AI Speech provides cloud-based speech services including speech-to-text and text-to-speech capabilities.
Why the Other Answers Are Incorrect
A. To manage virtual machines
Virtual machine management is unrelated to speech AI.
C. To monitor network hardware
Azure AI Speech does not monitor infrastructure devices.
D. To create relational databases
Database creation is unrelated to speech services.
Question 2
What does speech recognition do?
A. Converts speech into text B. Converts images into speech C. Detects objects in video D. Compresses audio files
Correct Answer
A. Converts speech into text
Explanation
Speech recognition, also called speech-to-text, converts spoken language into written text.
Why the Other Answers Are Incorrect
B. Converts images into speech
This is unrelated to speech recognition.
C. Detects objects in video
This is a computer vision task.
D. Compresses audio files
Speech recognition does not perform compression.
Question 3
What does speech synthesis perform?
A. Converts text into spoken audio B. Detects entities in text C. Creates spreadsheets automatically D. Increases internet bandwidth
Correct Answer
A. Converts text into spoken audio
Explanation
Speech synthesis, also called text-to-speech, generates spoken audio from written text.
Why the Other Answers Are Incorrect
B. Detects entities in text
This is a text analysis task.
C. Creates spreadsheets automatically
This is unrelated to speech services.
D. Increases internet bandwidth
Speech synthesis does not affect networking.
Question 4
Which Microsoft platform provides tools for building and managing AI applications?
A. Azure AI Foundry B. Microsoft Paint C. Windows Media Player D. Microsoft Calculator
Correct Answer
A. Azure AI Foundry
Explanation
Azure AI Foundry provides tools for building, testing, deploying, and managing AI solutions.
Why the Other Answers Are Incorrect
B. Microsoft Paint
Paint is a graphics editor.
C. Windows Media Player
This is a media playback application.
D. Microsoft Calculator
This is a utility application.
Question 5
How do lightweight applications typically communicate with Azure AI Speech services?
A. Through APIs and endpoints B. Through printer drivers only C. Through USB flash drives D. Through monitor calibration settings
Correct Answer
A. Through APIs and endpoints
Explanation
Applications use APIs and cloud endpoints to send requests and receive AI-generated responses.
Why the Other Answers Are Incorrect
B. Through printer drivers only
Printer drivers are unrelated to AI services.
C. Through USB flash drives
Cloud AI services use network communication.
D. Through monitor calibration settings
This is unrelated to APIs.
Question 6
Why is authentication important when using Azure AI Speech?
A. To secure access to AI services B. To improve microphone volume C. To increase response creativity D. To remove network latency
Correct Answer
A. To secure access to AI services
Explanation
Authentication helps ensure only authorized users and applications can access Azure AI resources.
Why the Other Answers Are Incorrect
B. To improve microphone volume
Authentication does not affect hardware settings.
C. To increase response creativity
Creativity is controlled through model parameters.
D. To remove network latency
Authentication does not control connection speed.
Question 7
What is a benefit of streaming audio in speech-enabled applications?
A. Faster and more natural interactions B. Permanent elimination of all speech errors C. Automatic hardware upgrades D. Unlimited cloud storage
Correct Answer
A. Faster and more natural interactions
Explanation
Streaming audio enables real-time processing, improving responsiveness and conversational flow.
Why the Other Answers Are Incorrect
B. Permanent elimination of all speech errors
Speech systems can still make mistakes.
C. Automatic hardware upgrades
Streaming does not upgrade hardware.
D. Unlimited cloud storage
Streaming does not affect storage capacity.
Question 8
Which Responsible AI consideration is especially important for speech-enabled applications?
A. Protecting sensitive spoken information B. Increasing screen brightness C. Improving printer speed D. Accelerating video rendering
Correct Answer
A. Protecting sensitive spoken information
Explanation
Speech applications may process personal or confidential audio, making privacy and security important concerns.
Why the Other Answers Are Incorrect
B. Increasing screen brightness
This is unrelated to Responsible AI.
C. Improving printer speed
Printers are unrelated to speech AI.
D. Accelerating video rendering
This is unrelated to speech processing.
Question 9
What challenge can negatively affect speech recognition accuracy?
A. Background noise B. Spreadsheet formatting C. Screen resolution D. Video playback speed
Correct Answer
A. Background noise
Explanation
Loud environments and poor audio quality can reduce speech recognition accuracy.
Why the Other Answers Are Incorrect
B. Spreadsheet formatting
This does not affect speech recognition.
C. Screen resolution
Speech recognition does not depend on display quality.
D. Video playback speed
This is unrelated to speech input processing.
Question 10
What is one advantage of speech-enabled AI applications?
A. Hands-free interaction B. Guaranteed perfect accuracy C. Elimination of all privacy concerns D. Removal of internet requirements
Correct Answer
A. Hands-free interaction
Explanation
Speech-enabled applications allow users to interact naturally without typing.
Why the Other Answers Are Incorrect
B. Guaranteed perfect accuracy
Speech systems can still make errors.
C. Elimination of all privacy concerns
Privacy protections are still necessary.
D. Removal of internet requirements
Cloud-based speech services generally require internet connectivity.
Final Thoughts
Building lightweight applications using Azure Speech in Foundry tools is an important AI-901 exam topic. Microsoft expects candidates to understand how speech-enabled AI applications work, including speech recognition, speech synthesis, APIs, authentication, Responsible AI considerations, and real-time conversational workflows.
Azure AI Speech and Azure AI Foundry provide powerful cloud-based tools that make it easier to create modern voice-enabled AI applications for business, accessibility, and productivity scenarios.
This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. This topic falls under these sections: Identify AI concepts and capabilities (40–45%) --> Identify AI workloads --> Identify techniques to extract information from text, images, audio, and videos
Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.
Information extraction is one of the most valuable uses of AI and an important topic for the AI-901 certification exam. Organizations generate enormous amounts of unstructured data every day, including documents, emails, images, audio recordings, and videos. AI systems help convert this unstructured data into structured, usable information.
Microsoft expects AI-901 candidates to understand common techniques used to extract information from text, images, audio, and video content.
This topic falls under the “Identify AI workloads” section of the AI-901 exam objectives.
What Is Information Extraction?
Information extraction is the process of identifying and retrieving useful structured information from unstructured or semi-structured data.
AI systems analyze content and extract meaningful data automatically.
Examples of Information Extraction
Source
Extracted Information
Documents
Names, dates, invoice totals
Emails
Customer requests, keywords
Images
Objects, faces, text
Audio
Spoken words, speaker identity
Video
Activities, objects, movement
Structured vs. Unstructured Data
Understanding structured and unstructured data is important for this topic.
Structured Data
Unstructured Data
Tables
Emails
Databases
Images
Spreadsheets
Audio
Defined formats
Videos
Organized fields
Documents
AI techniques help transform unstructured data into structured information.
Information Extraction from Text
AI systems commonly use Natural Language Processing (NLP) to extract information from text.
Common Text Extraction Techniques
For the AI-901 exam, important techniques include:
Keyword extraction
Named Entity Recognition (NER)
Sentiment analysis
Summarization
Language detection
Text classification
Keyword Extraction
Keyword extraction identifies important words or phrases within text.
Example
Extracting phrases like:
“shipping delay”
“billing issue”
“customer satisfaction”
from support tickets.
Named Entity Recognition (NER)
NER identifies entities such as:
People
Organizations
Locations
Dates
Phone numbers
Products
Example
Input
“Microsoft will host an event in Seattle on June 15.”
Extracted Entities
Microsoft → Organization
Seattle → Location
June 15 → Date
Sentiment Analysis
Sentiment analysis identifies emotional tone within text.
Possible Results
Positive
Negative
Neutral
Example
Analyzing customer reviews to determine satisfaction levels.
Summarization
Summarization creates shorter versions of long text.
Example
Generating meeting summaries from lengthy transcripts.
Text Classification
Text classification assigns categories to text.
Example
Automatically labeling emails as:
Support
Sales
Billing
Information Extraction from Images
Computer vision techniques extract information from images.
Common Image Extraction Techniques
Important techniques include:
OCR
Image classification
Object detection
Facial recognition
Image tagging
Optical Character Recognition (OCR)
OCR extracts text from images and scanned documents.
OCR Example
Input
Scanned invoice image.
Extracted Information
Invoice number
Total amount
Vendor name
Dates
Common OCR Use Cases
Receipt scanning
Invoice processing
Document digitization
Form extraction
Image Classification
Image classification identifies the overall category of an image.
Example
Identifying whether an image contains:
A dog
A car
A building
Object Detection
Object detection identifies and locates multiple objects within images.
Example
Detecting:
Cars
Pedestrians
Traffic lights
in a street image.
Facial Recognition
Facial recognition identifies or verifies people based on facial features.
Speech AI technologies extract information from spoken audio.
Common Audio Extraction Techniques
Important techniques include:
Speech recognition
Speaker recognition
Sentiment analysis in speech
Speech translation
Speech Recognition
Speech recognition converts spoken language into text.
Also called:
Speech-to-text
Automatic Speech Recognition (ASR)
Example
Audio Input
A recorded meeting.
Extracted Information
A written transcript.
Speaker Recognition
Speaker recognition identifies or verifies speakers based on voice characteristics.
Example
Voice authentication systems.
Speech Sentiment Analysis
Some AI systems analyze vocal tone and emotion.
Example
Detecting frustration during customer service calls.
Speech Translation
Speech translation converts spoken language into another language.
Example
Real-time multilingual meeting translation.
Information Extraction from Video
Video analysis combines computer vision and audio processing techniques.
Common Video Extraction Techniques
Important techniques include:
Motion detection
Object tracking
Activity recognition
Scene analysis
Video transcription
Motion Detection
Motion detection identifies movement within video footage.
Example
Security surveillance systems detecting activity.
Object Tracking
Object tracking follows identified objects across video frames.
Example
Tracking vehicles in traffic monitoring systems.
Activity Recognition
Activity recognition identifies actions occurring in video.
Example
Detecting:
Running
Falling
Fighting
Driving
Scene Analysis
Scene analysis identifies environments or contexts in video.
Example
Recognizing:
Office scenes
Outdoor settings
Crowded areas
Video Transcription
Video transcription converts spoken content in videos into text.
Example
Generating subtitles for videos automatically.
Multimodal AI
Some AI systems combine multiple data types together.
This is called multimodal AI.
Example of Multimodal AI
A meeting assistant may process:
Audio
Video
Text chat
Shared documents
simultaneously.
Real-World Information Extraction Scenarios
Scenario 1: Invoice Processing System
Goal
Extract invoice information automatically.
Techniques Used
OCR
Entity extraction
Scenario 2: Customer Support Analysis
Goal
Analyze customer complaints.
Techniques Used
Sentiment analysis
Keyword extraction
Scenario 3: Smart Security Camera
Goal
Detect suspicious activity.
Techniques Used
Object detection
Motion detection
Facial recognition
Scenario 4: Meeting Intelligence Platform
Goal
Generate searchable meeting notes.
Techniques Used
Speech recognition
Summarization
Speaker recognition
Scenario 5: Video Streaming Platform
Goal
Generate subtitles automatically.
Techniques Used
Speech recognition
Video transcription
Azure AI Services for Information Extraction
Azure AI Services provide tools for extracting information from multiple data types.
Common services include:
Azure AI Language
Azure AI Speech
Azure AI Vision
Azure AI Document Intelligence
These services allow organizations to build AI solutions without training models from scratch.
Responsible AI Considerations
Information extraction systems should follow Responsible AI principles.
Important considerations include:
Privacy
Consent
Data security
Transparency
Bias reduction
Compliance
Sensitive personal information may be present in extracted data.
Challenges in Information Extraction
AI systems may face challenges such as:
Poor image quality
Background noise
Ambiguous language
Multiple speakers
Handwritten text
Video quality issues
Performance depends heavily on data quality.
Important AI-901 Exam Tips
For the exam, remember these key points:
NLP extracts information from text.
OCR extracts text from images.
Speech recognition converts speech into text.
Object detection identifies and locates objects in images or video.
Video analysis can detect activities and movement.
Information extraction converts unstructured data into structured information.
Multimodal AI combines multiple data types.
Azure AI services provide prebuilt information extraction capabilities.
Quick Knowledge Check
Question 1
Which technique extracts text from scanned documents?
Answer
OCR.
Question 2
What does speech recognition do?
Answer
Converts spoken language into text.
Question 3
Which technique identifies objects within images?
Answer
Object detection.
Question 4
What is multimodal AI?
Answer
AI systems that process multiple types of data together, such as text, audio, and images.
Practice Exam Questions
Question 1
Which AI technique is used to extract text from scanned documents or images?
A. Sentiment analysis B. Optical Character Recognition (OCR) C. Object detection D. Speech synthesis
Correct Answer
B. Optical Character Recognition (OCR)
Explanation
OCR extracts machine-readable text from images, scanned documents, and photographs.
Why the Other Answers Are Incorrect
A. Sentiment analysis
Sentiment analysis identifies emotional tone in text.
C. Object detection
Object detection identifies objects within images.
D. Speech synthesis
Speech synthesis converts text into spoken audio.
Question 2
A company wants to convert recorded customer support calls into written transcripts.
Which AI capability should be used?
A. Speech recognition B. Facial recognition C. Image classification D. Regression
Correct Answer
A. Speech recognition
Explanation
Speech recognition converts spoken language into written text.
Why the Other Answers Are Incorrect
B. Facial recognition
Facial recognition analyzes faces in images.
C. Image classification
Image classification categorizes images.
D. Regression
Regression predicts numeric values.
Question 3
Which AI technique identifies and locates multiple objects within an image?
A. OCR B. Object detection C. Summarization D. Clustering
Correct Answer
B. Object detection
Explanation
Object detection identifies objects and their positions within images or video frames.
Why the Other Answers Are Incorrect
A. OCR
OCR extracts text from images.
C. Summarization
Summarization condenses text.
D. Clustering
Clustering groups similar data points.
Question 4
A business wants to automatically determine whether customer reviews are positive or negative.
Which AI technique is MOST appropriate?
A. Sentiment analysis B. OCR C. Facial recognition D. Image tagging
Correct Answer
A. Sentiment analysis
Explanation
Sentiment analysis evaluates emotional tone and opinions in text.
Why the Other Answers Are Incorrect
B. OCR
OCR extracts text from images.
C. Facial recognition
Facial recognition identifies people from images.
D. Image tagging
Image tagging labels image content.
Question 5
Which AI capability is commonly used to identify names, locations, and organizations within text?
A. Named Entity Recognition (NER) B. Speech synthesis C. Object tracking D. Regression analysis
Correct Answer
A. Named Entity Recognition (NER)
Explanation
NER extracts entities such as people, organizations, dates, and locations from text.
Why the Other Answers Are Incorrect
B. Speech synthesis
Speech synthesis generates spoken audio.
C. Object tracking
Object tracking follows objects in video.
D. Regression analysis
Regression predicts numeric values.
Question 6
A smart security camera tracks moving vehicles across multiple video frames.
Which AI technique is being used?
A. Text classification B. Object tracking C. Summarization D. Speech translation
Correct Answer
B. Object tracking
Explanation
Object tracking follows identified objects as they move through video footage.
Why the Other Answers Are Incorrect
A. Text classification
Text classification categorizes written text.
C. Summarization
Summarization condenses text.
D. Speech translation
Speech translation converts spoken language between languages.
Question 7
Which term describes AI systems that process multiple data types such as text, images, and audio together?
A. Regression AI B. Multimodal AI C. Clustering AI D. Rule-based AI
Correct Answer
B. Multimodal AI
Explanation
Multimodal AI combines and processes multiple forms of data simultaneously.
Why the Other Answers Are Incorrect
A. Regression AI
Regression predicts numeric values.
C. Clustering AI
Clustering groups similar items.
D. Rule-based AI
Rule-based systems follow predefined logic rules.
Question 8
Which AI capability would MOST likely be used to generate automatic subtitles for videos?
A. Speech recognition B. Image classification C. Facial recognition D. Recommendation systems
Correct Answer
A. Speech recognition
Explanation
Speech recognition converts spoken words in videos into text subtitles.
Why the Other Answers Are Incorrect
B. Image classification
Image classification categorizes images.
C. Facial recognition
Facial recognition identifies people in images.
D. Recommendation systems
Recommendation systems suggest content or products.
Question 9
A retailer wants AI to automatically identify products such as shoes, shirts, and electronics in uploaded images.
Which AI capability should be used?
A. Object detection B. Sentiment analysis C. Speech synthesis D. Language translation
Correct Answer
A. Object detection
Explanation
Object detection identifies multiple objects within images and can locate them visually.
Why the Other Answers Are Incorrect
B. Sentiment analysis
Sentiment analysis evaluates text emotion.
C. Speech synthesis
Speech synthesis converts text into speech.
D. Language translation
Language translation converts text or speech between languages.
Question 10
What is the PRIMARY goal of information extraction AI systems?
A. Creating video games B. Converting unstructured data into useful structured information C. Compressing database files D. Replacing all human decision-making
Correct Answer
B. Converting unstructured data into useful structured information
Explanation
Information extraction systems analyze unstructured content such as text, images, audio, and video to retrieve meaningful structured data.
Why the Other Answers Are Incorrect
A. Creating video games
This is unrelated to information extraction.
C. Compressing database files
This is a storage task, not AI extraction.
D. Replacing all human decision-making
AI systems are designed to assist and augment human processes, not completely replace all decision-making.
Final Thoughts
Information extraction is one of the most practical and widely used AI workloads covered in the AI-901 certification exam. Microsoft expects candidates to understand how AI systems extract useful insights from text, images, audio, and videos using NLP, speech AI, computer vision, and multimodal AI technologies.
These capabilities help organizations automate workflows, analyze large volumes of data, and build intelligent applications using Azure AI services.
This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. This topic falls under these sections: Describe an analytics workload (25–30%) --> Describe considerations for real-time data analytics --> Describe the difference between Batch and Streaming data
Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.
Understanding the difference between batch data and streaming data is fundamental for designing modern analytics solutions. These two approaches define how data is ingested, processed, and analyzed.
What Is Batch Data?
Batch data refers to data that is:
Collected over a period of time
Processed in large chunks (batches)
Handled at scheduled intervals
Key Characteristics of Batch Data
High latency (minutes, hours, or days)
Processes large volumes at once
Typically scheduled (e.g., nightly jobs)
Efficient and cost-effective
Common Use Cases
Daily sales reports
Monthly financial summaries
Historical data analysis
Data warehousing workloads
Azure Services for Batch Processing
Azure Data Factory → batch ingestion and orchestration
Azure Synapse Analytics → batch processing and analytics
What Is Streaming Data?
Streaming data refers to data that is:
Generated continuously
Processed in real time (or near real time)
Handled as individual events or small micro-batches
Key Characteristics of Streaming Data
Low latency (seconds or milliseconds)
Continuous data flow
Enables real-time insights
Often requires more complex processing
Common Use Cases
IoT sensor monitoring
Fraud detection
Live dashboards
Website activity tracking
Azure Services for Streaming
Azure Event Hubs → event ingestion
Azure Stream Analytics → real-time processing
Batch vs Streaming — Key Differences
Feature
Batch Processing
Streaming Processing
Data Flow
Periodic
Continuous
Latency
High
Low
Data Size
Large chunks
Small events
Complexity
Simpler
More complex
Cost
Lower
Higher
Use Case
Historical analysis
Real-time insights
When to Use Batch Processing
Choose batch when:
Real-time data is not required
You are working with large historical datasets
Cost efficiency is important
Processing can occur on a schedule
When to Use Streaming Processing
Choose streaming when:
You need real-time or near real-time insights
Data is generated continuously
Immediate action is required
Hybrid Approaches (Lambda / Modern Architectures)
Many modern systems use both:
Batch layer → historical analysis
Streaming layer → real-time insights
✔ Example:
Real-time dashboard + nightly aggregated reports
Why This Matters for DP-900
On the exam, you may be asked to:
Distinguish between batch and streaming scenarios
Choose the appropriate processing method
Identify Azure services for each approach
Understand trade-offs (latency, cost, complexity)
Summary — Exam-Relevant Takeaways
✔ Batch processing
Processes data in chunks
Higher latency
Lower cost
Best for historical analysis
✔ Streaming processing
Processes data continuously
Low latency
Enables real-time insights
More complex
✔ Azure services:
Batch → Azure Data Factory, Azure Synapse Analytics
What is the primary characteristic of batch data processing?
A. Continuous data flow B. Real-time processing C. Processing data in scheduled chunks D. Immediate event handling
✅ Answer: C
Explanation: Batch processing handles data in groups at scheduled intervals, not continuously.
Question 2
Which type of processing is BEST suited for real-time analytics?
A. Batch processing B. Stream processing C. Periodic processing D. Manual processing
✅ Answer: B
Explanation: Stream processing enables real-time or near real-time insights.
Question 3
Which Azure service is commonly used for streaming data ingestion?
A. Azure Data Factory B. Azure Event Hubs C. Azure Synapse Analytics D. Azure SQL Database
✅ Answer: B
Explanation: Azure Event Hubs is designed for high-throughput, real-time data ingestion.
Question 4
Which scenario is BEST suited for batch processing?
A. Monitoring live stock prices B. Detecting fraud in real time C. Generating a monthly financial report D. Tracking website clicks instantly
✅ Answer: C
Explanation: Batch processing is ideal for scheduled, periodic workloads like reports.
Question 5
What is the typical latency for streaming data processing?
A. Hours B. Days C. Seconds or milliseconds D. Weeks
✅ Answer: C
Explanation: Streaming processing provides low-latency, near real-time results.
Question 6
Which Azure service is used to process streaming data in real time?
A. Azure Blob Storage B. Azure Stream Analytics C. Azure Files D. Azure Virtual Machines
✅ Answer: B
Explanation: Azure Stream Analytics processes streaming data in real time.
Question 7
Which statement about batch processing is TRUE?
A. It processes data continuously B. It always requires real-time data sources C. It is typically more cost-effective than streaming D. It has lower latency than streaming
✅ Answer: C
Explanation: Batch processing is generally more cost-efficient than continuous streaming.
Question 8
Which scenario requires streaming processing?
A. Archiving old data B. Processing annual tax records C. Monitoring IoT sensor data in real time D. Generating quarterly reports
✅ Answer: C
Explanation: Streaming is needed for continuous, real-time data flows like IoT.
Question 9
What is a key difference between batch and streaming processing?
A. Batch uses structured data, streaming does not B. Streaming has higher latency than batch C. Batch processes data in chunks, streaming processes data continuously D. Streaming is always cheaper than batch
This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. This topic falls under these sections: Describe an analytics workload (25–30%) --> Describe common elements of large-scale analytics --> Describe options for analytical data stores
Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.
Analytical data stores are designed to support reporting, business intelligence, and large-scale data analysis. For the DP-900 exam, you should understand the different types of analytical stores, their characteristics, and when to use each.
What Is an Analytical Data Store?
An analytical data store is optimized for:
Querying large volumes of data
Aggregations and reporting
Historical analysis
✔ Unlike transactional systems, analytical stores focus on read-heavy workloads rather than frequent updates.
Key Characteristics
Optimized for complex queries and aggregations
Stores historical data
Handles large datasets (TBs to PBs)
Typically uses denormalized schemas
Designed for high-performance reads
Main Types of Analytical Data Stores
1. Data Warehouse
Definition
A structured repository designed for relational analytical queries.
Key Features
Uses structured data
Schema-based (often star or snowflake schema)
Supports SQL queries
Azure Example
Azure Synapse Analytics
Use Cases
Business intelligence reporting
Financial analysis
Enterprise dashboards
✔ Best for: Structured data and SQL-based analytics
2. Data Lake
Definition
A storage repository for raw data in its native format.
Key Features
Supports structured, semi-structured, and unstructured data
Schema-on-read (schema applied when querying)
Highly scalable and cost-effective
Azure Example
Azure Data Lake Storage
Use Cases
Big data analytics
Machine learning
Storing raw ingestion data
✔ Best for: Flexible, large-scale data storage
3. Data Lakehouse (Conceptual)
Definition
A hybrid approach combining features of data lakes and data warehouses.
Key Features
Stores raw data like a data lake
Supports structured queries like a warehouse
Often uses open formats (e.g., Parquet, Delta)
Azure Context
Often implemented using:
Azure Data Lake Storage
Azure Synapse Analytics
✔ Best for: Unified analytics platform
4. Analytical Databases / Big Data Processing Systems
Definition
Systems designed for distributed processing of large datasets.
Azure Example
Azure Synapse Analytics
Key Features
Parallel processing
Handles massive datasets
Supports batch and interactive queries
✔ Best for: Large-scale analytics workloads
Comparison of Analytical Data Stores
Feature
Data Warehouse
Data Lake
Lakehouse
Data Type
Structured
All types
All types
Schema
Schema-on-write
Schema-on-read
Hybrid
Cost
Higher
Lower
Moderate
Flexibility
Low
High
High
Query Performance
High
Variable
High
Key Design Considerations
1. Data Structure
Structured → Data warehouse
Mixed or raw → Data lake
2. Query Requirements
Complex SQL queries → Data warehouse
Exploratory analytics → Data lake
3. Cost
Data lakes are generally more cost-effective
Warehouses provide optimized performance at higher cost
4. Scalability
All Azure analytical stores scale
Data lakes excel in massive data storage
5. Performance Needs
Warehouses → optimized for speed
Lakes → optimized for storage and flexibility
Typical Analytics Architecture
Data Ingestion
Batch or streaming
Storage
Data lake or data warehouse
Processing
Transformations and aggregations
Visualization
BI tools (e.g., Power BI)
Why This Matters for DP-900
On the exam, you may be asked to:
Identify the correct analytical store for a scenario
Compare data lakes vs data warehouses
Understand schema-on-read vs schema-on-write
Recognize Azure services used for analytics
Summary — Exam-Relevant Takeaways
✔ Analytical data stores are used for:
Reporting
Analytics
Historical data analysis
✔ Main types:
Data Warehouse → structured, high-performance queries
Data Lake → raw, flexible storage
Lakehouse → hybrid approach
✔ Key concepts:
Schema-on-write (warehouse)
Schema-on-read (lake)
✔ Azure services to know:
Azure Synapse Analytics → data warehouse & analytics
Azure Data Lake Storage → scalable data lake
✔ Exam tip: 👉 Structured + SQL analytics → Data Warehouse 👉 Raw + flexible + big data → Data Lake
What is the primary purpose of an analytical data store?
A. To process high-volume transactions B. To store temporary application data C. To support reporting and data analysis D. To manage user authentication
✅ Answer: C
Explanation: Analytical data stores are optimized for reporting, querying, and analysis, not transactions.
Question 2
Which type of data store is BEST suited for structured data and complex SQL queries?
A. Data lake B. Data warehouse C. File storage D. Key-value store
✅ Answer: B
Explanation: Data warehouses are designed for structured data and high-performance SQL queries.
Question 3
Which Azure service is commonly used as a data warehouse?
A. Azure Data Lake Storage B. Azure Synapse Analytics C. Azure Files D. Azure Table Storage
✅ Answer: B
Explanation: Azure Synapse Analytics provides data warehousing and large-scale analytics capabilities.
Question 4
What is a key characteristic of a data lake?
A. Requires predefined schema before loading data B. Stores only structured data C. Stores data in its raw format D. Optimized for transactional workloads
✅ Answer: C
Explanation: Data lakes store raw data in native formats, supporting schema-on-read.
Question 5
Which concept describes applying schema when data is read rather than when it is written?
A. Schema-on-write B. Schema-on-read C. Data normalization D. Data partitioning
✅ Answer: B
Explanation: Schema-on-read is used in data lakes, allowing flexible analysis.
Question 6
Which scenario is BEST suited for a data lake?
A. Financial reporting with strict schema B. Running complex SQL joins on structured data C. Storing raw IoT and log data for later analysis D. Processing online transactions
✅ Answer: C
Explanation: Data lakes are ideal for large volumes of raw, diverse data.
Question 7
Which analytical data store typically uses schema-on-write?
A. Data lake B. Data warehouse C. Object storage D. Key-value store
✅ Answer: B
Explanation: Data warehouses require a defined schema before data is loaded.
Question 8
Which of the following best describes a data lakehouse?
A. A transactional database system B. A file storage system only C. A hybrid of data lake and data warehouse D. A key-value storage solution
✅ Answer: C
Explanation: A lakehouse combines flexibility of data lakes with performance of warehouses.
Question 9
Which factor is MOST important when choosing between a data lake and a data warehouse?
A. Screen resolution B. Data structure and query requirements C. Programming language D. User interface design
✅ Answer: B
Explanation: The choice depends on data type (structured vs raw) and query needs.
Question 10
Which Azure service is BEST suited for storing large volumes of raw, unstructured data?
A. Azure SQL Database B. Azure Data Lake Storage C. Azure Synapse Analytics D. Azure Table Storage
✅ Answer: B
Explanation: Azure Data Lake Storage is optimized for large-scale raw data storage.
✅ Quick Exam Takeaways
✔ Analytical data stores support:
Reporting
Business intelligence
Large-scale analytics
✔ Main types:
Data Warehouse → structured, SQL, high performance
Data Lake → raw, flexible, scalable
Lakehouse → hybrid approach
✔ Key concepts:
Schema-on-write → warehouse
Schema-on-read → lake
✔ Azure services:
Azure Synapse Analytics → data warehouse / analytics
Azure Data Lake Storage → data lake
✔ Exam tip: 👉 Structured + SQL → Data Warehouse 👉 Raw + flexible → Data Lake