This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub.
This topic falls under these sections:
Identify AI concepts and capabilities (40–45%)
--> Identify AI workloads
--> Identify techniques to extract information from text, images, audio, and videos
Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.
Information extraction is one of the most valuable uses of AI and an important topic for the AI-901 certification exam. Organizations generate enormous amounts of unstructured data every day, including documents, emails, images, audio recordings, and videos. AI systems help convert this unstructured data into structured, usable information.
Microsoft expects AI-901 candidates to understand common techniques used to extract information from text, images, audio, and video content.
This topic falls under the “Identify AI workloads” section of the AI-901 exam objectives.
What Is Information Extraction?
Information extraction is the process of identifying and retrieving useful structured information from unstructured or semi-structured data.
AI systems analyze content and extract meaningful data automatically.
Examples of Information Extraction
| Source | Extracted Information |
|---|---|
| Documents | Names, dates, invoice totals |
| Emails | Customer requests, keywords |
| Images | Objects, faces, text |
| Audio | Spoken words, speaker identity |
| Video | Activities, objects, movement |
Structured vs. Unstructured Data
Understanding structured and unstructured data is important for this topic.
| Structured Data | Unstructured Data |
|---|---|
| Tables | Emails |
| Databases | Images |
| Spreadsheets | Audio |
| Defined formats | Videos |
| Organized fields | Documents |
AI techniques help transform unstructured data into structured information.
Information Extraction from Text
AI systems commonly use Natural Language Processing (NLP) to extract information from text.
Common Text Extraction Techniques
For the AI-901 exam, important techniques include:
- Keyword extraction
- Named Entity Recognition (NER)
- Sentiment analysis
- Summarization
- Language detection
- Text classification
Keyword Extraction
Keyword extraction identifies important words or phrases within text.
Example
Extracting phrases like:
- “shipping delay”
- “billing issue”
- “customer satisfaction”
from support tickets.
Named Entity Recognition (NER)
NER identifies entities such as:
- People
- Organizations
- Locations
- Dates
- Phone numbers
- Products
Example
Input
“Microsoft will host an event in Seattle on June 15.”
Extracted Entities
- Microsoft → Organization
- Seattle → Location
- June 15 → Date
Sentiment Analysis
Sentiment analysis identifies emotional tone within text.
Possible Results
- Positive
- Negative
- Neutral
Example
Analyzing customer reviews to determine satisfaction levels.
Summarization
Summarization creates shorter versions of long text.
Example
Generating meeting summaries from lengthy transcripts.
Text Classification
Text classification assigns categories to text.
Example
Automatically labeling emails as:
- Support
- Sales
- Billing
Information Extraction from Images
Computer vision techniques extract information from images.
Common Image Extraction Techniques
Important techniques include:
- OCR
- Image classification
- Object detection
- Facial recognition
- Image tagging
Optical Character Recognition (OCR)
OCR extracts text from images and scanned documents.
OCR Example
Input
Scanned invoice image.
Extracted Information
- Invoice number
- Total amount
- Vendor name
- Dates
Common OCR Use Cases
- Receipt scanning
- Invoice processing
- Document digitization
- Form extraction
Image Classification
Image classification identifies the overall category of an image.
Example
Identifying whether an image contains:
- A dog
- A car
- A building
Object Detection
Object detection identifies and locates multiple objects within images.
Example
Detecting:
- Cars
- Pedestrians
- Traffic lights
in a street image.
Facial Recognition
Facial recognition identifies or verifies people based on facial features.
Example
Smartphone face unlock systems.
Image Tagging
Image tagging automatically generates descriptive labels.
Example Tags
- Beach
- Sunset
- Ocean
- Person
Information Extraction from Audio
Speech AI technologies extract information from spoken audio.
Common Audio Extraction Techniques
Important techniques include:
- Speech recognition
- Speaker recognition
- Sentiment analysis in speech
- Speech translation
Speech Recognition
Speech recognition converts spoken language into text.
Also called:
- Speech-to-text
- Automatic Speech Recognition (ASR)
Example
Audio Input
A recorded meeting.
Extracted Information
A written transcript.
Speaker Recognition
Speaker recognition identifies or verifies speakers based on voice characteristics.
Example
Voice authentication systems.
Speech Sentiment Analysis
Some AI systems analyze vocal tone and emotion.
Example
Detecting frustration during customer service calls.
Speech Translation
Speech translation converts spoken language into another language.
Example
Real-time multilingual meeting translation.
Information Extraction from Video
Video analysis combines computer vision and audio processing techniques.
Common Video Extraction Techniques
Important techniques include:
- Motion detection
- Object tracking
- Activity recognition
- Scene analysis
- Video transcription
Motion Detection
Motion detection identifies movement within video footage.
Example
Security surveillance systems detecting activity.
Object Tracking
Object tracking follows identified objects across video frames.
Example
Tracking vehicles in traffic monitoring systems.
Activity Recognition
Activity recognition identifies actions occurring in video.
Example
Detecting:
- Running
- Falling
- Fighting
- Driving
Scene Analysis
Scene analysis identifies environments or contexts in video.
Example
Recognizing:
- Office scenes
- Outdoor settings
- Crowded areas
Video Transcription
Video transcription converts spoken content in videos into text.
Example
Generating subtitles for videos automatically.
Multimodal AI
Some AI systems combine multiple data types together.
This is called multimodal AI.
Example of Multimodal AI
A meeting assistant may process:
- Audio
- Video
- Text chat
- Shared documents
simultaneously.
Real-World Information Extraction Scenarios
Scenario 1: Invoice Processing System
Goal
Extract invoice information automatically.
Techniques Used
- OCR
- Entity extraction
Scenario 2: Customer Support Analysis
Goal
Analyze customer complaints.
Techniques Used
- Sentiment analysis
- Keyword extraction
Scenario 3: Smart Security Camera
Goal
Detect suspicious activity.
Techniques Used
- Object detection
- Motion detection
- Facial recognition
Scenario 4: Meeting Intelligence Platform
Goal
Generate searchable meeting notes.
Techniques Used
- Speech recognition
- Summarization
- Speaker recognition
Scenario 5: Video Streaming Platform
Goal
Generate subtitles automatically.
Techniques Used
- Speech recognition
- Video transcription
Azure AI Services for Information Extraction
Azure AI Services provide tools for extracting information from multiple data types.
Common services include:
- Azure AI Language
- Azure AI Speech
- Azure AI Vision
- Azure AI Document Intelligence
These services allow organizations to build AI solutions without training models from scratch.
Responsible AI Considerations
Information extraction systems should follow Responsible AI principles.
Important considerations include:
- Privacy
- Consent
- Data security
- Transparency
- Bias reduction
- Compliance
Sensitive personal information may be present in extracted data.
Challenges in Information Extraction
AI systems may face challenges such as:
- Poor image quality
- Background noise
- Ambiguous language
- Multiple speakers
- Handwritten text
- Video quality issues
Performance depends heavily on data quality.
Important AI-901 Exam Tips
For the exam, remember these key points:
- NLP extracts information from text.
- OCR extracts text from images.
- Speech recognition converts speech into text.
- Object detection identifies and locates objects in images or video.
- Video analysis can detect activities and movement.
- Information extraction converts unstructured data into structured information.
- Multimodal AI combines multiple data types.
- Azure AI services provide prebuilt information extraction capabilities.
Quick Knowledge Check
Question 1
Which technique extracts text from scanned documents?
Answer
OCR.
Question 2
What does speech recognition do?
Answer
Converts spoken language into text.
Question 3
Which technique identifies objects within images?
Answer
Object detection.
Question 4
What is multimodal AI?
Answer
AI systems that process multiple types of data together, such as text, audio, and images.
Practice Exam Questions
Question 1
Which AI technique is used to extract text from scanned documents or images?
A. Sentiment analysis
B. Optical Character Recognition (OCR)
C. Object detection
D. Speech synthesis
Correct Answer
B. Optical Character Recognition (OCR)
Explanation
OCR extracts machine-readable text from images, scanned documents, and photographs.
Why the Other Answers Are Incorrect
A. Sentiment analysis
Sentiment analysis identifies emotional tone in text.
C. Object detection
Object detection identifies objects within images.
D. Speech synthesis
Speech synthesis converts text into spoken audio.
Question 2
A company wants to convert recorded customer support calls into written transcripts.
Which AI capability should be used?
A. Speech recognition
B. Facial recognition
C. Image classification
D. Regression
Correct Answer
A. Speech recognition
Explanation
Speech recognition converts spoken language into written text.
Why the Other Answers Are Incorrect
B. Facial recognition
Facial recognition analyzes faces in images.
C. Image classification
Image classification categorizes images.
D. Regression
Regression predicts numeric values.
Question 3
Which AI technique identifies and locates multiple objects within an image?
A. OCR
B. Object detection
C. Summarization
D. Clustering
Correct Answer
B. Object detection
Explanation
Object detection identifies objects and their positions within images or video frames.
Why the Other Answers Are Incorrect
A. OCR
OCR extracts text from images.
C. Summarization
Summarization condenses text.
D. Clustering
Clustering groups similar data points.
Question 4
A business wants to automatically determine whether customer reviews are positive or negative.
Which AI technique is MOST appropriate?
A. Sentiment analysis
B. OCR
C. Facial recognition
D. Image tagging
Correct Answer
A. Sentiment analysis
Explanation
Sentiment analysis evaluates emotional tone and opinions in text.
Why the Other Answers Are Incorrect
B. OCR
OCR extracts text from images.
C. Facial recognition
Facial recognition identifies people from images.
D. Image tagging
Image tagging labels image content.
Question 5
Which AI capability is commonly used to identify names, locations, and organizations within text?
A. Named Entity Recognition (NER)
B. Speech synthesis
C. Object tracking
D. Regression analysis
Correct Answer
A. Named Entity Recognition (NER)
Explanation
NER extracts entities such as people, organizations, dates, and locations from text.
Why the Other Answers Are Incorrect
B. Speech synthesis
Speech synthesis generates spoken audio.
C. Object tracking
Object tracking follows objects in video.
D. Regression analysis
Regression predicts numeric values.
Question 6
A smart security camera tracks moving vehicles across multiple video frames.
Which AI technique is being used?
A. Text classification
B. Object tracking
C. Summarization
D. Speech translation
Correct Answer
B. Object tracking
Explanation
Object tracking follows identified objects as they move through video footage.
Why the Other Answers Are Incorrect
A. Text classification
Text classification categorizes written text.
C. Summarization
Summarization condenses text.
D. Speech translation
Speech translation converts spoken language between languages.
Question 7
Which term describes AI systems that process multiple data types such as text, images, and audio together?
A. Regression AI
B. Multimodal AI
C. Clustering AI
D. Rule-based AI
Correct Answer
B. Multimodal AI
Explanation
Multimodal AI combines and processes multiple forms of data simultaneously.
Why the Other Answers Are Incorrect
A. Regression AI
Regression predicts numeric values.
C. Clustering AI
Clustering groups similar items.
D. Rule-based AI
Rule-based systems follow predefined logic rules.
Question 8
Which AI capability would MOST likely be used to generate automatic subtitles for videos?
A. Speech recognition
B. Image classification
C. Facial recognition
D. Recommendation systems
Correct Answer
A. Speech recognition
Explanation
Speech recognition converts spoken words in videos into text subtitles.
Why the Other Answers Are Incorrect
B. Image classification
Image classification categorizes images.
C. Facial recognition
Facial recognition identifies people in images.
D. Recommendation systems
Recommendation systems suggest content or products.
Question 9
A retailer wants AI to automatically identify products such as shoes, shirts, and electronics in uploaded images.
Which AI capability should be used?
A. Object detection
B. Sentiment analysis
C. Speech synthesis
D. Language translation
Correct Answer
A. Object detection
Explanation
Object detection identifies multiple objects within images and can locate them visually.
Why the Other Answers Are Incorrect
B. Sentiment analysis
Sentiment analysis evaluates text emotion.
C. Speech synthesis
Speech synthesis converts text into speech.
D. Language translation
Language translation converts text or speech between languages.
Question 10
What is the PRIMARY goal of information extraction AI systems?
A. Creating video games
B. Converting unstructured data into useful structured information
C. Compressing database files
D. Replacing all human decision-making
Correct Answer
B. Converting unstructured data into useful structured information
Explanation
Information extraction systems analyze unstructured content such as text, images, audio, and video to retrieve meaningful structured data.
Why the Other Answers Are Incorrect
A. Creating video games
This is unrelated to information extraction.
C. Compressing database files
This is a storage task, not AI extraction.
D. Replacing all human decision-making
AI systems are designed to assist and augment human processes, not completely replace all decision-making.
Final Thoughts
Information extraction is one of the most practical and widely used AI workloads covered in the AI-901 certification exam. Microsoft expects candidates to understand how AI systems extract useful insights from text, images, audio, and videos using NLP, speech AI, computer vision, and multimodal AI technologies.
These capabilities help organizations automate workflows, analyze large volumes of data, and build intelligent applications using Azure AI services.
Go to the AI-901 Exam Prep Hub main page

One thought on “Identify techniques to extract information from text, images, audio, and videos (AI-901 Exam Prep)”