This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Identify AI concepts and capabilities (40–45%)
   --> Identify AI workloads
      --> Identify techniques to extract information from text, images, audio, and videos

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Information extraction is one of the most valuable uses of AI and an important topic for the AI-901 certification exam. Organizations generate enormous amounts of unstructured data every day, including documents, emails, images, audio recordings, and videos. AI systems help convert this unstructured data into structured, usable information.

Microsoft expects AI-901 candidates to understand common techniques used to extract information from text, images, audio, and video content.

This topic falls under the “Identify AI workloads” section of the AI-901 exam objectives.

What Is Information Extraction?

Information extraction is the process of identifying and retrieving useful structured information from unstructured or semi-structured data.

AI systems analyze content and extract meaningful data automatically.

Examples of Information Extraction

Source	Extracted Information
Documents	Names, dates, invoice totals
Emails	Customer requests, keywords
Images	Objects, faces, text
Audio	Spoken words, speaker identity
Video	Activities, objects, movement

Structured vs. Unstructured Data

Understanding structured and unstructured data is important for this topic.

Structured Data	Unstructured Data
Tables	Emails
Databases	Images
Spreadsheets	Audio
Defined formats	Videos
Organized fields	Documents

AI techniques help transform unstructured data into structured information.

Information Extraction from Text

AI systems commonly use Natural Language Processing (NLP) to extract information from text.

Common Text Extraction Techniques

For the AI-901 exam, important techniques include:

Keyword extraction
Named Entity Recognition (NER)
Sentiment analysis
Summarization
Language detection
Text classification

Keyword Extraction

Keyword extraction identifies important words or phrases within text.

Example

Extracting phrases like:

“shipping delay”
“billing issue”
“customer satisfaction”

from support tickets.

Named Entity Recognition (NER)

NER identifies entities such as:

People
Organizations
Locations
Dates
Phone numbers
Products

Example

Input

“Microsoft will host an event in Seattle on June 15.”

Extracted Entities

Microsoft → Organization
Seattle → Location
June 15 → Date

Sentiment Analysis

Sentiment analysis identifies emotional tone within text.

Possible Results

Positive
Negative
Neutral

Example

Analyzing customer reviews to determine satisfaction levels.

Summarization

Summarization creates shorter versions of long text.

Example

Generating meeting summaries from lengthy transcripts.

Text Classification

Text classification assigns categories to text.

Example

Automatically labeling emails as:

Support
Sales
Billing

Information Extraction from Images

Computer vision techniques extract information from images.

Common Image Extraction Techniques

Important techniques include:

OCR
Image classification
Object detection
Facial recognition
Image tagging

Optical Character Recognition (OCR)

OCR extracts text from images and scanned documents.

OCR Example

Input

Scanned invoice image.

Extracted Information

Invoice number
Total amount
Vendor name
Dates

Common OCR Use Cases

Receipt scanning
Invoice processing
Document digitization
Form extraction

Image Classification

Image classification identifies the overall category of an image.

Example

Identifying whether an image contains:

A dog
A car
A building

Object Detection

Object detection identifies and locates multiple objects within images.

Example

Detecting:

Cars
Pedestrians
Traffic lights

in a street image.

Facial Recognition

Facial recognition identifies or verifies people based on facial features.

Example

Smartphone face unlock systems.

Image Tagging

Image tagging automatically generates descriptive labels.

Example Tags

Beach
Sunset
Ocean
Person

Information Extraction from Audio

Speech AI technologies extract information from spoken audio.

Common Audio Extraction Techniques

Important techniques include:

Speech recognition
Speaker recognition
Sentiment analysis in speech
Speech translation

Speech Recognition

Speech recognition converts spoken language into text.

Also called:

Speech-to-text
Automatic Speech Recognition (ASR)

Example

Audio Input

A recorded meeting.

Extracted Information

A written transcript.

Speaker Recognition

Speaker recognition identifies or verifies speakers based on voice characteristics.

Example

Voice authentication systems.

Speech Sentiment Analysis

Some AI systems analyze vocal tone and emotion.

Example

Detecting frustration during customer service calls.

Speech Translation

Speech translation converts spoken language into another language.

Example

Real-time multilingual meeting translation.

Information Extraction from Video

Video analysis combines computer vision and audio processing techniques.

Common Video Extraction Techniques

Important techniques include:

Motion detection
Object tracking
Activity recognition
Scene analysis
Video transcription

Motion Detection

Motion detection identifies movement within video footage.

Example

Security surveillance systems detecting activity.

Object Tracking

Object tracking follows identified objects across video frames.

Example

Tracking vehicles in traffic monitoring systems.

Activity Recognition

Activity recognition identifies actions occurring in video.

Example

Detecting:

Running
Falling
Fighting
Driving

Scene Analysis

Scene analysis identifies environments or contexts in video.

Example

Recognizing:

Office scenes
Outdoor settings
Crowded areas

Video Transcription

Video transcription converts spoken content in videos into text.

Example

Generating subtitles for videos automatically.

Multimodal AI

Some AI systems combine multiple data types together.

This is called multimodal AI.

Example of Multimodal AI

A meeting assistant may process:

Audio
Video
Text chat
Shared documents

simultaneously.

Real-World Information Extraction Scenarios

Scenario 1: Invoice Processing System

Goal

Extract invoice information automatically.

Techniques Used

OCR
Entity extraction

Scenario 2: Customer Support Analysis

Goal

Analyze customer complaints.

Techniques Used

Sentiment analysis
Keyword extraction

Scenario 3: Smart Security Camera

Goal

Detect suspicious activity.

Techniques Used

Object detection
Motion detection
Facial recognition

Scenario 4: Meeting Intelligence Platform

Goal

Generate searchable meeting notes.

Techniques Used

Speech recognition
Summarization
Speaker recognition

Scenario 5: Video Streaming Platform

Goal

Generate subtitles automatically.

Techniques Used

Speech recognition
Video transcription

Azure AI Services for Information Extraction

Azure AI Services provide tools for extracting information from multiple data types.

Common services include:

Azure AI Language
Azure AI Speech
Azure AI Vision
Azure AI Document Intelligence

These services allow organizations to build AI solutions without training models from scratch.

Responsible AI Considerations

Information extraction systems should follow Responsible AI principles.

Important considerations include:

Privacy
Consent
Data security
Transparency
Bias reduction
Compliance

Sensitive personal information may be present in extracted data.

Challenges in Information Extraction

AI systems may face challenges such as:

Poor image quality
Background noise
Ambiguous language
Multiple speakers
Handwritten text
Video quality issues

Performance depends heavily on data quality.

Important AI-901 Exam Tips

For the exam, remember these key points:

NLP extracts information from text.
OCR extracts text from images.
Speech recognition converts speech into text.
Object detection identifies and locates objects in images or video.
Video analysis can detect activities and movement.
Information extraction converts unstructured data into structured information.
Multimodal AI combines multiple data types.
Azure AI services provide prebuilt information extraction capabilities.

Quick Knowledge Check

Question 1

Which technique extracts text from scanned documents?

Answer

OCR.

Question 2

What does speech recognition do?

Answer

Converts spoken language into text.

Question 3

Which technique identifies objects within images?

Answer

Object detection.

Question 4

What is multimodal AI?

Answer

AI systems that process multiple types of data together, such as text, audio, and images.

Practice Exam Questions

Question 1

Which AI technique is used to extract text from scanned documents or images?

A. Sentiment analysis
B. Optical Character Recognition (OCR)
C. Object detection
D. Speech synthesis

Correct Answer

B. Optical Character Recognition (OCR)

Explanation

OCR extracts machine-readable text from images, scanned documents, and photographs.

Why the Other Answers Are Incorrect

A. Sentiment analysis

Sentiment analysis identifies emotional tone in text.

C. Object detection

Object detection identifies objects within images.

D. Speech synthesis

Speech synthesis converts text into spoken audio.

Question 2

A company wants to convert recorded customer support calls into written transcripts.

Which AI capability should be used?

A. Speech recognition
B. Facial recognition
C. Image classification
D. Regression

Correct Answer

A. Speech recognition

Explanation

Speech recognition converts spoken language into written text.

Why the Other Answers Are Incorrect

B. Facial recognition

Facial recognition analyzes faces in images.

C. Image classification

Image classification categorizes images.

D. Regression

Regression predicts numeric values.

Question 3

Which AI technique identifies and locates multiple objects within an image?

A. OCR
B. Object detection
C. Summarization
D. Clustering

Correct Answer

B. Object detection

Explanation

Object detection identifies objects and their positions within images or video frames.

Why the Other Answers Are Incorrect

A. OCR

OCR extracts text from images.

C. Summarization

Summarization condenses text.

D. Clustering

Clustering groups similar data points.

Question 4

A business wants to automatically determine whether customer reviews are positive or negative.

Which AI technique is MOST appropriate?

A. Sentiment analysis
B. OCR
C. Facial recognition
D. Image tagging

Correct Answer

A. Sentiment analysis

Explanation

Sentiment analysis evaluates emotional tone and opinions in text.

Why the Other Answers Are Incorrect

B. OCR

OCR extracts text from images.

C. Facial recognition

Facial recognition identifies people from images.

D. Image tagging

Image tagging labels image content.

Question 5

Which AI capability is commonly used to identify names, locations, and organizations within text?

A. Named Entity Recognition (NER)
B. Speech synthesis
C. Object tracking
D. Regression analysis

Correct Answer

A. Named Entity Recognition (NER)

Explanation

NER extracts entities such as people, organizations, dates, and locations from text.

Why the Other Answers Are Incorrect

B. Speech synthesis

Speech synthesis generates spoken audio.

C. Object tracking

Object tracking follows objects in video.

D. Regression analysis

Regression predicts numeric values.

Question 6

A smart security camera tracks moving vehicles across multiple video frames.

Which AI technique is being used?

A. Text classification
B. Object tracking
C. Summarization
D. Speech translation

Correct Answer

B. Object tracking

Explanation

Object tracking follows identified objects as they move through video footage.

Why the Other Answers Are Incorrect

A. Text classification

Text classification categorizes written text.

C. Summarization

Summarization condenses text.

D. Speech translation

Speech translation converts spoken language between languages.

Question 7

Which term describes AI systems that process multiple data types such as text, images, and audio together?

A. Regression AI
B. Multimodal AI
C. Clustering AI
D. Rule-based AI

Correct Answer

B. Multimodal AI

Explanation

Multimodal AI combines and processes multiple forms of data simultaneously.

Why the Other Answers Are Incorrect

A. Regression AI

Regression predicts numeric values.

C. Clustering AI

Clustering groups similar items.

D. Rule-based AI

Rule-based systems follow predefined logic rules.

Question 8

Which AI capability would MOST likely be used to generate automatic subtitles for videos?

A. Speech recognition
B. Image classification
C. Facial recognition
D. Recommendation systems

Correct Answer

A. Speech recognition

Explanation

Speech recognition converts spoken words in videos into text subtitles.

Why the Other Answers Are Incorrect

B. Image classification

Image classification categorizes images.

C. Facial recognition

Facial recognition identifies people in images.

D. Recommendation systems

Recommendation systems suggest content or products.

Question 9

A retailer wants AI to automatically identify products such as shoes, shirts, and electronics in uploaded images.

Which AI capability should be used?

A. Object detection
B. Sentiment analysis
C. Speech synthesis
D. Language translation

Correct Answer

A. Object detection

Explanation

Object detection identifies multiple objects within images and can locate them visually.

Why the Other Answers Are Incorrect

B. Sentiment analysis

Sentiment analysis evaluates text emotion.

C. Speech synthesis

Speech synthesis converts text into speech.

D. Language translation

Language translation converts text or speech between languages.

Question 10

What is the PRIMARY goal of information extraction AI systems?

A. Creating video games
B. Converting unstructured data into useful structured information
C. Compressing database files
D. Replacing all human decision-making

Correct Answer

B. Converting unstructured data into useful structured information

Explanation

Information extraction systems analyze unstructured content such as text, images, audio, and video to retrieve meaningful structured data.

Why the Other Answers Are Incorrect

A. Creating video games

This is unrelated to information extraction.

C. Compressing database files

This is a storage task, not AI extraction.

D. Replacing all human decision-making

AI systems are designed to assist and augment human processes, not completely replace all decision-making.

Final Thoughts

Information extraction is one of the most practical and widely used AI workloads covered in the AI-901 certification exam. Microsoft expects candidates to understand how AI systems extract useful insights from text, images, audio, and videos using NLP, speech AI, computer vision, and multimodal AI technologies.

These capabilities help organizations automate workflows, analyze large volumes of data, and build intelligent applications using Azure AI services.

Go to the AI-901 Exam Prep Hub main page