Identify techniques to extract information from text, images, audio, and videos (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Identify AI concepts and capabilities (40–45%)
--> Identify AI workloads
--> Identify techniques to extract information from text, images, audio, and videos


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Information extraction is one of the most valuable uses of AI and an important topic for the AI-901 certification exam. Organizations generate enormous amounts of unstructured data every day, including documents, emails, images, audio recordings, and videos. AI systems help convert this unstructured data into structured, usable information.

Microsoft expects AI-901 candidates to understand common techniques used to extract information from text, images, audio, and video content.

This topic falls under the “Identify AI workloads” section of the AI-901 exam objectives.


What Is Information Extraction?

Information extraction is the process of identifying and retrieving useful structured information from unstructured or semi-structured data.

AI systems analyze content and extract meaningful data automatically.


Examples of Information Extraction

SourceExtracted Information
DocumentsNames, dates, invoice totals
EmailsCustomer requests, keywords
ImagesObjects, faces, text
AudioSpoken words, speaker identity
VideoActivities, objects, movement

Structured vs. Unstructured Data

Understanding structured and unstructured data is important for this topic.

Structured DataUnstructured Data
TablesEmails
DatabasesImages
SpreadsheetsAudio
Defined formatsVideos
Organized fieldsDocuments

AI techniques help transform unstructured data into structured information.


Information Extraction from Text

AI systems commonly use Natural Language Processing (NLP) to extract information from text.


Common Text Extraction Techniques

For the AI-901 exam, important techniques include:

  • Keyword extraction
  • Named Entity Recognition (NER)
  • Sentiment analysis
  • Summarization
  • Language detection
  • Text classification

Keyword Extraction

Keyword extraction identifies important words or phrases within text.

Example

Extracting phrases like:

  • “shipping delay”
  • “billing issue”
  • “customer satisfaction”

from support tickets.


Named Entity Recognition (NER)

NER identifies entities such as:

  • People
  • Organizations
  • Locations
  • Dates
  • Phone numbers
  • Products

Example

Input

“Microsoft will host an event in Seattle on June 15.”

Extracted Entities

  • Microsoft → Organization
  • Seattle → Location
  • June 15 → Date

Sentiment Analysis

Sentiment analysis identifies emotional tone within text.

Possible Results

  • Positive
  • Negative
  • Neutral

Example

Analyzing customer reviews to determine satisfaction levels.


Summarization

Summarization creates shorter versions of long text.

Example

Generating meeting summaries from lengthy transcripts.


Text Classification

Text classification assigns categories to text.

Example

Automatically labeling emails as:

  • Support
  • Sales
  • Billing

Information Extraction from Images

Computer vision techniques extract information from images.


Common Image Extraction Techniques

Important techniques include:

  • OCR
  • Image classification
  • Object detection
  • Facial recognition
  • Image tagging

Optical Character Recognition (OCR)

OCR extracts text from images and scanned documents.


OCR Example

Input

Scanned invoice image.

Extracted Information

  • Invoice number
  • Total amount
  • Vendor name
  • Dates

Common OCR Use Cases

  • Receipt scanning
  • Invoice processing
  • Document digitization
  • Form extraction

Image Classification

Image classification identifies the overall category of an image.

Example

Identifying whether an image contains:

  • A dog
  • A car
  • A building

Object Detection

Object detection identifies and locates multiple objects within images.

Example

Detecting:

  • Cars
  • Pedestrians
  • Traffic lights

in a street image.


Facial Recognition

Facial recognition identifies or verifies people based on facial features.

Example

Smartphone face unlock systems.


Image Tagging

Image tagging automatically generates descriptive labels.

Example Tags

  • Beach
  • Sunset
  • Ocean
  • Person

Information Extraction from Audio

Speech AI technologies extract information from spoken audio.


Common Audio Extraction Techniques

Important techniques include:

  • Speech recognition
  • Speaker recognition
  • Sentiment analysis in speech
  • Speech translation

Speech Recognition

Speech recognition converts spoken language into text.

Also called:

  • Speech-to-text
  • Automatic Speech Recognition (ASR)

Example

Audio Input

A recorded meeting.

Extracted Information

A written transcript.


Speaker Recognition

Speaker recognition identifies or verifies speakers based on voice characteristics.

Example

Voice authentication systems.


Speech Sentiment Analysis

Some AI systems analyze vocal tone and emotion.

Example

Detecting frustration during customer service calls.


Speech Translation

Speech translation converts spoken language into another language.

Example

Real-time multilingual meeting translation.


Information Extraction from Video

Video analysis combines computer vision and audio processing techniques.


Common Video Extraction Techniques

Important techniques include:

  • Motion detection
  • Object tracking
  • Activity recognition
  • Scene analysis
  • Video transcription

Motion Detection

Motion detection identifies movement within video footage.

Example

Security surveillance systems detecting activity.


Object Tracking

Object tracking follows identified objects across video frames.

Example

Tracking vehicles in traffic monitoring systems.


Activity Recognition

Activity recognition identifies actions occurring in video.

Example

Detecting:

  • Running
  • Falling
  • Fighting
  • Driving

Scene Analysis

Scene analysis identifies environments or contexts in video.

Example

Recognizing:

  • Office scenes
  • Outdoor settings
  • Crowded areas

Video Transcription

Video transcription converts spoken content in videos into text.

Example

Generating subtitles for videos automatically.


Multimodal AI

Some AI systems combine multiple data types together.

This is called multimodal AI.


Example of Multimodal AI

A meeting assistant may process:

  • Audio
  • Video
  • Text chat
  • Shared documents

simultaneously.


Real-World Information Extraction Scenarios


Scenario 1: Invoice Processing System

Goal

Extract invoice information automatically.

Techniques Used

  • OCR
  • Entity extraction

Scenario 2: Customer Support Analysis

Goal

Analyze customer complaints.

Techniques Used

  • Sentiment analysis
  • Keyword extraction

Scenario 3: Smart Security Camera

Goal

Detect suspicious activity.

Techniques Used

  • Object detection
  • Motion detection
  • Facial recognition

Scenario 4: Meeting Intelligence Platform

Goal

Generate searchable meeting notes.

Techniques Used

  • Speech recognition
  • Summarization
  • Speaker recognition

Scenario 5: Video Streaming Platform

Goal

Generate subtitles automatically.

Techniques Used

  • Speech recognition
  • Video transcription

Azure AI Services for Information Extraction

Azure AI Services provide tools for extracting information from multiple data types.

Common services include:

  • Azure AI Language
  • Azure AI Speech
  • Azure AI Vision
  • Azure AI Document Intelligence

These services allow organizations to build AI solutions without training models from scratch.


Responsible AI Considerations

Information extraction systems should follow Responsible AI principles.

Important considerations include:

  • Privacy
  • Consent
  • Data security
  • Transparency
  • Bias reduction
  • Compliance

Sensitive personal information may be present in extracted data.


Challenges in Information Extraction

AI systems may face challenges such as:

  • Poor image quality
  • Background noise
  • Ambiguous language
  • Multiple speakers
  • Handwritten text
  • Video quality issues

Performance depends heavily on data quality.


Important AI-901 Exam Tips

For the exam, remember these key points:

  • NLP extracts information from text.
  • OCR extracts text from images.
  • Speech recognition converts speech into text.
  • Object detection identifies and locates objects in images or video.
  • Video analysis can detect activities and movement.
  • Information extraction converts unstructured data into structured information.
  • Multimodal AI combines multiple data types.
  • Azure AI services provide prebuilt information extraction capabilities.

Quick Knowledge Check

Question 1

Which technique extracts text from scanned documents?

Answer

OCR.


Question 2

What does speech recognition do?

Answer

Converts spoken language into text.


Question 3

Which technique identifies objects within images?

Answer

Object detection.


Question 4

What is multimodal AI?

Answer

AI systems that process multiple types of data together, such as text, audio, and images.


Practice Exam Questions

Question 1

Which AI technique is used to extract text from scanned documents or images?

A. Sentiment analysis
B. Optical Character Recognition (OCR)
C. Object detection
D. Speech synthesis


Correct Answer

B. Optical Character Recognition (OCR)


Explanation

OCR extracts machine-readable text from images, scanned documents, and photographs.


Why the Other Answers Are Incorrect

A. Sentiment analysis

Sentiment analysis identifies emotional tone in text.

C. Object detection

Object detection identifies objects within images.

D. Speech synthesis

Speech synthesis converts text into spoken audio.


Question 2

A company wants to convert recorded customer support calls into written transcripts.

Which AI capability should be used?

A. Speech recognition
B. Facial recognition
C. Image classification
D. Regression


Correct Answer

A. Speech recognition


Explanation

Speech recognition converts spoken language into written text.


Why the Other Answers Are Incorrect

B. Facial recognition

Facial recognition analyzes faces in images.

C. Image classification

Image classification categorizes images.

D. Regression

Regression predicts numeric values.


Question 3

Which AI technique identifies and locates multiple objects within an image?

A. OCR
B. Object detection
C. Summarization
D. Clustering


Correct Answer

B. Object detection


Explanation

Object detection identifies objects and their positions within images or video frames.


Why the Other Answers Are Incorrect

A. OCR

OCR extracts text from images.

C. Summarization

Summarization condenses text.

D. Clustering

Clustering groups similar data points.


Question 4

A business wants to automatically determine whether customer reviews are positive or negative.

Which AI technique is MOST appropriate?

A. Sentiment analysis
B. OCR
C. Facial recognition
D. Image tagging


Correct Answer

A. Sentiment analysis


Explanation

Sentiment analysis evaluates emotional tone and opinions in text.


Why the Other Answers Are Incorrect

B. OCR

OCR extracts text from images.

C. Facial recognition

Facial recognition identifies people from images.

D. Image tagging

Image tagging labels image content.


Question 5

Which AI capability is commonly used to identify names, locations, and organizations within text?

A. Named Entity Recognition (NER)
B. Speech synthesis
C. Object tracking
D. Regression analysis


Correct Answer

A. Named Entity Recognition (NER)


Explanation

NER extracts entities such as people, organizations, dates, and locations from text.


Why the Other Answers Are Incorrect

B. Speech synthesis

Speech synthesis generates spoken audio.

C. Object tracking

Object tracking follows objects in video.

D. Regression analysis

Regression predicts numeric values.


Question 6

A smart security camera tracks moving vehicles across multiple video frames.

Which AI technique is being used?

A. Text classification
B. Object tracking
C. Summarization
D. Speech translation


Correct Answer

B. Object tracking


Explanation

Object tracking follows identified objects as they move through video footage.


Why the Other Answers Are Incorrect

A. Text classification

Text classification categorizes written text.

C. Summarization

Summarization condenses text.

D. Speech translation

Speech translation converts spoken language between languages.


Question 7

Which term describes AI systems that process multiple data types such as text, images, and audio together?

A. Regression AI
B. Multimodal AI
C. Clustering AI
D. Rule-based AI


Correct Answer

B. Multimodal AI


Explanation

Multimodal AI combines and processes multiple forms of data simultaneously.


Why the Other Answers Are Incorrect

A. Regression AI

Regression predicts numeric values.

C. Clustering AI

Clustering groups similar items.

D. Rule-based AI

Rule-based systems follow predefined logic rules.


Question 8

Which AI capability would MOST likely be used to generate automatic subtitles for videos?

A. Speech recognition
B. Image classification
C. Facial recognition
D. Recommendation systems


Correct Answer

A. Speech recognition


Explanation

Speech recognition converts spoken words in videos into text subtitles.


Why the Other Answers Are Incorrect

B. Image classification

Image classification categorizes images.

C. Facial recognition

Facial recognition identifies people in images.

D. Recommendation systems

Recommendation systems suggest content or products.


Question 9

A retailer wants AI to automatically identify products such as shoes, shirts, and electronics in uploaded images.

Which AI capability should be used?

A. Object detection
B. Sentiment analysis
C. Speech synthesis
D. Language translation


Correct Answer

A. Object detection


Explanation

Object detection identifies multiple objects within images and can locate them visually.


Why the Other Answers Are Incorrect

B. Sentiment analysis

Sentiment analysis evaluates text emotion.

C. Speech synthesis

Speech synthesis converts text into speech.

D. Language translation

Language translation converts text or speech between languages.


Question 10

What is the PRIMARY goal of information extraction AI systems?

A. Creating video games
B. Converting unstructured data into useful structured information
C. Compressing database files
D. Replacing all human decision-making


Correct Answer

B. Converting unstructured data into useful structured information


Explanation

Information extraction systems analyze unstructured content such as text, images, audio, and video to retrieve meaningful structured data.


Why the Other Answers Are Incorrect

A. Creating video games

This is unrelated to information extraction.

C. Compressing database files

This is a storage task, not AI extraction.

D. Replacing all human decision-making

AI systems are designed to assist and augment human processes, not completely replace all decision-making.


Final Thoughts

Information extraction is one of the most practical and widely used AI workloads covered in the AI-901 certification exam. Microsoft expects candidates to understand how AI systems extract useful insights from text, images, audio, and videos using NLP, speech AI, computer vision, and multimodal AI technologies.

These capabilities help organizations automate workflows, analyze large volumes of data, and build intelligent applications using Azure AI services.


Go to the AI-901 Exam Prep Hub main page

One thought on “Identify techniques to extract information from text, images, audio, and videos (AI-901 Exam Prep)”

Leave a comment