Tag: Information Extraction

AI, AI-103 May 25, 2026

Extract information by using multimodal pipelines that combine OCR, layout analysis, and field extraction (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement information extraction solutions (10–15%)
   --> Extract content from documents
      --> Extract information by using multimodal pipelines that combine OCR, layout analysis, and field extraction

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

For the AI-103: Develop AI Apps and Agents on Azure certification exam, an important topic within Extract content from documents is understanding how to build multimodal document-processing pipelines that combine:

OCR
Layout analysis
Field extraction
AI enrichment
Structured document understanding

Modern enterprise AI systems must process far more than plain text documents. Organizations often work with:

Scanned PDFs
Invoices
Contracts
Receipts
Forms
Medical records
Insurance claims
Multi-column reports
Handwritten documents

These files contain a mixture of:

Text
Images
Tables
Structured fields
Visual layouts
Signatures
Handwriting

Simple text extraction is often insufficient. Multimodal pipelines combine several AI capabilities to understand both the textual and visual structure of documents.

This is a major AI-103 exam topic.

What Is a Multimodal Pipeline?

A multimodal pipeline processes multiple forms of information simultaneously.

Examples of modalities:

Printed text
Handwriting
Images
Layout structure
Tables
Form fields
Visual relationships

The pipeline combines multiple AI capabilities to create structured, searchable, machine-readable outputs.

Why Multimodal Extraction Matters

Enterprise documents are rarely simple text files.

Examples:

Document Type	Challenges
Invoice	Tables, totals, vendor fields
Contract	Sections, signatures, clauses
Medical Form	Handwriting, structured fields
Receipt	Irregular layouts
Bank Statement	Multi-column formatting

Without multimodal extraction:

Context may be lost
Tables become scrambled
Relationships disappear
Important fields are missed

Core Azure Services Used

Several Azure services commonly appear in multimodal extraction architectures.

Service	Purpose
Azure AI Document Intelligence	Layout analysis and field extraction
Azure AI Vision	OCR and image analysis
Azure AI Search	Search and indexing
Azure OpenAI Service	Embeddings and AI reasoning
Azure Blob Storage	Document storage
Azure Functions	Custom processing logic

Understanding OCR

What Is OCR?

OCR stands for Optical Character Recognition.

OCR extracts machine-readable text from:

Scanned documents
Images
Photos
PDFs
Screenshots
Handwritten forms

OCR is one of the foundational technologies in document AI.

OCR Workflow

			
Scanned Document
       ↓
OCR Engine
       ↓
Extracted Text

		

OCR converts visual text into searchable digital text.

OCR Capabilities

Modern OCR systems can:

Detect printed text
Detect handwriting
Identify text coordinates
Support multiple languages
Preserve reading order

Outputs may include:

Words
Lines
Bounding boxes
Confidence scores

OCR Limitations

OCR alone has limitations.

OCR may extract:

			
Invoice
Contoso
$1250

But OCR alone does not understand:

Which value is the invoice total
Which text is the vendor name
Table relationships
Document structure

This is why layout analysis and field extraction are needed.

Layout Analysis

What Is Layout Analysis?

Layout analysis identifies the structural organization of a document.

It detects:

Headers
Footers
Paragraphs
Tables
Columns
Sections
Reading order
Form structures

This helps preserve document meaning.

Why Layout Analysis Matters

Consider a multi-column report.

Without layout analysis:

Text from separate columns may become mixed together.

With layout analysis:

Columns remain separate
Reading order is preserved
Structure is maintained

This improves:

Search quality
AI reasoning
Data extraction accuracy

Layout Extraction Example

Example invoice structure:

			
Invoice
 ├── Vendor Name
 ├── Invoice Number
 ├── Line Item Table
 └── Total Amount

		

Layout-aware systems preserve these relationships.

Table Extraction

Tables are common in enterprise documents.

Examples:

Financial reports
Invoices
Receipts
Medical records

Without layout analysis:

Rows and columns may become scrambled

With layout-aware extraction:

Rows remain intact
Columns remain aligned
Relationships are preserved

This is heavily tested in AI-103 scenarios.

Field Extraction

What Is Field Extraction?

Field extraction identifies specific business values within documents.

Examples:

Document	Extracted Fields
Invoice	Invoice number, total
Receipt	Merchant, purchase amount
Contract	Effective date
ID Document	Name, DOB

Structured Field Extraction

Field extraction converts unstructured documents into structured data.

Example:

			
{
  "vendor": "Contoso",
  "invoiceNumber": "INV-1023",
  "total": "$1250"
}

		

This enables:

Automation
Analytics
Workflow integration
Search indexing

Azure AI Document Intelligence

Azure AI Document Intelligence is a core Azure service for:

OCR
Layout analysis
Table extraction
Field extraction
Form understanding

This service is central to the AI-103 information extraction objectives.

Prebuilt Models

Document Intelligence includes prebuilt models for common document types.

Examples:

Model	Purpose
Invoice Model	Extract invoice fields
Receipt Model	Extract receipt data
ID Document Model	Extract identity fields
Business Card Model	Extract contact information

Example Invoice Extraction

Input:

Invoice PDF

Output:

			
{
  "VendorName": "Contoso",
  "InvoiceDate": "2026-05-10",
  "TotalAmount": "$1250"
}

		

Custom Models

Organizations often require extraction for specialized documents.

Examples:

Insurance claims
Healthcare forms
Legal documents
Internal business forms

Custom models can be trained using labeled examples.

Multimodal Pipeline Architecture

Typical architecture:

			
Document Upload
       ↓
OCR Processing
       ↓
Layout Analysis
       ↓
Field Extraction
       ↓
AI Enrichment
       ↓
Indexing / Workflow

		

AI Enrichment After Extraction

Once structured data is extracted, additional enrichment may occur:

Entity recognition
Classification
Summarization
Embedding generation
Metadata tagging

These enrichments support:

Search
RAG
AI agents
Analytics

Combining OCR with Search Pipelines

Extracted content is commonly indexed into:
Azure AI Search

This enables:

Semantic search
Hybrid search
Vector retrieval
Grounded AI responses

Embeddings and RAG

Multimodal extraction often feeds Retrieval-Augmented Generation systems.

Workflow:

			
Document
    ↓
OCR + Layout + Fields
    ↓
Chunking
    ↓
Embeddings
    ↓
Vector Index
    ↓
Grounded AI Retrieval

		

Confidence Scores

Extraction systems commonly produce confidence scores.

Example:

			
Invoice Total:
$1250
Confidence: 98%

Confidence scores help:

Validate automation
Trigger human review
Improve quality control

Human-in-the-Loop Validation

Some workflows include manual review when:

Confidence is low
Documents are ambiguous
Fields are missing
Handwriting is unclear

This is common in:

Financial systems
Healthcare
Insurance
Compliance workflows

Security Considerations

Document pipelines may process sensitive data:

Financial records
PII
Healthcare data
Legal documents

Security measures include:

RBAC
Encryption
Managed identities
Secure storage
Access controls

Important AI-103 concept:

Extracted data must remain secure throughout the pipeline.

Performance Optimization

Optimization techniques include:

Batch processing
Incremental ingestion
Selective OCR
Parallel document processing
Caching enrichment outputs

Common AI-103 Scenarios

Scenario 1

You need to extract invoice totals and vendor names.

Solution:

Document Intelligence invoice model

Scenario 2

You need searchable scanned PDFs.

Solution:

OCR
Azure AI Search indexing

Scenario 3

You need to preserve table structures.

Solution:

Layout analysis

Scenario 4

You need extraction from specialized business forms.

Solution:

Custom Document Intelligence model

Important AI-103 Exam Tips

Know These Core Concepts

Concept	Purpose
OCR	Extract text from images
Layout Analysis	Preserve document structure
Field Extraction	Identify business values
Table Extraction	Preserve row/column relationships
Prebuilt Models	Common document extraction
Custom Models	Specialized extraction scenarios

Frequently Tested Knowledge Areas

Expect questions involving:

OCR workflows
Layout-aware extraction
Table extraction
Invoice processing
Document Intelligence models
Confidence scores
Custom extraction models
Multimodal document pipelines
RAG ingestion integration

Final Thoughts

Multimodal document pipelines are foundational to modern enterprise AI systems.

For AI-103, focus heavily on:

OCR
Layout analysis
Field extraction
Table preservation
Azure AI Document Intelligence
Prebuilt models
Custom extraction models
Search integration
RAG workflows

These technologies enable intelligent document processing, enterprise search, grounded AI, and workflow automation solutions on Azure.

Practice Exam Questions

Question 1

What is the primary purpose of OCR in a document-processing pipeline?

A. Encrypt documents
B. Convert visual text into machine-readable text
C. Generate embeddings
D. Compress PDFs

Answer

B. Convert visual text into machine-readable text

Question 2

Which Azure service is primarily used for layout analysis and field extraction?

A. Azure Monitor
B. Azure Firewall
C. Azure DNS
D. Azure AI Document Intelligence

Answer

D. Azure AI Document Intelligence

Question 3

Why is layout analysis important in document extraction?

A. It reduces storage costs
B. It preserves document structure and relationships
C. It encrypts extracted fields
D. It eliminates OCR requirements

Answer

B. It preserves document structure and relationships

Question 4

Which capability extracts specific business values such as invoice totals or dates?

A. OCR
B. Sentiment analysis
C. Field extraction
D. Vector search

Answer

C. Field extraction

Question 5

What is a major advantage of table extraction?

A. It preserves row and column relationships
B. It compresses document size
C. It replaces embeddings
D. It removes metadata

Answer

A. It preserves row and column relationships

Question 6

Which model would best extract fields from a receipt?

A. Sentiment model
B. Translation model
C. Receipt prebuilt model
D. OCR-only model

Answer

C. Receipt prebuilt model

Question 7

What is a common use case for custom extraction models?

A. Hosting virtual machines
B. Processing specialized business forms
C. Managing Azure subscriptions
D. Configuring networking

Answer

B. Processing specialized business forms

Question 8

What do confidence scores represent in document extraction systems?

A. Encryption strength
B. Estimated reliability of extracted data
C. Search ranking scores
D. Vector dimensions

Answer

B. Estimated reliability of extracted data

Question 9

Which Azure service commonly stores searchable extracted content?

A. Azure Load Balancer
B. Azure Backup
C. Azure Policy
D. Azure AI Search

Answer

D. Azure AI Search

Question 10

What is the benefit of combining OCR, layout analysis, and field extraction?

A. It eliminates the need for indexing
B. It enables richer and more accurate document understanding
C. It replaces vector search entirely
D. It only works for structured databases

Answer

B. It enables richer and more accurate document understanding

Go to the AI-103 Exam Prep Hub main page

AI, AI-103, Microsoft Certification May 25, 2026

Produce clean, grounded representations to use with agents and RAG by using Content Understanding (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement information extraction solutions (10–15%)
   --> Extract content from documents
      --> Produce clean, grounded representations to use with agents and RAG by using Content Understanding

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

For the AI-103: Develop AI Apps and Agents on Azure certification exam, an important topic within Extract content from documents is understanding how to create clean, grounded representations of enterprise content for use with:

AI agents
Retrieval-Augmented Generation (RAG)
Enterprise search
Knowledge mining
Intelligent copilots

Modern AI systems require more than simple text extraction. Raw document data is often:

Noisy
Unstructured
Incomplete
Difficult for LLMs to interpret
Poorly suited for retrieval pipelines

Content Understanding focuses on transforming raw enterprise content into structured, meaningful, semantically rich representations that AI systems can reliably retrieve and reason over.

This is a foundational concept for enterprise AI architectures on Azure.

What Is Content Understanding?

Content Understanding refers to the process of:

Extracting
Structuring
Enriching
Normalizing
Organizing

information from documents and multimodal content so it can be effectively used by AI systems.

The goal is to produce:

Clean data
Structured representations
Semantic meaning
Grounded retrieval content

This improves:

AI accuracy
Retrieval quality
Grounding reliability
Agent reasoning

Why Content Understanding Matters

Large Language Models (LLMs) are powerful, but raw enterprise data is often problematic.

Examples of issues:

OCR noise
Poor formatting
Mixed layouts
Duplicate text
Unstructured fields
Broken tables
Missing metadata

Without content understanding:

Retrieval quality suffers
AI hallucinations increase
Agents misinterpret data
Search relevance decreases

Goal of Content Understanding

The objective is to transform raw content like this:

			
INV 1032
CNTSO LTD
T0TAL 1,250

into structured, grounded representations like this:

			
{
  "documentType": "Invoice",
  "vendor": "Contoso Ltd",
  "invoiceNumber": "1032",
  "totalAmount": "$1250"
}

		

This structured representation is much more useful for:

RAG
AI agents
Search
Workflow automation

Core Azure Services Used

Several Azure services commonly appear in content understanding pipelines.

Service	Purpose
Azure AI Document Intelligence	OCR, layout analysis, field extraction
Azure AI Search	Search indexing and retrieval
Azure OpenAI Service	Embeddings and grounded generation
Azure AI Vision	OCR and image understanding
Azure AI Language	Entity extraction and NLP enrichment
Azure Blob Storage	Source content storage
Azure AI Foundry	AI orchestration and agent development

Content Understanding Pipeline

A typical pipeline looks like this:

			
Raw Documents
      ↓
OCR Extraction
      ↓
Layout Analysis
      ↓
Field Extraction
      ↓
Normalization
      ↓
Metadata Enrichment
      ↓
Chunking
      ↓
Embeddings
      ↓
Search Index / RAG

		

Step 1: OCR Extraction

What Is OCR?

OCR (Optical Character Recognition) converts visual text into machine-readable text.

Common document sources:

Scanned PDFs
Images
Receipts
Contracts
Forms
Screenshots

OCR is foundational for content understanding.

OCR Challenges

OCR output is not always clean.

Problems may include:

Misspelled words
Broken formatting
Incorrect characters
Missing spacing
Reading-order issues

Example:

TOTAI:

instead of:

TOTAL:

Content understanding pipelines help correct and normalize these issues.

Step 2: Layout Analysis

Why Layout Matters

Documents contain visual structure:

Headers
Sections
Tables
Columns
Forms
Labels

Simple text extraction often destroys this structure.

Layout-Aware Processing

Layout analysis preserves:

Reading order
Relationships
Table alignment
Section hierarchy

Example:

			
Invoice
 ├── Vendor
 ├── Date
 ├── Line Items
 └── Total

		

This structural understanding improves downstream AI reasoning.

Step 3: Field Extraction

Field extraction identifies business-relevant information.

Examples:

Document Type	Fields
Invoice	Invoice number, total
Receipt	Merchant, amount
Contract	Effective date
Insurance Form	Policy number

Structured field extraction is heavily tested in AI-103.

Prebuilt Models

Azure AI Document Intelligence provides prebuilt models for:

Invoices
Receipts
IDs
Business cards
Contracts

These models simplify extraction workflows.

Step 4: Normalization

What Is Normalization?

Normalization standardizes extracted data.

Examples:

Raw Value	Normalized Value
5/10/26	2026-05-10
USD 1,250	1250.00
Contso	Contoso

Normalization improves:

Search consistency
Analytics
Retrieval quality
Agent reliability

Step 5: Metadata Enrichment

Metadata adds semantic meaning to extracted content.

Examples:

Document type
Department
Region
Classification
Language
Entities
Topics

Example:

			
{
  "department": "Finance",
  "documentType": "Invoice",
  "region": "US"
}

		

Metadata improves:

Filtering
Security trimming
Semantic retrieval
Agent routing

Step 6: Chunking

Why Chunking Matters

Large documents exceed LLM token limits.

Chunking splits documents into manageable pieces.

Good chunking:

Preserves context
Improves embeddings
Enhances retrieval precision

Chunking Strategies

Fixed-Length Chunking

Example:

500-token chunks

Semantic Chunking

Split by:

Headings
Sections
Topics

Overlapping Chunks

Preserve context continuity.

Step 7: Embeddings

What Are Embeddings?

Embeddings are numerical vector representations of content.

Embeddings allow:

Semantic similarity search
Vector retrieval
Grounded RAG retrieval

Generated using:

Azure OpenAI Service
Azure AI Foundry models

Vector Retrieval

After embeddings are generated:

Vectors are stored in indexes
User queries are vectorized
Similar content is retrieved

This supports:

RAG
AI agents
Semantic search

Grounded Representations

What Does “Grounded” Mean?

Grounded representations are:

Accurate
Structured
Relevant
Contextual
Linked to trusted sources

Grounding reduces hallucinations by ensuring the AI uses verified enterprise content.

Content Understanding for Agents

AI agents rely heavily on:

Structured retrieval
Metadata
Semantic context
Actionable content

Poor-quality extracted data causes:

Incorrect reasoning
Failed workflows
Hallucinated responses

Content understanding improves agent reliability.

Example Agent Workflow

			
User Request
      ↓
Retrieve Structured Knowledge
      ↓
Ground Prompt
      ↓
Agent Reasoning
      ↓
Workflow Execution

		

Content Understanding and RAG

Content understanding dramatically improves Retrieval-Augmented Generation systems.

Without content understanding:

Retrieval becomes noisy
Context quality suffers
Irrelevant chunks appear

With content understanding:

Retrieval precision improves
Prompts become cleaner
Responses become more accurate

Semantic Enrichment

Additional enrichment may include:

Entity recognition
Key phrase extraction
Classification
Sentiment analysis
Summarization

These enrichments create richer representations for retrieval systems.

Search Integration

Processed content is often indexed into:
Azure AI Search

This enables:

Semantic search
Hybrid search
Vector search
Metadata filtering

Security Considerations

Enterprise content pipelines often process:

Financial records
Healthcare information
Legal documents
Sensitive business data

Security measures include:

RBAC
Encryption
Managed identities
Document-level permissions

Important exam concept:

Retrieval systems should return only authorized content.

Human-in-the-Loop Validation

Some workflows include manual review when:

OCR confidence is low
Fields are ambiguous
Documents are poorly scanned
Compliance validation is required

This is common in:

Finance
Insurance
Healthcare
Legal systems

Common AI-103 Scenarios

Scenario 1

You need AI agents to answer questions from invoices.

Solution:

OCR
Layout extraction
Field extraction
Structured grounding

Scenario 2

You need better RAG retrieval quality.

Solution:

Semantic chunking
Metadata enrichment
Clean representations

Scenario 3

You need enterprise search over scanned documents.

Solution:

OCR
Azure AI Search
Embeddings

Scenario 4

You need structured extraction from forms.

Solution:

Azure AI Document Intelligence
Prebuilt or custom models

Important AI-103 Exam Tips

Know These Core Concepts

Concept	Purpose
OCR	Extract text from images
Layout Analysis	Preserve document structure
Field Extraction	Extract business values
Normalization	Standardize extracted data
Embeddings	Semantic vector representations
Grounding	Provide trusted AI context
Metadata Enrichment	Add semantic meaning

Frequently Tested Knowledge Areas

Expect questions involving:

OCR workflows
Layout-aware extraction
Document Intelligence models
Metadata enrichment
Chunking strategies
Embedding generation
Vector retrieval
RAG grounding
AI agent retrieval pipelines

Final Thoughts

Content Understanding is foundational for enterprise AI systems built on Azure.

For AI-103, focus heavily on:

OCR
Layout analysis
Field extraction
Metadata enrichment
Normalization
Chunking
Embeddings
Grounded retrieval
RAG architectures
Agent-ready structured representations

These capabilities enable intelligent search, reliable AI agents, and grounded generative AI applications.

Practice Exam Questions

Question 1

What is the primary purpose of Content Understanding in AI pipelines?

A. Encrypt documents
B. Create structured, meaningful representations from raw content
C. Replace embeddings entirely
D. Eliminate OCR requirements

Answer

B. Create structured, meaningful representations from raw content

Question 2

Which Azure service is primarily used for layout analysis and field extraction?

A. Azure Monitor
B. Azure DNS
C. Azure AI Document Intelligence
D. Azure Firewall

Answer

C. Azure AI Document Intelligence

Question 3

Why is normalization important in document pipelines?

A. It increases storage consumption
B. It removes vector embeddings
C. It replaces OCR processing
D. It standardizes extracted values for consistency

Answer

D. It standardizes extracted values for consistency

Question 4

What is the purpose of embeddings in RAG systems?

A. Compress images
B. Encrypt metadata
C. Represent content numerically for semantic retrieval
D. Replace search indexes

Answer

C. Represent content numerically for semantic retrieval

Question 5

Which capability preserves document structure such as tables and reading order?

A. Sentiment analysis
B. Layout analysis
C. Tokenization
D. Compression

Answer

B. Layout analysis

Question 6

What is grounding in a generative AI solution?

A. Providing trusted contextual information to the AI model
B. Removing duplicate documents
C. Encrypting vector indexes
D. Reducing token counts

Answer

A. Providing trusted contextual information to the AI model

Question 7

Which Azure service commonly stores searchable vector indexes?

A. Azure AI Search
B. Azure Backup
C. Azure Policy
D. Azure DevTest Labs

Answer

A. Azure AI Search

Question 8

Why is chunking important in RAG pipelines?

A. It reduces OCR quality
B. It splits documents into manageable retrieval units
C. It encrypts document metadata
D. It removes structured fields

Answer

B. It splits documents into manageable retrieval units

Question 9

Which process identifies business values such as invoice totals or policy numbers?

A. OCR
B. Translation
C. Semantic ranking
D. Field extraction

Answer

D. Field extraction

Question 10

What is a major benefit of clean, grounded representations for AI agents?

A. Reduced storage costs only
B. Improved reasoning and retrieval accuracy
C. Elimination of embeddings
D. Removal of metadata requirements

Answer

B. Improved reasoning and retrieval accuracy

Go to the AI-103 Exam Prep Hub main page

AI, AI-103, Microsoft Certification May 25, 2026

Implement analyzers for generating structured or markdown outputs for downstream reasoning by using Content Understanding (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement information extraction solutions (10–15%)
   --> Extract content from documents
      --> Implement analyzers for generating structured or markdown outputs for downstream reasoning by using Content Understanding

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

For the AI-103: Develop AI Apps and Agents on Azure certification exam, an important topic within Extract content from documents is understanding how to implement analyzers that generate:

Structured outputs
Markdown outputs
Semantically organized representations

for use in:

AI agents
Retrieval-Augmented Generation (RAG)
Search systems
Downstream reasoning pipelines
Enterprise copilots

Modern AI systems require more than raw OCR text. Enterprise content must be transformed into representations that:

Preserve meaning
Retain structure
Improve retrieval quality
Support reasoning by LLMs
Enable grounded AI responses

This is where Content Understanding analyzers become critical.

What Is Content Understanding?

Content Understanding refers to transforming raw enterprise content into:

Structured
Semantically meaningful
AI-friendly representations

This process often includes:

OCR
Layout analysis
Field extraction
Metadata enrichment
Content normalization
Output formatting

The goal is to prepare information for:

Retrieval
Search
Grounding
Agent reasoning

Why Output Formatting Matters

Raw extracted text is often messy and difficult for AI systems to reason over.

Example raw OCR output:

Invoice 1023 contoso ltd total 1250 due june 1

This lacks:

Structure
Readability
Semantic organization
Field relationships

Structured or Markdown outputs improve downstream AI performance significantly.

What Are Analyzers?

Analyzers are processing components that:

Interpret extracted content
Organize information
Generate structured representations
Produce AI-friendly outputs

Analyzers help transform content into:

JSON
Markdown
Structured objects
Semantic chunks
Hierarchical content

Why Structured Outputs Matter

Structured outputs improve:

Retrieval precision
Prompt grounding
Agent reasoning
Workflow automation
Search quality

Example structured output:

			
{
  "documentType": "Invoice",
  "vendor": "Contoso Ltd",
  "invoiceNumber": "1023",
  "totalAmount": "$1250"
}

		

Structured data is easier for:

AI agents
APIs
Search indexes
Automation systems

Why Markdown Outputs Matter

Markdown preserves:

Hierarchy
Headings
Lists
Tables
Readability
Contextual structure

Markdown is especially useful for:

RAG pipelines
LLM prompting
Semantic chunking
Knowledge retrieval

Example Markdown Output

			
# Invoice
## Vendor
Contoso Ltd
## Invoice Number
1023
## Total Amount
$1250

		

Compared to raw OCR text, Markdown provides:

Better semantic structure
Improved chunking
Enhanced reasoning quality

Core Azure Services Used

Several Azure services commonly appear in these architectures.

Service	Purpose
Azure AI Document Intelligence	OCR, layout analysis, field extraction
Azure AI Search	Search indexing and retrieval
Azure OpenAI Service	Embeddings and reasoning
Azure AI Vision	OCR and image analysis
Azure AI Language	NLP enrichment
Azure Functions	Custom analyzers and transformations
Azure Blob Storage	Document storage

Content Understanding Pipeline

Typical pipeline:

			
Raw Document
      ↓
OCR
      ↓
Layout Analysis
      ↓
Field Extraction
      ↓
Analyzer Processing
      ↓
Structured / Markdown Output
      ↓
Chunking + Embeddings
      ↓
RAG / Agent Retrieval

		

OCR and Text Extraction

What Is OCR?

OCR (Optical Character Recognition) converts visual text into machine-readable text.

OCR is foundational for:

Scanned PDFs
Receipts
Images
Forms
Contracts

However, OCR alone is not sufficient for downstream reasoning.

OCR Challenges

Raw OCR may contain:

Noise
Incorrect spacing
Mixed reading order
Formatting issues

Example:

T0TAL

instead of:

TOTAL

Analyzers help normalize and organize extracted content.

Layout Analysis

Why Layout Matters

Documents contain structural relationships:

Headings
Sections
Tables
Columns
Labels

Layout analysis preserves these relationships.

Without layout analysis:

Content becomes flattened
Context may be lost
Tables may break

Table Preservation

Example table:

Item	Price
Laptop	$1200
Mouse	$50

Without layout-aware extraction:

Laptop 1200 Mouse 50

With structured formatting:

			
| Item | Price |
|---|---|
| Laptop | $1200 |
| Mouse | $50 |

Markdown tables preserve meaning for downstream reasoning.

Field Extraction

Field extraction identifies business-critical values.

Examples:

Invoice totals
Dates
Vendor names
Policy numbers
Customer IDs

Analyzers often convert these fields into:

JSON objects
Structured metadata
Searchable entities

Structured JSON Outputs

JSON is useful for:

APIs
Workflow automation
Agent tools
Databases

Example:

			
{
  "vendor": "Contoso",
  "invoiceDate": "2026-05-10",
  "total": 1250
}

		

Benefits:

Machine-readable
Consistent schema
Easy filtering
Strong validation

Markdown Outputs for RAG

Markdown is especially useful for LLM-based systems because it:

Preserves hierarchy
Improves chunk boundaries
Enhances readability
Supports semantic structure

Example:

			
# Security Policy
## Password Requirements
- Minimum 12 characters
- MFA required

This structure improves retrieval quality significantly.

Semantic Chunking

Analyzers often support semantic chunking.

Instead of arbitrary token splits:

Chunks follow sections
Headings are preserved
Context remains intact

Benefits:

Better embeddings
Higher retrieval precision
Improved grounding

Metadata Enrichment

Analyzers often attach metadata such as:

Document type
Department
Security classification
Topic
Language

Example:

			
{
  "documentType": "Contract",
  "department": "Legal",
  "classification": "Confidential"
}

		

Metadata improves:

Filtering
Security trimming
Agent routing
Search precision

Downstream Reasoning

What Is Downstream Reasoning?

Downstream reasoning refers to how AI systems use extracted content after ingestion.

Examples:

RAG prompting
Agent planning
Workflow decisions
Semantic retrieval
Summarization

Cleaner representations improve reasoning quality.

Why AI Agents Need Structured Content

Agents frequently:

Retrieve knowledge
Call tools
Execute workflows
Make decisions

Poorly structured content can cause:

Hallucinations
Incorrect actions
Failed workflows
Poor retrieval

Structured and Markdown outputs improve agent reliability.

RAG Integration

Structured outputs commonly feed Retrieval-Augmented Generation pipelines.

Workflow:

			
Document
    ↓
Analyzer
    ↓
Markdown / JSON
    ↓
Embeddings
    ↓
Vector Search
    ↓
Grounded LLM Prompt

		

Embeddings and Semantic Retrieval

Generated outputs are often:

Chunked
Embedded
Indexed into vector stores

Commonly using:
Azure AI Search

This enables:

Semantic search
Hybrid search
Grounded retrieval

Content Understanding and AI Search

Structured outputs improve search quality because:

Metadata is cleaner
Sections are preserved
Semantic meaning is retained

This improves:

Relevance ranking
Hybrid retrieval
AI grounding

Human-in-the-Loop Validation

Some systems include human review when:

Confidence scores are low
OCR quality is poor
Structured extraction fails
Compliance is required

This is common in:

Healthcare
Finance
Insurance
Legal systems

Security Considerations

Enterprise document systems often contain:

PII
Financial data
Legal records
Sensitive business information

Security measures include:

RBAC
Managed identities
Encryption
Access filtering
Secure indexing

Important exam concept:

AI retrieval systems should enforce document-level security.

Common AI-103 Scenarios

Scenario 1

You need AI-friendly representations of contracts.

Solution:

Layout analysis
Markdown output
Semantic chunking

Scenario 2

You need workflow automation from invoices.

Solution:

Structured JSON extraction
Field extraction
Custom analyzers

Scenario 3

You need improved RAG retrieval quality.

Solution:

Markdown formatting
Structured metadata
Semantic chunking

Scenario 4

You need searchable scanned PDFs.

Solution:

OCR
Azure AI Search
Content Understanding pipeline

Important AI-103 Exam Tips

Know These Core Concepts

Concept	Purpose
OCR	Extract text from images
Layout Analysis	Preserve document structure
Structured Output	Machine-readable representation
Markdown Output	AI-friendly semantic formatting
Semantic Chunking	Preserve contextual boundaries
Metadata Enrichment	Improve retrieval and filtering
Grounding	Provide trusted AI context

Frequently Tested Knowledge Areas

Expect questions involving:

OCR workflows
Markdown generation
Structured extraction
JSON outputs
Semantic chunking
Metadata enrichment
AI Search integration
RAG pipelines
Agent-ready document representations

Final Thoughts

Implementing analyzers that generate structured and Markdown outputs is a foundational capability for modern enterprise AI systems.

For AI-103, focus heavily on:

OCR
Layout analysis
Field extraction
Structured outputs
Markdown formatting
Semantic chunking
Metadata enrichment
Grounded retrieval
RAG architectures
Agent-ready content pipelines

These technologies dramatically improve the quality, reliability, and reasoning capabilities of AI agents and enterprise generative AI applications.

Practice Exam Questions

Question 1

What is the primary purpose of generating structured outputs from documents?

A. Reduce network bandwidth
B. Create machine-readable representations for downstream processing
C. Eliminate OCR requirements
D. Replace vector search

Answer

B. Create machine-readable representations for downstream processing

Question 2

Why are Markdown outputs useful for RAG systems?

A. They encrypt content automatically
B. They eliminate chunking requirements
C. They preserve semantic structure and readability
D. They reduce vector dimensions

Answer

C. They preserve semantic structure and readability

Question 3

Which Azure service is commonly used for OCR and layout analysis?

A. Azure AI Document Intelligence
B. Azure Monitor
C. Azure DNS
D. Azure Backup

Answer

A. Azure AI Document Intelligence

Question 4

What is semantic chunking?

A. Encrypting document sections
B. Splitting content based on logical meaning and structure
C. Removing metadata
D. Compressing embeddings

Answer

B. Splitting content based on logical meaning and structure

Question 5

Which output format is especially useful for APIs and workflow automation?

A. Markdown
B. PDF
C. JPEG
D. JSON

Answer

D. JSON

Question 6

Why is layout analysis important in Content Understanding pipelines?

A. It reduces storage costs
B. It preserves document structure and relationships
C. It replaces OCR processing
D. It removes metadata fields

Answer

B. It preserves document structure and relationships

Question 7

Which Azure service commonly stores searchable vector indexes?

A. Azure AI Search
B. Azure Firewall
C. Azure Policy
D. Azure Backup

Answer

A. Azure AI Search

Question 8

What is the purpose of metadata enrichment?

A. Increase OCR noise
B. Eliminate search indexes
C. Replace embeddings
D. Add semantic meaning and filtering information

Answer

D. Add semantic meaning and filtering information

Question 9

Why do AI agents benefit from structured and Markdown outputs?

A. They reduce storage usage only
B. They improve reasoning and retrieval quality
C. They eliminate the need for embeddings
D. They replace semantic search entirely

Answer

B. They improve reasoning and retrieval quality

Question 10

What is grounding in a generative AI system?

A. Compressing vector databases
B. Removing document metadata
C. Reducing OCR confidence scores
D. Providing trusted contextual information to the model

Answer

D. Providing trusted contextual information to the model

Go to the AI-103 Exam Prep Hub main page

AI, AI-901, azure, Microsoft Certification May 18, 2026

Build a lightweight application with Information Extraction capabilities by using Content Understanding (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions for information extraction by using Foundry
      --> Build a lightweight application with Information Extraction capabilities by using Content Understanding

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Modern organizations often need applications that can automatically extract information from documents, images, audio, and video. Azure AI services and Microsoft Foundry tools make it possible to create lightweight applications that use AI-powered content understanding without requiring advanced machine learning expertise.

For the AI-901 certification exam, candidates should understand the foundational concepts involved in building lightweight applications with information extraction capabilities by using Azure Content Understanding and Microsoft Foundry.

This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.

What Is Information Extraction?

Information extraction is the process of automatically identifying and retrieving useful data from content.

AI systems can extract information from:

Documents
Images
Audio
Video
Text

Examples include:

Names
Dates
Invoice totals
Keywords
Objects
Spoken words

What Is Azure Content Understanding?

Azure Content Understanding enables AI-powered analysis of different types of content.

Capabilities include:

OCR (Optical Character Recognition)
Speech recognition
Entity extraction
Image analysis
Video analysis
Classification
Caption generation

What Is a Lightweight Application?

A lightweight application is a simple application that performs focused tasks using cloud-based AI services.

Characteristics include:

Minimal infrastructure
API-based communication
Rapid development
Simple user interface
Cloud-hosted AI processing

For AI-901, candidates should understand concepts and workflows rather than advanced coding details.

Azure AI Foundry

Azure AI Foundry provides tools for building and testing AI applications.

Developers can:

Access AI models
Configure services
Test prompts
Analyze content
Build AI-powered workflows

Common Information Extraction Capabilities

OCR (Optical Character Recognition)

OCR extracts text from images and scanned documents.

Example

Input

Photo of a receipt

Output

Store name
Total amount
Purchase date

Entity Extraction

AI systems can identify important entities within content.

Examples of Entities

Names
Locations
Organizations
Phone numbers
Dates

Speech Recognition

Speech recognition converts spoken language into text.

Example

Input

Customer support call recording

Output

Searchable transcript

Object Detection

Object detection identifies objects within images or video.

Example

A warehouse-monitoring application may detect:

Boxes
Forklifts
Employees

Sentiment Analysis

Sentiment analysis determines emotional tone.

Example

Customer feedback classified as:

Positive
Neutral
Negative

Typical Lightweight Application Workflow

A lightweight information-extraction application often follows these steps:

User uploads content
Application sends content to Azure AI service
AI analyzes content
Structured results are returned
Application displays extracted information

Example Workflow

User uploads:

Image
PDF
Audio file
Video file

AI extracts:

Text
Keywords
Objects
Entities
Captions

APIs and Endpoints

Applications communicate with Azure AI services through:

APIs
Endpoints

The application sends content to the AI service and receives structured results.

Authentication

Applications must authenticate securely before using Azure AI services.

Common authentication methods include:

API keys
Azure credentials
Managed identities

Example High-Level Pseudocode

			
content = upload_file()
results = analyze_content(content)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Structured Outputs

AI systems often return structured data formats such as:

JSON
Tables
Lists
Metadata

Structured outputs make integration easier.

Example JSON-Like Output

			
{
  "invoiceNumber": "INV-1001",
  "date": "2026-05-15",
  "total": "$245.99"
}

		

Common Real-World Scenarios

Scenario 1: Invoice Processing

Goal

Automatically extract invoice data.

Extracted Information

Vendor name
Invoice number
Total amount
Due date

Scenario 2: Customer Service Analytics

Goal

Analyze customer interactions.

Extracted Information

Topics
Sentiment
Keywords
Transcripts

Scenario 3: Healthcare Document Analysis

Goal

Extract information from medical documents.

Extracted Information

Patient names
Dates
Medical terms

Scenario 4: Media Monitoring

Goal

Analyze audio and video content.

Extracted Information

Captions
Objects
Speakers
Keywords

Responsible AI Considerations

Information-extraction applications should follow Responsible AI principles.

Key considerations include:

Privacy
Fairness
Transparency
Inclusiveness
Accountability
Security

Privacy Concerns

Content may contain:

Personal information
Financial records
Medical data
Private conversations

Organizations should secure sensitive data appropriately.

Fairness and Bias

AI systems may perform differently across:

Languages
Accents
Demographics
Image quality
Environmental conditions

Testing and evaluation are important.

Transparency

Users should understand:

AI is analyzing their content
AI-generated outputs may contain errors
Human review may still be needed

Accuracy Limitations

Information-extraction systems may struggle with:

Blurry images
Poor audio quality
Handwritten text
Background noise
Low-resolution files

Hallucinations and Errors

AI systems may occasionally:

Extract incorrect information
Misidentify objects
Misinterpret speech
Generate inaccurate summaries

Applications should validate important outputs.

Error Handling

Applications should handle:

Unsupported file formats
Corrupted files
Authentication failures
Network interruptions
Rate limits

Advantages of Lightweight AI Applications

Benefits include:

Rapid deployment
Reduced development complexity
Scalability
Automation
Faster information processing

Limitations of Lightweight AI Applications

Challenges include:

Dependence on cloud services
Accuracy limitations
Privacy concerns
Potential bias
Environmental variability

Multimodal AI

Modern AI systems can combine:

Text
Speech
Vision
Generative AI

These systems can process multiple content types together.

High-Level Architecture

A simplified architecture often includes:

User uploads content
Application sends content to Azure AI service
AI analyzes content
Structured results are returned
Application displays extracted information

Important AI-901 Exam Tips

For the exam, remember these key points:

Information extraction retrieves useful data from content.
OCR extracts text from images and documents.
Speech recognition converts speech into text.
Object detection identifies objects within images or video.
APIs and endpoints connect applications to Azure AI services.
Authentication secures access to AI resources.
Structured outputs often use JSON-like formats.
Responsible AI principles apply to information extraction systems.
Poor-quality content can reduce accuracy.
Hallucinations are inaccurate AI-generated outputs.
Azure AI Foundry supports AI application development.

Quick Knowledge Check

Question 1

What does OCR do?

Answer

Extracts text from images and scanned documents.

Question 2

What does speech recognition do?

Answer

Converts spoken language into text.

Question 3

Why is authentication important?

Answer

It secures access to Azure AI services.

Question 4

What can reduce information-extraction accuracy?

Answer

Poor-quality images, background noise, and blurry documents.

Practice Exam Questions

Exam: AI-901

Topic: Build a Lightweight Application with Information Extraction Capabilities by Using Content Understanding

Question 1

What is the PRIMARY purpose of information extraction in AI applications?

A. To automatically retrieve useful data from content
B. To increase internet speed
C. To replace operating systems
D. To improve monitor resolution

Correct Answer

A. To automatically retrieve useful data from content

Explanation

Information extraction uses AI to identify and retrieve meaningful data from documents, images, audio, video, and text.

Why the Other Answers Are Incorrect

B. To increase internet speed

Information extraction does not improve networking performance.

C. To replace operating systems

AI extraction tools do not replace operating systems.

D. To improve monitor resolution

This is unrelated to AI information extraction.

Question 2

What does OCR stand for?

A. Optical Character Recognition
B. Open Cloud Routing
C. Operational Content Reporting
D. Object Classification Retrieval

Correct Answer

A. Optical Character Recognition

Explanation

OCR extracts machine-readable text from images and scanned documents.

Why the Other Answers Are Incorrect

B. Open Cloud Routing

This is not an OCR term.

C. Operational Content Reporting

This is unrelated to text extraction.

D. Object Classification Retrieval

This is not the meaning of OCR.

Question 3

Which AI capability converts spoken language into text?

A. Speech recognition
B. Image classification
C. Speech synthesis
D. Object detection

Correct Answer

A. Speech recognition

Explanation

Speech recognition transcribes spoken words into text.

Why the Other Answers Are Incorrect

B. Image classification

This categorizes images.

C. Speech synthesis

This converts text into spoken audio.

D. Object detection

This identifies objects within images or video.

Question 4

What is a lightweight AI application?

A. A simple application that uses cloud AI services for focused tasks
B. A hardware-only system
C. A networking device
D. A spreadsheet management tool

Correct Answer

A. A simple application that uses cloud AI services for focused tasks

Explanation

Lightweight applications typically use APIs and cloud services to provide AI capabilities without requiring complex infrastructure.

Why the Other Answers Are Incorrect

B. A hardware-only system

Lightweight AI apps commonly use cloud services.

C. A networking device

Networking devices are unrelated.

D. A spreadsheet management tool

This is unrelated to AI application design.

Question 5

How do lightweight AI applications commonly communicate with Azure AI services?

A. Through APIs and endpoints
B. Through printer drivers
C. Through monitor settings
D. Through USB-only connections

Correct Answer

A. Through APIs and endpoints

Explanation

Applications use APIs and endpoints to send content to Azure AI services and receive analysis results.

Why the Other Answers Are Incorrect

B. Through printer drivers

Printers are unrelated to Azure AI communication.

C. Through monitor settings

This is unrelated to cloud AI services.

D. Through USB-only connections

Cloud AI services use network communication.

Question 6

Why is authentication important in Azure AI applications?

A. To secure access to AI resources
B. To improve image brightness
C. To increase network speed
D. To improve speaker volume

Correct Answer

A. To secure access to AI resources

Explanation

Authentication ensures that only authorized users and applications can access Azure AI services.

Why the Other Answers Are Incorrect

B. To improve image brightness

Authentication does not affect image quality.

C. To increase network speed

Authentication does not improve networking.

D. To improve speaker volume

Authentication does not affect audio playback.

Question 7

Which format is commonly used for structured AI output data?

A. JSON
B. JPEG
C. MP3
D. ZIP

Correct Answer

A. JSON

Explanation

AI systems often return structured data in JSON-like formats for easy application integration.

Why the Other Answers Are Incorrect

B. JPEG

JPEG is an image format.

C. MP3

MP3 is an audio format.

D. ZIP

ZIP is a compressed archive format.

Question 8

Which factor can reduce information-extraction accuracy?

A. Poor-quality input content
B. Spreadsheet formatting
C. Keyboard layout changes
D. Screen brightness settings

Correct Answer

A. Poor-quality input content

Explanation

Blurry images, poor audio quality, and noisy environments can negatively affect AI extraction accuracy.

Why the Other Answers Are Incorrect

B. Spreadsheet formatting

This does not affect AI extraction services.

C. Keyboard layout changes

This is unrelated to AI analysis.

D. Screen brightness settings

This does not affect AI processing accuracy.

Question 9

Which Responsible AI concern is especially important for information extraction applications?

A. Protecting sensitive personal data
B. Increasing printer performance
C. Improving spreadsheet formulas
D. Reducing monitor power usage

Correct Answer

A. Protecting sensitive personal data

Explanation

Extracted content may contain financial, medical, or personal information that must be protected securely.

Why the Other Answers Are Incorrect

B. Increasing printer performance

This is unrelated to Responsible AI.

C. Improving spreadsheet formulas

This is unrelated to information extraction.

D. Reducing monitor power usage

This is unrelated to AI ethics.

Question 10

What are hallucinations in AI information-extraction systems?

A. Incorrect or fabricated AI-generated outputs
B. Hardware installation failures
C. Network outages
D. Operating system crashes

Correct Answer

A. Incorrect or fabricated AI-generated outputs

Explanation

Hallucinations occur when AI systems generate inaccurate extracted information, captions, summaries, or identifications.

Why the Other Answers Are Incorrect

B. Hardware installation failures

This is unrelated to AI-generated outputs.

C. Network outages

This is a connectivity issue.

D. Operating system crashes

This is unrelated to AI hallucinations.

Final Thoughts

Building lightweight applications with information extraction capabilities is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as OCR, speech recognition, APIs, authentication, structured outputs, Responsible AI principles, and lightweight AI workflows.

Azure AI services and Azure AI Foundry provide powerful tools for creating scalable applications capable of extracting valuable information from text, images, audio, video, and documents.

Go to the AI-901 Exam Prep Hub main page

AI, AI-901, azure, Microsoft Certification May 18, 2026

Extract information from audio and video by using Content Understanding (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions for information extraction by using Foundry
      --> Extract information from audio and video by using Content Understanding

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Organizations increasingly rely on AI systems to analyze audio and video content for automation, accessibility, security, analytics, and customer experiences. AI-powered content understanding solutions can extract valuable information from spoken language, sounds, images, and moving video streams.

For the AI-901 certification exam, candidates should understand the foundational concepts behind extracting information from audio and video by using Azure Content Understanding and Microsoft Foundry tools.

This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.

What Is Content Understanding?

Content understanding refers to AI systems analyzing and interpreting different forms of content, including:

Audio
Video
Images
Documents
Text

AI systems can identify patterns, extract information, and generate useful insights.

Azure Content Understanding

Azure Content Understanding enables AI-powered analysis of multimedia content.

Capabilities include:

Speech recognition
Video analysis
Speaker identification
Caption generation
Object detection
Keyword extraction

Azure AI Foundry

Azure AI Foundry provides tools for building, testing, and managing AI applications.

Developers can:

Deploy AI services
Process multimedia content
Build lightweight applications
Test AI workflows

Audio Information Extraction

AI systems can analyze audio files to extract useful information.

Examples include:

Spoken words
Speaker identity
Keywords
Emotions
Language detection

Speech Recognition

Speech recognition converts spoken language into text.

Example

Input

Audio recording of a meeting

Output

Meeting transcript

Speaker Identification

AI systems can distinguish between different speakers.

Example

A meeting transcription may identify:

Speaker 1
Speaker 2
Speaker 3

Language Detection

AI systems can identify the spoken language within audio content.

Example

An AI system determines whether audio is:

English
Spanish
French
Japanese

Keyword Extraction

AI systems can identify important terms within conversations.

Example

A customer support call may extract:

Product names
Complaint topics
Order numbers

Sentiment Analysis

AI systems can analyze emotional tone in speech.

Example

A customer call may be classified as:

Positive
Neutral
Negative

Video Information Extraction

Video analysis combines:

Audio analysis
Image analysis
Motion analysis

Common Video Analysis Capabilities

AI systems may perform:

Object detection
Facial analysis
Activity recognition
Scene description
Text extraction
Caption generation

Object Detection in Video

AI systems can identify objects appearing in video frames.

Example

A traffic-monitoring system may detect:

Cars
Trucks
Pedestrians
Traffic lights

Scene Detection

AI systems can identify scene changes within videos.

Example

A sports video may identify:

Game start
Replay segments
Commercial breaks

Video Captioning

AI systems can generate descriptions or subtitles for videos.

Example

A training video may automatically generate captions for accessibility.

Optical Character Recognition (OCR) in Video

AI systems can extract text appearing in video frames.

Example

A video may contain:

Street signs
License plates
Product labels

APIs and Endpoints

Applications communicate with Azure AI services using:

APIs
Endpoints

Audio and video content is submitted programmatically for analysis.

Authentication

Applications must securely authenticate before accessing Azure AI services.

Common authentication methods include:

API keys
Azure credentials
Managed identities

Lightweight Application Workflow

A typical workflow includes:

User uploads audio or video
Application sends content to AI service
AI analyzes multimedia content
Results are returned
Application displays extracted information

Example High-Level Pseudocode

			
media = upload_media()
results = analyze_media(media)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Common Real-World Scenarios

Scenario 1: Meeting Transcription

Goal

Convert meeting audio into searchable text.

Features

Speech recognition
Speaker identification
Keyword extraction

Scenario 2: Call Center Analytics

Goal

Analyze customer service calls.

Features

Sentiment analysis
Topic extraction
Call summarization

Scenario 3: Security Monitoring

Goal

Analyze surveillance video.

Features

Object detection
Activity recognition
Facial analysis

Scenario 4: Video Accessibility

Goal

Improve accessibility for multimedia content.

Features

Caption generation
Speech transcription
Scene descriptions

Responsible AI Considerations

Audio and video AI systems should follow Responsible AI principles.

Key considerations include:

Privacy
Fairness
Transparency
Inclusiveness
Accountability
Security

Privacy Concerns

Audio and video may contain:

Personal conversations
Faces
Biometric data
Sensitive information

Organizations should protect multimedia data appropriately.

Fairness and Bias

Speech and video systems may perform differently across:

Languages
Accents
Dialects
Lighting conditions
Demographics

Testing and evaluation are important.

Transparency

Users should understand:

AI is analyzing multimedia content
AI-generated outputs may contain errors
Human review may still be needed

Accuracy Limitations

Audio and video analysis systems may struggle with:

Background noise
Poor audio quality
Low-resolution video
Obstructed visuals
Multiple overlapping speakers

Hallucinations and Errors

AI systems may occasionally:

Misidentify speakers
Generate inaccurate captions
Misinterpret speech
Detect nonexistent objects

Applications should validate important outputs.

Error Handling

Applications should handle:

Unsupported file formats
Corrupted media files
Authentication failures
Network interruptions
Rate limits

Advantages of Multimedia Information Extraction

Benefits include:

Automation
Faster analysis
Improved accessibility
Searchable content
Scalable processing

Limitations of Multimedia Information Extraction

Challenges include:

Privacy concerns
Accuracy limitations
Bias
Environmental variability
Ethical considerations

Multimodal AI

Modern AI systems may combine:

Speech
Vision
Text
Generative AI

These systems can:

Analyze multimedia content
Answer questions
Generate summaries
Create captions and descriptions

High-Level Architecture

A simplified architecture often includes:

User uploads audio/video
Application sends media to Azure AI service
AI processes multimedia content
Structured results are returned
Application displays extracted information

Important AI-901 Exam Tips

For the exam, remember these key points:

Speech recognition converts speech to text.
Speaker identification distinguishes speakers.
Sentiment analysis detects emotional tone.
OCR can extract text from video frames.
Object detection identifies objects in video.
APIs and endpoints connect applications to AI services.
Authentication secures AI resources.
Responsible AI principles apply to multimedia AI systems.
Poor audio or video quality can reduce accuracy.
Hallucinations are inaccurate AI-generated outputs.
Azure AI Foundry supports multimedia AI application development.

Quick Knowledge Check

Question 1

What does speech recognition do?

Answer

Converts spoken language into text.

Question 2

What is speaker identification?

Answer

Distinguishing between different speakers in audio content.

Question 3

Why is authentication important?

Answer

It secures access to Azure AI services.

Question 4

What can reduce multimedia-analysis accuracy?

Answer

Background noise, low-quality audio, and poor video quality.

Practice Exam Questions

Exam: AI-901

Topic: Extract Information from Audio and Video by Using Content Understanding

Question 1

What is the PRIMARY purpose of content understanding in AI systems?

A. To analyze and interpret multimedia content such as audio and video
B. To increase internet bandwidth
C. To replace operating systems
D. To improve keyboard performance

Correct Answer

A. To analyze and interpret multimedia content such as audio and video

Explanation

Content understanding enables AI systems to analyze audio, video, images, and other forms of content to extract useful information.

Why the Other Answers Are Incorrect

B. To increase internet bandwidth

Content understanding does not improve networking speed.

C. To replace operating systems

AI multimedia analysis does not replace operating systems.

D. To improve keyboard performance

This is unrelated to AI content understanding.

Question 2

What does speech recognition do?

A. Converts spoken language into text
B. Converts images into audio
C. Encrypts media files
D. Repairs damaged videos

Correct Answer

A. Converts spoken language into text

Explanation

Speech recognition transcribes spoken words into machine-readable text.

Why the Other Answers Are Incorrect

B. Converts images into audio

This is unrelated to speech recognition.

C. Encrypts media files

Encryption is unrelated to speech transcription.

D. Repairs damaged videos

Speech recognition does not repair media files.

Question 3

Which AI capability identifies different speakers in an audio recording?

A. Speaker identification
B. OCR
C. Image classification
D. Object compression

Correct Answer

A. Speaker identification

Explanation

Speaker identification distinguishes between different speakers within audio content.

Why the Other Answers Are Incorrect

B. OCR

OCR extracts text from images.

C. Image classification

This categorizes images.

D. Object compression

This is not a multimedia AI capability.

Question 4

What is sentiment analysis used for in audio processing?

A. Detecting emotional tone in speech
B. Increasing audio volume
C. Compressing audio files
D. Repairing broken microphones

Correct Answer

A. Detecting emotional tone in speech

Explanation

Sentiment analysis identifies whether speech content is positive, negative, or neutral.

Why the Other Answers Are Incorrect

B. Increasing audio volume

This is unrelated to AI analysis.

C. Compressing audio files

Compression is unrelated to sentiment detection.

D. Repairing broken microphones

This is a hardware issue.

Question 5

Which AI capability can extract text from video frames?

A. OCR
B. Speech synthesis
C. Audio normalization
D. File compression

Correct Answer

A. OCR

Explanation

OCR can identify and extract text that appears visually within video frames.

Why the Other Answers Are Incorrect

B. Speech synthesis

This converts text into speech.

C. Audio normalization

This adjusts sound levels.

D. File compression

This reduces file size.

Question 6

How do lightweight multimedia-analysis applications typically communicate with Azure AI services?

A. Through APIs and endpoints
B. Through printer drivers
C. Through monitor settings
D. Through USB-only connections

Correct Answer

A. Through APIs and endpoints

Explanation

Applications use APIs and endpoints to send audio and video content to Azure AI services for analysis.

Why the Other Answers Are Incorrect

B. Through printer drivers

Printers are unrelated to multimedia AI communication.

C. Through monitor settings

This is unrelated to cloud AI services.

D. Through USB-only connections

Cloud AI services use network communication.

Question 7

Why is authentication important when using Azure AI multimedia services?

A. To secure access to AI resources
B. To improve speaker volume
C. To increase internet speed
D. To improve video resolution

Correct Answer

A. To secure access to AI resources

Explanation

Authentication ensures that only authorized users and applications can access Azure AI services.

Why the Other Answers Are Incorrect

B. To improve speaker volume

Authentication does not affect sound levels.

C. To increase internet speed

Authentication does not improve networking.

D. To improve video resolution

Authentication does not affect video quality.

Question 8

Which factor can reduce speech-recognition accuracy?

A. Background noise
B. Spreadsheet formatting
C. Keyboard layout changes
D. Monitor brightness

Correct Answer

A. Background noise

Explanation

Noise and poor audio quality can make it difficult for AI systems to correctly recognize speech.

Why the Other Answers Are Incorrect

B. Spreadsheet formatting

This does not affect audio AI systems.

C. Keyboard layout changes

This is unrelated to speech recognition.

D. Monitor brightness

This does not affect audio analysis.

Question 9

Which Responsible AI concern is especially important for audio and video analysis systems?

A. Protecting sensitive personal information
B. Increasing printer speed
C. Improving spreadsheet formulas
D. Reducing file storage costs

Correct Answer

A. Protecting sensitive personal information

Explanation

Audio and video files may contain faces, voices, and personal conversations that require privacy protection.

Why the Other Answers Are Incorrect

B. Increasing printer speed

This is unrelated to Responsible AI.

C. Improving spreadsheet formulas

This is unrelated to multimedia analysis.

D. Reducing file storage costs

This is not a Responsible AI principle.

Question 10

What are hallucinations in multimedia AI systems?

A. Incorrect or fabricated AI-generated outputs
B. Hardware installation failures
C. Network outages
D. Speaker hardware malfunctions

Correct Answer

A. Incorrect or fabricated AI-generated outputs

Explanation

Hallucinations occur when AI systems produce inaccurate captions, object detections, speaker identifications, or transcriptions.

Why the Other Answers Are Incorrect

B. Hardware installation failures

This is unrelated to AI-generated outputs.

C. Network outages

This is a connectivity issue.

D. Speaker hardware malfunctions

This is a hardware problem, not an AI hallucination.

Final Thoughts

Extracting information from audio and video by using Content Understanding is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as speech recognition, video analysis, OCR, APIs, authentication, Responsible AI principles, and lightweight multimedia-analysis workflows.

Azure AI services and Azure AI Foundry provide powerful tools for building intelligent multimedia applications capable of understanding spoken language, video content, and visual information at scale.

Go to the AI-901 Exam Prep Hub main page

AI, AI-901, Artificial Intelligence (AI), Microsoft Certification May 18, 2026May 18, 2026

Extract information from images by using Content Understanding (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions for information extraction by using Foundry
      --> Extract information from images by using Content Understanding

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Modern AI systems can analyze images and extract meaningful information automatically. Organizations use image analysis solutions for automation, accessibility, security, healthcare, retail, and business intelligence.

For the AI-901 certification exam, candidates should understand the foundational concepts behind extracting information from images by using Azure Content Understanding and Microsoft Foundry tools.

This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.

What Is Image Information Extraction?

Image information extraction is the process of analyzing images to identify and retrieve useful information.

AI systems can detect:

Text
Objects
Faces
Colors
Products
Landmarks
Visual patterns

What Is Azure Content Understanding?

Azure Content Understanding enables AI systems to interpret and analyze content such as:

Images
Documents
Audio
Video

Capabilities include:

OCR
Object detection
Classification
Caption generation
Metadata extraction

Azure AI Foundry

Azure AI Foundry provides tools for building, testing, and managing AI-powered applications.

Developers can:

Access AI models
Analyze images
Build lightweight applications
Test AI workflows

Common Image Extraction Techniques

Optical Character Recognition (OCR)

OCR extracts text from images.

Example

Image

Photo of a street sign

OCR Output

“Main Street”

Object Detection

Object detection identifies objects and their locations within images.

Example

Detected Objects

Car
Bicycle
Traffic light
Person

Image Classification

Image classification determines the overall category of an image.

Example

Image

Photo of a cat

Classification

“Cat”

Facial Analysis

AI systems can analyze facial characteristics.

Capabilities may include:

Face detection
Emotion analysis
Age estimation

Responsible AI considerations are especially important for facial-analysis systems.

Image Captioning

Image captioning generates natural-language descriptions of images.

Example

Image

A dog running on a beach

Caption

“A brown dog running along a sandy beach.”

Metadata Extraction

AI systems can extract metadata and contextual information from images.

Examples include:

Time
Location
Camera details
Image dimensions

Barcode and QR Code Detection

AI systems can identify and decode:

Barcodes
QR codes

Example

Retail applications may scan product barcodes for inventory management.

APIs and Endpoints

Applications communicate with Azure AI services using:

APIs
Endpoints

Images are submitted programmatically for analysis.

Authentication

Applications must securely authenticate before accessing AI services.

Common methods include:

API keys
Azure credentials
Managed identities

Lightweight Application Workflow

A typical workflow includes:

User uploads image
Application sends image to AI service
AI analyzes image
Results are returned
Application displays extracted information

Example High-Level Pseudocode

			
image = upload_image()
results = analyze_image(image)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Common Real-World Scenarios

Scenario 1: Receipt Scanner

Goal

Extract purchase details from receipt images.

Features

OCR
Table extraction
Total amount detection

Scenario 2: Accessibility Assistant

Goal

Describe images for visually impaired users.

Features

Image captioning
OCR
Object detection

Scenario 3: Retail Inventory

Goal

Identify products from shelf images.

Features

Barcode scanning
Object detection
Classification

Scenario 4: Traffic Monitoring

Goal

Analyze roadway images.

Features

Vehicle detection
Traffic analysis
License plate reading

Responsible AI Considerations

Image-analysis applications should follow Responsible AI principles.

Key considerations include:

Privacy
Fairness
Transparency
Inclusiveness
Accountability
Security

Privacy Concerns

Images may contain:

Faces
Personal information
License plates
Sensitive documents

Organizations should protect image data appropriately.

Fairness and Bias

Vision systems may perform differently across:

Lighting conditions
Skin tones
Environmental conditions
Camera quality

Testing and evaluation are important.

Transparency

Users should understand:

AI is analyzing images
AI-generated outputs may contain errors
Images may be processed in the cloud

Accuracy Limitations

Image extraction systems may struggle with:

Blurry images
Poor lighting
Obstructed objects
Low-resolution images

Hallucinations and Errors

AI systems may occasionally:

Misidentify objects
Generate incorrect captions
Extract inaccurate text

Applications should validate important outputs.

Error Handling

Applications should handle:

Unsupported image formats
Corrupted files
Authentication failures
Network interruptions
Rate limits

Advantages of Image Extraction AI

Benefits include:

Faster processing
Automation
Scalability
Accessibility improvements
Reduced manual work

Limitations of Image Extraction AI

Challenges include:

Accuracy limitations
Bias
Privacy concerns
Environmental variability
Ethical considerations

Multimodal AI

Some modern AI systems combine:

Vision
Text
Speech
Generative AI

These systems can:

Analyze images
Answer visual questions
Generate descriptions
Create new content

High-Level Architecture

A simplified architecture often includes:

User uploads image
Application sends image to Azure AI service
AI processes image
Structured results are returned
Application displays information

Important AI-901 Exam Tips

For the exam, remember these key points:

OCR extracts text from images.
Object detection identifies objects and locations.
Image classification categorizes images.
Image captioning generates natural-language descriptions.
APIs and endpoints connect applications to AI services.
Authentication secures access to AI resources.
Responsible AI principles apply to image-analysis systems.
Poor image quality can reduce accuracy.
Hallucinations are inaccurate AI-generated outputs.
Azure AI Foundry supports AI application development.

Quick Knowledge Check

Question 1

What does OCR do?

Answer

Extracts machine-readable text from images.

Question 2

What is object detection?

Answer

Identifying and locating objects within an image.

Question 3

Why is authentication important?

Answer

It secures access to Azure AI services.

Question 4

What can reduce image-analysis accuracy?

Answer

Poor lighting, blur, and low-resolution images.

Practice Exam Questions

Exam: AI-901

Topic: Extract Information from Images by Using Content Understanding

Question 1

What is the PRIMARY purpose of image information extraction?

A. To analyze images and retrieve useful information
B. To increase internet bandwidth
C. To manage operating systems
D. To improve printer performance

Correct Answer

A. To analyze images and retrieve useful information

Explanation

Image information extraction uses AI to identify and retrieve meaningful data from images, such as text, objects, and visual patterns.

Why the Other Answers Are Incorrect

B. To increase internet bandwidth

Image analysis does not affect networking speed.

C. To manage operating systems

This is unrelated to computer vision.

D. To improve printer performance

Printers are unrelated to AI image extraction.

Question 2

What does OCR stand for?

A. Optical Character Recognition
B. Open Content Routing
C. Object Classification Reporting
D. Operational Cloud Rendering

Correct Answer

A. Optical Character Recognition

Explanation

OCR extracts machine-readable text from images and scanned documents.

Why the Other Answers Are Incorrect

B. Open Content Routing

This is not the meaning of OCR.

C. Object Classification Reporting

This is unrelated to text extraction.

D. Operational Cloud Rendering

This is not an OCR term.

Question 3

Which computer vision capability identifies multiple objects and their locations within an image?

A. Object detection
B. Speech synthesis
C. Text summarization
D. Audio transcription

Correct Answer

A. Object detection

Explanation

Object detection identifies objects and determines where they appear within an image.

Why the Other Answers Are Incorrect

B. Speech synthesis

This converts text into speech.

C. Text summarization

This is a text-analysis task.

D. Audio transcription

This converts speech into text.

Question 4

What is image classification?

A. Categorizing an image based on its contents
B. Compressing image file sizes
C. Encrypting image data
D. Converting images into spreadsheets

Correct Answer

A. Categorizing an image based on its contents

Explanation

Image classification determines the overall category or subject represented in an image.

Why the Other Answers Are Incorrect

B. Compressing image file sizes

Compression is unrelated to classification.

C. Encrypting image data

Encryption is unrelated to image categorization.

D. Converting images into spreadsheets

This is unrelated to computer vision.

Question 5

What does image captioning do?

A. Generates natural-language descriptions of images
B. Repairs corrupted image files
C. Converts speech into text
D. Improves internet speeds

Correct Answer

A. Generates natural-language descriptions of images

Explanation

Image captioning creates descriptive text that explains the contents of an image.

Why the Other Answers Are Incorrect

B. Repairs corrupted image files

This is unrelated to caption generation.

C. Converts speech into text

This is speech recognition.

D. Improves internet speeds

This is unrelated to AI image analysis.

Question 6

How do lightweight image-analysis applications typically communicate with Azure AI services?

A. Through APIs and endpoints
B. Through printer drivers
C. Through monitor settings
D. Through USB-only connections

Correct Answer

A. Through APIs and endpoints

Explanation

Applications send images to cloud AI services through APIs and service endpoints.

Why the Other Answers Are Incorrect

B. Through printer drivers

Printers are unrelated to AI communication.

C. Through monitor settings

This is unrelated to cloud AI services.

D. Through USB-only connections

Cloud services use network communication.

Question 7

Why is authentication important when using Azure AI services?

A. To secure access to AI resources
B. To improve image brightness
C. To reduce image resolution
D. To increase network speed

Correct Answer

A. To secure access to AI resources

Explanation

Authentication ensures that only authorized users and applications can access Azure AI services.

Why the Other Answers Are Incorrect

B. To improve image brightness

Authentication does not affect image quality.

C. To reduce image resolution

Authentication is unrelated to image resolution.

D. To increase network speed

Authentication does not improve internet performance.

Question 8

Which Responsible AI concern is especially important for image-analysis systems?

A. Protecting personal and sensitive visual information
B. Increasing printer speed
C. Improving spreadsheet formulas
D. Reducing monitor power usage

Correct Answer

A. Protecting personal and sensitive visual information

Explanation

Images may contain sensitive information such as faces, license plates, and documents that must be protected.

Why the Other Answers Are Incorrect

B. Increasing printer speed

This is unrelated to Responsible AI.

C. Improving spreadsheet formulas

This is unrelated to image analysis.

D. Reducing monitor power usage

This is unrelated to AI ethics.

Question 9

Which factor can reduce image-analysis accuracy?

A. Poor image quality
B. Spreadsheet formatting
C. Keyboard layout changes
D. Audio playback speed

Correct Answer

A. Poor image quality

Explanation

Blur, poor lighting, and low-resolution images can negatively affect AI analysis accuracy.

Why the Other Answers Are Incorrect

B. Spreadsheet formatting

This does not affect image AI systems.

C. Keyboard layout changes

This is unrelated to computer vision.

D. Audio playback speed

This is unrelated to image processing.

Question 10

What are hallucinations in AI image-analysis systems?

A. Incorrect or fabricated AI-generated outputs
B. Hardware installation failures
C. Network outages
D. Audio recording problems

Correct Answer

A. Incorrect or fabricated AI-generated outputs

Explanation

Hallucinations occur when AI systems generate inaccurate captions, object identifications, or extracted information.

Why the Other Answers Are Incorrect

B. Hardware installation failures

This is unrelated to AI-generated outputs.

C. Network outages

This is a connectivity issue.

D. Audio recording problems

This is unrelated to image-analysis systems.

Final Thoughts

Extracting information from images by using Content Understanding is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as OCR, object detection, image classification, APIs, authentication, Responsible AI principles, and lightweight image-analysis workflows.

Azure AI services and Azure AI Foundry provide powerful tools for building scalable AI applications capable of understanding and extracting valuable information from visual content.

Go to the AI-901 Exam Prep Hub main page

AI, AI-901, Artificial Intelligence (AI), azure, Microsoft Certification May 18, 2026May 18, 2026

Extract information from documents and forms by using Azure Content Understanding in Foundry Tools (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions for information extraction by using Foundry
      --> Extract information from documents and forms by using Azure Content Understanding in Foundry Tools

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Organizations process enormous amounts of documents every day, including invoices, receipts, forms, contracts, and identification documents. AI-powered information extraction solutions help automate the process of reading, understanding, and organizing document data.

For the AI-901 certification exam, candidates should understand the foundational concepts behind extracting information from documents and forms by using Azure Content Understanding and Microsoft Foundry tools.

This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.

What Is Information Extraction?

Information extraction is the process of identifying and retrieving useful data from documents, images, forms, audio, or other content.

Examples include extracting:

Names
Dates
Invoice totals
Addresses
Phone numbers
Product information

What Is Azure Content Understanding?

Azure Content Understanding helps AI systems analyze and interpret structured and unstructured documents.

Capabilities include:

Text extraction
Form recognition
Document analysis
Information classification
Key-value pair extraction

Azure AI Foundry

Azure AI Foundry provides tools for building, testing, and managing AI-powered applications.

Developers can:

Configure AI services
Process documents
Test extraction workflows
Build lightweight AI applications

Structured vs. Unstructured Documents

Structured Documents

Structured documents follow a consistent layout.

Examples include:

Tax forms
Invoices
Receipts
Application forms

Unstructured Documents

Unstructured documents have less predictable layouts.

Examples include:

Emails
Letters
Articles
Contracts

Optical Character Recognition (OCR)

OCR converts text within images or scanned documents into machine-readable text.

Example

Input

Scanned receipt image

OCR Output

Store name
Date
Total amount

Form Recognition

Form recognition identifies fields and values within forms.

Example

Form

Insurance application

Extracted Data

Customer name
Policy number
Address
Claim amount

Key-Value Pair Extraction

AI systems can identify relationships between labels and values.

Example

Key	Value
Invoice Number	INV-1045
Total	$250.00
Due Date	05/30/2026

Table Extraction

AI can identify and extract tables from documents.

Example

A receipt table may contain:

Item names
Quantities
Prices

Classification

Document classification identifies the type of document being processed.

Example

The system determines whether a file is:

Invoice
Contract
Receipt
Resume

Named Entity Recognition (NER)

NER identifies important entities within text.

Entities may include:

People
Organizations
Locations
Dates

Example

Text

“John Smith works for Contoso in Seattle.”

Extracted Entities

John Smith (Person)
Contoso (Organization)
Seattle (Location)

APIs and Endpoints

Applications communicate with Azure AI services through:

APIs
Endpoints

Documents are submitted for analysis programmatically.

Authentication

Applications must securely authenticate before accessing Azure AI services.

Common authentication methods include:

API keys
Azure credentials
Managed identities

Lightweight Application Workflow

A typical workflow includes:

User uploads document
Application sends file to AI service
AI extracts information
Results are returned
Application displays or stores extracted data

Example Workflow

Input

Scanned invoice

AI Processing

OCR
Key-value extraction
Table analysis

Output

Structured invoice data

Example High-Level Pseudocode

			
document = upload_document()
results = analyze_document(document)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Common Real-World Scenarios

Scenario 1: Invoice Processing

Goal

Automate invoice data extraction.

Features

OCR
Table extraction
Total amount detection

Scenario 2: Receipt Scanning

Goal

Extract purchase information from receipts.

Features

Text extraction
Merchant identification
Expense categorization

Scenario 3: Resume Processing

Goal

Extract candidate information from resumes.

Features

Name extraction
Skill identification
Contact information detection

Scenario 4: Healthcare Forms

Goal

Digitize patient records.

Features

Form recognition
Key-value extraction
Classification

Responsible AI Considerations

Document-processing applications should follow Responsible AI principles.

Key considerations include:

Privacy
Security
Fairness
Transparency
Accountability
Inclusiveness

Privacy Concerns

Documents may contain:

Personal information
Financial data
Medical information
Legal records

Organizations should protect sensitive data appropriately.

Security Considerations

Applications should secure:

Uploaded files
Stored documents
API credentials
Extracted data

Transparency

Users should understand:

AI is analyzing documents
Extracted data may contain errors
Human review may still be needed

Accuracy Limitations

AI extraction systems may struggle with:

Poor scan quality
Handwritten text
Complex layouts
Damaged documents

Hallucinations and Errors

AI systems may occasionally:

Extract incorrect values
Miss fields
Misclassify documents

Applications should validate important information.

Error Handling

Applications should handle:

Unsupported file formats
Corrupted documents
Authentication failures
Network interruptions
Rate limits

Advantages of Information Extraction AI

Benefits include:

Faster document processing
Reduced manual entry
Improved scalability
Increased automation
Better searchability

Limitations of Information Extraction AI

Challenges include:

Variable document quality
Handwriting recognition difficulties
Inconsistent layouts
Privacy concerns
Extraction inaccuracies

Generative AI and Information Extraction

Some modern systems combine:

OCR
Document intelligence
Generative AI

This enables:

Summarization
Question answering
Conversational document analysis

High-Level Architecture

A simplified architecture often includes:

User uploads document
Application sends document to Azure AI service
AI analyzes content
Structured data is returned
Application displays or stores results

Important AI-901 Exam Tips

For the exam, remember these key points:

OCR extracts text from documents and images.
Form recognition identifies fields and values.
Key-value extraction identifies label-value relationships.
Table extraction retrieves structured table data.
Classification identifies document types.
APIs and endpoints connect applications to Azure AI services.
Authentication secures access to AI resources.
Responsible AI principles apply to document-processing systems.
Poor document quality can reduce extraction accuracy.
AI-generated outputs may still require validation.

Quick Knowledge Check

Question 1

What does OCR do?

Answer

Extracts machine-readable text from images or scanned documents.

Question 2

What is form recognition?

Answer

Identifying and extracting fields and values from forms.

Question 3

Why is authentication important?

Answer

It secures access to Azure AI services and protects resources.

Question 4

What can reduce extraction accuracy?

Answer

Poor scan quality, handwriting, and inconsistent document layouts.

Practice Exam Questions

Exam: AI-901

Topic: Extract Information from Documents and Forms by Using Azure Content Understanding in Foundry Tools

Question 1

What is the PRIMARY purpose of information extraction AI solutions?

A. To retrieve useful data from documents and content
B. To increase internet bandwidth
C. To replace operating systems
D. To improve monitor resolution

Correct Answer

A. To retrieve useful data from documents and content

Explanation

Information extraction AI systems identify and retrieve meaningful information such as names, dates, totals, and addresses from documents and forms.

Why the Other Answers Are Incorrect

B. To increase internet bandwidth

Information extraction does not affect network speed.

C. To replace operating systems

AI document processing does not replace operating systems.

D. To improve monitor resolution

This is unrelated to AI information extraction.

Question 2

What does OCR stand for?

A. Optical Character Recognition
B. Open Content Retrieval
C. Object Classification Routing
D. Operational Compute Reporting

Correct Answer

A. Optical Character Recognition

Explanation

OCR converts printed or handwritten text within images and scanned documents into machine-readable text.

Why the Other Answers Are Incorrect

B. Open Content Retrieval

This is not the meaning of OCR.

C. Object Classification Routing

This is unrelated to document analysis.

D. Operational Compute Reporting

This is not an OCR term.

Question 3

Which AI capability identifies fields and values within forms?

A. Form recognition
B. Speech synthesis
C. Image compression
D. Network monitoring

Correct Answer

A. Form recognition

Explanation

Form recognition extracts structured information such as names, dates, totals, and addresses from forms and documents.

Why the Other Answers Are Incorrect

B. Speech synthesis

This converts text into speech.

C. Image compression

This reduces file size and is unrelated to field extraction.

D. Network monitoring

This is unrelated to document AI.

Question 4

Which Azure platform provides tools for building and managing AI-powered applications?

A. Azure AI Foundry
B. Microsoft Paint
C. Windows Task Manager
D. Azure DNS

Correct Answer

A. Azure AI Foundry

Explanation

Azure AI Foundry provides tools for deploying, testing, and managing AI applications and services.

Why the Other Answers Are Incorrect

B. Microsoft Paint

Paint is a graphics editor.

C. Windows Task Manager

This is a system monitoring tool.

D. Azure DNS

This is a networking service.

Question 5

What is key-value pair extraction?

A. Identifying labels and their associated values in documents
B. Encrypting document files
C. Compressing image sizes
D. Converting audio into text

Correct Answer

A. Identifying labels and their associated values in documents

Explanation

Key-value extraction identifies relationships such as:

Invoice Number → INV-1045
Total → $250.00

Why the Other Answers Are Incorrect

B. Encrypting document files

Encryption is unrelated to data extraction.

C. Compressing image sizes

Compression is unrelated to document intelligence.

D. Converting audio into text

This is speech recognition.

Question 6

What is the purpose of document classification?

A. To identify the type of document being processed
B. To increase network performance
C. To generate music files
D. To repair damaged documents physically

Correct Answer

A. To identify the type of document being processed

Explanation

Document classification determines whether a file is an invoice, contract, receipt, resume, or another document type.

Why the Other Answers Are Incorrect

B. To increase network performance

Classification does not improve networking.

C. To generate music files

This is unrelated to document AI.

D. To repair damaged documents physically

AI classification does not physically repair documents.

Question 7

How do lightweight document-processing applications typically communicate with Azure AI services?

A. Through APIs and endpoints
B. Through USB-only connections
C. Through monitor calibration tools
D. Through printer drivers

Correct Answer

A. Through APIs and endpoints

Explanation

Applications send documents to Azure AI services using APIs and endpoints and receive structured analysis results.

Why the Other Answers Are Incorrect

B. Through USB-only connections

Cloud services use network communication.

C. Through monitor calibration tools

This is unrelated to AI services.

D. Through printer drivers

Printers are unrelated to cloud AI communication.

Question 8

Which factor can reduce the accuracy of document extraction systems?

A. Poor document quality
B. Spreadsheet color themes
C. Keyboard layout changes
D. Audio playback speed

Correct Answer

A. Poor document quality

Explanation

Blurry scans, damaged pages, handwriting, and poor lighting can negatively affect extraction accuracy.

Why the Other Answers Are Incorrect

B. Spreadsheet color themes

This does not affect document extraction AI.

C. Keyboard layout changes

This is unrelated to AI document analysis.

D. Audio playback speed

This is unrelated to document processing.

Question 9

Why is authentication important when using Azure AI services?

A. To secure access to AI resources
B. To improve image resolution
C. To increase internet speed
D. To compress document files

Correct Answer

A. To secure access to AI resources

Explanation

Authentication ensures that only authorized users and applications can access AI services.

Why the Other Answers Are Incorrect

B. To improve image resolution

Authentication does not affect image quality.

C. To increase internet speed

Authentication does not improve networking.

D. To compress document files

Authentication is unrelated to file compression.

Question 10

Which Responsible AI concern is especially important when processing documents?

A. Protecting sensitive personal information
B. Increasing monitor brightness
C. Improving printer speed
D. Reducing spreadsheet file size

Correct Answer

A. Protecting sensitive personal information

Explanation

Documents may contain financial, medical, legal, or personal information that must be protected appropriately.

Why the Other Answers Are Incorrect

B. Increasing monitor brightness

This is unrelated to Responsible AI.

C. Improving printer speed

This is unrelated to document intelligence.

D. Reducing spreadsheet file size

This is unrelated to AI ethics or privacy.

Final Thoughts

Extracting information from documents and forms using Azure Content Understanding and Foundry tools is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as OCR, form recognition, document analysis, APIs, authentication, Responsible AI principles, and lightweight document-processing workflows.

Azure AI services and Azure AI Foundry provide powerful tools for automating information extraction and improving efficiency across business, healthcare, finance, and administrative scenarios.

Go to the AI-901 Exam Prep Hub main page

AI, AI-901, Artificial Intelligence (AI), azure, Microsoft Certification May 18, 2026

Identify techniques to extract information from text, images, audio, and videos (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Identify AI concepts and capabilities (40–45%)
   --> Identify AI workloads
      --> Identify techniques to extract information from text, images, audio, and videos

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Information extraction is one of the most valuable uses of AI and an important topic for the AI-901 certification exam. Organizations generate enormous amounts of unstructured data every day, including documents, emails, images, audio recordings, and videos. AI systems help convert this unstructured data into structured, usable information.

Microsoft expects AI-901 candidates to understand common techniques used to extract information from text, images, audio, and video content.

This topic falls under the “Identify AI workloads” section of the AI-901 exam objectives.

What Is Information Extraction?

Information extraction is the process of identifying and retrieving useful structured information from unstructured or semi-structured data.

AI systems analyze content and extract meaningful data automatically.

Examples of Information Extraction

Source	Extracted Information
Documents	Names, dates, invoice totals
Emails	Customer requests, keywords
Images	Objects, faces, text
Audio	Spoken words, speaker identity
Video	Activities, objects, movement

Structured vs. Unstructured Data

Understanding structured and unstructured data is important for this topic.

Structured Data	Unstructured Data
Tables	Emails
Databases	Images
Spreadsheets	Audio
Defined formats	Videos
Organized fields	Documents

AI techniques help transform unstructured data into structured information.

Information Extraction from Text

AI systems commonly use Natural Language Processing (NLP) to extract information from text.

Common Text Extraction Techniques

For the AI-901 exam, important techniques include:

Keyword extraction
Named Entity Recognition (NER)
Sentiment analysis
Summarization
Language detection
Text classification

Keyword Extraction

Keyword extraction identifies important words or phrases within text.

Example

Extracting phrases like:

“shipping delay”
“billing issue”
“customer satisfaction”

from support tickets.

Named Entity Recognition (NER)

NER identifies entities such as:

People
Organizations
Locations
Dates
Phone numbers
Products

Example

Input

“Microsoft will host an event in Seattle on June 15.”

Extracted Entities

Microsoft → Organization
Seattle → Location
June 15 → Date

Sentiment Analysis

Sentiment analysis identifies emotional tone within text.

Possible Results

Positive
Negative
Neutral

Example

Analyzing customer reviews to determine satisfaction levels.

Summarization

Summarization creates shorter versions of long text.

Example

Generating meeting summaries from lengthy transcripts.

Text Classification

Text classification assigns categories to text.

Example

Automatically labeling emails as:

Support
Sales
Billing

Information Extraction from Images

Computer vision techniques extract information from images.

Common Image Extraction Techniques

Important techniques include:

OCR
Image classification
Object detection
Facial recognition
Image tagging

Optical Character Recognition (OCR)

OCR extracts text from images and scanned documents.

OCR Example

Input

Scanned invoice image.

Extracted Information

Invoice number
Total amount
Vendor name
Dates

Common OCR Use Cases

Receipt scanning
Invoice processing
Document digitization
Form extraction

Image Classification

Image classification identifies the overall category of an image.

Example

Identifying whether an image contains:

A dog
A car
A building

Object Detection

Object detection identifies and locates multiple objects within images.

Example

Detecting:

Cars
Pedestrians
Traffic lights

in a street image.

Facial Recognition

Facial recognition identifies or verifies people based on facial features.

Example

Smartphone face unlock systems.

Image Tagging

Image tagging automatically generates descriptive labels.

Example Tags

Beach
Sunset
Ocean
Person

Information Extraction from Audio

Speech AI technologies extract information from spoken audio.

Common Audio Extraction Techniques

Important techniques include:

Speech recognition
Speaker recognition
Sentiment analysis in speech
Speech translation

Speech Recognition

Speech recognition converts spoken language into text.

Also called:

Speech-to-text
Automatic Speech Recognition (ASR)

Example

Audio Input

A recorded meeting.

Extracted Information

A written transcript.

Speaker Recognition

Speaker recognition identifies or verifies speakers based on voice characteristics.

Example

Voice authentication systems.

Speech Sentiment Analysis

Some AI systems analyze vocal tone and emotion.

Example

Detecting frustration during customer service calls.

Speech Translation

Speech translation converts spoken language into another language.

Example

Real-time multilingual meeting translation.

Information Extraction from Video

Video analysis combines computer vision and audio processing techniques.

Common Video Extraction Techniques

Important techniques include:

Motion detection
Object tracking
Activity recognition
Scene analysis
Video transcription

Motion Detection

Motion detection identifies movement within video footage.

Example

Security surveillance systems detecting activity.

Object Tracking

Object tracking follows identified objects across video frames.

Example

Tracking vehicles in traffic monitoring systems.

Activity Recognition

Activity recognition identifies actions occurring in video.

Example

Detecting:

Running
Falling
Fighting
Driving

Scene Analysis

Scene analysis identifies environments or contexts in video.

Example

Recognizing:

Office scenes
Outdoor settings
Crowded areas

Video Transcription

Video transcription converts spoken content in videos into text.

Example

Generating subtitles for videos automatically.

Multimodal AI

Some AI systems combine multiple data types together.

This is called multimodal AI.

Example of Multimodal AI

A meeting assistant may process:

Audio
Video
Text chat
Shared documents

simultaneously.

Real-World Information Extraction Scenarios

Scenario 1: Invoice Processing System

Goal

Extract invoice information automatically.

Techniques Used

OCR
Entity extraction

Scenario 2: Customer Support Analysis

Goal

Analyze customer complaints.

Techniques Used

Sentiment analysis
Keyword extraction

Scenario 3: Smart Security Camera

Goal

Detect suspicious activity.

Techniques Used

Object detection
Motion detection
Facial recognition

Scenario 4: Meeting Intelligence Platform

Goal

Generate searchable meeting notes.

Techniques Used

Speech recognition
Summarization
Speaker recognition

Scenario 5: Video Streaming Platform

Goal

Generate subtitles automatically.

Techniques Used

Speech recognition
Video transcription

Azure AI Services for Information Extraction

Azure AI Services provide tools for extracting information from multiple data types.

Common services include:

Azure AI Language
Azure AI Speech
Azure AI Vision
Azure AI Document Intelligence

These services allow organizations to build AI solutions without training models from scratch.

Responsible AI Considerations

Information extraction systems should follow Responsible AI principles.

Important considerations include:

Privacy
Consent
Data security
Transparency
Bias reduction
Compliance

Sensitive personal information may be present in extracted data.

Challenges in Information Extraction

AI systems may face challenges such as:

Poor image quality
Background noise
Ambiguous language
Multiple speakers
Handwritten text
Video quality issues

Performance depends heavily on data quality.

Important AI-901 Exam Tips

For the exam, remember these key points:

NLP extracts information from text.
OCR extracts text from images.
Speech recognition converts speech into text.
Object detection identifies and locates objects in images or video.
Video analysis can detect activities and movement.
Information extraction converts unstructured data into structured information.
Multimodal AI combines multiple data types.
Azure AI services provide prebuilt information extraction capabilities.

Quick Knowledge Check

Question 1

Which technique extracts text from scanned documents?

Answer

OCR.

Question 2

What does speech recognition do?

Answer

Converts spoken language into text.

Question 3

Which technique identifies objects within images?

Answer

Object detection.

Question 4

What is multimodal AI?

Answer

AI systems that process multiple types of data together, such as text, audio, and images.

Practice Exam Questions

Question 1

Which AI technique is used to extract text from scanned documents or images?

A. Sentiment analysis
B. Optical Character Recognition (OCR)
C. Object detection
D. Speech synthesis

Correct Answer

B. Optical Character Recognition (OCR)

Explanation

OCR extracts machine-readable text from images, scanned documents, and photographs.

Why the Other Answers Are Incorrect

A. Sentiment analysis

Sentiment analysis identifies emotional tone in text.

C. Object detection

Object detection identifies objects within images.

D. Speech synthesis

Speech synthesis converts text into spoken audio.

Question 2

A company wants to convert recorded customer support calls into written transcripts.

Which AI capability should be used?

A. Speech recognition
B. Facial recognition
C. Image classification
D. Regression

Correct Answer

A. Speech recognition

Explanation

Speech recognition converts spoken language into written text.

Why the Other Answers Are Incorrect

B. Facial recognition

Facial recognition analyzes faces in images.

C. Image classification

Image classification categorizes images.

D. Regression

Regression predicts numeric values.

Question 3

Which AI technique identifies and locates multiple objects within an image?

A. OCR
B. Object detection
C. Summarization
D. Clustering

Correct Answer

B. Object detection

Explanation

Object detection identifies objects and their positions within images or video frames.

Why the Other Answers Are Incorrect

A. OCR

OCR extracts text from images.

C. Summarization

Summarization condenses text.

D. Clustering

Clustering groups similar data points.

Question 4

A business wants to automatically determine whether customer reviews are positive or negative.

Which AI technique is MOST appropriate?

A. Sentiment analysis
B. OCR
C. Facial recognition
D. Image tagging

Correct Answer

A. Sentiment analysis

Explanation

Sentiment analysis evaluates emotional tone and opinions in text.

Why the Other Answers Are Incorrect

B. OCR

OCR extracts text from images.

C. Facial recognition

Facial recognition identifies people from images.

D. Image tagging

Image tagging labels image content.

Question 5

Which AI capability is commonly used to identify names, locations, and organizations within text?

A. Named Entity Recognition (NER)
B. Speech synthesis
C. Object tracking
D. Regression analysis

Correct Answer

A. Named Entity Recognition (NER)

Explanation

NER extracts entities such as people, organizations, dates, and locations from text.

Why the Other Answers Are Incorrect

B. Speech synthesis

Speech synthesis generates spoken audio.

C. Object tracking

Object tracking follows objects in video.

D. Regression analysis

Regression predicts numeric values.

Question 6

A smart security camera tracks moving vehicles across multiple video frames.

Which AI technique is being used?

A. Text classification
B. Object tracking
C. Summarization
D. Speech translation

Correct Answer

B. Object tracking

Explanation

Object tracking follows identified objects as they move through video footage.

Why the Other Answers Are Incorrect

A. Text classification

Text classification categorizes written text.

C. Summarization

Summarization condenses text.

D. Speech translation

Speech translation converts spoken language between languages.

Question 7

Which term describes AI systems that process multiple data types such as text, images, and audio together?

A. Regression AI
B. Multimodal AI
C. Clustering AI
D. Rule-based AI

Correct Answer

B. Multimodal AI

Explanation

Multimodal AI combines and processes multiple forms of data simultaneously.

Why the Other Answers Are Incorrect

A. Regression AI

Regression predicts numeric values.

C. Clustering AI

Clustering groups similar items.

D. Rule-based AI

Rule-based systems follow predefined logic rules.

Question 8

Which AI capability would MOST likely be used to generate automatic subtitles for videos?

A. Speech recognition
B. Image classification
C. Facial recognition
D. Recommendation systems

Correct Answer

A. Speech recognition

Explanation

Speech recognition converts spoken words in videos into text subtitles.

Why the Other Answers Are Incorrect

B. Image classification

Image classification categorizes images.

C. Facial recognition

Facial recognition identifies people in images.

D. Recommendation systems

Recommendation systems suggest content or products.

Question 9

A retailer wants AI to automatically identify products such as shoes, shirts, and electronics in uploaded images.

Which AI capability should be used?

A. Object detection
B. Sentiment analysis
C. Speech synthesis
D. Language translation

Correct Answer

A. Object detection

Explanation

Object detection identifies multiple objects within images and can locate them visually.

Why the Other Answers Are Incorrect

B. Sentiment analysis

Sentiment analysis evaluates text emotion.

C. Speech synthesis

Speech synthesis converts text into speech.

D. Language translation

Language translation converts text or speech between languages.

Question 10

What is the PRIMARY goal of information extraction AI systems?

A. Creating video games
B. Converting unstructured data into useful structured information
C. Compressing database files
D. Replacing all human decision-making

Correct Answer

B. Converting unstructured data into useful structured information

Explanation

Information extraction systems analyze unstructured content such as text, images, audio, and video to retrieve meaningful structured data.

Why the Other Answers Are Incorrect

A. Creating video games

This is unrelated to information extraction.

C. Compressing database files

This is a storage task, not AI extraction.

D. Replacing all human decision-making

AI systems are designed to assist and augment human processes, not completely replace all decision-making.

Final Thoughts

Information extraction is one of the most practical and widely used AI workloads covered in the AI-901 certification exam. Microsoft expects candidates to understand how AI systems extract useful insights from text, images, audio, and videos using NLP, speech AI, computer vision, and multimodal AI technologies.

These capabilities help organizations automate workflows, analyze large volumes of data, and build intelligent applications using Azure AI services.

Go to the AI-901 Exam Prep Hub main page