AI, AI-901, Microsoft Certification May 18, 2026

AI-901: Microsoft Azure AI Fundamentals – Practice Exam #2 (30 Questions)

Question 1

Which machine learning technique is BEST suited for predicting house prices?

A. Clustering
B. Regression
C. Object detection
D. Translation

Correct Answer

B. Regression

Explanation

Regression predicts continuous numeric values such as prices, temperatures, or sales forecasts.

Question 2

A company wants to automatically detect fraudulent credit-card transactions.

Which type of AI workload is MOST appropriate?

A. Classification
B. OCR
C. Image generation
D. Speech synthesis

HOTSPOT / MATCHING

Match each AI capability with its correct output.

Capability	Output
Speech synthesis	?
OCR	?
Sentiment analysis	?

Options:

Emotional tone
Spoken audio
Extracted text

Correct Answers

Capability	Output
Speech synthesis	Spoken audio
OCR	Extracted text
Sentiment analysis	Emotional tone

Question 6

Which type of AI model can generate entirely new images from text prompts?

A. Generative AI model
B. Regression model
C. Clustering model
D. Time-series model

Correct Answer

A. Generative AI model

Question 7

You need an AI solution that converts spoken customer calls into searchable transcripts.

Which capability should you use?

A. Speech recognition
B. Speech synthesis
C. OCR
D. Object detection

Correct Answer

A. Speech recognition

Question 8

MULTIPLE ANSWER

Which are common capabilities of computer vision solutions?

Select ALL that apply.

A. Object detection
B. Image classification
C. OCR
D. Language translation
E. Facial analysis

Correct Answers

A. Object detection
B. Image classification
C. OCR
E. Facial analysis

Question 9

What does an Azure AI endpoint provide?

A. A network-accessible location for interacting with an AI service
B. A physical monitor connection
C. A database backup
D. A printer configuration

Correct Answer

A. A network-accessible location for interacting with an AI service

Question 10

Question 16

Which Responsible AI principle focuses on ensuring AI systems work consistently and safely?

A. Reliability and safety
B. Transparency
C. Inclusiveness
D. Fairness

Correct Answer

A. Reliability and safety

Question 17

You deploy a model in Azure AI Foundry.

What is commonly required for applications to securely access the model?

A. Authentication credentials
B. A USB cable
C. A local printer
D. Spreadsheet macros

Correct Answer

A. Authentication credentials

Question 18

HOTSPOT / MATCHING

Match the workload to the correct scenario.

Scenario	Workload
Predicting future sales revenue	?
Detecting emotions in reviews	?
Identifying products in store images	?

Options:

Sentiment analysis
Regression
Object detection

Correct Answers

Scenario	Workload
Predicting future sales revenue	Regression
Detecting emotions in reviews	Sentiment analysis
Identifying products in store images	Object detection

Question 19

Which AI capability generates written descriptions of images?

A. Image captioning
B. OCR
C. Regression
D. Translation

Correct Answer

A. Image captioning

Question 20

Which statement about hallucinations in generative AI is TRUE?

A. Hallucinations are always intentional
B. Hallucinations are fabricated or inaccurate outputs
C. Hallucinations improve model accuracy
D. Hallucinations only occur in image models

Correct Answer

B. Hallucinations are fabricated or inaccurate outputs

Question 21

A retailer wants to group shoppers based on purchasing patterns without predefined categories.

Which machine learning technique should be used?

A. Clustering
B. Classification
C. OCR
D. Regression

Correct Answer

A. Clustering

Question 22

MULTIPLE ANSWER

Which tasks are examples of information extraction?

Select ALL that apply.

A. Extracting names from documents
B. Reading text from images
C. Detecting keywords in audio
D. Predicting stock prices
E. Identifying invoice totals

Correct Answers

A. Extracting names from documents
B. Reading text from images
C. Detecting keywords in audio
E. Identifying invoice totals

Question 23

Which Responsible AI principle emphasizes that humans remain responsible for AI outcomes?

A. Accountability
B. Fairness
C. Inclusiveness
D. Reliability

Correct Answer

A. Accountability

Question 24

FILL IN THE BLANK

__________ converts written text into spoken audio.

Correct Answer

Speech synthesis

Question 25

Which AI capability would BEST help visually impaired users understand photos?

A. Image captioning
B. Regression
C. Clustering
D. Forecasting

Correct Answer

A. Image captioning

Question 26

A customer-service solution automatically identifies whether callers are angry or satisfied.

Which AI capability is being used?

A. Sentiment analysis
B. OCR
C. Image classification
D. Forecasting

Correct Answer

A. Sentiment analysis

Question 27

MULTIPLE ANSWER

Which are advantages of using cloud-based Azure AI services?

Select ALL that apply.

A. Scalability
B. Reduced infrastructure management
C. Access to pretrained models
D. Elimination of all AI errors
E. Faster deployment

Correct Answers

A. Scalability
B. Reduced infrastructure management
C. Access to pretrained models
E. Faster deployment

Question 28

You need an AI solution that can analyze both spoken words and visual content from videos.

Which type of AI system is MOST appropriate?

A. Multimodal AI
B. Regression-only AI
C. Clustering-only AI
D. Spreadsheet automation AI

Correct Answer

A. Multimodal AI

Question 29

Which statement about APIs in Azure AI solutions is TRUE?

A. APIs allow applications to communicate with AI services
B. APIs physically store images
C. APIs replace authentication
D. APIs only work offline

Correct Answer

A. APIs allow applications to communicate with AI services

Question 30

SCENARIO-BASED QUESTION

A healthcare organization wants an AI application that:

Extracts text from medical forms
Converts doctor dictation into text
Identifies medical equipment in images
Summarizes patient notes

Which AI capabilities are required?

A. OCR, speech recognition, object detection, and text summarization
B. Forecasting and clustering only
C. Regression and translation only
D. Speech synthesis only

Correct Answer

A. OCR, speech recognition, object detection, and text summarization

Explanation

The scenario requires multiple AI workloads:

OCR for extracting text from forms
Speech recognition for doctor dictation
Object detection for medical equipment images
Text summarization for patient notes

Go to the AI-901 Exam Prep Hub main page

AI-901, Microsoft Certification May 18, 2026

AI-901: Microsoft Azure AI Fundamentals – Practice Exam #1 (30 Questions)

Question 1

Which type of AI workload is primarily used to predict future numeric values?

A. Computer vision
B. Regression
C. Classification
D. Natural language processing

Correct Answer

B. Regression

Explanation

Regression predicts continuous numeric values such as sales forecasts, temperatures, or stock prices.

Why the Other Answers Are Incorrect

A. Computer vision analyzes images and video.
C. Classification predicts categories rather than numeric values.
D. Natural language processing focuses on text and language.

Question 2

You need to determine whether customer feedback is positive, negative, or neutral.

Which AI capability should you use?

A. OCR
B. Object detection
C. Sentiment analysis
D. Speech synthesis

Correct Answer

C. Sentiment analysis

Explanation

Sentiment analysis evaluates emotional tone in text.

Question 3

Which Responsible AI principle focuses on ensuring AI systems treat people equitably?

A. Transparency
B. Fairness
C. Accountability
D. Reliability

Correct Answer

B. Fairness

Question 4

You are building a chatbot that answers customer questions.

Which type of AI workload is MOST appropriate?

A. Generative AI
B. Regression
C. Clustering
D. Forecasting

Correct Answer

A. Generative AI

Explanation

Generative AI models can generate human-like conversational responses.

Question 5

HOTSPOT / MATCHING

Match the AI capability to the correct scenario.

Scenario	Capability
Detecting handwritten text in scanned forms	?
Identifying objects in an image	?
Converting speech into text	?

Options:

OCR
Speech recognition
Object detection

Correct Answers

Scenario	Capability
Detecting handwritten text in scanned forms	OCR
Identifying objects in an image	Object detection
Converting speech into text	Speech recognition

Question 6

Which Azure AI capability generates spoken audio from text?

A. Speech recognition
B. Speech synthesis
C. OCR
D. Translation

Correct Answer

B. Speech synthesis

Question 7

You want to create an AI application that analyzes invoices and extracts totals and dates.

Which capability should you use?

A. Object detection
B. OCR and entity extraction
C. Speech synthesis
D. Classification only

Correct Answer

B. OCR and entity extraction

Explanation

Invoices contain text and structured information that can be extracted using OCR and entity extraction.

Question 8

MULTIPLE ANSWER

Which are common Responsible AI principles promoted by Microsoft?

Select ALL that apply.

A. Fairness
B. Transparency
C. Accountability
D. Exclusiveness
E. Reliability and safety

Correct Answers

A. Fairness
B. Transparency
C. Accountability
E. Reliability and safety

Explanation

Microsoft’s Responsible AI principles include:

Fairness
Reliability and safety
Privacy and security
Inclusiveness
Transparency
Accountability

Question 9

What is the PRIMARY purpose of a system prompt in generative AI?

A. To define the behavior and rules for the AI model
B. To increase internet speed
C. To encrypt databases
D. To replace APIs

Correct Answer

A. To define the behavior and rules for the AI model

Question 10

You need to identify cars, bicycles, and pedestrians in traffic-camera footage.

You are designing an AI solution for visually impaired users that describes images aloud.

Which capability is MOST appropriate?

A. Image captioning
B. Forecasting
C. Regression
D. Clustering

Correct Answer

A. Image captioning

Question 14

Which authentication method helps secure access to Azure AI services?

A. API keys
B. Printer drivers
C. HDMI cables
D. Browser bookmarks

Correct Answer

A. API keys

Question 15

MULTIPLE ANSWER

Which tasks are examples of natural language processing (NLP)?

Select ALL that apply.

A. Language translation
B. Sentiment analysis
C. Image classification
D. Text summarization
E. Entity extraction

Correct Answers

A. Language translation
B. Sentiment analysis
D. Text summarization
E. Entity extraction

Question 16

Which AI workload predicts categories such as “approved” or “denied”?

A. Regression
B. Classification
C. Clustering
D. Computer vision

Correct Answer

B. Classification

Question 17

You are using Azure AI Foundry to deploy a generative AI model.

What must happen before applications can interact with the model?

A. The model must be deployed to an endpoint
B. The model must be printed
C. The operating system must be replaced
D. The database must be deleted

Correct Answer

A. The model must be deployed to an endpoint

Question 18

HOTSPOT / MATCHING

Match each workload with the correct example.

Workload	Example
Speech AI	?
Computer Vision	?
Generative AI	?

Options:

Detecting objects in images
Generating marketing text
Transcribing audio recordings

Correct Answers

Workload	Example
Speech AI	Transcribing audio recordings
Computer Vision	Detecting objects in images
Generative AI	Generating marketing text

Question 19

What is a hallucination in generative AI?

A. A hardware failure
B. A networking issue
C. An incorrect or fabricated AI-generated response
D. A database backup

Correct Answer

C. An incorrect or fabricated AI-generated response

Question 20

Which factor can reduce speech-recognition accuracy?

A. Background noise
B. High-quality microphones
C. Clear pronunciation
D. Stable internet connections

Correct Answer

A. Background noise

Question 21

You need to group customers into segments based on purchasing behavior without predefined labels.

Which machine learning technique should you use?

A. Classification
B. Regression
C. Clustering
D. OCR

Correct Answer

C. Clustering

Question 22

MULTIPLE ANSWER

Which capabilities are associated with Azure AI Speech services?

Select ALL that apply.

A. Speech recognition
B. Speech synthesis
C. Translation
D. Object detection
E. Speaker identification

Correct Answers

A. Speech recognition
B. Speech synthesis
C. Translation
E. Speaker identification

Question 23

Which Responsible AI principle emphasizes explaining how AI systems make decisions?

A. Transparency
B. Privacy
C. Inclusiveness
D. Reliability

Correct Answer

A. Transparency

Question 24

FILL IN THE BLANK

__________ extracts machine-readable text from images and scanned documents.

Correct Answer

OCR

Optical Character Recognition

Question 25

A company wants to automatically summarize long customer-support conversations.

Which AI capability should they use?

A. Text summarization
B. Object detection
C. Forecasting
D. Regression

Correct Answer

A. Text summarization

Question 26

You need an AI system that can understand both images and text prompts.

Which type of model should you use?

A. Multimodal model
B. Regression model
C. Clustering model
D. Time-series model

Correct Answer

A. Multimodal model

Question 27

MULTIPLE ANSWER

Which are benefits of cloud-based AI services?

Select ALL that apply.

A. Scalability
B. Reduced infrastructure management
C. Automatic access to pretrained models
D. Elimination of all security concerns
E. Faster deployment

Correct Answers

A. Scalability
B. Reduced infrastructure management
C. Automatic access to pretrained models
E. Faster deployment

Question 28

You are creating a lightweight application that sends images to Azure AI services for analysis.

How does the application typically communicate with the service?

A. Through APIs and endpoints
B. Through printer drivers
C. Through USB storage devices
D. Through monitor settings

Correct Answer

A. Through APIs and endpoints

Question 29

Which AI capability is MOST useful for detecting the emotional tone of customer reviews?

A. OCR
B. Sentiment analysis
C. Image classification
D. Speech synthesis

Correct Answer

B. Sentiment analysis

Question 30

SCENARIO-BASED QUESTION

A retail company wants an AI solution that:

Extracts text from receipts
Detects products in shelf images
Analyzes customer-service calls
Generates chatbot responses

Which AI workloads are required?

A. OCR, object detection, speech AI, and generative AI
B. Regression only
C. Classification only
D. Forecasting and clustering only

Correct Answer

A. OCR, object detection, speech AI, and generative AI

Explanation

The scenario requires multiple AI capabilities:

OCR for receipt text extraction
Object detection for shelf-image analysis
Speech AI for customer-call analysis
Generative AI for chatbot responses

AI, AI-901, azure, Microsoft Certification May 18, 2026

Build a lightweight application with Information Extraction capabilities by using Content Understanding (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions for information extraction by using Foundry
      --> Build a lightweight application with Information Extraction capabilities by using Content Understanding

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Modern organizations often need applications that can automatically extract information from documents, images, audio, and video. Azure AI services and Microsoft Foundry tools make it possible to create lightweight applications that use AI-powered content understanding without requiring advanced machine learning expertise.

For the AI-901 certification exam, candidates should understand the foundational concepts involved in building lightweight applications with information extraction capabilities by using Azure Content Understanding and Microsoft Foundry.

This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.

What Is Information Extraction?

Information extraction is the process of automatically identifying and retrieving useful data from content.

AI systems can extract information from:

Documents
Images
Audio
Video
Text

Examples include:

Names
Dates
Invoice totals
Keywords
Objects
Spoken words

What Is Azure Content Understanding?

Azure Content Understanding enables AI-powered analysis of different types of content.

Capabilities include:

OCR (Optical Character Recognition)
Speech recognition
Entity extraction
Image analysis
Video analysis
Classification
Caption generation

What Is a Lightweight Application?

A lightweight application is a simple application that performs focused tasks using cloud-based AI services.

Characteristics include:

Minimal infrastructure
API-based communication
Rapid development
Simple user interface
Cloud-hosted AI processing

For AI-901, candidates should understand concepts and workflows rather than advanced coding details.

Azure AI Foundry

Azure AI Foundry provides tools for building and testing AI applications.

Developers can:

Access AI models
Configure services
Test prompts
Analyze content
Build AI-powered workflows

Common Information Extraction Capabilities

OCR (Optical Character Recognition)

OCR extracts text from images and scanned documents.

Example

Input

Photo of a receipt

Output

Store name
Total amount
Purchase date

Entity Extraction

AI systems can identify important entities within content.

Examples of Entities

Names
Locations
Organizations
Phone numbers
Dates

Speech Recognition

Speech recognition converts spoken language into text.

Example

Input

Customer support call recording

Output

Searchable transcript

Object Detection

Object detection identifies objects within images or video.

Example

A warehouse-monitoring application may detect:

Boxes
Forklifts
Employees

Sentiment Analysis

Sentiment analysis determines emotional tone.

Example

Customer feedback classified as:

Positive
Neutral
Negative

Typical Lightweight Application Workflow

A lightweight information-extraction application often follows these steps:

User uploads content
Application sends content to Azure AI service
AI analyzes content
Structured results are returned
Application displays extracted information

Example Workflow

User uploads:

Image
PDF
Audio file
Video file

AI extracts:

Text
Keywords
Objects
Entities
Captions

APIs and Endpoints

Applications communicate with Azure AI services through:

APIs
Endpoints

The application sends content to the AI service and receives structured results.

Authentication

Applications must authenticate securely before using Azure AI services.

Common authentication methods include:

API keys
Azure credentials
Managed identities

Example High-Level Pseudocode

			
content = upload_file()
results = analyze_content(content)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Structured Outputs

AI systems often return structured data formats such as:

JSON
Tables
Lists
Metadata

Structured outputs make integration easier.

Example JSON-Like Output

			
{
  "invoiceNumber": "INV-1001",
  "date": "2026-05-15",
  "total": "$245.99"
}

		

Common Real-World Scenarios

Scenario 1: Invoice Processing

Goal

Automatically extract invoice data.

Extracted Information

Vendor name
Invoice number
Total amount
Due date

Scenario 2: Customer Service Analytics

Goal

Analyze customer interactions.

Extracted Information

Topics
Sentiment
Keywords
Transcripts

Scenario 3: Healthcare Document Analysis

Goal

Extract information from medical documents.

Extracted Information

Patient names
Dates
Medical terms

Scenario 4: Media Monitoring

Goal

Analyze audio and video content.

Extracted Information

Captions
Objects
Speakers
Keywords

Responsible AI Considerations

Information-extraction applications should follow Responsible AI principles.

Key considerations include:

Privacy
Fairness
Transparency
Inclusiveness
Accountability
Security

Privacy Concerns

Content may contain:

Personal information
Financial records
Medical data
Private conversations

Organizations should secure sensitive data appropriately.

Fairness and Bias

AI systems may perform differently across:

Languages
Accents
Demographics
Image quality
Environmental conditions

Testing and evaluation are important.

Transparency

Users should understand:

AI is analyzing their content
AI-generated outputs may contain errors
Human review may still be needed

Accuracy Limitations

Information-extraction systems may struggle with:

Blurry images
Poor audio quality
Handwritten text
Background noise
Low-resolution files

Hallucinations and Errors

AI systems may occasionally:

Extract incorrect information
Misidentify objects
Misinterpret speech
Generate inaccurate summaries

Applications should validate important outputs.

Error Handling

Applications should handle:

Unsupported file formats
Corrupted files
Authentication failures
Network interruptions
Rate limits

Advantages of Lightweight AI Applications

Benefits include:

Rapid deployment
Reduced development complexity
Scalability
Automation
Faster information processing

Limitations of Lightweight AI Applications

Challenges include:

Dependence on cloud services
Accuracy limitations
Privacy concerns
Potential bias
Environmental variability

Multimodal AI

Modern AI systems can combine:

Text
Speech
Vision
Generative AI

These systems can process multiple content types together.

High-Level Architecture

A simplified architecture often includes:

User uploads content
Application sends content to Azure AI service
AI analyzes content
Structured results are returned
Application displays extracted information

Important AI-901 Exam Tips

For the exam, remember these key points:

Information extraction retrieves useful data from content.
OCR extracts text from images and documents.
Speech recognition converts speech into text.
Object detection identifies objects within images or video.
APIs and endpoints connect applications to Azure AI services.
Authentication secures access to AI resources.
Structured outputs often use JSON-like formats.
Responsible AI principles apply to information extraction systems.
Poor-quality content can reduce accuracy.
Hallucinations are inaccurate AI-generated outputs.
Azure AI Foundry supports AI application development.

Quick Knowledge Check

Practice Exam Questions

D. Operating system crashes

This is unrelated to AI hallucinations.

Final Thoughts

Building lightweight applications with information extraction capabilities is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as OCR, speech recognition, APIs, authentication, structured outputs, Responsible AI principles, and lightweight AI workflows.

Azure AI services and Azure AI Foundry provide powerful tools for creating scalable applications capable of extracting valuable information from text, images, audio, video, and documents.

Go to the AI-901 Exam Prep Hub main page

AI, AI-901, azure, Microsoft Certification May 18, 2026

Extract information from audio and video by using Content Understanding (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions for information extraction by using Foundry
      --> Extract information from audio and video by using Content Understanding

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Organizations increasingly rely on AI systems to analyze audio and video content for automation, accessibility, security, analytics, and customer experiences. AI-powered content understanding solutions can extract valuable information from spoken language, sounds, images, and moving video streams.

For the AI-901 certification exam, candidates should understand the foundational concepts behind extracting information from audio and video by using Azure Content Understanding and Microsoft Foundry tools.

This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.

What Is Content Understanding?

Content understanding refers to AI systems analyzing and interpreting different forms of content, including:

Audio
Video
Images
Documents
Text

AI systems can identify patterns, extract information, and generate useful insights.

Azure Content Understanding

Azure Content Understanding enables AI-powered analysis of multimedia content.

Capabilities include:

Speech recognition
Video analysis
Speaker identification
Caption generation
Object detection
Keyword extraction

Azure AI Foundry

Azure AI Foundry provides tools for building, testing, and managing AI applications.

Developers can:

Deploy AI services
Process multimedia content
Build lightweight applications
Test AI workflows

Audio Information Extraction

AI systems can analyze audio files to extract useful information.

Examples include:

Spoken words
Speaker identity
Keywords
Emotions
Language detection

Speech Recognition

Speech recognition converts spoken language into text.

Example

Input

Audio recording of a meeting

Output

Positive
Neutral
Negative

Video Information Extraction

Video analysis combines:

Audio analysis
Image analysis
Motion analysis

Common Video Analysis Capabilities

AI systems may perform:

Object detection
Facial analysis
Activity recognition
Scene description
Text extraction
Caption generation

Object Detection in Video

AI systems can identify objects appearing in video frames.

Example

A traffic-monitoring system may detect:

Cars
Trucks
Pedestrians
Traffic lights

Scene Detection

AI systems can identify scene changes within videos.

Example

A sports video may identify:

Game start
Replay segments
Commercial breaks

Video Captioning

AI systems can generate descriptions or subtitles for videos.

Example

A training video may automatically generate captions for accessibility.

Optical Character Recognition (OCR) in Video

AI systems can extract text appearing in video frames.

Example

A video may contain:

Street signs
License plates
Product labels

APIs and Endpoints

Applications communicate with Azure AI services using:

APIs
Endpoints

Audio and video content is submitted programmatically for analysis.

Authentication

Applications must securely authenticate before accessing Azure AI services.

Common authentication methods include:

API keys
Azure credentials
Managed identities

Lightweight Application Workflow

A typical workflow includes:

User uploads audio or video
Application sends content to AI service
AI analyzes multimedia content
Results are returned
Application displays extracted information

Example High-Level Pseudocode

			
media = upload_media()
results = analyze_media(media)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Common Real-World Scenarios

Scenario 1: Meeting Transcription

Goal

Convert meeting audio into searchable text.

Features

Speech recognition
Speaker identification
Keyword extraction

Scenario 2: Call Center Analytics

Goal

Analyze customer service calls.

Features

Sentiment analysis
Topic extraction
Call summarization

Scenario 3: Security Monitoring

Goal

Analyze surveillance video.

Features

Object detection
Activity recognition
Facial analysis

Scenario 4: Video Accessibility

Goal

Improve accessibility for multimedia content.

Features

Caption generation
Speech transcription
Scene descriptions

Responsible AI Considerations

Audio and video AI systems should follow Responsible AI principles.

Key considerations include:

Privacy
Fairness
Transparency
Inclusiveness
Accountability
Security

Privacy Concerns

Audio and video may contain:

Personal conversations
Faces
Biometric data
Sensitive information

Organizations should protect multimedia data appropriately.

Fairness and Bias

Speech and video systems may perform differently across:

Languages
Accents
Dialects
Lighting conditions
Demographics

Testing and evaluation are important.

Transparency

Users should understand:

AI is analyzing multimedia content
AI-generated outputs may contain errors
Human review may still be needed

Accuracy Limitations

Audio and video analysis systems may struggle with:

Background noise
Poor audio quality
Low-resolution video
Obstructed visuals
Multiple overlapping speakers

Hallucinations and Errors

AI systems may occasionally:

Misidentify speakers
Generate inaccurate captions
Misinterpret speech
Detect nonexistent objects

Applications should validate important outputs.

Error Handling

Applications should handle:

Unsupported file formats
Corrupted media files
Authentication failures
Network interruptions
Rate limits

Advantages of Multimedia Information Extraction

Benefits include:

Automation
Faster analysis
Improved accessibility
Searchable content
Scalable processing

Limitations of Multimedia Information Extraction

Challenges include:

Privacy concerns
Accuracy limitations
Bias
Environmental variability
Ethical considerations

Multimodal AI

Modern AI systems may combine:

Speech
Vision
Text
Generative AI

These systems can:

Analyze multimedia content
Answer questions
Generate summaries
Create captions and descriptions

High-Level Architecture

A simplified architecture often includes:

User uploads audio/video
Application sends media to Azure AI service
AI processes multimedia content
Structured results are returned
Application displays extracted information

Important AI-901 Exam Tips

For the exam, remember these key points:

Speech recognition converts speech to text.
Speaker identification distinguishes speakers.
Sentiment analysis detects emotional tone.
OCR can extract text from video frames.
Object detection identifies objects in video.
APIs and endpoints connect applications to AI services.
Authentication secures AI resources.
Responsible AI principles apply to multimedia AI systems.
Poor audio or video quality can reduce accuracy.
Hallucinations are inaccurate AI-generated outputs.
Azure AI Foundry supports multimedia AI application development.

Quick Knowledge Check

Practice Exam Questions

D. Speaker hardware malfunctions

This is a hardware problem, not an AI hallucination.

Final Thoughts

Extracting information from audio and video by using Content Understanding is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as speech recognition, video analysis, OCR, APIs, authentication, Responsible AI principles, and lightweight multimedia-analysis workflows.

Azure AI services and Azure AI Foundry provide powerful tools for building intelligent multimedia applications capable of understanding spoken language, video content, and visual information at scale.

Go to the AI-901 Exam Prep Hub main page

AI, AI-901, Artificial Intelligence (AI), Microsoft Certification May 18, 2026May 18, 2026

Extract information from images by using Content Understanding (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions for information extraction by using Foundry
      --> Extract information from images by using Content Understanding

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Modern AI systems can analyze images and extract meaningful information automatically. Organizations use image analysis solutions for automation, accessibility, security, healthcare, retail, and business intelligence.

For the AI-901 certification exam, candidates should understand the foundational concepts behind extracting information from images by using Azure Content Understanding and Microsoft Foundry tools.

This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.

What Is Image Information Extraction?

Image information extraction is the process of analyzing images to identify and retrieve useful information.

AI systems can detect:

Text
Objects
Faces
Colors
Products
Landmarks
Visual patterns

What Is Azure Content Understanding?

Azure Content Understanding enables AI systems to interpret and analyze content such as:

Images
Documents
Audio
Video

Capabilities include:

OCR
Object detection
Classification
Caption generation
Metadata extraction

Azure AI Foundry

Azure AI Foundry provides tools for building, testing, and managing AI-powered applications.

Developers can:

Access AI models
Analyze images
Build lightweight applications
Test AI workflows

Common Image Extraction Techniques

Optical Character Recognition (OCR)

OCR extracts text from images.

Example

Image

Photo of a street sign

OCR Output

“Main Street”

Object Detection

Object detection identifies objects and their locations within images.

Example

Detected Objects

Car
Bicycle
Traffic light
Person

Image Classification

Image classification determines the overall category of an image.

Example

Image

Photo of a cat

Classification

“Cat”

Facial Analysis

AI systems can analyze facial characteristics.

Capabilities may include:

Face detection
Emotion analysis
Age estimation

Responsible AI considerations are especially important for facial-analysis systems.

Image Captioning

Image captioning generates natural-language descriptions of images.

Example

Image

A dog running on a beach

Caption

“A brown dog running along a sandy beach.”

Metadata Extraction

AI systems can extract metadata and contextual information from images.

Examples include:

Time
Location
Camera details
Image dimensions

Barcode and QR Code Detection

AI systems can identify and decode:

Barcodes
QR codes

Example

Retail applications may scan product barcodes for inventory management.

APIs and Endpoints

Applications communicate with Azure AI services using:

APIs
Endpoints

Images are submitted programmatically for analysis.

Authentication

Applications must securely authenticate before accessing AI services.

Common methods include:

API keys
Azure credentials
Managed identities

Lightweight Application Workflow

A typical workflow includes:

User uploads image
Application sends image to AI service
AI analyzes image
Results are returned
Application displays extracted information

Example High-Level Pseudocode

			
image = upload_image()
results = analyze_image(image)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Common Real-World Scenarios

Scenario 1: Receipt Scanner

Goal

Extract purchase details from receipt images.

Features

OCR
Table extraction
Total amount detection

Scenario 2: Accessibility Assistant

Goal

Describe images for visually impaired users.

Features

Image captioning
OCR
Object detection

Scenario 3: Retail Inventory

Goal

Identify products from shelf images.

Features

Barcode scanning
Object detection
Classification

Scenario 4: Traffic Monitoring

Goal

Analyze roadway images.

Features

Vehicle detection
Traffic analysis
License plate reading

Responsible AI Considerations

Image-analysis applications should follow Responsible AI principles.

Key considerations include:

Privacy
Fairness
Transparency
Inclusiveness
Accountability
Security

Privacy Concerns

Images may contain:

Faces
Personal information
License plates
Sensitive documents

Organizations should protect image data appropriately.

Fairness and Bias

Vision systems may perform differently across:

Lighting conditions
Skin tones
Environmental conditions
Camera quality

Testing and evaluation are important.

Transparency

Users should understand:

AI is analyzing images
AI-generated outputs may contain errors
Images may be processed in the cloud

Accuracy Limitations

Image extraction systems may struggle with:

Blurry images
Poor lighting
Obstructed objects
Low-resolution images

Hallucinations and Errors

AI systems may occasionally:

Misidentify objects
Generate incorrect captions
Extract inaccurate text

Applications should validate important outputs.

Error Handling

Applications should handle:

Unsupported image formats
Corrupted files
Authentication failures
Network interruptions
Rate limits

Advantages of Image Extraction AI

Benefits include:

Faster processing
Automation
Scalability
Accessibility improvements
Reduced manual work

Limitations of Image Extraction AI

Challenges include:

Accuracy limitations
Bias
Privacy concerns
Environmental variability
Ethical considerations

Multimodal AI

Some modern AI systems combine:

Vision
Text
Speech
Generative AI

These systems can:

Analyze images
Answer visual questions
Generate descriptions
Create new content

High-Level Architecture

A simplified architecture often includes:

User uploads image
Application sends image to Azure AI service
AI processes image
Structured results are returned
Application displays information

Important AI-901 Exam Tips

For the exam, remember these key points:

OCR extracts text from images.
Object detection identifies objects and locations.
Image classification categorizes images.
Image captioning generates natural-language descriptions.
APIs and endpoints connect applications to AI services.
Authentication secures access to AI resources.
Responsible AI principles apply to image-analysis systems.
Poor image quality can reduce accuracy.
Hallucinations are inaccurate AI-generated outputs.
Azure AI Foundry supports AI application development.

Quick Knowledge Check

Practice Exam Questions

D. Audio recording problems

This is unrelated to image-analysis systems.

Final Thoughts

Extracting information from images by using Content Understanding is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as OCR, object detection, image classification, APIs, authentication, Responsible AI principles, and lightweight image-analysis workflows.

Azure AI services and Azure AI Foundry provide powerful tools for building scalable AI applications capable of understanding and extracting valuable information from visual content.

Go to the AI-901 Exam Prep Hub main page

AI, AI-901, Computer Vision, Microsoft Certification May 18, 2026

Build a lightweight application that includes vision capabilities (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions with computer vision and image-generation capabilities by using Foundry
      --> Build a lightweight application that includes vision capabilities

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Computer vision enables AI systems to interpret and analyze visual information such as images and videos. Organizations use computer vision solutions for automation, accessibility, security, analytics, and customer experiences.

For the AI-901 certification exam, candidates should understand the foundational concepts behind building lightweight applications that include vision capabilities by using Microsoft Azure AI services and Azure AI Foundry.

This topic falls under the “Implement AI solutions with computer vision and image-generation capabilities by using Foundry” section of the AI-901 exam objectives.

What Is Computer Vision?

Computer vision is a field of AI that enables systems to analyze and understand visual information.

Visual data may include:

Images
Videos
Scanned documents
Camera feeds

Common Computer Vision Tasks

Computer vision systems commonly perform:

Image classification
Object detection
Optical character recognition (OCR)
Facial analysis
Image captioning
Content moderation

Azure AI Vision

Azure AI Vision provides computer vision capabilities through cloud-based AI services.

Features include:

Image analysis
OCR
Object detection
Image captioning
Facial attribute analysis

What Is a Lightweight Application?

A lightweight application is a simple application designed to perform focused tasks with minimal complexity and infrastructure.

Characteristics include:

Simple user interface
Fast deployment
Minimal resource usage
Easy maintenance

Examples of Lightweight Vision Applications

Examples include:

Image analysis tools
Receipt scanning apps
Accessibility assistants
Product recognition apps
Photo-tagging systems

Azure AI Foundry

Azure AI Foundry provides tools for building, testing, and managing AI-powered applications.

Developers can:

Access AI models
Deploy services
Test prompts
Build AI workflows

Facial Analysis

Computer vision systems can analyze facial features.

Possible capabilities include:

Face detection
Emotion analysis
Age estimation

For Responsible AI reasons, facial recognition and identification systems require careful consideration.

APIs and Endpoints

Applications communicate with Azure AI services using:

APIs
Endpoints

These allow images to be analyzed programmatically.

Authentication

Applications must securely authenticate before accessing Azure AI services.

Common authentication methods include:

API keys
Azure credentials
Managed identities

User Interface Components

A lightweight vision application may include:

Image upload area
Camera capture button
Results display
Image preview

Real-Time Image Processing

Some applications process images in near real time.

Examples include:

Security monitoring
Live object detection
Accessibility tools

Example Workflow

A common workflow includes:

User uploads image
Application sends image to Azure AI Vision
AI service analyzes image
Results are returned
Application displays findings

Example High-Level Pseudocode

			
image = upload_image()
results = analyze_image(image)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Common Real-World Scenarios

Scenario 1: Receipt Scanner

Goal

Extract purchase information from receipts.

Features

OCR
Text extraction
Data organization

Scenario 2: Accessibility Assistant

Goal

Describe images for visually impaired users.

Features

Image captioning
OCR
Spoken descriptions

Scenario 3: Product Recognition

Goal

Identify products from photos.

Features

Object detection
Classification
Product lookup

Scenario 4: Content Moderation

Goal

Identify harmful or inappropriate images.

Features

Image analysis
Safety detection
Automated filtering

Responsible AI Considerations

Vision-enabled applications should follow Responsible AI principles.

Key considerations include:

Fairness
Privacy
Transparency
Inclusiveness
Accountability
Security

Privacy Concerns

Images may contain:

Personal data
Faces
Sensitive documents
Location information

Organizations should protect visual data appropriately.

Bias and Fairness

Computer vision systems may perform unevenly across:

Skin tones
Lighting conditions
Demographics
Environmental conditions

Testing and evaluation are important for fairness.

Transparency

Users should understand:

AI is analyzing images
AI-generated results may contain errors
Images may be processed in the cloud

Hallucinations and Errors

Vision systems may occasionally generate:

Incorrect captions
False detections
Inaccurate classifications

These incorrect outputs are sometimes called hallucinations.

Error Handling

Applications should handle:

Invalid image formats
Poor image quality
Authentication failures
Network interruptions
Rate limits

Image Quality Challenges

Computer vision accuracy can decrease with:

Blurry images
Poor lighting
Low resolution
Obstructed objects

Advantages of Vision Applications

Benefits include:

Automation
Faster analysis
Accessibility improvements
Improved customer experiences
Scalable image processing

Limitations of Vision Applications

Challenges include:

Recognition inaccuracies
Bias
Privacy concerns
Variable image quality
Ethical considerations

High-Level Architecture

A simplified architecture often includes:

User interface
Image upload/capture
Azure AI Vision service
AI analysis
Results display

Generative Vision Capabilities

Some modern systems combine:

Computer vision
Generative AI

These multimodal systems can:

Analyze images
Generate descriptions
Answer visual questions
Create new images

Important AI-901 Exam Tips

For the exam, remember these key points:

Computer vision analyzes visual information.
Azure AI Vision provides computer vision capabilities.
OCR extracts text from images.
Object detection identifies multiple objects in images.
Image captioning generates natural-language image descriptions.
APIs and endpoints connect applications to Azure AI services.
Authentication secures service access.
Responsible AI principles apply to computer vision systems.
Image quality affects AI accuracy.
Hallucinations are inaccurate AI-generated outputs.

Quick Knowledge Check

This is unrelated to AI vision systems.

Final Thoughts

Building lightweight applications with vision capabilities is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand the foundational concepts behind computer vision applications, including image classification, object detection, OCR, APIs, authentication, Responsible AI principles, and real-world implementation workflows.

Azure AI Vision and Azure AI Foundry provide powerful cloud-based tools that make it easier to build intelligent applications capable of analyzing and understanding visual information.

Go to the AI-901 Exam Prep Hub main page

AI, AI-901, Artificial Intelligence (AI), azure, Microsoft Certification May 18, 2026May 18, 2026

Extract information from documents and forms by using Azure Content Understanding in Foundry Tools (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions for information extraction by using Foundry
      --> Extract information from documents and forms by using Azure Content Understanding in Foundry Tools

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Organizations process enormous amounts of documents every day, including invoices, receipts, forms, contracts, and identification documents. AI-powered information extraction solutions help automate the process of reading, understanding, and organizing document data.

For the AI-901 certification exam, candidates should understand the foundational concepts behind extracting information from documents and forms by using Azure Content Understanding and Microsoft Foundry tools.

This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.

What Is Information Extraction?

Information extraction is the process of identifying and retrieving useful data from documents, images, forms, audio, or other content.

Examples include extracting:

Names
Dates
Invoice totals
Addresses
Phone numbers
Product information

What Is Azure Content Understanding?

Azure Content Understanding helps AI systems analyze and interpret structured and unstructured documents.

Capabilities include:

Text extraction
Form recognition
Document analysis
Information classification
Key-value pair extraction

Azure AI Foundry

Azure AI Foundry provides tools for building, testing, and managing AI-powered applications.

Developers can:

Configure AI services
Process documents
Test extraction workflows
Build lightweight AI applications

Structured vs. Unstructured Documents

Structured Documents

Structured documents follow a consistent layout.

Examples include:

Tax forms
Invoices
Receipts
Application forms

Unstructured Documents

Unstructured documents have less predictable layouts.

Examples include:

Emails
Letters
Articles
Contracts

Optical Character Recognition (OCR)

OCR converts text within images or scanned documents into machine-readable text.

Example

Input

Scanned receipt image

OCR Output

Store name
Date
Total amount

Form Recognition

Form recognition identifies fields and values within forms.

Example

Form

Insurance application

Extracted Data

Customer name
Policy number
Address
Claim amount

Key-Value Pair Extraction

AI systems can identify relationships between labels and values.

Example

Key	Value
Invoice Number	INV-1045
Total	$250.00
Due Date	05/30/2026

Table Extraction

AI can identify and extract tables from documents.

Example

A receipt table may contain:

Item names
Quantities
Prices

Classification

Document classification identifies the type of document being processed.

Example

The system determines whether a file is:

Invoice
Contract
Receipt
Resume

Named Entity Recognition (NER)

NER identifies important entities within text.

Entities may include:

People
Organizations
Locations
Dates

Example

Text

“John Smith works for Contoso in Seattle.”

Extracted Entities

John Smith (Person)
Contoso (Organization)
Seattle (Location)

APIs and Endpoints

Applications communicate with Azure AI services through:

APIs
Endpoints

Documents are submitted for analysis programmatically.

Authentication

Applications must securely authenticate before accessing Azure AI services.

Common authentication methods include:

API keys
Azure credentials
Managed identities

Lightweight Application Workflow

A typical workflow includes:

User uploads document
Application sends file to AI service
AI extracts information
Results are returned
Application displays or stores extracted data

Example Workflow

Input

Scanned invoice

AI Processing

OCR
Key-value extraction
Table analysis

Output

Structured invoice data

Example High-Level Pseudocode

			
document = upload_document()
results = analyze_document(document)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Common Real-World Scenarios

Scenario 1: Invoice Processing

Goal

Automate invoice data extraction.

Features

OCR
Table extraction
Total amount detection

Scenario 2: Receipt Scanning

Goal

Extract purchase information from receipts.

Features

Text extraction
Merchant identification
Expense categorization

Scenario 3: Resume Processing

Goal

Extract candidate information from resumes.

Features

Name extraction
Skill identification
Contact information detection

Scenario 4: Healthcare Forms

Goal

Digitize patient records.

Features

Form recognition
Key-value extraction
Classification

Responsible AI Considerations

Document-processing applications should follow Responsible AI principles.

Key considerations include:

Privacy
Security
Fairness
Transparency
Accountability
Inclusiveness

Privacy Concerns

Documents may contain:

Personal information
Financial data
Medical information
Legal records

Organizations should protect sensitive data appropriately.

Security Considerations

Applications should secure:

Uploaded files
Stored documents
API credentials
Extracted data

Transparency

Users should understand:

AI is analyzing documents
Extracted data may contain errors
Human review may still be needed

Accuracy Limitations

AI extraction systems may struggle with:

Poor scan quality
Handwritten text
Complex layouts
Damaged documents

Hallucinations and Errors

AI systems may occasionally:

Extract incorrect values
Miss fields
Misclassify documents

Applications should validate important information.

Error Handling

Applications should handle:

Unsupported file formats
Corrupted documents
Authentication failures
Network interruptions
Rate limits

Advantages of Information Extraction AI

Benefits include:

Faster document processing
Reduced manual entry
Improved scalability
Increased automation
Better searchability

Limitations of Information Extraction AI

Challenges include:

Variable document quality
Handwriting recognition difficulties
Inconsistent layouts
Privacy concerns
Extraction inaccuracies

Generative AI and Information Extraction

Some modern systems combine:

OCR
Document intelligence
Generative AI

This enables:

Summarization
Question answering
Conversational document analysis

High-Level Architecture

A simplified architecture often includes:

User uploads document
Application sends document to Azure AI service
AI analyzes content
Structured data is returned
Application displays or stores results

Important AI-901 Exam Tips

For the exam, remember these key points:

OCR extracts text from documents and images.
Form recognition identifies fields and values.
Key-value extraction identifies label-value relationships.
Table extraction retrieves structured table data.
Classification identifies document types.
APIs and endpoints connect applications to Azure AI services.
Authentication secures access to AI resources.
Responsible AI principles apply to document-processing systems.
Poor document quality can reduce extraction accuracy.
AI-generated outputs may still require validation.

Quick Knowledge Check

Practice Exam Questions

D. Azure DNS

This is a networking service.

Question 5

What is key-value pair extraction?

A. Identifying labels and their associated values in documents
B. Encrypting document files
C. Compressing image sizes
D. Converting audio into text

Correct Answer

A. Identifying labels and their associated values in documents

Explanation

Key-value extraction identifies relationships such as:

Invoice Number → INV-1045
Total → $250.00

Why the Other Answers Are Incorrect

B. Increasing monitor brightness

This is unrelated to Responsible AI.

C. Improving printer speed

This is unrelated to document intelligence.

D. Reducing spreadsheet file size

This is unrelated to AI ethics or privacy.

Final Thoughts

Extracting information from documents and forms using Azure Content Understanding and Foundry tools is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as OCR, form recognition, document analysis, APIs, authentication, Responsible AI principles, and lightweight document-processing workflows.

Azure AI services and Azure AI Foundry provide powerful tools for automating information extraction and improving efficiency across business, healthcare, finance, and administrative scenarios.

Go to the AI-901 Exam Prep Hub main page

AI, AI-901, Generative AI, Microsoft Certification May 18, 2026

Create new visual outputs by using generative models (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions with computer vision and image-generation capabilities by using Foundry
      --> Create new visual outputs by using generative models

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Generative AI models are capable of creating entirely new content based on patterns learned during training. One important category of generative AI focuses on producing visual outputs such as images, artwork, diagrams, and design concepts.

For the AI-901 certification exam, candidates should understand the foundational concepts behind creating new visual outputs by using generative AI models through Microsoft Azure AI Foundry and related Azure AI services.

This topic falls under the “Implement AI solutions with computer vision and image-generation capabilities by using Foundry” section of the AI-901 exam objectives.

What Is Generative AI?

Generative AI refers to AI systems capable of creating new content rather than simply analyzing existing data.

Generative AI can produce:

Text
Images
Audio
Video
Code

What Are Generative Image Models?

Generative image models create new visual content from prompts or instructions.

These models can generate:

Artwork
Illustrations
Photorealistic images
Concept designs
Marketing graphics

Example Prompt

“Create an image of a futuristic city at sunset.”

The model generates a new image based on the description.

Azure AI Foundry

Azure AI Foundry provides tools for building and deploying AI-powered applications, including generative AI solutions.

Developers can:

Access generative models
Test prompts
Deploy models
Build AI applications

Image Generation Workflow

A common image-generation workflow includes:

User enters prompt
Application sends prompt to model
Generative model creates image
Application displays generated output

Text-to-Image Generation

Text-to-image models generate images from natural-language prompts.

Example

Prompt

“A golden retriever wearing sunglasses on a beach.”

Result

A newly generated image matching the description.

Image Editing

Some generative models can modify existing images.

Capabilities may include:

Removing objects
Replacing backgrounds
Extending images
Applying artistic styles

Example

Original Image

Photo of a park

Prompt

“Add snow to the scene.”

The model generates an updated version of the image.

Style Transfer

Style transfer applies artistic styles to images.

Example

Prompt

“Make this image look like a watercolor painting.”

The AI transforms the image style.

Inpainting

Inpainting fills missing or selected portions of images.

Example

A damaged image has missing areas that the AI reconstructs.

Outpainting

Outpainting expands images beyond their original boundaries.

Example

A cropped landscape image is extended to show more scenery.

Prompt Engineering

Prompt engineering involves crafting prompts that improve AI-generated results.

Good prompts are:

Clear
Detailed
Specific

Weak Prompt Example

“Create a dog.”

Better Prompt Example

“Create a realistic golden retriever sitting beside a lake during sunset.”

System Prompts

System prompts guide the overall behavior of the AI model.

They may define:

Safety rules
Content restrictions
Tone
Style preferences

Model Parameters

Generative AI models may use parameters that influence output behavior.

Common concepts include:

Creativity/randomness
Response length
Style guidance

For AI-901, conceptual understanding is more important than memorizing exact parameter names.

APIs and Endpoints

Applications communicate with deployed generative models using:

APIs
Endpoints

These allow prompts and images to be processed programmatically.

Authentication

Applications must securely authenticate before using Azure AI services.

Common authentication methods include:

API keys
Azure credentials
Managed identities

User Interface Components

A lightweight image-generation application may include:

Prompt text box
Image upload option
Generate button
Image display area

Real-Time Generation

Some applications generate images interactively in near real time.

This improves user experience and experimentation.

Common Real-World Scenarios

Scenario 1: Marketing Content Creation

Goal

Generate promotional graphics.

Features

Text-to-image generation
Brand-aligned designs
Rapid content creation

Scenario 2: Product Concept Design

Goal

Visualize product ideas.

Features

Prototype generation
Style experimentation
Rapid iteration

Scenario 3: Educational Content

Goal

Generate learning visuals and illustrations.

Features

Diagram generation
Visual storytelling
Accessibility support

Scenario 4: Entertainment and Gaming

Goal

Create concept art and environments.

Features

Character design
Landscape generation
Artistic experimentation

Responsible AI Considerations

Generative image applications should follow Responsible AI principles.

Key considerations include:

Fairness
Privacy
Transparency
Inclusiveness
Accountability
Security

Copyright and Intellectual Property

Organizations should consider:

Ownership rights
Licensing concerns
Use of copyrighted material

Generated content may still raise legal and ethical questions.

Harmful Content Risks

Generative AI systems may create:

Offensive content
Misleading images
Unsafe material

Content filtering and moderation are important safeguards.

Deepfakes

AI-generated images or videos designed to imitate real people are called deepfakes.

Deepfakes can create ethical and security concerns.

Hallucinations

Generative models may produce inaccurate or unrealistic outputs.

These incorrect outputs are called hallucinations.

Bias and Fairness

Generated images may unintentionally reflect societal biases.

Examples include:

Stereotypical portrayals
Uneven representation
Cultural bias

Transparency

Users should understand:

AI generated the image
Outputs may contain inaccuracies
Images may be synthetic rather than real

Error Handling

Applications should handle:

Invalid prompts
Unsupported file types
Network interruptions
Authentication failures
Rate limits

Advantages of Generative Image Models

Benefits include:

Faster content creation
Creative assistance
Rapid prototyping
Automation
Enhanced user engagement

Limitations of Generative Models

Challenges include:

Hallucinations
Bias
Ethical concerns
Copyright uncertainty
Variable output quality

High-Level Workflow

A simplified workflow includes:

User enters prompt
Application sends request
Model generates image
Application displays output

Example High-Level Pseudocode

			
prompt = get_prompt()
image = generate_image(prompt)
display_image(image)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Important AI-901 Exam Tips

For the exam, remember these key points:

Generative AI creates new content.
Text-to-image models generate images from prompts.
Azure AI Foundry supports generative AI development.
Prompt engineering improves output quality.
APIs and endpoints connect applications to AI services.
Authentication secures access to Azure AI resources.
Deepfakes are synthetic media designed to imitate real people.
Hallucinations are inaccurate AI-generated outputs.
Responsible AI principles apply to generative image systems.
Transparency is important when presenting AI-generated content.

Quick Knowledge Check

This is unrelated to generative image models.

Final Thoughts

Creating new visual outputs by using generative models is an important AI-901 certification topic. Microsoft expects candidates to understand the foundational concepts behind generative image AI, including text-to-image generation, prompt engineering, APIs, deployment, Responsible AI principles, hallucinations, and ethical considerations.

Azure AI Foundry provides powerful tools for building intelligent applications capable of generating creative visual content for business, education, accessibility, and entertainment scenarios.

Go to the AI-901 Exam Prep Hub main page

AI, AI-901, Artificial Intelligence (AI), Computer Vision, Microsoft Certification May 18, 2026

Interpret visual input in prompts by using a deployed multimodal model (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions with computer vision and image-generation capabilities by using Foundry
      --> Interpret visual input in prompts by using a deployed multimodal model

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Modern AI systems are increasingly capable of understanding not only text and speech, but also visual information such as images and videos. Multimodal AI models combine multiple forms of input to generate intelligent responses and insights.

For the AI-901 certification exam, candidates should understand the foundational concepts behind interpreting visual input in prompts by using deployed multimodal models through Microsoft Azure AI Foundry and related Azure AI services.

This topic falls under the “Implement AI solutions with computer vision and image-generation capabilities by using Foundry” section of the AI-901 exam objectives.

What Is a Multimodal Model?

A multimodal model is an AI model capable of processing multiple types of input and output.

These modalities may include:

Text
Images
Speech/audio
Video

Multimodal models can combine information across different input types to generate responses.

What Is Visual Input?

Visual input refers to image or video data provided to an AI system.

Examples include:

Photographs
Screenshots
Documents
Charts
Diagrams
Videos

Example Visual Prompt

A user uploads a photo and asks:

“What objects are visible in this image?”

The AI analyzes the visual content and generates a response.

Computer Vision

Computer vision is the field of AI focused on enabling systems to interpret and understand visual information.

Computer vision tasks include:

Image classification
Object detection
Facial analysis
Optical character recognition (OCR)
Image captioning

Azure AI Vision

Azure AI Vision provides computer vision capabilities in Azure.

Features include:

Image analysis
OCR
Object detection
Image captioning
Face-related analysis

Azure AI Foundry

Azure AI Foundry provides tools for building and managing multimodal AI applications.

Developers can:

Deploy AI models
Test prompts
Analyze images
Build AI-powered apps

Deployed Models

A deployed model is an AI model made available for real-time use through a cloud endpoint.

Applications communicate with deployed models using APIs.

Visual Prompt Workflow

A common workflow includes:

User uploads image
Application sends image to multimodal model
Model analyzes visual content
Model generates response
Application displays results

Example Workflow

User Uploads Image

A photo of a dog playing in a park

User Prompt

“Describe this image.”

AI Response

“A brown dog is running through a grassy park.”

Visual Question Answering

Some multimodal models can answer questions about images.

Example

Prompt

“How many people are in the image?”

The model analyzes the image and generates an answer.

Combining Text and Images

Multimodal systems often combine:

Text prompts
Visual input

This improves contextual understanding.

Example

Image

A restaurant menu

Prompt

“Which item appears to be vegetarian?”

The AI analyzes both the image and the prompt together.

APIs and Endpoints

Applications communicate with deployed multimodal models through:

APIs
Endpoints

These allow images and prompts to be submitted programmatically.

Authentication

Applications must securely authenticate before accessing Azure AI services.

Common methods include:

API keys
Azure credentials
Managed identities

User Interface Components

A lightweight visual AI application may include:

Image upload area
Prompt input box
Results display
Image preview

Real-Time Processing

Many multimodal applications support near real-time image analysis.

This enables interactive user experiences.

Common Real-World Scenarios

Scenario 1: Accessibility Assistant

Goal

Describe visual content for visually impaired users.

Features

Image captioning
OCR
Voice output

Scenario 2: Retail Product Recognition

Goal

Identify products from images.

Features

Object detection
Classification
Product lookup

Scenario 3: Document Processing

Goal

Extract information from scanned forms.

Features

OCR
Text extraction
Data analysis

Scenario 4: Content Moderation

Goal

Identify harmful or unsafe visual content.

Features

Image analysis
Safety filtering
Automated moderation

Responsible AI Considerations

Visual AI applications should follow Responsible AI principles.

Key considerations include:

Privacy
Fairness
Transparency
Inclusiveness
Accountability
Security

Privacy Concerns

Images may contain:

Personal information
Faces
Sensitive documents

Organizations should protect user data appropriately.

Bias and Fairness

Computer vision systems may perform unevenly across:

Skin tones
Age groups
Lighting conditions
Demographics

Organizations should evaluate models carefully for fairness.

Transparency

Users should understand:

AI is analyzing images
AI-generated descriptions may contain errors
Images may be stored or processed in the cloud

Hallucinations

Multimodal AI systems may generate inaccurate visual descriptions.

These incorrect outputs are called hallucinations.

Applications should not assume all AI-generated outputs are accurate.

Error Handling

Applications should handle:

Unsupported image formats
Low-quality images
Network failures
Authentication errors
Rate limits

Image Quality Challenges

Poor image quality can reduce accuracy.

Examples include:

Blurry images
Poor lighting
Occluded objects
Low resolution

Advantages of Visual AI Applications

Benefits include:

Automation
Faster analysis
Accessibility improvements
Improved user experiences
Scalable image processing

Limitations of Visual AI Applications

Challenges include:

Recognition inaccuracies
Bias
Privacy concerns
Hallucinations
Sensitivity to image quality

High-Level Workflow

A simplified workflow includes:

Upload image
Send image and prompt to model
Analyze visual content
Generate response
Display results

Example High-Level Pseudocode

			
image = upload_image()
prompt = get_prompt()
response = analyze_image(image, prompt)
display_response(response)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Important AI-901 Exam Tips

For the exam, remember these key points:

Multimodal models process multiple data types.
Visual input includes images and video.
Azure AI Vision supports computer vision workloads.
OCR extracts text from images.
Image captioning generates descriptions of images.
Object detection identifies multiple objects in images.
APIs and endpoints connect applications to AI services.
Authentication secures AI access.
Responsible AI principles apply to computer vision systems.
Hallucinations are inaccurate AI-generated outputs.

Quick Knowledge Check

Keyboard settings are unrelated to computer vision.

Final Thoughts

Interpreting visual input using deployed multimodal models is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand the foundational concepts behind computer vision and multimodal AI applications, including image analysis, OCR, object detection, image captioning, APIs, authentication, and Responsible AI principles.

Azure AI Vision and Azure AI Foundry provide powerful tools for building intelligent applications capable of understanding and responding to visual information in real-world scenarios.

Go to the AI-901 Exam Prep Hub main page

AI, AI-901, azure, Microsoft Certification May 18, 2026

Build a lightweight application by using Azure Speech in Foundry Tools (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions for text and speech by using Foundry
      --> Build a lightweight application by using Azure Speech in Foundry Tools

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Speech-enabled AI applications are becoming increasingly common in customer service, accessibility, virtual assistants, and productivity solutions. Microsoft Azure provides speech services that allow developers to add speech recognition and speech synthesis capabilities to lightweight AI applications.

For the AI-901 certification exam, candidates should understand the foundational concepts behind building lightweight speech-enabled applications using Azure Speech and Microsoft Foundry tools.

This topic falls under the “Implement AI solutions for text and speech by using Foundry” section of the AI-901 exam objectives.

What Is Azure AI Speech?

Azure AI Speech is a cloud-based AI service that enables speech-related functionality in applications.

Azure AI Speech supports:

Speech recognition
Speech synthesis
Speech translation
Voice generation

What Is a Lightweight Application?

A lightweight application is a simple application designed to perform focused tasks with minimal complexity.

Characteristics include:

Simple user interface
Fast deployment
Lower resource usage
Easy maintenance

Examples of Lightweight Speech Applications

Examples include:

Voice-enabled chatbots
Simple voice assistants
Speech-to-text applications
Text-to-speech readers
Voice-controlled support tools

Azure AI Foundry

Azure AI Foundry provides tools for building, deploying, and testing AI-powered applications.

Developers can:

Access AI services
Configure models
Test applications
Manage deployments

Speech Recognition

Speech recognition converts spoken language into text.

This process is commonly called:

Speech-to-text (STT)
Automatic speech recognition (ASR)

Example

Spoken Input

“Schedule a meeting tomorrow.”

Recognized Text

“Schedule a meeting tomorrow.”

Speech Synthesis

Speech synthesis converts written text into spoken audio.

This process is commonly called:

Text-to-speech (TTS)

Example

Text

“Your appointment is confirmed.”

Spoken Output

The application reads the text aloud.

Speech Translation

Speech translation converts spoken language from one language into another.

Example

Spoken English

“Good morning.”

Translated Spanish Audio

“Buenos días.”

Voice Generation

AI systems can generate natural-sounding voices for:

Virtual assistants
Narration
Accessibility
Customer service systems

Basic Workflow of a Speech Application

A lightweight speech application commonly follows this workflow:

User speaks into microphone
Application captures audio
Azure Speech processes audio
Speech is converted to text
Application processes text
Optional speech synthesis generates spoken response

Example End-to-End Scenario

User Speaks

“What are today’s weather conditions?”

Speech Service

Converts speech to text

AI Processing

Generates response

Text-to-Speech

Reads response aloud

APIs and Endpoints

Applications communicate with Azure Speech services using:

APIs
Endpoints

These allow applications to send requests and receive responses programmatically.

Authentication

Applications must securely authenticate before using Azure Speech services.

Common methods include:

API keys
Azure credentials
Managed identities

Common User Interface Components

A lightweight speech application often includes:

Microphone input button
Text display area
Playback controls
Response output area

Real-Time Processing

Many speech applications process audio in real time.

This allows conversational experiences with minimal delay.

Streaming Audio

Streaming audio enables continuous processing of speech as users speak.

Benefits include:

Faster responses
More natural interactions
Reduced waiting time

Conversation Context

Some applications preserve context across interactions.

This allows more natural conversations.

Example

User

“Who founded Microsoft?”

User Later

“When was it created?”

The system understands “it” refers to Microsoft.

System Prompts

System prompts guide AI behavior and responses.

They help define:

Tone
Personality
Response style
Safety boundaries

Example System Prompt

“You are a friendly virtual assistant.”

Responsible AI Considerations

Speech-enabled applications should follow Responsible AI principles.

Key considerations include:

Privacy
Security
Inclusiveness
Transparency
Fairness
Accountability

Privacy Concerns

Speech systems may process sensitive spoken information.

Organizations should:

Secure recordings
Protect user conversations
Minimize unnecessary data retention

Inclusiveness

Speech applications should support:

Different accents
Multiple languages
Diverse speech patterns
Accessibility needs

Transparency

Users should know:

AI is processing speech
Audio may be analyzed
AI-generated responses may contain errors

Hallucinations

Generative AI systems may occasionally generate inaccurate responses.

These inaccuracies are called hallucinations.

Applications should not assume responses are always correct.

Error Handling

Applications should handle:

Background noise
Recognition errors
Authentication failures
Network interruptions
Rate limits

Background Noise Challenges

Speech recognition accuracy may decrease in:

Loud environments
Crowded spaces
Poor microphone conditions

Rate Limits

Azure AI services may limit request frequency.

Applications should handle throttling gracefully.

Latency

Latency refers to delays between:

User speech
AI processing
Spoken responses

Low latency improves user experience.

Advantages of Speech-Enabled Applications

Benefits include:

Natural interaction
Hands-free usage
Accessibility improvements
Faster communication
Improved engagement

Limitations of Speech Applications

Challenges include:

Accent variability
Background noise
Recognition inaccuracies
Privacy concerns
Network dependency

Common Real-World Scenarios

Scenario 1: Voice Assistant

Goal

Allow users to ask spoken questions.

Features

Speech recognition
Spoken responses
Conversational interaction

Scenario 2: Accessibility Tool

Goal

Assist visually impaired users.

Features

Text-to-speech
Voice commands
Audio navigation

Scenario 3: Customer Support Bot

Goal

Provide voice-based support.

Features

Real-time speech recognition
AI-generated responses
Multilingual support

High-Level Application Workflow

A simplified workflow includes:

Capture speech
Convert speech to text
Process request
Generate response
Convert response to speech
Play audio response

Example High-Level Pseudocode

			
audio = capture_audio()
text = speech_to_text(audio)
response = process_request(text)
speak(response)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Important AI-901 Exam Tips

For the exam, remember these key points:

Azure AI Speech provides speech-related AI services.
Speech recognition converts speech to text.
Speech synthesis converts text to speech.
Azure AI Foundry supports AI application development.
APIs and endpoints connect applications to cloud AI services.
Authentication secures access to Azure services.
Streaming audio supports real-time interaction.
Responsible AI principles apply to speech-enabled applications.
Inclusiveness is important for diverse speech patterns and accents.
Hallucinations are inaccurate AI-generated outputs.

Quick Knowledge Check

Cloud-based speech services generally require internet connectivity.

Final Thoughts

Building lightweight applications using Azure Speech in Foundry tools is an important AI-901 exam topic. Microsoft expects candidates to understand how speech-enabled AI applications work, including speech recognition, speech synthesis, APIs, authentication, Responsible AI considerations, and real-time conversational workflows.

Azure AI Speech and Azure AI Foundry provide powerful cloud-based tools that make it easier to create modern voice-enabled AI applications for business, accessibility, and productivity scenarios.

Go to the AI-901 Exam Prep Hub main page

Exam Prep Hubs available on The Data Community

Exam Prep Hub for AI-900: Microsoft Azure AI Fundamentals

Exam Prep Hub for PL-300: Microsoft Power BI Data Analyst

Exam Prep Hub for DP-600: Implementing Analytics Solutions Using Microsoft Fabric

Question 1

Correct Answer

Explanation

Question 2

Correct Answer

Explanation

Question 3

Correct Answer

Question 4

Correct Answer

Question 5

Correct Answers

Question 6

Correct Answer

Question 7

Correct Answer

Question 8

Correct Answers

Question 9

Correct Answer

Question 10

Correct Answer

Question 11

Correct Answer

Question 12

Correct Answer

Question 13

Correct Answer

Question 14

Correct Answer

Question 15

Correct Answers

Question 16

Correct Answer

Question 17

Correct Answer

Question 18

Correct Answers

Question 19

Correct Answer

Question 20

Correct Answer

Question 21

Correct Answer

Question 22

Correct Answers

Question 23

Correct Answer

Question 24

Correct Answer

Question 25

Correct Answer

Question 26

Correct Answer

Question 27

Correct Answers

Question 28

Correct Answer

Question 29

Correct Answer

Question 30

Correct Answer

Explanation

Question 1

Correct Answer

Explanation

Why the Other Answers Are Incorrect

Question 2

Correct Answer

Explanation

Question 3

Correct Answer

Question 4

Correct Answer

Explanation

Question 5