Tag: Image Generation

AI, AI-103, Computer Vision, Generative AI, Microsoft Certification May 25, 2026

Select and apply appropriate generation and editing controls provided by the platform (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement computer vision solutions (10–15%)
   --> Design and implement image- and video-generation solutions
      --> Select and apply appropriate generation and editing controls provided by the platform

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Modern generative AI platforms provide many controls that influence how images and videos are generated or edited. These controls help developers:

Improve output quality
Maintain consistency
Control creativity
Optimize performance
Enforce safety policies
Reduce operational costs

For the AI-103 certification exam, you should understand how to select and apply the appropriate controls for:

Image generation
Video generation
Image editing
Video editing
Multi-modal workflows

You should also understand:

Prompt controls
Resolution settings
Style and creativity controls
Safety filtering
Masking and editing parameters
Rendering settings
Model selection
Performance optimization

This topic falls under:

“Design and implement image- and video-generation solutions”

What Are Generation and Editing Controls?

Generation and editing controls are configurable parameters that influence how AI models produce or modify content.

Controls may affect:

Creativity
Style
Resolution
Consistency
Motion
Safety
Latency
Cost

These settings help tailor outputs to business and technical requirements.

Categories of Generation and Editing Controls

Common control categories include:

Prompt controls
Style controls
Resolution controls
Variation controls
Safety controls
Masking controls
Temporal controls
Rendering controls
Performance controls

Prompt Controls

What Are Prompt Controls?

Prompt controls influence how the model interprets user instructions.

Prompts can define:

Subject matter
Artistic style
Lighting
Camera perspective
Motion
Environment
Mood

Positive Prompts

Positive prompts specify desired characteristics.

Example:

			
A cinematic aerial view of a tropical island during sunset, ultra realistic, high detail

Negative Prompts

Negative prompts specify unwanted characteristics.

Example:

blurry, distorted, low quality, extra limbs

Negative prompts help improve output quality.

Prompt Weighting

What Is Prompt Weighting?

Prompt weighting emphasizes certain prompt elements more strongly.

Example:

sunset::2 tropical beach::1

This increases emphasis on:

sunset

relative to:

tropical beach

Style Controls

Purpose of Style Controls

Style controls influence artistic appearance.

Examples:

Photorealistic
Anime
Watercolor
Oil painting
Cyberpunk
Sketch

Style Reference Inputs

Platforms may allow reference images that guide:

Artistic appearance
Color palettes
Composition
Brand identity

Consistency Controls

Consistency controls help maintain:

Character appearance
Object structure
Scene continuity
Brand alignment

These are especially important in:

Video generation
Multi-image campaigns
Character-based storytelling

Resolution Controls

What Are Resolution Controls?

Resolution controls determine image or video dimensions.

Examples:

512 × 512
1024 × 1024
4K video

Higher Resolution Tradeoffs

Higher resolutions improve:

Detail
Print quality
Visual realism

However, they also increase:

Rendering time
GPU usage
Storage requirements
Cost

Aspect Ratio Controls

Aspect ratio defines image shape.

Examples:

Aspect Ratio	Common Usage
1:1	Social media posts
16:9	Videos and widescreen
9:16	Mobile vertical video
4:3	Traditional displays

Variation Controls

What Are Variation Controls?

Variation settings determine how different outputs are from one another.

Low variation:

Produces consistent outputs

High variation:

Produces more creative diversity

Seed Controls

What Is a Seed?

A seed is a numeric value used to initialize generation randomness.

Using the same:

Prompt
Model
Parameters
Seed

typically produces similar outputs.

Why Seeds Matter

Seeds help with:

Reproducibility
Testing
Version control
Collaborative workflows

Creativity Controls

Some platforms provide controls that influence:

Creativity
Randomness
Prompt adherence

High Creativity Settings

High creativity may produce:

Artistic outputs
Unexpected compositions
Diverse variations

Low Creativity Settings

Low creativity may produce:

Predictable outputs
Strong prompt adherence
Stable business imagery

Sampling Controls

Sampling controls influence how models select outputs during generation.

These settings affect:

Diversity
Determinism
Coherence

Temperature

Temperature controls randomness.

Low Temperature

Produces:

More predictable outputs
Stable results

High Temperature

Produces:

More diverse outputs
More creativity

Guidance Scale

What Is Guidance Scale?

Guidance scale controls how closely the model follows the prompt.

High Guidance Scale

Produces:

Strong prompt adherence
Less deviation

Low Guidance Scale

Produces:

More creativity
More variation

Editing Controls

Editing workflows often include specialized controls.

Mask Controls

Masks define editable regions.

Controls may include:

Edge softness
Mask opacity
Region expansion
Feathering

Inpainting Strength

What Is Inpainting Strength?

Inpainting strength determines how aggressively the model modifies masked regions.

Low Inpainting Strength

Preserves more of the original image.

High Inpainting Strength

Allows more dramatic modifications.

Blend Controls

Blend settings control how generated edits merge with original content.

This affects:

Realism
Transition smoothness
Artifact reduction

Temporal Controls for Video

Video workflows require additional controls for:

Motion consistency
Frame continuity
Camera movement

Frame Rate Controls

Frame rate determines:

Motion smoothness
Rendering complexity

Examples:

24 FPS
30 FPS
60 FPS

Motion Strength Controls

Motion controls influence:

Animation intensity
Camera movement
Object motion

Temporal Consistency Controls

These controls reduce:

Flickering
Object distortion
Scene instability

Especially important in:

Video editing
AI animation
Multi-scene workflows

Rendering Controls

Rendering settings affect:

Compression
Encoding
File size
Playback quality

Output Format Controls

Common formats include:

PNG
JPEG
MP4
MOV
WebM

Compression Settings

Higher compression:

Smaller files
Lower quality

Lower compression:

Better quality
Larger files

Safety Controls

Why Safety Controls Matter

Generative AI platforms include safety controls to reduce:

Harmful content
Unsafe imagery
Policy violations
Deepfake misuse

Azure AI Content Safety

Microsoft provides:
Azure AI Content Safety

to help detect:

Unsafe prompts
Harmful outputs
Policy violations

Moderation Controls

Moderation settings may:

Block unsafe generations
Flag outputs for review
Require human approval

Watermarking and Provenance Controls

Some platforms support:

Watermarking
Metadata tagging
Provenance tracking

These help identify AI-generated content.

Performance Controls

Why Performance Controls Matter

Performance settings help balance:

Quality
Latency
GPU usage
Operational cost

Batch Size Controls

Batch generation creates multiple outputs simultaneously.

Advantages:

Increased throughput

Tradeoffs:

Higher GPU usage

Draft vs Final Rendering

Some workflows generate:

Low-quality preview drafts
High-quality final renders

This improves responsiveness.

GPU and Hardware Selection

Platforms may allow selection of:

GPU tiers
Compute capacity
Rendering priority

Higher-end hardware improves:

Speed
Resolution capability
Throughput

Workflow Orchestration Controls

Enterprise systems often orchestrate:

Multiple generation stages
Human review
Safety validation
Asset storage
Automated rendering

Example Workflow

User submits prompt
Safety validation runs
Generation parameters selected
AI model generates outputs
Variations produced
Human review occurs
Final assets stored

Azure Services Used in Generative Media Workflows

Azure OpenAI Service

Supports:

Multi-modal AI workflows
Prompt-driven generation
AI editing capabilities

Azure AI Foundry

Supports:

Workflow orchestration
Prompt flows
Evaluation pipelines
AI experimentation

Azure AI Vision

Can support:

Segmentation
Object tracking
Scene analysis
Visual understanding

Azure Blob Storage

Frequently used for:

Media storage
Generated asset management
Workflow integration

Azure Functions

Often used for:

Trigger-based workflows
Rendering orchestration
Automated pipelines

Observability and Monitoring

Production systems should monitor:

Rendering latency
Failed generations
GPU utilization
Safety violations
Storage consumption
Operational cost

Best Practices for Applying Controls

Match Controls to Business Goals

Balance realism, creativity, and consistency.

Use Safety Controls Consistently

Validate prompts and outputs.

Optimize Resolution Carefully

Higher quality increases compute cost.

Use Seeds for Reproducibility

Helpful for testing and collaboration.

Tune Creativity Settings

Choose stable or artistic outputs depending on requirements.

Apply Human Review for Sensitive Content

Especially important in regulated environments.

Monitor Performance and Cost

Generative workflows can become expensive.

Real-World Example

An advertising company may implement a workflow that:

Generates multiple campaign images
Applies:
- 16:9 aspect ratio
- High guidance scale
- Moderate creativity
- Consistent style reference
Runs content safety checks
Produces multiple output variations
Stores approved assets in Blob Storage

This demonstrates:

Prompt controls
Style consistency
Resolution management
Safety enforcement
Workflow orchestration

Exam Tips for AI-103

For the AI-103 exam, remember these important concepts:

Prompt controls influence generation quality and style.
Negative prompts reduce undesirable characteristics.
Resolution and aspect ratio affect quality and performance.
Seeds support reproducibility.
Temperature and guidance scale influence creativity and prompt adherence.
Masks define editable regions.
Inpainting strength controls edit intensity.
Temporal consistency controls are critical for video workflows.
Safety controls help reduce harmful outputs.
Azure AI Content Safety supports moderation workflows.
GPU selection and rendering settings affect cost and latency.

Practice Exam Questions

Question 1

What is the purpose of a negative prompt in image generation?

A. Increasing GPU memory
B. Specifying unwanted characteristics in generated outputs
C. Compressing images automatically
D. Encrypting generated assets

Answer

B. Specifying unwanted characteristics in generated outputs

Explanation

Negative prompts help prevent undesirable features from appearing in generated media.

Question 2

What does a guidance scale primarily control?

A. Video compression ratio
B. How closely the model follows the prompt
C. Database indexing speed
D. Network bandwidth usage

Answer

B. How closely the model follows the prompt

Explanation

Higher guidance scales increase adherence to the prompt instructions.

Question 3

What is the primary benefit of using seeds in generative workflows?

A. Encrypting prompts
B. Improving reproducibility of outputs
C. Increasing storage capacity
D. Eliminating latency

Answer

B. Improving reproducibility of outputs

Explanation

Using the same seed and settings helps reproduce similar outputs.

Question 4

Which control directly affects output dimensions?

A. Temperature
B. Aspect ratio
C. Resolution settings
D. Sampling frequency

Answer

C. Resolution settings

Explanation

Resolution controls determine image or video dimensions.

Question 5

What is the purpose of temporal consistency controls in video workflows?

A. Compressing video metadata
B. Reducing flickering and unstable motion
C. Encrypting rendered frames
D. Eliminating frame rendering

Answer

B. Reducing flickering and unstable motion

Explanation

Temporal consistency helps maintain stable edits across frames.

Question 6

What does low temperature generally produce?

A. More predictable outputs
B. More artistic randomness
C. Higher network latency
D. Larger file sizes

Answer

A. More predictable outputs

Explanation

Lower temperature settings reduce randomness and increase consistency.

Question 7

Which Azure service helps moderate unsafe generated content?

A. Azure CDN
B. Azure AI Content Safety
C. Azure DNS
D. Azure Firewall

Answer

B. Azure AI Content Safety

Explanation

Azure AI Content Safety evaluates prompts and outputs for harmful content.

Question 8

What is the purpose of mask controls in editing workflows?

A. Defining editable image or video regions
B. Encrypting generated assets
C. Reducing GPU temperatures
D. Compressing output videos

Answer

A. Defining editable image or video regions

Explanation

Masks specify which regions may be modified during editing.

Question 9

Why might an organization generate low-resolution drafts before final rendering?

A. To improve responsiveness and reduce rendering cost
B. To remove prompts automatically
C. To eliminate all GPU usage
D. To encrypt media files

Answer

A. To improve responsiveness and reduce rendering cost

Explanation

Draft rendering allows faster previews before expensive high-quality rendering.

Question 10

What is a key tradeoff of higher-resolution generation?

A. Reduced image quality
B. Increased rendering cost and latency
C. Elimination of safety concerns
D. Lower GPU utilization

Answer

B. Increased rendering cost and latency

Explanation

Higher resolutions require more computational resources and rendering time.

Go to the AI-103 Exam Prep Hub main page

AI, AI-103, Computer Vision, Generative AI, Microsoft Certification May 25, 2026

Implement workflows to edit generated videos (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement computer vision solutions (10–15%)
   --> Design and implement image- and video-generation solutions
      --> Implement workflows to edit generated videos

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Generative AI systems are rapidly transforming how organizations create and edit video content. Beyond generating videos from prompts, modern AI systems can also:

Modify generated videos
Edit scenes and objects
Replace backgrounds
Apply stylistic changes
Enhance quality
Generate alternate video versions
Automate post-production workflows

For the AI-103 certification exam, you should understand how to implement workflows that edit generated videos using:

Prompt-driven modifications
Mask-based editing
Inpainting
Video-to-video transformation
Multi-modal AI workflows
Automated orchestration pipelines

You should also understand:

Temporal consistency
Video rendering workflows
Responsible AI considerations
Content safety
Storage and orchestration
Performance optimization
Azure services used in video-editing solutions

This topic falls under:

“Design and implement image- and video-generation solutions”

What Is AI Video Editing?

AI video editing uses generative AI and computer vision techniques to modify existing or AI-generated videos.

Unlike traditional manual editing, AI systems can:

Understand scene context
Interpret natural language instructions
Modify video elements automatically
Maintain frame consistency across time

Common AI Video Editing Use Cases

Marketing and Advertising

Edit:

Promotional videos
Product showcases
Seasonal campaigns

Entertainment and Media

Create:

Visual effects
Scene modifications
Cinematic enhancements
Animation edits

E-Commerce

Generate:

Product video variations
Personalized ads
Localized marketing clips

Education and Training

Modify:

Tutorial videos
Simulations
Instructional content

Enterprise Applications

Support:

Automated media workflows
AI-assisted post-production
Content localization

Core Components of AI Video Editing Workflows

Video-editing workflows commonly include:

Source video
Editing prompts
Masks or segmentation
Video generation model
Safety validation
Rendering pipeline
Storage system

Prompt-Driven Video Editing

What Is Prompt-Driven Video Editing?

Prompt-driven editing uses natural language instructions to modify video content.

Example:

Convert this daytime city scene into a rainy nighttime scene with neon lighting

The AI system interprets:

Lighting changes
Environmental conditions
Color adjustments
Scene mood

and applies them consistently across video frames.

Common Prompt-Driven Modifications

Style Transformation

Convert videos into:

Anime style
Watercolor style
Cinematic style
Retro film appearance

Environmental Changes

Modify:

Weather
Time of day
Background scenery

Object Addition or Removal

Add or remove:

Vehicles
People
Furniture
Branding elements

Scene Enhancements

Improve:

Lighting
Sharpness
Atmosphere
Visual effects

Video Inpainting

What Is Video Inpainting?

Video inpainting modifies selected regions across multiple video frames while preserving the rest of the video.

The workflow typically includes:

Original video
Mask identifying editable regions
Prompt describing desired changes
AI model generating replacement content
Temporal consistency validation

Example Video Inpainting Workflow

Original video:

Street scene with parked cars

Mask:

Covers one vehicle

Prompt:

Replace the parked sedan with a red sports car

Result:

The vehicle changes consistently across all frames.

Why Temporal Consistency Matters

Temporal Consistency

Temporal consistency ensures:

Objects remain stable
Motion appears natural
Lighting stays coherent
Edits do not flicker between frames

Without temporal consistency:

Objects may distort
Colors may shift unexpectedly
Motion may appear unnatural

Mask-Based Video Editing

What Is a Video Mask?

A video mask identifies editable regions across frames.

Masks may:

Track moving objects
Define static regions
Follow characters or subjects

Types of Video Masks

Manual Masks

Editors manually define editable regions.

Advantages:

High precision
Fine-grained control

Automated Masks

AI models automatically track and segment objects.

Advantages:

Faster workflows
Reduced manual effort

Object Tracking in Video Editing

Why Object Tracking Matters

Objects often move across frames.

Tracking systems help:

Maintain mask alignment
Preserve edit consistency
Improve realism

Example Object Tracking Workflow

Detect object in frame 1
Track object movement
Update mask positions automatically
Apply edits consistently

Video-to-Video Transformation

What Is Video-to-Video Transformation?

Video-to-video systems transform an existing video into a modified version while preserving motion structure.

Examples:

Cartoon conversion
Cinematic grading
Artistic style transfer
Environment changes

Style Transfer for Video

What Is Style Transfer?

Style transfer applies visual characteristics from one style to another.

Examples:

Oil painting style
Anime appearance
Sketch rendering
Vintage film effects

Scene Expansion and Outpainting

What Is Video Outpainting?

Video outpainting expands scenes beyond original frame boundaries.

Examples:

Widening landscapes
Expanding backgrounds
Creating cinematic widescreen effects

Frame Interpolation

What Is Frame Interpolation?

Frame interpolation generates intermediate frames between existing frames.

Benefits:

Smoother motion
Higher frame rates
Improved visual quality

Upscaling and Video Enhancement

AI systems can improve:

Resolution
Sharpness
Noise reduction
Compression artifacts

Multi-Step Video Editing Workflows

Enterprise solutions often combine several AI editing stages.

Example Enterprise Workflow

Upload generated video
Segment editable objects
Generate masks
Apply prompt-driven modifications
Run temporal consistency checks
Enhance resolution
Apply safety validation
Render final output
Store edited video

Workflow Automation

AI video-editing workflows are commonly automated using:

APIs
Event-driven pipelines
Serverless orchestration
AI workflow engines

Example Automated Workflow

User uploads video
Azure Function triggers workflow
AI service performs segmentation
Prompt-based edits applied
Safety validation runs
Final video rendered
Output stored in Blob Storage

Rendering Pipelines

What Is Video Rendering?

Rendering combines generated frames and effects into a final playable video.

Rendering tasks may include:

Frame generation
Compression
Encoding
Transitions
Audio synchronization

Video Encoding Formats

Common formats include:

MP4
MOV
WebM

Responsible AI Considerations

AI-powered video editing introduces significant Responsible AI concerns.

Deepfake Risks

AI editing may alter:

Faces
Voices
Identities
Expressions

Potential misuse includes:

Fraud
Misinformation
Impersonation

Harmful Content

Edited videos may unintentionally include:

Violence
Hate content
Explicit material

Copyright Concerns

Generated edits may resemble copyrighted:

Characters
Styles
Media assets

Bias and Fairness

AI systems may unintentionally reinforce:

Cultural stereotypes
Representation imbalance
Demographic bias

Azure AI Content Safety

Microsoft provides:
Azure AI Content Safety

to help evaluate:

Unsafe prompts
Harmful outputs
Policy violations

Moderation Workflows

Enterprise systems may:

Block unsafe edits
Require human review
Escalate suspicious outputs

Watermarking and Provenance

AI-generated or edited videos may include:

Watermarks
Metadata
Provenance tracking

These help identify synthetic content.

Performance Considerations

Video editing is computationally intensive.

Factors affecting performance include:

Video resolution
Frame count
Rendering complexity
Model size
GPU availability

GPU Acceleration

Video editing workflows commonly rely on GPUs because of:

Parallel frame processing
Rendering efficiency
Matrix computation acceleration

Latency Challenges

Video editing typically requires:

Significant compute time
Large storage bandwidth
High rendering throughput

Optimization Techniques

Lower Resolution Drafts

Generate previews before final rendering.

Progressive Rendering

Return low-quality previews first.

Parallel Frame Processing

Render independent frames simultaneously.

Frame Interpolation

Reduce rendering requirements while maintaining smooth motion.

Azure Services for Video Editing Workflows

Azure OpenAI Service

Supports:

Multi-modal AI workflows
Prompt-driven generation
AI-powered editing pipelines

Azure AI Foundry

Supports:

Workflow orchestration
Prompt flows
Multi-modal AI pipelines
Evaluation systems

Azure AI Vision

Can support:

Segmentation
Object tracking
Scene analysis
Video understanding

Azure Blob Storage

Frequently used for:

Source video storage
Rendered output storage
Media asset management

Azure Functions

Often used for:

Trigger-based orchestration
Automated workflows
Rendering pipelines

Observability for Video Editing Systems

Production systems should monitor:

Rendering latency
GPU utilization
Failed processing jobs
Safety violations
Storage usage
Operational costs

Human-in-the-Loop Review

Organizations often require human approval for:

Public-facing content
Brand-sensitive media
Regulated industries
High-risk synthetic content

Best Practices for Video Editing Workflows

Use Precise Masks

Improves editing consistency.

Maintain Temporal Consistency

Prevent flickering and unstable edits.

Write Detailed Prompts

Improves modification accuracy.

Implement Content Safety

Validate prompts and outputs.

Monitor Cost and Performance

Video rendering can be expensive.

Use Human Review for Sensitive Content

Especially important in regulated environments.

Maintain Audit Logs

Track prompts, edits, approvals, and outputs.

Real-World Example

A marketing company may implement a workflow that:

Generates a product video
Applies prompt:

Convert the commercial into a nighttime neon cyberpunk theme

Automatically segments products and people
Applies scene-wide edits
Validates content safety
Renders multiple versions
Stores approved outputs in Blob Storage

This demonstrates:

Prompt-driven editing
Video-to-video transformation
Automated orchestration
Temporal consistency management

Exam Tips for AI-103

For the AI-103 exam, remember these important concepts:

Prompt-driven video editing uses natural language instructions to modify videos.
Video inpainting edits selected regions across multiple frames.
Temporal consistency is critical for realistic video editing.
Masks define editable regions across video frames.
Object tracking helps maintain consistent edits.
Video-to-video transformation preserves motion structure while changing appearance.
Azure AI Content Safety helps moderate unsafe edits.
Azure Blob Storage commonly stores source and edited videos.
GPU acceleration is critical for rendering performance.
Human review may be required for sensitive or public-facing content.

Practice Exam Questions

Question 1

What is the primary purpose of video inpainting?

A. Compressing video files
B. Editing selected regions across video frames
C. Encrypting video metadata
D. Detecting malware

Answer

B. Editing selected regions across video frames

Explanation

Video inpainting modifies targeted areas consistently across multiple frames.

Question 2

Why is temporal consistency important in video editing workflows?

A. It reduces storage costs
B. It ensures stable and coherent edits across frames
C. It eliminates all latency
D. It encrypts rendered videos

Answer

B. It ensures stable and coherent edits across frames

Explanation

Temporal consistency prevents flickering and unrealistic motion artifacts.

Question 3

What is the purpose of a video mask?

A. Encrypting video content
B. Defining editable regions across frames
C. Increasing internet speed
D. Compressing rendered outputs

Answer

B. Defining editable regions across frames

Explanation

Masks specify which parts of a video may be modified.

Question 4

What does video-to-video transformation primarily do?

A. Convert videos into spreadsheets
B. Transform an existing video while preserving motion structure
C. Remove all frames from a video
D. Encrypt video storage

Answer

B. Transform an existing video while preserving motion structure

Explanation

Video-to-video workflows alter appearance while retaining motion continuity.

Question 5

Why is object tracking important in AI video editing?

A. It reduces database size
B. It maintains mask alignment and consistent edits
C. It removes prompts automatically
D. It compresses video metadata

Answer

B. It maintains mask alignment and consistent edits

Explanation

Tracking ensures edits follow moving objects accurately across frames.

Question 6

What is frame interpolation?

A. Deleting intermediate frames
B. Generating intermediate frames for smoother motion
C. Encrypting rendered videos
D. Compressing audio tracks

Answer

B. Generating intermediate frames for smoother motion

Explanation

Frame interpolation improves motion smoothness and frame rates.

Question 7

Which Azure service helps moderate harmful edited video content?

A. Azure DNS
B. Azure AI Content Safety
C. Azure CDN
D. Azure Virtual WAN

Answer

B. Azure AI Content Safety

Explanation

Azure AI Content Safety evaluates prompts and outputs for unsafe content.

Question 8

Why are GPUs commonly used in AI video editing workflows?

A. GPUs eliminate the need for prompts
B. GPUs accelerate parallel rendering and frame processing
C. GPUs automatically moderate unsafe content
D. GPUs reduce internet bandwidth

Answer

B. GPUs accelerate parallel rendering and frame processing

Explanation

Video editing workloads require intensive parallel computations.

Question 9

Which Azure storage service is commonly used for storing rendered videos?

A. Azure Queue Storage
B. Azure Blob Storage
C. Azure DNS
D. Azure Firewall

Answer

B. Azure Blob Storage

Explanation

Azure Blob Storage is commonly used for large media assets.

Question 10

What is a major Responsible AI concern in AI-powered video editing?

A. Deepfake misuse
B. Reduced GPU temperature
C. Faster SQL performance
D. Lower storage capacity

Answer

A. Deepfake misuse

Explanation

AI video editing can potentially be misused for impersonation or misinformation.

Go to the AI-103 Exam Prep Hub main page

AI, AI-103, Azure AI, Computer Vision, Generative AI, Microsoft Certification May 25, 2026

Configure image-editing workflows, including inpainting, mask-based edits, and prompt-driven modifications (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement computer vision solutions (10–15%)
   --> Design and implement image- and video-generation solutions
      --> Configure image-editing workflows, including inpainting, mask-based edits, and prompt-driven modifications

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Modern generative AI systems are capable of much more than simply generating images from scratch. Organizations increasingly use AI-powered image editing workflows to:

Modify existing images
Replace objects
Edit backgrounds
Improve image quality
Apply artistic styles
Perform targeted visual changes

For the AI-103 certification exam, you should understand how to configure and implement image-editing workflows using:

Inpainting
Mask-based editing
Prompt-driven modifications
Reference images
Multi-modal editing pipelines

You should also understand:

Workflow orchestration
Prompt engineering
Responsible AI considerations
Content safety
Storage and processing workflows
Azure services commonly used in image editing systems

This topic falls under:

“Design and implement image- and video-generation solutions”

What Is AI Image Editing?

AI image editing uses generative AI models to modify existing images based on:

Text prompts
Masks
Reference media
Style instructions

Unlike text-to-image generation, image editing starts with an existing image and selectively changes portions of it.

Common Image Editing Use Cases

Marketing and Advertising

Modify:

Product backgrounds
Seasonal themes
Promotional imagery

E-Commerce

Generate:

Product variations
Lifestyle scenes
Background replacements

Photography

Enhance:

Lighting
Resolution
Object cleanup
Scene composition

Entertainment and Media

Create:

Visual effects
Character edits
Stylized artwork

Enterprise Applications

Support:

Brand-compliant imagery
AI-assisted design workflows
Automated content generation

Core Components of AI Image Editing

AI image-editing workflows commonly include:

Source image
Editing instructions
Masks
Generative model
Safety validation
Output rendering

What Is Inpainting?

Definition

Inpainting is an AI editing technique that modifies selected portions of an image while preserving the rest of the image.

The system uses:

An original image
A mask identifying editable regions
A text prompt describing desired changes

How Inpainting Works

The workflow typically includes:

Upload original image
Define editable region using a mask
Provide prompt instructions
AI model generates replacement content
Blend generated content into original image

Example Inpainting Scenario

Original image:

Person standing in a park

Mask:

Covers the person’s jacket

Prompt:

Replace the jacket with a red leather jacket

Result:

Only the jacket changes
Background and other elements remain intact

Common Inpainting Use Cases

Object Removal

Remove:

Watermarks
Background clutter
Unwanted objects

Object Replacement

Replace:

Clothing
Furniture
Products
Signs

Background Editing

Modify scenery while preserving foreground subjects.

Image Restoration

Repair:

Damaged photographs
Missing sections
Visual defects

What Is a Mask?

A mask defines which parts of an image may be modified.

Mask-Based Editing

Purpose of Masks

Masks allow precise control over edits.

White or highlighted regions typically indicate:

Editable areas

Unmasked regions remain unchanged.

Types of Masks

Binary Masks

Simple editable/non-editable regions.

Soft Masks

Allow gradual blending between edited and preserved areas.

Semantic Masks

Generated automatically using object detection or segmentation.

Examples:

Person segmentation
Background segmentation
Sky detection

Manual vs Automated Mask Creation

Manual Masks

Users draw editable areas manually.

Advantages:

Precise control
Flexible editing

Automated Masks

AI identifies objects automatically.

Advantages:

Faster workflows
Reduced manual effort

Prompt-Driven Modifications

What Are Prompt-Driven Modifications?

Prompt-driven editing uses natural language instructions to guide image modifications.

The prompt describes:

Desired changes
Style
Color
Objects
Mood
Lighting

Example Prompt-Driven Edits

Style Modification

Transform this image into a watercolor painting

Background Replacement

Replace the background with a snowy mountain landscape

Object Addition

Add a golden retriever sitting beside the person

Lighting Adjustments

Convert the scene to nighttime with neon lighting

Prompt Engineering for Image Editing

Why Prompt Engineering Matters

Clear prompts improve:

Editing accuracy
Consistency
Style control
Realism

Effective Prompt Components

Component	Example
Object	“A wooden table”
Style	“minimalist design”
Environment	“modern office”
Lighting	“soft warm lighting”
Quality	“highly detailed”

Negative Prompts

Negative prompts specify unwanted characteristics.

Example:

blurry, distorted, extra limbs, low quality

These help improve output quality.

Multi-Step Editing Workflows

Enterprise systems often use multiple editing stages.

Example Workflow

Upload image
Detect editable objects
Generate masks
Apply prompt-driven edits
Run safety validation
Generate variations
Store approved outputs

Image Segmentation in Editing Workflows

What Is Image Segmentation?

Segmentation identifies objects or regions within images.

Segmentation helps:

Create masks automatically
Improve editing precision
Enable object-aware workflows

Types of Segmentation

Semantic Segmentation

Groups pixels by category.

Example:

Sky
Road
Person

Instance Segmentation

Separates individual objects.

Example:

Person 1
Person 2
Car 1

Style Transfer

What Is Style Transfer?

Style transfer applies the artistic style of one image to another.

Examples:

Oil painting style
Anime style
Sketch style
Watercolor style

Image Variations

Generative editing systems can produce:

Multiple alternate edits
Different styles
Different lighting conditions
Multiple compositions

This helps users compare outputs.

Outpainting

What Is Outpainting?

Outpainting extends an image beyond its original boundaries.

Use cases:

Expanding landscapes
Creating panoramic scenes
Extending backgrounds

Workflow Automation

Image-editing pipelines are commonly automated using:

APIs
Serverless workflows
Event-driven orchestration

Example Automated Workflow

User uploads product image
Azure Function triggers workflow
AI model removes background
New background generated
Safety checks run
Final image stored

Responsible AI Considerations

Image editing introduces several Responsible AI concerns.

Deepfake Risks

Image editing can alter:

Faces
Identities
Appearances

Improper use may create misleading content.

Harmful Content Generation

Edits may unintentionally create:

Violent imagery
Hate content
Explicit material

Copyright Concerns

Generated edits may resemble copyrighted works.

Organizations should ensure proper licensing.

Bias and Fairness

Editing systems may unintentionally reinforce:

Stereotypes
Representation imbalance
Cultural bias

Azure AI Content Safety

Microsoft provides:
Azure AI Content Safety

to help detect:

Harmful prompts
Unsafe outputs
Policy violations

Moderation Workflows

Enterprise systems may:

Block unsafe edits
Flag outputs for review
Require human approval

Human-in-the-Loop Validation

Organizations often require manual review for:

Brand-sensitive content
Regulated industries
Public-facing media

Performance Considerations

Image editing can require substantial compute resources.

Factors affecting performance include:

Image resolution
Mask complexity
Model size
Number of variations
GPU availability

GPU Acceleration

Generative image editing heavily relies on GPUs because of:

Parallel computation
Matrix operations
Rendering efficiency

Optimization Techniques

Lower Resolution Drafts

Preview edits before full rendering.

Progressive Upscaling

Generate smaller images first, then upscale.

Cached Assets

Reuse commonly edited assets.

Parallel Variation Generation

Create multiple outputs simultaneously.

Azure Services for Image Editing Workflows

Azure OpenAI Service

Supports:

Multi-modal AI workflows
Prompt-driven editing
Image generation pipelines

Azure AI Foundry

Used for:

Prompt orchestration
Workflow development
Model evaluation
AI pipeline management

Azure AI Vision

Can support:

Segmentation
Object detection
Image analysis
Automated mask generation

Azure Blob Storage

Frequently used for:

Storing source images
Managing edited outputs
Workflow integration

Azure Functions

Often used for:

Workflow orchestration
Trigger-based processing
Automation pipelines

Observability for Image Editing Systems

Production systems should monitor:

Editing latency
Failed requests
GPU utilization
Safety violations
Prompt trends
Storage usage
Operational costs

Best Practices for Image Editing Solutions

Use Precise Masks

Improves editing accuracy.

Write Detailed Prompts

Clear prompts produce better results.

Validate Inputs and Outputs

Apply safety filtering consistently.

Maintain Audit Logs

Track prompts, edits, and approvals.

Use Human Review for Sensitive Content

Especially important for regulated industries.

Optimize for Cost and Latency

Balance rendering quality with operational efficiency.

Protect User Privacy

Secure uploaded images appropriately.

Real-World Example

An e-commerce retailer may implement an image-editing workflow that:

Accepts a clothing product image
Automatically segments the background
Uses prompt:

Replace the background with a luxury fashion studio setting

Generates multiple styled variations
Runs safety validation
Stores approved outputs in Blob Storage

This demonstrates:

Mask-based editing
Prompt-driven modification
Automated workflows
Safety enforcement

Exam Tips for AI-103

For the AI-103 exam, remember these important concepts:

Inpainting edits selected portions of an image.
Masks define editable regions.
Prompt-driven editing uses natural language instructions.
Segmentation can automate mask generation.
Negative prompts help avoid undesirable outputs.
Outpainting expands image boundaries.
Style transfer changes artistic appearance.
Azure AI Content Safety helps moderate unsafe content.
Azure Blob Storage commonly stores source and edited images.
GPU acceleration is important for performance.
Human review may be required for sensitive content.

Practice Exam Questions

Question 1

What is the primary purpose of inpainting?

A. Compressing image files
B. Editing selected portions of an image
C. Detecting malware in images
D. Encrypting image metadata

Answer

B. Editing selected portions of an image

Explanation

Inpainting modifies specific image regions while preserving the remainder of the image.

Question 2

What does a mask define in an image-editing workflow?

A. GPU allocation settings
B. Editable image regions
C. Storage locations
D. Encryption keys

Answer

B. Editable image regions

Explanation

Masks specify which parts of an image may be modified.

Question 3

What is the purpose of prompt-driven modifications?

A. Increasing network speed
B. Guiding edits using natural language instructions
C. Compressing images automatically
D. Removing metadata

Answer

B. Guiding edits using natural language instructions

Explanation

Prompt-driven editing uses text instructions to direct AI modifications.

Question 4

Which technique extends an image beyond its original borders?

A. Segmentation
B. Inpainting
C. Outpainting
D. Compression

Answer

C. Outpainting

Explanation

Outpainting expands the visible image area.

Question 5

What is a common use case for image segmentation in editing workflows?

A. Encrypting image files
B. Automatically generating masks
C. Reducing internet bandwidth
D. Removing prompts

Answer

B. Automatically generating masks

Explanation

Segmentation helps identify editable regions automatically.

Question 6

What is the purpose of a negative prompt?

A. Preventing unwanted visual characteristics
B. Increasing GPU temperature
C. Encrypting prompts
D. Expanding image resolution

Answer

A. Preventing unwanted visual characteristics

Explanation

Negative prompts specify undesired features in generated outputs.

Question 7

Which Azure service helps moderate unsafe image edits?

A. Azure CDN
B. Azure AI Content Safety
C. Azure Virtual WAN
D. Azure DNS

Answer

B. Azure AI Content Safety

Explanation

Azure AI Content Safety evaluates prompts and outputs for harmful content.

Question 8

Why are GPUs commonly used in AI image editing?

A. GPUs reduce storage requirements
B. GPUs improve parallel processing performance
C. GPUs eliminate the need for prompts
D. GPUs automatically create masks

Answer

B. GPUs improve parallel processing performance

Explanation

Image editing requires intensive parallel computations that GPUs handle efficiently.

Question 9

Which Azure service is commonly used to store edited image outputs?

A. Azure Queue Storage
B. Azure Blob Storage
C. Azure DNS
D. Azure Firewall

Answer

B. Azure Blob Storage

Explanation

Azure Blob Storage is commonly used for storing media assets.

Question 10

What is a key Responsible AI concern in AI-powered image editing?

A. Deepfake misuse
B. Reduced storage capacity
C. Faster SQL queries
D. Lower network utilization

Answer

A. Deepfake misuse

Explanation

AI image editing can potentially be used to create misleading or impersonated content.

Go to the AI-103 Exam Prep Hub main page

AI, AI-103, Computer Vision, Generative AI, Microsoft Certification May 25, 2026

Implement a solution that generates images from text prompts and reference media (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement computer vision solutions (10–15%)
   --> Design and implement image- and video-generation solutions
      --> Implement a solution that generates images from text prompts and reference media

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the rapidly growing areas of generative AI is AI-powered image generation. Modern AI systems can create realistic or artistic images using:

Natural language prompts
Existing reference images
Style examples
Sketches
Masks
Multi-modal inputs

For the AI-103 exam, you should understand how to design and implement solutions that generate images from:

Text prompts
Reference media
Multi-modal instructions

You should also understand:

Prompt engineering for image generation
Image editing workflows
Responsible AI considerations
Model selection
Content safety
Image generation architectures
Azure AI services involved in image generation solutions

This topic falls under:

“Design and implement image- and video-generation solutions”

What Is AI Image Generation?

AI image generation uses generative AI models to create images based on input instructions.

Inputs may include:

Text prompts
Existing images
Style references
Sketches
Masks
Layout guides

Outputs may include:

Photorealistic images
Illustrations
Concept art
Product mockups
Marketing graphics
Variations of existing images

Text-to-Image Generation

What Is Text-to-Image Generation?

Text-to-image generation converts natural language descriptions into images.

Example prompt:

A futuristic city skyline at sunset with flying cars and neon lights

The model interprets:

Objects
Style
Lighting
Composition
Mood
Color
Context

and generates a matching image.

Common Use Cases

Marketing and Advertising

Generate:

Social media graphics
Product campaigns
Brand concepts

Entertainment and Gaming

Create:

Concept art
Characters
Environments
Storyboards

E-Commerce

Generate:

Product mockups
Lifestyle imagery
Variations of products

Education and Training

Create:

Diagrams
Simulations
Visual explanations

Design Prototyping

Generate:

UI concepts
Architecture ideas
Interior design concepts

Image Generation Models

Image generation solutions commonly use diffusion-based generative models.

These models learn patterns from massive image datasets and generate new images from learned representations.

Diffusion Models

What Is a Diffusion Model?

A diffusion model works by:

Starting with random noise
Iteratively refining the image
Aligning the image with the prompt

The model gradually transforms noise into meaningful visuals.

Prompt Interpretation

Image generation models interpret prompts using:

Natural language processing
Cross-modal embeddings
Attention mechanisms

Prompt wording strongly influences the final image.

Prompt Engineering for Image Generation

Why Prompt Engineering Matters

The quality of generated images depends heavily on prompt design.

Good prompts improve:

Accuracy
Style consistency
Composition
Realism
Artistic control

Effective Prompt Components

A strong prompt often includes:

Component	Example
Subject	“A golden retriever”
Environment	“on a tropical beach”
Style	“watercolor painting”
Lighting	“soft sunset lighting”
Camera angle	“wide-angle shot”
Quality modifiers	“highly detailed”

Example Prompt

			
A highly detailed watercolor painting of a golden retriever sitting on a tropical beach during sunset, cinematic lighting, ultra realistic

Negative Prompts

Negative prompts specify what should NOT appear.

Example:

blurry, distorted, low quality, extra limbs

Negative prompts improve output quality.

Image-to-Image Generation

What Is Image-to-Image Generation?

Image-to-image generation uses an existing image as a reference or starting point.

The model modifies or transforms the image while preserving certain characteristics.

Common Image-to-Image Tasks

Style Transfer

Convert images into:

Oil paintings
Anime
Sketches
Watercolors

Image Variations

Generate alternate versions of an image.

Background Replacement

Modify image backgrounds while preserving subjects.

Image Enhancement

Improve:

Resolution
Sharpness
Lighting

Object Replacement

Replace objects while maintaining scene consistency.

Reference Media in Image Generation

Reference media provides guidance to the model.

Examples include:

Existing photos
Character references
Brand assets
Style examples
Sketches

Benefits of Reference Media

Reference media helps maintain:

Visual consistency
Brand identity
Character appearance
Artistic style
Composition structure

Multi-Modal Image Generation

Modern systems often combine:

Text
Images
Layout instructions
Style guidance

This is called multi-modal generation.

Example Multi-Modal Workflow

Inputs:

Product image
Brand style guide
Text prompt

Output:

Marketing-ready advertisement image

Inpainting

What Is Inpainting?

Inpainting edits selected regions of an image.

A mask identifies which portion to modify.

Inpainting Use Cases

Object Removal

Remove unwanted items from photos.

Background Editing

Replace scenery or environments.

Image Repair

Restore damaged images.

Content Replacement

Modify clothing, objects, or text.

Outpainting

What Is Outpainting?

Outpainting expands an image beyond its original borders.

Example:

Extending landscapes
Expanding backgrounds
Creating panoramic views

Image Generation Workflow

A typical workflow includes:

User submits prompt
System validates request
Prompt preprocessing occurs
Model generates image
Safety checks run
Output returned or stored

Safety and Responsible AI

Image generation introduces important Responsible AI concerns.

Common Risks

Harmful Content

Generated images may contain:

Violence
Hate symbols
Explicit content

Deepfakes

AI-generated media may impersonate real people.

Copyright Concerns

Generated images may resemble copyrighted material.

Bias and Representation Issues

Models may unintentionally reinforce stereotypes.

Azure AI Content Safety

Microsoft provides:
Azure AI Content Safety

to help detect:

Harmful prompts
Unsafe outputs
Policy violations

Content Filtering

Content filtering may:

Block prompts
Reject unsafe generations
Flag suspicious content
Require moderation review

Watermarking and Provenance

Some AI systems include:

Watermarking
Metadata tagging
Content provenance tracking

These help identify AI-generated images.

Latency and Performance Considerations

Image generation can be computationally expensive.

Performance depends on:

Model size
Image resolution
Prompt complexity
Hardware acceleration
Batch size

GPU Acceleration

Image generation commonly relies on GPUs because of:

Parallel processing
Matrix computation efficiency

Optimization Techniques

Lower Resolution Generation

Generate smaller images faster.

Progressive Upscaling

Generate low-resolution images first, then upscale.

Caching

Reuse repeated assets or prompts.

Batch Processing

Generate multiple images simultaneously.

Azure Services for Image Generation Solutions

Azure OpenAI Service

Supports:

Image generation models
Multi-modal AI capabilities
Prompt-based image workflows

Azure AI Foundry

Used for:

Model management
Prompt orchestration
AI workflow development
Evaluation pipelines

Azure AI Vision

Can support:

Image analysis
Captioning
Object detection
Visual processing workflows

Azure Blob Storage

Frequently used for:

Storing generated images
Media asset management
Workflow integration

Integrating Image Generation into Applications

Applications may integrate image generation into:

Chatbots
Design tools
Marketing platforms
CMS systems
Mobile apps
AI agents

Example Architecture

A marketing image generation solution may include:

Front-end web application
Azure OpenAI image model
Azure AI Content Safety validation
Blob Storage for generated images
Azure Functions for orchestration
Monitoring and logging systems

Observability for Image Generation

Production image systems should monitor:

Request volume
Generation latency
Failed requests
Safety violations
GPU utilization
Cost metrics

Prompt Versioning

Prompt versioning tracks changes to prompts over time.

Benefits:

Reproducibility
Experimentation
Rollback capability
Quality comparisons

Human-in-the-Loop Validation

Some enterprise systems require manual review for:

Brand-sensitive images
Public-facing content
Regulated industries

Best Practices for Image Generation Solutions

Use Clear Prompts

Detailed prompts improve output quality.

Validate Inputs

Screen prompts for unsafe or prohibited content.

Use Reference Images Carefully

Ensure proper licensing and compliance.

Implement Content Safety

Apply filtering to both prompts and outputs.

Monitor Costs

Image generation can be resource-intensive.

Optimize for Latency

Balance quality with performance requirements.

Maintain Audit Logs

Track prompts, outputs, and moderation decisions.

Use Human Review for High-Risk Content

Particularly important in regulated industries.

Real-World Example

An e-commerce retailer may implement an AI image generation solution that:

Accepts a product image
Accepts a text prompt:

			
Create a luxury holiday advertisement featuring this watch in a snowy mountain setting

Generates multiple variations
Applies content safety checks
Stores approved images in Azure Blob Storage

This demonstrates:

Text-to-image generation
Reference image usage
Workflow orchestration
Safety validation

Exam Tips for AI-103

For the AI-103 exam, remember these important concepts:

Text-to-image generation creates images from natural language prompts.
Image-to-image generation modifies or transforms existing images.
Reference media helps maintain consistency and style.
Diffusion models are commonly used for image generation.
Prompt engineering strongly affects image quality.
Inpainting edits selected portions of images.
Outpainting expands image boundaries.
Responsible AI and content safety are critical.
Azure AI Content Safety helps filter unsafe prompts and outputs.
Generated images are often stored using Azure Blob Storage.
GPU acceleration is important for performance.

Practice Exam Questions

Question 1

What is the primary purpose of text-to-image generation?

A. Compressing images
B. Generating images from natural language descriptions
C. Encrypting image files
D. Detecting malware

Answer

B. Generating images from natural language descriptions

Explanation

Text-to-image generation creates visuals based on natural language prompts.

Question 2

Which type of model is commonly used for AI image generation?

A. Relational models
B. Diffusion models
C. Decision trees
D. DNS models

Answer

B. Diffusion models

Explanation

Diffusion models generate images by refining random noise iteratively.

Question 3

What is the purpose of a negative prompt?

A. Increasing storage space
B. Specifying undesirable image characteristics
C. Encrypting generated images
D. Reducing image resolution

Answer

B. Specifying undesirable image characteristics

Explanation

Negative prompts help prevent unwanted features from appearing in outputs.

Question 4

What does image-to-image generation primarily use as input?

A. Only audio data
B. Only tabular data
C. Existing images as references
D. SQL databases

Answer

C. Existing images as references

Explanation

Image-to-image workflows transform or modify existing images.

Question 5

What is inpainting?

A. Compressing image files
B. Expanding image borders
C. Editing selected image regions using masks
D. Detecting objects in video streams

Answer

C. Editing selected image regions using masks

Explanation

Inpainting modifies specific portions of an image.

Question 6

What is outpainting?

A. Detecting image corruption
B. Expanding an image beyond its original boundaries
C. Removing metadata from images
D. Converting images to grayscale

Answer

B. Expanding an image beyond its original boundaries

Explanation

Outpainting extends the visible image area.

Question 7

Which Azure service helps detect harmful AI-generated content?

A. Azure AI Content Safety
B. Azure CDN
C. Azure DNS
D. Azure Firewall

Answer

A. Azure AI Content Safety

Explanation

Azure AI Content Safety evaluates prompts and outputs for policy violations.

Question 8

Why is GPU acceleration commonly used in image generation?

A. GPUs reduce internet bandwidth usage
B. GPUs improve parallel computation performance
C. GPUs eliminate all latency
D. GPUs remove the need for prompts

Answer

B. GPUs improve parallel computation performance

Explanation

Image generation requires intensive matrix computations that GPUs handle efficiently.

Question 9

What is a key benefit of using reference media?

A. Eliminating all hallucinations
B. Maintaining visual consistency and style
C. Encrypting prompts automatically
D. Reducing storage costs

Answer

B. Maintaining visual consistency and style

Explanation

Reference images help preserve branding, character appearance, and artistic style.

Question 10

Which Azure storage service is commonly used for storing generated images?

A. Azure Queue Storage
B. Azure Blob Storage
C. Azure Table Storage
D. Azure DNS

Answer

B. Azure Blob Storage

Explanation

Azure Blob Storage is commonly used for storing media assets and generated images.

Go to the AI-103 Exam Prep Hub main page

AI, AI-103, Generative AI, Microsoft Certification May 25, 2026May 25, 2026

Implement a solution that generates videos from text prompts and reference media (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement computer vision solutions (10–15%)
   --> Design and implement image- and video-generation solutions
      --> Implement a solution that generates videos from text prompts and reference media

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Generative AI is rapidly expanding beyond text and images into video generation. Modern AI systems can now create short videos, animations, cinematic scenes, marketing clips, and visual simulations using:

Natural language prompts
Existing videos
Reference images
Style examples
Storyboards
Multi-modal inputs

For the AI-103 certification exam, you should understand how to design and implement solutions that generate videos from:

Text prompts
Reference media
Multi-modal instructions

You should also understand:

Video generation workflows
Multi-modal AI concepts
Prompt engineering for video
Video editing and transformation
Responsible AI considerations
Performance and scalability
Azure AI services used in video generation pipelines

This topic falls under:

“Design and implement image- and video-generation solutions”

What Is AI Video Generation?

AI video generation uses generative AI models to create or modify videos based on user instructions.

Inputs may include:

Text prompts
Images
Existing videos
Style references
Scene descriptions
Character references
Motion instructions

Outputs may include:

Animated clips
Cinematic scenes
Marketing videos
Product demonstrations
Simulated environments
AI-enhanced video edits

Text-to-Video Generation

What Is Text-to-Video Generation?

Text-to-video generation converts natural language descriptions into video sequences.

Example prompt:

			
A drone flying through a futuristic city at night with neon lights reflecting on wet streets

The model interprets:

Objects
Movement
Lighting
Scene transitions
Camera motion
Temporal consistency

and generates a video sequence.

How Video Generation Differs from Image Generation

Video generation is more complex because models must maintain:

Motion consistency
Temporal continuity
Object persistence
Lighting stability
Camera coherence

Instead of generating a single frame, the model generates a sequence of connected frames.

Temporal Consistency

What Is Temporal Consistency?

Temporal consistency ensures that:

Objects remain stable across frames
Characters retain appearance
Motion looks natural
Lighting stays coherent

Without temporal consistency:

Objects may flicker
Faces may distort
Backgrounds may shift unpredictably

Common Video Generation Use Cases

Marketing and Advertising

Generate:

Promotional videos
Social media content
Product showcases

Entertainment and Media

Create:

Animations
Storyboards
Visual effects
Cinematic previews

Education and Training

Generate:

Simulations
Tutorials
Visual explanations

Gaming

Create:

Cutscenes
Environmental animations
NPC interactions

Enterprise Applications

Generate:

Training videos
Virtual demonstrations
AI-powered presentations

Video Generation Models

Modern AI video systems commonly use:

Diffusion models
Transformer architectures
Multi-modal generative models

These models learn relationships between:

Text
Images
Motion
Time sequences

Diffusion Models for Video

Video diffusion models operate similarly to image diffusion models but add temporal processing.

The model:

Starts with noisy frames
Gradually refines them
Maintains frame-to-frame consistency

Multi-Modal Video Generation

Video generation often combines:

Text prompts
Images
Motion guidance
Audio
Style references

This is called multi-modal generation.

Example Multi-Modal Workflow

Inputs:

Character image
Text prompt
Style reference

Output:

Animated video clip matching the character and style

Prompt Engineering for Video Generation

Why Prompt Engineering Matters

Prompt design strongly affects:

Scene quality
Motion realism
Camera movement
Style consistency
Subject accuracy

Effective Video Prompt Components

Strong prompts often include:

Component	Example
Subject	“A red sports car”
Action	“driving through mountain roads”
Environment	“during sunrise”
Camera movement	“cinematic tracking shot”
Style	“photorealistic”
Mood	“dramatic atmosphere”

Example Prompt

			
A photorealistic cinematic tracking shot of a red sports car driving through mountain roads during sunrise, dramatic atmosphere, ultra detailed

Camera and Motion Instructions

Prompts can specify:

Zoom
Pan
Tilt
Tracking shots
Slow motion
Time-lapse

Example:

Slow-motion close-up shot of ocean waves crashing against rocks

Reference Media in Video Generation

Reference media guides the model using:

Existing videos
Images
Character designs
Motion examples
Style references

Benefits of Reference Media

Reference media helps maintain:

Character consistency
Brand identity
Visual continuity
Artistic style
Scene structure

Image-to-Video Generation

What Is Image-to-Video Generation?

Image-to-video generation animates a static image.

The system adds:

Motion
Camera movement
Environmental effects
Character animation

Example

Input:

Portrait image

Prompt:

The person smiles gently while wind moves through their hair

Output:

Animated portrait video

Video-to-Video Transformation

What Is Video-to-Video Transformation?

Video-to-video systems modify existing videos while preserving motion structure.

Examples:

Style conversion
Cartoon transformation
Lighting changes
Scene modifications

Storyboard-Based Generation

Some systems generate videos from storyboard sequences.

Inputs may include:

Scene descriptions
Frame sketches
Timing instructions

The orchestration system generates connected scenes.

Video Editing with AI

Generative AI can also:

Remove objects
Replace backgrounds
Extend scenes
Improve quality
Add effects
Upscale video resolution

Inpainting for Video

Video inpainting edits selected regions across multiple frames.

Use cases:

Removing unwanted objects
Editing environments
Replacing logos
Correcting defects

Outpainting for Video

Video outpainting expands scenes beyond original frame boundaries.

Examples:

Widening landscapes
Expanding cinematic shots
Creating panoramic sequences

Responsible AI Considerations

Video generation introduces major Responsible AI concerns.

Deepfake Risks

AI-generated videos can impersonate real people.

Potential misuse includes:

Misinformation
Fraud
Identity impersonation

Harmful Content

Generated videos may contain:

Violence
Hate content
Explicit material
Unsafe scenarios

Copyright and Ownership

Generated videos may resemble:

Copyrighted characters
Artistic styles
Existing content

Organizations must ensure legal compliance.

Bias and Fairness

Generative systems may unintentionally reinforce:

Stereotypes
Representation bias
Cultural inaccuracies

Azure AI Content Safety

Microsoft provides:
Azure AI Content Safety

to help evaluate:

Unsafe prompts
Harmful generated outputs
Policy violations

Watermarking and Provenance

AI-generated videos may include:

Watermarks
Metadata
Provenance tracking

These help identify synthetic media.

Video Generation Workflow

A typical workflow may include:

User submits prompt
Input validation occurs
Reference media processed
Prompt enhancement
Video model generates frames
Temporal consistency checks occur
Safety filtering runs
Final rendering occurs
Video stored or streamed

Performance Considerations

Video generation is computationally expensive.

Factors affecting performance include:

Video length
Resolution
Frame rate
Model complexity
Hardware acceleration

GPU Acceleration

Video generation heavily relies on GPUs for:

Parallel frame generation
Matrix operations
Rendering acceleration

Latency Challenges

Video generation typically requires more time than image generation because:

Many frames must be generated
Temporal relationships must be preserved
Rendering workloads are larger

Optimization Techniques

Generate Lower Resolution Drafts

Preview before full rendering.

Frame Interpolation

Generate fewer frames and interpolate intermediate motion.

Batch Rendering

Process multiple frames simultaneously.

Progressive Rendering

Return low-quality previews while high-quality rendering continues.

Azure Services for Video Generation Solutions

Azure OpenAI Service

Supports:

Multi-modal AI workflows
Prompt-based generation
Integration with generative AI applications

Azure AI Foundry

Supports:

AI workflow orchestration
Prompt flows
Model evaluation
Multi-modal pipelines

Azure AI Vision

Can support:

Scene analysis
Object recognition
Video understanding workflows

Azure Blob Storage

Frequently used for:

Storing generated videos
Media asset management
Content delivery integration

Azure Functions

Often used for:

Video processing workflows
Trigger-based orchestration
Rendering automation

Integrating Video Generation into Applications

Applications may integrate AI video generation into:

Marketing platforms
Creative tools
Mobile apps
Enterprise copilots
Learning systems
Media production workflows

Example Enterprise Architecture

An enterprise training platform might:

Accept a text lesson
Generate storyboard prompts
Create AI-generated training videos
Apply narration and subtitles
Run safety validation
Store final videos in Blob Storage

Observability for Video Generation

Production systems should monitor:

Rendering latency
GPU utilization
Failed generations
Storage usage
Safety violations
Cost metrics

Human-in-the-Loop Review

Organizations often require manual review for:

Public-facing media
Brand-sensitive content
Regulated industries
High-risk synthetic media

Best Practices for Video Generation Solutions

Use Detailed Prompts

Detailed instructions improve video quality.

Use Reference Media Carefully

Ensure proper licensing and compliance.

Implement Content Safety

Validate prompts and generated outputs.

Monitor Computational Costs

Video generation can be expensive.

Optimize for Performance

Balance quality with rendering time.

Track Provenance

Identify synthetic content appropriately.

Use Human Review for Sensitive Content

Particularly important for public or regulated use cases.

Real-World Example

A travel company may implement a video generation solution that:

Accepts destination photos
Accepts prompt:

			
Create a cinematic tropical vacation advertisement with drone footage, sunset lighting, and relaxing atmosphere

Generates short promotional videos
Applies safety and brand validation
Stores approved videos in Azure Blob Storage

This demonstrates:

Text-to-video generation
Reference media usage
Workflow orchestration
Responsible AI controls

Exam Tips for AI-103

For the AI-103 exam, remember these important concepts:

Text-to-video generation creates videos from natural language prompts.
Video generation requires temporal consistency across frames.
Reference media helps preserve style and continuity.
Multi-modal generation combines text, images, and motion guidance.
Prompt engineering strongly affects video quality.
Image-to-video generation animates static images.
Video-to-video transformation modifies existing videos.
Responsible AI concerns include deepfakes and harmful content.
Azure AI Content Safety helps moderate unsafe content.
GPU acceleration is critical for video generation performance.
Azure Blob Storage is commonly used for storing generated media.

Practice Exam Questions

Question 1

What is the primary purpose of text-to-video generation?

A. Compressing video files
B. Creating videos from natural language prompts
C. Encrypting media assets
D. Detecting malware in video streams

Answer

B. Creating videos from natural language prompts

Explanation

Text-to-video systems generate video sequences from prompt-based instructions.

Question 2

Why is temporal consistency important in AI video generation?

A. It reduces storage costs
B. It encrypts generated videos
C. It removes all latency
D. It ensures stable and coherent motion across frames

Answer

D. It ensures stable and coherent motion across frames

Explanation

Temporal consistency prevents flickering and maintains object continuity.

Question 3

What is image-to-video generation?

A. Converting videos into audio
B. Compressing images into ZIP files
C. Animating a static image into a video sequence
D. Translating subtitles automatically

Answer

C. Animating a static image into a video sequence

Explanation

Image-to-video generation adds movement and animation to still images.

Question 4

What is a common use of reference media in video generation?

A. Reducing network bandwidth
B. Maintaining visual consistency and style
C. Encrypting prompts
D. Eliminating GPU requirements

Answer

B. Maintaining visual consistency and style

Explanation

Reference media helps preserve branding, character appearance, and artistic direction.

Question 5

Which type of model is commonly used in AI video generation?

A. Diffusion models
B. Spreadsheet models
C. DNS models
D. Relational models

Answer

A. Diffusion models

Explanation

Diffusion-based architectures are widely used for generative media tasks.

Question 6

What is video inpainting?

A. Increasing frame rates automatically
B. Editing selected regions across video frames
C. Compressing video metadata
D. Removing subtitles

Answer

B. Editing selected regions across video frames

Explanation

Video inpainting modifies targeted portions of videos across multiple frames.

Question 7

Which Azure service helps detect harmful generated content?

A. Azure CDN
B. Azure Virtual WAN
C. Azure DNS
D. Azure AI Content Safety

Answer

D. Azure AI Content Safety

Explanation

Azure AI Content Safety evaluates prompts and outputs for unsafe or policy-violating content.

Question 8

Why are GPUs commonly used in video generation?

A. GPUs eliminate the need for prompts
B. GPUs improve parallel processing for rendering and generation
C. GPUs automatically moderate unsafe content
D. GPUs reduce internet latency

Answer

B. GPUs improve parallel processing for rendering and generation

Explanation

Video generation requires intensive computation that GPUs handle efficiently.

Question 9

Which Azure storage service is commonly used for storing generated videos?

A. Azure Blob Storage
B. Azure Queue Storage
C. Azure DNS
D. Azure Firewall

Answer

A. Azure Blob Storage

Explanation

Azure Blob Storage is commonly used for storing large media files.

Question 10

What is a major Responsible AI concern associated with AI-generated videos?

A. Deepfake misuse
B. Reduced CPU temperatures
C. Faster SQL queries
D. Lower image resolution

Answer

A. Deepfake misuse

Explanation

AI-generated videos can potentially be used for impersonation or misinformation.

Go to the AI-103 Exam Prep Hub main page

AI, AI-901, Generative AI, Microsoft Certification May 18, 2026

Create new visual outputs by using generative models (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions with computer vision and image-generation capabilities by using Foundry
      --> Create new visual outputs by using generative models

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Generative AI models are capable of creating entirely new content based on patterns learned during training. One important category of generative AI focuses on producing visual outputs such as images, artwork, diagrams, and design concepts.

For the AI-901 certification exam, candidates should understand the foundational concepts behind creating new visual outputs by using generative AI models through Microsoft Azure AI Foundry and related Azure AI services.

This topic falls under the “Implement AI solutions with computer vision and image-generation capabilities by using Foundry” section of the AI-901 exam objectives.

What Is Generative AI?

Generative AI refers to AI systems capable of creating new content rather than simply analyzing existing data.

Generative AI can produce:

Text
Images
Audio
Video
Code

What Are Generative Image Models?

Generative image models create new visual content from prompts or instructions.

These models can generate:

Artwork
Illustrations
Photorealistic images
Concept designs
Marketing graphics

Example Prompt

“Create an image of a futuristic city at sunset.”

The model generates a new image based on the description.

Azure AI Foundry

Azure AI Foundry provides tools for building and deploying AI-powered applications, including generative AI solutions.

Developers can:

Access generative models
Test prompts
Deploy models
Build AI applications

Image Generation Workflow

A common image-generation workflow includes:

User enters prompt
Application sends prompt to model
Generative model creates image
Application displays generated output

Text-to-Image Generation

Text-to-image models generate images from natural-language prompts.

Example

Prompt

“A golden retriever wearing sunglasses on a beach.”

Result

A newly generated image matching the description.

Image Editing

Some generative models can modify existing images.

Capabilities may include:

Removing objects
Replacing backgrounds
Extending images
Applying artistic styles

Example

Original Image

Photo of a park

Prompt

“Add snow to the scene.”

The model generates an updated version of the image.

Style Transfer

Style transfer applies artistic styles to images.

Example

Prompt

“Make this image look like a watercolor painting.”

The AI transforms the image style.

Inpainting

Inpainting fills missing or selected portions of images.

Example

A damaged image has missing areas that the AI reconstructs.

Outpainting

Outpainting expands images beyond their original boundaries.

Example

A cropped landscape image is extended to show more scenery.

Prompt Engineering

Prompt engineering involves crafting prompts that improve AI-generated results.

Good prompts are:

Clear
Detailed
Specific

Weak Prompt Example

“Create a dog.”

Better Prompt Example

“Create a realistic golden retriever sitting beside a lake during sunset.”

System Prompts

System prompts guide the overall behavior of the AI model.

They may define:

Safety rules
Content restrictions
Tone
Style preferences

Model Parameters

Generative AI models may use parameters that influence output behavior.

Common concepts include:

Creativity/randomness
Response length
Style guidance

For AI-901, conceptual understanding is more important than memorizing exact parameter names.

APIs and Endpoints

Applications communicate with deployed generative models using:

APIs
Endpoints

These allow prompts and images to be processed programmatically.

Authentication

Applications must securely authenticate before using Azure AI services.

Common authentication methods include:

API keys
Azure credentials
Managed identities

User Interface Components

A lightweight image-generation application may include:

Prompt text box
Image upload option
Generate button
Image display area

Real-Time Generation

Some applications generate images interactively in near real time.

This improves user experience and experimentation.

Common Real-World Scenarios

Scenario 1: Marketing Content Creation

Goal

Generate promotional graphics.

Features

Text-to-image generation
Brand-aligned designs
Rapid content creation

Scenario 2: Product Concept Design

Goal

Visualize product ideas.

Features

Prototype generation
Style experimentation
Rapid iteration

Scenario 3: Educational Content

Goal

Generate learning visuals and illustrations.

Features

Diagram generation
Visual storytelling
Accessibility support

Scenario 4: Entertainment and Gaming

Goal

Create concept art and environments.

Features

Character design
Landscape generation
Artistic experimentation

Responsible AI Considerations

Generative image applications should follow Responsible AI principles.

Key considerations include:

Fairness
Privacy
Transparency
Inclusiveness
Accountability
Security

Copyright and Intellectual Property

Organizations should consider:

Ownership rights
Licensing concerns
Use of copyrighted material

Generated content may still raise legal and ethical questions.

Harmful Content Risks

Generative AI systems may create:

Offensive content
Misleading images
Unsafe material

Content filtering and moderation are important safeguards.

Deepfakes

AI-generated images or videos designed to imitate real people are called deepfakes.

Deepfakes can create ethical and security concerns.

Hallucinations

Generative models may produce inaccurate or unrealistic outputs.

These incorrect outputs are called hallucinations.

Bias and Fairness

Generated images may unintentionally reflect societal biases.

Examples include:

Stereotypical portrayals
Uneven representation
Cultural bias

Transparency

Users should understand:

AI generated the image
Outputs may contain inaccuracies
Images may be synthetic rather than real

Error Handling

Applications should handle:

Invalid prompts
Unsupported file types
Network interruptions
Authentication failures
Rate limits

Advantages of Generative Image Models

Benefits include:

Faster content creation
Creative assistance
Rapid prototyping
Automation
Enhanced user engagement

Limitations of Generative Models

Challenges include:

Hallucinations
Bias
Ethical concerns
Copyright uncertainty
Variable output quality

High-Level Workflow

A simplified workflow includes:

User enters prompt
Application sends request
Model generates image
Application displays output

Example High-Level Pseudocode

			
prompt = get_prompt()
image = generate_image(prompt)
display_image(image)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Important AI-901 Exam Tips

For the exam, remember these key points:

Generative AI creates new content.
Text-to-image models generate images from prompts.
Azure AI Foundry supports generative AI development.
Prompt engineering improves output quality.
APIs and endpoints connect applications to AI services.
Authentication secures access to Azure AI resources.
Deepfakes are synthetic media designed to imitate real people.
Hallucinations are inaccurate AI-generated outputs.
Responsible AI principles apply to generative image systems.
Transparency is important when presenting AI-generated content.

Quick Knowledge Check

Question 1

What does a text-to-image model do?

Answer

Generates images from natural-language prompts.

Question 2

What is prompt engineering?

Answer

Designing prompts to improve AI-generated results.

Question 3

What are deepfakes?

Answer

AI-generated media designed to imitate real people.

Question 4

Why is transparency important in generative AI?

Answer

Users should understand that AI generated the content and that inaccuracies may exist.

Practice Exam Questions

Question 1

What is the PRIMARY purpose of a generative AI model?

A. To create new content based on learned patterns
B. To replace computer hardware
C. To increase internet bandwidth
D. To manage operating systems

Correct Answer

A. To create new content based on learned patterns

Explanation

Generative AI models create new outputs such as images, text, audio, or video using patterns learned during training.

Why the Other Answers Are Incorrect

B. To replace computer hardware

Generative AI is software-based and does not replace hardware.

C. To increase internet bandwidth

AI models do not improve network speeds.

D. To manage operating systems

Operating system management is unrelated to generative AI.

Question 2

What does a text-to-image model do?

A. Generates images from text prompts
B. Converts images into spreadsheets
C. Detects malware in files
D. Compresses image files automatically

Correct Answer

A. Generates images from text prompts

Explanation

Text-to-image models create images based on natural-language descriptions provided by users.

Why the Other Answers Are Incorrect

B. Converts images into spreadsheets

This is unrelated to generative AI.

C. Detects malware in files

This is a cybersecurity task.

D. Compresses image files automatically

Compression is unrelated to image generation.

Question 3

Which Microsoft platform provides tools for building and deploying generative AI applications?

A. Azure AI Foundry
B. Microsoft Paint
C. Windows File Explorer
D. Microsoft Notepad

Correct Answer

A. Azure AI Foundry

Explanation

Azure AI Foundry provides tools for deploying, testing, and managing AI-powered applications.

Why the Other Answers Are Incorrect

B. Microsoft Paint

Paint is a graphics editor, not an AI platform.

C. Windows File Explorer

This is a file management tool.

D. Microsoft Notepad

Notepad is a text editor.

Question 4

What is prompt engineering?

A. Designing prompts to improve AI-generated results
B. Repairing damaged computer hardware
C. Compressing images into smaller files
D. Monitoring internet traffic

Correct Answer

A. Designing prompts to improve AI-generated results

Explanation

Prompt engineering involves creating clear and specific prompts to guide AI systems toward better outputs.

Why the Other Answers Are Incorrect

B. Repairing damaged computer hardware

This is unrelated to AI prompting.

C. Compressing images into smaller files

Compression is unrelated to prompts.

D. Monitoring internet traffic

This is a networking task.

Question 5

Which prompt is MOST likely to generate a detailed image?

A. “Create a dog.”
B. “Generate.”
C. “Create a realistic golden retriever sitting beside a lake during sunset.”
D. “Image.”

Correct Answer

C. “Create a realistic golden retriever sitting beside a lake during sunset.”

Explanation

Detailed prompts generally produce more accurate and useful AI-generated images.

Why the Other Answers Are Incorrect

A. “Create a dog.”

This prompt is too vague.

B. “Generate.”

This provides almost no guidance.

D. “Image.”

This prompt is incomplete and unclear.

Question 6

What is inpainting?

A. Filling or reconstructing parts of an image
B. Converting speech into text
C. Detecting objects in video streams
D. Encrypting image files

Correct Answer

A. Filling or reconstructing parts of an image

Explanation

Inpainting allows AI to fill in missing or selected regions within an image.

Why the Other Answers Are Incorrect

B. Converting speech into text

This is speech recognition.

C. Detecting objects in video streams

This is a computer vision task.

D. Encrypting image files

Encryption is unrelated to inpainting.

Question 7

What are deepfakes?

A. AI-generated media designed to imitate real people
B. Hardware failures in AI systems
C. Encrypted image storage systems
D. High-speed networking protocols

Correct Answer

A. AI-generated media designed to imitate real people

Explanation

Deepfakes use generative AI to create realistic but synthetic media that imitates real individuals.

Why the Other Answers Are Incorrect

B. Hardware failures in AI systems

This is unrelated to generated media.

C. Encrypted image storage systems

This is unrelated to deepfakes.

D. High-speed networking protocols

Networking is unrelated to deepfake technology.

Question 8

How do applications typically communicate with deployed generative AI models?

A. Through APIs and endpoints
B. Through printer drivers
C. Through monitor calibration settings
D. Through USB-only connections

Correct Answer

A. Through APIs and endpoints

Explanation

Applications use APIs and endpoints to send prompts and receive generated outputs from AI services.

Why the Other Answers Are Incorrect

B. Through printer drivers

Printers are unrelated to AI communication.

C. Through monitor calibration settings

This is unrelated to cloud AI services.

D. Through USB-only connections

Cloud AI services use network communication.

Question 9

Which Responsible AI concern is especially important for generative image models?

A. Preventing harmful or misleading content generation
B. Increasing keyboard typing speed
C. Improving spreadsheet formulas
D. Reducing monitor power consumption

Correct Answer

A. Preventing harmful or misleading content generation

Explanation

Generative AI systems can potentially create unsafe, offensive, or misleading content, making moderation and safeguards important.

Why the Other Answers Are Incorrect

B. Increasing keyboard typing speed

This is unrelated to Responsible AI.

C. Improving spreadsheet formulas

This is unrelated to image generation.

D. Reducing monitor power consumption

This is unrelated to AI ethics.

Question 10

What are hallucinations in generative AI systems?

A. Inaccurate or fabricated AI-generated outputs
B. Hardware installation errors
C. Network outages
D. Audio playback failures

Correct Answer

A. Inaccurate or fabricated AI-generated outputs

Explanation

Hallucinations occur when generative AI produces incorrect, unrealistic, or invented outputs.

Why the Other Answers Are Incorrect

B. Hardware installation errors

This is unrelated to AI-generated content.

C. Network outages

This is a connectivity issue.

D. Audio playback failures

This is unrelated to generative image models.

Final Thoughts

Creating new visual outputs by using generative models is an important AI-901 certification topic. Microsoft expects candidates to understand the foundational concepts behind generative image AI, including text-to-image generation, prompt engineering, APIs, deployment, Responsible AI principles, hallucinations, and ethical considerations.

Azure AI Foundry provides powerful tools for building intelligent applications capable of generating creative visual content for business, education, accessibility, and entertainment scenarios.

Go to the AI-901 Exam Prep Hub main page

AI, AI-901, Artificial Intelligence (AI), Computer Vision, Microsoft Certification May 18, 2026

Interpret visual input in prompts by using a deployed multimodal model (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
   --> Implement AI solutions with computer vision and image-generation capabilities by using Foundry
      --> Interpret visual input in prompts by using a deployed multimodal model

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Modern AI systems are increasingly capable of understanding not only text and speech, but also visual information such as images and videos. Multimodal AI models combine multiple forms of input to generate intelligent responses and insights.

For the AI-901 certification exam, candidates should understand the foundational concepts behind interpreting visual input in prompts by using deployed multimodal models through Microsoft Azure AI Foundry and related Azure AI services.

This topic falls under the “Implement AI solutions with computer vision and image-generation capabilities by using Foundry” section of the AI-901 exam objectives.

What Is a Multimodal Model?

A multimodal model is an AI model capable of processing multiple types of input and output.

These modalities may include:

Text
Images
Speech/audio
Video

Multimodal models can combine information across different input types to generate responses.

What Is Visual Input?

Visual input refers to image or video data provided to an AI system.

Examples include:

Photographs
Screenshots
Documents
Charts
Diagrams
Videos

Example Visual Prompt

A user uploads a photo and asks:

“What objects are visible in this image?”

The AI analyzes the visual content and generates a response.

Computer Vision

Computer vision is the field of AI focused on enabling systems to interpret and understand visual information.

Computer vision tasks include:

Image classification
Object detection
Facial analysis
Optical character recognition (OCR)
Image captioning

Azure AI Vision

Azure AI Vision provides computer vision capabilities in Azure.

Features include:

Image analysis
OCR
Object detection
Image captioning
Face-related analysis

Azure AI Foundry

Azure AI Foundry provides tools for building and managing multimodal AI applications.

Developers can:

Deploy AI models
Test prompts
Analyze images
Build AI-powered apps

Deployed Models

A deployed model is an AI model made available for real-time use through a cloud endpoint.

Applications communicate with deployed models using APIs.

Visual Prompt Workflow

A common workflow includes:

User uploads image
Application sends image to multimodal model
Model analyzes visual content
Model generates response
Application displays results

Example Workflow

User Uploads Image

A photo of a dog playing in a park

User Prompt

“Describe this image.”

AI Response

“A brown dog is running through a grassy park.”

Image Classification

Image classification identifies the primary category of an image.

Example

Image

Picture of a cat

Classification

“Cat”

Object Detection

Object detection identifies and locates multiple objects within an image.

Example

Image

Street scene

Detected Objects

Car
Bicycle
Traffic light
Pedestrian

Optical Character Recognition (OCR)

OCR extracts text from images or scanned documents.

Example

Image

Photo of a receipt

Extracted Text

Store name
Total amount
Date

Image Captioning

Image captioning generates natural-language descriptions of images.

Example

Image

A child flying a kite

Caption

“A child flying a colorful kite in a field.”

Visual Question Answering

Some multimodal models can answer questions about images.

Example

Prompt

“How many people are in the image?”

The model analyzes the image and generates an answer.

Combining Text and Images

Multimodal systems often combine:

Text prompts
Visual input

This improves contextual understanding.

Example

Image

A restaurant menu

Prompt

“Which item appears to be vegetarian?”

The AI analyzes both the image and the prompt together.

APIs and Endpoints

Applications communicate with deployed multimodal models through:

APIs
Endpoints

These allow images and prompts to be submitted programmatically.

Authentication

Applications must securely authenticate before accessing Azure AI services.

Common methods include:

API keys
Azure credentials
Managed identities

User Interface Components

A lightweight visual AI application may include:

Image upload area
Prompt input box
Results display
Image preview

Real-Time Processing

Many multimodal applications support near real-time image analysis.

This enables interactive user experiences.

Common Real-World Scenarios

Scenario 1: Accessibility Assistant

Goal

Describe visual content for visually impaired users.

Features

Image captioning
OCR
Voice output

Scenario 2: Retail Product Recognition

Goal

Identify products from images.

Features

Object detection
Classification
Product lookup

Scenario 3: Document Processing

Goal

Extract information from scanned forms.

Features

OCR
Text extraction
Data analysis

Scenario 4: Content Moderation

Goal

Identify harmful or unsafe visual content.

Features

Image analysis
Safety filtering
Automated moderation

Responsible AI Considerations

Visual AI applications should follow Responsible AI principles.

Key considerations include:

Privacy
Fairness
Transparency
Inclusiveness
Accountability
Security

Privacy Concerns

Images may contain:

Personal information
Faces
Sensitive documents

Organizations should protect user data appropriately.

Bias and Fairness

Computer vision systems may perform unevenly across:

Skin tones
Age groups
Lighting conditions
Demographics

Organizations should evaluate models carefully for fairness.

Transparency

Users should understand:

AI is analyzing images
AI-generated descriptions may contain errors
Images may be stored or processed in the cloud

Hallucinations

Multimodal AI systems may generate inaccurate visual descriptions.

These incorrect outputs are called hallucinations.

Applications should not assume all AI-generated outputs are accurate.

Error Handling

Applications should handle:

Unsupported image formats
Low-quality images
Network failures
Authentication errors
Rate limits

Image Quality Challenges

Poor image quality can reduce accuracy.

Examples include:

Blurry images
Poor lighting
Occluded objects
Low resolution

Advantages of Visual AI Applications

Benefits include:

Automation
Faster analysis
Accessibility improvements
Improved user experiences
Scalable image processing

Limitations of Visual AI Applications

Challenges include:

Recognition inaccuracies
Bias
Privacy concerns
Hallucinations
Sensitivity to image quality

High-Level Workflow

A simplified workflow includes:

Upload image
Send image and prompt to model
Analyze visual content
Generate response
Display results

Example High-Level Pseudocode

			
image = upload_image()
prompt = get_prompt()
response = analyze_image(image, prompt)
display_response(response)

For AI-901, understanding the workflow is more important than memorizing exact syntax.

Important AI-901 Exam Tips

For the exam, remember these key points:

Multimodal models process multiple data types.
Visual input includes images and video.
Azure AI Vision supports computer vision workloads.
OCR extracts text from images.
Image captioning generates descriptions of images.
Object detection identifies multiple objects in images.
APIs and endpoints connect applications to AI services.
Authentication secures AI access.
Responsible AI principles apply to computer vision systems.
Hallucinations are inaccurate AI-generated outputs.

Quick Knowledge Check

Question 1

What is OCR used for?

Answer

Extracting text from images or scanned documents.

Question 2

What does image captioning do?

Answer

Generates natural-language descriptions of images.

Question 3

Why are multimodal models useful?

Answer

They can process multiple types of input such as text and images together.

Question 4

Why is fairness important in computer vision?

Answer

To reduce biased or uneven performance across different groups of people.

Practice Exam Questions

Question 1

What is a multimodal AI model?

A. A model that processes only text
B. A model capable of processing multiple types of input such as text and images
C. A model used only for networking
D. A model designed exclusively for spreadsheets

Correct Answer

B. A model capable of processing multiple types of input such as text and images

Explanation

Multimodal models can process and combine different forms of input, including text, images, audio, and video.

Why the Other Answers Are Incorrect

A. A model that processes only text

That describes a text-only model.

C. A model used only for networking

Networking is unrelated to multimodal AI.

D. A model designed exclusively for spreadsheets

This is unrelated to AI modalities.

Question 2

Which Azure service provides computer vision capabilities such as image analysis and OCR?

A. Azure AI Vision
B. Azure Backup
C. Azure Virtual Desktop
D. Azure Monitor

Correct Answer

A. Azure AI Vision

Explanation

Azure AI Vision provides computer vision features including OCR, object detection, and image captioning.

Why the Other Answers Are Incorrect

B. Azure Backup

This is a backup service.

C. Azure Virtual Desktop

This provides desktop virtualization.

D. Azure Monitor

This is used for monitoring and diagnostics.

Question 3

What does OCR stand for?

A. Optical Character Recognition
B. Operational Cloud Routing
C. Object Classification Registry
D. Open Compute Rendering

Correct Answer

A. Optical Character Recognition

Explanation

OCR extracts text from images or scanned documents.

Why the Other Answers Are Incorrect

B. Operational Cloud Routing

This is not an AI vision term.

C. Object Classification Registry

This is not the meaning of OCR.

D. Open Compute Rendering

This is unrelated to text extraction.

Question 4

What is the PRIMARY purpose of object detection?

A. To identify and locate objects within an image
B. To translate speech into text
C. To summarize long documents
D. To improve internet speed

Correct Answer

A. To identify and locate objects within an image

Explanation

Object detection identifies multiple objects and their positions within an image.

Why the Other Answers Are Incorrect

B. To translate speech into text

This is a speech recognition task.

C. To summarize long documents

This is a text analysis task.

D. To improve internet speed

Object detection does not affect networking.

Question 5

What does image captioning do?

A. Generates natural-language descriptions of images
B. Converts text into audio
C. Detects malware in files
D. Compresses images automatically

Correct Answer

A. Generates natural-language descriptions of images

Explanation

Image captioning uses AI to describe visual content in natural language.

Why the Other Answers Are Incorrect

B. Converts text into audio

This is speech synthesis.

C. Detects malware in files

This is unrelated to computer vision.

D. Compresses images automatically

Captioning does not perform compression.

Question 6

How do applications typically communicate with deployed multimodal models?

A. Through APIs and endpoints
B. Through USB-only connections
C. Through monitor drivers
D. Through spreadsheet templates

Correct Answer

A. Through APIs and endpoints

Explanation

Applications use APIs and endpoints to send prompts and images to AI services.

Why the Other Answers Are Incorrect

B. Through USB-only connections

Cloud AI services use network communication.

C. Through monitor drivers

These are unrelated to AI communication.

D. Through spreadsheet templates

This is unrelated to AI integration.

Question 7

Why is authentication important when accessing Azure AI services?

A. To secure access to AI resources
B. To increase image resolution
C. To improve keyboard performance
D. To reduce monitor brightness

Correct Answer

A. To secure access to AI resources

Explanation

Authentication ensures that only authorized users and applications can access Azure AI services.

Why the Other Answers Are Incorrect

B. To increase image resolution

Authentication does not affect image quality.

C. To improve keyboard performance

This is unrelated to AI services.

D. To reduce monitor brightness

Authentication does not control display settings.

Question 8

Which Responsible AI concern is especially important when analyzing images?

A. Protecting personal and sensitive visual information
B. Increasing video frame rates
C. Improving printer output quality
D. Accelerating spreadsheet calculations

Correct Answer

A. Protecting personal and sensitive visual information

Explanation

Images may contain faces, documents, or other sensitive information that must be protected.

Why the Other Answers Are Incorrect

B. Increasing video frame rates

This is unrelated to Responsible AI.

C. Improving printer output quality

Printers are unrelated to computer vision ethics.

D. Accelerating spreadsheet calculations

This is unrelated to image analysis.

Question 9

What are hallucinations in multimodal AI systems?

A. Incorrect or fabricated AI-generated outputs
B. Hardware installation failures
C. Internet connectivity issues
D. Audio recording problems

Correct Answer

A. Incorrect or fabricated AI-generated outputs

Explanation

Hallucinations occur when AI generates inaccurate or invented descriptions or answers.

Why the Other Answers Are Incorrect

B. Hardware installation failures

This is unrelated to AI-generated content.

C. Internet connectivity issues

This is a networking problem.

D. Audio recording problems

This relates to audio hardware or software.

Question 10

Which factor can negatively affect computer vision accuracy?

A. Poor image quality
B. Spreadsheet formatting
C. Screen brightness settings
D. Keyboard layout

Correct Answer

A. Poor image quality

Explanation

Blurry images, poor lighting, and low resolution can reduce computer vision accuracy.

Why the Other Answers Are Incorrect

B. Spreadsheet formatting

This does not affect image analysis.

C. Screen brightness settings

This does not directly affect AI image processing.

D. Keyboard layout

Keyboard settings are unrelated to computer vision.

Final Thoughts

Interpreting visual input using deployed multimodal models is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand the foundational concepts behind computer vision and multimodal AI applications, including image analysis, OCR, object detection, image captioning, APIs, authentication, and Responsible AI principles.

Azure AI Vision and Azure AI Foundry provide powerful tools for building intelligent applications capable of understanding and responding to visual information in real-world scenarios.

Go to the AI-901 Exam Prep Hub main page

AI, AI-901, Computer Vision, Generative AI, Microsoft Certification May 18, 2026

Identify features and capabilities of Computer Vision and Image-Generation models (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Identify AI concepts and capabilities (40–45%)
   --> Identify AI workloads
      --> Identify features and capabilities of Computer Vision and Image-Generation models

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Computer vision and image-generation AI models are important AI workloads covered in the AI-901 certification exam. Microsoft expects candidates to understand how AI systems analyze visual information and generate new images using machine learning and deep learning technologies.

These AI capabilities are widely used in healthcare, manufacturing, security, retail, entertainment, accessibility, and many other industries.

This topic falls under the “Identify AI workloads” section of the AI-901 exam objectives.

What Is Computer Vision?

Computer vision is an AI workload that enables computers to analyze and interpret images and video.

Computer vision systems attempt to simulate human visual understanding.

These systems can:

Identify objects
Detect faces
Read text
Analyze scenes
Track movement
Recognize patterns

How Computer Vision Works

Computer vision models are typically trained using large collections of labeled images.

The models learn patterns such as:

Shapes
Colors
Textures
Edges
Spatial relationships

Modern computer vision systems commonly use:

Deep learning
Neural networks
Convolutional Neural Networks (CNNs)

Common Computer Vision Capabilities

For the AI-901 exam, important computer vision capabilities include:

Image classification
Object detection
Facial recognition
Optical Character Recognition (OCR)
Image analysis
Image tagging

Image Classification

Image classification identifies the primary subject or category of an image.

The model assigns labels to entire images.

Image Classification Example

Input

An image of a dog.

Output

“Dog”

Common Use Cases for Image Classification

Medical Imaging

Classifying medical scans.

Retail

Categorizing products automatically.

Agriculture

Identifying plant diseases.

Wildlife Monitoring

Recognizing animal species.

Object Detection

Object detection identifies and locates multiple objects within an image.

Unlike image classification, object detection can identify several objects and their positions.

Object Detection Example

Input

Street traffic image.

Output

Car
Pedestrian
Traffic light

with location boundaries around each object.

Common Use Cases for Object Detection

Autonomous Vehicles

Detecting vehicles and pedestrians.

Manufacturing

Identifying defective products.

Security Systems

Detecting unauthorized activity.

Retail Analytics

Monitoring customer movement in stores.

Facial Recognition

Facial recognition identifies or verifies individuals using facial features.

Common Facial Recognition Capabilities

Face Detection

Determines whether faces exist in an image.

Face Verification

Confirms whether two faces belong to the same person.

Face Identification

Identifies a person from a database of known individuals.

Common Use Cases for Facial Recognition

Smartphone Authentication

Unlocking phones using facial recognition.

Building Security

Controlling physical access.

Attendance Systems

Tracking employee attendance.

Airport Security

Identity verification systems.

Optical Character Recognition (OCR)

OCR extracts text from images, scanned documents, or photographs.

OCR converts visual text into machine-readable text.

OCR Example

Input

A scanned invoice image.

Output

Extracted text including:

Invoice number
Dates
Totals

Common OCR Use Cases

Invoice Processing

Automating financial workflows.

Document Digitization

Converting paper documents into searchable digital text.

Receipt Scanning

Extracting purchase information.

Accessibility

Reading text aloud for visually impaired users.

Image Tagging and Image Analysis

Image analysis systems can automatically generate descriptions or tags for images.

Example Tags

An image may receive tags such as:

Beach
Ocean
Sunset
Person

Common Use Cases

Photo Organization

Automatically categorizing photos.

Content Moderation

Identifying inappropriate images.

Search Optimization

Improving image search systems.

Video Analysis

Computer vision can also process video streams.

Common Video Analysis Tasks

Motion detection
Activity recognition
Traffic monitoring
Surveillance analysis

What Are Image-Generation Models?

Image-generation models create new images using AI.

These models learn visual patterns from training data and generate entirely new content.

Image-generation AI is part of generative AI.

How Image-Generation Models Work

Image-generation systems are trained on large image datasets.

The models learn relationships between:

Objects
Colors
Styles
Shapes
Text descriptions

Many systems use deep learning architectures such as:

Diffusion models
Generative Adversarial Networks (GANs)

Text-to-Image Generation

Text-to-image models generate images from written prompts.

Example

Prompt

“A futuristic city at sunset”

Output

An AI-generated image matching the description.

Common Use Cases for Image Generation

Marketing and Advertising

Creating promotional graphics.

Entertainment and Gaming

Generating concept art.

Design Assistance

Creating mockups or creative inspiration.

Education

Generating visual learning content.

Accessibility

Creating visual representations from text descriptions.

Image Editing and Enhancement

Some AI models can edit or enhance existing images.

Common Capabilities

Background removal
Image restoration
Colorization
Resolution enhancement
Style transfer

Deepfakes and Synthetic Media

AI-generated images and videos can create highly realistic synthetic content.

This technology can be useful but also creates ethical concerns.

Responsible AI Considerations

Computer vision and image-generation systems raise important Responsible AI considerations.

Organizations should consider:

Privacy
Consent
Bias
Security
Transparency
Misuse prevention

Bias in Vision Models

Computer vision systems may perform differently across demographic groups if training data is unbalanced.

Example risks include:

Facial recognition inaccuracies
Biased image classification
Unequal detection accuracy

Ethical Concerns with Image Generation

Potential concerns include:

Deepfakes
Misinformation
Copyright concerns
Identity misuse
Harmful content generation

Organizations should implement safeguards and moderation systems.

Azure AI Vision Services

Azure AI Vision Services provide prebuilt computer vision capabilities including:

Image analysis
OCR
Face detection
Object detection
Video analysis

Azure OpenAI and Image Generation

Azure OpenAI Service supports generative AI capabilities, including image-generation models.

These services help organizations build AI-powered creative applications.

Computer Vision vs. Image Generation

Capability	Purpose
Computer Vision	Analyze and understand images
Image Generation	Create new images

Real-World Examples

Scenario 1: Self-Driving Car

Goal

Detect vehicles and pedestrians.

Capability Used

Object detection

Scenario 2: Receipt Scanning App

Goal

Extract text from receipts.

Capability Used

OCR

Scenario 3: Social Media Photo Organization

Goal

Automatically tag uploaded photos.

Capability Used

Image analysis and tagging

Scenario 4: AI Art Generator

Goal

Create artwork from text prompts.

Capability Used

Image generation

Scenario 5: Smartphone Face Unlock

Goal

Verify user identity.

Capability Used

Facial recognition

Important AI-901 Exam Tips

For the exam, remember these key points:

Computer vision analyzes images and video.
Image classification labels entire images.
Object detection identifies and locates objects.
OCR extracts text from images.
Facial recognition identifies or verifies individuals.
Image-generation models create new images.
Text-to-image systems generate visuals from prompts.
Computer vision and generative AI are different workloads.
Responsible AI principles are important in vision systems.

Quick Knowledge Check

Question 1

What is the purpose of OCR?

Answer

To extract text from images or scanned documents.

Question 2

What is the difference between image classification and object detection?

Answer

Image classification labels an entire image, while object detection identifies and locates multiple objects within an image.

Question 3

What do image-generation models do?

Answer

They create new images using AI.

Question 4

Which AI capability is commonly used for smartphone face unlock?

Answer

Facial recognition.

Practice Exam Questions

Question 1

What is the PRIMARY purpose of computer vision?

A. Converting speech into text
B. Analyzing and understanding images and video
C. Predicting stock prices
D. Generating database queries

Correct Answer

B. Analyzing and understanding images and video

Explanation

Computer vision enables AI systems to interpret and analyze visual content such as images and video.

Why the Other Answers Are Incorrect

A. Converting speech into text

This is speech recognition.

C. Predicting stock prices

This is typically a regression task.

D. Generating database queries

This is unrelated to computer vision.

Question 2

Which computer vision capability identifies the main subject or category of an image?

A. OCR
B. Image classification
C. Speech synthesis
D. Clustering

Correct Answer

B. Image classification

Explanation

Image classification assigns labels or categories to entire images.

Why the Other Answers Are Incorrect

A. OCR

OCR extracts text from images.

C. Speech synthesis

Speech synthesis converts text into spoken audio.

D. Clustering

Clustering groups similar data.

Question 3

A self-driving car needs to identify pedestrians, traffic signs, and vehicles in real time.

Which AI capability is MOST appropriate?

A. Sentiment analysis
B. Object detection
C. Keyword extraction
D. Language detection

Correct Answer

B. Object detection

Explanation

Object detection identifies and locates multiple objects within images or video streams.

Why the Other Answers Are Incorrect

A. Sentiment analysis

Sentiment analysis evaluates emotional tone in text.

C. Keyword extraction

Keyword extraction identifies important phrases in text.

D. Language detection

Language detection identifies written languages.

Question 4

What is the PRIMARY purpose of Optical Character Recognition (OCR)?

A. Translating speech between languages
B. Extracting text from images or scanned documents
C. Detecting faces in photographs
D. Generating new artwork

Correct Answer

B. Extracting text from images or scanned documents

Explanation

OCR converts text within images into machine-readable text.

Why the Other Answers Are Incorrect

A. Translating speech between languages

This is speech translation.

C. Detecting faces in photographs

This is facial recognition or face detection.

D. Generating new artwork

This is an image-generation capability.

Question 5

Which AI capability is commonly used for smartphone face unlock features?

A. Facial recognition
B. Speech recognition
C. Regression
D. Text summarization

Correct Answer

A. Facial recognition

Explanation

Facial recognition systems identify or verify users using facial features.

Why the Other Answers Are Incorrect

B. Speech recognition

Speech recognition processes spoken language.

C. Regression

Regression predicts numeric values.

D. Text summarization

Summarization condenses text.

Question 6

What is the PRIMARY function of image-generation models?

A. Extracting text from images
B. Creating new images using AI
C. Detecting network intrusions
D. Translating written languages

Correct Answer

B. Creating new images using AI

Explanation

Image-generation models produce new visual content based on learned patterns and prompts.

Why the Other Answers Are Incorrect

A. Extracting text from images

This is OCR.

C. Detecting network intrusions

This is unrelated to image generation.

D. Translating written languages

This is an NLP capability.

Question 7

Which example BEST represents a text-to-image generation system?

A. A chatbot answering questions
B. An AI model creating artwork from a written prompt
C. A speech recognition application
D. A recommendation engine

Correct Answer

B. An AI model creating artwork from a written prompt

Explanation

Text-to-image systems generate images based on textual descriptions.

Why the Other Answers Are Incorrect

A. A chatbot answering questions

This is generative text AI.

C. A speech recognition application

Speech recognition converts speech into text.

D. A recommendation engine

Recommendation systems suggest products or content.

Question 8

What is the key difference between image classification and object detection?

A. Image classification processes audio while object detection processes video
B. Image classification labels an entire image, while object detection identifies and locates multiple objects
C. Object detection only works with text
D. There is no difference

Correct Answer

B. Image classification labels an entire image, while object detection identifies and locates multiple objects

Explanation

Image classification provides a label for an entire image, while object detection identifies multiple objects and their locations.

Why the Other Answers Are Incorrect

A. Image classification processes audio while object detection processes video

Both work with visual data.

C. Object detection only works with text

Object detection works with images and video.

D. There is no difference

These are distinct computer vision tasks.

Question 9

Which Responsible AI concern is MOST associated with image-generation systems?

A. Deepfakes and synthetic media misuse
B. Spreadsheet formatting errors
C. SQL indexing problems
D. Network bandwidth allocation

Correct Answer

A. Deepfakes and synthetic media misuse

Explanation

Image-generation AI can create highly realistic synthetic content, raising concerns about misinformation and misuse.

Why the Other Answers Are Incorrect

B. Spreadsheet formatting errors

This is unrelated to AI image generation.

C. SQL indexing problems

This is a database issue.

D. Network bandwidth allocation

This is unrelated to Responsible AI concerns.

Question 10

A retailer wants to automatically categorize product photos into categories such as shoes, shirts, and electronics.

Which AI capability is MOST appropriate?

A. Image classification
B. OCR
C. Speech synthesis
D. Sentiment analysis

Correct Answer

A. Image classification

Explanation

Image classification assigns category labels to images based on visual content.

Why the Other Answers Are Incorrect

B. OCR

OCR extracts text from images.

C. Speech synthesis

Speech synthesis generates spoken audio.

D. Sentiment analysis

Sentiment analysis evaluates emotional tone in text.

Final Thoughts

Computer vision and image-generation AI models are essential components of modern AI systems and important topics for the AI-901 certification exam. Microsoft expects candidates to understand how AI systems analyze visual information and generate new content, along with common business scenarios where these technologies are applied.

These capabilities help organizations build intelligent visual applications using Azure AI services and generative AI technologies.

Go to the AI-901 Exam Prep Hub main page