Tag: Image Generation

Create new visual outputs by using generative models (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
--> Implement AI solutions with computer vision and image-generation capabilities by using Foundry
--> Create new visual outputs by using generative models


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Generative AI models are capable of creating entirely new content based on patterns learned during training. One important category of generative AI focuses on producing visual outputs such as images, artwork, diagrams, and design concepts.

For the AI-901 certification exam, candidates should understand the foundational concepts behind creating new visual outputs by using generative AI models through Microsoft Azure AI Foundry and related Azure AI services.

This topic falls under the “Implement AI solutions with computer vision and image-generation capabilities by using Foundry” section of the AI-901 exam objectives.


What Is Generative AI?

Generative AI refers to AI systems capable of creating new content rather than simply analyzing existing data.

Generative AI can produce:

  • Text
  • Images
  • Audio
  • Video
  • Code

What Are Generative Image Models?

Generative image models create new visual content from prompts or instructions.

These models can generate:

  • Artwork
  • Illustrations
  • Photorealistic images
  • Concept designs
  • Marketing graphics

Example Prompt

“Create an image of a futuristic city at sunset.”

The model generates a new image based on the description.


Azure AI Foundry

Azure AI Foundry provides tools for building and deploying AI-powered applications, including generative AI solutions.

Developers can:

  • Access generative models
  • Test prompts
  • Deploy models
  • Build AI applications

Image Generation Workflow

A common image-generation workflow includes:

  1. User enters prompt
  2. Application sends prompt to model
  3. Generative model creates image
  4. Application displays generated output

Text-to-Image Generation

Text-to-image models generate images from natural-language prompts.


Example

Prompt

“A golden retriever wearing sunglasses on a beach.”

Result

A newly generated image matching the description.


Image Editing

Some generative models can modify existing images.

Capabilities may include:

  • Removing objects
  • Replacing backgrounds
  • Extending images
  • Applying artistic styles

Example

Original Image

Photo of a park

Prompt

“Add snow to the scene.”

The model generates an updated version of the image.


Style Transfer

Style transfer applies artistic styles to images.


Example

Prompt

“Make this image look like a watercolor painting.”

The AI transforms the image style.


Inpainting

Inpainting fills missing or selected portions of images.


Example

A damaged image has missing areas that the AI reconstructs.


Outpainting

Outpainting expands images beyond their original boundaries.


Example

A cropped landscape image is extended to show more scenery.


Prompt Engineering

Prompt engineering involves crafting prompts that improve AI-generated results.

Good prompts are:

  • Clear
  • Detailed
  • Specific

Weak Prompt Example

“Create a dog.”


Better Prompt Example

“Create a realistic golden retriever sitting beside a lake during sunset.”


System Prompts

System prompts guide the overall behavior of the AI model.

They may define:

  • Safety rules
  • Content restrictions
  • Tone
  • Style preferences

Model Parameters

Generative AI models may use parameters that influence output behavior.

Common concepts include:

  • Creativity/randomness
  • Response length
  • Style guidance

For AI-901, conceptual understanding is more important than memorizing exact parameter names.


APIs and Endpoints

Applications communicate with deployed generative models using:

  • APIs
  • Endpoints

These allow prompts and images to be processed programmatically.


Authentication

Applications must securely authenticate before using Azure AI services.

Common authentication methods include:

  • API keys
  • Azure credentials
  • Managed identities

User Interface Components

A lightweight image-generation application may include:

  • Prompt text box
  • Image upload option
  • Generate button
  • Image display area

Real-Time Generation

Some applications generate images interactively in near real time.

This improves user experience and experimentation.


Common Real-World Scenarios


Scenario 1: Marketing Content Creation

Goal

Generate promotional graphics.

Features

  • Text-to-image generation
  • Brand-aligned designs
  • Rapid content creation

Scenario 2: Product Concept Design

Goal

Visualize product ideas.

Features

  • Prototype generation
  • Style experimentation
  • Rapid iteration

Scenario 3: Educational Content

Goal

Generate learning visuals and illustrations.

Features

  • Diagram generation
  • Visual storytelling
  • Accessibility support

Scenario 4: Entertainment and Gaming

Goal

Create concept art and environments.

Features

  • Character design
  • Landscape generation
  • Artistic experimentation

Responsible AI Considerations

Generative image applications should follow Responsible AI principles.

Key considerations include:

  • Fairness
  • Privacy
  • Transparency
  • Inclusiveness
  • Accountability
  • Security

Copyright and Intellectual Property

Organizations should consider:

  • Ownership rights
  • Licensing concerns
  • Use of copyrighted material

Generated content may still raise legal and ethical questions.


Harmful Content Risks

Generative AI systems may create:

  • Offensive content
  • Misleading images
  • Unsafe material

Content filtering and moderation are important safeguards.


Deepfakes

AI-generated images or videos designed to imitate real people are called deepfakes.

Deepfakes can create ethical and security concerns.


Hallucinations

Generative models may produce inaccurate or unrealistic outputs.

These incorrect outputs are called hallucinations.


Bias and Fairness

Generated images may unintentionally reflect societal biases.

Examples include:

  • Stereotypical portrayals
  • Uneven representation
  • Cultural bias

Transparency

Users should understand:

  • AI generated the image
  • Outputs may contain inaccuracies
  • Images may be synthetic rather than real

Error Handling

Applications should handle:

  • Invalid prompts
  • Unsupported file types
  • Network interruptions
  • Authentication failures
  • Rate limits

Advantages of Generative Image Models

Benefits include:

  • Faster content creation
  • Creative assistance
  • Rapid prototyping
  • Automation
  • Enhanced user engagement

Limitations of Generative Models

Challenges include:

  • Hallucinations
  • Bias
  • Ethical concerns
  • Copyright uncertainty
  • Variable output quality

High-Level Workflow

A simplified workflow includes:

  1. User enters prompt
  2. Application sends request
  3. Model generates image
  4. Application displays output

Example High-Level Pseudocode

prompt = get_prompt()
image = generate_image(prompt)
display_image(image)

For AI-901, understanding the workflow is more important than memorizing exact syntax.


Important AI-901 Exam Tips

For the exam, remember these key points:

  • Generative AI creates new content.
  • Text-to-image models generate images from prompts.
  • Azure AI Foundry supports generative AI development.
  • Prompt engineering improves output quality.
  • APIs and endpoints connect applications to AI services.
  • Authentication secures access to Azure AI resources.
  • Deepfakes are synthetic media designed to imitate real people.
  • Hallucinations are inaccurate AI-generated outputs.
  • Responsible AI principles apply to generative image systems.
  • Transparency is important when presenting AI-generated content.

Quick Knowledge Check

Question 1

What does a text-to-image model do?

Answer

Generates images from natural-language prompts.


Question 2

What is prompt engineering?

Answer

Designing prompts to improve AI-generated results.


Question 3

What are deepfakes?

Answer

AI-generated media designed to imitate real people.


Question 4

Why is transparency important in generative AI?

Answer

Users should understand that AI generated the content and that inaccuracies may exist.


Practice Exam Questions

Question 1

What is the PRIMARY purpose of a generative AI model?

A. To create new content based on learned patterns
B. To replace computer hardware
C. To increase internet bandwidth
D. To manage operating systems


Correct Answer

A. To create new content based on learned patterns


Explanation

Generative AI models create new outputs such as images, text, audio, or video using patterns learned during training.


Why the Other Answers Are Incorrect

B. To replace computer hardware

Generative AI is software-based and does not replace hardware.

C. To increase internet bandwidth

AI models do not improve network speeds.

D. To manage operating systems

Operating system management is unrelated to generative AI.


Question 2

What does a text-to-image model do?

A. Generates images from text prompts
B. Converts images into spreadsheets
C. Detects malware in files
D. Compresses image files automatically


Correct Answer

A. Generates images from text prompts


Explanation

Text-to-image models create images based on natural-language descriptions provided by users.


Why the Other Answers Are Incorrect

B. Converts images into spreadsheets

This is unrelated to generative AI.

C. Detects malware in files

This is a cybersecurity task.

D. Compresses image files automatically

Compression is unrelated to image generation.


Question 3

Which Microsoft platform provides tools for building and deploying generative AI applications?

A. Azure AI Foundry
B. Microsoft Paint
C. Windows File Explorer
D. Microsoft Notepad


Correct Answer

A. Azure AI Foundry


Explanation

Azure AI Foundry provides tools for deploying, testing, and managing AI-powered applications.


Why the Other Answers Are Incorrect

B. Microsoft Paint

Paint is a graphics editor, not an AI platform.

C. Windows File Explorer

This is a file management tool.

D. Microsoft Notepad

Notepad is a text editor.


Question 4

What is prompt engineering?

A. Designing prompts to improve AI-generated results
B. Repairing damaged computer hardware
C. Compressing images into smaller files
D. Monitoring internet traffic


Correct Answer

A. Designing prompts to improve AI-generated results


Explanation

Prompt engineering involves creating clear and specific prompts to guide AI systems toward better outputs.


Why the Other Answers Are Incorrect

B. Repairing damaged computer hardware

This is unrelated to AI prompting.

C. Compressing images into smaller files

Compression is unrelated to prompts.

D. Monitoring internet traffic

This is a networking task.


Question 5

Which prompt is MOST likely to generate a detailed image?

A. “Create a dog.”
B. “Generate.”
C. “Create a realistic golden retriever sitting beside a lake during sunset.”
D. “Image.”


Correct Answer

C. “Create a realistic golden retriever sitting beside a lake during sunset.”


Explanation

Detailed prompts generally produce more accurate and useful AI-generated images.


Why the Other Answers Are Incorrect

A. “Create a dog.”

This prompt is too vague.

B. “Generate.”

This provides almost no guidance.

D. “Image.”

This prompt is incomplete and unclear.


Question 6

What is inpainting?

A. Filling or reconstructing parts of an image
B. Converting speech into text
C. Detecting objects in video streams
D. Encrypting image files


Correct Answer

A. Filling or reconstructing parts of an image


Explanation

Inpainting allows AI to fill in missing or selected regions within an image.


Why the Other Answers Are Incorrect

B. Converting speech into text

This is speech recognition.

C. Detecting objects in video streams

This is a computer vision task.

D. Encrypting image files

Encryption is unrelated to inpainting.


Question 7

What are deepfakes?

A. AI-generated media designed to imitate real people
B. Hardware failures in AI systems
C. Encrypted image storage systems
D. High-speed networking protocols


Correct Answer

A. AI-generated media designed to imitate real people


Explanation

Deepfakes use generative AI to create realistic but synthetic media that imitates real individuals.


Why the Other Answers Are Incorrect

B. Hardware failures in AI systems

This is unrelated to generated media.

C. Encrypted image storage systems

This is unrelated to deepfakes.

D. High-speed networking protocols

Networking is unrelated to deepfake technology.


Question 8

How do applications typically communicate with deployed generative AI models?

A. Through APIs and endpoints
B. Through printer drivers
C. Through monitor calibration settings
D. Through USB-only connections


Correct Answer

A. Through APIs and endpoints


Explanation

Applications use APIs and endpoints to send prompts and receive generated outputs from AI services.


Why the Other Answers Are Incorrect

B. Through printer drivers

Printers are unrelated to AI communication.

C. Through monitor calibration settings

This is unrelated to cloud AI services.

D. Through USB-only connections

Cloud AI services use network communication.


Question 9

Which Responsible AI concern is especially important for generative image models?

A. Preventing harmful or misleading content generation
B. Increasing keyboard typing speed
C. Improving spreadsheet formulas
D. Reducing monitor power consumption


Correct Answer

A. Preventing harmful or misleading content generation


Explanation

Generative AI systems can potentially create unsafe, offensive, or misleading content, making moderation and safeguards important.


Why the Other Answers Are Incorrect

B. Increasing keyboard typing speed

This is unrelated to Responsible AI.

C. Improving spreadsheet formulas

This is unrelated to image generation.

D. Reducing monitor power consumption

This is unrelated to AI ethics.


Question 10

What are hallucinations in generative AI systems?

A. Inaccurate or fabricated AI-generated outputs
B. Hardware installation errors
C. Network outages
D. Audio playback failures


Correct Answer

A. Inaccurate or fabricated AI-generated outputs


Explanation

Hallucinations occur when generative AI produces incorrect, unrealistic, or invented outputs.


Why the Other Answers Are Incorrect

B. Hardware installation errors

This is unrelated to AI-generated content.

C. Network outages

This is a connectivity issue.

D. Audio playback failures

This is unrelated to generative image models.


Final Thoughts

Creating new visual outputs by using generative models is an important AI-901 certification topic. Microsoft expects candidates to understand the foundational concepts behind generative image AI, including text-to-image generation, prompt engineering, APIs, deployment, Responsible AI principles, hallucinations, and ethical considerations.

Azure AI Foundry provides powerful tools for building intelligent applications capable of generating creative visual content for business, education, accessibility, and entertainment scenarios.


Go to the AI-901 Exam Prep Hub main page

Interpret visual input in prompts by using a deployed multimodal model (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
--> Implement AI solutions with computer vision and image-generation capabilities by using Foundry
--> Interpret visual input in prompts by using a deployed multimodal model


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Modern AI systems are increasingly capable of understanding not only text and speech, but also visual information such as images and videos. Multimodal AI models combine multiple forms of input to generate intelligent responses and insights.

For the AI-901 certification exam, candidates should understand the foundational concepts behind interpreting visual input in prompts by using deployed multimodal models through Microsoft Azure AI Foundry and related Azure AI services.

This topic falls under the “Implement AI solutions with computer vision and image-generation capabilities by using Foundry” section of the AI-901 exam objectives.


What Is a Multimodal Model?

A multimodal model is an AI model capable of processing multiple types of input and output.

These modalities may include:

  • Text
  • Images
  • Speech/audio
  • Video

Multimodal models can combine information across different input types to generate responses.


What Is Visual Input?

Visual input refers to image or video data provided to an AI system.

Examples include:

  • Photographs
  • Screenshots
  • Documents
  • Charts
  • Diagrams
  • Videos

Example Visual Prompt

A user uploads a photo and asks:

“What objects are visible in this image?”

The AI analyzes the visual content and generates a response.


Computer Vision

Computer vision is the field of AI focused on enabling systems to interpret and understand visual information.

Computer vision tasks include:

  • Image classification
  • Object detection
  • Facial analysis
  • Optical character recognition (OCR)
  • Image captioning

Azure AI Vision

Azure AI Vision provides computer vision capabilities in Azure.

Features include:

  • Image analysis
  • OCR
  • Object detection
  • Image captioning
  • Face-related analysis

Azure AI Foundry

Azure AI Foundry provides tools for building and managing multimodal AI applications.

Developers can:

  • Deploy AI models
  • Test prompts
  • Analyze images
  • Build AI-powered apps

Deployed Models

A deployed model is an AI model made available for real-time use through a cloud endpoint.

Applications communicate with deployed models using APIs.


Visual Prompt Workflow

A common workflow includes:

  1. User uploads image
  2. Application sends image to multimodal model
  3. Model analyzes visual content
  4. Model generates response
  5. Application displays results

Example Workflow

User Uploads Image

A photo of a dog playing in a park

User Prompt

“Describe this image.”

AI Response

“A brown dog is running through a grassy park.”


Image Classification

Image classification identifies the primary category of an image.


Example

Image

Picture of a cat

Classification

“Cat”


Object Detection

Object detection identifies and locates multiple objects within an image.


Example

Image

Street scene

Detected Objects

  • Car
  • Bicycle
  • Traffic light
  • Pedestrian

Optical Character Recognition (OCR)

OCR extracts text from images or scanned documents.


Example

Image

Photo of a receipt

Extracted Text

  • Store name
  • Total amount
  • Date

Image Captioning

Image captioning generates natural-language descriptions of images.


Example

Image

A child flying a kite

Caption

“A child flying a colorful kite in a field.”


Visual Question Answering

Some multimodal models can answer questions about images.


Example

Prompt

“How many people are in the image?”

The model analyzes the image and generates an answer.


Combining Text and Images

Multimodal systems often combine:

  • Text prompts
  • Visual input

This improves contextual understanding.


Example

Image

A restaurant menu

Prompt

“Which item appears to be vegetarian?”

The AI analyzes both the image and the prompt together.


APIs and Endpoints

Applications communicate with deployed multimodal models through:

  • APIs
  • Endpoints

These allow images and prompts to be submitted programmatically.


Authentication

Applications must securely authenticate before accessing Azure AI services.

Common methods include:

  • API keys
  • Azure credentials
  • Managed identities

User Interface Components

A lightweight visual AI application may include:

  • Image upload area
  • Prompt input box
  • Results display
  • Image preview

Real-Time Processing

Many multimodal applications support near real-time image analysis.

This enables interactive user experiences.


Common Real-World Scenarios


Scenario 1: Accessibility Assistant

Goal

Describe visual content for visually impaired users.

Features

  • Image captioning
  • OCR
  • Voice output

Scenario 2: Retail Product Recognition

Goal

Identify products from images.

Features

  • Object detection
  • Classification
  • Product lookup

Scenario 3: Document Processing

Goal

Extract information from scanned forms.

Features

  • OCR
  • Text extraction
  • Data analysis

Scenario 4: Content Moderation

Goal

Identify harmful or unsafe visual content.

Features

  • Image analysis
  • Safety filtering
  • Automated moderation

Responsible AI Considerations

Visual AI applications should follow Responsible AI principles.

Key considerations include:

  • Privacy
  • Fairness
  • Transparency
  • Inclusiveness
  • Accountability
  • Security

Privacy Concerns

Images may contain:

  • Personal information
  • Faces
  • Sensitive documents

Organizations should protect user data appropriately.


Bias and Fairness

Computer vision systems may perform unevenly across:

  • Skin tones
  • Age groups
  • Lighting conditions
  • Demographics

Organizations should evaluate models carefully for fairness.


Transparency

Users should understand:

  • AI is analyzing images
  • AI-generated descriptions may contain errors
  • Images may be stored or processed in the cloud

Hallucinations

Multimodal AI systems may generate inaccurate visual descriptions.

These incorrect outputs are called hallucinations.

Applications should not assume all AI-generated outputs are accurate.


Error Handling

Applications should handle:

  • Unsupported image formats
  • Low-quality images
  • Network failures
  • Authentication errors
  • Rate limits

Image Quality Challenges

Poor image quality can reduce accuracy.

Examples include:

  • Blurry images
  • Poor lighting
  • Occluded objects
  • Low resolution

Advantages of Visual AI Applications

Benefits include:

  • Automation
  • Faster analysis
  • Accessibility improvements
  • Improved user experiences
  • Scalable image processing

Limitations of Visual AI Applications

Challenges include:

  • Recognition inaccuracies
  • Bias
  • Privacy concerns
  • Hallucinations
  • Sensitivity to image quality

High-Level Workflow

A simplified workflow includes:

  1. Upload image
  2. Send image and prompt to model
  3. Analyze visual content
  4. Generate response
  5. Display results

Example High-Level Pseudocode

image = upload_image()
prompt = get_prompt()
response = analyze_image(image, prompt)
display_response(response)

For AI-901, understanding the workflow is more important than memorizing exact syntax.


Important AI-901 Exam Tips

For the exam, remember these key points:

  • Multimodal models process multiple data types.
  • Visual input includes images and video.
  • Azure AI Vision supports computer vision workloads.
  • OCR extracts text from images.
  • Image captioning generates descriptions of images.
  • Object detection identifies multiple objects in images.
  • APIs and endpoints connect applications to AI services.
  • Authentication secures AI access.
  • Responsible AI principles apply to computer vision systems.
  • Hallucinations are inaccurate AI-generated outputs.

Quick Knowledge Check

Question 1

What is OCR used for?

Answer

Extracting text from images or scanned documents.


Question 2

What does image captioning do?

Answer

Generates natural-language descriptions of images.


Question 3

Why are multimodal models useful?

Answer

They can process multiple types of input such as text and images together.


Question 4

Why is fairness important in computer vision?

Answer

To reduce biased or uneven performance across different groups of people.


Practice Exam Questions

Question 1

What is a multimodal AI model?

A. A model that processes only text
B. A model capable of processing multiple types of input such as text and images
C. A model used only for networking
D. A model designed exclusively for spreadsheets


Correct Answer

B. A model capable of processing multiple types of input such as text and images


Explanation

Multimodal models can process and combine different forms of input, including text, images, audio, and video.


Why the Other Answers Are Incorrect

A. A model that processes only text

That describes a text-only model.

C. A model used only for networking

Networking is unrelated to multimodal AI.

D. A model designed exclusively for spreadsheets

This is unrelated to AI modalities.


Question 2

Which Azure service provides computer vision capabilities such as image analysis and OCR?

A. Azure AI Vision
B. Azure Backup
C. Azure Virtual Desktop
D. Azure Monitor


Correct Answer

A. Azure AI Vision


Explanation

Azure AI Vision provides computer vision features including OCR, object detection, and image captioning.


Why the Other Answers Are Incorrect

B. Azure Backup

This is a backup service.

C. Azure Virtual Desktop

This provides desktop virtualization.

D. Azure Monitor

This is used for monitoring and diagnostics.


Question 3

What does OCR stand for?

A. Optical Character Recognition
B. Operational Cloud Routing
C. Object Classification Registry
D. Open Compute Rendering


Correct Answer

A. Optical Character Recognition


Explanation

OCR extracts text from images or scanned documents.


Why the Other Answers Are Incorrect

B. Operational Cloud Routing

This is not an AI vision term.

C. Object Classification Registry

This is not the meaning of OCR.

D. Open Compute Rendering

This is unrelated to text extraction.


Question 4

What is the PRIMARY purpose of object detection?

A. To identify and locate objects within an image
B. To translate speech into text
C. To summarize long documents
D. To improve internet speed


Correct Answer

A. To identify and locate objects within an image


Explanation

Object detection identifies multiple objects and their positions within an image.


Why the Other Answers Are Incorrect

B. To translate speech into text

This is a speech recognition task.

C. To summarize long documents

This is a text analysis task.

D. To improve internet speed

Object detection does not affect networking.


Question 5

What does image captioning do?

A. Generates natural-language descriptions of images
B. Converts text into audio
C. Detects malware in files
D. Compresses images automatically


Correct Answer

A. Generates natural-language descriptions of images


Explanation

Image captioning uses AI to describe visual content in natural language.


Why the Other Answers Are Incorrect

B. Converts text into audio

This is speech synthesis.

C. Detects malware in files

This is unrelated to computer vision.

D. Compresses images automatically

Captioning does not perform compression.


Question 6

How do applications typically communicate with deployed multimodal models?

A. Through APIs and endpoints
B. Through USB-only connections
C. Through monitor drivers
D. Through spreadsheet templates


Correct Answer

A. Through APIs and endpoints


Explanation

Applications use APIs and endpoints to send prompts and images to AI services.


Why the Other Answers Are Incorrect

B. Through USB-only connections

Cloud AI services use network communication.

C. Through monitor drivers

These are unrelated to AI communication.

D. Through spreadsheet templates

This is unrelated to AI integration.


Question 7

Why is authentication important when accessing Azure AI services?

A. To secure access to AI resources
B. To increase image resolution
C. To improve keyboard performance
D. To reduce monitor brightness


Correct Answer

A. To secure access to AI resources


Explanation

Authentication ensures that only authorized users and applications can access Azure AI services.


Why the Other Answers Are Incorrect

B. To increase image resolution

Authentication does not affect image quality.

C. To improve keyboard performance

This is unrelated to AI services.

D. To reduce monitor brightness

Authentication does not control display settings.


Question 8

Which Responsible AI concern is especially important when analyzing images?

A. Protecting personal and sensitive visual information
B. Increasing video frame rates
C. Improving printer output quality
D. Accelerating spreadsheet calculations


Correct Answer

A. Protecting personal and sensitive visual information


Explanation

Images may contain faces, documents, or other sensitive information that must be protected.


Why the Other Answers Are Incorrect

B. Increasing video frame rates

This is unrelated to Responsible AI.

C. Improving printer output quality

Printers are unrelated to computer vision ethics.

D. Accelerating spreadsheet calculations

This is unrelated to image analysis.


Question 9

What are hallucinations in multimodal AI systems?

A. Incorrect or fabricated AI-generated outputs
B. Hardware installation failures
C. Internet connectivity issues
D. Audio recording problems


Correct Answer

A. Incorrect or fabricated AI-generated outputs


Explanation

Hallucinations occur when AI generates inaccurate or invented descriptions or answers.


Why the Other Answers Are Incorrect

B. Hardware installation failures

This is unrelated to AI-generated content.

C. Internet connectivity issues

This is a networking problem.

D. Audio recording problems

This relates to audio hardware or software.


Question 10

Which factor can negatively affect computer vision accuracy?

A. Poor image quality
B. Spreadsheet formatting
C. Screen brightness settings
D. Keyboard layout


Correct Answer

A. Poor image quality


Explanation

Blurry images, poor lighting, and low resolution can reduce computer vision accuracy.


Why the Other Answers Are Incorrect

B. Spreadsheet formatting

This does not affect image analysis.

C. Screen brightness settings

This does not directly affect AI image processing.

D. Keyboard layout

Keyboard settings are unrelated to computer vision.


Final Thoughts

Interpreting visual input using deployed multimodal models is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand the foundational concepts behind computer vision and multimodal AI applications, including image analysis, OCR, object detection, image captioning, APIs, authentication, and Responsible AI principles.

Azure AI Vision and Azure AI Foundry provide powerful tools for building intelligent applications capable of understanding and responding to visual information in real-world scenarios.


Go to the AI-901 Exam Prep Hub main page

Identify features and capabilities of Computer Vision and Image-Generation models (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Identify AI concepts and capabilities (40–45%)
--> Identify AI workloads
--> Identify features and capabilities of Computer Vision and Image-Generation models


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Computer vision and image-generation AI models are important AI workloads covered in the AI-901 certification exam. Microsoft expects candidates to understand how AI systems analyze visual information and generate new images using machine learning and deep learning technologies.

These AI capabilities are widely used in healthcare, manufacturing, security, retail, entertainment, accessibility, and many other industries.

This topic falls under the “Identify AI workloads” section of the AI-901 exam objectives.


What Is Computer Vision?

Computer vision is an AI workload that enables computers to analyze and interpret images and video.

Computer vision systems attempt to simulate human visual understanding.

These systems can:

  • Identify objects
  • Detect faces
  • Read text
  • Analyze scenes
  • Track movement
  • Recognize patterns

How Computer Vision Works

Computer vision models are typically trained using large collections of labeled images.

The models learn patterns such as:

  • Shapes
  • Colors
  • Textures
  • Edges
  • Spatial relationships

Modern computer vision systems commonly use:

  • Deep learning
  • Neural networks
  • Convolutional Neural Networks (CNNs)

Common Computer Vision Capabilities

For the AI-901 exam, important computer vision capabilities include:

  • Image classification
  • Object detection
  • Facial recognition
  • Optical Character Recognition (OCR)
  • Image analysis
  • Image tagging

Image Classification

Image classification identifies the primary subject or category of an image.

The model assigns labels to entire images.


Image Classification Example

Input

An image of a dog.

Output

“Dog”


Common Use Cases for Image Classification

Medical Imaging

Classifying medical scans.

Retail

Categorizing products automatically.

Agriculture

Identifying plant diseases.

Wildlife Monitoring

Recognizing animal species.


Object Detection

Object detection identifies and locates multiple objects within an image.

Unlike image classification, object detection can identify several objects and their positions.


Object Detection Example

Input

Street traffic image.

Output

  • Car
  • Pedestrian
  • Traffic light

with location boundaries around each object.


Common Use Cases for Object Detection

Autonomous Vehicles

Detecting vehicles and pedestrians.

Manufacturing

Identifying defective products.

Security Systems

Detecting unauthorized activity.

Retail Analytics

Monitoring customer movement in stores.


Facial Recognition

Facial recognition identifies or verifies individuals using facial features.


Common Facial Recognition Capabilities

Face Detection

Determines whether faces exist in an image.

Face Verification

Confirms whether two faces belong to the same person.

Face Identification

Identifies a person from a database of known individuals.


Common Use Cases for Facial Recognition

Smartphone Authentication

Unlocking phones using facial recognition.

Building Security

Controlling physical access.

Attendance Systems

Tracking employee attendance.

Airport Security

Identity verification systems.


Optical Character Recognition (OCR)

OCR extracts text from images, scanned documents, or photographs.

OCR converts visual text into machine-readable text.


OCR Example

Input

A scanned invoice image.

Output

Extracted text including:

  • Invoice number
  • Dates
  • Totals

Common OCR Use Cases

Invoice Processing

Automating financial workflows.

Document Digitization

Converting paper documents into searchable digital text.

Receipt Scanning

Extracting purchase information.

Accessibility

Reading text aloud for visually impaired users.


Image Tagging and Image Analysis

Image analysis systems can automatically generate descriptions or tags for images.


Example Tags

An image may receive tags such as:

  • Beach
  • Ocean
  • Sunset
  • Person

Common Use Cases

Photo Organization

Automatically categorizing photos.

Content Moderation

Identifying inappropriate images.

Search Optimization

Improving image search systems.


Video Analysis

Computer vision can also process video streams.

Common Video Analysis Tasks

  • Motion detection
  • Activity recognition
  • Traffic monitoring
  • Surveillance analysis

What Are Image-Generation Models?

Image-generation models create new images using AI.

These models learn visual patterns from training data and generate entirely new content.

Image-generation AI is part of generative AI.


How Image-Generation Models Work

Image-generation systems are trained on large image datasets.

The models learn relationships between:

  • Objects
  • Colors
  • Styles
  • Shapes
  • Text descriptions

Many systems use deep learning architectures such as:

  • Diffusion models
  • Generative Adversarial Networks (GANs)

Text-to-Image Generation

Text-to-image models generate images from written prompts.


Example

Prompt

“A futuristic city at sunset”

Output

An AI-generated image matching the description.


Common Use Cases for Image Generation

Marketing and Advertising

Creating promotional graphics.

Entertainment and Gaming

Generating concept art.

Design Assistance

Creating mockups or creative inspiration.

Education

Generating visual learning content.

Accessibility

Creating visual representations from text descriptions.


Image Editing and Enhancement

Some AI models can edit or enhance existing images.


Common Capabilities

  • Background removal
  • Image restoration
  • Colorization
  • Resolution enhancement
  • Style transfer

Deepfakes and Synthetic Media

AI-generated images and videos can create highly realistic synthetic content.

This technology can be useful but also creates ethical concerns.


Responsible AI Considerations

Computer vision and image-generation systems raise important Responsible AI considerations.

Organizations should consider:

  • Privacy
  • Consent
  • Bias
  • Security
  • Transparency
  • Misuse prevention

Bias in Vision Models

Computer vision systems may perform differently across demographic groups if training data is unbalanced.

Example risks include:

  • Facial recognition inaccuracies
  • Biased image classification
  • Unequal detection accuracy

Ethical Concerns with Image Generation

Potential concerns include:

  • Deepfakes
  • Misinformation
  • Copyright concerns
  • Identity misuse
  • Harmful content generation

Organizations should implement safeguards and moderation systems.


Azure AI Vision Services

Azure AI Vision Services provide prebuilt computer vision capabilities including:

  • Image analysis
  • OCR
  • Face detection
  • Object detection
  • Video analysis

Azure OpenAI and Image Generation

Azure OpenAI Service supports generative AI capabilities, including image-generation models.

These services help organizations build AI-powered creative applications.


Computer Vision vs. Image Generation

CapabilityPurpose
Computer VisionAnalyze and understand images
Image GenerationCreate new images

Real-World Examples


Scenario 1: Self-Driving Car

Goal

Detect vehicles and pedestrians.

Capability Used

Object detection


Scenario 2: Receipt Scanning App

Goal

Extract text from receipts.

Capability Used

OCR


Scenario 3: Social Media Photo Organization

Goal

Automatically tag uploaded photos.

Capability Used

Image analysis and tagging


Scenario 4: AI Art Generator

Goal

Create artwork from text prompts.

Capability Used

Image generation


Scenario 5: Smartphone Face Unlock

Goal

Verify user identity.

Capability Used

Facial recognition


Important AI-901 Exam Tips

For the exam, remember these key points:

  • Computer vision analyzes images and video.
  • Image classification labels entire images.
  • Object detection identifies and locates objects.
  • OCR extracts text from images.
  • Facial recognition identifies or verifies individuals.
  • Image-generation models create new images.
  • Text-to-image systems generate visuals from prompts.
  • Computer vision and generative AI are different workloads.
  • Responsible AI principles are important in vision systems.

Quick Knowledge Check

Question 1

What is the purpose of OCR?

Answer

To extract text from images or scanned documents.


Question 2

What is the difference between image classification and object detection?

Answer

Image classification labels an entire image, while object detection identifies and locates multiple objects within an image.


Question 3

What do image-generation models do?

Answer

They create new images using AI.


Question 4

Which AI capability is commonly used for smartphone face unlock?

Answer

Facial recognition.


Practice Exam Questions

Question 1

What is the PRIMARY purpose of computer vision?

A. Converting speech into text
B. Analyzing and understanding images and video
C. Predicting stock prices
D. Generating database queries


Correct Answer

B. Analyzing and understanding images and video


Explanation

Computer vision enables AI systems to interpret and analyze visual content such as images and video.


Why the Other Answers Are Incorrect

A. Converting speech into text

This is speech recognition.

C. Predicting stock prices

This is typically a regression task.

D. Generating database queries

This is unrelated to computer vision.


Question 2

Which computer vision capability identifies the main subject or category of an image?

A. OCR
B. Image classification
C. Speech synthesis
D. Clustering


Correct Answer

B. Image classification


Explanation

Image classification assigns labels or categories to entire images.


Why the Other Answers Are Incorrect

A. OCR

OCR extracts text from images.

C. Speech synthesis

Speech synthesis converts text into spoken audio.

D. Clustering

Clustering groups similar data.


Question 3

A self-driving car needs to identify pedestrians, traffic signs, and vehicles in real time.

Which AI capability is MOST appropriate?

A. Sentiment analysis
B. Object detection
C. Keyword extraction
D. Language detection


Correct Answer

B. Object detection


Explanation

Object detection identifies and locates multiple objects within images or video streams.


Why the Other Answers Are Incorrect

A. Sentiment analysis

Sentiment analysis evaluates emotional tone in text.

C. Keyword extraction

Keyword extraction identifies important phrases in text.

D. Language detection

Language detection identifies written languages.


Question 4

What is the PRIMARY purpose of Optical Character Recognition (OCR)?

A. Translating speech between languages
B. Extracting text from images or scanned documents
C. Detecting faces in photographs
D. Generating new artwork


Correct Answer

B. Extracting text from images or scanned documents


Explanation

OCR converts text within images into machine-readable text.


Why the Other Answers Are Incorrect

A. Translating speech between languages

This is speech translation.

C. Detecting faces in photographs

This is facial recognition or face detection.

D. Generating new artwork

This is an image-generation capability.


Question 5

Which AI capability is commonly used for smartphone face unlock features?

A. Facial recognition
B. Speech recognition
C. Regression
D. Text summarization


Correct Answer

A. Facial recognition


Explanation

Facial recognition systems identify or verify users using facial features.


Why the Other Answers Are Incorrect

B. Speech recognition

Speech recognition processes spoken language.

C. Regression

Regression predicts numeric values.

D. Text summarization

Summarization condenses text.


Question 6

What is the PRIMARY function of image-generation models?

A. Extracting text from images
B. Creating new images using AI
C. Detecting network intrusions
D. Translating written languages


Correct Answer

B. Creating new images using AI


Explanation

Image-generation models produce new visual content based on learned patterns and prompts.


Why the Other Answers Are Incorrect

A. Extracting text from images

This is OCR.

C. Detecting network intrusions

This is unrelated to image generation.

D. Translating written languages

This is an NLP capability.


Question 7

Which example BEST represents a text-to-image generation system?

A. A chatbot answering questions
B. An AI model creating artwork from a written prompt
C. A speech recognition application
D. A recommendation engine


Correct Answer

B. An AI model creating artwork from a written prompt


Explanation

Text-to-image systems generate images based on textual descriptions.


Why the Other Answers Are Incorrect

A. A chatbot answering questions

This is generative text AI.

C. A speech recognition application

Speech recognition converts speech into text.

D. A recommendation engine

Recommendation systems suggest products or content.


Question 8

What is the key difference between image classification and object detection?

A. Image classification processes audio while object detection processes video
B. Image classification labels an entire image, while object detection identifies and locates multiple objects
C. Object detection only works with text
D. There is no difference


Correct Answer

B. Image classification labels an entire image, while object detection identifies and locates multiple objects


Explanation

Image classification provides a label for an entire image, while object detection identifies multiple objects and their locations.


Why the Other Answers Are Incorrect

A. Image classification processes audio while object detection processes video

Both work with visual data.

C. Object detection only works with text

Object detection works with images and video.

D. There is no difference

These are distinct computer vision tasks.


Question 9

Which Responsible AI concern is MOST associated with image-generation systems?

A. Deepfakes and synthetic media misuse
B. Spreadsheet formatting errors
C. SQL indexing problems
D. Network bandwidth allocation


Correct Answer

A. Deepfakes and synthetic media misuse


Explanation

Image-generation AI can create highly realistic synthetic content, raising concerns about misinformation and misuse.


Why the Other Answers Are Incorrect

B. Spreadsheet formatting errors

This is unrelated to AI image generation.

C. SQL indexing problems

This is a database issue.

D. Network bandwidth allocation

This is unrelated to Responsible AI concerns.


Question 10

A retailer wants to automatically categorize product photos into categories such as shoes, shirts, and electronics.

Which AI capability is MOST appropriate?

A. Image classification
B. OCR
C. Speech synthesis
D. Sentiment analysis


Correct Answer

A. Image classification


Explanation

Image classification assigns category labels to images based on visual content.


Why the Other Answers Are Incorrect

B. OCR

OCR extracts text from images.

C. Speech synthesis

Speech synthesis generates spoken audio.

D. Sentiment analysis

Sentiment analysis evaluates emotional tone in text.


Final Thoughts

Computer vision and image-generation AI models are essential components of modern AI systems and important topics for the AI-901 certification exam. Microsoft expects candidates to understand how AI systems analyze visual information and generate new content, along with common business scenarios where these technologies are applied.

These capabilities help organizations build intelligent visual applications using Azure AI services and generative AI technologies.


Go to the AI-901 Exam Prep Hub main page