This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Identify AI concepts and capabilities (40–45%)
   --> Identify AI workloads
      --> Identify features and capabilities of Computer Vision and Image-Generation models

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Computer vision and image-generation AI models are important AI workloads covered in the AI-901 certification exam. Microsoft expects candidates to understand how AI systems analyze visual information and generate new images using machine learning and deep learning technologies.

These AI capabilities are widely used in healthcare, manufacturing, security, retail, entertainment, accessibility, and many other industries.

This topic falls under the “Identify AI workloads” section of the AI-901 exam objectives.

What Is Computer Vision?

Computer vision is an AI workload that enables computers to analyze and interpret images and video.

Computer vision systems attempt to simulate human visual understanding.

These systems can:

Identify objects
Detect faces
Read text
Analyze scenes
Track movement
Recognize patterns

How Computer Vision Works

Computer vision models are typically trained using large collections of labeled images.

The models learn patterns such as:

Shapes
Colors
Textures
Edges
Spatial relationships

Modern computer vision systems commonly use:

Deep learning
Neural networks
Convolutional Neural Networks (CNNs)

Common Computer Vision Capabilities

For the AI-901 exam, important computer vision capabilities include:

Image classification
Object detection
Facial recognition
Optical Character Recognition (OCR)
Image analysis
Image tagging

Image Classification

Image classification identifies the primary subject or category of an image.

The model assigns labels to entire images.

Image Classification Example

Input

An image of a dog.

Output

“Dog”

Common Use Cases for Image Classification

Medical Imaging

Classifying medical scans.

Retail

Categorizing products automatically.

Agriculture

Identifying plant diseases.

Wildlife Monitoring

Recognizing animal species.

Object Detection

Object detection identifies and locates multiple objects within an image.

Unlike image classification, object detection can identify several objects and their positions.

Object Detection Example

Input

Street traffic image.

Output

Car
Pedestrian
Traffic light

with location boundaries around each object.

Common Use Cases for Object Detection

Autonomous Vehicles

Detecting vehicles and pedestrians.

Manufacturing

Identifying defective products.

Security Systems

Detecting unauthorized activity.

Retail Analytics

Monitoring customer movement in stores.

Facial Recognition

Facial recognition identifies or verifies individuals using facial features.

Common Facial Recognition Capabilities

Face Detection

Determines whether faces exist in an image.

Face Verification

Confirms whether two faces belong to the same person.

Face Identification

Identifies a person from a database of known individuals.

Common Use Cases for Facial Recognition

Smartphone Authentication

Unlocking phones using facial recognition.

Building Security

Controlling physical access.

Attendance Systems

Tracking employee attendance.

Airport Security

Identity verification systems.

Optical Character Recognition (OCR)

OCR extracts text from images, scanned documents, or photographs.

OCR converts visual text into machine-readable text.

OCR Example

Input

A scanned invoice image.

Output

Extracted text including:

Invoice number
Dates
Totals

Common OCR Use Cases

Invoice Processing

Automating financial workflows.

Document Digitization

Converting paper documents into searchable digital text.

Receipt Scanning

Extracting purchase information.

Accessibility

Reading text aloud for visually impaired users.

Image Tagging and Image Analysis

Image analysis systems can automatically generate descriptions or tags for images.

Example Tags

An image may receive tags such as:

Beach
Ocean
Sunset
Person

Common Use Cases

Photo Organization

Automatically categorizing photos.

Content Moderation

Identifying inappropriate images.

Search Optimization

Improving image search systems.

Video Analysis

Computer vision can also process video streams.

Common Video Analysis Tasks

Motion detection
Activity recognition
Traffic monitoring
Surveillance analysis

What Are Image-Generation Models?

Image-generation models create new images using AI.

These models learn visual patterns from training data and generate entirely new content.

Image-generation AI is part of generative AI.

How Image-Generation Models Work

Image-generation systems are trained on large image datasets.

The models learn relationships between:

Objects
Colors
Styles
Shapes
Text descriptions

Many systems use deep learning architectures such as:

Diffusion models
Generative Adversarial Networks (GANs)

Text-to-Image Generation

Text-to-image models generate images from written prompts.

Example

Prompt

“A futuristic city at sunset”

Output

An AI-generated image matching the description.

Common Use Cases for Image Generation

Marketing and Advertising

Creating promotional graphics.

Entertainment and Gaming

Generating concept art.

Design Assistance

Creating mockups or creative inspiration.

Education

Generating visual learning content.

Accessibility

Creating visual representations from text descriptions.

Image Editing and Enhancement

Some AI models can edit or enhance existing images.

Common Capabilities

Background removal
Image restoration
Colorization
Resolution enhancement
Style transfer

Deepfakes and Synthetic Media

AI-generated images and videos can create highly realistic synthetic content.

This technology can be useful but also creates ethical concerns.

Responsible AI Considerations

Computer vision and image-generation systems raise important Responsible AI considerations.

Organizations should consider:

Privacy
Consent
Bias
Security
Transparency
Misuse prevention

Bias in Vision Models

Computer vision systems may perform differently across demographic groups if training data is unbalanced.

Example risks include:

Facial recognition inaccuracies
Biased image classification
Unequal detection accuracy

Ethical Concerns with Image Generation

Potential concerns include:

Deepfakes
Misinformation
Copyright concerns
Identity misuse
Harmful content generation

Organizations should implement safeguards and moderation systems.

Azure AI Vision Services

Azure AI Vision Services provide prebuilt computer vision capabilities including:

Image analysis
OCR
Face detection
Object detection
Video analysis

Azure OpenAI and Image Generation

Azure OpenAI Service supports generative AI capabilities, including image-generation models.

These services help organizations build AI-powered creative applications.

Computer Vision vs. Image Generation

Capability	Purpose
Computer Vision	Analyze and understand images
Image Generation	Create new images

Real-World Examples

Scenario 1: Self-Driving Car

Goal

Detect vehicles and pedestrians.

Capability Used

Object detection

Scenario 2: Receipt Scanning App

Goal

Extract text from receipts.

Capability Used

OCR

Scenario 3: Social Media Photo Organization

Goal

Automatically tag uploaded photos.

Capability Used

Image analysis and tagging

Scenario 4: AI Art Generator

Goal

Create artwork from text prompts.

Capability Used

Image generation

Scenario 5: Smartphone Face Unlock

Goal

Verify user identity.

Capability Used

Facial recognition

Important AI-901 Exam Tips

For the exam, remember these key points:

Computer vision analyzes images and video.
Image classification labels entire images.
Object detection identifies and locates objects.
OCR extracts text from images.
Facial recognition identifies or verifies individuals.
Image-generation models create new images.
Text-to-image systems generate visuals from prompts.
Computer vision and generative AI are different workloads.
Responsible AI principles are important in vision systems.

Quick Knowledge Check

Question 1

What is the purpose of OCR?

Answer

To extract text from images or scanned documents.

Question 2

What is the difference between image classification and object detection?

Answer

Image classification labels an entire image, while object detection identifies and locates multiple objects within an image.

Question 3

What do image-generation models do?

Answer

They create new images using AI.

Question 4

Which AI capability is commonly used for smartphone face unlock?

Answer

Facial recognition.

Practice Exam Questions

Question 1

What is the PRIMARY purpose of computer vision?

A. Converting speech into text
B. Analyzing and understanding images and video
C. Predicting stock prices
D. Generating database queries

Correct Answer

B. Analyzing and understanding images and video

Explanation

Computer vision enables AI systems to interpret and analyze visual content such as images and video.

Why the Other Answers Are Incorrect

A. Converting speech into text

This is speech recognition.

C. Predicting stock prices

This is typically a regression task.

D. Generating database queries

This is unrelated to computer vision.

Question 2

Which computer vision capability identifies the main subject or category of an image?

A. OCR
B. Image classification
C. Speech synthesis
D. Clustering

Correct Answer

B. Image classification

Explanation

Image classification assigns labels or categories to entire images.

Why the Other Answers Are Incorrect

A. OCR

OCR extracts text from images.

C. Speech synthesis

Speech synthesis converts text into spoken audio.

D. Clustering

Clustering groups similar data.

Question 3

A self-driving car needs to identify pedestrians, traffic signs, and vehicles in real time.

Which AI capability is MOST appropriate?

A. Sentiment analysis
B. Object detection
C. Keyword extraction
D. Language detection

Correct Answer

B. Object detection

Explanation

Object detection identifies and locates multiple objects within images or video streams.

Why the Other Answers Are Incorrect

A. Sentiment analysis

Sentiment analysis evaluates emotional tone in text.

C. Keyword extraction

Keyword extraction identifies important phrases in text.

D. Language detection

Language detection identifies written languages.

Question 4

What is the PRIMARY purpose of Optical Character Recognition (OCR)?

A. Translating speech between languages
B. Extracting text from images or scanned documents
C. Detecting faces in photographs
D. Generating new artwork

Correct Answer

B. Extracting text from images or scanned documents

Explanation

OCR converts text within images into machine-readable text.

Why the Other Answers Are Incorrect

A. Translating speech between languages

This is speech translation.

C. Detecting faces in photographs

This is facial recognition or face detection.

D. Generating new artwork

This is an image-generation capability.

Question 5

Which AI capability is commonly used for smartphone face unlock features?

A. Facial recognition
B. Speech recognition
C. Regression
D. Text summarization

Correct Answer

A. Facial recognition

Explanation

Facial recognition systems identify or verify users using facial features.

Why the Other Answers Are Incorrect

B. Speech recognition

Speech recognition processes spoken language.

C. Regression

Regression predicts numeric values.

D. Text summarization

Summarization condenses text.

Question 6

What is the PRIMARY function of image-generation models?

A. Extracting text from images
B. Creating new images using AI
C. Detecting network intrusions
D. Translating written languages

Correct Answer

B. Creating new images using AI

Explanation

Image-generation models produce new visual content based on learned patterns and prompts.

Why the Other Answers Are Incorrect

A. Extracting text from images

This is OCR.

C. Detecting network intrusions

This is unrelated to image generation.

D. Translating written languages

This is an NLP capability.

Question 7

Which example BEST represents a text-to-image generation system?

A. A chatbot answering questions
B. An AI model creating artwork from a written prompt
C. A speech recognition application
D. A recommendation engine

Correct Answer

B. An AI model creating artwork from a written prompt

Explanation

Text-to-image systems generate images based on textual descriptions.

Why the Other Answers Are Incorrect

A. A chatbot answering questions

This is generative text AI.

C. A speech recognition application

Speech recognition converts speech into text.

D. A recommendation engine

Recommendation systems suggest products or content.

Question 8

What is the key difference between image classification and object detection?

A. Image classification processes audio while object detection processes video
B. Image classification labels an entire image, while object detection identifies and locates multiple objects
C. Object detection only works with text
D. There is no difference

Correct Answer

B. Image classification labels an entire image, while object detection identifies and locates multiple objects

Explanation

Image classification provides a label for an entire image, while object detection identifies multiple objects and their locations.

Why the Other Answers Are Incorrect

A. Image classification processes audio while object detection processes video

Both work with visual data.

C. Object detection only works with text

Object detection works with images and video.

D. There is no difference

These are distinct computer vision tasks.

Question 9

Which Responsible AI concern is MOST associated with image-generation systems?

A. Deepfakes and synthetic media misuse
B. Spreadsheet formatting errors
C. SQL indexing problems
D. Network bandwidth allocation

Correct Answer

A. Deepfakes and synthetic media misuse

Explanation

Image-generation AI can create highly realistic synthetic content, raising concerns about misinformation and misuse.

Why the Other Answers Are Incorrect

B. Spreadsheet formatting errors

This is unrelated to AI image generation.

C. SQL indexing problems

This is a database issue.

D. Network bandwidth allocation

This is unrelated to Responsible AI concerns.

Question 10

A retailer wants to automatically categorize product photos into categories such as shoes, shirts, and electronics.

Which AI capability is MOST appropriate?

A. Image classification
B. OCR
C. Speech synthesis
D. Sentiment analysis

Correct Answer

A. Image classification

Explanation

Image classification assigns category labels to images based on visual content.

Why the Other Answers Are Incorrect

B. OCR

OCR extracts text from images.

C. Speech synthesis

Speech synthesis generates spoken audio.

D. Sentiment analysis

Sentiment analysis evaluates emotional tone in text.

Final Thoughts

Computer vision and image-generation AI models are essential components of modern AI systems and important topics for the AI-901 certification exam. Microsoft expects candidates to understand how AI systems analyze visual information and generate new content, along with common business scenarios where these technologies are applied.

These capabilities help organizations build intelligent visual applications using Azure AI services and generative AI technologies.

Go to the AI-901 Exam Prep Hub main page