This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub.
This topic falls under these sections:
Identify AI concepts and capabilities (40–45%)
--> Identify AI workloads
--> Identify features and capabilities of Computer Vision and Image-Generation models
Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.
Computer vision and image-generation AI models are important AI workloads covered in the AI-901 certification exam. Microsoft expects candidates to understand how AI systems analyze visual information and generate new images using machine learning and deep learning technologies.
These AI capabilities are widely used in healthcare, manufacturing, security, retail, entertainment, accessibility, and many other industries.
This topic falls under the “Identify AI workloads” section of the AI-901 exam objectives.
What Is Computer Vision?
Computer vision is an AI workload that enables computers to analyze and interpret images and video.
Computer vision systems attempt to simulate human visual understanding.
These systems can:
- Identify objects
- Detect faces
- Read text
- Analyze scenes
- Track movement
- Recognize patterns
How Computer Vision Works
Computer vision models are typically trained using large collections of labeled images.
The models learn patterns such as:
- Shapes
- Colors
- Textures
- Edges
- Spatial relationships
Modern computer vision systems commonly use:
- Deep learning
- Neural networks
- Convolutional Neural Networks (CNNs)
Common Computer Vision Capabilities
For the AI-901 exam, important computer vision capabilities include:
- Image classification
- Object detection
- Facial recognition
- Optical Character Recognition (OCR)
- Image analysis
- Image tagging
Image Classification
Image classification identifies the primary subject or category of an image.
The model assigns labels to entire images.
Image Classification Example
Input
An image of a dog.
Output
“Dog”
Common Use Cases for Image Classification
Medical Imaging
Classifying medical scans.
Retail
Categorizing products automatically.
Agriculture
Identifying plant diseases.
Wildlife Monitoring
Recognizing animal species.
Object Detection
Object detection identifies and locates multiple objects within an image.
Unlike image classification, object detection can identify several objects and their positions.
Object Detection Example
Input
Street traffic image.
Output
- Car
- Pedestrian
- Traffic light
with location boundaries around each object.
Common Use Cases for Object Detection
Autonomous Vehicles
Detecting vehicles and pedestrians.
Manufacturing
Identifying defective products.
Security Systems
Detecting unauthorized activity.
Retail Analytics
Monitoring customer movement in stores.
Facial Recognition
Facial recognition identifies or verifies individuals using facial features.
Common Facial Recognition Capabilities
Face Detection
Determines whether faces exist in an image.
Face Verification
Confirms whether two faces belong to the same person.
Face Identification
Identifies a person from a database of known individuals.
Common Use Cases for Facial Recognition
Smartphone Authentication
Unlocking phones using facial recognition.
Building Security
Controlling physical access.
Attendance Systems
Tracking employee attendance.
Airport Security
Identity verification systems.
Optical Character Recognition (OCR)
OCR extracts text from images, scanned documents, or photographs.
OCR converts visual text into machine-readable text.
OCR Example
Input
A scanned invoice image.
Output
Extracted text including:
- Invoice number
- Dates
- Totals
Common OCR Use Cases
Invoice Processing
Automating financial workflows.
Document Digitization
Converting paper documents into searchable digital text.
Receipt Scanning
Extracting purchase information.
Accessibility
Reading text aloud for visually impaired users.
Image Tagging and Image Analysis
Image analysis systems can automatically generate descriptions or tags for images.
Example Tags
An image may receive tags such as:
- Beach
- Ocean
- Sunset
- Person
Common Use Cases
Photo Organization
Automatically categorizing photos.
Content Moderation
Identifying inappropriate images.
Search Optimization
Improving image search systems.
Video Analysis
Computer vision can also process video streams.
Common Video Analysis Tasks
- Motion detection
- Activity recognition
- Traffic monitoring
- Surveillance analysis
What Are Image-Generation Models?
Image-generation models create new images using AI.
These models learn visual patterns from training data and generate entirely new content.
Image-generation AI is part of generative AI.
How Image-Generation Models Work
Image-generation systems are trained on large image datasets.
The models learn relationships between:
- Objects
- Colors
- Styles
- Shapes
- Text descriptions
Many systems use deep learning architectures such as:
- Diffusion models
- Generative Adversarial Networks (GANs)
Text-to-Image Generation
Text-to-image models generate images from written prompts.
Example
Prompt
“A futuristic city at sunset”
Output
An AI-generated image matching the description.
Common Use Cases for Image Generation
Marketing and Advertising
Creating promotional graphics.
Entertainment and Gaming
Generating concept art.
Design Assistance
Creating mockups or creative inspiration.
Education
Generating visual learning content.
Accessibility
Creating visual representations from text descriptions.
Image Editing and Enhancement
Some AI models can edit or enhance existing images.
Common Capabilities
- Background removal
- Image restoration
- Colorization
- Resolution enhancement
- Style transfer
Deepfakes and Synthetic Media
AI-generated images and videos can create highly realistic synthetic content.
This technology can be useful but also creates ethical concerns.
Responsible AI Considerations
Computer vision and image-generation systems raise important Responsible AI considerations.
Organizations should consider:
- Privacy
- Consent
- Bias
- Security
- Transparency
- Misuse prevention
Bias in Vision Models
Computer vision systems may perform differently across demographic groups if training data is unbalanced.
Example risks include:
- Facial recognition inaccuracies
- Biased image classification
- Unequal detection accuracy
Ethical Concerns with Image Generation
Potential concerns include:
- Deepfakes
- Misinformation
- Copyright concerns
- Identity misuse
- Harmful content generation
Organizations should implement safeguards and moderation systems.
Azure AI Vision Services
Azure AI Vision Services provide prebuilt computer vision capabilities including:
- Image analysis
- OCR
- Face detection
- Object detection
- Video analysis
Azure OpenAI and Image Generation
Azure OpenAI Service supports generative AI capabilities, including image-generation models.
These services help organizations build AI-powered creative applications.
Computer Vision vs. Image Generation
| Capability | Purpose |
|---|---|
| Computer Vision | Analyze and understand images |
| Image Generation | Create new images |
Real-World Examples
Scenario 1: Self-Driving Car
Goal
Detect vehicles and pedestrians.
Capability Used
Object detection
Scenario 2: Receipt Scanning App
Goal
Extract text from receipts.
Capability Used
OCR
Scenario 3: Social Media Photo Organization
Goal
Automatically tag uploaded photos.
Capability Used
Image analysis and tagging
Scenario 4: AI Art Generator
Goal
Create artwork from text prompts.
Capability Used
Image generation
Scenario 5: Smartphone Face Unlock
Goal
Verify user identity.
Capability Used
Facial recognition
Important AI-901 Exam Tips
For the exam, remember these key points:
- Computer vision analyzes images and video.
- Image classification labels entire images.
- Object detection identifies and locates objects.
- OCR extracts text from images.
- Facial recognition identifies or verifies individuals.
- Image-generation models create new images.
- Text-to-image systems generate visuals from prompts.
- Computer vision and generative AI are different workloads.
- Responsible AI principles are important in vision systems.
Quick Knowledge Check
Question 1
What is the purpose of OCR?
Answer
To extract text from images or scanned documents.
Question 2
What is the difference between image classification and object detection?
Answer
Image classification labels an entire image, while object detection identifies and locates multiple objects within an image.
Question 3
What do image-generation models do?
Answer
They create new images using AI.
Question 4
Which AI capability is commonly used for smartphone face unlock?
Answer
Facial recognition.
Practice Exam Questions
Question 1
What is the PRIMARY purpose of computer vision?
A. Converting speech into text
B. Analyzing and understanding images and video
C. Predicting stock prices
D. Generating database queries
Correct Answer
B. Analyzing and understanding images and video
Explanation
Computer vision enables AI systems to interpret and analyze visual content such as images and video.
Why the Other Answers Are Incorrect
A. Converting speech into text
This is speech recognition.
C. Predicting stock prices
This is typically a regression task.
D. Generating database queries
This is unrelated to computer vision.
Question 2
Which computer vision capability identifies the main subject or category of an image?
A. OCR
B. Image classification
C. Speech synthesis
D. Clustering
Correct Answer
B. Image classification
Explanation
Image classification assigns labels or categories to entire images.
Why the Other Answers Are Incorrect
A. OCR
OCR extracts text from images.
C. Speech synthesis
Speech synthesis converts text into spoken audio.
D. Clustering
Clustering groups similar data.
Question 3
A self-driving car needs to identify pedestrians, traffic signs, and vehicles in real time.
Which AI capability is MOST appropriate?
A. Sentiment analysis
B. Object detection
C. Keyword extraction
D. Language detection
Correct Answer
B. Object detection
Explanation
Object detection identifies and locates multiple objects within images or video streams.
Why the Other Answers Are Incorrect
A. Sentiment analysis
Sentiment analysis evaluates emotional tone in text.
C. Keyword extraction
Keyword extraction identifies important phrases in text.
D. Language detection
Language detection identifies written languages.
Question 4
What is the PRIMARY purpose of Optical Character Recognition (OCR)?
A. Translating speech between languages
B. Extracting text from images or scanned documents
C. Detecting faces in photographs
D. Generating new artwork
Correct Answer
B. Extracting text from images or scanned documents
Explanation
OCR converts text within images into machine-readable text.
Why the Other Answers Are Incorrect
A. Translating speech between languages
This is speech translation.
C. Detecting faces in photographs
This is facial recognition or face detection.
D. Generating new artwork
This is an image-generation capability.
Question 5
Which AI capability is commonly used for smartphone face unlock features?
A. Facial recognition
B. Speech recognition
C. Regression
D. Text summarization
Correct Answer
A. Facial recognition
Explanation
Facial recognition systems identify or verify users using facial features.
Why the Other Answers Are Incorrect
B. Speech recognition
Speech recognition processes spoken language.
C. Regression
Regression predicts numeric values.
D. Text summarization
Summarization condenses text.
Question 6
What is the PRIMARY function of image-generation models?
A. Extracting text from images
B. Creating new images using AI
C. Detecting network intrusions
D. Translating written languages
Correct Answer
B. Creating new images using AI
Explanation
Image-generation models produce new visual content based on learned patterns and prompts.
Why the Other Answers Are Incorrect
A. Extracting text from images
This is OCR.
C. Detecting network intrusions
This is unrelated to image generation.
D. Translating written languages
This is an NLP capability.
Question 7
Which example BEST represents a text-to-image generation system?
A. A chatbot answering questions
B. An AI model creating artwork from a written prompt
C. A speech recognition application
D. A recommendation engine
Correct Answer
B. An AI model creating artwork from a written prompt
Explanation
Text-to-image systems generate images based on textual descriptions.
Why the Other Answers Are Incorrect
A. A chatbot answering questions
This is generative text AI.
C. A speech recognition application
Speech recognition converts speech into text.
D. A recommendation engine
Recommendation systems suggest products or content.
Question 8
What is the key difference between image classification and object detection?
A. Image classification processes audio while object detection processes video
B. Image classification labels an entire image, while object detection identifies and locates multiple objects
C. Object detection only works with text
D. There is no difference
Correct Answer
B. Image classification labels an entire image, while object detection identifies and locates multiple objects
Explanation
Image classification provides a label for an entire image, while object detection identifies multiple objects and their locations.
Why the Other Answers Are Incorrect
A. Image classification processes audio while object detection processes video
Both work with visual data.
C. Object detection only works with text
Object detection works with images and video.
D. There is no difference
These are distinct computer vision tasks.
Question 9
Which Responsible AI concern is MOST associated with image-generation systems?
A. Deepfakes and synthetic media misuse
B. Spreadsheet formatting errors
C. SQL indexing problems
D. Network bandwidth allocation
Correct Answer
A. Deepfakes and synthetic media misuse
Explanation
Image-generation AI can create highly realistic synthetic content, raising concerns about misinformation and misuse.
Why the Other Answers Are Incorrect
B. Spreadsheet formatting errors
This is unrelated to AI image generation.
C. SQL indexing problems
This is a database issue.
D. Network bandwidth allocation
This is unrelated to Responsible AI concerns.
Question 10
A retailer wants to automatically categorize product photos into categories such as shoes, shirts, and electronics.
Which AI capability is MOST appropriate?
A. Image classification
B. OCR
C. Speech synthesis
D. Sentiment analysis
Correct Answer
A. Image classification
Explanation
Image classification assigns category labels to images based on visual content.
Why the Other Answers Are Incorrect
B. OCR
OCR extracts text from images.
C. Speech synthesis
Speech synthesis generates spoken audio.
D. Sentiment analysis
Sentiment analysis evaluates emotional tone in text.
Final Thoughts
Computer vision and image-generation AI models are essential components of modern AI systems and important topics for the AI-901 certification exam. Microsoft expects candidates to understand how AI systems analyze visual information and generate new content, along with common business scenarios where these technologies are applied.
These capabilities help organizations build intelligent visual applications using Azure AI services and generative AI technologies.
Go to the AI-901 Exam Prep Hub main page

One thought on “Identify features and capabilities of Computer Vision and Image-Generation models (AI-901 Exam Prep)”