This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement computer vision solutions (10–15%)
   --> Design and implement multimodal understanding workflows
      --> Configure generation of alt-text and extended image descriptions aligned to accessibility guidelines

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Accessibility is a critical requirement in modern AI applications. Multimodal AI systems can automatically generate:

Alt-text
Image captions
Extended image descriptions
Contextual accessibility summaries

These capabilities improve usability for individuals who rely on:

Screen readers
Assistive technologies
Audio narration
Alternative interfaces

For the AI-103 certification exam, you should understand how to configure systems that generate accessible image descriptions aligned with accessibility standards and Responsible AI principles.

This includes:

Alt-text generation
Extended descriptions
Accessibility-focused prompting
Multimodal understanding workflows
Caption quality validation
Accessibility compliance
Responsible AI considerations

You should also understand:

WCAG accessibility concepts
Concise vs detailed descriptions
OCR-enhanced accessibility workflows
Human review processes
Azure services used for accessibility-focused AI solutions

This topic falls under:

“Design and implement multimodal understanding workflows”

What Is Alt-Text?

Definition

Alt-text (alternative text) is a textual description of an image used by assistive technologies such as screen readers.

Alt-text helps users who cannot see images understand visual content.

Example of Alt-Text

Image:

A woman reading a book in a park

Alt-text:

A woman sitting on a park bench reading a book beneath a large tree

Purpose of Alt-Text

Alt-text improves:

Accessibility
Inclusion
Search indexing
Content usability

It is especially important for:

Websites
Mobile apps
Educational platforms
E-commerce systems

What Are Extended Image Descriptions?

Definition

Extended image descriptions provide more detailed explanations than standard alt-text.

These are useful for:

Complex charts
Infographics
Educational diagrams
Scientific imagery
Data visualizations

Example of Extended Description

Image:

Sales dashboard

Extended description:

			
A dashboard displaying quarterly sales trends from January through December. Sales rise steadily from Q1 to Q3 before declining slightly in Q4. The highest-performing category is electronics.

Concise vs Extended Descriptions

Concise Alt-Text

Short and focused.

Example:

A red sports car parked beside a city street

Best for:

Simple images
Fast accessibility reading

Extended Descriptions

Detailed and contextual.

Example:

			
A red convertible sports car is parked beside a busy downtown street lined with office buildings and pedestrians during the evening rush hour

Best for:

Complex scenes
Educational content
Accessibility enhancement

Accessibility Standards

WCAG Overview

Accessibility systems often align with:
World Wide Web Consortium
Web Content Accessibility Guidelines (WCAG).

WCAG focuses on:

Perceivable content
Operable interfaces
Understandable information
Robust accessibility support

Importance of Accessibility Compliance

Organizations may need accessibility compliance for:

Legal requirements
Public sector systems
Educational platforms
Enterprise accessibility policies

Characteristics of Good Alt-Text

Effective alt-text should:

Be concise
Be meaningful
Focus on important content
Avoid unnecessary details
Reflect image purpose

Weak Alt-Text Example

Image of a thing

Problems:

Too vague
Provides little value

Strong Alt-Text Example

A firefighter carrying a child away from a smoke-filled building

Advantages:

Clear
Specific
Contextual

When to Use Extended Descriptions

Extended descriptions are useful when images contain:

Charts
Tables
Infographics
Scientific diagrams
Dense visual information

Decorative Images

Decorative images may require:

Empty alt-text
No narration

This prevents unnecessary screen reader noise.

Multimodal Models for Accessibility

Modern multimodal AI systems can:

Analyze images
Detect objects
Identify relationships
Extract visible text
Generate natural-language descriptions

Accessibility-Focused Captioning

Accessibility captioning differs from general captioning because it prioritizes:

Clarity
Inclusiveness
Contextual usefulness
Screen-reader compatibility

OCR-Enhanced Accessibility

OCR (Optical Character Recognition) improves accessibility by extracting visible text from:

Signs
Labels
Screenshots
Infographics
Documents

Example OCR Workflow

Image:

Conference slide

OCR extracts:

Quarterly Revenue Growth

The system incorporates this text into the description.

Prompt Engineering for Accessibility

Accessibility-Focused Prompts

Prompt engineering helps guide multimodal models to produce accessibility-friendly descriptions.

Example Prompt

Generate concise alt-text suitable for a screen reader

Extended Description Prompt

			
Generate a detailed accessibility description including visible text, relationships, and environmental context

Prompt Engineering Best Practices

Focus on Important Information

Describe:

Key actions
Important objects
Meaningful context

Avoid:

Irrelevant background details

Match Description Length to Use Case

Use:

Concise descriptions for simple images
Extended descriptions for complex visuals

Avoid Assumptions

Do not infer:

Emotions
Intentions
Identities
unless visually clear.

Structured Accessibility Outputs

Applications may request:

JSON output
Categorized descriptions
Metadata tags

Example:

Return alt-text and extracted text as JSON

Multi-Image Accessibility Workflows

Applications may generate:

Individual alt-text
Album summaries
Comparative descriptions

Example Multi-Image Summary

			
A family vacation featuring beach activities, hiking trails, and outdoor dining experiences

Accessibility for Charts and Diagrams

Complex visuals require:

Trend descriptions
Key data insights
Structural explanations

Example Chart Description

			
The chart shows revenue increasing steadily from January through September before declining slightly in October and November

Responsible AI Considerations

Accessibility systems introduce important Responsible AI concerns.

Bias and Fairness

Models may:

Misidentify individuals
Reinforce stereotypes
Produce biased descriptions

Privacy Concerns

Images may contain:

Faces
Sensitive documents
Personal information

Organizations must protect user privacy.

Hallucinations

What Are Hallucinations?

Hallucinations occur when models describe nonexistent content.

Example:

Mentioning a laptop that does not appear in the image

Reducing Hallucinations

Strategies include:

Grounded prompting
OCR validation
Confidence scoring
Human review

Human-in-the-Loop Review

Manual review is often required for:

Public-facing systems
Educational materials
Government applications
Sensitive accessibility content

Azure AI Content Safety

Microsoft provides:
Azure AI Content Safety

to help detect:

Harmful content
Unsafe imagery
Policy violations

Performance Considerations

Accessibility workflows may process:

Large image libraries
High-resolution assets
Batch uploads

Factors affecting performance include:

Model complexity
OCR processing
Batch size
GPU availability

Optimization Techniques

Image Resizing

Reduce unnecessary resolution.

Batch Processing

Process multiple images simultaneously.

Asynchronous Workflows

Improve application responsiveness.

Caching

Reuse existing image descriptions when appropriate.

Azure Services for Accessibility Workflows

Azure OpenAI Service

Supports:

Multimodal reasoning
Accessibility-focused prompting
Natural-language description generation

Azure AI Vision

Supports:

Image analysis
OCR
Caption generation
Object detection

Azure AI Document Intelligence

Supports:

Layout understanding
OCR extraction
Document accessibility workflows

Azure AI Foundry

Supports:

Workflow orchestration
Prompt flows
AI evaluation pipelines

Azure Blob Storage

Frequently used for:

Image storage
Accessibility metadata storage
Workflow integration

Azure Functions

Often used for:

Event-driven workflows
Accessibility processing pipelines
Batch orchestration

Observability and Monitoring

Production accessibility systems should monitor:

Caption latency
OCR accuracy
Hallucination frequency
Accessibility quality metrics
Failed requests
Safety violations
Operational costs

Best Practices for Accessibility-Focused AI

Prioritize Clarity

Descriptions should be understandable and useful.

Match Description Depth to Content Complexity

Use concise or extended descriptions appropriately.

Include Visible Text When Relevant

OCR improves accessibility quality.

Avoid Biased Language

Use neutral, factual descriptions.

Validate Outputs

Check for hallucinations and inaccuracies.

Support Human Review

Especially important for high-impact content.

Maintain Accessibility Compliance

Align with WCAG principles and organizational policies.

Real-World Example

An educational platform may:

Upload classroom diagrams
Use OCR to extract visible labels
Generate concise alt-text for thumbnails
Generate extended descriptions for complex diagrams
Validate outputs with accessibility reviewers
Store descriptions for screen-reader access

This demonstrates:

Accessibility-focused prompting
OCR integration
Extended descriptions
Human-in-the-loop review

Exam Tips for AI-103

For the AI-103 exam, remember these important concepts:

Alt-text provides accessible image descriptions for screen readers.
Extended descriptions support complex visuals such as charts and diagrams.
Accessibility workflows often align with WCAG principles.
OCR improves accessibility by extracting visible text.
Concise descriptions are best for simple visuals.
Extended descriptions are best for complex content.
Hallucinations occur when models describe nonexistent content.
Accessibility-focused prompting improves output quality.
Azure AI Vision supports OCR and image analysis.
Azure AI Content Safety helps moderate unsafe imagery.
Human review may be required for sensitive or public-facing systems.

Practice Exam Questions

Question 1

What is the primary purpose of alt-text?

A. Compressing image files
B. Providing accessible image descriptions for assistive technologies
C. Encrypting image metadata
D. Accelerating GPU rendering

Answer

B. Providing accessible image descriptions for assistive technologies

Explanation

Alt-text enables screen readers to describe images to visually impaired users.

Question 2

When are extended image descriptions most useful?

A. For decorative images only
B. For complex visuals such as charts and diagrams
C. For reducing GPU utilization
D. For encrypting media assets

Answer

B. For complex visuals such as charts and diagrams

Explanation

Extended descriptions provide detailed explanations for visually dense content.

Question 3

What is a characteristic of good alt-text?

A. Excessive technical jargon
B. Clear and meaningful descriptions
C. Random artistic interpretation
D. Extremely long paragraphs for every image

Answer

B. Clear and meaningful descriptions

Explanation

Good alt-text should concisely communicate important image content.

Question 4

What does OCR contribute to accessibility workflows?

A. Automatic image compression
B. Extraction of visible text from images and documents
C. Elimination of GPU usage
D. Encryption of screen-reader output

Answer

B. Extraction of visible text from images and documents

Explanation

OCR improves accessibility by incorporating visible text into descriptions.

Question 5

What is a hallucination in an accessibility-focused AI system?

A. Generating unsupported or nonexistent details
B. Compressing images automatically
C. Encrypting image metadata
D. Scaling GPU clusters

Answer

A. Generating unsupported or nonexistent details

Explanation

Hallucinations occur when the model describes content not actually present.

Question 6

Which Azure service supports OCR and image analysis?

A. Azure AI Vision
B. Azure DNS
C. Azure Firewall
D. Azure Virtual WAN

Answer

A. Azure AI Vision

Explanation

Azure AI Vision supports OCR, captioning, and image understanding.

Question 7

Why should accessibility-focused prompts be specific?

A. To reduce storage requirements
B. To improve relevance and clarity of generated descriptions
C. To disable OCR functionality
D. To eliminate all hallucinations automatically

Answer

B. To improve relevance and clarity of generated descriptions

Explanation

Specific prompts guide multimodal models toward better accessibility outputs.

Question 8

What is a best practice for accessibility-focused image descriptions?

A. Avoid describing important context
B. Match description detail to image complexity
C. Always generate the longest possible description
D. Ignore visible text in diagrams

Answer

B. Match description detail to image complexity

Explanation

Simple images may need concise descriptions, while complex visuals require more detail.

Question 9

Which organization publishes WCAG accessibility guidelines?

A. World Wide Web Consortium (W3C)
B. Linux Foundation
C. IEEE
D. Apache Software Foundation

Answer

A. World Wide Web Consortium (W3C)

Explanation

The W3C publishes the Web Content Accessibility Guidelines (WCAG).

Question 10

Why might human review be required in accessibility workflows?

A. To validate accuracy and inclusiveness of generated descriptions
B. To reduce internet bandwidth usage
C. To disable multimodal prompting
D. To eliminate OCR processing