Overview
Optical Character Recognition (OCR) is a core computer vision workload tested on the AI-900 exam. OCR solutions are designed to extract printed or handwritten text from images and documents and convert it into machine-readable text.
On the AI-900 exam, you are expected to:
- Recognize OCR use cases
- Understand what OCR does and does not do
- Identify Azure services that provide OCR capabilities
What Is Optical Character Recognition (OCR)?
OCR is a computer vision technique that:
- Detects text within images
- Extracts characters, words, and lines
- Converts visual text into digital text
It answers the question:
“What text appears in this image or document?”
Key Characteristics of OCR Solutions
1. Text Extraction
OCR solutions can extract:
- Printed text
- Handwritten text (depending on the service)
- Numbers, symbols, and punctuation
The output is searchable and editable text.
2. Language Support
OCR solutions typically:
- Support multiple languages
- Automatically detect language in many cases
This is important for global document processing scenarios.
3. Layout and Structure Awareness
Advanced OCR solutions can identify:
- Lines and paragraphs
- Tables
- Forms
- Key-value pairs
This enables downstream document processing and automation.
4. Bounding Boxes for Text
OCR can return:
- Extracted text
- Bounding boxes showing where text appears
This allows applications to highlight or validate text locations.
5. Image and Document Input
OCR works with:
- Images (JPG, PNG)
- Scanned documents
- PDFs
- Photos taken by mobile devices
Common OCR Scenarios
OCR is the correct solution when text extraction is the primary goal.
Typical Use Cases
- Invoice and receipt processing
- Digitizing scanned documents
- License plate recognition
- Form processing
- Reading text from signs or labels
OCR vs Other Computer Vision Workloads
Understanding this distinction is critical for AI-900.
| Task | Primary Purpose |
|---|---|
| Image classification | Categorize entire images |
| Object detection | Locate and identify objects |
| OCR | Extract text from images |
| Image segmentation | Classify pixels |
Exam Tip:
If the question mentions read, extract, recognize text, or digitize documents, OCR is the correct answer.
Azure Services for OCR
Azure AI Vision (OCR Capabilities)
- Provides prebuilt OCR models
- Extracts printed and handwritten text
- Supports multiple languages
- No training required
- Accessible via REST APIs
Azure AI Document Intelligence (formerly Form Recognizer)
- Builds on OCR to:
- Extract structured data
- Analyze forms and documents
- Commonly used for:
- Invoices
- Receipts
- Business documents
Features of OCR Solutions on Azure
Prebuilt Models
- Ready to use
- No custom training needed
- Ideal for common document scenarios
Scalable Cloud Processing
- Runs in Azure
- Handles large document volumes
- Integrates with automation workflows
Integration with Other Services
OCR outputs are often used with:
- Search services
- Databases
- Business process automation
- AI-powered document workflows
When to Use OCR
Use OCR when:
- Text needs to be extracted from images or documents
- Manual data entry must be reduced
- Documents need to be searchable
When Not to Use OCR
- When identifying objects rather than text
- When categorizing images without text extraction
- When pixel-level image analysis is required
Responsible AI Considerations
At a fundamentals level, AI-900 expects awareness of:
- Privacy when processing documents with personal data
- Security of stored text and documents
- Accuracy limitations, especially with handwritten or low-quality images
Key Exam Takeaways
- OCR extracts text from images
- Converts visual content into machine-readable text
- Supports multiple languages
- Azure AI Vision provides OCR capabilities
- Azure AI Document Intelligence extends OCR for forms
- Watch for keywords: read, extract, recognize text, scan
Go to the Practice Exam Questions for this topic.
Go to the AI-900 Exam Prep Hub main page.
