This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub.
This topic falls under these sections:
Implement information extraction solutions (10–15%)
--> Build retrieval and grounding pipelines
--> Implement enrichment by using custom or built-in skills for text, images, and layout
Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Introduction
For the AI-103: Develop AI Apps and Agents on Azure certification exam, one of the key objectives within Build retrieval and grounding pipelines is understanding how to enrich content during ingestion and indexing.
AI enrichment is critical for modern:
- Retrieval-Augmented Generation (RAG) systems
- Enterprise search solutions
- AI agents
- Knowledge mining applications
- Intelligent document processing systems
Azure AI solutions often ingest raw content such as:
- PDFs
- Images
- Scanned forms
- Emails
- Audio transcripts
- Web pages
- Office documents
However, raw content alone is often not enough.
AI enrichment adds:
- Meaning
- Metadata
- Structure
- Searchability
- Semantic understanding
This enrichment process enables AI systems to retrieve more accurate and contextually relevant information.
What Is AI Enrichment?
AI enrichment is the process of enhancing raw content with AI-generated insights before indexing it into a search system.
Enrichment can:
- Extract text
- Detect entities
- Identify key phrases
- Analyze sentiment
- Detect language
- Recognize objects in images
- Understand document layout
- Generate metadata
These enrichments improve:
- Search relevance
- Semantic retrieval
- Grounding quality
- AI agent accuracy
Core Azure Services Used
Several Azure services commonly appear in enrichment pipelines.
| Service | Purpose |
|---|---|
| Azure AI Search | Indexing and enrichment orchestration |
| Azure AI Document Intelligence | Layout extraction and document analysis |
| Azure AI Vision | OCR and image analysis |
| Azure AI Language | Text analysis and NLP |
| Azure OpenAI Service | Embeddings and generative AI |
| Azure Blob Storage | Source content storage |
| Azure Functions | Custom enrichment logic |
Understanding Skillsets
What Is a Skillset?
In Azure AI Search, a skillset is a collection of enrichment steps that process content during indexing.
A skillset may:
- Extract text
- Analyze images
- Detect entities
- Generate embeddings
- Enrich metadata
Think of a skillset as an AI pipeline.
Skillset Workflow
Typical enrichment pipeline:
Raw Content ↓Indexer ↓Skillset ↓Enriched Content ↓Search Index
Built-In Skills
Azure AI Search includes many prebuilt cognitive skills.
These skills require minimal custom development.
Built-in skills are commonly tested on AI-103.
Categories of Built-In Skills
| Category | Examples |
|---|---|
| Text Skills | Entity extraction, sentiment |
| Vision Skills | OCR, image tagging |
| Layout Skills | Document structure extraction |
| Utility Skills | Shaping and merging data |
Text Enrichment Skills
Text enrichment skills analyze textual content.
Common use cases:
- Knowledge mining
- Semantic search
- RAG pipelines
- AI assistants
Language Detection Skill
Purpose
Detects the language of text.
Example:
Input:"Bonjour tout le monde"Output:French
Use cases:
- Multilingual indexing
- Translation pipelines
- Language-specific routing
Entity Recognition Skill
Purpose
Extracts named entities such as:
- People
- Organizations
- Locations
- Dates
Example:
Input:"Microsoft opened a new office in London."Output:- Microsoft (Organization)- London (Location)
This enrichment improves:
- Search filters
- Metadata tagging
- Semantic retrieval
Key Phrase Extraction Skill
Purpose
Extracts important phrases from content.
Example:
Document:"This policy describes annual cybersecurity compliance procedures."Extracted phrases:- cybersecurity compliance- annual procedures
Useful for:
- Search optimization
- Summaries
- Topic identification
Sentiment Analysis Skill
Purpose
Determines emotional tone.
Possible outputs:
- Positive
- Neutral
- Negative
Common use cases:
- Customer feedback analysis
- Support ticket analysis
- Call center insights
Text Translation Skill
Purpose
Translates content into another language.
Example:
Spanish → English
Useful in:
- Global enterprise systems
- Multilingual search
- Cross-language retrieval
Image Enrichment Skills
Image enrichment is critical for scanned documents and multimedia content.
Images often contain:
- Text
- Objects
- Logos
- Handwriting
- Charts
- Diagrams
OCR Skill
What Is OCR?
OCR (Optical Character Recognition) extracts text from images.
Common AI-103 scenario:
Make scanned PDFs searchable.
OCR enables indexing of:
- Scanned forms
- Photos
- Screenshots
- Whiteboards
- Image-based PDFs
OCR Workflow
Scanned PDF ↓OCR Skill ↓Extracted Text ↓Search Index
Image Analysis Skill
Purpose
Analyzes visual content.
Can detect:
- Objects
- Captions
- Categories
- Tags
- Landmarks
- Brands
Example:
Image:Beach sunsetDetected:- beach- sunset- ocean
These tags become searchable metadata.
Layout Enrichment
Layout enrichment is increasingly important in enterprise AI systems.
Many documents contain:
- Tables
- Headers
- Footers
- Sections
- Forms
- Multi-column layouts
Simple text extraction may lose this structure.
Azure AI Document Intelligence
Azure AI Document Intelligence helps preserve:
- Document structure
- Layout relationships
- Tables
- Form fields
This is essential for:
- Financial documents
- Invoices
- Contracts
- Healthcare forms
- Reports
Layout Extraction Example
Example document structure:
Invoice ├── Vendor Name ├── Invoice Number ├── Table of Items └── Total Amount
Layout-aware enrichment preserves relationships between fields.
Table Extraction
A major advantage of layout analysis is table extraction.
Without layout enrichment:
Rows and columns may become scrambled text.
With layout enrichment:
- Rows remain structured
- Columns are preserved
- Relationships remain intact
This significantly improves retrieval quality.
Custom Skills
What Are Custom Skills?
Built-in skills do not cover every business scenario.
Custom skills allow developers to add:
- Proprietary logic
- Specialized AI models
- External APIs
- Custom transformations
Custom skills are commonly implemented using:
- Azure Functions
- Web APIs
- Containerized services
Common Custom Skill Scenarios
Examples:
- Industry-specific entity extraction
- Internal taxonomy classification
- Medical terminology analysis
- Product categorization
- Compliance scoring
- Fraud detection enrichment
Custom Skill Workflow
Indexer ↓Custom Skill API ↓Enriched Metadata ↓Search Index
When to Use Built-In vs Custom Skills
| Built-In Skills | Custom Skills |
|---|---|
| Quick setup | Flexible |
| Microsoft-managed | Developer-managed |
| Common scenarios | Specialized scenarios |
| Minimal coding | Requires development |
Knowledge Stores
Enriched data can also be projected into a knowledge store.
A knowledge store supports:
- Analytics
- Visualization
- Reporting
- Downstream processing
Outputs may include:
- Tables
- JSON objects
- Enriched documents
Enrichment and RAG
Enrichment dramatically improves Retrieval-Augmented Generation systems.
Benefits include:
- Better retrieval relevance
- Improved grounding
- Richer metadata
- Enhanced semantic understanding
Example:
Raw document:"Contoso released Project Falcon."Enriched:- Organization: Contoso- Project: Falcon- Release event detected
This creates more intelligent retrieval behavior.
Embeddings and Enrichment
Modern pipelines often combine enrichment with:
- Chunking
- Embedding generation
- Vector indexing
Workflow:
Document ↓OCR / Layout Extraction ↓Entity Extraction ↓Chunking ↓Embeddings ↓Vector Index
Performance Considerations
AI enrichment can increase:
- Processing time
- Compute cost
- Indexing complexity
Optimization strategies:
- Select only needed skills
- Use incremental indexing
- Limit enrichment scope
- Cache reusable outputs
Security Considerations
Enrichment pipelines should support:
- RBAC
- Managed identities
- Secure storage access
- Data encryption
- Compliance requirements
Important exam concept:
Enriched content may contain sensitive information.
Common AI-103 Scenarios
Scenario 1
You need searchable scanned documents.
Solution:
- OCR Skill
- Azure AI Search
Scenario 2
You need to preserve invoice tables.
Solution:
- Azure AI Document Intelligence
- Layout extraction
Scenario 3
You need industry-specific classification.
Solution:
- Custom skill
Scenario 4
You need multilingual search.
Solution:
- Language detection
- Translation skill
Important AI-103 Exam Tips
Know These Key Concepts
| Concept | Purpose |
|---|---|
| Skillset | AI enrichment pipeline |
| OCR | Extract text from images |
| Entity Recognition | Detect named entities |
| Layout Extraction | Preserve document structure |
| Custom Skill | Specialized enrichment logic |
| Knowledge Store | Store enriched outputs |
Frequently Tested Areas
Expect questions involving:
- Skillsets
- OCR workflows
- Layout-aware extraction
- Custom enrichment APIs
- Built-in cognitive skills
- AI enrichment pipelines
- Azure AI Search integration
- Document Intelligence usage
Final Thoughts
AI enrichment is a foundational capability in modern Azure AI architectures.
For AI-103, focus heavily on:
- Skillsets
- Built-in cognitive skills
- OCR pipelines
- Layout extraction
- Document Intelligence
- Custom skills
- Metadata enrichment
- Search optimization
These concepts are essential for building high-quality enterprise AI systems, retrieval pipelines, and grounded AI applications.
Practice Exam Questions
Question 1
What is the primary purpose of a skillset in Azure AI Search?
A. Store vector embeddings
B. Manage RBAC permissions
C. Apply AI enrichment during indexing
D. Train foundation models
Answer
C. Apply AI enrichment during indexing
Question 2
Which built-in skill extracts text from images?
A. Entity Recognition Skill
B. OCR Skill
C. Sentiment Skill
D. Translation Skill
Answer
B. OCR Skill
Question 3
Which Azure service is commonly used for layout-aware document extraction?
A. Azure Monitor
B. Azure Backup
C. Azure Virtual Network
D. Azure AI Document Intelligence
Answer
D. Azure AI Document Intelligence
Question 4
What is a common use case for custom skills?
A. Hosting virtual machines
B. Industry-specific enrichment logic
C. Managing Azure subscriptions
D. Database replication
Answer
B. Industry-specific enrichment logic
Question 5
Which skill identifies people, organizations, and locations in text?
A. OCR Skill
B. Image Analysis Skill
C. Entity Recognition Skill
D. Translation Skill
Answer
C. Entity Recognition Skill
Question 6
Why is layout extraction important?
A. It preserves document structure and relationships
B. It encrypts documents
C. It reduces storage size
D. It removes duplicate records
Answer
A. It preserves document structure and relationships
Question 7
Which Azure service commonly hosts custom enrichment APIs?
A. Azure Functions
B. Azure Firewall
C. Azure Kubernetes Service only
D. Azure Monitor
Answer
A. Azure Functions
Question 8
What is the purpose of key phrase extraction?
A. Compress documents
B. Identify important concepts in content
C. Encrypt text
D. Generate embeddings
Answer
B. Identify important concepts in content
Question 9
Which enrichment capability is most useful for scanned PDF documents?
A. Semantic ranking
B. Vector similarity
C. OCR
D. Metadata filtering
Answer
C. OCR
Question 10
What is a knowledge store used for in Azure AI Search?
A. Hosting foundation models
B. Storing enriched outputs for downstream use
C. Managing virtual networks
D. Encrypting embeddings
Answer
B. Storing enriched outputs for downstream use
Go to the AI-103 Exam Prep Hub main page
