Tag: Retrieval and Grounding Pipelines

AI-103, Azure AI, Microsoft Certification May 25, 2026

Ingest and index content, such as documents, images, audio, and video (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement information extraction solutions (10–15%)
   --> Build retrieval and grounding pipelines
      --> Ingest and index content, such as documents, images, audio, and video

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

For the AI-103: Develop AI Apps and Agents on Azure certification exam, one of the important objectives within Implement information extraction solutions is understanding how to ingest, process, enrich, and index content so that AI applications and agents can retrieve and ground responses accurately.

This topic is especially important for:

Retrieval-Augmented Generation (RAG)
Knowledge mining
Enterprise search
AI agents
Multimodal AI applications
Semantic search solutions

Modern AI applications rarely rely only on model training data. Instead, they ingest organizational content such as:

PDFs
Word documents
Images
Scanned forms
Audio recordings
Videos
Web pages
Databases
Emails
Knowledge base articles

Azure provides several services that work together to support these ingestion and indexing pipelines.

Why Content Ingestion and Indexing Matter

Large Language Models (LLMs) are powerful, but they:

Can become outdated
Cannot access private enterprise data by default
May hallucinate information
Need grounding with trusted data sources

A retrieval and grounding pipeline solves this problem by:

Ingesting data
Extracting useful content
Enriching the data with AI
Creating searchable indexes
Retrieving relevant chunks during prompting

This architecture is foundational to:

Azure AI Search + RAG
AI agents
Enterprise copilots
Knowledge mining systems

Core Azure Services Used

Several Azure services commonly appear in AI-103 scenarios.

Service	Purpose
Microsoft Azure AI Search	Indexing, vector search, semantic search
Azure AI Document Intelligence	Extract text, forms, layout, tables
Azure AI Vision	OCR, image analysis
Azure AI Speech	Speech-to-text transcription
Azure OpenAI Service	Embeddings and generative AI
Azure Blob Storage	Store raw content
Azure Functions	Automation and ingestion orchestration
Azure Logic Apps	Workflow orchestration
Azure AI Foundry	AI orchestration and agent development

High-Level Retrieval and Grounding Pipeline

A typical ingestion pipeline looks like this:

			
Content Sources
    ↓
Ingestion
    ↓
AI Enrichment
    ↓
Chunking
    ↓
Embeddings Generation
    ↓
Indexing
    ↓
Retrieval
    ↓
Grounded LLM Response

		

Step 1: Content Ingestion

What Is Content Ingestion?

Content ingestion is the process of importing data into the AI pipeline from various sources.

Common sources include:

SharePoint
Azure Blob Storage
SQL databases
Websites
PDFs
Images
Audio recordings
Video files
Emails
Internal documentation

Ingesting Documents

Documents are among the most common enterprise data sources.

Typical file types:

PDF
DOCX
TXT
HTML
CSV
PowerPoint
Excel

Common Workflow

Upload documents to Azure Blob Storage
Use Azure AI Search indexers
Extract text and metadata
Apply enrichment skills
Store indexed content

Important Exam Concept: Indexers

An indexer in Azure AI Search:

Connects to a data source
Crawls content
Extracts text
Applies AI enrichment
Pushes results into a search index

Supported data sources include:

Azure Blob Storage
Azure SQL
Cosmos DB
SharePoint (via connectors)

Ingesting Images

Images may contain:

Text
Objects
Faces
Product labels
Handwriting
Diagrams

OCR (Optical Character Recognition)

Azure AI Vision can extract text from:

Photos
Scanned documents
Screenshots
Whiteboards

Common exam scenario:

Extract text from scanned PDFs and make it searchable.

The solution usually involves:

Azure AI Vision OCR
Azure AI Search skillsets
Search indexes

Image Metadata Extraction

AI enrichment can also detect:

Captions
Tags
Objects
Brands
Categories

Example:

			
Image: beach_photo.jpg
Extracted metadata:
- beach
- ocean
- sunset
- palm tree

		

This metadata becomes searchable within the index.

Ingesting Audio Content

Audio ingestion commonly involves:

Meeting recordings
Call center conversations
Podcasts
Voice memos

Speech-to-Text

Azure AI Speech converts spoken language into text transcripts.

Workflow:

Upload audio
Transcribe speech
Store transcript
Index transcript in Azure AI Search

Important exam point:

Audio itself is usually not directly indexed — the transcript is indexed.

Additional Enrichment

You may also extract:

Speaker identification
Sentiment
Keywords
Language detection

Ingesting Video Content

Video ingestion is increasingly important in enterprise AI.

Video contains:

Audio
Visual frames
Text overlays
Metadata

Typical Video Processing Pipeline

Upload video
Extract audio track
Transcribe speech
Analyze frames
Generate metadata
Index searchable content

Services commonly used:

Azure AI Speech
Azure AI Vision
Azure Media Services (historically)
Azure AI Search

AI Enrichment Pipelines

What Is AI Enrichment?

AI enrichment enhances raw data before indexing.

Examples:

OCR
Key phrase extraction
Entity recognition
Language detection
Sentiment analysis
Image tagging
Translation

In Azure AI Search, enrichment is configured using:

Skillsets
Cognitive skills
Custom skills

Skillsets in Azure AI Search

A skillset is a pipeline of AI enrichment steps.

Example skillset:

			
PDF
 ↓
OCR Skill
 ↓
Language Detection Skill
 ↓
Key Phrase Extraction Skill
 ↓
Embedding Generation
 ↓
Index

		

Built-In Cognitive Skills

Common built-in skills include:

Skill	Purpose
OCR Skill	Extract text from images
Entity Recognition Skill	Detect people, places, organizations
Key Phrase Extraction Skill	Identify important phrases
Language Detection Skill	Detect language
Sentiment Skill	Analyze sentiment
Image Analysis Skill	Describe image content

Chunking Content

Why Chunking Matters

LLMs have token limits.

Large documents must be split into smaller sections called chunks.

Chunking improves:

Retrieval precision
Embedding quality
Grounding accuracy
Search relevance

Chunking Strategies

Fixed-Size Chunking

Example:

500 tokens per chunk

Semantic Chunking

Split by:

Headings
Paragraphs
Sections

Overlapping Chunks

Helps preserve context.

Example:

			
Chunk 1: Tokens 1–500
Chunk 2: Tokens 450–950

Embeddings Generation

What Are Embeddings?

Embeddings are numerical vector representations of text or content.

Embeddings allow:

Semantic similarity search
Vector search
RAG retrieval

Example concept:

"car" and "automobile"

Traditional keyword search may treat them differently.

Embeddings place them close together in vector space.

Vector Indexing

Vector Search in Azure AI Search

Azure AI Search supports:

Vector indexes
Hybrid search
Semantic ranking

Workflow:

Generate embeddings
Store vectors in index
Query with vector embeddings
Retrieve semantically similar content

This is a major AI-103 topic.

Hybrid Search

Hybrid search combines:

Keyword search
Semantic search
Vector search

Benefits:

Better relevance
Improved grounding
More accurate AI responses

This is commonly recommended for enterprise RAG systems.

Semantic Search

Semantic search improves ranking using language understanding.

Instead of exact keyword matching:

"How do I reset my password?"

Semantic search may also retrieve:

"Steps to change account credentials"

Metadata and Filtering

Indexes commonly store metadata such as:

File name
Author
Upload date
Department
Language
Content type

Metadata supports:

Filtering
Security trimming
Access control
Faceted search

Example:

			
department = HR
language = English
documentType = Policy

Incremental Indexing

Enterprise systems often ingest changing content.

Incremental indexing:

Detects changed documents
Updates only modified content
Improves efficiency

Important concept:

Avoid rebuilding the entire index unnecessarily.

Security Considerations

AI-103 may test secure ingestion patterns.

Key considerations:

Managed identities
RBAC
Private endpoints
Data encryption
Secure storage access
Role-based document access

Common scenario:

Ensure users only retrieve documents they are authorized to access.

Common AI-103 Architecture Scenario

A very common exam architecture looks like this:

			
Documents in Blob Storage
        ↓
Azure AI Search Indexer
        ↓
Skillset Enrichment
        ↓
Chunking + Embeddings
        ↓
Vector Index
        ↓
Azure OpenAI RAG Application

		

Understand this flow thoroughly for the exam.

Important Exam Tips

Know the Difference Between:

Concept	Purpose
Data source	Where content originates
Indexer	Pulls and processes content
Skillset	AI enrichment pipeline
Index	Searchable storage structure
Embeddings	Vector representations
Vector search	Semantic similarity retrieval

Common Exam Scenarios

Scenario 1

You need to search scanned PDFs.

Solution:

OCR
Skillsets
Azure AI Search

Scenario 2

You need semantic retrieval for a chatbot.

Solution:

Embeddings
Vector indexes
Hybrid search
Azure OpenAI

Scenario 3

You need searchable meeting recordings.

Solution:

Speech-to-text transcription
Index transcripts

Scenario 4

You need image-based metadata search.

Solution:

Image Analysis Skill
AI enrichment pipeline

Final Thoughts

Understanding ingestion and indexing pipelines is critical for modern Azure AI solutions.

For the AI-103 exam, focus especially on:

Azure AI Search architecture
Skillsets and enrichment
OCR workflows
Vector indexing
Embeddings
Chunking strategies
Hybrid search
RAG grounding pipelines

These concepts appear repeatedly throughout generative AI, agentic AI, and enterprise search solutions.

Practice Exam Questions

Question 1

Which Azure service is primarily responsible for creating and managing searchable indexes in a RAG solution?

A. Azure AI Vision
B. Azure AI Speech
C. Azure AI Search
D. Azure Functions

Answer

C. Azure AI Search

Question 2

What is the primary purpose of chunking documents before generating embeddings?

A. Reduce storage costs
B. Encrypt content
C. Convert files to JSON
D. Improve retrieval and fit token limits

Answer

D. Improve retrieval and fit token limits

Question 3

Which Azure capability extracts text from scanned images and PDFs?

A. OCR
B. Sentiment Analysis
C. Vectorization
D. Language Detection

Answer

A. OCR

Question 4

What is typically indexed from audio recordings?

A. Raw waveform data
B. Video frames
C. Speech transcripts
D. Encryption metadata

Answer

C. Speech transcripts

Question 5

Which component in Azure AI Search orchestrates AI enrichment steps?

A. Index
B. Skillset
C. Embedding model
D. Semantic ranker

Answer

B. Skillset

Question 6

What is the purpose of embeddings in a retrieval pipeline?

A. Compress documents
B. Enable semantic similarity search
C. Encrypt vector data
D. Improve OCR quality

Answer

B. Enable semantic similarity search

Question 7

Which search approach combines keyword and vector search?

A. OCR search
B. Lexical indexing
C. Hybrid search
D. Boolean search

Answer

C. Hybrid search

Question 8

Which Azure service commonly converts speech into searchable text?

A. Azure AI Vision
B. Azure AI Search
C. Azure AI Speech
D. Azure Monitor

Answer

C. Azure AI Speech

Question 9

What is an indexer in Azure AI Search responsible for?

A. Training machine learning models
B. Managing RBAC permissions
C. Hosting APIs
D. Crawling and importing data into indexes

Answer

D. Crawling and importing data into indexes

Question 10

Which statement best describes semantic search?

A. It only matches exact keywords
B. It retrieves results based on meaning and context
C. It replaces vector search entirely
D. It only works with structured databases

Answer

B. It retrieves results based on meaning and context

Go to the AI-103 Exam Prep Hub main page

AI, AI-103, Azure AI, Microsoft Certification May 25, 2026

Configure semantic search, hybrid search, and vector search for Grounding (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement information extraction solutions (10–15%)
   --> Build retrieval and grounding pipelines
      --> Configure semantic search, hybrid search, and vector search for Grounding

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

For the AI-103: Develop AI Apps and Agents on Azure certification exam, one of the most important modern AI concepts is understanding how to configure and use:

Semantic search
Vector search
Hybrid search

These technologies are foundational to:

Retrieval-Augmented Generation (RAG)
AI agents
Enterprise copilots
Knowledge mining systems
Grounded AI applications

In modern Azure AI architectures, these search methods help Large Language Models (LLMs) retrieve relevant enterprise content so responses are accurate, current, and grounded in trusted data.

Why Grounding Matters

LLMs such as those used through Azure OpenAI Service are powerful, but they have limitations:

They may hallucinate
Their training data may be outdated
They do not automatically know private organizational data
They cannot inherently access enterprise documents

Grounding solves this problem.

What Is Grounding?

Grounding means providing an AI model with relevant external data during inference.

Example:

			
User Question:
"What is our company travel reimbursement policy?"
AI Workflow:
1. Retrieve policy document chunks
2. Provide chunks to LLM
3. Generate grounded answer

		

Without grounding, the model might invent an answer.

With grounding, the response is based on actual company documentation.

Core Azure Services Used

Several Azure services commonly appear in grounding architectures.

Service	Purpose
Azure AI Search	Search indexes, vector search, semantic ranking
Azure OpenAI Service	Embeddings generation and LLM responses
Azure Blob Storage	Store source documents
Azure AI Document Intelligence	Extract document content
Azure AI Foundry	Build AI agents and orchestration workflows

Understanding Search Types

There are three major search approaches you must understand for AI-103:

Search Type	Main Purpose
Keyword Search	Exact text matching
Semantic Search	Meaning-based ranking
Vector Search	Embedding similarity
Hybrid Search	Combines keyword + semantic + vector

Traditional Keyword Search

Traditional search relies on:

Exact matches
Tokens
Lexical analysis

Example:

			
Search Query:
"reset password"

Documents containing:

"reset password"

will rank highly.

However, keyword search struggles with:

Synonyms
Context
Natural language intent

Example:

"change account credentials"

may not match well.

Semantic Search

What Is Semantic Search?

Semantic search improves retrieval by understanding:

Context
Meaning
Intent
Relationships between words

Instead of only exact keywords, semantic search uses language understanding to improve ranking quality.

How Semantic Search Works

Semantic search:

Interprets user intent
Understands relationships between phrases
Re-ranks search results
Produces more relevant answers

Example:

			
User Query:
"How do I update my login information?"

Semantic search may retrieve:

"Instructions for changing account credentials"

even without exact keyword matches.

Semantic Ranking

In Azure AI Search, semantic ranking:

Reorders results based on relevance
Uses deep language models
Improves natural language search experiences

Important AI-103 point:

Semantic search enhances ranking, but it does not replace vector search.

Semantic Captions and Answers

Azure AI Search semantic search can generate:

Semantic captions
Semantic answers

Semantic Captions

Short highlighted summaries from documents.

Semantic Answers

Direct answers extracted from indexed content.

Example:

			
Question:
"What is the vacation accrual policy?"
Semantic answer:
"Employees accrue 10 vacation days annually."

Vector Search

What Is Vector Search?

Vector search uses embeddings to retrieve semantically similar content.

Instead of matching keywords, vector search compares numerical vectors.

What Are Embeddings?

Embeddings are numerical representations of content.

Words or concepts with similar meanings are placed near each other in vector space.

Example:

			
"car"
"automobile"
"vehicle"

These concepts become mathematically similar vectors.

Embedding Generation

Embeddings are commonly generated using models in:

Azure OpenAI Service
Azure AI Foundry models

Typical embedding workflow:

Chunk documents
Generate embeddings
Store vectors in search index
Generate embedding for user query
Retrieve nearest vectors

Vector Search Workflow

			
Document Chunk
      ↓
Embedding Model
      ↓
Vector Embedding
      ↓
Stored in Search Index

		

Query workflow:

			
User Query
     ↓
Embedding Model
     ↓
Query Vector
     ↓
Nearest Neighbor Search

		

Nearest Neighbor Search

Vector databases use similarity calculations such as:

Cosine similarity
Euclidean distance

The system retrieves content with the closest vectors.

Important exam concept:

Vector similarity measures semantic closeness.

Configuring Vector Search in Azure AI Search

To configure vector search, you typically:

Create vector-enabled fields
Generate embeddings
Store embeddings in index
Configure vector search profiles
Execute vector queries

Example Vector Index Structure

Example fields:

Field	Type
id	String
content	String
contentVector	Collection(Float)
title	String

The vector field stores embeddings.

Vector Dimensions

Embedding models produce vectors with fixed dimensions.

Example:

1536 dimensions

Important:

The vector field dimension must match the embedding model output.

Hybrid Search

What Is Hybrid Search?

Hybrid search combines:

Keyword search
Semantic ranking
Vector similarity

This is one of the most important AI-103 topics.

Why Hybrid Search Matters

Each search method has strengths and weaknesses.

Method	Strength
Keyword search	Exact matching
Semantic search	Better ranking/context
Vector search	Conceptual similarity

Hybrid search combines all three for optimal retrieval quality.

Hybrid Search Architecture

			
User Query
   ↓
Keyword Search
   +
Vector Search
   ↓
Combined Results
   ↓
Semantic Re-ranking
   ↓
Top Grounding Results

		

This architecture is extremely common in enterprise RAG systems.

Why Hybrid Search Is Recommended

Hybrid search improves:

Recall
Precision
Relevance
Context matching
Grounding quality

This reduces hallucinations and improves AI responses.

Retrieval-Augmented Generation (RAG)

What Is RAG?

RAG combines:

Retrieval systems
External knowledge
Generative AI

Workflow:

			
User Query
   ↓
Search Retrieval
   ↓
Relevant Chunks
   ↓
LLM Prompt
   ↓
Grounded Response

		

Grounding Pipeline Example

			
Documents in Blob Storage
        ↓
Azure AI Search Indexer
        ↓
Chunking
        ↓
Embedding Generation
        ↓
Vector Index
        ↓
Hybrid Search Retrieval
        ↓
Azure OpenAI Prompt
        ↓
Grounded Response

		

This pipeline appears frequently in AI-103 scenarios.

Chunking and Retrieval Quality

Chunking directly affects search quality.

Good chunks:

Preserve meaning
Fit token limits
Improve embedding relevance

Poor chunking causes:

Incomplete answers
Lost context
Lower retrieval accuracy

Semantic vs Vector Search

Semantic Search	Vector Search
Improves ranking	Retrieves by embedding similarity
Language understanding	Numerical vector comparison
Works with textual relevance	Works with semantic proximity
Re-ranking layer	Retrieval mechanism

Important:

These technologies complement each other.

Filtering in Grounding Pipelines

Metadata filtering improves retrieval quality.

Common filters:

Department
Security level
Document type
Date
Language

Example:

department = Finance

This limits retrieval scope.

Security Trimming

Enterprise grounding systems often require:

RBAC
Document-level security
Identity-aware retrieval

Important exam concept:

Users should retrieve only authorized content.

Performance Optimization

Key optimization techniques:

Proper chunk sizes
Embedding caching
Hybrid search
Metadata filtering
Incremental indexing
Semantic ranking

Common AI-103 Scenarios

Scenario 1

You need a chatbot that answers using internal PDFs.

Solution:

Azure AI Search
Embeddings
Vector search
Hybrid search
Azure OpenAI

Scenario 2

You need better ranking for natural language queries.

Solution:

Semantic search
Semantic ranking

Scenario 3

You need concept-based retrieval rather than keyword matching.

Solution:

Vector search

Scenario 4

You need maximum retrieval accuracy.

Solution:

Hybrid search

Important AI-103 Exam Tips

Know These Core Concepts

Concept	Key Purpose
Embeddings	Vector representation
Vector search	Semantic retrieval
Semantic ranking	Better result ordering
Hybrid search	Combined retrieval
Grounding	Providing trusted context
Chunking	Breaking documents into manageable pieces

Frequently Tested Knowledge Areas

Expect questions involving:

RAG architectures
Embedding generation
Vector-enabled indexes
Hybrid retrieval
Semantic ranking
Grounding pipelines
Azure AI Search configuration
Chunking strategies

Final Thoughts

Semantic search, vector search, and hybrid search are foundational technologies for modern AI systems on Azure.

For AI-103, focus heavily on:

How embeddings work
When to use vector search
Why hybrid search is recommended
How semantic ranking improves results
How grounding reduces hallucinations
How Azure AI Search integrates with Azure OpenAI

These concepts are central to enterprise AI agents, copilots, and generative AI applications.

Practice Exam Questions

Question 1

What is the primary purpose of grounding in a generative AI solution?

A. Reduce storage costs
B. Train foundation models
C. Provide trusted external context to the LLM
D. Encrypt embeddings

Answer

C. Provide trusted external context to the LLM

Question 2

Which Azure service commonly provides vector search capabilities?

A. Azure Monitor
B. Azure AI Search
C. Azure Virtual Machines
D. Azure Backup

Answer

B. Azure AI Search

Question 3

What are embeddings used for in vector search?

A. Encryption
B. Data compression
C. Numerical semantic representations
D. OCR processing

Answer

C. Numerical semantic representations

Question 4

Which search type is best at retrieving semantically similar concepts even when keywords differ?

A. Boolean search
B. Lexical search
C. Metadata search
D. Vector search

Answer

D. Vector search

Question 5

What does hybrid search combine?

A. OCR and translation
B. Keyword and vector search
C. SQL and NoSQL databases
D. Blob storage and Cosmos DB

Answer

B. Keyword and vector search

Question 6

What is the role of semantic ranking in Azure AI Search?

A. Improve relevance ordering of results
B. Encrypt search indexes
C. Generate embeddings
D. Compress vectors

Answer

A. Improve relevance ordering of results

Question 7

Which process converts text into numerical vectors?

A. OCR
B. Tokenization
C. Embedding generation
D. Semantic ranking

Answer

C. Embedding generation

Question 8

Why is chunking important in grounding pipelines?

A. It removes duplicate users
B. It reduces RBAC complexity
C. It improves retrieval relevance and token management
D. It encrypts documents

Answer

C. It improves retrieval relevance and token management

Question 9

Which search approach generally provides the best retrieval quality for enterprise RAG applications?

A. Keyword search only
B. Vector search only
C. SQL full-text search
D. Hybrid search

Answer

D. Hybrid search

Question 10

Which statement best describes semantic search?

A. It only retrieves exact keyword matches
B. It uses language understanding to improve relevance
C. It replaces embeddings entirely
D. It only works on structured databases

Answer

B. It uses language understanding to improve relevance

Go to the AI-103 Exam Prep Hub main page

AI, AI-103, Microsoft Certification May 25, 2026

Implement enrichment by using custom or built-in skills for text, images, and layout (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement information extraction solutions (10–15%)
   --> Build retrieval and grounding pipelines
      --> Implement enrichment by using custom or built-in skills for text, images, and layout

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

For the AI-103: Develop AI Apps and Agents on Azure certification exam, one of the key objectives within Build retrieval and grounding pipelines is understanding how to enrich content during ingestion and indexing.

AI enrichment is critical for modern:

Retrieval-Augmented Generation (RAG) systems
Enterprise search solutions
AI agents
Knowledge mining applications
Intelligent document processing systems

Azure AI solutions often ingest raw content such as:

PDFs
Images
Scanned forms
Emails
Audio transcripts
Web pages
Office documents

However, raw content alone is often not enough.

AI enrichment adds:

Meaning
Metadata
Structure
Searchability
Semantic understanding

This enrichment process enables AI systems to retrieve more accurate and contextually relevant information.

What Is AI Enrichment?

AI enrichment is the process of enhancing raw content with AI-generated insights before indexing it into a search system.

Enrichment can:

Extract text
Detect entities
Identify key phrases
Analyze sentiment
Detect language
Recognize objects in images
Understand document layout
Generate metadata

These enrichments improve:

Search relevance
Semantic retrieval
Grounding quality
AI agent accuracy

Core Azure Services Used

Several Azure services commonly appear in enrichment pipelines.

Service	Purpose
Azure AI Search	Indexing and enrichment orchestration
Azure AI Document Intelligence	Layout extraction and document analysis
Azure AI Vision	OCR and image analysis
Azure AI Language	Text analysis and NLP
Azure OpenAI Service	Embeddings and generative AI
Azure Blob Storage	Source content storage
Azure Functions	Custom enrichment logic

Understanding Skillsets

What Is a Skillset?

In Azure AI Search, a skillset is a collection of enrichment steps that process content during indexing.

A skillset may:

Extract text
Analyze images
Detect entities
Generate embeddings
Enrich metadata

Think of a skillset as an AI pipeline.

Skillset Workflow

Typical enrichment pipeline:

			
Raw Content
     ↓
Indexer
     ↓
Skillset
     ↓
Enriched Content
     ↓
Search Index

		

Built-In Skills

Azure AI Search includes many prebuilt cognitive skills.

These skills require minimal custom development.

Built-in skills are commonly tested on AI-103.

Categories of Built-In Skills

Category	Examples
Text Skills	Entity extraction, sentiment
Vision Skills	OCR, image tagging
Layout Skills	Document structure extraction
Utility Skills	Shaping and merging data

Text Enrichment Skills

Text enrichment skills analyze textual content.

Common use cases:

Knowledge mining
Semantic search
RAG pipelines
AI assistants

Language Detection Skill

Purpose

Detects the language of text.

Example:

			
Input:
"Bonjour tout le monde"
Output:
French

Use cases:

Multilingual indexing
Translation pipelines
Language-specific routing

Entity Recognition Skill

Purpose

Extracts named entities such as:

People
Organizations
Locations
Dates

Example:

			
Input:
"Microsoft opened a new office in London."
Output:
- Microsoft (Organization)
- London (Location)

		

This enrichment improves:

Search filters
Metadata tagging
Semantic retrieval

Key Phrase Extraction Skill

Purpose

Extracts important phrases from content.

Example:

			
Document:
"This policy describes annual cybersecurity compliance procedures."
Extracted phrases:
- cybersecurity compliance
- annual procedures

		

Useful for:

Search optimization
Summaries
Topic identification

Sentiment Analysis Skill

Purpose

Determines emotional tone.

Possible outputs:

Positive
Neutral
Negative

Common use cases:

Customer feedback analysis
Support ticket analysis
Call center insights

Text Translation Skill

Purpose

Translates content into another language.

Example:

Spanish → English

Useful in:

Global enterprise systems
Multilingual search
Cross-language retrieval

Image Enrichment Skills

Image enrichment is critical for scanned documents and multimedia content.

Images often contain:

Text
Objects
Logos
Handwriting
Charts
Diagrams

OCR Skill

What Is OCR?

OCR (Optical Character Recognition) extracts text from images.

Common AI-103 scenario:

Make scanned PDFs searchable.

OCR enables indexing of:

Scanned forms
Photos
Screenshots
Whiteboards
Image-based PDFs

OCR Workflow

			
Scanned PDF
      ↓
OCR Skill
      ↓
Extracted Text
      ↓
Search Index

		

Image Analysis Skill

Purpose

Analyzes visual content.

Can detect:

Objects
Captions
Categories
Tags
Landmarks
Brands

Example:

			
Image:
Beach sunset
Detected:
- beach
- sunset
- ocean

		

These tags become searchable metadata.

Layout Enrichment

Layout enrichment is increasingly important in enterprise AI systems.

Many documents contain:

Tables
Headers
Footers
Sections
Forms
Multi-column layouts

Simple text extraction may lose this structure.

Azure AI Document Intelligence

Azure AI Document Intelligence helps preserve:

Document structure
Layout relationships
Tables
Form fields

This is essential for:

Financial documents
Invoices
Contracts
Healthcare forms
Reports

Layout Extraction Example

Example document structure:

			
Invoice
 ├── Vendor Name
 ├── Invoice Number
 ├── Table of Items
 └── Total Amount

		

Layout-aware enrichment preserves relationships between fields.

Table Extraction

A major advantage of layout analysis is table extraction.

Without layout enrichment:

Rows and columns may become scrambled text.

With layout enrichment:

Rows remain structured
Columns are preserved
Relationships remain intact

This significantly improves retrieval quality.

Custom Skills

What Are Custom Skills?

Built-in skills do not cover every business scenario.

Custom skills allow developers to add:

Proprietary logic
Specialized AI models
External APIs
Custom transformations

Custom skills are commonly implemented using:

Azure Functions
Web APIs
Containerized services

Common Custom Skill Scenarios

Examples:

Industry-specific entity extraction
Internal taxonomy classification
Medical terminology analysis
Product categorization
Compliance scoring
Fraud detection enrichment

Custom Skill Workflow

			
Indexer
   ↓
Custom Skill API
   ↓
Enriched Metadata
   ↓
Search Index

		

When to Use Built-In vs Custom Skills

Built-In Skills	Custom Skills
Quick setup	Flexible
Microsoft-managed	Developer-managed
Common scenarios	Specialized scenarios
Minimal coding	Requires development

Knowledge Stores

Enriched data can also be projected into a knowledge store.

A knowledge store supports:

Analytics
Visualization
Reporting
Downstream processing

Outputs may include:

Tables
JSON objects
Enriched documents

Enrichment and RAG

Enrichment dramatically improves Retrieval-Augmented Generation systems.

Benefits include:

Better retrieval relevance
Improved grounding
Richer metadata
Enhanced semantic understanding

Example:

			
Raw document:
"Contoso released Project Falcon."
Enriched:
- Organization: Contoso
- Project: Falcon
- Release event detected

		

This creates more intelligent retrieval behavior.

Embeddings and Enrichment

Modern pipelines often combine enrichment with:

Chunking
Embedding generation
Vector indexing

Workflow:

			
Document
   ↓
OCR / Layout Extraction
   ↓
Entity Extraction
   ↓
Chunking
   ↓
Embeddings
   ↓
Vector Index

		

Performance Considerations

AI enrichment can increase:

Processing time
Compute cost
Indexing complexity

Optimization strategies:

Select only needed skills
Use incremental indexing
Limit enrichment scope
Cache reusable outputs

Security Considerations

Enrichment pipelines should support:

RBAC
Managed identities
Secure storage access
Data encryption
Compliance requirements

Important exam concept:

Enriched content may contain sensitive information.

Common AI-103 Scenarios

Scenario 1

You need searchable scanned documents.

Solution:

OCR Skill
Azure AI Search

Scenario 2

You need to preserve invoice tables.

Solution:

Azure AI Document Intelligence
Layout extraction

Scenario 3

You need industry-specific classification.

Solution:

Custom skill

Scenario 4

You need multilingual search.

Solution:

Language detection
Translation skill

Important AI-103 Exam Tips

Know These Key Concepts

Concept	Purpose
Skillset	AI enrichment pipeline
OCR	Extract text from images
Entity Recognition	Detect named entities
Layout Extraction	Preserve document structure
Custom Skill	Specialized enrichment logic
Knowledge Store	Store enriched outputs

Frequently Tested Areas

Expect questions involving:

Skillsets
OCR workflows
Layout-aware extraction
Custom enrichment APIs
Built-in cognitive skills
AI enrichment pipelines
Azure AI Search integration
Document Intelligence usage

Final Thoughts

AI enrichment is a foundational capability in modern Azure AI architectures.

For AI-103, focus heavily on:

Skillsets
Built-in cognitive skills
OCR pipelines
Layout extraction
Document Intelligence
Custom skills
Metadata enrichment
Search optimization

These concepts are essential for building high-quality enterprise AI systems, retrieval pipelines, and grounded AI applications.

Practice Exam Questions

Question 1

What is the primary purpose of a skillset in Azure AI Search?

A. Store vector embeddings
B. Manage RBAC permissions
C. Apply AI enrichment during indexing
D. Train foundation models

Answer

C. Apply AI enrichment during indexing

Question 2

Which built-in skill extracts text from images?

A. Entity Recognition Skill
B. OCR Skill
C. Sentiment Skill
D. Translation Skill

Answer

B. OCR Skill

Question 3

Which Azure service is commonly used for layout-aware document extraction?

A. Azure Monitor
B. Azure Backup
C. Azure Virtual Network
D. Azure AI Document Intelligence

Answer

D. Azure AI Document Intelligence

Question 4

What is a common use case for custom skills?

A. Hosting virtual machines
B. Industry-specific enrichment logic
C. Managing Azure subscriptions
D. Database replication

Answer

B. Industry-specific enrichment logic

Question 5

Which skill identifies people, organizations, and locations in text?

A. OCR Skill
B. Image Analysis Skill
C. Entity Recognition Skill
D. Translation Skill

Answer

C. Entity Recognition Skill

Question 6

Why is layout extraction important?

A. It preserves document structure and relationships
B. It encrypts documents
C. It reduces storage size
D. It removes duplicate records

Answer

A. It preserves document structure and relationships

Question 7

Which Azure service commonly hosts custom enrichment APIs?

A. Azure Functions
B. Azure Firewall
C. Azure Kubernetes Service only
D. Azure Monitor

Answer

A. Azure Functions

Question 8

What is the purpose of key phrase extraction?

A. Compress documents
B. Identify important concepts in content
C. Encrypt text
D. Generate embeddings

Answer

B. Identify important concepts in content

Question 9

Which enrichment capability is most useful for scanned PDF documents?

A. Semantic ranking
B. Vector similarity
C. OCR
D. Metadata filtering

Answer

C. OCR

Question 10

What is a knowledge store used for in Azure AI Search?

A. Hosting foundation models
B. Storing enriched outputs for downstream use
C. Managing virtual networks
D. Encrypting embeddings

Answer

B. Storing enriched outputs for downstream use

Go to the AI-103 Exam Prep Hub main page

AI, AI-103, Azure AI, Microsoft Certification May 25, 2026

Configure RAG ingestion flow, including documents and using OCR (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement information extraction solutions (10–15%)
   --> Build retrieval and grounding pipelines
      --> Configure RAG ingestion flow, including documents and using OCR

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

For the AI-103: Develop AI Apps and Agents on Azure certification exam, one of the critical topics within Build retrieval and grounding pipelines is understanding how to configure a Retrieval-Augmented Generation (RAG) ingestion flow.

Modern AI applications and agents depend heavily on RAG architectures to:

Retrieve enterprise data
Ground AI responses
Reduce hallucinations
Provide current and trusted information

A major part of this process involves:

Ingesting documents
Extracting content
Applying OCR
Enriching data
Creating searchable indexes
Supporting semantic and vector retrieval

Understanding how these components work together is essential for the AI-103 exam.

What Is Retrieval-Augmented Generation (RAG)?

RAG combines:

Information retrieval
External knowledge sources
Large Language Models (LLMs)

Instead of relying solely on model training data, a RAG system retrieves relevant enterprise content during inference.

Why RAG Matters

Without RAG:

AI models may hallucinate
Responses may be outdated
Enterprise knowledge is inaccessible
Answers may lack grounding

With RAG:

Responses are grounded in real documents
AI can use private organizational data
Retrieval improves factual accuracy
Answers become more trustworthy

High-Level RAG Architecture

A common RAG architecture looks like this:

			
Enterprise Documents
        ↓
Ingestion Pipeline
        ↓
OCR / Enrichment
        ↓
Chunking
        ↓
Embeddings Generation
        ↓
Vector Index
        ↓
Retrieval
        ↓
LLM Prompt
        ↓
Grounded Response

		

This workflow appears frequently in AI-103 scenarios.

Core Azure Services Used

Several Azure services commonly appear in RAG ingestion architectures.

Service	Purpose
Azure AI Search	Indexing, retrieval, vector search
Azure OpenAI Service	Embeddings and generative AI
Azure AI Vision	OCR and image analysis
Azure AI Document Intelligence	Layout extraction and document processing
Azure Blob Storage	Document storage
Azure Functions	Workflow automation and custom processing
Azure AI Foundry	AI orchestration and agent workflows

Understanding the RAG Ingestion Flow

The ingestion flow prepares enterprise data for retrieval and grounding.

Core stages include:

Document ingestion
Content extraction
OCR processing
AI enrichment
Chunking
Embedding generation
Indexing

Step 1: Document Ingestion

What Is Document Ingestion?

Document ingestion imports content into the retrieval pipeline.

Common sources:

PDFs
Word documents
PowerPoint files
HTML pages
Scanned images
Emails
Knowledge base articles
SharePoint repositories

Common Storage Locations

Many Azure architectures store documents in:

Azure Blob Storage
Azure Data Lake Storage
SharePoint
SQL databases

Blob Storage is especially common in AI-103 examples.

Step 2: Extracting Content

Documents may contain:

Plain text
Tables
Images
Scanned pages
Handwriting
Multi-column layouts

The extraction process converts raw files into machine-readable content.

Structured vs Unstructured Documents

Structured	Unstructured
Databases	PDFs
CSV files	Emails
Tables	Scanned forms
JSON	Images

RAG pipelines often focus on unstructured data.

Step 3: OCR Processing

What Is OCR?

OCR stands for Optical Character Recognition.

OCR extracts text from:

Scanned PDFs
Photos
Screenshots
Whiteboards
Forms
Image-based documents

This is one of the most heavily tested concepts in AI-103 information extraction topics.

Why OCR Is Important in RAG

Many enterprise documents are scanned images rather than machine-readable text.

Without OCR:

The content cannot be searched
Embeddings cannot be generated
Retrieval becomes impossible

OCR converts images into searchable text.

OCR Workflow

			
Scanned PDF
      ↓
OCR Processing
      ↓
Extracted Text
      ↓
Chunking
      ↓
Embeddings
      ↓
Search Index

		

Azure AI Vision OCR

Azure AI Vision provides OCR capabilities that can:

Detect printed text
Detect handwritten text
Support multiple languages
Extract text coordinates

Common outputs:

Lines
Words
Bounding boxes
Confidence scores

OCR in Azure AI Search Skillsets

OCR is commonly integrated directly into:

Azure AI Search indexers
Skillsets

Typical flow:

			
Blob Storage
     ↓
Indexer
     ↓
OCR Skill
     ↓
Search Index

		

Step 4: AI Enrichment

After OCR or extraction, AI enrichment improves the content.

Common enrichment steps:

Language detection
Entity recognition
Key phrase extraction
Sentiment analysis
Image tagging
Translation

These enrichments improve:

Retrieval quality
Metadata
Semantic search
Grounding accuracy

Skillsets in Azure AI Search

A skillset is a pipeline of AI enrichment operations.

Example:

			
OCR Skill
   ↓
Entity Recognition
   ↓
Key Phrase Extraction
   ↓
Embeddings Generation

		

Skillsets are a core AI-103 topic.

Step 5: Chunking Documents

Why Chunking Is Necessary

Large documents exceed LLM token limits.

Chunking divides documents into smaller pieces.

Benefits:

Better retrieval precision
Improved embedding quality
More accurate grounding
Reduced token usage

Chunking Strategies

Fixed-Size Chunking

Example:

500-token chunks

Semantic Chunking

Split by:

Sections
Headings
Paragraphs

Overlapping Chunks

Preserves context across chunks.

Example:

			
Chunk 1: Tokens 1–500
Chunk 2: Tokens 450–950

Step 6: Generate Embeddings

What Are Embeddings?

Embeddings are numerical vector representations of content.

Embeddings enable:

Semantic search
Vector search
Similarity matching

Generated using:

Azure OpenAI Service
Azure AI Foundry models

Embedding Workflow

			
Document Chunk
      ↓
Embedding Model
      ↓
Vector Embedding

		

The vectors are stored in a vector-enabled index.

Step 7: Indexing Content

Azure AI Search Indexes

Indexes store:

Document content
Metadata
Embeddings
Enrichment outputs

Example fields:

Field	Purpose
id	Unique identifier
content	Extracted text
title	Document title
contentVector	Embedding vector
language	Metadata

Vector Indexing

Vector indexes support:

Semantic similarity retrieval
Nearest-neighbor search
Hybrid search

Important exam concept:

Vector search is foundational to RAG retrieval.

Hybrid Search

What Is Hybrid Search?

Hybrid search combines:

Keyword search
Semantic ranking
Vector search

Benefits:

Better relevance
Higher recall
Improved grounding

Hybrid search is strongly recommended for enterprise AI applications.

Retrieval Stage

When a user submits a question:

Query embedding is generated
Search retrieves relevant chunks
Retrieved chunks are inserted into the prompt
LLM generates grounded response

Example RAG Query Flow

			
User Question
      ↓
Embedding Generation
      ↓
Vector + Hybrid Search
      ↓
Relevant Chunks Retrieved
      ↓
Prompt Construction
      ↓
Grounded AI Response

		

Document Intelligence and Layout Extraction

Many documents contain:

Tables
Forms
Multi-column layouts
Headers and footers

Simple OCR may lose structure.

Azure AI Document Intelligence preserves layout relationships.

Layout-Aware Retrieval

Example:

			
Invoice
 ├── Vendor
 ├── Invoice Number
 ├── Table of Charges
 └── Total

		

Layout extraction preserves:

Table rows
Field relationships
Reading order

This improves:

Search quality
Grounding accuracy
Structured retrieval

Security Considerations

Enterprise RAG systems often require:

RBAC
Managed identities
Private endpoints
Data encryption
Access-controlled retrieval

Important exam point:

Retrieval systems should return only authorized content.

Performance Optimization

Common optimization techniques:

Incremental indexing
Hybrid search
Proper chunk sizing
Metadata filtering
Caching embeddings
Selective OCR processing

Common AI-103 Scenarios

Scenario 1

You need searchable scanned PDFs.

Solution:

OCR Skill
Azure AI Search
Blob Storage

Scenario 2

You need semantic retrieval for an AI chatbot.

Solution:

Embeddings
Vector search
Hybrid search

Scenario 3

You need invoice field extraction.

Solution:

Azure AI Document Intelligence
Layout extraction

Scenario 4

You need enterprise grounding with internal documents.

Solution:

RAG architecture
Azure AI Search
Azure OpenAI

Important AI-103 Exam Tips

Know These Key Concepts

Concept	Purpose
OCR	Extract text from images
Skillset	AI enrichment pipeline
Chunking	Split documents for retrieval
Embeddings	Vector representations
Vector search	Semantic retrieval
Hybrid search	Combined retrieval approach
Grounding	Provide trusted context to LLM

Frequently Tested Knowledge Areas

Expect questions involving:

OCR pipelines
RAG architectures
Azure AI Search indexers
Skillsets
Embedding generation
Chunking strategies
Hybrid search
Layout-aware extraction
Document Intelligence integration

Final Thoughts

Configuring RAG ingestion flows is one of the most important modern Azure AI skills.

For AI-103, focus heavily on:

OCR workflows
Document ingestion
AI enrichment
Chunking
Embeddings
Vector indexing
Hybrid retrieval
Grounding pipelines

These concepts are foundational to enterprise AI agents, copilots, and intelligent search applications.

Practice Exam Questions

Question 1

What is the primary purpose of OCR in a RAG ingestion pipeline?

A. Encrypt documents
B. Generate embeddings directly
C. Compress PDF files
D. Convert images and scanned documents into searchable text

Answer

D. Convert images and scanned documents into searchable text

Question 2

Which Azure service commonly provides OCR capabilities?

A. Azure Backup
B. Azure AI Vision
C. Azure DNS
D. Azure Firewall

Answer

B. Azure AI Vision

Question 3

What is the purpose of chunking documents in a RAG pipeline?

A. Reduce network latency only
B. Encrypt sensitive data
C. Improve retrieval and fit token limits
D. Remove metadata

Answer

C. Improve retrieval and fit token limits

Question 4

Which Azure service commonly stores searchable vector indexes?

A. Azure AI Search
B. Azure Virtual Machines
C. Azure Monitor
D. Azure Policy

Answer

A. Azure AI Search

Question 5

What is the role of embeddings in a RAG system?

A. Compress images
B. Store RBAC permissions
C. Represent content as numerical vectors for similarity search
D. Replace OCR processing

Answer

C. Represent content as numerical vectors for similarity search

Question 6

Which component commonly orchestrates AI enrichment during indexing?

A. Load balancer
B. Skillset
C. Resource group
D. Network security group

Answer

B. Skillset

Question 7

Why is hybrid search commonly recommended in enterprise RAG systems?

A. It reduces storage costs only
B. It replaces OCR processing
C. It eliminates embeddings entirely
D. It combines multiple retrieval techniques for better relevance

Answer

D. It combines multiple retrieval techniques for better relevance

Question 8

Which Azure service is best for preserving document layout and table structures?

A. Azure AI Document Intelligence
B. Azure Monitor
C. Azure Kubernetes Service
D. Azure Logic Apps

Answer

A. Azure AI Document Intelligence

Question 9

What is grounding in a generative AI solution?

A. Deleting unused indexes
B. Training foundation models from scratch
C. Providing trusted external context to the LLM
D. Compressing vector databases

Answer

C. Providing trusted external context to the LLM

Question 10

Which statement best describes a RAG architecture?

A. It relies only on model training data
B. It combines retrieval systems with generative AI models
C. It eliminates the need for search indexes
D. It only works with structured databases

Answer

B. It combines retrieval systems with generative AI models

Go to the AI-103 Exam Prep Hub main page

AI, AI-103, Microsoft Certification May 25, 2026

Connect retrieval pipelines directly to workflows and agent tools (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement information extraction solutions (10–15%)
   --> Build retrieval and grounding pipelines
      --> Connect retrieval pipelines directly to workflows and agent tools

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

For the AI-103: Develop AI Apps and Agents on Azure certification exam, an important topic within Build retrieval and grounding pipelines is understanding how retrieval systems integrate directly with:

AI workflows
AI agents
Tools and plugins
Business processes
Enterprise automation systems

Modern AI applications no longer operate as isolated chatbots. Instead, they function as intelligent agents capable of:

Retrieving enterprise knowledge
Using external tools
Executing workflows
Calling APIs
Automating business operations
Making context-aware decisions

This topic focuses on how Retrieval-Augmented Generation (RAG) pipelines connect to these broader AI systems.

Why Retrieval Pipelines Matter in AI Agents

Large Language Models (LLMs) alone have limitations:

No inherent access to enterprise data
Static training knowledge
Potential hallucinations
No direct business system integration

Retrieval pipelines solve the knowledge problem by providing grounded enterprise data.

Agent tools and workflows solve the action problem by enabling AI systems to:

Retrieve information
Take actions
Automate processes
Interact with external systems

Together, retrieval + tools form the foundation of modern AI agents.

What Is a Retrieval Pipeline?

A retrieval pipeline:

Accepts a user query
Searches enterprise data
Retrieves relevant content
Supplies grounded context to the model

Typical pipeline stages:

			
User Query
    ↓
Embedding Generation
    ↓
Vector / Hybrid Search
    ↓
Relevant Document Chunks
    ↓
Prompt Construction
    ↓
LLM Response

		

What Are Agent Tools?

Agent tools are capabilities that AI agents can invoke dynamically.

Examples:

Search indexes
Databases
APIs
CRM systems
Ticketing systems
Email services
Scheduling systems
ERP platforms

Instead of only answering questions, the agent can:

Retrieve data
Execute operations
Update records
Trigger workflows

Azure Services Commonly Used

Several Azure services commonly appear in these architectures.

Service	Purpose
Azure AI Search	Retrieval and vector search
Azure OpenAI Service	LLMs and embeddings
Azure AI Foundry	Agent orchestration and tool integration
Azure Functions	Tool endpoints and automation
Azure Logic Apps	Workflow orchestration
Azure API Management	Secure API exposure
Azure Blob Storage	Source document storage

Retrieval-Augmented Generation (RAG)

What Is RAG?

RAG combines:

Retrieval systems
External knowledge
Generative AI

Workflow:

			
Question
   ↓
Retrieve Relevant Content
   ↓
Ground the Prompt
   ↓
Generate Response

		

This improves:

Accuracy
Freshness
Enterprise knowledge access
Hallucination reduction

Connecting Retrieval to Agent Workflows

Modern agents often follow this sequence:

			
User Request
     ↓
Agent Planning
     ↓
Tool Selection
     ↓
Retrieval Pipeline
     ↓
Context Gathering
     ↓
Workflow Execution
     ↓
Grounded Response

		

The retrieval system becomes one tool among many available to the agent.

Example Enterprise Agent Scenario

User asks:

"What is the status of customer ticket 4821?"

Agent workflow:

Retrieve ticket documentation
Query ticketing API
Retrieve knowledge articles
Generate grounded response
Offer next actions

This combines:

Retrieval
API tools
Workflow logic
Grounded AI generation

Agent Tool Invocation

What Is Tool Invocation?

Tool invocation allows an LLM or agent to call external functionality.

Examples:

Database query
REST API call
Search query
Workflow trigger

The model determines:

Which tool to use
When to use it
What parameters to send

Retrieval as a Tool

In modern architectures, retrieval itself is often exposed as a callable tool.

Example:

search_company_policies(query)

The agent can dynamically retrieve relevant information during conversations.

Function Calling and Tools

Many Azure AI architectures use:

Function calling
Tool calling
API orchestration

The LLM generates structured requests that invoke external systems.

Example:

			
{
  "tool": "search_documents",
  "query": "vacation policy"
}

Azure AI Search in Agent Architectures

Azure AI Search commonly serves as:

The enterprise retrieval layer
A vector search engine
A semantic search platform
A grounding source

The agent retrieves:

Relevant chunks
Metadata
Semantic matches
Knowledge articles

Hybrid Retrieval for Agents

Why Hybrid Search Matters

Hybrid search combines:

Keyword search
Semantic search
Vector search

Benefits:

Better retrieval quality
Improved grounding
Higher accuracy

Hybrid retrieval is especially important for agents because:

User requests vary widely
Natural language can be ambiguous
Exact keywords are not always present

Workflow Automation

Retrieval pipelines often connect directly to workflow systems.

Examples:

Ticket escalation
HR approvals
Inventory updates
Order processing
Document routing

Azure Logic Apps Integration

Azure Logic Apps enables:

Low-code orchestration
API integrations
Business process automation

Example workflow:

			
User Request
    ↓
Retrieve Policy
    ↓
Validate Eligibility
    ↓
Submit Approval Workflow
    ↓
Notify User

		

Azure Functions as Agent Tools

Azure Functions commonly provides:

Lightweight APIs
Custom tool endpoints
Retrieval wrappers
Data transformation services

Example:

			
Agent
   ↓
Azure Function
   ↓
Search Index Query
   ↓
Grounded Results

		

Multi-Step Agent Reasoning

Modern agents may perform:

Retrieval
Analysis
Tool invocation
Validation
Workflow execution
Final response generation

This is sometimes called:

Agent orchestration
Agentic workflows
Multi-step reasoning

Retrieval and Memory

Agents often maintain:

Conversation memory
Session context
Long-term retrieval memory

Retrieval systems may supplement memory with:

Enterprise knowledge
Historical records
Prior interactions

Metadata Filtering in Agent Retrieval

Metadata filtering improves retrieval precision.

Examples:

			
department = Finance
region = US
classification = Internal

This supports:

Security trimming
Contextual retrieval
Personalized responses

Security Considerations

Enterprise retrieval workflows require:

RBAC
Managed identities
API authentication
Secure connectors
Document-level permissions

Important AI-103 concept:

Agents should retrieve only authorized content.

Prompt Grounding

Retrieved content is inserted into prompts before inference.

Example:

			
System Prompt:
Use only the provided company policy documents when answering.

Grounded prompts improve:

Accuracy
Trustworthiness
Compliance

Agent Planning

Advanced agents may:

Decide whether retrieval is necessary
Select the best tool
Choose retrieval strategy
Determine workflow actions

Example:

			
Question:
"What is our PTO policy?"
Agent decision:
1. Use retrieval tool
2. Search HR documents
3. Generate grounded answer

		

Retrieval Pipelines and Multimodal Systems

Retrieval systems increasingly support:

Text
Images
Audio
Video

Examples:

OCR extraction
Image captions
Speech transcripts
Video metadata

These enrichments improve agent grounding.

Real-World Enterprise Use Cases

Customer Support Agents

Retrieve knowledge articles
Update tickets
Escalate issues

HR Agents

Retrieve policies
Trigger onboarding workflows
Validate eligibility rules

Finance Agents

Retrieve invoices
Query ERP systems
Initiate approvals

IT Support Agents

Retrieve troubleshooting documents
Reset passwords
Open incidents

Common AI-103 Scenarios

Scenario 1

You need an AI agent that answers questions using internal documents.

Solution:

Azure AI Search
Vector search
RAG grounding

Scenario 2

You need the agent to retrieve data and trigger workflows.

Solution:

Retrieval pipeline
Azure Logic Apps
Azure Functions

Scenario 3

You need secure enterprise retrieval.

Solution:

RBAC
Metadata filtering
Managed identities

Scenario 4

You need the AI system to call APIs dynamically.

Solution:

Tool calling
Function calling
Agent orchestration

Important AI-103 Exam Tips

Know These Core Concepts

Concept	Purpose
RAG	Retrieval + generation
Grounding	Supplying trusted context
Tool calling	Dynamic external function execution
Agent orchestration	Multi-step reasoning workflows
Hybrid search	Combined retrieval approach
Metadata filtering	Scoped retrieval
Workflow automation	Business process execution

Frequently Tested Areas

Expect questions involving:

RAG architectures
Tool invocation
Azure AI Search integration
Function calling
Workflow orchestration
Agent tool design
Hybrid retrieval
Security trimming
Grounded prompts

Final Thoughts

Connecting retrieval pipelines directly to workflows and agent tools is a foundational concept for modern enterprise AI systems.

For AI-103, focus heavily on:

RAG architectures
Retrieval integration
Agent orchestration
Tool calling
Workflow automation
Hybrid search
Grounding techniques
Secure enterprise retrieval

These concepts are central to intelligent copilots, enterprise AI assistants, and autonomous AI agents built on Azure.

Practice Exam Questions

Question 1

What is the primary purpose of a retrieval pipeline in a RAG system?

A. Train foundation models
B. Retrieve relevant external information for grounding
C. Encrypt enterprise documents
D. Replace embeddings entirely

Answer

B. Retrieve relevant external information for grounding

Question 2

Which Azure service commonly provides enterprise vector and hybrid search capabilities?

A. Azure Firewall
B. Azure AI Search
C. Azure DNS
D. Azure Policy

Answer

B. Azure AI Search

Question 3

What is grounding in an AI agent architecture?

A. Compressing embeddings
B. Restricting token counts
C. Training models on-premises
D. Providing trusted contextual data to the model

Answer

D. Providing trusted contextual data to the model

Question 4

What is tool invocation in an AI agent?

A. Rebuilding search indexes
B. Encrypting prompts
C. Calling external functionality dynamically
D. Reducing vector dimensions

Answer

C. Calling external functionality dynamically

Question 5

Which Azure service is commonly used for workflow orchestration?

A. Azure Logic Apps
B. Azure Firewall
C. Azure Monitor
D. Azure Kubernetes Service

Answer

A. Azure Logic Apps

Question 6

Why is hybrid search commonly recommended for AI agents?

A. It removes the need for embeddings
B. It combines multiple retrieval methods for improved relevance
C. It eliminates OCR requirements
D. It only supports structured data

Answer

B. It combines multiple retrieval methods for improved relevance

Question 7

Which Azure service commonly hosts lightweight APIs and custom agent tools?

A. Azure Backup
B. Azure DevTest Labs
C. Azure ExpressRoute
D. Azure Functions

Answer

D. Azure Functions

Question 8

What is the role of metadata filtering in retrieval pipelines?

A. Reduce storage costs only
B. Improve retrieval precision and security scoping
C. Replace vector search
D. Generate embeddings

Answer

B. Improve retrieval precision and security scoping

Question 9

What is a common responsibility of an AI agent orchestrator?

A. Managing virtual machine scaling
B. Encrypting OCR outputs
C. Coordinating retrieval, reasoning, and tool usage
D. Compressing vector databases

Answer

C. Coordinating retrieval, reasoning, and tool usage

Question 10

Which statement best describes Retrieval-Augmented Generation (RAG)?

A. It uses only model training data
B. It only works with SQL databases
C. It replaces semantic search completely
D. It combines retrieval systems with generative AI models