This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub.
This topic falls under these sections:
Implement generative AI and agentic solutions (30–35%)
--> Build generative applications by using Foundry
--> Implement Retrieval-Augmented Generation (RAG) in an application
Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Introduction
Large language models (LLMs) are powerful, but they have limitations.
LLMs may:
- Hallucinate information
- Generate outdated responses
- Lack organization-specific knowledge
- Produce unverifiable answers
Retrieval-Augmented Generation (RAG) addresses these issues by combining:
- Information retrieval
- Vector search
- Enterprise knowledge grounding
- Generative AI
The AI-103: Develop AI Apps and Agents on Azure certification exam tests your understanding of how to implement RAG-based applications.
For the AI-103 exam, you should understand:
- RAG architecture
- Vector search
- Embeddings
- Chunking strategies
- Indexing
- Semantic search
- Grounding techniques
- Prompt augmentation
- Retrieval pipelines
- RAG optimization
- Monitoring and evaluation
- Security considerations
What Is Retrieval-Augmented Generation (RAG)?
RAG is an AI architecture that combines:
- Information retrieval
- Context augmentation
- Generative AI
Instead of relying only on model training data, RAG retrieves relevant information from external sources and injects it into prompts.
Why RAG Matters
RAG improves:
- Accuracy
- Grounding
- Freshness of information
- Enterprise knowledge integration
- Explainability
Common RAG Use Cases
Typical RAG applications include:
- Enterprise chatbots
- Knowledge assistants
- Internal documentation search
- Customer support systems
- Research assistants
- AI copilots
Core Components of a RAG System
A RAG solution typically includes:
- Data sources
- Chunking pipeline
- Embedding model
- Vector database or search index
- Retrieval engine
- Large language model
- Prompt orchestration layer
RAG Workflow Overview
The general workflow is:
- Ingest data
- Split data into chunks
- Generate embeddings
- Store embeddings in an index
- Receive user query
- Convert query to embeddings
- Retrieve relevant chunks
- Add retrieved context to prompt
- Generate grounded response
What Are Embeddings?
Embeddings are numerical vector representations of data.
Embeddings capture:
- Semantic meaning
- Contextual similarity
- Relationships between concepts
Embedding Models
Embedding models convert:
- Text
- Documents
- Queries
Into vectors for similarity comparison.
Vector Similarity Search
Vector search identifies content that is semantically similar.
Unlike keyword search, vector search understands:
- Meaning
- Intent
- Context
What Is Chunking?
Chunking divides documents into smaller sections.
Chunking is essential because:
- Models have token limits
- Smaller chunks improve retrieval precision
- Large documents are difficult to process efficiently
Chunking Strategies
Common chunking methods include:
- Fixed-size chunking
- Sliding window chunking
- Semantic chunking
- Paragraph-based chunking
Fixed-Size Chunking
Documents are split into equal-sized chunks.
Advantages:
- Simple
- Predictable
Disadvantages:
- May break context unexpectedly
Sliding Window Chunking
Chunks overlap partially.
Benefits include:
- Better context preservation
- Improved retrieval continuity
Semantic Chunking
Semantic chunking groups logically related content.
Advantages:
- Better contextual integrity
- Higher retrieval quality
Metadata in RAG Systems
Metadata may include:
- Document title
- Author
- Date
- Category
- Security labels
Metadata improves filtering and retrieval.
Indexing in RAG Systems
Indexes store:
- Embeddings
- Metadata
- Searchable content
Indexes enable efficient retrieval.
Vector Databases and Search Indexes
RAG systems commonly use:
- Azure AI Search
- Vector indexes
- Hybrid search systems
Semantic Search
Semantic search improves relevance using:
- Meaning
- Intent
- Natural language understanding
Hybrid Search
Hybrid search combines:
- Keyword search
- Semantic ranking
- Vector similarity search
This often improves retrieval quality.
Retrieval Pipelines
Retrieval pipelines:
- Process user queries
- Retrieve relevant information
- Rank search results
- Filter irrelevant content
Query Embeddings
User queries are converted into embeddings.
The query vector is compared against stored vectors.
Similarity Metrics
Common similarity calculations include:
- Cosine similarity
- Euclidean distance
- Dot product similarity
Top-K Retrieval
Top-K retrieval returns the most relevant results.
Choosing the right K value is important:
- Too few results may miss context
- Too many results may add noise
Prompt Augmentation
Retrieved content is inserted into prompts.
This process is called:
- Prompt grounding
- Context injection
- Prompt augmentation
Grounded Responses
Grounded responses:
- Reference trusted data
- Reduce hallucinations
- Improve reliability
System Prompts in RAG
System prompts may instruct the model to:
- Use only retrieved sources
- Cite references
- Avoid unsupported claims
Citation Generation
Many RAG applications provide:
- Source references
- Citations
- Linked documents
This improves transparency.
Hallucination Reduction
RAG reduces hallucinations by:
- Providing factual context
- Using enterprise knowledge
- Restricting unsupported generation
RAG Architecture Patterns
Common patterns include:
- Basic RAG
- Hybrid RAG
- Multi-stage retrieval
- Agentic RAG
Basic RAG
Basic RAG:
- Retrieves documents
- Injects them into prompts
- Generates responses
Hybrid RAG
Hybrid RAG combines:
- Vector search
- Keyword search
- Semantic ranking
Multi-Stage Retrieval
Multi-stage retrieval uses:
- Initial retrieval
- Re-ranking
- Filtering
- Secondary refinement
Agentic RAG
Agentic RAG systems may:
- Choose retrieval tools dynamically
- Perform iterative searches
- Validate retrieved data
- Orchestrate workflows
Azure AI Search in RAG
Azure AI Search commonly provides:
- Vector search
- Semantic ranking
- Hybrid search
- Index management
Data Ingestion Pipelines
RAG ingestion pipelines may process:
- PDFs
- Web pages
- Databases
- Office documents
- Structured data
Data Freshness
Organizations should ensure indexes remain current.
Strategies include:
- Scheduled reindexing
- Incremental ingestion
- Event-driven updates
Access Control in RAG
Enterprise RAG systems should enforce:
- Role-based access
- Document-level security
- Identity-aware retrieval
Security Considerations
Organizations should secure:
- Data ingestion pipelines
- Search indexes
- Embedding endpoints
- Model endpoints
Monitoring RAG Systems
Organizations should monitor:
- Retrieval quality
- Grounding quality
- Latency
- Hallucinations
- Search relevance
Evaluating RAG Performance
Key evaluation metrics include:
- Precision
- Recall
- Relevance
- Groundedness
- Citation accuracy
Groundedness Evaluation
Groundedness measures whether responses are supported by retrieved content.
Retrieval Quality Evaluation
Organizations should evaluate:
- Search result relevance
- Ranking effectiveness
- Missing context
Latency Optimization
RAG pipelines can introduce additional latency.
Optimization strategies include:
- Caching
- Smaller embeddings
- Efficient indexing
- Query optimization
Cost Optimization
Cost reduction strategies include:
- Limiting retrieved chunks
- Smaller embedding models
- Efficient indexing
- Intelligent caching
Responsible AI Considerations
Developers should:
- Validate sources
- Prevent data leakage
- Monitor hallucinations
- Enforce safety policies
Common AI-103 RAG Scenarios
Scenario 1: Enterprise Knowledge Chatbot
Requirements:
- Internal document access
- Accurate answers
- Source citations
Recommended Solution:
- RAG with Azure AI Search
Scenario 2: Legal Document Assistant
Requirements:
- High factual accuracy
- Traceability
- Large document support
Recommended Solution:
- Semantic chunking
- Hybrid search
- Citation generation
Scenario 3: Customer Support Copilot
Requirements:
- Fast retrieval
- Grounded answers
- Updated knowledge
Recommended Solution:
- Incremental indexing
- Real-time retrieval
Scenario 4: Agentic AI Workflow
Requirements:
- Dynamic retrieval
- Multi-step reasoning
- Tool orchestration
Recommended Solution:
- Agentic RAG architecture
Common AI-103 Exam Tips
Understand the RAG Workflow
Know all stages:
- Ingestion
- Chunking
- Embeddings
- Indexing
- Retrieval
- Prompt augmentation
- Generation
Learn Embedding Concepts
Understand:
- Semantic vectors
- Similarity search
- Embedding models
Understand Search Types
Know the differences between:
- Keyword search
- Vector search
- Semantic search
- Hybrid search
Understand Grounding
Know how grounding:
- Reduces hallucinations
- Improves factual accuracy
- Supports explainability
Summary
Retrieval-Augmented Generation (RAG) is one of the most important generative AI architectures.
For the AI-103 exam, you should understand:
- RAG architecture
- Embeddings
- Chunking
- Indexing
- Vector search
- Semantic search
- Hybrid search
- Prompt grounding
- Retrieval pipelines
- Groundedness evaluation
- Security considerations
- Monitoring and optimization
RAG enables organizations to build:
- Accurate
- Explainable
- Grounded
- Enterprise-aware AI applications
These concepts are foundational for modern AI systems on Azure.
Practice Exam Questions
Question 1
What is the primary goal of Retrieval-Augmented Generation (RAG)?
A. Reduce storage replication
B. Improve factual grounding using retrieved data
C. Eliminate vector search
D. Replace all language models
Answer
B. Improve factual grounding using retrieved data
Explanation
RAG improves accuracy by injecting retrieved information into prompts.
Question 2
What are embeddings?
A. GPU drivers
B. Numerical vector representations of data
C. Network security policies
D. Storage replication methods
Answer
B. Numerical vector representations of data
Explanation
Embeddings represent semantic meaning as vectors.
Question 3
Why is chunking important in RAG systems?
A. To increase network latency
B. To divide documents into manageable sections
C. To disable semantic search
D. To eliminate embeddings
Answer
B. To divide documents into manageable sections
Explanation
Chunking improves retrieval efficiency and contextual relevance.
Question 4
Which search method understands semantic meaning instead of exact keywords?
A. Static indexing
B. Vector search
C. Archive retrieval
D. Compression balancing
Answer
B. Vector search
Explanation
Vector search retrieves semantically similar content.
Question 5
What does hybrid search combine?
A. GPU clusters and storage accounts
B. Keyword search and vector search
C. Virtual machines and containers
D. Authentication and authorization
Answer
B. Keyword search and vector search
Explanation
Hybrid search combines lexical and semantic retrieval methods.
Question 6
What is prompt augmentation?
A. Increasing storage capacity
B. Adding retrieved context to prompts
C. Compressing vectors
D. Removing metadata
Answer
B. Adding retrieved context to prompts
Explanation
Prompt augmentation injects retrieved content into model prompts.
Question 7
What is groundedness?
A. GPU allocation efficiency
B. Whether responses are supported by retrieved sources
C. Network bandwidth usage
D. Storage replication speed
Answer
B. Whether responses are supported by retrieved sources
Explanation
Groundedness measures factual support from retrieved content.
Question 8
Which Azure service is commonly used for vector and semantic search in RAG systems?
A. Azure AI Search
B. Azure DNS
C. Azure Backup
D. Azure Batch
Answer
A. Azure AI Search
Explanation
Azure AI Search supports vector, semantic, and hybrid search.
Question 9
What is a major advantage of semantic chunking?
A. It eliminates embeddings
B. It preserves contextual meaning better
C. It disables retrieval
D. It reduces authentication requirements
Answer
B. It preserves contextual meaning better
Explanation
Semantic chunking groups logically related content.
Question 10
Which metric evaluates whether retrieved results are relevant?
A. Groundedness
B. Retrieval quality
C. GPU utilization
D. Storage redundancy
Answer
B. Retrieval quality
Explanation
Retrieval quality measures the relevance of retrieved documents.
Go to the AI-103 Exam Prep Hub main page
