This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement generative AI and agentic solutions (30–35%)
   --> Build agents by using Foundry
      --> Build agents that integrate retrieval, function-calling, and conversation memory

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Modern AI agents are far more capable than traditional chatbots.

Today’s enterprise AI agents can:

Retrieve enterprise knowledge
Call APIs and tools
Maintain memory across conversations
Perform multistep workflows
Coordinate reasoning and actions

Azure AI Foundry provides the infrastructure and orchestration capabilities needed to build these advanced agentic systems.

For the AI-103: Develop AI Apps and Agents on Azure certification exam, understanding how to build agents that integrate:

Retrieval
Function-calling
Conversation memory

is extremely important.

These capabilities are foundational to enterprise generative AI systems.

What Is an AI Agent?

An AI agent is an AI-powered system capable of:

Understanding goals
Maintaining context
Using tools
Retrieving information
Performing actions
Adapting to new inputs

Agents extend beyond simple prompt-response interactions.

Core Components of Modern Agents

Modern agents commonly include:

Large language models (LLMs)
Retrieval systems
Tool integrations
Function-calling frameworks
Memory systems
Workflow orchestration
Safety controls

Retrieval in Agent Systems

Retrieval allows agents to:

Access external knowledge
Ground responses in enterprise data
Improve factual accuracy
Reduce hallucinations

Why Retrieval Matters

LLMs are trained on static datasets.

Without retrieval:

Models may lack current information
Enterprise-specific knowledge may be unavailable
Hallucinations become more likely

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) combines:

Search and retrieval systems
LLM reasoning and generation

RAG allows agents to generate responses using retrieved content.

Typical RAG Workflow

A common RAG workflow includes:

User submits a query
Query is converted to embeddings
Search retrieves relevant documents
Documents are added to prompts
LLM generates grounded responses

Knowledge Sources for Retrieval

Agents may retrieve data from:

Azure AI Search
Vector databases
SQL databases
Document repositories
SharePoint
Blob storage
Knowledge bases

Vector Search

Vector search enables semantic retrieval.

Instead of keyword matching only, vector search finds:

Meaning
Similarity
Contextual relationships

Embeddings

Embeddings are numerical vector representations of text or data.

Embeddings help systems:

Measure semantic similarity
Perform vector search
Improve retrieval relevance

Chunking Strategies

Documents are often split into smaller chunks before indexing.

Chunking improves:

Retrieval precision
Context quality
Token efficiency

Retrieval Pipelines

Retrieval pipelines commonly include:

Data ingestion
Chunking
Embedding generation
Indexing
Query retrieval
Reranking

Hybrid Search

Hybrid search combines:

Keyword search
Vector search

This improves search quality.

Grounding Responses

Grounding means generating responses using retrieved evidence.

Grounded systems are:

More accurate
More explainable
More reliable

Citation and Source Attribution

Agents may include:

Source links
Document citations
Retrieved evidence

This improves transparency.

Function-Calling in Agent Systems

Function-calling allows models to invoke:

APIs
Services
Workflows
Databases
External tools

Why Function-Calling Matters

LLMs alone cannot:

Access live systems
Execute actions
Retrieve dynamic business data

Function-calling bridges this gap.

Examples of Functions

Common functions include:

Get weather data
Retrieve customer records
Create support tickets
Query inventory systems
Send emails
Schedule meetings

Tool Schemas

Function-calling relies on structured tool schemas.

Schemas define:

Tool names
Parameters
Data types
Required fields
Expected outputs

Example Function Schema

Example:

Function: GetOrderStatus

Inputs:

OrderID
CustomerID

Outputs:

Shipping status
Estimated delivery date

Structured Tool Invocation

Structured tool invocation improves:

Reliability
Validation
Automation
Error handling

Function Selection Logic

Agents may decide:

Whether tools are needed
Which tools to invoke
When to call functions
How to sequence operations

Multi-Tool Workflows

Advanced agents may orchestrate:

Multiple tools
Sequential workflows
Conditional logic
Parallel execution

Example Multi-Tool Workflow

Example:

Retrieve customer data
Query billing system
Generate summary
Create support ticket
Send notification

Tool Safety Controls

Organizations should control:

Which tools agents can access
Which users may trigger actions
Which workflows require approval

Human-in-the-Loop Approvals

High-risk operations may require:

Human review
Approval checkpoints
Escalation workflows

Conversation Memory

Conversation memory allows agents to:

Maintain context
Track interactions
Remember prior information
Continue workflows

Why Memory Matters

Without memory:

Conversations become disconnected
Users repeat information
Workflow continuity breaks

Types of Memory

Common memory types include:

Short-term memory
Long-term memory
Episodic memory
Semantic memory

Short-Term Memory

Short-term memory stores:

Recent prompts
Recent responses
Current task state

Long-Term Memory

Long-term memory stores:

User preferences
Historical interactions
Persistent context

Stateful vs Stateless Agents

Stateless Agents

Do not retain memory between sessions.

Benefits:

Simpler architecture
Lower storage requirements

Stateful Agents

Maintain context and conversation history.

Benefits:

Better user experiences
Improved multistep reasoning

Context Window Limitations

LLMs have limited context windows.

Applications must manage:

Token usage
Conversation length
Historical context

Memory Management Strategies

Common strategies include:

Rolling conversation windows
Summarized history
Vector memory retrieval
Persistent storage systems

Vector Memory

Conversation history may be stored as embeddings.

This enables:

Semantic memory retrieval
Long-term contextual recall
Personalized interactions

Retrieval-Based Memory

Agents may retrieve:

Prior conversations
Historical workflow data
Previous decisions

Persistent Memory Storage

Persistent memory may use:

Databases
Search indexes
Vector stores
Cloud storage

Agent Orchestration

Orchestration coordinates:

Retrieval systems
Function-calling
Memory systems
Workflow execution

Agent Reasoning Loops

Agents may perform iterative reasoning:

Analyze request
Retrieve information
Call tools
Evaluate outputs
Continue reasoning
Generate response

Workflow State Management

Agents may track:

Active tasks
Tool outputs
Pending actions
Workflow progress

Azure AI Foundry and Agent Development

Azure AI Foundry supports:

Model deployment
Retrieval integration
Agent orchestration
Prompt flows
Evaluation pipelines
Monitoring and governance

Azure AI Search in Agent Systems

Azure AI Search commonly provides:

Vector indexing
Semantic ranking
Hybrid search
Enterprise retrieval

Prompt Engineering for Agents

Effective prompts define:

Agent role
Behavioral expectations
Tool usage rules
Safety constraints

Grounded Prompt Construction

Grounded prompts may include:

Retrieved documents
Citations
Tool outputs
Prior conversation context

Monitoring Agent Systems

Organizations should monitor:

Retrieval relevance
Tool-call accuracy
Memory quality
Latency
Hallucinations
Safety events

Evaluating RAG Systems

RAG systems should be evaluated for:

Retrieval quality
Relevance
Faithfulness
Grounding accuracy
Citation quality

Evaluating Function-Calling

Organizations should validate:

Correct tool selection
Parameter accuracy
Workflow reliability
Error recovery

Evaluating Conversation Memory

Memory systems should be evaluated for:

Context retention
Consistency
Recall accuracy
Session continuity

Security Considerations

Secure agent systems should implement:

Authentication
Authorization
Managed identities
RBAC
Private networking
Audit logging

Responsible AI Considerations

Organizations should apply:

Safety filters
Guardrails
Human oversight
Content moderation
Usage monitoring

Real-World Scenario

Scenario: Enterprise HR Assistant

Requirements:

Retrieve HR policies
Answer employee questions
Access scheduling systems
Remember user preferences
Escalate sensitive requests

Recommended Design:

RAG using Azure AI Search
Function-calling for HR systems
Stateful conversation memory
Approval workflows for sensitive actions
Grounded response generation

Common AI-103 Exam Tips

Understand Retrieval Concepts

Know:

RAG
Embeddings
Vector search
Hybrid search
Grounding

Learn Function-Calling Concepts

Understand:

Tool schemas
Structured invocation
Tool orchestration
Workflow execution

Understand Memory Systems

Know:

Stateful vs stateless agents
Short-term vs long-term memory
Context management
Vector memory

Understand Agent Orchestration

Know how agents combine:

Retrieval
Tool usage
Memory
Reasoning

Summary

Modern enterprise agents combine:

Retrieval systems
Function-calling
Conversation memory
Workflow orchestration

For the AI-103 exam, you should understand:

RAG architectures
Vector search
Embeddings
Grounding
Function-calling
Tool schemas
Tool orchestration
Stateful memory
Context management
Agent reasoning loops
Monitoring and governance

These concepts are foundational to building scalable and intelligent AI agents with Azure AI Foundry.

Practice Exam Questions

Question 1

What is the primary purpose of Retrieval-Augmented Generation (RAG)?

A. Reduce GPU temperatures
B. Combine retrieval systems with LLM generation
C. Eliminate vector search
D. Replace APIs completely

Answer

B. Combine retrieval systems with LLM generation

Explanation

RAG combines retrieval and generation to improve grounded responses.

Question 2

Why are embeddings important in retrieval systems?

A. They increase firewall security
B. They enable semantic similarity comparisons
C. They replace orchestration engines
D. They remove token limits

Answer

B. They enable semantic similarity comparisons

Explanation

Embeddings support semantic vector search.

Question 3

What is a key advantage of hybrid search?

A. It disables semantic ranking
B. It combines keyword and vector search
C. It removes indexing requirements
D. It eliminates embeddings

Answer

B. It combines keyword and vector search

Explanation

Hybrid search improves retrieval quality by combining approaches.

Question 4

What is the purpose of function-calling in agent systems?

A. Reduce network traffic only
B. Allow models to invoke external tools and services
C. Eliminate APIs
D. Disable workflows

Answer

B. Allow models to invoke external tools and services

Explanation

Function-calling enables interaction with external systems.

Question 5

What information is typically included in a tool schema?

A. GPU temperature metrics
B. Parameters, data types, and outputs
C. Only firewall settings
D. Only vector dimensions

Answer

B. Parameters, data types, and outputs

Explanation

Schemas define structured tool interfaces.

Question 6

Why is conversation memory important?

A. It reduces all storage costs
B. It maintains continuity and context across interactions
C. It removes orchestration needs
D. It disables tool invocation

Answer

B. It maintains continuity and context across interactions

Explanation

Memory improves user experiences and multistep workflows.

Question 7

What is a characteristic of stateful agents?

A. They never store context
B. They maintain conversation history and state
C. They disable retrieval systems
D. They remove prompt engineering

Answer

B. They maintain conversation history and state

Explanation

Stateful agents retain memory across interactions.

Question 8

What is a common challenge when using LLM conversation memory?

A. Unlimited context windows
B. Context window limitations and token constraints
C. Elimination of embeddings
D. Removal of grounding

Answer

B. Context window limitations and token constraints

Explanation

LLMs can process only limited amounts of context.

Question 9

Which Azure service is commonly used for enterprise retrieval in RAG architectures?

A. Azure DevOps
B. Azure AI Search
C. Azure Virtual Desktop
D. Azure Batch

Answer

B. Azure AI Search

Explanation

Azure AI Search supports vector and hybrid search for RAG systems.

Question 10

What should organizations monitor in agent systems?

A. Only GPU fan speeds
B. Retrieval quality, tool usage, memory accuracy, and safety
C. Only prompt lengths
D. Only authentication failures

Answer

B. Retrieval quality, tool usage, memory accuracy, and safety

Explanation

Comprehensive monitoring improves reliability, governance, and user trust.

Go to the AI-103 Exam Prep Hub main page