This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Plan and manage an Azure AI solution (25–30%)
   --> Set up AI solutions in Foundry
      --> Configure model and agent deployments

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the most important responsibilities for Azure AI developers is configuring and managing model and agent deployments.

Modern AI applications depend on properly configured:

Large Language Models (LLMs)
Embedding models
Multimodal models
AI agents
Retrieval systems
Tool integrations
Orchestration workflows

The AI-103: Develop AI Apps and Agents on Azure certification exam tests your ability to configure AI solutions in Azure AI Foundry and related Azure services.

For the AI-103 exam, you should understand:

Azure OpenAI model deployments
Deployment types
Provisioned throughput
Model versioning
Deployment scaling
Agent configuration
Tool and function integration
Retrieval integration
Security configuration
Monitoring and evaluation
Deployment lifecycle management

What Is a Model Deployment?

A model deployment is a configured instance of an AI model that applications can access through APIs.

Deployments allow developers to:

Choose models
Configure capacity
Control scaling
Manage versions
Apply security controls
Monitor usage

A deployment acts as the operational endpoint for AI inference.

Azure AI Foundry

Azure AI Foundry provides tools and services for:

Deploying AI models
Configuring AI agents
Managing workflows
Evaluating AI systems
Monitoring AI applications

It integrates with:

Azure OpenAI
Azure AI Search
Prompt Flow
Azure AI Content Safety
Azure Functions

Types of Models in Azure AI

Common model types include:

Large Language Models (LLMs)
Small Language Models (SLMs)
Embedding models
Multimodal models
Vision models
Speech models

Large Language Models (LLMs)

LLMs are used for:

Chatbots
AI copilots
Summarization
Reasoning
Tool calling
Content generation

Examples include GPT-based models.

Embedding Models

Embedding models convert content into vector representations.

Used for:

Vector search
Semantic retrieval
Similarity matching
RAG systems

Multimodal Models

Multimodal models process multiple input types such as:

Text
Images
Audio
Documents

Used for:

Image analysis
Visual reasoning
OCR workflows
Multimodal agents

Azure OpenAI Deployments

Azure OpenAI deployments expose models through API endpoints.

Deployment configuration includes:

Model selection
Deployment name
Capacity allocation
Version selection
Region selection
Content filtering settings

Deployment Names

Each deployment has a unique deployment name.

Applications use the deployment name when making API requests.

Example:

gpt4-copilot-prod
embeddings-search-dev

Model Versioning

Models evolve over time.

Versioning helps:

Maintain stability
Test upgrades
Support rollback strategies
Compare model behavior

Why Model Versioning Matters

Different versions may:

Behave differently
Produce different outputs
Affect latency
Affect costs
Impact prompt performance

Deployment Types

Azure AI commonly supports:

Standard deployments
Provisioned throughput deployments

Standard Deployments

Standard deployments use shared infrastructure.

Advantages:

Simpler setup
Lower upfront costs
Flexible usage

Limitations:

Shared capacity
Variable latency under heavy load

Provisioned Throughput Deployments

Provisioned throughput reserves dedicated model capacity.

Advantages:

Predictable performance
Consistent latency
Enterprise-grade scaling

Limitations:

Higher cost
Capacity planning required

When to Use Standard Deployments

Use standard deployments when:

Workloads are moderate
Usage is variable
Cost optimization matters
Development/testing environments are used

When to Use Provisioned Throughput

Use provisioned throughput when:

High traffic is expected
Predictable latency is required
Enterprise SLAs exist
Production copilots are deployed

Scaling Model Deployments

AI deployments must support varying workloads.

Autoscaling

Autoscaling adjusts resources dynamically based on demand.

Benefits:

Improved performance
Better cost efficiency
Reduced manual intervention

Horizontal Scaling

Horizontal scaling adds additional instances or capacity.

Useful for:

High concurrency
Enterprise AI systems
Large-scale chatbots

Latency Considerations

Latency refers to response time.

Factors affecting latency:

Model size
Throughput load
Geographic distance
Retrieval pipelines
Tool execution

Choosing the Correct Model

Choosing the correct model is critical.

Use Larger Models When:

Advanced reasoning is required
Complex workflows exist
High-quality generation matters

Use Smaller Models When:

Cost efficiency matters
Low latency is important
Simpler tasks are performed

Agent Deployments

AI agents combine:

Models
Memory
Retrieval
Tool calling
Workflow orchestration

Agent deployment involves configuring all these components together.

Agent Configuration Components

Common agent configuration elements include:

System prompts
Tool definitions
Function calling
Knowledge sources
Retrieval settings
Memory configuration
Safety settings

System Prompts

System prompts define:

Agent behavior
Role instructions
Response style
Operational constraints

Well-designed system prompts improve:

Reliability
Consistency
Safety

Tool and Function Integration

Agents may use tools such as:

APIs
Databases
Search services
External systems

Function calling enables agents to invoke these tools dynamically.

Retrieval Integration

Many AI agents use Retrieval-Augmented Generation (RAG).

RAG systems commonly integrate:

Azure AI Search
Embedding models
Vector search
Knowledge indexes

Knowledge Sources

Agents may connect to:

Enterprise documents
Databases
APIs
SharePoint
Blob Storage
Internal knowledge bases

Memory Configuration

Agents may use:

Short-term memory
Long-term memory
Semantic memory

Common storage systems include:

Azure Cosmos DB
Azure SQL Database
Azure AI Search

Security Configuration

Security is a major AI-103 exam topic.

Microsoft Entra ID

Microsoft Entra ID supports:

Authentication
Authorization
RBAC
Identity management

Azure Key Vault

Azure Key Vault securely stores:

API keys
Secrets
Certificates
Connection strings

Content Safety Configuration

Azure AI Content Safety helps:

Detect harmful content
Filter unsafe outputs
Apply safety policies

Network Security

Enterprise AI deployments may use:

VNets
Private Endpoints
Firewalls
API gateways

Monitoring Deployments

AI deployments require operational monitoring.

Azure Monitor

Azure Monitor provides:

Metrics
Logging
Alerts
Diagnostics

Application Insights

Application Insights supports:

Telemetry
Request tracing
Error diagnostics
Performance monitoring

Metrics to Monitor

Common metrics include:

Latency
Token usage
Error rates
Throughput
Tool call failures
Retrieval quality

Evaluating AI Deployments

AI systems should be evaluated for:

Accuracy
Groundedness
Safety
Relevance
Reliability

Prompt Flow

Prompt Flow supports:

Workflow orchestration
Prompt chaining
Tool integration
Evaluation pipelines

Prompt Flow is an important AI-103 topic.

CI/CD for AI Deployments

AI deployment pipelines should support:

Automated testing
Version control
Safe releases
Rollbacks

Blue-Green Deployments

Blue-green deployments:

Reduce downtime
Support safer releases
Simplify rollback

Canary Deployments

Canary deployments:

Roll out changes gradually
Reduce deployment risk
Support controlled testing

Common AI-103 Deployment Scenarios

Scenario 1: Enterprise AI Copilot

Requirements:

High concurrency
Secure retrieval
Enterprise search
Low latency

Recommended Configuration:

Provisioned throughput
Azure AI Search
Entra ID
Autoscaling

Scenario 2: Development Chatbot

Requirements:

Low cost
Rapid experimentation
Flexible scaling

Recommended Configuration:

Standard deployment
App Service
Basic monitoring

Scenario 3: AI Agent with Tool Calling

Requirements:

API integrations
Workflow execution
Multi-step reasoning

Recommended Configuration:

Azure OpenAI
Azure Functions
Prompt Flow
Tool definitions

Scenario 4: Enterprise Knowledge Assistant

Requirements:

Grounded responses
Semantic retrieval
Document search

Recommended Configuration:

Embedding models
Azure AI Search
Hybrid search
RAG pipelines

Cost Optimization Considerations

AI deployments can become expensive.

Common Cost Drivers

Token usage
Provisioned throughput
Search indexing
Embedding generation
Large models
High concurrency

Cost Optimization Strategies

Use Smaller Models When Possible

Smaller models reduce:

Latency
Compute costs
Token usage

Optimize Retrieval

Efficient retrieval reduces:

Prompt size
Token costs
Latency

Use Autoscaling

Autoscaling prevents overprovisioning.

Common AI-103 Exam Tips

Understand Deployment Types

Know the differences between:

Standard deployments
Provisioned throughput deployments

Learn Agent Configuration Components

Understand:

System prompts
Tool integration
Retrieval settings
Memory configuration

Know Security Best Practices

Use:

Entra ID
RBAC
Key Vault
Private networking

Understand Monitoring Concepts

Know how to monitor:

Latency
Token usage
Throughput
Errors
AI quality

Summary

Configuring model and agent deployments is a critical skill for Azure AI developers.

For the AI-103 exam, you should understand:

Azure OpenAI deployment configuration
Model versioning
Deployment scaling
Agent architecture
Tool integration
Retrieval integration
Memory configuration
Security controls
Monitoring and evaluation
Deployment lifecycle management

Well-configured deployments improve:

Reliability
Performance
Scalability
Security
Cost efficiency
User experience

These concepts are foundational for building enterprise-grade AI applications and agent-based systems on Azure.

Prompt Flow orchestrates prompts, tools, and AI workflows.

Go to the AI-103 Exam Prep Hub main page