This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub.
This topic falls under these sections:
Plan and manage an Azure AI solution (25–30%)
--> Set up AI solutions in Foundry
--> Configure model and agent deployments
Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Introduction
One of the most important responsibilities for Azure AI developers is configuring and managing model and agent deployments.
Modern AI applications depend on properly configured:
- Large Language Models (LLMs)
- Embedding models
- Multimodal models
- AI agents
- Retrieval systems
- Tool integrations
- Orchestration workflows
The AI-103: Develop AI Apps and Agents on Azure certification exam tests your ability to configure AI solutions in Azure AI Foundry and related Azure services.
For the AI-103 exam, you should understand:
- Azure OpenAI model deployments
- Deployment types
- Provisioned throughput
- Model versioning
- Deployment scaling
- Agent configuration
- Tool and function integration
- Retrieval integration
- Security configuration
- Monitoring and evaluation
- Deployment lifecycle management
What Is a Model Deployment?
A model deployment is a configured instance of an AI model that applications can access through APIs.
Deployments allow developers to:
- Choose models
- Configure capacity
- Control scaling
- Manage versions
- Apply security controls
- Monitor usage
A deployment acts as the operational endpoint for AI inference.
Azure AI Foundry
Azure AI Foundry provides tools and services for:
- Deploying AI models
- Configuring AI agents
- Managing workflows
- Evaluating AI systems
- Monitoring AI applications
It integrates with:
- Azure OpenAI
- Azure AI Search
- Prompt Flow
- Azure AI Content Safety
- Azure Functions
Types of Models in Azure AI
Common model types include:
- Large Language Models (LLMs)
- Small Language Models (SLMs)
- Embedding models
- Multimodal models
- Vision models
- Speech models
Large Language Models (LLMs)
LLMs are used for:
- Chatbots
- AI copilots
- Summarization
- Reasoning
- Tool calling
- Content generation
Examples include GPT-based models.
Embedding Models
Embedding models convert content into vector representations.
Used for:
- Vector search
- Semantic retrieval
- Similarity matching
- RAG systems
Multimodal Models
Multimodal models process multiple input types such as:
- Text
- Images
- Audio
- Documents
Used for:
- Image analysis
- Visual reasoning
- OCR workflows
- Multimodal agents
Azure OpenAI Deployments
Azure OpenAI deployments expose models through API endpoints.
Deployment configuration includes:
- Model selection
- Deployment name
- Capacity allocation
- Version selection
- Region selection
- Content filtering settings
Deployment Names
Each deployment has a unique deployment name.
Applications use the deployment name when making API requests.
Example:
- gpt4-copilot-prod
- embeddings-search-dev
Model Versioning
Models evolve over time.
Versioning helps:
- Maintain stability
- Test upgrades
- Support rollback strategies
- Compare model behavior
Why Model Versioning Matters
Different versions may:
- Behave differently
- Produce different outputs
- Affect latency
- Affect costs
- Impact prompt performance
Deployment Types
Azure AI commonly supports:
- Standard deployments
- Provisioned throughput deployments
Standard Deployments
Standard deployments use shared infrastructure.
Advantages:
- Simpler setup
- Lower upfront costs
- Flexible usage
Limitations:
- Shared capacity
- Variable latency under heavy load
Provisioned Throughput Deployments
Provisioned throughput reserves dedicated model capacity.
Advantages:
- Predictable performance
- Consistent latency
- Enterprise-grade scaling
Limitations:
- Higher cost
- Capacity planning required
When to Use Standard Deployments
Use standard deployments when:
- Workloads are moderate
- Usage is variable
- Cost optimization matters
- Development/testing environments are used
When to Use Provisioned Throughput
Use provisioned throughput when:
- High traffic is expected
- Predictable latency is required
- Enterprise SLAs exist
- Production copilots are deployed
Scaling Model Deployments
AI deployments must support varying workloads.
Autoscaling
Autoscaling adjusts resources dynamically based on demand.
Benefits:
- Improved performance
- Better cost efficiency
- Reduced manual intervention
Horizontal Scaling
Horizontal scaling adds additional instances or capacity.
Useful for:
- High concurrency
- Enterprise AI systems
- Large-scale chatbots
Latency Considerations
Latency refers to response time.
Factors affecting latency:
- Model size
- Throughput load
- Geographic distance
- Retrieval pipelines
- Tool execution
Choosing the Correct Model
Choosing the correct model is critical.
Use Larger Models When:
- Advanced reasoning is required
- Complex workflows exist
- High-quality generation matters
Use Smaller Models When:
- Cost efficiency matters
- Low latency is important
- Simpler tasks are performed
Agent Deployments
AI agents combine:
- Models
- Memory
- Retrieval
- Tool calling
- Workflow orchestration
Agent deployment involves configuring all these components together.
Agent Configuration Components
Common agent configuration elements include:
- System prompts
- Tool definitions
- Function calling
- Knowledge sources
- Retrieval settings
- Memory configuration
- Safety settings
System Prompts
System prompts define:
- Agent behavior
- Role instructions
- Response style
- Operational constraints
Well-designed system prompts improve:
- Reliability
- Consistency
- Safety
Tool and Function Integration
Agents may use tools such as:
- APIs
- Databases
- Search services
- External systems
Function calling enables agents to invoke these tools dynamically.
Retrieval Integration
Many AI agents use Retrieval-Augmented Generation (RAG).
RAG systems commonly integrate:
- Azure AI Search
- Embedding models
- Vector search
- Knowledge indexes
Knowledge Sources
Agents may connect to:
- Enterprise documents
- Databases
- APIs
- SharePoint
- Blob Storage
- Internal knowledge bases
Memory Configuration
Agents may use:
- Short-term memory
- Long-term memory
- Semantic memory
Common storage systems include:
- Azure Cosmos DB
- Azure SQL Database
- Azure AI Search
Security Configuration
Security is a major AI-103 exam topic.
Microsoft Entra ID
Microsoft Entra ID supports:
- Authentication
- Authorization
- RBAC
- Identity management
Azure Key Vault
Azure Key Vault securely stores:
- API keys
- Secrets
- Certificates
- Connection strings
Content Safety Configuration
Azure AI Content Safety helps:
- Detect harmful content
- Filter unsafe outputs
- Apply safety policies
Network Security
Enterprise AI deployments may use:
- VNets
- Private Endpoints
- Firewalls
- API gateways
Monitoring Deployments
AI deployments require operational monitoring.
Azure Monitor
Azure Monitor provides:
- Metrics
- Logging
- Alerts
- Diagnostics
Application Insights
Application Insights supports:
- Telemetry
- Request tracing
- Error diagnostics
- Performance monitoring
Metrics to Monitor
Common metrics include:
- Latency
- Token usage
- Error rates
- Throughput
- Tool call failures
- Retrieval quality
Evaluating AI Deployments
AI systems should be evaluated for:
- Accuracy
- Groundedness
- Safety
- Relevance
- Reliability
Prompt Flow
Prompt Flow supports:
- Workflow orchestration
- Prompt chaining
- Tool integration
- Evaluation pipelines
Prompt Flow is an important AI-103 topic.
CI/CD for AI Deployments
AI deployment pipelines should support:
- Automated testing
- Version control
- Safe releases
- Rollbacks
Blue-Green Deployments
Blue-green deployments:
- Reduce downtime
- Support safer releases
- Simplify rollback
Canary Deployments
Canary deployments:
- Roll out changes gradually
- Reduce deployment risk
- Support controlled testing
Common AI-103 Deployment Scenarios
Scenario 1: Enterprise AI Copilot
Requirements:
- High concurrency
- Secure retrieval
- Enterprise search
- Low latency
Recommended Configuration:
- Provisioned throughput
- Azure AI Search
- Entra ID
- Autoscaling
Scenario 2: Development Chatbot
Requirements:
- Low cost
- Rapid experimentation
- Flexible scaling
Recommended Configuration:
- Standard deployment
- App Service
- Basic monitoring
Scenario 3: AI Agent with Tool Calling
Requirements:
- API integrations
- Workflow execution
- Multi-step reasoning
Recommended Configuration:
- Azure OpenAI
- Azure Functions
- Prompt Flow
- Tool definitions
Scenario 4: Enterprise Knowledge Assistant
Requirements:
- Grounded responses
- Semantic retrieval
- Document search
Recommended Configuration:
- Embedding models
- Azure AI Search
- Hybrid search
- RAG pipelines
Cost Optimization Considerations
AI deployments can become expensive.
Common Cost Drivers
- Token usage
- Provisioned throughput
- Search indexing
- Embedding generation
- Large models
- High concurrency
Cost Optimization Strategies
Use Smaller Models When Possible
Smaller models reduce:
- Latency
- Compute costs
- Token usage
Optimize Retrieval
Efficient retrieval reduces:
- Prompt size
- Token costs
- Latency
Use Autoscaling
Autoscaling prevents overprovisioning.
Common AI-103 Exam Tips
Understand Deployment Types
Know the differences between:
- Standard deployments
- Provisioned throughput deployments
Learn Agent Configuration Components
Understand:
- System prompts
- Tool integration
- Retrieval settings
- Memory configuration
Know Security Best Practices
Use:
- Entra ID
- RBAC
- Key Vault
- Private networking
Understand Monitoring Concepts
Know how to monitor:
- Latency
- Token usage
- Throughput
- Errors
- AI quality
Summary
Configuring model and agent deployments is a critical skill for Azure AI developers.
For the AI-103 exam, you should understand:
- Azure OpenAI deployment configuration
- Model versioning
- Deployment scaling
- Agent architecture
- Tool integration
- Retrieval integration
- Memory configuration
- Security controls
- Monitoring and evaluation
- Deployment lifecycle management
Well-configured deployments improve:
- Reliability
- Performance
- Scalability
- Security
- Cost efficiency
- User experience
These concepts are foundational for building enterprise-grade AI applications and agent-based systems on Azure.
Practice Exam Questions
Question 1
Which deployment type provides dedicated capacity for Azure OpenAI workloads?
A. Shared deployment
B. Provisioned throughput deployment
C. Batch deployment
D. Basic deployment
Answer
B. Provisioned throughput deployment
Explanation
Provisioned throughput reserves dedicated processing capacity.
Question 2
What is the primary purpose of model versioning?
A. Increase storage size
B. Manage model updates and rollback strategies
C. Reduce API authentication
D. Eliminate monitoring
Answer
B. Manage model updates and rollback strategies
Explanation
Versioning helps maintain stability and supports rollback.
Question 3
Which Azure service is MOST commonly used for semantic retrieval in RAG systems?
A. Azure AI Search
B. Azure Backup
C. Azure CDN
D. Azure DNS
Answer
A. Azure AI Search
Explanation
Azure AI Search supports vector and semantic retrieval.
Question 4
What is the purpose of a system prompt in an AI agent?
A. Encrypt embeddings
B. Define agent behavior and instructions
C. Replace APIs
D. Configure storage replication
Answer
B. Define agent behavior and instructions
Explanation
System prompts guide the agent’s role, constraints, and response style.
Question 5
Which Azure service securely stores API keys and secrets?
A. Azure Key Vault
B. Azure Monitor
C. Azure Backup
D. Azure CDN
Answer
A. Azure Key Vault
Explanation
Azure Key Vault securely stores sensitive credentials.
Question 6
Which deployment strategy gradually rolls out updates to a small percentage of users first?
A. Full deployment
B. Canary deployment
C. Offline deployment
D. Batch deployment
Answer
B. Canary deployment
Explanation
Canary deployments reduce deployment risk through gradual rollout.
Question 7
Which type of model is specifically designed for vector generation and semantic similarity?
A. Vision model
B. Embedding model
C. Speech model
D. OCR model
Answer
B. Embedding model
Explanation
Embedding models generate vector representations for semantic retrieval.
Question 8
Which Azure service provides telemetry and request tracing for AI applications?
A. Application Insights
B. Azure DNS
C. Azure Files
D. Azure Firewall
Answer
A. Application Insights
Explanation
Application Insights provides application telemetry and diagnostics.
Question 9
Which feature dynamically adjusts resources based on workload demand?
A. Static allocation
B. Autoscaling
C. Encryption scaling
D. Semantic routing
Answer
B. Autoscaling
Explanation
Autoscaling automatically adjusts capacity based on traffic.
Question 10
Which Azure service is commonly used for workflow orchestration and prompt chaining in AI solutions?
A. Prompt Flow
B. Azure CDN
C. Azure Backup
D. Azure Front Door
Answer
A. Prompt Flow
Explanation
Prompt Flow orchestrates prompts, tools, and AI workflows.
Go to the AI-103 Exam Prep Hub main page





