Design Azure infrastructure for AI Apps and agent-based solutions (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Plan and manage an Azure AI solution (25–30%)
--> Set up AI solutions in Foundry
--> Design Azure infrastructure for AI Apps and agent-based solutions


Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Designing infrastructure for AI applications and agent-based systems is one of the most important responsibilities for Azure AI developers.

Modern AI solutions are not simply standalone models. They are distributed cloud systems that combine:

  • AI services
  • APIs
  • Databases
  • Search systems
  • Storage
  • Networking
  • Security controls
  • Monitoring systems
  • Agent orchestration components

The AI-103: Develop AI Apps and Agents on Azure certification exam tests your ability to design Azure infrastructure that supports:

  • Generative AI applications
  • AI agents
  • Retrieval-Augmented Generation (RAG)
  • Vector search
  • Multimodal AI systems
  • Scalable AI architectures
  • Secure enterprise AI deployments

For the AI-103 exam, you should understand:

  • Core Azure infrastructure services
  • AI architecture patterns
  • Scalability and performance design
  • Networking and security
  • Identity and access management
  • Storage and databases
  • Monitoring and observability
  • Cost optimization
  • High availability and disaster recovery
  • Infrastructure choices for AI agents

Core Components of AI Infrastructure

AI applications commonly require multiple infrastructure layers.

Typical components include:

  1. AI model services
  2. Compute resources
  3. Storage systems
  4. Search and retrieval systems
  5. Networking components
  6. Security services
  7. Monitoring systems
  8. Workflow orchestration
  9. API management
  10. Identity management

Azure AI Services Layer

Azure OpenAI

Azure OpenAI provides:

  • Large Language Models (LLMs)
  • Embedding models
  • Multimodal models
  • Conversational AI capabilities

Azure OpenAI is commonly used for:

  • AI copilots
  • Chatbots
  • AI agents
  • Summarization
  • Content generation
  • Tool calling

Azure AI Search

Azure AI Search supports:

  • Vector search
  • Semantic search
  • Hybrid search
  • Enterprise retrieval
  • RAG architectures

It is commonly used for:

  • Knowledge grounding
  • Enterprise search
  • AI assistant retrieval

Azure AI Vision

Azure AI Vision provides:

  • OCR
  • Image analysis
  • Object detection
  • Caption generation
  • Visual understanding

Azure AI Document Intelligence

Azure AI Document Intelligence supports:

  • Invoice extraction
  • Form processing
  • Layout analysis
  • OCR workflows
  • Structured document extraction

Compute Infrastructure for AI Applications

Azure App Service

Azure App Service is commonly used to host:

  • Web applications
  • AI front ends
  • APIs
  • Lightweight AI services

Advantages:

  • Managed platform
  • Easy scaling
  • Simplified deployment

Azure Kubernetes Service (AKS)

AKS provides container orchestration for:

  • Large-scale AI applications
  • Microservices
  • Agent orchestration systems
  • Distributed AI workloads

Advantages:

  • High scalability
  • Container management
  • Advanced orchestration
  • Enterprise-grade deployments

When to Use AKS

Use AKS when:

  • Complex orchestration is required
  • Multiple services interact
  • High scalability is needed
  • Microservice architectures are used

Azure Functions

Azure Functions provides serverless compute.

Common AI use cases:

  • Tool execution
  • Event-driven workflows
  • API integrations
  • Lightweight processing
  • Agent tool calling

Advantages:

  • Pay-per-use pricing
  • Automatic scaling
  • Fast development

Azure Container Apps

Azure Container Apps provides simplified container hosting.

Useful for:

  • API services
  • AI middleware
  • Lightweight agent services
  • Event-driven AI components

Choosing the Correct Compute Service

Use Azure App Service When:

  • Hosting simple AI web apps
  • Managing APIs
  • Rapid deployment is needed

Use AKS When:

  • Large-scale orchestration is required
  • Complex microservices exist
  • Advanced scalability is necessary

Use Azure Functions When:

  • Event-driven execution is needed
  • Tool calling is required
  • Lightweight compute is sufficient

Use Azure Container Apps When:

  • Container simplicity is preferred
  • Serverless containers are desired

Storage Infrastructure

AI systems often require multiple storage solutions.


Azure Blob Storage

Azure Blob Storage supports:

  • Document storage
  • Training data
  • Images
  • Videos
  • Logs
  • AI datasets

Common AI uses:

  • RAG document storage
  • Knowledge repositories
  • Media storage

Azure Cosmos DB

Azure Cosmos DB provides:

  • Globally distributed NoSQL storage
  • Low-latency access
  • High scalability

Common AI uses:

  • Agent memory
  • Session storage
  • User profiles
  • Conversation history

Azure SQL Database

Azure SQL Database supports:

  • Structured enterprise data
  • Relational workloads
  • Transactional systems

Common AI uses:

  • Enterprise integration
  • Business systems
  • Structured metadata

Vector Storage

Vector-enabled storage supports:

  • Embedding storage
  • Similarity search
  • Semantic retrieval

Common services include:

  • Azure AI Search
  • Azure Cosmos DB
  • Azure SQL Database

Networking Infrastructure

AI solutions require secure and scalable networking.


Virtual Networks (VNets)

VNets provide:

  • Network isolation
  • Secure communication
  • Private connectivity

Use VNets when:

  • Enterprise security is required
  • Private networking is necessary
  • Sensitive data is involved

Private Endpoints

Private Endpoints allow Azure services to be accessed privately through VNets.

Benefits:

  • Improved security
  • Reduced public exposure
  • Enterprise compliance support

API Management

Azure API Management helps:

  • Secure APIs
  • Throttle requests
  • Monitor API usage
  • Apply policies
  • Manage agent APIs

This is important for:

  • AI agents
  • Tool integrations
  • Enterprise API governance

Load Balancing

Azure Load Balancer and Application Gateway help:

  • Distribute traffic
  • Improve availability
  • Scale AI applications

Identity and Security

Security is a major AI-103 exam topic.


Microsoft Entra ID

Microsoft Entra ID provides:

  • Authentication
  • Authorization
  • Identity management
  • Role-based access control (RBAC)

AI applications use Entra ID for:

  • User authentication
  • API access control
  • Secure enterprise integration

Role-Based Access Control (RBAC)

RBAC ensures users and services only access authorized resources.

Examples:

  • Restricting AI model access
  • Controlling storage access
  • Securing search indexes

Azure Key Vault

Azure Key Vault stores:

  • Secrets
  • API keys
  • Certificates
  • Connection strings

Never hardcode secrets in AI applications.


Azure AI Content Safety

Azure AI Content Safety helps:

  • Detect harmful content
  • Filter unsafe outputs
  • Support responsible AI practices

Monitoring and Observability

AI systems require monitoring for:

  • Reliability
  • Performance
  • Cost
  • Failures
  • Hallucinations
  • API latency

Azure Monitor

Azure Monitor collects:

  • Metrics
  • Logs
  • Alerts
  • Performance data

Application Insights

Application Insights supports:

  • Application telemetry
  • Request tracing
  • Error tracking
  • Dependency monitoring

Useful for:

  • AI apps
  • APIs
  • Agent workflows

Logging AI Systems

AI systems should log:

  • Prompts
  • Responses
  • Errors
  • Tool calls
  • Latency
  • Retrieval quality

Logging helps:

  • Troubleshooting
  • Auditing
  • Evaluation
  • Compliance

Scalability Design

AI applications may experience:

  • High traffic
  • Large token volumes
  • Heavy retrieval workloads
  • Concurrent agent operations

Infrastructure must scale effectively.


Horizontal Scaling

Horizontal scaling adds more instances.

Examples:

  • Additional API servers
  • More containers
  • More worker nodes

Vertical Scaling

Vertical scaling increases resource capacity.

Examples:

  • More CPU
  • More memory
  • Larger VM sizes

Autoscaling

Autoscaling dynamically adjusts resources based on demand.

Common services supporting autoscaling:

  • AKS
  • Azure Functions
  • App Service
  • Container Apps

High Availability and Disaster Recovery

Enterprise AI systems require resilience.


Availability Zones

Availability Zones improve fault tolerance.

Benefits:

  • Redundancy
  • Improved uptime
  • Reduced outage risk

Geo-Redundancy

Geo-redundancy replicates data across regions.

Useful for:

  • Disaster recovery
  • Business continuity
  • Global applications

Backup and Recovery

AI systems should back up:

  • Knowledge indexes
  • Databases
  • Configuration data
  • Logs
  • Agent memory

Infrastructure for AI Agents

AI agents often require additional infrastructure components.


Agent Orchestration

AI agents may require orchestration services such as:

  • Prompt Flow
  • Azure Functions
  • Logic Apps
  • AKS workflows

Retrieval Infrastructure

Agent systems commonly use:

  • Azure AI Search
  • Embeddings
  • Vector indexes
  • RAG pipelines

Persistent Memory Infrastructure

Persistent memory may use:

  • Azure Cosmos DB
  • Azure SQL Database
  • Blob Storage

Tool Integration Infrastructure

Agents often integrate with:

  • REST APIs
  • Databases
  • External SaaS systems
  • Enterprise workflows

Common AI-103 Architecture Scenarios

Scenario 1: Enterprise AI Copilot

Requirements:

  • Conversational AI
  • Enterprise search
  • Secure authentication
  • Document retrieval

Recommended Infrastructure:

  • Azure OpenAI
  • Azure AI Search
  • Entra ID
  • Blob Storage
  • App Service

Scenario 2: Large-Scale Multi-Agent System

Requirements:

  • Multiple AI agents
  • High scalability
  • Distributed orchestration

Recommended Infrastructure:

  • AKS
  • Azure Functions
  • Prompt Flow
  • Cosmos DB

Scenario 3: AI Invoice Processing Solution

Requirements:

  • OCR
  • Document extraction
  • Workflow automation

Recommended Infrastructure:

  • Azure AI Document Intelligence
  • Blob Storage
  • Logic Apps
  • Azure Functions

Scenario 4: Global AI Chat Platform

Requirements:

  • Global availability
  • High concurrency
  • Disaster recovery

Recommended Infrastructure:

  • Geo-redundant storage
  • Availability Zones
  • Load balancing
  • Autoscaling

Cost Optimization Considerations

AI infrastructure can become expensive.


Common Cost Drivers

  • Token usage
  • Vector storage
  • GPU workloads
  • Data transfer
  • Search indexing
  • High-scale orchestration

Cost Optimization Strategies

Use Smaller Models When Appropriate

Smaller models reduce:

  • Compute usage
  • Token costs
  • Latency

Use Autoscaling

Autoscaling reduces idle resource costs.


Optimize Retrieval Pipelines

Efficient chunking and indexing reduce:

  • Search costs
  • Storage requirements
  • Retrieval latency

Common AI-103 Exam Tips

Understand Infrastructure Tradeoffs

Know when to use:

  • AKS vs App Service
  • Functions vs Containers
  • Cosmos DB vs SQL Database

Learn Security Best Practices

Know how to use:

  • Entra ID
  • RBAC
  • Key Vault
  • Private Endpoints

Understand RAG Infrastructure

RAG commonly uses:

  • Azure OpenAI
  • Azure AI Search
  • Embeddings
  • Storage systems

Know Agent Infrastructure Patterns

AI agents commonly require:

  • Workflow orchestration
  • Tool integration
  • Persistent memory
  • Retrieval systems

Summary

Designing Azure infrastructure for AI applications requires balancing:

  • Scalability
  • Security
  • Performance
  • Cost
  • Reliability
  • Maintainability

For the AI-103 exam, you should understand:

  • Azure AI service architecture
  • Compute options
  • Storage design
  • Networking and security
  • Monitoring and observability
  • High availability
  • Agent infrastructure patterns
  • RAG infrastructure
  • Infrastructure scaling strategies

Strong infrastructure design skills are essential for deploying production-grade AI apps and agent-based systems on Azure.


Practice Exam Questions

Question 1

Which Azure service is MOST appropriate for enterprise vector search and RAG retrieval?

A. Azure AI Search
B. Azure Backup
C. Azure CDN
D. Azure DNS

Answer

A. Azure AI Search

Explanation

Azure AI Search supports vector search, semantic search, and retrieval for RAG systems.


Question 2

Which Azure compute service is BEST suited for large-scale containerized AI microservices?

A. Azure App Service
B. Azure Kubernetes Service (AKS)
C. Azure Files
D. Azure CDN

Answer

B. Azure Kubernetes Service (AKS)

Explanation

AKS provides advanced container orchestration and scalability.


Question 3

Which Azure service is MOST appropriate for storing API keys and secrets securely?

A. Azure Key Vault
B. Azure Monitor
C. Azure DNS
D. Azure Load Balancer

Answer

A. Azure Key Vault

Explanation

Azure Key Vault securely stores secrets, certificates, and keys.


Question 4

Which Azure service provides serverless execution for lightweight AI workflows and tool calling?

A. Azure Functions
B. Azure Backup
C. Azure CDN
D. Azure Firewall

Answer

A. Azure Functions

Explanation

Azure Functions supports event-driven serverless compute.


Question 5

What is the primary purpose of Availability Zones?

A. Reduce token usage
B. Improve fault tolerance and uptime
C. Replace backups
D. Encrypt embeddings

Answer

B. Improve fault tolerance and uptime

Explanation

Availability Zones provide redundancy across isolated datacenter locations.


Question 6

Which Azure service is MOST commonly used for globally distributed NoSQL storage in AI applications?

A. Azure Cosmos DB
B. Azure DNS
C. Azure Files
D. Azure CDN

Answer

A. Azure Cosmos DB

Explanation

Azure Cosmos DB provides scalable globally distributed NoSQL storage.


Question 7

Which Azure networking feature enables private access to Azure services from a VNet?

A. Private Endpoint
B. Public IP
C. Load Balancer
D. Traffic Manager

Answer

A. Private Endpoint

Explanation

Private Endpoints provide secure private connectivity.


Question 8

Which Azure monitoring service provides application telemetry and request tracing?

A. Application Insights
B. Azure CDN
C. Azure Policy
D. Azure ExpressRoute

Answer

A. Application Insights

Explanation

Application Insights provides telemetry and diagnostics for applications.


Question 9

Which Azure identity service provides authentication and RBAC support for AI applications?

A. Microsoft Entra ID
B. Azure CDN
C. Azure Firewall
D. Azure Front Door

Answer

A. Microsoft Entra ID

Explanation

Microsoft Entra ID provides identity and access management.


Question 10

Which scaling strategy adds additional instances to support increased AI workload demand?

A. Vertical scaling
B. Horizontal scaling
C. Encryption scaling
D. Semantic scaling

Answer

B. Horizontal scaling

Explanation

Horizontal scaling adds more instances to distribute workloads.


Go to the AI-103 Exam Prep Hub main page

Leave a comment