This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub.
This topic falls under these sections:
Implement generative AI and agentic solutions (30–35%)
--> Build generative applications by using Foundry
--> Deploy and consume LLMs, small models, code models, and multimodal models
Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Introduction
Modern AI applications rely on a wide variety of AI models.
Different models are optimized for different workloads, including:
- Conversational AI
- Code generation
- Text summarization
- Image understanding
- Audio processing
- Reasoning tasks
- Agentic workflows
The AI-103: Develop AI Apps and Agents on Azure certification exam tests your understanding of how to deploy and consume AI models in Azure AI Foundry.
For the AI-103 exam, you should understand:
- Large language models (LLMs)
- Small language models (SLMs)
- Code models
- Multimodal models
- Model deployment concepts
- Model consumption patterns
- API-based model access
- Endpoint configuration
- Performance and cost tradeoffs
- Model selection strategies
- Responsible AI considerations
What Are Large Language Models (LLMs)?
Large language models are advanced AI systems trained on massive datasets.
LLMs can:
- Generate text
- Summarize documents
- Answer questions
- Translate languages
- Reason across prompts
- Support conversational AI
Common LLM Use Cases
Typical use cases include:
- AI assistants
- Enterprise chatbots
- Content generation
- Knowledge retrieval
- Agent orchestration
- Workflow automation
Characteristics of LLMs
LLMs typically provide:
- Strong reasoning
- Broad general knowledge
- Advanced conversational abilities
- Complex instruction following
However, they also:
- Require more compute
- Cost more to run
- May introduce higher latency
What Are Small Language Models (SLMs)?
Small language models are lightweight models optimized for:
- Faster inference
- Lower cost
- Lower latency
- Edge deployment
- Specialized tasks
Common SLM Use Cases
SLMs are often used for:
- Classification
- Simple chatbots
- Mobile applications
- Embedded AI
- Lightweight assistants
Benefits of Small Models
Advantages include:
- Reduced infrastructure cost
- Faster response times
- Lower resource requirements
- Easier deployment at scale
LLM vs SLM Tradeoffs
LLMs
Best for:
- Complex reasoning
- Broad knowledge
- Multi-step tasks
Tradeoffs:
- Higher cost
- Higher latency
- Larger infrastructure requirements
SLMs
Best for:
- Lightweight inference
- Narrow tasks
- Cost-sensitive workloads
Tradeoffs:
- Reduced reasoning capability
- Smaller context windows
- Less flexibility
What Are Code Models?
Code models are specialized AI models trained for software development tasks.
These models can:
- Generate code
- Explain code
- Complete functions
- Debug issues
- Convert between languages
Common Code Model Use Cases
Typical scenarios include:
- Developer copilots
- Code generation
- Documentation generation
- Test generation
- Refactoring assistance
Code Model Capabilities
Code models often support:
- Multiple programming languages
- Natural language prompts
- Code reasoning
- Syntax understanding
What Are Multimodal Models?
Multimodal models process multiple types of input.
Examples include:
- Text and images
- Text and audio
- Video and text
Multimodal AI Capabilities
Multimodal models may support:
- Image understanding
- OCR
- Visual question answering
- Audio transcription
- Speech interaction
- Video analysis
Common Multimodal Use Cases
Examples include:
- AI vision assistants
- Document understanding
- Medical imaging analysis
- Voice assistants
- Image captioning
Model Deployment in Azure AI Foundry
Azure AI Foundry enables developers to:
- Discover models
- Deploy models
- Test models
- Monitor deployments
- Consume models through APIs
Model Catalogs
Azure AI Foundry provides access to:
- Foundation models
- Open-source models
- Specialized models
- Multimodal models
Deployment Concepts
A deployment makes a model available through:
- APIs
- Endpoints
- Applications
- Agent workflows
Deployment Types
Common deployment options include:
- Managed online deployments
- Serverless deployments
- Real-time inference endpoints
- Batch inference deployments
Real-Time Inference
Real-time inference is used for:
- Interactive chat
- AI assistants
- Live applications
- Agent workflows
Batch Inference
Batch inference is used for:
- Large-scale document processing
- Offline analysis
- Scheduled workloads
- Bulk content generation
Endpoint Configuration
Deployments expose endpoints for application access.
Endpoints may include:
- Authentication
- Rate limits
- Scaling policies
- Monitoring settings
Authentication and Authorization
Applications may access models using:
- API keys
- Managed identities
- Microsoft Entra ID
- Role-based access control (RBAC)
Consuming Models Through APIs
Applications consume deployed models using:
- REST APIs
- SDKs
- Client libraries
Prompt-Based Interactions
Generative AI applications commonly interact with models through prompts.
Prompts may include:
- Instructions
- Context
- Examples
- Retrieved documents
System Prompts
System prompts define:
- AI behavior
- Tone
- Constraints
- Safety policies
Model Parameters
Common inference parameters include:
- Temperature
- Top-p
- Max tokens
- Frequency penalty
- Presence penalty
Temperature
Temperature controls output randomness.
Lower temperature:
- More deterministic
- More predictable
Higher temperature:
- More creative
- More variable
Context Windows
Context windows determine how much information a model can process in a request.
Larger context windows support:
- Long conversations
- Large documents
- Multi-document grounding
Streaming Responses
Streaming enables applications to receive responses incrementally.
Benefits include:
- Improved user experience
- Faster perceived response times
Grounding Models
Grounding improves factual accuracy by providing trusted data.
Grounded applications commonly use:
- Vector search
- Retrieval-Augmented Generation (RAG)
- Enterprise knowledge sources
Model Selection Considerations
Developers should evaluate:
- Accuracy
- Cost
- Latency
- Context size
- Reasoning ability
- Multimodal support
- Scalability
Choosing Between Models
Use LLMs When:
- Complex reasoning is required
- Broad knowledge is needed
- Multi-step workflows are involved
Use SLMs When:
- Low latency matters
- Cost optimization is critical
- Tasks are narrow or repetitive
Use Code Models When:
- Building developer tools
- Generating code
- Supporting programming workflows
Use Multimodal Models When:
- Images or audio are required
- Visual understanding is needed
- Mixed media inputs are processed
Scaling Model Deployments
Scaling strategies may include:
- Autoscaling
- Regional deployments
- Load balancing
- Rate limiting
Monitoring Deployments
Organizations should monitor:
- Latency
- Throughput
- Token usage
- Errors
- Safety events
- Cost
Cost Optimization
Cost optimization strategies include:
- Choosing smaller models
- Limiting token usage
- Caching responses
- Using batch processing
Responsible AI Considerations
Developers should implement:
- Safety filters
- Guardrails
- Content moderation
- Monitoring
- Human oversight
Multimodal Safety Concerns
Multimodal systems may require:
- Image moderation
- OCR filtering
- Audio moderation
- Content safety evaluation
Agentic AI and Model Consumption
AI agents may use:
- LLMs for reasoning
- SLMs for lightweight tasks
- Code models for automation
- Multimodal models for perception
Common AI-103 Deployment Scenarios
Scenario 1: Enterprise Chatbot
Requirements:
- Strong reasoning
- Long conversations
- Grounded responses
Recommended Model:
- LLM with RAG
Scenario 2: Mobile AI Assistant
Requirements:
- Fast responses
- Low cost
- Lightweight inference
Recommended Model:
- Small language model
Scenario 3: Developer Copilot
Requirements:
- Code generation
- Programming assistance
- Syntax awareness
Recommended Model:
- Code model
Scenario 4: Image-Aware AI Assistant
Requirements:
- Image analysis
- OCR
- Text generation
Recommended Model:
- Multimodal model
Common AI-103 Exam Tips
Understand Model Categories
Know the differences between:
- LLMs
- SLMs
- Code models
- Multimodal models
Learn Deployment Concepts
Understand:
- Endpoints
- Real-time inference
- Batch inference
- Scaling
Learn Consumption Patterns
Know:
- REST APIs
- SDKs
- Prompt engineering
- System prompts
Understand Cost and Performance Tradeoffs
Know how:
- Model size affects cost
- Context size affects latency
- Scaling impacts performance
Summary
Azure AI Foundry enables developers to deploy and consume a wide range of AI models.
For the AI-103 exam, you should understand:
- LLMs
- Small language models
- Code models
- Multimodal models
- Deployment options
- Model consumption patterns
- Prompt engineering
- Scaling strategies
- Cost optimization
- Responsible AI controls
Choosing the right model and deployment strategy is essential for building:
- Scalable
- Reliable
- Efficient
- Responsible AI solutions
These concepts are foundational for generative AI and agentic systems on Azure.
Practice Exam Questions
Question 1
What is a primary strength of large language models (LLMs)?
A. Minimal compute usage
B. Complex reasoning and broad knowledge
C. Guaranteed factual accuracy
D. Extremely low latency
Answer
B. Complex reasoning and broad knowledge
Explanation
LLMs excel at reasoning, conversation, and broad knowledge tasks.
Question 2
Which model type is best suited for lightweight, low-cost inference?
A. Large language model
B. Small language model
C. Multimodal model
D. Vision transformer only
Answer
B. Small language model
Explanation
SLMs are optimized for lower latency and reduced cost.
Question 3
Which model type is specifically optimized for programming tasks?
A. Vision model
B. Code model
C. Embedding model
D. Speech model
Answer
B. Code model
Explanation
Code models are trained for software development workflows.
Question 4
What is a defining feature of multimodal models?
A. They only process text
B. They process multiple input types
C. They eliminate inference costs
D. They require no prompting
Answer
B. They process multiple input types
Explanation
Multimodal models handle text, images, audio, and other media.
Question 5
Which deployment type is best for interactive AI chat applications?
A. Batch inference
B. Real-time inference
C. Archive deployment
D. Offline storage deployment
Answer
B. Real-time inference
Explanation
Interactive applications require low-latency real-time inference.
Question 6
What does the temperature parameter control?
A. Network throughput
B. Output randomness and creativity
C. Storage replication
D. GPU memory allocation
Answer
B. Output randomness and creativity
Explanation
Temperature affects how deterministic or creative outputs become.
Question 7
Which technique improves factual accuracy by using trusted data sources?
A. GPU scaling
B. Retrieval-Augmented Generation (RAG)
C. Semantic caching
D. Compression indexing
Answer
B. Retrieval-Augmented Generation (RAG)
Explanation
RAG grounds model outputs using retrieved enterprise data.
Question 8
What is a major benefit of streaming responses?
A. Reduced storage costs
B. Faster perceived response times
C. Elimination of monitoring
D. Improved vector indexing
Answer
B. Faster perceived response times
Explanation
Streaming improves user experience during response generation.
Question 9
Which authentication method supports passwordless access to Azure AI services?
A. Static credentials only
B. Managed identities
C. Anonymous access
D. Embedded API secrets in code
Answer
B. Managed identities
Explanation
Managed identities support secure, keyless authentication.
Question 10
Which model type is most appropriate for image understanding and OCR tasks?
A. Small language model
B. Multimodal model
C. Traditional relational database
D. Static rules engine
Answer
B. Multimodal model
Explanation
Multimodal models process images and text together.
Go to the AI-103 Exam Prep Hub main page
