This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub.
This topic falls under these sections:
Plan and manage an Azure AI solution (25–30%)
--> Choose the appropriate Foundry services for generative AI and agents
--> Choose an appropriate model for each task, including large language models (LLMs), small language models, multimodal models, and Foundry Tools
Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Introduction
One of the most important skills for the AI-103: Develop AI Apps and Agents on Azure certification exam is understanding how to choose the correct AI model and supporting Azure AI Foundry tools for a given business or technical scenario.
Modern AI development is no longer about simply selecting “an AI model.” Instead, developers must evaluate:
- The type of task being performed
- Cost constraints
- Latency requirements
- Accuracy expectations
- Reasoning complexity
- Context window needs
- Multimodal capabilities
- Deployment environment
- Security and governance requirements
- Agent orchestration requirements
Azure AI Foundry provides access to multiple categories of models and tools that help developers build generative AI applications and AI agents efficiently.
For the AI-103 exam, you should understand:
- When to use Large Language Models (LLMs)
- When Small Language Models (SLMs) are preferable
- When multimodal models are required
- How Azure AI Foundry tools support model selection and orchestration
- Tradeoffs between performance, cost, speed, and capability
- Common real-world scenarios for each model category
Azure AI Foundry Overview
Azure AI Foundry is Microsoft’s unified platform for building, evaluating, deploying, and managing AI applications and agents.
Azure AI Foundry provides:
- Access to foundation models
- Agent development capabilities
- Prompt engineering tools
- Evaluation tools
- Safety and content filtering
- Retrieval-augmented generation (RAG) support
- Fine-tuning capabilities
- Monitoring and observability
- Integration with Azure AI services
Azure AI Foundry enables developers to:
- Compare multiple models
- Test prompts
- Evaluate outputs
- Build AI agents
- Connect enterprise data
- Deploy scalable AI applications
For the AI-103 exam, understanding the relationship between model capabilities and Azure AI Foundry tools is extremely important.
Understanding Model Categories
The exam focuses heavily on selecting the correct model type for specific tasks.
The major categories include:
- Large Language Models (LLMs)
- Small Language Models (SLMs)
- Multimodal Models
- Embedding Models
- Specialized Models
Each category serves different purposes.
Large Language Models (LLMs)
What Are Large Language Models?
Large Language Models are advanced AI models trained on massive datasets containing text, code, and other information.
LLMs are designed for:
- Natural language understanding
- Natural language generation
- Complex reasoning
- Summarization
- Coding assistance
- Question answering
- Conversational AI
- Agent workflows
- Content creation
Examples include:
- GPT-4 family models
- GPT-4o models
- GPT-4 Turbo
- Phi large models
- Other frontier foundation models available in Azure AI Foundry
Characteristics of LLMs
Strengths
LLMs are excellent at:
Complex Reasoning
Examples:
- Multi-step problem solving
- Data interpretation
- Logical analysis
- Decision support
Advanced Content Generation
Examples:
- Marketing content
- Technical documentation
- Email drafting
- Knowledge-base generation
Conversational Experiences
Examples:
- AI chatbots
- AI copilots
- Virtual assistants
- Interactive tutoring systems
Agentic Workflows
LLMs are commonly used as the “reasoning engine” behind AI agents.
They can:
- Plan tasks
- Determine next actions
- Call tools
- Use memory
- Chain workflows
- Interact with APIs
Limitations of LLMs
Although powerful, LLMs have tradeoffs.
Higher Cost
LLMs generally:
- Require more compute
- Cost more per token
- Increase infrastructure expenses
Increased Latency
Larger models may:
- Respond more slowly
- Increase application response times
- Affect real-time user experiences
Resource Requirements
LLMs require:
- More GPU resources
- More memory
- Larger deployments
Overkill for Simple Tasks
Using GPT-4-level reasoning for basic classification or short summarization tasks may be unnecessary and expensive.
When to Use LLMs
Choose an LLM when tasks require:
- Advanced reasoning
- Long-context understanding
- High-quality content generation
- Complex conversational behavior
- Tool calling and agent orchestration
- Coding assistance
- Sophisticated summarization
- Enterprise copilots
Example LLM Scenarios
Scenario 1: Enterprise AI Copilot
A company wants an AI assistant that:
- Reads internal documentation
- Answers employee questions
- Generates summaries
- Explains policies
- Uses tools and APIs
Best choice:
- Large Language Model with RAG integration
Reason:
- Requires reasoning and conversational understanding.
Scenario 2: AI Coding Assistant
A development team needs:
- Code generation
- Debugging suggestions
- Refactoring support
- Documentation generation
Best choice:
- Advanced LLM
Reason:
- Coding tasks require complex contextual reasoning.
Small Language Models (SLMs)
What Are Small Language Models?
Small Language Models are more lightweight AI models optimized for:
- Faster responses
- Lower costs
- Lower resource consumption
- Edge deployments
- Narrower tasks
Examples include:
- Smaller Phi models
- Compact transformer-based models
- Task-specific lightweight models
Characteristics of SLMs
Strengths
Lower Cost
SLMs:
- Consume fewer resources
- Cost less to run
- Reduce token usage costs
Faster Inference
SLMs typically:
- Respond more quickly
- Improve responsiveness
- Support near real-time interactions
Edge and Mobile Suitability
SLMs may run:
- On edge devices
- On mobile hardware
- In constrained environments
Efficient for Narrow Tasks
SLMs work well for:
- Classification
- Basic summarization
- Intent detection
- Simple chat interactions
- Lightweight automation
Limitations of SLMs
Reduced Reasoning Ability
Compared to LLMs, SLMs may struggle with:
- Complex logic
- Long context handling
- Multi-step reasoning
- Sophisticated conversations
Lower Output Quality
Outputs may:
- Be less nuanced
- Contain reduced detail
- Provide weaker contextual understanding
When to Use SLMs
Choose an SLM when:
- Speed is critical
- Cost optimization matters
- Tasks are relatively simple
- Edge deployment is needed
- High throughput is required
- Lightweight AI experiences are sufficient
Example SLM Scenarios
Scenario 1: Customer Intent Classification
An application classifies support tickets into categories such as:
- Billing
- Technical support
- Returns
- Sales
Best choice:
- Small Language Model
Reason:
- Classification is relatively simple and does not require advanced reasoning.
Scenario 2: Edge Device Assistant
A manufacturing company deploys an AI assistant on factory equipment with limited compute.
Best choice:
- Small Language Model
Reason:
- Edge environments benefit from lightweight models.
Multimodal Models
What Are Multimodal Models?
Multimodal models can process multiple data types simultaneously.
Examples include:
- Text
- Images
- Audio
- Video
- Documents
These models combine information across modalities to produce richer outputs.
Capabilities of Multimodal Models
Multimodal models can:
- Analyze images and answer questions about them
- Generate captions from images
- Extract information from documents
- Process speech and text together
- Understand charts and diagrams
- Support visual reasoning
Common Multimodal Tasks
Image Understanding
Examples:
- Object detection
- Scene analysis
- Image captioning
- Visual question answering
Document Intelligence
Examples:
- Invoice extraction
- Receipt processing
- Form analysis
- OCR workflows
Audio + Text Experiences
Examples:
- Voice assistants
- Meeting summarization
- Speech transcription
- Audio analysis
When to Use Multimodal Models
Choose multimodal models when applications involve:
- Images and text together
- Document processing
- Speech interactions
- Visual understanding
- Cross-modal reasoning
Example Multimodal Scenarios
Scenario 1: Invoice Processing
A company needs to:
- Read invoices
- Extract totals
- Identify vendors
- Validate line items
Best choice:
- Multimodal document processing model
Reason:
- The solution must interpret both layout and text.
Scenario 2: Retail Image Assistant
Users upload photos of products and ask questions about them.
Best choice:
- Multimodal model
Reason:
- Requires simultaneous image and text understanding.
Embedding Models
What Are Embedding Models?
Embedding models convert text or other content into vector representations.
These vectors capture semantic meaning.
Embedding models are essential for:
- Semantic search
- Retrieval-Augmented Generation (RAG)
- Similarity matching
- Recommendation systems
- Knowledge retrieval
Retrieval-Augmented Generation (RAG)
RAG combines:
- Embedding models
- Vector databases
- LLMs
Workflow:
- Convert documents into embeddings
- Store embeddings in a vector index
- Convert user query into embeddings
- Retrieve relevant content
- Send retrieved data to the LLM
RAG improves:
- Accuracy
- Freshness of information
- Enterprise grounding
- Hallucination reduction
Specialized Models
Some tasks are better handled by specialized AI models instead of general-purpose LLMs.
Examples:
- Translation models
- Speech models
- OCR models
- Vision models
- Classification models
Why Specialized Models Matter
Specialized models may provide:
- Better accuracy
- Lower cost
- Faster performance
- Simpler deployment
Example:
Using a dedicated OCR service is often more efficient than asking an LLM to read text from images.
Model Selection Factors
The AI-103 exam heavily tests your ability to select the correct model based on requirements.
Factor 1: Task Complexity
Use LLMs For:
- Advanced reasoning
- Multi-step workflows
- Complex conversations
Use SLMs For:
- Simple classification
- Lightweight interactions
- Fast automation
Factor 2: Cost
LLMs
- Higher operational cost
- More expensive inference
SLMs
- Lower operational cost
- Better for high-volume workloads
Factor 3: Latency
Low-Latency Requirements
Prefer:
- SLMs
- Lightweight models
Complex Processing
Prefer:
- LLMs
Even if response time increases.
Factor 4: Context Window
Some tasks require processing:
- Long documents
- Large conversations
- Extensive histories
Choose models with larger context windows for:
- Legal analysis
- Knowledge assistants
- Long-form summarization
Factor 5: Multimodal Requirements
If the application involves:
- Images
- Audio
- Video
- Documents
Choose multimodal-capable models.
Factor 6: Deployment Environment
Cloud-Hosted Applications
May use:
- Large frontier models
- GPU-intensive deployments
Edge or Mobile Deployments
Prefer:
- Small models
- Quantized models
- Lightweight inference
Azure AI Foundry Tools
Azure AI Foundry includes numerous tools that support model selection and AI application development.
Model Catalog
The Model Catalog allows developers to:
- Browse available models
- Compare capabilities
- Review benchmarks
- Deploy models
- Evaluate pricing
The catalog includes:
- Microsoft-hosted models
- Open-source models
- Partner models
- Frontier models
Prompt Flow
Prompt Flow helps developers:
- Build AI workflows
- Chain prompts together
- Integrate tools
- Evaluate prompts
- Test model behavior
Prompt Flow is useful for:
- Agent orchestration
- RAG pipelines
- Multi-step AI workflows
AI Agent Development Tools
Azure AI Foundry supports AI agents that can:
- Use tools
- Access data
- Maintain memory
- Perform actions
- Execute workflows
Agent frameworks may include:
- Tool calling
- Function calling
- Retrieval integration
- Multi-agent orchestration
Evaluation Tools
Evaluation tools help developers assess:
- Accuracy
- Groundedness
- Safety
- Relevance
- Latency
- Cost
Evaluation is critical because model quality varies by task.
Content Safety Tools
Azure AI Foundry includes safety features such as:
- Content filtering
- Harm detection
- Prompt injection detection
- Responsible AI controls
These tools help ensure safe AI deployments.
Fine-Tuning Tools
Fine-tuning allows developers to customize models using:
- Domain-specific data
- Proprietary terminology
- Specialized workflows
Fine-tuning may improve:
- Accuracy
- Consistency
- Industry-specific responses
However, fine-tuning also:
- Increases cost
- Requires data preparation
- Adds operational complexity
Choosing Between Prompt Engineering, RAG, and Fine-Tuning
This is a very important AI-103 exam topic.
Prompt Engineering
Use when:
- You need quick customization
- Tasks are general-purpose
- No private data integration is needed
Advantages:
- Fast
- Cheap
- Easy to maintain
RAG
Use when:
- You need current or proprietary data
- You want grounding in enterprise content
- You need dynamic knowledge retrieval
Advantages:
- Reduces hallucinations
- Keeps knowledge current
- Avoids retraining
Fine-Tuning
Use when:
- Consistent specialized outputs are required
- Domain language is highly unique
- Behavioral customization is necessary
Advantages:
- Tailored responses
- Better domain alignment
Real-World Model Selection Examples
Example 1: FAQ Chatbot
Requirements:
- Low cost
- Fast responses
- Basic conversational support
Best Choice:
- Small Language Model + RAG
Example 2: Legal Document Assistant
Requirements:
- Long-context understanding
- Detailed summarization
- Advanced reasoning
Best Choice:
- Large Language Model with large context window
Example 3: Mobile AI App
Requirements:
- Offline capability
- Fast performance
- Low resource usage
Best Choice:
- Small Language Model
Example 4: Image-Based Customer Support
Requirements:
- Analyze uploaded photos
- Understand text and images
- Generate responses
Best Choice:
- Multimodal model
Key AI-103 Exam Tips
Understand Tradeoffs
You should know:
- Bigger models are not always better
- Simpler tasks may not require advanced LLMs
- Cost and latency matter
- Specialized models may outperform general models
Know Common Pairings
LLM + RAG
Used for:
- Enterprise chatbots
- Knowledge assistants
- AI copilots
Embeddings + Vector Search
Used for:
- Semantic search
- Knowledge retrieval
- Similarity matching
Multimodal Models
Used for:
- Vision AI
- Document processing
- Audio interactions
Learn the Azure AI Foundry Ecosystem
Know the purpose of:
- Model Catalog
- Prompt Flow
- Evaluation tools
- Agent tools
- Safety systems
- Fine-tuning workflows
Summary
Selecting the correct AI model is one of the most important responsibilities for an Azure AI developer.
For the AI-103 exam, you should understand:
- The differences between LLMs and SLMs
- When multimodal models are required
- How embedding models support RAG
- When specialized models outperform general-purpose models
- The tradeoffs between cost, speed, and reasoning capability
- How Azure AI Foundry tools support AI development and orchestration
In real-world AI systems, choosing the correct model can dramatically improve:
- Performance
- User experience
- Scalability
- Operational cost
- Reliability
- Maintainability
A strong understanding of model selection is essential for designing effective Azure AI applications and AI agents.
Practice Exam Questions
Question 1
A company is building an enterprise AI assistant that must answer complex employee questions using internal documentation and perform multi-step reasoning. Which model type is MOST appropriate?
A. Small Language Model (SLM)
B. Embedding model only
C. Large Language Model (LLM)
D. OCR model
Answer
C. Large Language Model (LLM)
Explanation
Complex reasoning and conversational understanding are best handled by LLMs.
Question 2
Which model type is generally BEST for low-cost, low-latency classification tasks?
A. Large multimodal model
B. Small Language Model (SLM)
C. GPT-4-class reasoning model
D. Vision foundation model
Answer
B. Small Language Model (SLM)
Explanation
SLMs are optimized for lightweight and cost-efficient tasks.
Question 3
A solution must process uploaded invoices and extract totals, vendor names, and line items. Which model type is MOST appropriate?
A. Embedding model
B. Small Language Model
C. Multimodal model
D. Translation model
Answer
C. Multimodal model
Explanation
Invoice extraction requires understanding both layout and text.
Question 4
What is the primary purpose of embedding models?
A. Image generation
B. Semantic vector representation
C. Audio transcription
D. Tool orchestration
Answer
B. Semantic vector representation
Explanation
Embedding models convert content into vectors for semantic search and retrieval.
Question 5
Which Azure AI Foundry tool helps developers chain prompts, integrate tools, and build AI workflows?
A. Azure Monitor
B. Prompt Flow
C. Azure Policy
D. Azure Functions
Answer
B. Prompt Flow
Explanation
Prompt Flow is designed for workflow orchestration and prompt pipelines.
Question 6
A mobile AI application must operate with minimal compute resources and very fast response times. Which model type is MOST appropriate?
A. Large Language Model
B. Small Language Model
C. Large multimodal model
D. High-context reasoning model
Answer
B. Small Language Model
Explanation
SLMs are optimized for lightweight and edge deployments.
Question 7
Which approach is BEST when an AI chatbot must use current enterprise data without retraining the model?
A. Fine-tuning only
B. Prompt engineering only
C. Retrieval-Augmented Generation (RAG)
D. Quantization
Answer
C. Retrieval-Augmented Generation (RAG)
Explanation
RAG retrieves current information dynamically without retraining.
Question 8
Which factor MOST strongly indicates that a multimodal model is required?
A. Need for vector embeddings
B. Need for faster response times
C. Need to process images and text together
D. Need for lower cost
Answer
C. Need to process images and text together
Explanation
Multimodal models handle multiple input modalities simultaneously.
Question 9
What is a major tradeoff of using larger language models?
A. Reduced reasoning capability
B. Lower context windows
C. Increased operational cost
D. Inability to support agents
Answer
C. Increased operational cost
Explanation
Larger models typically require more compute resources and cost more.
Question 10
Which Azure AI Foundry capability helps evaluate model quality, safety, and groundedness?
A. Azure Load Balancer
B. Evaluation tools
C. Azure Backup
D. Traffic Manager
Answer
B. Evaluation tools
Explanation
Evaluation tools assess output quality, safety, and performance metrics.
Go to the AI-103 Exam Prep Hub main page
