Tag: SLMs

AI, AI-103, Azure AI, Large Language Models (LLMs), Microsoft Certification May 25, 2026

Translate speech into other languages by using Language Models and Foundry Tools (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement text analysis solutions (10–15%)
   --> Implement speech solutions
      --> Translate speech into other languages by using Language Models and Foundry Tools

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Speech translation is one of the most impactful capabilities in modern AI systems. Organizations increasingly require applications that can:

Understand spoken language
Translate speech into other languages
Generate spoken responses
Support multilingual conversations in real time

For the AI-103 certification exam, you should understand how to build speech translation workflows using:

Azure AI Speech
Azure AI Translator
Azure OpenAI Service
Azure AI Foundry
Multimodal language models
Real-time streaming pipelines

This topic falls under:

“Implement speech solutions”

What Is Speech Translation?

Speech translation is the process of:

Receiving spoken audio
Converting speech to text
Translating the text into another language
Optionally converting translated text back into speech

This allows users speaking different languages to communicate naturally.

Common Speech Translation Scenarios

Organizations use speech translation for:

Real-time multilingual meetings
Customer support
Voice assistants
Call centers
Live event translation
Healthcare communication
Travel applications
Educational platforms

Core Azure Services

Azure AI Speech

provides:

Speech-to-text (STT)
Text-to-speech (TTS)
Speech translation
Speaker recognition
Real-time transcription

Azure AI Translator

supports:

Text translation
Multilingual translation
Language detection
Custom translation models

Azure OpenAI Service

supports:

LLM-powered translation flows
Context-aware translation
Conversational reasoning
Multimodal AI

Azure AI Foundry

supports:

Workflow orchestration
Prompt flows
Agentic pipelines
Multimodal AI applications

Basic Speech Translation Workflow

A standard speech translation pipeline includes:

Audio input
Speech recognition
Language detection
Translation
Optional speech synthesis

Example Workflow

User speaks:

"Where is the nearest train station?"

Speech-to-text output:

Where is the nearest train station?

Translated text:

¿Dónde está la estación de tren más cercana?

Optional spoken response generated in Spanish.

Real-Time Translation

Streaming Translation Pipelines

Real-time translation systems:

Stream audio continuously
Process speech incrementally
Generate translations with low latency

This is essential for:

Live conversations
AI voice agents
Meetings
Customer service systems

Components of a Real-Time Pipeline

Typical components include:

Audio capture
Streaming transcription
Translation engine
Context-aware LLM reasoning
Speech synthesis

Language Detection

Speech translation systems often detect:

Spoken language automatically
Mixed-language conversations
Regional dialects

Example

User speaks French.

The system:

Detects French automatically
Converts speech to text
Translates to English
Returns spoken English response

Text Translation vs LLM Translation

Traditional Translation

Traditional translation engines:

Focus on linguistic accuracy
Translate sentence-by-sentence
Work well for standard phrases

LLM-Powered Translation

LLM translation can:

Preserve conversational context
Maintain tone
Adapt domain terminology
Handle ambiguous phrasing
Improve naturalness

Example

Literal translation:

The product crashed.

LLM-aware translation may interpret:

The software application failed unexpectedly.

based on technical context.

Domain-Aware Translation

Enterprise systems often require:

Industry terminology
Compliance wording
Medical vocabulary
Legal phrasing
Financial language

Example

Healthcare systems may require accurate translation of:

Diagnoses
Prescriptions
Procedures
Emergency instructions

Foundry Tools and Prompt Flows

Azure AI Foundry enables developers to:

Build translation pipelines
Chain speech and LLM components
Create multilingual agents
Orchestrate AI workflows

Example Prompt Flow

Pipeline:

Speech recognition
Translation
Sentiment analysis
RAG retrieval
Response generation
Text-to-speech

Multilingual AI Agents

Voice-enabled AI agents may:

Detect user language automatically
Respond in the same language
Switch languages dynamically
Maintain conversational context

Example

Customer speaks Japanese.

The AI agent:

Detects Japanese
Translates request internally
Queries enterprise systems
Generates response
Speaks Japanese response

Retrieval-Augmented Generation (RAG)

Translation systems may use:

Enterprise knowledge bases
Vector search
Document retrieval

to generate grounded multilingual responses.

Example RAG Translation Workflow

User asks question in Spanish
Speech converted to text
Question translated to English
RAG retrieves company documents
LLM generates grounded answer
Response translated back to Spanish
Spoken output returned

Speech Synthesis

Text-to-speech (TTS) enables systems to:

Speak translated content
Generate natural responses
Support conversational agents

Neural Voices

Modern TTS systems use:

Neural speech synthesis
Human-like prosody
Natural pacing
Emotional tone modeling

Custom Speech Models

Organizations may train models for:

Industry vocabulary
Brand terminology
Regional accents
Specialized pronunciation

Multimodal Reasoning

Advanced AI systems combine:

Speech
Text
Images
Contextual memory
External tools

to improve translation quality.

Example

A multilingual support agent:

Hears customer speech
Reads uploaded screenshots
Retrieves support documents
Generates translated instructions

Latency Considerations

Speech translation systems must minimize:

Recognition delay
Translation delay
Model inference time
Audio playback lag

Reducing Latency

Strategies include:

Streaming APIs
Smaller models
Incremental processing
Parallel workflows
Cached prompts

Cost Optimization

Translation workflows may become expensive at scale.

Optimization methods include:

Shorter prompts
Efficient chunking
Streaming responses
Model routing
Hybrid architectures

Responsible AI Considerations

Speech translation systems introduce important risks.

Translation Accuracy Risks

Potential issues include:

Misinterpretation
Cultural misunderstanding
Incorrect terminology
Hallucinated content

Bias and Fairness

Speech systems may perform differently across:

Accents
Dialects
Languages
Speaking styles

Organizations should evaluate:

Accuracy consistency
Fairness metrics
Language coverage

Privacy and Security

Speech data may contain:

Personal information
Financial data
Medical information
Confidential conversations

Security measures should include:

Encryption
Access control
Retention policies
Secure logging

Human-in-the-Loop Validation

High-risk scenarios may require:

Human translators
Escalation workflows
Confidence scoring
Manual review

Monitoring and Observability

Production systems should monitor:

Translation quality
Recognition accuracy
Latency
Failure rates
Token usage
Language detection accuracy

Real-World Example

A multinational company deploys an AI meeting assistant.

Workflow:

Employees speak different languages
Audio streamed into Azure AI Speech
Speech converted to text
Azure AI Translator translates content
Azure OpenAI summarizes meeting outcomes
TTS generates multilingual playback
Notes stored in enterprise systems

This demonstrates:

Real-time speech translation
LLM orchestration
Multilingual AI agents
Foundry workflow integration
Multimodal reasoning

Best Practices for AI-103

Use Streaming Pipelines

Enable real-time interactions.

Combine STT, Translation, and TTS

Create end-to-end multilingual workflows.

Ground LLM Responses

Use RAG to reduce hallucinations.

Evaluate Across Languages

Test performance for fairness and consistency.

Protect Sensitive Audio Data

Secure transcripts and recordings.

Use Human Review for Critical Scenarios

Especially in healthcare and legal domains.

Monitor Latency

Real-time conversations require fast responses.

Exam Tips for AI-103

For the AI-103 exam, remember these key concepts:

Speech translation includes STT, translation, and optional TTS.
Azure AI Speech supports speech translation workflows.
Azure AI Translator handles multilingual text translation.
Azure OpenAI Service enables context-aware LLM translation.
Azure AI Foundry orchestrates AI pipelines.
Streaming workflows reduce latency.
RAG improves grounded multilingual responses.
Neural TTS creates natural voice responses.
Responsible AI is critical for multilingual systems.
Translation systems must be evaluated for fairness and accuracy.

Practice Exam Questions

Question 1

What is the first step in a speech translation workflow?

A. Text summarization
B. Speech-to-text conversion
C. Vector indexing
D. OCR extraction

Answer

B. Speech-to-text conversion

Explanation

Speech translation workflows typically begin by converting spoken audio into text.

Question 2

Which Azure service provides speech recognition capabilities?

A. Azure Firewall
B. Azure VPN Gateway
C. Azure CDN
D. Azure AI Speech

Answer

D. Azure AI Speech

Explanation

Azure AI Speech supports speech recognition and speech translation features.

Question 3

Which service specializes in multilingual text translation?

A. Azure AI Translator
B. Azure Blob Storage
C. Azure Monitor
D. Azure Front Door

Answer

A. Azure AI Translator

Explanation

Azure AI Translator provides translation and language detection services.

Question 4

What is a benefit of LLM-powered translation compared to traditional translation?

A. Removal of speech recognition requirements
B. Elimination of all translation errors
C. Better contextual understanding
D. Lower storage costs only

Answer

C. Better contextual understanding

Explanation

LLMs can preserve conversational tone and domain context.

Question 5

Why are streaming workflows important for speech translation?

A. They reduce latency for real-time interactions
B. They disable multilingual support
C. They eliminate audio capture
D. They remove the need for translation models

Answer

A. They reduce latency for real-time interactions

Explanation

Streaming enables responsive multilingual conversations.

Question 6

What is Retrieval-Augmented Generation (RAG)?

A. Removing speaker identification
B. Compressing speech files
C. Encrypting translations automatically
D. Combining retrieval systems with LLM reasoning

Answer

D. Combining retrieval systems with LLM reasoning

Explanation

RAG retrieves trusted information before generating responses.

Question 7

What capability does text-to-speech (TTS) provide?

A. Video segmentation
B. Image classification
C. Spoken audio generation from text
D. OCR extraction

Answer

C. Spoken audio generation from text

Explanation

TTS converts text into synthesized speech.

Question 8

What is an important responsible AI concern for speech translation systems?

A. Accent bias and mistranslations
B. GPU fan speed
C. Storage redundancy
D. DNS routing policies

Answer

A. Accent bias and mistranslations

Explanation

Speech systems may perform differently across accents and languages.

Question 9

Which platform helps orchestrate AI translation pipelines and prompt flows?

A. Azure AI Foundry
B. Azure Virtual WAN
C. Azure DNS
D. Azure Files

Answer

A. Azure AI Foundry

Explanation

Azure AI Foundry supports orchestration of AI workflows and multimodal pipelines.

Question 10

Why might organizations use custom speech models?

A. To remove multilingual capabilities
B. To improve domain-specific vocabulary recognition
C. To disable TTS
D. To reduce cloud networking costs

Answer

B. To improve domain-specific vocabulary recognition

Explanation

Custom speech models improve recognition accuracy for specialized terminology.

Go to the AI-103 Exam Prep Hub main page

AI, AI-103, Artificial Intelligence (AI), Generative AI, Microsoft Certification May 25, 2026

Deploy and consume LLMs, small models, code models, and multimodal models (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement generative AI and agentic solutions (30–35%)
   --> Build generative applications by using Foundry
      --> Deploy and consume LLMs, small models, code models, and multimodal models

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Modern AI applications rely on a wide variety of AI models.

Different models are optimized for different workloads, including:

Conversational AI
Code generation
Text summarization
Image understanding
Audio processing
Reasoning tasks
Agentic workflows

The AI-103: Develop AI Apps and Agents on Azure certification exam tests your understanding of how to deploy and consume AI models in Azure AI Foundry.

For the AI-103 exam, you should understand:

Large language models (LLMs)
Small language models (SLMs)
Code models
Multimodal models
Model deployment concepts
Model consumption patterns
API-based model access
Endpoint configuration
Performance and cost tradeoffs
Model selection strategies
Responsible AI considerations

What Are Large Language Models (LLMs)?

Large language models are advanced AI systems trained on massive datasets.

LLMs can:

Generate text
Summarize documents
Answer questions
Translate languages
Reason across prompts
Support conversational AI

Common LLM Use Cases

Typical use cases include:

AI assistants
Enterprise chatbots
Content generation
Knowledge retrieval
Agent orchestration
Workflow automation

Characteristics of LLMs

LLMs typically provide:

Strong reasoning
Broad general knowledge
Advanced conversational abilities
Complex instruction following

However, they also:

Require more compute
Cost more to run
May introduce higher latency

What Are Small Language Models (SLMs)?

Small language models are lightweight models optimized for:

Faster inference
Lower cost
Lower latency
Edge deployment
Specialized tasks

Common SLM Use Cases

SLMs are often used for:

Classification
Simple chatbots
Mobile applications
Embedded AI
Lightweight assistants

Benefits of Small Models

Advantages include:

Reduced infrastructure cost
Faster response times
Lower resource requirements
Easier deployment at scale

LLM vs SLM Tradeoffs

LLMs

Best for:

Complex reasoning
Broad knowledge
Multi-step tasks

Tradeoffs:

Higher cost
Higher latency
Larger infrastructure requirements

SLMs

Best for:

Lightweight inference
Narrow tasks
Cost-sensitive workloads

Tradeoffs:

Reduced reasoning capability
Smaller context windows
Less flexibility

What Are Code Models?

Code models are specialized AI models trained for software development tasks.

These models can:

Generate code
Explain code
Complete functions
Debug issues
Convert between languages

Common Code Model Use Cases

Typical scenarios include:

Developer copilots
Code generation
Documentation generation
Test generation
Refactoring assistance

Code Model Capabilities

Code models often support:

Multiple programming languages
Natural language prompts
Code reasoning
Syntax understanding

What Are Multimodal Models?

Multimodal models process multiple types of input.

Examples include:

Text and images
Text and audio
Video and text

Multimodal AI Capabilities

Multimodal models may support:

Image understanding
OCR
Visual question answering
Audio transcription
Speech interaction
Video analysis

Common Multimodal Use Cases

Examples include:

AI vision assistants
Document understanding
Medical imaging analysis
Voice assistants
Image captioning

Model Deployment in Azure AI Foundry

Azure AI Foundry enables developers to:

Discover models
Deploy models
Test models
Monitor deployments
Consume models through APIs

Model Catalogs

Azure AI Foundry provides access to:

Foundation models
Open-source models
Specialized models
Multimodal models

Deployment Concepts

A deployment makes a model available through:

APIs
Endpoints
Applications
Agent workflows

Deployment Types

Common deployment options include:

Managed online deployments
Serverless deployments
Real-time inference endpoints
Batch inference deployments

Real-Time Inference

Real-time inference is used for:

Interactive chat
AI assistants
Live applications
Agent workflows

Batch Inference

Batch inference is used for:

Large-scale document processing
Offline analysis
Scheduled workloads
Bulk content generation

Endpoint Configuration

Deployments expose endpoints for application access.

Endpoints may include:

Authentication
Rate limits
Scaling policies
Monitoring settings

Authentication and Authorization

Applications may access models using:

API keys
Managed identities
Microsoft Entra ID
Role-based access control (RBAC)

Consuming Models Through APIs

Applications consume deployed models using:

REST APIs
SDKs
Client libraries

Prompt-Based Interactions

Generative AI applications commonly interact with models through prompts.

Prompts may include:

Instructions
Context
Examples
Retrieved documents

System Prompts

System prompts define:

AI behavior
Tone
Constraints
Safety policies

Model Parameters

Common inference parameters include:

Temperature
Top-p
Max tokens
Frequency penalty
Presence penalty

Temperature

Temperature controls output randomness.

Lower temperature:

More deterministic
More predictable

Higher temperature:

More creative
More variable

Context Windows

Context windows determine how much information a model can process in a request.

Larger context windows support:

Long conversations
Large documents
Multi-document grounding

Streaming Responses

Streaming enables applications to receive responses incrementally.

Benefits include:

Improved user experience
Faster perceived response times

Grounding Models

Grounding improves factual accuracy by providing trusted data.

Grounded applications commonly use:

Vector search
Retrieval-Augmented Generation (RAG)
Enterprise knowledge sources

Model Selection Considerations

Developers should evaluate:

Accuracy
Cost
Latency
Context size
Reasoning ability
Multimodal support
Scalability

Choosing Between Models

Use LLMs When:

Complex reasoning is required
Broad knowledge is needed
Multi-step workflows are involved

Use SLMs When:

Low latency matters
Cost optimization is critical
Tasks are narrow or repetitive

Use Code Models When:

Building developer tools
Generating code
Supporting programming workflows

Use Multimodal Models When:

Images or audio are required
Visual understanding is needed
Mixed media inputs are processed

Scaling Model Deployments

Scaling strategies may include:

Autoscaling
Regional deployments
Load balancing
Rate limiting

Monitoring Deployments

Organizations should monitor:

Latency
Throughput
Token usage
Errors
Safety events
Cost

Cost Optimization

Cost optimization strategies include:

Choosing smaller models
Limiting token usage
Caching responses
Using batch processing

Responsible AI Considerations

Developers should implement:

Safety filters
Guardrails
Content moderation
Monitoring
Human oversight

Multimodal Safety Concerns

Multimodal systems may require:

Image moderation
OCR filtering
Audio moderation
Content safety evaluation

Agentic AI and Model Consumption

AI agents may use:

LLMs for reasoning
SLMs for lightweight tasks
Code models for automation
Multimodal models for perception

Common AI-103 Deployment Scenarios

Scenario 1: Enterprise Chatbot

Requirements:

Strong reasoning
Long conversations
Grounded responses

Recommended Model:

LLM with RAG

Scenario 2: Mobile AI Assistant

Requirements:

Fast responses
Low cost
Lightweight inference

Recommended Model:

Small language model

Scenario 3: Developer Copilot

Requirements:

Code generation
Programming assistance
Syntax awareness

Recommended Model:

Code model

Scenario 4: Image-Aware AI Assistant

Requirements:

Image analysis
OCR
Text generation

Recommended Model:

Multimodal model

Common AI-103 Exam Tips

Understand Model Categories

Know the differences between:

LLMs
SLMs
Code models
Multimodal models

Learn Deployment Concepts

Understand:

Endpoints
Real-time inference
Batch inference
Scaling

Learn Consumption Patterns

Know:

REST APIs
SDKs
Prompt engineering
System prompts

Understand Cost and Performance Tradeoffs

Know how:

Model size affects cost
Context size affects latency
Scaling impacts performance

Summary

Azure AI Foundry enables developers to deploy and consume a wide range of AI models.

For the AI-103 exam, you should understand:

LLMs
Small language models
Code models
Multimodal models
Deployment options
Model consumption patterns
Prompt engineering
Scaling strategies
Cost optimization
Responsible AI controls

Choosing the right model and deployment strategy is essential for building:

Scalable
Reliable
Efficient
Responsible AI solutions

These concepts are foundational for generative AI and agentic systems on Azure.

Practice Exam Questions

Question 1

What is a primary strength of large language models (LLMs)?

A. Minimal compute usage
B. Complex reasoning and broad knowledge
C. Guaranteed factual accuracy
D. Extremely low latency

Answer

B. Complex reasoning and broad knowledge

Explanation

LLMs excel at reasoning, conversation, and broad knowledge tasks.

Question 2

Which model type is best suited for lightweight, low-cost inference?

A. Large language model
B. Small language model
C. Multimodal model
D. Vision transformer only

Answer

B. Small language model

Explanation

SLMs are optimized for lower latency and reduced cost.

Question 3

Which model type is specifically optimized for programming tasks?

A. Vision model
B. Code model
C. Embedding model
D. Speech model

Answer

B. Code model

Explanation

Code models are trained for software development workflows.

Question 4

What is a defining feature of multimodal models?

A. They only process text
B. They process multiple input types
C. They eliminate inference costs
D. They require no prompting

Answer

B. They process multiple input types

Explanation

Multimodal models handle text, images, audio, and other media.

Question 5

Which deployment type is best for interactive AI chat applications?

A. Batch inference
B. Real-time inference
C. Archive deployment
D. Offline storage deployment

Answer

B. Real-time inference

Explanation

Interactive applications require low-latency real-time inference.

Question 6

What does the temperature parameter control?

A. Network throughput
B. Output randomness and creativity
C. Storage replication
D. GPU memory allocation

Answer

B. Output randomness and creativity

Explanation

Temperature affects how deterministic or creative outputs become.

Question 7

Which technique improves factual accuracy by using trusted data sources?

A. GPU scaling
B. Retrieval-Augmented Generation (RAG)
C. Semantic caching
D. Compression indexing

Answer

B. Retrieval-Augmented Generation (RAG)

Explanation

RAG grounds model outputs using retrieved enterprise data.

Question 8

What is a major benefit of streaming responses?

A. Reduced storage costs
B. Faster perceived response times
C. Elimination of monitoring
D. Improved vector indexing

Answer

B. Faster perceived response times

Explanation

Streaming improves user experience during response generation.

Question 9

Which authentication method supports passwordless access to Azure AI services?

A. Static credentials only
B. Managed identities
C. Anonymous access
D. Embedded API secrets in code

Answer

B. Managed identities

Explanation

Managed identities support secure, keyless authentication.

Question 10

Which model type is most appropriate for image understanding and OCR tasks?

A. Small language model
B. Multimodal model
C. Traditional relational database
D. Static rules engine

Answer

B. Multimodal model

Explanation

Multimodal models process images and text together.

Go to the AI-103 Exam Prep Hub main page

AI, AI-103, Artificial Intelligence (AI), Microsoft Certification May 25, 2026

Choose an appropriate model for each task, including large language models (LLMs), small language models, multimodal models, and Foundry Tools (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Plan and manage an Azure AI solution (25–30%)
   --> Choose the appropriate Foundry services for generative AI and agents
      --> Choose an appropriate model for each task, including large language models (LLMs), small language models, multimodal models, and Foundry Tools

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the most important skills for the AI-103: Develop AI Apps and Agents on Azure certification exam is understanding how to choose the correct AI model and supporting Azure AI Foundry tools for a given business or technical scenario.

Modern AI development is no longer about simply selecting “an AI model.” Instead, developers must evaluate:

The type of task being performed
Cost constraints
Latency requirements
Accuracy expectations
Reasoning complexity
Context window needs
Multimodal capabilities
Deployment environment
Security and governance requirements
Agent orchestration requirements

Azure AI Foundry provides access to multiple categories of models and tools that help developers build generative AI applications and AI agents efficiently.

For the AI-103 exam, you should understand:

When to use Large Language Models (LLMs)
When Small Language Models (SLMs) are preferable
When multimodal models are required
How Azure AI Foundry tools support model selection and orchestration
Tradeoffs between performance, cost, speed, and capability
Common real-world scenarios for each model category

Azure AI Foundry Overview

Azure AI Foundry is Microsoft’s unified platform for building, evaluating, deploying, and managing AI applications and agents.

Azure AI Foundry provides:

Access to foundation models
Agent development capabilities
Prompt engineering tools
Evaluation tools
Safety and content filtering
Retrieval-augmented generation (RAG) support
Fine-tuning capabilities
Monitoring and observability
Integration with Azure AI services

Azure AI Foundry enables developers to:

Compare multiple models
Test prompts
Evaluate outputs
Build AI agents
Connect enterprise data
Deploy scalable AI applications

For the AI-103 exam, understanding the relationship between model capabilities and Azure AI Foundry tools is extremely important.

Understanding Model Categories

The exam focuses heavily on selecting the correct model type for specific tasks.

The major categories include:

Large Language Models (LLMs)
Small Language Models (SLMs)
Multimodal Models
Embedding Models
Specialized Models

Each category serves different purposes.

Large Language Models (LLMs)

What Are Large Language Models?

Large Language Models are advanced AI models trained on massive datasets containing text, code, and other information.

LLMs are designed for:

Natural language understanding
Natural language generation
Complex reasoning
Summarization
Coding assistance
Question answering
Conversational AI
Agent workflows
Content creation

Examples include:

GPT-4 family models
GPT-4o models
GPT-4 Turbo
Phi large models
Other frontier foundation models available in Azure AI Foundry

Characteristics of LLMs

Strengths

LLMs are excellent at:

Complex Reasoning

Examples:

Multi-step problem solving
Data interpretation
Logical analysis
Decision support

Advanced Content Generation

Examples:

Marketing content
Technical documentation
Email drafting
Knowledge-base generation

Conversational Experiences

Examples:

AI chatbots
AI copilots
Virtual assistants
Interactive tutoring systems

Agentic Workflows

LLMs are commonly used as the “reasoning engine” behind AI agents.

They can:

Plan tasks
Determine next actions
Call tools
Use memory
Chain workflows
Interact with APIs

Limitations of LLMs

Although powerful, LLMs have tradeoffs.

Higher Cost

LLMs generally:

Require more compute
Cost more per token
Increase infrastructure expenses

Increased Latency

Larger models may:

Respond more slowly
Increase application response times
Affect real-time user experiences

Resource Requirements

LLMs require:

More GPU resources
More memory
Larger deployments

Overkill for Simple Tasks

Using GPT-4-level reasoning for basic classification or short summarization tasks may be unnecessary and expensive.

When to Use LLMs

Choose an LLM when tasks require:

Advanced reasoning
Long-context understanding
High-quality content generation
Complex conversational behavior
Tool calling and agent orchestration
Coding assistance
Sophisticated summarization
Enterprise copilots

Example LLM Scenarios

Scenario 1: Enterprise AI Copilot

A company wants an AI assistant that:

Reads internal documentation
Answers employee questions
Generates summaries
Explains policies
Uses tools and APIs

Best choice:

Large Language Model with RAG integration

Reason:

Requires reasoning and conversational understanding.

Scenario 2: AI Coding Assistant

A development team needs:

Code generation
Debugging suggestions
Refactoring support
Documentation generation

Best choice:

Advanced LLM

Reason:

Coding tasks require complex contextual reasoning.

Small Language Models (SLMs)

What Are Small Language Models?

Small Language Models are more lightweight AI models optimized for:

Faster responses
Lower costs
Lower resource consumption
Edge deployments
Narrower tasks

Examples include:

Smaller Phi models
Compact transformer-based models
Task-specific lightweight models

Characteristics of SLMs

Strengths

Lower Cost

SLMs:

Consume fewer resources
Cost less to run
Reduce token usage costs

Faster Inference

SLMs typically:

Respond more quickly
Improve responsiveness
Support near real-time interactions

Edge and Mobile Suitability

SLMs may run:

On edge devices
On mobile hardware
In constrained environments

Efficient for Narrow Tasks

SLMs work well for:

Classification
Basic summarization
Intent detection
Simple chat interactions
Lightweight automation

Limitations of SLMs

Reduced Reasoning Ability

Compared to LLMs, SLMs may struggle with:

Complex logic
Long context handling
Multi-step reasoning
Sophisticated conversations

Lower Output Quality

Outputs may:

Be less nuanced
Contain reduced detail
Provide weaker contextual understanding

When to Use SLMs

Choose an SLM when:

Speed is critical
Cost optimization matters
Tasks are relatively simple
Edge deployment is needed
High throughput is required
Lightweight AI experiences are sufficient

Example SLM Scenarios

Scenario 1: Customer Intent Classification

An application classifies support tickets into categories such as:

Billing
Technical support
Returns
Sales

Best choice:

Small Language Model

Reason:

Classification is relatively simple and does not require advanced reasoning.

Scenario 2: Edge Device Assistant

A manufacturing company deploys an AI assistant on factory equipment with limited compute.

Best choice:

Small Language Model

Reason:

Edge environments benefit from lightweight models.

Multimodal Models

What Are Multimodal Models?

Multimodal models can process multiple data types simultaneously.

Examples include:

Text
Images
Audio
Video
Documents

These models combine information across modalities to produce richer outputs.

Capabilities of Multimodal Models

Multimodal models can:

Analyze images and answer questions about them
Generate captions from images
Extract information from documents
Process speech and text together
Understand charts and diagrams
Support visual reasoning

Common Multimodal Tasks

Image Understanding

Examples:

Object detection
Scene analysis
Image captioning
Visual question answering

Document Intelligence

Examples:

Invoice extraction
Receipt processing
Form analysis
OCR workflows

Audio + Text Experiences

Examples:

Voice assistants
Meeting summarization
Speech transcription
Audio analysis

When to Use Multimodal Models

Choose multimodal models when applications involve:

Images and text together
Document processing
Speech interactions
Visual understanding
Cross-modal reasoning

Example Multimodal Scenarios

Scenario 1: Invoice Processing

A company needs to:

Read invoices
Extract totals
Identify vendors
Validate line items

Best choice:

Multimodal document processing model

Reason:

The solution must interpret both layout and text.

Scenario 2: Retail Image Assistant

Users upload photos of products and ask questions about them.

Best choice:

Multimodal model

Reason:

Requires simultaneous image and text understanding.

Embedding Models

What Are Embedding Models?

Embedding models convert text or other content into vector representations.

These vectors capture semantic meaning.

Embedding models are essential for:

Semantic search
Retrieval-Augmented Generation (RAG)
Similarity matching
Recommendation systems
Knowledge retrieval

Retrieval-Augmented Generation (RAG)

RAG combines:

Embedding models
Vector databases
LLMs

Workflow:

Convert documents into embeddings
Store embeddings in a vector index
Convert user query into embeddings
Retrieve relevant content
Send retrieved data to the LLM

RAG improves:

Accuracy
Freshness of information
Enterprise grounding
Hallucination reduction

Specialized Models

Some tasks are better handled by specialized AI models instead of general-purpose LLMs.

Examples:

Translation models
Speech models
OCR models
Vision models
Classification models

Why Specialized Models Matter

Specialized models may provide:

Better accuracy
Lower cost
Faster performance
Simpler deployment

Example:

Using a dedicated OCR service is often more efficient than asking an LLM to read text from images.

Model Selection Factors

The AI-103 exam heavily tests your ability to select the correct model based on requirements.

Factor 1: Task Complexity

Use LLMs For:

Advanced reasoning
Multi-step workflows
Complex conversations

Use SLMs For:

Simple classification
Lightweight interactions
Fast automation

Factor 2: Cost

LLMs

Higher operational cost
More expensive inference

SLMs

Lower operational cost
Better for high-volume workloads

Factor 3: Latency

Low-Latency Requirements

Prefer:

SLMs
Lightweight models

Complex Processing

Prefer:

LLMs

Even if response time increases.

Factor 4: Context Window

Some tasks require processing:

Long documents
Large conversations
Extensive histories

Choose models with larger context windows for:

Legal analysis
Knowledge assistants
Long-form summarization

Factor 5: Multimodal Requirements

If the application involves:

Images
Audio
Video
Documents

Choose multimodal-capable models.

Factor 6: Deployment Environment

Cloud-Hosted Applications

May use:

Large frontier models
GPU-intensive deployments

Edge or Mobile Deployments

Prefer:

Small models
Quantized models
Lightweight inference

Azure AI Foundry Tools

Azure AI Foundry includes numerous tools that support model selection and AI application development.

Model Catalog

The Model Catalog allows developers to:

Browse available models
Compare capabilities
Review benchmarks
Deploy models
Evaluate pricing

The catalog includes:

Microsoft-hosted models
Open-source models
Partner models
Frontier models

Prompt Flow

Prompt Flow helps developers:

Build AI workflows
Chain prompts together
Integrate tools
Evaluate prompts
Test model behavior

Prompt Flow is useful for:

Agent orchestration
RAG pipelines
Multi-step AI workflows

AI Agent Development Tools

Azure AI Foundry supports AI agents that can:

Use tools
Access data
Maintain memory
Perform actions
Execute workflows

Agent frameworks may include:

Tool calling
Function calling
Retrieval integration
Multi-agent orchestration

Evaluation Tools

Evaluation tools help developers assess:

Accuracy
Groundedness
Safety
Relevance
Latency
Cost

Evaluation is critical because model quality varies by task.

Content Safety Tools

Azure AI Foundry includes safety features such as:

Content filtering
Harm detection
Prompt injection detection
Responsible AI controls

These tools help ensure safe AI deployments.

Fine-Tuning Tools

Fine-tuning allows developers to customize models using:

Domain-specific data
Proprietary terminology
Specialized workflows

Fine-tuning may improve:

Accuracy
Consistency
Industry-specific responses

However, fine-tuning also:

Increases cost
Requires data preparation
Adds operational complexity

Choosing Between Prompt Engineering, RAG, and Fine-Tuning

This is a very important AI-103 exam topic.

Prompt Engineering

Use when:

You need quick customization
Tasks are general-purpose
No private data integration is needed

Advantages:

Fast
Cheap
Easy to maintain

RAG

Use when:

You need current or proprietary data
You want grounding in enterprise content
You need dynamic knowledge retrieval

Advantages:

Reduces hallucinations
Keeps knowledge current
Avoids retraining

Fine-Tuning

Use when:

Consistent specialized outputs are required
Domain language is highly unique
Behavioral customization is necessary

Advantages:

Tailored responses
Better domain alignment

Real-World Model Selection Examples

Example 1: FAQ Chatbot

Requirements:

Low cost
Fast responses
Basic conversational support

Best Choice:

Small Language Model + RAG

Example 2: Legal Document Assistant

Requirements:

Long-context understanding
Detailed summarization
Advanced reasoning

Best Choice:

Large Language Model with large context window

Example 3: Mobile AI App

Requirements:

Offline capability
Fast performance
Low resource usage

Best Choice:

Small Language Model

Example 4: Image-Based Customer Support

Requirements:

Analyze uploaded photos
Understand text and images
Generate responses

Best Choice:

Multimodal model

Key AI-103 Exam Tips

Understand Tradeoffs

You should know:

Bigger models are not always better
Simpler tasks may not require advanced LLMs
Cost and latency matter
Specialized models may outperform general models

Know Common Pairings

LLM + RAG

Used for:

Enterprise chatbots
Knowledge assistants
AI copilots

Embeddings + Vector Search

Used for:

Semantic search
Knowledge retrieval
Similarity matching

Multimodal Models

Used for:

Vision AI
Document processing
Audio interactions

Learn the Azure AI Foundry Ecosystem

Know the purpose of:

Model Catalog
Prompt Flow
Evaluation tools
Agent tools
Safety systems
Fine-tuning workflows

Summary

Selecting the correct AI model is one of the most important responsibilities for an Azure AI developer.

For the AI-103 exam, you should understand:

The differences between LLMs and SLMs
When multimodal models are required
How embedding models support RAG
When specialized models outperform general-purpose models
The tradeoffs between cost, speed, and reasoning capability
How Azure AI Foundry tools support AI development and orchestration

In real-world AI systems, choosing the correct model can dramatically improve:

Performance
User experience
Scalability
Operational cost
Reliability
Maintainability

A strong understanding of model selection is essential for designing effective Azure AI applications and AI agents.

Practice Exam Questions

Question 1

A company is building an enterprise AI assistant that must answer complex employee questions using internal documentation and perform multi-step reasoning. Which model type is MOST appropriate?

A. Small Language Model (SLM)
B. Embedding model only
C. Large Language Model (LLM)
D. OCR model

Answer

C. Large Language Model (LLM)

Explanation

Complex reasoning and conversational understanding are best handled by LLMs.

Question 2

Which model type is generally BEST for low-cost, low-latency classification tasks?

A. Large multimodal model
B. Small Language Model (SLM)
C. GPT-4-class reasoning model
D. Vision foundation model

Answer

B. Small Language Model (SLM)

Explanation

SLMs are optimized for lightweight and cost-efficient tasks.

Question 3

A solution must process uploaded invoices and extract totals, vendor names, and line items. Which model type is MOST appropriate?

A. Embedding model
B. Small Language Model
C. Multimodal model
D. Translation model

Answer

C. Multimodal model

Explanation

Invoice extraction requires understanding both layout and text.

Question 4

What is the primary purpose of embedding models?

A. Image generation
B. Semantic vector representation
C. Audio transcription
D. Tool orchestration

Answer

B. Semantic vector representation

Explanation

Embedding models convert content into vectors for semantic search and retrieval.

Question 5

Which Azure AI Foundry tool helps developers chain prompts, integrate tools, and build AI workflows?

A. Azure Monitor
B. Prompt Flow
C. Azure Policy
D. Azure Functions

Answer

B. Prompt Flow

Explanation

Prompt Flow is designed for workflow orchestration and prompt pipelines.

Question 6

A mobile AI application must operate with minimal compute resources and very fast response times. Which model type is MOST appropriate?

A. Large Language Model
B. Small Language Model
C. Large multimodal model
D. High-context reasoning model

Answer

B. Small Language Model

Explanation

SLMs are optimized for lightweight and edge deployments.

Question 7

Which approach is BEST when an AI chatbot must use current enterprise data without retraining the model?

A. Fine-tuning only
B. Prompt engineering only
C. Retrieval-Augmented Generation (RAG)
D. Quantization

Answer

C. Retrieval-Augmented Generation (RAG)

Explanation

RAG retrieves current information dynamically without retraining.

Question 8

Which factor MOST strongly indicates that a multimodal model is required?

A. Need for vector embeddings
B. Need for faster response times
C. Need to process images and text together
D. Need for lower cost

Answer

C. Need to process images and text together

Explanation

Multimodal models handle multiple input modalities simultaneously.

Question 9

What is a major tradeoff of using larger language models?

A. Reduced reasoning capability
B. Lower context windows
C. Increased operational cost
D. Inability to support agents

Answer

C. Increased operational cost

Explanation

Larger models typically require more compute resources and cost more.

Question 10

Which Azure AI Foundry capability helps evaluate model quality, safety, and groundedness?

A. Azure Load Balancer
B. Evaluation tools
C. Azure Backup
D. Traffic Manager

Answer

B. Evaluation tools

Explanation

Evaluation tools assess output quality, safety, and performance metrics.

Go to the AI-103 Exam Prep Hub main page