This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub.
This topic falls under these sections:
Implement generative AI and agentic solutions (30–35%)
--> Optimize and operationalize generative AI systems
--> Tune generation behavior, such as prompt engineering and adjusting model parameters
Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Introduction
One of the most important responsibilities of an AI developer is controlling and optimizing the behavior of generative AI systems. Large language models (LLMs) are highly flexible, but without proper tuning, prompts, and parameter adjustments, responses may become inaccurate, inconsistent, unsafe, verbose, expensive, or irrelevant.
For the AI-103 certification exam, candidates must understand how to tune generation behavior in Azure AI Foundry and related Azure AI services. This includes:
- Prompt engineering
- System messages
- Few-shot prompting
- Context management
- Retrieval grounding
- Adjusting model parameters
- Temperature tuning
- Token limits
- Sampling controls
- Output formatting
- Structured outputs
- Response optimization
- Safety tuning
- Evaluation and iteration
This article explains the concepts, techniques, tools, and best practices needed to tune generative AI systems effectively.
What Does “Generation Behavior” Mean?
Generation behavior refers to how a generative AI model responds to prompts and tasks.
Behavior includes:
- Creativity
- Accuracy
- Consistency
- Verbosity
- Tone
- Reasoning style
- Formatting
- Safety
- Tool usage behavior
- Retrieval usage
- Determinism
- Hallucination tendency
Developers influence generation behavior primarily through:
- Prompt engineering
- Model parameter tuning
- Grounding and retrieval
- Tool orchestration
- Safety configurations
- Output constraints
Prompt Engineering
What Is Prompt Engineering?
Prompt engineering is the process of designing prompts that guide the model toward desired outputs.
A prompt may include:
- Instructions
- Context
- Examples
- Constraints
- Formatting requirements
- Role definitions
- Retrieved content
Effective prompting significantly improves:
- Accuracy
- Relevance
- Safety
- Consistency
- User experience
Types of Prompts
System Prompts
System prompts define the overall behavior and rules for the model.
Examples:
- “You are a professional customer support assistant.”
- “Always answer using concise bullet points.”
- “Do not provide legal advice.”
System prompts are extremely important in agent systems.
They establish:
- Personality
- Tone
- Safety rules
- Tool usage guidance
- Behavioral boundaries
User Prompts
User prompts contain the actual request from the user.
Example:
Summarize this sales report.
Assistant Messages
Assistant messages represent prior model responses in conversational systems.
These messages help maintain:
- Context
- Continuity
- Conversation memory
Zero-Shot Prompting
Zero-shot prompting provides instructions without examples.
Example:
Classify the sentiment of this review as positive, negative, or neutral.
Advantages:
- Simple
- Fast
- Efficient
Disadvantages:
- Less consistent
- More variability
Few-Shot Prompting
Few-shot prompting includes examples that demonstrate desired behavior.
Example:
Review: The food was amazing.Sentiment: PositiveReview: The service was terrible.Sentiment: NegativeReview: The hotel was acceptable.Sentiment:
Advantages:
- Better consistency
- Improved formatting
- Improved reasoning
Disadvantages:
- Uses more tokens
- Increases cost
Chain-of-Thought Prompting
Chain-of-thought prompting encourages step-by-step reasoning.
Example:
Explain your reasoning step by step.
Useful for:
- Math
- Logic
- Planning
- Multistep reasoning
Benefits:
- Improved reasoning quality
- Better transparency
Risks:
- Higher token usage
- Longer latency
Role Prompting
Role prompting assigns a specific role or identity.
Examples:
- Financial analyst
- Teacher
- Security auditor
- Travel planner
Example:
You are an experienced cloud architect specializing in Azure AI.
Role prompting improves domain alignment.
Context Injection
Context injection provides supporting information within prompts.
Example:
Use the following company policy when answering:
Context may come from:
- Documents
- Databases
- APIs
- Azure AI Search
- Knowledge stores
This is a core concept in RAG systems.
Prompt Templates
Prompt templates standardize prompts dynamically.
Example:
Summarize the following document in {language}:{document}
Benefits:
- Reusability
- Maintainability
- Consistency
Prompt Chaining
Prompt chaining breaks complex tasks into smaller prompts.
Example workflow:
- Extract key topics
- Summarize each topic
- Generate final report
Advantages:
- Better reasoning
- Improved reliability
- Easier debugging
Retrieval-Augmented Prompting
Retrieval-augmented generation (RAG) adds retrieved content into prompts.
Example:
Answer using only the following documents.
Benefits:
- Reduced hallucinations
- Better grounding
- More current information
Structured Output Prompting
Developers often require structured outputs.
Example:
Return the response as JSON.
Benefits:
- Easier parsing
- API integration
- Workflow automation
Structured outputs are common in:
- Agents
- Automation systems
- Function calling
Prompt Engineering Best Practices
Be Clear and Specific
Bad prompt:
Tell me about Azure.
Better prompt:
Explain Azure AI Foundry for beginners in fewer than 200 words.
Define Constraints
Examples:
- Maximum length
- Formatting rules
- Safety restrictions
- Source limitations
Use Examples
Few-shot examples improve consistency.
Reduce Ambiguity
Ambiguous prompts produce inconsistent results.
Test and Iterate
Prompt engineering is iterative.
Developers should continuously evaluate and improve prompts.
Model Parameters
Model parameters strongly affect output behavior.
Important parameters include:
- Temperature
- Top-p
- Maximum tokens
- Frequency penalty
- Presence penalty
- Stop sequences
Temperature
What Is Temperature?
Temperature controls randomness in model outputs.
Lower temperature:
- More deterministic
- More focused
- Less creative
Higher temperature:
- More creative
- More diverse
- Less predictable
Low Temperature Examples
Typical range:
0.0 – 0.3
Best for:
- Fact-based answers
- Technical support
- Classification
- Compliance workflows
High Temperature Examples
Typical range:
0.7 – 1.0
Best for:
- Brainstorming
- Creative writing
- Marketing ideas
- Story generation
Top-p Sampling
Top-p controls token selection diversity.
The model considers only the most probable tokens whose cumulative probability reaches p.
Lower top-p:
- More focused responses
- Less diversity
Higher top-p:
- More varied responses
Temperature and top-p often work together.
Maximum Tokens
Maximum tokens limit response length.
Benefits:
- Cost control
- Latency reduction
- Preventing excessive responses
Risks:
- Responses may be truncated if limit is too low.
Frequency Penalty
Frequency penalty reduces repeated words or phrases.
Useful for:
- Avoiding repetition
- Improving readability
Presence Penalty
Presence penalty encourages introducing new topics.
Higher presence penalty:
- More topic diversity
- Less repetition
Stop Sequences
Stop sequences define where generation should stop.
Example:
Stop when “END_RESPONSE” appears.
Useful for:
- Structured outputs
- Tool workflows
- Multi-agent orchestration
Deterministic vs Creative Behavior
Deterministic Systems
Characteristics:
- Consistent outputs
- Repeatable behavior
- Lower creativity
Best for:
- Enterprise workflows
- Compliance systems
- Customer support
- Automation
Recommended settings:
- Low temperature
- Lower top-p
Creative Systems
Characteristics:
- Diverse outputs
- More exploration
- Greater variability
Best for:
- Ideation
- Content creation
- Brainstorming
Recommended settings:
- Higher temperature
- Higher top-p
Tuning for RAG Applications
RAG systems require special tuning.
Developers should optimize:
- Retrieval quality
- Prompt grounding
- Context window usage
- Citation instructions
- Hallucination reduction
Example grounding instruction:
Answer only using the retrieved documents.
Tuning Agent Systems
Agents require additional behavioral tuning.
Developers tune:
- Tool usage behavior
- Planning behavior
- Memory usage
- Conversation flow
- Escalation behavior
- Approval workflows
Example:
Only call the refund API after confirming the user identity.
Function Calling and Structured Generation
Models can generate structured tool calls.
Example JSON schema:
{ "city": "Orlando", "unit": "Fahrenheit"}
Prompt tuning improves:
- Schema adherence
- Parameter accuracy
- Tool selection
Controlling Hallucinations
Hallucinations are a major tuning challenge.
Methods to reduce hallucinations:
- Lower temperature
- Use grounding
- Improve retrieval
- Add citation requirements
- Use smaller focused prompts
- Add explicit instructions
Example:
If the answer is not found in the documents, say you do not know.
Safety-Oriented Prompting
Prompts should include safety constraints.
Examples:
Do not generate harmful or unsafe instructions.
Safety prompting helps:
- Reduce harmful outputs
- Prevent jailbreaks
- Enforce policy compliance
Prompt Injection Defense
Attackers may attempt prompt injection.
Example:
Ignore all previous instructions.
Defensive techniques:
- Strong system prompts
- Tool restrictions
- Output validation
- Context isolation
- Human approval workflows
Evaluating Prompt Quality
Developers evaluate prompts using:
- Accuracy metrics
- Grounding scores
- User feedback
- Safety evaluations
- Latency measurements
- Cost analysis
Prompt quality evaluation is iterative.
A/B Testing Prompts
A/B testing compares multiple prompts.
Example:
- Prompt A produces concise responses.
- Prompt B produces detailed responses.
Metrics determine which prompt performs better.
Cost Optimization Through Tuning
Good tuning reduces costs.
Strategies include:
- Smaller prompts
- Lower token counts
- Smaller models
- Efficient retrieval
- Reduced chain-of-thought usage
Azure AI Foundry Support for Tuning
Azure AI Foundry supports:
- Prompt flow design
- Model evaluation
- Safety evaluations
- Deployment management
- Agent orchestration
- Evaluation pipelines
- Monitoring and telemetry
Developers can iterate quickly and compare outputs.
Common Tuning Mistakes
Overly Long Prompts
Problems:
- Increased cost
- Higher latency
- Context dilution
Excessive Temperature
Problems:
- Hallucinations
- Inconsistent outputs
- Unsafe behavior
Weak Instructions
Problems:
- Ambiguous responses
- Poor formatting
- Incorrect tool usage
Lack of Evaluation
Problems:
- Hidden failures
- Safety risks
- Poor user experience
Real-World Examples
Customer Support Bot
Goals:
- Accurate answers
- Consistent tone
- Fast responses
Recommended settings:
- Low temperature
- Grounded retrieval
- Structured outputs
Creative Writing Assistant
Goals:
- Diverse ideas
- Creative language
- Engaging responses
Recommended settings:
- Higher temperature
- Higher top-p
Financial Advisory Agent
Goals:
- High accuracy
- Low hallucination risk
- Compliance adherence
Recommended settings:
- Very low temperature
- Strict grounding
- Human approval workflows
AI-103 Exam Tips
For the AI-103 exam, remember these key points:
- Prompt engineering strongly influences model behavior.
- System prompts define overall agent behavior.
- Few-shot prompting improves consistency.
- Lower temperature produces more deterministic outputs.
- Higher temperature increases creativity.
- Top-p controls response diversity.
- Maximum tokens control output length.
- RAG improves grounding and reduces hallucinations.
- Structured outputs are important for tool workflows.
- Prompt tuning is iterative and evaluation-driven.
- Safety prompting helps reduce harmful outputs.
- Prompt injection is a security concern.
Practice Exam Questions
Question 1
What is the primary purpose of prompt engineering?
A. Increase GPU memory
B. Guide the model toward desired outputs
C. Eliminate all costs
D. Replace embeddings
Correct Answer
B. Guide the model toward desired outputs
Explanation
Prompt engineering designs prompts that improve accuracy, consistency, formatting, and safety.
Question 2
Which parameter most directly controls output randomness?
A. Max tokens
B. Presence penalty
C. Temperature
D. Context window
Correct Answer
C. Temperature
Explanation
Temperature controls response randomness and creativity.
Question 3
What is a common benefit of few-shot prompting?
A. Reduced token usage
B. Better output consistency
C. Elimination of latency
D. Automatic vector search
Correct Answer
B. Better output consistency
Explanation
Few-shot examples help models understand desired formatting and behavior.
Question 4
Which setting is most appropriate for a compliance-focused enterprise chatbot?
A. High temperature
B. Very low temperature
C. Maximum randomness
D. No grounding
Correct Answer
B. Very low temperature
Explanation
Compliance systems require deterministic and reliable outputs.
Question 5
What is the purpose of maximum token settings?
A. Control response length
B. Increase retrieval quality
C. Encrypt prompts
D. Replace embeddings
Correct Answer
A. Control response length
Explanation
Maximum tokens limit the size of generated responses.
Question 6
Which technique helps reduce hallucinations in RAG systems?
A. Increasing randomness
B. Removing retrieval
C. Grounding responses in retrieved content
D. Eliminating prompts
Correct Answer
C. Grounding responses in retrieved content
Explanation
Grounding helps models answer using trusted retrieved information.
Question 7
What is a system prompt primarily used for?
A. Storing embeddings
B. Defining overall model behavior and rules
C. Encrypting responses
D. Monitoring latency
Correct Answer
B. Defining overall model behavior and rules
Explanation
System prompts establish tone, constraints, and behavioral guidance.
Question 8
What is the purpose of structured output prompting?
A. Improve network routing
B. Produce machine-readable outputs such as JSON
C. Reduce GPU utilization
D. Increase hallucinations
Correct Answer
B. Produce machine-readable outputs such as JSON
Explanation
Structured outputs simplify automation and API integration.
Question 9
Which tuning strategy is most likely to reduce cost?
A. Increasing token usage
B. Using unnecessarily large prompts
C. Reducing prompt size and response length
D. Maximizing chain-of-thought reasoning for every request
Correct Answer
C. Reducing prompt size and response length
Explanation
Smaller prompts and shorter outputs reduce token consumption.
Question 10
What is a major risk of setting temperature too high?
A. Reduced creativity
B. Increased hallucinations and inconsistency
C. Elimination of variability
D. Reduced response diversity
Correct Answer
B. Increased hallucinations and inconsistency
Explanation
Higher temperature increases randomness and may reduce reliability.
Final Thoughts
Tuning generation behavior is one of the most important skills for modern AI developers. Through effective prompt engineering and careful parameter tuning, developers can optimize AI systems for accuracy, safety, cost efficiency, consistency, and user satisfaction.
For the AI-103 exam, candidates should understand:
- Prompt engineering strategies
- System prompts and role prompting
- Few-shot and chain-of-thought prompting
- Temperature and top-p tuning
- Structured outputs
- Hallucination reduction techniques
- Safety prompting
- RAG grounding strategies
- Cost optimization methods
- Prompt evaluation and iteration
Strong tuning practices are essential for building reliable, production-grade AI applications and agents on Azure.
Go to the AI-103 Exam Prep Hub main page
