This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement generative AI and agentic solutions (30–35%)
   --> Optimize and operationalize generative AI systems
      --> Tune generation behavior, such as prompt engineering and adjusting model parameters

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the most important responsibilities of an AI developer is controlling and optimizing the behavior of generative AI systems. Large language models (LLMs) are highly flexible, but without proper tuning, prompts, and parameter adjustments, responses may become inaccurate, inconsistent, unsafe, verbose, expensive, or irrelevant.

For the AI-103 certification exam, candidates must understand how to tune generation behavior in Azure AI Foundry and related Azure AI services. This includes:

Prompt engineering
System messages
Few-shot prompting
Context management
Retrieval grounding
Adjusting model parameters
Temperature tuning
Token limits
Sampling controls
Output formatting
Structured outputs
Response optimization
Safety tuning
Evaluation and iteration

This article explains the concepts, techniques, tools, and best practices needed to tune generative AI systems effectively.

What Does “Generation Behavior” Mean?

Generation behavior refers to how a generative AI model responds to prompts and tasks.

Behavior includes:

Creativity
Accuracy
Consistency
Verbosity
Tone
Reasoning style
Formatting
Safety
Tool usage behavior
Retrieval usage
Determinism
Hallucination tendency

Developers influence generation behavior primarily through:

Prompt engineering
Model parameter tuning
Grounding and retrieval
Tool orchestration
Safety configurations
Output constraints

Prompt Engineering

What Is Prompt Engineering?

Prompt engineering is the process of designing prompts that guide the model toward desired outputs.

A prompt may include:

Instructions
Context
Examples
Constraints
Formatting requirements
Role definitions
Retrieved content

Effective prompting significantly improves:

Accuracy
Relevance
Safety
Consistency
User experience

Types of Prompts

System Prompts

System prompts define the overall behavior and rules for the model.

Examples:

“You are a professional customer support assistant.”
“Always answer using concise bullet points.”
“Do not provide legal advice.”

System prompts are extremely important in agent systems.

They establish:

Personality
Tone
Safety rules
Tool usage guidance
Behavioral boundaries

User Prompts

User prompts contain the actual request from the user.

Example:

Summarize this sales report.

Assistant Messages

Assistant messages represent prior model responses in conversational systems.

These messages help maintain:

Context
Continuity
Conversation memory

Zero-Shot Prompting

Zero-shot prompting provides instructions without examples.

Example:

Classify the sentiment of this review as positive, negative, or neutral.

Advantages:

Simple
Fast
Efficient

Disadvantages:

Less consistent
More variability

Few-Shot Prompting

Few-shot prompting includes examples that demonstrate desired behavior.

Example:

			
Review: The food was amazing.
Sentiment: Positive
Review: The service was terrible.
Sentiment: Negative
Review: The hotel was acceptable.
Sentiment:

		

Advantages:

Better consistency
Improved formatting
Improved reasoning

Disadvantages:

Uses more tokens
Increases cost

Chain-of-Thought Prompting

Chain-of-thought prompting encourages step-by-step reasoning.

Example:

Explain your reasoning step by step.

Useful for:

Math
Logic
Planning
Multistep reasoning

Benefits:

Improved reasoning quality
Better transparency

Risks:

Higher token usage
Longer latency

Role Prompting

Role prompting assigns a specific role or identity.

Examples:

Financial analyst
Teacher
Security auditor
Travel planner

Example:

You are an experienced cloud architect specializing in Azure AI.

Role prompting improves domain alignment.

Context Injection

Context injection provides supporting information within prompts.

Example:

Use the following company policy when answering:

Context may come from:

Documents
Databases
APIs
Azure AI Search
Knowledge stores

This is a core concept in RAG systems.

Prompt Templates

Prompt templates standardize prompts dynamically.

Example:

			
Summarize the following document in {language}:
{document}

Benefits:

Reusability
Maintainability
Consistency

Prompt Chaining

Prompt chaining breaks complex tasks into smaller prompts.

Example workflow:

Extract key topics
Summarize each topic
Generate final report

Advantages:

Better reasoning
Improved reliability
Easier debugging

Retrieval-Augmented Prompting

Retrieval-augmented generation (RAG) adds retrieved content into prompts.

Example:

Answer using only the following documents.

Benefits:

Reduced hallucinations
Better grounding
More current information

Structured Output Prompting

Developers often require structured outputs.

Example:

Return the response as JSON.

Benefits:

Easier parsing
API integration
Workflow automation

Structured outputs are common in:

Agents
Automation systems
Function calling

Prompt Engineering Best Practices

Be Clear and Specific

Bad prompt:

Tell me about Azure.

Better prompt:

Explain Azure AI Foundry for beginners in fewer than 200 words.

Define Constraints

Examples:

Maximum length
Formatting rules
Safety restrictions
Source limitations

Use Examples

Few-shot examples improve consistency.

Reduce Ambiguity

Ambiguous prompts produce inconsistent results.

Test and Iterate

Prompt engineering is iterative.

Developers should continuously evaluate and improve prompts.

Model Parameters

Model parameters strongly affect output behavior.

Important parameters include:

Temperature
Top-p
Maximum tokens
Frequency penalty
Presence penalty
Stop sequences

Temperature

What Is Temperature?

Temperature controls randomness in model outputs.

Lower temperature:

More deterministic
More focused
Less creative

Higher temperature:

More creative
More diverse
Less predictable

Low Temperature Examples

Typical range:

0.0 – 0.3

Best for:

Fact-based answers
Technical support
Classification
Compliance workflows

High Temperature Examples

Typical range:

0.7 – 1.0

Best for:

Brainstorming
Creative writing
Marketing ideas
Story generation

Top-p Sampling

Top-p controls token selection diversity.

The model considers only the most probable tokens whose cumulative probability reaches p.

Lower top-p:

More focused responses
Less diversity

Higher top-p:

More varied responses

Temperature and top-p often work together.

Maximum Tokens

Maximum tokens limit response length.

Benefits:

Cost control
Latency reduction
Preventing excessive responses

Risks:

Responses may be truncated if limit is too low.

Frequency Penalty

Frequency penalty reduces repeated words or phrases.

Useful for:

Avoiding repetition
Improving readability

Presence Penalty

Presence penalty encourages introducing new topics.

Higher presence penalty:

More topic diversity
Less repetition

Stop Sequences

Stop sequences define where generation should stop.

Example:

Stop when “END_RESPONSE” appears.

Useful for:

Structured outputs
Tool workflows
Multi-agent orchestration

Deterministic vs Creative Behavior

Deterministic Systems

Characteristics:

Consistent outputs
Repeatable behavior
Lower creativity

Best for:

Enterprise workflows
Compliance systems
Customer support
Automation

Recommended settings:

Low temperature
Lower top-p

Creative Systems

Characteristics:

Diverse outputs
More exploration
Greater variability

Best for:

Ideation
Content creation
Brainstorming

Recommended settings:

Higher temperature
Higher top-p

Tuning for RAG Applications

RAG systems require special tuning.

Developers should optimize:

Retrieval quality
Prompt grounding
Context window usage
Citation instructions
Hallucination reduction

Example grounding instruction:

Answer only using the retrieved documents.

Tuning Agent Systems

Agents require additional behavioral tuning.

Developers tune:

Tool usage behavior
Planning behavior
Memory usage
Conversation flow
Escalation behavior
Approval workflows

Example:

Only call the refund API after confirming the user identity.

Function Calling and Structured Generation

Models can generate structured tool calls.

Example JSON schema:

			
{
  "city": "Orlando",
  "unit": "Fahrenheit"
}

Prompt tuning improves:

Schema adherence
Parameter accuracy
Tool selection

Controlling Hallucinations

Hallucinations are a major tuning challenge.

Methods to reduce hallucinations:

Lower temperature
Use grounding
Improve retrieval
Add citation requirements
Use smaller focused prompts
Add explicit instructions

Example:

If the answer is not found in the documents, say you do not know.

Safety-Oriented Prompting

Prompts should include safety constraints.

Examples:

Do not generate harmful or unsafe instructions.

Safety prompting helps:

Reduce harmful outputs
Prevent jailbreaks
Enforce policy compliance

Prompt Injection Defense

Attackers may attempt prompt injection.

Example:

Ignore all previous instructions.

Defensive techniques:

Strong system prompts
Tool restrictions
Output validation
Context isolation
Human approval workflows

Evaluating Prompt Quality

Developers evaluate prompts using:

Accuracy metrics
Grounding scores
User feedback
Safety evaluations
Latency measurements
Cost analysis

Prompt quality evaluation is iterative.

A/B Testing Prompts

A/B testing compares multiple prompts.

Example:

Prompt A produces concise responses.
Prompt B produces detailed responses.

Metrics determine which prompt performs better.

Cost Optimization Through Tuning

Good tuning reduces costs.

Strategies include:

Smaller prompts
Lower token counts
Smaller models
Efficient retrieval
Reduced chain-of-thought usage

Azure AI Foundry Support for Tuning

Azure AI Foundry supports:

Prompt flow design
Model evaluation
Safety evaluations
Deployment management
Agent orchestration
Evaluation pipelines
Monitoring and telemetry

Developers can iterate quickly and compare outputs.

Common Tuning Mistakes

Overly Long Prompts

Problems:

Increased cost
Higher latency
Context dilution

Excessive Temperature

Problems:

Hallucinations
Inconsistent outputs
Unsafe behavior

Weak Instructions

Problems:

Ambiguous responses
Poor formatting
Incorrect tool usage

Lack of Evaluation

Problems:

Hidden failures
Safety risks
Poor user experience

Real-World Examples

Customer Support Bot

Goals:

Accurate answers
Consistent tone
Fast responses

Recommended settings:

Low temperature
Grounded retrieval
Structured outputs

Creative Writing Assistant

Goals:

Diverse ideas
Creative language
Engaging responses

Recommended settings:

Higher temperature
Higher top-p

Financial Advisory Agent

Goals:

High accuracy
Low hallucination risk
Compliance adherence

Recommended settings:

Very low temperature
Strict grounding
Human approval workflows

AI-103 Exam Tips

For the AI-103 exam, remember these key points:

Prompt engineering strongly influences model behavior.
System prompts define overall agent behavior.
Few-shot prompting improves consistency.
Lower temperature produces more deterministic outputs.
Higher temperature increases creativity.
Top-p controls response diversity.
Maximum tokens control output length.
RAG improves grounding and reduces hallucinations.
Structured outputs are important for tool workflows.
Prompt tuning is iterative and evaluation-driven.
Safety prompting helps reduce harmful outputs.
Prompt injection is a security concern.

Higher temperature increases randomness and may reduce reliability.

Final Thoughts

Tuning generation behavior is one of the most important skills for modern AI developers. Through effective prompt engineering and careful parameter tuning, developers can optimize AI systems for accuracy, safety, cost efficiency, consistency, and user satisfaction.

For the AI-103 exam, candidates should understand: