Tune generation behavior, such as prompt engineering and adjusting model parameters (AI-103 Exam Prep)

This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Implement generative AI and agentic solutions (30–35%)
--> Optimize and operationalize generative AI systems
--> Tune generation behavior, such as prompt engineering and adjusting model parameters


Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the most important responsibilities of an AI developer is controlling and optimizing the behavior of generative AI systems. Large language models (LLMs) are highly flexible, but without proper tuning, prompts, and parameter adjustments, responses may become inaccurate, inconsistent, unsafe, verbose, expensive, or irrelevant.

For the AI-103 certification exam, candidates must understand how to tune generation behavior in Azure AI Foundry and related Azure AI services. This includes:

  • Prompt engineering
  • System messages
  • Few-shot prompting
  • Context management
  • Retrieval grounding
  • Adjusting model parameters
  • Temperature tuning
  • Token limits
  • Sampling controls
  • Output formatting
  • Structured outputs
  • Response optimization
  • Safety tuning
  • Evaluation and iteration

This article explains the concepts, techniques, tools, and best practices needed to tune generative AI systems effectively.


What Does “Generation Behavior” Mean?

Generation behavior refers to how a generative AI model responds to prompts and tasks.

Behavior includes:

  • Creativity
  • Accuracy
  • Consistency
  • Verbosity
  • Tone
  • Reasoning style
  • Formatting
  • Safety
  • Tool usage behavior
  • Retrieval usage
  • Determinism
  • Hallucination tendency

Developers influence generation behavior primarily through:

  1. Prompt engineering
  2. Model parameter tuning
  3. Grounding and retrieval
  4. Tool orchestration
  5. Safety configurations
  6. Output constraints

Prompt Engineering

What Is Prompt Engineering?

Prompt engineering is the process of designing prompts that guide the model toward desired outputs.

A prompt may include:

  • Instructions
  • Context
  • Examples
  • Constraints
  • Formatting requirements
  • Role definitions
  • Retrieved content

Effective prompting significantly improves:

  • Accuracy
  • Relevance
  • Safety
  • Consistency
  • User experience

Types of Prompts

System Prompts

System prompts define the overall behavior and rules for the model.

Examples:

  • “You are a professional customer support assistant.”
  • “Always answer using concise bullet points.”
  • “Do not provide legal advice.”

System prompts are extremely important in agent systems.

They establish:

  • Personality
  • Tone
  • Safety rules
  • Tool usage guidance
  • Behavioral boundaries

User Prompts

User prompts contain the actual request from the user.

Example:

Summarize this sales report.

Assistant Messages

Assistant messages represent prior model responses in conversational systems.

These messages help maintain:

  • Context
  • Continuity
  • Conversation memory

Zero-Shot Prompting

Zero-shot prompting provides instructions without examples.

Example:

Classify the sentiment of this review as positive, negative, or neutral.

Advantages:

  • Simple
  • Fast
  • Efficient

Disadvantages:

  • Less consistent
  • More variability

Few-Shot Prompting

Few-shot prompting includes examples that demonstrate desired behavior.

Example:

Review: The food was amazing.
Sentiment: Positive
Review: The service was terrible.
Sentiment: Negative
Review: The hotel was acceptable.
Sentiment:

Advantages:

  • Better consistency
  • Improved formatting
  • Improved reasoning

Disadvantages:

  • Uses more tokens
  • Increases cost

Chain-of-Thought Prompting

Chain-of-thought prompting encourages step-by-step reasoning.

Example:

Explain your reasoning step by step.

Useful for:

  • Math
  • Logic
  • Planning
  • Multistep reasoning

Benefits:

  • Improved reasoning quality
  • Better transparency

Risks:

  • Higher token usage
  • Longer latency

Role Prompting

Role prompting assigns a specific role or identity.

Examples:

  • Financial analyst
  • Teacher
  • Security auditor
  • Travel planner

Example:

You are an experienced cloud architect specializing in Azure AI.

Role prompting improves domain alignment.


Context Injection

Context injection provides supporting information within prompts.

Example:

Use the following company policy when answering:

Context may come from:

  • Documents
  • Databases
  • APIs
  • Azure AI Search
  • Knowledge stores

This is a core concept in RAG systems.


Prompt Templates

Prompt templates standardize prompts dynamically.

Example:

Summarize the following document in {language}:
{document}

Benefits:

  • Reusability
  • Maintainability
  • Consistency

Prompt Chaining

Prompt chaining breaks complex tasks into smaller prompts.

Example workflow:

  1. Extract key topics
  2. Summarize each topic
  3. Generate final report

Advantages:

  • Better reasoning
  • Improved reliability
  • Easier debugging

Retrieval-Augmented Prompting

Retrieval-augmented generation (RAG) adds retrieved content into prompts.

Example:

Answer using only the following documents.

Benefits:

  • Reduced hallucinations
  • Better grounding
  • More current information

Structured Output Prompting

Developers often require structured outputs.

Example:

Return the response as JSON.

Benefits:

  • Easier parsing
  • API integration
  • Workflow automation

Structured outputs are common in:

  • Agents
  • Automation systems
  • Function calling

Prompt Engineering Best Practices

Be Clear and Specific

Bad prompt:

Tell me about Azure.

Better prompt:

Explain Azure AI Foundry for beginners in fewer than 200 words.

Define Constraints

Examples:

  • Maximum length
  • Formatting rules
  • Safety restrictions
  • Source limitations

Use Examples

Few-shot examples improve consistency.


Reduce Ambiguity

Ambiguous prompts produce inconsistent results.


Test and Iterate

Prompt engineering is iterative.

Developers should continuously evaluate and improve prompts.


Model Parameters

Model parameters strongly affect output behavior.

Important parameters include:

  • Temperature
  • Top-p
  • Maximum tokens
  • Frequency penalty
  • Presence penalty
  • Stop sequences

Temperature

What Is Temperature?

Temperature controls randomness in model outputs.

Lower temperature:

  • More deterministic
  • More focused
  • Less creative

Higher temperature:

  • More creative
  • More diverse
  • Less predictable

Low Temperature Examples

Typical range:

0.0 – 0.3

Best for:

  • Fact-based answers
  • Technical support
  • Classification
  • Compliance workflows

High Temperature Examples

Typical range:

0.7 – 1.0

Best for:

  • Brainstorming
  • Creative writing
  • Marketing ideas
  • Story generation

Top-p Sampling

Top-p controls token selection diversity.

The model considers only the most probable tokens whose cumulative probability reaches p.

Lower top-p:

  • More focused responses
  • Less diversity

Higher top-p:

  • More varied responses

Temperature and top-p often work together.


Maximum Tokens

Maximum tokens limit response length.

Benefits:

  • Cost control
  • Latency reduction
  • Preventing excessive responses

Risks:

  • Responses may be truncated if limit is too low.

Frequency Penalty

Frequency penalty reduces repeated words or phrases.

Useful for:

  • Avoiding repetition
  • Improving readability

Presence Penalty

Presence penalty encourages introducing new topics.

Higher presence penalty:

  • More topic diversity
  • Less repetition

Stop Sequences

Stop sequences define where generation should stop.

Example:

Stop when “END_RESPONSE” appears.

Useful for:

  • Structured outputs
  • Tool workflows
  • Multi-agent orchestration

Deterministic vs Creative Behavior

Deterministic Systems

Characteristics:

  • Consistent outputs
  • Repeatable behavior
  • Lower creativity

Best for:

  • Enterprise workflows
  • Compliance systems
  • Customer support
  • Automation

Recommended settings:

  • Low temperature
  • Lower top-p

Creative Systems

Characteristics:

  • Diverse outputs
  • More exploration
  • Greater variability

Best for:

  • Ideation
  • Content creation
  • Brainstorming

Recommended settings:

  • Higher temperature
  • Higher top-p

Tuning for RAG Applications

RAG systems require special tuning.

Developers should optimize:

  • Retrieval quality
  • Prompt grounding
  • Context window usage
  • Citation instructions
  • Hallucination reduction

Example grounding instruction:

Answer only using the retrieved documents.

Tuning Agent Systems

Agents require additional behavioral tuning.

Developers tune:

  • Tool usage behavior
  • Planning behavior
  • Memory usage
  • Conversation flow
  • Escalation behavior
  • Approval workflows

Example:

Only call the refund API after confirming the user identity.

Function Calling and Structured Generation

Models can generate structured tool calls.

Example JSON schema:

{
"city": "Orlando",
"unit": "Fahrenheit"
}

Prompt tuning improves:

  • Schema adherence
  • Parameter accuracy
  • Tool selection

Controlling Hallucinations

Hallucinations are a major tuning challenge.

Methods to reduce hallucinations:

  • Lower temperature
  • Use grounding
  • Improve retrieval
  • Add citation requirements
  • Use smaller focused prompts
  • Add explicit instructions

Example:

If the answer is not found in the documents, say you do not know.

Safety-Oriented Prompting

Prompts should include safety constraints.

Examples:

Do not generate harmful or unsafe instructions.

Safety prompting helps:

  • Reduce harmful outputs
  • Prevent jailbreaks
  • Enforce policy compliance

Prompt Injection Defense

Attackers may attempt prompt injection.

Example:

Ignore all previous instructions.

Defensive techniques:

  • Strong system prompts
  • Tool restrictions
  • Output validation
  • Context isolation
  • Human approval workflows

Evaluating Prompt Quality

Developers evaluate prompts using:

  • Accuracy metrics
  • Grounding scores
  • User feedback
  • Safety evaluations
  • Latency measurements
  • Cost analysis

Prompt quality evaluation is iterative.


A/B Testing Prompts

A/B testing compares multiple prompts.

Example:

  • Prompt A produces concise responses.
  • Prompt B produces detailed responses.

Metrics determine which prompt performs better.


Cost Optimization Through Tuning

Good tuning reduces costs.

Strategies include:

  • Smaller prompts
  • Lower token counts
  • Smaller models
  • Efficient retrieval
  • Reduced chain-of-thought usage

Azure AI Foundry Support for Tuning

Azure AI Foundry supports:

  • Prompt flow design
  • Model evaluation
  • Safety evaluations
  • Deployment management
  • Agent orchestration
  • Evaluation pipelines
  • Monitoring and telemetry

Developers can iterate quickly and compare outputs.


Common Tuning Mistakes

Overly Long Prompts

Problems:

  • Increased cost
  • Higher latency
  • Context dilution

Excessive Temperature

Problems:

  • Hallucinations
  • Inconsistent outputs
  • Unsafe behavior

Weak Instructions

Problems:

  • Ambiguous responses
  • Poor formatting
  • Incorrect tool usage

Lack of Evaluation

Problems:

  • Hidden failures
  • Safety risks
  • Poor user experience

Real-World Examples

Customer Support Bot

Goals:

  • Accurate answers
  • Consistent tone
  • Fast responses

Recommended settings:

  • Low temperature
  • Grounded retrieval
  • Structured outputs

Creative Writing Assistant

Goals:

  • Diverse ideas
  • Creative language
  • Engaging responses

Recommended settings:

  • Higher temperature
  • Higher top-p

Financial Advisory Agent

Goals:

  • High accuracy
  • Low hallucination risk
  • Compliance adherence

Recommended settings:

  • Very low temperature
  • Strict grounding
  • Human approval workflows

AI-103 Exam Tips

For the AI-103 exam, remember these key points:

  • Prompt engineering strongly influences model behavior.
  • System prompts define overall agent behavior.
  • Few-shot prompting improves consistency.
  • Lower temperature produces more deterministic outputs.
  • Higher temperature increases creativity.
  • Top-p controls response diversity.
  • Maximum tokens control output length.
  • RAG improves grounding and reduces hallucinations.
  • Structured outputs are important for tool workflows.
  • Prompt tuning is iterative and evaluation-driven.
  • Safety prompting helps reduce harmful outputs.
  • Prompt injection is a security concern.

Practice Exam Questions

Question 1

What is the primary purpose of prompt engineering?

A. Increase GPU memory
B. Guide the model toward desired outputs
C. Eliminate all costs
D. Replace embeddings

Correct Answer

B. Guide the model toward desired outputs

Explanation

Prompt engineering designs prompts that improve accuracy, consistency, formatting, and safety.


Question 2

Which parameter most directly controls output randomness?

A. Max tokens
B. Presence penalty
C. Temperature
D. Context window

Correct Answer

C. Temperature

Explanation

Temperature controls response randomness and creativity.


Question 3

What is a common benefit of few-shot prompting?

A. Reduced token usage
B. Better output consistency
C. Elimination of latency
D. Automatic vector search

Correct Answer

B. Better output consistency

Explanation

Few-shot examples help models understand desired formatting and behavior.


Question 4

Which setting is most appropriate for a compliance-focused enterprise chatbot?

A. High temperature
B. Very low temperature
C. Maximum randomness
D. No grounding

Correct Answer

B. Very low temperature

Explanation

Compliance systems require deterministic and reliable outputs.


Question 5

What is the purpose of maximum token settings?

A. Control response length
B. Increase retrieval quality
C. Encrypt prompts
D. Replace embeddings

Correct Answer

A. Control response length

Explanation

Maximum tokens limit the size of generated responses.


Question 6

Which technique helps reduce hallucinations in RAG systems?

A. Increasing randomness
B. Removing retrieval
C. Grounding responses in retrieved content
D. Eliminating prompts

Correct Answer

C. Grounding responses in retrieved content

Explanation

Grounding helps models answer using trusted retrieved information.


Question 7

What is a system prompt primarily used for?

A. Storing embeddings
B. Defining overall model behavior and rules
C. Encrypting responses
D. Monitoring latency

Correct Answer

B. Defining overall model behavior and rules

Explanation

System prompts establish tone, constraints, and behavioral guidance.


Question 8

What is the purpose of structured output prompting?

A. Improve network routing
B. Produce machine-readable outputs such as JSON
C. Reduce GPU utilization
D. Increase hallucinations

Correct Answer

B. Produce machine-readable outputs such as JSON

Explanation

Structured outputs simplify automation and API integration.


Question 9

Which tuning strategy is most likely to reduce cost?

A. Increasing token usage
B. Using unnecessarily large prompts
C. Reducing prompt size and response length
D. Maximizing chain-of-thought reasoning for every request

Correct Answer

C. Reducing prompt size and response length

Explanation

Smaller prompts and shorter outputs reduce token consumption.


Question 10

What is a major risk of setting temperature too high?

A. Reduced creativity
B. Increased hallucinations and inconsistency
C. Elimination of variability
D. Reduced response diversity

Correct Answer

B. Increased hallucinations and inconsistency

Explanation

Higher temperature increases randomness and may reduce reliability.


Final Thoughts

Tuning generation behavior is one of the most important skills for modern AI developers. Through effective prompt engineering and careful parameter tuning, developers can optimize AI systems for accuracy, safety, cost efficiency, consistency, and user satisfaction.

For the AI-103 exam, candidates should understand:

  • Prompt engineering strategies
  • System prompts and role prompting
  • Few-shot and chain-of-thought prompting
  • Temperature and top-p tuning
  • Structured outputs
  • Hallucination reduction techniques
  • Safety prompting
  • RAG grounding strategies
  • Cost optimization methods
  • Prompt evaluation and iteration

Strong tuning practices are essential for building reliable, production-grade AI applications and agents on Azure.


Go to the AI-103 Exam Prep Hub main page

Leave a comment