This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub. 
This topic falls under these sections:
Plan and manage an Azure AI solution (25–30%)
   --> Implement responsible AI across generative AI and agentic systems
      --> Configure safety filters, guardrails, risk detection, and content moderation

Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Generative AI and agentic systems can produce highly capable outputs, but they also introduce risks.

AI systems may generate:

Harmful content
Unsafe instructions
Toxic responses
Biased outputs
Sensitive information exposure
Hallucinated information
Unsafe autonomous actions

Organizations deploying AI systems must implement strong safety and governance controls.

The AI-103: Develop AI Apps and Agents on Azure certification exam tests your understanding of responsible AI and AI safety mechanisms.

For the AI-103 exam, you should understand:

Safety filters
Guardrails
Risk detection
Content moderation
Prompt filtering
Output filtering
Harm detection
Responsible AI principles
AI governance
Prompt injection defense
Azure AI Content Safety
Safe agent behavior

Why AI Safety Matters

AI systems interact directly with users, enterprise systems, and organizational data.

Without safeguards, AI may:

Produce harmful outputs
Leak sensitive data
Generate misleading responses
Perform unsafe actions
Violate compliance policies

Safety systems reduce operational and reputational risk.

Responsible AI Principles

Responsible AI principles guide safe AI deployment.

Core principles include:

Fairness
Reliability
Safety
Privacy
Transparency
Accountability

What Are Safety Filters?

Safety filters evaluate AI inputs and outputs for harmful content.

They help:

Block unsafe prompts
Detect harmful responses
Reduce toxic outputs
Enforce policy compliance

Input Filtering

Input filtering analyzes prompts before they reach the model.

It helps detect:

Harmful requests
Prompt injection attempts
Unsafe instructions
Sensitive topics

Output Filtering

Output filtering evaluates generated responses before returning them to users.

It helps prevent:

Toxic responses
Harmful advice
Violent content
Sensitive information leakage

What Are Guardrails?

Guardrails are governance controls that constrain AI behavior.

Guardrails help ensure AI systems:

Stay within policy boundaries
Avoid harmful actions
Follow organizational rules
Operate safely

Types of Guardrails

Common guardrails include:

Content restrictions
Tool-use restrictions
Data access boundaries
Topic limitations
Workflow constraints
Approval requirements

Tool-Use Guardrails

AI agents may access:

APIs
Databases
Email systems
Enterprise applications

Tool guardrails restrict:

Which tools can be used
Which actions are allowed
Which workflows require approval

Data Access Guardrails

Data guardrails help prevent:

Unauthorized access
Sensitive data exposure
Cross-tenant data leakage

Workflow Guardrails

Workflow guardrails limit:

Autonomous actions
Escalation capabilities
Financial transactions
Administrative operations

What Is Risk Detection?

Risk detection identifies potentially harmful or unsafe AI activity.

Examples include:

Toxic content
Violence
Hate speech
Self-harm content
Prompt injection attempts
Policy violations

Real-Time Risk Detection

Real-time safety systems evaluate:

User prompts
Retrieved content
Generated outputs
Tool requests

before actions are completed.

Categories of Harmful Content

Safety systems commonly detect:

Hate content
Sexual content
Violent content
Self-harm content

Severity Levels

Risk detection systems often assign severity levels such as:

Safe
Low
Medium
High

Organizations can configure thresholds.

Azure AI Content Safety

Azure AI Content Safety provides tools for:

Harm detection
Content moderation
Safety filtering
Prompt analysis

This is an important AI-103 exam topic.

Content Moderation

Content moderation reviews text and media for policy violations.

Moderation may occur:

Before generation
During workflows
After generation

Moderation Policies

Organizations may block:

Offensive content
Illegal content
Dangerous instructions
Harassment
Extremist content

Human Review Workflows

Some moderation systems escalate content for:

Human review
Compliance checks
Policy validation

Prompt Injection Attacks

Prompt injection attacks attempt to manipulate model instructions.

Examples include:

Overriding system prompts
Exposing secrets
Triggering unsafe actions

Defending Against Prompt Injection

Defense strategies include:

Input filtering
Prompt isolation
Tool restrictions
Approval workflows
Retrieval validation

Jailbreak Attempts

Jailbreaks attempt to bypass model safety controls.

Attackers may try to:

Circumvent filters
Force unsafe outputs
Override restrictions

Defending Against Jailbreaks

Mitigation strategies include:

Strong system prompts
Safety filtering
Layered guardrails
Human oversight

Hallucination Risks

Hallucinations occur when models generate incorrect or fabricated information.

This can create:

Compliance risks
Business risks
Safety concerns

Reducing Hallucinations

Common strategies include:

Grounding with enterprise data
Retrieval-Augmented Generation (RAG)
Confidence scoring
Output validation

Grounding and Safety

Grounded systems reduce unsafe responses by:

Using trusted data sources
Improving factual accuracy
Limiting unsupported claims

Agentic System Risks

AI agents introduce additional safety concerns.

Agents may:

Execute tools
Perform workflows
Access enterprise systems
Operate autonomously

Agent Safety Controls

Safe agent systems commonly use:

Tool restrictions
Permission boundaries
Approval workflows
Monitoring
Logging

Human-in-the-Loop Safety

Human-in-the-loop (HITL) systems require human approval for:

Sensitive actions
High-risk operations
Critical decisions

Rate Limiting and Abuse Prevention

Safety systems may limit:

Request frequency
Token usage
Tool execution frequency

This helps reduce abuse.

Monitoring and Logging

Organizations should monitor:

Unsafe prompts
Safety violations
Moderation actions
Tool activity
Policy violations

Audit Trails

Audit logs support:

Governance
Compliance
Incident investigation
Accountability

Transparency and Explainability

Organizations should understand:

Why content was blocked
Why actions were denied
Which rules triggered safety responses

Risk-Based Safety Design

Safety controls should align with risk.

Higher-risk systems require:

Stronger filtering
More oversight
Additional approvals
Tighter controls

Examples of High-Risk AI Systems

Examples include:

Healthcare AI
Financial AI systems
Legal advisory systems
Autonomous enterprise agents

Multi-Layered Defense

Effective AI safety uses layered protection.

Common layers include:

Input filtering
Output moderation
Tool restrictions
Human oversight
Monitoring

Common AI-103 Safety Scenarios

Scenario 1: Enterprise Chatbot

Requirements:

Prevent toxic responses
Reduce hallucinations
Protect sensitive data

Recommended Safety Controls:

Content moderation
Grounding
Output filtering

Scenario 2: AI Financial Assistant

Requirements:

High accuracy
Restricted actions
Human approvals

Recommended Safety Controls:

HITL workflows
Tool restrictions
Approval guardrails

Scenario 3: Autonomous AI Agent

Requirements:

Safe tool usage
Workflow governance
Policy enforcement

Recommended Safety Controls:

Tool allow lists
Permission boundaries
Monitoring

Scenario 4: Public AI API

Requirements:

Abuse prevention
Harm detection
Request monitoring

Recommended Safety Controls:

Rate limiting
Content Safety
Audit logging

Common AI-103 Exam Tips

Understand Safety Layers

Know:

Input filtering
Output filtering
Moderation
Guardrails

Learn Azure AI Content Safety

Understand:

Harm categories
Severity levels
Moderation workflows

Understand Agent Safety

Know:

Tool restrictions
Permission boundaries
Human oversight

Learn Prompt Injection Defense

Understand:

Jailbreak prevention
Prompt isolation
Retrieval validation

Summary

Safety and governance are essential for responsible AI systems.

For the AI-103 exam, you should understand:

Safety filters
Guardrails
Risk detection
Content moderation
Prompt injection defense
Azure AI Content Safety
Tool restrictions
Agent safety controls
Human oversight
Responsible AI principles

Strong AI safety practices help ensure systems remain:

Safe
Reliable
Governed
Compliant
Resistant to misuse

These concepts are foundational for deploying enterprise AI solutions on Azure.

Audit logs provide accountability and governance visibility.

Go to the AI-103 Exam Prep Hub main page

Introduction

Why AI Safety Matters

Responsible AI Principles

What Are Safety Filters?

Input Filtering

Output Filtering

What Are Guardrails?

Types of Guardrails

Tool-Use Guardrails

Data Access Guardrails

Workflow Guardrails

What Is Risk Detection?

Real-Time Risk Detection

Categories of Harmful Content

Severity Levels

Azure AI Content Safety

Content Moderation

Moderation Policies

Human Review Workflows

Prompt Injection Attacks

Defending Against Prompt Injection

Jailbreak Attempts

Defending Against Jailbreaks

Hallucination Risks

Reducing Hallucinations

Grounding and Safety

Agentic System Risks

Agent Safety Controls

Human-in-the-Loop Safety

Rate Limiting and Abuse Prevention

Monitoring and Logging

Audit Trails

Transparency and Explainability

Risk-Based Safety Design

Examples of High-Risk AI Systems

Multi-Layered Defense

Common AI-103 Safety Scenarios

Scenario 1: Enterprise Chatbot

Scenario 2: AI Financial Assistant

Scenario 3: Autonomous AI Agent

Scenario 4: Public AI API

Common AI-103 Exam Tips

Understand Safety Layers

Learn Azure AI Content Safety

Understand Agent Safety

Learn Prompt Injection Defense

Summary

Practice Exam Questions

Question 1

Answer

Explanation

Question 2

Answer

Explanation

Question 3

Answer

Explanation

Question 4

Answer

Explanation

Question 5

Answer

Explanation

Question 6

Answer

Explanation

Question 7

Answer

Explanation

Question 8

Answer

Explanation

Question 9

Answer

Explanation

Question 10

Answer

Explanation

Share this:

Related