This post is a part of the AI-103: Develop AI Apps and Agents on Azure Exam Prep Hub.
This topic falls under these sections:
Implement text analysis solutions (10–15%)
--> Apply language model text analysis
--> Implement solutions to extract entities, topics, summaries, and structured JSON outputs by using generative prompting and Foundry Tools
Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Introduction
Modern AI applications increasingly rely on language models to transform unstructured text into structured, actionable information. Organizations use generative AI systems to:
- Extract entities
- Detect topics
- Generate summaries
- Produce structured JSON outputs
- Automate workflows
- Enrich search and analytics systems
For the AI-103 certification exam, you should understand how to implement text analysis workflows using:
- Generative prompting
- Multimodal and language models
- Structured outputs
- Azure AI Foundry tools
- Prompt orchestration
- Responsible AI practices
This topic falls under:
“Apply language model text analysis”
What Is Text Analysis?
Definition
Text analysis is the process of extracting meaningful information from unstructured text.
Examples include:
- Entity extraction
- Topic classification
- Sentiment analysis
- Summarization
- Categorization
- Structured data generation
Why Generative AI Improves Text Analysis
Traditional NLP systems often relied on:
- Rule-based processing
- Fixed schemas
- Pretrained classifiers
Generative AI systems provide:
- Flexible extraction
- Contextual understanding
- Natural language reasoning
- Dynamic schema generation
- Few-shot adaptability
Common Text Analysis Tasks
Entity Extraction
Identifying important entities within text.
Examples:
- Names
- Organizations
- Dates
- Locations
- Products
- Financial values
Example Entity Extraction
Input:
Contoso signed a contract with Fabrikam on March 5, 2026.
Extracted entities:
{ "organizations": [ "Contoso", "Fabrikam" ], "date": "March 5, 2026"}
Topic Extraction
What Is Topic Extraction?
Topic extraction identifies the primary themes discussed within text.
Example Topics
Document:
The company discussed quarterly cloud migration costs and AI infrastructure scaling.
Detected topics:
- Cloud computing
- AI infrastructure
- Financial operations
Summarization
What Is Summarization?
Summarization condenses large amounts of text into shorter, meaningful summaries.
Types of Summaries
Extractive Summarization
Selects important text directly from the source.
Abstractive Summarization
Generates new language-based summaries.
Generative AI commonly uses abstractive summarization.
Example Summary Prompt
Summarize this customer support conversation in three sentences.
Structured JSON Outputs
Why Structured Outputs Matter
Structured outputs improve:
- Automation
- API integration
- Data pipelines
- Analytics
- Workflow orchestration
Example Structured Output
{ "customer_sentiment": "negative", "issue_type": "billing", "priority": "high"}
Prompt Engineering for Text Analysis
Why Prompt Engineering Matters
Prompts strongly influence:
- Extraction quality
- Consistency
- Formatting
- Hallucination frequency
Example Entity Prompt
Extract all people, organizations, and dates from the following text.
Example JSON Prompt
Return the output strictly as valid JSON.
Example Topic Classification Prompt
Identify the top three business topics discussed in this document.
Few-Shot Prompting
What Is Few-Shot Prompting?
Few-shot prompting provides examples within prompts.
Example
Input: "Invoice overdue for 45 days"Output:{ "category": "accounts receivable"}
Few-shot prompting improves consistency and accuracy.
Chain-of-Thought Reasoning
Some workflows encourage reasoning before output generation.
Example:
Analyze the text step-by-step before generating the final JSON output.
Structured Output Validation
Generated JSON should be validated to ensure:
- Proper formatting
- Required fields
- Valid schema structure
Example Validation Concerns
Potential issues:
- Missing fields
- Invalid JSON syntax
- Hallucinated values
- Unexpected schema changes
Hallucinations in Text Analysis
What Are Hallucinations?
Hallucinations occur when models:
- Invent entities
- Create unsupported summaries
- Generate incorrect classifications
Example Hallucination
Input:
Meeting scheduled for Tuesday.
Incorrect output:
{ "location": "New York"}
The location was never mentioned.
Reducing Hallucinations
Strategies include:
- Grounded prompts
- Retrieval augmentation
- Schema validation
- Confidence scoring
- Human review
- Explicit formatting instructions
Retrieval-Augmented Generation (RAG)
What Is RAG?
RAG combines:
- Retrieval systems
- Vector search
- Generative models
to improve grounding and reduce hallucinations.
Example RAG Workflow
- User submits question
- Relevant documents retrieved
- LLM analyzes retrieved content
- Structured output generated
Azure AI Foundry
Microsoft provides:
Azure AI Foundry
to help build and orchestrate AI workflows.
Foundry Capabilities
Azure AI Foundry supports:
- Prompt flows
- Model orchestration
- Evaluations
- Safety testing
- Workflow automation
- AI experimentation
Prompt Flows
What Are Prompt Flows?
Prompt flows visually orchestrate:
- Inputs
- LLM calls
- Validation steps
- Tool integrations
- Output processing
Example Prompt Flow
- Receive document
- Extract entities
- Classify topics
- Generate summary
- Return JSON response
Multi-Step Text Analysis Pipelines
Organizations commonly chain multiple operations:
- OCR
- Summarization
- Classification
- Translation
- Entity extraction
Example Enterprise Workflow
- Upload support ticket
- Detect language
- Extract entities
- Summarize issue
- Generate structured JSON
- Route to support queue
Azure OpenAI Service
Azure OpenAI Service
supports:
- Generative prompting
- Structured outputs
- Summarization
- Topic extraction
- Entity extraction
Azure AI Language
Azure AI Language
supports:
- Named entity recognition
- Classification
- Summarization
- Sentiment analysis
Azure AI Search
Azure AI Search
supports:
- Vector search
- Hybrid search
- Retrieval workflows
- RAG architectures
Azure Functions
Azure Functions
commonly orchestrates:
- Text pipelines
- Event triggers
- Automated workflows
Security and Responsible AI
Text analysis systems must handle:
- Sensitive data
- PII
- Confidential information
- Harmful prompts
Responsible AI Considerations
Organizations should:
- Validate outputs
- Monitor hallucinations
- Protect privacy
- Audit workflows
- Apply content filtering
Privacy Considerations
Text may contain:
- Personal information
- Financial data
- Medical information
- Corporate secrets
Organizations should:
- Encrypt data
- Restrict access
- Mask sensitive fields
Human-in-the-Loop Review
Human review may be necessary for:
- Legal workflows
- Healthcare systems
- Financial reporting
- High-risk classifications
Observability and Monitoring
Production systems should monitor:
- Latency
- Token usage
- Hallucination frequency
- JSON validation failures
- Prompt injection attempts
- Cost
- Throughput
Cost Optimization
Generative AI pipelines can become expensive.
Optimization strategies include:
- Shorter prompts
- Chunking large documents
- Smaller models where appropriate
- Caching results
- Batch processing
Example Structured Extraction Workflow
A legal firm may:
- Upload contracts
- Extract entities
- Detect clauses
- Generate summaries
- Produce structured JSON metadata
- Store searchable outputs
This demonstrates:
- Entity extraction
- Summarization
- Structured outputs
- Workflow orchestration
Best Practices for Text Analysis Workflows
Use Explicit Prompt Instructions
Improve consistency and formatting.
Validate JSON Outputs
Prevent downstream parsing failures.
Ground Responses in Source Data
Reduce hallucinations.
Use Multi-Step Pipelines
Separate extraction, classification, and summarization stages.
Monitor Hallucinations
Track unsupported outputs.
Protect Sensitive Data
Apply privacy and security controls.
Support Human Review
Especially for high-risk workflows.
Exam Tips for AI-103
For the AI-103 exam, remember these important concepts:
- Entity extraction identifies structured information within text.
- Topic extraction identifies major themes.
- Summarization condenses large text into concise outputs.
- Structured JSON outputs improve automation and integrations.
- Prompt engineering strongly affects extraction quality.
- Few-shot prompting improves consistency.
- Hallucinations generate unsupported or incorrect outputs.
- RAG improves grounding using retrieved documents.
- Azure AI Foundry supports prompt flows and orchestration.
- Azure OpenAI Service supports generative text analysis workflows.
- JSON validation is important for reliable downstream processing.
Practice Exam Questions
Question 1
What is the purpose of entity extraction?
A. Compressing text files
B. Identifying structured information such as names and dates
C. Encrypting JSON outputs
D. Scaling databases dynamically
Answer
B. Identifying structured information such as names and dates
Explanation
Entity extraction identifies meaningful structured information within text.
Question 2
What is topic extraction?
A. Compressing prompts
B. Removing hallucinations automatically
C. Encrypting documents
D. Identifying major themes discussed within text
Answer
D. Identifying major themes discussed within text
Explanation
Topic extraction identifies the primary subjects or themes in content.
Question 3
Why are structured JSON outputs useful?
A. They simplify automation and system integration
B. They eliminate OCR workflows
C. They reduce internet bandwidth usage
D. They disable hallucinations
Answer
A. They simplify automation and system integration
Explanation
Structured outputs are easier for applications and APIs to process programmatically.
Question 4
What is a hallucination in generative AI?
A. A valid JSON schema
B. Unsupported or invented model output
C. A GPU optimization technique
D. An OCR extraction method
Answer
B. Unsupported or invented model output
Explanation
Hallucinations occur when models generate incorrect or fabricated information.
Question 5
What is few-shot prompting?
A. Disabling prompts entirely
B. Compressing token usage automatically
C. Providing examples within prompts to guide model behavior
D. Encrypting prompt flows
Answer
C. Providing examples within prompts to guide model behavior
Explanation
Few-shot prompting improves output quality by demonstrating desired behavior.
Question 6
Which Azure service supports prompt flow orchestration?
A. Azure AI Foundry
B. Azure DNS
C. Azure Firewall
D. Azure CDN
Answer
A. Azure AI Foundry
Explanation
Azure AI Foundry supports prompt flows, orchestration, and AI workflow management.
Question 7
What is Retrieval-Augmented Generation (RAG)?
A. Combining retrieval systems with generative AI for grounded responses
B. Compressing OCR results
C. Encrypting vector embeddings
D. Removing JSON outputs
Answer
A. Combining retrieval systems with generative AI for grounded responses
Explanation
RAG retrieves relevant information before generating responses.
Question 8
Why should generated JSON outputs be validated?
A. To disable summarization
B. To reduce OCR latency
C. To ensure schema correctness and prevent parsing failures
D. To eliminate vector search
Answer
C. To ensure schema correctness and prevent parsing failures
Explanation
Validation ensures outputs are properly structured and usable downstream.
Question 9
Which Azure service supports generative summarization and entity extraction?
A. Azure Virtual WAN
B. Azure ExpressRoute
C. Azure Firewall
D. Azure OpenAI Service
Answer
D. Azure OpenAI Service
Explanation
Azure OpenAI Service supports generative AI-based text analysis workflows.
Question 10
What is a best practice for reducing hallucinations?
A. Disable monitoring systems
B. Automatically trust all outputs
C. Use grounded prompts and validation workflows
D. Avoid structured outputs
Answer
C. Use grounded prompts and validation workflows
Explanation
Grounding and validation help reduce unsupported or fabricated outputs.
Go to the AI-103 Exam Prep Hub main page
