Tag: AI-901: Azure AI Fundamentals

Exam Prep Hub for AI-901: Azure AI Fundamentals

Welcome to the AI-901: Azure AI Fundamentals Exam Prep Hub!

Welcome to the one-stop hub with information for preparing for the AI-901: Azure AI Fundamentals certification exam. The content for this exam helps you to demonstrate that “you have conceptual knowledge of AI solutions in Azure and the foundational technical skills to work with them”. You will also need “knowledge of Python coding syntax and programming techniques, and you should be familiar with Azure resources”.
Upon successful completion of the exam, you earn the Microsoft Certified: Azure AI Fundamentals certification.

This hub provides information directly here (topic-by-topic as outlined in the official study guide), links to a number of external resources, tips for preparing for the exam, practice tests, and section questions to help you prepare. Bookmark this page and use it as a guide to ensure that you are fully covering all relevant topics for the AI-901 exam and making use of as many of the resources available as possible.


Audience profile (from Microsoft’s site)



As a candidate for this Microsoft Certification, you’re at the beginning of your career in AI solution development. These Microsoft certifications offer opportunities to demonstrate your understanding of machine learning, AI concepts, and Azure services, whether you are starting your career or advancing your skills in AI solution development. Both certifications are designed for candidates from technical and non-technical backgrounds—prior experience in data science or software engineering is not required, though familiarity with basic cloud concepts and client-server applications will be helpful.
For the AI-901, you should have foundational knowledge of AI workloads and understand the basic principles of AI and machine learning. And also, you should have foundational technical skills for working with AI solutions in Azure, conceptual knowledge of Azure-based AI solutions, and familiarity with Python coding syntax and programming techniques, as well as Azure resources.
You may be eligible for ACE college credit if you pass this certification. See ACE college credit for certification exams for details.


Skills at a glance (as specified in the official study guide)

  • Identify AI concepts and responsibilities (40–45%)
  • Implement AI solutions by using Microsoft Foundry (55–60%)

Topic-by-Topic Exam Content

[click a topic link to access the content and practice questions for that topic]

Identify AI concepts and capabilities (40–45%)

Describe principles of responsible AI

Identify AI model components and configurations

Identify AI workloads

Implement AI solutions by using Microsoft Foundry (55–60%)

Implement generative AI apps and agents by using Foundry

Implement AI solutions for text and speech by using Foundry

Implement AI solutions with computer vision and image-generation capabilities by using Foundry

Implement AI solutions for information extraction by using Foundry


AI-901 Practice Exams


Important AI-901 Resources


Good luck to you on your data journey!

AI-901: Microsoft Azure AI Fundamentals – Practice Exam #4 (30 Questions)


Question 1

Which AI workload is BEST suited for forecasting future inventory demand?

A. Regression
B. OCR
C. Object detection
D. Speech synthesis


Correct Answer

A. Regression


Explanation

Regression predicts continuous numeric values, such as future inventory levels or sales demand.


Question 2

A company wants to automatically categorize support tickets into departments such as Billing, Technical Support, and Sales.

Which machine learning technique should be used?

A. Classification
B. Clustering
C. Regression
D. Forecasting


Correct Answer

A. Classification


Question 3

Which Responsible AI principle emphasizes designing systems that can be used by people with disabilities?

A. Inclusiveness
B. Reliability
C. Accountability
D. Transparency


Correct Answer

A. Inclusiveness


Question 4

What is tokenization in generative AI?

A. Breaking text into smaller pieces for processing
B. Encrypting storage devices
C. Compressing images
D. Improving network bandwidth


Correct Answer

A. Breaking text into smaller pieces for processing


Question 5

HOTSPOT / MATCHING

Match each AI capability with the appropriate use case.

Use CaseCapability
Creating realistic AI-generated artwork?
Identifying the language of a sentence?
Extracting text from a passport image?

Options:

  • OCR
  • Language detection
  • Image generation

Correct Answers

Use CaseCapability
Creating realistic AI-generated artworkImage generation
Identifying the language of a sentenceLanguage detection
Extracting text from a passport imageOCR

Question 6

Which Azure AI capability allows applications to answer questions conversationally?

A. Generative AI
B. Regression
C. Clustering
D. Forecasting


Correct Answer

A. Generative AI


Question 7

A retailer wants to detect when shelves in stores are empty by analyzing camera images.

Which AI capability should be used?

A. Object detection
B. Speech recognition
C. Translation
D. Regression


Correct Answer

A. Object detection


Question 8

MULTIPLE ANSWER

Which are common examples of generative AI applications?

Select ALL that apply.

A. Chatbots
B. AI image creation
C. Automated code generation
D. Spreadsheet printing
E. Content summarization


Correct Answers

A. Chatbots
B. AI image creation
C. Automated code generation
E. Content summarization


Question 9

What is the purpose of an API key in Azure AI services?

A. To authenticate access to the service
B. To increase monitor brightness
C. To compress image files
D. To replace endpoints


Correct Answer

A. To authenticate access to the service


Question 10

Which AI capability identifies names, locations, or dates in text?

A. Entity extraction
B. Object detection
C. OCR
D. Speech synthesis


Correct Answer

A. Entity extraction


Question 11

FILL IN THE BLANK

__________ AI can generate new content such as text, images, and code.


Correct Answer

Generative

or

Generative AI


Question 12

Which statement about cloud-based AI services is TRUE?

A. They reduce the need to manage physical infrastructure
B. They eliminate all cybersecurity risks
C. They only work offline
D. They cannot scale automatically


Correct Answer

A. They reduce the need to manage physical infrastructure


Question 13

You need an AI solution that can convert customer emails into concise summaries.

Which capability is MOST appropriate?

A. Text summarization
B. Regression
C. Object detection
D. Forecasting


Correct Answer

A. Text summarization


Question 14

Which machine learning technique is used when training data already contains labeled outcomes?

A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. OCR


Correct Answer

A. Supervised learning


Question 15

MULTIPLE ANSWER

Which capabilities are commonly associated with Azure AI Vision services?

Select ALL that apply.

A. OCR
B. Image captioning
C. Object detection
D. Sentiment analysis
E. Facial analysis


Correct Answers

A. OCR
B. Image captioning
C. Object detection
E. Facial analysis


Question 16

Which Responsible AI principle focuses on ensuring AI systems operate dependably?

A. Reliability and safety
B. Transparency
C. Inclusiveness
D. Fairness


Correct Answer

A. Reliability and safety


Question 17

You want an AI assistant to always respond in a formal tone.

What is the BEST way to accomplish this?

A. Use a system prompt
B. Use OCR
C. Increase screen resolution
D. Disable APIs


Correct Answer

A. Use a system prompt


Question 18

HOTSPOT / MATCHING

Match the AI workload to the corresponding example.

WorkloadExample
Classification?
Speech synthesis?
Clustering?

Options:

  • Grouping similar customers
  • Converting text into spoken audio
  • Determining whether a loan is approved

Correct Answers

WorkloadExample
ClassificationDetermining whether a loan is approved
Speech synthesisConverting text into spoken audio
ClusteringGrouping similar customers

Question 19

Which capability allows AI systems to create captions that describe images?

A. Image captioning
B. OCR
C. Translation
D. Regression


Correct Answer

A. Image captioning


Question 20

Which statement about Responsible AI is TRUE?

A. AI systems should be monitored and governed responsibly
B. AI systems never require human oversight
C. Responsible AI eliminates all risks
D. Responsible AI only applies to generative AI


Correct Answer

A. AI systems should be monitored and governed responsibly


Question 21

A transportation company wants to analyze dashcam footage to detect traffic signs and pedestrians.

Which workload is MOST appropriate?

A. Computer vision
B. Regression
C. Forecasting
D. Translation


Correct Answer

A. Computer vision


Question 22

MULTIPLE ANSWER

Which are common capabilities of natural language processing solutions?

Select ALL that apply.

A. Translation
B. Sentiment analysis
C. Entity recognition
D. Object detection
E. Summarization


Correct Answers

A. Translation
B. Sentiment analysis
C. Entity recognition
E. Summarization


Question 23

Which Responsible AI principle emphasizes understanding and explaining AI decisions?

A. Transparency
B. Reliability
C. Privacy
D. Inclusiveness


Correct Answer

A. Transparency


Question 24

FILL IN THE BLANK

__________ recognition converts spoken words into text.


Correct Answer

Speech

or

Speech recognition


Question 25

A business wants AI-generated voice responses for a customer-service phone system.

Which capability should they use?

A. Speech synthesis
B. OCR
C. Classification
D. Clustering


Correct Answer

A. Speech synthesis


Question 26

Which AI capability would BEST help identify duplicate customer records?

A. Entity matching
B. Object detection
C. Speech synthesis
D. Forecasting


Correct Answer

A. Entity matching


Question 27

MULTIPLE ANSWER

Which are benefits of using pretrained generative AI models?

Select ALL that apply.

A. Faster implementation
B. Reduced training costs
C. Access to advanced capabilities
D. Guaranteed perfect outputs
E. Simplified development


Correct Answers

A. Faster implementation
B. Reduced training costs
C. Access to advanced capabilities
E. Simplified development


Question 28

You are developing a lightweight application that sends prompts to an Azure AI model and receives responses.

What is the MOST likely communication method?

A. REST API calls
B. HDMI connections
C. USB transfers
D. Printer drivers


Correct Answer

A. REST API calls


Question 29

Which AI capability can identify the main topics discussed in a document?

A. Keyword extraction
B. Speech synthesis
C. Object detection
D. Regression


Correct Answer

A. Keyword extraction


Question 30

SCENARIO-BASED QUESTION

A media company wants an AI solution that:

  • Generates promotional images
  • Converts podcast audio into transcripts
  • Detects company logos in uploaded photos
  • Summarizes interview articles

Which AI capabilities are required?

A. Image generation, speech recognition, object detection, and text summarization
B. Regression and clustering only
C. Forecasting and classification only
D. OCR only


Correct Answer

A. Image generation, speech recognition, object detection, and text summarization


Explanation

The scenario requires multiple AI capabilities:

  • Image generation for promotional graphics
  • Speech recognition for podcast transcription
  • Object detection for logo identification
  • Text summarization for interview summaries

AI-901: Microsoft Azure AI Fundamentals – Practice Exam #3 (30 Questions)

Question 1

Which machine learning technique is commonly used to predict whether an email is spam?

A. Regression
B. Classification
C. Clustering
D. OCR


Correct Answer

B. Classification


Explanation

Classification predicts categories or labels such as spam/not spam, approved/denied, or fraud/not fraud.


Question 2

A company wants an AI system that can automatically generate product descriptions for an online store.

Which AI capability should be used?

A. Generative AI
B. Regression
C. OCR
D. Clustering


Correct Answer

A. Generative AI


Question 3

Which Responsible AI principle focuses on making AI systems accessible to people with varying abilities?

A. Fairness
B. Inclusiveness
C. Accountability
D. Transparency


Correct Answer

B. Inclusiveness


Question 4

Which Azure AI capability is MOST useful for converting scanned paper documents into searchable text?

A. OCR
B. Object detection
C. Speech synthesis
D. Regression


Correct Answer

A. OCR


Question 5

HOTSPOT / MATCHING

Match the AI scenario with the correct capability.

ScenarioCapability
Predicting next month’s sales?
Grouping similar customers?
Determining if a review is positive or negative?

Options:

  • Clustering
  • Regression
  • Sentiment analysis

Correct Answers

ScenarioCapability
Predicting next month’s salesRegression
Grouping similar customersClustering
Determining if a review is positive or negativeSentiment analysis

Question 6

What is the PRIMARY benefit of pretrained AI models?

A. They eliminate all errors
B. They reduce the need to build models from scratch
C. They require no internet connectivity
D. They replace APIs


Correct Answer

B. They reduce the need to build models from scratch


Question 7

Which AI capability identifies people, products, or objects within images?

A. Object detection
B. Sentiment analysis
C. Speech recognition
D. Translation


Correct Answer

A. Object detection


Question 8

MULTIPLE ANSWER

Which are examples of generative AI outputs?

Select ALL that apply.

A. AI-generated images
B. AI-written summaries
C. AI-created code
D. Printed paper forms
E. AI-generated marketing emails


Correct Answers

A. AI-generated images
B. AI-written summaries
C. AI-created code
E. AI-generated marketing emails


Question 9

What is temperature commonly used for in generative AI models?

A. To control response randomness and creativity
B. To measure server heat
C. To encrypt AI outputs
D. To improve monitor resolution


Correct Answer

A. To control response randomness and creativity


Question 10

A business wants an AI system that can translate customer support chats into multiple languages.

Which workload should they use?

A. Natural language processing
B. Object detection
C. Regression
D. Clustering


Correct Answer

A. Natural language processing


Question 11

FILL IN THE BLANK

__________ AI models can process multiple input types such as text, images, and audio.


Correct Answer

Multimodal


Question 12

Which statement about Azure AI Foundry is TRUE?

A. It can be used to deploy and manage AI models
B. It only supports spreadsheet applications
C. It replaces operating systems
D. It only works for computer vision models


Correct Answer

A. It can be used to deploy and manage AI models


Question 13

You need an AI application that can identify important phrases within documents.

Which capability should you use?

A. Keyword extraction
B. Regression
C. Clustering
D. Object detection


Correct Answer

A. Keyword extraction


Question 14

Which type of machine learning groups data based on similarities without predefined labels?

A. Regression
B. Classification
C. Clustering
D. OCR


Correct Answer

C. Clustering


Question 15

MULTIPLE ANSWER

Which are common uses of speech AI services?

Select ALL that apply.

A. Speech-to-text conversion
B. Text-to-speech conversion
C. Real-time translation
D. Predicting stock prices
E. Speaker recognition


Correct Answers

A. Speech-to-text conversion
B. Text-to-speech conversion
C. Real-time translation
E. Speaker recognition


Question 16

Which Responsible AI principle emphasizes understanding how AI systems operate?

A. Transparency
B. Reliability
C. Inclusiveness
D. Privacy


Correct Answer

A. Transparency


Question 17

You are creating a chatbot that should answer politely and professionally.

Which prompt type is BEST for defining the chatbot’s behavior?

A. System prompt
B. User prompt
C. Regression prompt
D. OCR prompt


Correct Answer

A. System prompt


Question 18

HOTSPOT / MATCHING

Match the capability to the appropriate scenario.

CapabilityScenario
OCR?
Speech synthesis?
Image classification?

Options:

  • Categorizing animal photos
  • Reading text from receipts
  • Generating spoken responses

Correct Answers

CapabilityScenario
OCRReading text from receipts
Speech synthesisGenerating spoken responses
Image classificationCategorizing animal photos

Question 19

Which AI workload would BEST identify the language used in a sentence?

A. Natural language processing
B. Regression
C. Forecasting
D. Clustering


Correct Answer

A. Natural language processing


Question 20

What is one limitation of generative AI models?

A. They may produce inaccurate information
B. They cannot process text
C. They only work offline
D. They cannot generate images


Correct Answer

A. They may produce inaccurate information


Question 21

A company wants to monitor customer opinions on social media posts.

Which AI capability should they use?

A. Sentiment analysis
B. Object detection
C. OCR
D. Regression


Correct Answer

A. Sentiment analysis


Question 22

MULTIPLE ANSWER

Which tasks are examples of computer vision workloads?

Select ALL that apply.

A. Detecting objects in images
B. Recognizing handwritten text
C. Identifying image categories
D. Translating speech
E. Facial analysis


Correct Answers

A. Detecting objects in images
B. Recognizing handwritten text
C. Identifying image categories
E. Facial analysis


Question 23

Which Responsible AI principle ensures organizations remain responsible for AI decisions and outcomes?

A. Accountability
B. Transparency
C. Inclusiveness
D. Fairness


Correct Answer

A. Accountability


Question 24

FILL IN THE BLANK

__________ analysis evaluates whether text expresses positive, negative, or neutral emotions.


Correct Answer

Sentiment

or

Sentiment analysis


Question 25

Which AI capability would BEST help summarize a recorded business meeting?

A. Speech recognition combined with text summarization
B. Object detection only
C. Regression only
D. Clustering only


Correct Answer

A. Speech recognition combined with text summarization


Question 26

A retailer wants an AI model that automatically creates advertising images from written prompts.

Which capability is MOST appropriate?

A. Image-generation models
B. Forecasting models
C. Regression models
D. OCR models


Correct Answer

A. Image-generation models


Question 27

MULTIPLE ANSWER

Which are advantages of lightweight AI applications?

Select ALL that apply.

A. Faster development
B. Reduced infrastructure complexity
C. Easier cloud integration
D. Guaranteed perfect AI accuracy
E. Simplified deployment


Correct Answers

A. Faster development
B. Reduced infrastructure complexity
C. Easier cloud integration
E. Simplified deployment


Question 28

You want an AI application to analyze uploaded images and answer questions about them.

Which type of model is MOST appropriate?

A. Multimodal model
B. Regression model
C. Clustering model
D. Forecasting model


Correct Answer

A. Multimodal model


Question 29

Which statement about APIs in AI solutions is TRUE?

A. APIs allow applications to exchange data with AI services
B. APIs eliminate the need for authentication
C. APIs physically store microphones
D. APIs replace cloud services


Correct Answer

A. APIs allow applications to exchange data with AI services


Question 30

SCENARIO-BASED QUESTION

A financial company wants an AI solution that:

  • Reads text from loan applications
  • Detects signatures in uploaded forms
  • Converts recorded customer calls into text
  • Generates automated email responses

Which AI capabilities are required?

A. OCR, object detection, speech recognition, and generative AI
B. Regression and clustering only
C. Forecasting and translation only
D. Speech synthesis only


Correct Answer

A. OCR, object detection, speech recognition, and generative AI


Explanation

The scenario requires multiple AI capabilities:

  • OCR for reading application text
  • Object detection for identifying signatures
  • Speech recognition for transcribing calls
  • Generative AI for automated email creation

AI-901: Microsoft Azure AI Fundamentals – Practice Exam #2 (30 Questions)


Question 1

Which machine learning technique is BEST suited for predicting house prices?

A. Clustering
B. Regression
C. Object detection
D. Translation


Correct Answer

B. Regression


Explanation

Regression predicts continuous numeric values such as prices, temperatures, or sales forecasts.


Question 2

A company wants to automatically detect fraudulent credit-card transactions.

Which type of AI workload is MOST appropriate?

A. Classification
B. OCR
C. Image generation
D. Speech synthesis


Correct Answer

A. Classification


Explanation

Fraud detection commonly uses classification models to determine whether transactions are fraudulent or legitimate.


Question 3

Which Responsible AI principle focuses on protecting sensitive user data?

A. Transparency
B. Fairness
C. Privacy and security
D. Inclusiveness


Correct Answer

C. Privacy and security


Question 4

What is the PRIMARY purpose of a user prompt in generative AI?

A. To provide instructions or requests to the model
B. To replace APIs
C. To install operating systems
D. To secure databases


Correct Answer

A. To provide instructions or requests to the model


Question 5

HOTSPOT / MATCHING

Match each AI capability with its correct output.

CapabilityOutput
Speech synthesis?
OCR?
Sentiment analysis?

Options:

  • Emotional tone
  • Spoken audio
  • Extracted text

Correct Answers

CapabilityOutput
Speech synthesisSpoken audio
OCRExtracted text
Sentiment analysisEmotional tone

Question 6

Which type of AI model can generate entirely new images from text prompts?

A. Generative AI model
B. Regression model
C. Clustering model
D. Time-series model


Correct Answer

A. Generative AI model


Question 7

You need an AI solution that converts spoken customer calls into searchable transcripts.

Which capability should you use?

A. Speech recognition
B. Speech synthesis
C. OCR
D. Object detection


Correct Answer

A. Speech recognition


Question 8

MULTIPLE ANSWER

Which are common capabilities of computer vision solutions?

Select ALL that apply.

A. Object detection
B. Image classification
C. OCR
D. Language translation
E. Facial analysis


Correct Answers

A. Object detection
B. Image classification
C. OCR
E. Facial analysis


Question 9

What does an Azure AI endpoint provide?

A. A network-accessible location for interacting with an AI service
B. A physical monitor connection
C. A database backup
D. A printer configuration


Correct Answer

A. A network-accessible location for interacting with an AI service


Question 10

Which AI workload is MOST associated with language translation?

A. Natural language processing
B. Regression
C. Forecasting
D. Clustering


Correct Answer

A. Natural language processing


Question 11

FILL IN THE BLANK

__________ identifies and locates objects within an image or video.


Correct Answer

Object detection


Question 12

A company wants an AI solution that can generate summaries of long documents.

Which AI capability should they use?

A. Text summarization
B. OCR
C. Regression
D. Forecasting


Correct Answer

A. Text summarization


Question 13

Which statement about multimodal AI models is TRUE?

A. They can process multiple content types such as text and images
B. They only process spreadsheets
C. They cannot analyze images
D. They only work with speech input


Correct Answer

A. They can process multiple content types such as text and images


Question 14

You are building an AI solution that extracts invoice numbers and due dates from scanned invoices.

Which technologies are MOST useful?

A. OCR and entity extraction
B. Forecasting and regression
C. Clustering and translation
D. Speech synthesis and object detection


Correct Answer

A. OCR and entity extraction


Question 15

MULTIPLE ANSWER

Which factors can reduce the accuracy of AI vision systems?

Select ALL that apply.

A. Poor lighting
B. Low-resolution images
C. Blurry images
D. Clear high-quality images
E. Obstructed objects


Correct Answers

A. Poor lighting
B. Low-resolution images
C. Blurry images
E. Obstructed objects


Question 16

Which Responsible AI principle focuses on ensuring AI systems work consistently and safely?

A. Reliability and safety
B. Transparency
C. Inclusiveness
D. Fairness


Correct Answer

A. Reliability and safety


Question 17

You deploy a model in Azure AI Foundry.

What is commonly required for applications to securely access the model?

A. Authentication credentials
B. A USB cable
C. A local printer
D. Spreadsheet macros


Correct Answer

A. Authentication credentials


Question 18

HOTSPOT / MATCHING

Match the workload to the correct scenario.

ScenarioWorkload
Predicting future sales revenue?
Detecting emotions in reviews?
Identifying products in store images?

Options:

  • Sentiment analysis
  • Regression
  • Object detection

Correct Answers

ScenarioWorkload
Predicting future sales revenueRegression
Detecting emotions in reviewsSentiment analysis
Identifying products in store imagesObject detection

Question 19

Which AI capability generates written descriptions of images?

A. Image captioning
B. OCR
C. Regression
D. Translation


Correct Answer

A. Image captioning


Question 20

Which statement about hallucinations in generative AI is TRUE?

A. Hallucinations are always intentional
B. Hallucinations are fabricated or inaccurate outputs
C. Hallucinations improve model accuracy
D. Hallucinations only occur in image models


Correct Answer

B. Hallucinations are fabricated or inaccurate outputs


Question 21

A retailer wants to group shoppers based on purchasing patterns without predefined categories.

Which machine learning technique should be used?

A. Clustering
B. Classification
C. OCR
D. Regression


Correct Answer

A. Clustering


Question 22

MULTIPLE ANSWER

Which tasks are examples of information extraction?

Select ALL that apply.

A. Extracting names from documents
B. Reading text from images
C. Detecting keywords in audio
D. Predicting stock prices
E. Identifying invoice totals


Correct Answers

A. Extracting names from documents
B. Reading text from images
C. Detecting keywords in audio
E. Identifying invoice totals


Question 23

Which Responsible AI principle emphasizes that humans remain responsible for AI outcomes?

A. Accountability
B. Fairness
C. Inclusiveness
D. Reliability


Correct Answer

A. Accountability


Question 24

FILL IN THE BLANK

__________ converts written text into spoken audio.


Correct Answer

Speech synthesis


Question 25

Which AI capability would BEST help visually impaired users understand photos?

A. Image captioning
B. Regression
C. Clustering
D. Forecasting


Correct Answer

A. Image captioning


Question 26

A customer-service solution automatically identifies whether callers are angry or satisfied.

Which AI capability is being used?

A. Sentiment analysis
B. OCR
C. Image classification
D. Forecasting


Correct Answer

A. Sentiment analysis


Question 27

MULTIPLE ANSWER

Which are advantages of using cloud-based Azure AI services?

Select ALL that apply.

A. Scalability
B. Reduced infrastructure management
C. Access to pretrained models
D. Elimination of all AI errors
E. Faster deployment


Correct Answers

A. Scalability
B. Reduced infrastructure management
C. Access to pretrained models
E. Faster deployment


Question 28

You need an AI solution that can analyze both spoken words and visual content from videos.

Which type of AI system is MOST appropriate?

A. Multimodal AI
B. Regression-only AI
C. Clustering-only AI
D. Spreadsheet automation AI


Correct Answer

A. Multimodal AI


Question 29

Which statement about APIs in Azure AI solutions is TRUE?

A. APIs allow applications to communicate with AI services
B. APIs physically store images
C. APIs replace authentication
D. APIs only work offline


Correct Answer

A. APIs allow applications to communicate with AI services


Question 30

SCENARIO-BASED QUESTION

A healthcare organization wants an AI application that:

  • Extracts text from medical forms
  • Converts doctor dictation into text
  • Identifies medical equipment in images
  • Summarizes patient notes

Which AI capabilities are required?

A. OCR, speech recognition, object detection, and text summarization
B. Forecasting and clustering only
C. Regression and translation only
D. Speech synthesis only


Correct Answer

A. OCR, speech recognition, object detection, and text summarization


Explanation

The scenario requires multiple AI workloads:

  • OCR for extracting text from forms
  • Speech recognition for doctor dictation
  • Object detection for medical equipment images
  • Text summarization for patient notes

Go to the AI-901 Exam Prep Hub main page

AI-901: Microsoft Azure AI Fundamentals – Practice Exam #1 (30 Questions)


Question 1

Which type of AI workload is primarily used to predict future numeric values?

A. Computer vision
B. Regression
C. Classification
D. Natural language processing


Correct Answer

B. Regression


Explanation

Regression predicts continuous numeric values such as sales forecasts, temperatures, or stock prices.

Why the Other Answers Are Incorrect

  • A. Computer vision analyzes images and video.
  • C. Classification predicts categories rather than numeric values.
  • D. Natural language processing focuses on text and language.

Question 2

You need to determine whether customer feedback is positive, negative, or neutral.

Which AI capability should you use?

A. OCR
B. Object detection
C. Sentiment analysis
D. Speech synthesis


Correct Answer

C. Sentiment analysis


Explanation

Sentiment analysis evaluates emotional tone in text.


Question 3

Which Responsible AI principle focuses on ensuring AI systems treat people equitably?

A. Transparency
B. Fairness
C. Accountability
D. Reliability


Correct Answer

B. Fairness


Question 4

You are building a chatbot that answers customer questions.

Which type of AI workload is MOST appropriate?

A. Generative AI
B. Regression
C. Clustering
D. Forecasting


Correct Answer

A. Generative AI


Explanation

Generative AI models can generate human-like conversational responses.


Question 5

HOTSPOT / MATCHING

Match the AI capability to the correct scenario.

ScenarioCapability
Detecting handwritten text in scanned forms?
Identifying objects in an image?
Converting speech into text?

Options:

  • OCR
  • Speech recognition
  • Object detection

Correct Answers

ScenarioCapability
Detecting handwritten text in scanned formsOCR
Identifying objects in an imageObject detection
Converting speech into textSpeech recognition

Question 6

Which Azure AI capability generates spoken audio from text?

A. Speech recognition
B. Speech synthesis
C. OCR
D. Translation


Correct Answer

B. Speech synthesis


Question 7

You want to create an AI application that analyzes invoices and extracts totals and dates.

Which capability should you use?

A. Object detection
B. OCR and entity extraction
C. Speech synthesis
D. Classification only


Correct Answer

B. OCR and entity extraction


Explanation

Invoices contain text and structured information that can be extracted using OCR and entity extraction.


Question 8

MULTIPLE ANSWER

Which are common Responsible AI principles promoted by Microsoft?

Select ALL that apply.

A. Fairness
B. Transparency
C. Accountability
D. Exclusiveness
E. Reliability and safety


Correct Answers

A. Fairness
B. Transparency
C. Accountability
E. Reliability and safety


Explanation

Microsoft’s Responsible AI principles include:

  • Fairness
  • Reliability and safety
  • Privacy and security
  • Inclusiveness
  • Transparency
  • Accountability

Question 9

What is the PRIMARY purpose of a system prompt in generative AI?

A. To define the behavior and rules for the AI model
B. To increase internet speed
C. To encrypt databases
D. To replace APIs


Correct Answer

A. To define the behavior and rules for the AI model


Question 10

You need to identify cars, bicycles, and pedestrians in traffic-camera footage.

Which AI capability should you use?

A. OCR
B. Object detection
C. Sentiment analysis
D. Translation


Correct Answer

B. Object detection


Question 11

FILL IN THE BLANK

__________ converts spoken language into machine-readable text.


Correct Answer

Speech recognition


Question 12

Which statement about generative AI models is TRUE?

A. They only analyze spreadsheets
B. They can generate new content such as text and images
C. They cannot process natural language
D. They only work offline


Correct Answer

B. They can generate new content such as text and images


Question 13

You are designing an AI solution for visually impaired users that describes images aloud.

Which capability is MOST appropriate?

A. Image captioning
B. Forecasting
C. Regression
D. Clustering


Correct Answer

A. Image captioning


Question 14

Which authentication method helps secure access to Azure AI services?

A. API keys
B. Printer drivers
C. HDMI cables
D. Browser bookmarks


Correct Answer

A. API keys


Question 15

MULTIPLE ANSWER

Which tasks are examples of natural language processing (NLP)?

Select ALL that apply.

A. Language translation
B. Sentiment analysis
C. Image classification
D. Text summarization
E. Entity extraction


Correct Answers

A. Language translation
B. Sentiment analysis
D. Text summarization
E. Entity extraction


Question 16

Which AI workload predicts categories such as “approved” or “denied”?

A. Regression
B. Classification
C. Clustering
D. Computer vision


Correct Answer

B. Classification


Question 17

You are using Azure AI Foundry to deploy a generative AI model.

What must happen before applications can interact with the model?

A. The model must be deployed to an endpoint
B. The model must be printed
C. The operating system must be replaced
D. The database must be deleted


Correct Answer

A. The model must be deployed to an endpoint


Question 18

HOTSPOT / MATCHING

Match each workload with the correct example.

WorkloadExample
Speech AI?
Computer Vision?
Generative AI?

Options:

  • Detecting objects in images
  • Generating marketing text
  • Transcribing audio recordings

Correct Answers

WorkloadExample
Speech AITranscribing audio recordings
Computer VisionDetecting objects in images
Generative AIGenerating marketing text

Question 19

What is a hallucination in generative AI?

A. A hardware failure
B. A networking issue
C. An incorrect or fabricated AI-generated response
D. A database backup


Correct Answer

C. An incorrect or fabricated AI-generated response


Question 20

Which factor can reduce speech-recognition accuracy?

A. Background noise
B. High-quality microphones
C. Clear pronunciation
D. Stable internet connections


Correct Answer

A. Background noise


Question 21

You need to group customers into segments based on purchasing behavior without predefined labels.

Which machine learning technique should you use?

A. Classification
B. Regression
C. Clustering
D. OCR


Correct Answer

C. Clustering


Question 22

MULTIPLE ANSWER

Which capabilities are associated with Azure AI Speech services?

Select ALL that apply.

A. Speech recognition
B. Speech synthesis
C. Translation
D. Object detection
E. Speaker identification


Correct Answers

A. Speech recognition
B. Speech synthesis
C. Translation
E. Speaker identification


Question 23

Which Responsible AI principle emphasizes explaining how AI systems make decisions?

A. Transparency
B. Privacy
C. Inclusiveness
D. Reliability


Correct Answer

A. Transparency


Question 24

FILL IN THE BLANK

__________ extracts machine-readable text from images and scanned documents.


Correct Answer

OCR

or

Optical Character Recognition


Question 25

A company wants to automatically summarize long customer-support conversations.

Which AI capability should they use?

A. Text summarization
B. Object detection
C. Forecasting
D. Regression


Correct Answer

A. Text summarization


Question 26

You need an AI system that can understand both images and text prompts.

Which type of model should you use?

A. Multimodal model
B. Regression model
C. Clustering model
D. Time-series model


Correct Answer

A. Multimodal model


Question 27

MULTIPLE ANSWER

Which are benefits of cloud-based AI services?

Select ALL that apply.

A. Scalability
B. Reduced infrastructure management
C. Automatic access to pretrained models
D. Elimination of all security concerns
E. Faster deployment


Correct Answers

A. Scalability
B. Reduced infrastructure management
C. Automatic access to pretrained models
E. Faster deployment


Question 28

You are creating a lightweight application that sends images to Azure AI services for analysis.

How does the application typically communicate with the service?

A. Through APIs and endpoints
B. Through printer drivers
C. Through USB storage devices
D. Through monitor settings


Correct Answer

A. Through APIs and endpoints


Question 29

Which AI capability is MOST useful for detecting the emotional tone of customer reviews?

A. OCR
B. Sentiment analysis
C. Image classification
D. Speech synthesis


Correct Answer

B. Sentiment analysis


Question 30

SCENARIO-BASED QUESTION

A retail company wants an AI solution that:

  • Extracts text from receipts
  • Detects products in shelf images
  • Analyzes customer-service calls
  • Generates chatbot responses

Which AI workloads are required?

A. OCR, object detection, speech AI, and generative AI
B. Regression only
C. Classification only
D. Forecasting and clustering only


Correct Answer

A. OCR, object detection, speech AI, and generative AI


Explanation

The scenario requires multiple AI capabilities:

  • OCR for receipt text extraction
  • Object detection for shelf-image analysis
  • Speech AI for customer-call analysis
  • Generative AI for chatbot responses

Build a lightweight application with Information Extraction capabilities by using Content Understanding (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
--> Implement AI solutions for information extraction by using Foundry
--> Build a lightweight application with Information Extraction capabilities by using Content Understanding


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Modern organizations often need applications that can automatically extract information from documents, images, audio, and video. Azure AI services and Microsoft Foundry tools make it possible to create lightweight applications that use AI-powered content understanding without requiring advanced machine learning expertise.

For the AI-901 certification exam, candidates should understand the foundational concepts involved in building lightweight applications with information extraction capabilities by using Azure Content Understanding and Microsoft Foundry.

This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.


What Is Information Extraction?

Information extraction is the process of automatically identifying and retrieving useful data from content.

AI systems can extract information from:

  • Documents
  • Images
  • Audio
  • Video
  • Text

Examples include:

  • Names
  • Dates
  • Invoice totals
  • Keywords
  • Objects
  • Spoken words

What Is Azure Content Understanding?

Azure Content Understanding enables AI-powered analysis of different types of content.

Capabilities include:

  • OCR (Optical Character Recognition)
  • Speech recognition
  • Entity extraction
  • Image analysis
  • Video analysis
  • Classification
  • Caption generation

What Is a Lightweight Application?

A lightweight application is a simple application that performs focused tasks using cloud-based AI services.

Characteristics include:

  • Minimal infrastructure
  • API-based communication
  • Rapid development
  • Simple user interface
  • Cloud-hosted AI processing

For AI-901, candidates should understand concepts and workflows rather than advanced coding details.


Azure AI Foundry

Azure AI Foundry provides tools for building and testing AI applications.

Developers can:

  • Access AI models
  • Configure services
  • Test prompts
  • Analyze content
  • Build AI-powered workflows

Common Information Extraction Capabilities


OCR (Optical Character Recognition)

OCR extracts text from images and scanned documents.


Example

Input

Photo of a receipt

Output

  • Store name
  • Total amount
  • Purchase date

Entity Extraction

AI systems can identify important entities within content.


Examples of Entities

  • Names
  • Locations
  • Organizations
  • Phone numbers
  • Dates

Speech Recognition

Speech recognition converts spoken language into text.


Example

Input

Customer support call recording

Output

Searchable transcript


Object Detection

Object detection identifies objects within images or video.


Example

A warehouse-monitoring application may detect:

  • Boxes
  • Forklifts
  • Employees

Sentiment Analysis

Sentiment analysis determines emotional tone.


Example

Customer feedback classified as:

  • Positive
  • Neutral
  • Negative

Typical Lightweight Application Workflow

A lightweight information-extraction application often follows these steps:

  1. User uploads content
  2. Application sends content to Azure AI service
  3. AI analyzes content
  4. Structured results are returned
  5. Application displays extracted information

Example Workflow

User uploads:

  • Image
  • PDF
  • Audio file
  • Video file

AI extracts:

  • Text
  • Keywords
  • Objects
  • Entities
  • Captions

APIs and Endpoints

Applications communicate with Azure AI services through:

  • APIs
  • Endpoints

The application sends content to the AI service and receives structured results.


Authentication

Applications must authenticate securely before using Azure AI services.

Common authentication methods include:

  • API keys
  • Azure credentials
  • Managed identities

Example High-Level Pseudocode

content = upload_file()
results = analyze_content(content)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.


Structured Outputs

AI systems often return structured data formats such as:

  • JSON
  • Tables
  • Lists
  • Metadata

Structured outputs make integration easier.


Example JSON-Like Output

{
"invoiceNumber": "INV-1001",
"date": "2026-05-15",
"total": "$245.99"
}

Common Real-World Scenarios


Scenario 1: Invoice Processing

Goal

Automatically extract invoice data.

Extracted Information

  • Vendor name
  • Invoice number
  • Total amount
  • Due date

Scenario 2: Customer Service Analytics

Goal

Analyze customer interactions.

Extracted Information

  • Topics
  • Sentiment
  • Keywords
  • Transcripts

Scenario 3: Healthcare Document Analysis

Goal

Extract information from medical documents.

Extracted Information

  • Patient names
  • Dates
  • Medical terms

Scenario 4: Media Monitoring

Goal

Analyze audio and video content.

Extracted Information

  • Captions
  • Objects
  • Speakers
  • Keywords

Responsible AI Considerations

Information-extraction applications should follow Responsible AI principles.

Key considerations include:

  • Privacy
  • Fairness
  • Transparency
  • Inclusiveness
  • Accountability
  • Security

Privacy Concerns

Content may contain:

  • Personal information
  • Financial records
  • Medical data
  • Private conversations

Organizations should secure sensitive data appropriately.


Fairness and Bias

AI systems may perform differently across:

  • Languages
  • Accents
  • Demographics
  • Image quality
  • Environmental conditions

Testing and evaluation are important.


Transparency

Users should understand:

  • AI is analyzing their content
  • AI-generated outputs may contain errors
  • Human review may still be needed

Accuracy Limitations

Information-extraction systems may struggle with:

  • Blurry images
  • Poor audio quality
  • Handwritten text
  • Background noise
  • Low-resolution files

Hallucinations and Errors

AI systems may occasionally:

  • Extract incorrect information
  • Misidentify objects
  • Misinterpret speech
  • Generate inaccurate summaries

Applications should validate important outputs.


Error Handling

Applications should handle:

  • Unsupported file formats
  • Corrupted files
  • Authentication failures
  • Network interruptions
  • Rate limits

Advantages of Lightweight AI Applications

Benefits include:

  • Rapid deployment
  • Reduced development complexity
  • Scalability
  • Automation
  • Faster information processing

Limitations of Lightweight AI Applications

Challenges include:

  • Dependence on cloud services
  • Accuracy limitations
  • Privacy concerns
  • Potential bias
  • Environmental variability

Multimodal AI

Modern AI systems can combine:

  • Text
  • Speech
  • Vision
  • Generative AI

These systems can process multiple content types together.


High-Level Architecture

A simplified architecture often includes:

  1. User uploads content
  2. Application sends content to Azure AI service
  3. AI analyzes content
  4. Structured results are returned
  5. Application displays extracted information

Important AI-901 Exam Tips

For the exam, remember these key points:

  • Information extraction retrieves useful data from content.
  • OCR extracts text from images and documents.
  • Speech recognition converts speech into text.
  • Object detection identifies objects within images or video.
  • APIs and endpoints connect applications to Azure AI services.
  • Authentication secures access to AI resources.
  • Structured outputs often use JSON-like formats.
  • Responsible AI principles apply to information extraction systems.
  • Poor-quality content can reduce accuracy.
  • Hallucinations are inaccurate AI-generated outputs.
  • Azure AI Foundry supports AI application development.

Quick Knowledge Check

Question 1

What does OCR do?

Answer

Extracts text from images and scanned documents.


Question 2

What does speech recognition do?

Answer

Converts spoken language into text.


Question 3

Why is authentication important?

Answer

It secures access to Azure AI services.


Question 4

What can reduce information-extraction accuracy?

Answer

Poor-quality images, background noise, and blurry documents.


Practice Exam Questions

Exam: AI-901

Topic: Build a Lightweight Application with Information Extraction Capabilities by Using Content Understanding


Question 1

What is the PRIMARY purpose of information extraction in AI applications?

A. To automatically retrieve useful data from content
B. To increase internet speed
C. To replace operating systems
D. To improve monitor resolution


Correct Answer

A. To automatically retrieve useful data from content


Explanation

Information extraction uses AI to identify and retrieve meaningful data from documents, images, audio, video, and text.


Why the Other Answers Are Incorrect

B. To increase internet speed

Information extraction does not improve networking performance.

C. To replace operating systems

AI extraction tools do not replace operating systems.

D. To improve monitor resolution

This is unrelated to AI information extraction.


Question 2

What does OCR stand for?

A. Optical Character Recognition
B. Open Cloud Routing
C. Operational Content Reporting
D. Object Classification Retrieval


Correct Answer

A. Optical Character Recognition


Explanation

OCR extracts machine-readable text from images and scanned documents.


Why the Other Answers Are Incorrect

B. Open Cloud Routing

This is not an OCR term.

C. Operational Content Reporting

This is unrelated to text extraction.

D. Object Classification Retrieval

This is not the meaning of OCR.


Question 3

Which AI capability converts spoken language into text?

A. Speech recognition
B. Image classification
C. Speech synthesis
D. Object detection


Correct Answer

A. Speech recognition


Explanation

Speech recognition transcribes spoken words into text.


Why the Other Answers Are Incorrect

B. Image classification

This categorizes images.

C. Speech synthesis

This converts text into spoken audio.

D. Object detection

This identifies objects within images or video.


Question 4

What is a lightweight AI application?

A. A simple application that uses cloud AI services for focused tasks
B. A hardware-only system
C. A networking device
D. A spreadsheet management tool


Correct Answer

A. A simple application that uses cloud AI services for focused tasks


Explanation

Lightweight applications typically use APIs and cloud services to provide AI capabilities without requiring complex infrastructure.


Why the Other Answers Are Incorrect

B. A hardware-only system

Lightweight AI apps commonly use cloud services.

C. A networking device

Networking devices are unrelated.

D. A spreadsheet management tool

This is unrelated to AI application design.


Question 5

How do lightweight AI applications commonly communicate with Azure AI services?

A. Through APIs and endpoints
B. Through printer drivers
C. Through monitor settings
D. Through USB-only connections


Correct Answer

A. Through APIs and endpoints


Explanation

Applications use APIs and endpoints to send content to Azure AI services and receive analysis results.


Why the Other Answers Are Incorrect

B. Through printer drivers

Printers are unrelated to Azure AI communication.

C. Through monitor settings

This is unrelated to cloud AI services.

D. Through USB-only connections

Cloud AI services use network communication.


Question 6

Why is authentication important in Azure AI applications?

A. To secure access to AI resources
B. To improve image brightness
C. To increase network speed
D. To improve speaker volume


Correct Answer

A. To secure access to AI resources


Explanation

Authentication ensures that only authorized users and applications can access Azure AI services.


Why the Other Answers Are Incorrect

B. To improve image brightness

Authentication does not affect image quality.

C. To increase network speed

Authentication does not improve networking.

D. To improve speaker volume

Authentication does not affect audio playback.


Question 7

Which format is commonly used for structured AI output data?

A. JSON
B. JPEG
C. MP3
D. ZIP


Correct Answer

A. JSON


Explanation

AI systems often return structured data in JSON-like formats for easy application integration.


Why the Other Answers Are Incorrect

B. JPEG

JPEG is an image format.

C. MP3

MP3 is an audio format.

D. ZIP

ZIP is a compressed archive format.


Question 8

Which factor can reduce information-extraction accuracy?

A. Poor-quality input content
B. Spreadsheet formatting
C. Keyboard layout changes
D. Screen brightness settings


Correct Answer

A. Poor-quality input content


Explanation

Blurry images, poor audio quality, and noisy environments can negatively affect AI extraction accuracy.


Why the Other Answers Are Incorrect

B. Spreadsheet formatting

This does not affect AI extraction services.

C. Keyboard layout changes

This is unrelated to AI analysis.

D. Screen brightness settings

This does not affect AI processing accuracy.


Question 9

Which Responsible AI concern is especially important for information extraction applications?

A. Protecting sensitive personal data
B. Increasing printer performance
C. Improving spreadsheet formulas
D. Reducing monitor power usage


Correct Answer

A. Protecting sensitive personal data


Explanation

Extracted content may contain financial, medical, or personal information that must be protected securely.


Why the Other Answers Are Incorrect

B. Increasing printer performance

This is unrelated to Responsible AI.

C. Improving spreadsheet formulas

This is unrelated to information extraction.

D. Reducing monitor power usage

This is unrelated to AI ethics.


Question 10

What are hallucinations in AI information-extraction systems?

A. Incorrect or fabricated AI-generated outputs
B. Hardware installation failures
C. Network outages
D. Operating system crashes


Correct Answer

A. Incorrect or fabricated AI-generated outputs


Explanation

Hallucinations occur when AI systems generate inaccurate extracted information, captions, summaries, or identifications.


Why the Other Answers Are Incorrect

B. Hardware installation failures

This is unrelated to AI-generated outputs.

C. Network outages

This is a connectivity issue.

D. Operating system crashes

This is unrelated to AI hallucinations.


Final Thoughts

Building lightweight applications with information extraction capabilities is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as OCR, speech recognition, APIs, authentication, structured outputs, Responsible AI principles, and lightweight AI workflows.

Azure AI services and Azure AI Foundry provide powerful tools for creating scalable applications capable of extracting valuable information from text, images, audio, video, and documents.


Go to the AI-901 Exam Prep Hub main page

Extract information from audio and video by using Content Understanding (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
--> Implement AI solutions for information extraction by using Foundry
--> Extract information from audio and video by using Content Understanding


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Organizations increasingly rely on AI systems to analyze audio and video content for automation, accessibility, security, analytics, and customer experiences. AI-powered content understanding solutions can extract valuable information from spoken language, sounds, images, and moving video streams.

For the AI-901 certification exam, candidates should understand the foundational concepts behind extracting information from audio and video by using Azure Content Understanding and Microsoft Foundry tools.

This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.


What Is Content Understanding?

Content understanding refers to AI systems analyzing and interpreting different forms of content, including:

  • Audio
  • Video
  • Images
  • Documents
  • Text

AI systems can identify patterns, extract information, and generate useful insights.


Azure Content Understanding

Azure Content Understanding enables AI-powered analysis of multimedia content.

Capabilities include:

  • Speech recognition
  • Video analysis
  • Speaker identification
  • Caption generation
  • Object detection
  • Keyword extraction

Azure AI Foundry

Azure AI Foundry provides tools for building, testing, and managing AI applications.

Developers can:

  • Deploy AI services
  • Process multimedia content
  • Build lightweight applications
  • Test AI workflows

Audio Information Extraction

AI systems can analyze audio files to extract useful information.

Examples include:

  • Spoken words
  • Speaker identity
  • Keywords
  • Emotions
  • Language detection

Speech Recognition

Speech recognition converts spoken language into text.


Example

Input

Audio recording of a meeting

Output

Meeting transcript


Speaker Identification

AI systems can distinguish between different speakers.


Example

A meeting transcription may identify:

  • Speaker 1
  • Speaker 2
  • Speaker 3

Language Detection

AI systems can identify the spoken language within audio content.


Example

An AI system determines whether audio is:

  • English
  • Spanish
  • French
  • Japanese

Keyword Extraction

AI systems can identify important terms within conversations.


Example

A customer support call may extract:

  • Product names
  • Complaint topics
  • Order numbers

Sentiment Analysis

AI systems can analyze emotional tone in speech.


Example

A customer call may be classified as:

  • Positive
  • Neutral
  • Negative

Video Information Extraction

Video analysis combines:

  • Audio analysis
  • Image analysis
  • Motion analysis

Common Video Analysis Capabilities

AI systems may perform:

  • Object detection
  • Facial analysis
  • Activity recognition
  • Scene description
  • Text extraction
  • Caption generation

Object Detection in Video

AI systems can identify objects appearing in video frames.


Example

A traffic-monitoring system may detect:

  • Cars
  • Trucks
  • Pedestrians
  • Traffic lights

Scene Detection

AI systems can identify scene changes within videos.


Example

A sports video may identify:

  • Game start
  • Replay segments
  • Commercial breaks

Video Captioning

AI systems can generate descriptions or subtitles for videos.


Example

A training video may automatically generate captions for accessibility.


Optical Character Recognition (OCR) in Video

AI systems can extract text appearing in video frames.


Example

A video may contain:

  • Street signs
  • License plates
  • Product labels

APIs and Endpoints

Applications communicate with Azure AI services using:

  • APIs
  • Endpoints

Audio and video content is submitted programmatically for analysis.


Authentication

Applications must securely authenticate before accessing Azure AI services.

Common authentication methods include:

  • API keys
  • Azure credentials
  • Managed identities

Lightweight Application Workflow

A typical workflow includes:

  1. User uploads audio or video
  2. Application sends content to AI service
  3. AI analyzes multimedia content
  4. Results are returned
  5. Application displays extracted information

Example High-Level Pseudocode

media = upload_media()
results = analyze_media(media)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.


Common Real-World Scenarios


Scenario 1: Meeting Transcription

Goal

Convert meeting audio into searchable text.

Features

  • Speech recognition
  • Speaker identification
  • Keyword extraction

Scenario 2: Call Center Analytics

Goal

Analyze customer service calls.

Features

  • Sentiment analysis
  • Topic extraction
  • Call summarization

Scenario 3: Security Monitoring

Goal

Analyze surveillance video.

Features

  • Object detection
  • Activity recognition
  • Facial analysis

Scenario 4: Video Accessibility

Goal

Improve accessibility for multimedia content.

Features

  • Caption generation
  • Speech transcription
  • Scene descriptions

Responsible AI Considerations

Audio and video AI systems should follow Responsible AI principles.

Key considerations include:

  • Privacy
  • Fairness
  • Transparency
  • Inclusiveness
  • Accountability
  • Security

Privacy Concerns

Audio and video may contain:

  • Personal conversations
  • Faces
  • Biometric data
  • Sensitive information

Organizations should protect multimedia data appropriately.


Fairness and Bias

Speech and video systems may perform differently across:

  • Languages
  • Accents
  • Dialects
  • Lighting conditions
  • Demographics

Testing and evaluation are important.


Transparency

Users should understand:

  • AI is analyzing multimedia content
  • AI-generated outputs may contain errors
  • Human review may still be needed

Accuracy Limitations

Audio and video analysis systems may struggle with:

  • Background noise
  • Poor audio quality
  • Low-resolution video
  • Obstructed visuals
  • Multiple overlapping speakers

Hallucinations and Errors

AI systems may occasionally:

  • Misidentify speakers
  • Generate inaccurate captions
  • Misinterpret speech
  • Detect nonexistent objects

Applications should validate important outputs.


Error Handling

Applications should handle:

  • Unsupported file formats
  • Corrupted media files
  • Authentication failures
  • Network interruptions
  • Rate limits

Advantages of Multimedia Information Extraction

Benefits include:

  • Automation
  • Faster analysis
  • Improved accessibility
  • Searchable content
  • Scalable processing

Limitations of Multimedia Information Extraction

Challenges include:

  • Privacy concerns
  • Accuracy limitations
  • Bias
  • Environmental variability
  • Ethical considerations

Multimodal AI

Modern AI systems may combine:

  • Speech
  • Vision
  • Text
  • Generative AI

These systems can:

  • Analyze multimedia content
  • Answer questions
  • Generate summaries
  • Create captions and descriptions

High-Level Architecture

A simplified architecture often includes:

  1. User uploads audio/video
  2. Application sends media to Azure AI service
  3. AI processes multimedia content
  4. Structured results are returned
  5. Application displays extracted information

Important AI-901 Exam Tips

For the exam, remember these key points:

  • Speech recognition converts speech to text.
  • Speaker identification distinguishes speakers.
  • Sentiment analysis detects emotional tone.
  • OCR can extract text from video frames.
  • Object detection identifies objects in video.
  • APIs and endpoints connect applications to AI services.
  • Authentication secures AI resources.
  • Responsible AI principles apply to multimedia AI systems.
  • Poor audio or video quality can reduce accuracy.
  • Hallucinations are inaccurate AI-generated outputs.
  • Azure AI Foundry supports multimedia AI application development.

Quick Knowledge Check

Question 1

What does speech recognition do?

Answer

Converts spoken language into text.


Question 2

What is speaker identification?

Answer

Distinguishing between different speakers in audio content.


Question 3

Why is authentication important?

Answer

It secures access to Azure AI services.


Question 4

What can reduce multimedia-analysis accuracy?

Answer

Background noise, low-quality audio, and poor video quality.


Practice Exam Questions

Exam: AI-901

Topic: Extract Information from Audio and Video by Using Content Understanding


Question 1

What is the PRIMARY purpose of content understanding in AI systems?

A. To analyze and interpret multimedia content such as audio and video
B. To increase internet bandwidth
C. To replace operating systems
D. To improve keyboard performance


Correct Answer

A. To analyze and interpret multimedia content such as audio and video


Explanation

Content understanding enables AI systems to analyze audio, video, images, and other forms of content to extract useful information.


Why the Other Answers Are Incorrect

B. To increase internet bandwidth

Content understanding does not improve networking speed.

C. To replace operating systems

AI multimedia analysis does not replace operating systems.

D. To improve keyboard performance

This is unrelated to AI content understanding.


Question 2

What does speech recognition do?

A. Converts spoken language into text
B. Converts images into audio
C. Encrypts media files
D. Repairs damaged videos


Correct Answer

A. Converts spoken language into text


Explanation

Speech recognition transcribes spoken words into machine-readable text.


Why the Other Answers Are Incorrect

B. Converts images into audio

This is unrelated to speech recognition.

C. Encrypts media files

Encryption is unrelated to speech transcription.

D. Repairs damaged videos

Speech recognition does not repair media files.


Question 3

Which AI capability identifies different speakers in an audio recording?

A. Speaker identification
B. OCR
C. Image classification
D. Object compression


Correct Answer

A. Speaker identification


Explanation

Speaker identification distinguishes between different speakers within audio content.


Why the Other Answers Are Incorrect

B. OCR

OCR extracts text from images.

C. Image classification

This categorizes images.

D. Object compression

This is not a multimedia AI capability.


Question 4

What is sentiment analysis used for in audio processing?

A. Detecting emotional tone in speech
B. Increasing audio volume
C. Compressing audio files
D. Repairing broken microphones


Correct Answer

A. Detecting emotional tone in speech


Explanation

Sentiment analysis identifies whether speech content is positive, negative, or neutral.


Why the Other Answers Are Incorrect

B. Increasing audio volume

This is unrelated to AI analysis.

C. Compressing audio files

Compression is unrelated to sentiment detection.

D. Repairing broken microphones

This is a hardware issue.


Question 5

Which AI capability can extract text from video frames?

A. OCR
B. Speech synthesis
C. Audio normalization
D. File compression


Correct Answer

A. OCR


Explanation

OCR can identify and extract text that appears visually within video frames.


Why the Other Answers Are Incorrect

B. Speech synthesis

This converts text into speech.

C. Audio normalization

This adjusts sound levels.

D. File compression

This reduces file size.


Question 6

How do lightweight multimedia-analysis applications typically communicate with Azure AI services?

A. Through APIs and endpoints
B. Through printer drivers
C. Through monitor settings
D. Through USB-only connections


Correct Answer

A. Through APIs and endpoints


Explanation

Applications use APIs and endpoints to send audio and video content to Azure AI services for analysis.


Why the Other Answers Are Incorrect

B. Through printer drivers

Printers are unrelated to multimedia AI communication.

C. Through monitor settings

This is unrelated to cloud AI services.

D. Through USB-only connections

Cloud AI services use network communication.


Question 7

Why is authentication important when using Azure AI multimedia services?

A. To secure access to AI resources
B. To improve speaker volume
C. To increase internet speed
D. To improve video resolution


Correct Answer

A. To secure access to AI resources


Explanation

Authentication ensures that only authorized users and applications can access Azure AI services.


Why the Other Answers Are Incorrect

B. To improve speaker volume

Authentication does not affect sound levels.

C. To increase internet speed

Authentication does not improve networking.

D. To improve video resolution

Authentication does not affect video quality.


Question 8

Which factor can reduce speech-recognition accuracy?

A. Background noise
B. Spreadsheet formatting
C. Keyboard layout changes
D. Monitor brightness


Correct Answer

A. Background noise


Explanation

Noise and poor audio quality can make it difficult for AI systems to correctly recognize speech.


Why the Other Answers Are Incorrect

B. Spreadsheet formatting

This does not affect audio AI systems.

C. Keyboard layout changes

This is unrelated to speech recognition.

D. Monitor brightness

This does not affect audio analysis.


Question 9

Which Responsible AI concern is especially important for audio and video analysis systems?

A. Protecting sensitive personal information
B. Increasing printer speed
C. Improving spreadsheet formulas
D. Reducing file storage costs


Correct Answer

A. Protecting sensitive personal information


Explanation

Audio and video files may contain faces, voices, and personal conversations that require privacy protection.


Why the Other Answers Are Incorrect

B. Increasing printer speed

This is unrelated to Responsible AI.

C. Improving spreadsheet formulas

This is unrelated to multimedia analysis.

D. Reducing file storage costs

This is not a Responsible AI principle.


Question 10

What are hallucinations in multimedia AI systems?

A. Incorrect or fabricated AI-generated outputs
B. Hardware installation failures
C. Network outages
D. Speaker hardware malfunctions


Correct Answer

A. Incorrect or fabricated AI-generated outputs


Explanation

Hallucinations occur when AI systems produce inaccurate captions, object detections, speaker identifications, or transcriptions.


Why the Other Answers Are Incorrect

B. Hardware installation failures

This is unrelated to AI-generated outputs.

C. Network outages

This is a connectivity issue.

D. Speaker hardware malfunctions

This is a hardware problem, not an AI hallucination.


Final Thoughts

Extracting information from audio and video by using Content Understanding is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as speech recognition, video analysis, OCR, APIs, authentication, Responsible AI principles, and lightweight multimedia-analysis workflows.

Azure AI services and Azure AI Foundry provide powerful tools for building intelligent multimedia applications capable of understanding spoken language, video content, and visual information at scale.


Go to the AI-901 Exam Prep Hub main page

Extract information from images by using Content Understanding (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
--> Implement AI solutions for information extraction by using Foundry
--> Extract information from images by using Content Understanding


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Modern AI systems can analyze images and extract meaningful information automatically. Organizations use image analysis solutions for automation, accessibility, security, healthcare, retail, and business intelligence.

For the AI-901 certification exam, candidates should understand the foundational concepts behind extracting information from images by using Azure Content Understanding and Microsoft Foundry tools.

This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.


What Is Image Information Extraction?

Image information extraction is the process of analyzing images to identify and retrieve useful information.

AI systems can detect:

  • Text
  • Objects
  • Faces
  • Colors
  • Products
  • Landmarks
  • Visual patterns

What Is Azure Content Understanding?

Azure Content Understanding enables AI systems to interpret and analyze content such as:

  • Images
  • Documents
  • Audio
  • Video

Capabilities include:

  • OCR
  • Object detection
  • Classification
  • Caption generation
  • Metadata extraction

Azure AI Foundry

Azure AI Foundry provides tools for building, testing, and managing AI-powered applications.

Developers can:

  • Access AI models
  • Analyze images
  • Build lightweight applications
  • Test AI workflows

Common Image Extraction Techniques


Optical Character Recognition (OCR)

OCR extracts text from images.


Example

Image

Photo of a street sign

OCR Output

“Main Street”


Object Detection

Object detection identifies objects and their locations within images.


Example

Detected Objects

  • Car
  • Bicycle
  • Traffic light
  • Person

Image Classification

Image classification determines the overall category of an image.


Example

Image

Photo of a cat

Classification

“Cat”


Facial Analysis

AI systems can analyze facial characteristics.

Capabilities may include:

  • Face detection
  • Emotion analysis
  • Age estimation

Responsible AI considerations are especially important for facial-analysis systems.


Image Captioning

Image captioning generates natural-language descriptions of images.


Example

Image

A dog running on a beach

Caption

“A brown dog running along a sandy beach.”


Metadata Extraction

AI systems can extract metadata and contextual information from images.

Examples include:

  • Time
  • Location
  • Camera details
  • Image dimensions

Barcode and QR Code Detection

AI systems can identify and decode:

  • Barcodes
  • QR codes

Example

Retail applications may scan product barcodes for inventory management.


APIs and Endpoints

Applications communicate with Azure AI services using:

  • APIs
  • Endpoints

Images are submitted programmatically for analysis.


Authentication

Applications must securely authenticate before accessing AI services.

Common methods include:

  • API keys
  • Azure credentials
  • Managed identities

Lightweight Application Workflow

A typical workflow includes:

  1. User uploads image
  2. Application sends image to AI service
  3. AI analyzes image
  4. Results are returned
  5. Application displays extracted information

Example High-Level Pseudocode

image = upload_image()
results = analyze_image(image)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.


Common Real-World Scenarios


Scenario 1: Receipt Scanner

Goal

Extract purchase details from receipt images.

Features

  • OCR
  • Table extraction
  • Total amount detection

Scenario 2: Accessibility Assistant

Goal

Describe images for visually impaired users.

Features

  • Image captioning
  • OCR
  • Object detection

Scenario 3: Retail Inventory

Goal

Identify products from shelf images.

Features

  • Barcode scanning
  • Object detection
  • Classification

Scenario 4: Traffic Monitoring

Goal

Analyze roadway images.

Features

  • Vehicle detection
  • Traffic analysis
  • License plate reading

Responsible AI Considerations

Image-analysis applications should follow Responsible AI principles.

Key considerations include:

  • Privacy
  • Fairness
  • Transparency
  • Inclusiveness
  • Accountability
  • Security

Privacy Concerns

Images may contain:

  • Faces
  • Personal information
  • License plates
  • Sensitive documents

Organizations should protect image data appropriately.


Fairness and Bias

Vision systems may perform differently across:

  • Lighting conditions
  • Skin tones
  • Environmental conditions
  • Camera quality

Testing and evaluation are important.


Transparency

Users should understand:

  • AI is analyzing images
  • AI-generated outputs may contain errors
  • Images may be processed in the cloud

Accuracy Limitations

Image extraction systems may struggle with:

  • Blurry images
  • Poor lighting
  • Obstructed objects
  • Low-resolution images

Hallucinations and Errors

AI systems may occasionally:

  • Misidentify objects
  • Generate incorrect captions
  • Extract inaccurate text

Applications should validate important outputs.


Error Handling

Applications should handle:

  • Unsupported image formats
  • Corrupted files
  • Authentication failures
  • Network interruptions
  • Rate limits

Advantages of Image Extraction AI

Benefits include:

  • Faster processing
  • Automation
  • Scalability
  • Accessibility improvements
  • Reduced manual work

Limitations of Image Extraction AI

Challenges include:

  • Accuracy limitations
  • Bias
  • Privacy concerns
  • Environmental variability
  • Ethical considerations

Multimodal AI

Some modern AI systems combine:

  • Vision
  • Text
  • Speech
  • Generative AI

These systems can:

  • Analyze images
  • Answer visual questions
  • Generate descriptions
  • Create new content

High-Level Architecture

A simplified architecture often includes:

  1. User uploads image
  2. Application sends image to Azure AI service
  3. AI processes image
  4. Structured results are returned
  5. Application displays information

Important AI-901 Exam Tips

For the exam, remember these key points:

  • OCR extracts text from images.
  • Object detection identifies objects and locations.
  • Image classification categorizes images.
  • Image captioning generates natural-language descriptions.
  • APIs and endpoints connect applications to AI services.
  • Authentication secures access to AI resources.
  • Responsible AI principles apply to image-analysis systems.
  • Poor image quality can reduce accuracy.
  • Hallucinations are inaccurate AI-generated outputs.
  • Azure AI Foundry supports AI application development.

Quick Knowledge Check

Question 1

What does OCR do?

Answer

Extracts machine-readable text from images.


Question 2

What is object detection?

Answer

Identifying and locating objects within an image.


Question 3

Why is authentication important?

Answer

It secures access to Azure AI services.


Question 4

What can reduce image-analysis accuracy?

Answer

Poor lighting, blur, and low-resolution images.


Practice Exam Questions

Exam: AI-901

Topic: Extract Information from Images by Using Content Understanding


Question 1

What is the PRIMARY purpose of image information extraction?

A. To analyze images and retrieve useful information
B. To increase internet bandwidth
C. To manage operating systems
D. To improve printer performance


Correct Answer

A. To analyze images and retrieve useful information


Explanation

Image information extraction uses AI to identify and retrieve meaningful data from images, such as text, objects, and visual patterns.


Why the Other Answers Are Incorrect

B. To increase internet bandwidth

Image analysis does not affect networking speed.

C. To manage operating systems

This is unrelated to computer vision.

D. To improve printer performance

Printers are unrelated to AI image extraction.


Question 2

What does OCR stand for?

A. Optical Character Recognition
B. Open Content Routing
C. Object Classification Reporting
D. Operational Cloud Rendering


Correct Answer

A. Optical Character Recognition


Explanation

OCR extracts machine-readable text from images and scanned documents.


Why the Other Answers Are Incorrect

B. Open Content Routing

This is not the meaning of OCR.

C. Object Classification Reporting

This is unrelated to text extraction.

D. Operational Cloud Rendering

This is not an OCR term.


Question 3

Which computer vision capability identifies multiple objects and their locations within an image?

A. Object detection
B. Speech synthesis
C. Text summarization
D. Audio transcription


Correct Answer

A. Object detection


Explanation

Object detection identifies objects and determines where they appear within an image.


Why the Other Answers Are Incorrect

B. Speech synthesis

This converts text into speech.

C. Text summarization

This is a text-analysis task.

D. Audio transcription

This converts speech into text.


Question 4

What is image classification?

A. Categorizing an image based on its contents
B. Compressing image file sizes
C. Encrypting image data
D. Converting images into spreadsheets


Correct Answer

A. Categorizing an image based on its contents


Explanation

Image classification determines the overall category or subject represented in an image.


Why the Other Answers Are Incorrect

B. Compressing image file sizes

Compression is unrelated to classification.

C. Encrypting image data

Encryption is unrelated to image categorization.

D. Converting images into spreadsheets

This is unrelated to computer vision.


Question 5

What does image captioning do?

A. Generates natural-language descriptions of images
B. Repairs corrupted image files
C. Converts speech into text
D. Improves internet speeds


Correct Answer

A. Generates natural-language descriptions of images


Explanation

Image captioning creates descriptive text that explains the contents of an image.


Why the Other Answers Are Incorrect

B. Repairs corrupted image files

This is unrelated to caption generation.

C. Converts speech into text

This is speech recognition.

D. Improves internet speeds

This is unrelated to AI image analysis.


Question 6

How do lightweight image-analysis applications typically communicate with Azure AI services?

A. Through APIs and endpoints
B. Through printer drivers
C. Through monitor settings
D. Through USB-only connections


Correct Answer

A. Through APIs and endpoints


Explanation

Applications send images to cloud AI services through APIs and service endpoints.


Why the Other Answers Are Incorrect

B. Through printer drivers

Printers are unrelated to AI communication.

C. Through monitor settings

This is unrelated to cloud AI services.

D. Through USB-only connections

Cloud services use network communication.


Question 7

Why is authentication important when using Azure AI services?

A. To secure access to AI resources
B. To improve image brightness
C. To reduce image resolution
D. To increase network speed


Correct Answer

A. To secure access to AI resources


Explanation

Authentication ensures that only authorized users and applications can access Azure AI services.


Why the Other Answers Are Incorrect

B. To improve image brightness

Authentication does not affect image quality.

C. To reduce image resolution

Authentication is unrelated to image resolution.

D. To increase network speed

Authentication does not improve internet performance.


Question 8

Which Responsible AI concern is especially important for image-analysis systems?

A. Protecting personal and sensitive visual information
B. Increasing printer speed
C. Improving spreadsheet formulas
D. Reducing monitor power usage


Correct Answer

A. Protecting personal and sensitive visual information


Explanation

Images may contain sensitive information such as faces, license plates, and documents that must be protected.


Why the Other Answers Are Incorrect

B. Increasing printer speed

This is unrelated to Responsible AI.

C. Improving spreadsheet formulas

This is unrelated to image analysis.

D. Reducing monitor power usage

This is unrelated to AI ethics.


Question 9

Which factor can reduce image-analysis accuracy?

A. Poor image quality
B. Spreadsheet formatting
C. Keyboard layout changes
D. Audio playback speed


Correct Answer

A. Poor image quality


Explanation

Blur, poor lighting, and low-resolution images can negatively affect AI analysis accuracy.


Why the Other Answers Are Incorrect

B. Spreadsheet formatting

This does not affect image AI systems.

C. Keyboard layout changes

This is unrelated to computer vision.

D. Audio playback speed

This is unrelated to image processing.


Question 10

What are hallucinations in AI image-analysis systems?

A. Incorrect or fabricated AI-generated outputs
B. Hardware installation failures
C. Network outages
D. Audio recording problems


Correct Answer

A. Incorrect or fabricated AI-generated outputs


Explanation

Hallucinations occur when AI systems generate inaccurate captions, object identifications, or extracted information.


Why the Other Answers Are Incorrect

B. Hardware installation failures

This is unrelated to AI-generated outputs.

C. Network outages

This is a connectivity issue.

D. Audio recording problems

This is unrelated to image-analysis systems.


Final Thoughts

Extracting information from images by using Content Understanding is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as OCR, object detection, image classification, APIs, authentication, Responsible AI principles, and lightweight image-analysis workflows.

Azure AI services and Azure AI Foundry provide powerful tools for building scalable AI applications capable of understanding and extracting valuable information from visual content.


Go to the AI-901 Exam Prep Hub main page

Build a lightweight application that includes vision capabilities (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
--> Implement AI solutions with computer vision and image-generation capabilities by using Foundry
--> Build a lightweight application that includes vision capabilities


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Computer vision enables AI systems to interpret and analyze visual information such as images and videos. Organizations use computer vision solutions for automation, accessibility, security, analytics, and customer experiences.

For the AI-901 certification exam, candidates should understand the foundational concepts behind building lightweight applications that include vision capabilities by using Microsoft Azure AI services and Azure AI Foundry.

This topic falls under the “Implement AI solutions with computer vision and image-generation capabilities by using Foundry” section of the AI-901 exam objectives.


What Is Computer Vision?

Computer vision is a field of AI that enables systems to analyze and understand visual information.

Visual data may include:

  • Images
  • Videos
  • Scanned documents
  • Camera feeds

Common Computer Vision Tasks

Computer vision systems commonly perform:

  • Image classification
  • Object detection
  • Optical character recognition (OCR)
  • Facial analysis
  • Image captioning
  • Content moderation

Azure AI Vision

Azure AI Vision provides computer vision capabilities through cloud-based AI services.

Features include:

  • Image analysis
  • OCR
  • Object detection
  • Image captioning
  • Facial attribute analysis

What Is a Lightweight Application?

A lightweight application is a simple application designed to perform focused tasks with minimal complexity and infrastructure.

Characteristics include:

  • Simple user interface
  • Fast deployment
  • Minimal resource usage
  • Easy maintenance

Examples of Lightweight Vision Applications

Examples include:

  • Image analysis tools
  • Receipt scanning apps
  • Accessibility assistants
  • Product recognition apps
  • Photo-tagging systems

Azure AI Foundry

Azure AI Foundry provides tools for building, testing, and managing AI-powered applications.

Developers can:

  • Access AI models
  • Deploy services
  • Test prompts
  • Build AI workflows

Image Classification

Image classification identifies the main category or subject of an image.


Example

Image

Photo of a bicycle

Classification

“Bicycle”


Object Detection

Object detection identifies multiple objects and their locations within an image.


Example

Image

Street scene

Detected Objects

  • Car
  • Traffic light
  • Pedestrian
  • Bicycle

Optical Character Recognition (OCR)

OCR extracts text from images and scanned documents.


Example

Image

Photo of a restaurant menu

Extracted Text

Menu items and prices


Image Captioning

Image captioning generates natural-language descriptions of images.


Example

Image

A dog playing in a park

Caption

“A brown dog running through a grassy park.”


Facial Analysis

Computer vision systems can analyze facial features.

Possible capabilities include:

  • Face detection
  • Emotion analysis
  • Age estimation

For Responsible AI reasons, facial recognition and identification systems require careful consideration.


APIs and Endpoints

Applications communicate with Azure AI services using:

  • APIs
  • Endpoints

These allow images to be analyzed programmatically.


Authentication

Applications must securely authenticate before accessing Azure AI services.

Common authentication methods include:

  • API keys
  • Azure credentials
  • Managed identities

User Interface Components

A lightweight vision application may include:

  • Image upload area
  • Camera capture button
  • Results display
  • Image preview

Real-Time Image Processing

Some applications process images in near real time.

Examples include:

  • Security monitoring
  • Live object detection
  • Accessibility tools

Example Workflow

A common workflow includes:

  1. User uploads image
  2. Application sends image to Azure AI Vision
  3. AI service analyzes image
  4. Results are returned
  5. Application displays findings

Example High-Level Pseudocode

image = upload_image()
results = analyze_image(image)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.


Common Real-World Scenarios


Scenario 1: Receipt Scanner

Goal

Extract purchase information from receipts.

Features

  • OCR
  • Text extraction
  • Data organization

Scenario 2: Accessibility Assistant

Goal

Describe images for visually impaired users.

Features

  • Image captioning
  • OCR
  • Spoken descriptions

Scenario 3: Product Recognition

Goal

Identify products from photos.

Features

  • Object detection
  • Classification
  • Product lookup

Scenario 4: Content Moderation

Goal

Identify harmful or inappropriate images.

Features

  • Image analysis
  • Safety detection
  • Automated filtering

Responsible AI Considerations

Vision-enabled applications should follow Responsible AI principles.

Key considerations include:

  • Fairness
  • Privacy
  • Transparency
  • Inclusiveness
  • Accountability
  • Security

Privacy Concerns

Images may contain:

  • Personal data
  • Faces
  • Sensitive documents
  • Location information

Organizations should protect visual data appropriately.


Bias and Fairness

Computer vision systems may perform unevenly across:

  • Skin tones
  • Lighting conditions
  • Demographics
  • Environmental conditions

Testing and evaluation are important for fairness.


Transparency

Users should understand:

  • AI is analyzing images
  • AI-generated results may contain errors
  • Images may be processed in the cloud

Hallucinations and Errors

Vision systems may occasionally generate:

  • Incorrect captions
  • False detections
  • Inaccurate classifications

These incorrect outputs are sometimes called hallucinations.


Error Handling

Applications should handle:

  • Invalid image formats
  • Poor image quality
  • Authentication failures
  • Network interruptions
  • Rate limits

Image Quality Challenges

Computer vision accuracy can decrease with:

  • Blurry images
  • Poor lighting
  • Low resolution
  • Obstructed objects

Advantages of Vision Applications

Benefits include:

  • Automation
  • Faster analysis
  • Accessibility improvements
  • Improved customer experiences
  • Scalable image processing

Limitations of Vision Applications

Challenges include:

  • Recognition inaccuracies
  • Bias
  • Privacy concerns
  • Variable image quality
  • Ethical considerations

High-Level Architecture

A simplified architecture often includes:

  1. User interface
  2. Image upload/capture
  3. Azure AI Vision service
  4. AI analysis
  5. Results display

Generative Vision Capabilities

Some modern systems combine:

  • Computer vision
  • Generative AI

These multimodal systems can:

  • Analyze images
  • Generate descriptions
  • Answer visual questions
  • Create new images

Important AI-901 Exam Tips

For the exam, remember these key points:

  • Computer vision analyzes visual information.
  • Azure AI Vision provides computer vision capabilities.
  • OCR extracts text from images.
  • Object detection identifies multiple objects in images.
  • Image captioning generates natural-language image descriptions.
  • APIs and endpoints connect applications to Azure AI services.
  • Authentication secures service access.
  • Responsible AI principles apply to computer vision systems.
  • Image quality affects AI accuracy.
  • Hallucinations are inaccurate AI-generated outputs.

Quick Knowledge Check

Question 1

What does OCR do?

Answer

Extracts text from images and scanned documents.


Question 2

What is object detection?

Answer

Identifying and locating objects within an image.


Question 3

Why is authentication important?

Answer

It secures access to Azure AI services.


Question 4

What can reduce computer vision accuracy?

Answer

Poor image quality such as blur or low lighting.


Practice Exam Questions

Question 1

What is the PRIMARY purpose of computer vision?

A. To enable AI systems to analyze and understand visual information
B. To increase internet bandwidth
C. To manage database backups
D. To improve keyboard performance


Correct Answer

A. To enable AI systems to analyze and understand visual information


Explanation

Computer vision allows AI systems to process and interpret images, videos, and other visual data.


Why the Other Answers Are Incorrect

B. To increase internet bandwidth

Computer vision does not affect networking speed.

C. To manage database backups

This is unrelated to computer vision.

D. To improve keyboard performance

This is unrelated to AI vision systems.


Question 2

Which Azure service provides computer vision capabilities such as OCR and image analysis?

A. Azure AI Vision
B. Azure Backup
C. Azure Virtual Machines
D. Azure DNS


Correct Answer

A. Azure AI Vision


Explanation

Azure AI Vision provides cloud-based computer vision capabilities including OCR, object detection, and image captioning.


Why the Other Answers Are Incorrect

B. Azure Backup

This is a backup service.

C. Azure Virtual Machines

This provides compute infrastructure.

D. Azure DNS

This is a networking service.


Question 3

What does OCR stand for?

A. Optical Character Recognition
B. Open Cloud Rendering
C. Object Classification Registry
D. Operational Compute Routing


Correct Answer

A. Optical Character Recognition


Explanation

OCR extracts text from images or scanned documents.


Why the Other Answers Are Incorrect

B. Open Cloud Rendering

This is not the meaning of OCR.

C. Object Classification Registry

This is unrelated to OCR.

D. Operational Compute Routing

This is not a computer vision term.


Question 4

What is the PRIMARY purpose of object detection?

A. To identify and locate objects within an image
B. To translate spoken language
C. To summarize long documents
D. To compress image files


Correct Answer

A. To identify and locate objects within an image


Explanation

Object detection identifies multiple objects and their locations inside an image.


Why the Other Answers Are Incorrect

B. To translate spoken language

This is a speech AI task.

C. To summarize long documents

This is a text analysis task.

D. To compress image files

Object detection does not compress files.


Question 5

What does image captioning do?

A. Generates natural-language descriptions of images
B. Converts speech into text
C. Encrypts image files
D. Creates database tables


Correct Answer

A. Generates natural-language descriptions of images


Explanation

Image captioning creates human-readable descriptions of visual content.


Why the Other Answers Are Incorrect

B. Converts speech into text

This is speech recognition.

C. Encrypts image files

Encryption is unrelated to captioning.

D. Creates database tables

This is unrelated to computer vision.


Question 6

How do lightweight vision applications typically communicate with Azure AI services?

A. Through APIs and endpoints
B. Through printer drivers
C. Through monitor settings
D. Through USB-only connections


Correct Answer

A. Through APIs and endpoints


Explanation

Applications use APIs and cloud endpoints to send images and receive AI-generated analysis results.


Why the Other Answers Are Incorrect

B. Through printer drivers

Printers are unrelated to AI communication.

C. Through monitor settings

This is unrelated to cloud AI services.

D. Through USB-only connections

Cloud services use network communication.


Question 7

Why is authentication important when accessing Azure AI Vision services?

A. To secure access to AI resources
B. To increase image brightness
C. To improve keyboard response time
D. To accelerate internet speeds


Correct Answer

A. To secure access to AI resources


Explanation

Authentication helps ensure that only authorized users and applications can access Azure AI services.


Why the Other Answers Are Incorrect

B. To increase image brightness

Authentication does not affect image quality.

C. To improve keyboard response time

This is unrelated to authentication.

D. To accelerate internet speeds

Authentication does not improve network performance.


Question 8

Which Responsible AI concern is especially important in computer vision systems?

A. Protecting personal and sensitive visual information
B. Increasing monitor resolution
C. Improving printer speed
D. Reducing spreadsheet file sizes


Correct Answer

A. Protecting personal and sensitive visual information


Explanation

Images may contain faces, documents, or other sensitive information that must be protected.


Why the Other Answers Are Incorrect

B. Increasing monitor resolution

This is unrelated to Responsible AI.

C. Improving printer speed

Printers are unrelated to computer vision ethics.

D. Reducing spreadsheet file sizes

This is unrelated to image analysis.


Question 9

What challenge can reduce computer vision accuracy?

A. Poor image quality
B. Spreadsheet formatting
C. Keyboard layout changes
D. Audio playback speed


Correct Answer

A. Poor image quality


Explanation

Blur, low lighting, and low resolution can negatively affect image analysis accuracy.


Why the Other Answers Are Incorrect

B. Spreadsheet formatting

This does not affect vision systems.

C. Keyboard layout changes

This is unrelated to image processing.

D. Audio playback speed

This is unrelated to computer vision.


Question 10

What are hallucinations in AI vision systems?

A. Incorrect or fabricated AI-generated outputs
B. Hardware installation failures
C. Network outages
D. Printer connection problems


Correct Answer

A. Incorrect or fabricated AI-generated outputs


Explanation

Hallucinations occur when AI systems generate inaccurate descriptions or detections.


Why the Other Answers Are Incorrect

B. Hardware installation failures

This is unrelated to AI-generated outputs.

C. Network outages

This is a connectivity issue.

D. Printer connection problems

This is unrelated to AI vision systems.


Final Thoughts

Building lightweight applications with vision capabilities is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand the foundational concepts behind computer vision applications, including image classification, object detection, OCR, APIs, authentication, Responsible AI principles, and real-world implementation workflows.

Azure AI Vision and Azure AI Foundry provide powerful cloud-based tools that make it easier to build intelligent applications capable of analyzing and understanding visual information.


Go to the AI-901 Exam Prep Hub main page

Extract information from documents and forms by using Azure Content Understanding in Foundry Tools (AI-901 Exam Prep)

This post is a part of the AI-901: Microsoft Azure AI Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Implement AI solutions by using Microsoft Foundry (55–60%)
--> Implement AI solutions for information extraction by using Foundry
--> Extract information from documents and forms by using Azure Content Understanding in Foundry Tools


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Organizations process enormous amounts of documents every day, including invoices, receipts, forms, contracts, and identification documents. AI-powered information extraction solutions help automate the process of reading, understanding, and organizing document data.

For the AI-901 certification exam, candidates should understand the foundational concepts behind extracting information from documents and forms by using Azure Content Understanding and Microsoft Foundry tools.

This topic falls under the “Implement AI solutions for information extraction by using Foundry” section of the AI-901 exam objectives.


What Is Information Extraction?

Information extraction is the process of identifying and retrieving useful data from documents, images, forms, audio, or other content.

Examples include extracting:

  • Names
  • Dates
  • Invoice totals
  • Addresses
  • Phone numbers
  • Product information

What Is Azure Content Understanding?

Azure Content Understanding helps AI systems analyze and interpret structured and unstructured documents.

Capabilities include:

  • Text extraction
  • Form recognition
  • Document analysis
  • Information classification
  • Key-value pair extraction

Azure AI Foundry

Azure AI Foundry provides tools for building, testing, and managing AI-powered applications.

Developers can:

  • Configure AI services
  • Process documents
  • Test extraction workflows
  • Build lightweight AI applications

Structured vs. Unstructured Documents


Structured Documents

Structured documents follow a consistent layout.

Examples include:

  • Tax forms
  • Invoices
  • Receipts
  • Application forms

Unstructured Documents

Unstructured documents have less predictable layouts.

Examples include:

  • Emails
  • Letters
  • Articles
  • Contracts

Optical Character Recognition (OCR)

OCR converts text within images or scanned documents into machine-readable text.


Example

Input

Scanned receipt image

OCR Output

  • Store name
  • Date
  • Total amount

Form Recognition

Form recognition identifies fields and values within forms.


Example

Form

Insurance application

Extracted Data

  • Customer name
  • Policy number
  • Address
  • Claim amount

Key-Value Pair Extraction

AI systems can identify relationships between labels and values.


Example

KeyValue
Invoice NumberINV-1045
Total$250.00
Due Date05/30/2026

Table Extraction

AI can identify and extract tables from documents.


Example

A receipt table may contain:

  • Item names
  • Quantities
  • Prices

Classification

Document classification identifies the type of document being processed.


Example

The system determines whether a file is:

  • Invoice
  • Contract
  • Receipt
  • Resume

Named Entity Recognition (NER)

NER identifies important entities within text.

Entities may include:

  • People
  • Organizations
  • Locations
  • Dates

Example

Text

“John Smith works for Contoso in Seattle.”

Extracted Entities

  • John Smith (Person)
  • Contoso (Organization)
  • Seattle (Location)

APIs and Endpoints

Applications communicate with Azure AI services through:

  • APIs
  • Endpoints

Documents are submitted for analysis programmatically.


Authentication

Applications must securely authenticate before accessing Azure AI services.

Common authentication methods include:

  • API keys
  • Azure credentials
  • Managed identities

Lightweight Application Workflow

A typical workflow includes:

  1. User uploads document
  2. Application sends file to AI service
  3. AI extracts information
  4. Results are returned
  5. Application displays or stores extracted data

Example Workflow

Input

Scanned invoice

AI Processing

  • OCR
  • Key-value extraction
  • Table analysis

Output

Structured invoice data


Example High-Level Pseudocode

document = upload_document()
results = analyze_document(document)
display_results(results)

For AI-901, understanding the workflow is more important than memorizing exact syntax.


Common Real-World Scenarios


Scenario 1: Invoice Processing

Goal

Automate invoice data extraction.

Features

  • OCR
  • Table extraction
  • Total amount detection

Scenario 2: Receipt Scanning

Goal

Extract purchase information from receipts.

Features

  • Text extraction
  • Merchant identification
  • Expense categorization

Scenario 3: Resume Processing

Goal

Extract candidate information from resumes.

Features

  • Name extraction
  • Skill identification
  • Contact information detection

Scenario 4: Healthcare Forms

Goal

Digitize patient records.

Features

  • Form recognition
  • Key-value extraction
  • Classification

Responsible AI Considerations

Document-processing applications should follow Responsible AI principles.

Key considerations include:

  • Privacy
  • Security
  • Fairness
  • Transparency
  • Accountability
  • Inclusiveness

Privacy Concerns

Documents may contain:

  • Personal information
  • Financial data
  • Medical information
  • Legal records

Organizations should protect sensitive data appropriately.


Security Considerations

Applications should secure:

  • Uploaded files
  • Stored documents
  • API credentials
  • Extracted data

Transparency

Users should understand:

  • AI is analyzing documents
  • Extracted data may contain errors
  • Human review may still be needed

Accuracy Limitations

AI extraction systems may struggle with:

  • Poor scan quality
  • Handwritten text
  • Complex layouts
  • Damaged documents

Hallucinations and Errors

AI systems may occasionally:

  • Extract incorrect values
  • Miss fields
  • Misclassify documents

Applications should validate important information.


Error Handling

Applications should handle:

  • Unsupported file formats
  • Corrupted documents
  • Authentication failures
  • Network interruptions
  • Rate limits

Advantages of Information Extraction AI

Benefits include:

  • Faster document processing
  • Reduced manual entry
  • Improved scalability
  • Increased automation
  • Better searchability

Limitations of Information Extraction AI

Challenges include:

  • Variable document quality
  • Handwriting recognition difficulties
  • Inconsistent layouts
  • Privacy concerns
  • Extraction inaccuracies

Generative AI and Information Extraction

Some modern systems combine:

  • OCR
  • Document intelligence
  • Generative AI

This enables:

  • Summarization
  • Question answering
  • Conversational document analysis

High-Level Architecture

A simplified architecture often includes:

  1. User uploads document
  2. Application sends document to Azure AI service
  3. AI analyzes content
  4. Structured data is returned
  5. Application displays or stores results

Important AI-901 Exam Tips

For the exam, remember these key points:

  • OCR extracts text from documents and images.
  • Form recognition identifies fields and values.
  • Key-value extraction identifies label-value relationships.
  • Table extraction retrieves structured table data.
  • Classification identifies document types.
  • APIs and endpoints connect applications to Azure AI services.
  • Authentication secures access to AI resources.
  • Responsible AI principles apply to document-processing systems.
  • Poor document quality can reduce extraction accuracy.
  • AI-generated outputs may still require validation.

Quick Knowledge Check

Question 1

What does OCR do?

Answer

Extracts machine-readable text from images or scanned documents.


Question 2

What is form recognition?

Answer

Identifying and extracting fields and values from forms.


Question 3

Why is authentication important?

Answer

It secures access to Azure AI services and protects resources.


Question 4

What can reduce extraction accuracy?

Answer

Poor scan quality, handwriting, and inconsistent document layouts.


Practice Exam Questions

Exam: AI-901

Topic: Extract Information from Documents and Forms by Using Azure Content Understanding in Foundry Tools


Question 1

What is the PRIMARY purpose of information extraction AI solutions?

A. To retrieve useful data from documents and content
B. To increase internet bandwidth
C. To replace operating systems
D. To improve monitor resolution


Correct Answer

A. To retrieve useful data from documents and content


Explanation

Information extraction AI systems identify and retrieve meaningful information such as names, dates, totals, and addresses from documents and forms.


Why the Other Answers Are Incorrect

B. To increase internet bandwidth

Information extraction does not affect network speed.

C. To replace operating systems

AI document processing does not replace operating systems.

D. To improve monitor resolution

This is unrelated to AI information extraction.


Question 2

What does OCR stand for?

A. Optical Character Recognition
B. Open Content Retrieval
C. Object Classification Routing
D. Operational Compute Reporting


Correct Answer

A. Optical Character Recognition


Explanation

OCR converts printed or handwritten text within images and scanned documents into machine-readable text.


Why the Other Answers Are Incorrect

B. Open Content Retrieval

This is not the meaning of OCR.

C. Object Classification Routing

This is unrelated to document analysis.

D. Operational Compute Reporting

This is not an OCR term.


Question 3

Which AI capability identifies fields and values within forms?

A. Form recognition
B. Speech synthesis
C. Image compression
D. Network monitoring


Correct Answer

A. Form recognition


Explanation

Form recognition extracts structured information such as names, dates, totals, and addresses from forms and documents.


Why the Other Answers Are Incorrect

B. Speech synthesis

This converts text into speech.

C. Image compression

This reduces file size and is unrelated to field extraction.

D. Network monitoring

This is unrelated to document AI.


Question 4

Which Azure platform provides tools for building and managing AI-powered applications?

A. Azure AI Foundry
B. Microsoft Paint
C. Windows Task Manager
D. Azure DNS


Correct Answer

A. Azure AI Foundry


Explanation

Azure AI Foundry provides tools for deploying, testing, and managing AI applications and services.


Why the Other Answers Are Incorrect

B. Microsoft Paint

Paint is a graphics editor.

C. Windows Task Manager

This is a system monitoring tool.

D. Azure DNS

This is a networking service.


Question 5

What is key-value pair extraction?

A. Identifying labels and their associated values in documents
B. Encrypting document files
C. Compressing image sizes
D. Converting audio into text


Correct Answer

A. Identifying labels and their associated values in documents


Explanation

Key-value extraction identifies relationships such as:

  • Invoice Number → INV-1045
  • Total → $250.00

Why the Other Answers Are Incorrect

B. Encrypting document files

Encryption is unrelated to data extraction.

C. Compressing image sizes

Compression is unrelated to document intelligence.

D. Converting audio into text

This is speech recognition.


Question 6

What is the purpose of document classification?

A. To identify the type of document being processed
B. To increase network performance
C. To generate music files
D. To repair damaged documents physically


Correct Answer

A. To identify the type of document being processed


Explanation

Document classification determines whether a file is an invoice, contract, receipt, resume, or another document type.


Why the Other Answers Are Incorrect

B. To increase network performance

Classification does not improve networking.

C. To generate music files

This is unrelated to document AI.

D. To repair damaged documents physically

AI classification does not physically repair documents.


Question 7

How do lightweight document-processing applications typically communicate with Azure AI services?

A. Through APIs and endpoints
B. Through USB-only connections
C. Through monitor calibration tools
D. Through printer drivers


Correct Answer

A. Through APIs and endpoints


Explanation

Applications send documents to Azure AI services using APIs and endpoints and receive structured analysis results.


Why the Other Answers Are Incorrect

B. Through USB-only connections

Cloud services use network communication.

C. Through monitor calibration tools

This is unrelated to AI services.

D. Through printer drivers

Printers are unrelated to cloud AI communication.


Question 8

Which factor can reduce the accuracy of document extraction systems?

A. Poor document quality
B. Spreadsheet color themes
C. Keyboard layout changes
D. Audio playback speed


Correct Answer

A. Poor document quality


Explanation

Blurry scans, damaged pages, handwriting, and poor lighting can negatively affect extraction accuracy.


Why the Other Answers Are Incorrect

B. Spreadsheet color themes

This does not affect document extraction AI.

C. Keyboard layout changes

This is unrelated to AI document analysis.

D. Audio playback speed

This is unrelated to document processing.


Question 9

Why is authentication important when using Azure AI services?

A. To secure access to AI resources
B. To improve image resolution
C. To increase internet speed
D. To compress document files


Correct Answer

A. To secure access to AI resources


Explanation

Authentication ensures that only authorized users and applications can access AI services.


Why the Other Answers Are Incorrect

B. To improve image resolution

Authentication does not affect image quality.

C. To increase internet speed

Authentication does not improve networking.

D. To compress document files

Authentication is unrelated to file compression.


Question 10

Which Responsible AI concern is especially important when processing documents?

A. Protecting sensitive personal information
B. Increasing monitor brightness
C. Improving printer speed
D. Reducing spreadsheet file size


Correct Answer

A. Protecting sensitive personal information


Explanation

Documents may contain financial, medical, legal, or personal information that must be protected appropriately.


Why the Other Answers Are Incorrect

B. Increasing monitor brightness

This is unrelated to Responsible AI.

C. Improving printer speed

This is unrelated to document intelligence.

D. Reducing spreadsheet file size

This is unrelated to AI ethics or privacy.


Final Thoughts

Extracting information from documents and forms using Azure Content Understanding and Foundry tools is an important topic for the AI-901 certification exam. Microsoft expects candidates to understand foundational concepts such as OCR, form recognition, document analysis, APIs, authentication, Responsible AI principles, and lightweight document-processing workflows.

Azure AI services and Azure AI Foundry provide powerful tools for automating information extraction and improving efficiency across business, healthcare, finance, and administrative scenarios.


Go to the AI-901 Exam Prep Hub main page