Tag: DP-900

DP-900, Microsoft Certification May 10, 2026

Describe Features of Analytical Workloads (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe core data concepts (25–30%)
   --> Describe common data workloads
      --> Describe features of analytical workloads

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Analytical workloads are essential for deriving insights from data. Unlike transactional workloads — which support day-to-day operations — analytical workloads focus on querying, aggregating, summarizing, and analyzing large volumes of data to help with reporting, decision making, and trends.

What Is an Analytical Workload?

An analytical workload refers to data processing that is oriented toward analysis, rather than operational updates. It is optimized for:

Complex queries
Aggregations across large datasets
Historical analysis and reporting
Business intelligence (BI)

Analytical workloads are often associated with OLAP (Online Analytical Processing) systems.

Key Features of Analytical Workloads

1. Large Volumes of Data

Analytical systems often operate on datasets that are:

Much larger than transactional tables
Historical — spanning months or years of records
Combined from multiple sources (e.g., transactional systems, logs, external data)

These datasets can be stored in data warehouses, data lakes, or big data systems.

2. Complex, Read-Heavy Queries

Analytical workloads are dominated by complex SELECT queries, often involving:

Aggregations (SUM, AVG, COUNT)
Grouping by categories
Filtering on multiple dimensions
Joining large tables

These queries can be computationally intensive and are often used for reporting and dashboards.

3. Denormalized or Columnar Storage

Unlike transactional systems that use normalized schemas, analytical workloads often use:

Denormalized schemas (e.g., star or snowflake schemas)
Columnar storage formats (e.g., Parquet, ORC)

These formats improve query performance by minimizing I/O and enabling efficient aggregation.

4. Longer Query Response Times (But High Throughput)

Queries in analytical systems are not always expected to return results in milliseconds, as they:

Scan large amounts of data
Compute aggregates and summaries
May be optimized for throughput rather than low latency

This contrasts with transactional systems where fast, small transactions are critical.

5. Batch or Bulk Processing

Analytical workloads often rely on:

Batch ingestion of data (e.g., nightly ETL jobs)
Data transformation pipelines (cleaning, aggregating, enriching)
Tools like Azure Data Factory, Databricks, or Synapse pipelines

These pipelines prepare data for analytics and reporting.

6. Support for BI and Reporting Tools

Analytical workloads integrate with business intelligence tools, such as:

Power BI
Excel
Azure Synapse Analytics Studio

These tools connect directly to analytical stores to produce dashboards, charts, and insights.

Analytical vs Transactional Workloads — Quick Comparison

Feature	Transactional	Analytical
Primary Purpose	Operational processing (OLTP)	Decision support & reporting (OLAP)
Data Size	Small to moderate	Large or very large
Workload Type	Frequent inserts/updates/deletes	Complex queries/aggregations
Schema	Normalized	Often denormalized
Query Focus	Single record operations	Scanning many records
Typical Tools	Relational OLTP databases	Data warehouses, big data systems

Where Analytical Workloads Run in Azure

Azure offers several services optimized for analytical workloads:

Azure Synapse Analytics

A unified analytics service that enables:

Data warehousing
Big data processing
Integration with Spark and SQL
High-performance analytics

It is ideal for large-scale reporting and BI scenarios.

Azure Data Lake Storage + Analytics

Azure Data Lake Storage Gen2 works with:

Apache Spark
Azure Databricks
Synapse Analytics

This combination supports big data analytics, machine learning, and data science workloads.

Azure SQL Data Warehouse (Synapse Dedicated SQL Pools)

This is the former SQL DW offering (now part of Synapse) optimized for:

Massive parallel processing
Distributed query execution
High-volume analytical queries

Why Analytical Workloads Matter for DP-900

For DP-900, you should be able to:

Define analytical workloads and distinguish them from transactional workloads
Recognize use cases where analytical workloads are appropriate
Identify Azure services designed for analytical processing
Understand schema design and storage options that support analytics

Being able to describe these features shows your understanding of how modern data ecosystems support business intelligence and analytics.

Summary — Exam-Relevant Takeaways

✔ Analytical workloads focus on complex queries and analysis across large datasets
✔ They use denormalized schemas and columnar storage to boost performance
✔ They are optimized for throughput and summarization, not real-time transactions
✔ They typically support reports, dashboards, and insights
✔ Azure services like Azure Synapse Analytics, Azure Data Lake, and Databricks support these workloads

Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

DP-900, Microsoft Certification May 10, 2026

Practice Questions: Describe Features of Transactional Workloads (DP-900 Exam Prep)

Practice Questions

Question 1

Which scenario best represents a transactional workload?

A. Generating monthly sales reports
B. Training a machine learning model
C. Recording a customer purchase in real time
D. Visualizing historical trends

✅ Answer: C

Explanation:
Transactional workloads capture operational business events as they occur.

Question 2

Which characteristic is most closely associated with transactional workloads?

A. Large batch queries
B. Complex aggregations
C. Frequent small read/write operations
D. Historical trend analysis

✅ Answer: C

Explanation:
Transactional systems perform many small, fast inserts, updates, and deletes.

Question 3

Which ACID property ensures that completed transactions are permanently saved?

A. Atomicity
B. Consistency
C. Isolation
D. Durability

✅ Answer: D

Explanation:
Durability guarantees that once a transaction commits, it remains stored even after failures.

Question 4

A banking system transfers money between accounts. If either debit or credit fails, both must roll back.

Which ACID property does this demonstrate?

A. Consistency
B. Isolation
C. Atomicity
D. Durability

✅ Answer: C

Explanation:
Atomicity ensures that a transaction is all-or-nothing.

Question 5

Transactional workloads typically use which type of schema design?

A. Denormalized
B. Star schema
C. Snowflake schema
D. Normalized

✅ Answer: D

Explanation:
Transactional systems usually use normalized schemas to reduce redundancy and enforce integrity.

Question 6

Which Azure service is MOST appropriate for a traditional OLTP application?

A. Azure Synapse Analytics
B. Azure SQL Database
C. Azure Data Lake Storage
D. Azure Blob Storage

✅ Answer: B

Explanation:
Azure SQL Database is optimized for transactional (OLTP) workloads with ACID support.

Question 7

Which requirement is most critical for transactional workloads?

A. High throughput for batch queries
B. Schema flexibility
C. Low latency and strong consistency
D. Historical data retention

✅ Answer: C

Explanation:
Transactional workloads prioritize fast response times and data consistency.

Question 8

Which workload is LEAST likely to be transactional?

A. Updating inventory levels
B. Processing credit card payments
C. Inserting new customer records
D. Running yearly financial summaries

✅ Answer: D

Explanation:
Yearly summaries are analytical, not transactional.

Question 9

Which statement about transactional workloads is TRUE?

A. They primarily analyze historical data
B. They usually involve complex joins across millions of rows
C. They support operational business processes
D. They are optimized for reporting

✅ Answer: C

Explanation:
Transactional workloads support daily operations such as orders, payments, and updates.

Question 10

An e-commerce application must confirm orders instantly and ensure inventory counts are always correct.

Which workload type does this describe?

A. Analytical
B. Batch
C. Streaming
D. Transactional

✅ Answer: D

Explanation:
Real-time order processing with consistency requirements is transactional.

✅ Exam Tips for Transactional Workloads

For DP-900, remember:

✔ Focus on real-time operational processing
✔ Think OLTP
✔ Many small reads/writes
✔ ACID compliance
✔ Low latency + strong consistency
✔ Typically normalized schemas
✔ Azure SQL Database is the classic example

Go to the DP-900 Exam Prep Hub main page.

Databases, DP-900, Microsoft Certification May 10, 2026

Practice Questions: Describe Types of Databases (DP-900 Exam Prep)

Practice Questions

Question 1

You need to store customer orders in tables with fixed columns and enforce relationships between customers and orders.

Which type of database should you use?

A. Graph
B. Document
C. Relational
D. Key-value

✅ Answer: C

Explanation:
Relational databases store structured data in tables with defined schemas and support relationships via keys.

Question 2

Which characteristic best describes a relational database?

A. Schema-less storage
B. Data stored as JSON documents
C. Tables with rows and columns
D. Nodes and edges

✅ Answer: C

Explanation:
Relational databases organize data into tables (rows and columns) and use SQL for querying.

Question 3

An application must store user profiles in flexible JSON documents where each user may have different attributes.

Which database type is most appropriate?

A. Column-family
B. Document
C. Relational
D. Graph

✅ Answer: B

Explanation:
Document databases store data as JSON-like documents and allow flexible schemas — ideal for user profiles.

Question 4

Which Azure service supports multiple NoSQL data models such as Core (SQL) API, Table API, Cassandra API, and Gremlin API?

A. Azure SQL Database
B. Azure Table Storage
C. Azure Cosmos DB
D. Azure Database for PostgreSQL

✅ Answer: C

Explanation:
Azure Cosmos DB is a globally distributed, multi-model NoSQL database service.

Question 5

You are designing a recommendation engine that analyzes relationships between users and products.

Which database type is best suited?

A. Relational
B. Key-value
C. Graph
D. Column-family

✅ Answer: C

Explanation:
Graph databases specialize in relationship-heavy data using nodes and edges.

Question 6

Which statement about NoSQL databases is TRUE?

A. They always require fixed schemas
B. They primarily use SQL
C. They are optimized for horizontal scaling
D. They cannot store structured data

✅ Answer: C

Explanation:
NoSQL databases are designed for horizontal scaling and flexible schemas.

Question 7

You need extremely fast lookups using a unique identifier, and the data structure is simple.

Which NoSQL model should you choose?

A. Document
B. Graph
C. Column-family
D. Key-value

✅ Answer: D

Explanation:
Key-value databases store data as key/value pairs and provide very fast retrieval.

Question 8

Which Azure service is best suited for structured transactional workloads using SQL?

A. Azure Blob Storage
B. Azure Cosmos DB
C. Azure SQL Database
D. Azure Data Lake Storage

✅ Answer: C

Explanation:
Azure SQL Database is a managed relational database service optimized for structured transactional data.

Question 9

Which feature is typically associated with relational databases but not guaranteed in NoSQL systems?

A. Global distribution
B. Flexible schemas
C. ACID transactions
D. Horizontal scaling

✅ Answer: C

Explanation:
Relational databases traditionally provide full ACID transaction support.

Question 10

A company collects massive volumes of time-series telemetry data where columns may vary across rows.

Which database type fits this scenario best?

A. Relational
B. Document
C. Column-family
D. Graph

✅ Answer: C

Explanation:
Column-family (wide-column) databases are well suited for large, sparse datasets such as time-series data.

✅ Key Exam Reminders

For DP-900, make sure you can confidently:

Distinguish relational vs non-relational
Recognize NoSQL models (key-value, document, column-family, graph)
Match Azure services to database types (especially Azure SQL vs Azure Cosmos DB)
Choose the right database type for a scenario

Go to the DP-900 Exam Prep Hub main page.

Data Engineering, Data Integration, DP-900, Microsoft Certification May 10, 2026

Describe Common Formats for Data Files (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe core data concepts (25–30%)
   --> Identify options for data storage
      --> Describe common formats for data files

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

In DP-900, Microsoft expects you to understand common data file formats, what type of data they typically store (structured, semi-structured, or unstructured), and why certain formats are used in analytics and Azure storage scenarios.

This topic connects directly to Azure Blob Storage, Azure Data Lake Storage, and analytics pipelines.

Why Data File Formats Matter

Data file formats define:

How data is organized inside a file
Whether the data is human-readable or binary
How efficiently it can be stored and queried
Which tools and services can process it

Choosing the right format impacts:

Performance
Storage cost
Analytics capabilities
Interoperability between systems

For DP-900, focus on understanding what each format is used for, not deep implementation details.

Common Data File Formats You Should Know

1. CSV (Comma-Separated Values)

CSV is one of the simplest and most widely used formats for structured data.

Key Characteristics

Plain text
Each row represents a record
Columns separated by commas (or other delimiters)
No embedded schema
Human readable

Example:

			
CustomerID,Name,City
1,John,Seattle
2,Maria,Austin

Typical Use Cases

Data exports and imports
Simple datasets
Spreadsheet interoperability

Exam Notes

Represents structured data
Lightweight and easy to move between systems
No support for nested structures or data types

2. JSON (JavaScript Object Notation)

JSON is the most common format for semi-structured data, especially in modern applications and APIs.

Key Characteristics

Key–value pairs
Supports nested objects and arrays
Self-describing
Human readable
Schema-on-read

Example:

			
{
  "CustomerID": 1,
  "Name": "John",
  "Orders": [
    { "OrderID": 100, "Amount": 50 }
  ]
}

		

Typical Use Cases

Web APIs
Application data
Azure Cosmos DB documents
Logs and telemetry

Exam Notes

Represents semi-structured data
Flexible schema
Commonly used with Azure Cosmos DB and Azure Data Lake

3. XML (Extensible Markup Language)

XML is another semi-structured format that uses tags to describe data.

Key Characteristics

Tag-based hierarchy
Supports nested structures
Human readable but verbose
Self-describing

Example:

			
<Customer>
  <CustomerID>1</CustomerID>
  <Name>John</Name>
</Customer>

Typical Use Cases

Legacy systems
Configuration files
Enterprise data exchange

Exam Notes

Semi-structured
Less common than JSON in modern Azure solutions

4. Parquet

Parquet is a columnar, binary file format optimized for analytics workloads.

Key Characteristics

Column-based storage
Highly compressed
Not human readable
Very fast for analytical queries

Typical Use Cases

Big data analytics
Azure Synapse Analytics
Azure Data Lake Storage

Exam Notes

Used for large analytical datasets
Optimized for performance and storage efficiency
Common in modern data engineering pipelines

5. Avro

Avro is a binary format designed for data serialization and streaming.

Key Characteristics

Compact binary format
Includes schema with the data
Efficient for data movement
Not human readable

Typical Use Cases

Data pipelines
Event streaming
Big data ingestion

Exam Notes

Often used behind the scenes in analytics platforms
Supports schema evolution

6. Plain Text Files

Simple text files may also be used to store unstructured or loosely structured data.

Examples

Log files
Notes
Raw exports

Exam Notes

Usually treated as unstructured data
Stored in Azure Blob Storage or Data Lake

How These Formats Map to Data Types

This mapping is important for DP-900 questions:

Format	Data Type
CSV	Structured
JSON	Semi-structured
XML	Semi-structured
Parquet	Structured / Analytics
Avro	Semi-structured
TXT	Unstructured

Where These Formats Are Stored in Azure

You’ll commonly see these formats stored in:

Azure Blob Storage

Primary storage for files
Supports all formats (CSV, JSON, Parquet, images, etc.)
Used for unstructured and semi-structured data

Azure Data Lake Storage Gen2

Built on Blob Storage
Optimized for analytics
Common for Parquet and Avro files
Used with Azure Synapse and Azure Data Factory

Why This Matters for DP-900

On the exam, file formats typically appear in scenarios like:

Choosing storage for CSV or JSON files
Identifying formats used in analytics pipelines
Recognizing Parquet in big data workloads
Distinguishing structured vs semi-structured file types

You’re expected to understand purpose and characteristics, not internal file mechanics.

Summary — Exam-Relevant Takeaways

For DP-900, remember:

✔ CSV → structured, simple, text-based
✔ JSON / XML → semi-structured, flexible, self-describing
✔ Parquet → columnar, compressed, analytics-optimized
✔ Avro → binary, schema included, streaming-friendly
✔ TXT → unstructured

And:

These formats are commonly stored in Azure Blob Storage or Azure Data Lake Storage
Analytics formats (Parquet/Avro) are used with Azure Synapse and big data workloads

Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

DP-900, Microsoft Certification May 10, 2026

Describe Features of Semi-Structured Data (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe core data concepts (25–30%)
   --> Describe ways to represent data
      --> Describe features of semi-structured data

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Introduction

For the DP-900 exam, semi-structured data sits between structured and unstructured data. You’re expected to understand what it is, how it’s organized, and why Azure provides specialized services to store and query it.

What Is Semi-Structured Data?

Semi-structured data is data that does not follow a rigid, tabular schema like relational data, but still contains organizational markers or tags that make it partially structured and machine readable.

Unlike structured data (rows and columns), semi-structured data:

Does not require a predefined schema
Can vary in shape from record to record
Still contains self-describing elements such as key–value pairs or hierarchical structures

In other words, semi-structured data has some structure — just not fixed tables.

Common examples include:

JSON documents
XML files
YAML
Avro / Parquet (used in analytics pipelines)

Key Features of Semi-Structured Data

1. Schema-on-Read (Not Schema-on-Write)

One of the most important characteristics of semi-structured data is schema-on-read.

This means:

Data is stored without enforcing a strict schema
Structure is interpreted when the data is queried or analyzed

This contrasts with structured data, which uses schema-on-write, where structure must be defined before data is inserted.

For DP-900, remember:

Semi-structured data is flexible at ingestion time and structured at query time.

2. Flexible and Evolving Structure

Each record in a semi-structured dataset can contain:

Different fields
Nested objects
Optional attributes

Example (JSON):

			
{
  "CustomerID": 123,
  "Name": "Sarah",
  "Orders": [
    { "OrderID": 1, "Amount": 50 },
    { "OrderID": 2, "Amount": 75 }
  ]
}

		

Another record in the same dataset might include extra fields like Email or omit Orders entirely.

This flexibility makes semi-structured data ideal for:

Application telemetry
IoT data
User activity logs
Rapidly changing systems

3. Hierarchical or Nested Organization

Semi-structured data often uses hierarchies rather than flat tables.

For example:

JSON objects inside objects
XML elements within elements

This nested design allows complex relationships to exist inside a single document — something that would require multiple tables in relational systems.

4. Self-Describing Format

Semi-structured data embeds its own metadata using:

Keys
Tags
Field names

This makes the data self-describing, meaning applications can understand what each value represents without relying on an external schema definition.

Example:

"Temperature": 72

The key itself describes the value.

5. Easily Transported Across Systems

Semi-structured formats such as JSON and XML are:

Human readable
Platform independent
Widely supported across APIs and applications

This is why most modern web services exchange data using JSON.

Common Formats of Semi-Structured Data

You should recognize these for DP-900:

Format	Description
JSON	Most common format for APIs and applications
XML	Tag-based hierarchical format
YAML	Human-friendly configuration format
Avro / Parquet	Columnar formats used in analytics pipelines

Where Semi-Structured Data Is Used in Azure

Microsoft Azure provides specialized services designed to handle semi-structured data:

Azure Cosmos DB

Stores JSON documents
Supports schema-less designs
Designed for globally distributed applications
Optimized for flexible data models

Azure Data Lake Storage

Stores large volumes of semi-structured files
Used in analytics pipelines
Often paired with Azure Synapse or Azure Data Factory

These services are built specifically for workloads where structure changes frequently or cannot be fully defined in advance.

Why Semi-Structured Data Matters for DP-900

Understanding semi-structured data helps you:

Distinguish it from relational (structured) data
Identify appropriate Azure services (especially Cosmos DB)
Understand modern application and analytics architectures

On the exam, you’ll typically see semi-structured data appear in scenarios involving:

JSON documents
Application telemetry
IoT data
Log files

Structured vs Semi-Structured (Quick Comparison)

Structured	Semi-Structured
Fixed schema	Flexible schema
Rows and columns	Documents / nested objects
Schema-on-write	Schema-on-read
SQL databases	Document databases
Highly consistent	Shape varies by record

Summary — Exam-Relevant Takeaways

For DP-900, remember:

✔ Semi-structured data has no fixed schema
✔ Uses schema-on-read
✔ Supports nested and hierarchical structures
✔ Common formats: JSON, XML
✔ Often stored in Azure Cosmos DB or Data Lake
✔ Ideal for rapidly changing or document-based data

Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

DP-900, Microsoft Certification May 10, 2026

Practice Questions: Describe Common Formats for Data Files (DP-900 Exam Prep)

Practice Questions

Question 1

Which file format is most commonly used to store simple structured data in a plain-text, tabular form?

A. JSON
B. Parquet
C. CSV
D. Avro

✅ Answer: C

Explanation:
CSV (Comma-Separated Values) stores structured data as rows and columns in plain text and is widely used for data exchange.

Question 2

Which format is most associated with semi-structured data and commonly used by web APIs?

A. CSV
B. JSON
C. TXT
D. JPEG

✅ Answer: B

Explanation:
JSON uses key–value pairs and nested objects, making it ideal for semi-structured application data and APIs.

Question 3

A data engineering team needs a highly compressed, column-based file format optimized for analytics queries in Azure Synapse. Which format should they use?

A. XML
B. CSV
C. Parquet
D. TXT

✅ Answer: C

Explanation:
Parquet is a columnar, binary format designed for high-performance analytics and efficient storage.

Question 4

Which file format is tag-based, verbose, and commonly seen in legacy systems?

A. JSON
B. XML
C. Avro
D. CSV

✅ Answer: B

Explanation:
XML is a semi-structured, tag-based format often used in older enterprise systems and integrations.

Question 5

Which format is binary, includes schema information, and is commonly used in streaming or ingestion pipelines?

A. CSV
B. JSON
C. Avro
D. TXT

✅ Answer: C

Explanation:
Avro is a compact binary format that embeds schema and supports schema evolution, making it suitable for pipelines and streaming.

Question 6

A company stores application logs as JSON files in Azure Data Lake Storage. What type of data is this?

A. Structured
B. Semi-structured
C. Unstructured
D. Relational

✅ Answer: B

Explanation:
JSON represents semi-structured data because it uses keys and nested structures but does not enforce a fixed schema.

Question 7

Which format is most appropriate for exchanging small datasets between systems and opening directly in Excel?

A. Parquet
B. Avro
C. CSV
D. XML

✅ Answer: C

Explanation:
CSV is lightweight, human readable, and easily opened in spreadsheet tools like Excel.

Question 8

Which Azure service is most commonly used to store files such as CSV, JSON, Parquet, images, and videos?

A. Azure SQL Database
B. Azure Cosmos DB
C. Azure Blob Storage
D. Azure Table Storage

✅ Answer: C

Explanation:
Azure Blob Storage is Azure’s primary service for storing files of all formats, including structured, semi-structured, and unstructured data.

Question 9

Which format is not human readable and primarily optimized for analytics workloads?

A. CSV
B. JSON
C. Parquet
D. XML

✅ Answer: C

Explanation:
Parquet is a binary format optimized for performance and compression, not human readability.

Question 10

Match the format to the most appropriate data type:

Which pairing is correct?

A. CSV → Unstructured
B. JSON → Structured
C. TXT → Semi-structured
D. Parquet → Structured / Analytics

✅ Answer: D

Explanation:
Parquet is commonly used for structured analytical datasets in big data and Azure analytics workloads.

✅ Quick Exam Takeaways

For DP-900, remember:

CSV → Structured, plain text
JSON / XML → Semi-structured
Parquet → Columnar, analytics-optimized
Avro → Binary, schema included, pipeline-friendly
TXT → Usually unstructured

And:

These formats typically live in Azure Blob Storage or Azure Data Lake Storage
Parquet and Avro are common in analytics and data engineering pipelines

Go to the DP-900 Exam Prep Hub main page.

DP-900, Microsoft Certification May 10, 2026

Practice Questions: Describe Features of Unstructured Data (DP-900 Exam Prep)

Practice Questions

Question 1

Which statement best describes unstructured data?

A. Data organized in rows and columns
B. Data with flexible key–value pairs
C. Data without a predefined schema or consistent structure
D. Data stored only in relational databases

✅ Answer: C

Explanation:
Unstructured data has no predefined schema and does not naturally fit into tables.

Question 2

Which of the following is an example of unstructured data?

A. A customer table in Azure SQL Database
B. A JSON document
C. A PDF document
D. A CSV file

✅ Answer: C

Explanation:
PDF documents are classic unstructured data. JSON is semi-structured, and CSV is structured.

Question 3

Which Azure service is primarily used to store unstructured data such as images and videos?

A. Azure SQL Database
B. Azure Cosmos DB
C. Azure Blob Storage
D. Azure Table Storage

✅ Answer: C

Explanation:
Azure Blob Storage is Azure’s primary service for storing unstructured data like media files and documents.

Question 4

Why can’t unstructured data typically be queried directly using SQL?

A. SQL is deprecated
B. Unstructured data lacks a schema
C. SQL only works on cloud platforms
D. Unstructured data is encrypted

✅ Answer: B

Explanation:
SQL relies on schemas and tables. Unstructured data has no inherent structure, so it requires additional processing before analysis.

Question 5

Which workload most commonly generates unstructured data?

A. Financial transaction systems
B. Inventory databases
C. Media content platforms
D. Payroll systems

✅ Answer: C

Explanation:
Media platforms generate images, videos, and audio — all unstructured data.

Question 6

How is unstructured data typically stored?

A. As relational records
B. As nested documents
C. As files or binary objects
D. As key–value pairs

✅ Answer: C

Explanation:
Unstructured data is stored as files or blobs, not rows or documents.

Question 7

Which capability is commonly required to extract meaning from unstructured text data?

A. SQL joins
B. Index clustering
C. Natural language processing
D. Primary keys

✅ Answer: C

Explanation:
Unstructured text requires NLP or AI techniques to derive insights.

Question 8

Which statement correctly compares unstructured and semi-structured data?

A. Both require fixed schemas
B. Semi-structured data has no internal organization
C. Unstructured data contains embedded keys
D. Semi-structured data is machine readable, unstructured typically is not

✅ Answer: D

Explanation:
Semi-structured data (like JSON) contains keys/tags, while unstructured data does not.

Question 9

A company stores call recordings and scanned documents for compliance. What type of data is this?

A. Structured
B. Semi-structured
C. Unstructured
D. Relational

✅ Answer: C

Explanation:
Audio files and scanned documents are unstructured data.

Question 10

Which is a key characteristic of unstructured data?

A. Strong data typing
B. Fixed schema
C. Hierarchical documents
D. Requires AI or analytics tools for interpretation

✅ Answer: D

Explanation:
Unstructured data typically needs AI, machine learning, or analytics tools (such as computer vision or text analytics) to extract meaning.

✅ Quick Exam Takeaways

For DP-900, remember:

Unstructured data has no schema
Stored as files/blobs
Not directly queryable with SQL
Requires AI or analytics for insight
Common Azure service: Azure Blob Storage
Examples: images, videos, PDFs, audio, free-form text

Go to the DP-900 Exam Prep Hub main page.

DP-900, Microsoft Certification May 10, 2026

Practice Questions: Describe features of semi-structured data (DP-900 Exam Prep)

Practice Questions

Question 1

Which statement best describes semi-structured data?

A. Data stored strictly in rows and columns
B. Data with no identifiable organization
C. Data that uses a flexible structure with self-describing elements
D. Data that can only be stored in relational databases

✅ Answer: C

Explanation:
Semi-structured data does not use rigid tables but contains self-describing elements (such as key–value pairs or tags) that provide partial structure.

Question 2

Which of the following is a common format for semi-structured data?

A. CSV
B. JSON
C. JPEG
D. MP4

✅ Answer: B

Explanation:
JSON is one of the most common semi-structured formats used in APIs, applications, and document databases.

Question 3

Semi-structured data typically uses which schema approach?

A. Schema-on-write
B. Schema-on-delete
C. Schema-on-read
D. Fixed schema

✅ Answer: C

Explanation:
Semi-structured data uses schema-on-read, meaning structure is applied when the data is queried, not when it is stored.

Question 4

Which Azure service is commonly used to store JSON-based semi-structured data?

A. Azure SQL Database
B. Azure Blob Storage only
C. Azure Cosmos DB
D. Azure Files

✅ Answer: C

Explanation:
Azure Cosmos DB is a globally distributed NoSQL service designed to store semi-structured JSON documents.

Question 5

Which characteristic differentiates semi-structured data from structured data?

A. It cannot be queried
B. It requires primary keys
C. It allows records with different fields
D. It must be stored in spreadsheets

✅ Answer: C

Explanation:
In semi-structured data, individual records can have different attributes, unlike structured data which enforces uniform columns.

Question 6

What does it mean when semi-structured data is described as self-describing?

A. It automatically documents itself
B. It contains embedded field names or tags
C. It always includes metadata files
D. It uses SQL syntax

✅ Answer: B

Explanation:
Semi-structured data includes keys or tags (like JSON property names) that describe the values they contain.

Question 7

Which scenario best represents semi-structured data?

A. A customer table with fixed columns
B. A collection of images
C. Application logs stored as JSON documents
D. Audio recordings

✅ Answer: C

Explanation:
JSON-based application logs are classic examples of semi-structured data.

Question 8

Why is semi-structured data well suited for rapidly changing applications?

A. It enforces strict schemas
B. It supports schema-on-read and flexible structures
C. It requires fewer storage resources
D. It must be normalized

✅ Answer: B

Explanation:
Semi-structured data allows flexible schemas, making it ideal when data models evolve frequently.

Question 9

Which feature allows nested objects in semi-structured data?

A. Tabular organization
B. Hierarchical structure
C. Index clustering
D. Column constraints

✅ Answer: B

Explanation:
Semi-structured data supports hierarchical and nested structures, such as JSON objects inside other objects.

Question 10

Which workload most commonly produces semi-structured data?

A. Financial ledger systems
B. Payroll databases
C. Web APIs and application telemetry
D. Spreadsheet reporting

✅ Answer: C

Explanation:
Web services, application telemetry, and IoT systems frequently generate JSON or similar semi-structured formats.

✅ Quick Exam Takeaways

For DP-900, remember:

Semi-structured data uses schema-on-read
Records can have different fields
Supports nested / hierarchical structures
Common formats: JSON, XML
Common Azure service: Azure Cosmos DB
Ideal for applications, telemetry, logs, and IoT

Go to the DP-900 Exam Prep Hub main page.

DP-900, Microsoft Certification May 10, 2026

Describe Features of Structured Data (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe core data concepts (25–30%)
   --> Describe ways to represent data
      --> Describe features of structured data

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Introduction

In the DP-900 exam, you’ll need to understand what structured data is, how it’s organized, and why its characteristics matter in the context of data storage, querying, and analytics.

What Is Structured Data?

Structured data refers to information that is organized in a well-defined format and schema, typically into tables with rows and columns. Each row represents a record (an instance of an entity), and each column represents an attribute of that entity — with a specific data type (like integer, string, date). Because the structure is defined up-front, systems know exactly how to store, validate, and query the data.

In practical terms, structured data is the type of data you find in:

Relational databases (e.g., Azure SQL Database)
Spreadsheets (e.g., Excel)
Data warehouses

In spreadsheets for example, each row is a record (e.g., a customer, an employee, a sale), and each column is a specific attribute (e.g., customer name, hire date, transaction amount). The schema tells the system exactly how the data is arranged. This predictably organized format makes structured data highly efficient for storage, retrieval, and analysis.

Key Features of Structured Data

1. Predefined Schema

Structured data has a fixed schema — a blueprint that defines how data is organized before it is stored or queried.

Every table has a set number of columns.
Each column has a defined name and data type (such as integer, decimal, date/time, text).
Attempts to insert data that does not conform to the schema are typically rejected.

The database enforces rules that each column only accepts (and contains) compatible values. For example, the HireDate column, which has a “date” datatype, would not allow an entry of the employee’s job title (which is of the varchar or string datatype).

2. Tabular Organization

Structured data is most often stored in tables – and a table is made up of rows and columns:

Rows represent individual records (e.g., a customer, a sale, an order).
Columns represent data attributes (e.g., customer name, transaction amount, order date).

This row/column model is familiar from relational databases and spreadsheets. This tabular layout makes the data predictable, easy to view, understand, and ingest into analytical tools.

3. Strong Data Typing

Each column has a specific datatype and validation rules, such as:

Integer
String or text
Date/time
Decimal or numeric
Boolean

This data typing, which prevents invalid values from being stored, helps maintain data integrity, reduce errors, and ensure consistent interpretation of values. For example:

A “DateOfBirth” column only accepts dates.
A “Price” column only accepts numeric values.

Strong typing also allows database engines to optimize storage and querying. For example:

Numbers can be indexed for fast lookup
Dates can be naturally compared and sorted

4. Easy Querying and Analysis

Because structured data adheres to a strict model, with the structure being fixed and known, it can be easily accessed and analyzed using query languages like SQL (Structured Query Language). SQL enables operations such as filtering, aggregating, joining data across tables, sorting, and more. It can be used to generate reports from the data quickly and consistently.

This is why structured data is ideal for business reporting, dashboards, and operational systems.

Database systems like Azure SQL Database use SQL to let users retrieve specific records and perform analytics efficiently.

As an example, a SQL query like the one below – which retrieves the name and hire date of employees who were hired after 01/01/2024 – is simple and efficient when run against structured data:

			
SELECT Name, HireDate 
FROM Employees
WHERE HireDate > '2024-01-01';

5. Enforced Data Integrity and Rules

Relational databases that store structured data use rules and constraints to preserve data integrity, such as:

Primary keys to uniquely identify records
Foreign keys to express and enforce relationships between tables
Constraints like NOT NULL, UNIQUE, and CHECK, prevent invalid/unwanted data

These rules, along with the data typing, ensure data remains accurate, consistent, and meaningful across the entire dataset. Every row follows the same structure. This makes the data predictable, reliable and trustworthy for business reporting and analytics.

Because of this consistency:

Data validation is easier.
Automated processes function reliably.
Analytical and reporting tools deliver accurate results.

6. Indexing and Optimization

Although this is not a core feature, structured data systems often support indexing, which speeds up querying of data by creating optimized paths to specific values. This makes search and retrieval faster and is very important, and necessary in many cases, when working with large datasets.

Where Structured Data Is Used

Structured data is foundational in many classic applications, including:

Relational databases such as Azure SQL Database
- Many business applications use SQL databases (from a variety of vendors, including Microsoft, Oracle, and others) to store data.
Data warehouses that aggregate business data for analytics
Spreadsheet systems like Microsoft Excel

All of these use fixed schemas and are typically queried with SQL or BI tools. In the Azure ecosystem, services like Azure SQL Database and Azure Synapse Analytics are designed to handle structured data workloads — enabling high-performance querying, transaction processing, and analytics.

Why Structured Data Matters for DP-900

Structured data forms the foundation of many business applications and analytical systems.

Understanding structured data is essential because:

It’s the foundation of relational data concepts on Azure.
It’s the baseline for SQL and transactional processing.
Many Azure services prioritize structured workloads for performance and reliability.

Understanding its features helps you:

✔ Know when to use relational databases versus non-relational stores
✔ Understand how schema affects querying and data integrity
✔ Recognize the strengths and limitations of structured formats in Azure environments

Being clear on how structured data is defined, stored, and queried will help you distinguish it from semi-structured and unstructured data — a key skill in the DP-900 exam.

Benefits of Structured Data

Because of its organization and predictability, structured data offers several advantages:

✔ Easy querying and reporting — supported directly by SQL.
✔ High data integrity — enforced through schemas and validation rules.
✔ Efficient storage and processing — optimized for performance.
✔ Readily usable by analytics tools — ideal for dashboards and BI.

These benefits make structured data ideal for many enterprise workloads where accuracy, speed, and reliability are essential.

Structured Data vs. Other Data Types

To further your understanding of structured data, it helps to contrast it with the other data types:

Semi-structured data has some organization but lacks a strict schema (e.g., JSON).
Unstructured data has no inherent structure (e.g., text documents, images).

Structured data lives on the most rigid end of this spectrum, which is why it’s easy to manage with traditional databases and analytics tools.

Summary: Exam-Relevant Takeaways

Structured data sits in rows and columns with a predefined schema.
- ✔ Structured Data = predefined schema + tables + columns
Each column has a defined data type and validation rules.
- ✔ Strong typing and consistency
Structured data can be queried efficiently with SQL.
- ✔ Efficient querying with SQL
Its organization supports consistent, reliable, and fast analytics.
- ✔ Enforced integrity via constraints and keys

Understanding these features will help you recognize when structured data is the right representation and how it compares to other data forms in Azure scenarios.

Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

DP-900, Microsoft Certification May 10, 2026

Practice Questions: Describe features of structured data (DP-900 Exam Prep)

Practice Questions

Question 1

Which statement best describes structured data?

A. Data stored as images and videos
B. Data organized in key-value pairs without a schema
C. Data organized in rows and columns with a predefined schema
D. Data that can only be stored in files

✅ Answer: C

Explanation:
Structured data uses a fixed schema and is typically organized into tables with rows and columns, making it easy to query and analyze.

Question 2

Which of the following is a defining characteristic of structured data?

A. Schema-on-read
B. Schema-on-write
C. No enforced data types
D. Free-form text storage

✅ Answer: B

Explanation:
Structured data uses schema-on-write, meaning the structure (tables, columns, data types) must be defined before data is stored.

Question 3

You have a table with columns CustomerID, Name, and JoinDate. Each column has a defined data type. What feature of structured data does this demonstrate?

A. Indexing
B. Semi-structured storage
C. Strong data typing
D. Unstructured formatting

✅ Answer: C

Explanation:
Structured data enforces strong data typing, ensuring each column only accepts valid values (e.g., dates in date columns).

Question 4

Which language is most commonly used to query structured data?

A. Python
B. JSON
C. SQL
D. XML

✅ Answer: C

Explanation:
Structured data is designed to be queried using SQL (Structured Query Language).

Question 5

Which Azure service is primarily designed to store structured relational data?

A. Azure Blob Storage
B. Azure Data Lake Storage
C. Azure SQL Database
D. Azure File Storage

✅ Answer: C

Explanation:
Azure SQL Database is a managed relational database service optimized for structured data.

Question 6

What does a row represent in structured data?

A. A column definition
B. A schema
C. A single record or entity instance
D. A data type

✅ Answer: C

Explanation:
Each row represents one complete record (for example, one customer or one order).

Question 7

Which feature helps ensure that every record in a table can be uniquely identified?

A. Foreign key
B. Primary key
C. Index
D. View

✅ Answer: B

Explanation:
A primary key uniquely identifies each row and is a core integrity feature of structured data systems.

Question 8

Why is structured data well suited for reporting and dashboards?

A. It allows free-form documents
B. It does not require validation
C. It supports predictable schemas and efficient queries
D. It stores multimedia content

✅ Answer: C

Explanation:
Fixed schemas and SQL support make structured data ideal for analytics, reporting, and BI workloads.

Question 9

Which of the following best illustrates structured data?

A. A collection of photos
B. JSON log files
C. A spreadsheet with defined columns
D. Audio recordings

✅ Answer: C

Explanation:
Spreadsheets with consistent columns and rows are classic examples of structured data.

Question 10

What is a major benefit of enforcing constraints such as NOT NULL and UNIQUE?

A. Faster internet connections
B. Reduced storage costs
C. Improved data integrity
D. Automatic encryption

✅ Answer: C

Explanation:
Constraints help maintain accuracy and consistency, which is a key strength of structured data systems.

✅ Quick Exam Takeaway

For DP-900, remember:

Structured data uses tables (rows + columns)
Requires a predefined schema
Enforces data types and constraints
Is queried with SQL
Commonly lives in relational databases like Azure SQL Database

Go to the DP-900 Exam Prep Hub main page.