Describe Features of Semi-Structured Data (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe core data concepts (25–30%)
   --> Describe ways to represent data
      --> Describe features of semi-structured data

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Introduction

For the DP-900 exam, semi-structured data sits between structured and unstructured data. You’re expected to understand what it is, how it’s organized, and why Azure provides specialized services to store and query it.

What Is Semi-Structured Data?

Semi-structured data is data that does not follow a rigid, tabular schema like relational data, but still contains organizational markers or tags that make it partially structured and machine readable.

Unlike structured data (rows and columns), semi-structured data:

Does not require a predefined schema
Can vary in shape from record to record
Still contains self-describing elements such as key–value pairs or hierarchical structures

In other words, semi-structured data has some structure — just not fixed tables.

Common examples include:

JSON documents
XML files
YAML
Avro / Parquet (used in analytics pipelines)

Key Features of Semi-Structured Data

1. Schema-on-Read (Not Schema-on-Write)

One of the most important characteristics of semi-structured data is schema-on-read.

This means:

Data is stored without enforcing a strict schema
Structure is interpreted when the data is queried or analyzed

This contrasts with structured data, which uses schema-on-write, where structure must be defined before data is inserted.

For DP-900, remember:

Semi-structured data is flexible at ingestion time and structured at query time.

2. Flexible and Evolving Structure

Each record in a semi-structured dataset can contain:

Different fields
Nested objects
Optional attributes

Example (JSON):

			
{
  "CustomerID": 123,
  "Name": "Sarah",
  "Orders": [
    { "OrderID": 1, "Amount": 50 },
    { "OrderID": 2, "Amount": 75 }
  ]
}

		

Another record in the same dataset might include extra fields like Email or omit Orders entirely.

This flexibility makes semi-structured data ideal for:

Application telemetry
IoT data
User activity logs
Rapidly changing systems

3. Hierarchical or Nested Organization

Semi-structured data often uses hierarchies rather than flat tables.

For example:

JSON objects inside objects
XML elements within elements

This nested design allows complex relationships to exist inside a single document — something that would require multiple tables in relational systems.

4. Self-Describing Format

Semi-structured data embeds its own metadata using:

Keys
Tags
Field names

This makes the data self-describing, meaning applications can understand what each value represents without relying on an external schema definition.

Example:

"Temperature": 72

The key itself describes the value.

5. Easily Transported Across Systems

Semi-structured formats such as JSON and XML are:

Human readable
Platform independent
Widely supported across APIs and applications

This is why most modern web services exchange data using JSON.

Common Formats of Semi-Structured Data

You should recognize these for DP-900:

Format	Description
JSON	Most common format for APIs and applications
XML	Tag-based hierarchical format
YAML	Human-friendly configuration format
Avro / Parquet	Columnar formats used in analytics pipelines

Where Semi-Structured Data Is Used in Azure

Microsoft Azure provides specialized services designed to handle semi-structured data:

Azure Cosmos DB

Stores JSON documents
Supports schema-less designs
Designed for globally distributed applications
Optimized for flexible data models

Azure Data Lake Storage

Stores large volumes of semi-structured files
Used in analytics pipelines
Often paired with Azure Synapse or Azure Data Factory

These services are built specifically for workloads where structure changes frequently or cannot be fully defined in advance.

Why Semi-Structured Data Matters for DP-900

Understanding semi-structured data helps you:

Distinguish it from relational (structured) data
Identify appropriate Azure services (especially Cosmos DB)
Understand modern application and analytics architectures

On the exam, you’ll typically see semi-structured data appear in scenarios involving:

JSON documents
Application telemetry
IoT data
Log files

Structured vs Semi-Structured (Quick Comparison)

Structured	Semi-Structured
Fixed schema	Flexible schema
Rows and columns	Documents / nested objects
Schema-on-write	Schema-on-read
SQL databases	Document databases
Highly consistent	Shape varies by record

Summary — Exam-Relevant Takeaways

For DP-900, remember:

✔ Semi-structured data has no fixed schema
✔ Uses schema-on-read
✔ Supports nested and hierarchical structures
✔ Common formats: JSON, XML
✔ Often stored in Azure Cosmos DB or Data Lake
✔ Ideal for rapidly changing or document-based data

Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

The Data Community

Describe Features of Semi-Structured Data (DP-900 Exam Prep)

Introduction

What Is Semi-Structured Data?

Key Features of Semi-Structured Data

1. Schema-on-Read (Not Schema-on-Write)

2. Flexible and Evolving Structure

3. Hierarchical or Nested Organization

4. Self-Describing Format

5. Easily Transported Across Systems

Common Formats of Semi-Structured Data

Where Semi-Structured Data Is Used in Azure

Azure Cosmos DB

Azure Data Lake Storage

Why Semi-Structured Data Matters for DP-900

Structured vs Semi-Structured (Quick Comparison)

Summary — Exam-Relevant Takeaways

One thought on “Describe Features of Semi-Structured Data (DP-900 Exam Prep)”

Leave a comment Cancel reply

Information and resources for the data professionals' community

Introduction

What Is Semi-Structured Data?

Key Features of Semi-Structured Data

1. Schema-on-Read (Not Schema-on-Write)

2. Flexible and Evolving Structure

3. Hierarchical or Nested Organization

4. Self-Describing Format

5. Easily Transported Across Systems

Common Formats of Semi-Structured Data

Where Semi-Structured Data Is Used in Azure

Azure Cosmos DB

Azure Data Lake Storage

Why Semi-Structured Data Matters for DP-900

Structured vs Semi-Structured (Quick Comparison)

Summary — Exam-Relevant Takeaways

Share this:

Related

One thought on “Describe Features of Semi-Structured Data (DP-900 Exam Prep)”

Leave a comment Cancel reply

Information and resources for the data professionals' community