Describe Features of Unstructured Data (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe core data concepts (25–30%)
   --> Describe ways to represent data
      --> Describe features of unstructured data

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Introduction

For the DP-900 exam, unstructured data represents the opposite end of the data spectrum from structured data. You’re expected to understand what unstructured data is, its defining characteristics, and how Azure typically stores and works with it.

What Is Unstructured Data?

Unstructured data is data that does not follow a predefined data model or schema and does not naturally fit into rows and columns.

Unlike structured or semi-structured data:

There is no inherent schema
There are no consistent fields or attributes
The meaning of the data is not directly machine-readable without additional processing

Common examples include:

Text documents (Word, PDF, emails)
Images
Audio files
Video files
Social media posts
Free-form text

In short:

Unstructured data is raw content without built-in organization.

Key Features of Unstructured Data

1. No Predefined Schema

Unstructured data has no fixed structure at all.

There are:

No columns
No rows
No data types
No enforced fields

Each file or object stands alone, and systems do not inherently understand its internal meaning.

For DP-900, remember:

Unstructured data uses no schema-on-write and no schema-on-read by default.

Any structure must be created later using analytics or AI tools.

2. Human-Readable, Not Machine-Optimized

Unstructured data is usually created for human consumption, not database processing.

Examples:

A photo is meant to be viewed
A video is meant to be watched
A document is meant to be read

Computers cannot easily extract meaning from this data without:

AI
machine learning
text analytics
computer vision

3. Stored as Files or Binary Objects

Unstructured data is typically stored as files or blobs, rather than database records.

Each item is treated as a complete object, such as:

image.jpg
recording.mp3
report.pdf

There is no inherent relationship between files unless you explicitly create one.

4. Requires Specialized Processing

To analyze unstructured data, you generally need advanced tools such as:

Natural language processing (for text)
Image recognition
Speech-to-text
AI models

This is very different from structured data, where SQL alone is often sufficient.

5. Extremely Large Volume

Unstructured data typically represents the majority of enterprise data.

Examples include:

Video archives
Image repositories
Document libraries
Application-generated media

This makes scalability and low-cost storage especially important.

Where Unstructured Data Is Stored in Azure

Azure provides services specifically designed for unstructured data:

Azure Blob Storage

Primary Azure service for unstructured data
Stores images, videos, documents, backups, etc.
Highly scalable and cost-effective
Treats data as binary large objects (blobs)

Azure Data Lake Storage Gen2

Built on Blob Storage
Optimized for analytics workloads
Commonly used when unstructured data feeds big data or AI pipelines

For DP-900 purposes:

Azure Blob Storage = core unstructured storage
Azure Data Lake Storage = analytics-oriented unstructured storage

Common Use Cases for Unstructured Data

You’ll typically see unstructured data in scenarios involving:

Media content (photos, videos)
Document management systems
Call recordings
Social media data
Machine learning datasets

These workloads focus on storage and later interpretation, rather than immediate querying.

How Unstructured Differs from Semi-Structured

It’s important not to confuse these two:

Semi-Structured	Unstructured
Has tags or keys (JSON/XML)	No internal structure
Schema-on-read	No schema
Machine readable	Human readable
Cosmos DB / Data Lake	Blob Storage / Data Lake
Nested fields	Raw files

JSON logs = semi-structured
PDF documents = unstructured

This distinction shows up frequently in DP-900 questions.

Why Unstructured Data Matters for DP-900

Understanding unstructured data helps you:

Identify appropriate Azure storage services
Recognize when SQL is not suitable
Understand modern data pipelines involving AI and analytics

On the exam, unstructured data usually appears in questions involving:

Images
Videos
Documents
Blob Storage

Summary — Exam-Relevant Takeaways

For DP-900, remember:

✔ Unstructured data has no predefined schema
✔ Stored as files or blobs, not tables
✔ Not directly queryable with SQL
✔ Requires AI or analytics tools for insight
✔ Common Azure services: Azure Blob Storage, Azure Data Lake Storage
✔ Examples: images, videos, PDFs, audio, free-form text

Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

The Data Community

Describe Features of Unstructured Data (DP-900 Exam Prep)

Introduction

What Is Unstructured Data?