This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub.
This topic falls under these sections:
Describe core data concepts (25–30%)
--> Describe ways to represent data
--> Describe features of unstructured data
Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.
Introduction
For the DP-900 exam, unstructured data represents the opposite end of the data spectrum from structured data. You’re expected to understand what unstructured data is, its defining characteristics, and how Azure typically stores and works with it.
What Is Unstructured Data?
Unstructured data is data that does not follow a predefined data model or schema and does not naturally fit into rows and columns.
Unlike structured or semi-structured data:
- There is no inherent schema
- There are no consistent fields or attributes
- The meaning of the data is not directly machine-readable without additional processing
Common examples include:
- Text documents (Word, PDF, emails)
- Images
- Audio files
- Video files
- Social media posts
- Free-form text
In short:
Unstructured data is raw content without built-in organization.
Key Features of Unstructured Data
1. No Predefined Schema
Unstructured data has no fixed structure at all.
There are:
- No columns
- No rows
- No data types
- No enforced fields
Each file or object stands alone, and systems do not inherently understand its internal meaning.
For DP-900, remember:
Unstructured data uses no schema-on-write and no schema-on-read by default.
Any structure must be created later using analytics or AI tools.
2. Human-Readable, Not Machine-Optimized
Unstructured data is usually created for human consumption, not database processing.
Examples:
- A photo is meant to be viewed
- A video is meant to be watched
- A document is meant to be read
Computers cannot easily extract meaning from this data without:
- AI
- machine learning
- text analytics
- computer vision
3. Stored as Files or Binary Objects
Unstructured data is typically stored as files or blobs, rather than database records.
Each item is treated as a complete object, such as:
- image.jpg
- recording.mp3
- report.pdf
There is no inherent relationship between files unless you explicitly create one.
4. Requires Specialized Processing
To analyze unstructured data, you generally need advanced tools such as:
- Natural language processing (for text)
- Image recognition
- Speech-to-text
- AI models
This is very different from structured data, where SQL alone is often sufficient.
5. Extremely Large Volume
Unstructured data typically represents the majority of enterprise data.
Examples include:
- Video archives
- Image repositories
- Document libraries
- Application-generated media
This makes scalability and low-cost storage especially important.
Where Unstructured Data Is Stored in Azure
Azure provides services specifically designed for unstructured data:
Azure Blob Storage
- Primary Azure service for unstructured data
- Stores images, videos, documents, backups, etc.
- Highly scalable and cost-effective
- Treats data as binary large objects (blobs)
Azure Data Lake Storage Gen2
- Built on Blob Storage
- Optimized for analytics workloads
- Commonly used when unstructured data feeds big data or AI pipelines
For DP-900 purposes:
- Azure Blob Storage = core unstructured storage
- Azure Data Lake Storage = analytics-oriented unstructured storage
Common Use Cases for Unstructured Data
You’ll typically see unstructured data in scenarios involving:
- Media content (photos, videos)
- Document management systems
- Call recordings
- Social media data
- Machine learning datasets
These workloads focus on storage and later interpretation, rather than immediate querying.
How Unstructured Differs from Semi-Structured
It’s important not to confuse these two:
| Semi-Structured | Unstructured |
|---|---|
| Has tags or keys (JSON/XML) | No internal structure |
| Schema-on-read | No schema |
| Machine readable | Human readable |
| Cosmos DB / Data Lake | Blob Storage / Data Lake |
| Nested fields | Raw files |
JSON logs = semi-structured
PDF documents = unstructured
This distinction shows up frequently in DP-900 questions.
Why Unstructured Data Matters for DP-900
Understanding unstructured data helps you:
- Identify appropriate Azure storage services
- Recognize when SQL is not suitable
- Understand modern data pipelines involving AI and analytics
On the exam, unstructured data usually appears in questions involving:
- Images
- Videos
- Documents
- Blob Storage
Summary — Exam-Relevant Takeaways
For DP-900, remember:
✔ Unstructured data has no predefined schema
✔ Stored as files or blobs, not tables
✔ Not directly queryable with SQL
✔ Requires AI or analytics tools for insight
✔ Common Azure services: Azure Blob Storage, Azure Data Lake Storage
✔ Examples: images, videos, PDFs, audio, free-form text
Go to the Practice Exam Questions for this topic.
Go to the DP-900 Exam Prep Hub main page.
