This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub.
This topic falls under these sections:
Describe an analytics workload (25–30%)
--> Describe common elements of large-scale analytics
--> Describe options for analytical data stores
Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.
Analytical data stores are designed to support reporting, business intelligence, and large-scale data analysis. For the DP-900 exam, you should understand the different types of analytical stores, their characteristics, and when to use each.
What Is an Analytical Data Store?
An analytical data store is optimized for:
- Querying large volumes of data
- Aggregations and reporting
- Historical analysis
✔ Unlike transactional systems, analytical stores focus on read-heavy workloads rather than frequent updates.
Key Characteristics
- Optimized for complex queries and aggregations
- Stores historical data
- Handles large datasets (TBs to PBs)
- Typically uses denormalized schemas
- Designed for high-performance reads
Main Types of Analytical Data Stores
1. Data Warehouse
Definition
A structured repository designed for relational analytical queries.
Key Features
- Uses structured data
- Schema-based (often star or snowflake schema)
- Supports SQL queries
Azure Example
Azure Synapse Analytics
Use Cases
- Business intelligence reporting
- Financial analysis
- Enterprise dashboards
✔ Best for: Structured data and SQL-based analytics
2. Data Lake
Definition
A storage repository for raw data in its native format.
Key Features
- Supports structured, semi-structured, and unstructured data
- Schema-on-read (schema applied when querying)
- Highly scalable and cost-effective
Azure Example
Azure Data Lake Storage
Use Cases
- Big data analytics
- Machine learning
- Storing raw ingestion data
✔ Best for: Flexible, large-scale data storage
3. Data Lakehouse (Conceptual)
Definition
A hybrid approach combining features of data lakes and data warehouses.
Key Features
- Stores raw data like a data lake
- Supports structured queries like a warehouse
- Often uses open formats (e.g., Parquet, Delta)
Azure Context
- Often implemented using:
- Azure Data Lake Storage
- Azure Synapse Analytics
✔ Best for: Unified analytics platform
4. Analytical Databases / Big Data Processing Systems
Definition
Systems designed for distributed processing of large datasets.
Azure Example
Azure Synapse Analytics
Key Features
- Parallel processing
- Handles massive datasets
- Supports batch and interactive queries
✔ Best for: Large-scale analytics workloads
Comparison of Analytical Data Stores
| Feature | Data Warehouse | Data Lake | Lakehouse |
|---|---|---|---|
| Data Type | Structured | All types | All types |
| Schema | Schema-on-write | Schema-on-read | Hybrid |
| Cost | Higher | Lower | Moderate |
| Flexibility | Low | High | High |
| Query Performance | High | Variable | High |
Key Design Considerations
1. Data Structure
- Structured → Data warehouse
- Mixed or raw → Data lake
2. Query Requirements
- Complex SQL queries → Data warehouse
- Exploratory analytics → Data lake
3. Cost
- Data lakes are generally more cost-effective
- Warehouses provide optimized performance at higher cost
4. Scalability
- All Azure analytical stores scale
- Data lakes excel in massive data storage
5. Performance Needs
- Warehouses → optimized for speed
- Lakes → optimized for storage and flexibility
Typical Analytics Architecture
- Data Ingestion
- Batch or streaming
- Storage
- Data lake or data warehouse
- Processing
- Transformations and aggregations
- Visualization
- BI tools (e.g., Power BI)
Why This Matters for DP-900
On the exam, you may be asked to:
- Identify the correct analytical store for a scenario
- Compare data lakes vs data warehouses
- Understand schema-on-read vs schema-on-write
- Recognize Azure services used for analytics
Summary — Exam-Relevant Takeaways
✔ Analytical data stores are used for:
- Reporting
- Analytics
- Historical data analysis
✔ Main types:
- Data Warehouse → structured, high-performance queries
- Data Lake → raw, flexible storage
- Lakehouse → hybrid approach
✔ Key concepts:
- Schema-on-write (warehouse)
- Schema-on-read (lake)
✔ Azure services to know:
- Azure Synapse Analytics → data warehouse & analytics
- Azure Data Lake Storage → scalable data lake
✔ Exam tip:
👉 Structured + SQL analytics → Data Warehouse
👉 Raw + flexible + big data → Data Lake
Go to the Practice Exam Questions for this topic.
Go to the DP-900 Exam Prep Hub main page.
