This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
--> Ingest and transform batch data
--> Choose an appropriate data store
Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Introduction
One of the most important responsibilities of a Microsoft Fabric Data Engineer is selecting the appropriate data store for a given workload. The choice of data store directly affects performance, scalability, maintainability, security, data modeling approaches, and overall solution cost.
Microsoft Fabric provides multiple storage and analytics options, each optimized for specific use cases. The DP-700 exam expects candidates to understand when and why to use each storage option, especially in batch ingestion and transformation scenarios.
A common exam theme is evaluating business requirements and selecting the most appropriate storage technology among:
- OneLake
- Lakehouse
- Data Warehouse
- KQL Database
- Eventhouse
- Mirrored Databases
- Delta Tables
- Shortcuts
- External Storage Sources
Understanding the strengths and limitations of each option is critical for success on the exam.
Understanding Data Stores in Microsoft Fabric
A data store is a repository where data is persisted and made available for processing, analytics, reporting, or machine learning.
Selecting a data store depends on factors such as:
- Data structure
- Data volume
- Query patterns
- Processing requirements
- Latency requirements
- User skill sets
- Governance needs
A poor storage choice can result in:
- Slow query performance
- Increased complexity
- Higher costs
- Difficult maintenance
OneLake: The Foundation of Fabric
What is OneLake?
OneLake is the unified logical data lake for Microsoft Fabric.
It serves as:
- A single storage layer
- A centralized data repository
- A common storage foundation for all Fabric workloads
Key characteristics:
- Tenant-wide storage
- Based on Azure Data Lake Storage Gen2 architecture
- Supports open formats
- Eliminates data silos
Benefits
- Single copy of data
- Unified governance
- Centralized security
- Reduced duplication
Exam Tip
OneLake itself is not typically selected as the analytics store. Instead, workloads such as Lakehouses and Warehouses are built on top of OneLake.
Lakehouse
What is a Lakehouse?
A Lakehouse combines features of:
- Data lakes
- Data warehouses
Lakehouses support:
- Structured data
- Semi-structured data
- Unstructured data
while maintaining:
- Open file formats
- Delta Lake support
- Spark processing
When to Choose a Lakehouse
Choose a Lakehouse when:
- Data engineering workloads dominate
- Spark processing is required
- Data science workloads are expected
- Large volumes of raw data must be stored
- Data arrives in multiple formats
Examples:
- IoT data
- Clickstream data
- Log files
- Data science projects
Lakehouse Advantages
Open Format Storage
Uses Delta Parquet files.
Spark Integration
Optimized for notebooks and Spark jobs.
Data Science Friendly
Supports machine learning workflows.
Multi-format Data
Handles structured and unstructured data.
Lakehouse Limitations
- Less familiar to traditional SQL developers
- Some reporting scenarios may perform better in a warehouse
Data Warehouse
What is a Fabric Warehouse?
A Fabric Warehouse is a fully managed relational analytics platform optimized for SQL workloads.
Characteristics:
- Relational tables
- T-SQL support
- ACID transactions
- Optimized query engine
When to Choose a Warehouse
Choose a Warehouse when:
- Business users primarily use SQL
- Dimensional models are required
- Reporting workloads dominate
- Data is highly structured
Examples:
- Enterprise reporting
- Financial reporting
- Sales analytics
- Executive dashboards
Warehouse Advantages
Familiar SQL Experience
Supports T-SQL.
Strong BI Integration
Works seamlessly with reporting tools.
Relational Modeling
Supports star and snowflake schemas.
High Query Performance
Optimized for analytical queries.
Warehouse Limitations
- Less flexible for unstructured data
- Not ideal for large-scale machine learning workloads
Lakehouse vs Warehouse
| Feature | Lakehouse | Warehouse |
|---|---|---|
| Storage Format | Delta Files | Relational Tables |
| Primary Engine | Spark | SQL |
| Structured Data | Yes | Yes |
| Semi-Structured Data | Yes | Limited |
| Unstructured Data | Yes | No |
| Machine Learning | Excellent | Limited |
| BI Reporting | Good | Excellent |
| Data Science | Excellent | Limited |
Delta Tables
What Are Delta Tables?
Delta tables are transactional storage structures built on Delta Lake technology.
Benefits include:
- ACID compliance
- Schema enforcement
- Schema evolution
- Time travel
- Reliable batch processing
When to Use Delta Tables
Use Delta tables when:
- Reliable batch ingestion is required
- Incremental loads are needed
- Historical versions must be maintained
- Data quality controls are important
Exam Tip
Most Lakehouse production data should be stored as Delta tables.
KQL Database
What is a KQL Database?
KQL databases are optimized for:
- Log analytics
- Telemetry
- Time-series analysis
- Streaming workloads
They use:
- Kusto Query Language (KQL)
instead of SQL.
When to Choose KQL Databases
Choose KQL databases for:
- Application logs
- Monitoring data
- Telemetry
- Operational analytics
Examples:
- Server monitoring
- Manufacturing sensors
- Website activity tracking
Advantages
- Extremely fast ingestion
- High-performance time-series analysis
- Optimized aggregations
Limitations
- Not designed for dimensional modeling
- Less suitable for traditional reporting
Eventhouse
What is Eventhouse?
Eventhouse is a Real-Time Intelligence component designed for large-scale event analytics.
Eventhouse supports:
- Streaming ingestion
- KQL analytics
- High-volume telemetry
When to Use Eventhouse
Choose Eventhouse when:
- Massive event volumes exist
- Real-time analytics are required
- Streaming data dominates
Mirrored Databases
What Are Mirrored Databases?
Mirroring replicates operational database data into Fabric.
Supported sources include:
- Azure SQL Database
- Azure SQL Managed Instance
- SQL Server (supported scenarios)
- Other supported databases
When to Choose Mirroring
Choose mirroring when:
- Near real-time synchronization is required
- Minimal ETL is desired
- Operational systems must remain the source of truth
Advantages
- Reduced data movement
- Simplified architecture
- Near real-time availability
Shortcuts
What Are Shortcuts?
Shortcuts provide virtual access to data stored elsewhere.
The data remains in its original location.
Examples:
- Azure Data Lake Storage Gen2
- Amazon S3
- Other OneLake locations
When to Use Shortcuts
Choose shortcuts when:
- Data duplication must be avoided
- Multiple teams share datasets
- Existing storage investments should be preserved
Benefits
- No copying
- Reduced storage costs
- Single source of truth
Choosing Based on Workload Type
Data Engineering Workloads
Recommended Store:
✔ Lakehouse
Reason:
- Spark-native
- Flexible storage
- Delta support
Business Intelligence Workloads
Recommended Store:
✔ Warehouse
Reason:
- SQL optimized
- Dimensional modeling
- Reporting performance
Data Science Workloads
Recommended Store:
✔ Lakehouse
Reason:
- ML integration
- Notebook support
- Large-scale data processing
Streaming Analytics
Recommended Store:
✔ KQL Database
✔ Eventhouse
Reason:
- Low latency
- High ingestion rates
Operational Data Replication
Recommended Store:
✔ Mirrored Database
Reason:
- Simplified synchronization
Common DP-700 Scenario Questions
Scenario 1
A company wants to store structured and unstructured files while supporting Spark notebooks.
Best choice:
Lakehouse
Scenario 2
A finance department needs highly optimized SQL reporting.
Best choice:
Warehouse
Scenario 3
An operations team analyzes billions of telemetry records daily.
Best choice:
KQL Database or Eventhouse
Scenario 4
A company wants to use data already stored in ADLS Gen2 without copying it.
Best choice:
Shortcut
Scenario 5
A data science team requires Delta tables and machine learning workflows.
Best choice:
Lakehouse
Decision Framework
Ask the following questions:
Is the workload primarily SQL analytics?
→ Warehouse
Is Spark required?
→ Lakehouse
Is machine learning required?
→ Lakehouse
Is the data mostly telemetry or logs?
→ KQL Database
Is the data streaming continuously?
→ Eventhouse or KQL Database
Must external data remain in place?
→ Shortcut
Must operational databases replicate into Fabric?
→ Mirrored Database
Best Practices
Store Curated Data as Delta Tables
Provides reliability and transactional consistency.
Avoid Unnecessary Copies
Use shortcuts when appropriate.
Match Store to Workload
Do not force all workloads into one store.
Separate Raw and Curated Layers
Use Medallion Architecture:
- Bronze
- Silver
- Gold
Use Warehouses for Reporting
Leverage optimized SQL engines.
Use Lakehouses for Engineering
Take advantage of Spark and Delta Lake.
DP-700 Exam Focus Areas
You should understand:
✓ OneLake architecture
✓ Lakehouse capabilities
✓ Warehouse capabilities
✓ Delta tables
✓ KQL databases
✓ Eventhouse
✓ Mirrored databases
✓ Shortcuts
✓ SQL vs Spark workloads
✓ Reporting vs data science workloads
✓ Structured vs semi-structured data
✓ Batch vs streaming storage choices
✓ Common architecture decision scenarios
Practice Exam Questions
Question 1
A company needs to store structured, semi-structured, and unstructured data while enabling Spark notebook processing. Which data store should be selected?
A. KQL Database
B. Warehouse
C. Lakehouse
D. Eventhouse
Answer: C
Explanation
Lakehouses support multiple data formats and integrate natively with Spark, making them ideal for data engineering and data science workloads.
Question 2
Which Fabric data store is optimized for traditional SQL-based analytical reporting?
A. Eventhouse
B. Shortcut
C. Lakehouse
D. Warehouse
Answer: D
Explanation
Warehouses are designed for structured relational analytics and provide strong T-SQL support.
Question 3
A company collects billions of sensor readings each day and requires fast time-series analysis.
Which data store is most appropriate?
A. Warehouse
B. Shortcut
C. KQL Database
D. Mirrored Database
Answer: C
Explanation
KQL databases are optimized for telemetry, logs, and time-series analytics.
Question 4
You need to provide access to data stored in Azure Data Lake Storage Gen2 without copying the files into Fabric.
What should you use?
A. Warehouse
B. Mirrored Database
C. Delta Table
D. Shortcut
Answer: D
Explanation
Shortcuts provide virtual access to external data sources while avoiding duplication.
Question 5
Which feature is provided by Delta tables?
A. Automatic dashboard creation
B. ACID transaction support
C. Real-time alerting
D. Event routing
Answer: B
Explanation
Delta tables provide ACID transactions, schema enforcement, and reliable data processing.
Question 6
A business intelligence team primarily writes T-SQL queries and builds dimensional models.
Which storage option should they use?
A. Eventhouse
B. KQL Database
C. Warehouse
D. Lakehouse
Answer: C
Explanation
Warehouses are optimized for SQL analytics and dimensional modeling.
Question 7
A company wants near real-time replication of operational Azure SQL Database data into Fabric.
Which solution should be selected?
A. Mirrored Database
B. Eventhouse
C. Warehouse
D. Delta Table
Answer: A
Explanation
Mirrored databases synchronize source database changes into Fabric with minimal ETL effort.
Question 8
Which Fabric storage option is generally the best choice for machine learning workloads?
A. Warehouse
B. Lakehouse
C. KQL Database
D. Eventhouse
Answer: B
Explanation
Lakehouses support Spark, notebooks, Delta tables, and machine learning workflows.
Question 9
What is the primary purpose of OneLake?
A. Replacing all Lakehouses
B. Serving as Fabric’s unified storage foundation
C. Providing only streaming analytics
D. Managing deployment pipelines
Answer: B
Explanation
OneLake is the centralized storage layer that underpins all Fabric workloads.
Question 10
A solution requires high-volume real-time event analytics with streaming ingestion.
Which Fabric component is most appropriate?
A. Warehouse
B. Mirrored Database
C. Delta Table
D. Eventhouse
Answer: D
Explanation
Eventhouse is designed specifically for large-scale streaming analytics and real-time intelligence scenarios.
Exam Tip
For DP-700, many questions can be answered by remembering this mapping:
| Requirement | Best Choice |
|---|---|
| Spark, ML, flexible data formats | Lakehouse |
| SQL analytics and reporting | Warehouse |
| Telemetry and time-series data | KQL Database |
| Massive real-time event analytics | Eventhouse |
| External data without copying | Shortcut |
| Near real-time operational replication | Mirrored Database |
| Unified Fabric storage layer | OneLake |
A common exam pattern is to present multiple technically possible options. Focus on selecting the option that is optimized for the workload, not merely one that could work.
Go to the DP-700 Exam Prep Hub main page.
