This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform batch data
      --> Choose an appropriate data store

Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the most important responsibilities of a Microsoft Fabric Data Engineer is selecting the appropriate data store for a given workload. The choice of data store directly affects performance, scalability, maintainability, security, data modeling approaches, and overall solution cost.

Microsoft Fabric provides multiple storage and analytics options, each optimized for specific use cases. The DP-700 exam expects candidates to understand when and why to use each storage option, especially in batch ingestion and transformation scenarios.

A common exam theme is evaluating business requirements and selecting the most appropriate storage technology among:

OneLake
Lakehouse
Data Warehouse
KQL Database
Eventhouse
Mirrored Databases
Delta Tables
Shortcuts
External Storage Sources

Understanding the strengths and limitations of each option is critical for success on the exam.

Understanding Data Stores in Microsoft Fabric

A data store is a repository where data is persisted and made available for processing, analytics, reporting, or machine learning.

Selecting a data store depends on factors such as:

Data structure
Data volume
Query patterns
Processing requirements
Latency requirements
User skill sets
Governance needs

A poor storage choice can result in:

Slow query performance
Increased complexity
Higher costs
Difficult maintenance

OneLake: The Foundation of Fabric

What is OneLake?

OneLake is the unified logical data lake for Microsoft Fabric.

It serves as:

A single storage layer
A centralized data repository
A common storage foundation for all Fabric workloads

Key characteristics:

Tenant-wide storage
Based on Azure Data Lake Storage Gen2 architecture
Supports open formats
Eliminates data silos

Benefits

Single copy of data
Unified governance
Centralized security
Reduced duplication

Exam Tip

OneLake itself is not typically selected as the analytics store. Instead, workloads such as Lakehouses and Warehouses are built on top of OneLake.

Lakehouse

What is a Lakehouse?

A Lakehouse combines features of:

Data lakes
Data warehouses

Lakehouses support:

Structured data
Semi-structured data
Unstructured data

while maintaining:

Open file formats
Delta Lake support
Spark processing

When to Choose a Lakehouse

Choose a Lakehouse when:

Data engineering workloads dominate
Spark processing is required
Data science workloads are expected
Large volumes of raw data must be stored
Data arrives in multiple formats

Examples:

IoT data
Clickstream data
Log files
Data science projects

Lakehouse Advantages

Open Format Storage

Uses Delta Parquet files.

Spark Integration

Optimized for notebooks and Spark jobs.

Data Science Friendly

Supports machine learning workflows.

Multi-format Data

Handles structured and unstructured data.

Lakehouse Limitations

Less familiar to traditional SQL developers
Some reporting scenarios may perform better in a warehouse

Data Warehouse

What is a Fabric Warehouse?

A Fabric Warehouse is a fully managed relational analytics platform optimized for SQL workloads.

Characteristics:

Relational tables
T-SQL support
ACID transactions
Optimized query engine

When to Choose a Warehouse

Choose a Warehouse when:

Business users primarily use SQL
Dimensional models are required
Reporting workloads dominate
Data is highly structured

Examples:

Enterprise reporting
Financial reporting
Sales analytics
Executive dashboards

Warehouse Advantages

Familiar SQL Experience

Supports T-SQL.

Strong BI Integration

Works seamlessly with reporting tools.

Relational Modeling

Supports star and snowflake schemas.

High Query Performance

Optimized for analytical queries.

Warehouse Limitations

Less flexible for unstructured data
Not ideal for large-scale machine learning workloads

Lakehouse vs Warehouse

Feature	Lakehouse	Warehouse
Storage Format	Delta Files	Relational Tables
Primary Engine	Spark	SQL
Structured Data	Yes	Yes
Semi-Structured Data	Yes	Limited
Unstructured Data	Yes	No
Machine Learning	Excellent	Limited
BI Reporting	Good	Excellent
Data Science	Excellent	Limited

Delta Tables

What Are Delta Tables?

Delta tables are transactional storage structures built on Delta Lake technology.

Benefits include:

ACID compliance
Schema enforcement
Schema evolution
Time travel
Reliable batch processing

When to Use Delta Tables

Use Delta tables when:

Reliable batch ingestion is required
Incremental loads are needed
Historical versions must be maintained
Data quality controls are important

Exam Tip

Most Lakehouse production data should be stored as Delta tables.

KQL Database

What is a KQL Database?

KQL databases are optimized for:

Log analytics
Telemetry
Time-series analysis
Streaming workloads

They use:

Kusto Query Language (KQL)

instead of SQL.

When to Choose KQL Databases

Choose KQL databases for:

Application logs
Monitoring data
Telemetry
Operational analytics

Examples:

Server monitoring
Manufacturing sensors
Website activity tracking

Advantages

Extremely fast ingestion
High-performance time-series analysis
Optimized aggregations

Limitations

Not designed for dimensional modeling
Less suitable for traditional reporting

Eventhouse

What is Eventhouse?

Eventhouse is a Real-Time Intelligence component designed for large-scale event analytics.

Eventhouse supports:

Streaming ingestion
KQL analytics
High-volume telemetry

When to Use Eventhouse

Choose Eventhouse when:

Massive event volumes exist
Real-time analytics are required
Streaming data dominates

Mirrored Databases

What Are Mirrored Databases?

Mirroring replicates operational database data into Fabric.

Supported sources include:

Azure SQL Database
Azure SQL Managed Instance
SQL Server (supported scenarios)
Other supported databases

When to Choose Mirroring

Choose mirroring when:

Near real-time synchronization is required
Minimal ETL is desired
Operational systems must remain the source of truth

Advantages

Reduced data movement
Simplified architecture
Near real-time availability

Shortcuts

What Are Shortcuts?

Shortcuts provide virtual access to data stored elsewhere.

The data remains in its original location.

Examples:

Azure Data Lake Storage Gen2
Amazon S3
Other OneLake locations

When to Use Shortcuts

Choose shortcuts when:

Data duplication must be avoided
Multiple teams share datasets
Existing storage investments should be preserved

Benefits

No copying
Reduced storage costs
Single source of truth

Choosing Based on Workload Type

Data Engineering Workloads

Recommended Store:

✔ Lakehouse

Reason:

Spark-native
Flexible storage
Delta support

Business Intelligence Workloads

Recommended Store:

✔ Warehouse

Reason:

SQL optimized
Dimensional modeling
Reporting performance

Data Science Workloads

Recommended Store:

✔ Lakehouse

Reason:

ML integration
Notebook support
Large-scale data processing

Streaming Analytics

Recommended Store:

✔ KQL Database
✔ Eventhouse

Reason:

Low latency
High ingestion rates

Operational Data Replication

Recommended Store:

✔ Mirrored Database

Reason:

Simplified synchronization

Common DP-700 Scenario Questions

Scenario 1

A company wants to store structured and unstructured files while supporting Spark notebooks.

Best choice:

Lakehouse

Scenario 2

A finance department needs highly optimized SQL reporting.

Best choice:

Warehouse

Scenario 3

An operations team analyzes billions of telemetry records daily.

Best choice:

KQL Database or Eventhouse

Scenario 4

A company wants to use data already stored in ADLS Gen2 without copying it.

Best choice:

Shortcut

Scenario 5

A data science team requires Delta tables and machine learning workflows.

Best choice:

Lakehouse

Decision Framework

Ask the following questions:

Is the workload primarily SQL analytics?

→ Warehouse

Is Spark required?

→ Lakehouse

Is machine learning required?

→ Lakehouse

Is the data mostly telemetry or logs?

→ KQL Database

Is the data streaming continuously?

→ Eventhouse or KQL Database

Must external data remain in place?

→ Shortcut

Must operational databases replicate into Fabric?

→ Mirrored Database

Best Practices

Store Curated Data as Delta Tables

Provides reliability and transactional consistency.

Avoid Unnecessary Copies

Use shortcuts when appropriate.

Match Store to Workload

Do not force all workloads into one store.

Separate Raw and Curated Layers

Use Medallion Architecture:

Bronze
Silver
Gold

Use Warehouses for Reporting

Leverage optimized SQL engines.

Use Lakehouses for Engineering

Take advantage of Spark and Delta Lake.

DP-700 Exam Focus Areas

You should understand:

✓ OneLake architecture

✓ Lakehouse capabilities

✓ Warehouse capabilities

✓ Delta tables

✓ KQL databases

✓ Eventhouse

✓ Mirrored databases

✓ Shortcuts

✓ SQL vs Spark workloads

✓ Reporting vs data science workloads

✓ Structured vs semi-structured data

✓ Batch vs streaming storage choices

✓ Common architecture decision scenarios

Practice Exam Questions

Question 1

A company needs to store structured, semi-structured, and unstructured data while enabling Spark notebook processing. Which data store should be selected?

A. KQL Database

B. Warehouse

C. Lakehouse

D. Eventhouse

Answer: C

Explanation

Lakehouses support multiple data formats and integrate natively with Spark, making them ideal for data engineering and data science workloads.

Question 2

Which Fabric data store is optimized for traditional SQL-based analytical reporting?

A. Eventhouse

B. Shortcut

C. Lakehouse

D. Warehouse

Answer: D

Explanation

Warehouses are designed for structured relational analytics and provide strong T-SQL support.

Question 3

A company collects billions of sensor readings each day and requires fast time-series analysis.

Which data store is most appropriate?

A. Warehouse

B. Shortcut

C. KQL Database

D. Mirrored Database

Answer: C

Explanation

KQL databases are optimized for telemetry, logs, and time-series analytics.

Question 4

You need to provide access to data stored in Azure Data Lake Storage Gen2 without copying the files into Fabric.

What should you use?

A. Warehouse

B. Mirrored Database

C. Delta Table

D. Shortcut

Answer: D

Explanation

Shortcuts provide virtual access to external data sources while avoiding duplication.

Question 5

Which feature is provided by Delta tables?

A. Automatic dashboard creation

B. ACID transaction support

C. Real-time alerting

D. Event routing

Answer: B

Explanation

Delta tables provide ACID transactions, schema enforcement, and reliable data processing.

Question 6

A business intelligence team primarily writes T-SQL queries and builds dimensional models.

Which storage option should they use?

A. Eventhouse

B. KQL Database

C. Warehouse

D. Lakehouse

Answer: C

Explanation

Warehouses are optimized for SQL analytics and dimensional modeling.

Question 7

A company wants near real-time replication of operational Azure SQL Database data into Fabric.

Which solution should be selected?

A. Mirrored Database

B. Eventhouse

C. Warehouse

D. Delta Table

Answer: A

Explanation

Mirrored databases synchronize source database changes into Fabric with minimal ETL effort.

Question 8

Which Fabric storage option is generally the best choice for machine learning workloads?

A. Warehouse

B. Lakehouse

C. KQL Database

D. Eventhouse

Answer: B

Explanation

Lakehouses support Spark, notebooks, Delta tables, and machine learning workflows.

Question 9

What is the primary purpose of OneLake?

A. Replacing all Lakehouses

B. Serving as Fabric’s unified storage foundation

C. Providing only streaming analytics

D. Managing deployment pipelines

Answer: B

Explanation

OneLake is the centralized storage layer that underpins all Fabric workloads.

Question 10

A solution requires high-volume real-time event analytics with streaming ingestion.

Which Fabric component is most appropriate?

A. Warehouse

B. Mirrored Database

C. Delta Table

D. Eventhouse

Answer: D

Explanation

Eventhouse is designed specifically for large-scale streaming analytics and real-time intelligence scenarios.

Exam Tip

For DP-700, many questions can be answered by remembering this mapping:

Requirement	Best Choice
Spark, ML, flexible data formats	Lakehouse
SQL analytics and reporting	Warehouse
Telemetry and time-series data	KQL Database
Massive real-time event analytics	Eventhouse
External data without copying	Shortcut
Near real-time operational replication	Mirrored Database
Unified Fabric storage layer	OneLake

A common exam pattern is to present multiple technically possible options. Focus on selecting the option that is optimized for the workload, not merely one that could work.

Go to the DP-700 Exam Prep Hub main page.