Choose an appropriate data store (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform batch data
      --> Choose an appropriate data store


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the most important responsibilities of a Microsoft Fabric Data Engineer is selecting the appropriate data store for a given workload. The choice of data store directly affects performance, scalability, maintainability, security, data modeling approaches, and overall solution cost.

Microsoft Fabric provides multiple storage and analytics options, each optimized for specific use cases. The DP-700 exam expects candidates to understand when and why to use each storage option, especially in batch ingestion and transformation scenarios.

A common exam theme is evaluating business requirements and selecting the most appropriate storage technology among:

  • OneLake
  • Lakehouse
  • Data Warehouse
  • KQL Database
  • Eventhouse
  • Mirrored Databases
  • Delta Tables
  • Shortcuts
  • External Storage Sources

Understanding the strengths and limitations of each option is critical for success on the exam.


Understanding Data Stores in Microsoft Fabric

A data store is a repository where data is persisted and made available for processing, analytics, reporting, or machine learning.

Selecting a data store depends on factors such as:

  • Data structure
  • Data volume
  • Query patterns
  • Processing requirements
  • Latency requirements
  • User skill sets
  • Governance needs

A poor storage choice can result in:

  • Slow query performance
  • Increased complexity
  • Higher costs
  • Difficult maintenance

OneLake: The Foundation of Fabric

What is OneLake?

OneLake is the unified logical data lake for Microsoft Fabric.

It serves as:

  • A single storage layer
  • A centralized data repository
  • A common storage foundation for all Fabric workloads

Key characteristics:

  • Tenant-wide storage
  • Based on Azure Data Lake Storage Gen2 architecture
  • Supports open formats
  • Eliminates data silos

Benefits

  • Single copy of data
  • Unified governance
  • Centralized security
  • Reduced duplication

Exam Tip

OneLake itself is not typically selected as the analytics store. Instead, workloads such as Lakehouses and Warehouses are built on top of OneLake.


Lakehouse

What is a Lakehouse?

A Lakehouse combines features of:

  • Data lakes
  • Data warehouses

Lakehouses support:

  • Structured data
  • Semi-structured data
  • Unstructured data

while maintaining:

  • Open file formats
  • Delta Lake support
  • Spark processing

When to Choose a Lakehouse

Choose a Lakehouse when:

  • Data engineering workloads dominate
  • Spark processing is required
  • Data science workloads are expected
  • Large volumes of raw data must be stored
  • Data arrives in multiple formats

Examples:

  • IoT data
  • Clickstream data
  • Log files
  • Data science projects

Lakehouse Advantages

Open Format Storage

Uses Delta Parquet files.

Spark Integration

Optimized for notebooks and Spark jobs.

Data Science Friendly

Supports machine learning workflows.

Multi-format Data

Handles structured and unstructured data.


Lakehouse Limitations

  • Less familiar to traditional SQL developers
  • Some reporting scenarios may perform better in a warehouse

Data Warehouse

What is a Fabric Warehouse?

A Fabric Warehouse is a fully managed relational analytics platform optimized for SQL workloads.

Characteristics:

  • Relational tables
  • T-SQL support
  • ACID transactions
  • Optimized query engine

When to Choose a Warehouse

Choose a Warehouse when:

  • Business users primarily use SQL
  • Dimensional models are required
  • Reporting workloads dominate
  • Data is highly structured

Examples:

  • Enterprise reporting
  • Financial reporting
  • Sales analytics
  • Executive dashboards

Warehouse Advantages

Familiar SQL Experience

Supports T-SQL.

Strong BI Integration

Works seamlessly with reporting tools.

Relational Modeling

Supports star and snowflake schemas.

High Query Performance

Optimized for analytical queries.


Warehouse Limitations

  • Less flexible for unstructured data
  • Not ideal for large-scale machine learning workloads

Lakehouse vs Warehouse

FeatureLakehouseWarehouse
Storage FormatDelta FilesRelational Tables
Primary EngineSparkSQL
Structured DataYesYes
Semi-Structured DataYesLimited
Unstructured DataYesNo
Machine LearningExcellentLimited
BI ReportingGoodExcellent
Data ScienceExcellentLimited

Delta Tables

What Are Delta Tables?

Delta tables are transactional storage structures built on Delta Lake technology.

Benefits include:

  • ACID compliance
  • Schema enforcement
  • Schema evolution
  • Time travel
  • Reliable batch processing

When to Use Delta Tables

Use Delta tables when:

  • Reliable batch ingestion is required
  • Incremental loads are needed
  • Historical versions must be maintained
  • Data quality controls are important

Exam Tip

Most Lakehouse production data should be stored as Delta tables.


KQL Database

What is a KQL Database?

KQL databases are optimized for:

  • Log analytics
  • Telemetry
  • Time-series analysis
  • Streaming workloads

They use:

  • Kusto Query Language (KQL)

instead of SQL.


When to Choose KQL Databases

Choose KQL databases for:

  • Application logs
  • Monitoring data
  • Telemetry
  • Operational analytics

Examples:

  • Server monitoring
  • Manufacturing sensors
  • Website activity tracking

Advantages

  • Extremely fast ingestion
  • High-performance time-series analysis
  • Optimized aggregations

Limitations

  • Not designed for dimensional modeling
  • Less suitable for traditional reporting

Eventhouse

What is Eventhouse?

Eventhouse is a Real-Time Intelligence component designed for large-scale event analytics.

Eventhouse supports:

  • Streaming ingestion
  • KQL analytics
  • High-volume telemetry

When to Use Eventhouse

Choose Eventhouse when:

  • Massive event volumes exist
  • Real-time analytics are required
  • Streaming data dominates

Mirrored Databases

What Are Mirrored Databases?

Mirroring replicates operational database data into Fabric.

Supported sources include:

  • Azure SQL Database
  • Azure SQL Managed Instance
  • SQL Server (supported scenarios)
  • Other supported databases

When to Choose Mirroring

Choose mirroring when:

  • Near real-time synchronization is required
  • Minimal ETL is desired
  • Operational systems must remain the source of truth

Advantages

  • Reduced data movement
  • Simplified architecture
  • Near real-time availability

Shortcuts

What Are Shortcuts?

Shortcuts provide virtual access to data stored elsewhere.

The data remains in its original location.

Examples:

  • Azure Data Lake Storage Gen2
  • Amazon S3
  • Other OneLake locations

When to Use Shortcuts

Choose shortcuts when:

  • Data duplication must be avoided
  • Multiple teams share datasets
  • Existing storage investments should be preserved

Benefits

  • No copying
  • Reduced storage costs
  • Single source of truth

Choosing Based on Workload Type

Data Engineering Workloads

Recommended Store:

✔ Lakehouse

Reason:

  • Spark-native
  • Flexible storage
  • Delta support

Business Intelligence Workloads

Recommended Store:

✔ Warehouse

Reason:

  • SQL optimized
  • Dimensional modeling
  • Reporting performance

Data Science Workloads

Recommended Store:

✔ Lakehouse

Reason:

  • ML integration
  • Notebook support
  • Large-scale data processing

Streaming Analytics

Recommended Store:

✔ KQL Database
✔ Eventhouse

Reason:

  • Low latency
  • High ingestion rates

Operational Data Replication

Recommended Store:

✔ Mirrored Database

Reason:

  • Simplified synchronization

Common DP-700 Scenario Questions

Scenario 1

A company wants to store structured and unstructured files while supporting Spark notebooks.

Best choice:

Lakehouse


Scenario 2

A finance department needs highly optimized SQL reporting.

Best choice:

Warehouse


Scenario 3

An operations team analyzes billions of telemetry records daily.

Best choice:

KQL Database or Eventhouse


Scenario 4

A company wants to use data already stored in ADLS Gen2 without copying it.

Best choice:

Shortcut


Scenario 5

A data science team requires Delta tables and machine learning workflows.

Best choice:

Lakehouse


Decision Framework

Ask the following questions:

Is the workload primarily SQL analytics?

→ Warehouse

Is Spark required?

→ Lakehouse

Is machine learning required?

→ Lakehouse

Is the data mostly telemetry or logs?

→ KQL Database

Is the data streaming continuously?

→ Eventhouse or KQL Database

Must external data remain in place?

→ Shortcut

Must operational databases replicate into Fabric?

→ Mirrored Database


Best Practices

Store Curated Data as Delta Tables

Provides reliability and transactional consistency.


Avoid Unnecessary Copies

Use shortcuts when appropriate.


Match Store to Workload

Do not force all workloads into one store.


Separate Raw and Curated Layers

Use Medallion Architecture:

  • Bronze
  • Silver
  • Gold

Use Warehouses for Reporting

Leverage optimized SQL engines.


Use Lakehouses for Engineering

Take advantage of Spark and Delta Lake.


DP-700 Exam Focus Areas

You should understand:

✓ OneLake architecture

✓ Lakehouse capabilities

✓ Warehouse capabilities

✓ Delta tables

✓ KQL databases

✓ Eventhouse

✓ Mirrored databases

✓ Shortcuts

✓ SQL vs Spark workloads

✓ Reporting vs data science workloads

✓ Structured vs semi-structured data

✓ Batch vs streaming storage choices

✓ Common architecture decision scenarios


Practice Exam Questions

Question 1

A company needs to store structured, semi-structured, and unstructured data while enabling Spark notebook processing. Which data store should be selected?

A. KQL Database

B. Warehouse

C. Lakehouse

D. Eventhouse

Answer: C

Explanation

Lakehouses support multiple data formats and integrate natively with Spark, making them ideal for data engineering and data science workloads.


Question 2

Which Fabric data store is optimized for traditional SQL-based analytical reporting?

A. Eventhouse

B. Shortcut

C. Lakehouse

D. Warehouse

Answer: D

Explanation

Warehouses are designed for structured relational analytics and provide strong T-SQL support.


Question 3

A company collects billions of sensor readings each day and requires fast time-series analysis.

Which data store is most appropriate?

A. Warehouse

B. Shortcut

C. KQL Database

D. Mirrored Database

Answer: C

Explanation

KQL databases are optimized for telemetry, logs, and time-series analytics.


Question 4

You need to provide access to data stored in Azure Data Lake Storage Gen2 without copying the files into Fabric.

What should you use?

A. Warehouse

B. Mirrored Database

C. Delta Table

D. Shortcut

Answer: D

Explanation

Shortcuts provide virtual access to external data sources while avoiding duplication.


Question 5

Which feature is provided by Delta tables?

A. Automatic dashboard creation

B. ACID transaction support

C. Real-time alerting

D. Event routing

Answer: B

Explanation

Delta tables provide ACID transactions, schema enforcement, and reliable data processing.


Question 6

A business intelligence team primarily writes T-SQL queries and builds dimensional models.

Which storage option should they use?

A. Eventhouse

B. KQL Database

C. Warehouse

D. Lakehouse

Answer: C

Explanation

Warehouses are optimized for SQL analytics and dimensional modeling.


Question 7

A company wants near real-time replication of operational Azure SQL Database data into Fabric.

Which solution should be selected?

A. Mirrored Database

B. Eventhouse

C. Warehouse

D. Delta Table

Answer: A

Explanation

Mirrored databases synchronize source database changes into Fabric with minimal ETL effort.


Question 8

Which Fabric storage option is generally the best choice for machine learning workloads?

A. Warehouse

B. Lakehouse

C. KQL Database

D. Eventhouse

Answer: B

Explanation

Lakehouses support Spark, notebooks, Delta tables, and machine learning workflows.


Question 9

What is the primary purpose of OneLake?

A. Replacing all Lakehouses

B. Serving as Fabric’s unified storage foundation

C. Providing only streaming analytics

D. Managing deployment pipelines

Answer: B

Explanation

OneLake is the centralized storage layer that underpins all Fabric workloads.


Question 10

A solution requires high-volume real-time event analytics with streaming ingestion.

Which Fabric component is most appropriate?

A. Warehouse

B. Mirrored Database

C. Delta Table

D. Eventhouse

Answer: D

Explanation

Eventhouse is designed specifically for large-scale streaming analytics and real-time intelligence scenarios.


Exam Tip

For DP-700, many questions can be answered by remembering this mapping:

RequirementBest Choice
Spark, ML, flexible data formatsLakehouse
SQL analytics and reportingWarehouse
Telemetry and time-series dataKQL Database
Massive real-time event analyticsEventhouse
External data without copyingShortcut
Near real-time operational replicationMirrored Database
Unified Fabric storage layerOneLake

A common exam pattern is to present multiple technically possible options. Focus on selecting the option that is optimized for the workload, not merely one that could work.


Go to the DP-700 Exam Prep Hub main page.

Leave a comment