This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Monitor and optimize an analytics solution (30–35%)
   --> Optimize performance
      --> Optimize a Lakehouse table

Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Optimizing Lakehouse tables is a critical skill for the DP-700 certification exam and for real-world Microsoft Fabric data engineering solutions. As data volumes grow, poorly optimized Lakehouse tables can lead to slow query performance, increased compute consumption, longer notebook execution times, delayed report refreshes, and higher operational costs.

Microsoft Fabric Lakehouses use the Delta Lake format as their storage foundation. Delta Lake provides ACID transactions, schema enforcement, versioning, and performance optimization features that enable scalable analytics workloads. However, data engineers must actively manage and optimize Lakehouse tables to maintain high performance.

For the DP-700 exam, you should understand:

How Delta tables are stored
Causes of poor Lakehouse performance
File management and compaction
Table optimization techniques
Partitioning strategies
Data skipping
Z-Ordering concepts
VACUUM operations
Query optimization best practices
Monitoring table performance

Understanding Lakehouse Tables

A Lakehouse table in Microsoft Fabric is typically stored as a Delta table within OneLake.

A Delta table consists of:

Data files (typically Parquet)
Delta transaction logs
Metadata
Version history

This architecture provides:

ACID transactions
Time travel
Reliable updates and deletes
Scalable analytics

Although Delta Lake automatically handles many storage operations, performance can degrade over time if tables are not maintained properly.

Why Lakehouse Tables Require Optimization

Over time, data ingestion processes create:

Large numbers of files
Small files
Fragmented storage
Uneven data distribution

Common symptoms include:

Slow SQL queries
Long Spark job runtimes
Delayed report refreshes
Increased resource consumption
Poor filtering performance

Optimization activities help maintain efficient storage and query execution.

The Small File Problem

One of the most common performance issues is excessive small files.

Consider a streaming ingestion process that writes:

Thousands of files per hour
Each file only a few kilobytes

Eventually, the table may contain millions of small files.

Why Small Files Hurt Performance

Every query must:

Read file metadata
Open file handles
Scan numerous files

The overhead often becomes greater than the actual data processing.

Example:

Scenario	File Count
Optimized table	100 files
Fragmented table	50,000 files

The optimized table will generally perform significantly better.

Table Compaction

Compaction combines many small files into fewer larger files.

Benefits include:

Faster query execution
Reduced metadata overhead
Improved scan efficiency
Better Spark performance

Compaction is one of the most important optimization tasks for Delta tables.

Example

Before compaction:

10,000 files × 5 MB

After compaction:

100 files × 500 MB

The total data size remains similar, but query performance often improves substantially.

Using OPTIMIZE

The OPTIMIZE command is commonly used to compact Delta files.

Example:

OPTIMIZE Sales

The command:

Consolidates small files
Improves storage efficiency
Enhances query performance

For the DP-700 exam, understand that OPTIMIZE primarily addresses file fragmentation and small file issues.

Data Skipping

Delta Lake stores statistics about data files.

These statistics help Fabric eliminate unnecessary file scans.

This capability is known as data skipping.

Example:

A query requests:

WHERE OrderDate >= '2026-01-01'

If a file only contains data from 2024, Fabric can skip reading that file entirely.

Benefits include:

Reduced I/O
Faster query performance
Lower compute consumption

Z-Ordering

Z-Ordering improves data locality by physically organizing related values together.

This is particularly useful when queries repeatedly filter on specific columns.

Example:

			
OPTIMIZE Sales
ZORDER BY (CustomerID)

Benefits:

Better file pruning
Faster filtering
Improved query performance

Good Candidates for Z-Ordering

Columns frequently used in:

WHERE clauses
JOIN operations
Report filters
Dashboard slicers

Examples:

CustomerID
ProductID
OrderDate
Region

Poor Candidates

Columns with:

Extremely high cardinality and random access patterns
Rarely used filters
Constantly changing query patterns

Partitioning Strategies

Partitioning physically separates data into directories.

Example:

			
Sales
 ├── Year=2024
 ├── Year=2025
 └── Year=2026

Queries targeting a specific year can read only the relevant partition.

Benefits:

Reduced data scanning
Faster query execution
Improved scalability

Choosing Partition Columns

Good partition columns typically:

Appear frequently in filters
Have moderate cardinality
Create balanced partitions

Examples:

Year
Month
Region
BusinessUnit

Over-Partitioning Risks

Too many partitions can create performance problems.

Poor example:

Partition by CustomerID

If there are millions of customers:

Millions of folders
Small files
Metadata overhead

This often performs worse than a non-partitioned table.

Rule of Thumb

Partition only when:

Query patterns justify it
Data volumes are large
Cardinality is manageable

VACUUM Operations

Delta tables retain historical files to support:

Transactions
Rollbacks
Time travel

Over time, these files consume storage.

VACUUM removes obsolete files.

Example:

VACUUM Sales

Benefits:

Reduces storage consumption
Removes unneeded files
Improves storage efficiency

Important Exam Point

VACUUM does not improve query performance directly.

Its primary purpose is storage cleanup.

Optimizing Data Types

Using appropriate data types improves efficiency.

Examples:

Better Choice	Avoid
INT	STRING for numeric values
DATE	STRING dates
SMALLINT	Oversized numeric types

Benefits:

Smaller storage footprint
Faster filtering
Improved joins
Better compression

Query Optimization Techniques

Sometimes the table is not the problem—the query is.

Use Predicate Filtering

Good:

			
SELECT *
FROM Sales
WHERE Year = 2026

Avoid:

			
SELECT *
FROM Sales

Filtering reduces scanned data.

Select Required Columns

Good:

SELECT CustomerID, SalesAmount

Avoid:

SELECT *

Reading fewer columns improves performance.

Reduce Unnecessary Joins

Complex joins increase execution time.

Use only required tables and columns.

Monitoring Lakehouse Performance

Several Fabric tools help identify optimization opportunities.

SQL Query Monitoring

Review:

Query duration
Resource usage
Execution plans

Notebook Monitoring

Identify:

Long-running Spark jobs
Excessive shuffles
Skewed workloads

Capacity Metrics

Monitor:

CPU utilization
Memory usage
Workload concurrency

Workspace Monitoring

Look for:

Refresh delays
Pipeline bottlenecks
Query slowdowns

Common Optimization Workflow

A typical optimization process might include:

Step 1

Identify slow queries.

Step 2

Determine whether excessive file counts exist.

Step 3

Run OPTIMIZE.

Step 4

Evaluate partitioning strategy.

Step 5

Consider Z-Ordering.

Step 6

Review query design.

Step 7

Run VACUUM when appropriate.

Best Practices

Use OPTIMIZE Regularly

Especially after:

Large batch loads
Frequent incremental loads
Streaming ingestion

Avoid Excessive Small Files

Batch writes when possible.

Partition Carefully

Avoid high-cardinality partition columns.

Use Z-Ordering Selectively

Apply to heavily filtered columns.

Monitor Query Performance

Optimization should be driven by workload patterns.

Schedule Maintenance

Automate optimization processes where possible.

DP-700 Exam Tips

Remember these key points:

Delta tables are the foundation of Fabric Lakehouse storage.
Small files are a major cause of poor performance.
OPTIMIZE primarily addresses file compaction.
Z-Ordering improves filtering performance.
Partitioning reduces scanned data but must be used carefully.
Over-partitioning can degrade performance.
VACUUM removes obsolete files and reduces storage consumption.
Data skipping helps eliminate unnecessary file reads.
Query optimization and table optimization work together.
Monitoring tools help identify performance bottlenecks.

Practice Exam Questions

Question 1

A Lakehouse table contains hundreds of thousands of very small Delta files after months of incremental loads. Which action should you take first?

A. Run OPTIMIZE on the table
B. Run VACUUM on the table
C. Create a new semantic model
D. Increase workspace permissions

Correct Answer: A

Explanation:
OPTIMIZE compacts small files into larger files, reducing metadata overhead and improving query performance. VACUUM removes obsolete files but does not address file fragmentation.

Question 2

What is the primary purpose of the VACUUM command?

A. Improve filtering performance
B. Create partitions automatically
C. Remove obsolete files no longer needed by Delta Lake
D. Rebuild semantic models

Correct Answer: C

Explanation:
VACUUM removes old files that are no longer required for Delta transaction history and time travel, helping reduce storage consumption.

Question 3

Which column is generally the best candidate for partitioning a large sales table?

A. OrderID with millions of unique values
B. TransactionGUID with millions of unique values
C. ProductDescription
D. SalesYear

Correct Answer: D

Explanation:
SalesYear is commonly used in filtering and has manageable cardinality, making it an effective partition column.

Question 4

What problem does data skipping help solve?

A. Excessive security permissions
B. Reading files that cannot possibly contain matching data
C. Semantic model refresh failures
D. Notebook authentication errors

Correct Answer: B

Explanation:
Data skipping uses file statistics to eliminate unnecessary file reads during query execution.

Question 5

A table is frequently filtered using CustomerID. Which optimization technique is most likely to improve performance?

A. Z-Ordering on CustomerID
B. Deleting transaction logs
C. Removing partitions entirely
D. Disabling Delta Lake features

Correct Answer: A

Explanation:
Z-Ordering organizes data based on frequently filtered columns, improving file pruning and query performance.

Question 6

What is a common risk of over-partitioning?

A. Increased data skipping efficiency
B. Reduced storage consumption
C. Excessive numbers of small partitions and files
D. Automatic query acceleration

Correct Answer: C

Explanation:
Over-partitioning can create many small directories and files, leading to metadata overhead and degraded performance.

Question 7

Which query pattern is generally most efficient?

A. SELECT * FROM Sales
B. SELECT CustomerID, SalesAmount FROM Sales WHERE Year = 2026
C. SELECT * FROM Sales CROSS JOIN Products
D. SELECT DISTINCT * FROM Sales

Correct Answer: B

Explanation:
Filtering rows and selecting only required columns minimizes data scanning and improves query efficiency.

Question 8

Which statement about OPTIMIZE is correct?

A. It removes all Delta transaction logs
B. It creates semantic model aggregations
C. It converts Delta tables to Parquet-only tables
D. It compacts many small files into fewer larger files

Correct Answer: D

Explanation:
OPTIMIZE primarily improves performance through file compaction and reduction of small-file fragmentation.

Question 9

A data engineer partitions a table by CustomerID containing 20 million unique customers. What is the most likely result?

A. Improved performance in all scenarios
B. Automatic Z-Ordering
C. Poor performance due to excessive partition cardinality
D. Elimination of Delta logs

Correct Answer: C

Explanation:
Partitioning by extremely high-cardinality columns creates excessive partitions and often harms performance.

Question 10

Which statement best describes Z-Ordering?

A. It removes deleted records permanently
B. It physically organizes related values together to improve query filtering
C. It automatically creates partitions for every column
D. It converts Delta tables into warehouse tables

Correct Answer: B

Explanation:
Z-Ordering improves data locality, helping Fabric skip more files and accelerate queries that filter on selected columns.

Go to the DP-700 Exam Prep Hub main page.

Introduction

Understanding Lakehouse Tables

Why Lakehouse Tables Require Optimization

The Small File Problem

Why Small Files Hurt Performance

Table Compaction

Example

Using OPTIMIZE

Data Skipping

Z-Ordering

Good Candidates for Z-Ordering

Poor Candidates

Partitioning Strategies

Choosing Partition Columns

Over-Partitioning Risks

Rule of Thumb

VACUUM Operations

Important Exam Point

Optimizing Data Types

Query Optimization Techniques

Use Predicate Filtering

Select Required Columns

Reduce Unnecessary Joins

Monitoring Lakehouse Performance

SQL Query Monitoring

Notebook Monitoring

Capacity Metrics

Workspace Monitoring

Common Optimization Workflow

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Best Practices

Use OPTIMIZE Regularly

Avoid Excessive Small Files

Partition Carefully

Use Z-Ordering Selectively

Monitor Query Performance

Schedule Maintenance

DP-700 Exam Tips

Practice Exam Questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Share this:

Related

Leave a comment Cancel reply

Information and resources for the data professionals' community