Prepare data for loading into a dimensional model (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Design and implement loading patterns
      --> Prepare data for loading into a dimensional model


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the primary goals of data engineering is to transform raw operational data into a structure that supports efficient reporting, analytics, and business intelligence. In Microsoft Fabric, this often involves preparing data for loading into a dimensional model.

Dimensional modeling is a foundational concept in data warehousing and analytics. It organizes data into fact tables and dimension tables, enabling fast query performance, simplified reporting, and intuitive business analysis.

For the DP-700 exam, you should understand:

  • Dimensional modeling concepts
  • Fact and dimension tables
  • Star and snowflake schemas
  • Data preparation requirements
  • Surrogate keys
  • Slowly Changing Dimensions (SCDs)
  • Data cleansing and conformance
  • Loading sequence considerations
  • Fabric implementation patterns using Lakehouses, Warehouses, Notebooks, Dataflows Gen2, and Pipelines

Many DP-700 scenario questions focus on selecting the correct transformations and loading patterns to support dimensional models.


What Is a Dimensional Model?

A dimensional model organizes data into structures optimized for analytics rather than transaction processing.

The model consists primarily of:

  • Fact tables
  • Dimension tables

Example:

                Product Dimension
                        |
Customer Dimension --- Fact Sales --- Date Dimension
                        |
                 Store Dimension


This structure allows users to analyze business measures from multiple perspectives.


Why Use a Dimensional Model?

Dimensional models provide:

Improved Query Performance

Analytics queries often execute faster than on normalized transactional systems.

Easier Reporting

Business users can understand facts and dimensions more easily than complex normalized schemas.

Better Scalability

Supports large-scale reporting and analytical workloads.

Simplified BI Development

Tools such as Power BI work exceptionally well with dimensional models.


Fact Tables

Fact tables contain measurable business events.

Examples:

Fact TableBusiness Event
FactSalesSales transactions
FactOrdersCustomer orders
FactInventoryInventory balances
FactClaimsInsurance claims

Fact tables typically contain:

  • Numeric measures
  • Foreign keys to dimensions

Example:

SalesKeyDateKeyProductKeyCustomerKeySalesAmount
100120260101201501250.00

Dimension Tables

Dimension tables provide descriptive attributes used for filtering and grouping.

Examples:

DimensionExample Attributes
CustomerName, Region, Age
ProductCategory, Brand
DateYear, Month, Quarter
StoreLocation, Territory

Example:

CustomerKeyCustomerNameRegion
501Smith CorpEast

Understanding the Star Schema

The most common dimensional design is the star schema.

         Product
            |
Customer -- Fact Sales -- Date
            |
          Store


Characteristics:

  • Central fact table
  • Multiple dimensions
  • Simple joins
  • Excellent reporting performance

For DP-700, the star schema is typically the preferred analytical design.


Understanding the Snowflake Schema

A snowflake schema normalizes dimension tables.

Example:

Product
|
Category
|
Department

Advantages:

  • Reduced redundancy

Disadvantages:

  • More joins
  • Increased complexity

Most Fabric analytics workloads favor star schemas over snowflake schemas.


Data Preparation Before Loading

Raw source data rarely fits directly into a dimensional model.

Preparation typically includes:

  • Data cleansing
  • Standardization
  • Deduplication
  • Business rule application
  • Surrogate key generation
  • Data quality validation

Data Cleansing

Before loading dimensions and facts, incorrect data must be corrected.

Examples:

Inconsistent Values

FL
Florida
Fla

Standardized to:

Florida

Invalid Dates

01/45/2026

Must be corrected or rejected.


Handling Missing Values

Example:

CustomerIDEmail
101NULL

Possible approaches:

  • Default values
  • Unknown members
  • Data quality workflows

A common dimensional modeling practice is using “Unknown” dimension records.


Deduplication

Source systems often contain duplicate records.

Example:

CustomerIDName
100Smith
100Smith

Duplicates should be removed before loading.


Conformed Dimensions

A conformed dimension is shared across multiple fact tables.

Example:

Fact Sales
|
Customer Dimension
|
Fact Orders

Benefits:

  • Consistent reporting
  • Unified business definitions
  • Simplified analytics

DP-700 questions often reference conformed dimensions.


Surrogate Keys

Dimension tables typically use surrogate keys instead of business keys.

Example:

Source System:

CustomerID
CUST100

Dimension:

CustomerKeyCustomerID
501CUST100

Why Use Surrogate Keys?

Advantages include:

Independence from Source Systems

Source keys can change.

Improved Performance

Integer keys are more efficient than text values.

Support for Slowly Changing Dimensions

Surrogate keys help track historical changes.


Dimension Loading Sequence

Dimension tables are usually loaded before fact tables.

Why?

Fact tables require dimension keys.

Typical workflow:

Load Dimensions
Generate Surrogate Keys
Load Fact Tables

Key Lookup Process

During fact loading:

  1. Source business key identified
  2. Matching dimension record located
  3. Surrogate key retrieved
  4. Fact record loaded

Example:

CustomerID = CUST100
CustomerKey = 501
FactSales loaded

Slowly Changing Dimensions (SCD)

Dimensions often change over time.

Example:

Customer moves from:

Florida

to

Texas

The organization must decide how historical records should be handled.


SCD Type 1

Type 1 overwrites existing values.

Example:

Before:

CustomerState
SmithFlorida

After:

CustomerState
SmithTexas

History is lost.


SCD Type 2

Type 2 preserves history.

Example:

CustomerStateCurrent
SmithFloridaNo
SmithTexasYes

Benefits:

  • Historical reporting
  • Auditability
  • Trend analysis

Type 2 SCD is heavily tested in data engineering certifications.


Date Dimensions

Date dimensions are one of the most important dimensions.

Typical attributes:

DateKeyYearQuarterMonth
202601012026Q1January

Benefits:

  • Consistent date calculations
  • Faster reporting
  • Simplified filtering

Fact Table Preparation

Before loading facts:

Validate Measures

Example:

SalesAmount >= 0

Verify Foreign Keys

Ensure referenced dimensions exist.

Remove Invalid Records

Reject records with missing required fields.

Apply Business Rules

Example:

NetSales =
SalesAmount - DiscountAmount

Fact Table Granularity

Granularity defines the level of detail stored.

Examples:

Transaction-Level

One row per sale.

Daily Summary

One row per day.

Monthly Summary

One row per month.

The chosen grain should be clearly defined before loading.


Preparing Data in Microsoft Fabric

Several Fabric tools support dimensional modeling.


Dataflows Gen2

Useful for:

  • Data cleansing
  • Standardization
  • Deduplication
  • Business rule transformations

Best for low-code scenarios.


Notebooks

Useful for:

  • Complex transformations
  • Spark processing
  • SCD implementation
  • Large-scale dimension preparation

Common languages:

  • PySpark
  • Spark SQL

Data Pipelines

Useful for:

  • Orchestration
  • Scheduling
  • Parameterized execution
  • End-to-end ETL workflows

Fabric Warehouses

Useful for:

  • Dimensional storage
  • SQL-based transformations
  • Star schema implementation

Typical Fabric Dimensional Loading Pattern

Source Systems
Bronze Layer
(Raw Data)
Silver Layer
(Cleansed Data)
Dimension Processing
Fact Processing
Gold Layer
(Analytics Model)

This Medallion Architecture pattern is commonly used in Fabric implementations.


Common DP-700 Exam Scenarios

Scenario 1

A reporting system requires historical customer address tracking.

Best solution:

SCD Type 2


Scenario 2

A sales fact table references customer and product dimensions.

Best practice:

Load dimensions before facts.


Scenario 3

A product code changes in the source system.

Best solution:

Use surrogate keys.


Scenario 4

Multiple fact tables require consistent customer reporting.

Best solution:

Conformed dimensions.


Best Practices

Define Granularity Early

Determine the fact table grain before development.


Use Surrogate Keys

Avoid using business keys directly in fact tables.


Load Dimensions First

Fact loads depend on dimension keys.


Implement Data Quality Checks

Prevent invalid data from entering the warehouse.


Use Conformed Dimensions

Promote consistency across analytical models.


Preserve History When Needed

Use SCD Type 2 for historical reporting requirements.


DP-700 Exam Focus Areas

You should understand:

✓ Fact tables

✓ Dimension tables

✓ Star schemas

✓ Snowflake schemas

✓ Conformed dimensions

✓ Surrogate keys

✓ Business keys

✓ Fact table granularity

✓ Dimension loading strategies

✓ Fact loading strategies

✓ SCD Type 1

✓ SCD Type 2

✓ Data cleansing

✓ Data standardization

✓ Medallion Architecture support for dimensional models


Practice Exam Questions

Question 1

Which type of table stores measurable business events in a dimensional model?

A. Staging table

B. Dimension table

C. Lookup table

D. Fact table

Answer: D

Explanation

Fact tables store measurable events such as sales, orders, inventory quantities, and revenue.


Question 2

A data engineer needs to store customer attributes such as customer name, city, and region.

Which table type should be used?

A. Fact table

B. Bridge table

C. Aggregate table

D. Dimension table

Answer: D

Explanation

Dimension tables contain descriptive attributes used for filtering, grouping, and reporting.


Question 3

What is the primary advantage of a star schema?

A. More normalization

B. Simplified queries and better reporting performance

C. Reduced storage requirements

D. Elimination of dimensions

Answer: B

Explanation

Star schemas reduce join complexity and are optimized for analytical workloads.


Question 4

Which key type is typically used as the primary key in a dimension table?

A. Natural key

B. Foreign key

C. Composite key

D. Surrogate key

Answer: D

Explanation

Surrogate keys are system-generated identifiers that improve performance and support Slowly Changing Dimensions.


Question 5

A customer changes states from Florida to Texas, and historical reporting must be preserved.

Which Slowly Changing Dimension type should be used?

A. Type 0

B. Type 1

C. Type 2

D. Type 3

Answer: C

Explanation

Type 2 creates a new dimension record and preserves historical values.


Question 6

What should generally be loaded first during dimensional processing?

A. Dimension tables

B. Aggregate tables

C. Materialized views

D. Fact tables

Answer: A

Explanation

Fact tables require dimension keys, so dimensions are loaded first.


Question 7

Which activity is most commonly performed during data preparation for dimensional modeling?

A. Encrypting storage accounts

B. Data cleansing and standardization

C. Creating dashboards

D. Configuring network firewalls

Answer: B

Explanation

Data cleansing and standardization improve data quality before loading into dimensions and facts.


Question 8

A company wants multiple fact tables to use the same customer dimension.

What type of dimension should be implemented?

A. Slowly Changing Dimension

B. Role-playing Dimension

C. Junk Dimension

D. Conformed Dimension

Answer: D

Explanation

Conformed dimensions provide consistent business definitions across multiple fact tables.


Question 9

What is the primary purpose of a Date dimension?

A. Store transaction details

B. Manage security permissions

C. Provide standardized calendar attributes for reporting

D. Store surrogate key mappings

Answer: C

Explanation

Date dimensions simplify filtering, aggregation, and time-based reporting.


Question 10

A data engineer must ensure that every sales transaction is stored individually.

What fact table grain should be selected?

A. Monthly summary

B. Quarterly summary

C. Daily summary

D. Transaction-level detail

Answer: D

Explanation

Transaction-level grain stores one row per business event, providing the highest level of detail and analytical flexibility.


Exam Tip

For DP-700, remember this fundamental sequence:

Cleanse Data
Build Dimensions
Generate Surrogate Keys
Load Fact Tables
Publish Analytics Model

When an exam question discusses historical tracking, think SCD Type 2. When it discusses reporting performance and simplicity, think Star Schema. When it discusses multiple fact tables sharing the same business entity, think Conformed Dimension. These concepts appear frequently in real-world Fabric data warehouse implementations and certification exam scenarios.


Go to the DP-700 Exam Prep Hub main page.

Leave a comment