Category: Power Query

Implement Relationships, Such as Bridge Tables and Many-to-Many Relationships

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models (25-30%)
--> Design and build semantic models
--> Implement Relationships, Such as Bridge Tables

and Many-to-Many Relationships

Why Relationships Matter in Semantic Models

In Microsoft Fabric and Power BI semantic models, relationships define how tables interact and how filters propagate across data. Well-designed relationships are critical for:

  • Accurate aggregations
  • Predictable filtering behavior
  • Correct DAX calculations
  • Optimal query performance

While one-to-many relationships are preferred, real-world data often requires handling many-to-many relationships using techniques such as bridge tables.


Common Relationship Types in Semantic Models

1. One-to-Many (Preferred)

  • One dimension row relates to many fact rows
  • Most common and performant relationship
  • Typical in star schemas

Example:

  • DimCustomer → FactSales

2. Many-to-Many

  • Multiple rows in one table relate to multiple rows in another
  • More complex filtering behavior
  • Can negatively impact performance if not modeled correctly

Example:

  • Customers associated with multiple regions
  • Products assigned to multiple categories

Understanding Many-to-Many Relationships

Native Many-to-Many Relationships

Power BI supports direct many-to-many relationships, but these should be used carefully.

Characteristics:

  • Cardinality: Many-to-many
  • Filters propagate ambiguously
  • DAX becomes harder to reason about

Exam Tip:
Direct many-to-many relationships are supported but not always recommended for complex models.


Bridge Tables (Best Practice)

A bridge table (also called a factless fact table) resolves many-to-many relationships by introducing an intermediate table.

What Is a Bridge Table?

A table that:

  • Contains keys from two related entities
  • Has no numeric measures
  • Enables controlled filtering paths

Example Scenario

Business case:
Products can belong to multiple categories.

Tables:

  • DimProduct (ProductID, Name)
  • DimCategory (CategoryID, CategoryName)
  • BridgeProductCategory (ProductID, CategoryID)

Relationships:

  • DimProduct → BridgeProductCategory (one-to-many)
  • DimCategory → BridgeProductCategory (one-to-many)

This converts a many-to-many relationship into two one-to-many relationships.


Benefits of Using Bridge Tables

BenefitDescription
Predictable filteringClear filter paths
Better DAX controlEasier to write and debug measures
Improved performanceAvoids ambiguous joins
ScalabilityHandles complex relationships cleanly

Filter Direction Considerations

Single vs Bidirectional Filters

  • Single direction (recommended):
    Filters flow from dimension → bridge → fact
  • Bidirectional:
    Can simplify some scenarios but increases ambiguity

Exam Guidance:

  • Use single-direction filters by default
  • Enable bidirectional filtering only when required and understood

Many-to-Many and DAX Implications

When working with many-to-many relationships:

  • Measures may return unexpected results
  • DISTINCTCOUNT is commonly required
  • Explicit filtering using DAX functions may be necessary

Common DAX patterns:

  • CALCULATE
  • TREATAS
  • CROSSFILTER (advanced)

Relationship Best Practices for DP-600

  • Favor star schemas with one-to-many relationships
  • Use bridge tables instead of direct many-to-many when possible
  • Avoid unnecessary bidirectional filters
  • Validate relationship cardinality and direction
  • Test measures under different filtering scenarios

Common Exam Scenarios

You may see questions like:

  • “How do you model a relationship where products belong to multiple categories?”
  • “What is the purpose of a bridge table?”
  • “What are the risks of many-to-many relationships?”

Correct answers typically emphasize:

  • Bridge tables
  • Controlled filter propagation
  • Avoiding ambiguous relationships

Star Schema vs Many-to-Many Models

FeatureStar SchemaMany-to-Many
ComplexityLowHigher
PerformanceBetterLower
DAX simplicityHighLower
Use casesMost analyticsSpecialized scenarios

Summary

Implementing relationships correctly is foundational to building reliable semantic models in Microsoft Fabric:

  • One-to-many relationships are preferred
  • Many-to-many relationships should be handled carefully
  • Bridge tables provide a scalable, exam-recommended solution
  • Clear relationships lead to accurate analytics and simpler DAX

Exam Tip

If a question involves multiple entities relating to each other, or many-to-many relationships, the most likely answer usually includes using a “bridge table”.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

1. Which relationship type is generally preferred in Power BI semantic models?

A. Many-to-many
B. One-to-one
C. One-to-many
D. Bidirectional many-to-many

Correct Answer: C

Explanation:
One-to-many relationships provide predictable filter propagation, better performance, and simpler DAX calculations.


2. What is the primary purpose of a bridge table?

A. Store aggregated metrics
B. Normalize dimension attributes
C. Resolve many-to-many relationships
D. Improve data refresh performance

Correct Answer: C

Explanation:
Bridge tables convert many-to-many relationships into two one-to-many relationships, improving model clarity and control.


3. Which characteristic best describes a bridge table?

A. Contains numeric measures
B. Stores transactional data
C. Contains keys from related tables only
D. Is always filtered bidirectionally

Correct Answer: C

Explanation:
Bridge tables typically contain only keys (foreign keys) and no measures, enabling relationship resolution.


4. What is a common risk of using native many-to-many relationships directly?

A. They cannot be refreshed
B. They cause data duplication
C. They create ambiguous filter propagation
D. They are unsupported in Fabric

Correct Answer: C

Explanation:
Native many-to-many relationships can result in ambiguous filtering and unpredictable aggregation results.


5. In a bridge table scenario, how are relationships typically defined?

A. Many-to-many on both sides
B. One-to-one from both dimensions
C. One-to-many from each dimension to the bridge
D. Bidirectional many-to-one

Correct Answer: C

Explanation:
Each dimension connects to the bridge table using a one-to-many relationship.


6. When should bidirectional filtering be enabled?

A. Always, for simplicity
B. Only when necessary and well-understood
C. Only on fact tables
D. Never in semantic models

Correct Answer: B

Explanation:
Bidirectional filters can be useful but introduce complexity and ambiguity if misused.


7. Which scenario is best handled using a bridge table?

A. A customer has one address
B. A sale belongs to one product
C. A product belongs to multiple categories
D. A date table relates to a fact table

Correct Answer: C

Explanation:
Products belonging to multiple categories is a classic many-to-many scenario requiring a bridge table.


8. How does a properly designed bridge table affect DAX measures?

A. Makes measures harder to write
B. Requires custom SQL logic
C. Enables predictable filter behavior
D. Eliminates the need for CALCULATE

Correct Answer: C

Explanation:
Bridge tables create clear filter paths, making DAX behavior more predictable and reliable.


9. Which DAX function is commonly used to handle complex many-to-many filtering scenarios?

A. SUMX
B. RELATED
C. TREATAS
D. LOOKUPVALUE

Correct Answer: C

Explanation:
TREATAS is often used to apply filters across tables that are not directly related.


10. For DP-600 exam questions involving many-to-many relationships, which solution is typically preferred?

A. Direct many-to-many relationships
B. Denormalized fact tables
C. Bridge tables with one-to-many relationships
D. Duplicate dimension tables

Correct Answer: C

Explanation:
The exam emphasizes scalable, maintainable modeling practices — bridge tables are the recommended solution.


Implement a Star Schema for a Semantic Model

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models
--> Design and build semantic models
--> Implement a Star Schema for a Semantic Model

What Is a Star Schema?

A star schema is a logical data modeling pattern optimized for analytics and reporting. It organizes data into:

  • Fact tables: Contain numeric measurements (metrics) of business processes
  • Dimension tables: Contain descriptive attributes used for slicing, grouping, and filtering

The schema resembles a star: a central fact table with multiple dimensions radiating outward.


Why Use a Star Schema for Semantic Models?

Star schemas are widely used in Power BI semantic models (Tabular models) because they:

  • Improve query performance: Simplified joins and clear relationships enable efficient engine processing
  • Simplify reporting: Easy for report authors to understand and navigate
  • Support fast aggregations: Summary measures are computed more efficiently
  • Integrate with DAX naturally: Reduces complexity of measures

In DP-600 scenarios where performance and reusability matter, star schemas are often the best design choice.


Semantic Models and Star Schema

Semantic models define business logic that sits on top of data. Star schemas support semantic models by:

  • Providing clean dimensional context (e.g., Product, Region, Time)
  • Ensuring facts are centrally located for aggregations
  • Reducing the number of relationships and cycles
  • Enabling measures to be defined once and reused across visuals

Semantic models typically import star schema tables into Power BI, Direct Lake, or DirectQuery contexts.


Elements of a Star Schema

Fact Tables

A fact table stores measurable, numeric data about business events.

Examples:

  • Sales
  • Orders
  • Transactions
  • Inventory movements

Characteristics:

  • Contains foreign keys referring to dimensions
  • Contains numeric measures (e.g., quantity, revenue)

Dimension Tables

Dimension tables store contextual attributes that describe facts.

Examples:

  • Customer (name, segment, region)
  • Product (category, brand)
  • Date (calendar attributes)
  • Store or location

Characteristics:

  • Typically smaller than fact tables
  • Used to filter and group measures

Building a Star Schema for a Semantic Model

1. Identify the Grain of the Fact Table

The grain defines the level of detail in the fact table — for example:

  • One row per sales transaction per customer per day

Understand the grain before building dimensions.


2. Design Dimension Tables

Dimensions should be:

  • Descriptive
  • De-duplicated
  • Hierarchical where relevant (e.g., Country > State > City)

Example:

DimProductDimCustomerDimDate
ProductIDCustomerIDDateKey
NameNameYear
CategorySegmentQuarter
BrandRegionMonth

3. Define Relationships

Semantic models should have clear relationships:

  • Fact → Dimension: one-to-many
  • No ambiguous cycles
  • Avoid overly complex circular relationships

In a star schema:

  • Fact table joins to each dimension
  • Dimensions do not join to each other directly

4. Import into Semantic Model

In Power BI Desktop or Fabric:

  • Load fact and dimension tables
  • Validate relationships
  • Ensure correct cardinality
  • Mark the Date dimension as a Date table if appropriate

Benefits in Semantic Modeling

BenefitDescription
PerformanceSimplified relationships yield faster queries
UsabilityModel is intuitive for report authors
MaintenanceEasier to document and manage
DAX SimplicityMeasures use clear filter paths

DAX and Star Schema

Star schemas make DAX measures more predictable:

Example measure:

Total Sales = SUM(FactSales[SalesAmount])

With a proper star schema:

  • Filtering by dimension (e.g., DimCustomer[Region] = “West”) automatically propagates to the fact table
  • DAX measure logic is clean and consistent

Star Schema vs Snowflake Schema

FeatureStar SchemaSnowflake Schema
ComplexitySimpleMore complex
Query performanceTypically betterSlightly slower
Modeling effortLowerHigher
NormalizationLowHigh

For analytical workloads (like in Fabric and Power BI), star schemas are generally preferred.


When to Apply a Star Schema

Use star schema design when:

  • You are building semantic models for BI/reporting
  • Data is sourced from multiple systems
  • You need to support slicing and dicing by multiple dimensions
  • Performance and maintainability are priorities

Semantic models built on star schemas work well with:

  • Import mode
  • Direct Lake with dimensional context
  • Composite models

Common Exam Scenarios

You might encounter questions like:

  • “Which table should be the fact in this model?”
  • “Why should dimensions be separated from fact tables?”
  • “How does a star schema improve performance in a semantic model?”

Key answers will focus on:

  • Simplified relationships
  • Better DAX performance
  • Intuitive filtering and slicing

Best Practices for Semantic Star Schemas

  • Explicitly define date tables and mark them as such
  • Avoid many-to-many relationships where possible
  • Keep dimensions denormalized (flattened)
  • Ensure fact tables have surrogate keys linking to dimensions
  • Validate cardinality and relationship directions

Exam Tip

If a question emphasizes performance, simplicity, clear filtering behavior, and ease of reporting, a star schema is likely the correct design choice / optimal answer.


Summary

Implementing a star schema for a semantic model is a proven best practice in analytics:

  • Central fact table
  • Descriptive dimensions
  • One-to-many relationships
  • Optimized for DAX and interactive reporting

This approach supports Fabric’s goal of providing fast, flexible, and scalable analytics.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

1. What is the primary purpose of a star schema in a semantic model?

A. To normalize data to reduce storage
B. To optimize transactional workloads
C. To simplify analytics and improve query performance
D. To enforce row-level security

Correct Answer: C

Explanation:
Star schemas are designed specifically for analytics. They simplify relationships and improve query performance by organizing data into fact and dimension tables.


2. In a star schema, what type of data is typically stored in a fact table?

A. Descriptive attributes such as names and categories
B. Hierarchical lookup values
C. Numeric measures related to business processes
D. User-defined calculated columns

Correct Answer: C

Explanation:
Fact tables store measurable, numeric values such as revenue, quantity, or counts, which are analyzed across dimensions.


3. Which relationship type is most common between fact and dimension tables in a star schema?

A. One-to-one
B. One-to-many
C. Many-to-many
D. Bidirectional many-to-many

Correct Answer: B

Explanation:
Each dimension record (e.g., a customer) can relate to many fact records (e.g., multiple sales), making one-to-many relationships standard.


4. Why are star schemas preferred over snowflake schemas in Power BI semantic models?

A. Snowflake schemas require more storage
B. Star schemas improve DAX performance and model usability
C. Snowflake schemas are not supported in Fabric
D. Star schemas eliminate the need for relationships

Correct Answer: B

Explanation:
Star schemas reduce relationship complexity, making DAX calculations simpler and improving query performance.


5. Which table should typically contain a DateKey column in a star schema?

A. Dimension tables only
B. Fact tables only
C. Both fact and dimension tables
D. Neither table type

Correct Answer: C

Explanation:
The fact table uses DateKey as a foreign key, while the Date dimension uses it as a primary key.


6. What is the “grain” of a fact table?

A. The number of rows in the table
B. The level of detail represented by each row
C. The number of dimensions connected
D. The data type of numeric columns

Correct Answer: B

Explanation:
Grain defines what a single row represents (e.g., one sale per customer per day).


7. Which modeling practice helps ensure optimal performance in a semantic model?

A. Creating relationships between dimension tables
B. Using many-to-many relationships by default
C. Keeping dimensions denormalized
D. Storing text attributes in the fact table

Correct Answer: C

Explanation:
Denormalized (flattened) dimension tables reduce joins and improve query performance in analytic models.


8. What happens when a dimension is used to filter a report in a properly designed star schema?

A. The filter applies only to the dimension table
B. The filter automatically propagates to the fact table
C. The filter is ignored by measures
D. The filter causes a many-to-many relationship

Correct Answer: B

Explanation:
Filters flow from dimension tables to the fact table through one-to-many relationships.


9. Which scenario is best suited for a star schema in a semantic model?

A. Real-time transactional processing
B. Log ingestion with high write frequency
C. Interactive reporting with slicing and aggregation
D. Application-level CRUD operations

Correct Answer: C

Explanation:
Star schemas are optimized for analytical queries involving aggregation, filtering, and slicing.


10. What is a common modeling mistake when implementing a star schema?

A. Using surrogate keys
B. Creating direct relationships between dimension tables
C. Marking a date table as a date table
D. Defining one-to-many relationships

Correct Answer: B

Explanation:
Dimensions should not typically relate to each other directly in a star schema, as this introduces unnecessary complexity.


Choose a storage mode – additional information

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models
--> Design and build semantic models
--> Choose a storage mode

This is supplemental information to what is included in the "Choose a storage mode" post.

DP-600 Cheat Sheet: Choosing a Storage Mode in Microsoft Fabric

Storage Mode Decision Matrix

Requirement / ScenarioImportDirectQueryDirect LakeComposite
Best query performance✅ Excellent❌ Depends on source✅ Excellent✅ Very good
Near real-time data❌ No✅ Yes✅ Yes✅ Yes
Large datasets (TB-scale)❌ Limited✅ Yes✅ Yes✅ Yes
Minimal refresh overhead❌ Requires refresh✅ No refresh✅ No refresh⚠ Partial
Uses OneLake Delta tables❌ Not required❌ Not required✅ Required✅ Optional
Full DAX & modeling features✅ Full support⚠ Limited⚠ Limited✅ Full
Calculated tables supported✅ Yes❌ No❌ No✅ Yes (Import tables only)
Lowest data duplication❌ High✅ None✅ None⚠ Mixed
Simple to manage✅ Yes⚠ Depends on source⚠ Fabric-specific❌ More complex

When to Choose Each Storage Mode

✅ Import Mode — Choose when:

  • Dataset fits comfortably in memory
  • You need complex DAX, calculated tables, or calculated columns
  • Performance is the top priority
  • Data freshness can be managed via scheduled refresh

Exam clue words: fastest, complex calculations, small to medium data


✅ DirectQuery — Choose when:

  • Data must always be current
  • Source system is highly optimized (SQL, Synapse, etc.)
  • Data volume is very large
  • You want zero data duplication

Exam clue words: real-time, source system, no refresh


✅ Direct Lake — Choose when:

  • Data is stored as Delta tables in OneLake
  • Dataset is large and frequently updated
  • You want Import-like performance without refresh
  • You’re working fully within Fabric

Exam clue words: OneLake, Delta, no refresh, Fabric-optimized


✅ Composite Model — Choose when:

  • You need flexibility across different tables
  • Fact tables are large and live (Direct Lake / DirectQuery)
  • Dimension tables are small and stable (Import)
  • You want performance and modeling flexibility

Exam clue words: hybrid, mix storage modes, dimension vs fact


Fast Exam Inclusion/Elimination Tips

  • Calculated tables required? → Import or Composite
  • OneLake + Delta tables? → Direct Lake
  • Real-time + external source? → DirectQuery
  • Best balance of flexibility and scale? → Composite

One-Sentence Exam Rule

If it’s in OneLake and too big to refresh, Direct Lake is usually the right answer.

Choose a Storage Mode

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models (25-30%)
--> Design and build semantic models
--> Choose a storage mode

What Is Storage Mode?

In Microsoft Fabric, storage mode determines how a semantic model accesses and processes data. It affects performance, freshness, compute behavior, and model capabilities. Choosing the right storage mode is critical when designing semantic models for analytics and reporting.

A semantic model (Power BI dataset) can use different storage modes for its tables — and when multiple modes coexist, the model is called a composite model. DEV Community


Common Storage Modes

There are three primary storage modes you should know for the exam:

1. Import Mode

  • Stores data inside the semantic model in memory (VertiPaq) after a refresh. DEV Community
  • Offers fast query performance since data is cached locally.
  • Requires scheduled or manual refresh to update data from the source.
  • Supports the full range of modeling features (e.g., calculated tables, complex DAX).

When to use Import Mode:

  • Data fits in memory and doesn’t need real-time freshness.
  • You need complex calculations or modeling features requiring data in memory.
  • You want high performance for interactive analytics.

Pros:

  • Very fast interactive queries
  • Full DAX and modeling capabilities

Cons:

  • Must schedule refreshes
  • Data freshness depends on refresh cadence

2. DirectQuery Mode

  • Semantic model does not store data locally — queries are sent to the underlying source (SQL, warehouse, etc.) at query time. DEV Community
  • Ensures real-time or near-real-time data because no import refresh is needed.

When to use DirectQuery:

  • Source data changes frequently and must always show the latest results.
  • Data volumes are too large to import fully.

Pros:

  • Real-time access to source data
  • No refresh cycles required

Cons:

  • Performance depends heavily on source system
  • Some modeling features may be limited compared with Import

3. Direct Lake Mode

A newer, Fabric-specific storage mode designed to combine performance and freshness:

  • Reads Delta tables directly from OneLake and loads necessary column data into memory. Microsoft Learn
  • Avoids full data copy, eliminating the long import refresh cycle.
  • Uses the VertiPaq engine for fast aggregations and interactions (similar to import).
  • Offers low-latency synch with source changes without heavy refresh workloads.
  • Supports real-time insights while minimizing data movement. Microsoft Learn

When to use Direct Lake:

  • Working with extremely large datasets that would be costly or impractical to import entirely.
  • Needing relatively fresh data without long refresh cycles typical of Import mode.
  • Integrating tightly with delta-based assets such as Fabric lakehouses and warehouses. Microsoft Learn

Pros:

  • Fast querying with fresher data than import
  • No heavy refresh cycles
  • Leverages OneLake integration and existing delta tables

Cons:

  • Some modeling features (like calculated tables) are limited or not supported in Direct Lake tables (those tables must be switched to Import if needed). Microsoft Fabric Community
  • May fall back to DirectQuery in certain conditions (e.g., tables requiring SQL endpoint security). Microsoft Learn

Composite Models

A semantic model may include a mix of storage modes — for example, some tables in Direct Lake and others in Import. This is called a composite model. DEV Community

Typical use cases for composite models:

  • Import frequently used dimension tables (to support calculated tables)
  • Use Direct Lake for large fact tables stored in OneLake
  • Balance performance with modeling flexibility

Choosing the Right Storage Mode — Key Factors

When deciding on a storage mode for your semantic model, consider:

1. Data Freshness Requirements

  • Real-time data? → DirectQuery or Direct Lake
  • Static or periodic data? → Import

2. Dataset Size

  • Large volumes (multi-TB) without capacity for full import? → Direct Lake
  • Manageable size within memory? → Import

3. Modeling Features Needed

  • Complex measures, calculated tables, custom hierarchies? → Import (or mix)

4. Performance Needs

  • High interactive performance with good freshness? → Direct Lake
  • Ultimate speed with full caching? → Import

5. Source Capabilities

  • Some sources may not support DirectQuery efficiently — understand source performance.

Practical Examples

  • Import Mode: Small/medium enterprise data warehouse reporting that runs daily refreshes.
  • DirectQuery: Regulatory reporting where every query must reflect the latest operational data in a SQL system.
  • Direct Lake: Analytics on massive delta datasets stored in OneLake, where import is impractical but freshness and performance are both essential. Microsoft Learn

Exam Tips

  • Know what each mode does (Import vs DirectQuery vs Direct Lake).
  • Understand trade offs between performance, freshness, and modeling capability.
  • Recognize Direct Lake as a Fabric-optimized hybrid mode ideal for delta lake data.
  • Be prepared to choose the mode based on scenario requirements like latency, size, and features.

Summary

Storage ModeData LocationRefreshPerformanceBest Use Case
ImportIn model memoryScheduledVery fastSmaller datasets needing complex logic
DirectQuerySourceReal-timeSource-dependentReal-time needs
Direct LakeOneLake delta filesNear real-timeFast, scalableLarge datasets in OneLake Microsoft Learn

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

1. Which storage mode stores data fully in memory within the semantic model?

A. DirectQuery
B. Direct Lake
C. Import
D. Composite

Correct Answer: C. Import

Explanation:
Import mode loads data into the VertiPaq in-memory engine inside the semantic model, providing the fastest query performance but requiring refreshes.


2. Which storage mode provides real-time access to data by querying the source system at query time?

A. Import
B. DirectQuery
C. Direct Lake
D. Cached

Correct Answer: B. DirectQuery

Explanation:
DirectQuery does not store data locally. Each query is sent directly to the source system, ensuring real-time or near-real-time results.


3. What is a key advantage of Direct Lake compared to Import mode?

A. Supports more DAX functions
B. Requires no OneLake integration
C. Avoids full data refresh while maintaining high performance
D. Works only with SQL Server

Correct Answer: C. Avoids full data refresh while maintaining high performance

Explanation:
Direct Lake reads Delta tables directly from OneLake, avoiding large import refreshes while still using the VertiPaq engine for fast analytics.


4. Which scenario is best suited for Import mode?

A. A dataset requiring real-time updates every second
B. A small to medium dataset with complex DAX calculations
C. A multi-terabyte lakehouse fact table
D. Streaming event data

Correct Answer: B. A small to medium dataset with complex DAX calculations

Explanation:
Import mode supports the full range of modeling features and offers excellent performance for datasets that fit comfortably in memory.


5. Which storage mode is specifically optimized for Delta tables stored in OneLake?

A. Import
B. DirectQuery
C. Direct Lake
D. Hybrid

Correct Answer: C. Direct Lake

Explanation:
Direct Lake is a Fabric-optimized storage mode designed to work directly with Delta tables in OneLake.


6. A semantic model includes some tables in Import mode and others in Direct Lake mode. What is this called?

A. Hybrid model
B. Incremental model
C. Composite model
D. Federated model

Correct Answer: C. Composite model

Explanation:
A composite model uses multiple storage modes within the same semantic model, allowing flexibility between performance and freshness.


7. Which limitation applies to Direct Lake tables?

A. They cannot be refreshed
B. They do not support relationships
C. Calculated tables are not supported directly
D. They cannot be queried using DAX

Correct Answer: C. Calculated tables are not supported directly

Explanation:
Calculated tables require Import mode. Direct Lake tables must be switched to Import if calculated tables are needed.


8. What primarily determines query performance when using DirectQuery mode?

A. The VertiPaq engine
B. The refresh schedule
C. The source system’s performance
D. OneLake caching

Correct Answer: C. The source system’s performance

Explanation:
In DirectQuery mode, queries are executed against the source system, so performance depends on source optimization and capacity.


9. Which storage mode minimizes data duplication while still offering high query performance?

A. Import
B. DirectQuery
C. Direct Lake
D. Cached Import

Correct Answer: C. Direct Lake

Explanation:
Direct Lake avoids copying data into the model while still leveraging in-memory query acceleration, minimizing duplication and refresh overhead.


10. You need near real-time analytics on a very large dataset stored in OneLake without long refresh times. Which storage mode should you choose?

A. Import
B. DirectQuery
C. Direct Lake
D. Snapshot

Correct Answer: C. Direct Lake

Explanation:
Direct Lake is ideal for large OneLake datasets where full import refreshes are impractical but fast, fresh analytics are required.

Filter Data

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data
--> Transform data
--> Filter data

Filtering data is one of the most fundamental transformation operations used when preparing analytics data. It ensures that only relevant, valid, and accurate records are included in curated tables or models. Filtering improves performance, reduces unnecessary processing overhead, and helps enforce business logic early in the analytics pipeline.

In Microsoft Fabric, filtering occurs at multiple transformation layers — from ingestion tools to interactive modeling. For the DP-600 exam, you should understand where, why, and how to filter data effectively using various tools and technologies within Fabric.


Why Filter Data?

Filtering data serves several key purposes in analytics:

1. Improve Query and Report Performance

  • Reduces the amount of data scanned and processed
  • Enables faster refresh and retrieval

2. Enforce Business Logic

  • Excludes irrelevant segments (e.g., test data, canceled transactions)
  • Supports clean analytical results

3. Prepare Analytics-Ready Data

  • Limits datasets to required time periods or categories
  • Produces smaller, focused outputs for reporting

4. Reduce Cost

  • Smaller processing needs reduce compute and storage overhead

Where Filtering Happens in Microsoft Fabric

Filtering can be implemented at multiple stages:

LayerHow You Filter
Power Query (Dataflows Gen2 / Lakehouse)UI filters or M code
SQL (Warehouse & Lakehouse SQL analytics)WHERE clauses
Spark (Lakehouse Notebooks)DataFrame filter() / where()
Pipelines (Data Movement)Source filters or query-based extraction
Semantic Models (Power BI / DAX)Query filters, slicers, and row-level security

Filtering early, as close to the data source as possible, ensures better performance downstream.


Tools and Techniques

1. Power Query (Low-Code)

Power Query provides a user-friendly interface to filter rows:

  • Text filters: Equals, Begins With, Contains, etc.
  • Number filters: Greater than, Between, Top N, etc.
  • Date filters: Before, After, This Month, Last 12 Months, etc.
  • Remove blank or null values

These filters are recorded as transformation steps and can be reused or versioned.


2. SQL (Warehouses & Lakehouses)

SQL filtering uses the WHERE clause:

SELECT *
FROM Sales
WHERE OrderDate >= '2025-01-01'
  AND Country = 'USA';

SQL filtering is efficient and pushed down to the engine, reducing row counts early.


3. Spark (Notebooks)

Filtering in Spark (PySpark example):

filtered_df = df.filter(df["SalesAmount"] > 1000)

Or with SQL in Spark:

SELECT *
FROM sales
WHERE SalesAmount > 1000;

Spark filtering is optimized for distributed processing across big datasets.


4. Pipelines (Data Movement)

During ingestion or ETL, you can apply filters in:

  • Copy activity query filters
  • Source queries
  • Pre-processing steps

This ensures only needed rows land in the target store.


5. Semantic Model Filters

In Power BI and semantic models, filtering can happen as:

  • Report filters
  • Slicers and visuals
  • Row-Level Security (RLS) — security-driven filtering

These filters control what users see rather than what data is stored.


Business and Data Quality Scenarios

Filtering is often tied to business needs such as:

  • Excluding invalid, test, or archived records
  • Restricting to active customers only
  • Selecting a specific date range (e.g., last fiscal year)
  • Filtering data for regional or product segments

Filtering vs Security

It’s important to distinguish filtering for transformation from security filters:

FilteringSecurity
Removes unwanted rows during transformationControls what users are allowed to see
Improves performanceEnforces access control
Happens before modelingHappens during query evaluation

Best Practices

When filtering data in Microsoft Fabric:

  • Filter early in the pipeline to reduce volume
  • Use pushdown filters in SQL when querying large sources
  • Document filtering logic for audit and governance
  • Combine filters logically (AND/OR) to match business rules
  • Avoid filtering in the semantic model when it can be done upstream

Common Exam Scenarios

You may be asked to:

  • Choose the correct tool and stage for filtering
  • Translate business rules into filter logic
  • Recognize when filtering improves performance
  • Identify risks of filtering too late or in the wrong layer

Example exam prompt:
A dataset should exclude test transactions and include only the last 12 months of sales. Which transformation step should be applied and where?
The correct answer will involve filtering early with SQL or Power Query before modeling.


Key Takeaways

  • Filtering data is a core part of preparing analytics-ready datasets.
  • Multiple Fabric components support filtering (Power Query, SQL, Spark, pipelines).
  • Filtering early improves performance and reduces unnecessary workload.
  • Understand filtering in context — transformation vs. security.

Final Exam Tips

  • When a question asks about reducing dataset size, improving performance, or enforcing business logic before loading into a model, filtering is often the correct action — and it usually belongs upstream.
  • Filter early and upstream whenever possible
  • Use SQL or Power Query for transformation-level filtering
  • Avoid relying solely on report-level filters for large datasets
  • Distinguish filtering for performance from security filtering

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

Question 1

What is the primary purpose of filtering data during the transformation phase?

A. To enforce user-level security
B. To reduce data volume and improve performance
C. To encrypt sensitive columns
D. To normalize data structures

Correct Answer: B

Explanation:
Filtering removes unnecessary rows early in the pipeline, reducing data volume, improving performance, and lowering compute costs. Security and normalization are separate concerns.


Question 2

Which Fabric component allows low-code, UI-driven row filtering during data preparation?

A. Spark notebooks
B. SQL warehouse
C. Power Query (Dataflows Gen2)
D. Semantic models

Correct Answer: C

Explanation:
Power Query provides a graphical interface for filtering rows using text, numeric, and date-based filters, making it ideal for low-code transformations.


Question 3

Which SQL clause is used to filter rows in a lakehouse or warehouse?

A. GROUP BY
B. HAVING
C. WHERE
D. ORDER BY

Correct Answer: C

Explanation:
The WHERE clause filters rows before aggregation or sorting, making it the primary SQL mechanism for data filtering.


Question 4

Which filtering approach is most efficient for very large datasets?

A. Filtering in Power BI visuals
B. Filtering after loading data into a semantic model
C. Filtering at the source using SQL or ingestion queries
D. Filtering using calculated columns

Correct Answer: C

Explanation:
Filtering as close to the source as possible minimizes data movement and processing, making it the most efficient approach for large datasets.


Question 5

In a Spark notebook, which method is commonly used to filter a DataFrame?

A. select()
B. filter() or where()
C. join()
D. distinct()

Correct Answer: B

Explanation:
Spark DataFrames use filter() or where() to remove rows based on conditions.


Question 6

Which scenario is an example of business-rule filtering?

A. Removing duplicate rows
B. Converting text to numeric data types
C. Excluding canceled orders from sales analysis
D. Creating a star schema

Correct Answer: C

Explanation:
Business-rule filtering enforces organizational logic, such as excluding canceled or test transactions from analytics.


Question 7

What is the key difference between data filtering and row-level security (RLS)?

A. Filtering improves query speed; RLS does not
B. Filtering removes data; RLS restricts visibility
C. Filtering is applied only in SQL; RLS is applied only in Power BI
D. Filtering is mandatory; RLS is optional

Correct Answer: B

Explanation:
Filtering removes rows from the dataset, while RLS controls which rows users can see without removing the data itself.


Question 8

Which filtering method is typically applied after data has already been loaded?

A. Source query filters
B. Pipeline copy activity filters
C. Semantic model report filters
D. Power Query transformations

Correct Answer: C

Explanation:
Report and visual filters in semantic models are applied at query time and do not reduce stored data volume.


Question 9

Why is filtering data early in the pipeline considered a best practice?

A. It increases data redundancy
B. It simplifies semantic model design
C. It reduces processing and storage costs
D. It improves data encryption

Correct Answer: C

Explanation:
Early filtering minimizes unnecessary data processing and storage, improving efficiency across the entire analytics solution.


Question 10

A dataset should include only the last 12 months of data. Where should this filter ideally be applied?

A. In Power BI slicers
B. In the semantic model
C. During data ingestion or transformation
D. In calculated measures

Correct Answer: C

Explanation:
Applying time-based filters during ingestion or transformation ensures only relevant data is processed and stored, improving performance and consistency.


Convert Column Data Types

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data
--> Transform data
--> Convert column data types

Converting data types is a fundamental transformation task in data preparation. It helps ensure data consistency, accurate calculations, filter behavior, sorting, joins, and overall query correctness. In Microsoft Fabric, data type conversion can happen in Power Query, SQL, or Spark depending on the workload and where you are in your data pipeline.

This article explains why, where, and how you convert data types in Fabric, with an emphasis on real-world scenarios and exam relevance.

Why Convert Data Types?

Data type mismatches can lead to:

  • Erroneous joins (e.g., joining text to numeric)
  • Incorrect aggregations (e.g., sums treating numbers as text)
  • Filtering issues (e.g., date strings not filtering as dates)
  • Unexpected sort order (e.g., text sorts differently from numbers)

In analytics, getting data types right is critical for both the correctness of results and query performance.

Common Data Types in Analytics

Here are some common data types you’ll work with:

CategoryExamples
NumericINT, BIGINT, DECIMAL, FLOAT
TextSTRING, VARCHAR
Date/TimeDATE, TIME, DATETIME, TIMESTAMP
BooleanTRUE / FALSE

Where Data Type Conversion Occurs in Fabric

Depending on workload and tool, you may convert data types in:

Power Query (Dataflows Gen2 & Lakehouses)

  • Visual change type steps (Menu → Transform → Data Type)
  • Applied steps stored in the query
  • Useful for low-code transformation

SQL (Warehouse & Lakehouse SQL Analytics)

  • CAST, CONVERT, or TRY_CAST in SQL
  • Applies at query time or when persisting transformed data

Spark (Lakehouse Notebooks)

  • Explicit schema definitions
  • Transformation commands like withColumn() with type conversion functions

Each environment has trade-offs. For example, Power Query is user-friendly but may not scale like SQL or Spark for very large datasets.

How to Convert Data Types

In Power Query

  1. Select the column
  2. Go to Transform → Data Type
  3. Choose the correct type (e.g., Whole Number, Decimal Number, Date)

Power Query generates a Change Type step that applies at refresh.

In SQL

SELECT

    CAST(order_amount AS DECIMAL(18,2)) AS order_amount,

    CONVERT(DATE, order_date) AS order_date

FROM Sales;

  • CAST() and CONVERT() are standard.
  • Some engines support TRY_CAST() to avoid errors on incompatible values.

In Spark (PySpark or SQL)

PySpark example:

df = df.withColumn(“order_date”, df[“order_date”].cast(“date”))

SQL example in Spark:

SELECT CAST(order_amount AS DOUBLE) AS order_amount

FROM sales;

When to Convert Data Types

You should convert data types:

  • Before joins (to ensure matching keys)
  • Before aggregations (to ensure correct math operations)
  • Before loading into semantic models
    (to ensure correct behavior in Power BI)
  • When cleaning source data
    (e.g., text fields that actually represent numbers or dates)

Common Conversion Scenarios

1. Text to Numeric

Often needed when source systems export numbers as text:

SourceTarget
“1000”1000 (INT/DECIMAL)

2. Text to Date/Time

Date fields often arrive as text:

SourceTarget
“2025-08-01”2025-08-01 (DATE)

3. Numeric to Text

Sometimes required when composing keys:

CONCAT(customer_id, order_id)

4. Boolean Conversion

Often used in logical flags:

SourceTarget
“Yes”/”No”TRUE/FALSE

Handling Conversion Errors

Not all values convert cleanly. Options include:

  • TRY_CAST / TRY_CONVERT
    • Returns NULL instead of error
  • Error handling in Power Query
    • Replacing errors or invalid values
  • Filtering out problematic rows
    • Before casting

Example:

SELECT TRY_CAST(order_amount AS DECIMAL(18,2)) AS order_amount

FROM sales;

Performance and Governance Considerations

  • Convert as early as possible to support accurate joins/filters
  • Document transformations for transparency
  • Use consistent type conventions across the organization
  • Apply sensitivity labels appropriately — type conversion doesn’t affect security labels

Impact on Semantic Models

When creating semantic models (Power BI datasets):

  • Data types determine field behavior (e.g., date hierarchies)
  • Incorrect types can cause:
    • Incorrect aggregations
    • Misleading visuals
    • DAX errors

Always validate types before importing data into the model.

Best Practices

  • Always validate data values before conversion
  • Use schema enforcement where possible (e.g., Spark schema)
  • Avoid implicit type conversions during joins
  • Keep logs or steps of transformations for reproducibility

Key Takeaways for the DP-600 Exam

  • Know why data type conversion matters for analytics
  • Be able to choose the right tool (Power Query / SQL / Spark) for the context
  • Understand common conversions (text→numeric, text→date, boolean conversion)
  • Recognize when conversion must occur in the pipeline for correctness and performance

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Expect scenario-based questions rather than direct definitions
  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Keep in mind that if a question mentions unexpected calculations, broken joins, or filtering issues, always consider data type mismatches as a possible root cause.

Question 1

Why is converting column data types important in an analytics solution?

A. It reduces storage costs
B. It ensures accurate calculations, joins, and filtering
C. It improves report visuals automatically
D. It encrypts sensitive data

Correct Answer: B

Explanation:
Correct data types ensure accurate aggregations, proper join behavior, correct filtering, and predictable sorting.

Question 2

Which Fabric tool provides a visual, low-code interface for changing column data types?

A. SQL Analytics endpoint
B. Spark notebooks
C. Power Query
D. Eventhouse

Correct Answer: C

Explanation:
Power Query allows users to change data types through a graphical interface and automatically records the steps.

Question 3

What is a common risk when converting text values to numeric data types?

A. Increased storage usage
B. Duplicate rows
C. Conversion errors or null values
D. Slower report rendering

Correct Answer: C

Explanation:
Text values that are not valid numbers may cause conversion failures or be converted to nulls, depending on the method used.

Question 4

Which SQL function safely attempts to convert a value and returns NULL if conversion fails?

A. CAST
B. CONVERT
C. TRY_CAST
D. FORMAT

Correct Answer: C

Explanation:
TRY_CAST avoids query failures by returning NULL when a value cannot be converted.

Question 5

When should data types ideally be converted in a Fabric analytics pipeline?

A. At report query time
B. After publishing reports
C. Early in the transformation process
D. Only in the semantic model

Correct Answer: C

Explanation:
Converting data types early prevents downstream issues in joins, aggregations, and semantic models.

Question 6

Which data type is most appropriate for calendar-based filtering and time intelligence?

A. Text
B. Integer
C. Date or DateTime
D. Boolean

Correct Answer: C

Explanation:
Date and DateTime types enable proper time-based filtering, hierarchies, and time intelligence calculations.

Question 7

Which Spark operation converts a column’s data type?

A. changeType()
B. convert()
C. cast()
D. toType()

Correct Answer: C

Explanation:
The cast() method is used in Spark to convert a column’s data type.

Question 8

Why can implicit data type conversion during joins be problematic?

A. It improves performance
B. It hides data lineage
C. It may cause incorrect matches or slow performance
D. It automatically removes duplicates

Correct Answer: C

Explanation:
Implicit conversions can prevent index usage and lead to incorrect or inefficient joins.

Question 9

A numeric column is stored as text and sorts incorrectly (e.g., 1, 10, 2). What is the cause?

A. Incorrect aggregation
B. Missing values
C. Wrong data type
D. Duplicate rows

Correct Answer: C

Explanation:
Text sorting is lexicographical, not numeric, leading to incorrect ordering.

Question 10

What is the impact of incorrect data types in a Power BI semantic model?

A. Only visuals are affected
B. Aggregations, filters, and DAX behavior may be incorrect
C. Reports fail to load
D. Sensitivity labels are removed

Correct Answer: B

Explanation:
Data types influence how fields behave in calculations, visuals, and DAX expressions.

Identify and Resolve Duplicate Data, Missing Data, or Null Values

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data
--> Transform data
--> Identify and resolve duplicate data, missing data, or null values

Ensuring data quality is foundational for reliable analytics. Duplicate records, missing values, and nulls can lead to inaccurate aggregations, misleading insights, and broken joins. Microsoft Fabric provides multiple tools and techniques to identify, investigate, and resolve these issues during data preparation.

Why Data Quality Matters

Poor data quality can cause:

  • Incorrect business metrics (e.g., inflated counts)
  • Failed joins or mismatches
  • Incorrect aggregates or KPIs
  • Discrepancies across reports

The DP-600 exam expects you to know how to detect and fix these issues using Fabric’s transformation tools — without degrading performance or losing important data.

Key Data Quality Issues

1. Duplicate Data

Duplicates occur when the same record appears multiple times.
Common causes:

  • Repeated ingestion jobs
  • Incorrect joins
  • Source system errors

Impact of duplicates:

  • Inflated metrics
  • Misleading counts
  • Distorted analytics

2. Missing Data

Missing data refers to complete absence of expected rows for certain categories or time periods.

Examples:

  • No sales records for a specific store in a date range
  • Missing customer segments

Impact:

  • Bias in analysis
  • Understated performance

3. Null Values

Nulls represent unknown or undefined values in a dataset.

Common cases:

  • Missing customer name
  • Missing numeric values
  • Unpopulated fields in incomplete records

Consequences:

  • SQL functions may ignore nulls
  • Aggregations may be skewed
  • Joins may fail or produce incorrect results

Tools and Techniques in Microsoft Fabric

1. Power Query (Dataflows Gen2 / Lakehouse)

Power Query provides a visual and programmatic interface to clean data:

  • Remove duplicates:
    Home → Remove Rows → Remove Duplicates
  • Replace or fill nulls:
    Transform → Replace Values
    Or use Fill Up / Fill Down
  • Filter nulls:
    Filter rows where column is null or not null

Benefits:

  • No-code/low-code
  • Reusable transformation steps
  • Easy preview and validation

2. SQL (Warehouses / Lakehouse SQL Analytics)

Using SQL, you can identify and fix issues:

Detect duplicates:

SELECT Col1, Col2, COUNT(*) AS Cnt
FROM table
GROUP BY Col1, Col2
HAVING COUNT(*) > 1;

Remove duplicates (example pattern):

WITH RankedRows AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY keycol ORDER BY keycol) AS rn
  FROM table
)
SELECT * FROM RankedRows WHERE rn = 1;

Replace nulls:

SELECT COALESCE(column, 0) AS column_fixed
FROM table;

3. Spark (Lakehouses via Notebooks)

Identify nulls:

df.filter(df["column"].isNull()).show()

Drop duplicates:

df.dropDuplicates(["keycol"])

Fill nulls:

df.na.fill({"column": "Unknown"})

Best Practices for Resolution

Addressing Duplicates

  • Use business keys (unique identifiers) to define duplicates
  • Validate whether duplicates are true duplicates or legitimate repeats
  • Document deduplication logic

Handling Nulls

  • Use domain knowledge to decide substitute values
    • Zero for numeric
    • “Unknown” or “Not Provided” for text
  • Preserve nulls when they carry meaning (e.g., missing responses)

Handling Missing Data

  • Understand the business meaning
    • Is absence valid?
    • Should data be imputed?
    • Or should missing rows be generated via reference tables?

Data Profiling

  • Use profiling to understand distributions and quality:
    • Column completeness
    • Unique value distribution
    • Null frequency

Data profiling helps you decide which cleaning steps are required.

When to Clean Data in Fabric

Data quality transformations should be performed:

  • Early in the pipeline (at the ingestion or transformation layer)
  • Before building semantic models
  • Before aggregations or joins
  • Before publishing curated datasets

Early cleaning prevents issues from propagating into semantic models and reports.

Exam Scenarios

In DP-600 exam questions, you might see scenarios like:

  • Metrics appear inflated due to duplicate records
  • Reports show missing date ranges
  • Joins fail due to null key values
  • Aggregations ignore null values

Your job is to choose the correct transformation action — e.g., filtering nulls, deduplicating, replacing values, or imputing missing data — and the best tool (Power Query vs SQL vs Spark).

Key Takeaways

  • Duplicate rows inflate counts and distort analytics.
  • Missing rows can bias time-series or segment analysis.
  • Null values can break joins and cause incorrect aggregation results.
  • Effective resolution relies on understanding business context and using the right Fabric tools.
  • Clean data early for better downstream performance and governance.

Final Exam Tip
If a metric doesn’t look right, think data quality first — missing or null values and duplicates are one of the most common real-world issues covered in DP-600 scenarios.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

General Exam Tips for this section
If something looks wrong in a report:

  • Too high? → Check for duplicates
  • Blank or missing? → Check for nulls or missing rows
  • Not joining correctly? → Check nulls and key integrity

Question 1

Which issue is most likely to cause inflated totals in aggregated metrics?

A. Null values in numeric columns
B. Missing rows for a time period
C. Duplicate records
D. Incorrect column data types

Correct Answer: C

Explanation:
Duplicate records result in the same data being counted more than once, which inflates sums, counts, and averages.

Question 2

In Power Query, which action is used to remove duplicate rows?

A. Filter Rows
B. Group By
C. Remove Duplicates
D. Replace Values

Correct Answer: C

Explanation:
The Remove Duplicates operation removes repeated rows based on selected columns.

Question 3

Which SQL function is commonly used to replace null values with a default value?

A. NULLIF
B. ISNULL or COALESCE
C. COUNT
D. CAST

Correct Answer: B

Explanation:
ISNULL() and COALESCE() return a specified value when a column contains NULL.

Question 4

Why can null values cause problems in joins?

A. Nulls increase query runtime
B. Nulls are treated as zero
C. Nulls never match other values
D. Nulls are automatically filtered

Correct Answer: C

Explanation:
NULL values do not match any value (including other NULLs), which can cause rows to be excluded from join results.

Question 5

Which scenario best justifies keeping null values rather than replacing them?

A. The column is used in joins
B. The null indicates “unknown” or “not applicable”
C. The column is numeric
D. The column has duplicates

Correct Answer: B

Explanation:
Nulls may carry important business meaning and should be preserved when they accurately represent missing or unknown information.

Question 6

Which Fabric tool is most appropriate for visual data profiling to identify missing and null values?

A. Power BI visuals
B. Power Query
C. Semantic models
D. Eventhouse

Correct Answer: B

Explanation:
Power Query provides built-in data profiling features such as column distribution, column quality, and column profile.

Question 7

What is the purpose of using an anti join when checking data quality?

A. To merge tables
B. To append data
C. To identify unmatched records
D. To replace null values

Correct Answer: C

Explanation:
Anti joins return rows that do not have a match in another table, making them ideal for identifying missing or orphaned records.

Question 8

Which approach is considered a best practice for handling data quality issues?

A. Fix issues only in reports
B. Clean data as late as possible
C. Resolve issues early in the pipeline
D. Ignore null values

Correct Answer: C

Explanation:
Resolving data quality issues early prevents them from propagating into semantic models and reports.

Question 9

Which Spark operation removes duplicate rows from a DataFrame?

A. filter()
B. groupBy()
C. dropDuplicates()
D. distinctColumns()

Correct Answer: C

Explanation:
dropDuplicates() removes duplicate rows based on one or more columns.

Question 10

A report is missing values for several dates. What is the most likely cause?

A. Duplicate rows
B. Incorrect aggregation logic
C. Missing source data
D. Incorrect data type conversion

Correct Answer: C

Explanation:
Missing dates usually indicate that source records are absent rather than null or duplicated.

Merge or Join Data

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data
--> Transform data
--> Merge or join data

Merging or joining data is a fundamental transformation task in Microsoft Fabric. It enables you to combine related data from multiple tables or sources into a single dataset for analysis, modeling, or reporting. This skill is essential for preparing clean, well-structured data in lakehouses, warehouses, dataflows, and Power BI semantic models.

For the DP-600 exam, you are expected to understand when, where, and how to merge or join data using the appropriate Fabric tools, as well as the implications for performance, data quality, and modeling.

Merge vs. Join: Key Distinction

Although often used interchangeably, the terms have slightly different meanings depending on the tool:

  • Merge
    • Commonly used in Power Query
    • Combines tables by matching rows based on one or more key columns
    • Produces a new column that can be expanded
  • Join
    • Commonly used in SQL and Spark
    • Combines tables using explicit join logic (JOIN clauses)
    • Output schema is defined directly in the query

Where Merging and Joining Occur in Fabric

Fabric ExperienceHow It’s Done
Power Query (Dataflows Gen2, Lakehouse)Merge Queries UI
WarehouseSQL JOIN statements
Lakehouse (Spark notebooks)DataFrame joins
Power BI DesktopPower Query merges

Common Join Types (Exam-Critical)

Understanding join types is heavily tested:

  • Inner Join
    • Returns only matching rows from both tables
  • Left Outer Join
    • Returns all rows from the left table and matching rows from the right
  • Right Outer Join
    • Returns all rows from the right table and matching rows from the left
  • Full Outer Join
    • Returns all rows from both tables
  • Left Anti / Right Anti Join
    • Returns rows with no match in the other table

👉 Exam tip: Anti joins are commonly used to identify missing or unmatched data.

Join Keys and Data Quality Considerations

Before merging or joining data, it’s critical to ensure:

  • Join columns:
    • Have matching data types
    • Are cleaned and standardized
    • Represent the same business entity
  • Duplicate values in join keys can:
    • Create unexpected row multiplication
    • Impact aggregations and performance

Performance and Design Considerations

  • Prefer SQL joins or Spark joins for large datasets rather than Power Query
  • Filter and clean data before joining to reduce data volume
  • In dimensional modeling:
    • Fact tables typically join to dimension tables using left joins
  • Avoid unnecessary joins in the semantic layer when they can be handled upstream

Common Use Cases

  • Combining fact data with descriptive attributes
  • Enriching transactional data with reference or lookup tables
  • Building dimension tables for star schema models
  • Validating data completeness using anti joins

Exam Tips and Pitfalls

  • Don’t confuse merge vs. append (append stacks rows vertically)
  • Know which tool to use based on:
    • Data size
    • Refresh frequency
    • Complexity
  • Expect scenario questions asking:
    • Which join type to use
    • Where the join should occur in the architecture

Key Takeaways

  • Merging and joining data is essential for data preparation in Fabric
  • Different Fabric experiences offer different ways to join data
  • Correct join type and clean join keys are critical for accuracy
  • Performance and modeling best practices matter for the DP-600 exam

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

Question 1

What is the primary purpose of merging or joining data in Microsoft Fabric?

A. To reduce storage costs
B. To vertically stack tables
C. To combine related data based on a common key
D. To encrypt sensitive columns

Correct Answer: C

Explanation:
Merging or joining data combines related datasets horizontally using shared key columns so that related attributes appear in a single dataset.

Question 2

In Power Query, what is the result of a Merge Queries operation?

A. Rows from both tables are appended
B. A new table is automatically created
C. A new column containing related table data is added
D. A relationship is created in the semantic model

Correct Answer: C

Explanation:
Power Query merges add a column that contains matching rows from the second table, which can then be expanded.

Question 3

Which join type returns only rows that exist in both tables?

A. Left outer join
B. Right outer join
C. Full outer join
D. Inner join

Correct Answer: D

Explanation:
An inner join returns only rows with matching keys in both tables.

Question 4

You want to keep all rows from a fact table and bring in matching dimension attributes. Which join type should you use?

A. Inner join
B. Left outer join
C. Right outer join
D. Full outer join

Correct Answer: B

Explanation:
A left outer join preserves all rows from the left (fact) table while bringing in matching rows from the dimension table.

Question 5

Which join type is most useful for identifying records that do not have a match in another table?

A. Inner join
B. Full outer join
C. Left anti join
D. Right outer join

Correct Answer: C

Explanation:
A left anti join returns rows from the left table that do not have matching rows in the right table, making it ideal for data quality checks.

Question 6

What issue can occur when joining tables that contain duplicate values in the join key?

A. Data type conversion errors
B. Row multiplication
C. Data loss
D. Query failure

Correct Answer: B

Explanation:
Duplicate keys can cause one-to-many or many-to-many matches, resulting in more rows than expected after the join.

Question 7

Which Fabric experience is best suited for performing joins on very large datasets?

A. Power BI Desktop
B. Power Query
C. Warehouse using SQL
D. Excel

Correct Answer: C

Explanation:
SQL joins in a warehouse are optimized for large-scale data processing and typically outperform Power Query for large datasets.

Question 8

Which operation should not be confused with merging or joining data?

A. Append
B. Inner join
C. Left join
D. Anti join

Correct Answer: A

Explanation:
Append stacks tables vertically (row-wise), while merges and joins combine tables horizontally (column-wise).

Question 9

What should you verify before merging two tables?

A. That both tables have the same number of rows
B. That join columns use compatible data types
C. That all columns are indexed
D. That the tables are in the same workspace

Correct Answer: B

Explanation:
Join columns must have compatible data types and clean values; otherwise, matches may fail or produce incorrect results.

Question 10

From a modeling best-practice perspective, where should complex joins ideally be performed?

A. In Power BI visuals
B. In DAX measures
C. Upstream in lakehouse or warehouse transformations
D. At query time in reports

Correct Answer: C

Explanation:
Performing joins upstream simplifies semantic models, improves performance, and ensures consistency across reports.

Aggregate Data

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data
--> Transform data
--> Aggregate data

Aggregating data is a foundational data transformation technique used to compute summaries and roll-ups, such as totals, averages, counts, and other statistical measures. In analytics solutions—even ones built in Microsoft Fabric—aggregation enables faster performance, simplified reporting, and clearer insights.

In the context of DP-600, you should understand why and when to aggregate data, how aggregation affects downstream analytics, and where it is implemented in Fabric workloads.

What Is Data Aggregation?

Aggregation refers to the process of summarizing detailed records into higher-level metrics. Common aggregation operations include:

  • SUM – total of a numeric field
  • COUNT / COUNT DISTINCT – number of records or unique values
  • AVG – average
  • MIN / MAX – lowest or highest value
  • GROUP BY – group records before applying aggregate functions

Aggregation turns row-level data into summary tables that are ideal for dashboards, KPIs, and trend analysis.

Why Aggregate Data?

Performance

Large detailed tables can be slow to query. Pre-aggregated data:

  • Reduces data scanned at query time
  • Improves report responsiveness

Simplicity

Aggregated data simplifies reporting logic for end users by providing ready-to-use summary metrics.

Consistency

When aggregations are standardized at the data layer, multiple reports can reuse the same durable summaries, ensuring consistent results.

When to Aggregate

Consider aggregating when:

  • Working with large detail tables (e.g., web logs, transaction history)
  • Reports require summary metrics (e.g., monthly totals, regional averages)
  • Users frequently query the same roll-ups
  • You want to offload compute from the semantic model or report layer

Where to Aggregate in Microsoft Fabric

Lakehouse

  • Use Spark SQL or SQL analytics endpoints
  • Good for large-scale transformations on big data
  • Ideal for creating summarized tables

Warehouse

  • Use T-SQL for aggregations
  • Supports highly optimized analytical queries
  • Can store aggregated tables for BI performance

Dataflows Gen2

  • Use Power Query transformations to aggregate and produce curated tables
  • Fits well in ETL/ELT pipelines

Notebooks

  • Use Spark (PySpark or SQL) for advanced or complex aggregations

Semantic Models (DAX)

  • Create aggregated measures
  • Useful for scenarios when aggregation logic must be defined at analysis time

Common Aggregation Patterns

Rollups by Time

Aggregating by day, week, month, quarter, or year:

SELECT
  YEAR(OrderDate) AS Year,
  MONTH(OrderDate) AS Month,
  SUM(SalesAmount) AS TotalSales
FROM Sales
GROUP BY
  YEAR(OrderDate),
  MONTH(OrderDate);

Aggregations with Dimensions

Combining filters and groupings:

SELECT
  Region,
  ProductCategory,
  SUM(SalesAmount) AS TotalSales,
  COUNT(*) AS OrderCount
FROM Sales
GROUP BY
  Region,
  ProductCategory;

Aggregations vs. Detailed Tables

AspectDetailed TableAggregated Table
Query flexibilityHighLower (fixed aggregates)
PerformanceLowerHigher
StorageModerateLower
BI simplicityModerateHigh

Best practice: store both detail and aggregated tables when storage and refresh times permit.

Aggregation and Semantic Models

Semantic models often benefit from pre-aggregated tables:

  • Improves report performance
  • Reduces row scans on large datasets
  • Can support composite models that combine aggregated tables with detail tables

Within semantic models:

  • Calculated measures define aggregation rules
  • Aggregated physical tables can be imported for performance

Governance and Refresh Considerations

  • Aggregated tables must be refreshed on a schedule that matches business needs.
  • Use pipelines or automation to update aggregated data regularly.
  • Ensure consistency between fact detail and aggregated summaries.
  • Document and version aggregation logic for maintainability.

Example Use Cases

Sales KPI Dashboard

  • Monthly total sales
  • Year-to-date sales
  • Average order value

Operational Reporting

  • Daily website visits by category
  • Hourly orders processed per store

Executive Scorecards

  • Quarter-to-date profits
  • Customer acquisition counts by region

Best Practices for DP-600

  • Aggregate as close to the data source as practical to improve performance
  • Use Dataflows Gen2, Lakehouse SQL, or Warehouse SQL for durable aggregated tables
  • Avoid over-aggregation that removes necessary detail for other reports
  • Use semantic model measures for dynamic aggregation needs

Key Takeaway
In DP-600 scenarios, aggregating data is about preparing analytics-ready datasets that improve performance and simplify reporting. Understand how to choose the right place and method for aggregation—whether in a lakehouse, warehouse, dataflow, or semantic model—and how that choice impacts downstream analytics.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

Question 1

What is the primary purpose of aggregating data in analytics solutions?

A. To increase data granularity
B. To reduce data quality issues
C. To summarize detailed data into meaningful metrics
D. To enforce security rules

Correct Answer: C

Explanation:
Aggregation summarizes detailed records (for example, transactions) into higher-level metrics such as totals, averages, or counts, making data easier to analyze and faster to query.

Question 2

Which SQL clause is required when using aggregate functions like SUM() or COUNT() with non-aggregated columns?

A. ORDER BY
B. GROUP BY
C. WHERE
D. HAVING

Correct Answer: B

Explanation:
GROUP BY defines how rows are grouped before aggregate functions are applied. Any non-aggregated column in the SELECT clause must appear in the GROUP BY clause.

Question 3

Which scenario is the best candidate for creating a pre-aggregated table in Microsoft Fabric?

A. Ad-hoc exploratory analysis
B. Frequently queried KPIs used across multiple reports
C. Data with unpredictable schema changes
D. Small lookup tables

Correct Answer: B

Explanation:
Pre-aggregated tables are ideal for commonly used KPIs because they improve performance and ensure consistent results across reports.

Question 4

Where can durable aggregated tables be created in Microsoft Fabric?

A. Only in semantic models
B. Only in notebooks
C. Lakehouses and warehouses
D. Power BI reports

Correct Answer: C

Explanation:
Both Lakehouses (via Spark SQL or SQL analytics endpoints) and Warehouses (via T-SQL) support persistent aggregated tables.

Question 5

Which aggregation function returns the number of unique values in a column?

A. COUNT
B. SUM
C. AVG
D. COUNT DISTINCT

Correct Answer: D

Explanation:
COUNT DISTINCT counts only unique values, which is commonly used for metrics like unique customers or unique orders.

Question 6

What is a key benefit of aggregating data before loading it into a semantic model?

A. Increased storage usage
B. Improved query performance
C. More complex DAX expressions
D. Higher data latency

Correct Answer: B

Explanation:
Pre-aggregated data reduces the number of rows scanned at query time, resulting in faster report and dashboard performance.

Question 7

Which Fabric component is best suited for performing aggregation as part of an ETL or ELT process using Power Query?

A. Notebooks
B. Dataflows Gen2
C. Eventhouses
D. Semantic models

Correct Answer: B

Explanation:
Dataflows Gen2 use Power Query and are designed for repeatable data transformations, including grouping and aggregating data.

Question 8

What is a common tradeoff when using aggregated tables instead of detailed fact tables?

A. Higher storage costs
B. Reduced data security
C. Loss of granular detail
D. Slower refresh times

Correct Answer: C

Explanation:
Aggregated tables improve performance but reduce flexibility because detailed, row-level data is no longer available.

Question 9

Which aggregation pattern is commonly used for time-based analysis?

A. GROUP BY product category
B. GROUP BY customer ID
C. GROUP BY date, month, or year
D. GROUP BY transaction ID

Correct Answer: C

Explanation:
Time-based aggregations (daily, monthly, yearly) are fundamental for trend analysis and KPI reporting.

Question 10

Which approach is considered a best practice when designing aggregated datasets for analytics?

A. Aggregate all data at the highest level only
B. Store only aggregated tables and discard detail data
C. Maintain both detailed and aggregated tables when possible
D. Avoid aggregations until the reporting layer

Correct Answer: C

Explanation:
Keeping both detail-level and aggregated tables provides flexibility while still achieving strong performance for common analytical queries.

Denormalize Data

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data
--> Transform data
--> Denormalize Data

Data denormalization is a transformation strategy that restructures data to improve query performance and simplify analytics—especially in analytical workloads such as reporting, dashboarding, and BI. In Microsoft Fabric, denormalization plays a key role when preparing data for efficient consumption in lakehouses, warehouses, and semantic models.

This article explains what denormalization means, why it’s important for analytics, how to implement it in Fabric, and when to use it versus normalized structures.

What Is Denormalization?

Denormalization is the process of combining data from multiple tables or sources into a single, flattened structure. The goal is to reduce the number of joins and simplify querying at the expense of some redundancy.

In contrast:

  • Normalized data avoids redundancy by splitting data into many related tables.
  • Denormalized data often duplicates data intentionally to speed up analytical queries.

Why Denormalize Data for Analytics?

Denormalization is widely used in analytics because it:

  • Improves query performance: Fewer joins mean faster queries—especially for BI tools like Power BI.
  • Simplifies report logic: Flattened tables make it easier for report authors to understand and use data.
  • Reduces semantic model complexity: Fewer relationships and tables can improve both model performance and maintainability.
  • Optimizes storage access: Pre-computed joins and aggregated structures reduce run-time computation.

Beneficial Scenarios for Denormalization

Denormalization is especially helpful when:

  • Building star schemas or analytical data marts.
  • Preparing data for semantic models that are consumed by BI tools.
  • Performance is critical for dashboards and reports.
  • Data rarely changes (or changes can be managed with refresh logic).
  • Users require self-service analytics with minimal SQL complexity.

Where to Denormalize in Microsoft Fabric

Denormalization can be implemented in different Fabric components depending on workload and transformation needs:

1. Dataflows Gen2

  • Use Power Query to merge tables and create flattened structures
  • Ideal for low-code scenarios targeting OneLake
  • Great for building reusable tables

2. Lakehouses

  • Use Spark SQL or T-SQL to perform joins and build denormalized tables
  • Useful for large-scale ELT transformations

3. Warehouse

  • Use SQL to create flattened analytic tables optimized for BI
  • Supports indexing and performance tuning

4. Notebooks

  • Use PySpark or Spark SQL for complex or iterative denormalization logic

How to Denormalize Data

Typical Techniques

  • Merge or Join tables: Combine fact and dimension tables into a single analytic table
  • Pre-compute derived values: Compute metrics or concatenated fields ahead of time
  • Flatten hierarchies: Add attributes from parent tables directly into child records
  • Pivot or unpivot: Adjust layout to match analytics needs

Example (Conceptual Join)

Instead of querying these tables:

SELECT

    s.SalesID,

    d.CustomerName,

    p.ProductName

FROM FactSales s

JOIN DimCustomer d ON s.CustomerID = d.CustomerID

JOIN DimProduct p ON s.ProductID = p.ProductID;

Create a denormalized “SalesAnalytics” table:

SELECT

    s.SalesID,

    s.SalesDate,

    d.CustomerName,

    p.ProductName,

    s.SalesAmount

INTO DenormSalesAnalytics

FROM FactSales s

JOIN DimCustomer d ON s.CustomerID = d.CustomerID

JOIN DimProduct p ON s.ProductID = p.ProductID;

This single table can then be queried directly by BI tools without joins.

Trade-Offs of Denormalization

While denormalization improves performance and simplicity, it also introduces trade-offs:

Pros

  • Faster, simpler queries
  • Better analytics experience
  • Easier semantic model design

Cons

  • Data redundancy
  • Larger storage footprint
  • More complex refresh and update logic
  • Higher maintenance if source schemas change

Integrating Denormalization with Semantic Models

Denormalized tables are often used as sources for Power BI semantic models to:

  • Reduce row-level relationships
  • Improve report refresh times
  • Simplify model structure
  • Support consistent business metrics

Because semantic models work best with wide tables and straightforward relationships, denormalized sources are ideal.

Best Practices for Denormalization

  • Denormalize only where it delivers clear performance or usability benefits
  • Document transformation logic for future maintainability
  • Use pipelines or Dataflows Gen2 for repeatable and auditable ELT flows
  • Monitor refresh performance and adjust partitions or indexes

When Not to Denormalize

Avoid denormalization when:

  • Data integrity rules are strict and must avoid redundancy
  • Source systems change frequently
  • You are performing OLTP-style operations (transactional systems)
  • Storage and refresh cost outweigh performance gains

What to Know for the DP-600 Exam

You should be comfortable with:

  • The definition and purpose of denormalization
  • Recognizing when it’s appropriate in analytics workloads
  • How to implement denormalization in Fabric components
  • The trade-offs involved in denormalizing data
  • How denormalized structures optimize semantic models and BI

Final Exam Tip
If a question emphasizes reducing joins, improving query performance, and simplifying reporting, you’re likely dealing with denormalization.
If it emphasizes transactional integrity and normalized structures, that’s not the scenario for denormalization.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

1. What is the primary purpose of denormalizing data for analytics workloads?

A. Reduce data duplication
B. Improve transactional integrity
C. Improve query performance and simplify analytics
D. Enforce strict normalization rules

Correct Answer: C

Explanation:
Denormalization intentionally introduces redundancy to reduce joins, simplify queries, and improve performance—key requirements for analytics and BI workloads.

2. Which type of workload benefits most from denormalized data?

A. OLTP transaction processing
B. Real-time device telemetry ingestion
C. BI reporting and dashboarding
D. Application logging

Correct Answer: C

Explanation:
BI reporting and analytics benefit from flattened, denormalized structures because they reduce query complexity and improve performance.

3. What is a common technique used to denormalize data?

A. Normalizing dimension tables
B. Splitting wide tables into smaller ones
C. Merging multiple related tables into one
D. Removing foreign keys

Correct Answer: C

Explanation:
Denormalization commonly involves merging fact and dimension data into a single table to reduce the need for joins during querying.

4. Which Microsoft Fabric component is best suited for low-code denormalization?

A. Notebooks
B. SQL analytics endpoint
C. Dataflows Gen2
D. Eventhouse

Correct Answer: C

Explanation:
Dataflows Gen2 use Power Query to perform low-code transformations such as merging tables and creating flattened datasets.

5. What is a key trade-off introduced by denormalization?

A. Reduced query performance
B. Increased data redundancy
C. Reduced storage reliability
D. Loss of query flexibility

Correct Answer: B

Explanation:
Denormalization duplicates data across rows or tables, which increases redundancy and can complicate updates and refresh processes.

6. Why is denormalized data often used as a source for Power BI semantic models?

A. Power BI cannot handle relationships
B. Denormalized tables simplify models and improve performance
C. Semantic models require flattened data only
D. Denormalized data reduces licensing costs

Correct Answer: B

Explanation:
Flattened tables reduce the number of relationships and joins, improving performance and making semantic models easier to design and maintain.

7. In which scenario should denormalization generally be avoided?

A. Preparing a reporting data mart
B. Building a self-service analytics dataset
C. Supporting frequent transactional updates
D. Optimizing dashboard query speed

Correct Answer: C

Explanation:
Denormalization is not ideal for transactional systems where frequent updates and strict data integrity are required.

8. Where is denormalization commonly implemented in Microsoft Fabric?

A. User interface settings
B. Workspace-level permissions
C. Lakehouses, warehouses, and Dataflows Gen2
D. Real-Time hub only

Correct Answer: C

Explanation:
Denormalization is a data transformation task typically implemented in Fabric lakehouses, warehouses, notebooks, or Dataflows Gen2.

9. What is a common benefit of denormalizing data earlier in the data pipeline?

A. Reduced need for data validation
B. Improved consistency across analytics assets
C. Automatic enforcement of row-level security
D. Lower data ingestion costs

Correct Answer: B

Explanation:
Denormalizing upstream ensures that all downstream analytics assets consume the same enriched and flattened datasets, improving consistency.

10. Which phrase best indicates that denormalization is an appropriate solution?

A. “Strict transactional consistency is required”
B. “Data must be updated in real time per record”
C. “Queries require many joins and are slow”
D. “Source systems change frequently”

Correct Answer: C

Explanation:
Denormalization is commonly applied when complex joins cause performance issues and simplified querying is required.