Category: Data Integration (ETL)

Exam Prep Hub for DP-600: Implementing Analytics Solutions Using Microsoft Fabric

This is your one-stop hub with information for preparing for the DP-600: Implementing Analytics Solutions Using Microsoft Fabric certification exam. Upon successful completion of the exam, you earn the Fabric Analytics Engineer Associate certification.

This hub provides information directly here, links to a number of external resources, tips for preparing for the exam, practice tests, and section questions to help you prepare. Bookmark this page and use it as a guide to ensure that you are fully covering all relevant topics for the exam and using as many of the resources available as possible. We hope you find it convenient and helpful.

Why do the DP-600: Implementing Analytics Solutions Using Microsoft Fabric exam to gain the Fabric Analytics Engineer Associate certification?

Most likely, you already know why you want to earn this certification, but in case you are seeking information on its benefits, here are a few:
(1) there is a possibility for career advancement because Microsoft Fabric is a leading data platform used by companies of all sizes, all over the world, and is likely to become even more popular
(2) greater job opportunities due to the edge provided by the certification
(3) higher earnings potential,
(4) you will expand your knowledge about the Fabric platform by going beyond what you would normally do on the job and
(5) it will provide immediate credibility about your knowledge, and
(6) it may, and it should, provide you with greater confidence about your knowledge and skills.


Important DP-600 resources:


DP-600: Skills measured as of October 31, 2025:

Here you can learn in a structured manner by going through the topics of the exam one-by-one to ensure full coverage; click on each hyperlinked topic below to go to more information about it:

Skills at a glance

  • Maintain a data analytics solution (25%-30%)
  • Prepare data (45%-50%)
  • Implement and manage semantic models (25%-30%)

Maintain a data analytics solution (25%-30%)

Implement security and governance

Maintain the analytics development lifecycle

Prepare data (45%-50%)

Get Data

Transform Data

Query and analyze data

Implement and manage semantic models (25%-30%)

Design and build semantic models

Optimize enterprise-scale semantic models


Practice Exams:

We have provided 2 practice exams with answers to help you prepare.

DP-600 Practice Exam 1 (60 questions with answer key)

DP-600 Practice Exam 2 (60 questions with answer key)


Good luck to you passing the DP-600: Implementing Analytics Solutions Using Microsoft Fabric certification exam and earning the Fabric Analytics Engineer Associate certification!

Implement Performance Improvements in Queries and Report Visuals (DP-600 Exam Prep)

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models (25-30%)
--> Optimize enterprise-scale semantic models
--> Implement performance improvements in queries and report visuals

Performance optimization is a critical skill for the Fabric Analytics Engineer. In enterprise-scale semantic models, poor query design, inefficient DAX, or overly complex visuals can significantly degrade report responsiveness and user experience. This exam section focuses on identifying performance bottlenecks and applying best practices to improve query execution, model efficiency, and report rendering.


1. Understand Where Performance Issues Occur

Performance problems typically fall into three layers:

a. Data & Storage Layer

  • Storage mode (Import, DirectQuery, Direct Lake, Composite)
  • Data source latency
  • Table size and cardinality
  • Partitioning and refresh strategies

b. Semantic Model & Query Layer

  • DAX calculation complexity
  • Relationships and filter propagation
  • Aggregation design
  • Use of calculation groups and measures

c. Report & Visual Layer

  • Number and type of visuals
  • Cross-filtering behavior
  • Visual-level queries
  • Use of slicers and filters

DP-600 questions often test your ability to identify the correct layer where optimization is needed.


2. Optimize Queries and Semantic Model Performance

a. Choose the Appropriate Storage Mode

  • Use Import for small-to-medium datasets requiring fast interactivity
  • Use Direct Lake for large OneLake Delta tables with high concurrency
  • Use Composite models to balance performance and real-time access
  • Avoid unnecessary DirectQuery when Import or Direct Lake is feasible

b. Reduce Data Volume

  • Remove unused columns and tables
  • Reduce column cardinality (e.g., avoid high-cardinality text columns)
  • Prefer surrogate keys over natural keys
  • Disable Auto Date/Time when not needed

c. Optimize Relationships

  • Use single-direction relationships by default
  • Avoid unnecessary bidirectional filters
  • Ensure relationships follow a star schema
  • Avoid many-to-many relationships unless required

d. Use Aggregations

  • Create aggregation tables to pre-summarize large fact tables
  • Enable query hits against aggregation tables before scanning detailed data
  • Especially valuable in composite models

3. Improve DAX Query Performance

a. Write Efficient DAX

  • Prefer measures over calculated columns
  • Use variables (VAR) to avoid repeated calculations
  • Minimize row context where possible
  • Avoid excessive iterators (SUMX, FILTER) over large tables

b. Use Filter Context Efficiently

  • Prefer CALCULATE with simple filters
  • Avoid complex nested FILTER expressions
  • Use KEEPFILTERS and REMOVEFILTERS intentionally

c. Avoid Expensive Patterns

  • Avoid EARLIER in favor of variables
  • Avoid dynamic table generation inside visuals
  • Minimize use of ALL when ALLSELECTED or scoped filters suffice

4. Optimize Report Visual Performance

a. Reduce Visual Complexity

  • Limit the number of visuals per page
  • Avoid visuals that generate multiple queries (e.g., complex custom visuals)
  • Use summary visuals instead of detailed tables where possible

b. Control Interactions

  • Disable unnecessary visual interactions
  • Avoid excessive cross-highlighting
  • Use report-level filters instead of visual-level filters when possible

c. Optimize Slicers

  • Avoid slicers on high-cardinality columns
  • Use dropdown slicers instead of list slicers
  • Limit the number of slicers on a page

d. Prefer Measures Over Visual Calculations

  • Avoid implicit measures created by dragging numeric columns
  • Define explicit measures in the semantic model
  • Reuse measures across visuals to improve cache efficiency

5. Use Performance Analysis Tools

a. Performance Analyzer

  • Identify slow visuals
  • Measure DAX query duration
  • Distinguish between query time and visual rendering time

b. Query Diagnostics (Power BI Desktop)

  • Analyze backend query behavior
  • Identify expensive DirectQuery or Direct Lake operations

c. DAX Studio (Advanced)

  • Analyze query plans
  • Measure storage engine vs formula engine time
  • Identify inefficient DAX patterns

(You won’t be tested on tool UI details, but knowing when and why to use them is exam-relevant.)


6. Common DP-600 Exam Scenarios

You may be asked to:

  • Identify why a report is slow and choose the best optimization
  • Identify the bottleneck layer (model, query, or visual)
  • Select the most appropriate storage mode for performance
  • Choose the least disruptive, most effective optimization
  • Improve a slow DAX measure
  • Reduce visual rendering time without changing the data source
  • Optimize performance for enterprise-scale models
  • Apply enterprise-scale best practices, not just quick fixes

Key Exam Takeaways

  • Always optimize the model first, visuals second
  • Star schema + clean relationships = better performance
  • Efficient DAX matters more than clever DAX
  • Fewer visuals and interactions = faster reports
  • Aggregations and Direct Lake are key enterprise-scale tools

Practice Questions:

Go to the Practice Exam Questions for this topic.

Implement Relationships, Such as Bridge Tables and Many-to-Many Relationships

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models (25-30%)
--> Design and build semantic models
--> Implement Relationships, Such as Bridge Tables

and Many-to-Many Relationships

Why Relationships Matter in Semantic Models

In Microsoft Fabric and Power BI semantic models, relationships define how tables interact and how filters propagate across data. Well-designed relationships are critical for:

  • Accurate aggregations
  • Predictable filtering behavior
  • Correct DAX calculations
  • Optimal query performance

While one-to-many relationships are preferred, real-world data often requires handling many-to-many relationships using techniques such as bridge tables.


Common Relationship Types in Semantic Models

1. One-to-Many (Preferred)

  • One dimension row relates to many fact rows
  • Most common and performant relationship
  • Typical in star schemas

Example:

  • DimCustomer → FactSales

2. Many-to-Many

  • Multiple rows in one table relate to multiple rows in another
  • More complex filtering behavior
  • Can negatively impact performance if not modeled correctly

Example:

  • Customers associated with multiple regions
  • Products assigned to multiple categories

Understanding Many-to-Many Relationships

Native Many-to-Many Relationships

Power BI supports direct many-to-many relationships, but these should be used carefully.

Characteristics:

  • Cardinality: Many-to-many
  • Filters propagate ambiguously
  • DAX becomes harder to reason about

Exam Tip:
Direct many-to-many relationships are supported but not always recommended for complex models.


Bridge Tables (Best Practice)

A bridge table (also called a factless fact table) resolves many-to-many relationships by introducing an intermediate table.

What Is a Bridge Table?

A table that:

  • Contains keys from two related entities
  • Has no numeric measures
  • Enables controlled filtering paths

Example Scenario

Business case:
Products can belong to multiple categories.

Tables:

  • DimProduct (ProductID, Name)
  • DimCategory (CategoryID, CategoryName)
  • BridgeProductCategory (ProductID, CategoryID)

Relationships:

  • DimProduct → BridgeProductCategory (one-to-many)
  • DimCategory → BridgeProductCategory (one-to-many)

This converts a many-to-many relationship into two one-to-many relationships.


Benefits of Using Bridge Tables

BenefitDescription
Predictable filteringClear filter paths
Better DAX controlEasier to write and debug measures
Improved performanceAvoids ambiguous joins
ScalabilityHandles complex relationships cleanly

Filter Direction Considerations

Single vs Bidirectional Filters

  • Single direction (recommended):
    Filters flow from dimension → bridge → fact
  • Bidirectional:
    Can simplify some scenarios but increases ambiguity

Exam Guidance:

  • Use single-direction filters by default
  • Enable bidirectional filtering only when required and understood

Many-to-Many and DAX Implications

When working with many-to-many relationships:

  • Measures may return unexpected results
  • DISTINCTCOUNT is commonly required
  • Explicit filtering using DAX functions may be necessary

Common DAX patterns:

  • CALCULATE
  • TREATAS
  • CROSSFILTER (advanced)

Relationship Best Practices for DP-600

  • Favor star schemas with one-to-many relationships
  • Use bridge tables instead of direct many-to-many when possible
  • Avoid unnecessary bidirectional filters
  • Validate relationship cardinality and direction
  • Test measures under different filtering scenarios

Common Exam Scenarios

You may see questions like:

  • “How do you model a relationship where products belong to multiple categories?”
  • “What is the purpose of a bridge table?”
  • “What are the risks of many-to-many relationships?”

Correct answers typically emphasize:

  • Bridge tables
  • Controlled filter propagation
  • Avoiding ambiguous relationships

Star Schema vs Many-to-Many Models

FeatureStar SchemaMany-to-Many
ComplexityLowHigher
PerformanceBetterLower
DAX simplicityHighLower
Use casesMost analyticsSpecialized scenarios

Summary

Implementing relationships correctly is foundational to building reliable semantic models in Microsoft Fabric:

  • One-to-many relationships are preferred
  • Many-to-many relationships should be handled carefully
  • Bridge tables provide a scalable, exam-recommended solution
  • Clear relationships lead to accurate analytics and simpler DAX

Exam Tip

If a question involves multiple entities relating to each other, or many-to-many relationships, the most likely answer usually includes using a “bridge table”.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

1. Which relationship type is generally preferred in Power BI semantic models?

A. Many-to-many
B. One-to-one
C. One-to-many
D. Bidirectional many-to-many

Correct Answer: C

Explanation:
One-to-many relationships provide predictable filter propagation, better performance, and simpler DAX calculations.


2. What is the primary purpose of a bridge table?

A. Store aggregated metrics
B. Normalize dimension attributes
C. Resolve many-to-many relationships
D. Improve data refresh performance

Correct Answer: C

Explanation:
Bridge tables convert many-to-many relationships into two one-to-many relationships, improving model clarity and control.


3. Which characteristic best describes a bridge table?

A. Contains numeric measures
B. Stores transactional data
C. Contains keys from related tables only
D. Is always filtered bidirectionally

Correct Answer: C

Explanation:
Bridge tables typically contain only keys (foreign keys) and no measures, enabling relationship resolution.


4. What is a common risk of using native many-to-many relationships directly?

A. They cannot be refreshed
B. They cause data duplication
C. They create ambiguous filter propagation
D. They are unsupported in Fabric

Correct Answer: C

Explanation:
Native many-to-many relationships can result in ambiguous filtering and unpredictable aggregation results.


5. In a bridge table scenario, how are relationships typically defined?

A. Many-to-many on both sides
B. One-to-one from both dimensions
C. One-to-many from each dimension to the bridge
D. Bidirectional many-to-one

Correct Answer: C

Explanation:
Each dimension connects to the bridge table using a one-to-many relationship.


6. When should bidirectional filtering be enabled?

A. Always, for simplicity
B. Only when necessary and well-understood
C. Only on fact tables
D. Never in semantic models

Correct Answer: B

Explanation:
Bidirectional filters can be useful but introduce complexity and ambiguity if misused.


7. Which scenario is best handled using a bridge table?

A. A customer has one address
B. A sale belongs to one product
C. A product belongs to multiple categories
D. A date table relates to a fact table

Correct Answer: C

Explanation:
Products belonging to multiple categories is a classic many-to-many scenario requiring a bridge table.


8. How does a properly designed bridge table affect DAX measures?

A. Makes measures harder to write
B. Requires custom SQL logic
C. Enables predictable filter behavior
D. Eliminates the need for CALCULATE

Correct Answer: C

Explanation:
Bridge tables create clear filter paths, making DAX behavior more predictable and reliable.


9. Which DAX function is commonly used to handle complex many-to-many filtering scenarios?

A. SUMX
B. RELATED
C. TREATAS
D. LOOKUPVALUE

Correct Answer: C

Explanation:
TREATAS is often used to apply filters across tables that are not directly related.


10. For DP-600 exam questions involving many-to-many relationships, which solution is typically preferred?

A. Direct many-to-many relationships
B. Denormalized fact tables
C. Bridge tables with one-to-many relationships
D. Duplicate dimension tables

Correct Answer: C

Explanation:
The exam emphasizes scalable, maintainable modeling practices — bridge tables are the recommended solution.


Implement a Star Schema for a Semantic Model

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models
--> Design and build semantic models
--> Implement a Star Schema for a Semantic Model

What Is a Star Schema?

A star schema is a logical data modeling pattern optimized for analytics and reporting. It organizes data into:

  • Fact tables: Contain numeric measurements (metrics) of business processes
  • Dimension tables: Contain descriptive attributes used for slicing, grouping, and filtering

The schema resembles a star: a central fact table with multiple dimensions radiating outward.


Why Use a Star Schema for Semantic Models?

Star schemas are widely used in Power BI semantic models (Tabular models) because they:

  • Improve query performance: Simplified joins and clear relationships enable efficient engine processing
  • Simplify reporting: Easy for report authors to understand and navigate
  • Support fast aggregations: Summary measures are computed more efficiently
  • Integrate with DAX naturally: Reduces complexity of measures

In DP-600 scenarios where performance and reusability matter, star schemas are often the best design choice.


Semantic Models and Star Schema

Semantic models define business logic that sits on top of data. Star schemas support semantic models by:

  • Providing clean dimensional context (e.g., Product, Region, Time)
  • Ensuring facts are centrally located for aggregations
  • Reducing the number of relationships and cycles
  • Enabling measures to be defined once and reused across visuals

Semantic models typically import star schema tables into Power BI, Direct Lake, or DirectQuery contexts.


Elements of a Star Schema

Fact Tables

A fact table stores measurable, numeric data about business events.

Examples:

  • Sales
  • Orders
  • Transactions
  • Inventory movements

Characteristics:

  • Contains foreign keys referring to dimensions
  • Contains numeric measures (e.g., quantity, revenue)

Dimension Tables

Dimension tables store contextual attributes that describe facts.

Examples:

  • Customer (name, segment, region)
  • Product (category, brand)
  • Date (calendar attributes)
  • Store or location

Characteristics:

  • Typically smaller than fact tables
  • Used to filter and group measures

Building a Star Schema for a Semantic Model

1. Identify the Grain of the Fact Table

The grain defines the level of detail in the fact table — for example:

  • One row per sales transaction per customer per day

Understand the grain before building dimensions.


2. Design Dimension Tables

Dimensions should be:

  • Descriptive
  • De-duplicated
  • Hierarchical where relevant (e.g., Country > State > City)

Example:

DimProductDimCustomerDimDate
ProductIDCustomerIDDateKey
NameNameYear
CategorySegmentQuarter
BrandRegionMonth

3. Define Relationships

Semantic models should have clear relationships:

  • Fact → Dimension: one-to-many
  • No ambiguous cycles
  • Avoid overly complex circular relationships

In a star schema:

  • Fact table joins to each dimension
  • Dimensions do not join to each other directly

4. Import into Semantic Model

In Power BI Desktop or Fabric:

  • Load fact and dimension tables
  • Validate relationships
  • Ensure correct cardinality
  • Mark the Date dimension as a Date table if appropriate

Benefits in Semantic Modeling

BenefitDescription
PerformanceSimplified relationships yield faster queries
UsabilityModel is intuitive for report authors
MaintenanceEasier to document and manage
DAX SimplicityMeasures use clear filter paths

DAX and Star Schema

Star schemas make DAX measures more predictable:

Example measure:

Total Sales = SUM(FactSales[SalesAmount])

With a proper star schema:

  • Filtering by dimension (e.g., DimCustomer[Region] = “West”) automatically propagates to the fact table
  • DAX measure logic is clean and consistent

Star Schema vs Snowflake Schema

FeatureStar SchemaSnowflake Schema
ComplexitySimpleMore complex
Query performanceTypically betterSlightly slower
Modeling effortLowerHigher
NormalizationLowHigh

For analytical workloads (like in Fabric and Power BI), star schemas are generally preferred.


When to Apply a Star Schema

Use star schema design when:

  • You are building semantic models for BI/reporting
  • Data is sourced from multiple systems
  • You need to support slicing and dicing by multiple dimensions
  • Performance and maintainability are priorities

Semantic models built on star schemas work well with:

  • Import mode
  • Direct Lake with dimensional context
  • Composite models

Common Exam Scenarios

You might encounter questions like:

  • “Which table should be the fact in this model?”
  • “Why should dimensions be separated from fact tables?”
  • “How does a star schema improve performance in a semantic model?”

Key answers will focus on:

  • Simplified relationships
  • Better DAX performance
  • Intuitive filtering and slicing

Best Practices for Semantic Star Schemas

  • Explicitly define date tables and mark them as such
  • Avoid many-to-many relationships where possible
  • Keep dimensions denormalized (flattened)
  • Ensure fact tables have surrogate keys linking to dimensions
  • Validate cardinality and relationship directions

Exam Tip

If a question emphasizes performance, simplicity, clear filtering behavior, and ease of reporting, a star schema is likely the correct design choice / optimal answer.


Summary

Implementing a star schema for a semantic model is a proven best practice in analytics:

  • Central fact table
  • Descriptive dimensions
  • One-to-many relationships
  • Optimized for DAX and interactive reporting

This approach supports Fabric’s goal of providing fast, flexible, and scalable analytics.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

1. What is the primary purpose of a star schema in a semantic model?

A. To normalize data to reduce storage
B. To optimize transactional workloads
C. To simplify analytics and improve query performance
D. To enforce row-level security

Correct Answer: C

Explanation:
Star schemas are designed specifically for analytics. They simplify relationships and improve query performance by organizing data into fact and dimension tables.


2. In a star schema, what type of data is typically stored in a fact table?

A. Descriptive attributes such as names and categories
B. Hierarchical lookup values
C. Numeric measures related to business processes
D. User-defined calculated columns

Correct Answer: C

Explanation:
Fact tables store measurable, numeric values such as revenue, quantity, or counts, which are analyzed across dimensions.


3. Which relationship type is most common between fact and dimension tables in a star schema?

A. One-to-one
B. One-to-many
C. Many-to-many
D. Bidirectional many-to-many

Correct Answer: B

Explanation:
Each dimension record (e.g., a customer) can relate to many fact records (e.g., multiple sales), making one-to-many relationships standard.


4. Why are star schemas preferred over snowflake schemas in Power BI semantic models?

A. Snowflake schemas require more storage
B. Star schemas improve DAX performance and model usability
C. Snowflake schemas are not supported in Fabric
D. Star schemas eliminate the need for relationships

Correct Answer: B

Explanation:
Star schemas reduce relationship complexity, making DAX calculations simpler and improving query performance.


5. Which table should typically contain a DateKey column in a star schema?

A. Dimension tables only
B. Fact tables only
C. Both fact and dimension tables
D. Neither table type

Correct Answer: C

Explanation:
The fact table uses DateKey as a foreign key, while the Date dimension uses it as a primary key.


6. What is the “grain” of a fact table?

A. The number of rows in the table
B. The level of detail represented by each row
C. The number of dimensions connected
D. The data type of numeric columns

Correct Answer: B

Explanation:
Grain defines what a single row represents (e.g., one sale per customer per day).


7. Which modeling practice helps ensure optimal performance in a semantic model?

A. Creating relationships between dimension tables
B. Using many-to-many relationships by default
C. Keeping dimensions denormalized
D. Storing text attributes in the fact table

Correct Answer: C

Explanation:
Denormalized (flattened) dimension tables reduce joins and improve query performance in analytic models.


8. What happens when a dimension is used to filter a report in a properly designed star schema?

A. The filter applies only to the dimension table
B. The filter automatically propagates to the fact table
C. The filter is ignored by measures
D. The filter causes a many-to-many relationship

Correct Answer: B

Explanation:
Filters flow from dimension tables to the fact table through one-to-many relationships.


9. Which scenario is best suited for a star schema in a semantic model?

A. Real-time transactional processing
B. Log ingestion with high write frequency
C. Interactive reporting with slicing and aggregation
D. Application-level CRUD operations

Correct Answer: C

Explanation:
Star schemas are optimized for analytical queries involving aggregation, filtering, and slicing.


10. What is a common modeling mistake when implementing a star schema?

A. Using surrogate keys
B. Creating direct relationships between dimension tables
C. Marking a date table as a date table
D. Defining one-to-many relationships

Correct Answer: B

Explanation:
Dimensions should not typically relate to each other directly in a star schema, as this introduces unnecessary complexity.


Choose a storage mode – additional information

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models
--> Design and build semantic models
--> Choose a storage mode

This is supplemental information to what is included in the "Choose a storage mode" post.

DP-600 Cheat Sheet: Choosing a Storage Mode in Microsoft Fabric

Storage Mode Decision Matrix

Requirement / ScenarioImportDirectQueryDirect LakeComposite
Best query performance✅ Excellent❌ Depends on source✅ Excellent✅ Very good
Near real-time data❌ No✅ Yes✅ Yes✅ Yes
Large datasets (TB-scale)❌ Limited✅ Yes✅ Yes✅ Yes
Minimal refresh overhead❌ Requires refresh✅ No refresh✅ No refresh⚠ Partial
Uses OneLake Delta tables❌ Not required❌ Not required✅ Required✅ Optional
Full DAX & modeling features✅ Full support⚠ Limited⚠ Limited✅ Full
Calculated tables supported✅ Yes❌ No❌ No✅ Yes (Import tables only)
Lowest data duplication❌ High✅ None✅ None⚠ Mixed
Simple to manage✅ Yes⚠ Depends on source⚠ Fabric-specific❌ More complex

When to Choose Each Storage Mode

✅ Import Mode — Choose when:

  • Dataset fits comfortably in memory
  • You need complex DAX, calculated tables, or calculated columns
  • Performance is the top priority
  • Data freshness can be managed via scheduled refresh

Exam clue words: fastest, complex calculations, small to medium data


✅ DirectQuery — Choose when:

  • Data must always be current
  • Source system is highly optimized (SQL, Synapse, etc.)
  • Data volume is very large
  • You want zero data duplication

Exam clue words: real-time, source system, no refresh


✅ Direct Lake — Choose when:

  • Data is stored as Delta tables in OneLake
  • Dataset is large and frequently updated
  • You want Import-like performance without refresh
  • You’re working fully within Fabric

Exam clue words: OneLake, Delta, no refresh, Fabric-optimized


✅ Composite Model — Choose when:

  • You need flexibility across different tables
  • Fact tables are large and live (Direct Lake / DirectQuery)
  • Dimension tables are small and stable (Import)
  • You want performance and modeling flexibility

Exam clue words: hybrid, mix storage modes, dimension vs fact


Fast Exam Inclusion/Elimination Tips

  • Calculated tables required? → Import or Composite
  • OneLake + Delta tables? → Direct Lake
  • Real-time + external source? → DirectQuery
  • Best balance of flexibility and scale? → Composite

One-Sentence Exam Rule

If it’s in OneLake and too big to refresh, Direct Lake is usually the right answer.

Choose a Storage Mode

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models (25-30%)
--> Design and build semantic models
--> Choose a storage mode

What Is Storage Mode?

In Microsoft Fabric, storage mode determines how a semantic model accesses and processes data. It affects performance, freshness, compute behavior, and model capabilities. Choosing the right storage mode is critical when designing semantic models for analytics and reporting.

A semantic model (Power BI dataset) can use different storage modes for its tables — and when multiple modes coexist, the model is called a composite model. DEV Community


Common Storage Modes

There are three primary storage modes you should know for the exam:

1. Import Mode

  • Stores data inside the semantic model in memory (VertiPaq) after a refresh. DEV Community
  • Offers fast query performance since data is cached locally.
  • Requires scheduled or manual refresh to update data from the source.
  • Supports the full range of modeling features (e.g., calculated tables, complex DAX).

When to use Import Mode:

  • Data fits in memory and doesn’t need real-time freshness.
  • You need complex calculations or modeling features requiring data in memory.
  • You want high performance for interactive analytics.

Pros:

  • Very fast interactive queries
  • Full DAX and modeling capabilities

Cons:

  • Must schedule refreshes
  • Data freshness depends on refresh cadence

2. DirectQuery Mode

  • Semantic model does not store data locally — queries are sent to the underlying source (SQL, warehouse, etc.) at query time. DEV Community
  • Ensures real-time or near-real-time data because no import refresh is needed.

When to use DirectQuery:

  • Source data changes frequently and must always show the latest results.
  • Data volumes are too large to import fully.

Pros:

  • Real-time access to source data
  • No refresh cycles required

Cons:

  • Performance depends heavily on source system
  • Some modeling features may be limited compared with Import

3. Direct Lake Mode

A newer, Fabric-specific storage mode designed to combine performance and freshness:

  • Reads Delta tables directly from OneLake and loads necessary column data into memory. Microsoft Learn
  • Avoids full data copy, eliminating the long import refresh cycle.
  • Uses the VertiPaq engine for fast aggregations and interactions (similar to import).
  • Offers low-latency synch with source changes without heavy refresh workloads.
  • Supports real-time insights while minimizing data movement. Microsoft Learn

When to use Direct Lake:

  • Working with extremely large datasets that would be costly or impractical to import entirely.
  • Needing relatively fresh data without long refresh cycles typical of Import mode.
  • Integrating tightly with delta-based assets such as Fabric lakehouses and warehouses. Microsoft Learn

Pros:

  • Fast querying with fresher data than import
  • No heavy refresh cycles
  • Leverages OneLake integration and existing delta tables

Cons:

  • Some modeling features (like calculated tables) are limited or not supported in Direct Lake tables (those tables must be switched to Import if needed). Microsoft Fabric Community
  • May fall back to DirectQuery in certain conditions (e.g., tables requiring SQL endpoint security). Microsoft Learn

Composite Models

A semantic model may include a mix of storage modes — for example, some tables in Direct Lake and others in Import. This is called a composite model. DEV Community

Typical use cases for composite models:

  • Import frequently used dimension tables (to support calculated tables)
  • Use Direct Lake for large fact tables stored in OneLake
  • Balance performance with modeling flexibility

Choosing the Right Storage Mode — Key Factors

When deciding on a storage mode for your semantic model, consider:

1. Data Freshness Requirements

  • Real-time data? → DirectQuery or Direct Lake
  • Static or periodic data? → Import

2. Dataset Size

  • Large volumes (multi-TB) without capacity for full import? → Direct Lake
  • Manageable size within memory? → Import

3. Modeling Features Needed

  • Complex measures, calculated tables, custom hierarchies? → Import (or mix)

4. Performance Needs

  • High interactive performance with good freshness? → Direct Lake
  • Ultimate speed with full caching? → Import

5. Source Capabilities

  • Some sources may not support DirectQuery efficiently — understand source performance.

Practical Examples

  • Import Mode: Small/medium enterprise data warehouse reporting that runs daily refreshes.
  • DirectQuery: Regulatory reporting where every query must reflect the latest operational data in a SQL system.
  • Direct Lake: Analytics on massive delta datasets stored in OneLake, where import is impractical but freshness and performance are both essential. Microsoft Learn

Exam Tips

  • Know what each mode does (Import vs DirectQuery vs Direct Lake).
  • Understand trade offs between performance, freshness, and modeling capability.
  • Recognize Direct Lake as a Fabric-optimized hybrid mode ideal for delta lake data.
  • Be prepared to choose the mode based on scenario requirements like latency, size, and features.

Summary

Storage ModeData LocationRefreshPerformanceBest Use Case
ImportIn model memoryScheduledVery fastSmaller datasets needing complex logic
DirectQuerySourceReal-timeSource-dependentReal-time needs
Direct LakeOneLake delta filesNear real-timeFast, scalableLarge datasets in OneLake Microsoft Learn

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

1. Which storage mode stores data fully in memory within the semantic model?

A. DirectQuery
B. Direct Lake
C. Import
D. Composite

Correct Answer: C. Import

Explanation:
Import mode loads data into the VertiPaq in-memory engine inside the semantic model, providing the fastest query performance but requiring refreshes.


2. Which storage mode provides real-time access to data by querying the source system at query time?

A. Import
B. DirectQuery
C. Direct Lake
D. Cached

Correct Answer: B. DirectQuery

Explanation:
DirectQuery does not store data locally. Each query is sent directly to the source system, ensuring real-time or near-real-time results.


3. What is a key advantage of Direct Lake compared to Import mode?

A. Supports more DAX functions
B. Requires no OneLake integration
C. Avoids full data refresh while maintaining high performance
D. Works only with SQL Server

Correct Answer: C. Avoids full data refresh while maintaining high performance

Explanation:
Direct Lake reads Delta tables directly from OneLake, avoiding large import refreshes while still using the VertiPaq engine for fast analytics.


4. Which scenario is best suited for Import mode?

A. A dataset requiring real-time updates every second
B. A small to medium dataset with complex DAX calculations
C. A multi-terabyte lakehouse fact table
D. Streaming event data

Correct Answer: B. A small to medium dataset with complex DAX calculations

Explanation:
Import mode supports the full range of modeling features and offers excellent performance for datasets that fit comfortably in memory.


5. Which storage mode is specifically optimized for Delta tables stored in OneLake?

A. Import
B. DirectQuery
C. Direct Lake
D. Hybrid

Correct Answer: C. Direct Lake

Explanation:
Direct Lake is a Fabric-optimized storage mode designed to work directly with Delta tables in OneLake.


6. A semantic model includes some tables in Import mode and others in Direct Lake mode. What is this called?

A. Hybrid model
B. Incremental model
C. Composite model
D. Federated model

Correct Answer: C. Composite model

Explanation:
A composite model uses multiple storage modes within the same semantic model, allowing flexibility between performance and freshness.


7. Which limitation applies to Direct Lake tables?

A. They cannot be refreshed
B. They do not support relationships
C. Calculated tables are not supported directly
D. They cannot be queried using DAX

Correct Answer: C. Calculated tables are not supported directly

Explanation:
Calculated tables require Import mode. Direct Lake tables must be switched to Import if calculated tables are needed.


8. What primarily determines query performance when using DirectQuery mode?

A. The VertiPaq engine
B. The refresh schedule
C. The source system’s performance
D. OneLake caching

Correct Answer: C. The source system’s performance

Explanation:
In DirectQuery mode, queries are executed against the source system, so performance depends on source optimization and capacity.


9. Which storage mode minimizes data duplication while still offering high query performance?

A. Import
B. DirectQuery
C. Direct Lake
D. Cached Import

Correct Answer: C. Direct Lake

Explanation:
Direct Lake avoids copying data into the model while still leveraging in-memory query acceleration, minimizing duplication and refresh overhead.


10. You need near real-time analytics on a very large dataset stored in OneLake without long refresh times. Which storage mode should you choose?

A. Import
B. DirectQuery
C. Direct Lake
D. Snapshot

Correct Answer: C. Direct Lake

Explanation:
Direct Lake is ideal for large OneLake datasets where full import refreshes are impractical but fast, fresh analytics are required.

Select, Filter, and Aggregate Data by Using KQL

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data
--> Query and analyze data
--> Select, filter, and aggregate data by using KQL

The Kusto Query Language (KQL) is a read-only request language used for querying large, distributed, event-driven datasets — especially within Eventhouse and Azure Data Explorer–backed workloads in Microsoft Fabric. KQL enables you to select, filter, and aggregate data efficiently in scenarios involving high-velocity data like telemetry, logs, and streaming events.

For the DP-600 exam, you should understand KQL basics and how it supports data exploration and analytical summarization in a real-time analytics context.


KQL Basics

KQL is designed to be expressive and performant for time-series or log-like data. Queries are built as a pipeline of operations, where each operator transforms the data and passes it to the next.


Selecting Data

In KQL, the project operator performs the equivalent of selecting columns:

EventHouseTable
| project Timestamp, Country, EventType, Value

  • project lets you choose which fields to include
  • You can rename fields inline: | project Time=Timestamp, Sales=Value

Exam Tip:
Use project early to limit data to relevant columns and reduce processing downstream.


Filtering Data

Filtering in KQL is done using the where operator:

EventHouseTable
| where Country == "USA"

Multiple conditions can be combined with and/or:

| where Value > 100 and EventType == "Purchase"

Filtering early in the pipeline improves performance by reducing the dataset before subsequent transformations.


Aggregating Data

KQL uses the summarize operator to perform aggregations such as counts, sums, averages, min, max, etc.

Example – Aggregate Total Sales:

EventHouseTable
| where EventType == "Purchase"
| summarize TotalSales = sum(Value)

Example – Grouped Aggregation:

EventHouseTable
| where EventType == "Purchase"
| summarize CountEvents = count(), TotalSales = sum(Value) by Country

Time-Bucketed Aggregation

KQL supports time binning using bin():

EventHouseTable
| where EventType == "Purchase"
| summarize TotalSales = sum(Value) by Country, bin(Timestamp, 1h)

This groups results into hourly buckets, which is ideal for time-series analytics and dashboards.


Common KQL Aggregation Functions

FunctionDescription
count()Total number of records
sum(column)Sum of numeric values
avg(column)Average value
min(column) / max(column)Minimum / maximum value
percentile(column, p)Percentile calculation

Combining Operators

KQL queries are often a combination of select, filter, and aggregation:

EventHouseTable
| where EventType == "Purchase" and Timestamp >= ago(7d)
| project Country, Value, Timestamp
| summarize TotalSales = sum(Value), CountPurchases = count() by Country
| order by TotalSales desc

This pipeline:

  1. Filters for purchases in the last 7 days
  2. Projects relevant fields
  3. Aggregates totals and counts
  4. Orders the result by highest total sales

KQL vs SQL: What’s Different?

FeatureSQLKQL
SyntaxDeclarativePipeline-based
JoinsExtensive supportLimited pivot semantics
Use casesRelational dataTime-series, event, logs
AggregationGROUP BYsummarize

KQL shines when querying streaming or event data at scale — exactly the kinds of scenarios Eventhouse targets.


Performance Considerations in KQL

  • Apply where as early as possible.
  • Use project to keep only necessary fields.
  • Time-range filters (e.g., last 24h) drastically reduce scan size.
  • KQL runs distributed and is optimized for large event streams.

Practical Use Cases

Example – Top Countries by Event Count:

EventHouseTable
| summarize EventCount = count() by Country
| top 10 by EventCount

Example – Average Value of Events per Day:

EventHouseTable
| where EventType == "SensorReading"
| summarize AvgValue = avg(Value) by bin(Timestamp, 1d)


Exam Relevance

In DP-600 exam scenarios involving event or near-real-time analytics (such as with Eventhouse or KQL-backed lakehouse sources), you may be asked to:

  • Write or interpret KQL that:
    • projects specific fields
    • filters records based on conditions
    • aggregates and groups results
  • Choose the correct operator (where, project, summarize) for a task
  • Understand how KQL can be optimized with time-based filtering

Key Takeaways

  • project selects specific fields.
  • where filters rows based on conditions.
  • summarize performs aggregations.
  • Time-series queries often use bin() for bucketing.
  • The KQL pipeline enables modular, readable, and optimized queries for large datasets.

Final Exam Tips

If a question involves event streams, telemetry, metrics over time, or real-time analytics, and asks about summarizing values after filtering, think KQL with where, project, and summarize.

  • project → select columns
  • where → filter rows
  • summarize → aggregate and group
  • bin() → time-based grouping
  • KQL is pipeline-based, not declarative like SQL
  • Used heavily in Eventhouse / real-time analytics

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

1. Which KQL operator is used to select specific columns from a dataset?

A. select
B. where
C. project
D. summarize

Correct Answer: C

Explanation:
project is the KQL operator used to select and optionally rename columns. KQL does not use SELECT like SQL.


2. Which operator is used to filter rows in a KQL query?

A. filter
B. where
C. having
D. restrict

Correct Answer: B

Explanation:
The where operator filters rows based on conditions and is typically placed early in the query pipeline for performance.


3. How do you count the number of records in a table using KQL?

A. count(*)
B. summarize count()
C. summarize count(*)
D. summarize count()

Correct Answer: D

Explanation:
In KQL, aggregation functions are used inside summarize. count() counts rows; count(*) is SQL syntax.


4. Which KQL operator performs aggregations similar to SQL’s GROUP BY?

A. group
B. aggregate
C. summarize
D. partition

Correct Answer: C

Explanation:
summarize is the KQL operator used for aggregation and grouping.


5. Which query returns total sales grouped by country?

A.

| group by Country sum(Value)

B.

| summarize sum(Value) Country

C.

| summarize TotalSales = sum(Value) by Country

D.

| aggregate Value by Country

Correct Answer: C

Explanation:
KQL requires explicit naming of aggregates and grouping using summarize … by.


6. What is the purpose of the bin() function in KQL?

A. To sort data
B. To group numeric values
C. To bucket values into time intervals
D. To remove null values

Correct Answer: C

Explanation:
bin() groups values—commonly timestamps—into fixed-size intervals (for example, hourly or daily buckets).


7. Which query correctly summarizes event counts per hour?

A.

| summarize count() by Timestamp

B.

| summarize count() by hour(Timestamp)

C.

| summarize count() by bin(Timestamp, 1h)

D.

| count() by Timestamp

Correct Answer: C

Explanation:
Time-based grouping in KQL requires bin() to define the interval size.


8. Which operator should be placed as early as possible in a KQL query for performance reasons?

A. summarize
B. project
C. order by
D. where

Correct Answer: D

Explanation:
Applying where early reduces the dataset size before further processing, improving performance.


9. Which KQL query returns the top 5 countries by event count?

A.

| top 5 Country by count()

B.

| summarize count() by Country | top 5 by count_

C.

| summarize EventCount = count() by Country | top 5 by EventCount

D.

| order by Country limit 5

Correct Answer: C

Explanation:
You must first aggregate using summarize, then use top based on the aggregated column.


10. In Microsoft Fabric, KQL is primarily used with which workload?

A. Warehouse
B. Lakehouse SQL endpoint
C. Eventhouse
D. Semantic model

Correct Answer: C

Explanation:
KQL is the primary query language for Eventhouse and real-time analytics scenarios in Microsoft Fabric.


Select, Filter, and Aggregate Data Using SQL

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data
--> Query and analyze data
--> Select, Filter, and Aggregate Data Using SQL

Working with SQL to select, filter, and aggregate data is a core skill for analytics engineers using Microsoft Fabric. Whether querying data in a warehouse, lakehouse SQL analytics endpoint, or semantic model via DirectQuery, SQL enables precise data retrieval and summarization for reporting, dashboards, and analytics solutions.

For DP-600, you should understand how to construct SQL queries that perform:

  • Selecting specific data columns
  • Filtering rows based on conditions
  • Aggregating values with grouping and summary functions

SQL Data Selection

Selecting data refers to using the SELECT clause to choose which columns or expressions to return.

Example:

SELECT
    CustomerID,
    OrderDate,
    SalesAmount
FROM Sales;

  • Use * to return all columns:
    SELECT * FROM Sales;
  • Use expressions to compute derived values: SELECT OrderDate, SalesAmount, SalesAmount * 1.1 AS AdjustedRevenue FROM Sales;

Exam Tip: Be purposeful in selecting only needed columns to improve performance.


SQL Data Filtering

Filtering data determines which rows are returned based on conditions using the WHERE clause.

Basic Filtering:

SELECT *
FROM Sales
WHERE OrderDate >= '2025-01-01';

Combined Conditions:

  • AND: WHERE Country = 'USA' AND SalesAmount > 1000
  • OR: WHERE Region = 'East' OR Region = 'West'

Null and Missing Value Filters:

WHERE SalesAmount IS NOT NULL

Exam Tip: Understand how WHERE filters reduce dataset size before aggregation.


SQL Aggregation

Aggregation summarizes grouped rows using functions like SUM, COUNT, AVG, MIN, and MAX.

Basic Aggregation:

SELECT
    SUM(SalesAmount) AS TotalSales
FROM Sales;

Grouped Aggregation:

SELECT
    Country,
    SUM(SalesAmount) AS TotalSales,
    COUNT(*) AS OrderCount
FROM Sales
GROUP BY Country;

Filtering After Aggregation:

Use HAVING instead of WHERE to filter aggregated results:

SELECT
    Country,
    SUM(SalesAmount) AS TotalSales
FROM Sales
GROUP BY Country
HAVING SUM(SalesAmount) > 100000;

Exam Tip:

  • Use WHERE for row-level filters before grouping.
  • Use HAVING to filter group-level aggregates.

Combining Select, Filter, and Aggregate

A complete SQL query often blends all three:

SELECT
    ProductCategory,
    COUNT(*) AS Orders,
    SUM(SalesAmount) AS TotalSales,
    AVG(SalesAmount) AS AvgSale
FROM Sales
WHERE OrderDate BETWEEN '2025-01-01' AND '2025-12-31'
GROUP BY ProductCategory
ORDER BY TotalSales DESC;

This example:

  • Selects specific columns and expressions
  • Filters by date range
  • Aggregates by product category
  • Orders results by summary metric

SQL in Different Fabric Workloads

WorkloadSQL Usage
WarehouseStandard T-SQL for BI queries
Lakehouse SQL AnalyticsSQL against Delta tables
Semantic Models via DirectQuerySQL pushed to source where supported
Dataflows/Power QuerySQL-like operations through M (not direct SQL)

Performance and Pushdown

When using SQL in Fabric:

  • Engines push filters and aggregations down to the data source for performance.
  • Select only needed columns early to limit data movement.
  • Avoid SELECT * in production queries unless necessary.

Key SQL Concepts for the Exam

ConceptWhy It Matters
SELECTDefines what data to retrieve
WHEREFilters data before aggregation
GROUP BYOrganizes rows into groups
HAVINGFilters after aggregation
Aggregate functionsSummarize numeric data

Understanding how these work together is essential for creating analytics-ready datasets.


Common Exam Scenarios

You may be asked to:

  • Write SQL to filter data based on conditions
  • Summarize data across groups
  • Decide whether to use WHERE or HAVING
  • Identify the correct SQL pattern for a reporting requirement

Example exam prompt:

“Which SQL query correctly returns the total sales per region, only for regions with more than 1,000 orders?”

Understanding aggregate filters (HAVING) and groupings will be key.


Final Exam Tips

If a question mentions:

  • “Return summary metrics”
  • “Only include rows that meet conditions”
  • “Group results by category”

…you’re looking at combining SELECT, WHERE, and GROUP BY in SQL.

  • WHERE filters rows before aggregation
  • HAVING filters after aggregation
  • GROUP BY is required for per-group metrics
  • Use aggregate functions intentionally
  • Performance matters — avoid unnecessary columns

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

1. Which SQL clause is used to filter rows before aggregation occurs?

A. HAVING
B. GROUP BY
C. WHERE
D. ORDER BY

Correct Answer: C

Explanation:
The WHERE clause filters individual rows before any aggregation or grouping takes place. HAVING filters results after aggregation.


2. You need to calculate total sales per product category. Which clause is required?

A. WHERE
B. GROUP BY
C. ORDER BY
D. HAVING

Correct Answer: B

Explanation:
GROUP BY groups rows so aggregate functions (such as SUM) can be calculated per category.


3. Which function returns the number of rows in each group?

A. SUM()
B. COUNT()
C. AVG()
D. MAX()

Correct Answer: B

Explanation:
COUNT() counts the number of rows in a group. It is commonly used to count records or transactions.


4. Which query correctly filters aggregated results?

A.

WHERE SUM(SalesAmount) > 10000

B.

HAVING SUM(SalesAmount) > 10000

C.

GROUP BY SUM(SalesAmount) > 10000

D.

ORDER BY SUM(SalesAmount) > 10000

Correct Answer: B

Explanation:
HAVING is used to filter aggregated values. WHERE cannot reference aggregate functions.


5. Which SQL statement returns the total number of orders?

A.

SELECT COUNT(*) FROM Orders;

B.

SELECT SUM(*) FROM Orders;

C.

SELECT TOTAL(Orders) FROM Orders;

D.

SELECT COUNT(Orders) FROM Orders;

Correct Answer: A

Explanation:
COUNT(*) counts all rows in a table, making it the correct way to return total order count.


6. Which clause is used to sort aggregated query results?

A. GROUP BY
B. WHERE
C. ORDER BY
D. HAVING

Correct Answer: C

Explanation:
ORDER BY sorts the final result set, including aggregated columns.


7. What happens if a column in the SELECT statement is not included in the GROUP BY clause or an aggregate function?

A. The query runs but returns incorrect results
B. SQL automatically groups it
C. The query fails
D. The column is ignored

Correct Answer: C

Explanation:
In SQL, any column in SELECT must either be aggregated or included in GROUP BY.


8. Which query returns average sales amount per country?

A.

SELECT Country, AVG(SalesAmount)
FROM Sales;

B.

SELECT Country, AVG(SalesAmount)
FROM Sales
GROUP BY Country;

C.

SELECT Country, SUM(SalesAmount)
GROUP BY Country;

D.

SELECT AVG(SalesAmount)
FROM Sales
GROUP BY Country;

Correct Answer: B

Explanation:
Grouping by Country allows AVG(SalesAmount) to be calculated per country.


9. Which filter removes rows with NULL values in a column?

A.

WHERE SalesAmount = NULL

B.

WHERE SalesAmount <> NULL

C.

WHERE SalesAmount IS NOT NULL

D.

WHERE NOT NULL SalesAmount

Correct Answer: C

Explanation:
SQL uses IS NULL and IS NOT NULL to check for null values.


10. Which SQL pattern is most efficient for analytics queries in Microsoft Fabric?

A. Selecting all columns and filtering later
B. Using SELECT * for simplicity
C. Filtering early and selecting only needed columns
D. Aggregating without grouping

Correct Answer: C

Explanation:
Filtering early and selecting only required columns improves performance by reducing data movement—an important Fabric best practice.


Select, Filter, and Aggregate Data by Using the Visual Query Editor

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data
--> Query and analyze data
--> Select, Filter, and Aggregate Data by Using the Visual Query Editor

In Microsoft Fabric, the Visual Query Editor (VQE) provides a low-code, graphical experience for querying data across lakehouses, warehouses, and semantic models. It allows analytics engineers to explore, shape, and summarize data without writing SQL or KQL, while still generating optimized queries behind the scenes.

For the DP-600 exam, you should understand what the Visual Query Editor is, where it’s used, and how to perform common data analysis tasks such as selecting columns, filtering rows, and aggregating data.


What Is the Visual Query Editor?

The Visual Query Editor is a graphical query-building interface available in multiple Fabric experiences, including:

  • Lakehouse SQL analytics endpoint
  • Warehouse
  • Power BI (Direct Lake and DirectQuery scenarios)
  • Data exploration within Fabric items

Instead of writing queries manually, you interact with:

  • Tables and columns
  • Drag-and-drop operations
  • Menus for filters, grouping, and aggregations

Fabric then translates these actions into optimized SQL or engine-specific queries.


Selecting Data

Selecting data in the Visual Query Editor focuses on choosing the right columns and datasets for analysis.

Key Capabilities

  • Select or deselect columns from one or more tables
  • Rename columns for readability
  • Reorder columns for analysis or reporting
  • Combine columns from related tables (via existing relationships)

Exam Tips

  • Selecting fewer columns improves performance and reduces data transfer.
  • Column renaming in VQE affects the query result, not the underlying table schema.
  • The Visual Query Editor respects relationships defined in semantic models and warehouses.

Filtering Data

Filtering allows you to limit rows based on conditions, ensuring only relevant data is included.

Common Filter Types

  • Equality filters (e.g., Status = "Active")
  • Range filters (e.g., dates, numeric thresholds)
  • Text filters (contains, starts with, ends with)
  • Null / non-null filters
  • Relative date filters (last 7 days, current month)

Where Filtering Is Applied

  • At the query level, not permanently in the data source
  • Before aggregation (important for correct results)

Exam Tips

  • Filters applied in the Visual Query Editor are executed at the data source when possible (query folding).
  • Filtering early improves performance and reduces memory usage.
  • Be aware of how filters interact with aggregations.

Aggregating Data

Aggregation summarizes data by grouping rows and applying calculations.

Common Aggregations

  • Sum
  • Count / Count Distinct
  • Average
  • Min / Max

Grouping Data

  • Select one or more columns as group-by fields
  • Apply aggregations to numeric or date columns
  • Results return one row per group

Examples

  • Total sales by product category
  • Count of orders per customer
  • Average response time by day

Exam Tips

  • Aggregations in the Visual Query Editor are conceptually similar to GROUP BY in SQL.
  • Aggregated queries reduce dataset size and improve performance.
  • Understand the difference between row-level data and aggregated results.

Behind the Scenes: Generated Queries

Although the Visual Query Editor is low-code, Fabric generates:

  • SQL queries for warehouses and lakehouse SQL endpoints
  • Optimized engine-specific queries for semantic models

This ensures:

  • Efficient execution
  • Compatibility with Direct Lake and DirectQuery
  • Consistent results across Fabric experiences

Exam Tip

You are not required to read or write the generated SQL, but you should understand that the Visual Query Editor does not bypass query optimization.


When to Use the Visual Query Editor

Use the Visual Query Editor when:

  • Quickly exploring unfamiliar datasets
  • Building queries without writing code
  • Creating reusable query logic for reports
  • Teaching or collaborating with less SQL-focused users

Avoid it when:

  • Complex transformations are required (use SQL, Spark, or Dataflows)
  • Highly customized logic is needed beyond supported operations

Key Exam Takeaways

For the DP-600 exam, remember:

  • The Visual Query Editor is a graphical query-building tool in Microsoft Fabric.
  • It supports selecting columns, filtering rows, and aggregating data.
  • Operations are translated into optimized queries executed at the data source.
  • Filtering occurs before aggregation, affecting results and performance.
  • It is commonly used with lakehouses, warehouses, and semantic models.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions
  • Know the purpose and scope of the Visual Query Editor
  • Know how to selecting, filtering, and aggregating data
  • Understand execution order and performance implications
  • Know when to use (and not use) the Visual Query Editor

Question 1

What is the primary purpose of the Visual Query Editor in Microsoft Fabric?

A. To permanently modify table schemas
B. To build queries visually without writing SQL
C. To replace semantic models
D. To manage workspace permissions

Correct Answer: B

Explanation:
The Visual Query Editor provides a low-code, graphical interface for building queries. It does not modify schemas, replace models, or manage security.


Question 2

When you deselect a column in the Visual Query Editor, what happens?

A. The column is deleted from the source table
B. The column is hidden permanently for all users
C. The column is excluded only from the query results
D. The column data type is changed

Correct Answer: C

Explanation:
Column selection affects only the query output, not the underlying data or schema.


Question 3

Why is it considered a best practice to select only required columns in a query?

A. It enforces data security
B. It reduces query complexity and improves performance
C. It enables Direct Lake mode
D. It prevents duplicate rows

Correct Answer: B

Explanation:
Selecting fewer columns reduces data movement and memory usage, leading to better performance.


Question 4

Which type of filter is commonly used to restrict data to a recent time period?

A. Equality filter
B. Text filter
C. Relative date filter
D. Aggregate filter

Correct Answer: C

Explanation:
Relative date filters (e.g., “Last 30 days”) dynamically adjust based on the current date and are commonly used in analytics.


Question 5

At what stage of query execution are filters applied in the Visual Query Editor?

A. After aggregation
B. After the query result is returned
C. Before aggregation
D. Only in the Power BI report layer

Correct Answer: C

Explanation:
Filters are applied before aggregation, ensuring accurate summary results and better performance.


Question 6

Which aggregation requires grouping to produce meaningful results?

A. SUM
B. COUNT
C. GROUP BY
D. MIN

Correct Answer: C

Explanation:
Grouping defines how rows are summarized. Aggregations like SUM or COUNT rely on GROUP BY logic to produce per-group results.


Question 7

You want to see total sales by product category. Which Visual Query Editor actions are required?

A. Filter Product Category and sort by Sales
B. Group by Product Category and apply SUM to Sales
C. Count Product Category and filter Sales
D. Rename Product Category and aggregate rows

Correct Answer: B

Explanation:
This scenario requires grouping on Product Category and applying a SUM aggregation to the Sales column.


Question 8

What happens behind the scenes when you build a query using the Visual Query Editor?

A. Fabric stores a cached dataset only
B. Fabric generates optimized SQL or engine-specific queries
C. Fabric converts the query into DAX
D. Fabric disables query folding

Correct Answer: B

Explanation:
The Visual Query Editor translates visual actions into optimized queries (such as SQL) that execute at the data source.


Question 9

Which Fabric items commonly support querying through the Visual Query Editor?

A. Pipelines and notebooks only
B. Dashboards only
C. Lakehouses, warehouses, and semantic models
D. Eventhouses only

Correct Answer: C

Explanation:
The Visual Query Editor is widely used across lakehouses, warehouses, and semantic models in Fabric.


Question 10

When should you avoid using the Visual Query Editor?

A. When exploring new datasets
B. When building quick aggregations
C. When complex transformation logic is required
D. When filtering data

Correct Answer: C

Explanation:
For advanced or complex transformations, SQL, Spark, or Dataflows are more appropriate than the Visual Query Editor.


Filter Data

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data
--> Transform data
--> Filter data

Filtering data is one of the most fundamental transformation operations used when preparing analytics data. It ensures that only relevant, valid, and accurate records are included in curated tables or models. Filtering improves performance, reduces unnecessary processing overhead, and helps enforce business logic early in the analytics pipeline.

In Microsoft Fabric, filtering occurs at multiple transformation layers — from ingestion tools to interactive modeling. For the DP-600 exam, you should understand where, why, and how to filter data effectively using various tools and technologies within Fabric.


Why Filter Data?

Filtering data serves several key purposes in analytics:

1. Improve Query and Report Performance

  • Reduces the amount of data scanned and processed
  • Enables faster refresh and retrieval

2. Enforce Business Logic

  • Excludes irrelevant segments (e.g., test data, canceled transactions)
  • Supports clean analytical results

3. Prepare Analytics-Ready Data

  • Limits datasets to required time periods or categories
  • Produces smaller, focused outputs for reporting

4. Reduce Cost

  • Smaller processing needs reduce compute and storage overhead

Where Filtering Happens in Microsoft Fabric

Filtering can be implemented at multiple stages:

LayerHow You Filter
Power Query (Dataflows Gen2 / Lakehouse)UI filters or M code
SQL (Warehouse & Lakehouse SQL analytics)WHERE clauses
Spark (Lakehouse Notebooks)DataFrame filter() / where()
Pipelines (Data Movement)Source filters or query-based extraction
Semantic Models (Power BI / DAX)Query filters, slicers, and row-level security

Filtering early, as close to the data source as possible, ensures better performance downstream.


Tools and Techniques

1. Power Query (Low-Code)

Power Query provides a user-friendly interface to filter rows:

  • Text filters: Equals, Begins With, Contains, etc.
  • Number filters: Greater than, Between, Top N, etc.
  • Date filters: Before, After, This Month, Last 12 Months, etc.
  • Remove blank or null values

These filters are recorded as transformation steps and can be reused or versioned.


2. SQL (Warehouses & Lakehouses)

SQL filtering uses the WHERE clause:

SELECT *
FROM Sales
WHERE OrderDate >= '2025-01-01'
  AND Country = 'USA';

SQL filtering is efficient and pushed down to the engine, reducing row counts early.


3. Spark (Notebooks)

Filtering in Spark (PySpark example):

filtered_df = df.filter(df["SalesAmount"] > 1000)

Or with SQL in Spark:

SELECT *
FROM sales
WHERE SalesAmount > 1000;

Spark filtering is optimized for distributed processing across big datasets.


4. Pipelines (Data Movement)

During ingestion or ETL, you can apply filters in:

  • Copy activity query filters
  • Source queries
  • Pre-processing steps

This ensures only needed rows land in the target store.


5. Semantic Model Filters

In Power BI and semantic models, filtering can happen as:

  • Report filters
  • Slicers and visuals
  • Row-Level Security (RLS) — security-driven filtering

These filters control what users see rather than what data is stored.


Business and Data Quality Scenarios

Filtering is often tied to business needs such as:

  • Excluding invalid, test, or archived records
  • Restricting to active customers only
  • Selecting a specific date range (e.g., last fiscal year)
  • Filtering data for regional or product segments

Filtering vs Security

It’s important to distinguish filtering for transformation from security filters:

FilteringSecurity
Removes unwanted rows during transformationControls what users are allowed to see
Improves performanceEnforces access control
Happens before modelingHappens during query evaluation

Best Practices

When filtering data in Microsoft Fabric:

  • Filter early in the pipeline to reduce volume
  • Use pushdown filters in SQL when querying large sources
  • Document filtering logic for audit and governance
  • Combine filters logically (AND/OR) to match business rules
  • Avoid filtering in the semantic model when it can be done upstream

Common Exam Scenarios

You may be asked to:

  • Choose the correct tool and stage for filtering
  • Translate business rules into filter logic
  • Recognize when filtering improves performance
  • Identify risks of filtering too late or in the wrong layer

Example exam prompt:
A dataset should exclude test transactions and include only the last 12 months of sales. Which transformation step should be applied and where?
The correct answer will involve filtering early with SQL or Power Query before modeling.


Key Takeaways

  • Filtering data is a core part of preparing analytics-ready datasets.
  • Multiple Fabric components support filtering (Power Query, SQL, Spark, pipelines).
  • Filtering early improves performance and reduces unnecessary workload.
  • Understand filtering in context — transformation vs. security.

Final Exam Tips

  • When a question asks about reducing dataset size, improving performance, or enforcing business logic before loading into a model, filtering is often the correct action — and it usually belongs upstream.
  • Filter early and upstream whenever possible
  • Use SQL or Power Query for transformation-level filtering
  • Avoid relying solely on report-level filters for large datasets
  • Distinguish filtering for performance from security filtering

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

Question 1

What is the primary purpose of filtering data during the transformation phase?

A. To enforce user-level security
B. To reduce data volume and improve performance
C. To encrypt sensitive columns
D. To normalize data structures

Correct Answer: B

Explanation:
Filtering removes unnecessary rows early in the pipeline, reducing data volume, improving performance, and lowering compute costs. Security and normalization are separate concerns.


Question 2

Which Fabric component allows low-code, UI-driven row filtering during data preparation?

A. Spark notebooks
B. SQL warehouse
C. Power Query (Dataflows Gen2)
D. Semantic models

Correct Answer: C

Explanation:
Power Query provides a graphical interface for filtering rows using text, numeric, and date-based filters, making it ideal for low-code transformations.


Question 3

Which SQL clause is used to filter rows in a lakehouse or warehouse?

A. GROUP BY
B. HAVING
C. WHERE
D. ORDER BY

Correct Answer: C

Explanation:
The WHERE clause filters rows before aggregation or sorting, making it the primary SQL mechanism for data filtering.


Question 4

Which filtering approach is most efficient for very large datasets?

A. Filtering in Power BI visuals
B. Filtering after loading data into a semantic model
C. Filtering at the source using SQL or ingestion queries
D. Filtering using calculated columns

Correct Answer: C

Explanation:
Filtering as close to the source as possible minimizes data movement and processing, making it the most efficient approach for large datasets.


Question 5

In a Spark notebook, which method is commonly used to filter a DataFrame?

A. select()
B. filter() or where()
C. join()
D. distinct()

Correct Answer: B

Explanation:
Spark DataFrames use filter() or where() to remove rows based on conditions.


Question 6

Which scenario is an example of business-rule filtering?

A. Removing duplicate rows
B. Converting text to numeric data types
C. Excluding canceled orders from sales analysis
D. Creating a star schema

Correct Answer: C

Explanation:
Business-rule filtering enforces organizational logic, such as excluding canceled or test transactions from analytics.


Question 7

What is the key difference between data filtering and row-level security (RLS)?

A. Filtering improves query speed; RLS does not
B. Filtering removes data; RLS restricts visibility
C. Filtering is applied only in SQL; RLS is applied only in Power BI
D. Filtering is mandatory; RLS is optional

Correct Answer: B

Explanation:
Filtering removes rows from the dataset, while RLS controls which rows users can see without removing the data itself.


Question 8

Which filtering method is typically applied after data has already been loaded?

A. Source query filters
B. Pipeline copy activity filters
C. Semantic model report filters
D. Power Query transformations

Correct Answer: C

Explanation:
Report and visual filters in semantic models are applied at query time and do not reduce stored data volume.


Question 9

Why is filtering data early in the pipeline considered a best practice?

A. It increases data redundancy
B. It simplifies semantic model design
C. It reduces processing and storage costs
D. It improves data encryption

Correct Answer: C

Explanation:
Early filtering minimizes unnecessary data processing and storage, improving efficiency across the entire analytics solution.


Question 10

A dataset should include only the last 12 months of data. Where should this filter ideally be applied?

A. In Power BI slicers
B. In the semantic model
C. During data ingestion or transformation
D. In calculated measures

Correct Answer: C

Explanation:
Applying time-based filters during ingestion or transformation ensures only relevant data is processed and stored, improving performance and consistency.