In Microsoft Fabric, storage mode determines how a semantic model accesses and processes data. It affects performance, freshness, compute behavior, and model capabilities. Choosing the right storage mode is critical when designing semantic models for analytics and reporting.
A semantic model (Power BI dataset) can use different storage modes for its tables — and when multiple modes coexist, the model is called a composite model. DEV Community
Common Storage Modes
There are three primary storage modes you should know for the exam:
1. Import Mode
Stores data inside the semantic model in memory (VertiPaq) after a refresh. DEV Community
Offers fast query performance since data is cached locally.
Requires scheduled or manual refresh to update data from the source.
Supports the full range of modeling features (e.g., calculated tables, complex DAX).
When to use Import Mode:
Data fits in memory and doesn’t need real-time freshness.
You need complex calculations or modeling features requiring data in memory.
You want high performance for interactive analytics.
Pros:
Very fast interactive queries
Full DAX and modeling capabilities
Cons:
Must schedule refreshes
Data freshness depends on refresh cadence
2. DirectQuery Mode
Semantic model does not store data locally — queries are sent to the underlying source (SQL, warehouse, etc.) at query time. DEV Community
Ensures real-time or near-real-time data because no import refresh is needed.
When to use DirectQuery:
Source data changes frequently and must always show the latest results.
Data volumes are too large to import fully.
Pros:
Real-time access to source data
No refresh cycles required
Cons:
Performance depends heavily on source system
Some modeling features may be limited compared with Import
3. Direct Lake Mode
A newer, Fabric-specific storage mode designed to combine performance and freshness:
Reads Delta tables directly from OneLake and loads necessary column data into memory. Microsoft Learn
Avoids full data copy, eliminating the long import refresh cycle.
Uses the VertiPaq engine for fast aggregations and interactions (similar to import).
Offers low-latency synch with source changes without heavy refresh workloads.
Supports real-time insights while minimizing data movement. Microsoft Learn
When to use Direct Lake:
Working with extremely large datasets that would be costly or impractical to import entirely.
Needing relatively fresh data without long refresh cycles typical of Import mode.
Integrating tightly with delta-based assets such as Fabric lakehouses and warehouses. Microsoft Learn
Pros:
Fast querying with fresher data than import
No heavy refresh cycles
Leverages OneLake integration and existing delta tables
Cons:
Some modeling features (like calculated tables) are limited or not supported in Direct Lake tables (those tables must be switched to Import if needed). Microsoft Fabric Community
May fall back to DirectQuery in certain conditions (e.g., tables requiring SQL endpoint security). Microsoft Learn
Composite Models
A semantic model may include a mix of storage modes — for example, some tables in Direct Lake and others in Import. This is called a composite model. DEV Community
Typical use cases for composite models:
Import frequently used dimension tables (to support calculated tables)
Use Direct Lake for large fact tables stored in OneLake
Balance performance with modeling flexibility
Choosing the Right Storage Mode — Key Factors
When deciding on a storage mode for your semantic model, consider:
1. Data Freshness Requirements
Real-time data? → DirectQuery or Direct Lake
Static or periodic data? → Import
2. Dataset Size
Large volumes (multi-TB) without capacity for full import? → Direct Lake
High interactive performance with good freshness? → Direct Lake
Ultimate speed with full caching? → Import
5. Source Capabilities
Some sources may not support DirectQuery efficiently — understand source performance.
Practical Examples
Import Mode: Small/medium enterprise data warehouse reporting that runs daily refreshes.
DirectQuery: Regulatory reporting where every query must reflect the latest operational data in a SQL system.
Direct Lake: Analytics on massive delta datasets stored in OneLake, where import is impractical but freshness and performance are both essential. Microsoft Learn
Exam Tips
Know what each mode does (Import vs DirectQuery vs Direct Lake).
Understand trade offs between performance, freshness, and modeling capability.
Recognize Direct Lake as a Fabric-optimized hybrid mode ideal for delta lake data.
Be prepared to choose the mode based on scenario requirements like latency, size, and features.
Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …
Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions
1. Which storage mode stores data fully in memory within the semantic model?
A. DirectQuery B. Direct Lake C. Import D. Composite
Correct Answer: C. Import
Explanation: Import mode loads data into the VertiPaq in-memory engine inside the semantic model, providing the fastest query performance but requiring refreshes.
2. Which storage mode provides real-time access to data by querying the source system at query time?
A. Import B. DirectQuery C. Direct Lake D. Cached
Correct Answer: B. DirectQuery
Explanation: DirectQuery does not store data locally. Each query is sent directly to the source system, ensuring real-time or near-real-time results.
3. What is a key advantage of Direct Lake compared to Import mode?
A. Supports more DAX functions B. Requires no OneLake integration C. Avoids full data refresh while maintaining high performance D. Works only with SQL Server
Correct Answer: C. Avoids full data refresh while maintaining high performance
Explanation: Direct Lake reads Delta tables directly from OneLake, avoiding large import refreshes while still using the VertiPaq engine for fast analytics.
4. Which scenario is best suited for Import mode?
A. A dataset requiring real-time updates every second B. A small to medium dataset with complex DAX calculations C. A multi-terabyte lakehouse fact table D. Streaming event data
Correct Answer: B. A small to medium dataset with complex DAX calculations
Explanation: Import mode supports the full range of modeling features and offers excellent performance for datasets that fit comfortably in memory.
5. Which storage mode is specifically optimized for Delta tables stored in OneLake?
A. Import B. DirectQuery C. Direct Lake D. Hybrid
Correct Answer: C. Direct Lake
Explanation: Direct Lake is a Fabric-optimized storage mode designed to work directly with Delta tables in OneLake.
6. A semantic model includes some tables in Import mode and others in Direct Lake mode. What is this called?
A. Hybrid model B. Incremental model C. Composite model D. Federated model
Correct Answer: C. Composite model
Explanation: A composite model uses multiple storage modes within the same semantic model, allowing flexibility between performance and freshness.
7. Which limitation applies to Direct Lake tables?
A. They cannot be refreshed B. They do not support relationships C. Calculated tables are not supported directly D. They cannot be queried using DAX
Correct Answer: C. Calculated tables are not supported directly
Explanation: Calculated tables require Import mode. Direct Lake tables must be switched to Import if calculated tables are needed.
8. What primarily determines query performance when using DirectQuery mode?
A. The VertiPaq engine B. The refresh schedule C. The source system’s performance D. OneLake caching
Correct Answer: C. The source system’s performance
Explanation: In DirectQuery mode, queries are executed against the source system, so performance depends on source optimization and capacity.
9. Which storage mode minimizes data duplication while still offering high query performance?
A. Import B. DirectQuery C. Direct Lake D. Cached Import
Correct Answer: C. Direct Lake
Explanation: Direct Lake avoids copying data into the model while still leveraging in-memory query acceleration, minimizing duplication and refresh overhead.
10. You need near real-time analytics on a very large dataset stored in OneLake without long refresh times. Which storage mode should you choose?
A. Import B. DirectQuery C. Direct Lake D. Snapshot
Correct Answer: C. Direct Lake
Explanation: Direct Lake is ideal for large OneLake datasets where full import refreshes are impractical but fast, fresh analytics are required.
Data Analysis Expressions (DAX) is a formula language used to create dynamic calculations in Power BI semantic models. Unlike SQL or KQL, DAX works within the analytical model and is designed for filter context–aware calculations, interactive reporting, and business logic. For DP-600, you should understand how to use DAX to select, filter, and aggregate data within a semantic model for analytics and reporting.
What Is DAX?
DAX is similar to Excel formulas but optimized for relational, in-memory analytics. It is used in:
DAX operates based on filter context and evaluates measures dynamically.
There is no explicit SELECT statement — rather, measures compute values based on current context.
Use CALCULATE to change filter context.
Aggregation functions (e.g., SUM, COUNT, AVERAGE) are fundamental to summarizing data.
Filtering functions like FILTER and time intelligence functions enhance analytical flexibility.
Final Exam Tips
If a question mentions interactive reports, dynamic filters, slicers, or time-based comparisons, DAX is likely the right language to use for the solution.
If the question mentions slicers, visuals, or dynamic results, think DAX measure.
Time intelligence functions are high-value topics.
Practice Questions:
Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …
Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions
1. Which DAX function is primarily used to modify the filter context of a calculation?
A. FILTER B. SUMX C. CALCULATE D. ALL
Correct answer: ✅ C Explanation:CALCULATE changes the filter context under which an expression is evaluated.
2. A Power BI report contains slicers for Year and Product. A measure returns different results as slicers change. What concept explains this behavior?
A. Row context B. Filter context C. Evaluation context D. Query context
Correct answer: ✅ B Explanation: Filter context is affected by slicers, filters, and visual interactions.
3. Which DAX function iterates row by row over a table to perform a calculation?
A. SUM B. COUNT C. AVERAGE D. SUMX
Correct answer: ✅ D Explanation:SUMX evaluates an expression for each row and then aggregates the results.
4. You want to calculate total sales only for transactions greater than $1,000. Which approach is correct?
8. Which DAX function returns a table instead of a scalar value?
A. SUM B. AVERAGE C. FILTER D. COUNT
Correct answer: ✅ C Explanation:FILTER returns a table that can be consumed by other functions like CALCULATE.
9. Which scenario is the best use case for DAX instead of SQL or KQL?
A. Cleaning raw data before ingestion B. Transforming streaming event data C. Creating interactive report-level calculations D. Querying flat files in a lakehouse
Correct answer: ✅ C Explanation: DAX excels at dynamic, interactive calculations in semantic models.
10. What is the primary purpose of the SAMEPERIODLASTYEAR function?
A. Aggregate values by fiscal year B. Remove filters from a date column C. Compare values to the previous year D. Calculate rolling averages
Correct answer: ✅ C Explanation: It shifts the date context back one year for year-over-year analysis.
The Kusto Query Language (KQL) is a read-only request language used for querying large, distributed, event-driven datasets — especially within Eventhouse and Azure Data Explorer–backed workloads in Microsoft Fabric. KQL enables you to select, filter, and aggregate data efficiently in scenarios involving high-velocity data like telemetry, logs, and streaming events.
For the DP-600 exam, you should understand KQL basics and how it supports data exploration and analytical summarization in a real-time analytics context.
KQL Basics
KQL is designed to be expressive and performant for time-series or log-like data. Queries are built as a pipeline of operations, where each operator transforms the data and passes it to the next.
Selecting Data
In KQL, the project operator performs the equivalent of selecting columns:
EventHouseTable
| project Timestamp, Country, EventType, Value
project lets you choose which fields to include
You can rename fields inline: | project Time=Timestamp, Sales=Value
Exam Tip: Use project early to limit data to relevant columns and reduce processing downstream.
Filtering Data
Filtering in KQL is done using the where operator:
EventHouseTable
| where Country == "USA"
Multiple conditions can be combined with and/or:
| where Value > 100 and EventType == "Purchase"
Filtering early in the pipeline improves performance by reducing the dataset before subsequent transformations.
Aggregating Data
KQL uses the summarize operator to perform aggregations such as counts, sums, averages, min, max, etc.
EventHouseTable
| where EventType == "Purchase"
| summarize CountEvents = count(), TotalSales = sum(Value) by Country
Time-Bucketed Aggregation
KQL supports time binning using bin():
EventHouseTable
| where EventType == "Purchase"
| summarize TotalSales = sum(Value) by Country, bin(Timestamp, 1h)
This groups results into hourly buckets, which is ideal for time-series analytics and dashboards.
Common KQL Aggregation Functions
Function
Description
count()
Total number of records
sum(column)
Sum of numeric values
avg(column)
Average value
min(column) / max(column)
Minimum / maximum value
percentile(column, p)
Percentile calculation
Combining Operators
KQL queries are often a combination of select, filter, and aggregation:
EventHouseTable
| where EventType == "Purchase" and Timestamp >= ago(7d)
| project Country, Value, Timestamp
| summarize TotalSales = sum(Value), CountPurchases = count() by Country
| order by TotalSales desc
This pipeline:
Filters for purchases in the last 7 days
Projects relevant fields
Aggregates totals and counts
Orders the result by highest total sales
KQL vs SQL: What’s Different?
Feature
SQL
KQL
Syntax
Declarative
Pipeline-based
Joins
Extensive support
Limited pivot semantics
Use cases
Relational data
Time-series, event, logs
Aggregation
GROUP BY
summarize
KQL shines when querying streaming or event data at scale — exactly the kinds of scenarios Eventhouse targets.
Performance Considerations in KQL
Apply where as early as possible.
Use project to keep only necessary fields.
Time-range filters (e.g., last 24h) drastically reduce scan size.
KQL runs distributed and is optimized for large event streams.
Practical Use Cases
Example – Top Countries by Event Count:
EventHouseTable
| summarize EventCount = count() by Country
| top 10 by EventCount
Example – Average Value of Events per Day:
EventHouseTable
| where EventType == "SensorReading"
| summarize AvgValue = avg(Value) by bin(Timestamp, 1d)
Exam Relevance
In DP-600 exam scenarios involving event or near-real-time analytics (such as with Eventhouse or KQL-backed lakehouse sources), you may be asked to:
Write or interpret KQL that:
projects specific fields
filters records based on conditions
aggregates and groups results
Choose the correct operator (where, project, summarize) for a task
Understand how KQL can be optimized with time-based filtering
Key Takeaways
project selects specific fields.
where filters rows based on conditions.
summarize performs aggregations.
Time-series queries often use bin() for bucketing.
The KQL pipeline enables modular, readable, and optimized queries for large datasets.
Final Exam Tips
If a question involves event streams, telemetry, metrics over time, or real-time analytics, and asks about summarizing values after filtering, think KQL with where, project, and summarize.
project → select columns
where → filter rows
summarize → aggregate and group
bin() → time-based grouping
KQL is pipeline-based, not declarative like SQL
Used heavily in Eventhouse / real-time analytics
Practice Questions:
Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …
Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions
1. Which KQL operator is used to select specific columns from a dataset?
A. select B. where C. project D. summarize
✅ Correct Answer: C
Explanation: project is the KQL operator used to select and optionally rename columns. KQL does not use SELECT like SQL.
2. Which operator is used to filter rows in a KQL query?
A. filter B. where C. having D. restrict
✅ Correct Answer: B
Explanation: The where operator filters rows based on conditions and is typically placed early in the query pipeline for performance.
3. How do you count the number of records in a table using KQL?
A. count(*) B. summarize count() C. summarize count(*) D. summarize count()
✅ Correct Answer: D
Explanation: In KQL, aggregation functions are used inside summarize. count() counts rows; count(*) is SQL syntax.
4. Which KQL operator performs aggregations similar to SQL’s GROUP BY?
A. group B. aggregate C. summarize D. partition
✅ Correct Answer: C
Explanation: summarize is the KQL operator used for aggregation and grouping.
5. Which query returns total sales grouped by country?
A.
| group by Country sum(Value)
B.
| summarize sum(Value) Country
C.
| summarize TotalSales = sum(Value) by Country
D.
| aggregate Value by Country
✅ Correct Answer: C
Explanation: KQL requires explicit naming of aggregates and grouping using summarize … by.
6. What is the purpose of the bin() function in KQL?
A. To sort data B. To group numeric values C. To bucket values into time intervals D. To remove null values
✅ Correct Answer: C
Explanation: bin() groups values—commonly timestamps—into fixed-size intervals (for example, hourly or daily buckets).
7. Which query correctly summarizes event counts per hour?
A.
| summarize count() by Timestamp
B.
| summarize count() by hour(Timestamp)
C.
| summarize count() by bin(Timestamp, 1h)
D.
| count() by Timestamp
✅ Correct Answer: C
Explanation: Time-based grouping in KQL requires bin() to define the interval size.
8. Which operator should be placed as early as possible in a KQL query for performance reasons?
A. summarize B. project C. order by D. where
✅ Correct Answer: D
Explanation: Applying where early reduces the dataset size before further processing, improving performance.
9. Which KQL query returns the top 5 countries by event count?
A.
| top 5 Country by count()
B.
| summarize count() by Country | top 5 by count_
C.
| summarize EventCount = count() by Country | top 5 by EventCount
D.
| order by Country limit 5
✅ Correct Answer: C
Explanation: You must first aggregate using summarize, then use top based on the aggregated column.
10. In Microsoft Fabric, KQL is primarily used with which workload?
A. Warehouse B. Lakehouse SQL endpoint C. Eventhouse D. Semantic model
✅ Correct Answer: C
Explanation: KQL is the primary query language for Eventhouse and real-time analytics scenarios in Microsoft Fabric.
Working with SQL to select, filter, and aggregate data is a core skill for analytics engineers using Microsoft Fabric. Whether querying data in a warehouse, lakehouse SQL analytics endpoint, or semantic model via DirectQuery, SQL enables precise data retrieval and summarization for reporting, dashboards, and analytics solutions.
For DP-600, you should understand how to construct SQL queries that perform:
Selecting specific data columns
Filtering rows based on conditions
Aggregating values with grouping and summary functions
SQL Data Selection
Selecting data refers to using the SELECT clause to choose which columns or expressions to return.
Example:
SELECT
CustomerID,
OrderDate,
SalesAmount
FROM Sales;
Use * to return all columns: SELECT * FROM Sales;
Use expressions to compute derived values: SELECT OrderDate, SalesAmount, SalesAmount * 1.1 AS AdjustedRevenue FROM Sales;
Exam Tip: Be purposeful in selecting only needed columns to improve performance.
SQL Data Filtering
Filtering data determines which rows are returned based on conditions using the WHERE clause.
Basic Filtering:
SELECT *
FROM Sales
WHERE OrderDate >= '2025-01-01';
Combined Conditions:
AND: WHERE Country = 'USA' AND SalesAmount > 1000
OR: WHERE Region = 'East' OR Region = 'West'
Null and Missing Value Filters:
WHERE SalesAmount IS NOT NULL
Exam Tip: Understand how WHERE filters reduce dataset size before aggregation.
SQL Aggregation
Aggregation summarizes grouped rows using functions like SUM, COUNT, AVG, MIN, and MAX.
Basic Aggregation:
SELECT
SUM(SalesAmount) AS TotalSales
FROM Sales;
Grouped Aggregation:
SELECT
Country,
SUM(SalesAmount) AS TotalSales,
COUNT(*) AS OrderCount
FROM Sales
GROUP BY Country;
Filtering After Aggregation:
Use HAVING instead of WHERE to filter aggregated results:
SELECT
Country,
SUM(SalesAmount) AS TotalSales
FROM Sales
GROUP BY Country
HAVING SUM(SalesAmount) > 100000;
Exam Tip:
Use WHERE for row-level filters before grouping.
Use HAVING to filter group-level aggregates.
Combining Select, Filter, and Aggregate
A complete SQL query often blends all three:
SELECT
ProductCategory,
COUNT(*) AS Orders,
SUM(SalesAmount) AS TotalSales,
AVG(SalesAmount) AS AvgSale
FROM Sales
WHERE OrderDate BETWEEN '2025-01-01' AND '2025-12-31'
GROUP BY ProductCategory
ORDER BY TotalSales DESC;
This example:
Selects specific columns and expressions
Filters by date range
Aggregates by product category
Orders results by summary metric
SQL in Different Fabric Workloads
Workload
SQL Usage
Warehouse
Standard T-SQL for BI queries
Lakehouse SQL Analytics
SQL against Delta tables
Semantic Models via DirectQuery
SQL pushed to source where supported
Dataflows/Power Query
SQL-like operations through M (not direct SQL)
Performance and Pushdown
When using SQL in Fabric:
Engines push filters and aggregations down to the data source for performance.
Select only needed columns early to limit data movement.
Avoid SELECT * in production queries unless necessary.
Key SQL Concepts for the Exam
Concept
Why It Matters
SELECT
Defines what data to retrieve
WHERE
Filters data before aggregation
GROUP BY
Organizes rows into groups
HAVING
Filters after aggregation
Aggregate functions
Summarize numeric data
Understanding how these work together is essential for creating analytics-ready datasets.
Common Exam Scenarios
You may be asked to:
Write SQL to filter data based on conditions
Summarize data across groups
Decide whether to use WHERE or HAVING
Identify the correct SQL pattern for a reporting requirement
Example exam prompt:
“Which SQL query correctly returns the total sales per region, only for regions with more than 1,000 orders?”
Understanding aggregate filters (HAVING) and groupings will be key.
Final Exam Tips
If a question mentions:
“Return summary metrics”
“Only include rows that meet conditions”
“Group results by category”
…you’re looking at combining SELECT, WHERE, and GROUP BY in SQL.
WHERE filters rows before aggregation
HAVING filters after aggregation
GROUP BY is required for per-group metrics
Use aggregate functions intentionally
Performance matters — avoid unnecessary columns
Practice Questions:
Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …
Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions
1. Which SQL clause is used to filter rows before aggregation occurs?
A. HAVING B. GROUP BY C. WHERE D. ORDER BY
✅ Correct Answer: C
Explanation: The WHERE clause filters individual rows before any aggregation or grouping takes place. HAVING filters results after aggregation.
2. You need to calculate total sales per product category. Which clause is required?
A. WHERE B. GROUP BY C. ORDER BY D. HAVING
✅ Correct Answer: B
Explanation: GROUP BY groups rows so aggregate functions (such as SUM) can be calculated per category.
3. Which function returns the number of rows in each group?
A. SUM() B. COUNT() C. AVG() D. MAX()
✅ Correct Answer: B
Explanation: COUNT() counts the number of rows in a group. It is commonly used to count records or transactions.
4. Which query correctly filters aggregated results?
A.
WHERE SUM(SalesAmount) > 10000
B.
HAVING SUM(SalesAmount) > 10000
C.
GROUP BY SUM(SalesAmount) > 10000
D.
ORDER BY SUM(SalesAmount) > 10000
✅ Correct Answer: B
Explanation: HAVING is used to filter aggregated values. WHERE cannot reference aggregate functions.
5. Which SQL statement returns the total number of orders?
A.
SELECT COUNT(*) FROM Orders;
B.
SELECT SUM(*) FROM Orders;
C.
SELECT TOTAL(Orders) FROM Orders;
D.
SELECT COUNT(Orders) FROM Orders;
✅ Correct Answer: A
Explanation: COUNT(*) counts all rows in a table, making it the correct way to return total order count.
6. Which clause is used to sort aggregated query results?
A. GROUP BY B. WHERE C. ORDER BY D. HAVING
✅ Correct Answer: C
Explanation: ORDER BY sorts the final result set, including aggregated columns.
7. What happens if a column in the SELECT statement is not included in the GROUP BY clause or an aggregate function?
A. The query runs but returns incorrect results B. SQL automatically groups it C. The query fails D. The column is ignored
✅ Correct Answer: C
Explanation: In SQL, any column in SELECT must either be aggregated or included in GROUP BY.
8. Which query returns average sales amount per country?
A.
SELECT Country, AVG(SalesAmount)
FROM Sales;
B.
SELECT Country, AVG(SalesAmount)
FROM Sales
GROUP BY Country;
C.
SELECT Country, SUM(SalesAmount)
GROUP BY Country;
D.
SELECT AVG(SalesAmount)
FROM Sales
GROUP BY Country;
✅ Correct Answer: B
Explanation: Grouping by Country allows AVG(SalesAmount) to be calculated per country.
9. Which filter removes rows with NULL values in a column?
A.
WHERE SalesAmount = NULL
B.
WHERE SalesAmount <> NULL
C.
WHERE SalesAmount IS NOT NULL
D.
WHERE NOT NULL SalesAmount
✅ Correct Answer: C
Explanation: SQL uses IS NULL and IS NOT NULL to check for null values.
10. Which SQL pattern is most efficient for analytics queries in Microsoft Fabric?
A. Selecting all columns and filtering later B. Using SELECT * for simplicity C. Filtering early and selecting only needed columns D. Aggregating without grouping
✅ Correct Answer: C
Explanation: Filtering early and selecting only required columns improves performance by reducing data movement—an important Fabric best practice.
In Microsoft Fabric, the Visual Query Editor (VQE) provides a low-code, graphical experience for querying data across lakehouses, warehouses, and semantic models. It allows analytics engineers to explore, shape, and summarize data without writing SQL or KQL, while still generating optimized queries behind the scenes.
For the DP-600 exam, you should understand what the Visual Query Editor is, where it’s used, and how to perform common data analysis tasks such as selecting columns, filtering rows, and aggregating data.
What Is the Visual Query Editor?
The Visual Query Editor is a graphical query-building interface available in multiple Fabric experiences, including:
Lakehouse SQL analytics endpoint
Warehouse
Power BI (Direct Lake and DirectQuery scenarios)
Data exploration within Fabric items
Instead of writing queries manually, you interact with:
Tables and columns
Drag-and-drop operations
Menus for filters, grouping, and aggregations
Fabric then translates these actions into optimized SQL or engine-specific queries.
Selecting Data
Selecting data in the Visual Query Editor focuses on choosing the right columns and datasets for analysis.
Key Capabilities
Select or deselect columns from one or more tables
Rename columns for readability
Reorder columns for analysis or reporting
Combine columns from related tables (via existing relationships)
Exam Tips
Selecting fewer columns improves performance and reduces data transfer.
Column renaming in VQE affects the query result, not the underlying table schema.
The Visual Query Editor respects relationships defined in semantic models and warehouses.
Filtering Data
Filtering allows you to limit rows based on conditions, ensuring only relevant data is included.
Common Filter Types
Equality filters (e.g., Status = "Active")
Range filters (e.g., dates, numeric thresholds)
Text filters (contains, starts with, ends with)
Null / non-null filters
Relative date filters (last 7 days, current month)
Where Filtering Is Applied
At the query level, not permanently in the data source
Before aggregation (important for correct results)
Exam Tips
Filters applied in the Visual Query Editor are executed at the data source when possible (query folding).
Filtering early improves performance and reduces memory usage.
Be aware of how filters interact with aggregations.
Aggregating Data
Aggregation summarizes data by grouping rows and applying calculations.
Common Aggregations
Sum
Count / Count Distinct
Average
Min / Max
Grouping Data
Select one or more columns as group-by fields
Apply aggregations to numeric or date columns
Results return one row per group
Examples
Total sales by product category
Count of orders per customer
Average response time by day
Exam Tips
Aggregations in the Visual Query Editor are conceptually similar to GROUP BY in SQL.
Aggregated queries reduce dataset size and improve performance.
Understand the difference between row-level data and aggregated results.
Behind the Scenes: Generated Queries
Although the Visual Query Editor is low-code, Fabric generates:
SQL queries for warehouses and lakehouse SQL endpoints
Optimized engine-specific queries for semantic models
This ensures:
Efficient execution
Compatibility with Direct Lake and DirectQuery
Consistent results across Fabric experiences
Exam Tip
You are not required to read or write the generated SQL, but you should understand that the Visual Query Editor does not bypass query optimization.
When to Use the Visual Query Editor
Use the Visual Query Editor when:
Quickly exploring unfamiliar datasets
Building queries without writing code
Creating reusable query logic for reports
Teaching or collaborating with less SQL-focused users
Avoid it when:
Complex transformations are required (use SQL, Spark, or Dataflows)
Highly customized logic is needed beyond supported operations
Key Exam Takeaways
For the DP-600 exam, remember:
The Visual Query Editor is a graphical query-building tool in Microsoft Fabric.
It supports selecting columns, filtering rows, and aggregating data.
Operations are translated into optimized queries executed at the data source.
Filtering occurs before aggregation, affecting results and performance.
It is commonly used with lakehouses, warehouses, and semantic models.
Practice Questions:
Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …
Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions
Know the purpose and scope of the Visual Query Editor
Know how to selecting, filtering, and aggregating data
Understand execution order and performance implications
Know when to use (and not use) the Visual Query Editor
Question 1
What is the primary purpose of the Visual Query Editor in Microsoft Fabric?
A. To permanently modify table schemas B. To build queries visually without writing SQL C. To replace semantic models D. To manage workspace permissions
Correct Answer:B
Explanation: The Visual Query Editor provides a low-code, graphical interface for building queries. It does not modify schemas, replace models, or manage security.
Question 2
When you deselect a column in the Visual Query Editor, what happens?
A. The column is deleted from the source table B. The column is hidden permanently for all users C. The column is excluded only from the query results D. The column data type is changed
Correct Answer:C
Explanation: Column selection affects only the query output, not the underlying data or schema.
Question 3
Why is it considered a best practice to select only required columns in a query?
A. It enforces data security B. It reduces query complexity and improves performance C. It enables Direct Lake mode D. It prevents duplicate rows
Correct Answer:B
Explanation: Selecting fewer columns reduces data movement and memory usage, leading to better performance.
Question 4
Which type of filter is commonly used to restrict data to a recent time period?
A. Equality filter B. Text filter C. Relative date filter D. Aggregate filter
Correct Answer:C
Explanation: Relative date filters (e.g., “Last 30 days”) dynamically adjust based on the current date and are commonly used in analytics.
Question 5
At what stage of query execution are filters applied in the Visual Query Editor?
A. After aggregation B. After the query result is returned C. Before aggregation D. Only in the Power BI report layer
Correct Answer:C
Explanation: Filters are applied before aggregation, ensuring accurate summary results and better performance.
Question 6
Which aggregation requires grouping to produce meaningful results?
A. SUM B. COUNT C. GROUP BY D. MIN
Correct Answer:C
Explanation: Grouping defines how rows are summarized. Aggregations like SUM or COUNT rely on GROUP BY logic to produce per-group results.
Question 7
You want to see total sales by product category. Which Visual Query Editor actions are required?
A. Filter Product Category and sort by Sales B. Group by Product Category and apply SUM to Sales C. Count Product Category and filter Sales D. Rename Product Category and aggregate rows
Correct Answer:B
Explanation: This scenario requires grouping on Product Category and applying a SUM aggregation to the Sales column.
Question 8
What happens behind the scenes when you build a query using the Visual Query Editor?
A. Fabric stores a cached dataset only B. Fabric generates optimized SQL or engine-specific queries C. Fabric converts the query into DAX D. Fabric disables query folding
Correct Answer:B
Explanation: The Visual Query Editor translates visual actions into optimized queries (such as SQL) that execute at the data source.
Question 9
Which Fabric items commonly support querying through the Visual Query Editor?
A. Pipelines and notebooks only B. Dashboards only C. Lakehouses, warehouses, and semantic models D. Eventhouses only
Correct Answer:C
Explanation: The Visual Query Editor is widely used across lakehouses, warehouses, and semantic models in Fabric.
Question 10
When should you avoid using the Visual Query Editor?
A. When exploring new datasets B. When building quick aggregations C. When complex transformation logic is required D. When filtering data
Correct Answer:C
Explanation: For advanced or complex transformations, SQL, Spark, or Dataflows are more appropriate than the Visual Query Editor.
Filtering data is one of the most fundamental transformation operations used when preparing analytics data. It ensures that only relevant, valid, and accurate records are included in curated tables or models. Filtering improves performance, reduces unnecessary processing overhead, and helps enforce business logic early in the analytics pipeline.
In Microsoft Fabric, filtering occurs at multiple transformation layers — from ingestion tools to interactive modeling. For the DP-600 exam, you should understand where, why, and how to filter data effectively using various tools and technologies within Fabric.
Why Filter Data?
Filtering data serves several key purposes in analytics:
1. Improve Query and Report Performance
Reduces the amount of data scanned and processed
Enables faster refresh and retrieval
2. Enforce Business Logic
Excludes irrelevant segments (e.g., test data, canceled transactions)
Supports clean analytical results
3. Prepare Analytics-Ready Data
Limits datasets to required time periods or categories
Produces smaller, focused outputs for reporting
4. Reduce Cost
Smaller processing needs reduce compute and storage overhead
Where Filtering Happens in Microsoft Fabric
Filtering can be implemented at multiple stages:
Layer
How You Filter
Power Query (Dataflows Gen2 / Lakehouse)
UI filters or M code
SQL (Warehouse & Lakehouse SQL analytics)
WHERE clauses
Spark (Lakehouse Notebooks)
DataFrame filter() / where()
Pipelines (Data Movement)
Source filters or query-based extraction
Semantic Models (Power BI / DAX)
Query filters, slicers, and row-level security
Filtering early, as close to the data source as possible, ensures better performance downstream.
Tools and Techniques
1. Power Query (Low-Code)
Power Query provides a user-friendly interface to filter rows:
Text filters: Equals, Begins With, Contains, etc.
Number filters: Greater than, Between, Top N, etc.
Date filters: Before, After, This Month, Last 12 Months, etc.
Remove blank or null values
These filters are recorded as transformation steps and can be reused or versioned.
2. SQL (Warehouses & Lakehouses)
SQL filtering uses the WHERE clause:
SELECT *
FROM Sales
WHERE OrderDate >= '2025-01-01'
AND Country = 'USA';
SQL filtering is efficient and pushed down to the engine, reducing row counts early.
3. Spark (Notebooks)
Filtering in Spark (PySpark example):
filtered_df = df.filter(df["SalesAmount"] > 1000)
Or with SQL in Spark:
SELECT *
FROM sales
WHERE SalesAmount > 1000;
Spark filtering is optimized for distributed processing across big datasets.
4. Pipelines (Data Movement)
During ingestion or ETL, you can apply filters in:
Copy activity query filters
Source queries
Pre-processing steps
This ensures only needed rows land in the target store.
5. Semantic Model Filters
In Power BI and semantic models, filtering can happen as:
These filters control what users see rather than what data is stored.
Business and Data Quality Scenarios
Filtering is often tied to business needs such as:
Excluding invalid, test, or archived records
Restricting to active customers only
Selecting a specific date range (e.g., last fiscal year)
Filtering data for regional or product segments
Filtering vs Security
It’s important to distinguish filtering for transformation from security filters:
Filtering
Security
Removes unwanted rows during transformation
Controls what users are allowed to see
Improves performance
Enforces access control
Happens before modeling
Happens during query evaluation
Best Practices
When filtering data in Microsoft Fabric:
Filter early in the pipeline to reduce volume
Use pushdown filters in SQL when querying large sources
Document filtering logic for audit and governance
Combine filters logically (AND/OR) to match business rules
Avoid filtering in the semantic model when it can be done upstream
Common Exam Scenarios
You may be asked to:
Choose the correct tool and stage for filtering
Translate business rules into filter logic
Recognize when filtering improves performance
Identify risks of filtering too late or in the wrong layer
Example exam prompt: “A dataset should exclude test transactions and include only the last 12 months of sales. Which transformation step should be applied and where?” The correct answer will involve filtering early with SQL or Power Query before modeling.
Key Takeaways
Filtering data is a core part of preparing analytics-ready datasets.
Multiple Fabric components support filtering (Power Query, SQL, Spark, pipelines).
Filtering early improves performance and reduces unnecessary workload.
Understand filtering in context — transformation vs. security.
Final Exam Tips
When a question asks about reducing dataset size, improving performance, or enforcing business logic before loading into a model, filtering is often the correct action — and it usually belongs upstream.
Filter early and upstream whenever possible
Use SQL or Power Query for transformation-level filtering
Avoid relying solely on report-level filters for large datasets
Distinguish filtering for performance from security filtering
Practice Questions:
Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …
Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions
Question 1
What is the primary purpose of filtering data during the transformation phase?
A. To enforce user-level security B. To reduce data volume and improve performance C. To encrypt sensitive columns D. To normalize data structures
Correct Answer:B
Explanation: Filtering removes unnecessary rows early in the pipeline, reducing data volume, improving performance, and lowering compute costs. Security and normalization are separate concerns.
Question 2
Which Fabric component allows low-code, UI-driven row filtering during data preparation?
A. Spark notebooks B. SQL warehouse C. Power Query (Dataflows Gen2) D. Semantic models
Correct Answer:C
Explanation: Power Query provides a graphical interface for filtering rows using text, numeric, and date-based filters, making it ideal for low-code transformations.
Question 3
Which SQL clause is used to filter rows in a lakehouse or warehouse?
A. GROUP BY B. HAVING C. WHERE D. ORDER BY
Correct Answer:C
Explanation: The WHERE clause filters rows before aggregation or sorting, making it the primary SQL mechanism for data filtering.
Question 4
Which filtering approach is most efficient for very large datasets?
A. Filtering in Power BI visuals B. Filtering after loading data into a semantic model C. Filtering at the source using SQL or ingestion queries D. Filtering using calculated columns
Correct Answer:C
Explanation: Filtering as close to the source as possible minimizes data movement and processing, making it the most efficient approach for large datasets.
Question 5
In a Spark notebook, which method is commonly used to filter a DataFrame?
A. select() B. filter() or where() C. join() D. distinct()
Correct Answer:B
Explanation: Spark DataFrames use filter() or where() to remove rows based on conditions.
Question 6
Which scenario is an example of business-rule filtering?
A. Removing duplicate rows B. Converting text to numeric data types C. Excluding canceled orders from sales analysis D. Creating a star schema
Correct Answer:C
Explanation: Business-rule filtering enforces organizational logic, such as excluding canceled or test transactions from analytics.
Question 7
What is the key difference between data filtering and row-level security (RLS)?
A. Filtering improves query speed; RLS does not B. Filtering removes data; RLS restricts visibility C. Filtering is applied only in SQL; RLS is applied only in Power BI D. Filtering is mandatory; RLS is optional
Correct Answer:B
Explanation: Filtering removes rows from the dataset, while RLS controls which rows users can see without removing the data itself.
Question 8
Which filtering method is typically applied after data has already been loaded?
A. Source query filters B. Pipeline copy activity filters C. Semantic model report filters D. Power Query transformations
Correct Answer:C
Explanation: Report and visual filters in semantic models are applied at query time and do not reduce stored data volume.
Question 9
Why is filtering data early in the pipeline considered a best practice?
A. It increases data redundancy B. It simplifies semantic model design C. It reduces processing and storage costs D. It improves data encryption
Correct Answer:C
Explanation: Early filtering minimizes unnecessary data processing and storage, improving efficiency across the entire analytics solution.
Question 10
A dataset should include only the last 12 months of data. Where should this filter ideally be applied?
A. In Power BI slicers B. In the semantic model C. During data ingestion or transformation D. In calculated measures
Correct Answer:C
Explanation: Applying time-based filters during ingestion or transformation ensures only relevant data is processed and stored, improving performance and consistency.
Converting data types is a fundamental transformation task in data preparation. It helps ensure data consistency, accurate calculations, filter behavior, sorting, joins, and overall query correctness. In Microsoft Fabric, data type conversion can happen in Power Query, SQL, or Spark depending on the workload and where you are in your data pipeline.
This article explains why, where, and how you convert data types in Fabric, with an emphasis on real-world scenarios and exam relevance.
Why Convert Data Types?
Data type mismatches can lead to:
Erroneous joins (e.g., joining text to numeric)
Incorrect aggregations (e.g., sums treating numbers as text)
Filtering issues (e.g., date strings not filtering as dates)
Unexpected sort order (e.g., text sorts differently from numbers)
In analytics, getting data types right is critical for both the correctness of results and query performance.
Common Data Types in Analytics
Here are some common data types you’ll work with:
Category
Examples
Numeric
INT, BIGINT, DECIMAL, FLOAT
Text
STRING, VARCHAR
Date/Time
DATE, TIME, DATETIME, TIMESTAMP
Boolean
TRUE / FALSE
Where Data Type Conversion Occurs in Fabric
Depending on workload and tool, you may convert data types in:
Power Query (Dataflows Gen2 & Lakehouses)
Visual change type steps (Menu → Transform → Data Type)
Applied steps stored in the query
Useful for low-code transformation
SQL (Warehouse & Lakehouse SQL Analytics)
CAST, CONVERT, or TRY_CAST in SQL
Applies at query time or when persisting transformed data
Spark (Lakehouse Notebooks)
Explicit schema definitions
Transformation commands like withColumn() with type conversion functions
Each environment has trade-offs. For example, Power Query is user-friendly but may not scale like SQL or Spark for very large datasets.
How to Convert Data Types
In Power Query
Select the column
Go to Transform → Data Type
Choose the correct type (e.g., Whole Number, Decimal Number, Date)
Power Query generates a Change Type step that applies at refresh.
In SQL
SELECT
CAST(order_amount AS DECIMAL(18,2)) AS order_amount,
CONVERT(DATE, order_date) AS order_date
FROM Sales;
CAST() and CONVERT() are standard.
Some engines support TRY_CAST() to avoid errors on incompatible values.
When creating semantic models (Power BI datasets):
Data types determine field behavior (e.g., date hierarchies)
Incorrect types can cause:
Incorrect aggregations
Misleading visuals
DAX errors
Always validate types before importing data into the model.
Best Practices
Always validate data values before conversion
Use schema enforcement where possible (e.g., Spark schema)
Avoid implicit type conversions during joins
Keep logs or steps of transformations for reproducibility
Key Takeaways for the DP-600 Exam
Know why data type conversion matters for analytics
Be able to choose the right tool (Power Query / SQL / Spark) for the context
Understand common conversions (text→numeric, text→date, boolean conversion)
Recognize when conversion must occur in the pipeline for correctness and performance
Practice Questions:
Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …
Expect scenario-based questions rather than direct definitions
Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Keep in mind that if a question mentions unexpected calculations, broken joins, or filtering issues, always consider data type mismatches as a possible root cause.
Question 1
Why is converting column data types important in an analytics solution?
A. It reduces storage costs B. It ensures accurate calculations, joins, and filtering C. It improves report visuals automatically D. It encrypts sensitive data
✅ Correct Answer: B
Explanation: Correct data types ensure accurate aggregations, proper join behavior, correct filtering, and predictable sorting.
Question 2
Which Fabric tool provides a visual, low-code interface for changing column data types?
A. SQL Analytics endpoint B. Spark notebooks C. Power Query D. Eventhouse
✅ Correct Answer: C
Explanation: Power Query allows users to change data types through a graphical interface and automatically records the steps.
Question 3
What is a common risk when converting text values to numeric data types?
A. Increased storage usage B. Duplicate rows C. Conversion errors or null values D. Slower report rendering
✅ Correct Answer: C
Explanation: Text values that are not valid numbers may cause conversion failures or be converted to nulls, depending on the method used.
Question 4
Which SQL function safely attempts to convert a value and returns NULL if conversion fails?
A. CAST B. CONVERT C. TRY_CAST D. FORMAT
✅ Correct Answer: C
Explanation: TRY_CAST avoids query failures by returning NULL when a value cannot be converted.
Question 5
When should data types ideally be converted in a Fabric analytics pipeline?
A. At report query time B. After publishing reports C. Early in the transformation process D. Only in the semantic model
✅ Correct Answer: C
Explanation: Converting data types early prevents downstream issues in joins, aggregations, and semantic models.
Question 6
Which data type is most appropriate for calendar-based filtering and time intelligence?
A. Text B. Integer C. Date or DateTime D. Boolean
✅ Correct Answer: C
Explanation: Date and DateTime types enable proper time-based filtering, hierarchies, and time intelligence calculations.
Question 7
Which Spark operation converts a column’s data type?
A. changeType() B. convert() C. cast() D. toType()
✅ Correct Answer: C
Explanation: The cast() method is used in Spark to convert a column’s data type.
Question 8
Why can implicit data type conversion during joins be problematic?
A. It improves performance B. It hides data lineage C. It may cause incorrect matches or slow performance D. It automatically removes duplicates
✅ Correct Answer: C
Explanation: Implicit conversions can prevent index usage and lead to incorrect or inefficient joins.
Question 9
A numeric column is stored as text and sorts incorrectly (e.g., 1, 10, 2). What is the cause?
A. Incorrect aggregation B. Missing values C. Wrong data type D. Duplicate rows
✅ Correct Answer: C
Explanation: Text sorting is lexicographical, not numeric, leading to incorrect ordering.
Question 10
What is the impact of incorrect data types in a Power BI semantic model?
A. Only visuals are affected B. Aggregations, filters, and DAX behavior may be incorrect C. Reports fail to load D. Sensitivity labels are removed
✅ Correct Answer: B
Explanation: Data types influence how fields behave in calculations, visuals, and DAX expressions.
Ensuring data quality is foundational for reliable analytics. Duplicate records, missing values, and nulls can lead to inaccurate aggregations, misleading insights, and broken joins. Microsoft Fabric provides multiple tools and techniques to identify, investigate, and resolve these issues during data preparation.
Why Data Quality Matters
Poor data quality can cause:
Incorrect business metrics (e.g., inflated counts)
Failed joins or mismatches
Incorrect aggregates or KPIs
Discrepancies across reports
The DP-600 exam expects you to know how to detect and fix these issues using Fabric’s transformation tools — without degrading performance or losing important data.
Key Data Quality Issues
1. Duplicate Data
Duplicates occur when the same record appears multiple times. Common causes:
Repeated ingestion jobs
Incorrect joins
Source system errors
Impact of duplicates:
Inflated metrics
Misleading counts
Distorted analytics
2. Missing Data
Missing data refers to complete absence of expected rows for certain categories or time periods.
Examples:
No sales records for a specific store in a date range
Missing customer segments
Impact:
Bias in analysis
Understated performance
3. Null Values
Nulls represent unknown or undefined values in a dataset.
Common cases:
Missing customer name
Missing numeric values
Unpopulated fields in incomplete records
Consequences:
SQL functions may ignore nulls
Aggregations may be skewed
Joins may fail or produce incorrect results
Tools and Techniques in Microsoft Fabric
1. Power Query (Dataflows Gen2 / Lakehouse)
Power Query provides a visual and programmatic interface to clean data:
Remove duplicates: Home → Remove Rows → Remove Duplicates
Replace or fill nulls: Transform → Replace Values Or use Fill Up / Fill Down
Filter nulls: Filter rows where column is null or not null
Benefits:
No-code/low-code
Reusable transformation steps
Easy preview and validation
2. SQL (Warehouses / Lakehouse SQL Analytics)
Using SQL, you can identify and fix issues:
Detect duplicates:
SELECT Col1, Col2, COUNT(*) AS Cnt
FROM table
GROUP BY Col1, Col2
HAVING COUNT(*) > 1;
Remove duplicates (example pattern):
WITH RankedRows AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY keycol ORDER BY keycol) AS rn
FROM table
)
SELECT * FROM RankedRows WHERE rn = 1;
Replace nulls:
SELECT COALESCE(column, 0) AS column_fixed
FROM table;
3. Spark (Lakehouses via Notebooks)
Identify nulls:
df.filter(df["column"].isNull()).show()
Drop duplicates:
df.dropDuplicates(["keycol"])
Fill nulls:
df.na.fill({"column": "Unknown"})
Best Practices for Resolution
Addressing Duplicates
Use business keys (unique identifiers) to define duplicates
Validate whether duplicates are true duplicates or legitimate repeats
Document deduplication logic
Handling Nulls
Use domain knowledge to decide substitute values
Zero for numeric
“Unknown” or “Not Provided” for text
Preserve nulls when they carry meaning (e.g., missing responses)
Handling Missing Data
Understand the business meaning
Is absence valid?
Should data be imputed?
Or should missing rows be generated via reference tables?
Data Profiling
Use profiling to understand distributions and quality:
Column completeness
Unique value distribution
Null frequency
Data profiling helps you decide which cleaning steps are required.
When to Clean Data in Fabric
Data quality transformations should be performed:
Early in the pipeline (at the ingestion or transformation layer)
Before building semantic models
Before aggregations or joins
Before publishing curated datasets
Early cleaning prevents issues from propagating into semantic models and reports.
Exam Scenarios
In DP-600 exam questions, you might see scenarios like:
Metrics appear inflated due to duplicate records
Reports show missing date ranges
Joins fail due to null key values
Aggregations ignore null values
Your job is to choose the correct transformation action — e.g., filtering nulls, deduplicating, replacing values, or imputing missing data — and the best tool (Power Query vs SQL vs Spark).
Key Takeaways
Duplicate rows inflate counts and distort analytics.
Missing rows can bias time-series or segment analysis.
Null values can break joins and cause incorrect aggregation results.
Effective resolution relies on understanding business context and using the right Fabric tools.
Clean data early for better downstream performance and governance.
Final Exam Tip If a metric doesn’t look right, think data quality first — missing or null values and duplicates are one of the most common real-world issues covered in DP-600 scenarios.
Practice Questions:
Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …
Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions
General Exam Tips for this section If something looks wrong in a report:
Too high? → Check for duplicates
Blank or missing? → Check for nulls or missing rows
Not joining correctly? → Check nulls and key integrity
Question 1
Which issue is most likely to cause inflated totals in aggregated metrics?
A. Null values in numeric columns B. Missing rows for a time period C. Duplicate records D. Incorrect column data types
✅ Correct Answer: C
Explanation: Duplicate records result in the same data being counted more than once, which inflates sums, counts, and averages.
Question 2
In Power Query, which action is used to remove duplicate rows?
A. Filter Rows B. Group By C. Remove Duplicates D. Replace Values
✅ Correct Answer: C
Explanation: The Remove Duplicates operation removes repeated rows based on selected columns.
Question 3
Which SQL function is commonly used to replace null values with a default value?
A. NULLIF B. ISNULL or COALESCE C. COUNT D. CAST
✅ Correct Answer: B
Explanation: ISNULL() and COALESCE() return a specified value when a column contains NULL.
Question 4
Why can null values cause problems in joins?
A. Nulls increase query runtime B. Nulls are treated as zero C. Nulls never match other values D. Nulls are automatically filtered
✅ Correct Answer: C
Explanation: NULL values do not match any value (including other NULLs), which can cause rows to be excluded from join results.
Question 5
Which scenario best justifies keeping null values rather than replacing them?
A. The column is used in joins B. The null indicates “unknown” or “not applicable” C. The column is numeric D. The column has duplicates
✅ Correct Answer: B
Explanation: Nulls may carry important business meaning and should be preserved when they accurately represent missing or unknown information.
Question 6
Which Fabric tool is most appropriate for visual data profiling to identify missing and null values?
A. Power BI visuals B. Power Query C. Semantic models D. Eventhouse
✅ Correct Answer: B
Explanation: Power Query provides built-in data profiling features such as column distribution, column quality, and column profile.
Question 7
What is the purpose of using an anti join when checking data quality?
A. To merge tables B. To append data C. To identify unmatched records D. To replace null values
✅ Correct Answer: C
Explanation: Anti joins return rows that do not have a match in another table, making them ideal for identifying missing or orphaned records.
Question 8
Which approach is considered a best practice for handling data quality issues?
A. Fix issues only in reports B. Clean data as late as possible C. Resolve issues early in the pipeline D. Ignore null values
✅ Correct Answer: C
Explanation: Resolving data quality issues early prevents them from propagating into semantic models and reports.
Question 9
Which Spark operation removes duplicate rows from a DataFrame?
A. filter() B. groupBy() C. dropDuplicates() D. distinctColumns()
✅ Correct Answer: C
Explanation: dropDuplicates() removes duplicate rows based on one or more columns.
Question 10
A report is missing values for several dates. What is the most likely cause?
A. Duplicate rows B. Incorrect aggregation logic C. Missing source data D. Incorrect data type conversion
✅ Correct Answer: C
Explanation: Missing dates usually indicate that source records are absent rather than null or duplicated.
Merging or joining data is a fundamental transformation task in Microsoft Fabric. It enables you to combine related data from multiple tables or sources into a single dataset for analysis, modeling, or reporting. This skill is essential for preparing clean, well-structured data in lakehouses, warehouses, dataflows, and Power BI semantic models.
For the DP-600 exam, you are expected to understand when, where, and how to merge or join data using the appropriate Fabric tools, as well as the implications for performance, data quality, and modeling.
Merge vs. Join: Key Distinction
Although often used interchangeably, the terms have slightly different meanings depending on the tool:
Merge
Commonly used in Power Query
Combines tables by matching rows based on one or more key columns
Produces a new column that can be expanded
Join
Commonly used in SQL and Spark
Combines tables using explicit join logic (JOIN clauses)
Output schema is defined directly in the query
Where Merging and Joining Occur in Fabric
Fabric Experience
How It’s Done
Power Query (Dataflows Gen2, Lakehouse)
Merge Queries UI
Warehouse
SQL JOIN statements
Lakehouse (Spark notebooks)
DataFrame joins
Power BI Desktop
Power Query merges
Common Join Types (Exam-Critical)
Understanding join types is heavily tested:
Inner Join
Returns only matching rows from both tables
Left Outer Join
Returns all rows from the left table and matching rows from the right
Right Outer Join
Returns all rows from the right table and matching rows from the left
Full Outer Join
Returns all rows from both tables
Left Anti / Right Anti Join
Returns rows with no match in the other table
👉 Exam tip: Anti joins are commonly used to identify missing or unmatched data.
Join Keys and Data Quality Considerations
Before merging or joining data, it’s critical to ensure:
Join columns:
Have matching data types
Are cleaned and standardized
Represent the same business entity
Duplicate values in join keys can:
Create unexpected row multiplication
Impact aggregations and performance
Performance and Design Considerations
Prefer SQL joins or Spark joins for large datasets rather than Power Query
Filter and clean data before joining to reduce data volume
In dimensional modeling:
Fact tables typically join to dimension tables using left joins
Avoid unnecessary joins in the semantic layer when they can be handled upstream
Common Use Cases
Combining fact data with descriptive attributes
Enriching transactional data with reference or lookup tables
Building dimension tables for star schema models
Validating data completeness using anti joins
Exam Tips and Pitfalls
Don’t confuse merge vs. append (append stacks rows vertically)
Know which tool to use based on:
Data size
Refresh frequency
Complexity
Expect scenario questions asking:
Which join type to use
Where the join should occur in the architecture
Key Takeaways
Merging and joining data is essential for data preparation in Fabric
Different Fabric experiences offer different ways to join data
Correct join type and clean join keys are critical for accuracy
Performance and modeling best practices matter for the DP-600 exam
Practice Questions:
Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …
Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions
Question 1
What is the primary purpose of merging or joining data in Microsoft Fabric?
A. To reduce storage costs B. To vertically stack tables C. To combine related data based on a common key D. To encrypt sensitive columns
✅ Correct Answer: C
Explanation: Merging or joining data combines related datasets horizontally using shared key columns so that related attributes appear in a single dataset.
Question 2
In Power Query, what is the result of a Merge Queries operation?
A. Rows from both tables are appended B. A new table is automatically created C. A new column containing related table data is added D. A relationship is created in the semantic model
✅ Correct Answer: C
Explanation: Power Query merges add a column that contains matching rows from the second table, which can then be expanded.
Question 3
Which join type returns only rows that exist in both tables?
A. Left outer join B. Right outer join C. Full outer join D. Inner join
✅ Correct Answer: D
Explanation: An inner join returns only rows with matching keys in both tables.
Question 4
You want to keep all rows from a fact table and bring in matching dimension attributes. Which join type should you use?
A. Inner join B. Left outer join C. Right outer join D. Full outer join
✅ Correct Answer: B
Explanation: A left outer join preserves all rows from the left (fact) table while bringing in matching rows from the dimension table.
Question 5
Which join type is most useful for identifying records that do not have a match in another table?
A. Inner join B. Full outer join C. Left anti join D. Right outer join
✅ Correct Answer: C
Explanation: A left anti join returns rows from the left table that do not have matching rows in the right table, making it ideal for data quality checks.
Question 6
What issue can occur when joining tables that contain duplicate values in the join key?
A. Data type conversion errors B. Row multiplication C. Data loss D. Query failure
✅ Correct Answer: B
Explanation: Duplicate keys can cause one-to-many or many-to-many matches, resulting in more rows than expected after the join.
Question 7
Which Fabric experience is best suited for performing joins on very large datasets?
A. Power BI Desktop B. Power Query C. Warehouse using SQL D. Excel
✅ Correct Answer: C
Explanation: SQL joins in a warehouse are optimized for large-scale data processing and typically outperform Power Query for large datasets.
Question 8
Which operation should not be confused with merging or joining data?
A. Append B. Inner join C. Left join D. Anti join
✅ Correct Answer: A
Explanation: Append stacks tables vertically (row-wise), while merges and joins combine tables horizontally (column-wise).
Question 9
What should you verify before merging two tables?
A. That both tables have the same number of rows B. That join columns use compatible data types C. That all columns are indexed D. That the tables are in the same workspace
✅ Correct Answer: B
Explanation: Join columns must have compatible data types and clean values; otherwise, matches may fail or produce incorrect results.
Question 10
From a modeling best-practice perspective, where should complex joins ideally be performed?
A. In Power BI visuals B. In DAX measures C. Upstream in lakehouse or warehouse transformations D. At query time in reports
✅ Correct Answer: C
Explanation: Performing joins upstream simplifies semantic models, improves performance, and ensures consistency across reports.
Information and resources for the data professionals' community