Category: Data Security

Analytics, Business Intelligence, Business Intelligence (BI) Development, Data Analysis, Data Development, Data Integration, Data Modeling, Data Quality Assurance, Data Security, Data Strategy, DP-600, Microsoft Certification, Microsoft Fabric, Performance Tuning, Power BI, Power Query, Reporting, SQL December 28, 2025January 5, 2026

Design and Build Composite Models (DP-600 Exam Prep)

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models (25-30%) 
    --> Design and build semantic models 
        --> Design and Build Composite Models

What Is a Composite Model?

A composite model in Power BI and Microsoft Fabric combines data from multiple data sources and multiple storage modes in a single semantic model. Rather than importing all data into the model’s in-memory cache, composite models let you mix different query/storage patterns such as:

Import
DirectQuery
Direct Lake
Live connections

Composite models enable flexible design and optimized performance across diverse scenarios.

Why Composite Models Matter

Semantic models often need to support:

Large datasets that cannot be imported fully
Real-time or near-real-time requirements
Federation across disparate sources
Mix of highly dynamic and relatively static data

Composite models let you combine the benefits of in-memory performance with direct source access.

Core Concepts

Storage Modes in Composite Models

Storage Mode	Description	Typical Use
Import	Data is cached in the semantic model memory	Fast performance for static or moderately sized data
DirectQuery	Queries are pushed to the source at runtime	Real-time or large relational sources
Direct Lake	Queries Delta tables in OneLake	Large OneLake data with faster interactive access
Live Connection	Delegates all query processing to an external model	Shared enterprise semantic models

A composite model may include tables using different modes — for example, imported dimension tables and DirectQuery/Direct Lake fact tables.

Key Features of Composite Models

1. Table-Level Storage Modes

Every table in a composite model may use a different storage mode:

Dimensions may be imported
Fact tables may use DirectQuery or Direct Lake
Bridge or helper tables may be imported

This flexibility enables performance and freshness trade-offs.

2. Relationships Across Storage Modes

Relationships can span tables even if they use different storage modes, enabling:

Filtering between imported and DirectQuery tables
Cross-mode joins (handled intelligently by the engine)

Underlying engines push queries to the appropriate source (SQL, OneLake, Semantic layer), depending on where the data resides.

3. Aggregations and Hierarchies

You can define:

Aggregated tables (pre-summarized import tables)
Detail tables (DirectQuery or Direct Lake)

Power BI automatically uses aggregations when a visual’s query can be satisfied with summary data, enhancing performance.

4. Calculation Groups and Measures

Composite models work with complex semantic logic:

Calculation groups (standardized transformations)
DAX measures that span imported and DirectQuery tables

These models require careful modeling to ensure that context transitions behave predictably.

When to Use Composite Models

Composite models are ideal when:

A. Data Is Too Large to Import

Large fact tables (> hundreds of millions of rows)
Delta/OneLake data too big for full in-memory import
Use Direct Lake for these, while importing dimensions

B. Real-Time Data Is Required

Operational reporting
Systems with high update frequency
Use DirectQuery to relational sources

C. Multiple Data Sources Must Be Combined

Relational databases
OneLake & Delta
Cloud services (e.g., Synapse, SQL DB, Spark)
On-prem gateways

Composite models let you combine these seamlessly.

D. Different Performance vs Freshness Needs

Import for static master data
DirectQuery or Direct Lake for dynamic fact data

Composite vs Pure Models

Aspect	Import Only	Composite
Performance	Very fast	Depends on source/query pattern
Freshness	Scheduled refresh	Real-time/near-real-time possible
Source diversity	Limited	Multiple heterogeneous sources
Model complexity	Simpler	Higher

Query Execution and Optimization

Query Folding

DirectQuery and Power Query transformations rely on query folding to push logic back to the source
Query folding is essential for performance in composite models

Storage Mode Selection

Good modeling practices for composite models include:

Import small dimension tables
Direct Lake for large storage in OneLake
DirectQuery for real-time relational sources
Use aggregations to optimize performance

Modeling Considerations

1. Relationship Direction

Prefer single-direction relationships
Use bidirectional filtering only when required (careful with ambiguity)

2. Data Type Consistency

Ensure fields used in joins have matching data types
In composite models, mismatches can cause query fallbacks

3. Cardinality

High cardinality DirectQuery columns can slow queries
Use star schema patterns

4. Security

Row-level security crosses modes but must be carefully tested
Security logic must consider where filters are applied

Common Exam Scenarios

Exam questions may ask you to:

Choose between Import, DirectQuery, Direct Lake and composite
Assess performance vs freshness requirements
Determine query folding feasibility
Identify correct relationship patterns across modes

Example prompt:

“Your model combines a large OneLake dataset and a small dimension table. Users need current data daily but also fast filtering. Which storage and modeling approach is best?”

Correct exam choices often point to composite models using Direct Lake + imported dimensions.

Best Practices

Define a clear star schema even in composite models
Import dimension tables where reasonable
Use aggregations to improve performance for heavy visuals
Limit direct many-to-many relationships
Use calculation groups to apply analytics consistently
Test query performance across storage modes

Exam-Ready Summary/Tips

Composite models enable flexible and scalable semantic models by mixing storage modes:

Import – best performance for static or moderate data
DirectQuery – real-time access to source systems
Direct Lake – scalable querying of OneLake Delta data
Live Connection – federated or shared datasets

Design composite models to balance performance, freshness, and data volume, using strong schema design and query optimization.

For DP-600, always evaluate:

Data volume
Freshness requirements
Performance expectations
Source location (OneLake vs relational)

Composite models are frequently the correct answer when these requirements conflict.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions

1. What is the primary purpose of using a composite model in Microsoft Fabric?

A. To enable row-level security across workspaces
B. To combine multiple storage modes and data sources in one semantic model
C. To replace DirectQuery with Import mode
D. To enforce star schema design automatically

✅ Correct Answer: B

Explanation:
Composite models allow you to mix Import, DirectQuery, Direct Lake, and Live connections within a single semantic model, enabling flexible performance and data-freshness tradeoffs.

2. You are designing a semantic model with a very large fact table stored in OneLake and small dimension tables. Which storage mode combination is most appropriate?

A. Import all tables
B. DirectQuery for all tables
C. Direct Lake for the fact table and Import for dimension tables
D. Live connection for the fact table and Import for dimensions

✅ Correct Answer: C

Explanation:
Direct Lake is optimized for querying large Delta tables in OneLake, while importing small dimension tables improves performance for filtering and joins.

3. Which storage mode allows querying OneLake Delta tables without importing data into memory?

A. Import
B. DirectQuery
C. Direct Lake
D. Live Connection

✅ Correct Answer: C

Explanation:
Direct Lake queries Delta tables directly in OneLake, combining scalability with better interactive performance than traditional DirectQuery.

4. What happens when a DAX query in a composite model references both imported and DirectQuery tables?

A. The query fails
B. The data must be fully imported
C. The engine generates a hybrid query plan
D. All tables are treated as DirectQuery

✅ Correct Answer: C

Explanation:
Power BI’s engine generates a hybrid query plan, pushing operations to the source where possible and combining results with in-memory data.

5. Which scenario most strongly justifies using a composite model instead of Import mode only?

A. All data fits in memory and refreshes nightly
B. The dataset is static and small
C. Users require near-real-time data from a large relational source
D. The model contains only calculated tables

✅ Correct Answer: C

Explanation:
Composite models are ideal when real-time or near-real-time access is needed, especially for large datasets that are impractical to import.

6. In a composite model, which table type is typically best suited for Import mode?

A. High-volume transactional fact tables
B. Streaming event tables
C. Dimension tables with low cardinality
D. Tables requiring second-by-second freshness

✅ Correct Answer: C

Explanation:
Importing dimension tables improves query performance and reduces load on source systems due to their relatively small size and low volatility.

7. How do aggregation tables improve performance in composite models?

A. By replacing DirectQuery with Import
B. By pre-summarizing data to satisfy queries without scanning detail tables
C. By eliminating the need for relationships
D. By enabling bidirectional filtering automatically

✅ Correct Answer: B

Explanation:
Aggregations allow Power BI to answer queries using pre-summarized Import tables, avoiding expensive queries against large DirectQuery or Direct Lake fact tables.

8. Which modeling pattern is strongly recommended when designing composite models?

A. Snowflake schema
B. Flat tables
C. Star schema
D. Many-to-many relationships

✅ Correct Answer: C

Explanation:
A star schema simplifies relationships, improves performance, and reduces ambiguity—especially important in composite and cross-storage-mode models.

9. What is a potential risk of excessive bidirectional relationships in composite models?

A. Reduced data freshness
B. Increased memory consumption
C. Ambiguous filter paths and unpredictable query behavior
D. Loss of row-level security

✅ Correct Answer: C

Explanation:
Bidirectional relationships can introduce ambiguity, cause unexpected filtering, and negatively affect query performance—risks that are amplified in composite models.

10. Which feature allows a composite model to reuse an enterprise semantic model while extending it with additional data?

A. Direct Lake
B. Import mode
C. Live connection with local tables
D. Calculation groups

✅ Correct Answer: C

Explanation:
A live connection with local tables enables extending a shared enterprise semantic model by adding new tables and measures, forming a composite model.

Analytics, Business Intelligence, Business Intelligence (BI) Development, Data Analysis, Data Development, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Quality Assurance, Data Security, Data Strategy, Data Visualization, Data Warehousing, DP-600, Microsoft Certification, Microsoft Fabric, Performance Tuning, Power BI, Power Query, Python, Reporting, SQL December 28, 2025December 28, 2025

Implement a Star Schema for a Semantic Model

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models 
    --> Design and build semantic models 
        --> Implement a Star Schema for a Semantic Model

What Is a Star Schema?

A star schema is a logical data modeling pattern optimized for analytics and reporting. It organizes data into:

Fact tables: Contain numeric measurements (metrics) of business processes
Dimension tables: Contain descriptive attributes used for slicing, grouping, and filtering

The schema resembles a star: a central fact table with multiple dimensions radiating outward.

Why Use a Star Schema for Semantic Models?

Star schemas are widely used in Power BI semantic models (Tabular models) because they:

Improve query performance: Simplified joins and clear relationships enable efficient engine processing
Simplify reporting: Easy for report authors to understand and navigate
Support fast aggregations: Summary measures are computed more efficiently
Integrate with DAX naturally: Reduces complexity of measures

In DP-600 scenarios where performance and reusability matter, star schemas are often the best design choice.

Semantic Models and Star Schema

Semantic models define business logic that sits on top of data. Star schemas support semantic models by:

Providing clean dimensional context (e.g., Product, Region, Time)
Ensuring facts are centrally located for aggregations
Reducing the number of relationships and cycles
Enabling measures to be defined once and reused across visuals

Semantic models typically import star schema tables into Power BI, Direct Lake, or DirectQuery contexts.

Elements of a Star Schema

Fact Tables

A fact table stores measurable, numeric data about business events.

Examples:

Sales
Orders
Transactions
Inventory movements

Characteristics:

Contains foreign keys referring to dimensions
Contains numeric measures (e.g., quantity, revenue)

Dimension Tables

Dimension tables store contextual attributes that describe facts.

Examples:

Customer (name, segment, region)
Product (category, brand)
Date (calendar attributes)
Store or location

Characteristics:

Typically smaller than fact tables
Used to filter and group measures

Building a Star Schema for a Semantic Model

1. Identify the Grain of the Fact Table

The grain defines the level of detail in the fact table — for example:

One row per sales transaction per customer per day

Understand the grain before building dimensions.

2. Design Dimension Tables

Dimensions should be:

Descriptive
De-duplicated
Hierarchical where relevant (e.g., Country > State > City)

Example:

DimProduct	DimCustomer	DimDate
ProductID	CustomerID	DateKey
Name	Name	Year
Category	Segment	Quarter
Brand	Region	Month

3. Define Relationships

Semantic models should have clear relationships:

Fact → Dimension: one-to-many
No ambiguous cycles
Avoid overly complex circular relationships

In a star schema:

Fact table joins to each dimension
Dimensions do not join to each other directly

4. Import into Semantic Model

In Power BI Desktop or Fabric:

Load fact and dimension tables
Validate relationships
Ensure correct cardinality
Mark the Date dimension as a Date table if appropriate

Benefits in Semantic Modeling

Benefit	Description
Performance	Simplified relationships yield faster queries
Usability	Model is intuitive for report authors
Maintenance	Easier to document and manage
DAX Simplicity	Measures use clear filter paths

DAX and Star Schema

Star schemas make DAX measures more predictable:

Example measure:

Total Sales = SUM(FactSales[SalesAmount])

With a proper star schema:

Filtering by dimension (e.g., DimCustomer[Region] = “West”) automatically propagates to the fact table
DAX measure logic is clean and consistent

Star Schema vs Snowflake Schema

Feature	Star Schema	Snowflake Schema
Complexity	Simple	More complex
Query performance	Typically better	Slightly slower
Modeling effort	Lower	Higher
Normalization	Low	High

For analytical workloads (like in Fabric and Power BI), star schemas are generally preferred.

When to Apply a Star Schema

Use star schema design when:

You are building semantic models for BI/reporting
Data is sourced from multiple systems
You need to support slicing and dicing by multiple dimensions
Performance and maintainability are priorities

Semantic models built on star schemas work well with:

Import mode
Direct Lake with dimensional context
Composite models

Common Exam Scenarios

You might encounter questions like:

“Which table should be the fact in this model?”
“Why should dimensions be separated from fact tables?”
“How does a star schema improve performance in a semantic model?”

Key answers will focus on:

Simplified relationships
Better DAX performance
Intuitive filtering and slicing

Best Practices for Semantic Star Schemas

Explicitly define date tables and mark them as such
Avoid many-to-many relationships where possible
Keep dimensions denormalized (flattened)
Ensure fact tables have surrogate keys linking to dimensions
Validate cardinality and relationship directions

Exam Tip

If a question emphasizes performance, simplicity, clear filtering behavior, and ease of reporting, a star schema is likely the correct design choice / optimal answer.

Summary

Implementing a star schema for a semantic model is a proven best practice in analytics:

Central fact table
Descriptive dimensions
One-to-many relationships
Optimized for DAX and interactive reporting

This approach supports Fabric’s goal of providing fast, flexible, and scalable analytics.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions

1. What is the primary purpose of a star schema in a semantic model?

A. To normalize data to reduce storage
B. To optimize transactional workloads
C. To simplify analytics and improve query performance
D. To enforce row-level security

Correct Answer: C

Explanation:
Star schemas are designed specifically for analytics. They simplify relationships and improve query performance by organizing data into fact and dimension tables.

2. In a star schema, what type of data is typically stored in a fact table?

A. Descriptive attributes such as names and categories
B. Hierarchical lookup values
C. Numeric measures related to business processes
D. User-defined calculated columns

Correct Answer: C

Explanation:
Fact tables store measurable, numeric values such as revenue, quantity, or counts, which are analyzed across dimensions.

3. Which relationship type is most common between fact and dimension tables in a star schema?

A. One-to-one
B. One-to-many
C. Many-to-many
D. Bidirectional many-to-many

Correct Answer: B

Explanation:
Each dimension record (e.g., a customer) can relate to many fact records (e.g., multiple sales), making one-to-many relationships standard.

4. Why are star schemas preferred over snowflake schemas in Power BI semantic models?

A. Snowflake schemas require more storage
B. Star schemas improve DAX performance and model usability
C. Snowflake schemas are not supported in Fabric
D. Star schemas eliminate the need for relationships

Correct Answer: B

Explanation:
Star schemas reduce relationship complexity, making DAX calculations simpler and improving query performance.

5. Which table should typically contain a DateKey column in a star schema?

A. Dimension tables only
B. Fact tables only
C. Both fact and dimension tables
D. Neither table type

Correct Answer: C

Explanation:
The fact table uses DateKey as a foreign key, while the Date dimension uses it as a primary key.

6. What is the “grain” of a fact table?

A. The number of rows in the table
B. The level of detail represented by each row
C. The number of dimensions connected
D. The data type of numeric columns

Correct Answer: B

Explanation:
Grain defines what a single row represents (e.g., one sale per customer per day).

7. Which modeling practice helps ensure optimal performance in a semantic model?

A. Creating relationships between dimension tables
B. Using many-to-many relationships by default
C. Keeping dimensions denormalized
D. Storing text attributes in the fact table

Correct Answer: C

Explanation:
Denormalized (flattened) dimension tables reduce joins and improve query performance in analytic models.

8. What happens when a dimension is used to filter a report in a properly designed star schema?

A. The filter applies only to the dimension table
B. The filter automatically propagates to the fact table
C. The filter is ignored by measures
D. The filter causes a many-to-many relationship

Correct Answer: B

Explanation:
Filters flow from dimension tables to the fact table through one-to-many relationships.

9. Which scenario is best suited for a star schema in a semantic model?

A. Real-time transactional processing
B. Log ingestion with high write frequency
C. Interactive reporting with slicing and aggregation
D. Application-level CRUD operations

Correct Answer: C

Explanation:
Star schemas are optimized for analytical queries involving aggregation, filtering, and slicing.

10. What is a common modeling mistake when implementing a star schema?

A. Using surrogate keys
B. Creating direct relationships between dimension tables
C. Marking a date table as a date table
D. Defining one-to-many relationships

Correct Answer: B

Explanation:
Dimensions should not typically relate to each other directly in a star schema, as this introduces unnecessary complexity.

Analytics, Business Intelligence, Business Intelligence (BI) Development, Data Analysis, Data Cleaning, Data Development, Data Modeling, Data Quality Assurance, Data Security, Data Strategy, Data Visualization, Data Warehousing, Data Wrangling, DP-600, Microsoft Certification, Microsoft Fabric, Performance Tuning, Power BI December 28, 2025

Select, Filter, and Aggregate Data Using DAX

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Query and analyze data 
        --> Select, Filter, and Aggregate Data Using DAX

Data Analysis Expressions (DAX) is a formula language used to create dynamic calculations in Power BI semantic models. Unlike SQL or KQL, DAX works within the analytical model and is designed for filter context–aware calculations, interactive reporting, and business logic. For DP-600, you should understand how to use DAX to select, filter, and aggregate data within a semantic model for analytics and reporting.

What Is DAX?

DAX is similar to Excel formulas but optimized for relational, in-memory analytics. It is used in:

Measures (dynamic calculations)
Calculated columns (row-level derived values)
Calculated tables (additional, reusable query results)

In a semantic model, DAX queries run in response to visuals and can produce results based on current filters and slicers.

Selecting Data in DAX

DAX itself doesn’t use a traditional SELECT statement like SQL. Instead:

Data is selected implicitly by filter context
DAX measures operate over table columns referenced in expressions

Example of a simple DAX measure selecting and displaying sales:

Total Sales = SUM(Sales[SalesAmount])

Here:

Sales[SalesAmount] references the column in the Sales table
The measure returns the sum of all values in that column

Filtering Data in DAX

Filtering in DAX is context-driven and can be applied in multiple ways:

1. Implicit Filters

Visual-level filters and slicers automatically apply filters to DAX measures.

Example:
A card visual showing Total Sales will reflect only the filtered subset by product or date.

2. FILTER Function

Used within measures or calculated tables to narrow down rows:

HighValueSales = CALCULATE(
    SUM(Sales[SalesAmount]),
    FILTER(Sales, Sales[SalesAmount] > 1000)
)

Here:

FILTER returns a table with rows meeting the condition
CALCULATE modifies the filter context

3. CALCULATE as Filter Modifier

CALCULATE changes the context under which a measure evaluates:

SalesLastYear = CALCULATE(
    [Total Sales],
    SAMEPERIODLASTYEAR(Date[Date])
)

This measure selects data for the previous year based on current filters.

Aggregating Data in DAX

Aggregation in DAX is done using built-in functions and is influenced by filter context.

Common Aggregation Functions

SUM() — totals a numeric column
AVERAGE() — computes the mean
COUNT() / COUNTA() — row counts
MAX() / MIN() — extreme values
SUMX() — row-by-row iteration and sum

Example of row-by-row aggregation:

Total Profit = SUMX(
    Sales,
    Sales[SalesAmount] - Sales[Cost]
)

This computes the difference per row and then sums it.

Filter Context and Row Context

Understanding how DAX handles filter context and row context is essential:

Filter context: Set by the report (slicers, column filters) or modified by CALCULATE
Row context: Used in calculated columns and iteration functions (SUMX, FILTER)

DAX measures always respect the current filter context unless explicitly modified.

Grouping and Summarization

While DAX doesn’t use GROUP BY in the same way SQL does, measures inherently aggregate over groups determined by filter context or visual grouping.

Example:
In a table visual grouped by Product Category, the measure Total Sales returns aggregated values per category automatically.

Time Intelligence Functions

DAX includes built-in functions for time-based aggregation:

TOTALYTD(), TOTALQTD(), TOTALMTD() — year-to-date, quarter-to-date, month-to-date
SAMEPERIODLASTYEAR() — compare values year-over-year
DATESINPERIOD() — custom period

Example:

SalesYTD = TOTALYTD(
    [Total Sales],
    Date[Date]
)

Best Practices

Use measures, not calculated columns, for dynamic, filter-sensitive aggregations.
Let visuals control filter context via slicers, rows, and columns.
Avoid unnecessary row-by-row calculations when simple aggregation functions suffice.
Explicitly use CALCULATE to modify filter context for advanced scenarios.

When to Use DAX vs SQL/KQL

Scenario	Best Tool
Static relational querying	SQL
Streaming/event analytics	KQL
Report-level dynamic calculations	DAX
Interactive dashboards with slicers	DAX

Example Use Cases

1. Total Sales Measure

Total Sales = SUM(Sales[SalesAmount])

2. Filtered Sales for Big Orders

Big Orders Sales = CALCULATE(
    [Total Sales],
    Sales[SalesAmount] > 1000
)

3. Year-over-Year Sales

Sales YOY = CALCULATE(
    [Total Sales],
    SAMEPERIODLASTYEAR(Date[Date])
)

Key Takeaways for the Exam

DAX operates based on filter context and evaluates measures dynamically.
There is no explicit SELECT statement — rather, measures compute values based on current context.
Use CALCULATE to change filter context.
Aggregation functions (e.g., SUM, COUNT, AVERAGE) are fundamental to summarizing data.
Filtering functions like FILTER and time intelligence functions enhance analytical flexibility.

Final Exam Tips

If a question mentions interactive reports, dynamic filters, slicers, or time-based comparisons, DAX is likely the right language to use for the solution.
Measures + CALCULATE + filter context appear frequently.
If the question mentions slicers, visuals, or dynamic results, think DAX measure.
Time intelligence functions are high-value topics.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions

1. Which DAX function is primarily used to modify the filter context of a calculation?

A. FILTER
B. SUMX
C. CALCULATE
D. ALL

Correct answer: ✅ C
Explanation: CALCULATE changes the filter context under which an expression is evaluated.

2. A Power BI report contains slicers for Year and Product. A measure returns different results as slicers change. What concept explains this behavior?

A. Row context
B. Filter context
C. Evaluation context
D. Query context

Correct answer: ✅ B
Explanation: Filter context is affected by slicers, filters, and visual interactions.

3. Which DAX function iterates row by row over a table to perform a calculation?

A. SUM
B. COUNT
C. AVERAGE
D. SUMX

Correct answer: ✅ D
Explanation: SUMX evaluates an expression for each row and then aggregates the results.

4. You want to calculate total sales only for transactions greater than $1,000. Which approach is correct?

SUM(Sales[SalesAmount] > 1000)

FILTER(Sales, Sales[SalesAmount] > 1000)

CALCULATE(
    SUM(Sales[SalesAmount]),
    Sales[SalesAmount] > 1000
)

SUMX(Sales, Sales[SalesAmount] > 1000)

Correct answer: ✅ C
Explanation: CALCULATE applies a filter condition while aggregating.

5. Which DAX object is evaluated dynamically based on report filters and slicers?

A. Calculated column
B. Calculated table
C. Measure
D. Relationship

Correct answer: ✅ C
Explanation: Measures respond dynamically to filter context; calculated columns do not.

6. Which function is commonly used to calculate year-to-date (YTD) values in DAX?

A. DATESINPERIOD
B. SAMEPERIODLASTYEAR
C. TOTALYTD
D. CALCULATE

Correct answer: ✅ C
Explanation: TOTALYTD is designed for year-to-date aggregations.

7. A DAX measure returns different totals when placed in a table visual grouped by Category. Why does this happen?

A. The measure contains row context
B. The table visual creates filter context
C. The measure is recalculated per row
D. Relationships are ignored

Correct answer: ✅ B
Explanation: Visual grouping applies filter context automatically.

8. Which DAX function returns a table instead of a scalar value?

A. SUM
B. AVERAGE
C. FILTER
D. COUNT

Correct answer: ✅ C
Explanation: FILTER returns a table that can be consumed by other functions like CALCULATE.

9. Which scenario is the best use case for DAX instead of SQL or KQL?

A. Cleaning raw data before ingestion
B. Transforming streaming event data
C. Creating interactive report-level calculations
D. Querying flat files in a lakehouse

Correct answer: ✅ C
Explanation: DAX excels at dynamic, interactive calculations in semantic models.

10. What is the primary purpose of the `SAMEPERIODLASTYEAR` function?

A. Aggregate values by fiscal year
B. Remove filters from a date column
C. Compare values to the previous year
D. Calculate rolling averages

Correct answer: ✅ C
Explanation: It shifts the date context back one year for year-over-year analysis.

Analytics, Big Data, Business Intelligence (BI) Development, Data Analysis, Data Cleaning, Data Development, Data Integration, Data Integration (ETL), Data Modeling, Data Security, Data Strategy, Data Visualization, Data Warehousing, DP-600, Microsoft Certification, Microsoft Fabric, Performance Tuning December 28, 2025

Select, Filter, and Aggregate Data by Using KQL

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Query and analyze data 
        --> Select, filter, and aggregate data by using KQL

The Kusto Query Language (KQL) is a read-only request language used for querying large, distributed, event-driven datasets — especially within Eventhouse and Azure Data Explorer–backed workloads in Microsoft Fabric. KQL enables you to select, filter, and aggregate data efficiently in scenarios involving high-velocity data like telemetry, logs, and streaming events.

For the DP-600 exam, you should understand KQL basics and how it supports data exploration and analytical summarization in a real-time analytics context.

KQL Basics

KQL is designed to be expressive and performant for time-series or log-like data. Queries are built as a pipeline of operations, where each operator transforms the data and passes it to the next.

Selecting Data

In KQL, the project operator performs the equivalent of selecting columns:

EventHouseTable
| project Timestamp, Country, EventType, Value

project lets you choose which fields to include
You can rename fields inline: | project Time=Timestamp, Sales=Value

Exam Tip:
Use project early to limit data to relevant columns and reduce processing downstream.

Filtering Data

Filtering in KQL is done using the where operator:

EventHouseTable
| where Country == "USA"

Multiple conditions can be combined with and/or:

| where Value > 100 and EventType == "Purchase"

Filtering early in the pipeline improves performance by reducing the dataset before subsequent transformations.

Aggregating Data

KQL uses the summarize operator to perform aggregations such as counts, sums, averages, min, max, etc.

Example – Aggregate Total Sales:

EventHouseTable
| where EventType == "Purchase"
| summarize TotalSales = sum(Value)

Example – Grouped Aggregation:

EventHouseTable
| where EventType == "Purchase"
| summarize CountEvents = count(), TotalSales = sum(Value) by Country

Time-Bucketed Aggregation

KQL supports time binning using bin():

EventHouseTable
| where EventType == "Purchase"
| summarize TotalSales = sum(Value) by Country, bin(Timestamp, 1h)

This groups results into hourly buckets, which is ideal for time-series analytics and dashboards.

Common KQL Aggregation Functions

Function	Description
`count()`	Total number of records
`sum(column)`	Sum of numeric values
`avg(column)`	Average value
`min(column)` / `max(column)`	Minimum / maximum value
`percentile(column, p)`	Percentile calculation

Combining Operators

KQL queries are often a combination of select, filter, and aggregation:

EventHouseTable
| where EventType == "Purchase" and Timestamp >= ago(7d)
| project Country, Value, Timestamp
| summarize TotalSales = sum(Value), CountPurchases = count() by Country
| order by TotalSales desc

This pipeline:

Filters for purchases in the last 7 days
Projects relevant fields
Aggregates totals and counts
Orders the result by highest total sales

KQL vs SQL: What’s Different?

Feature	SQL	KQL
Syntax	Declarative	Pipeline-based
Joins	Extensive support	Limited pivot semantics
Use cases	Relational data	Time-series, event, logs
Aggregation	`GROUP BY`	`summarize`

KQL shines when querying streaming or event data at scale — exactly the kinds of scenarios Eventhouse targets.

Performance Considerations in KQL

Apply where as early as possible.
Use project to keep only necessary fields.
Time-range filters (e.g., last 24h) drastically reduce scan size.
KQL runs distributed and is optimized for large event streams.

Practical Use Cases

Example – Top Countries by Event Count:

EventHouseTable
| summarize EventCount = count() by Country
| top 10 by EventCount

Example – Average Value of Events per Day:

EventHouseTable
| where EventType == "SensorReading"
| summarize AvgValue = avg(Value) by bin(Timestamp, 1d)

Exam Relevance

In DP-600 exam scenarios involving event or near-real-time analytics (such as with Eventhouse or KQL-backed lakehouse sources), you may be asked to:

Write or interpret KQL that:
- projects specific fields
- filters records based on conditions
- aggregates and groups results
Choose the correct operator (where, project, summarize) for a task
Understand how KQL can be optimized with time-based filtering

Key Takeaways

project selects specific fields.
where filters rows based on conditions.
summarize performs aggregations.
Time-series queries often use bin() for bucketing.
The KQL pipeline enables modular, readable, and optimized queries for large datasets.

Final Exam Tips

If a question involves event streams, telemetry, metrics over time, or real-time analytics, and asks about summarizing values after filtering, think KQL with where, project, and summarize.

project → select columns
where → filter rows
summarize → aggregate and group
bin() → time-based grouping
KQL is pipeline-based, not declarative like SQL
Used heavily in Eventhouse / real-time analytics

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions

1. Which KQL operator is used to select specific columns from a dataset?

A. select
B. where
C. project
D. summarize

✅ Correct Answer: C

Explanation:
project is the KQL operator used to select and optionally rename columns. KQL does not use SELECT like SQL.

2. Which operator is used to filter rows in a KQL query?

A. filter
B. where
C. having
D. restrict

✅ Correct Answer: B

Explanation:
The where operator filters rows based on conditions and is typically placed early in the query pipeline for performance.

3. How do you count the number of records in a table using KQL?

A. count(*)
B. summarize count()
C. summarize count(*)
D. summarize count()

✅ Correct Answer: D

Explanation:
In KQL, aggregation functions are used inside summarize. count() counts rows; count(*) is SQL syntax.

4. Which KQL operator performs aggregations similar to SQL’s GROUP BY?

A. group
B. aggregate
C. summarize
D. partition

✅ Correct Answer: C

Explanation:
summarize is the KQL operator used for aggregation and grouping.

5. Which query returns total sales grouped by country?

| group by Country sum(Value)

| summarize sum(Value) Country

| summarize TotalSales = sum(Value) by Country

| aggregate Value by Country

✅ Correct Answer: C

Explanation:
KQL requires explicit naming of aggregates and grouping using summarize … by.

6. What is the purpose of the `bin()` function in KQL?

A. To sort data
B. To group numeric values
C. To bucket values into time intervals
D. To remove null values

✅ Correct Answer: C

Explanation:
bin() groups values—commonly timestamps—into fixed-size intervals (for example, hourly or daily buckets).

7. Which query correctly summarizes event counts per hour?

| summarize count() by Timestamp

| summarize count() by hour(Timestamp)

| summarize count() by bin(Timestamp, 1h)

| count() by Timestamp

✅ Correct Answer: C

Explanation:
Time-based grouping in KQL requires bin() to define the interval size.

8. Which operator should be placed as early as possible in a KQL query for performance reasons?

A. summarize
B. project
C. order by
D. where

✅ Correct Answer: D

Explanation:
Applying where early reduces the dataset size before further processing, improving performance.

9. Which KQL query returns the top 5 countries by event count?

| top 5 Country by count()

| summarize count() by Country | top 5 by count_

| summarize EventCount = count() by Country | top 5 by EventCount

| order by Country limit 5

✅ Correct Answer: C

Explanation:
You must first aggregate using summarize, then use top based on the aggregated column.

10. In Microsoft Fabric, KQL is primarily used with which workload?

A. Warehouse
B. Lakehouse SQL endpoint
C. Eventhouse
D. Semantic model

✅ Correct Answer: C

Explanation:
KQL is the primary query language for Eventhouse and real-time analytics scenarios in Microsoft Fabric.

Analytics, Big Data, Business Intelligence, Business Intelligence (BI) Development, Data Analysis, Data Cleaning, Data Development, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Quality Assurance, Data Security, Data Visualization, Data Warehousing, Data Wrangling, DP-600, Microsoft Certification, Microsoft Fabric, Performance Tuning, Python, SQL December 28, 2025

Select, Filter, and Aggregate Data Using SQL

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Query and analyze data 
        --> Select, Filter, and Aggregate Data Using SQL

Working with SQL to select, filter, and aggregate data is a core skill for analytics engineers using Microsoft Fabric. Whether querying data in a warehouse, lakehouse SQL analytics endpoint, or semantic model via DirectQuery, SQL enables precise data retrieval and summarization for reporting, dashboards, and analytics solutions.

For DP-600, you should understand how to construct SQL queries that perform:

Selecting specific data columns
Filtering rows based on conditions
Aggregating values with grouping and summary functions

SQL Data Selection

Selecting data refers to using the SELECT clause to choose which columns or expressions to return.

Example:

SELECT
    CustomerID,
    OrderDate,
    SalesAmount
FROM Sales;

Use * to return all columns:
SELECT * FROM Sales;
Use expressions to compute derived values: SELECT OrderDate, SalesAmount, SalesAmount * 1.1 AS AdjustedRevenue FROM Sales;

Exam Tip: Be purposeful in selecting only needed columns to improve performance.

SQL Data Filtering

Filtering data determines which rows are returned based on conditions using the WHERE clause.

Basic Filtering:

SELECT *
FROM Sales
WHERE OrderDate >= '2025-01-01';

Combined Conditions:

AND: WHERE Country = 'USA' AND SalesAmount > 1000
OR: WHERE Region = 'East' OR Region = 'West'

Null and Missing Value Filters:

WHERE SalesAmount IS NOT NULL

Exam Tip: Understand how WHERE filters reduce dataset size before aggregation.

SQL Aggregation

Aggregation summarizes grouped rows using functions like SUM, COUNT, AVG, MIN, and MAX.

Basic Aggregation:

SELECT
    SUM(SalesAmount) AS TotalSales
FROM Sales;

Grouped Aggregation:

SELECT
    Country,
    SUM(SalesAmount) AS TotalSales,
    COUNT(*) AS OrderCount
FROM Sales
GROUP BY Country;

Filtering After Aggregation:

Use HAVING instead of WHERE to filter aggregated results:

SELECT
    Country,
    SUM(SalesAmount) AS TotalSales
FROM Sales
GROUP BY Country
HAVING SUM(SalesAmount) > 100000;

Exam Tip:

Use WHERE for row-level filters before grouping.
Use HAVING to filter group-level aggregates.

Combining Select, Filter, and Aggregate

A complete SQL query often blends all three:

SELECT
    ProductCategory,
    COUNT(*) AS Orders,
    SUM(SalesAmount) AS TotalSales,
    AVG(SalesAmount) AS AvgSale
FROM Sales
WHERE OrderDate BETWEEN '2025-01-01' AND '2025-12-31'
GROUP BY ProductCategory
ORDER BY TotalSales DESC;

This example:

Selects specific columns and expressions
Filters by date range
Aggregates by product category
Orders results by summary metric

SQL in Different Fabric Workloads

Workload	SQL Usage
Warehouse	Standard T-SQL for BI queries
Lakehouse SQL Analytics	SQL against Delta tables
Semantic Models via DirectQuery	SQL pushed to source where supported
Dataflows/Power Query	SQL-like operations through M (not direct SQL)

Performance and Pushdown

When using SQL in Fabric:

Engines push filters and aggregations down to the data source for performance.
Select only needed columns early to limit data movement.
Avoid SELECT * in production queries unless necessary.

Key SQL Concepts for the Exam

Concept	Why It Matters
SELECT	Defines what data to retrieve
WHERE	Filters data before aggregation
GROUP BY	Organizes rows into groups
HAVING	Filters after aggregation
Aggregate functions	Summarize numeric data

Understanding how these work together is essential for creating analytics-ready datasets.

Common Exam Scenarios

You may be asked to:

Write SQL to filter data based on conditions
Summarize data across groups
Decide whether to use WHERE or HAVING
Identify the correct SQL pattern for a reporting requirement

Example exam prompt:

“Which SQL query correctly returns the total sales per region, only for regions with more than 1,000 orders?”

Understanding aggregate filters (HAVING) and groupings will be key.

Final Exam Tips

If a question mentions:

“Return summary metrics”
“Only include rows that meet conditions”
“Group results by category”

…you’re looking at combining SELECT, WHERE, and GROUP BY in SQL.

WHERE filters rows before aggregation
HAVING filters after aggregation
GROUP BY is required for per-group metrics
Use aggregate functions intentionally
Performance matters — avoid unnecessary columns

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions

**1. Which SQL clause is used to filter rows before aggregation occurs?**

A. HAVING
B. GROUP BY
C. WHERE
D. ORDER BY

✅ Correct Answer: C

Explanation:
The WHERE clause filters individual rows before any aggregation or grouping takes place. HAVING filters results after aggregation.

2. You need to calculate total sales per product category. Which clause is required?

A. WHERE
B. GROUP BY
C. ORDER BY
D. HAVING

✅ Correct Answer: B

Explanation:
GROUP BY groups rows so aggregate functions (such as SUM) can be calculated per category.

3. Which function returns the number of rows in each group?

A. SUM()
B. COUNT()
C. AVG()
D. MAX()

✅ Correct Answer: B

Explanation:
COUNT() counts the number of rows in a group. It is commonly used to count records or transactions.

4. Which query correctly filters aggregated results?

WHERE SUM(SalesAmount) > 10000

HAVING SUM(SalesAmount) > 10000

GROUP BY SUM(SalesAmount) > 10000

ORDER BY SUM(SalesAmount) > 10000

✅ Correct Answer: B

Explanation:
HAVING is used to filter aggregated values. WHERE cannot reference aggregate functions.

5. Which SQL statement returns the total number of orders?

SELECT COUNT(*) FROM Orders;

SELECT SUM(*) FROM Orders;

SELECT TOTAL(Orders) FROM Orders;

SELECT COUNT(Orders) FROM Orders;

✅ Correct Answer: A

Explanation:
COUNT(*) counts all rows in a table, making it the correct way to return total order count.

6. Which clause is used to sort aggregated query results?

A. GROUP BY
B. WHERE
C. ORDER BY
D. HAVING

✅ Correct Answer: C

Explanation:
ORDER BY sorts the final result set, including aggregated columns.

7. What happens if a column in the SELECT statement is not included in the GROUP BY clause or an aggregate function?

A. The query runs but returns incorrect results
B. SQL automatically groups it
C. The query fails
D. The column is ignored

✅ Correct Answer: C

Explanation:
In SQL, any column in SELECT must either be aggregated or included in GROUP BY.

8. Which query returns average sales amount per country?

SELECT Country, AVG(SalesAmount)
FROM Sales;

SELECT Country, AVG(SalesAmount)
FROM Sales
GROUP BY Country;

SELECT Country, SUM(SalesAmount)
GROUP BY Country;

SELECT AVG(SalesAmount)
FROM Sales
GROUP BY Country;

✅ Correct Answer: B

Explanation:
Grouping by Country allows AVG(SalesAmount) to be calculated per country.

9. Which filter removes rows with NULL values in a column?

WHERE SalesAmount = NULL

WHERE SalesAmount <> NULL

WHERE SalesAmount IS NOT NULL

WHERE NOT NULL SalesAmount

✅ Correct Answer: C

Explanation:
SQL uses IS NULL and IS NOT NULL to check for null values.

10. Which SQL pattern is most efficient for analytics queries in Microsoft Fabric?

A. Selecting all columns and filtering later
B. Using SELECT * for simplicity
C. Filtering early and selecting only needed columns
D. Aggregating without grouping

✅ Correct Answer: C

Explanation:
Filtering early and selecting only required columns improves performance by reducing data movement—an important Fabric best practice.

Analytics, Business Intelligence, Business Intelligence (BI) Development, Data Analysis, Data Development, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Security, Data Strategy, Data Visualization, Data Warehousing, Data Wrangling, DP-600, Microsoft Certification, Microsoft Fabric, Performance Tuning, Power BI, Power Query, SQL December 28, 2025

Implement a Star Schema for a Lakehouse or Warehouse

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Transform data 
        --> Implement a star schema for a lakehouse or warehouse

Designing and implementing an effective schema is foundational to efficient analytics. In Microsoft Fabric, structuring your data into a star schema dramatically improves query performance, simplifies reporting, and aligns with best practices for BI workloads.

This article explains what a star schema is, why it matters in Fabric, and how to implement it in a lakehouse or data warehouse.

What Is a Star Schema?

A star schema is a relational modeling technique that organizes data into two primary types of tables:

Fact tables: Contain measurable, quantitative data (metrics, transactions, events).
Dimension tables: Contain descriptive attributes (e.g., customer info, product details, dates).

Star schemas get their name because the design resembles a star—a central fact table linked to multiple dimension tables.

Why Use a Star Schema?

A star schema offers multiple advantages for analytical workloads:

Improved query performance: Queries are simplified and optimized due to straightforward joins.
Simpler reporting: BI tools like Power BI map naturally to star schemas.
Aggregations and drill-downs: Dimension tables support filtering and hierarchy reporting.
Better scalability: Optimized for large datasets and parallel processing.

In Fabric, both lakehouses and warehouses support star schema implementations, depending on workload and user needs.

Core Components of a Star Schema

1. Fact Tables

Fact tables store the numeric measurements of business processes.
Common characteristics:

Contains keys linking to dimensions
Often large and wide
Used for aggregations (SUM, COUNT, AVG, etc.)

Examples:
Sales transactions, inventory movement, website events

2. Dimension Tables

Dimension tables describe contextual attributes.
Common characteristics:

Contain descriptive fields
Usually smaller than fact tables
Often used for filtering/grouping

Examples:
Customer, product, date, geography

Implementing a Star Schema in a Lakehouse

Lakehouses in Fabric support Delta format tables and both Spark SQL and T-SQL analytics endpoints.

Steps to Implement:

Ingest raw data into your lakehouse (as files or staging tables).
Transform data:
- Cleanse and conform fields
- Derive business keys
Create dimension tables:
- Deduplicate
- Add descriptive attributes
Create fact tables:
- Join transactional data to dimension keys
- Store numeric measures
Optimize:
- Partition and Z-ORDER for performance

Tools You Might Use:

Notebooks (PySpark)
Lakehouse SQL
Data pipelines

Exam Tip:
Lakehouses are ideal when you need flexibility, schema evolution, or combined batch + exploratory analytics.

Implementing a Star Schema in a Warehouse

Data warehouses in Fabric provide a SQL-optimized store designed for BI workloads.

Steps to Implement:

Stage raw data in warehouse tables
Build conforming dimension tables
Build fact tables with proper keys
Add constraints and indexes (as appropriate)
Optimize with materialized views or aggregations

Warehouse advantages:

Strong query performance for BI
Native SQL analytics
Excellent integration with Power BI and semantic models

Exam Tip:
Choose a warehouse when your priority is high-performance BI analytics with well-defined dimensional models.

Common Star Schema Patterns

Conformed Dimensions

Dimensions shared across multiple fact tables
Ensures consistent filtering and reporting across business processes

Slowly Changing Dimensions (SCD)

Maintain historical attribute changes
Types include Type 1 (overwrite) and Type 2 (versioning)

Fact Table Grain

Define the “grain” (level of detail) clearly—for example, “one row per sales transaction.”

Star Schema and Power BI Semantic Models

Semantic models often sit on top of star schemas:

Fact tables become measure containers
Dimensions become filtering hierarchies
Reduces DAX complexity
Improves performance

Best Practice: Structure your lakehouse or warehouse into a star schema before building the semantic model.

Star Schema in Lakehouse vs Warehouse

Feature	Lakehouse	Warehouse
Query engines	Spark & SQL	SQL only
Best for	Mixed workloads (big data + SQL)	BI & reporting
Optimization	Partition/Z-ORDER	Indexing and statistics
Tooling	Notebooks, pipelines	SQL scripts, BI artifacts
Schema complexity	Flexible	Rigid

Governance and Performance Considerations

Use consistent keys across facts and dimensions
Validate referential integrity where possible
Avoid wide, unindexed tables for BI queries
Apply sensitivity labels on schemas for governance
Document schema and business logic

What to Know for the DP-600 Exam

Be prepared to:

Explain the purpose of star schema components
Identify when to implement star schema in lakehouses vs warehouses
Recognize patterns like conformed dimensions and SCDs
Understand performance implications of schema design
Relate star schema design to Power BI and semantic models

Final Exam Tip
If the question emphasizes high-performance reporting, simple joins, and predictable filtering, think star schema.
If it mentions big data exploration or flexible schema evolution, star schema in a lakehouse may be part of the answer.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions

1. What is the defining characteristic of a star schema?

A. Multiple fact tables connected through bridge tables
B. A central fact table connected directly to dimension tables
C. Fully normalized transactional tables
D. A schema optimized for OLTP workloads

Correct Answer: B

Explanation:
A star schema consists of a central fact table directly linked to surrounding dimension tables, forming a star-like structure optimized for analytics.

2. Which type of data is stored in a fact table?

A. Descriptive attributes such as names and categories
B. Hierarchical metadata for navigation
C. Quantitative, measurable values
D. User access permissions

Correct Answer: C

Explanation:
Fact tables store numeric measures (e.g., sales amount, quantity) that are aggregated during analytical queries.

3. Which table type is typically smaller and used for filtering and grouping?

A. Fact table
B. Dimension table
C. Bridge table
D. Staging table

Correct Answer: B

Explanation:
Dimension tables store descriptive attributes and are commonly used for filtering, grouping, and slicing fact data in reports.

4. Why are star schemas preferred for Power BI semantic models?

A. They eliminate the need for relationships
B. They align naturally with BI tools and optimize query performance
C. They reduce OneLake storage usage
D. They replace DAX calculations

Correct Answer: B

Explanation:
Power BI and other BI tools are optimized for star schemas, which simplify joins, reduce model complexity, and improve performance.

5. When implementing a star schema in a Fabric lakehouse, which storage format is typically used?

A. CSV
B. JSON
C. Parquet
D. Delta

Correct Answer: D

Explanation:
Fabric lakehouses store tables in Delta format, which supports ACID transactions and efficient analytical querying.

6. Which scenario most strongly suggests using a warehouse instead of a lakehouse for a star schema?

A. Schema evolution and exploratory data science
B. High-performance, SQL-based BI reporting
C. Streaming ingestion of real-time events
D. Semi-structured data exploration

Correct Answer: B

Explanation:
Fabric warehouses are optimized for SQL-based analytics and BI workloads, making them ideal for star schemas supporting reporting scenarios.

7. What does the “grain” of a fact table describe?

A. The number of dimensions in the table
B. The level of detail represented by each row
C. The size of the table in storage
D. The indexing strategy

Correct Answer: B

Explanation:
The grain defines the level of detail for each row in the fact table (e.g., one row per transaction or per day).

8. What is a conformed dimension?

A. A dimension used by only one fact table
B. A dimension that contains only numeric values
C. A shared dimension used consistently across multiple fact tables
D. A dimension generated dynamically at query time

Correct Answer: C

Explanation:
Conformed dimensions are shared across multiple fact tables, enabling consistent filtering and reporting across different business processes.

9. Which design choice improves performance when querying star schemas?

A. Highly normalized dimension tables
B. Complex many-to-many relationships
C. Simple joins between fact and dimension tables
D. Storing dimensions inside the fact table

Correct Answer: C

Explanation:
Star schemas minimize join complexity by using simple, direct relationships between facts and dimensions, improving query performance.

10. Which statement best describes how star schemas fit into the Fabric analytics lifecycle?

A. They replace semantic models entirely
B. They are used only for real-time analytics
C. They provide an analytics-ready structure for reporting and modeling
D. They are required only for data ingestion

Correct Answer: C

Explanation:
Star schemas organize data into an analytics-ready structure that supports semantic models, reporting, and scalable BI workloads.

Analytics, Big Data, Business Intelligence, Business Intelligence (BI) Development, Business Intelligence Platform, Data Analysis, Data Cleaning, Data Development, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Quality Assurance, Data Security, Data Strategy, Data Visualization, Data Warehousing, Data Wrangling, Databases, DP-600, Microsoft Certification, Microsoft Fabric, SQL December 28, 2025

Create Views, Functions, and Stored Procedures

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Transform data 
        --> Create views, functions, and stored procedures

Creating views, functions, and stored procedures is a core data transformation and modeling skill for analytics engineers working in Microsoft Fabric. These objects help abstract complexity, improve reusability, enforce business logic, and optimize downstream analytics and reporting.

This section of the DP-600 exam focuses on when, where, and how to use these objects effectively across Fabric components such as Lakehouses, Warehouses, and SQL analytics endpoints.

Views

What are Views?

A view is a virtual table defined by a SQL query. It does not store data itself but presents data dynamically from underlying tables.

Where Views Are Used in Fabric

Fabric Data Warehouse
Lakehouse SQL analytics endpoint
Exposed to Power BI semantic models and other consumers

Common Use Cases

Simplify complex joins and transformations
Present curated, analytics-ready datasets
Enforce column-level or row-level filtering logic
Provide a stable schema over evolving raw data

Key Characteristics

Always reflect the latest data
Can be used like tables in SELECT statements
Improve maintainability and readability
Can support security patterns when combined with permissions

Exam Tip

Know that views are ideal for logical transformations, not heavy compute or data persistence.

Functions

What are Functions?

Functions encapsulate reusable logic and return a value or a table. They help standardize calculations and transformations across queries.

Types of Functions (SQL)

Scalar functions: Return a single value (e.g., formatted date, calculated metric)
Table-valued functions (TVFs): Return a result set that behaves like a table

Where Functions Are Used in Fabric

Fabric Warehouses
SQL analytics endpoints for Lakehouses

Common Use Cases

Standardized business calculations
Reusable transformation logic
Parameterized filtering or calculations
Cleaner and more modular SQL code

Key Characteristics

Improve consistency across queries
Can be referenced in views and stored procedures
May impact performance if overused in large queries

Exam Tip

Functions promote reuse and consistency, but should be used thoughtfully to avoid performance overhead.

Stored Procedures

What are Stored Procedures?

Stored procedures are precompiled SQL code blocks that can accept parameters and perform multiple operations.

Where Stored Procedures Are Used in Fabric

Fabric Data Warehouses
SQL endpoints that support procedural logic

Common Use Cases

Complex transformation workflows
Batch processing logic
Conditional logic and control-of-flow (IF/ELSE, loops)
Data loading, validation, and orchestration steps

Key Characteristics

Can perform multiple SQL statements
Can accept input and output parameters
Improve performance by reducing repeated compilation
Support automation and operational workflows

Exam Tip

Stored procedures are best for procedural logic and orchestration, not ad-hoc analytics queries.

Choosing Between Views, Functions, and Stored Procedures

Object	Best Used For
Views	Simplifying data access and shaping datasets
Functions	Reusable calculations and logic
Stored Procedures	Complex, parameter-driven workflows

Understanding why you would choose one over another is frequently tested on the DP-600 exam.

Integration with Power BI and Analytics

Views are commonly consumed by Power BI semantic models
Functions help ensure consistent calculations across reports
Stored procedures are typically part of data preparation or orchestration, not directly consumed by reports

Governance and Best Practices

Use clear naming conventions (e.g., vw_, fn_, sp_)
Document business logic embedded in SQL objects
Minimize logic duplication across objects
Apply permissions carefully to control access
Balance reusability with performance considerations

What to Know for the DP-600 Exam

You should be comfortable with:

When to use views vs. functions vs. stored procedures
How these objects support data transformation
Their role in analytics-ready data preparation
How they integrate with Lakehouses, Warehouses, and Power BI
Performance and governance implications

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions

1. What is the primary purpose of creating a view in a Fabric lakehouse or warehouse?

A. To permanently store transformed data
B. To execute procedural logic with parameters
C. To provide a virtual, query-based representation of data
D. To orchestrate batch data loads

Correct Answer: C

Explanation:
A view is a virtual table defined by a SQL query. It does not store data but dynamically presents data from underlying tables, making it ideal for simplifying access and shaping analytics-ready datasets.

2. Which Fabric component commonly exposes views directly to Power BI semantic models?

A. Eventhouse
B. SQL analytics endpoint
C. Dataflow Gen2
D. Real-Time hub

Correct Answer: B

Explanation:
The SQL analytics endpoint (for lakehouses and warehouses) exposes tables and views that Power BI semantic models can consume using SQL-based connectivity.

3. When should you use a scalar function instead of a view?

A. When you need to return a dataset with multiple rows
B. When you need to encapsulate reusable calculation logic
C. When you need to perform batch updates
D. When you want to persist transformed data

Correct Answer: B

Explanation:
Scalar functions are designed to return a single value and are ideal for reusable calculations such as formatting, conditional logic, or standardized metrics.

4. Which object type can return a result set that behaves like a table?

A. Scalar function
B. Stored procedure
C. Table-valued function
D. View index

Correct Answer: C

Explanation:
A table-valued function (TVF) returns a table and can be used in FROM clauses, similar to a view but with parameterization support.

5. Which scenario is the best use case for a stored procedure?

A. Creating a simplified reporting dataset
B. Applying row-level filters for security
C. Running conditional logic with multiple SQL steps
D. Exposing data to Power BI reports

Correct Answer: C

Explanation:
Stored procedures are best suited for procedural logic, including conditional branching, looping, and executing multiple SQL statements as part of a workflow.

6. Why are views commonly preferred over duplicating transformation logic in reports?

A. Views improve report rendering speed automatically
B. Views centralize and standardize transformation logic
C. Views permanently store transformed data
D. Views replace semantic models

Correct Answer: B

Explanation:
Views allow transformation logic to be defined once and reused consistently across multiple reports and consumers, improving maintainability and governance.

7. What is a potential downside of overusing functions in large SQL queries?

A. Increased storage costs
B. Reduced data freshness
C. Potential performance degradation
D. Loss of security enforcement

Correct Answer: C

Explanation:
Functions, especially scalar functions, can negatively impact query performance when used extensively on large datasets due to repeated execution per row.

8. Which object is most appropriate for parameter-driven data preparation steps in a warehouse?

A. View
B. Scalar function
C. Table
D. Stored procedure

Correct Answer: D

Explanation:
Stored procedures support parameters, control-of-flow logic, and multiple statements, making them ideal for complex, repeatable data preparation tasks.

9. How do views support governance and security in Microsoft Fabric?

A. By encrypting data at rest
B. By defining workspace-level permissions
C. By exposing only selected columns or filtered rows
D. By controlling OneLake storage access

Correct Answer: C

Explanation:
Views can limit the columns and rows exposed to users, helping implement logical data access patterns when combined with permissions and security models.

10. Which statement best describes how these objects fit into Fabric’s analytics lifecycle?

A. They replace Power BI semantic models
B. They are primarily used for real-time streaming
C. They prepare and standardize data for downstream analytics
D. They manage infrastructure-level security

Correct Answer: C

Explanation:
Views, functions, and stored procedures play a key role in transforming, standardizing, and preparing data for consumption by semantic models, reports, and analytics tools.

Analytics, BI Administration, Business Intelligence, Business Intelligence (BI) Development, Business Intelligence Platform, Data Analysis, Data Development, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Munging, Data Security, Data Strategy, Data Visualization, Data Warehousing, Data Wrangling, Databases, DP-600, Microsoft Certification, Microsoft Fabric, Microsoft OneLake December 28, 2025

Choose Between a Lakehouse, Warehouse, or Eventhouse

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Get data 
        --> Choose Between a Lakehouse, Warehouse, or Eventhouse

One of the most important architectural decisions a Microsoft Fabric Analytics Engineer must make is selecting the right analytical store for a given workload. For the DP-600 exam, this topic tests your ability to choose between a Lakehouse, Warehouse, or Eventhouse based on data type, query patterns, latency requirements, and user personas.

Overview of the Three Options

Microsoft Fabric provides three primary analytics storage and query experiences:

Option	Primary Purpose
Lakehouse	Flexible analytics on files and tables using Spark and SQL
Warehouse	Enterprise-grade SQL analytics and BI reporting
Eventhouse	Real-time and near-real-time analytics on streaming data

Understanding why and when to use each is critical for DP-600 success.

Lakehouse

What Is a Lakehouse?

A Lakehouse combines the flexibility of a data lake with the structure of a data warehouse. Data is stored in Delta Lake format in OneLake and can be accessed using both Spark and SQL.

When to Choose a Lakehouse

Choose a Lakehouse when you need:

Flexible schema (schema-on-read or schema-on-write)
Support for data engineering and data science
Access to raw, curated, and enriched data
Spark-based transformations and notebooks
Mixed workloads (batch analytics, exploration, ML)

Key Characteristics

Supports files and tables
Uses Spark SQL and T-SQL endpoints
Ideal for ELT and advanced transformations
Easy integration with notebooks and pipelines

Exam signal words: flexible, raw data, Spark, data science, experimentation

Warehouse

What Is a Warehouse?

A Warehouse is a fully managed, SQL-first analytical store optimized for business intelligence and reporting. It enforces schema-on-write and provides a traditional relational experience.

When to Choose a Warehouse

Choose a Warehouse when you need:

Strong SQL-based analytics
High-performance reporting
Well-defined schemas and governance
Centralized enterprise BI
Compatibility with Power BI Import or DirectQuery

Key Characteristics

T-SQL only (no Spark)
Optimized for structured data
Best for star/snowflake schemas
Familiar experience for SQL developers

Exam signal words: enterprise BI, reporting, structured, governed, SQL-first

Eventhouse

What Is an Eventhouse?

An Eventhouse is optimized for real-time and streaming analytics, built on KQL (Kusto Query Language). It is designed to handle high-velocity event data.

When to Choose an Eventhouse

Choose an Eventhouse when you need:

Near-real-time or real-time analytics
Streaming data ingestion
Operational or telemetry analytics
Event-based dashboards and alerts

Key Characteristics

Uses KQL for querying
Integrates with Eventstreams
Handles massive ingestion rates
Optimized for time-series data

Exam signal words: streaming, telemetry, IoT, real-time, events

Choosing the Right Option (Exam-Critical)

The DP-600 exam often presents scenarios where multiple options could work, but only one best fits the requirements.

Decision Matrix

Requirement	Best Choice
Raw + curated data	Lakehouse
Complex Spark transformations	Lakehouse
Enterprise BI reporting	Warehouse
Strong governance and schemas	Warehouse
Streaming or telemetry data	Eventhouse
Near-real-time dashboards	Eventhouse
SQL-only users	Warehouse
Data science workloads	Lakehouse

Common Exam Scenarios

You may be asked to:

Choose a storage type for a new analytics solution
Migrate from traditional systems to Fabric
Support both engineers and analysts
Enable real-time monitoring
Balance governance with flexibility

Always identify:

Data type (batch vs streaming)
Latency requirements
User personas
Query language
Governance needs

Best Practices to Remember

Use Lakehouse as a flexible foundation for analytics
Use Warehouse for polished, governed BI solutions
Use Eventhouse for real-time operational insights
Avoid forcing one option to handle all workloads
Let business requirements—not familiarity—drive the choice

Key Takeaway
For the DP-600 exam, choosing between a Lakehouse, Warehouse, or Eventhouse is about aligning data characteristics and access patterns with the right Fabric experience. Lakehouses provide flexibility, Warehouses deliver enterprise BI performance, and Eventhouses enable real-time analytics. The correct answer is almost always the one that best fits the scenario constraints.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions, with the below possible association:
- Spark, raw, experimentation → Lakehouse
- Enterprise BI, governed, SQL reporting → Warehouse
- Streaming, telemetry, real-time → Eventhouse
Expect scenario-based questions rather than direct definitions

1. Which Microsoft Fabric component is BEST suited for flexible analytics on both files and tables using Spark and SQL?

A. Warehouse
B. Eventhouse
C. Lakehouse
D. Semantic model

Correct Answer: C

Explanation:
A Lakehouse stores data in Delta format in OneLake and supports both Spark and SQL, making it ideal for flexible analytics across files and tables.

2. A team of data scientists needs to experiment with raw and curated data using notebooks. Which option should they choose?

A. Warehouse
B. Eventhouse
C. Semantic model
D. Lakehouse

Correct Answer: D

Explanation:
Lakehouses are designed for data engineering and data science workloads, offering Spark-based notebooks and flexible schema handling.

3. Which option is MOST appropriate for enterprise BI reporting with well-defined schemas and strong governance?

A. Lakehouse
B. Warehouse
C. Eventhouse
D. OneLake

Correct Answer: B

Explanation:
Warehouses are SQL-first, schema-on-write systems optimized for structured data, governance, and high-performance BI reporting.

4. A solution must support near-real-time analytics on streaming IoT telemetry data. Which Fabric component should be used?

A. Lakehouse
B. Warehouse
C. Eventhouse
D. Dataflow Gen2

Correct Answer: C

Explanation:
Eventhouses are optimized for high-velocity streaming data and real-time analytics using KQL.

5. Which query language is primarily used to analyze data in an Eventhouse?

A. T-SQL
B. Spark SQL
C. DAX
D. KQL

Correct Answer: D

Explanation:
Eventhouses are built on KQL (Kusto Query Language), which is optimized for querying event and time-series data.

6. A business analytics team requires fast dashboard performance and is familiar only with SQL. Which option best meets this requirement?

A. Lakehouse
B. Warehouse
C. Eventhouse
D. Spark notebook

Correct Answer: B

Explanation:
Warehouses provide a traditional SQL experience optimized for BI dashboards and reporting performance.

7. Which characteristic BEST distinguishes a Lakehouse from a Warehouse?

A. Lakehouses support Power BI
B. Warehouses store data in OneLake
C. Lakehouses support Spark-based processing
D. Warehouses cannot be governed

Correct Answer: C

Explanation:
Lakehouses uniquely support Spark-based processing, enabling advanced transformations and data science workloads.

8. A solution must store structured batch data and unstructured files in the same analytical store. Which option should be selected?

A. Warehouse
B. Eventhouse
C. Semantic model
D. Lakehouse

Correct Answer: D

Explanation:
Lakehouses support both structured tables and unstructured or semi-structured files within the same environment.

9. Which scenario MOST strongly indicates the need for an Eventhouse?

A. Monthly financial reporting
B. Slowly changing dimension modeling
C. Real-time operational monitoring
D. Ad hoc SQL analysis

Correct Answer: C

Explanation:
Eventhouses are designed for real-time analytics on streaming data, making them ideal for operational monitoring scenarios.

10. When choosing between a Lakehouse, Warehouse, or Eventhouse on the DP-600 exam, which factor is MOST important?

A. Personal familiarity with the tool
B. The default Fabric option
C. Data characteristics and latency requirements
D. Workspace size

Correct Answer: C

Explanation:
DP-600 emphasizes selecting the correct component based on data type (batch vs streaming), latency needs, user personas, and governance—not personal preference.

Analytics, BI Administration, Business Intelligence, Business Intelligence (BI) Development, Business Intelligence Platform, Data Analysis, Data Cleaning, Data Development, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Quality Assurance, Data Security, Data Strategy, Data Visualization, Data Warehousing, Data Wrangling, Databases, DP-600, Microsoft Certification, Microsoft Fabric, Power Query December 28, 2025

Ingest or Access Data as Needed

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Get data 
        --> Ingest or access data as needed

A core responsibility of a Microsoft Fabric Analytics Engineer is deciding how data should be brought into Fabric—or whether it should be brought in at all. For the DP-600 exam, this topic focuses on selecting the right ingestion or access pattern based on performance, freshness, cost, and governance requirements.

Ingest vs. Access: Key Concept

Before choosing a tool or method, understand the distinction:

Ingest data: Physically copy data into Fabric-managed storage (OneLake)
Access data: Query or reference data where it already lives, without copying

The exam frequently tests your ability to choose the most appropriate option—not just a working one.

Common Data Ingestion Methods in Microsoft Fabric

1. Dataflows Gen2

Best for:

Low-code ingestion and transformation
Reusable ingestion logic
Business-friendly data preparation

Key characteristics:

Uses Power Query Online
Supports scheduled refresh
Stores results in OneLake (Lakehouse or Warehouse)
Ideal for centralized, governed ingestion

Exam tip:
Use Dataflows Gen2 when reuse, transformation, and governance are priorities.

2. Data Pipelines (Copy Activity)

Best for:

High-volume or frequent ingestion
Orchestration across multiple sources
ELT-style workflows

Key characteristics:

Supports many source and sink types
Enables scheduling, dependencies, and retries
Minimal transformation (primarily copy)

Exam tip:
Choose pipelines when performance and orchestration matter more than transformation.

3. Notebooks (Spark)

Best for:

Complex transformations
Data science or advanced engineering
Custom ingestion logic

Key characteristics:

Full control using Spark (PySpark, Scala, SQL)
Suitable for large-scale processing
Writes directly to OneLake

Exam tip:
Notebooks are powerful but require engineering skills—don’t choose them for simple ingestion scenarios.

Accessing Data Without Ingesting

1. OneLake Shortcuts

Best for:

Avoiding data duplication
Reusing data across workspaces
Accessing external storage

Key characteristics:

Logical reference only (no copy)
Supports ADLS Gen2 and Amazon S3
Appears native in Lakehouse tables or files

Exam tip:
Shortcuts are often the best answer when the question mentions avoiding duplication or reducing storage cost.

2. DirectQuery

Best for:

Near-real-time data access
Large datasets that cannot be imported
Centralized source-of-truth systems

Key characteristics:

Queries run against the source system
Performance depends on source
Limited modeling flexibility compared to Import

Exam tip:
Expect trade-off questions involving DirectQuery vs. Import.

3. Real-Time Access (Eventstreams / KQL)

Best for:

Streaming and telemetry data
Operational and real-time analytics

Key characteristics:

Event-driven ingestion
Supports near-real-time dashboards
Often discovered via Real-Time hub

Exam tip:
Use real-time ingestion when freshness is measured in seconds, not hours.

Choosing the Right Approach (Exam-Critical)

You should be able to decide based on these factors:

Requirement	Best Option
Reusable ingestion logic	Dataflows Gen2
High-volume copy	Data pipelines
Complex transformations	Notebooks
Avoid duplication	OneLake shortcuts
Near real-time reporting	DirectQuery / Eventstreams
Governance and trust	Ingestion + endorsement

Governance and Security Considerations

Ingested data can inherit sensitivity labels
Access-based methods rely on source permissions
Workspace roles determine who can ingest or access data
Endorsed datasets should be preferred for reuse

DP-600 often frames ingestion questions within a governance context.

Common Exam Scenarios

You may be asked to:

Choose between ingesting data or accessing it directly
Identify when shortcuts are preferable to ingestion
Select the right tool for a specific ingestion pattern
Balance data freshness vs. performance
Reduce duplication across workspaces

Best Practices to Remember

Ingest when performance and modeling flexibility are required
Access when freshness, cost, or duplication is a concern
Centralize ingestion logic for reuse
Prefer Fabric-native patterns over external tools
Let business requirements drive architectural decisions

Key Takeaway
For the DP-600 exam, “Ingest or access data as needed” is about making intentional, informed choices. Microsoft Fabric provides multiple ways to bring data into analytics solutions, and the correct approach depends on scale, freshness, reuse, governance, and cost. Understanding why one method is better than another is far more important than memorizing features.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions (for example, low code/no code, large dataset, high-volume data, reuse, complex transformations)
Expect scenario-based questions rather than direct definitions

Also, keep in mind that …

DP-600 questions often include multiple valid options, but only one that best aligns with the scenario’s constraints. Always identify and consider factors such as:
- Data volume
- Freshness requirements
- Reuse and duplication concerns
- Transformation complexity

1. What is the primary difference between ingesting data and accessing data in Microsoft Fabric?

A. Ingested data cannot be secured
B. Accessed data is always slower
C. Ingesting copies data into OneLake, while accessing queries data in place
D. Accessed data requires a gateway

Correct Answer: C

Explanation:
Ingestion physically copies data into Fabric-managed storage (OneLake), while access-based approaches query or reference data where it already exists.

2. Which option is BEST when the goal is to avoid duplicating large datasets across multiple workspaces?

A. Import mode
B. Dataflows Gen2
C. OneLake shortcuts
D. Notebooks

Correct Answer: C

Explanation:
OneLake shortcuts allow data to be referenced without copying it, making them ideal for reuse and cost control.

3. A team needs reusable, low-code ingestion logic with scheduled refresh. Which Fabric feature should they use?

A. Spark notebooks
B. Data pipelines
C. Dataflows Gen2
D. DirectQuery

Correct Answer: C

Explanation:
Dataflows Gen2 provide Power Query–based ingestion with refresh scheduling and reuse across Fabric items.

4. Which ingestion method is MOST appropriate for complex transformations requiring custom logic?

A. Dataflows Gen2
B. Copy activity in pipelines
C. OneLake shortcuts
D. Spark notebooks

Correct Answer: D

Explanation:
Spark notebooks offer full control over transformation logic and are suited for complex, large-scale processing.

5. When should DirectQuery be preferred over Import mode?

A. When the dataset is small
B. When data freshness is critical
C. When transformations are complex
D. When performance must be maximized

Correct Answer: B

Explanation:
DirectQuery is preferred when near-real-time access to data is required, even though performance depends on the source system.

6. Which Fabric component is BEST suited for orchestrating high-volume data ingestion with dependencies and retries?

A. Dataflows Gen2
B. Data pipelines
C. Semantic models
D. Power BI Desktop

Correct Answer: B

Explanation:
Data pipelines are designed for orchestration, handling large volumes of data, scheduling, and dependency management.

7. A dataset is queried infrequently but must support advanced modeling features. Which approach is most appropriate?

A. DirectQuery
B. Access via shortcut
C. Import into OneLake
D. Eventstream ingestion

Correct Answer: C

Explanation:
Import mode supports full modeling capabilities and high query performance, making it suitable even for infrequently accessed data.

8. Which scenario best fits the use of real-time ingestion methods such as Eventstreams or KQL databases?

A. Monthly financial reporting
B. Static reference data
C. IoT telemetry and operational monitoring
D. Slowly changing dimensions

Correct Answer: C

Explanation:
Real-time ingestion is designed for continuous, event-driven data such as IoT telemetry and operational metrics.

9. Why might ingesting data be preferred over accessing it directly?

A. It always reduces storage costs
B. It eliminates the need for security
C. It improves performance and modeling flexibility
D. It avoids data refresh

Correct Answer: C

Explanation:
Ingesting data into OneLake enables faster query performance and full support for modeling features.

10. Which factor is MOST important when deciding between ingesting data and accessing it?

A. The color of the dashboard
B. The number of reports
C. Business requirements such as freshness, scale, and governance
D. The Fabric region

Correct Answer: C

Explanation:
The decision to ingest or access data should be driven by business needs, including performance, freshness, cost, and governance—not technical convenience alone.

BI Administration, Business Intelligence, Business Intelligence (BI) Development, Business Intelligence Platform, Data Analysis, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Security, Data Strategy, Data Visualization, Data Warehousing, DP-600, Microsoft Certification, Microsoft Fabric December 28, 2025

Discover Data by Using OneLake Catalog and Real-Time Hub

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Get data 
        --> Discover data by using OneLake catalog and Real-Time hub

Discovering existing data assets efficiently is a critical capability for a Microsoft Fabric Analytics Engineer. For the DP-600 exam, this topic emphasizes how to find, understand, and evaluate data sources using Fabric’s built-in discovery experiences: OneLake catalog and Real-Time hub.

Purpose of Data Discovery in Microsoft Fabric

In large Fabric environments, data already exists across:

Lakehouses
Warehouses
Semantic models
Streaming and event-based sources

The goal of data discovery is to:

Avoid duplicate ingestion
Promote reuse of trusted data
Understand data ownership, sensitivity, and freshness
Accelerate analytics development

OneLake Catalog

What Is the OneLake Catalog?

The OneLake catalog is a centralized metadata and discovery experience that allows users to browse and search data assets stored in OneLake, Fabric’s unified data lake.

It provides visibility into:

Lakehouses and Warehouses
Tables, views, and files
Shortcuts to external data
Endorsement and sensitivity metadata

Key Capabilities of the OneLake Catalog

For the exam, you should understand that the OneLake catalog enables users to:

Search and filter data assets across workspaces
View schema details (columns, data types)
Identify endorsed (Certified or Promoted) assets
See sensitivity labels applied to data
Discover data ownership and location
Reuse existing data rather than re-ingesting it

This supports both governance and efficiency.

Endorsement and Trust Signals

Within the OneLake catalog, users can quickly identify:

Certified items (approved and governed)
Promoted items (recommended but not formally certified)

These trust signals are important in exam scenarios that ask how to guide users toward reliable data sources.

Shortcuts and External Data

The catalog also exposes OneLake shortcuts, which allow data from:

Azure Data Lake Storage Gen2
Amazon S3
Other Fabric workspaces

to appear as native OneLake data without duplication. This is a key discovery mechanism tested in DP-600.

Real-Time Hub

What Is the Real-Time Hub?

The Real-Time hub is a discovery experience focused on streaming and event-driven data sources in Microsoft Fabric.

It centralizes access to:

Eventstreams
Azure Event Hubs
Azure IoT Hub
Azure Data Explorer (KQL databases)
Other real-time data producers

Key Capabilities of the Real-Time Hub

For exam purposes, understand that the Real-Time hub allows users to:

Discover available streaming data sources
Preview live event data
Subscribe to or reuse existing event streams
Understand data velocity and schema
Reduce duplication of real-time ingestion pipelines

This is especially important in architectures involving operational analytics or near real-time reporting.

OneLake Catalog vs. Real-Time Hub

Feature	OneLake Catalog	Real-Time Hub
Primary focus	Stored data	Streaming / event data
Data types	Tables, files, shortcuts	Events, streams, telemetry
Use case	Analytical and historical data	Real-time and operational analytics
Governance signals	Endorsement, sensitivity	Ownership, stream metadata

Understanding when to use each is a common exam theme.

Security and Governance Considerations

Data discovery respects Fabric security:

Users only see items they have permission to access
Sensitivity labels are visible in discovery views
Workspace roles control discovery depth

This ensures compliance while still promoting self-service analytics.

Exam-Relevant Scenarios

On the DP-600 exam, you may be asked to:

Identify how users can discover existing datasets before ingesting new data
Choose between OneLake catalog and Real-Time hub based on data type
Locate endorsed or certified data assets
Reduce duplication by reusing existing tables or streams
Enable self-service discovery while maintaining governance

Best Practices (Aligned to DP-600)

Use OneLake catalog first before creating new data connections
Encourage use of endorsed and certified assets
Use Real-Time hub to discover existing event streams
Leverage shortcuts to reuse data without copying
Combine discovery with proper labeling and endorsement

Key Takeaway
For the DP-600 exam, discovering data in Microsoft Fabric is about visibility, trust, and reuse. The OneLake catalog helps users find and understand stored analytical data, while the Real-Time hub enables discovery of live streaming sources. Together, they reduce redundancy, improve governance, and accelerate analytics development.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Pay close attention to when to use OneLake catalog vs. Real-Time hub
Look for and understand the usage scenario of keywords in exam questions (for example, discover, reuse, streaming, endorsed, shortcut)
Expect scenario-based questions that test architecture choices, rather than direct definitions

1. What is the primary purpose of the OneLake catalog in Microsoft Fabric?

A. To ingest streaming data
B. To schedule data refreshes
C. To discover and explore data stored in OneLake
D. To manage workspace permissions

Correct Answer: C

Explanation:
The OneLake catalog is a centralized discovery and metadata experience that helps users find, understand, and reuse data stored in OneLake across Fabric workspaces.

2. Which type of data is the Real-Time hub primarily designed to help users discover?

A. Historical data in Lakehouses
B. Structured warehouse tables
C. Streaming and event-driven data sources
D. Power BI semantic models

Correct Answer: C

Explanation:
The Real-Time hub focuses on streaming and event-based data such as Eventstreams, Azure Event Hubs, IoT Hub, and KQL databases.

3. A user wants to avoid re-ingesting data that already exists in another workspace. Which Fabric feature best supports this goal?

A. Data pipelines
B. OneLake shortcuts
C. Import mode
D. DirectQuery

Correct Answer: B

Explanation:
OneLake shortcuts allow data stored externally or in another workspace to appear as native OneLake data without physically copying it.

4. Which metadata element in the OneLake catalog helps users identify trusted and approved data assets?

A. Workspace name
B. File size
C. Endorsement status
D. Refresh schedule

Correct Answer: C

Explanation:
Endorsements (Promoted and Certified) act as trust signals, helping users quickly identify reliable and governed data assets.

5. Which statement about data visibility in the OneLake catalog is true?

A. All users can see all data across the tenant
B. Only workspace admins can see catalog entries
C. Users can only see items they have permission to access
D. Sensitivity labels hide data from discovery

Correct Answer: C

Explanation:
The OneLake catalog respects Fabric security boundaries—users only see data assets they are authorized to access.

6. A team is building a real-time dashboard and wants to see what streaming data already exists. Where should they look first?

A. OneLake catalog
B. Power BI Service
C. Dataflows Gen2
D. Real-Time hub

Correct Answer: D

Explanation:
The Real-Time hub centralizes discovery of streaming and event-based data sources, making it the best starting point for real-time analytics scenarios.

7. Which of the following items is most likely discovered through the Real-Time hub?

A. Parquet files in OneLake
B. Lakehouse Delta tables
C. Azure Event Hub streams
D. Warehouse SQL views

Correct Answer: C

Explanation:
Azure Event Hubs and other event-driven sources are exposed through the Real-Time hub, not the OneLake catalog.

8. What advantage does data discovery provide in large Fabric environments?

A. Faster Power BI rendering
B. Reduced licensing costs
C. Reduced data duplication and improved reuse
D. Automatic data modeling

Correct Answer: C

Explanation:
Discovering existing data assets helps teams reuse trusted data, reducing redundant ingestion and improving governance.

9. Which information is commonly visible when browsing an asset in the OneLake catalog?

A. User passwords
B. Column-level schema details
C. Tenant-wide permissions
D. Gateway configuration

Correct Answer: B

Explanation:
The OneLake catalog exposes metadata such as table schemas, column names, and data types to help users evaluate suitability before use.

10. Which scenario best demonstrates correct use of OneLake catalog and Real-Time hub together?

A. Using DirectQuery for all reports
B. Creating a new pipeline for every dataset
C. Discovering historical data in OneLake and live events in Real-Time hub
D. Applying sensitivity labels to dashboards

Correct Answer: C

Explanation:
OneLake catalog is optimized for discovering stored analytical data, while Real-Time hub is designed for discovering live streaming sources. Using both ensures comprehensive data discovery.