Category: Data Governance

Analytics, Big Data, Business Intelligence, Business Intelligence (BI) Development, Business Intelligence Platform, Data Analysis, Data Cleaning, Data Development, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Quality Assurance, Data Security, Data Strategy, Data Visualization, Data Warehousing, Data Wrangling, Databases, DP-600, Microsoft Certification, Microsoft Fabric, SQL December 28, 2025

Create Views, Functions, and Stored Procedures

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Transform data 
        --> Create views, functions, and stored procedures

Creating views, functions, and stored procedures is a core data transformation and modeling skill for analytics engineers working in Microsoft Fabric. These objects help abstract complexity, improve reusability, enforce business logic, and optimize downstream analytics and reporting.

This section of the DP-600 exam focuses on when, where, and how to use these objects effectively across Fabric components such as Lakehouses, Warehouses, and SQL analytics endpoints.

Views

What are Views?

A view is a virtual table defined by a SQL query. It does not store data itself but presents data dynamically from underlying tables.

Where Views Are Used in Fabric

Fabric Data Warehouse
Lakehouse SQL analytics endpoint
Exposed to Power BI semantic models and other consumers

Common Use Cases

Simplify complex joins and transformations
Present curated, analytics-ready datasets
Enforce column-level or row-level filtering logic
Provide a stable schema over evolving raw data

Key Characteristics

Always reflect the latest data
Can be used like tables in SELECT statements
Improve maintainability and readability
Can support security patterns when combined with permissions

Exam Tip

Know that views are ideal for logical transformations, not heavy compute or data persistence.

Functions

What are Functions?

Functions encapsulate reusable logic and return a value or a table. They help standardize calculations and transformations across queries.

Types of Functions (SQL)

Scalar functions: Return a single value (e.g., formatted date, calculated metric)
Table-valued functions (TVFs): Return a result set that behaves like a table

Where Functions Are Used in Fabric

Fabric Warehouses
SQL analytics endpoints for Lakehouses

Common Use Cases

Standardized business calculations
Reusable transformation logic
Parameterized filtering or calculations
Cleaner and more modular SQL code

Key Characteristics

Improve consistency across queries
Can be referenced in views and stored procedures
May impact performance if overused in large queries

Exam Tip

Functions promote reuse and consistency, but should be used thoughtfully to avoid performance overhead.

Stored Procedures

What are Stored Procedures?

Stored procedures are precompiled SQL code blocks that can accept parameters and perform multiple operations.

Where Stored Procedures Are Used in Fabric

Fabric Data Warehouses
SQL endpoints that support procedural logic

Common Use Cases

Complex transformation workflows
Batch processing logic
Conditional logic and control-of-flow (IF/ELSE, loops)
Data loading, validation, and orchestration steps

Key Characteristics

Can perform multiple SQL statements
Can accept input and output parameters
Improve performance by reducing repeated compilation
Support automation and operational workflows

Exam Tip

Stored procedures are best for procedural logic and orchestration, not ad-hoc analytics queries.

Choosing Between Views, Functions, and Stored Procedures

Object	Best Used For
Views	Simplifying data access and shaping datasets
Functions	Reusable calculations and logic
Stored Procedures	Complex, parameter-driven workflows

Understanding why you would choose one over another is frequently tested on the DP-600 exam.

Integration with Power BI and Analytics

Views are commonly consumed by Power BI semantic models
Functions help ensure consistent calculations across reports
Stored procedures are typically part of data preparation or orchestration, not directly consumed by reports

Governance and Best Practices

Use clear naming conventions (e.g., vw_, fn_, sp_)
Document business logic embedded in SQL objects
Minimize logic duplication across objects
Apply permissions carefully to control access
Balance reusability with performance considerations

What to Know for the DP-600 Exam

You should be comfortable with:

When to use views vs. functions vs. stored procedures
How these objects support data transformation
Their role in analytics-ready data preparation
How they integrate with Lakehouses, Warehouses, and Power BI
Performance and governance implications

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions

1. What is the primary purpose of creating a view in a Fabric lakehouse or warehouse?

A. To permanently store transformed data
B. To execute procedural logic with parameters
C. To provide a virtual, query-based representation of data
D. To orchestrate batch data loads

Correct Answer: C

Explanation:
A view is a virtual table defined by a SQL query. It does not store data but dynamically presents data from underlying tables, making it ideal for simplifying access and shaping analytics-ready datasets.

2. Which Fabric component commonly exposes views directly to Power BI semantic models?

A. Eventhouse
B. SQL analytics endpoint
C. Dataflow Gen2
D. Real-Time hub

Correct Answer: B

Explanation:
The SQL analytics endpoint (for lakehouses and warehouses) exposes tables and views that Power BI semantic models can consume using SQL-based connectivity.

3. When should you use a scalar function instead of a view?

A. When you need to return a dataset with multiple rows
B. When you need to encapsulate reusable calculation logic
C. When you need to perform batch updates
D. When you want to persist transformed data

Correct Answer: B

Explanation:
Scalar functions are designed to return a single value and are ideal for reusable calculations such as formatting, conditional logic, or standardized metrics.

4. Which object type can return a result set that behaves like a table?

A. Scalar function
B. Stored procedure
C. Table-valued function
D. View index

Correct Answer: C

Explanation:
A table-valued function (TVF) returns a table and can be used in FROM clauses, similar to a view but with parameterization support.

5. Which scenario is the best use case for a stored procedure?

A. Creating a simplified reporting dataset
B. Applying row-level filters for security
C. Running conditional logic with multiple SQL steps
D. Exposing data to Power BI reports

Correct Answer: C

Explanation:
Stored procedures are best suited for procedural logic, including conditional branching, looping, and executing multiple SQL statements as part of a workflow.

6. Why are views commonly preferred over duplicating transformation logic in reports?

A. Views improve report rendering speed automatically
B. Views centralize and standardize transformation logic
C. Views permanently store transformed data
D. Views replace semantic models

Correct Answer: B

Explanation:
Views allow transformation logic to be defined once and reused consistently across multiple reports and consumers, improving maintainability and governance.

7. What is a potential downside of overusing functions in large SQL queries?

A. Increased storage costs
B. Reduced data freshness
C. Potential performance degradation
D. Loss of security enforcement

Correct Answer: C

Explanation:
Functions, especially scalar functions, can negatively impact query performance when used extensively on large datasets due to repeated execution per row.

8. Which object is most appropriate for parameter-driven data preparation steps in a warehouse?

A. View
B. Scalar function
C. Table
D. Stored procedure

Correct Answer: D

Explanation:
Stored procedures support parameters, control-of-flow logic, and multiple statements, making them ideal for complex, repeatable data preparation tasks.

9. How do views support governance and security in Microsoft Fabric?

A. By encrypting data at rest
B. By defining workspace-level permissions
C. By exposing only selected columns or filtered rows
D. By controlling OneLake storage access

Correct Answer: C

Explanation:
Views can limit the columns and rows exposed to users, helping implement logical data access patterns when combined with permissions and security models.

10. Which statement best describes how these objects fit into Fabric’s analytics lifecycle?

A. They replace Power BI semantic models
B. They are primarily used for real-time streaming
C. They prepare and standardize data for downstream analytics
D. They manage infrastructure-level security

Correct Answer: C

Explanation:
Views, functions, and stored procedures play a key role in transforming, standardizing, and preparing data for consumption by semantic models, reports, and analytics tools.

Analytics, BI Administration, Business Intelligence, Business Intelligence (BI) Development, Business Intelligence Platform, Data Analysis, Data Development, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Munging, Data Security, Data Strategy, Data Visualization, Data Warehousing, Data Wrangling, Databases, DP-600, Microsoft Certification, Microsoft Fabric, Microsoft OneLake December 28, 2025

Choose Between a Lakehouse, Warehouse, or Eventhouse

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Get data 
        --> Choose Between a Lakehouse, Warehouse, or Eventhouse

One of the most important architectural decisions a Microsoft Fabric Analytics Engineer must make is selecting the right analytical store for a given workload. For the DP-600 exam, this topic tests your ability to choose between a Lakehouse, Warehouse, or Eventhouse based on data type, query patterns, latency requirements, and user personas.

Overview of the Three Options

Microsoft Fabric provides three primary analytics storage and query experiences:

Option	Primary Purpose
Lakehouse	Flexible analytics on files and tables using Spark and SQL
Warehouse	Enterprise-grade SQL analytics and BI reporting
Eventhouse	Real-time and near-real-time analytics on streaming data

Understanding why and when to use each is critical for DP-600 success.

Lakehouse

What Is a Lakehouse?

A Lakehouse combines the flexibility of a data lake with the structure of a data warehouse. Data is stored in Delta Lake format in OneLake and can be accessed using both Spark and SQL.

When to Choose a Lakehouse

Choose a Lakehouse when you need:

Flexible schema (schema-on-read or schema-on-write)
Support for data engineering and data science
Access to raw, curated, and enriched data
Spark-based transformations and notebooks
Mixed workloads (batch analytics, exploration, ML)

Key Characteristics

Supports files and tables
Uses Spark SQL and T-SQL endpoints
Ideal for ELT and advanced transformations
Easy integration with notebooks and pipelines

Exam signal words: flexible, raw data, Spark, data science, experimentation

Warehouse

What Is a Warehouse?

A Warehouse is a fully managed, SQL-first analytical store optimized for business intelligence and reporting. It enforces schema-on-write and provides a traditional relational experience.

When to Choose a Warehouse

Choose a Warehouse when you need:

Strong SQL-based analytics
High-performance reporting
Well-defined schemas and governance
Centralized enterprise BI
Compatibility with Power BI Import or DirectQuery

Key Characteristics

T-SQL only (no Spark)
Optimized for structured data
Best for star/snowflake schemas
Familiar experience for SQL developers

Exam signal words: enterprise BI, reporting, structured, governed, SQL-first

Eventhouse

What Is an Eventhouse?

An Eventhouse is optimized for real-time and streaming analytics, built on KQL (Kusto Query Language). It is designed to handle high-velocity event data.

When to Choose an Eventhouse

Choose an Eventhouse when you need:

Near-real-time or real-time analytics
Streaming data ingestion
Operational or telemetry analytics
Event-based dashboards and alerts

Key Characteristics

Uses KQL for querying
Integrates with Eventstreams
Handles massive ingestion rates
Optimized for time-series data

Exam signal words: streaming, telemetry, IoT, real-time, events

Choosing the Right Option (Exam-Critical)

The DP-600 exam often presents scenarios where multiple options could work, but only one best fits the requirements.

Decision Matrix

Requirement	Best Choice
Raw + curated data	Lakehouse
Complex Spark transformations	Lakehouse
Enterprise BI reporting	Warehouse
Strong governance and schemas	Warehouse
Streaming or telemetry data	Eventhouse
Near-real-time dashboards	Eventhouse
SQL-only users	Warehouse
Data science workloads	Lakehouse

Common Exam Scenarios

You may be asked to:

Choose a storage type for a new analytics solution
Migrate from traditional systems to Fabric
Support both engineers and analysts
Enable real-time monitoring
Balance governance with flexibility

Always identify:

Data type (batch vs streaming)
Latency requirements
User personas
Query language
Governance needs

Best Practices to Remember

Use Lakehouse as a flexible foundation for analytics
Use Warehouse for polished, governed BI solutions
Use Eventhouse for real-time operational insights
Avoid forcing one option to handle all workloads
Let business requirements—not familiarity—drive the choice

Key Takeaway
For the DP-600 exam, choosing between a Lakehouse, Warehouse, or Eventhouse is about aligning data characteristics and access patterns with the right Fabric experience. Lakehouses provide flexibility, Warehouses deliver enterprise BI performance, and Eventhouses enable real-time analytics. The correct answer is almost always the one that best fits the scenario constraints.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions, with the below possible association:
- Spark, raw, experimentation → Lakehouse
- Enterprise BI, governed, SQL reporting → Warehouse
- Streaming, telemetry, real-time → Eventhouse
Expect scenario-based questions rather than direct definitions

1. Which Microsoft Fabric component is BEST suited for flexible analytics on both files and tables using Spark and SQL?

A. Warehouse
B. Eventhouse
C. Lakehouse
D. Semantic model

Correct Answer: C

Explanation:
A Lakehouse stores data in Delta format in OneLake and supports both Spark and SQL, making it ideal for flexible analytics across files and tables.

2. A team of data scientists needs to experiment with raw and curated data using notebooks. Which option should they choose?

A. Warehouse
B. Eventhouse
C. Semantic model
D. Lakehouse

Correct Answer: D

Explanation:
Lakehouses are designed for data engineering and data science workloads, offering Spark-based notebooks and flexible schema handling.

3. Which option is MOST appropriate for enterprise BI reporting with well-defined schemas and strong governance?

A. Lakehouse
B. Warehouse
C. Eventhouse
D. OneLake

Correct Answer: B

Explanation:
Warehouses are SQL-first, schema-on-write systems optimized for structured data, governance, and high-performance BI reporting.

4. A solution must support near-real-time analytics on streaming IoT telemetry data. Which Fabric component should be used?

A. Lakehouse
B. Warehouse
C. Eventhouse
D. Dataflow Gen2

Correct Answer: C

Explanation:
Eventhouses are optimized for high-velocity streaming data and real-time analytics using KQL.

5. Which query language is primarily used to analyze data in an Eventhouse?

A. T-SQL
B. Spark SQL
C. DAX
D. KQL

Correct Answer: D

Explanation:
Eventhouses are built on KQL (Kusto Query Language), which is optimized for querying event and time-series data.

6. A business analytics team requires fast dashboard performance and is familiar only with SQL. Which option best meets this requirement?

A. Lakehouse
B. Warehouse
C. Eventhouse
D. Spark notebook

Correct Answer: B

Explanation:
Warehouses provide a traditional SQL experience optimized for BI dashboards and reporting performance.

7. Which characteristic BEST distinguishes a Lakehouse from a Warehouse?

A. Lakehouses support Power BI
B. Warehouses store data in OneLake
C. Lakehouses support Spark-based processing
D. Warehouses cannot be governed

Correct Answer: C

Explanation:
Lakehouses uniquely support Spark-based processing, enabling advanced transformations and data science workloads.

8. A solution must store structured batch data and unstructured files in the same analytical store. Which option should be selected?

A. Warehouse
B. Eventhouse
C. Semantic model
D. Lakehouse

Correct Answer: D

Explanation:
Lakehouses support both structured tables and unstructured or semi-structured files within the same environment.

9. Which scenario MOST strongly indicates the need for an Eventhouse?

A. Monthly financial reporting
B. Slowly changing dimension modeling
C. Real-time operational monitoring
D. Ad hoc SQL analysis

Correct Answer: C

Explanation:
Eventhouses are designed for real-time analytics on streaming data, making them ideal for operational monitoring scenarios.

10. When choosing between a Lakehouse, Warehouse, or Eventhouse on the DP-600 exam, which factor is MOST important?

A. Personal familiarity with the tool
B. The default Fabric option
C. Data characteristics and latency requirements
D. Workspace size

Correct Answer: C

Explanation:
DP-600 emphasizes selecting the correct component based on data type (batch vs streaming), latency needs, user personas, and governance—not personal preference.

Analytics, BI Administration, Business Intelligence, Business Intelligence (BI) Development, Business Intelligence Platform, Data Analysis, Data Cleaning, Data Development, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Quality Assurance, Data Security, Data Strategy, Data Visualization, Data Warehousing, Data Wrangling, Databases, DP-600, Microsoft Certification, Microsoft Fabric, Power Query December 28, 2025

Ingest or Access Data as Needed

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Get data 
        --> Ingest or access data as needed

A core responsibility of a Microsoft Fabric Analytics Engineer is deciding how data should be brought into Fabric—or whether it should be brought in at all. For the DP-600 exam, this topic focuses on selecting the right ingestion or access pattern based on performance, freshness, cost, and governance requirements.

Ingest vs. Access: Key Concept

Before choosing a tool or method, understand the distinction:

Ingest data: Physically copy data into Fabric-managed storage (OneLake)
Access data: Query or reference data where it already lives, without copying

The exam frequently tests your ability to choose the most appropriate option—not just a working one.

Common Data Ingestion Methods in Microsoft Fabric

1. Dataflows Gen2

Best for:

Low-code ingestion and transformation
Reusable ingestion logic
Business-friendly data preparation

Key characteristics:

Uses Power Query Online
Supports scheduled refresh
Stores results in OneLake (Lakehouse or Warehouse)
Ideal for centralized, governed ingestion

Exam tip:
Use Dataflows Gen2 when reuse, transformation, and governance are priorities.

2. Data Pipelines (Copy Activity)

Best for:

High-volume or frequent ingestion
Orchestration across multiple sources
ELT-style workflows

Key characteristics:

Supports many source and sink types
Enables scheduling, dependencies, and retries
Minimal transformation (primarily copy)

Exam tip:
Choose pipelines when performance and orchestration matter more than transformation.

3. Notebooks (Spark)

Best for:

Complex transformations
Data science or advanced engineering
Custom ingestion logic

Key characteristics:

Full control using Spark (PySpark, Scala, SQL)
Suitable for large-scale processing
Writes directly to OneLake

Exam tip:
Notebooks are powerful but require engineering skills—don’t choose them for simple ingestion scenarios.

Accessing Data Without Ingesting

1. OneLake Shortcuts

Best for:

Avoiding data duplication
Reusing data across workspaces
Accessing external storage

Key characteristics:

Logical reference only (no copy)
Supports ADLS Gen2 and Amazon S3
Appears native in Lakehouse tables or files

Exam tip:
Shortcuts are often the best answer when the question mentions avoiding duplication or reducing storage cost.

2. DirectQuery

Best for:

Near-real-time data access
Large datasets that cannot be imported
Centralized source-of-truth systems

Key characteristics:

Queries run against the source system
Performance depends on source
Limited modeling flexibility compared to Import

Exam tip:
Expect trade-off questions involving DirectQuery vs. Import.

3. Real-Time Access (Eventstreams / KQL)

Best for:

Streaming and telemetry data
Operational and real-time analytics

Key characteristics:

Event-driven ingestion
Supports near-real-time dashboards
Often discovered via Real-Time hub

Exam tip:
Use real-time ingestion when freshness is measured in seconds, not hours.

Choosing the Right Approach (Exam-Critical)

You should be able to decide based on these factors:

Requirement	Best Option
Reusable ingestion logic	Dataflows Gen2
High-volume copy	Data pipelines
Complex transformations	Notebooks
Avoid duplication	OneLake shortcuts
Near real-time reporting	DirectQuery / Eventstreams
Governance and trust	Ingestion + endorsement

Governance and Security Considerations

Ingested data can inherit sensitivity labels
Access-based methods rely on source permissions
Workspace roles determine who can ingest or access data
Endorsed datasets should be preferred for reuse

DP-600 often frames ingestion questions within a governance context.

Common Exam Scenarios

You may be asked to:

Choose between ingesting data or accessing it directly
Identify when shortcuts are preferable to ingestion
Select the right tool for a specific ingestion pattern
Balance data freshness vs. performance
Reduce duplication across workspaces

Best Practices to Remember

Ingest when performance and modeling flexibility are required
Access when freshness, cost, or duplication is a concern
Centralize ingestion logic for reuse
Prefer Fabric-native patterns over external tools
Let business requirements drive architectural decisions

Key Takeaway
For the DP-600 exam, “Ingest or access data as needed” is about making intentional, informed choices. Microsoft Fabric provides multiple ways to bring data into analytics solutions, and the correct approach depends on scale, freshness, reuse, governance, and cost. Understanding why one method is better than another is far more important than memorizing features.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions (for example, low code/no code, large dataset, high-volume data, reuse, complex transformations)
Expect scenario-based questions rather than direct definitions

Also, keep in mind that …

DP-600 questions often include multiple valid options, but only one that best aligns with the scenario’s constraints. Always identify and consider factors such as:
- Data volume
- Freshness requirements
- Reuse and duplication concerns
- Transformation complexity

1. What is the primary difference between ingesting data and accessing data in Microsoft Fabric?

A. Ingested data cannot be secured
B. Accessed data is always slower
C. Ingesting copies data into OneLake, while accessing queries data in place
D. Accessed data requires a gateway

Correct Answer: C

Explanation:
Ingestion physically copies data into Fabric-managed storage (OneLake), while access-based approaches query or reference data where it already exists.

2. Which option is BEST when the goal is to avoid duplicating large datasets across multiple workspaces?

A. Import mode
B. Dataflows Gen2
C. OneLake shortcuts
D. Notebooks

Correct Answer: C

Explanation:
OneLake shortcuts allow data to be referenced without copying it, making them ideal for reuse and cost control.

3. A team needs reusable, low-code ingestion logic with scheduled refresh. Which Fabric feature should they use?

A. Spark notebooks
B. Data pipelines
C. Dataflows Gen2
D. DirectQuery

Correct Answer: C

Explanation:
Dataflows Gen2 provide Power Query–based ingestion with refresh scheduling and reuse across Fabric items.

4. Which ingestion method is MOST appropriate for complex transformations requiring custom logic?

A. Dataflows Gen2
B. Copy activity in pipelines
C. OneLake shortcuts
D. Spark notebooks

Correct Answer: D

Explanation:
Spark notebooks offer full control over transformation logic and are suited for complex, large-scale processing.

5. When should DirectQuery be preferred over Import mode?

A. When the dataset is small
B. When data freshness is critical
C. When transformations are complex
D. When performance must be maximized

Correct Answer: B

Explanation:
DirectQuery is preferred when near-real-time access to data is required, even though performance depends on the source system.

6. Which Fabric component is BEST suited for orchestrating high-volume data ingestion with dependencies and retries?

A. Dataflows Gen2
B. Data pipelines
C. Semantic models
D. Power BI Desktop

Correct Answer: B

Explanation:
Data pipelines are designed for orchestration, handling large volumes of data, scheduling, and dependency management.

7. A dataset is queried infrequently but must support advanced modeling features. Which approach is most appropriate?

A. DirectQuery
B. Access via shortcut
C. Import into OneLake
D. Eventstream ingestion

Correct Answer: C

Explanation:
Import mode supports full modeling capabilities and high query performance, making it suitable even for infrequently accessed data.

8. Which scenario best fits the use of real-time ingestion methods such as Eventstreams or KQL databases?

A. Monthly financial reporting
B. Static reference data
C. IoT telemetry and operational monitoring
D. Slowly changing dimensions

Correct Answer: C

Explanation:
Real-time ingestion is designed for continuous, event-driven data such as IoT telemetry and operational metrics.

9. Why might ingesting data be preferred over accessing it directly?

A. It always reduces storage costs
B. It eliminates the need for security
C. It improves performance and modeling flexibility
D. It avoids data refresh

Correct Answer: C

Explanation:
Ingesting data into OneLake enables faster query performance and full support for modeling features.

10. Which factor is MOST important when deciding between ingesting data and accessing it?

A. The color of the dashboard
B. The number of reports
C. Business requirements such as freshness, scale, and governance
D. The Fabric region

Correct Answer: C

Explanation:
The decision to ingest or access data should be driven by business needs, including performance, freshness, cost, and governance—not technical convenience alone.

BI Administration, Business Intelligence, Business Intelligence (BI) Development, Business Intelligence Platform, Data Analysis, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Security, Data Strategy, Data Visualization, Data Warehousing, DP-600, Microsoft Certification, Microsoft Fabric December 28, 2025

Discover Data by Using OneLake Catalog and Real-Time Hub

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Get data 
        --> Discover data by using OneLake catalog and Real-Time hub

Discovering existing data assets efficiently is a critical capability for a Microsoft Fabric Analytics Engineer. For the DP-600 exam, this topic emphasizes how to find, understand, and evaluate data sources using Fabric’s built-in discovery experiences: OneLake catalog and Real-Time hub.

Purpose of Data Discovery in Microsoft Fabric

In large Fabric environments, data already exists across:

Lakehouses
Warehouses
Semantic models
Streaming and event-based sources

The goal of data discovery is to:

Avoid duplicate ingestion
Promote reuse of trusted data
Understand data ownership, sensitivity, and freshness
Accelerate analytics development

OneLake Catalog

What Is the OneLake Catalog?

The OneLake catalog is a centralized metadata and discovery experience that allows users to browse and search data assets stored in OneLake, Fabric’s unified data lake.

It provides visibility into:

Lakehouses and Warehouses
Tables, views, and files
Shortcuts to external data
Endorsement and sensitivity metadata

Key Capabilities of the OneLake Catalog

For the exam, you should understand that the OneLake catalog enables users to:

Search and filter data assets across workspaces
View schema details (columns, data types)
Identify endorsed (Certified or Promoted) assets
See sensitivity labels applied to data
Discover data ownership and location
Reuse existing data rather than re-ingesting it

This supports both governance and efficiency.

Endorsement and Trust Signals

Within the OneLake catalog, users can quickly identify:

Certified items (approved and governed)
Promoted items (recommended but not formally certified)

These trust signals are important in exam scenarios that ask how to guide users toward reliable data sources.

Shortcuts and External Data

The catalog also exposes OneLake shortcuts, which allow data from:

Azure Data Lake Storage Gen2
Amazon S3
Other Fabric workspaces

to appear as native OneLake data without duplication. This is a key discovery mechanism tested in DP-600.

Real-Time Hub

What Is the Real-Time Hub?

The Real-Time hub is a discovery experience focused on streaming and event-driven data sources in Microsoft Fabric.

It centralizes access to:

Eventstreams
Azure Event Hubs
Azure IoT Hub
Azure Data Explorer (KQL databases)
Other real-time data producers

Key Capabilities of the Real-Time Hub

For exam purposes, understand that the Real-Time hub allows users to:

Discover available streaming data sources
Preview live event data
Subscribe to or reuse existing event streams
Understand data velocity and schema
Reduce duplication of real-time ingestion pipelines

This is especially important in architectures involving operational analytics or near real-time reporting.

OneLake Catalog vs. Real-Time Hub

Feature	OneLake Catalog	Real-Time Hub
Primary focus	Stored data	Streaming / event data
Data types	Tables, files, shortcuts	Events, streams, telemetry
Use case	Analytical and historical data	Real-time and operational analytics
Governance signals	Endorsement, sensitivity	Ownership, stream metadata

Understanding when to use each is a common exam theme.

Security and Governance Considerations

Data discovery respects Fabric security:

Users only see items they have permission to access
Sensitivity labels are visible in discovery views
Workspace roles control discovery depth

This ensures compliance while still promoting self-service analytics.

Exam-Relevant Scenarios

On the DP-600 exam, you may be asked to:

Identify how users can discover existing datasets before ingesting new data
Choose between OneLake catalog and Real-Time hub based on data type
Locate endorsed or certified data assets
Reduce duplication by reusing existing tables or streams
Enable self-service discovery while maintaining governance

Best Practices (Aligned to DP-600)

Use OneLake catalog first before creating new data connections
Encourage use of endorsed and certified assets
Use Real-Time hub to discover existing event streams
Leverage shortcuts to reuse data without copying
Combine discovery with proper labeling and endorsement

Key Takeaway
For the DP-600 exam, discovering data in Microsoft Fabric is about visibility, trust, and reuse. The OneLake catalog helps users find and understand stored analytical data, while the Real-Time hub enables discovery of live streaming sources. Together, they reduce redundancy, improve governance, and accelerate analytics development.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Pay close attention to when to use OneLake catalog vs. Real-Time hub
Look for and understand the usage scenario of keywords in exam questions (for example, discover, reuse, streaming, endorsed, shortcut)
Expect scenario-based questions that test architecture choices, rather than direct definitions

1. What is the primary purpose of the OneLake catalog in Microsoft Fabric?

A. To ingest streaming data
B. To schedule data refreshes
C. To discover and explore data stored in OneLake
D. To manage workspace permissions

Correct Answer: C

Explanation:
The OneLake catalog is a centralized discovery and metadata experience that helps users find, understand, and reuse data stored in OneLake across Fabric workspaces.

2. Which type of data is the Real-Time hub primarily designed to help users discover?

A. Historical data in Lakehouses
B. Structured warehouse tables
C. Streaming and event-driven data sources
D. Power BI semantic models

Correct Answer: C

Explanation:
The Real-Time hub focuses on streaming and event-based data such as Eventstreams, Azure Event Hubs, IoT Hub, and KQL databases.

3. A user wants to avoid re-ingesting data that already exists in another workspace. Which Fabric feature best supports this goal?

A. Data pipelines
B. OneLake shortcuts
C. Import mode
D. DirectQuery

Correct Answer: B

Explanation:
OneLake shortcuts allow data stored externally or in another workspace to appear as native OneLake data without physically copying it.

4. Which metadata element in the OneLake catalog helps users identify trusted and approved data assets?

A. Workspace name
B. File size
C. Endorsement status
D. Refresh schedule

Correct Answer: C

Explanation:
Endorsements (Promoted and Certified) act as trust signals, helping users quickly identify reliable and governed data assets.

5. Which statement about data visibility in the OneLake catalog is true?

A. All users can see all data across the tenant
B. Only workspace admins can see catalog entries
C. Users can only see items they have permission to access
D. Sensitivity labels hide data from discovery

Correct Answer: C

Explanation:
The OneLake catalog respects Fabric security boundaries—users only see data assets they are authorized to access.

6. A team is building a real-time dashboard and wants to see what streaming data already exists. Where should they look first?

A. OneLake catalog
B. Power BI Service
C. Dataflows Gen2
D. Real-Time hub

Correct Answer: D

Explanation:
The Real-Time hub centralizes discovery of streaming and event-based data sources, making it the best starting point for real-time analytics scenarios.

7. Which of the following items is most likely discovered through the Real-Time hub?

A. Parquet files in OneLake
B. Lakehouse Delta tables
C. Azure Event Hub streams
D. Warehouse SQL views

Correct Answer: C

Explanation:
Azure Event Hubs and other event-driven sources are exposed through the Real-Time hub, not the OneLake catalog.

8. What advantage does data discovery provide in large Fabric environments?

A. Faster Power BI rendering
B. Reduced licensing costs
C. Reduced data duplication and improved reuse
D. Automatic data modeling

Correct Answer: C

Explanation:
Discovering existing data assets helps teams reuse trusted data, reducing redundant ingestion and improving governance.

9. Which information is commonly visible when browsing an asset in the OneLake catalog?

A. User passwords
B. Column-level schema details
C. Tenant-wide permissions
D. Gateway configuration

Correct Answer: B

Explanation:
The OneLake catalog exposes metadata such as table schemas, column names, and data types to help users evaluate suitability before use.

10. Which scenario best demonstrates correct use of OneLake catalog and Real-Time hub together?

A. Using DirectQuery for all reports
B. Creating a new pipeline for every dataset
C. Discovering historical data in OneLake and live events in Real-Time hub
D. Applying sensitivity labels to dashboards

Correct Answer: C

Explanation:
OneLake catalog is optimized for discovering stored analytical data, while Real-Time hub is designed for discovering live streaming sources. Using both ensures comprehensive data discovery.

Analytics, BI Administration, Business Intelligence, Business Intelligence (BI) Development, Business Intelligence Platform, Data Analysis, Data Development, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Security, Data Strategy, Data Visualization, Data Warehousing, Data Wrangling, DP-600, Microsoft Certification, Microsoft Fabric, Power BI, Power Query December 28, 2025

Create a Data Connection in Microsoft Fabric

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Get data 
        --> Create a data connection

Creating data connections is a foundational skill for a Microsoft Fabric Analytics Engineer. In the DP-600 exam, this topic focuses on how to securely and efficiently connect Fabric workloads—such as Lakehouses, Warehouses, Dataflows Gen2, and semantic models—to a wide variety of data sources.

What a Data Connection Means in Microsoft Fabric

A data connection defines how Fabric authenticates to, accesses, and retrieves data from a source system. It includes:

The data source type
Connection details (server, database, endpoint, file path, etc.)
Authentication method
Optional privacy and credential reuse settings

Once created, a data connection can often be reused across multiple items within a workspace.

Common Data Sources in Fabric

For the exam, you should be familiar with connecting to the following categories of data sources:

1. Azure and Microsoft Data Sources

Azure SQL Database
Azure Synapse (dedicated and serverless pools)
Azure Data Lake Storage Gen2
Azure Blob Storage
OneLake (Fabric-native storage)
Power BI semantic models (DirectQuery)

2. On-Premises Data Sources

SQL Server
Oracle
Other relational databases

These typically require an On-premises Data Gateway.

3. Files and Semi-Structured Data

CSV, JSON, Parquet, Excel
Files stored in OneLake, ADLS Gen2, SharePoint, or local file systems

Where Data Connections Are Created

In Microsoft Fabric, data connections can be created from several entry points:

Lakehouse: Add data via shortcuts or ingestion
Warehouse: Connect external data or ingest via pipelines
Dataflows Gen2: Define connections as part of Power Query Online
Pipelines: Configure source connections in copy activities
Semantic models: Connect via Import or DirectQuery

Understanding where the connection is configured is important for exam scenarios.

Authentication Methods

The DP-600 exam commonly tests authentication concepts. Be familiar with:

Microsoft Entra ID (OAuth) – Recommended and most secure
Service principal – Common for automation and CI/CD
Account key / Shared Access Signature (SAS) – Often used for storage
Username and password – Less secure, sometimes legacy

You should also understand when credentials are:

Stored at the connection level
Managed per workspace
Reused across multiple items

Gateways and Connectivity Modes

On-Premises Data Gateway

Required when connecting Fabric to on-premises sources. Key points:

Can be standard or personal (standard is preferred)
Must be online for refresh and query operations
Uses outbound connections only

Connectivity Modes

Import: Data is loaded into Fabric storage
DirectQuery: Queries run against the source system
Shortcut-based access: Data remains external but appears native in OneLake

Security and Governance Considerations

When creating data connections, Fabric enforces governance through:

Workspace roles (Viewer, Contributor, Member, Admin)
Credential isolation per workspace
Sensitivity labels inherited from data sources (when applicable)

Exam questions may test your ability to choose the most secure and scalable connection method.

Best Practices (Exam-Relevant)

Prefer Entra ID authentication over credentials or keys
Use OneLake shortcuts to avoid unnecessary data duplication
Centralize connections in Dataflows Gen2 for reuse
Validate gateway availability for on-premises sources
Align connection methods with performance needs (Import vs DirectQuery)

How This Appears on the DP-600 Exam

You may be asked to:

Identify the correct data connection method for a scenario
Choose the appropriate authentication type
Determine when a gateway is required
Decide where to create a connection for reuse and governance
Troubleshoot refresh or connectivity issues

Key Takeaway
Creating data connections in Microsoft Fabric is about more than just accessing data—it’s about security, performance, reusability, and governance. For the DP-600 exam, focus on understanding source types, authentication options, gateways, and where connections are defined within the Fabric ecosystem.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions (for example, gateway, authentication, reuse, DirectQuery vs Import)
Expect scenario-based questions rather than direct definitions

1. Which authentication method is generally recommended when creating data connections in Microsoft Fabric?

A. Username and password
B. Shared Access Signature (SAS)
C. Microsoft Entra ID (OAuth)
D. Account key

Correct Answer: C

Explanation:
Microsoft Entra ID (OAuth) is the recommended authentication method because it provides centralized identity management, better security, support for conditional access, and easier credential rotation compared to passwords or keys.

2. When is an On-premises Data Gateway required in Microsoft Fabric?

A. When connecting to Azure SQL Database
B. When connecting to OneLake
C. When connecting to an on-premises SQL Server
D. When connecting to Azure Data Lake Storage Gen2

Correct Answer: C

Explanation:
An On-premises Data Gateway is required when Fabric needs to access data sources that are hosted on-premises. Cloud-based sources such as Azure SQL Database or ADLS Gen2 do not require a gateway.

3. Which Fabric feature allows external data to appear as if it is stored in OneLake without copying the data?

A. Import mode
B. DirectQuery mode
C. OneLake shortcuts
D. Data pipelines

Correct Answer: C

Explanation:
OneLake shortcuts provide a logical reference to external storage locations (such as ADLS Gen2 or S3) without physically moving or duplicating the data.

4. You want multiple Fabric items in the same workspace to reuse a single data connection. Where should you create the connection?

A. In each semantic model
B. In Dataflows Gen2
C. In Power BI Desktop only
D. In Excel

Correct Answer: B

Explanation:
Dataflows Gen2 are designed for centralized data ingestion and transformation, making them ideal for creating reusable data connections across multiple Fabric items.

5. Which connectivity mode loads data into Fabric storage and provides the best query performance?

A. DirectQuery
B. Live connection
C. Shortcut-based access
D. Import

Correct Answer: D

Explanation:
Import mode copies data into Fabric-managed storage, enabling high-performance queries and full modeling capabilities at the cost of data freshness.

6. Which statement about DirectQuery connections in Fabric is true?

A. Data is stored in OneLake
B. Queries are always faster than Import mode
C. Queries are executed against the source system
D. A gateway is never required

Correct Answer: C

Explanation:
With DirectQuery, queries are sent directly to the source system at runtime. Performance depends on the source, and a gateway may be required for on-premises sources.

7. Which role is required to create or edit data connections within a Fabric workspace?

A. Viewer
B. Contributor
C. Member
D. Admin

Correct Answer: B

Explanation:
Users must have at least Contributor permissions to create or modify data connections. Viewers have read-only access and cannot manage connections.

8. Which file formats are commonly supported when creating file-based data connections in Fabric?

A. CSV only
B. CSV, JSON, Parquet, Excel
C. TXT only
D. XML only

Correct Answer: B

Explanation:
Microsoft Fabric supports a wide range of structured and semi-structured file formats, including CSV, JSON, Parquet, and Excel, especially when stored in OneLake or ADLS Gen2.

9. What is the primary security benefit of using a service principal for data connections?

A. Faster query performance
B. No need for a gateway
C. Automated, non-interactive authentication
D. Unlimited access to all workspaces

Correct Answer: C

Explanation:
Service principals enable secure, automated authentication scenarios (such as CI/CD pipelines) without relying on individual user credentials.

10. A data refresh in Fabric fails because credentials are missing. What is the most likely cause?

A. The dataset is in Import mode
B. The gateway is offline or misconfigured
C. The semantic model contains calculated columns
D. The file format is unsupported

Correct Answer: B

Explanation:
If a data source requires an On-premises Data Gateway and the gateway is offline or incorrectly configured, Fabric cannot access the credentials, causing refresh failures.

BI Administration, Business Intelligence, Business Intelligence (BI) Development, Business Intelligence Platform, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Security, Data Strategy, Data Visualization, Data Warehousing, DP-600, Microsoft Certification, Microsoft Fabric, Microsoft OneLake December 28, 2025

Configure Direct Lake, including default fallback and refresh behavior

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models (25-30%) 
    --> Optimize enterprise-scale semantic models 
        --> Configure Direct Lake, including default fallback and refresh behavior

Overview

Direct Lake is a storage and connectivity mode in Microsoft Fabric semantic models that enables Power BI to query data directly from OneLake without importing data into VertiPaq or sending queries back to the data source (as in DirectQuery). It is designed to deliver near–Import performance with DirectQuery-like freshness, making it a key feature for enterprise-scale analytics.

For the DP-600 exam, you are expected to understand:

How Direct Lake works
When and why fallback occurs
How default fallback behavior is configured
How refresh behaves in Direct Lake models
Common performance and design considerations

How Direct Lake Works

In Direct Lake mode:

Data resides in Delta tables stored in OneLake (typically from a Lakehouse or Warehouse).
The semantic model reads Parquet/Delta files directly, bypassing data import.
Metadata and file statistics are cached to optimize query performance.
Queries are executed without duplicating data into VertiPaq storage.

This architecture reduces data duplication while still enabling fast, interactive analytics.

Default Fallback Behavior

What Is Direct Lake Fallback?

Fallback occurs when a query or operation cannot be executed using Direct Lake. In these cases, the semantic model automatically falls back to another mode to ensure the query still returns results.

Depending on configuration, fallback may occur to:

DirectQuery, or
Import (VertiPaq), if data is available

Fallback is automatic and transparent to report users unless explicitly restricted.

Common Causes of Fallback

Direct Lake fallback can be triggered by:

Unsupported DAX functions or expressions
Unsupported data types in Delta tables
Complex model features (certain calculation patterns, security scenarios)
Queries that cannot be resolved efficiently using file-based access
Temporary unavailability of OneLake files

Understanding these triggers is important for diagnosing performance issues.

Configuring Default Fallback Behavior

In Fabric semantic model settings, you can configure:

Allow fallback (default) – Ensures queries continue to work even when Direct Lake is not supported.
Disable fallback – Queries fail instead of falling back, which is useful for enforcing performance expectations or testing Direct Lake compatibility.

From an exam perspective:

Allowing fallback prioritizes reliability
Disabling fallback prioritizes predictability and performance validation

Refresh Behavior in Direct Lake Models

Do Direct Lake Models Require Refresh?

Unlike Import mode:

Direct Lake does not require scheduled data refresh to reflect new data in OneLake.
New or updated Delta files are automatically visible to the semantic model.

However, metadata refreshes are still relevant.

Types of Refresh in Direct Lake

Metadata Refresh
- Updates table schemas, partitions, and statistics
- Required when:
  - Columns are added or removed
  - Table structures change
- Lightweight compared to Import refresh
Hybrid Scenarios
- If fallback to Import is enabled and used, those imported parts do require refresh
- Mixed behavior may exist in composite or fallback-heavy models

Impact of Refresh on Performance

No large-scale data movement during refresh
Faster model readiness after schema changes
Reduced refresh windows compared to Import models
Lower memory pressure in capacity

This makes Direct Lake especially suitable for large, frequently updated datasets.

Performance and Design Considerations

To optimize Direct Lake usage:

Use supported Delta table features and data types
Keep models simple and star-schema based
Avoid unnecessary bidirectional relationships
Monitor fallback behavior using performance tools
Test critical DAX measures for Direct Lake compatibility

From an exam standpoint, expect scenario-based questions asking you to choose Direct Lake and configure fallback appropriately for scale, freshness, and reliability.

When to Use Direct Lake

Direct Lake is best suited for:

Large datasets stored in OneLake
Near-real-time analytics
Enterprise models that need both performance and freshness
Organizations standardizing on Fabric Lakehouse or Warehouse architectures

Key DP-600 Takeaways

Direct Lake queries Delta tables directly in OneLake
Default fallback ensures query continuity when Direct Lake isn’t supported
Fallback behavior can be enabled or disabled
Data refresh is not required, but metadata refresh still matters
Understanding fallback and refresh behavior is critical for enterprise-scale optimization

DP-600 Exam Tip 💡

Expect scenario-based questions where you must decide:

Whether to enable or disable fallback
How refresh behaves after schema changes
Why a query is falling back unexpectedly

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions

1. What is the primary benefit of using Direct Lake mode in a Fabric semantic model?

A. It fully imports data into VertiPaq for maximum compression
B. It queries Delta tables in OneLake directly without data import
C. It sends all queries back to the source system
D. It eliminates the need for semantic models

Correct Answer: B

Explanation:
Direct Lake reads Delta/Parquet files directly from OneLake, avoiding both data import (Import mode) and source query execution (DirectQuery), enabling near-Import performance with fresher data.

2. When does a Direct Lake semantic model fall back to another query mode?

A. When scheduled refresh fails
B. When unsupported features or queries are encountered
C. When the dataset exceeds 1 GB
D. When row-level security is enabled

Correct Answer: B

Explanation:
Fallback occurs when a query or model feature is not supported by Direct Lake, such as certain DAX expressions or unsupported data types.

3. What is the default behavior of Direct Lake when a query cannot be executed in Direct Lake mode?

A. The query fails immediately
B. The query retries using Import mode only
C. The query automatically falls back to another supported mode
D. The semantic model is disabled

Correct Answer: C

Explanation:
By default, Direct Lake allows fallback to ensure query reliability. This allows reports to continue functioning even if Direct Lake cannot handle a specific request.

4. Why might an organization choose to disable fallback in a Direct Lake semantic model?

A. To reduce OneLake storage costs
B. To enforce consistent Direct Lake performance and detect incompatibilities
C. To allow automatic data imports
D. To improve data refresh frequency

Correct Answer: B

Explanation:
Disabling fallback ensures queries only run in Direct Lake mode. This is useful for performance validation and preventing unexpected query behavior.

5. Which action typically requires a metadata refresh in a Direct Lake semantic model?

A. Adding new rows to a Delta table
B. Updating existing fact table values
C. Adding a new column to a Delta table
D. Running a Power BI report

Correct Answer: C

Explanation:
Schema changes such as adding or removing columns require a metadata refresh so the semantic model can recognize structural changes.

6. How does Direct Lake handle new data written to Delta tables in OneLake?

A. Data is visible only after a scheduled refresh
B. Data is visible automatically without data refresh
C. Data is visible only after manual import
D. Data is cached permanently

Correct Answer: B

Explanation:
Direct Lake reads data directly from OneLake, so new or updated data becomes available without needing a traditional Import refresh.

7. Which scenario is MOST likely to cause Direct Lake fallback?

A. Simple SUM aggregation on a fact table
B. Querying a supported Delta table
C. Using unsupported DAX functions in a measure
D. Filtering data using slicers

Correct Answer: C

Explanation:
Certain complex or unsupported DAX functions can force fallback because Direct Lake cannot execute them efficiently using file-based access.

8. What happens if fallback is disabled and a query cannot be executed in Direct Lake mode?

A. The query automatically switches to DirectQuery
B. The query fails and returns an error
C. The semantic model imports the data
D. The model switches to Import mode permanently

Correct Answer: B

Explanation:
When fallback is disabled, unsupported queries fail instead of switching modes, making incompatibilities more visible during testing.

9. Which statement about refresh behavior in Direct Lake models is TRUE?

A. Full data refresh is always required
B. Direct Lake models do not support refresh
C. Only metadata refresh may be required
D. Refresh behaves the same as Import mode

Correct Answer: C

Explanation:
Direct Lake does not require full data refreshes because it reads data directly from OneLake. Metadata refresh is needed only for structural changes.

10. Why is Direct Lake well suited for enterprise-scale semantic models?

A. It eliminates the need for Delta tables
B. It supports unlimited bidirectional relationships
C. It combines near-Import performance with fresh data access
D. It forces all data into memory

Correct Answer: C

Explanation:
Direct Lake offers high performance without importing data, making it ideal for large datasets that require frequent updates and scalable analytics.

Analytics, BI Administration, Business Intelligence, Business Intelligence (BI) Development, Business Intelligence Platform, Data Development, Data Governance, Data Integration, Data Integration (ETL), Data Modeling, Data Security, Data Strategy, Data Warehousing, DP-600, Microsoft Certification, Microsoft Fabric, Microsoft OneLake, Performance Tuning December 28, 2025

Choose Between Direct Lake on OneLake and Direct Lake on SQL Endpoints

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models (25-30%) 
    --> Optimize enterprise-scale semantic models 
        --> Choose between Direct Lake on OneLake and Direct Lake on SQL endpoints

In Microsoft Fabric, Direct Lake is a high-performance semantic model storage mode that allows Power BI and Fabric semantic models to query data directly from OneLake without importing it into VertiPaq. When implementing Direct Lake, you must choose where the semantic model reads from, either:

Direct Lake on OneLake
Direct Lake on SQL endpoints

Understanding the differences, trade-offs, and use cases for each option is critical for optimizing enterprise-scale semantic models, and this topic appears explicitly in the DP-600 exam blueprint.

Direct Lake on OneLake

What It Is

Direct Lake on OneLake connects the semantic model directly to Delta tables stored in OneLake, bypassing SQL engines entirely. Queries operate directly on Parquet/Delta files using the Fabric Direct Lake engine.

Key Characteristics

Reads Delta tables directly from OneLake
No dependency on a SQL query engine
Near-Import performance with zero data duplication
Minimal latency between data ingestion and reporting
Requires supported Delta table structures and data types

Advantages

Best performance for large-scale analytics
Always reflects the latest data written to OneLake
Eliminates Import refresh overhead
Ideal for lakehouse-centric architectures

Limitations

Some complex DAX patterns may cause fallback
Requires schema compatibility with Direct Lake
Less flexibility for SQL-based transformations

Typical Use Cases

Enterprise lakehouse analytics
High-volume fact tables
Near-real-time reporting
Fabric-native data pipelines

Direct Lake on SQL Endpoints

What It Is

Direct Lake on SQL endpoints connects the semantic model to the SQL analytics endpoint of a Lakehouse or Warehouse, while still using Direct Lake storage mode behind the scenes.

Instead of reading files directly, the semantic model relies on the SQL endpoint to expose the data.

Key Characteristics

Queries go through the SQL endpoint
Still benefits from Direct Lake storage
Enables SQL views and transformations
Slightly higher latency than pure OneLake access

Advantages

Supports SQL-based modeling (views, joins, calculated columns)
Easier integration with existing SQL logic
Familiar experience for SQL-first teams
Useful when business logic is already defined in SQL

Limitations

Additional query layer may impact performance
Less efficient than direct file access
SQL endpoint availability becomes a dependency

Typical Use Cases

Organizations with strong SQL development practices
Reuse of existing SQL views and transformations
Gradual migration from Warehouse or SQL models
Mixed BI and ad-hoc SQL workloads

Key Comparison Summary

Aspect	Direct Lake on OneLake	Direct Lake on SQL Endpoint
Data access	Direct file access	Via SQL analytics endpoint
Performance	Highest	Slightly lower
SQL dependency	None	Required
Schema flexibility	Lower	Higher
Transformation style	Lakehouse / Spark	SQL-based
Ideal for	Scale & performance	SQL reuse & flexibility

Choosing Between the Two (Exam-Focused Guidance)

On the DP-600 exam, questions typically focus on architectural intent and performance optimization:

Choose Direct Lake on OneLake when:

Performance is the top priority
Data is already modeled in Delta tables
You want the simplest, most scalable architecture
Near-real-time analytics are required

Choose Direct Lake on SQL endpoints when:

You need SQL views or transformations
Existing logic already exists in SQL
Teams are more comfortable with SQL than Spark
Some flexibility is preferred over maximum performance

Exam Tip 💡

If a question emphasizes:

Maximum performance, minimal latency, or scalability/large-scale analytics → Direct Lake on OneLake
SQL views, SQL transformations, or SQL reuse → Direct Lake on SQL endpoints

Expect scenario-based questions where both options are technically valid, but only one best aligns with the business and performance requirements.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions

Question 1

A company has Delta tables stored in OneLake and wants the lowest possible query latency for Power BI reports without using SQL views. Which option should they choose?

A. Import mode
B. DirectQuery on SQL endpoint
C. Direct Lake on SQL endpoint
D. Direct Lake on OneLake

Correct Answer: D

Explanation:
Direct Lake on OneLake reads Delta tables directly from OneLake without a SQL layer, delivering the best performance and lowest latency.

Question 2

Which requirement would most strongly favor Direct Lake on SQL endpoints over Direct Lake on OneLake?

A. Maximum performance
B. Real-time data visibility
C. Use of SQL views for business logic
D. Minimal infrastructure dependencies

Correct Answer: C

Explanation:
Direct Lake on SQL endpoints allows semantic models to consume SQL views and transformations, making it ideal when business logic is defined in SQL.

Question 3

What is a key architectural difference between Direct Lake on OneLake and Direct Lake on SQL endpoints?

A. Only OneLake supports Delta tables
B. SQL endpoints require data import
C. OneLake access bypasses the SQL engine
D. SQL endpoints cannot be used with semantic models

Correct Answer: C

Explanation:
Direct Lake on OneLake reads Delta files directly from storage, while SQL endpoints introduce an additional SQL query layer.

Question 4

A Fabric semantic model uses Direct Lake on OneLake. Under which condition might it fallback to DirectQuery?

A. The model contains calculated columns
B. The dataset exceeds 1 TB
C. The Delta table schema is unsupported
D. The SQL endpoint is unavailable

Correct Answer: C

Explanation:
If the Delta table schema or data types are not supported by Direct Lake, Fabric automatically falls back to DirectQuery.

Question 5

Which scenario is best suited for Direct Lake on SQL endpoints?

A. High-volume streaming telemetry
B. SQL-first team reusing existing warehouse views
C. Near-real-time dashboards on raw lake data
D. Large fact tables optimized for scan performance

Correct Answer: B

Explanation:
Direct Lake on SQL endpoints is ideal when teams rely on SQL views and want to reuse existing SQL logic.

Question 6

Which statement about performance is most accurate?

A. SQL endpoints always outperform OneLake
B. OneLake always requires Import mode
C. Direct Lake on OneLake typically offers better performance
D. Direct Lake on SQL endpoints does not use Direct Lake

Correct Answer: C

Explanation:
Direct Lake on OneLake avoids the SQL layer, resulting in faster query execution in most scenarios.

Question 7

A Power BI model must reflect new data immediately after ingestion into OneLake. Which option best supports this requirement?

A. Import mode
B. DirectQuery
C. Direct Lake on SQL endpoint
D. Direct Lake on OneLake

Correct Answer: D

Explanation:
Direct Lake on OneLake reads data directly from Delta tables and reflects changes immediately without refresh.

Question 8

Which dependency exists when using Direct Lake on SQL endpoints that does not exist with Direct Lake on OneLake?

A. Delta Lake support
B. VertiPaq compression
C. SQL analytics endpoint availability
D. Semantic model compatibility

Correct Answer: C

Explanation:
Direct Lake on SQL endpoints depends on the SQL analytics endpoint being available, while OneLake access does not.

Question 9

From a DP-600 exam perspective, which factor most often determines the correct choice between these two options?

A. Dataset size alone
B. Whether SQL transformations are required
C. Number of report users
D. Power BI license type

Correct Answer: B

Explanation:
Exam questions typically focus on whether SQL logic (views, joins, transformations) is needed, which drives the choice.

Question 10

You are designing an enterprise semantic model focused on scalability and minimal complexity. The data is already curated as Delta tables. What is the best choice?

A. Import mode
B. DirectQuery on SQL endpoint
C. Direct Lake on SQL endpoint
D. Direct Lake on OneLake

Correct Answer: D

Explanation:
Direct Lake on OneLake offers the simplest architecture with the highest scalability and performance when Delta tables are already prepared.

Analytics, BI Administration, Business Intelligence (BI) Development, Business Intelligence Platform, Data Analysis, Data Governance, Data Quality Assurance, Data Security, DP-600, Microsoft Certification, Microsoft Fabric December 28, 2025

Apply sensitivity labels to items in Microsoft Fabric

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Maintain a data analytics solution 
   --> Implement security and governance
      --> Apply sensitivity labels to items

To Do:
Complete the related module for this topic in the Microsoft Learn course: Secure data access in Microsoft Fabric

Sensitivity labels are a data protection and governance feature in Microsoft Fabric that help organizations classify, protect, and control the handling of sensitive data. They integrate with Microsoft Purview Information Protection and extend data protection consistently across Fabric, Power BI, and Microsoft 365.

For the DP-600 exam, you should understand what sensitivity labels are, how they are applied, what they affect, and how they differ from access controls.

What Are Sensitivity Labels?

Sensitivity labels:

Classify data based on confidentiality and business impact
Travel with the data across supported services
Can trigger protection behaviors, such as encryption or usage restrictions

Common label examples include:

Public
Internal
Confidential
Highly Confidential

Labels are organizationally defined and managed centrally.

Where Sensitivity Labels Come From

Sensitivity labels in Fabric are:

Created and managed in Microsoft Purview
Defined at the tenant level by security or compliance administrators
Made available to Fabric and Power BI through tenant settings

Fabric users apply labels, but typically do not define them.

Items That Can Be Labeled in Microsoft Fabric

Sensitivity labels can be applied to many Fabric items, including:

Semantic models (datasets)
Reports
Dashboards
Dataflows
Lakehouses and Warehouses (where supported)
Exported artifacts (Excel, PowerPoint, PDF)

This makes labeling a cross-workload governance mechanism.

How Sensitivity Labels Are Applied

Labels can be applied:

Manually by item owners or authorized users
Automatically through inherited labeling
Programmatically via APIs (advanced scenarios)

Label Inheritance

In many cases:

Reports inherit the label from their underlying semantic model
Dashboards inherit labels from pinned tiles
Exported files inherit the label of the source item

This inheritance model is frequently tested in exam scenarios.

What Sensitivity Labels Do (and Do Not Do)

What they do:

Classify data for compliance and governance
Enable protection such as:
- Encryption
- Watermarking
- Usage restrictions (e.g., block external sharing)
Travel with data when exported or shared

What they do NOT do:

Grant or restrict user access
Replace workspace, item-level, or data-level security
Filter rows or columns

Key exam distinction:
Sensitivity labels protect data after access is granted.

Sensitivity Labels vs Endorsements

These two concepts are often confused on exams.

Feature	Sensitivity Labels	Endorsements
Purpose	Data protection	Trust and quality
Enforced	Yes	No
Affects behavior	Yes (encryption, sharing rules)	No
Security-related	Yes	Governance guidance

Governance and Compliance Benefits

Sensitivity labels support:

Regulatory compliance (e.g., GDPR, HIPAA)
Data loss prevention (DLP)
Auditing and reporting
Consistent handling of sensitive data across platforms

They are especially important in environments with:

Self-service analytics
Data exports to Excel or PowerPoint
External sharing scenarios

Common Exam Scenarios

You may see questions such as:

A report exported to Excel must remain encrypted → sensitivity label
Data should be classified as confidential but still shared internally → labeling, not access restriction
Users can view data but cannot share externally → label-driven protection
A report automatically inherits its dataset’s classification → label inheritance

Best Practices to Remember

Apply labels at the semantic model level to ensure inheritance
Use sensitivity labels alongside:
- Workspace and item-level access controls
- RLS and CLS
- Endorsements
Review labeling regularly to ensure accuracy
Educate users on selecting the correct label

Key Exam Takeaways

Sensitivity labels classify and protect data
They are defined in Microsoft Purview
Labels can enforce encryption and sharing restrictions
Labels do not control access
Inheritance behavior is important for DP-600 questions

Exam Tips

If a question focuses on classifying, protecting, or controlling how data is shared after access, think sensitivity labels.
If it focuses on who can see the data, think security roles or permissions.
Expect scenario questions involving:
- PII, financial data, or confidential data
- Export restrictions
- Label inheritance
Know the difference between:
- Security (RLS, OLS, item access)
- Governance & compliance (sensitivity labels)
Always associate sensitivity labels with Microsoft Purview

Practice Questions

Question 1 (Single choice)

What is the PRIMARY purpose of applying sensitivity labels to items in Microsoft Fabric?

A. Improve query performance
B. Control row-level data access
C. Classify and protect data based on sensitivity
D. Grant workspace permissions

Correct Answer: C

Explanation:
Sensitivity labels are used for data classification, protection, and governance, not for performance or access control.

Question 2 (Scenario-based)

Your organization requires that all reports containing customer PII automatically display a watermark and restrict external sharing. What feature enables this?

A. Row-level security
B. Sensitivity labels with protection settings
C. Item-level access controls
D. Conditional access policies

Correct Answer: B

Explanation:
Sensitivity labels can apply visual markings, encryption, and sharing restrictions when integrated with Microsoft Purview.

Question 3 (Multi-select)

Which Fabric items can have sensitivity labels applied? (Select all that apply.)

A. Power BI reports
B. Semantic models
C. Lakehouses and warehouses
D. Notebooks

Correct Answers: A, B, C, D

Explanation:
Sensitivity labels can be applied to most Fabric artifacts, enabling consistent governance across analytics assets.

Question 4 (Scenario-based)

A semantic model inherits a sensitivity label from its underlying data source. What does this behavior represent?

A. Manual labeling
B. Label inheritance
C. Workspace-level labeling
D. Object-level security

Correct Answer: B

Explanation:
Label inheritance ensures that downstream artifacts maintain appropriate sensitivity classifications automatically.

Question 5 (Single choice)

Which service must be configured to define and manage sensitivity labels used in Microsoft Fabric?

A. Azure Active Directory
B. Microsoft Defender
C. Microsoft Purview
D. Power BI Admin portal

Correct Answer: C

Explanation:
Sensitivity labels are defined and managed in Microsoft Purview, then applied across Microsoft Fabric and Power BI.

Question 6 (Scenario-based)

A report is labeled Highly Confidential, but a user attempts to export its data to Excel. What is the expected behavior?

A. Export always succeeds
B. Export is blocked or encrypted based on label policy
C. Export ignores sensitivity labels
D. Only row-level security applies

Correct Answer: B

Explanation:
Sensitivity labels can restrict exports, apply encryption, or enforce protection based on policy.

Question 7 (Multi-select)

Which actions can sensitivity labels enforce? (Select all that apply.)

A. Data encryption
B. Watermarks and headers
C. External sharing restrictions
D. Row-level filtering

Correct Answers: A, B, C

Explanation:
Sensitivity labels control protection and compliance, not data filtering.

Question 8 (Scenario-based)

You apply a sensitivity label to a lakehouse. Which downstream artifact is MOST likely to automatically inherit the label?

A. A Power BI report built on the semantic model
B. A notebook in a different workspace
C. An external CSV export
D. An Azure SQL Database

Correct Answer: A

Explanation:
Label inheritance flows through Fabric analytics artifacts, especially semantic models and reports.

Question 9 (Single choice)

Who is typically allowed to apply or change sensitivity labels on Fabric items?

A. Any workspace Viewer
B. Only Microsoft admins
C. Users with sufficient item permissions
D. External users

Correct Answer: C

Explanation:
Users must have appropriate permissions (Contributor/Owner or item-level rights) to apply labels.

Question 10 (Scenario-based)

Your compliance team wants visibility into how sensitive data is used across Fabric. Which feature supports this requirement?

A. Query caching
B. Audit logs
C. Sensitivity labels with Purview reporting
D. Direct Lake mode

Correct Answer: C

Explanation:
Sensitivity labels integrate with Microsoft Purview reporting and auditing for compliance and governance tracking.

BI Administration, Business Intelligence Platform, Data Governance, Data Quality Assurance, Data Security, Data Strategy, DP-600, Microsoft Certification, Microsoft Fabric December 28, 2025

Endorse items in Microsoft Fabric

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Maintain a data analytics solution 
    --> Implement security and governance 
        --> Endorse items

To Do:
Complete the related module for this topic in the Microsoft Learn course: Secure data access in Microsoft Fabric

Item endorsement is a governance feature in Microsoft Fabric that helps organizations identify trusted, high-quality, and officially supported analytics assets. Endorsements guide users toward the right data and reports, reduce duplication, and promote consistent decision-making.

For the DP-600 exam, you should understand what endorsement is, the types of endorsements available, who can apply them, and how endorsements affect user behavior (not security).

What Does It Mean to Endorse an Item?

Endorsing an item signals to users that the content is:

Reliable
Well-maintained
Appropriate for reuse and decision-making

Endorsement is not a security mechanism. It does not grant or restrict access—it provides trust and visibility cues within the Fabric experience.

Endorsements can be applied to:

Semantic models (datasets)
Reports
Dashboards
Other supported Fabric items

Types of Endorsements

Microsoft Fabric supports three endorsement states:

1. None

There is no endorsement on the content.

2. Promoted

Promoted items are considered:

Useful
Reviewed
Suitable for reuse

Key characteristics:

Any item owner can promote their own content
Indicates quality, but not official certification
Common for team-approved or department-level assets

Typical use cases

A curated dataset used by multiple analysts
A well-designed report shared across a department

3. Certified

Certified items represent the highest level of trust.

Key characteristics:

Only authorized users (often admins or designated certifiers) can certify
Indicates the item meets organizational standards for:
- Data quality
- Governance
- Security
Intended for enterprise-wide consumption

Typical use cases

Official financial reporting datasets
Executive dashboards
Enterprise semantic models

Who Can Endorse Items?

Promoted: Item owners
Certified: Users authorized by Fabric or Power BI tenant settings (often admins or data stewards)

This distinction is important for the exam: not everyone can certify content, even if they own it.

Where Endorsements Appear

Endorsements are visible across the Fabric and Power BI experiences:

In search results
In lineage view
In the data hub
When users select data sources for report creation

Certified items are typically:

Ranked higher
More visible
Preferred in self-service analytics workflows

Endorsements vs Security Controls

A common exam trap is confusing endorsements with access control.

Feature	Endorsement	Access Control
Purpose	Trust and quality	Security and restriction
Limits access?	No	Yes
Affects visibility	Yes	Yes
Enforced by system	No (informational)	Yes (mandatory)

The “Make discoverable” setting

Within the selection settings dialog for Endorsement, there is also a selection option for “Make discoverable“. This option, when selected, allows users to discover the content even if they do not have access to it, and they can then request access.

Summary table

Endorsement and Discovery state	What it is	Who can do it	Typical Use Cases
None	There is no endorsement on the content
Promoted	The content is endorsed/flagged as Promoted Promoted items are considered: – Useful – Reviewed – Suitable for reuse Key characteristics: – Any item owner can promote their own content – Indicates quality, but not official certification – Common for team-approved or department-level assets	– Users can assign “Promoted” without any specific admin settings. – Users can “Promote” as long as they have write permissions on a semantic model.	– A curated dataset used by multiple analysts – A well-designed report shared across a department
Certified	The content is endorsed/flagged as Certified. Certified items represent the highest level of trust. Key characteristics: – Only authorized users (often admins or designated certifiers) can certify – Indicates the item meets organizational standards for: Data quality, Governance, and Security – Intended for enterprise-wide consumption	“Certification” requires admin approval to be able to set it.	– Official financial reporting datasets – Executive dashboards – Enterprise semantic models
Make discoverable	The content is endorsed/flagged as Findable. And the discoverability can be set for selected users, the entire company, or all except selected users.		Make content discoverable even to those that do not currently have access, so that they become aware it’s available and can request access to it.

Key takeaway:
A user must still have workspace or item-level access to use an endorsed item.

Role of Endorsements in Governance

Endorsements support governance by:

Encouraging reuse of approved assets
Reducing “shadow BI”
Helping users choose the right data source
Aligning self-service analytics with enterprise standards

They are especially important in large Fabric environments with:

Many workspaces
Multiple datasets covering similar subject areas
Mixed technical and business users

Common Exam Scenarios

Expect questions such as:

When to use Promoted vs Certified
Who is allowed to certify an item
Whether certification affects access permissions (it does not)
How endorsements support discoverability and trust

Example scenario:

Business users are building reports from multiple datasets and need guidance on which one is authoritative.
Correct concept: Certified semantic models.

Best Practices to Remember

Promote items early to guide reuse
Reserve certification for high-value, governed assets
Combine endorsements with:
- Clear workspace organization
- Descriptions and documentation
- Proper access controls
Review certifications periodically to ensure relevance

Key Exam Takeaways

Endorsements indicate trust, not permission
Two endorsement levels: Promoted and Certified
Certification requires special authorization
Endorsements improve discoverability and governance in Fabric

Final Exam Tips

If a question is about helping users identify trusted or official data, think endorsements.
If it’s about restricting access, think workspace, item-level, or data-level security.
Know the difference between Promoted and Certified
Expect scenario questions about:
- Data trust
- Self-service vs governed BI
- Discoverability in Data hub
Remember:
- Endorsements ≠ security
- Endorsements ≠ performance tuning
Certification permissions are centrally controlled

Link to documentation on this topic: Endorse your content

Practice Questions

Question 1 (Single choice)

What is the PRIMARY purpose of endorsing items in Microsoft Fabric?

A. Improve dataset refresh performance
B. Control data access permissions
C. Identify trusted and authoritative content
D. Apply compliance policies

Correct Answer: C

Explanation:
Endorsements help users quickly identify reliable, trusted content such as official semantic models and reports.

Question 2 (Multi-select)

Which endorsement types are available in Microsoft Fabric? (Select all that apply.)

A. Certified
B. Promoted
C. Verified
D. Approved

Correct Answers: A, B

Explanation:
Fabric supports Promoted and Certified endorsements. “Verified” and “Approved” are not valid endorsement types.

Question 3 (Scenario-based)

A business analyst creates a report that is useful but not officially validated. What endorsement is MOST appropriate?

A. Certified
B. Promoted
C. Deprecated
D. Restricted

Correct Answer: B

Explanation:
Promoted indicates content that is useful and recommended, but not formally governed or validated.

Question 4 (Scenario-based)

Your organization wants only centrally governed semantic models to be marked as official sources of truth. Which endorsement should be used?

A. Promoted
B. Shared
C. Certified
D. Published

Correct Answer: C

Explanation:
Certified content represents authoritative, validated data assets approved by data owners or governance teams.

Question 5 (Single choice)

Who can typically certify an item in Microsoft Fabric?

A. Any workspace Member
B. Only the item creator
C. Users authorized by tenant or workspace settings
D. External users

Correct Answer: C

Explanation:
Certification is restricted and controlled by tenant-level or workspace-level governance policies.

Question 6 (Multi-select)

Which Fabric items can be endorsed? (Select all that apply.)

A. Semantic models
B. Reports
C. Dashboards
D. Dataflows Gen2

Correct Answers: A, B, D

Explanation:
Semantic models, reports, and dataflows can be endorsed. Dashboards are less commonly emphasized in Fabric exam contexts.

Question 7 (Scenario-based)

A user searches for datasets in the Data hub. How do endorsements help in this scenario?

A. They hide non-endorsed items
B. They improve query performance
C. They help users identify trusted content
D. They automatically grant access

Correct Answer: C

Explanation:
Endorsements improve discoverability and trust, not access or performance.

Question 8 (Single choice)

What is the relationship between endorsements and security?

A. Endorsements enforce access controls
B. Endorsements replace RLS
C. Endorsements are independent of security
D. Endorsements automatically grant read access

Correct Answer: C

Explanation:
Endorsements do not control access. Security must be handled separately via permissions and access controls.

Question 9 (Scenario-based)

Your organization wants users to prefer centrally curated datasets without blocking self-service models. What approach BEST supports this?

A. Apply row-level security
B. Disable dataset creation
C. Certify governed datasets
D. Use Direct Lake mode

Correct Answer: C

Explanation:
Certifying official datasets encourages reuse while still allowing self-service analytics.

Question 10 (Fill in the blank)

In Microsoft Fabric, ________ items represent fully validated and authoritative content, while ________ items indicate recommended but not formally governed content.

Correct Answer:
Certified, Promoted

Explanation:
Certified = authoritative source of truth
Promoted = useful and recommended, but not governed

BI Administration, Business Intelligence, Business Intelligence (BI) Development, Business Intelligence Platform, Data Development, Data Governance, Data Quality Assurance, Data Security, Data Strategy, DP-600, Microsoft Certification, Microsoft Fabric December 28, 2025

Configure version control for a workspace in Microsoft Fabric

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Maintain a data analytics solution 
    --> Maintain the analytics development lifecycle 
        --> Configure version control for a workspace

Version control in Microsoft Fabric enables teams to track changes, collaborate safely, and manage the lifecycle of analytics assets using source control practices. Fabric integrates workspace items with Git repositories, bringing DevOps discipline to analytics development.

For the DP-600 exam, you should understand how Git integration works in Fabric, what items are supported, how changes flow, and common governance scenarios.

What Is Workspace Version Control in Fabric?

Workspace version control allows you to:

Connect a Fabric workspace to a Git repository
Store item definitions as code artifacts
Track changes through commits, branches, and pull requests
Support collaborative and auditable development

This capability is often referred to as Git integration for Fabric workspaces.

Supported Source Control Platform

Microsoft Fabric supports:

Azure DevOps (ADO) Git repositories

Key points:

GitHub support is limited or evolving (exam questions typically reference Azure DevOps)
Repositories must already exist
Authentication is handled via Microsoft Entra ID

Exam note: Expect Azure DevOps to be the default answer unless stated otherwise.

What Items Can Be Version Controlled?

Common Fabric items that support version control include:

Semantic models
Reports
Lakehouses
Warehouses
Notebooks
Data pipelines
Dataflows Gen2

Items are serialized into files and folders in the Git repo, allowing:

Diffing
History tracking
Rollbacks

How to Configure Version Control for a Workspace

At a high level, the process is:

Open the Fabric workspace settings
Enable Git integration
Select:
- Azure DevOps organization
- Project
- Repository
- Branch
Choose a workspace folder structure
Initialize synchronization

Once configured:

Workspace changes can be committed to Git
Repo changes can be synced back into the workspace

How Changes Flow Between Workspace and Git

From Workspace to Git

Users make changes in Fabric (e.g., update a report)
Changes are committed to the connected branch
Commit history tracks who changed what and when

From Git to Workspace

Changes merged into the branch can be pulled into Fabric
Enables controlled deployment across environments

Important exam concept:
Synchronization is not automatic—users must explicitly commit and sync.

Branching and Environment Strategy

A common lifecycle pattern:

Development workspace → linked to a dev branch
Test workspace → linked to a test branch
Production workspace → linked to a main branch

This supports:

Code reviews
Pull requests
Controlled promotion of changes

Permissions and Governance Considerations

To configure and use version control:

Users need sufficient workspace permissions (typically Admin or Member)
Users also need Git repository access
Git permissions are managed outside Fabric

Version control complements—but does not replace:

Workspace-level access controls
Item-level permissions
Endorsements and sensitivity labels

Benefits of Version Control in Fabric

Version control enables:

Collaboration among multiple developers
Change traceability and auditability
Rollback of problematic changes
CI/CD-style deployment patterns
Alignment with enterprise DevOps practices

These benefits are a frequent theme in DP-600 scenario questions.

Common Exam Scenarios

You may be asked to:

Identify when Git integration is appropriate
Choose the correct platform for source control
Understand how changes move between Git and Fabric
Design a dev/test/prod workspace strategy
Troubleshoot why changes are not reflected (sync not performed)

Example:

Multiple developers need to work on the same semantic model with change tracking.
Correct concept: Configure workspace version control with Git.

Key Exam Takeaways

Fabric supports Git-based version control at the workspace level
Azure DevOps is the primary supported platform
Changes require explicit commit and sync
Version control supports structured development and deployment
It is a core part of the analytics development lifecycle

Exam Tips

If a question mentions tracking changes, collaboration, rollback, or DevOps practices, think workspace version control with Git.
If it mentions moving changes between environments, think branches and multiple workspaces.
Know who can configure it → Workspace Admins
Understand Git integration flow
Expect scenario questions comparing:
- Git vs deployment pipelines
- Collaboration vs governance
Remember:
- JSON-based artifacts
- Not all items are supported
- No automatic commits

Practice Questions

Question 1 (Single choice)

What is the PRIMARY purpose of configuring version control for a Fabric workspace?

A. Improve query execution performance
B. Enable collaboration, change tracking, and rollback
C. Enforce row-level security
D. Automatically deploy content to production

Correct Answer: B

Explanation:
Version control enables source control integration, allowing teams to track changes, collaborate safely, and roll back when needed.

Question 2 (Multi-select)

Which version control systems can be integrated with Microsoft Fabric workspaces? (Select all that apply.)

A. Azure DevOps Git repositories
B. GitHub repositories
C. OneDrive for Business
D. SharePoint document libraries

Correct Answers: A, B

Explanation:
Fabric supports Git integration using Azure DevOps and GitHub. OneDrive and SharePoint are not supported for workspace version control.

Question 3 (Scenario-based)

A team wants to manage Power BI reports, semantic models, and dataflows using pull requests and branching. What should they configure?

A. Deployment pipelines
B. Sensitivity labels
C. Workspace version control with Git
D. Incremental refresh

Correct Answer: C

Explanation:
Git-based workspace version control enables branching, pull requests, and code reviews.

Question 4 (Single choice)

Which workspace role is REQUIRED to configure version control for a workspace?

A. Viewer
B. Contributor
C. Member
D. Admin

Correct Answer: D

Explanation:
Only workspace Admins can connect a workspace to a Git repository.

Question 5 (Scenario-based)

After connecting a workspace to a Git repository, where are Fabric items stored?

A. As binary files
B. As JSON-based artifact definitions
C. As SQL scripts
D. As Excel files

Correct Answer: B

Explanation:
Fabric artifacts are stored as JSON files, making them suitable for source control and comparison.

Question 6 (Multi-select)

Which items can be included in workspace version control? (Select all that apply.)

A. Reports
B. Semantic models
C. Dataflows Gen2
D. Dashboards

Correct Answers: A, B, C

Explanation:
Reports, semantic models, and dataflows are supported. Dashboards are typically excluded from version control scenarios.

Question 7 (Scenario-based)

A developer modifies a semantic model directly in the Fabric workspace while Git integration is enabled. What happens NEXT?

A. The change is automatically committed
B. The change is rejected
C. The workspace shows uncommitted changes
D. The change is immediately deployed to production

Correct Answer: C

Explanation:
Changes made in the workspace appear as pending/uncommitted changes until explicitly committed to the repository.

Question 8 (Single choice)

What is the relationship between workspace version control and deployment pipelines?

A. They are the same feature
B. Version control replaces deployment pipelines
C. They complement each other
D. Deployment pipelines require version control

Correct Answer: C

Explanation:
Version control handles source management, while deployment pipelines manage promotion across environments.

Question 9 (Scenario-based)

Your organization wants to prevent accidental overwrites when multiple developers edit the same item. Which feature BEST helps?

A. Row-level security
B. Sensitivity labels
C. Git branching and pull requests
D. Incremental refresh

Correct Answer: C

Explanation:
Git workflows enable controlled collaboration through branches, reviews, and merges.

Question 10 (Fill in the blank)

When version control is enabled, Fabric workspace changes must be ________ to the repository and ________ to update the workspace from Git.

Correct Answer:
Committed, synced (or pulled)

Explanation:
Changes flow both ways:

Commit workspace → Git
Sync Git → workspace