Tag: Dataflows Gen2

DP-700, Microsoft Certification, Microsoft Fabric June 3, 2026

Identify and resolve Dataflow Gen2 errors (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Monitor and optimize an analytics solution (30–35%)
   --> Identify and resolve errors
      --> Identify and resolve Dataflow Gen2 errors

Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Overview

Dataflow Gen2 is a powerful data ingestion and transformation service in Microsoft Fabric that enables data engineers and analysts to perform Extract, Transform, and Load (ETL) operations using a low-code, visual interface based on Power Query. Dataflow Gen2 supports hundreds of data sources and can load data into destinations such as Lakehouses, Warehouses, KQL Databases, and other Fabric items.

Because Dataflow Gen2 is often used to prepare and transform data before it reaches analytical solutions, failures can have significant downstream impacts. For the DP-700 exam, candidates must understand how to identify, troubleshoot, and resolve Dataflow Gen2 errors, interpret refresh results, analyze execution details, and implement practices that reduce operational issues.

Understanding Dataflow Gen2 Execution

A Dataflow Gen2 execution consists of several stages:

Source connection
Data extraction
Query transformation
Data validation
Data loading
Refresh completion

Errors can occur at any stage of this process.

Unlike pipelines, where multiple activities execute sequentially, Dataflow Gen2 refreshes are generally focused on executing Power Query transformations and loading results into destination systems.

Successful troubleshooting requires identifying which stage failed.

Common Categories of Dataflow Gen2 Errors

Connection Errors

Connection failures occur when Dataflow Gen2 cannot access the source system.

Common causes include:

Invalid credentials
Expired passwords
Revoked access
Incorrect connection strings
Network issues
Unsupported authentication methods

Example:

A Dataflow Gen2 refresh connects to Azure SQL Database using a username whose password has expired.

Result:

The refresh fails before any data is retrieved.

Typical troubleshooting steps:

Verify credentials.
Test connectivity.
Reauthenticate the connection.
Confirm source availability.

Authentication and Authorization Errors

Authentication confirms identity.

Authorization confirms permissions.

Common examples:

Missing database permissions
Insufficient workspace permissions
Revoked service account access
Missing OneLake permissions

Example error:

“Access denied while attempting to access source.”

Resolution:

Verify user permissions and security roles on both source and destination systems.

Source Schema Changes

Schema drift occurs when source structures change unexpectedly.

Examples include:

Columns removed
Columns renamed
Data types modified
New columns added

Example:

A transformation references a column named CustomerStatus.

A source application update renames the column to Status.

Result:

The transformation step fails.

Resolution:

Update transformation logic to reflect the new schema.

Power Query Transformation Errors

Many Dataflow Gen2 failures occur during transformation processing.

Missing Column Errors

Example:

A step attempts to select a column that no longer exists.

Error:

“Column not found.”

Resolution:

Review source schema and update transformation steps.

Data Type Conversion Errors

Example:

A text value such as “ABC123” is converted to a Whole Number data type.

Result:

Conversion failure.

Resolution:

Validate source data.
Clean data before conversion.
Use error handling logic.

Invalid Formula Errors

Power Query transformations use M language behind the scenes.

Example:

A custom column contains an incorrect expression.

Result:

Refresh failure.

Resolution:

Review and correct the transformation formula.

Reference Errors

Queries may reference:

Other queries
Parameters
Functions

If referenced objects are deleted or renamed, failures occur.

Resolution:

Validate dependencies within the dataflow.

Destination Errors

Errors may occur after transformations complete successfully.

Lakehouse Write Failures

Examples:

Missing destination table
Permission issues
Storage limitations
Schema mismatch

Resolution:

Verify table structure and permissions.

Warehouse Loading Errors

Examples:

Unsupported data types
Primary key violations
Schema conflicts

Resolution:

Validate compatibility between transformed data and destination schema.

KQL Database Loading Errors

Examples:

Incorrect mappings
Unsupported formats
Data ingestion policy issues

Resolution:

Review destination configuration and ingestion mappings.

Refresh Failures

Dataflow Gen2 refresh operations generate execution information that should be examined whenever failures occur.

Refresh details often provide:

Failure stage
Error messages
Execution duration
Rows processed
Destination information

For troubleshooting, refresh history is usually the first place to investigate.

Monitoring Dataflow Gen2 Refreshes

Refresh History

Refresh history provides information about:

Successes
Failures
Start times
End times
Refresh duration

Engineers should review failed refreshes immediately after errors occur.

Detailed Error Messages

Refresh details often contain:

Error codes
Source system messages
Transformation failures
Destination loading issues

Always review the detailed error rather than relying solely on the refresh status.

Example:

Generic message:

“Refresh failed.”

Detailed message:

“Cannot convert value ‘N/A’ to Whole Number.”

The detailed error immediately identifies the issue.

Dataflow Monitoring in Fabric

Fabric monitoring tools can help identify:

Failed refreshes
Long-running refreshes
Capacity-related issues
Destination write failures

Monitoring trends over time can reveal recurring problems.

Troubleshooting Common Dataflow Gen2 Errors

Scenario 1: Source Authentication Failure

Symptoms:

Refresh fails immediately.
No records processed.

Investigation:

Verify credentials.
Test source access.
Reauthenticate the connection.
Confirm account permissions.

Resolution:

Update credentials or restore permissions.

Scenario 2: Missing Column Error

Symptoms:

Refresh fails during transformation.
Error references a missing field.

Investigation:

Review source schema.
Compare against transformation steps.
Identify renamed or deleted columns.

Resolution:

Modify transformation logic.

Scenario 3: Data Type Conversion Failure

Symptoms:

Refresh stops during transformation.

Example:

A column contains:

The query attempts to convert the column to numeric values.

Resolution:

Clean invalid values.
Replace errors.
Filter problematic records.

Scenario 4: Destination Table Failure

Symptoms:

Transformations succeed.
Loading fails.

Investigation:

Verify destination exists.
Validate permissions.
Review destination schema.

Resolution:

Correct schema or permission issues.

Scenario 5: Long-Running Refresh

Symptoms:

Refresh takes significantly longer than expected.

Possible causes:

Large data volume
Complex transformations
Source system bottlenecks
Capacity constraints

Resolution:

Optimize transformations and reduce unnecessary processing.

Using Query Diagnostics

Power Query provides diagnostic capabilities that can help identify:

Expensive transformation steps
Slow source queries
Bottlenecks during execution

Query diagnostics are particularly useful when refreshes succeed but perform poorly.

Areas to investigate include:

Excessive row operations
Repeated transformations
Non-folding queries

Query Folding and Error Prevention

What is Query Folding?

Query folding occurs when transformations are pushed back to the source system.

Instead of processing data inside Fabric:

The source executes filtering.
The source performs aggregations.
The source reduces result sets.

Benefits:

Faster refreshes
Reduced resource consumption
Lower failure risk

How Query Folding Affects Troubleshooting

Poor query folding can lead to:

Excessive processing
Memory consumption
Long refresh durations

When troubleshooting performance-related refresh issues, query folding should be evaluated.

Capacity-Related Errors

Dataflow Gen2 consumes Fabric compute resources.

Potential issues include:

High concurrency
Capacity throttling
Resource contention

Symptoms:

Slow refreshes
Intermittent failures
Unexpected cancellations

Monitoring Fabric capacity metrics can help identify these issues.

Best Practices for Preventing Dataflow Gen2 Errors

Validate Source Schemas

Regularly review source structures.

This helps detect schema drift before failures occur.

Use Defensive Transformations

Handle unexpected values through:

Null handling
Error replacement
Data validation

This improves refresh reliability.

Minimize Complex Transformations

Perform only necessary transformations.

Simpler dataflows are easier to maintain and troubleshoot.

Monitor Refresh History

Review failures and performance trends regularly.

Early detection reduces operational impact.

Test After Source Changes

Whenever source applications are modified:

Validate schemas.
Test refreshes.
Confirm transformation logic.

Optimize Query Folding

Push processing to source systems whenever possible.

This reduces execution times and resource consumption.

Document Dependencies

Track:

Source systems
Queries
Parameters
Destination tables

Documentation simplifies troubleshooting.

DP-700 Exam Tips

For the exam, remember:

Most Dataflow Gen2 troubleshooting begins with refresh history.
Source schema changes are a common cause of refresh failures.
Data type conversion errors frequently occur during transformations.
Destination errors can occur even when transformations succeed.
Query folding significantly affects performance and reliability.
Detailed error messages provide more value than high-level failure notifications.
Authentication and authorization issues are common root causes.
Capacity constraints can impact refresh performance.
Missing columns and renamed fields often cause transformation failures.
Monitoring refresh history is a core operational responsibility for Fabric data engineers.

Practice Exam Questions

Question 1

A Dataflow Gen2 refresh fails with the error:

“Column ‘CustomerType’ was not found.”

What is the most likely cause?

A. Destination table permissions were removed.

B. Fabric capacity is overloaded.

C. A source schema change occurred.

D. Query folding is disabled.

Correct Answer: C

Explanation:

The transformation references a column that no longer exists or has been renamed.

A would generate authorization errors.
C would typically cause performance or resource issues.
D affects performance rather than column existence.

Question 2

A Dataflow Gen2 refresh fails immediately after starting and processes zero rows.

Which issue is most likely?

A. Authentication failure

B. Query folding problem

C. Aggregation error

D. Destination schema mismatch

Correct Answer: A

Explanation:

Authentication issues generally prevent data retrieval from beginning.

B affects execution efficiency.
C occurs during transformation.
D typically appears later during loading.

Question 3

A data engineer wants to determine exactly why a Dataflow Gen2 refresh failed.

What should they examine first?

A. Fabric capacity metrics

B. Lakehouse storage statistics

C. Refresh history details

D. Workspace role assignments

Correct Answer: C

Explanation:

Refresh history contains detailed execution information and error messages.

A and B may be useful later.
D should only be investigated if permissions are suspected.

Question 4

A Dataflow Gen2 transformation attempts to convert a text value of “Unknown” into a Whole Number.

What type of error will occur?

A. Data type conversion error

B. Capacity error

C. Query dependency error

D. Authentication error

Correct Answer: A

Explanation:

Text values that cannot be converted to numeric formats generate conversion failures.

A, B, and C are unrelated to data conversion.

Question 5

Which capability pushes transformation processing back to the source system whenever possible?

A. Data validation

B. Query folding

C. Incremental refresh

D. Parameterization

Correct Answer: B

Explanation:

Query folding allows supported transformations to execute on the source system.

A validates data.
C limits refresh scope.
D provides dynamic values.

Question 6

Transformations complete successfully, but data cannot be written to the destination Warehouse.

Which category of issue is most likely?

A. Destination loading error

B. Source connectivity failure

C. Missing source table

D. Query folding issue

Correct Answer: A

Explanation:

If transformations finish successfully, the failure likely occurs during the loading phase.

B and C would occur earlier.
D typically affects performance.

Question 7

A data engineer wants to reduce failures caused by unexpected values in source data.

Which approach is best?

A. Increase capacity size

B. Disable query folding

C. Use defensive transformations and error handling

D. Reduce refresh frequency

Correct Answer: C

Explanation:

Handling nulls, invalid values, and conversion errors proactively improves reliability.

A may help performance but not data quality.
B often reduces efficiency.
D does not address the root cause.

Question 8

Which issue commonly results from source schema drift?

A. Missing or renamed columns

B. Capacity throttling

C. Refresh scheduling conflicts

D. Workspace role inheritance

Correct Answer: A

Explanation:

Schema drift occurs when source structures change unexpectedly.

B, C, and D are unrelated.

Question 9

A refresh suddenly begins taking twice as long as usual without failing.

Which tool would be most useful for identifying expensive transformation steps?

A. Workspace permissions

B. Query diagnostics

C. Tenant settings

D. Dataflow ownership settings

Correct Answer: B

Explanation:

Query diagnostics help identify bottlenecks and inefficient transformations.

A, C, and D do not provide execution analysis.

Question 10

Which best practice helps prevent Dataflow Gen2 failures after application updates modify source tables?

A. Disable refresh schedules temporarily

B. Increase concurrency limits

C. Recreate all dataflows monthly

D. Validate source schemas and test refreshes after changes

Correct Answer: D

Explanation:

Testing after source changes helps identify schema drift and compatibility issues before production failures occur.

A is reactive rather than preventive.
B does not address schema changes.
C is unnecessary and inefficient.

Go to the DP-700 Exam Prep Hub main page.

DP-700, Microsoft Certification, Microsoft Fabric June 3, 2026

Configure Dataflows Gen2 workspace settings (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Implement and manage an analytics solution (30–35%)
   --> Configure Microsoft Fabric workspace settings
      --> Configure Dataflows Gen2 workspace settings

Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Dataflows Gen2 are a core component of Microsoft Fabric’s data ingestion and transformation capabilities. They provide a low-code/no-code method for extracting, transforming, and loading (ETL) data into Fabric destinations such as Lakehouses, Warehouses, and other analytics assets.

For the DP-700 exam, it is important to understand not only how to create Dataflows Gen2, but also how workspace settings affect their operation, governance, security, performance, and administration.

Workspace-level settings help administrators establish standards and controls for how Dataflows Gen2 are used within a Fabric environment. Understanding these settings enables data engineers to create scalable, maintainable, and governed data integration solutions.

What Are Dataflows Gen2?

Dataflows Gen2 are cloud-based data transformation solutions built on Power Query technology.

They allow users to:

Connect to data sources
Clean and transform data
Combine multiple datasets
Perform data quality operations
Load data into Fabric destinations

Unlike notebooks or Spark jobs that require coding skills, Dataflows Gen2 provide a graphical interface for data preparation.

Common use cases include:

Data ingestion
Data cleansing
Dimension table creation
Data enrichment
ETL and ELT workflows
Self-service data preparation

Dataflows Gen2 Architecture

A typical Dataflow Gen2 process consists of:

			
Data Source
     ↓
Power Query Transformations
     ↓
Dataflow Gen2
     ↓
Destination

		

Possible destinations include:

Lakehouse Tables
Warehouse Tables
Azure SQL Database
Other supported Fabric destinations

Why Workspace Settings Matter

In small environments, Dataflows Gen2 can be managed individually.

However, in enterprise environments, administrators need centralized control over:

Dataflow creation
Dataflow execution
Compute usage
Security
Data destinations
Governance

Workspace settings help establish consistent behavior across all Dataflows Gen2 within a workspace.

Dataflows Gen2 Workspace Administration

Workspace administrators control who can:

Create Dataflows Gen2
Modify Dataflows Gen2
Schedule refreshes
Access source data
Access destinations

These permissions are governed through Fabric workspace roles.

Workspace Role	Dataflow Capability
Admin	Full control
Member	Create and manage
Contributor	Create and edit
Viewer	Read-only

DP-700 Exam Tip

Remember that Dataflows Gen2 do not have a separate security model.

They inherit Fabric workspace permissions.

Configure Dataflow Creation Permissions

Organizations often restrict who can create Dataflows Gen2.

Reasons include:

Governance requirements
Cost management
Data quality controls
Standardization

A common enterprise pattern is:

Contributors create Dataflows
Members manage Dataflows
Admins govern Dataflows

This prevents uncontrolled proliferation of ETL processes.

Configure Data Destinations

One of the most important Dataflows Gen2 settings involves destination configuration.

Supported destinations include:

Lakehouse

The most common destination.

Benefits:

Delta table storage
Integration with Spark
Medallion architecture support

Common usage:

Bronze layer ingestion
Silver layer transformation

Warehouse

Dataflows can load directly into Fabric Warehouses.

Benefits:

Structured analytics
SQL querying
Dimensional modeling support

Multiple Destinations

Dataflows Gen2 support loading data into multiple destinations from a single transformation pipeline.

Benefits include:

Reduced duplication of transformation logic
Improved maintainability
Consistent outputs

Configure Refresh Settings

Refresh configuration is one of the most frequently tested Dataflow topics.

Refresh settings determine:

When Dataflows execute
How often they run
How data is updated

Options include:

Manual Refresh

Execution occurs only when initiated by a user.

Best for:

Testing
Development
Small workloads

Scheduled Refresh

Execution occurs automatically based on a defined schedule.

Examples:

Hourly
Daily
Weekly

Most production Dataflows use scheduled refresh.

Pipeline-Orchestrated Refresh

Dataflows can be executed through Fabric Data Factory pipelines.

Benefits:

End-to-end orchestration
Dependency management
Complex workflow support

This is commonly used in enterprise ETL solutions.

Refresh Failure Notifications

Administrators can configure monitoring and notifications for refresh failures.

Benefits:

Faster troubleshooting
Improved reliability
Reduced downtime

Monitoring is particularly important when Dataflows support business-critical reporting systems.

Configure Data Source Credentials

Dataflows require access credentials for source systems.

Supported authentication methods vary by connector and may include:

Organizational account
OAuth
Basic authentication
Service principals
API keys

Workspace administrators often establish governance policies around credential management.

Best Practice

Use service accounts or service principals whenever possible for production workloads.

This avoids refresh failures caused by employee account changes.

Configure Gateway Usage

Some data sources reside inside private corporate networks.

Examples:

On-premises SQL Server
Oracle databases
File shares

In these scenarios, Dataflows Gen2 may require an On-Premises Data Gateway.

Gateway settings determine:

Connectivity
Authentication
Data access paths

A common DP-700 scenario involves selecting a gateway for on-premises data access.

Dataflow Compute and Performance Considerations

Dataflows Gen2 execute within Fabric-managed infrastructure.

Administrators should understand factors that impact performance:

Data Volume

Larger datasets increase:

Refresh duration
Resource consumption

Transformation Complexity

Operations such as:

Merges
Joins
Group By
Aggregations

increase processing requirements.

Number of Refreshes

Frequent refresh schedules can consume additional capacity resources.

Administrators should balance:

Data freshness
Capacity utilization

Dataflow Lineage and Impact Analysis

Fabric automatically captures lineage information.

Administrators can view:

			
Source
 ↓
Dataflow Gen2
 ↓
Lakehouse
 ↓
Semantic Model
 ↓
Report

		

Benefits include:

Impact analysis
Dependency tracking
Governance visibility

Lineage is an important governance feature frequently associated with Dataflows.

Dataflow Monitoring

Workspace administrators can monitor:

Refresh history
Success rates
Failure messages
Duration metrics

Monitoring tools include:

Refresh history
Monitoring Hub
Fabric capacity metrics

Common Troubleshooting Areas

Credential failures
Gateway connectivity issues
Schema changes
Destination write failures
Capacity limitations

Dataflow Governance Best Practices

Standardize Naming Conventions

Example:

			
DF_Bronze_Customer_Ingestion
DF_Silver_Sales_Transform
DF_Gold_Product_Aggregation

Consistent naming improves maintainability.

Use Scheduled Refresh Sparingly

Avoid unnecessary refresh frequency.

Example:

Do not refresh every 15 minutes if daily updates are sufficient.

Implement Service Principals

Reduce dependency on individual user accounts.

Leverage Lineage Views

Monitor downstream dependencies before making changes.

Align with Medallion Architecture

Use Dataflows strategically within:

Bronze Layer
Silver Layer
Gold Layer

Common DP-700 Exam Scenarios

Scenario 1

A Dataflow must load data from an on-premises SQL Server.

Solution:

Configure an On-Premises Data Gateway.

Scenario 2

A Dataflow should execute only after a source ingestion process completes.

Solution:

Use a Data Factory pipeline to orchestrate execution.

Scenario 3

A Dataflow should load transformed data into a Lakehouse for downstream Spark processing.

Solution:

Configure the Lakehouse as the destination.

DP-700 Exam Focus Areas

You should understand:

✓ Dataflows Gen2 architecture

✓ Workspace permissions

✓ Dataflow creation governance

✓ Data destinations

✓ Refresh scheduling

✓ Pipeline orchestration

✓ Credential management

✓ Gateway configuration

✓ Monitoring and troubleshooting

✓ Lineage and impact analysis

✓ Performance considerations

10 Practice Exam Questions

Question 1

Which technology provides the transformation engine used by Dataflows Gen2?

A. Power Query

B. Apache Spark

C. Kusto Query Language (KQL)

D. T-SQL

Answer: A

Explanation

Dataflows Gen2 use Power Query as their transformation engine, providing a low-code interface for data preparation.

Question 2

A Dataflow Gen2 needs to access an on-premises SQL Server database.

What must be configured?

A. Eventstream

B. Data Activator

C. On-Premises Data Gateway

D. OneLake Shortcut

Answer: C

Explanation

An On-Premises Data Gateway enables Fabric services to securely access data sources located inside private networks.

Question 3

Which destination is most commonly used for storing Dataflow Gen2 outputs within a medallion architecture?

A. Semantic Model

B. Dashboard

C. Notebook

D. Lakehouse

Answer: D

Explanation

Lakehouses are commonly used as Bronze, Silver, and Gold layers within Fabric medallion architectures.

Question 4

What is the primary advantage of scheduled refresh?

A. Eliminates authentication requirements

B. Automatically updates data without manual intervention

C. Increases storage capacity

D. Creates backup copies of source systems

Answer: B

Explanation

Scheduled refresh ensures that data remains current without requiring users to manually run the Dataflow.

Question 5

Which Fabric feature can orchestrate Dataflow Gen2 execution as part of a larger workflow?

A. Data Factory Pipeline

B. Lakehouse Explorer

C. Monitoring Hub

D. OneLake File Explorer

Answer: A

Explanation

Data Factory pipelines provide orchestration, dependency management, and scheduling capabilities.

Question 6

What information can lineage views provide?

A. Network bandwidth consumption

B. Spark executor utilization

C. Upstream and downstream dependencies

D. Gateway installation logs

Answer: C

Explanation

Lineage views show how data moves between sources, Dataflows, Lakehouses, semantic models, and reports.

Question 7

Which workspace role has full administrative control over Dataflows Gen2?

A. Viewer

B. Contributor

C. Member

D. Admin

Answer: D

Explanation

Workspace Admins have complete control over all workspace items, including Dataflows Gen2.

Question 8

A company wants to minimize production refresh failures caused by employee account changes.

What is the recommended approach?

A. Increase refresh frequency

B. Use service principals or service accounts

C. Disable scheduled refresh

D. Use Viewer permissions

Answer: B

Explanation

Service principals provide stable authentication that is not tied to individual users.

Question 9

Which factor is most likely to increase Dataflow refresh duration?

A. Smaller datasets

B. Reduced transformations

C. Complex joins and aggregations

D. Fewer destination tables

Answer: C

Explanation

Complex transformation logic increases processing requirements and refresh times.

Question 10

What is the primary purpose of Dataflow monitoring?

A. Create semantic models

B. Manage workspace domains

C. Configure Spark runtimes

D. Identify refresh failures and performance issues

Answer: D

Explanation

Monitoring helps administrators detect failures, troubleshoot issues, and optimize performance.

Final Exam Tip

For DP-700, Dataflows Gen2 questions typically focus on data ingestion, destinations, refresh management, gateways, orchestration, and governance. When evaluating exam scenarios, remember that Dataflows Gen2 are designed to provide a low-code ETL experience using Power Query, while Fabric Pipelines provide orchestration and Lakehouses commonly serve as the destination within modern medallion architectures.

Go to the DP-700 Exam Prep Hub main page.

DP-700, Microsoft Certification, Microsoft Fabric June 3, 2026

Choose Between Dataflows Gen2, Notebooks, KQL, and T-SQL for data transformation (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform batch data
      --> Choose Between Dataflows Gen2, Notebooks, KQL, and T-SQL for data transformation

Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Microsoft Fabric provides multiple technologies for transforming data. One of the most common challenges for a Data Engineer is determining which transformation tool is best suited for a specific business requirement.

The DP-700 exam frequently tests your ability to select the appropriate transformation technology based on:

Data volume
Data complexity
Required programming skills
Data source type
Performance requirements
Real-time versus batch processing needs
User expertise
Maintainability

The four most important transformation technologies covered in the exam are:

Dataflows Gen2
Notebooks
KQL
T-SQL

Although all four can transform data, they are optimized for different workloads and use cases.

Understanding their strengths, limitations, and ideal scenarios is critical for success on the DP-700 exam.

Overview of Transformation Technologies

Technology	Primary Purpose	Best For
Dataflows Gen2	Low-code ETL	Business-friendly transformations
Notebooks	Advanced engineering and Spark processing	Large-scale data engineering
T-SQL	Relational transformations	Warehouses and SQL workloads
KQL	Real-time analytics and telemetry processing	Logs and streaming data

Dataflows Gen2

What Are Dataflows Gen2?

Dataflows Gen2 are low-code data transformation tools within Microsoft Fabric that use Power Query.

They allow users to:

Connect to data sources
Clean data
Transform data
Load data into Fabric destinations

without writing significant amounts of code.

Transformation Engine

Dataflows Gen2 use:

Power Query
M Language (behind the scenes)

Most transformations are performed through a graphical interface.

Typical Transformations

Examples include:

Renaming columns
Removing duplicates
Filtering rows
Merging datasets
Splitting columns
Data type conversions
Calculated columns

When to Use Dataflows Gen2

Choose Dataflows Gen2 when:

Low-code development is desired
Data volumes are moderate
Business analysts participate in development
Transformations are relatively straightforward
Self-service data preparation is required

Examples:

Preparing Excel data
Cleaning CSV files
Combining multiple business datasets
Standard ETL processes

Advantages

Low-Code Experience

Minimal coding required.

Large Connector Library

Supports numerous source systems.

Easy Maintenance

Visual transformation steps are easier to understand.

Integration with Fabric

Loads directly into:

Lakehouses
Warehouses
Other Fabric destinations

Limitations

Less Flexible

Complex logic may become difficult.

Not Ideal for Very Large Data Volumes

Spark-based solutions often scale better.

Limited Advanced Programming

Compared to notebooks.

Notebooks

What Are Notebooks?

Notebooks are code-based development environments that support:

PySpark
Python
Scala
Spark SQL
R

within Microsoft Fabric.

Transformation Engine

Notebooks execute on Spark clusters.

This enables:

Distributed processing
Parallel execution
Large-scale transformations

Typical Transformations

Examples:

Complex joins
Data enrichment
Machine learning preparation
Feature engineering
Data quality validation
Custom business logic

When to Use Notebooks

Choose notebooks when:

Large data volumes exist
Spark processing is required
Advanced transformations are needed
Custom programming is necessary
Machine learning integration is planned

Examples:

Processing billions of records
Data science workflows
Medallion architecture pipelines
Complex transformations

Advantages

Massive Scalability

Handles large datasets efficiently.

Flexible Programming

Supports multiple languages.

Machine Learning Integration

Works with Spark ML libraries.

Advanced Data Engineering

Ideal for enterprise-scale pipelines.

Limitations

Requires Coding Skills

Less accessible for business users.

More Complex Development

Compared to Dataflows Gen2.

T-SQL

What Is T-SQL?

T-SQL (Transact-SQL) is Microsoft’s extension of SQL.

Fabric Warehouses and SQL endpoints support T-SQL for:

Querying
Transforming
Managing relational data

Transformation Techniques

Common operations include:

			
SELECT
JOIN
GROUP BY
CASE
CTE
MERGE
WINDOW FUNCTIONS

		

When to Use T-SQL

Choose T-SQL when:

Data resides in a Warehouse
Relational transformations are required
SQL expertise already exists
Dimensional models are being built

Examples:

Fact table loading
Dimension updates
Data warehouse ETL
Reporting data preparation

Advantages

Familiar Language

Widely used by data professionals.

Excellent Relational Processing

Optimized for structured data.

Strong Performance

Particularly for warehouse workloads.

Easy Integration

Works naturally with BI tools.

Limitations

Less Suitable for Unstructured Data

Not ideal for files and raw data.

Limited Distributed Processing

Compared to Spark.

KQL

What Is KQL?

Kusto Query Language (KQL) is designed for:

Log analytics
Telemetry analysis
Real-time data processing
Event analytics

KQL is commonly used in:

KQL Databases
Eventhouse
Real-Time Intelligence

Typical Transformations

Examples include:

Filtering events
Aggregations
Pattern detection
Time-series analysis
Stream transformations

When to Use KQL

Choose KQL when:

Working with telemetry data
Processing logs
Analyzing streaming events
Building real-time dashboards

Examples:

Sensor monitoring
Application logs
Security analytics
Operational monitoring

Advantages

Optimized for Time-Series Data

Excellent for event-driven workloads.

Fast Query Performance

Handles large event volumes efficiently.

Real-Time Analytics

Supports low-latency analysis.

Limitations

Not a General ETL Tool

Less suitable for traditional batch ETL.

Not Designed for Dimensional Modeling

Warehouses are generally better for reporting models.

Comparing Transformation Technologies

Requirement	Dataflows Gen2	Notebooks	T-SQL	KQL
Low-Code Development	Excellent	Poor	Moderate	Moderate
Large-Scale Processing	Moderate	Excellent	Good	Excellent
Relational Transformations	Moderate	Good	Excellent	Limited
Streaming Analytics	Limited	Moderate	Poor	Excellent
Machine Learning Support	Poor	Excellent	Poor	Limited
Telemetry Analytics	Poor	Moderate	Poor	Excellent
Business User Friendly	Excellent	Poor	Moderate	Moderate
Advanced Programming	Limited	Excellent	Moderate	Limited

Decision Framework

Choose Dataflows Gen2 When:

Low-code ETL is preferred
Business users are involved
Data volumes are moderate
Transformations are straightforward

Choose Notebooks When:

Spark processing is required
Data volumes are large
Complex transformations exist
Machine learning is involved

Choose T-SQL When:

Working with a Warehouse
Building dimensional models
SQL skills are available
Data is highly structured

Choose KQL When:

Processing logs
Analyzing telemetry
Supporting streaming analytics
Building operational monitoring solutions

Common DP-700 Scenario Questions

Scenario 1

A business analyst needs to combine Excel spreadsheets and remove duplicate rows using a visual interface.

Best choice:

Dataflows Gen2

Scenario 2

A data engineer must transform billions of records stored in a Lakehouse.

Best choice:

Notebook

Scenario 3

A warehouse team must populate fact and dimension tables.

Best choice:

T-SQL

Scenario 4

An operations team analyzes millions of application log events each hour.

Best choice:

KQL

Scenario 5

A machine learning team requires custom Python transformations.

Best choice:

Notebook

Exam Tips

Many DP-700 questions are not asking what can perform a transformation, but what should perform the transformation.

Remember these associations:

Requirement	Best Choice
Visual ETL	Dataflows Gen2
Spark processing	Notebook
Data warehouse transformations	T-SQL
Telemetry and logs	KQL
Machine learning preparation	Notebook
Self-service data preparation	Dataflows Gen2
Streaming analytics	KQL

Practice Exam Questions

Question 1

A business analyst needs to cleanse CSV files using a graphical interface with minimal coding. Which transformation technology should be used?

A. T-SQL

B. Notebook

C. KQL

D. Dataflows Gen2

Answer: D

Explanation

Dataflows Gen2 provide a low-code, visual interface that is ideal for business users and simple ETL processes.

Question 2

A data engineer must process several billion records stored in a Lakehouse using distributed computing.

Which option should be selected?

A. Notebook

B. Dataflows Gen2

C. T-SQL

D. KQL

Answer: A

Explanation

Notebooks leverage Spark for distributed processing and are designed for large-scale data transformations.

Question 3

Which technology is specifically optimized for transforming and analyzing telemetry and log data?

A. Dataflows Gen2

B. Notebook

C. KQL

D. T-SQL

Answer: C

Explanation

KQL is designed for log analytics, telemetry processing, and real-time operational analytics.

Question 4

A team is loading dimension and fact tables within a Fabric Warehouse.

Which transformation technology is most appropriate?

A. Notebook

B. Dataflows Gen2

C. KQL

D. T-SQL

Answer: D

Explanation

T-SQL is the preferred technology for relational transformations in Fabric Warehouses.

Question 5

A company requires machine learning feature engineering using Python libraries.

Which technology should be selected?

A. Notebook

B. Dataflows Gen2

C. T-SQL

D. KQL

Answer: A

Explanation

Notebooks support Python, Spark, and machine learning frameworks, making them ideal for feature engineering.

Question 6

Which technology relies primarily on Power Query transformations?

A. Notebook

B. Dataflows Gen2

C. T-SQL

D. KQL

Answer: B

Explanation

Dataflows Gen2 use Power Query and the M language behind the scenes for data transformations.

Question 7

An operations team needs to perform real-time aggregations on streaming sensor data.

Which option should be used?

A. Dataflows Gen2

B. Notebook

C. KQL

D. T-SQL

Answer: C

Explanation

KQL is optimized for real-time event processing and telemetry analysis.

Question 8

A data engineer needs maximum flexibility to implement custom business logic across multiple data sources.

Which technology is most appropriate?

A. KQL

B. Dataflows Gen2

C. T-SQL

D. Notebook

Answer: D

Explanation

Notebooks provide the highest degree of customization through programming languages such as Python and PySpark.

Question 9

A team already has extensive SQL expertise and needs to transform highly structured relational data in a Warehouse.

Which option is best?

A. Notebook

B. T-SQL

C. Dataflows Gen2

D. KQL

Answer: B

Explanation

T-SQL is optimized for relational transformations and leverages existing SQL skills.

Question 10

Which technology is generally the most business-user-friendly option for creating batch data transformation processes?

A. Notebook

B. KQL

C. T-SQL

D. Dataflows Gen2

Answer: D

Explanation

Dataflows Gen2 provide a visual, low-code experience that is easier for business users and citizen developers than code-based solutions.

DP-700 Exam Summary

When deciding between transformation technologies, focus on the primary workload:

Dataflows Gen2 → Low-code ETL and self-service data preparation
Notebooks → Spark, large-scale processing, advanced engineering, and machine learning
T-SQL → Relational transformations and warehouse development
KQL → Telemetry, logs, time-series analytics, and real-time event processing

A common DP-700 exam strategy is to identify the keywords in the scenario:

Visual interface → Dataflows Gen2
Billions of rows / Spark → Notebook
Warehouse / dimensional model → T-SQL
Logs / telemetry / real-time analytics → KQL

These keywords often point directly to the correct answer.

Go to the DP-700 Exam Prep Hub main page.

Analytics, DP-700, Microsoft Certification, Microsoft Fabric June 3, 2026

Choose between Dataflow Gen2, a pipeline and a notebook (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Implement and manage an analytics solution (30–35%)
   --> Orchestrate processes
      --> Choose between Dataflow Gen2, a pipeline and a notebook

Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the most important skills for a Microsoft Fabric Data Engineer is selecting the appropriate tool for a particular task. Microsoft Fabric provides several powerful technologies for data ingestion, transformation, orchestration, and automation. Three of the most commonly used are:

Dataflow Gen2
Data Pipelines
Notebooks

Although these tools often work together, they serve different purposes. Choosing the wrong tool can lead to unnecessary complexity, reduced maintainability, and increased development effort.

For the DP-700 exam, you should understand:

The primary purpose of each tool
When to use each tool
Strengths and limitations
Common design patterns
How these tools interact with one another

A significant number of DP-700 scenario questions are likely to test your ability to determine which Fabric component best fits a given business requirement.

Understanding the Three Tools

Before comparing them, it is important to understand their primary functions.

Tool	Primary Purpose
Dataflow Gen2	Low-code data ingestion and transformation
Data Pipeline	Workflow orchestration and automation
Notebook	Code-based data processing and advanced analytics

A useful way to remember this is:

			
Dataflow Gen2 = Transform Data
Pipeline = Orchestrate Processes
Notebook = Execute Code

What Is Dataflow Gen2?

Dataflow Gen2 is a low-code/no-code data integration and transformation tool built on Power Query technology.

It allows users to:

Connect to data sources
Clean data
Transform data
Merge datasets
Filter records
Load data into Fabric destinations

Dataflow Gen2 is designed for users who prefer visual development rather than coding.

Dataflow Gen2 Architecture

			
Data Source
      ↓
Power Query Transformations
      ↓
Dataflow Gen2
      ↓
Lakehouse / Warehouse

		

The transformation logic is built using a graphical interface.

Common Dataflow Gen2 Tasks

Examples include:

Removing duplicates
Filtering rows
Renaming columns
Data cleansing
Combining files
Joining datasets
Type conversions
Data standardization

These activities require little or no programming.

Advantages of Dataflow Gen2

Low-Code Development

Business analysts and citizen developers can build transformations without extensive coding knowledge.

Reusable Transformations

Transformations can be reused across multiple projects.

Familiar Power Query Experience

Users familiar with Power BI often adapt quickly.

Large Connector Library

Supports many cloud and on-premises data sources.

Limitations of Dataflow Gen2

Dataflow Gen2 is not ideal for:

Complex machine learning workloads
Advanced Spark processing
Custom Python development
Large-scale distributed programming

For those scenarios, notebooks are often more appropriate.

What Is a Data Pipeline?

A Data Pipeline is an orchestration tool.

Its primary purpose is not data transformation but rather coordinating and automating activities.

Think of a pipeline as a workflow engine.

Pipeline Architecture

			
Activity 1
      ↓
Activity 2
      ↓
Activity 3
      ↓
Activity 4

		

Pipelines determine:

What runs
When it runs
In what order it runs
Under what conditions it runs

Common Pipeline Activities

Examples include:

Copy Data
Execute Notebook
Execute Dataflow
Stored Procedures
Web Activities
Conditional Logic
Scheduling Jobs

Pipelines coordinate these activities into a complete workflow.

Advantages of Pipelines

Workflow Automation

Automates complex end-to-end processes.

Scheduling

Supports recurring execution schedules.

Dependency Management

Controls execution order.

Error Handling

Supports retries and failure paths.

Integration

Can orchestrate multiple Fabric components.

Limitations of Pipelines

Pipelines are not intended for:

Complex data transformations
Interactive analysis
Advanced programming

Pipelines orchestrate work; they do not replace transformation tools.

What Is a Notebook?

A notebook is a code-based environment that allows developers and data engineers to execute code directly against Fabric data.

Notebooks commonly use:

Python
PySpark
Spark SQL
Scala (where supported)

They run on Spark compute engines.

Notebook Architecture

			
Data Source
      ↓
Spark Processing
      ↓
Notebook
      ↓
Lakehouse / Warehouse

		

Notebooks provide maximum flexibility and control.

Common Notebook Tasks

Examples include:

PySpark transformations
Data engineering workflows
Machine learning preparation
Advanced data cleansing
Streaming data processing
Delta table optimization
Custom business logic

Advantages of Notebooks

Full Programming Flexibility

Developers can implement virtually any logic.

Spark Integration

Supports distributed processing.

Advanced Transformations

Suitable for highly complex data engineering workloads.

Machine Learning Support

Works well with AI and ML frameworks.

Scalability

Can process very large datasets.

Limitations of Notebooks

Coding Required

Requires programming knowledge.

Higher Complexity

Can be more difficult to maintain.

Less Accessible

Business users typically prefer Dataflow Gen2.

Side-by-Side Comparison

Feature	Dataflow Gen2	Pipeline	Notebook
Primary Purpose	Data Transformation	Orchestration	Advanced Processing
Coding Required	Minimal	Minimal	Extensive
Scheduling	Limited	Yes	Usually via Pipeline
Spark Support	No Direct Coding	No	Yes
Visual Interface	Yes	Yes	No
Advanced Logic	Limited	Limited	Extensive
Best for ETL	Yes	Coordinates ETL	Yes
Machine Learning	No	No	Yes

When to Choose Dataflow Gen2

Choose Dataflow Gen2 when:

Data cleansing is required
Users prefer visual tools
Power Query transformations are sufficient
Business analysts are building solutions
Coding should be minimized

Example

Requirement:

			
Import CSV files
Remove duplicates
Rename columns
Load into Lakehouse

Best Choice:

Dataflow Gen2

When to Choose a Pipeline

Choose a Pipeline when:

Multiple tasks must be coordinated
Processes require scheduling
Activities depend on one another
Workflows need monitoring
Automation is required

Example

Requirement:

			
Run Dataflow
Run Notebook
Load Warehouse
Send Notification

Best Choice:

Pipeline

When to Choose a Notebook

Choose a Notebook when:

Complex transformations are required
PySpark processing is needed
Machine learning is involved
Custom code is necessary
Large-scale distributed processing is required

Example

Requirement:

			
Apply custom PySpark transformation
Process 10 TB dataset
Optimize Delta tables

Best Choice:

Notebook

Common Real-World Pattern

In many Fabric environments, all three tools are used together.

Example:

			
Dataflow Gen2
      ↓
Pipeline
      ↓
Notebook
      ↓
Warehouse

		

Workflow:

Dataflow Gen2 cleans source files.
Pipeline orchestrates execution.
Notebook performs advanced transformations.
Results load into a Warehouse.

This layered approach is common in enterprise solutions.

Decision Framework for DP-700

When reading exam questions, ask:

Is the requirement primarily data transformation?

Choose:

Dataflow Gen2

Is the requirement workflow orchestration?

Choose:

Pipeline

Is the requirement advanced coding or Spark processing?

Choose:

Notebook

Common Exam Traps

Trap #1

Question mentions:

Scheduling
Dependencies
Automation

Correct answer:

Pipeline

Even if transformations are involved.

Trap #2

Question mentions:

PySpark
Python
Machine Learning
Spark

Correct answer:

Notebook

Trap #3

Question mentions:

Power Query
Visual transformation
No-code development

Correct answer:

Dataflow Gen2

DP-700 Exam Focus Areas

You should understand:

✓ Purpose of Dataflow Gen2

✓ Purpose of Data Pipelines

✓ Purpose of Notebooks

✓ Visual versus code-based development

✓ Workflow orchestration

✓ Spark processing

✓ Power Query transformations

✓ Scheduling and automation

✓ Common integration patterns

✓ Appropriate tool selection for business scenarios

Practice Exam Questions

Question 1

A business analyst needs to import CSV files, remove duplicate rows, and standardize column names using a visual interface with minimal coding.

Which Fabric component should be used?

A. Notebook

B. Data Pipeline

C. Dataflow Gen2

D. Deployment Pipeline

Answer: C

Explanation

Dataflow Gen2 is designed for low-code data ingestion and transformation using Power Query.

Question 2

A data engineering solution must execute the following sequence:

Run a Dataflow Gen2 process
Execute a Notebook
Load a Warehouse
Send a notification email

Which Fabric component should coordinate this workflow?

A. Lakehouse

B. Data Pipeline

C. Notebook

D. Semantic Model

Answer: B

Explanation

Pipelines are designed to orchestrate and automate multiple activities and dependencies.

Question 3

A team needs to perform complex PySpark transformations against several terabytes of data.

Which Fabric component is most appropriate?

A. Dataflow Gen2

B. Pipeline

C. Dashboard

D. Notebook

Answer: D

Explanation

Notebooks provide Spark-based programming environments suitable for large-scale transformations.

Question 4

Which Fabric component is primarily responsible for workflow orchestration?

A. Dataflow Gen2

B. Lakehouse

C. Warehouse

D. Data Pipeline

Answer: D

Explanation

Data Pipelines coordinate and automate execution of multiple activities.

Question 5

A solution requires users with no programming experience to create reusable data cleansing transformations.

Which component should be selected?

A. Notebook

B. Dataflow Gen2

C. Pipeline

D. Spark Job Definition

Answer: B

Explanation

Dataflow Gen2 provides a low-code visual environment for data preparation.

Question 6

Which Fabric component offers the greatest flexibility for implementing custom business logic?

A. Dataflow Gen2

B. Warehouse

C. Notebook

D. Data Pipeline

Answer: C

Explanation

Notebooks support Python, PySpark, and Spark SQL, allowing virtually unlimited customization.

Question 7

A company wants to schedule nightly execution of several notebooks and monitor failures.

Which Fabric component should be used?

A. Dataflow Gen2

B. Notebook

C. Lakehouse

D. Data Pipeline

Answer: D

Explanation

Pipelines provide scheduling, monitoring, dependencies, and failure handling.

Question 8

Which statement best describes Dataflow Gen2?

A. It is primarily a workflow orchestration tool.

B. It is a low-code data transformation solution based on Power Query.

C. It is designed for machine learning development.

D. It replaces Spark notebooks.

Answer: B

Explanation

Dataflow Gen2 is optimized for low-code ETL and data transformation workloads.

Question 9

A data engineer must optimize Delta tables using Spark commands and Python code.

Which Fabric component should be used?

A. Notebook

B. Data Pipeline

C. Dataflow Gen2

D. Warehouse

Answer: A

Explanation

Notebook environments provide direct access to Spark capabilities and custom code execution.

Question 10

Which scenario is the best fit for a Data Pipeline?

A. Creating Power Query transformations

B. Applying machine learning algorithms

C. Coordinating multiple Fabric activities into an automated workflow

D. Writing custom PySpark code

Answer: C

Explanation

Pipelines are specifically designed for orchestration, automation, scheduling, dependency management, and monitoring.

Exam Tip

A useful DP-700 memory aid is:

Requirement	Best Tool
Visual ETL and data preparation	Dataflow Gen2
Scheduling and orchestration	Data Pipeline
Spark, Python, and advanced processing	Notebook

When a scenario focuses on automation and coordinating activities, think Pipeline.

When it focuses on Power Query transformations, think Dataflow Gen2.

When it focuses on PySpark, Spark SQL, machine learning, or custom code, think Notebook.

Go to the DP-700 Exam Prep Hub main page.