This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Monitor and optimize an analytics solution (30–35%)
--> Identify and resolve errors
--> Identify and resolve Dataflow Gen2 errors
Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Overview
Dataflow Gen2 is a powerful data ingestion and transformation service in Microsoft Fabric that enables data engineers and analysts to perform Extract, Transform, and Load (ETL) operations using a low-code, visual interface based on Power Query. Dataflow Gen2 supports hundreds of data sources and can load data into destinations such as Lakehouses, Warehouses, KQL Databases, and other Fabric items.
Because Dataflow Gen2 is often used to prepare and transform data before it reaches analytical solutions, failures can have significant downstream impacts. For the DP-700 exam, candidates must understand how to identify, troubleshoot, and resolve Dataflow Gen2 errors, interpret refresh results, analyze execution details, and implement practices that reduce operational issues.
Understanding Dataflow Gen2 Execution
A Dataflow Gen2 execution consists of several stages:
- Source connection
- Data extraction
- Query transformation
- Data validation
- Data loading
- Refresh completion
Errors can occur at any stage of this process.
Unlike pipelines, where multiple activities execute sequentially, Dataflow Gen2 refreshes are generally focused on executing Power Query transformations and loading results into destination systems.
Successful troubleshooting requires identifying which stage failed.
Common Categories of Dataflow Gen2 Errors
Connection Errors
Connection failures occur when Dataflow Gen2 cannot access the source system.
Common causes include:
- Invalid credentials
- Expired passwords
- Revoked access
- Incorrect connection strings
- Network issues
- Unsupported authentication methods
Example:
A Dataflow Gen2 refresh connects to Azure SQL Database using a username whose password has expired.
Result:
The refresh fails before any data is retrieved.
Typical troubleshooting steps:
- Verify credentials.
- Test connectivity.
- Reauthenticate the connection.
- Confirm source availability.
Authentication and Authorization Errors
Authentication confirms identity.
Authorization confirms permissions.
Common examples:
- Missing database permissions
- Insufficient workspace permissions
- Revoked service account access
- Missing OneLake permissions
Example error:
“Access denied while attempting to access source.”
Resolution:
Verify user permissions and security roles on both source and destination systems.
Source Schema Changes
Schema drift occurs when source structures change unexpectedly.
Examples include:
- Columns removed
- Columns renamed
- Data types modified
- New columns added
Example:
A transformation references a column named CustomerStatus.
A source application update renames the column to Status.
Result:
The transformation step fails.
Resolution:
Update transformation logic to reflect the new schema.
Power Query Transformation Errors
Many Dataflow Gen2 failures occur during transformation processing.
Missing Column Errors
Example:
A step attempts to select a column that no longer exists.
Error:
“Column not found.”
Resolution:
Review source schema and update transformation steps.
Data Type Conversion Errors
Example:
A text value such as “ABC123” is converted to a Whole Number data type.
Result:
Conversion failure.
Resolution:
- Validate source data.
- Clean data before conversion.
- Use error handling logic.
Invalid Formula Errors
Power Query transformations use M language behind the scenes.
Example:
A custom column contains an incorrect expression.
Result:
Refresh failure.
Resolution:
Review and correct the transformation formula.
Reference Errors
Queries may reference:
- Other queries
- Parameters
- Functions
If referenced objects are deleted or renamed, failures occur.
Resolution:
Validate dependencies within the dataflow.
Destination Errors
Errors may occur after transformations complete successfully.
Lakehouse Write Failures
Examples:
- Missing destination table
- Permission issues
- Storage limitations
- Schema mismatch
Resolution:
Verify table structure and permissions.
Warehouse Loading Errors
Examples:
- Unsupported data types
- Primary key violations
- Schema conflicts
Resolution:
Validate compatibility between transformed data and destination schema.
KQL Database Loading Errors
Examples:
- Incorrect mappings
- Unsupported formats
- Data ingestion policy issues
Resolution:
Review destination configuration and ingestion mappings.
Refresh Failures
Dataflow Gen2 refresh operations generate execution information that should be examined whenever failures occur.
Refresh details often provide:
- Failure stage
- Error messages
- Execution duration
- Rows processed
- Destination information
For troubleshooting, refresh history is usually the first place to investigate.
Monitoring Dataflow Gen2 Refreshes
Refresh History
Refresh history provides information about:
- Successes
- Failures
- Start times
- End times
- Refresh duration
Engineers should review failed refreshes immediately after errors occur.
Detailed Error Messages
Refresh details often contain:
- Error codes
- Source system messages
- Transformation failures
- Destination loading issues
Always review the detailed error rather than relying solely on the refresh status.
Example:
Generic message:
“Refresh failed.”
Detailed message:
“Cannot convert value ‘N/A’ to Whole Number.”
The detailed error immediately identifies the issue.
Dataflow Monitoring in Fabric
Fabric monitoring tools can help identify:
- Failed refreshes
- Long-running refreshes
- Capacity-related issues
- Destination write failures
Monitoring trends over time can reveal recurring problems.
Troubleshooting Common Dataflow Gen2 Errors
Scenario 1: Source Authentication Failure
Symptoms:
- Refresh fails immediately.
- No records processed.
Investigation:
- Verify credentials.
- Test source access.
- Reauthenticate the connection.
- Confirm account permissions.
Resolution:
Update credentials or restore permissions.
Scenario 2: Missing Column Error
Symptoms:
- Refresh fails during transformation.
- Error references a missing field.
Investigation:
- Review source schema.
- Compare against transformation steps.
- Identify renamed or deleted columns.
Resolution:
Modify transformation logic.
Scenario 3: Data Type Conversion Failure
Symptoms:
- Refresh stops during transformation.
Example:
A column contains:
- 100
- 200
- ABC
The query attempts to convert the column to numeric values.
Resolution:
- Clean invalid values.
- Replace errors.
- Filter problematic records.
Scenario 4: Destination Table Failure
Symptoms:
- Transformations succeed.
- Loading fails.
Investigation:
- Verify destination exists.
- Validate permissions.
- Review destination schema.
Resolution:
Correct schema or permission issues.
Scenario 5: Long-Running Refresh
Symptoms:
- Refresh takes significantly longer than expected.
Possible causes:
- Large data volume
- Complex transformations
- Source system bottlenecks
- Capacity constraints
Resolution:
Optimize transformations and reduce unnecessary processing.
Using Query Diagnostics
Power Query provides diagnostic capabilities that can help identify:
- Expensive transformation steps
- Slow source queries
- Bottlenecks during execution
Query diagnostics are particularly useful when refreshes succeed but perform poorly.
Areas to investigate include:
- Excessive row operations
- Repeated transformations
- Non-folding queries
Query Folding and Error Prevention
What is Query Folding?
Query folding occurs when transformations are pushed back to the source system.
Instead of processing data inside Fabric:
- The source executes filtering.
- The source performs aggregations.
- The source reduces result sets.
Benefits:
- Faster refreshes
- Reduced resource consumption
- Lower failure risk
How Query Folding Affects Troubleshooting
Poor query folding can lead to:
- Excessive processing
- Memory consumption
- Long refresh durations
When troubleshooting performance-related refresh issues, query folding should be evaluated.
Capacity-Related Errors
Dataflow Gen2 consumes Fabric compute resources.
Potential issues include:
- High concurrency
- Capacity throttling
- Resource contention
Symptoms:
- Slow refreshes
- Intermittent failures
- Unexpected cancellations
Monitoring Fabric capacity metrics can help identify these issues.
Best Practices for Preventing Dataflow Gen2 Errors
Validate Source Schemas
Regularly review source structures.
This helps detect schema drift before failures occur.
Use Defensive Transformations
Handle unexpected values through:
- Null handling
- Error replacement
- Data validation
This improves refresh reliability.
Minimize Complex Transformations
Perform only necessary transformations.
Simpler dataflows are easier to maintain and troubleshoot.
Monitor Refresh History
Review failures and performance trends regularly.
Early detection reduces operational impact.
Test After Source Changes
Whenever source applications are modified:
- Validate schemas.
- Test refreshes.
- Confirm transformation logic.
Optimize Query Folding
Push processing to source systems whenever possible.
This reduces execution times and resource consumption.
Document Dependencies
Track:
- Source systems
- Queries
- Parameters
- Destination tables
Documentation simplifies troubleshooting.
DP-700 Exam Tips
For the exam, remember:
- Most Dataflow Gen2 troubleshooting begins with refresh history.
- Source schema changes are a common cause of refresh failures.
- Data type conversion errors frequently occur during transformations.
- Destination errors can occur even when transformations succeed.
- Query folding significantly affects performance and reliability.
- Detailed error messages provide more value than high-level failure notifications.
- Authentication and authorization issues are common root causes.
- Capacity constraints can impact refresh performance.
- Missing columns and renamed fields often cause transformation failures.
- Monitoring refresh history is a core operational responsibility for Fabric data engineers.
Practice Exam Questions
Question 1
A Dataflow Gen2 refresh fails with the error:
“Column ‘CustomerType’ was not found.”
What is the most likely cause?
A. Destination table permissions were removed.
B. Fabric capacity is overloaded.
C. A source schema change occurred.
D. Query folding is disabled.
Correct Answer: C
Explanation:
The transformation references a column that no longer exists or has been renamed.
- A would generate authorization errors.
- C would typically cause performance or resource issues.
- D affects performance rather than column existence.
Question 2
A Dataflow Gen2 refresh fails immediately after starting and processes zero rows.
Which issue is most likely?
A. Authentication failure
B. Query folding problem
C. Aggregation error
D. Destination schema mismatch
Correct Answer: A
Explanation:
Authentication issues generally prevent data retrieval from beginning.
- B affects execution efficiency.
- C occurs during transformation.
- D typically appears later during loading.
Question 3
A data engineer wants to determine exactly why a Dataflow Gen2 refresh failed.
What should they examine first?
A. Fabric capacity metrics
B. Lakehouse storage statistics
C. Refresh history details
D. Workspace role assignments
Correct Answer: C
Explanation:
Refresh history contains detailed execution information and error messages.
- A and B may be useful later.
- D should only be investigated if permissions are suspected.
Question 4
A Dataflow Gen2 transformation attempts to convert a text value of “Unknown” into a Whole Number.
What type of error will occur?
A. Data type conversion error
B. Capacity error
C. Query dependency error
D. Authentication error
Correct Answer: A
Explanation:
Text values that cannot be converted to numeric formats generate conversion failures.
- A, B, and C are unrelated to data conversion.
Question 5
Which capability pushes transformation processing back to the source system whenever possible?
A. Data validation
B. Query folding
C. Incremental refresh
D. Parameterization
Correct Answer: B
Explanation:
Query folding allows supported transformations to execute on the source system.
- A validates data.
- C limits refresh scope.
- D provides dynamic values.
Question 6
Transformations complete successfully, but data cannot be written to the destination Warehouse.
Which category of issue is most likely?
A. Destination loading error
B. Source connectivity failure
C. Missing source table
D. Query folding issue
Correct Answer: A
Explanation:
If transformations finish successfully, the failure likely occurs during the loading phase.
- B and C would occur earlier.
- D typically affects performance.
Question 7
A data engineer wants to reduce failures caused by unexpected values in source data.
Which approach is best?
A. Increase capacity size
B. Disable query folding
C. Use defensive transformations and error handling
D. Reduce refresh frequency
Correct Answer: C
Explanation:
Handling nulls, invalid values, and conversion errors proactively improves reliability.
- A may help performance but not data quality.
- B often reduces efficiency.
- D does not address the root cause.
Question 8
Which issue commonly results from source schema drift?
A. Missing or renamed columns
B. Capacity throttling
C. Refresh scheduling conflicts
D. Workspace role inheritance
Correct Answer: A
Explanation:
Schema drift occurs when source structures change unexpectedly.
- B, C, and D are unrelated.
Question 9
A refresh suddenly begins taking twice as long as usual without failing.
Which tool would be most useful for identifying expensive transformation steps?
A. Workspace permissions
B. Query diagnostics
C. Tenant settings
D. Dataflow ownership settings
Correct Answer: B
Explanation:
Query diagnostics help identify bottlenecks and inefficient transformations.
- A, C, and D do not provide execution analysis.
Question 10
Which best practice helps prevent Dataflow Gen2 failures after application updates modify source tables?
A. Disable refresh schedules temporarily
B. Increase concurrency limits
C. Recreate all dataflows monthly
D. Validate source schemas and test refreshes after changes
Correct Answer: D
Explanation:
Testing after source changes helps identify schema drift and compatibility issues before production failures occur.
- A is reactive rather than preventive.
- B does not address schema changes.
- C is unnecessary and inefficient.
Go to the DP-700 Exam Prep Hub main page.
