Tag: Notebooks

DP-700, Microsoft Certification, Microsoft Fabric June 3, 2026

Identify and resolve notebook errors (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Monitor and optimize an analytics solution (30–35%)
   --> Identify and resolve errors
      --> Identify and resolve notebook errors

Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Notebook troubleshooting is an important skill for the DP-700 certification exam because notebooks are one of the primary tools used for data ingestion, transformation, orchestration, machine learning, and advanced analytics in Microsoft Fabric. Data engineers must be able to quickly identify failures, interpret error messages, diagnose root causes, and implement corrective actions.

This topic focuses on understanding notebook execution, common notebook errors, monitoring tools, debugging techniques, and best practices for building reliable notebook solutions in Microsoft Fabric.

Understanding Notebooks in Microsoft Fabric

A notebook is an interactive development environment that allows engineers to write and execute code using:

PySpark
Spark SQL
Scala
Python
R (where supported)

Fabric notebooks run on Spark clusters and are commonly used to:

Ingest data into Lakehouses
Transform data
Build ETL processes
Execute streaming workloads
Perform data quality checks
Orchestrate complex data engineering workflows

Because notebooks often process large datasets and depend on external systems, failures are inevitable. Effective troubleshooting is therefore a critical data engineering skill.

Common Categories of Notebook Errors

Notebook failures generally fall into several categories:

Syntax Errors

These occur when code violates language rules.

Example

df = spark.read.csv("/Files/data.csv"

Error:

SyntaxError: unexpected EOF while parsing

Cause:

Missing closing parenthesis

Resolution:

Review code carefully
Use notebook syntax highlighting
Validate code before execution

Runtime Errors

Runtime errors occur when code is syntactically correct but fails during execution.

Example

value = 100 / 0

Error:

ZeroDivisionError

Cause:

Division by zero

Resolution:

Add validation logic
Implement exception handling

Data Access Errors

These are among the most common notebook failures.

Examples

File not found
Table not found
Permission denied
Invalid storage path

Example:

			
df = spark.read.parquet(
    "/Files/Sales2025"
)

Error:

Path does not exist

Possible causes:

Incorrect path
Deleted file
Typographical error
Missing shortcut

Resolution:

Verify file location
Confirm OneLake shortcut configuration
Check permissions

Authentication and Authorization Errors

A notebook may be unable to access resources because the user or service principal lacks required permissions.

Examples:

			
Access Denied
Unauthorized
Permission denied

Common causes:

Workspace role limitations
Missing Lakehouse permissions
Source-system authentication failures

Resolution:

Verify workspace access
Confirm security settings
Validate credentials

Spark Resource Errors

Spark jobs require compute resources.

Failures may occur because of:

Insufficient memory
Driver overload
Executor failures
Large shuffle operations

Typical errors:

			
OutOfMemoryError
ExecutorLostFailure
Driver memory exceeded

Resolution:

Increase Spark resources
Optimize queries
Partition data appropriately
Reduce data movement

Dependency Errors

Notebook code may depend on external packages.

Example:

import pandas_profiling

Error:

ModuleNotFoundError

Cause:

Package not installed

Resolution:

Install required libraries
Use supported package versions

Monitoring Notebook Execution

Fabric provides several methods for monitoring notebook runs.

Notebook Run Status

Execution status may show:

Running
Completed
Failed
Cancelled

A failed run should always be investigated using execution logs.

Cell-Level Error Analysis

Notebook failures typically identify:

Failed cell
Error type
Line number
Stack trace

Example:

			
Cell 8 failed
AnalysisException
Table not found

This information significantly narrows troubleshooting efforts.

Spark Job Monitoring

Fabric allows engineers to inspect Spark jobs generated by notebook execution.

Useful information includes:

Job duration
Task failures
Stage failures
Resource utilization
Data shuffle activity

This information is particularly valuable for performance-related issues.

Reading Spark Error Messages

One of the most important DP-700 skills is interpreting Spark exceptions.

AnalysisException

Example:

			
AnalysisException:
Table customer_dim not found

Cause:

Missing table
Incorrect table name
Incorrect Lakehouse attachment

Resolution:

Verify table existence
Check notebook Lakehouse context

FileNotFoundException

Example:

FileNotFoundException

Cause:

Missing file
Incorrect path

Resolution:

Validate storage path
Confirm file availability

OutOfMemoryError

Example:

Java heap space

Cause:

Dataset too large
Inefficient transformations

Resolution:

Optimize Spark processing
Use partitioning
Increase cluster resources

NullPointerException

Cause:

Unexpected null values
Missing objects

Resolution:

Validate inputs
Add null handling

Debugging Techniques

Execute Incrementally

Rather than running an entire notebook:

Run cells individually
Verify outputs
Isolate failures

This approach greatly reduces troubleshooting time.

Inspect Intermediate Results

Example:

df.show()

display(df)

Benefits:

Verify schema
Validate transformations
Detect null values
Confirm expected row counts

Check Schemas

Schema mismatches are a common source of errors.

Example:

df.printSchema()

Verify:

Column names
Data types
Nullable settings

Validate Row Counts

Example:

df.count()

Useful for identifying:

Missing records
Unexpected filtering
Data quality issues

Exception Handling

PySpark notebooks can implement error handling using Python exceptions.

Example:

			
try:
    df = spark.read.parquet(path)
except Exception as e:
    print(e)

Benefits:

Graceful failure handling
Better logging
Easier troubleshooting

Logging Best Practices

Instead of relying solely on notebook output, create structured logging.

Example:

			
print("Starting ingestion...")
print("Reading source data...")
print("Writing destination table...")

Benefits:

Easier root-cause analysis
Better operational monitoring
Faster issue resolution

Many organizations write logs to:

Lakehouse tables
Monitoring databases
Log Analytics environments

Notebook Failures in Pipelines

Many Fabric notebooks are executed through Data Pipelines.

When notebook activities fail:

Pipeline monitoring provides:

Activity status
Error messages
Execution duration
Retry history

Common troubleshooting process:

Identify failed activity
Open notebook run details
Review Spark logs
Identify root cause
Correct notebook logic

Common Production Notebook Issues

Lakehouse Not Attached

Symptoms:

Table not found

Resolution:

Attach correct Lakehouse

Schema Drift

Symptoms:

New columns appear
Data types change

Resolution:

Add schema validation logic
Handle schema evolution

Large Data Volumes

Symptoms:

Slow execution
Memory failures

Resolution:

Optimize partitions
Filter data earlier
Reduce shuffle operations

Missing Upstream Data

Symptoms:

File not found

Resolution:

Verify ingestion completion
Add dependency checks

Notebook Optimization to Prevent Errors

Proactive optimization reduces future failures.

Best practices include:

Use partition pruning
Cache only when necessary
Avoid excessive collect() operations
Filter data early
Use Delta tables
Monitor Spark resource usage
Implement retry logic where appropriate
Validate input datasets before processing

Exam Tips

For the DP-700 exam, remember:

AnalysisException usually indicates missing tables, views, or schema issues.
FileNotFoundException typically indicates invalid paths or missing files.
OutOfMemoryError often indicates resource constraints or inefficient Spark processing.
Notebook debugging frequently involves reviewing Spark logs and cell-level errors.
Lakehouse attachment problems commonly cause table-access failures.
Pipelines provide monitoring information when notebook activities fail.
Exception handling and logging improve operational reliability.
Schema validation helps prevent runtime failures caused by schema drift.
Spark monitoring tools help diagnose performance and execution problems.
Resource optimization can prevent many notebook failures before they occur.

Practice Exam Questions

Question 1

A Fabric notebook fails with the following error:

AnalysisException: Table sales_fact not found

What is the MOST likely cause?

A. Spark cluster memory exhaustion

B. The referenced table does not exist or the wrong Lakehouse is attached

C. Network connectivity failure

D. Missing Python package

Correct Answer: B

Explanation:
AnalysisException commonly occurs when a referenced table, view, or schema object cannot be found. An incorrect Lakehouse attachment is also a frequent cause.

Question 2

A notebook fails with a FileNotFoundException when reading a parquet file.

What should be investigated first?

A. Spark executor configuration

B. Notebook language version

C. Storage path and file existence

D. Semantic model refresh history

Correct Answer: C

Explanation:
FileNotFoundException generally indicates an incorrect path, deleted file, missing shortcut, or unavailable source file.

Question 3

Which tool provides the MOST detailed information about Spark stage failures and executor issues?

A. Semantic model refresh history

B. Power BI usage metrics

C. Workspace role assignments

D. Spark job monitoring details

Correct Answer: D

Explanation:
Spark monitoring provides insight into jobs, stages, tasks, executor failures, and resource utilization.

Question 4

A notebook consistently fails due to Java heap space errors.

What is the MOST likely root cause?

A. Lakehouse attachment issue

B. Missing notebook parameter

C. Insufficient memory for the workload

D. Authentication failure

Correct Answer: C

Explanation:
Java heap space errors typically indicate memory pressure caused by large datasets or inefficient Spark operations.

Question 5

Which practice is MOST useful for isolating the source of a notebook failure?

A. Executing the entire notebook repeatedly

B. Running notebook cells individually and validating outputs

C. Increasing semantic model refresh frequency

D. Deleting Spark logs

Correct Answer: B

Explanation:
Executing cells incrementally helps identify exactly where a failure occurs and simplifies troubleshooting.

Question 6

A notebook references a Python package that is unavailable in the Spark environment.

Which error is MOST likely?

A. ModuleNotFoundError

B. AnalysisException

C. FileNotFoundException

D. TimeoutException

Correct Answer: A

Explanation:
ModuleNotFoundError occurs when required libraries or dependencies are unavailable.

Question 7

Which technique helps detect schema drift before downstream failures occur?

A. Increasing cluster size

B. Restarting the Spark session

C. Validating schemas during ingestion and transformation

D. Disabling logging

Correct Answer: C

Explanation:
Schema validation identifies unexpected columns, missing fields, or data type changes before they impact processing.

Question 8

A notebook activity fails within a Fabric pipeline.

Where should an engineer typically begin troubleshooting?

A. Power BI report usage metrics

B. Semantic model refresh schedule

C. Workspace branding settings

D. Pipeline activity run details and notebook execution logs

Correct Answer: D

Explanation:
Pipeline activity logs provide error messages, execution status, duration, and links to notebook execution details.

Question 9

Which action can help reduce the likelihood of OutOfMemoryError exceptions?

A. Using partition pruning and filtering data early

B. Disabling Spark monitoring

C. Removing notebook logging

D. Creating additional semantic models

Correct Answer: A

Explanation:
Reducing data volume processed by Spark lowers memory requirements and improves execution efficiency.

Question 10

Why should exception handling be implemented in production notebooks?

A. To eliminate all Spark errors

B. To increase Lakehouse storage capacity

C. To improve report rendering speed

D. To capture errors gracefully and improve troubleshooting

Correct Answer: D

Explanation:
Exception handling enables controlled failure behavior, better logging, easier diagnosis, and more resilient notebook execution.

Go to the DP-700 Exam Prep Hub main page.

DP-700, Microsoft Certification, Microsoft Fabric June 3, 2026

Choose Between Dataflows Gen2, Notebooks, KQL, and T-SQL for data transformation (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform batch data
      --> Choose Between Dataflows Gen2, Notebooks, KQL, and T-SQL for data transformation

Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Microsoft Fabric provides multiple technologies for transforming data. One of the most common challenges for a Data Engineer is determining which transformation tool is best suited for a specific business requirement.

The DP-700 exam frequently tests your ability to select the appropriate transformation technology based on:

Data volume
Data complexity
Required programming skills
Data source type
Performance requirements
Real-time versus batch processing needs
User expertise
Maintainability

The four most important transformation technologies covered in the exam are:

Dataflows Gen2
Notebooks
KQL
T-SQL

Although all four can transform data, they are optimized for different workloads and use cases.

Understanding their strengths, limitations, and ideal scenarios is critical for success on the DP-700 exam.

Overview of Transformation Technologies

Technology	Primary Purpose	Best For
Dataflows Gen2	Low-code ETL	Business-friendly transformations
Notebooks	Advanced engineering and Spark processing	Large-scale data engineering
T-SQL	Relational transformations	Warehouses and SQL workloads
KQL	Real-time analytics and telemetry processing	Logs and streaming data

Dataflows Gen2

What Are Dataflows Gen2?

Dataflows Gen2 are low-code data transformation tools within Microsoft Fabric that use Power Query.

They allow users to:

Connect to data sources
Clean data
Transform data
Load data into Fabric destinations

without writing significant amounts of code.

Transformation Engine

Dataflows Gen2 use:

Power Query
M Language (behind the scenes)

Most transformations are performed through a graphical interface.

Typical Transformations

Examples include:

Renaming columns
Removing duplicates
Filtering rows
Merging datasets
Splitting columns
Data type conversions
Calculated columns

When to Use Dataflows Gen2

Choose Dataflows Gen2 when:

Low-code development is desired
Data volumes are moderate
Business analysts participate in development
Transformations are relatively straightforward
Self-service data preparation is required

Examples:

Preparing Excel data
Cleaning CSV files
Combining multiple business datasets
Standard ETL processes

Advantages

Low-Code Experience

Minimal coding required.

Large Connector Library

Supports numerous source systems.

Easy Maintenance

Visual transformation steps are easier to understand.

Integration with Fabric

Loads directly into:

Lakehouses
Warehouses
Other Fabric destinations

Limitations

Less Flexible

Complex logic may become difficult.

Not Ideal for Very Large Data Volumes

Spark-based solutions often scale better.

Limited Advanced Programming

Compared to notebooks.

Notebooks

What Are Notebooks?

Notebooks are code-based development environments that support:

PySpark
Python
Scala
Spark SQL
R

within Microsoft Fabric.

Transformation Engine

Notebooks execute on Spark clusters.

This enables:

Distributed processing
Parallel execution
Large-scale transformations

Typical Transformations

Examples:

Complex joins
Data enrichment
Machine learning preparation
Feature engineering
Data quality validation
Custom business logic

When to Use Notebooks

Choose notebooks when:

Large data volumes exist
Spark processing is required
Advanced transformations are needed
Custom programming is necessary
Machine learning integration is planned

Examples:

Processing billions of records
Data science workflows
Medallion architecture pipelines
Complex transformations

Advantages

Massive Scalability

Handles large datasets efficiently.

Flexible Programming

Supports multiple languages.

Machine Learning Integration

Works with Spark ML libraries.

Advanced Data Engineering

Ideal for enterprise-scale pipelines.

Limitations

Requires Coding Skills

Less accessible for business users.

More Complex Development

Compared to Dataflows Gen2.

T-SQL

What Is T-SQL?

T-SQL (Transact-SQL) is Microsoft’s extension of SQL.

Fabric Warehouses and SQL endpoints support T-SQL for:

Querying
Transforming
Managing relational data

Transformation Techniques

Common operations include:

			
SELECT
JOIN
GROUP BY
CASE
CTE
MERGE
WINDOW FUNCTIONS

		

When to Use T-SQL

Choose T-SQL when:

Data resides in a Warehouse
Relational transformations are required
SQL expertise already exists
Dimensional models are being built

Examples:

Fact table loading
Dimension updates
Data warehouse ETL
Reporting data preparation

Advantages

Familiar Language

Widely used by data professionals.

Excellent Relational Processing

Optimized for structured data.

Strong Performance

Particularly for warehouse workloads.

Easy Integration

Works naturally with BI tools.

Limitations

Less Suitable for Unstructured Data

Not ideal for files and raw data.

Limited Distributed Processing

Compared to Spark.

KQL

What Is KQL?

Kusto Query Language (KQL) is designed for:

Log analytics
Telemetry analysis
Real-time data processing
Event analytics

KQL is commonly used in:

KQL Databases
Eventhouse
Real-Time Intelligence

Typical Transformations

Examples include:

Filtering events
Aggregations
Pattern detection
Time-series analysis
Stream transformations

When to Use KQL

Choose KQL when:

Working with telemetry data
Processing logs
Analyzing streaming events
Building real-time dashboards

Examples:

Sensor monitoring
Application logs
Security analytics
Operational monitoring

Advantages

Optimized for Time-Series Data

Excellent for event-driven workloads.

Fast Query Performance

Handles large event volumes efficiently.

Real-Time Analytics

Supports low-latency analysis.

Limitations

Not a General ETL Tool

Less suitable for traditional batch ETL.

Not Designed for Dimensional Modeling

Warehouses are generally better for reporting models.

Comparing Transformation Technologies

Requirement	Dataflows Gen2	Notebooks	T-SQL	KQL
Low-Code Development	Excellent	Poor	Moderate	Moderate
Large-Scale Processing	Moderate	Excellent	Good	Excellent
Relational Transformations	Moderate	Good	Excellent	Limited
Streaming Analytics	Limited	Moderate	Poor	Excellent
Machine Learning Support	Poor	Excellent	Poor	Limited
Telemetry Analytics	Poor	Moderate	Poor	Excellent
Business User Friendly	Excellent	Poor	Moderate	Moderate
Advanced Programming	Limited	Excellent	Moderate	Limited

Decision Framework

Choose Dataflows Gen2 When:

Low-code ETL is preferred
Business users are involved
Data volumes are moderate
Transformations are straightforward

Choose Notebooks When:

Spark processing is required
Data volumes are large
Complex transformations exist
Machine learning is involved

Choose T-SQL When:

Working with a Warehouse
Building dimensional models
SQL skills are available
Data is highly structured

Choose KQL When:

Processing logs
Analyzing telemetry
Supporting streaming analytics
Building operational monitoring solutions

Common DP-700 Scenario Questions

Scenario 1

A business analyst needs to combine Excel spreadsheets and remove duplicate rows using a visual interface.

Best choice:

Dataflows Gen2

Scenario 2

A data engineer must transform billions of records stored in a Lakehouse.

Best choice:

Notebook

Scenario 3

A warehouse team must populate fact and dimension tables.

Best choice:

T-SQL

Scenario 4

An operations team analyzes millions of application log events each hour.

Best choice:

KQL

Scenario 5

A machine learning team requires custom Python transformations.

Best choice:

Notebook

Exam Tips

Many DP-700 questions are not asking what can perform a transformation, but what should perform the transformation.

Remember these associations:

Requirement	Best Choice
Visual ETL	Dataflows Gen2
Spark processing	Notebook
Data warehouse transformations	T-SQL
Telemetry and logs	KQL
Machine learning preparation	Notebook
Self-service data preparation	Dataflows Gen2
Streaming analytics	KQL

Practice Exam Questions

Question 1

A business analyst needs to cleanse CSV files using a graphical interface with minimal coding. Which transformation technology should be used?

A. T-SQL

B. Notebook

C. KQL

D. Dataflows Gen2

Answer: D

Explanation

Dataflows Gen2 provide a low-code, visual interface that is ideal for business users and simple ETL processes.

Question 2

A data engineer must process several billion records stored in a Lakehouse using distributed computing.

Which option should be selected?

A. Notebook

B. Dataflows Gen2

C. T-SQL

D. KQL

Answer: A

Explanation

Notebooks leverage Spark for distributed processing and are designed for large-scale data transformations.

Question 3

Which technology is specifically optimized for transforming and analyzing telemetry and log data?

A. Dataflows Gen2

B. Notebook

C. KQL

D. T-SQL

Answer: C

Explanation

KQL is designed for log analytics, telemetry processing, and real-time operational analytics.

Question 4

A team is loading dimension and fact tables within a Fabric Warehouse.

Which transformation technology is most appropriate?

A. Notebook

B. Dataflows Gen2

C. KQL

D. T-SQL

Answer: D

Explanation

T-SQL is the preferred technology for relational transformations in Fabric Warehouses.

Question 5

A company requires machine learning feature engineering using Python libraries.

Which technology should be selected?

A. Notebook

B. Dataflows Gen2

C. T-SQL

D. KQL

Answer: A

Explanation

Notebooks support Python, Spark, and machine learning frameworks, making them ideal for feature engineering.

Question 6

Which technology relies primarily on Power Query transformations?

A. Notebook

B. Dataflows Gen2

C. T-SQL

D. KQL

Answer: B

Explanation

Dataflows Gen2 use Power Query and the M language behind the scenes for data transformations.

Question 7

An operations team needs to perform real-time aggregations on streaming sensor data.

Which option should be used?

A. Dataflows Gen2

B. Notebook

C. KQL

D. T-SQL

Answer: C

Explanation

KQL is optimized for real-time event processing and telemetry analysis.

Question 8

A data engineer needs maximum flexibility to implement custom business logic across multiple data sources.

Which technology is most appropriate?

A. KQL

B. Dataflows Gen2

C. T-SQL

D. Notebook

Answer: D

Explanation

Notebooks provide the highest degree of customization through programming languages such as Python and PySpark.

Question 9

A team already has extensive SQL expertise and needs to transform highly structured relational data in a Warehouse.

Which option is best?

A. Notebook

B. T-SQL

C. Dataflows Gen2

D. KQL

Answer: B

Explanation

T-SQL is optimized for relational transformations and leverages existing SQL skills.

Question 10

Which technology is generally the most business-user-friendly option for creating batch data transformation processes?

A. Notebook

B. KQL

C. T-SQL

D. Dataflows Gen2

Answer: D

Explanation

Dataflows Gen2 provide a visual, low-code experience that is easier for business users and citizen developers than code-based solutions.

DP-700 Exam Summary

When deciding between transformation technologies, focus on the primary workload:

Dataflows Gen2 → Low-code ETL and self-service data preparation
Notebooks → Spark, large-scale processing, advanced engineering, and machine learning
T-SQL → Relational transformations and warehouse development
KQL → Telemetry, logs, time-series analytics, and real-time event processing

A common DP-700 exam strategy is to identify the keywords in the scenario: