This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Monitor and optimize an analytics solution (30–35%)
--> Identify and resolve errors
--> Identify and resolve notebook errors
Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Introduction
Notebook troubleshooting is an important skill for the DP-700 certification exam because notebooks are one of the primary tools used for data ingestion, transformation, orchestration, machine learning, and advanced analytics in Microsoft Fabric. Data engineers must be able to quickly identify failures, interpret error messages, diagnose root causes, and implement corrective actions.
This topic focuses on understanding notebook execution, common notebook errors, monitoring tools, debugging techniques, and best practices for building reliable notebook solutions in Microsoft Fabric.
Understanding Notebooks in Microsoft Fabric
A notebook is an interactive development environment that allows engineers to write and execute code using:
- PySpark
- Spark SQL
- Scala
- Python
- R (where supported)
Fabric notebooks run on Spark clusters and are commonly used to:
- Ingest data into Lakehouses
- Transform data
- Build ETL processes
- Execute streaming workloads
- Perform data quality checks
- Orchestrate complex data engineering workflows
Because notebooks often process large datasets and depend on external systems, failures are inevitable. Effective troubleshooting is therefore a critical data engineering skill.
Common Categories of Notebook Errors
Notebook failures generally fall into several categories:
Syntax Errors
These occur when code violates language rules.
Example
df = spark.read.csv("/Files/data.csv"
Error:
SyntaxError: unexpected EOF while parsing
Cause:
- Missing closing parenthesis
Resolution:
- Review code carefully
- Use notebook syntax highlighting
- Validate code before execution
Runtime Errors
Runtime errors occur when code is syntactically correct but fails during execution.
Example
value = 100 / 0
Error:
ZeroDivisionError
Cause:
- Division by zero
Resolution:
- Add validation logic
- Implement exception handling
Data Access Errors
These are among the most common notebook failures.
Examples
- File not found
- Table not found
- Permission denied
- Invalid storage path
Example:
df = spark.read.parquet( "/Files/Sales2025")
Error:
Path does not exist
Possible causes:
- Incorrect path
- Deleted file
- Typographical error
- Missing shortcut
Resolution:
- Verify file location
- Confirm OneLake shortcut configuration
- Check permissions
Authentication and Authorization Errors
A notebook may be unable to access resources because the user or service principal lacks required permissions.
Examples:
Access DeniedUnauthorizedPermission denied
Common causes:
- Workspace role limitations
- Missing Lakehouse permissions
- Source-system authentication failures
Resolution:
- Verify workspace access
- Confirm security settings
- Validate credentials
Spark Resource Errors
Spark jobs require compute resources.
Failures may occur because of:
- Insufficient memory
- Driver overload
- Executor failures
- Large shuffle operations
Typical errors:
OutOfMemoryErrorExecutorLostFailureDriver memory exceeded
Resolution:
- Increase Spark resources
- Optimize queries
- Partition data appropriately
- Reduce data movement
Dependency Errors
Notebook code may depend on external packages.
Example:
import pandas_profiling
Error:
ModuleNotFoundError
Cause:
- Package not installed
Resolution:
- Install required libraries
- Use supported package versions
Monitoring Notebook Execution
Fabric provides several methods for monitoring notebook runs.
Notebook Run Status
Execution status may show:
- Running
- Completed
- Failed
- Cancelled
A failed run should always be investigated using execution logs.
Cell-Level Error Analysis
Notebook failures typically identify:
- Failed cell
- Error type
- Line number
- Stack trace
Example:
Cell 8 failedAnalysisExceptionTable not found
This information significantly narrows troubleshooting efforts.
Spark Job Monitoring
Fabric allows engineers to inspect Spark jobs generated by notebook execution.
Useful information includes:
- Job duration
- Task failures
- Stage failures
- Resource utilization
- Data shuffle activity
This information is particularly valuable for performance-related issues.
Reading Spark Error Messages
One of the most important DP-700 skills is interpreting Spark exceptions.
AnalysisException
Example:
AnalysisException:Table customer_dim not found
Cause:
- Missing table
- Incorrect table name
- Incorrect Lakehouse attachment
Resolution:
- Verify table existence
- Check notebook Lakehouse context
FileNotFoundException
Example:
FileNotFoundException
Cause:
- Missing file
- Incorrect path
Resolution:
- Validate storage path
- Confirm file availability
OutOfMemoryError
Example:
Java heap space
Cause:
- Dataset too large
- Inefficient transformations
Resolution:
- Optimize Spark processing
- Use partitioning
- Increase cluster resources
NullPointerException
Cause:
- Unexpected null values
- Missing objects
Resolution:
- Validate inputs
- Add null handling
Debugging Techniques
Execute Incrementally
Rather than running an entire notebook:
- Run cells individually
- Verify outputs
- Isolate failures
This approach greatly reduces troubleshooting time.
Inspect Intermediate Results
Example:
df.show()
or
display(df)
Benefits:
- Verify schema
- Validate transformations
- Detect null values
- Confirm expected row counts
Check Schemas
Schema mismatches are a common source of errors.
Example:
df.printSchema()
Verify:
- Column names
- Data types
- Nullable settings
Validate Row Counts
Example:
df.count()
Useful for identifying:
- Missing records
- Unexpected filtering
- Data quality issues
Exception Handling
PySpark notebooks can implement error handling using Python exceptions.
Example:
try: df = spark.read.parquet(path)except Exception as e: print(e)
Benefits:
- Graceful failure handling
- Better logging
- Easier troubleshooting
Logging Best Practices
Instead of relying solely on notebook output, create structured logging.
Example:
print("Starting ingestion...")print("Reading source data...")print("Writing destination table...")
Benefits:
- Easier root-cause analysis
- Better operational monitoring
- Faster issue resolution
Many organizations write logs to:
- Lakehouse tables
- Monitoring databases
- Log Analytics environments
Notebook Failures in Pipelines
Many Fabric notebooks are executed through Data Pipelines.
When notebook activities fail:
Pipeline monitoring provides:
- Activity status
- Error messages
- Execution duration
- Retry history
Common troubleshooting process:
- Identify failed activity
- Open notebook run details
- Review Spark logs
- Identify root cause
- Correct notebook logic
Common Production Notebook Issues
Lakehouse Not Attached
Symptoms:
Table not found
Resolution:
- Attach correct Lakehouse
Schema Drift
Symptoms:
- New columns appear
- Data types change
Resolution:
- Add schema validation logic
- Handle schema evolution
Large Data Volumes
Symptoms:
- Slow execution
- Memory failures
Resolution:
- Optimize partitions
- Filter data earlier
- Reduce shuffle operations
Missing Upstream Data
Symptoms:
File not found
Resolution:
- Verify ingestion completion
- Add dependency checks
Notebook Optimization to Prevent Errors
Proactive optimization reduces future failures.
Best practices include:
- Use partition pruning
- Cache only when necessary
- Avoid excessive collect() operations
- Filter data early
- Use Delta tables
- Monitor Spark resource usage
- Implement retry logic where appropriate
- Validate input datasets before processing
Exam Tips
For the DP-700 exam, remember:
- AnalysisException usually indicates missing tables, views, or schema issues.
- FileNotFoundException typically indicates invalid paths or missing files.
- OutOfMemoryError often indicates resource constraints or inefficient Spark processing.
- Notebook debugging frequently involves reviewing Spark logs and cell-level errors.
- Lakehouse attachment problems commonly cause table-access failures.
- Pipelines provide monitoring information when notebook activities fail.
- Exception handling and logging improve operational reliability.
- Schema validation helps prevent runtime failures caused by schema drift.
- Spark monitoring tools help diagnose performance and execution problems.
- Resource optimization can prevent many notebook failures before they occur.
Practice Exam Questions
Question 1
A Fabric notebook fails with the following error:
AnalysisException: Table sales_fact not found
What is the MOST likely cause?
A. Spark cluster memory exhaustion
B. The referenced table does not exist or the wrong Lakehouse is attached
C. Network connectivity failure
D. Missing Python package
Correct Answer: B
Explanation:
AnalysisException commonly occurs when a referenced table, view, or schema object cannot be found. An incorrect Lakehouse attachment is also a frequent cause.
Question 2
A notebook fails with a FileNotFoundException when reading a parquet file.
What should be investigated first?
A. Spark executor configuration
B. Notebook language version
C. Storage path and file existence
D. Semantic model refresh history
Correct Answer: C
Explanation:
FileNotFoundException generally indicates an incorrect path, deleted file, missing shortcut, or unavailable source file.
Question 3
Which tool provides the MOST detailed information about Spark stage failures and executor issues?
A. Semantic model refresh history
B. Power BI usage metrics
C. Workspace role assignments
D. Spark job monitoring details
Correct Answer: D
Explanation:
Spark monitoring provides insight into jobs, stages, tasks, executor failures, and resource utilization.
Question 4
A notebook consistently fails due to Java heap space errors.
What is the MOST likely root cause?
A. Lakehouse attachment issue
B. Missing notebook parameter
C. Insufficient memory for the workload
D. Authentication failure
Correct Answer: C
Explanation:
Java heap space errors typically indicate memory pressure caused by large datasets or inefficient Spark operations.
Question 5
Which practice is MOST useful for isolating the source of a notebook failure?
A. Executing the entire notebook repeatedly
B. Running notebook cells individually and validating outputs
C. Increasing semantic model refresh frequency
D. Deleting Spark logs
Correct Answer: B
Explanation:
Executing cells incrementally helps identify exactly where a failure occurs and simplifies troubleshooting.
Question 6
A notebook references a Python package that is unavailable in the Spark environment.
Which error is MOST likely?
A. ModuleNotFoundError
B. AnalysisException
C. FileNotFoundException
D. TimeoutException
Correct Answer: A
Explanation:
ModuleNotFoundError occurs when required libraries or dependencies are unavailable.
Question 7
Which technique helps detect schema drift before downstream failures occur?
A. Increasing cluster size
B. Restarting the Spark session
C. Validating schemas during ingestion and transformation
D. Disabling logging
Correct Answer: C
Explanation:
Schema validation identifies unexpected columns, missing fields, or data type changes before they impact processing.
Question 8
A notebook activity fails within a Fabric pipeline.
Where should an engineer typically begin troubleshooting?
A. Power BI report usage metrics
B. Semantic model refresh schedule
C. Workspace branding settings
D. Pipeline activity run details and notebook execution logs
Correct Answer: D
Explanation:
Pipeline activity logs provide error messages, execution status, duration, and links to notebook execution details.
Question 9
Which action can help reduce the likelihood of OutOfMemoryError exceptions?
A. Using partition pruning and filtering data early
B. Disabling Spark monitoring
C. Removing notebook logging
D. Creating additional semantic models
Correct Answer: A
Explanation:
Reducing data volume processed by Spark lowers memory requirements and improves execution efficiency.
Question 10
Why should exception handling be implemented in production notebooks?
A. To eliminate all Spark errors
B. To increase Lakehouse storage capacity
C. To improve report rendering speed
D. To capture errors gracefully and improve troubleshooting
Correct Answer: D
Explanation:
Exception handling enables controlled failure behavior, better logging, easier diagnosis, and more resilient notebook execution.
Go to the DP-700 Exam Prep Hub main page.
