This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Implement and manage an analytics solution (30–35%)
   --> Orchestrate processes
      --> Choose between Dataflow Gen2, a pipeline and a notebook

Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the most important skills for a Microsoft Fabric Data Engineer is selecting the appropriate tool for a particular task. Microsoft Fabric provides several powerful technologies for data ingestion, transformation, orchestration, and automation. Three of the most commonly used are:

Dataflow Gen2
Data Pipelines
Notebooks

Although these tools often work together, they serve different purposes. Choosing the wrong tool can lead to unnecessary complexity, reduced maintainability, and increased development effort.

For the DP-700 exam, you should understand:

The primary purpose of each tool
When to use each tool
Strengths and limitations
Common design patterns
How these tools interact with one another

A significant number of DP-700 scenario questions are likely to test your ability to determine which Fabric component best fits a given business requirement.

Understanding the Three Tools

Before comparing them, it is important to understand their primary functions.

Tool	Primary Purpose
Dataflow Gen2	Low-code data ingestion and transformation
Data Pipeline	Workflow orchestration and automation
Notebook	Code-based data processing and advanced analytics

A useful way to remember this is:

			
Dataflow Gen2 = Transform Data
Pipeline = Orchestrate Processes
Notebook = Execute Code

What Is Dataflow Gen2?

Dataflow Gen2 is a low-code/no-code data integration and transformation tool built on Power Query technology.

It allows users to:

Connect to data sources
Clean data
Transform data
Merge datasets
Filter records
Load data into Fabric destinations

Dataflow Gen2 is designed for users who prefer visual development rather than coding.

Dataflow Gen2 Architecture

			
Data Source
      ↓
Power Query Transformations
      ↓
Dataflow Gen2
      ↓
Lakehouse / Warehouse

		

The transformation logic is built using a graphical interface.

Common Dataflow Gen2 Tasks

Examples include:

Removing duplicates
Filtering rows
Renaming columns
Data cleansing
Combining files
Joining datasets
Type conversions
Data standardization

These activities require little or no programming.

Advantages of Dataflow Gen2

Low-Code Development

Business analysts and citizen developers can build transformations without extensive coding knowledge.

Reusable Transformations

Transformations can be reused across multiple projects.

Familiar Power Query Experience

Users familiar with Power BI often adapt quickly.

Large Connector Library

Supports many cloud and on-premises data sources.

Limitations of Dataflow Gen2

Dataflow Gen2 is not ideal for:

Complex machine learning workloads
Advanced Spark processing
Custom Python development
Large-scale distributed programming

For those scenarios, notebooks are often more appropriate.

What Is a Data Pipeline?

A Data Pipeline is an orchestration tool.

Its primary purpose is not data transformation but rather coordinating and automating activities.

Think of a pipeline as a workflow engine.

Pipeline Architecture

			
Activity 1
      ↓
Activity 2
      ↓
Activity 3
      ↓
Activity 4

		

Pipelines determine:

What runs
When it runs
In what order it runs
Under what conditions it runs

Common Pipeline Activities

Examples include:

Copy Data
Execute Notebook
Execute Dataflow
Stored Procedures
Web Activities
Conditional Logic
Scheduling Jobs

Pipelines coordinate these activities into a complete workflow.

Advantages of Pipelines

Workflow Automation

Automates complex end-to-end processes.

Scheduling

Supports recurring execution schedules.

Dependency Management

Controls execution order.

Error Handling

Supports retries and failure paths.

Integration

Can orchestrate multiple Fabric components.

Limitations of Pipelines

Pipelines are not intended for:

Complex data transformations
Interactive analysis
Advanced programming

Pipelines orchestrate work; they do not replace transformation tools.

What Is a Notebook?

A notebook is a code-based environment that allows developers and data engineers to execute code directly against Fabric data.

Notebooks commonly use:

Python
PySpark
Spark SQL
Scala (where supported)

They run on Spark compute engines.

Notebook Architecture

			
Data Source
      ↓
Spark Processing
      ↓
Notebook
      ↓
Lakehouse / Warehouse

		

Notebooks provide maximum flexibility and control.

Common Notebook Tasks

Examples include:

PySpark transformations
Data engineering workflows
Machine learning preparation
Advanced data cleansing
Streaming data processing
Delta table optimization
Custom business logic

Advantages of Notebooks

Full Programming Flexibility

Developers can implement virtually any logic.

Spark Integration

Supports distributed processing.

Advanced Transformations

Suitable for highly complex data engineering workloads.

Machine Learning Support

Works well with AI and ML frameworks.

Scalability

Can process very large datasets.

Limitations of Notebooks

Coding Required

Requires programming knowledge.

Higher Complexity

Can be more difficult to maintain.

Less Accessible

Business users typically prefer Dataflow Gen2.

Side-by-Side Comparison

Feature	Dataflow Gen2	Pipeline	Notebook
Primary Purpose	Data Transformation	Orchestration	Advanced Processing
Coding Required	Minimal	Minimal	Extensive
Scheduling	Limited	Yes	Usually via Pipeline
Spark Support	No Direct Coding	No	Yes
Visual Interface	Yes	Yes	No
Advanced Logic	Limited	Limited	Extensive
Best for ETL	Yes	Coordinates ETL	Yes
Machine Learning	No	No	Yes

When to Choose Dataflow Gen2

Choose Dataflow Gen2 when:

Data cleansing is required
Users prefer visual tools
Power Query transformations are sufficient
Business analysts are building solutions
Coding should be minimized

Example

Requirement:

			
Import CSV files
Remove duplicates
Rename columns
Load into Lakehouse

Best Choice:

Dataflow Gen2

When to Choose a Pipeline

Choose a Pipeline when:

Multiple tasks must be coordinated
Processes require scheduling
Activities depend on one another
Workflows need monitoring
Automation is required

Example

Requirement:

			
Run Dataflow
Run Notebook
Load Warehouse
Send Notification

Best Choice:

Pipeline

When to Choose a Notebook

Choose a Notebook when:

Complex transformations are required
PySpark processing is needed
Machine learning is involved
Custom code is necessary
Large-scale distributed processing is required

Example

Requirement:

			
Apply custom PySpark transformation
Process 10 TB dataset
Optimize Delta tables

Best Choice:

Notebook

Common Real-World Pattern

In many Fabric environments, all three tools are used together.

Example:

			
Dataflow Gen2
      ↓
Pipeline
      ↓
Notebook
      ↓
Warehouse

		

Workflow:

Dataflow Gen2 cleans source files.
Pipeline orchestrates execution.
Notebook performs advanced transformations.
Results load into a Warehouse.

This layered approach is common in enterprise solutions.

Decision Framework for DP-700

When reading exam questions, ask:

Is the requirement primarily data transformation?

Choose:

Dataflow Gen2

Is the requirement workflow orchestration?

Choose:

Pipeline

Is the requirement advanced coding or Spark processing?

Choose:

Notebook

Common Exam Traps

Trap #1

Question mentions:

Scheduling
Dependencies
Automation

Correct answer:

Pipeline

Even if transformations are involved.

Trap #2

Question mentions:

PySpark
Python
Machine Learning
Spark

Correct answer:

Notebook

Trap #3

Question mentions:

Power Query
Visual transformation
No-code development

Correct answer:

Dataflow Gen2

DP-700 Exam Focus Areas

You should understand:

✓ Purpose of Dataflow Gen2

✓ Purpose of Data Pipelines

✓ Purpose of Notebooks

✓ Visual versus code-based development

✓ Workflow orchestration

✓ Spark processing

✓ Power Query transformations

✓ Scheduling and automation

✓ Common integration patterns

✓ Appropriate tool selection for business scenarios

Practice Exam Questions

Question 1

A business analyst needs to import CSV files, remove duplicate rows, and standardize column names using a visual interface with minimal coding.

Which Fabric component should be used?

A. Notebook

B. Data Pipeline

C. Dataflow Gen2

D. Deployment Pipeline

Answer: C

Explanation

Dataflow Gen2 is designed for low-code data ingestion and transformation using Power Query.

Question 2

A data engineering solution must execute the following sequence:

Run a Dataflow Gen2 process
Execute a Notebook
Load a Warehouse
Send a notification email

Which Fabric component should coordinate this workflow?

A. Lakehouse

B. Data Pipeline

C. Notebook

D. Semantic Model

Answer: B

Explanation

Pipelines are designed to orchestrate and automate multiple activities and dependencies.

Question 3

A team needs to perform complex PySpark transformations against several terabytes of data.

Which Fabric component is most appropriate?

A. Dataflow Gen2

B. Pipeline

C. Dashboard

D. Notebook

Answer: D

Explanation

Notebooks provide Spark-based programming environments suitable for large-scale transformations.

Question 4

Which Fabric component is primarily responsible for workflow orchestration?

A. Dataflow Gen2

B. Lakehouse

C. Warehouse

D. Data Pipeline

Answer: D

Explanation

Data Pipelines coordinate and automate execution of multiple activities.

Question 5

A solution requires users with no programming experience to create reusable data cleansing transformations.

Which component should be selected?

A. Notebook

B. Dataflow Gen2

C. Pipeline

D. Spark Job Definition

Answer: B

Explanation

Dataflow Gen2 provides a low-code visual environment for data preparation.

Question 6

Which Fabric component offers the greatest flexibility for implementing custom business logic?

A. Dataflow Gen2

B. Warehouse

C. Notebook

D. Data Pipeline

Answer: C

Explanation

Notebooks support Python, PySpark, and Spark SQL, allowing virtually unlimited customization.

Question 7

A company wants to schedule nightly execution of several notebooks and monitor failures.

Which Fabric component should be used?

A. Dataflow Gen2

B. Notebook

C. Lakehouse

D. Data Pipeline

Answer: D

Explanation

Pipelines provide scheduling, monitoring, dependencies, and failure handling.

Question 8

Which statement best describes Dataflow Gen2?

A. It is primarily a workflow orchestration tool.

B. It is a low-code data transformation solution based on Power Query.

C. It is designed for machine learning development.

D. It replaces Spark notebooks.

Answer: B

Explanation

Dataflow Gen2 is optimized for low-code ETL and data transformation workloads.

Question 9

A data engineer must optimize Delta tables using Spark commands and Python code.

Which Fabric component should be used?

A. Notebook

B. Data Pipeline

C. Dataflow Gen2

D. Warehouse

Answer: A

Explanation

Notebook environments provide direct access to Spark capabilities and custom code execution.

Question 10

Which scenario is the best fit for a Data Pipeline?

A. Creating Power Query transformations

B. Applying machine learning algorithms

C. Coordinating multiple Fabric activities into an automated workflow

D. Writing custom PySpark code

Answer: C

Explanation

Pipelines are specifically designed for orchestration, automation, scheduling, dependency management, and monitoring.

Exam Tip

A useful DP-700 memory aid is:

Requirement	Best Tool
Visual ETL and data preparation	Dataflow Gen2
Scheduling and orchestration	Data Pipeline
Spark, Python, and advanced processing	Notebook

When a scenario focuses on automation and coordinating activities, think Pipeline.

When it focuses on Power Query transformations, think Dataflow Gen2.

When it focuses on PySpark, Spark SQL, machine learning, or custom code, think Notebook.

Go to the DP-700 Exam Prep Hub main page.