Choose between Dataflow Gen2, a pipeline and a notebook (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Implement and manage an analytics solution (30–35%)
   --> Orchestrate processes
      --> Choose between Dataflow Gen2, a pipeline and a notebook


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the most important skills for a Microsoft Fabric Data Engineer is selecting the appropriate tool for a particular task. Microsoft Fabric provides several powerful technologies for data ingestion, transformation, orchestration, and automation. Three of the most commonly used are:

  • Dataflow Gen2
  • Data Pipelines
  • Notebooks

Although these tools often work together, they serve different purposes. Choosing the wrong tool can lead to unnecessary complexity, reduced maintainability, and increased development effort.

For the DP-700 exam, you should understand:

  • The primary purpose of each tool
  • When to use each tool
  • Strengths and limitations
  • Common design patterns
  • How these tools interact with one another

A significant number of DP-700 scenario questions are likely to test your ability to determine which Fabric component best fits a given business requirement.


Understanding the Three Tools

Before comparing them, it is important to understand their primary functions.

ToolPrimary Purpose
Dataflow Gen2Low-code data ingestion and transformation
Data PipelineWorkflow orchestration and automation
NotebookCode-based data processing and advanced analytics

A useful way to remember this is:

Dataflow Gen2 = Transform Data
Pipeline = Orchestrate Processes
Notebook = Execute Code

What Is Dataflow Gen2?

Dataflow Gen2 is a low-code/no-code data integration and transformation tool built on Power Query technology.

It allows users to:

  • Connect to data sources
  • Clean data
  • Transform data
  • Merge datasets
  • Filter records
  • Load data into Fabric destinations

Dataflow Gen2 is designed for users who prefer visual development rather than coding.


Dataflow Gen2 Architecture

Data Source
Power Query Transformations
Dataflow Gen2
Lakehouse / Warehouse

The transformation logic is built using a graphical interface.


Common Dataflow Gen2 Tasks

Examples include:

  • Removing duplicates
  • Filtering rows
  • Renaming columns
  • Data cleansing
  • Combining files
  • Joining datasets
  • Type conversions
  • Data standardization

These activities require little or no programming.


Advantages of Dataflow Gen2

Low-Code Development

Business analysts and citizen developers can build transformations without extensive coding knowledge.

Reusable Transformations

Transformations can be reused across multiple projects.

Familiar Power Query Experience

Users familiar with Power BI often adapt quickly.

Large Connector Library

Supports many cloud and on-premises data sources.


Limitations of Dataflow Gen2

Dataflow Gen2 is not ideal for:

  • Complex machine learning workloads
  • Advanced Spark processing
  • Custom Python development
  • Large-scale distributed programming

For those scenarios, notebooks are often more appropriate.


What Is a Data Pipeline?

A Data Pipeline is an orchestration tool.

Its primary purpose is not data transformation but rather coordinating and automating activities.

Think of a pipeline as a workflow engine.


Pipeline Architecture

Activity 1
Activity 2
Activity 3
Activity 4

Pipelines determine:

  • What runs
  • When it runs
  • In what order it runs
  • Under what conditions it runs

Common Pipeline Activities

Examples include:

  • Copy Data
  • Execute Notebook
  • Execute Dataflow
  • Stored Procedures
  • Web Activities
  • Conditional Logic
  • Scheduling Jobs

Pipelines coordinate these activities into a complete workflow.


Advantages of Pipelines

Workflow Automation

Automates complex end-to-end processes.

Scheduling

Supports recurring execution schedules.

Dependency Management

Controls execution order.

Error Handling

Supports retries and failure paths.

Integration

Can orchestrate multiple Fabric components.


Limitations of Pipelines

Pipelines are not intended for:

  • Complex data transformations
  • Interactive analysis
  • Advanced programming

Pipelines orchestrate work; they do not replace transformation tools.


What Is a Notebook?

A notebook is a code-based environment that allows developers and data engineers to execute code directly against Fabric data.

Notebooks commonly use:

  • Python
  • PySpark
  • Spark SQL
  • Scala (where supported)

They run on Spark compute engines.


Notebook Architecture

Data Source
Spark Processing
Notebook
Lakehouse / Warehouse

Notebooks provide maximum flexibility and control.


Common Notebook Tasks

Examples include:

  • PySpark transformations
  • Data engineering workflows
  • Machine learning preparation
  • Advanced data cleansing
  • Streaming data processing
  • Delta table optimization
  • Custom business logic

Advantages of Notebooks

Full Programming Flexibility

Developers can implement virtually any logic.

Spark Integration

Supports distributed processing.

Advanced Transformations

Suitable for highly complex data engineering workloads.

Machine Learning Support

Works well with AI and ML frameworks.

Scalability

Can process very large datasets.


Limitations of Notebooks

Coding Required

Requires programming knowledge.

Higher Complexity

Can be more difficult to maintain.

Less Accessible

Business users typically prefer Dataflow Gen2.


Side-by-Side Comparison

FeatureDataflow Gen2PipelineNotebook
Primary PurposeData TransformationOrchestrationAdvanced Processing
Coding RequiredMinimalMinimalExtensive
SchedulingLimitedYesUsually via Pipeline
Spark SupportNo Direct CodingNoYes
Visual InterfaceYesYesNo
Advanced LogicLimitedLimitedExtensive
Best for ETLYesCoordinates ETLYes
Machine LearningNoNoYes

When to Choose Dataflow Gen2

Choose Dataflow Gen2 when:

  • Data cleansing is required
  • Users prefer visual tools
  • Power Query transformations are sufficient
  • Business analysts are building solutions
  • Coding should be minimized

Example

Requirement:

Import CSV files
Remove duplicates
Rename columns
Load into Lakehouse

Best Choice:

Dataflow Gen2


When to Choose a Pipeline

Choose a Pipeline when:

  • Multiple tasks must be coordinated
  • Processes require scheduling
  • Activities depend on one another
  • Workflows need monitoring
  • Automation is required

Example

Requirement:

Run Dataflow
Run Notebook
Load Warehouse
Send Notification

Best Choice:

Pipeline


When to Choose a Notebook

Choose a Notebook when:

  • Complex transformations are required
  • PySpark processing is needed
  • Machine learning is involved
  • Custom code is necessary
  • Large-scale distributed processing is required

Example

Requirement:

Apply custom PySpark transformation
Process 10 TB dataset
Optimize Delta tables

Best Choice:

Notebook


Common Real-World Pattern

In many Fabric environments, all three tools are used together.

Example:

Dataflow Gen2
Pipeline
Notebook
Warehouse

Workflow:

  1. Dataflow Gen2 cleans source files.
  2. Pipeline orchestrates execution.
  3. Notebook performs advanced transformations.
  4. Results load into a Warehouse.

This layered approach is common in enterprise solutions.


Decision Framework for DP-700

When reading exam questions, ask:

Is the requirement primarily data transformation?

Choose:

Dataflow Gen2


Is the requirement workflow orchestration?

Choose:

Pipeline


Is the requirement advanced coding or Spark processing?

Choose:

Notebook


Common Exam Traps

Trap #1

Question mentions:

  • Scheduling
  • Dependencies
  • Automation

Correct answer:

Pipeline

Even if transformations are involved.


Trap #2

Question mentions:

  • PySpark
  • Python
  • Machine Learning
  • Spark

Correct answer:

Notebook


Trap #3

Question mentions:

  • Power Query
  • Visual transformation
  • No-code development

Correct answer:

Dataflow Gen2


DP-700 Exam Focus Areas

You should understand:

✓ Purpose of Dataflow Gen2

✓ Purpose of Data Pipelines

✓ Purpose of Notebooks

✓ Visual versus code-based development

✓ Workflow orchestration

✓ Spark processing

✓ Power Query transformations

✓ Scheduling and automation

✓ Common integration patterns

✓ Appropriate tool selection for business scenarios


Practice Exam Questions

Question 1

A business analyst needs to import CSV files, remove duplicate rows, and standardize column names using a visual interface with minimal coding.

Which Fabric component should be used?

A. Notebook

B. Data Pipeline

C. Dataflow Gen2

D. Deployment Pipeline

Answer: C

Explanation

Dataflow Gen2 is designed for low-code data ingestion and transformation using Power Query.


Question 2

A data engineering solution must execute the following sequence:

  1. Run a Dataflow Gen2 process
  2. Execute a Notebook
  3. Load a Warehouse
  4. Send a notification email

Which Fabric component should coordinate this workflow?

A. Lakehouse

B. Data Pipeline

C. Notebook

D. Semantic Model

Answer: B

Explanation

Pipelines are designed to orchestrate and automate multiple activities and dependencies.


Question 3

A team needs to perform complex PySpark transformations against several terabytes of data.

Which Fabric component is most appropriate?

A. Dataflow Gen2

B. Pipeline

C. Dashboard

D. Notebook

Answer: D

Explanation

Notebooks provide Spark-based programming environments suitable for large-scale transformations.


Question 4

Which Fabric component is primarily responsible for workflow orchestration?

A. Dataflow Gen2

B. Lakehouse

C. Warehouse

D. Data Pipeline

Answer: D

Explanation

Data Pipelines coordinate and automate execution of multiple activities.


Question 5

A solution requires users with no programming experience to create reusable data cleansing transformations.

Which component should be selected?

A. Notebook

B. Dataflow Gen2

C. Pipeline

D. Spark Job Definition

Answer: B

Explanation

Dataflow Gen2 provides a low-code visual environment for data preparation.


Question 6

Which Fabric component offers the greatest flexibility for implementing custom business logic?

A. Dataflow Gen2

B. Warehouse

C. Notebook

D. Data Pipeline

Answer: C

Explanation

Notebooks support Python, PySpark, and Spark SQL, allowing virtually unlimited customization.


Question 7

A company wants to schedule nightly execution of several notebooks and monitor failures.

Which Fabric component should be used?

A. Dataflow Gen2

B. Notebook

C. Lakehouse

D. Data Pipeline

Answer: D

Explanation

Pipelines provide scheduling, monitoring, dependencies, and failure handling.


Question 8

Which statement best describes Dataflow Gen2?

A. It is primarily a workflow orchestration tool.

B. It is a low-code data transformation solution based on Power Query.

C. It is designed for machine learning development.

D. It replaces Spark notebooks.

Answer: B

Explanation

Dataflow Gen2 is optimized for low-code ETL and data transformation workloads.


Question 9

A data engineer must optimize Delta tables using Spark commands and Python code.

Which Fabric component should be used?

A. Notebook

B. Data Pipeline

C. Dataflow Gen2

D. Warehouse

Answer: A

Explanation

Notebook environments provide direct access to Spark capabilities and custom code execution.


Question 10

Which scenario is the best fit for a Data Pipeline?

A. Creating Power Query transformations

B. Applying machine learning algorithms

C. Coordinating multiple Fabric activities into an automated workflow

D. Writing custom PySpark code

Answer: C

Explanation

Pipelines are specifically designed for orchestration, automation, scheduling, dependency management, and monitoring.


Exam Tip

A useful DP-700 memory aid is:

RequirementBest Tool
Visual ETL and data preparationDataflow Gen2
Scheduling and orchestrationData Pipeline
Spark, Python, and advanced processingNotebook

When a scenario focuses on automation and coordinating activities, think Pipeline.

When it focuses on Power Query transformations, think Dataflow Gen2.

When it focuses on PySpark, Spark SQL, machine learning, or custom code, think Notebook.


Go to the DP-700 Exam Prep Hub main page.

Leave a comment