This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
--> Ingest and transform batch data
--> Ingest data by using pipelines
Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Introduction
Microsoft Fabric Data Pipelines are one of the primary tools used by data engineers to ingest, move, and orchestrate data across various sources and destinations. Pipelines provide a low-code orchestration framework that allows organizations to build scalable, repeatable, and maintainable data ingestion solutions.
For the DP-700 exam, it is important to understand:
- What pipelines are
- Pipeline architecture and components
- Common ingestion patterns
- Copy Data activity
- Data source and destination connectivity
- Pipeline orchestration
- Parameters and dynamic content
- Scheduling and triggering
- Monitoring and troubleshooting
- Best practices for pipeline-based ingestion
What Is a Microsoft Fabric Data Pipeline?
A Data Pipeline is a workflow orchestration service within Microsoft Fabric that enables data engineers to:
- Move data between systems
- Schedule data ingestion
- Execute transformation activities
- Coordinate multiple processes
- Automate data workflows
Pipelines are derived from the same core concepts used in Azure Data Factory and Azure Synapse Analytics, making them familiar to many data professionals.
A pipeline is essentially a container that holds one or more activities that execute in a defined sequence.
Why Use Pipelines for Data Ingestion?
Organizations often need to ingest data from:
- SQL Server
- Azure SQL Database
- Azure Blob Storage
- Amazon S3
- REST APIs
- CSV files
- Excel files
- On-premises systems
- Data warehouses
- SaaS applications
Pipelines provide a centralized and scalable way to move this data into Fabric.
Benefits include:
Automation
No manual intervention required once configured.
Scalability
Handles large volumes of data efficiently.
Reusability
Pipelines can be reused across multiple ingestion scenarios.
Monitoring
Built-in execution tracking and logging.
Integration
Works with many Fabric workloads and external systems.
Pipeline Architecture
A pipeline consists of several components:
Pipeline
The overall workflow container.
Activities
Tasks performed within the pipeline.
Examples:
- Copy Data
- Notebook execution
- Stored procedure execution
- Dataflow execution
- Variable assignment
Datasets
Represent source or destination data structures.
Connections
Define how the pipeline connects to external systems.
Parameters
Provide runtime flexibility.
Triggers
Determine when pipelines execute.
Common Pipeline Activities
For DP-700, understanding activities is essential.
Copy Data Activity
The most commonly used ingestion activity.
Used to:
- Copy files
- Move tables
- Transfer structured data
- Load data into Fabric destinations
Examples:
- SQL Server → Lakehouse
- Azure SQL → Warehouse
- CSV → OneLake
- Blob Storage → Lakehouse
Notebook Activity
Executes Spark notebooks.
Common uses:
- Data transformation
- Data cleansing
- Machine learning processing
Dataflow Activity
Runs Dataflow Gen2 processes.
Used when:
- Low-code transformations are preferred
- Business users participate in data preparation
Stored Procedure Activity
Executes SQL stored procedures.
Useful for:
- Database maintenance
- Incremental processing
- Metadata updates
Using the Copy Data Activity
The Copy Data activity is heavily emphasized on the DP-700 exam.
Source
Defines where data originates.
Examples:
- SQL Database
- Oracle
- REST API
- CSV File
- Blob Storage
Destination
Defines where data is written.
Examples:
- Lakehouse
- Data Warehouse
- OneLake files
- SQL endpoint
Mapping
Maps source columns to destination columns.
Example:
| Source | Destination |
|---|---|
| CustomerID | CustomerKey |
| Name | CustomerName |
| City | CustomerCity |
Data Sources Supported by Pipelines
Fabric pipelines support numerous source systems.
Common examples include:
Relational Databases
- SQL Server
- Azure SQL Database
- Oracle
- PostgreSQL
- MySQL
File-Based Sources
- CSV
- JSON
- Parquet
- Excel
Cloud Storage
- Azure Blob Storage
- Azure Data Lake Storage
- Amazon S3
Web-Based Sources
- REST APIs
- HTTP endpoints
Pipeline Destinations
Common destinations include:
Lakehouse
Frequently used for raw and curated data storage.
Benefits:
- Delta format
- Open storage
- Spark compatibility
Data Warehouse
Ideal for structured analytical workloads.
Benefits:
- SQL support
- Relational design
- High-performance reporting
OneLake Files
Used for raw file storage.
Batch Data Ingestion Patterns
The DP-700 exam focuses heavily on batch ingestion.
Full Load Pattern
Every execution loads the entire dataset.
Example:
Daily import of a 5,000-row lookup table.
Advantages:
- Simple implementation
Disadvantages:
- Higher processing costs
- Longer runtimes
Incremental Load Pattern
Only new or changed records are loaded.
Example:
Import orders created since the last execution.
Advantages:
- Faster
- Lower costs
- Reduced data movement
Disadvantages:
- More complex configuration
Parameterized Pipelines
Parameters make pipelines reusable.
Example parameter:
SourceTable
Pipeline executions can specify:
CustomersOrdersProductsInvoices
This allows one pipeline design to ingest many tables.
Benefits:
- Reduced development effort
- Easier maintenance
- Consistent ingestion processes
Dynamic Content
Dynamic expressions enable runtime flexibility.
Examples:
Generate file names:
Sales_@{utcnow()}.csv
Generate folders:
Raw/@{formatDateTime(utcnow(),'yyyy/MM/dd')}
Use parameter values:
@pipeline().parameters.TableName
Dynamic content is commonly tested on DP-700.
Control Flow Activities
Pipelines can include logic and branching.
If Condition
Executes different paths depending on conditions.
Example:
- File exists → Continue
- File missing → Send notification
Switch Activity
Handles multiple execution paths.
Example:
Process data differently based on source type.
ForEach Activity
Loops through collections.
Example:
Load 100 source tables using one pipeline.
Until Activity
Repeats execution until a condition becomes true.
Scheduling Pipelines
Pipelines commonly run on schedules.
Examples:
- Hourly
- Daily
- Weekly
- Monthly
Typical workloads:
| Workload | Schedule |
|---|---|
| Sales Data | Hourly |
| ERP Data | Daily |
| Financial Data | Nightly |
| Master Data | Weekly |
Event-Based Triggers
Instead of schedules, pipelines can run when events occur.
Examples:
- New file arrives
- Data source updated
- Upstream process completed
Benefits:
- Reduced latency
- Faster processing
- More responsive architecture
Monitoring Pipeline Executions
Fabric provides execution monitoring.
Data engineers can review:
Run Status
- Succeeded
- Failed
- In Progress
- Cancelled
Duration
How long execution required.
Activity-Level Results
Identify which step failed.
Error Messages
Useful for troubleshooting.
Common issues include:
- Authentication failures
- Missing files
- Schema mismatches
- Permission problems
Error Handling
Reliable ingestion solutions require proper error handling.
Common approaches:
Retry Policies
Automatically rerun failed activities.
Logging
Record execution details.
Validation
Check data quality before loading.
Notifications
Alert administrators when failures occur.
Security Considerations
Pipeline ingestion must follow security best practices.
Secure Credentials
Use managed identities and secure connections whenever possible.
Least Privilege
Grant only required permissions.
Workspace Security
Control who can modify pipelines.
Data Governance
Apply sensitivity labels and auditing where appropriate.
Pipeline Best Practices
Use Parameterization
Avoid hardcoding values.
Build Reusable Components
Create generic ingestion pipelines.
Use Incremental Loads
When possible, reduce data movement.
Monitor Executions
Review failures proactively.
Implement Error Handling
Design for operational resilience.
Separate Environments
Maintain Dev, Test, and Production pipelines.
Pipeline vs Dataflow Gen2 vs Notebook
Understanding when to use each tool is a common exam objective.
| Feature | Pipeline | Dataflow Gen2 | Notebook |
|---|---|---|---|
| Orchestration | Excellent | Limited | Limited |
| Data Movement | Excellent | Good | Good |
| Low-Code | Yes | Yes | No |
| Spark Processing | No | No | Yes |
| Complex Programming | No | No | Yes |
| Scheduling | Excellent | Good | Good |
Use Pipelines When:
- Moving data between systems
- Orchestrating workflows
- Scheduling processes
- Managing dependencies
Use Dataflow Gen2 When:
- Low-code transformations are required
Use Notebooks When:
- Spark processing is needed
- Custom Python or Scala logic is required
DP-700 Exam Tips
Remember these key points:
✓ Pipelines are primarily orchestration and data movement tools.
✓ The Copy Data activity is the most common ingestion activity.
✓ Pipelines support both scheduled and event-based execution.
✓ Parameters and dynamic expressions improve reusability.
✓ Incremental loads are preferred for large datasets.
✓ Pipelines can execute notebooks and dataflows.
✓ Monitoring and troubleshooting pipeline runs are important operational responsibilities.
✓ Control flow activities such as ForEach and If Condition are frequently used in enterprise solutions.
✓ Pipelines are generally the preferred Fabric tool for orchestrating end-to-end ingestion workflows.
Practice Exam Questions
Question 1
A data engineer needs to copy data nightly from Azure SQL Database into a Fabric Lakehouse. Which Fabric component is most appropriate?
A. Semantic Model
B. Data Pipeline
C. Dashboard
D. KQL Queryset
Correct Answer: B
Explanation:
Data Pipelines are designed for orchestrating and executing data movement activities such as copying data from Azure SQL Database into a Lakehouse.
Question 2
Which pipeline activity is primarily used to move data from a source system to a destination?
A. Notebook Activity
B. Copy Data Activity
C. If Condition Activity
D. Switch Activity
Correct Answer: B
Explanation:
The Copy Data activity is specifically designed for ingesting and transferring data between sources and destinations.
Question 3
A company wants a pipeline to process 50 tables using a single reusable workflow. Which feature should be implemented?
A. Data Warehouse
B. OneLake Shortcut
C. Parameters
D. Mirroring
Correct Answer: C
Explanation:
Parameters allow a pipeline to accept table names and other runtime values, making the solution reusable.
Question 4
Which control flow activity is used to repeatedly process a collection of items?
A. ForEach
B. Wait
C. Lookup
D. If Condition
Correct Answer: A
Explanation:
The ForEach activity iterates through collections and executes activities for each item.
Question 5
A data engineer wants a pipeline to run automatically every night at midnight. What should be configured?
A. Sensitivity Label
B. Scheduled Trigger
C. Dataflow Refresh Policy
D. Lakehouse Shortcut
Correct Answer: B
Explanation:
Scheduled triggers are used to execute pipelines at predefined times.
Question 6
Which Fabric destination is most commonly used for storing raw and curated Delta tables?
A. Lakehouse
B. Dashboard
C. Workspace Role
D. Semantic Model
Correct Answer: A
Explanation:
Lakehouses provide Delta Lake storage and are commonly used as ingestion targets.
Question 7
A pipeline should execute only when a new file arrives in storage. What should be used?
A. Manual Execution
B. Incremental Refresh
C. Event-Based Trigger
D. Full Load
Correct Answer: C
Explanation:
Event-based triggers allow pipelines to start when specific events occur, such as file creation.
Question 8
Which statement about incremental loading is correct?
A. It reloads all records every execution.
B. It loads only new or changed records.
C. It requires deleting the destination table first.
D. It cannot be implemented in pipelines.
Correct Answer: B
Explanation:
Incremental loading minimizes processing by transferring only new or modified data.
Question 9
A data engineer needs to execute custom PySpark transformation logic as part of a pipeline. Which activity should be used?
A. Copy Data Activity
B. If Condition Activity
C. Stored Procedure Activity
D. Notebook Activity
Correct Answer: D
Explanation:
Notebook activities allow execution of Spark notebooks containing custom Python, Scala, SQL, or Spark code.
Question 10
A pipeline execution fails due to a temporary network interruption. Which design practice can help improve reliability?
A. Use dashboard subscriptions
B. Apply endorsement labels
C. Configure retry policies
D. Disable monitoring
Correct Answer: C
Explanation:
Retry policies automatically reattempt failed activities and are a key best practice for building resilient ingestion pipelines.
Go to the DP-700 Exam Prep Hub main page.
