This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
--> Design and implement loading patterns
--> Design and implement a loading pattern for streaming data
Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Introduction
Traditional batch data processing has been the foundation of analytics systems for decades. However, many modern business scenarios require data to be processed and analyzed as soon as it is generated. Examples include IoT sensors, website clickstreams, financial transactions, manufacturing equipment telemetry, and application monitoring.
Microsoft Fabric provides several capabilities that support streaming and real-time analytics through its Real-Time Intelligence workloads, Eventstreams, KQL databases, Data Activator, Lakehouses, and Spark technologies.
For the DP-700 exam, you should understand:
- Streaming versus batch processing
- Real-time and near real-time architectures
- Event-driven data ingestion
- Eventstreams
- Event processing patterns
- Streaming destinations
- KQL databases
- Lakehouse streaming ingestion
- Event-driven orchestration
- Windowing concepts
- Checkpointing and fault tolerance
- Performance and scalability considerations
Many DP-700 scenario questions focus on choosing the appropriate loading pattern based on latency requirements and business needs.
Understanding Streaming Data
Streaming data is data that arrives continuously over time rather than in large batches.
Examples include:
| Source | Example Data |
|---|---|
| IoT Devices | Temperature readings |
| Web Applications | User clicks |
| Retail Systems | Purchases |
| Mobile Apps | User activity |
| Manufacturing Equipment | Sensor telemetry |
| Financial Systems | Transaction events |
Instead of loading data once per day, streaming systems continuously process incoming events.
Batch vs Streaming Processing
Batch Processing
Processes accumulated data at scheduled intervals.
Example:
Daily Sales File ↓Midnight ETL Process ↓Data Warehouse
Characteristics:
- High latency
- Simpler architecture
- Efficient for large historical datasets
Streaming Processing
Processes events continuously as they arrive.
Example:
Sensor Event ↓Immediate Processing ↓Analytics Platform
Characteristics:
- Low latency
- Near real-time insights
- Event-driven architecture
Streaming Data Latency Categories
Real-Time
Typically seconds or less.
Example:
Fraud Detection
Near Real-Time
Typically seconds to minutes.
Example:
Operational Dashboards
Micro-Batch
Small batches processed frequently.
Example:
Every 30 SecondsEvery 1 MinuteEvery 5 Minutes
Many streaming implementations in Fabric use micro-batch processing internally.
Streaming Architecture in Microsoft Fabric
A common Fabric streaming architecture:
Event Source ↓Eventstream ↓Transformation ↓Destination ↓Analytics
Possible destinations include:
- KQL Database
- Lakehouse
- Warehouse
- Real-Time Dashboard
Event-Driven Processing
Streaming systems are event-driven.
An event represents something that happened.
Examples:
Order CreatedOrder UpdatedMachine StartedTemperature ChangedSensor Failed
Events are generated continuously and processed immediately.
Eventstreams
Eventstreams are one of the core ingestion services in Microsoft Fabric Real-Time Intelligence.
Eventstreams provide:
- Event ingestion
- Routing
- Filtering
- Transformation
- Distribution
Eventstreams simplify streaming architecture by reducing custom development requirements.
Eventstream Sources
Common sources include:
Azure Event Hubs
High-volume event ingestion service.
IoT Hubs
Designed for IoT device communication.
Fabric Events
Events generated within Fabric workloads.
Custom Applications
Applications publishing events directly.
Eventstream Destinations
Eventstreams can route data to:
KQL Databases
Optimized for real-time analytics.
Lakehouses
Supports historical storage and analytics.
Eventhouse
Supports large-scale streaming workloads.
Activator
Supports automated actions and alerts.
Designing a Streaming Loading Pattern
A typical design includes:
Event Producer ↓Eventstream ↓Validation ↓Transformation ↓Storage Layer ↓Analytics
Each stage serves a specific purpose.
Step 1: Event Ingestion
The first step is capturing events from source systems.
Example:
Manufacturing Sensor ↓Temperature Reading ↓Eventstream
The ingestion layer must support:
- High throughput
- Reliability
- Scalability
Step 2: Data Validation
Streaming data often contains:
- Missing fields
- Invalid values
- Corrupt messages
Example:
Temperature = NULL
Such events may be:
- Rejected
- Corrected
- Routed elsewhere
Step 3: Stream Transformation
Common transformations include:
Filtering
Remove unnecessary events.
Example:
Temperature > 80
Enrichment
Add contextual information.
Example:
Device ID+Location Data
Aggregation
Combine multiple events.
Example:
Average TemperaturePer Minute
Step 4: Storage
Streaming systems often separate:
Hot Storage
Recent data for immediate analysis.
Cold Storage
Historical data for long-term reporting.
Fabric commonly uses:
KQL Database+Lakehouse
for this purpose.
KQL Databases
KQL databases are optimized for:
- Time-series data
- Telemetry
- Log analytics
- Streaming workloads
Benefits include:
- Fast ingestion
- High query performance
- Real-time dashboards
For DP-700, KQL databases are frequently associated with streaming scenarios.
Lakehouse Streaming Storage
Streaming data can also be written into Delta tables within a Lakehouse.
Benefits:
- Historical retention
- Data science workloads
- Machine learning
- Unified analytics
This pattern combines real-time and batch analytics.
Eventhouse
Eventhouse is designed for:
- Large-scale event analytics
- Streaming workloads
- Real-time intelligence solutions
It integrates closely with KQL databases and Eventstreams.
Windowing Concepts
Streaming systems often process data using windows.
A window groups events together for calculations.
Tumbling Window
Fixed non-overlapping intervals.
Example:
12:00-12:0512:05-12:1012:10-12:15
Each event belongs to one window.
Sliding Window
Windows overlap.
Example:
Every minuteLast 5 minutes
Provides continuous calculations.
Session Window
Groups events based on activity.
Example:
User Activity Session
Useful for clickstream analysis.
Checkpointing
Checkpointing tracks processing progress.
Purpose:
- Recovery after failures
- Prevent data loss
- Avoid duplicate processing
Without checkpointing:
System Failure↓Reprocess Everything
With checkpointing:
System Failure↓Resume From Last Checkpoint
Fault Tolerance
Streaming architectures must handle failures.
Strategies include:
Retry Logic
Automatically retry failed operations.
Checkpointing
Resume processing after failures.
Durable Storage
Persist data before processing.
Dead-Letter Queues
Store problematic events for investigation.
Event Ordering
Events may arrive out of sequence.
Example:
Event 3Event 1Event 2
Streaming solutions may require:
- Event timestamps
- Watermarks
- Reordering logic
Scalability Considerations
Streaming systems must scale with event volume.
Important considerations:
Throughput
Events processed per second.
Parallelism
Multiple processors handling data simultaneously.
Partitioning
Distributing events across resources.
Resource Management
Balancing cost and performance.
Streaming vs Batch Loading in Fabric
| Characteristic | Batch | Streaming |
|---|---|---|
| Latency | Minutes to Hours | Seconds |
| Trigger | Schedule | Event |
| Processing | Periodic | Continuous |
| Use Case | Historical Reporting | Operational Analytics |
| Architecture | Simpler | More Complex |
Common Fabric Streaming Patterns
Pattern 1: IoT Analytics
IoT Devices ↓Eventstream ↓KQL Database ↓Real-Time Dashboard
Pattern 2: Operational Monitoring
Applications ↓Eventstream ↓Eventhouse ↓Alerts
Pattern 3: Real-Time + Historical Analytics
Events ↓Eventstream ↓Lakehouse ↓Delta Tables ↓Analytics
Common DP-700 Exam Scenarios
Scenario 1
A company wants dashboards updated within seconds of receiving telemetry.
Best solution:
Streaming ingestion using Eventstreams and KQL databases
Scenario 2
A manufacturing system generates millions of sensor events daily.
Best solution:
Eventstream → Eventhouse → KQL Database
Scenario 3
An organization wants real-time analytics and historical reporting.
Best solution:
Eventstream → Lakehouse → Delta Tables
Scenario 4
A system must automatically alert users when a sensor exceeds a threshold.
Best solution:
Streaming ingestion with Data Activator
Best Practices
Use Eventstreams for Ingestion
Provides scalable event routing and transformation.
Use KQL Databases for Real-Time Analytics
Optimized for telemetry and time-series data.
Store Historical Data in Lakehouses
Supports long-term analytics and machine learning.
Implement Checkpointing
Improves reliability and recovery.
Design for Scalability
Plan for growth in event volume.
Validate Data Early
Prevent poor-quality events from contaminating downstream systems.
DP-700 Exam Focus Areas
You should understand:
✓ Streaming vs batch processing
✓ Event-driven architectures
✓ Eventstreams
✓ Eventhouse
✓ KQL databases
✓ Real-time analytics
✓ Near real-time processing
✓ Windowing concepts
✓ Streaming transformations
✓ Event routing
✓ Checkpointing
✓ Fault tolerance
✓ Lakehouse streaming ingestion
✓ Real-Time Intelligence workloads
Practice Exam Questions
Question 1
A company requires dashboards to update within seconds of receiving IoT telemetry. Which loading pattern should be implemented?
A. Weekly snapshot loading
B. Daily batch processing
C. Streaming ingestion
D. Full data reloads
Answer: C
Explanation
Streaming ingestion provides low-latency processing and supports near real-time dashboard updates.
Question 2
Which Microsoft Fabric component is primarily used to ingest, route, and transform streaming events?
A. Dataflow Gen2
B. Eventstream
C. Warehouse
D. Deployment Pipeline
Answer: B
Explanation
Eventstreams are specifically designed for real-time event ingestion, transformation, and routing.
Question 3
A data engineer needs a destination optimized for time-series analytics and rapid ingestion of telemetry data.
Which destination should be selected?
A. Lakehouse
B. Warehouse
C. KQL Database
D. Dataflow Gen2
Answer: C
Explanation
KQL databases are optimized for real-time analytics, telemetry, and log data.
Question 4
What is the primary benefit of checkpointing in a streaming solution?
A. Enables recovery after processing failures
B. Compresses event data
C. Eliminates duplicates permanently
D. Encrypts incoming events
Answer: A
Explanation
Checkpointing records processing progress, allowing recovery from the last successful point after failures.
Question 5
Which window type uses fixed, non-overlapping intervals?
A. Session window
B. Tumbling window
C. Dynamic window
D. Watermark window
Answer: B
Explanation
Tumbling windows divide data into fixed intervals without overlap.
Question 6
An organization wants to preserve streaming data for long-term analytics and machine learning workloads.
Which destination is most appropriate?
A. Lakehouse
B. Data Activator
C. Eventstream
D. Workspace
Answer: A
Explanation
Lakehouses provide scalable storage and support advanced analytics and machine learning.
Question 7
Which characteristic most distinguishes streaming processing from batch processing?
A. Lower storage requirements
B. Simpler architecture
C. Continuous event processing
D. Larger processing windows
Answer: C
Explanation
Streaming systems process data continuously as events arrive rather than at scheduled intervals.
Question 8
A user activity analysis solution must group events based on periods of user activity separated by inactivity.
Which window type should be used?
A. Sliding window
B. Tumbling window
C. Fixed window
D. Session window
Answer: D
Explanation
Session windows are designed to group events according to user activity sessions.
Question 9
What is the primary purpose of event enrichment during stream processing?
A. Delete invalid records
B. Add contextual information to events
C. Increase event frequency
D. Reduce storage costs
Answer: B
Explanation
Enrichment adds additional business or reference data to incoming events to improve analytical value.
Question 10
A company requires a Fabric architecture that supports both real-time analytics and historical analysis of streaming data.
Which design is most appropriate?
A. Eventstream → KQL Database only
B. Dataflow Gen2 → Warehouse
C. Eventstream → Lakehouse → Delta Tables
D. Scheduled Pipeline → Warehouse
Answer: C
Explanation
Writing streaming data to a Lakehouse enables historical retention while supporting analytical workloads through Delta tables.
Exam Tip
For DP-700, remember the following associations:
| Requirement | Recommended Fabric Technology |
|---|---|
| Real-time event ingestion | Eventstream |
| Time-series analytics | KQL Database |
| Large-scale event analytics | Eventhouse |
| Long-term storage | Lakehouse |
| Automated event-driven actions | Data Activator |
| Continuous processing | Streaming Pattern |
| Scheduled processing | Batch Pattern |
A common exam clue is wording such as:
“Data must be available for analysis within seconds of being generated.”
When you see this requirement, the correct solution will almost always involve streaming ingestion, Eventstreams, and often KQL databases or Eventhouse, rather than traditional batch-oriented pipelines.
Go to the DP-700 Exam Prep Hub main page.
