This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
--> Ingest and transform streaming data
--> Choose an appropriate streaming engine
Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Overview
Modern analytics solutions increasingly rely on the ability to process data as it is generated rather than waiting for scheduled batch loads. Streaming data enables organizations to react to events in near real time, support operational analytics, monitor systems, detect anomalies, and power intelligent applications.
In Microsoft Fabric, selecting the appropriate streaming engine is a critical design decision. The DP-700 exam expects candidates to understand the strengths, limitations, and ideal use cases of the various streaming technologies available in Fabric and to choose the most appropriate option based on business requirements.
This article explores the major streaming engines and technologies within Microsoft Fabric, how they compare, and when to use each one.
What Is Streaming Data?
Streaming data is data that arrives continuously from sources such as:
- IoT devices
- Sensors
- Application logs
- Clickstream events
- Social media feeds
- Financial transactions
- Manufacturing equipment
- Website activity
- Real-time telemetry
Unlike batch processing, where data is collected and processed periodically, streaming systems process data as events arrive.
Common requirements include:
- Low-latency processing
- Real-time dashboards
- Event detection
- Alert generation
- Continuous data ingestion
- Streaming analytics
Streaming Technologies in Microsoft Fabric
The primary streaming technologies that data engineers encounter in Fabric include:
| Technology | Primary Purpose |
|---|---|
| Eventstream | Real-time event ingestion and routing |
| Eventhouse | Real-time analytics using KQL |
| KQL Database | High-performance streaming analytics |
| Real-Time Intelligence | End-to-end real-time analytics platform |
| Spark Structured Streaming | Large-scale streaming transformations |
| Data Activator | Event-driven actions and alerts |
| Pipelines | Scheduled orchestration (not true streaming) |
Understanding when to use each is essential for the exam.
Eventstream
What Is Eventstream?
Eventstream is Fabric’s low-code real-time ingestion service.
It captures, transforms, filters, and routes streaming events from multiple sources to multiple destinations.
Think of Eventstream as the ingestion layer of a streaming architecture.
Common Sources
Eventstream can ingest data from:
- Azure Event Hubs
- Kafka endpoints
- Fabric events
- IoT sources
- Real-time telemetry systems
- Custom event producers
Common Destinations
Eventstream can send data to:
- Eventhouse
- KQL Databases
- Lakehouses
- Custom destinations
- Activator
Best Use Cases
Choose Eventstream when:
- Events must be continuously ingested
- Minimal coding is desired
- Data routing is required
- Multiple downstream consumers need the same events
- Building real-time analytics solutions
Exam Tip
If a scenario focuses on ingesting and routing real-time events, Eventstream is usually the best answer.
Eventhouse
What Is Eventhouse?
Eventhouse is a Real-Time Intelligence component optimized for storing and analyzing streaming data.
It is built on Kusto technology and provides:
- High ingestion rates
- Near real-time analytics
- Time-series analysis
- Log analytics
- Event exploration
Key Characteristics
- Optimized for append-only data
- Supports KQL
- Fast query performance
- Near real-time visibility
- Massive scalability
Best Use Cases
Use Eventhouse when:
- Large volumes of events arrive continuously
- Log analytics is required
- Telemetry analysis is needed
- Operational dashboards require low latency
Examples:
- Website activity monitoring
- Application diagnostics
- Manufacturing telemetry
- Security monitoring
KQL Databases
What Is a KQL Database?
A KQL database is the storage and query engine behind many real-time solutions.
It uses Kusto Query Language (KQL) and is highly optimized for:
- Streaming ingestion
- Log analytics
- Time-series data
- Event correlation
Advantages
- Extremely fast analytical queries
- Handles high ingestion volumes
- Rich time-series functions
- Powerful aggregation capabilities
Best Use Cases
Choose KQL databases when:
- Event analysis is the primary objective
- Massive event volumes exist
- Time-based analysis is required
- Operational monitoring is needed
Spark Structured Streaming
What Is Structured Streaming?
Spark Structured Streaming enables continuous processing using Apache Spark.
Unlike Eventstream and Eventhouse, Spark streaming is developer-focused and code-driven.
Supported languages include:
- PySpark
- Scala
- Spark SQL
Capabilities
Spark Structured Streaming supports:
- Complex transformations
- Data enrichment
- Machine learning integration
- Streaming joins
- Stateful processing
- Advanced business logic
Best Use Cases
Choose Spark Structured Streaming when:
- Complex transformations are required
- Large-scale processing is needed
- Machine learning must be integrated
- Events must be joined with reference datasets
- Custom code is acceptable
Examples:
- Fraud detection
- Customer behavior analytics
- Streaming feature engineering
- Predictive maintenance
Exam Tip
If a scenario requires advanced coding and transformation logic, Spark Structured Streaming is often the correct answer.
Real-Time Intelligence
What Is Real-Time Intelligence?
Real-Time Intelligence is Fabric’s complete platform for handling real-time data workloads.
It combines:
- Eventstream
- Eventhouse
- KQL Databases
- Data Activator
- Real-time dashboards
Benefits
Provides:
- End-to-end streaming architecture
- Real-time monitoring
- Event processing
- Alerting
- Operational analytics
Best Use Cases
Use Real-Time Intelligence when an organization needs:
- Comprehensive streaming analytics
- Operational dashboards
- Real-time monitoring
- Event-driven insights
Data Activator
What Is Data Activator?
Data Activator monitors events and automatically takes actions when specified conditions occur.
Examples include:
- Sending emails
- Triggering workflows
- Generating notifications
- Creating alerts
Example
If machine temperature exceeds 90°C:
- Generate an alert
- Notify engineers
- Open a support ticket
Best Use Cases
Choose Data Activator when:
- Business users need alerts
- Event-driven automation is required
- Low-code monitoring is desired
Pipelines Are Not Streaming Engines
A common DP-700 exam trap is confusing pipelines with streaming solutions.
Pipelines:
- Execute scheduled workloads
- Orchestrate activities
- Handle batch data movement
Pipelines do NOT provide continuous event processing.
Appropriate Pipeline Scenarios
- Daily data loads
- Weekly ETL jobs
- Scheduled orchestration
- Batch transformations
Inappropriate Pipeline Scenarios
- Second-by-second monitoring
- Real-time alerts
- Continuous event processing
Selecting the Appropriate Streaming Engine
Scenario 1: IoT Sensor Telemetry
Requirements:
- Millions of sensor events
- Real-time monitoring
- Fast analytics
Best choice:
Eventstream + Eventhouse
Scenario 2: Fraud Detection
Requirements:
- Stream transactions
- Apply advanced business rules
- Perform enrichment joins
Best choice:
Spark Structured Streaming
Scenario 3: Website Log Analysis
Requirements:
- Continuous ingestion
- Fast aggregations
- Time-series analysis
Best choice:
KQL Database/Eventhouse
Scenario 4: Equipment Failure Alerts
Requirements:
- Detect threshold breaches
- Notify operators
Best choice:
Data Activator
Scenario 5: Enterprise Real-Time Analytics Platform
Requirements:
- Complete streaming solution
- Dashboards
- Alerts
- Analytics
Best choice:
Real-Time Intelligence
Comparison of Streaming Engines
| Requirement | Recommended Technology |
|---|---|
| Event ingestion | Eventstream |
| Event routing | Eventstream |
| Real-time analytics | Eventhouse |
| Log analytics | KQL Database |
| Time-series analysis | KQL Database |
| Complex transformations | Spark Structured Streaming |
| Machine learning on streams | Spark Structured Streaming |
| Alerts and notifications | Data Activator |
| Complete real-time platform | Real-Time Intelligence |
| Scheduled ETL | Pipelines |
DP-700 Exam Tips
Remember these key distinctions:
- Eventstream = ingestion and routing.
- Eventhouse = real-time storage and analytics.
- KQL Database = high-performance event analytics.
- Spark Structured Streaming = advanced code-based processing.
- Data Activator = alerts and automated actions.
- Pipelines = orchestration, not streaming.
- Real-Time Intelligence = end-to-end streaming solution.
Many exam questions focus on matching business requirements to the correct streaming technology.
Practice Exam Questions
Question 1
A company needs to ingest streaming telemetry from thousands of IoT devices and route the data to multiple downstream consumers.
Which Fabric component should be used?
A. Data Activator
B. Eventstream
C. Pipeline
D. Notebook
Answer: B
Explanation:
Eventstream is specifically designed for real-time event ingestion and routing. Data Activator generates actions, pipelines handle batch orchestration, and notebooks perform transformations rather than ingestion.
Question 2
A solution requires advanced stream processing with custom Python code, joins against reference datasets, and machine learning inference.
Which technology should be selected?
A. Eventhouse
B. Spark Structured Streaming
C. KQL Database
D. Data Activator
Answer: B
Explanation:
Spark Structured Streaming supports complex transformations, enrichment, stateful processing, and machine learning integration through PySpark.
Question 3
A team needs extremely fast analytics over continuously arriving log data and plans to use KQL.
Which storage engine is most appropriate?
A. KQL Database
B. Dataflow Gen2
C. Warehouse
D. Pipeline
Answer: A
Explanation:
KQL databases are optimized for streaming ingestion, time-series analysis, and log analytics.
Question 4
A business user wants automatic notifications whenever inventory levels fall below a threshold.
Which Fabric component is best suited?
A. Eventstream
B. Notebook
C. Data Activator
D. Pipeline
Answer: C
Explanation:
Data Activator monitors data conditions and triggers automated actions such as alerts and notifications.
Question 5
Which Fabric component is primarily responsible for routing real-time events to destinations?
A. Warehouse
B. Eventstream
C. Dataflow Gen2
D. Notebook
Answer: B
Explanation:
Eventstream serves as the ingestion and routing layer for streaming architectures.
Question 6
A company requires an end-to-end platform for ingesting, storing, analyzing, and monitoring streaming events.
Which solution should be recommended?
A. Real-Time Intelligence
B. Dataflow Gen2
C. Warehouse
D. SQL Endpoint
Answer: A
Explanation:
Real-Time Intelligence combines ingestion, analytics, monitoring, alerting, and visualization capabilities into a unified platform.
Question 7
Which technology is best suited for analyzing application logs with time-series queries and low-latency reporting?
A. Notebook
B. Warehouse
C. Eventhouse
D. Pipeline
Answer: C
Explanation:
Eventhouse is optimized for streaming analytics, log analysis, and time-series workloads.
Question 8
A solution requires nightly ingestion of source data into a lakehouse.
Which option is most appropriate?
A. Eventstream
B. Data Activator
C. Eventhouse
D. Pipeline
Answer: D
Explanation:
Nightly ingestion is a batch process and is best handled through scheduled pipeline execution.
Question 9
A data engineer needs to continuously enrich streaming events using lookup data and perform custom business-rule calculations.
Which technology should be selected?
A. Spark Structured Streaming
B. Data Activator
C. Eventstream
D. Dashboard
Answer: A
Explanation:
Spark Structured Streaming provides advanced transformation capabilities including joins, aggregations, and custom code execution.
Question 10
Which statement best describes Eventhouse?
A. A workflow orchestration service for ETL processes
B. A low-code data preparation tool
C. A real-time analytics store optimized for event and telemetry data
D. A machine learning training environment
Answer: C
Explanation:
Eventhouse is designed for high-scale event ingestion, real-time analytics, log analytics, and KQL-based querying of streaming data.
Go to the DP-700 Exam Prep Hub main page.
