Choose an appropriate streaming engine (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform streaming data
      --> Choose an appropriate streaming engine


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Overview

Modern analytics solutions increasingly rely on the ability to process data as it is generated rather than waiting for scheduled batch loads. Streaming data enables organizations to react to events in near real time, support operational analytics, monitor systems, detect anomalies, and power intelligent applications.

In Microsoft Fabric, selecting the appropriate streaming engine is a critical design decision. The DP-700 exam expects candidates to understand the strengths, limitations, and ideal use cases of the various streaming technologies available in Fabric and to choose the most appropriate option based on business requirements.

This article explores the major streaming engines and technologies within Microsoft Fabric, how they compare, and when to use each one.


What Is Streaming Data?

Streaming data is data that arrives continuously from sources such as:

  • IoT devices
  • Sensors
  • Application logs
  • Clickstream events
  • Social media feeds
  • Financial transactions
  • Manufacturing equipment
  • Website activity
  • Real-time telemetry

Unlike batch processing, where data is collected and processed periodically, streaming systems process data as events arrive.

Common requirements include:

  • Low-latency processing
  • Real-time dashboards
  • Event detection
  • Alert generation
  • Continuous data ingestion
  • Streaming analytics

Streaming Technologies in Microsoft Fabric

The primary streaming technologies that data engineers encounter in Fabric include:

TechnologyPrimary Purpose
EventstreamReal-time event ingestion and routing
EventhouseReal-time analytics using KQL
KQL DatabaseHigh-performance streaming analytics
Real-Time IntelligenceEnd-to-end real-time analytics platform
Spark Structured StreamingLarge-scale streaming transformations
Data ActivatorEvent-driven actions and alerts
PipelinesScheduled orchestration (not true streaming)

Understanding when to use each is essential for the exam.


Eventstream

What Is Eventstream?

Eventstream is Fabric’s low-code real-time ingestion service.

It captures, transforms, filters, and routes streaming events from multiple sources to multiple destinations.

Think of Eventstream as the ingestion layer of a streaming architecture.


Common Sources

Eventstream can ingest data from:

  • Azure Event Hubs
  • Kafka endpoints
  • Fabric events
  • IoT sources
  • Real-time telemetry systems
  • Custom event producers

Common Destinations

Eventstream can send data to:

  • Eventhouse
  • KQL Databases
  • Lakehouses
  • Custom destinations
  • Activator

Best Use Cases

Choose Eventstream when:

  • Events must be continuously ingested
  • Minimal coding is desired
  • Data routing is required
  • Multiple downstream consumers need the same events
  • Building real-time analytics solutions

Exam Tip

If a scenario focuses on ingesting and routing real-time events, Eventstream is usually the best answer.


Eventhouse

What Is Eventhouse?

Eventhouse is a Real-Time Intelligence component optimized for storing and analyzing streaming data.

It is built on Kusto technology and provides:

  • High ingestion rates
  • Near real-time analytics
  • Time-series analysis
  • Log analytics
  • Event exploration

Key Characteristics

  • Optimized for append-only data
  • Supports KQL
  • Fast query performance
  • Near real-time visibility
  • Massive scalability

Best Use Cases

Use Eventhouse when:

  • Large volumes of events arrive continuously
  • Log analytics is required
  • Telemetry analysis is needed
  • Operational dashboards require low latency

Examples:

  • Website activity monitoring
  • Application diagnostics
  • Manufacturing telemetry
  • Security monitoring

KQL Databases

What Is a KQL Database?

A KQL database is the storage and query engine behind many real-time solutions.

It uses Kusto Query Language (KQL) and is highly optimized for:

  • Streaming ingestion
  • Log analytics
  • Time-series data
  • Event correlation

Advantages

  • Extremely fast analytical queries
  • Handles high ingestion volumes
  • Rich time-series functions
  • Powerful aggregation capabilities

Best Use Cases

Choose KQL databases when:

  • Event analysis is the primary objective
  • Massive event volumes exist
  • Time-based analysis is required
  • Operational monitoring is needed

Spark Structured Streaming

What Is Structured Streaming?

Spark Structured Streaming enables continuous processing using Apache Spark.

Unlike Eventstream and Eventhouse, Spark streaming is developer-focused and code-driven.

Supported languages include:

  • PySpark
  • Scala
  • Spark SQL

Capabilities

Spark Structured Streaming supports:

  • Complex transformations
  • Data enrichment
  • Machine learning integration
  • Streaming joins
  • Stateful processing
  • Advanced business logic

Best Use Cases

Choose Spark Structured Streaming when:

  • Complex transformations are required
  • Large-scale processing is needed
  • Machine learning must be integrated
  • Events must be joined with reference datasets
  • Custom code is acceptable

Examples:

  • Fraud detection
  • Customer behavior analytics
  • Streaming feature engineering
  • Predictive maintenance

Exam Tip

If a scenario requires advanced coding and transformation logic, Spark Structured Streaming is often the correct answer.


Real-Time Intelligence

What Is Real-Time Intelligence?

Real-Time Intelligence is Fabric’s complete platform for handling real-time data workloads.

It combines:

  • Eventstream
  • Eventhouse
  • KQL Databases
  • Data Activator
  • Real-time dashboards

Benefits

Provides:

  • End-to-end streaming architecture
  • Real-time monitoring
  • Event processing
  • Alerting
  • Operational analytics

Best Use Cases

Use Real-Time Intelligence when an organization needs:

  • Comprehensive streaming analytics
  • Operational dashboards
  • Real-time monitoring
  • Event-driven insights

Data Activator

What Is Data Activator?

Data Activator monitors events and automatically takes actions when specified conditions occur.

Examples include:

  • Sending emails
  • Triggering workflows
  • Generating notifications
  • Creating alerts

Example

If machine temperature exceeds 90°C:

  • Generate an alert
  • Notify engineers
  • Open a support ticket

Best Use Cases

Choose Data Activator when:

  • Business users need alerts
  • Event-driven automation is required
  • Low-code monitoring is desired

Pipelines Are Not Streaming Engines

A common DP-700 exam trap is confusing pipelines with streaming solutions.

Pipelines:

  • Execute scheduled workloads
  • Orchestrate activities
  • Handle batch data movement

Pipelines do NOT provide continuous event processing.


Appropriate Pipeline Scenarios

  • Daily data loads
  • Weekly ETL jobs
  • Scheduled orchestration
  • Batch transformations

Inappropriate Pipeline Scenarios

  • Second-by-second monitoring
  • Real-time alerts
  • Continuous event processing

Selecting the Appropriate Streaming Engine

Scenario 1: IoT Sensor Telemetry

Requirements:

  • Millions of sensor events
  • Real-time monitoring
  • Fast analytics

Best choice:

Eventstream + Eventhouse


Scenario 2: Fraud Detection

Requirements:

  • Stream transactions
  • Apply advanced business rules
  • Perform enrichment joins

Best choice:

Spark Structured Streaming


Scenario 3: Website Log Analysis

Requirements:

  • Continuous ingestion
  • Fast aggregations
  • Time-series analysis

Best choice:

KQL Database/Eventhouse


Scenario 4: Equipment Failure Alerts

Requirements:

  • Detect threshold breaches
  • Notify operators

Best choice:

Data Activator


Scenario 5: Enterprise Real-Time Analytics Platform

Requirements:

  • Complete streaming solution
  • Dashboards
  • Alerts
  • Analytics

Best choice:

Real-Time Intelligence


Comparison of Streaming Engines

RequirementRecommended Technology
Event ingestionEventstream
Event routingEventstream
Real-time analyticsEventhouse
Log analyticsKQL Database
Time-series analysisKQL Database
Complex transformationsSpark Structured Streaming
Machine learning on streamsSpark Structured Streaming
Alerts and notificationsData Activator
Complete real-time platformReal-Time Intelligence
Scheduled ETLPipelines

DP-700 Exam Tips

Remember these key distinctions:

  • Eventstream = ingestion and routing.
  • Eventhouse = real-time storage and analytics.
  • KQL Database = high-performance event analytics.
  • Spark Structured Streaming = advanced code-based processing.
  • Data Activator = alerts and automated actions.
  • Pipelines = orchestration, not streaming.
  • Real-Time Intelligence = end-to-end streaming solution.

Many exam questions focus on matching business requirements to the correct streaming technology.


Practice Exam Questions

Question 1

A company needs to ingest streaming telemetry from thousands of IoT devices and route the data to multiple downstream consumers.

Which Fabric component should be used?

A. Data Activator
B. Eventstream
C. Pipeline
D. Notebook

Answer: B

Explanation:
Eventstream is specifically designed for real-time event ingestion and routing. Data Activator generates actions, pipelines handle batch orchestration, and notebooks perform transformations rather than ingestion.


Question 2

A solution requires advanced stream processing with custom Python code, joins against reference datasets, and machine learning inference.

Which technology should be selected?

A. Eventhouse
B. Spark Structured Streaming
C. KQL Database
D. Data Activator

Answer: B

Explanation:
Spark Structured Streaming supports complex transformations, enrichment, stateful processing, and machine learning integration through PySpark.


Question 3

A team needs extremely fast analytics over continuously arriving log data and plans to use KQL.

Which storage engine is most appropriate?

A. KQL Database
B. Dataflow Gen2
C. Warehouse
D. Pipeline

Answer: A

Explanation:
KQL databases are optimized for streaming ingestion, time-series analysis, and log analytics.


Question 4

A business user wants automatic notifications whenever inventory levels fall below a threshold.

Which Fabric component is best suited?

A. Eventstream
B. Notebook
C. Data Activator
D. Pipeline

Answer: C

Explanation:
Data Activator monitors data conditions and triggers automated actions such as alerts and notifications.


Question 5

Which Fabric component is primarily responsible for routing real-time events to destinations?

A. Warehouse
B. Eventstream
C. Dataflow Gen2
D. Notebook

Answer: B

Explanation:
Eventstream serves as the ingestion and routing layer for streaming architectures.


Question 6

A company requires an end-to-end platform for ingesting, storing, analyzing, and monitoring streaming events.

Which solution should be recommended?

A. Real-Time Intelligence
B. Dataflow Gen2
C. Warehouse
D. SQL Endpoint

Answer: A

Explanation:
Real-Time Intelligence combines ingestion, analytics, monitoring, alerting, and visualization capabilities into a unified platform.


Question 7

Which technology is best suited for analyzing application logs with time-series queries and low-latency reporting?

A. Notebook
B. Warehouse
C. Eventhouse
D. Pipeline

Answer: C

Explanation:
Eventhouse is optimized for streaming analytics, log analysis, and time-series workloads.


Question 8

A solution requires nightly ingestion of source data into a lakehouse.

Which option is most appropriate?

A. Eventstream
B. Data Activator
C. Eventhouse
D. Pipeline

Answer: D

Explanation:
Nightly ingestion is a batch process and is best handled through scheduled pipeline execution.


Question 9

A data engineer needs to continuously enrich streaming events using lookup data and perform custom business-rule calculations.

Which technology should be selected?

A. Spark Structured Streaming
B. Data Activator
C. Eventstream
D. Dashboard

Answer: A

Explanation:
Spark Structured Streaming provides advanced transformation capabilities including joins, aggregations, and custom code execution.


Question 10

Which statement best describes Eventhouse?

A. A workflow orchestration service for ETL processes
B. A low-code data preparation tool
C. A real-time analytics store optimized for event and telemetry data
D. A machine learning training environment

Answer: C

Explanation:
Eventhouse is designed for high-scale event ingestion, real-time analytics, log analytics, and KQL-based querying of streaming data.


Go to the DP-700 Exam Prep Hub main page.

Leave a comment