Tag: Streaming data

Create windowing functions (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform streaming data
      --> Create windowing functions


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Overview

Windowing functions are a fundamental concept in stream processing and real-time analytics. In Microsoft Fabric, windowing functions enable you to group continuous streams of events into logical segments called windows, allowing aggregations and calculations to be performed on streaming data as it arrives. Windowing is heavily used in Eventstreams, Real-Time Intelligence, KQL queries, and stream processing scenarios. (Reitse’s blog)

Unlike batch processing, where all data is available before processing begins, streaming systems deal with potentially infinite streams of incoming events. Windowing functions provide a mechanism to divide this endless stream into manageable chunks for analysis. (MindMesh Academy)

For the DP-700 exam, you should understand:

  • Why windowing functions are required
  • The different window types
  • When each window type should be used
  • How windowing applies in Eventstreams and KQL
  • The differences between tumbling, hopping, sliding, session, and snapshot windows
  • Common real-world scenarios

Why Windowing Functions Are Needed

Imagine a sensor generating thousands of temperature readings every second.

Without windows:

  • Data arrives continuously.
  • Aggregations never complete.
  • Calculating averages, counts, or sums becomes difficult.

Windowing functions solve this problem by grouping events into defined time intervals where calculations can be performed. (MindMesh Academy)

Examples include:

  • Count website visits every 5 minutes
  • Calculate average temperature every minute
  • Measure sales totals every hour
  • Detect unusual activity within a rolling 10-minute period
  • Analyze user sessions based on inactivity

Windowing in Microsoft Fabric

Windowing is primarily encountered in:

  • Eventstreams
  • Real-Time Intelligence
  • Eventhouse queries
  • KQL transformations
  • Streaming analytics solutions

Fabric supports several window types, each designed for different business requirements. (Reitse’s blog)


Tumbling Windows

Definition

A tumbling window divides a stream into fixed, non-overlapping time intervals. Each event belongs to exactly one window. (MindMesh Academy)

Example

Five-minute windows:

Window
09:00–09:05
09:05–09:10
09:10–09:15

Events are assigned to one and only one window.


Characteristics

  • Fixed size
  • No overlap
  • Continuous
  • Predictable results

Use Cases

Website Traffic

Count visitors every five minutes.

Sensor Monitoring

Calculate average temperature every minute.

Sales Reporting

Generate hourly revenue summaries.


Exam Tip

If a question mentions:

  • Fixed intervals
  • Non-overlapping periods
  • Each event belongs to one window

The answer is almost always Tumbling Window.


Hopping Windows

Definition

A hopping window uses fixed-length windows that overlap. New windows start at specified intervals called the hop size. (Reitse’s blog)


Example

Window Size = 10 minutes

Hop Interval = 5 minutes

Windows:

Window
09:00–09:10
09:05–09:15
09:10–09:20

An event may appear in multiple windows.


Characteristics

  • Fixed size
  • Overlapping
  • Events can belong to multiple windows

Use Cases

Rolling Analytics

Monitor sales over the previous 10 minutes every 5 minutes.

Performance Monitoring

Analyze server utilization trends.

Operational Dashboards

Create smoother trend analysis.


Exam Tip

If a question describes:

  • Overlapping windows
  • Fixed intervals
  • Repeated calculations over rolling periods

Choose Hopping Window.


Sliding Windows

Definition

Sliding windows continuously evaluate data over a moving time range. Unlike tumbling windows, calculations are updated whenever new events arrive. (Reitse’s blog)


Example

Monitor failed logins within the previous 10 minutes.

As each new event arrives:

  • Old events leave the window
  • New events enter the window
  • Results update continuously

Characteristics

  • Continuous evaluation
  • Overlapping by nature
  • Event-driven processing

Use Cases

Fraud Detection

Detect suspicious transaction patterns.

Security Monitoring

Identify repeated failed logins.

IoT Alerts

Trigger warnings when sensor thresholds are exceeded.


Exam Tip

If the question mentions:

  • Real-time rolling calculations
  • Continuous updates
  • Last X minutes of activity

The correct answer is usually Sliding Window.


Session Windows

Definition

A session window groups events based on periods of activity separated by inactivity gaps. (Reitse’s blog)

Instead of fixed times, session windows are defined by user behavior.


Example

User activity:

Event Time
10:00
10:03
10:05
10:25

If timeout = 10 minutes:

Session 1:

  • 10:00
  • 10:03
  • 10:05

Session 2:

  • 10:25

The 20-minute gap creates a new session.


Characteristics

  • Activity-based
  • Dynamic duration
  • Defined by inactivity timeout

Use Cases

Website User Sessions

Track user visits.

Application Usage

Measure active engagement periods.

Customer Behavior Analytics

Group interactions into sessions.


Exam Tip

Look for keywords:

  • User sessions
  • Inactivity timeout
  • Activity periods

These indicate Session Window.


Snapshot Windows

Definition

A snapshot window captures data at a specific point in time rather than over a duration. (TechTacoFriday)

Think of it as taking a picture of the stream at a particular instant.


Use Cases

Point-in-Time Metrics

Current active users.

Device Status Monitoring

Current state of equipment.

Operational Dashboards

Real-time snapshots of system health.


Comparing Window Types

Window TypeOverlapFixed DurationBased on Inactivity
TumblingNoYesNo
HoppingYesYesNo
SlidingYesDynamicNo
SessionDynamicNoYes
SnapshotNoInstantNo

Windowing in Eventstreams

In Microsoft Fabric Eventstreams, windowing is commonly implemented using the Group By transformation. After selecting a window type, you can apply aggregations such as:

  • Count
  • Sum
  • Average
  • Minimum
  • Maximum

These aggregations help convert raw event streams into meaningful business metrics. (Reitse’s blog)


Windowing in KQL

KQL supports time-based aggregations using functions such as:

SalesEvents
| summarize TotalSales=sum(Amount)
by bin(Timestamp, 5m)

The bin() function creates fixed time buckets similar to tumbling windows. (A Guide to Cloud & AI)

Common KQL windowing scenarios include:

  • Time-series analytics
  • Streaming dashboards
  • Real-time monitoring
  • Trend analysis

Windowing and Streaming Analytics

Windowing is critical because streaming data never stops arriving.

Without windows:

  • Aggregations would never complete.
  • Metrics could not be calculated efficiently.
  • Real-time dashboards would be difficult to build.

Windows provide structure and enable:

  • Aggregation
  • Alerting
  • Trend detection
  • Session analysis
  • Operational monitoring

DP-700 Exam Tips

Know the Window Types

Microsoft frequently tests differences between:

  • Tumbling
  • Hopping
  • Sliding
  • Session

Remember Tumbling

If:

  • Windows are fixed
  • Windows do not overlap
  • Events belong to exactly one window

Choose Tumbling.


Remember Session

If:

  • User behavior is involved
  • There is an inactivity timeout
  • Windows vary in length

Choose Session.


Remember Hopping

If:

  • Windows overlap
  • Windows have fixed sizes
  • Events can appear multiple times

Choose Hopping.


Remember Sliding

If:

  • Continuous recalculation occurs
  • Rolling analysis is needed
  • Alerts depend on recent activity

Choose Sliding.


Practice Exam Questions

Question 1

A streaming solution must calculate the average temperature every minute. Each reading should belong to exactly one aggregation period.

What should you use?

A. Sliding window

B. Session window

C. Tumbling window

D. Hopping window

Answer: C

Explanation: Tumbling windows use fixed, non-overlapping intervals and each event belongs to only one window. (Scribd)


Question 2

You need to analyze sales from the previous 10 minutes every 5 minutes.

Which window type should you use?

A. Hopping window

B. Session window

C. Snapshot window

D. Tumbling window

Answer: A

Explanation: Hopping windows overlap and allow repeated analysis over rolling periods.


Question 3

A website analytics solution must group user activity until no activity occurs for 15 minutes.

Which window type is most appropriate?

A. Tumbling window

B. Snapshot window

C. Sliding window

D. Session window

Answer: D

Explanation: Session windows are based on inactivity periods and user behavior.


Question 4

You need a fraud detection solution that continuously evaluates transactions from the last five minutes whenever a new transaction arrives.

Which window type should be used?

A. Snapshot window

B. Session window

C. Tumbling window

D. Sliding window

Answer: D

Explanation: Sliding windows continuously recalculate results as new events arrive.


Question 5

Which window type allows an event to appear in multiple windows?

A. Tumbling window

B. Snapshot window

C. Hopping window

D. Session window

Answer: C

Explanation: Hopping windows overlap, allowing events to participate in multiple aggregations.


Question 6

What is the primary purpose of windowing functions in streaming systems?

A. Encrypt streaming data

B. Divide continuous streams into manageable groups for processing

C. Compress incoming events

D. Eliminate duplicate records

Answer: B

Explanation: Windowing organizes continuous streams into finite chunks that can be aggregated and analyzed. (MindMesh Academy)


Question 7

Which window type is most suitable for calculating hourly sales totals where no overlap is desired?

A. Sliding window

B. Hopping window

C. Session window

D. Tumbling window

Answer: D

Explanation: Tumbling windows create fixed, non-overlapping intervals.


Question 8

A streaming query groups events whenever there is activity and closes the group after ten minutes of inactivity.

What is being used?

A. Snapshot window

B. Hopping window

C. Session window

D. Tumbling window

Answer: C

Explanation: Session windows are based on inactivity timeouts.


Question 9

Which statement accurately describes a sliding window?

A. Events belong to only one interval

B. Results are calculated only after the window closes

C. Windows are based on inactivity gaps

D. Results are continuously updated as events arrive

Answer: D

Explanation: Sliding windows continuously recalculate as new events enter and old events leave the window.


Question 10

In Microsoft Fabric Eventstreams, windowing is commonly configured through which transformation?

A. Group By

B. Expand

C. Join

D. Union

Answer: A

Explanation: Eventstreams typically implement windowing through the Group By transformation, where window type and aggregations are defined. (Reitse’s blog)


Go to the DP-700 Exam Prep Hub main page.

Process data by using KQL (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform streaming data
      --> Process data by using KQL


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

As organizations increasingly rely on real-time analytics, the ability to query, transform, and analyze streaming data efficiently has become a critical skill for data engineers. Within Microsoft Fabric, one of the most important technologies for real-time data processing is Kusto Query Language (KQL).

KQL is the primary query language used in Real-Time Intelligence, Eventhouses, KQL Databases, and many streaming analytics scenarios within Microsoft Fabric. It is specifically optimized for high-performance analysis of large volumes of telemetry, log, event, and time-series data.

For the DP-700 exam, candidates should understand how KQL is used to process streaming data, when it should be selected over Spark or SQL, common KQL operators, ingestion concepts, aggregation techniques, windowing functions, and real-time analytics patterns.


What Is KQL?

Kusto Query Language (KQL) is a read-optimized query language developed by Microsoft for exploring, analyzing, and transforming large volumes of structured, semi-structured, and streaming data.

KQL is the primary language used in:

  • Microsoft Fabric Real-Time Intelligence
  • Eventhouses
  • KQL Databases
  • Azure Data Explorer
  • Microsoft Sentinel
  • Azure Monitor Logs

KQL is designed for:

  • Fast interactive analytics
  • Log analysis
  • Telemetry processing
  • Streaming data analytics
  • Time-series analysis
  • Monitoring solutions

Unlike traditional T-SQL, KQL uses a pipeline-style syntax that makes analytical queries easier to read and maintain.


Why Use KQL for Streaming Data?

KQL is optimized for scenarios involving:

  • High ingestion rates
  • Near real-time querying
  • Large event volumes
  • Time-series analysis
  • Operational monitoring
  • IoT telemetry
  • Application logs
  • Security analytics

A major advantage is that newly ingested streaming data can often be queried within seconds of arrival.


KQL in Microsoft Fabric

Within Microsoft Fabric, KQL is primarily used in:

Eventhouses

Eventhouses provide scalable storage and analytics for real-time data.

Capabilities include:

  • High-speed ingestion
  • KQL querying
  • Streaming analytics
  • Time-series analysis
  • Dashboard integration

Eventhouses are commonly used as the central repository for streaming event data.


KQL Databases

A KQL Database is a database inside an Eventhouse.

It stores:

  • Tables
  • Functions
  • Materialized views
  • Policies

KQL queries execute against these databases.


KQL Processing Workflow

A typical streaming architecture looks like:

Event Source
|
v
Eventstream
|
v
Eventhouse
|
v
KQL Database
|
v
KQL Queries
|
v
Reports / Dashboards

Data arrives continuously and becomes available for KQL analysis almost immediately.


Understanding KQL Query Structure

A basic KQL query:

Sales
| where Region == "East"
| summarize TotalSales = sum(Amount)

The pipe symbol (|) passes results from one operation to the next.

This pipeline approach is a key exam topic.


Filtering Streaming Data

The where operator filters records.

Example:

DeviceReadings
| where Temperature > 100

Common uses:

  • Error events
  • High temperatures
  • Security incidents
  • Suspicious transactions

Filtering early in a query improves performance.


Selecting Columns

The project operator selects specific columns.

Example:

Orders
| project OrderID, CustomerID, Amount

Benefits:

  • Reduced memory usage
  • Faster query execution
  • Cleaner output

Sorting Results

The sort operator orders data.

Example:

Orders
| sort by OrderDate desc

This is frequently used in monitoring and dashboard scenarios.


Aggregating Data with Summarize

The summarize operator is one of the most important KQL operators.

Example:

Sales
| summarize TotalSales = sum(Amount)

Common aggregation functions:

FunctionPurpose
sum()Total values
avg()Average
count()Row count
min()Minimum value
max()Maximum value
dcount()Distinct count

Grouping Data

Grouping is accomplished with summarize and a grouping column.

Example:

Sales
| summarize TotalSales=sum(Amount)
by Region

Output:

RegionTotalSales
East250000
West300000

This pattern is heavily used in analytics solutions.


Time-Based Analysis

Streaming data is frequently analyzed by time.

Example:

Events
| summarize Count=count()
by bin(Timestamp, 1h)

The bin() function groups records into fixed time windows.

Common windows:

  • 1 minute
  • 5 minutes
  • 15 minutes
  • 1 hour
  • 1 day

Working with Time-Series Data

Time-series analysis is one of KQL’s strengths.

Example:

SensorData
| summarize AvgTemp=avg(Temperature)
by bin(Timestamp, 5m)

This creates temperature averages every five minutes.

Typical use cases:

  • IoT monitoring
  • Server performance
  • Manufacturing systems
  • Financial transactions

Parsing Semi-Structured Data

Streaming data often arrives as JSON.

Example:

Events
| extend DeviceID = tostring(Event.DeviceID)

Common functions:

FunctionPurpose
tostring()Convert to string
toint()Convert to integer
todouble()Convert to decimal
parse_json()Parse JSON object

Creating Calculated Columns

The extend operator adds calculated values.

Example:

Sales
| extend Tax = Amount * .07

Common uses:

  • Calculations
  • Data enrichment
  • Derived metrics

Joining Streaming Data

KQL supports joins between datasets.

Example:

Orders
| join Customers
on CustomerID

Common scenarios:

  • Customer enrichment
  • Product lookups
  • Reference data joins

However, excessive joins can impact performance on very large streaming datasets.


Materialized Views

Materialized views precompute query results.

Benefits include:

  • Faster analytics
  • Reduced query costs
  • Improved dashboard performance

Example scenario:

A dashboard continuously displays hourly sales totals.

Instead of recalculating every query, a materialized view stores precomputed results.

This is a frequently tested DP-700 optimization topic.


Update Policies

Update policies automatically transform data during ingestion.

Example:

RawEvents Table
|
Update Policy
|
ProcessedEvents Table

Benefits:

  • Automatic transformation
  • Consistent processing
  • Reduced query complexity

Common use cases:

  • JSON parsing
  • Data enrichment
  • Data normalization

Streaming Ingestion

Fabric supports streaming ingestion into Eventhouses.

Characteristics:

  • Low latency
  • High throughput
  • Near real-time availability

Common sources include:

  • Eventstreams
  • Azure Event Hubs
  • IoT devices
  • Application telemetry
  • Custom applications

KQL vs Spark Structured Streaming

DP-700 commonly tests when to choose each technology.

RequirementKQLSpark Structured Streaming
Real-time analyticsExcellentGood
Data science workloadsLimitedExcellent
Machine learningLimitedExcellent
Interactive queryingExcellentModerate
Time-series analysisExcellentGood
Large-scale transformationsModerateExcellent
SQL-like queryingExcellentModerate

Use KQL When:

  • Analyzing event data
  • Monitoring telemetry
  • Building operational dashboards
  • Performing log analytics
  • Working with Eventhouses

Use Spark When:

  • Complex transformations are required
  • Machine learning workloads exist
  • Advanced ETL processing is needed
  • Large-scale data engineering pipelines are required

KQL vs T-SQL

FeatureKQLT-SQL
Streaming analyticsExcellentLimited
Time-series analysisExcellentModerate
OLTP operationsPoorExcellent
Real-time dashboardsExcellentModerate
Log analyticsExcellentPoor

For streaming analytics scenarios in Fabric, KQL is often the preferred option.


Performance Best Practices

Filter Early

Good:

Events
| where EventType == "Error"
| summarize count()

Poor:

Events
| summarize count()
| where EventType == "Error"

Filtering early reduces processing volume.


Project Only Required Columns

Avoid retrieving unnecessary data.

Events
| project Timestamp, DeviceID

Use Materialized Views

For frequently executed analytical queries, materialized views improve performance significantly.


Use Appropriate Time Bins

Choose bin sizes carefully:

  • Smaller bins = more detailed analysis
  • Larger bins = better performance

Common DP-700 Exam Scenarios

Scenario 1

You need near real-time analysis of millions of IoT events.

Best choice: Eventhouse + KQL


Scenario 2

You need complex machine learning transformations on streaming data.

Best choice: Spark Structured Streaming


Scenario 3

You need a dashboard showing rolling hourly transaction counts.

Best choice: KQL summarize with bin() function


Scenario 4

You need automatic transformation of incoming JSON data.

Best choice: Update policies


DP-700 Exam Tips

Remember these key points:

  • KQL is optimized for real-time analytics and event data.
  • Eventhouses are the primary storage and analytics engine for KQL workloads.
  • KQL uses a pipeline syntax (|).
  • where filters data.
  • project selects columns.
  • extend creates calculated columns.
  • summarize performs aggregations.
  • bin() groups time-series data into intervals.
  • Materialized views improve query performance.
  • Update policies automate ingestion-time transformations.
  • KQL is generally preferred over Spark for interactive streaming analytics.

Practice Exam Questions

Question 1

You need to analyze streaming telemetry data arriving from thousands of IoT devices and provide near real-time dashboards. Which technology should you primarily use?

A. Warehouse stored procedures
B. Dataflow Gen2
C. KQL in an Eventhouse
D. Power Query

Correct Answer: C

Explanation: KQL and Eventhouses are optimized for real-time analytics, telemetry processing, and interactive querying of streaming data.


Question 2

Which KQL operator is used to filter rows from a dataset?

A. summarize
B. where
C. project
D. extend

Correct Answer: B

Explanation: The where operator filters records based on specified conditions.


Question 3

A query needs to calculate total sales by region. Which KQL operator should be used?

A. project
B. where
C. summarize
D. extend

Correct Answer: C

Explanation: summarize performs aggregations such as sums, averages, and counts.


Question 4

Which operator is used to create a calculated column?

A. join
B. where
C. summarize
D. extend

Correct Answer: D

Explanation: The extend operator creates new calculated columns within a query.


Question 5

You need to display the number of events generated every hour. Which function should be used?

A. bin()
B. tostring()
C. parse_json()
D. countif()

Correct Answer: A

Explanation: The bin() function groups data into fixed time intervals for time-series analysis.


Question 6

Which Fabric component serves as the primary analytics engine for KQL workloads?

A. Lakehouse
B. Warehouse
C. Eventhouse
D. Dataflow Gen2

Correct Answer: C

Explanation: Eventhouses are designed for high-scale event ingestion and KQL-based analytics.


Question 7

What is the primary benefit of a materialized view?

A. Data encryption
B. Faster query performance through precomputed results
C. Reduced storage requirements
D. Automatic schema detection

Correct Answer: B

Explanation: Materialized views store precomputed query results, reducing query execution time.


Question 8

A data engineer must automatically transform incoming JSON data during ingestion. Which feature should be used?

A. Spark checkpointing
B. Eventstream routing
C. Data Activator
D. Update policies

Correct Answer: D

Explanation: Update policies automatically transform data as it is ingested into KQL tables.


Question 9

Which scenario is best suited for KQL instead of Spark Structured Streaming?

A. Large-scale machine learning pipeline
B. Deep learning model training
C. Interactive analysis of streaming telemetry data
D. Complex ETL involving hundreds of joins

Correct Answer: C

Explanation: KQL excels at real-time querying and analytics of telemetry, log, and event data.


Question 10

Which KQL operator is used to select specific columns from a dataset?

A. project
B. summarize
C. extend
D. where

Correct Answer: A

Explanation: The project operator returns only the specified columns, improving efficiency and readability.


Go to the DP-700 Exam Prep Hub main page.

Process data by using Spark structured streaming (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform streaming data
      --> Process data by using Spark structured streaming


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Modern analytics platforms increasingly require the ability to process data continuously as it arrives rather than waiting for scheduled batch loads. Microsoft Fabric supports this requirement through Spark Structured Streaming, a scalable and fault-tolerant stream processing engine built on Apache Spark.

For the DP-700 exam, you should understand when and how to use Spark Structured Streaming, how it differs from other real-time processing options such as Eventstreams and KQL Querysets, and how to design streaming solutions that write data into OneLake and Delta tables.

Spark Structured Streaming is commonly used when data engineers need to process streaming data with complex transformations, enrichments, joins, aggregations, and machine learning workloads while leveraging the scalability of Spark. (Microsoft Learn)


What Is Spark Structured Streaming?

Spark Structured Streaming is a stream-processing framework built on top of Apache Spark. It treats a continuous stream of incoming data as an unbounded table to which new rows are constantly appended. Developers write code using familiar DataFrame and Spark SQL operations while Spark handles the continuous execution behind the scenes. (Microsoft Learn)

Key characteristics include:

  • Near real-time processing
  • Fault tolerance
  • Automatic recovery
  • Horizontal scalability
  • Support for complex transformations
  • Integration with Delta Lake
  • Exactly-once processing capabilities through checkpointing and transaction logs (Microsoft Learn)

How Structured Streaming Works

The processing flow typically follows these steps:

  1. Read data from a streaming source.
  2. Apply transformations.
  3. Write results to a destination.
  4. Store checkpoints to track processing progress.
  5. Continue processing new data as it arrives.

Common Sources

Spark Structured Streaming supports sources such as:

  • Azure Event Hubs
  • Apache Kafka
  • JSON files
  • CSV files
  • Parquet files
  • Delta tables
  • Eventstreams outputs (Microsoft Learn)

Common Destinations

Results can be written to:

  • Lakehouse Delta tables
  • OneLake storage
  • Eventhouses
  • Memory sinks for testing
  • Other supported storage locations (Microsoft Learn)

Structured Streaming in Microsoft Fabric

Within Microsoft Fabric, Structured Streaming is most commonly implemented through:

  • Notebooks
  • Spark Job Definitions
  • Lakehouses
  • Delta tables
  • OneLake storage

A typical architecture looks like this:

Azure Event Hub
|
v
Spark Structured Streaming
|
v
Delta Table in Lakehouse
|
v
SQL Analytics Endpoint
|
v
Power BI

This pattern enables streaming data to become queryable almost immediately after arrival. (Microsoft Learn)


Structured Streaming vs Batch Processing

FeatureBatch ProcessingStructured Streaming
Data arrivalPeriodicContinuous
Processing latencyMinutes or hoursSeconds or minutes
Resource usageScheduledContinuous
Typical use caseHistorical reportingReal-time analytics
Data availabilityAfter load completesNear real time

Use Batch Processing When:

  • Data changes infrequently
  • Overnight processing is acceptable
  • Real-time insights are unnecessary

Use Structured Streaming When:

  • IoT devices generate events continuously
  • Fraud detection requires immediate action
  • Operational dashboards need live updates
  • Telemetry data must be analyzed continuously

Micro-Batch Processing

A common DP-700 exam topic is understanding that Structured Streaming typically uses a micro-batch architecture.

Instead of processing every individual event separately, Spark groups events into small batches and processes them continuously.

Example:

Incoming Events
-----------------
10:00:00 - Event A
10:00:01 - Event B
10:00:02 - Event C
Micro-batch executes
Process A+B+C together

This approach provides:

  • Better performance
  • Higher throughput
  • Easier fault recovery
  • Familiar Spark execution model (Microsoft Learn)

Reading Streaming Data

Streaming ingestion begins with:

df = (
spark.readStream
.format("eventhubs")
.load()
)

Important points:

  • readStream creates a streaming DataFrame.
  • Data remains continuously available.
  • Spark automatically detects new events.
  • The query remains active until stopped. (Microsoft Learn)

Writing Streaming Data

Streaming results are written using writeStream.

Example:

query = (
df.writeStream
.format("delta")
.outputMode("append")
.toTable("SalesEvents")
)

Common output modes include:

ModeDescription
AppendOnly new rows written
CompleteEntire result rewritten
UpdateOnly changed rows written

For Fabric data engineering scenarios, Append mode is most common. (Microsoft Learn)


Delta Lake Integration

One of the most important DP-700 concepts is integrating Structured Streaming with Delta Lake.

Benefits include:

  • ACID transactions
  • Schema evolution
  • Time travel
  • Data versioning
  • Reliable streaming ingestion

Streaming data can be written directly into Delta tables:

.writeStream
.format("delta")
.toTable("Orders")

This creates a continuously updated Delta table within the Lakehouse. (Microsoft Learn)


Checkpointing

Checkpointing is critical for fault tolerance.

Example:

.option(
"checkpointLocation",
"Files/checkpoints/orders"
)

Checkpoints store:

  • Processed offsets
  • Query progress
  • State information

Benefits:

  • Prevents duplicate processing
  • Enables recovery after failures
  • Supports exactly-once processing semantics

A frequent exam scenario involves identifying missing checkpoint configurations as the root cause of duplicate or reprocessed data. (mindmeshacademy.com)


Triggers

Triggers control how often Spark processes incoming data.

Example:

.trigger(
processingTime="1 minute"
)

Possible trigger strategies:

Trigger TypePurpose
Continuous processingLowest latency
Processing timeFixed intervals
Available NowProcess all available data and stop

Larger trigger intervals often improve throughput because more events are processed together. (Microsoft Learn)


Stateful vs Stateless Processing

Stateless Processing

Each event is processed independently.

Examples:

  • Filtering
  • Column selection
  • Simple transformations
stream.filter("temperature > 100")

Stateful Processing

Spark maintains information between batches.

Examples:

  • Running totals
  • Session windows
  • Stream aggregations
  • Deduplication

Stateful processing is more powerful but consumes additional memory and storage resources. (jumpstart.fabric.microsoft.com)


Stream Aggregations

Streaming aggregations allow continuous calculations.

Examples:

  • Sales totals
  • Device counts
  • Average temperatures
  • Transaction volumes

Example:

stream.groupBy("DeviceID") \
.count()

This continuously updates counts as new events arrive.


Common Streaming Scenarios in Fabric

IoT Monitoring

Sensors continuously send readings.

Process:

IoT Devices
|
Event Hub
|
Spark Structured Streaming
|
Lakehouse
|
Power BI Dashboard

Application Telemetry

Applications send logs and metrics continuously.

Use cases:

  • Performance monitoring
  • Error tracking
  • Operational dashboards

Real-Time Business Analytics

Examples include:

  • Online sales monitoring
  • Inventory tracking
  • Customer activity analysis
  • Fraud detection

Structured Streaming vs Eventstreams

DP-700 often tests when to use each technology.

RequirementEventstreamsStructured Streaming
No-code ingestionYesNo
Visual designYesNo
Complex transformationsLimitedExcellent
Custom codeNoYes
Machine learning integrationLimitedExcellent
Advanced Spark operationsNoYes

Use Eventstreams for simple routing and ingestion.

Use Structured Streaming for advanced engineering workloads. (Microsoft Learn)


Production Best Practices

Use Spark Job Definitions

For production workloads, Microsoft recommends Spark Job Definitions rather than leaving notebooks running continuously. They provide better reliability and restart capabilities. (Microsoft Learn)

Configure Retry Policies

Retry policies allow automatic recovery from infrastructure failures. (Microsoft Learn)

Always Use Checkpoints

Never deploy production streaming jobs without checkpoint locations. (mindmeshacademy.com)

Optimize Partitioning

Appropriate partitioning improves throughput and downstream query performance. (Microsoft Learn)

Monitor Streaming Jobs

Use Fabric Monitoring Hub to monitor:

  • Input rate
  • Processing rate
  • Batch duration
  • Streaming query health (Microsoft Learn)

DP-700 Exam Tips

Remember these frequently tested concepts:

  • Structured Streaming treats streams as continuously growing tables.
  • readStream reads streaming data.
  • writeStream writes streaming data.
  • Delta tables are common streaming destinations.
  • Checkpointing enables fault tolerance.
  • Spark Job Definitions are preferred for production streaming workloads.
  • Event Hubs is a common streaming source.
  • Micro-batch processing is the default execution model.
  • Structured Streaming is preferred when complex transformations are required.
  • Eventstreams are often preferred for simpler ingestion scenarios.

Practice Exam Questions

Question 1

A company needs to process telemetry data from thousands of IoT devices as soon as it arrives. The solution must perform complex transformations before storing data in a Lakehouse.

Which technology should you choose?

A. Dataflow Gen2
B. Warehouse Stored Procedures
C. Spark Structured Streaming
D. Copy Activity

Correct Answer: C

Explanation: Spark Structured Streaming is designed for continuous data processing and complex transformations on streaming data.


Question 2

What is the primary purpose of a checkpoint location in Structured Streaming?

A. Increase Spark cluster size
B. Store temporary query results
C. Track processing progress and support recovery
D. Compress Delta files

Correct Answer: C

Explanation: Checkpoints store offsets and state information that allow recovery without reprocessing all data.


Question 3

Which method is used to create a streaming DataFrame?

A. readStream()
B. streamRead()
C. loadStreaming()
D. readDelta()

Correct Answer: A

Explanation: readStream() is the Spark API used to create streaming DataFrames.


Question 4

Which destination is most commonly used for Spark Structured Streaming in Microsoft Fabric?

A. Delta table in a Lakehouse
B. Excel workbook
C. Dataflow Gen2
D. Semantic model

Correct Answer: A

Explanation: Delta tables in Lakehouses are the primary streaming storage destination in Fabric.


Question 5

What execution model does Spark Structured Streaming primarily use?

A. Row-by-row execution
B. Continuous SQL polling
C. Micro-batch processing
D. Manual scheduling

Correct Answer: C

Explanation: Structured Streaming processes incoming data as small batches at regular intervals.


Question 6

Which Fabric component is recommended for running production Structured Streaming workloads?

A. Notebook only
B. Dataflow Gen2
C. Pipeline activity
D. Spark Job Definition

Correct Answer: D

Explanation: Spark Job Definitions provide improved reliability, retry policies, and production-grade execution.


Question 7

A streaming job must continuously calculate running totals by customer.

What type of processing is required?

A. Stateless processing
B. Stateful processing
C. Batch processing
D. Snapshot processing

Correct Answer: B

Explanation: Running totals require maintaining state across multiple batches.


Question 8

Which statement about Eventstreams and Structured Streaming is correct?

A. Eventstreams supports more advanced Spark transformations.
B. Structured Streaming is a no-code solution.
C. Structured Streaming supports complex custom code transformations.
D. Eventstreams requires Spark coding.

Correct Answer: C

Explanation: Structured Streaming provides full Spark capabilities and custom coding flexibility.


Question 9

What is the benefit of writing streaming data to Delta tables?

A. Eliminates storage costs
B. Prevents all schema changes
C. Converts data to CSV automatically
D. Provides ACID transactions and reliability

Correct Answer: D

Explanation: Delta Lake provides transactional consistency, schema evolution, and reliable streaming ingestion.


Question 10

A data engineer wants to process incoming events every 60 seconds instead of immediately.

Which feature should be configured?

A. Checkpointing
B. Consumer groups
C. Trigger interval
D. Data partitioning

Correct Answer: C

Explanation: Trigger intervals control how frequently Spark processes incoming streaming data.


Go to the DP-700 Exam Prep Hub main page.

Process data by using Eventstreams (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform streaming data
      --> Process data by using Eventstreams


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

As organizations increasingly rely on real-time analytics, the ability to ingest, process, route, and analyze streaming data has become a critical skill for data engineers. Microsoft Fabric provides Eventstreams as a low-code, scalable solution for processing streaming data within the Real-Time Intelligence workload.

For the DP-700 exam, you should understand how Eventstreams work, how they integrate with other Fabric components, how to perform basic stream processing, and when to use Eventstreams instead of alternatives such as notebooks, pipelines, or KQL databases.


What Are Eventstreams?

An Eventstream is a real-time data processing service within Microsoft Fabric that enables users to:

  • Ingest streaming data from various sources
  • Process and transform events in motion
  • Route data to multiple destinations
  • Monitor streaming pipelines visually
  • Build real-time analytics solutions

Eventstreams serve as the ingestion and routing layer of many Real-Time Intelligence solutions.

Conceptually:

Data Sources
Eventstream
Processing & Routing
Destinations

Eventstreams allow organizations to handle millions of events while maintaining low latency and high scalability.


Why Use Eventstreams?

Traditional batch processing waits for data to accumulate before processing.

Streaming scenarios require:

  • Immediate processing
  • Low-latency analytics
  • Real-time alerts
  • Continuous monitoring

Examples include:

  • IoT sensor monitoring
  • Website clickstream analysis
  • Application telemetry
  • Manufacturing equipment monitoring
  • Financial transaction processing
  • Security event monitoring

Eventstreams provide a managed platform for handling these requirements.


Eventstream Architecture

An Eventstream consists of three major components:

1. Sources

Sources provide incoming event data.

Common sources include:

  • Event Hubs
  • Fabric Eventhouses
  • Azure IoT Hub
  • Fabric Real-Time Hub
  • Custom applications
  • Sample streaming data

Example:

IoT Devices
Azure Event Hubs
Eventstream

2. Processing

After ingestion, Eventstreams can perform lightweight transformations.

Examples include:

  • Filtering records
  • Selecting columns
  • Enriching events
  • Basic data transformations
  • Event routing

Processing occurs while data is flowing through the stream.


3. Destinations

Processed events can be delivered to one or more destinations.

Common destinations include:

  • Eventhouse
  • KQL Database
  • Lakehouse
  • Activator
  • Custom endpoints

Example:

Eventstream
┌─────────┬─────────┬─────────┐
│Lakehouse│Eventhouse│Activator│
└─────────┴─────────┴─────────┘

One incoming stream can be delivered to multiple destinations simultaneously.


Eventstreams and Real-Time Intelligence

Eventstreams are a foundational component of Fabric Real-Time Intelligence.

A typical architecture may include:

IoT Devices
Eventstream
Eventhouse
KQL Queries
Dashboards

In this architecture:

  • Eventstream ingests data.
  • Eventhouse stores data.
  • KQL analyzes data.
  • Dashboards visualize results.

Common Eventstream Sources

Azure Event Hubs

One of the most common production sources.

Use when:

  • High-volume streaming data exists
  • Enterprise-scale ingestion is required
  • External systems already publish events

Azure IoT Hub

Designed specifically for IoT devices.

Examples:

  • Manufacturing sensors
  • Smart buildings
  • Connected vehicles

Real-Time Hub

Fabric Real-Time Hub provides a centralized location for discovering and connecting streaming data sources.

Benefits include:

  • Simplified discovery
  • Easy integration
  • Centralized event management

Eventstream Processing Capabilities

Eventstreams support several lightweight transformation capabilities.

Filtering

Filter unwanted records before storage.

Example:

Only process temperatures above 80°F.

Input:

Device A: 75
Device B: 84
Device C: 81

Output:

Device B: 84
Device C: 81

Filtering reduces storage and processing costs.


Column Selection

Keep only required fields.

Input:

DeviceID
Temperature
Location
BatteryLevel
Timestamp

Output:

DeviceID
Temperature
Timestamp

This reduces data volume.


Data Enrichment

Additional information can be added to streaming events.

Example:

Incoming Event:
DeviceID = 100
Enriched Event:
DeviceID = 100
Region = East
Facility = Orlando

Enrichment improves downstream analytics.


Routing Events

One of the most important Eventstream features is routing.

A single incoming stream can be sent to multiple destinations.

Example:

Telemetry Stream
Eventstream
┌────────┬─────────┬─────────┐
│Lakehouse│Eventhouse│Activator│
└────────┴─────────┴─────────┘

This enables:

  • Historical storage
  • Real-time analytics
  • Automated actions

from the same stream.


Eventstream Destinations

Eventhouse

Best for:

  • KQL analytics
  • Real-time dashboards
  • Time-series analysis

Often the primary destination in Real-Time Intelligence solutions.


Lakehouse

Best for:

  • Historical retention
  • Data science
  • Long-term storage
  • Delta table analytics

Commonly used alongside Eventhouse.


Activator

Used to trigger actions based on conditions.

Examples:

  • Send alerts
  • Trigger workflows
  • Notify users

Example:

Temperature > 100°F
Send Alert

Eventstream Monitoring

Fabric provides monitoring capabilities for Eventstreams.

Metrics include:

  • Throughput
  • Incoming events
  • Failed events
  • Processing latency
  • Destination status

Monitoring helps identify:

  • Bottlenecks
  • Connection issues
  • Data quality problems

Eventstreams vs Pipelines

This comparison is important for the DP-700 exam.

FeatureEventstreamPipeline
Real-time processingYesNo
Streaming dataYesNo
Batch processingLimitedYes
Continuous executionYesNo
SchedulingNoYes
Data movementYesYes

Use Eventstreams When

  • Data arrives continuously
  • Low latency is required
  • Real-time monitoring is needed

Use Pipelines When

  • Batch processing is required
  • Scheduled execution is needed
  • ETL orchestration is required

Eventstreams vs Notebooks

FeatureEventstreamNotebook
Low-codeYesNo
Streaming ingestionYesPossible
Complex transformationsLimitedExtensive
Spark processingNoYes
Machine learningNoYes

Use Eventstreams

For simple streaming ingestion and routing.

Use Notebooks

For advanced Spark transformations and machine learning workloads.


Eventstreams vs Eventhouse

Candidates often confuse these services.

Eventstream

Focuses on:

  • Ingestion
  • Processing
  • Routing

Eventhouse

Focuses on:

  • Storage
  • Querying
  • Analytics

A common architecture uses both together.

Eventstream
Eventhouse
KQL Queries

Best Practices

Filter Early

Remove unnecessary events before storage.

Benefits:

  • Lower storage costs
  • Faster queries
  • Reduced processing requirements

Route Once, Consume Many

Instead of duplicating ingestion pipelines, use one Eventstream and multiple destinations.

Benefits:

  • Simpler architecture
  • Lower maintenance effort

Monitor Throughput

Regularly review:

  • Event ingestion rates
  • Failed events
  • Processing latency

Separate Real-Time and Historical Analytics

A common architecture is:

Eventstream
┌──────────┬──────────┐
│Eventhouse│Lakehouse │
└──────────┴──────────┘

Eventhouse supports operational analytics while Lakehouse supports historical analysis.


DP-700 Exam Tips

Remember the following:

  1. Eventstreams are designed for real-time data ingestion and routing.
  2. Eventstreams consist of sources, processing, and destinations.
  3. Eventstreams commonly feed Eventhouses.
  4. Multiple destinations can receive the same stream.
  5. Eventstreams support filtering, selection, and enrichment.
  6. Eventstreams are not replacements for notebooks.
  7. Pipelines are primarily for batch orchestration.
  8. Eventhouse stores and analyzes streaming data.
  9. Activator can trigger actions from streaming events.
  10. Eventstreams are a key component of Fabric Real-Time Intelligence architectures.

Practice Exam Questions

Question 1

A company receives telemetry from thousands of IoT devices every second. The data must be processed immediately and sent to an Eventhouse.

Which Fabric component should be used?

A. Eventstream
B. Dataflow Gen2
C. Warehouse
D. Deployment Pipeline

Correct Answer: A

Explanation:
Eventstreams are designed specifically for real-time ingestion, processing, and routing of streaming data.


Question 2

Which component of an Eventstream receives incoming events?

A. Destination
B. Source
C. Activator
D. Eventhouse

Correct Answer: B

Explanation:
Sources are responsible for providing incoming streaming data to the Eventstream.


Question 3

A data engineer wants to remove all records where temperature is below 70°F before storing the data.

Which Eventstream capability should be used?

A. Mirroring
B. Aggregation
C. Filtering
D. Scheduling

Correct Answer: C

Explanation:
Filtering removes unwanted records before they reach downstream destinations.


Question 4

Which destination is best suited for real-time KQL analytics?

A. Warehouse
B. Notebook
C. Dataflow Gen2
D. Eventhouse

Correct Answer: D

Explanation:
Eventhouse is optimized for real-time analytics and KQL querying.


Question 5

A company wants the same streaming data to be stored historically and analyzed in real time.

What should be done?

A. Create two separate Eventstreams
B. Route the Eventstream to both a Lakehouse and an Eventhouse
C. Export the data twice
D. Use Dataflow Gen2

Correct Answer: B

Explanation:
Eventstreams can send data to multiple destinations simultaneously.


Question 6

Which Fabric service can trigger alerts based on conditions detected in streaming data?

A. Pipeline
B. Activator
C. Warehouse
D. Notebook

Correct Answer: B

Explanation:
Activator can generate notifications and actions based on event conditions.


Question 7

Which statement best describes Eventstreams?

A. Primarily used for batch ETL scheduling
B. Primarily used for dashboard creation
C. Primarily used for real-time ingestion and routing
D. Primarily used for SQL warehousing

Correct Answer: C

Explanation:
Eventstreams specialize in streaming ingestion, lightweight processing, and routing.


Question 8

Which service is generally preferred for complex Spark-based transformations?

A. Eventstream
B. Activator
C. Eventhouse
D. Notebook

Correct Answer: D

Explanation:
Notebooks provide extensive Spark and PySpark transformation capabilities that exceed Eventstream processing functionality.


Question 9

What is a major benefit of routing a stream to multiple destinations?

A. Eliminates all storage costs
B. Allows different workloads to consume the same stream simultaneously
C. Removes the need for Eventhouse
D. Prevents data retention

Correct Answer: B

Explanation:
Multiple destinations allow operational analytics, historical storage, and alerting from the same data stream.


Question 10

Which statement accurately compares Eventstreams and pipelines?

A. Pipelines are optimized for continuous streaming ingestion.
B. Eventstreams are primarily used for batch scheduling.
C. Both services are identical.
D. Eventstreams are optimized for real-time processing, while pipelines are optimized for batch orchestration.

Correct Answer: D

Explanation:
Eventstreams handle continuously arriving data, while pipelines are designed for orchestrated batch processing and scheduled workflows.


Go to the DP-700 Exam Prep Hub main page.

Choose between Query Acceleration for OneLake shortcuts and standard OneLake shortcuts in Real-Time Intelligence (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform streaming data
      --> Choose between Query Acceleration for OneLake shortcuts and standard OneLake shortcuts in Real-Time Intelligence


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Microsoft Fabric provides multiple ways to access data stored in OneLake from Real-Time Intelligence workloads such as Eventhouses and KQL databases. One of the most important design decisions for data engineers is determining whether to use:

  • Standard OneLake shortcuts
  • Query-accelerated OneLake shortcuts

Understanding the differences between these options is essential for the DP-700 exam because they directly affect performance, cost, latency, storage consumption, and analytics architecture.

This article explains how each option works, when to use them, their limitations, and the decision-making criteria you should understand for the exam.


Understanding OneLake Shortcuts

A OneLake shortcut is a virtual reference to data stored elsewhere. Instead of copying data, the shortcut points to an existing data source. This allows multiple Fabric experiences to access the same data without creating duplicate copies. (Microsoft Learn)

For example:

  • A Lakehouse contains sales data.
  • An Eventhouse creates a shortcut to that data.
  • Queries can access the data through the shortcut.
  • The original data remains in its source location.

Benefits include:

  • No data duplication
  • Reduced storage costs
  • Single source of truth
  • Simplified data management
  • Faster implementation

Standard OneLake Shortcuts

A standard OneLake shortcut allows Real-Time Intelligence workloads to query external data directly from OneLake without ingesting it into the Eventhouse. (Microsoft Learn)

How It Works

When a query executes:

  1. Eventhouse accesses the shortcut.
  2. Data is retrieved from the source Delta table.
  3. Results are returned to the query.

No additional indexing or caching is performed.

Advantages

  • Minimal setup effort
  • No duplicated storage
  • Lower cost
  • Immediate access to existing data
  • Suitable for infrequent queries

Disadvantages

  • Slower query performance
  • Higher query latency
  • External storage access required during execution
  • Limited optimization opportunities

Query Acceleration for OneLake Shortcuts

Query Acceleration is a feature in Real-Time Intelligence that improves query performance against OneLake shortcut data by automatically caching and indexing selected data. (Video2 Skills Academy)

Instead of repeatedly reading Delta files from storage, Fabric creates optimized structures that significantly improve performance.

How It Works

When acceleration is enabled:

  1. A shortcut is created.
  2. Fabric indexes the data.
  3. Fabric caches data based on the configured retention period.
  4. Queries use optimized structures instead of repeatedly scanning raw files. (Microsoft Learn)

The experience becomes similar to querying native Eventhouse data.


Query Acceleration Architecture

Without acceleration:

Delta Table
OneLake Shortcut
Query Reads Files Directly

With acceleration:

Delta Table
OneLake Shortcut
Indexing and Caching
High-Performance Queries

Performance Comparison

CharacteristicStandard ShortcutQuery Accelerated Shortcut
Data duplicationNoNo
CachingNoYes
IndexingNoYes
Query latencyHigherLower
Large-scale analyticsModerateExcellent
CostLowerHigher
Setup complexityLowModerate

When to Use Standard OneLake Shortcuts

Choose standard shortcuts when:

Query Frequency is Low

If users only occasionally access the data, acceleration may not provide sufficient value.

Example:

  • Monthly compliance reports
  • Ad hoc investigations
  • Occasional auditing

Cost Optimization is Critical

Since acceleration introduces caching and indexing costs, standard shortcuts are often preferred for budget-sensitive workloads.

Data Volumes are Small

Smaller datasets generally perform well enough without acceleration.


When to Use Query Acceleration

Choose query acceleration when:

High Query Volume Exists

Examples:

  • Interactive dashboards
  • Continuous monitoring
  • Frequent analytics workloads

Large Delta Tables Are Queried

Large historical datasets often benefit significantly from acceleration.

Real-Time and Historical Data Must Be Combined

A common Real-Time Intelligence pattern involves:

  • Streaming data arriving in Eventhouse
  • Historical data stored in OneLake

Query acceleration enables efficient joins between both datasets. (Video2 Skills Academy)

Example:

Live Sensor Stream
+
Historical Equipment Data
=
Real-Time Analytics

Dimension Data Must Be Joined Frequently

Organizations often mirror dimension data into OneLake and then use accelerated shortcuts for enrichment and lookup operations. (Video2 Skills Academy)


Configuring Query Acceleration

Acceleration can be enabled:

  • During shortcut creation
  • After shortcut creation through Data Policies settings (Microsoft Learn)

Administrators can also define:

  • Number of cached days
  • Retention period
  • Acceleration policies

The caching period determines how much data remains optimized for high-performance access. (Microsoft Learn)


Caching Period Considerations

The caching period directly impacts:

  • Query performance
  • Storage consumption
  • Cost

Example:

Cached PeriodTypical Use Case
7 daysOperational monitoring
30 daysBusiness analytics
90 daysHistorical trend analysis

Longer periods improve performance across larger time ranges but increase storage costs.


Cost Considerations

This topic frequently appears in architecture-based exam questions.

Standard Shortcuts

Costs include:

  • Storage
  • Query processing

No additional acceleration charges apply.

Query Acceleration

Additional costs include:

The tradeoff is:

Higher Cost
Much Better Performance

Limitations of Query Acceleration

Candidates should understand major limitations.

Examples include: (Video2 Skills Academy)

  • Materialized views are not supported.
  • Update policies are not supported.
  • External tables with extremely large file counts may experience reduced effectiveness.
  • Certain Delta table schema changes may require reacceleration.
  • Some advanced Delta features may require disabling and re-enabling acceleration.

Decision Framework for the Exam

A useful exam strategy:

Choose Standard Shortcuts When

  • Cost is the highest priority.
  • Data is queried infrequently.
  • Data volume is moderate.
  • Performance requirements are relaxed.

Choose Query Acceleration When

  • Performance is critical.
  • Queries occur frequently.
  • Large datasets are analyzed.
  • Historical and streaming data are combined.
  • Interactive analytics workloads exist.

DP-700 Exam Tips

Remember These Key Points

  1. OneLake shortcuts avoid data duplication.
  2. Standard shortcuts access data directly.
  3. Query acceleration adds indexing and caching.
  4. Query acceleration improves performance but increases cost.
  5. Accelerated shortcuts are ideal for frequent analytical queries.
  6. Standard shortcuts are ideal for occasional access scenarios.
  7. Query acceleration is especially valuable when combining streaming and historical datasets.
  8. Cached retention periods directly affect cost and performance.
  9. Accelerated shortcuts behave like external tables and inherit some external table limitations.
  10. The exam often focuses on choosing the most cost-effective versus highest-performance solution.

Practice Exam Questions

Question 1

A company uses Eventhouse to analyze telemetry data. Historical data resides in OneLake and is queried thousands of times per day. Query performance is poor.

What should you implement?

A. Dataflows Gen2
B. Query acceleration on the OneLake shortcut
C. Warehouse mirroring
D. Notebook scheduling

Correct Answer: B

Explanation:
Query acceleration adds indexing and caching that significantly improves query performance for frequently accessed shortcut data. (Video2 Skills Academy)


Question 2

What is the primary benefit of a standard OneLake shortcut?

A. Eliminates all query latency
B. Automatically indexes data
C. Provides access to data without duplication
D. Creates materialized views

Correct Answer: C

Explanation:
Shortcuts reference existing data rather than copying it, allowing a single source of truth. (Microsoft Learn)


Question 3

A solution prioritizes the lowest possible storage and acceleration costs. Data is queried only once per month.

Which option should be selected?

A. Query-accelerated shortcut
B. Materialized view
C. Standard OneLake shortcut
D. Native Eventhouse ingestion

Correct Answer: C

Explanation:
When query frequency is very low, the additional acceleration costs are generally not justified.


Question 4

What additional capability does query acceleration provide?

A. Encryption
B. Data mirroring
C. Row-level security
D. Caching and indexing

Correct Answer: D

Explanation:
Query acceleration improves performance through indexing and caching. (Video2 Skills Academy)


Question 5

Which scenario most strongly justifies query acceleration?

A. Small dataset queried quarterly
B. Development environment testing
C. Large historical dataset used in interactive dashboards
D. One-time data migration

Correct Answer: C

Explanation:
Interactive dashboards require low latency and frequent queries, making acceleration highly beneficial.


Question 6

What happens to the source data when a OneLake shortcut is created?

A. It is copied into Eventhouse
B. It is archived
C. It is compressed
D. It remains in its original location

Correct Answer: D

Explanation:
A shortcut is only a reference to the original data source. (Microsoft Learn)


Question 7

An engineer wants to join streaming Eventhouse data with historical OneLake data while maintaining high query performance.

Which approach should be recommended?

A. Query-accelerated shortcut
B. Dataflow Gen2
C. Warehouse endpoint
D. Manual exports

Correct Answer: A

Explanation:
One of the primary use cases for query acceleration is combining streaming and historical data efficiently. (Video2 Skills Academy)


Question 8

What configuration primarily controls how much accelerated data remains cached?

A. Workspace role assignments
B. Retention and caching period settings
C. Lakehouse schema definitions
D. Fabric tenant settings

Correct Answer: B

Explanation:
Administrators specify how many days of data are retained in the acceleration cache. (Microsoft Learn)


Question 9

Which statement about accelerated shortcuts is true?

A. They always cost less than standard shortcuts.
B. They require data duplication.
C. They can improve performance through cached and indexed data.
D. They eliminate storage requirements.

Correct Answer: C

Explanation:
Acceleration works by indexing and caching data while still avoiding data duplication. (Video2 Skills Academy)


Question 10

A company needs the fastest possible query performance against frequently accessed OneLake data and is willing to accept additional cost.

Which option should be chosen?

A. Standard OneLake shortcut
B. Manual exports to CSV
C. Dataflow Gen2
D. Query-accelerated OneLake shortcut

Correct Answer: D

Explanation:
Query acceleration is specifically designed to maximize query performance by using caching and indexing mechanisms. (Video2 Skills Academy)


Go to the DP-700 Exam Prep Hub main page.

Choose between native tables and OneLake shortcuts in Real-Time Intelligence (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform streaming data
      --> Choose between native tables and OneLake shortcuts in Real-Time Intelligence


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the key design decisions when building real-time analytics solutions in Microsoft Fabric is determining where data should reside and how it should be accessed. Within Real-Time Intelligence, data engineers frequently encounter scenarios where they must choose between:

  • Native Tables in Eventhouse/KQL databases
  • OneLake Shortcuts to data stored elsewhere

Understanding the differences between these approaches is important for the DP-700 exam because the choice impacts:

  • Query performance
  • Data latency
  • Storage costs
  • Data governance
  • Data duplication
  • Maintenance complexity

A successful data engineer must understand when to ingest data directly into Real-Time Intelligence and when to reference existing data through shortcuts.


Understanding Real-Time Intelligence

Real-Time Intelligence is Microsoft Fabric’s solution for ingesting, analyzing, and acting upon streaming and operational data.

Key components include:

  • Eventstream
  • Eventhouse
  • KQL Databases
  • Data Activator
  • Real-Time Dashboards

Data stored within Eventhouse and KQL databases can come from multiple sources:

  • Direct streaming ingestion
  • Batch ingestion
  • External storage systems
  • OneLake data sources

This is where the choice between native tables and OneLake shortcuts becomes important.


What Are Native Tables?

Native tables are physical tables stored directly inside a KQL database or Eventhouse.

When data is ingested into Real-Time Intelligence, it is written into these tables and becomes part of the Eventhouse storage engine.


Characteristics of Native Tables

Native tables:

  • Physically store data
  • Support extremely fast query performance
  • Are optimized for time-series analytics
  • Support continuous streaming ingestion
  • Provide low-latency access
  • Support update policies and materialized views
  • Enable advanced KQL analytics

Native Table Architecture

Event Source
Eventstream
Native Table
KQL Queries
Dashboards / Analytics

Data resides directly within the Eventhouse environment.


Advantages of Native Tables

Highest Query Performance

Because data is physically stored in the Eventhouse engine, query execution is highly optimized.

Benefits include:

  • Faster aggregations
  • Faster filtering
  • Lower latency
  • Better concurrency

Optimized for Streaming Workloads

Native tables are specifically designed for:

  • High ingestion rates
  • Continuous event streams
  • Telemetry data
  • Operational analytics

Support for Advanced Features

Native tables support:

  • Materialized views
  • Update policies
  • Data retention policies
  • Cached query execution
  • Time-series functions

Lower Query Latency

Real-time dashboards often require results within seconds.

Native tables generally provide the lowest latency.


Disadvantages of Native Tables

Data Duplication

The same data may already exist elsewhere:

  • Lakehouse
  • Warehouse
  • ADLS Gen2
  • Other databases

Ingesting into native tables creates another copy.


Increased Storage Costs

More copies of data mean:

  • More storage consumption
  • Additional retention management

Additional Ingestion Processing

Data must be:

  • Moved
  • Loaded
  • Managed

before it becomes available.


What Are OneLake Shortcuts?

A OneLake shortcut provides a virtual reference to data stored elsewhere.

Rather than copying data into Eventhouse, Real-Time Intelligence accesses the existing data through the shortcut.


Shortcut Concept

Instead of:

Source → Copy → Eventhouse

You get:

Source → OneLake Shortcut → Query

No physical duplication occurs.


Supported Sources

Shortcuts can reference:

  • Fabric Lakehouses
  • Fabric Warehouses
  • Azure Data Lake Storage Gen2
  • Amazon S3
  • Other supported storage locations

Characteristics of OneLake Shortcuts

Shortcuts:

  • Avoid copying data
  • Provide a single source of truth
  • Reduce storage costs
  • Simplify governance
  • Enable data reuse

Advantages of OneLake Shortcuts

Eliminate Data Duplication

One of the biggest advantages.

Instead of storing multiple copies:

One Source
Multiple Consumers

All consumers access the same data.


Lower Storage Costs

Since data is not duplicated:

  • Less storage consumption
  • Lower management overhead

Faster Data Availability

No ingestion process is required.

Data becomes accessible immediately after the shortcut is created.


Improved Governance

Governance becomes easier because:

  • Data remains in one location
  • Policies remain centralized
  • Data lineage remains clearer

Supports the One Copy Vision

OneLake is built around the principle of:

“One copy of data for the entire organization.”

Shortcuts are a key enabler of this strategy.


Disadvantages of OneLake Shortcuts

Potentially Higher Query Latency

Because data is not stored locally:

  • Queries may require additional access steps
  • Performance can be slower than native tables

Limited Optimization

Some advanced Eventhouse optimization capabilities are most effective with native data.

Examples include:

  • Materialized views
  • Update policies
  • Streaming ingestion optimizations

Dependency on Source Availability

If the source becomes unavailable:

  • Queries may fail
  • Performance may degrade

Native tables do not have this dependency.


When to Choose Native Tables

Choose native tables when:

Real-Time Performance Is Critical

Examples:

  • Monitoring dashboards
  • Security analytics
  • Fraud detection
  • Manufacturing telemetry

Continuous Streaming Ingestion Exists

Examples:

  • IoT sensors
  • Application logs
  • Device telemetry

High Query Volumes Are Expected

Examples:

  • Enterprise dashboards
  • Operational reporting

Advanced KQL Features Are Required

Examples:

  • Materialized views
  • Update policies
  • Retention policies

When to Choose OneLake Shortcuts

Choose shortcuts when:

Data Already Exists in OneLake

Avoid creating unnecessary copies.


Storage Costs Must Be Minimized

Shortcuts reduce storage requirements.


Data Sharing Is Important

Multiple teams can access the same dataset.


Data Is Primarily Historical

Examples:

  • Historical archives
  • Reference datasets
  • Slowly changing datasets

Governance Is a Priority

Maintaining a single source of truth simplifies compliance and governance efforts.


Comparing Native Tables and OneLake Shortcuts

FeatureNative TablesOneLake Shortcuts
Physical storageYesNo
Data duplicationYesNo
Storage costHigherLower
Query performanceHighestGood
Streaming ingestionExcellentNot primary purpose
Advanced KQL featuresFull supportLimited scenarios
Data governanceMore complexSimpler
Single source of truthNoYes
Real-time analyticsBest choiceSuitable in some cases
Historical data accessGoodExcellent

Common DP-700 Exam Scenarios

Scenario 1

A manufacturing company ingests millions of telemetry events every minute and requires dashboards that refresh within seconds.

Best Choice: Native Tables

Reason:

  • Maximum ingestion performance
  • Lowest query latency

Scenario 2

An organization already stores enterprise sales data in a Fabric Lakehouse and wants Eventhouse users to analyze it without creating another copy.

Best Choice: OneLake Shortcut

Reason:

  • Eliminates duplication
  • Supports centralized governance

Scenario 3

A security operations center performs continuous threat monitoring using KQL.

Best Choice: Native Tables

Reason:

  • Optimized for streaming analytics
  • Fast query response times

Scenario 4

A data engineering team needs occasional access to historical archive data stored in ADLS Gen2.

Best Choice: OneLake Shortcut

Reason:

  • No need to ingest large historical datasets
  • Lower storage costs

Decision Framework

Ask the following questions:

Is the data arriving continuously?

If yes → Native Tables.


Is ultra-low latency required?

If yes → Native Tables.


Does the data already exist in OneLake?

If yes → Consider OneLake Shortcuts.


Is avoiding duplication important?

If yes → OneLake Shortcuts.


Are advanced KQL optimization features required?

If yes → Native Tables.


DP-700 Exam Tips

Remember these key distinctions:

  • Native tables physically store data inside Eventhouse.
  • Native tables provide the highest performance.
  • Native tables are ideal for streaming ingestion.
  • OneLake shortcuts reference data without copying it.
  • Shortcuts support the One Copy vision of OneLake.
  • Shortcuts reduce storage costs.
  • Native tables are preferred when low-latency analytics is critical.
  • Shortcuts are preferred when data already exists elsewhere and duplication should be avoided.
  • Exam questions often focus on balancing performance versus storage and governance.

Practice Exam Questions

Question 1

A company requires sub-second analytics on continuously arriving IoT telemetry data in Eventhouse.

Which storage approach should be selected?

A. OneLake shortcut to a Lakehouse
B. OneLake shortcut to ADLS Gen2
C. Native table
D. Dataflow Gen2

Answer: C

Explanation:
Native tables provide the lowest latency and are optimized for continuous streaming ingestion and real-time analytics.


Question 2

An organization already stores customer history in a Fabric Lakehouse and wants Eventhouse users to analyze the data without creating additional copies.

Which option should be used?

A. Native table
B. OneLake shortcut
C. Eventstream ingestion
D. Data Activator

Answer: B

Explanation:
OneLake shortcuts allow access to existing data without physically copying it into Eventhouse.


Question 3

What is the primary advantage of using OneLake shortcuts?

A. Faster ingestion speeds
B. Automatic materialized views
C. Lower query latency
D. Elimination of data duplication

Answer: D

Explanation:
Shortcuts provide virtual access to data and eliminate the need to create additional copies.


Question 4

Which feature is most strongly associated with native tables?

A. Single source of truth
B. External data access
C. Physical storage within Eventhouse
D. Reduced storage costs

Answer: C

Explanation:
Native tables physically store data within Eventhouse and are optimized for real-time analytics.


Question 5

A team wants to minimize storage costs while analyzing historical datasets already stored in OneLake.

Which option is best?

A. Native tables
B. OneLake shortcuts
C. Spark cache tables
D. Temporary KQL tables

Answer: B

Explanation:
Shortcuts allow direct access to existing data without storing another copy.


Question 6

Which scenario most strongly favors native tables?

A. Historical archive access
B. Shared enterprise data reuse
C. High-volume streaming telemetry analytics
D. Storage cost reduction

Answer: C

Explanation:
Native tables are designed for continuous ingestion and high-performance real-time analytics.


Question 7

A data engineer wants to support the OneLake principle of maintaining a single copy of organizational data.

Which option best aligns with this goal?

A. Native tables
B. Materialized views
C. Streaming ingestion
D. OneLake shortcuts

Answer: D

Explanation:
Shortcuts are specifically designed to support OneLake’s single-copy architecture.


Question 8

Which statement about native tables is true?

A. They never store data physically.
B. They generally provide better query performance than shortcuts.
C. They require external storage systems.
D. They cannot be queried with KQL.

Answer: B

Explanation:
Because the data is stored directly inside Eventhouse, native tables typically deliver the highest performance.


Question 9

A company wants to use advanced KQL features such as update policies and materialized views on streaming data.

Which approach should be selected?

A. OneLake shortcut
B. Warehouse shortcut
C. Native table
D. Dataflow Gen2

Answer: C

Explanation:
Advanced Eventhouse optimization features are most commonly associated with native tables.


Question 10

Which factor most commonly drives the decision to use a OneLake shortcut instead of a native table?

A. Requirement for lowest latency analytics
B. Requirement for continuous event ingestion
C. Requirement for materialized views
D. Requirement to avoid storing duplicate copies of data

Answer: D

Explanation:
The primary benefit of OneLake shortcuts is enabling data access without physically duplicating data, reducing storage costs and simplifying governance.


Go to the DP-700 Exam Prep Hub main page.

Choose an appropriate streaming engine (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform streaming data
      --> Choose an appropriate streaming engine


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Overview

Modern analytics solutions increasingly rely on the ability to process data as it is generated rather than waiting for scheduled batch loads. Streaming data enables organizations to react to events in near real time, support operational analytics, monitor systems, detect anomalies, and power intelligent applications.

In Microsoft Fabric, selecting the appropriate streaming engine is a critical design decision. The DP-700 exam expects candidates to understand the strengths, limitations, and ideal use cases of the various streaming technologies available in Fabric and to choose the most appropriate option based on business requirements.

This article explores the major streaming engines and technologies within Microsoft Fabric, how they compare, and when to use each one.


What Is Streaming Data?

Streaming data is data that arrives continuously from sources such as:

  • IoT devices
  • Sensors
  • Application logs
  • Clickstream events
  • Social media feeds
  • Financial transactions
  • Manufacturing equipment
  • Website activity
  • Real-time telemetry

Unlike batch processing, where data is collected and processed periodically, streaming systems process data as events arrive.

Common requirements include:

  • Low-latency processing
  • Real-time dashboards
  • Event detection
  • Alert generation
  • Continuous data ingestion
  • Streaming analytics

Streaming Technologies in Microsoft Fabric

The primary streaming technologies that data engineers encounter in Fabric include:

TechnologyPrimary Purpose
EventstreamReal-time event ingestion and routing
EventhouseReal-time analytics using KQL
KQL DatabaseHigh-performance streaming analytics
Real-Time IntelligenceEnd-to-end real-time analytics platform
Spark Structured StreamingLarge-scale streaming transformations
Data ActivatorEvent-driven actions and alerts
PipelinesScheduled orchestration (not true streaming)

Understanding when to use each is essential for the exam.


Eventstream

What Is Eventstream?

Eventstream is Fabric’s low-code real-time ingestion service.

It captures, transforms, filters, and routes streaming events from multiple sources to multiple destinations.

Think of Eventstream as the ingestion layer of a streaming architecture.


Common Sources

Eventstream can ingest data from:

  • Azure Event Hubs
  • Kafka endpoints
  • Fabric events
  • IoT sources
  • Real-time telemetry systems
  • Custom event producers

Common Destinations

Eventstream can send data to:

  • Eventhouse
  • KQL Databases
  • Lakehouses
  • Custom destinations
  • Activator

Best Use Cases

Choose Eventstream when:

  • Events must be continuously ingested
  • Minimal coding is desired
  • Data routing is required
  • Multiple downstream consumers need the same events
  • Building real-time analytics solutions

Exam Tip

If a scenario focuses on ingesting and routing real-time events, Eventstream is usually the best answer.


Eventhouse

What Is Eventhouse?

Eventhouse is a Real-Time Intelligence component optimized for storing and analyzing streaming data.

It is built on Kusto technology and provides:

  • High ingestion rates
  • Near real-time analytics
  • Time-series analysis
  • Log analytics
  • Event exploration

Key Characteristics

  • Optimized for append-only data
  • Supports KQL
  • Fast query performance
  • Near real-time visibility
  • Massive scalability

Best Use Cases

Use Eventhouse when:

  • Large volumes of events arrive continuously
  • Log analytics is required
  • Telemetry analysis is needed
  • Operational dashboards require low latency

Examples:

  • Website activity monitoring
  • Application diagnostics
  • Manufacturing telemetry
  • Security monitoring

KQL Databases

What Is a KQL Database?

A KQL database is the storage and query engine behind many real-time solutions.

It uses Kusto Query Language (KQL) and is highly optimized for:

  • Streaming ingestion
  • Log analytics
  • Time-series data
  • Event correlation

Advantages

  • Extremely fast analytical queries
  • Handles high ingestion volumes
  • Rich time-series functions
  • Powerful aggregation capabilities

Best Use Cases

Choose KQL databases when:

  • Event analysis is the primary objective
  • Massive event volumes exist
  • Time-based analysis is required
  • Operational monitoring is needed

Spark Structured Streaming

What Is Structured Streaming?

Spark Structured Streaming enables continuous processing using Apache Spark.

Unlike Eventstream and Eventhouse, Spark streaming is developer-focused and code-driven.

Supported languages include:

  • PySpark
  • Scala
  • Spark SQL

Capabilities

Spark Structured Streaming supports:

  • Complex transformations
  • Data enrichment
  • Machine learning integration
  • Streaming joins
  • Stateful processing
  • Advanced business logic

Best Use Cases

Choose Spark Structured Streaming when:

  • Complex transformations are required
  • Large-scale processing is needed
  • Machine learning must be integrated
  • Events must be joined with reference datasets
  • Custom code is acceptable

Examples:

  • Fraud detection
  • Customer behavior analytics
  • Streaming feature engineering
  • Predictive maintenance

Exam Tip

If a scenario requires advanced coding and transformation logic, Spark Structured Streaming is often the correct answer.


Real-Time Intelligence

What Is Real-Time Intelligence?

Real-Time Intelligence is Fabric’s complete platform for handling real-time data workloads.

It combines:

  • Eventstream
  • Eventhouse
  • KQL Databases
  • Data Activator
  • Real-time dashboards

Benefits

Provides:

  • End-to-end streaming architecture
  • Real-time monitoring
  • Event processing
  • Alerting
  • Operational analytics

Best Use Cases

Use Real-Time Intelligence when an organization needs:

  • Comprehensive streaming analytics
  • Operational dashboards
  • Real-time monitoring
  • Event-driven insights

Data Activator

What Is Data Activator?

Data Activator monitors events and automatically takes actions when specified conditions occur.

Examples include:

  • Sending emails
  • Triggering workflows
  • Generating notifications
  • Creating alerts

Example

If machine temperature exceeds 90°C:

  • Generate an alert
  • Notify engineers
  • Open a support ticket

Best Use Cases

Choose Data Activator when:

  • Business users need alerts
  • Event-driven automation is required
  • Low-code monitoring is desired

Pipelines Are Not Streaming Engines

A common DP-700 exam trap is confusing pipelines with streaming solutions.

Pipelines:

  • Execute scheduled workloads
  • Orchestrate activities
  • Handle batch data movement

Pipelines do NOT provide continuous event processing.


Appropriate Pipeline Scenarios

  • Daily data loads
  • Weekly ETL jobs
  • Scheduled orchestration
  • Batch transformations

Inappropriate Pipeline Scenarios

  • Second-by-second monitoring
  • Real-time alerts
  • Continuous event processing

Selecting the Appropriate Streaming Engine

Scenario 1: IoT Sensor Telemetry

Requirements:

  • Millions of sensor events
  • Real-time monitoring
  • Fast analytics

Best choice:

Eventstream + Eventhouse


Scenario 2: Fraud Detection

Requirements:

  • Stream transactions
  • Apply advanced business rules
  • Perform enrichment joins

Best choice:

Spark Structured Streaming


Scenario 3: Website Log Analysis

Requirements:

  • Continuous ingestion
  • Fast aggregations
  • Time-series analysis

Best choice:

KQL Database/Eventhouse


Scenario 4: Equipment Failure Alerts

Requirements:

  • Detect threshold breaches
  • Notify operators

Best choice:

Data Activator


Scenario 5: Enterprise Real-Time Analytics Platform

Requirements:

  • Complete streaming solution
  • Dashboards
  • Alerts
  • Analytics

Best choice:

Real-Time Intelligence


Comparison of Streaming Engines

RequirementRecommended Technology
Event ingestionEventstream
Event routingEventstream
Real-time analyticsEventhouse
Log analyticsKQL Database
Time-series analysisKQL Database
Complex transformationsSpark Structured Streaming
Machine learning on streamsSpark Structured Streaming
Alerts and notificationsData Activator
Complete real-time platformReal-Time Intelligence
Scheduled ETLPipelines

DP-700 Exam Tips

Remember these key distinctions:

  • Eventstream = ingestion and routing.
  • Eventhouse = real-time storage and analytics.
  • KQL Database = high-performance event analytics.
  • Spark Structured Streaming = advanced code-based processing.
  • Data Activator = alerts and automated actions.
  • Pipelines = orchestration, not streaming.
  • Real-Time Intelligence = end-to-end streaming solution.

Many exam questions focus on matching business requirements to the correct streaming technology.


Practice Exam Questions

Question 1

A company needs to ingest streaming telemetry from thousands of IoT devices and route the data to multiple downstream consumers.

Which Fabric component should be used?

A. Data Activator
B. Eventstream
C. Pipeline
D. Notebook

Answer: B

Explanation:
Eventstream is specifically designed for real-time event ingestion and routing. Data Activator generates actions, pipelines handle batch orchestration, and notebooks perform transformations rather than ingestion.


Question 2

A solution requires advanced stream processing with custom Python code, joins against reference datasets, and machine learning inference.

Which technology should be selected?

A. Eventhouse
B. Spark Structured Streaming
C. KQL Database
D. Data Activator

Answer: B

Explanation:
Spark Structured Streaming supports complex transformations, enrichment, stateful processing, and machine learning integration through PySpark.


Question 3

A team needs extremely fast analytics over continuously arriving log data and plans to use KQL.

Which storage engine is most appropriate?

A. KQL Database
B. Dataflow Gen2
C. Warehouse
D. Pipeline

Answer: A

Explanation:
KQL databases are optimized for streaming ingestion, time-series analysis, and log analytics.


Question 4

A business user wants automatic notifications whenever inventory levels fall below a threshold.

Which Fabric component is best suited?

A. Eventstream
B. Notebook
C. Data Activator
D. Pipeline

Answer: C

Explanation:
Data Activator monitors data conditions and triggers automated actions such as alerts and notifications.


Question 5

Which Fabric component is primarily responsible for routing real-time events to destinations?

A. Warehouse
B. Eventstream
C. Dataflow Gen2
D. Notebook

Answer: B

Explanation:
Eventstream serves as the ingestion and routing layer for streaming architectures.


Question 6

A company requires an end-to-end platform for ingesting, storing, analyzing, and monitoring streaming events.

Which solution should be recommended?

A. Real-Time Intelligence
B. Dataflow Gen2
C. Warehouse
D. SQL Endpoint

Answer: A

Explanation:
Real-Time Intelligence combines ingestion, analytics, monitoring, alerting, and visualization capabilities into a unified platform.


Question 7

Which technology is best suited for analyzing application logs with time-series queries and low-latency reporting?

A. Notebook
B. Warehouse
C. Eventhouse
D. Pipeline

Answer: C

Explanation:
Eventhouse is optimized for streaming analytics, log analysis, and time-series workloads.


Question 8

A solution requires nightly ingestion of source data into a lakehouse.

Which option is most appropriate?

A. Eventstream
B. Data Activator
C. Eventhouse
D. Pipeline

Answer: D

Explanation:
Nightly ingestion is a batch process and is best handled through scheduled pipeline execution.


Question 9

A data engineer needs to continuously enrich streaming events using lookup data and perform custom business-rule calculations.

Which technology should be selected?

A. Spark Structured Streaming
B. Data Activator
C. Eventstream
D. Dashboard

Answer: A

Explanation:
Spark Structured Streaming provides advanced transformation capabilities including joins, aggregations, and custom code execution.


Question 10

Which statement best describes Eventhouse?

A. A workflow orchestration service for ETL processes
B. A low-code data preparation tool
C. A real-time analytics store optimized for event and telemetry data
D. A machine learning training environment

Answer: C

Explanation:
Eventhouse is designed for high-scale event ingestion, real-time analytics, log analytics, and KQL-based querying of streaming data.


Go to the DP-700 Exam Prep Hub main page.

Design and implement a loading pattern for streaming data (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Design and implement loading patterns
      --> Design and implement a loading pattern for streaming data


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

Traditional batch data processing has been the foundation of analytics systems for decades. However, many modern business scenarios require data to be processed and analyzed as soon as it is generated. Examples include IoT sensors, website clickstreams, financial transactions, manufacturing equipment telemetry, and application monitoring.

Microsoft Fabric provides several capabilities that support streaming and real-time analytics through its Real-Time Intelligence workloads, Eventstreams, KQL databases, Data Activator, Lakehouses, and Spark technologies.

For the DP-700 exam, you should understand:

  • Streaming versus batch processing
  • Real-time and near real-time architectures
  • Event-driven data ingestion
  • Eventstreams
  • Event processing patterns
  • Streaming destinations
  • KQL databases
  • Lakehouse streaming ingestion
  • Event-driven orchestration
  • Windowing concepts
  • Checkpointing and fault tolerance
  • Performance and scalability considerations

Many DP-700 scenario questions focus on choosing the appropriate loading pattern based on latency requirements and business needs.


Understanding Streaming Data

Streaming data is data that arrives continuously over time rather than in large batches.

Examples include:

SourceExample Data
IoT DevicesTemperature readings
Web ApplicationsUser clicks
Retail SystemsPurchases
Mobile AppsUser activity
Manufacturing EquipmentSensor telemetry
Financial SystemsTransaction events

Instead of loading data once per day, streaming systems continuously process incoming events.


Batch vs Streaming Processing

Batch Processing

Processes accumulated data at scheduled intervals.

Example:

Daily Sales File
Midnight ETL Process
Data Warehouse

Characteristics:

  • High latency
  • Simpler architecture
  • Efficient for large historical datasets

Streaming Processing

Processes events continuously as they arrive.

Example:

Sensor Event
Immediate Processing
Analytics Platform

Characteristics:

  • Low latency
  • Near real-time insights
  • Event-driven architecture

Streaming Data Latency Categories

Real-Time

Typically seconds or less.

Example:

Fraud Detection

Near Real-Time

Typically seconds to minutes.

Example:

Operational Dashboards

Micro-Batch

Small batches processed frequently.

Example:

Every 30 Seconds
Every 1 Minute
Every 5 Minutes

Many streaming implementations in Fabric use micro-batch processing internally.


Streaming Architecture in Microsoft Fabric

A common Fabric streaming architecture:

Event Source
Eventstream
Transformation
Destination
Analytics

Possible destinations include:

  • KQL Database
  • Lakehouse
  • Warehouse
  • Real-Time Dashboard

Event-Driven Processing

Streaming systems are event-driven.

An event represents something that happened.

Examples:

Order Created
Order Updated
Machine Started
Temperature Changed
Sensor Failed

Events are generated continuously and processed immediately.


Eventstreams

Eventstreams are one of the core ingestion services in Microsoft Fabric Real-Time Intelligence.

Eventstreams provide:

  • Event ingestion
  • Routing
  • Filtering
  • Transformation
  • Distribution

Eventstreams simplify streaming architecture by reducing custom development requirements.


Eventstream Sources

Common sources include:

Azure Event Hubs

High-volume event ingestion service.

IoT Hubs

Designed for IoT device communication.

Fabric Events

Events generated within Fabric workloads.

Custom Applications

Applications publishing events directly.


Eventstream Destinations

Eventstreams can route data to:

KQL Databases

Optimized for real-time analytics.

Lakehouses

Supports historical storage and analytics.

Eventhouse

Supports large-scale streaming workloads.

Activator

Supports automated actions and alerts.


Designing a Streaming Loading Pattern

A typical design includes:

Event Producer
Eventstream
Validation
Transformation
Storage Layer
Analytics

Each stage serves a specific purpose.


Step 1: Event Ingestion

The first step is capturing events from source systems.

Example:

Manufacturing Sensor
Temperature Reading
Eventstream

The ingestion layer must support:

  • High throughput
  • Reliability
  • Scalability

Step 2: Data Validation

Streaming data often contains:

  • Missing fields
  • Invalid values
  • Corrupt messages

Example:

Temperature = NULL

Such events may be:

  • Rejected
  • Corrected
  • Routed elsewhere

Step 3: Stream Transformation

Common transformations include:

Filtering

Remove unnecessary events.

Example:

Temperature > 80

Enrichment

Add contextual information.

Example:

Device ID
+
Location Data

Aggregation

Combine multiple events.

Example:

Average Temperature
Per Minute

Step 4: Storage

Streaming systems often separate:

Hot Storage

Recent data for immediate analysis.

Cold Storage

Historical data for long-term reporting.

Fabric commonly uses:

KQL Database
+
Lakehouse

for this purpose.


KQL Databases

KQL databases are optimized for:

  • Time-series data
  • Telemetry
  • Log analytics
  • Streaming workloads

Benefits include:

  • Fast ingestion
  • High query performance
  • Real-time dashboards

For DP-700, KQL databases are frequently associated with streaming scenarios.


Lakehouse Streaming Storage

Streaming data can also be written into Delta tables within a Lakehouse.

Benefits:

  • Historical retention
  • Data science workloads
  • Machine learning
  • Unified analytics

This pattern combines real-time and batch analytics.


Eventhouse

Eventhouse is designed for:

  • Large-scale event analytics
  • Streaming workloads
  • Real-time intelligence solutions

It integrates closely with KQL databases and Eventstreams.


Windowing Concepts

Streaming systems often process data using windows.

A window groups events together for calculations.


Tumbling Window

Fixed non-overlapping intervals.

Example:

12:00-12:05
12:05-12:10
12:10-12:15

Each event belongs to one window.


Sliding Window

Windows overlap.

Example:

Every minute
Last 5 minutes

Provides continuous calculations.


Session Window

Groups events based on activity.

Example:

User Activity Session

Useful for clickstream analysis.


Checkpointing

Checkpointing tracks processing progress.

Purpose:

  • Recovery after failures
  • Prevent data loss
  • Avoid duplicate processing

Without checkpointing:

System Failure
Reprocess Everything

With checkpointing:

System Failure
Resume From Last Checkpoint

Fault Tolerance

Streaming architectures must handle failures.

Strategies include:

Retry Logic

Automatically retry failed operations.

Checkpointing

Resume processing after failures.

Durable Storage

Persist data before processing.

Dead-Letter Queues

Store problematic events for investigation.


Event Ordering

Events may arrive out of sequence.

Example:

Event 3
Event 1
Event 2

Streaming solutions may require:

  • Event timestamps
  • Watermarks
  • Reordering logic

Scalability Considerations

Streaming systems must scale with event volume.

Important considerations:

Throughput

Events processed per second.

Parallelism

Multiple processors handling data simultaneously.

Partitioning

Distributing events across resources.

Resource Management

Balancing cost and performance.


Streaming vs Batch Loading in Fabric

CharacteristicBatchStreaming
LatencyMinutes to HoursSeconds
TriggerScheduleEvent
ProcessingPeriodicContinuous
Use CaseHistorical ReportingOperational Analytics
ArchitectureSimplerMore Complex

Common Fabric Streaming Patterns

Pattern 1: IoT Analytics

IoT Devices
Eventstream
KQL Database
Real-Time Dashboard

Pattern 2: Operational Monitoring

Applications
Eventstream
Eventhouse
Alerts

Pattern 3: Real-Time + Historical Analytics

Events
Eventstream
Lakehouse
Delta Tables
Analytics

Common DP-700 Exam Scenarios

Scenario 1

A company wants dashboards updated within seconds of receiving telemetry.

Best solution:

Streaming ingestion using Eventstreams and KQL databases


Scenario 2

A manufacturing system generates millions of sensor events daily.

Best solution:

Eventstream → Eventhouse → KQL Database


Scenario 3

An organization wants real-time analytics and historical reporting.

Best solution:

Eventstream → Lakehouse → Delta Tables


Scenario 4

A system must automatically alert users when a sensor exceeds a threshold.

Best solution:

Streaming ingestion with Data Activator


Best Practices

Use Eventstreams for Ingestion

Provides scalable event routing and transformation.


Use KQL Databases for Real-Time Analytics

Optimized for telemetry and time-series data.


Store Historical Data in Lakehouses

Supports long-term analytics and machine learning.


Implement Checkpointing

Improves reliability and recovery.


Design for Scalability

Plan for growth in event volume.


Validate Data Early

Prevent poor-quality events from contaminating downstream systems.


DP-700 Exam Focus Areas

You should understand:

✓ Streaming vs batch processing

✓ Event-driven architectures

✓ Eventstreams

✓ Eventhouse

✓ KQL databases

✓ Real-time analytics

✓ Near real-time processing

✓ Windowing concepts

✓ Streaming transformations

✓ Event routing

✓ Checkpointing

✓ Fault tolerance

✓ Lakehouse streaming ingestion

✓ Real-Time Intelligence workloads


Practice Exam Questions

Question 1

A company requires dashboards to update within seconds of receiving IoT telemetry. Which loading pattern should be implemented?

A. Weekly snapshot loading

B. Daily batch processing

C. Streaming ingestion

D. Full data reloads

Answer: C

Explanation

Streaming ingestion provides low-latency processing and supports near real-time dashboard updates.


Question 2

Which Microsoft Fabric component is primarily used to ingest, route, and transform streaming events?

A. Dataflow Gen2

B. Eventstream

C. Warehouse

D. Deployment Pipeline

Answer: B

Explanation

Eventstreams are specifically designed for real-time event ingestion, transformation, and routing.


Question 3

A data engineer needs a destination optimized for time-series analytics and rapid ingestion of telemetry data.

Which destination should be selected?

A. Lakehouse

B. Warehouse

C. KQL Database

D. Dataflow Gen2

Answer: C

Explanation

KQL databases are optimized for real-time analytics, telemetry, and log data.


Question 4

What is the primary benefit of checkpointing in a streaming solution?

A. Enables recovery after processing failures

B. Compresses event data

C. Eliminates duplicates permanently

D. Encrypts incoming events

Answer: A

Explanation

Checkpointing records processing progress, allowing recovery from the last successful point after failures.


Question 5

Which window type uses fixed, non-overlapping intervals?

A. Session window

B. Tumbling window

C. Dynamic window

D. Watermark window

Answer: B

Explanation

Tumbling windows divide data into fixed intervals without overlap.


Question 6

An organization wants to preserve streaming data for long-term analytics and machine learning workloads.

Which destination is most appropriate?

A. Lakehouse

B. Data Activator

C. Eventstream

D. Workspace

Answer: A

Explanation

Lakehouses provide scalable storage and support advanced analytics and machine learning.


Question 7

Which characteristic most distinguishes streaming processing from batch processing?

A. Lower storage requirements

B. Simpler architecture

C. Continuous event processing

D. Larger processing windows

Answer: C

Explanation

Streaming systems process data continuously as events arrive rather than at scheduled intervals.


Question 8

A user activity analysis solution must group events based on periods of user activity separated by inactivity.

Which window type should be used?

A. Sliding window

B. Tumbling window

C. Fixed window

D. Session window

Answer: D

Explanation

Session windows are designed to group events according to user activity sessions.


Question 9

What is the primary purpose of event enrichment during stream processing?

A. Delete invalid records

B. Add contextual information to events

C. Increase event frequency

D. Reduce storage costs

Answer: B

Explanation

Enrichment adds additional business or reference data to incoming events to improve analytical value.


Question 10

A company requires a Fabric architecture that supports both real-time analytics and historical analysis of streaming data.

Which design is most appropriate?

A. Eventstream → KQL Database only

B. Dataflow Gen2 → Warehouse

C. Eventstream → Lakehouse → Delta Tables

D. Scheduled Pipeline → Warehouse

Answer: C

Explanation

Writing streaming data to a Lakehouse enables historical retention while supporting analytical workloads through Delta tables.


Exam Tip

For DP-700, remember the following associations:

RequirementRecommended Fabric Technology
Real-time event ingestionEventstream
Time-series analyticsKQL Database
Large-scale event analyticsEventhouse
Long-term storageLakehouse
Automated event-driven actionsData Activator
Continuous processingStreaming Pattern
Scheduled processingBatch Pattern

A common exam clue is wording such as:

“Data must be available for analysis within seconds of being generated.”

When you see this requirement, the correct solution will almost always involve streaming ingestion, Eventstreams, and often KQL databases or Eventhouse, rather than traditional batch-oriented pipelines.


Go to the DP-700 Exam Prep Hub main page.

Describe the difference between Batch and Streaming data (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe an analytics workload (25–30%)
--> Describe considerations for real-time data analytics
--> Describe the difference between Batch and Streaming data


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Understanding the difference between batch data and streaming data is fundamental for designing modern analytics solutions. These two approaches define how data is ingested, processed, and analyzed.


What Is Batch Data?

Batch data refers to data that is:

  • Collected over a period of time
  • Processed in large chunks (batches)
  • Handled at scheduled intervals

Key Characteristics of Batch Data

  • High latency (minutes, hours, or days)
  • Processes large volumes at once
  • Typically scheduled (e.g., nightly jobs)
  • Efficient and cost-effective

Common Use Cases

  • Daily sales reports
  • Monthly financial summaries
  • Historical data analysis
  • Data warehousing workloads

Azure Services for Batch Processing

  • Azure Data Factory → batch ingestion and orchestration
  • Azure Synapse Analytics → batch processing and analytics

What Is Streaming Data?

Streaming data refers to data that is:

  • Generated continuously
  • Processed in real time (or near real time)
  • Handled as individual events or small micro-batches

Key Characteristics of Streaming Data

  • Low latency (seconds or milliseconds)
  • Continuous data flow
  • Enables real-time insights
  • Often requires more complex processing

Common Use Cases

  • IoT sensor monitoring
  • Fraud detection
  • Live dashboards
  • Website activity tracking

Azure Services for Streaming

  • Azure Event Hubs → event ingestion
  • Azure Stream Analytics → real-time processing

Batch vs Streaming — Key Differences

FeatureBatch ProcessingStreaming Processing
Data FlowPeriodicContinuous
LatencyHighLow
Data SizeLarge chunksSmall events
ComplexitySimplerMore complex
CostLowerHigher
Use CaseHistorical analysisReal-time insights

When to Use Batch Processing

Choose batch when:

  • Real-time data is not required
  • You are working with large historical datasets
  • Cost efficiency is important
  • Processing can occur on a schedule

When to Use Streaming Processing

Choose streaming when:

  • You need real-time or near real-time insights
  • Data is generated continuously
  • Immediate action is required

Hybrid Approaches (Lambda / Modern Architectures)

Many modern systems use both:

  • Batch layer → historical analysis
  • Streaming layer → real-time insights

✔ Example:

  • Real-time dashboard + nightly aggregated reports

Why This Matters for DP-900

On the exam, you may be asked to:

  • Distinguish between batch and streaming scenarios
  • Choose the appropriate processing method
  • Identify Azure services for each approach
  • Understand trade-offs (latency, cost, complexity)

Summary — Exam-Relevant Takeaways

Batch processing

  • Processes data in chunks
  • Higher latency
  • Lower cost
  • Best for historical analysis

Streaming processing

  • Processes data continuously
  • Low latency
  • Enables real-time insights
  • More complex

✔ Azure services:

  • Batch → Azure Data Factory, Azure Synapse Analytics
  • Streaming → Azure Event Hubs, Azure Stream Analytics

✔ Exam tip:
👉 Real-time requirement → Streaming
👉 Scheduled / historical → Batch


Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

Practice Questions: Describe the difference between Batch and Streaming data (DP-900 Exam Prep)

Practice Questions


Question 1

What is the primary characteristic of batch data processing?

A. Continuous data flow
B. Real-time processing
C. Processing data in scheduled chunks
D. Immediate event handling

Answer: C

Explanation:
Batch processing handles data in groups at scheduled intervals, not continuously.


Question 2

Which type of processing is BEST suited for real-time analytics?

A. Batch processing
B. Stream processing
C. Periodic processing
D. Manual processing

Answer: B

Explanation:
Stream processing enables real-time or near real-time insights.


Question 3

Which Azure service is commonly used for streaming data ingestion?

A. Azure Data Factory
B. Azure Event Hubs
C. Azure Synapse Analytics
D. Azure SQL Database

Answer: B

Explanation:
Azure Event Hubs is designed for high-throughput, real-time data ingestion.


Question 4

Which scenario is BEST suited for batch processing?

A. Monitoring live stock prices
B. Detecting fraud in real time
C. Generating a monthly financial report
D. Tracking website clicks instantly

Answer: C

Explanation:
Batch processing is ideal for scheduled, periodic workloads like reports.


Question 5

What is the typical latency for streaming data processing?

A. Hours
B. Days
C. Seconds or milliseconds
D. Weeks

Answer: C

Explanation:
Streaming processing provides low-latency, near real-time results.


Question 6

Which Azure service is used to process streaming data in real time?

A. Azure Blob Storage
B. Azure Stream Analytics
C. Azure Files
D. Azure Virtual Machines

Answer: B

Explanation:
Azure Stream Analytics processes streaming data in real time.


Question 7

Which statement about batch processing is TRUE?

A. It processes data continuously
B. It always requires real-time data sources
C. It is typically more cost-effective than streaming
D. It has lower latency than streaming

Answer: C

Explanation:
Batch processing is generally more cost-efficient than continuous streaming.


Question 8

Which scenario requires streaming processing?

A. Archiving old data
B. Processing annual tax records
C. Monitoring IoT sensor data in real time
D. Generating quarterly reports

Answer: C

Explanation:
Streaming is needed for continuous, real-time data flows like IoT.


Question 9

What is a key difference between batch and streaming processing?

A. Batch uses structured data, streaming does not
B. Streaming has higher latency than batch
C. Batch processes data in chunks, streaming processes data continuously
D. Streaming is always cheaper than batch

Answer: C

Explanation:
Batch = periodic chunks, Streaming = continuous flow.


Question 10

Which approach would you choose if immediate action is required based on incoming data?

A. Batch processing
B. Stream processing
C. Scheduled processing
D. Offline processing

Answer: B

Explanation:
Streaming is required when real-time decisions are needed.


✅ Quick Exam Takeaways

Batch processing

  • Scheduled
  • High latency
  • Cost-effective
  • Best for historical analysis

Streaming processing

  • Continuous
  • Low latency
  • Real-time insights
  • More complex

✔ Azure services:

  • Batch → Azure Data Factory, Azure Synapse Analytics
  • Streaming → Azure Event Hubs, Azure Stream Analytics

✔ Exam tip:
👉 Real-time = Streaming
👉 Scheduled/historical = Batch


Go to the DP-900 Exam Prep Hub main page.