Tag: Real-time analytics

Choose between native tables and OneLake shortcuts in Real-Time Intelligence (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform streaming data
      --> Choose between native tables and OneLake shortcuts in Real-Time Intelligence


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the key design decisions when building real-time analytics solutions in Microsoft Fabric is determining where data should reside and how it should be accessed. Within Real-Time Intelligence, data engineers frequently encounter scenarios where they must choose between:

  • Native Tables in Eventhouse/KQL databases
  • OneLake Shortcuts to data stored elsewhere

Understanding the differences between these approaches is important for the DP-700 exam because the choice impacts:

  • Query performance
  • Data latency
  • Storage costs
  • Data governance
  • Data duplication
  • Maintenance complexity

A successful data engineer must understand when to ingest data directly into Real-Time Intelligence and when to reference existing data through shortcuts.


Understanding Real-Time Intelligence

Real-Time Intelligence is Microsoft Fabric’s solution for ingesting, analyzing, and acting upon streaming and operational data.

Key components include:

  • Eventstream
  • Eventhouse
  • KQL Databases
  • Data Activator
  • Real-Time Dashboards

Data stored within Eventhouse and KQL databases can come from multiple sources:

  • Direct streaming ingestion
  • Batch ingestion
  • External storage systems
  • OneLake data sources

This is where the choice between native tables and OneLake shortcuts becomes important.


What Are Native Tables?

Native tables are physical tables stored directly inside a KQL database or Eventhouse.

When data is ingested into Real-Time Intelligence, it is written into these tables and becomes part of the Eventhouse storage engine.


Characteristics of Native Tables

Native tables:

  • Physically store data
  • Support extremely fast query performance
  • Are optimized for time-series analytics
  • Support continuous streaming ingestion
  • Provide low-latency access
  • Support update policies and materialized views
  • Enable advanced KQL analytics

Native Table Architecture

Event Source
Eventstream
Native Table
KQL Queries
Dashboards / Analytics

Data resides directly within the Eventhouse environment.


Advantages of Native Tables

Highest Query Performance

Because data is physically stored in the Eventhouse engine, query execution is highly optimized.

Benefits include:

  • Faster aggregations
  • Faster filtering
  • Lower latency
  • Better concurrency

Optimized for Streaming Workloads

Native tables are specifically designed for:

  • High ingestion rates
  • Continuous event streams
  • Telemetry data
  • Operational analytics

Support for Advanced Features

Native tables support:

  • Materialized views
  • Update policies
  • Data retention policies
  • Cached query execution
  • Time-series functions

Lower Query Latency

Real-time dashboards often require results within seconds.

Native tables generally provide the lowest latency.


Disadvantages of Native Tables

Data Duplication

The same data may already exist elsewhere:

  • Lakehouse
  • Warehouse
  • ADLS Gen2
  • Other databases

Ingesting into native tables creates another copy.


Increased Storage Costs

More copies of data mean:

  • More storage consumption
  • Additional retention management

Additional Ingestion Processing

Data must be:

  • Moved
  • Loaded
  • Managed

before it becomes available.


What Are OneLake Shortcuts?

A OneLake shortcut provides a virtual reference to data stored elsewhere.

Rather than copying data into Eventhouse, Real-Time Intelligence accesses the existing data through the shortcut.


Shortcut Concept

Instead of:

Source → Copy → Eventhouse

You get:

Source → OneLake Shortcut → Query

No physical duplication occurs.


Supported Sources

Shortcuts can reference:

  • Fabric Lakehouses
  • Fabric Warehouses
  • Azure Data Lake Storage Gen2
  • Amazon S3
  • Other supported storage locations

Characteristics of OneLake Shortcuts

Shortcuts:

  • Avoid copying data
  • Provide a single source of truth
  • Reduce storage costs
  • Simplify governance
  • Enable data reuse

Advantages of OneLake Shortcuts

Eliminate Data Duplication

One of the biggest advantages.

Instead of storing multiple copies:

One Source
Multiple Consumers

All consumers access the same data.


Lower Storage Costs

Since data is not duplicated:

  • Less storage consumption
  • Lower management overhead

Faster Data Availability

No ingestion process is required.

Data becomes accessible immediately after the shortcut is created.


Improved Governance

Governance becomes easier because:

  • Data remains in one location
  • Policies remain centralized
  • Data lineage remains clearer

Supports the One Copy Vision

OneLake is built around the principle of:

“One copy of data for the entire organization.”

Shortcuts are a key enabler of this strategy.


Disadvantages of OneLake Shortcuts

Potentially Higher Query Latency

Because data is not stored locally:

  • Queries may require additional access steps
  • Performance can be slower than native tables

Limited Optimization

Some advanced Eventhouse optimization capabilities are most effective with native data.

Examples include:

  • Materialized views
  • Update policies
  • Streaming ingestion optimizations

Dependency on Source Availability

If the source becomes unavailable:

  • Queries may fail
  • Performance may degrade

Native tables do not have this dependency.


When to Choose Native Tables

Choose native tables when:

Real-Time Performance Is Critical

Examples:

  • Monitoring dashboards
  • Security analytics
  • Fraud detection
  • Manufacturing telemetry

Continuous Streaming Ingestion Exists

Examples:

  • IoT sensors
  • Application logs
  • Device telemetry

High Query Volumes Are Expected

Examples:

  • Enterprise dashboards
  • Operational reporting

Advanced KQL Features Are Required

Examples:

  • Materialized views
  • Update policies
  • Retention policies

When to Choose OneLake Shortcuts

Choose shortcuts when:

Data Already Exists in OneLake

Avoid creating unnecessary copies.


Storage Costs Must Be Minimized

Shortcuts reduce storage requirements.


Data Sharing Is Important

Multiple teams can access the same dataset.


Data Is Primarily Historical

Examples:

  • Historical archives
  • Reference datasets
  • Slowly changing datasets

Governance Is a Priority

Maintaining a single source of truth simplifies compliance and governance efforts.


Comparing Native Tables and OneLake Shortcuts

FeatureNative TablesOneLake Shortcuts
Physical storageYesNo
Data duplicationYesNo
Storage costHigherLower
Query performanceHighestGood
Streaming ingestionExcellentNot primary purpose
Advanced KQL featuresFull supportLimited scenarios
Data governanceMore complexSimpler
Single source of truthNoYes
Real-time analyticsBest choiceSuitable in some cases
Historical data accessGoodExcellent

Common DP-700 Exam Scenarios

Scenario 1

A manufacturing company ingests millions of telemetry events every minute and requires dashboards that refresh within seconds.

Best Choice: Native Tables

Reason:

  • Maximum ingestion performance
  • Lowest query latency

Scenario 2

An organization already stores enterprise sales data in a Fabric Lakehouse and wants Eventhouse users to analyze it without creating another copy.

Best Choice: OneLake Shortcut

Reason:

  • Eliminates duplication
  • Supports centralized governance

Scenario 3

A security operations center performs continuous threat monitoring using KQL.

Best Choice: Native Tables

Reason:

  • Optimized for streaming analytics
  • Fast query response times

Scenario 4

A data engineering team needs occasional access to historical archive data stored in ADLS Gen2.

Best Choice: OneLake Shortcut

Reason:

  • No need to ingest large historical datasets
  • Lower storage costs

Decision Framework

Ask the following questions:

Is the data arriving continuously?

If yes → Native Tables.


Is ultra-low latency required?

If yes → Native Tables.


Does the data already exist in OneLake?

If yes → Consider OneLake Shortcuts.


Is avoiding duplication important?

If yes → OneLake Shortcuts.


Are advanced KQL optimization features required?

If yes → Native Tables.


DP-700 Exam Tips

Remember these key distinctions:

  • Native tables physically store data inside Eventhouse.
  • Native tables provide the highest performance.
  • Native tables are ideal for streaming ingestion.
  • OneLake shortcuts reference data without copying it.
  • Shortcuts support the One Copy vision of OneLake.
  • Shortcuts reduce storage costs.
  • Native tables are preferred when low-latency analytics is critical.
  • Shortcuts are preferred when data already exists elsewhere and duplication should be avoided.
  • Exam questions often focus on balancing performance versus storage and governance.

Practice Exam Questions

Question 1

A company requires sub-second analytics on continuously arriving IoT telemetry data in Eventhouse.

Which storage approach should be selected?

A. OneLake shortcut to a Lakehouse
B. OneLake shortcut to ADLS Gen2
C. Native table
D. Dataflow Gen2

Answer: C

Explanation:
Native tables provide the lowest latency and are optimized for continuous streaming ingestion and real-time analytics.


Question 2

An organization already stores customer history in a Fabric Lakehouse and wants Eventhouse users to analyze the data without creating additional copies.

Which option should be used?

A. Native table
B. OneLake shortcut
C. Eventstream ingestion
D. Data Activator

Answer: B

Explanation:
OneLake shortcuts allow access to existing data without physically copying it into Eventhouse.


Question 3

What is the primary advantage of using OneLake shortcuts?

A. Faster ingestion speeds
B. Automatic materialized views
C. Lower query latency
D. Elimination of data duplication

Answer: D

Explanation:
Shortcuts provide virtual access to data and eliminate the need to create additional copies.


Question 4

Which feature is most strongly associated with native tables?

A. Single source of truth
B. External data access
C. Physical storage within Eventhouse
D. Reduced storage costs

Answer: C

Explanation:
Native tables physically store data within Eventhouse and are optimized for real-time analytics.


Question 5

A team wants to minimize storage costs while analyzing historical datasets already stored in OneLake.

Which option is best?

A. Native tables
B. OneLake shortcuts
C. Spark cache tables
D. Temporary KQL tables

Answer: B

Explanation:
Shortcuts allow direct access to existing data without storing another copy.


Question 6

Which scenario most strongly favors native tables?

A. Historical archive access
B. Shared enterprise data reuse
C. High-volume streaming telemetry analytics
D. Storage cost reduction

Answer: C

Explanation:
Native tables are designed for continuous ingestion and high-performance real-time analytics.


Question 7

A data engineer wants to support the OneLake principle of maintaining a single copy of organizational data.

Which option best aligns with this goal?

A. Native tables
B. Materialized views
C. Streaming ingestion
D. OneLake shortcuts

Answer: D

Explanation:
Shortcuts are specifically designed to support OneLake’s single-copy architecture.


Question 8

Which statement about native tables is true?

A. They never store data physically.
B. They generally provide better query performance than shortcuts.
C. They require external storage systems.
D. They cannot be queried with KQL.

Answer: B

Explanation:
Because the data is stored directly inside Eventhouse, native tables typically deliver the highest performance.


Question 9

A company wants to use advanced KQL features such as update policies and materialized views on streaming data.

Which approach should be selected?

A. OneLake shortcut
B. Warehouse shortcut
C. Native table
D. Dataflow Gen2

Answer: C

Explanation:
Advanced Eventhouse optimization features are most commonly associated with native tables.


Question 10

Which factor most commonly drives the decision to use a OneLake shortcut instead of a native table?

A. Requirement for lowest latency analytics
B. Requirement for continuous event ingestion
C. Requirement for materialized views
D. Requirement to avoid storing duplicate copies of data

Answer: D

Explanation:
The primary benefit of OneLake shortcuts is enabling data access without physically duplicating data, reducing storage costs and simplifying governance.


Go to the DP-700 Exam Prep Hub main page.

Identify Microsoft Cloud Services for real-time analytics (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe an analytics workload (25–30%)
--> Describe considerations for real-time data analytics
--> Identify Microsoft Cloud Services for real-time analytics


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Real-time analytics enables organizations to ingest, process, and analyze data as it is generated, allowing for immediate insights and actions. Microsoft Azure provides several services specifically designed to support real-time analytics workloads.

For the DP-900 exam, you should understand which services are used, their roles, and how they work together in a streaming architecture.


What Is Real-Time Analytics?

Real-time analytics refers to:

  • Processing data as it arrives (streaming data)
  • Producing insights with low latency (seconds or milliseconds)
  • Supporting immediate decision-making

Key Components of a Real-Time Analytics Solution

A typical real-time pipeline includes:

  1. Ingestion → Capture streaming data
  2. Processing → Analyze and transform data
  3. Storage → Persist results
  4. Visualization → Display insights

Core Azure Services for Real-Time Analytics


1. Event Ingestion Services


Azure Event Hubs

Purpose

  • High-throughput event ingestion service

Key Features

  • Handles millions of events per second
  • Scalable and distributed
  • Supports real-time data pipelines

Use Cases

  • IoT telemetry ingestion
  • Application logs
  • Streaming data pipelines

Think: “Entry point for streaming data”


Azure IoT Hub

Purpose

  • Specialized ingestion for IoT devices

Key Features

  • Device-to-cloud communication
  • Secure device management

Use Cases

  • Sensor data
  • Connected devices

Think: “Event Hubs for IoT scenarios”


2. Stream Processing Services


Azure Stream Analytics

Purpose

  • Real-time data processing using SQL-like queries

Key Features

  • Low-latency processing
  • Easy-to-use query language
  • Built-in integrations with Azure services

Use Cases

  • Real-time dashboards
  • Fraud detection
  • Alerting systems

Think: “Real-time analytics with SQL”


Azure Databricks

Purpose

  • Advanced stream and batch processing using Apache Spark

Key Features

  • Supports structured streaming
  • Handles large-scale data processing
  • Integrates with machine learning workflows

Use Cases

  • Complex event processing
  • Advanced analytics
  • Machine learning pipelines

Think: “Powerful, flexible streaming + big data processing”


3. Real-Time Analytics & Query Services


Azure Synapse Analytics

Purpose

  • Analyze streaming and batch data

Key Features

  • Integrates with streaming pipelines
  • Supports near real-time analytics

✔ Often used as part of a larger analytics architecture


Microsoft Fabric

Purpose

  • End-to-end analytics including real-time capabilities

Key Features

  • Real-Time Analytics workloads
  • Integrated with OneLake and Power BI
  • Unified platform for ingestion, processing, and visualization

Think: “All-in-one analytics platform (including real-time)”


How These Services Work Together

Typical Real-Time Pipeline

  1. Ingestion
    • Azure Event Hubs / Azure IoT Hub
  2. Processing
    • Azure Stream Analytics / Azure Databricks
  3. Storage
    • Data Lake / Synapse / Fabric OneLake
  4. Visualization
    • Power BI / Fabric dashboards

Service Selection Guidance


Use Azure Event Hubs when:

  • You need high-throughput event ingestion
  • Handling streaming data at scale

Use Azure IoT Hub when:

  • You are working with connected devices (IoT)

Use Azure Stream Analytics when:

  • You want simple, SQL-based real-time processing
  • Need quick setup and low complexity

Use Azure Databricks when:

  • You need advanced processing or machine learning
  • Working with complex or large-scale streaming data

Use Microsoft Fabric when:

  • You want a unified platform with real-time analytics built in
  • Need end-to-end analytics (data + BI)

Why This Matters for DP-900

On the exam, you may be asked to:

  • Identify which service handles streaming ingestion vs processing
  • Choose the correct service for real-time scenarios
  • Understand how services work together in a pipeline

Summary — Exam-Relevant Takeaways

✔ Real-time analytics = low-latency insights from streaming data

✔ Core services:

  • Ingestion
    • Azure Event Hubs
    • Azure IoT Hub
  • Processing
    • Azure Stream Analytics
    • Azure Databricks
  • Analytics / Platform
    • Azure Synapse Analytics
    • Microsoft Fabric

✔ Key distinctions:

  • Event Hubs → ingestion
  • Stream Analytics → real-time processing
  • Databricks → advanced processing
  • Fabric → unified analytics platform

✔ Exam tip:
👉 Streaming ingestion → Event Hubs
👉 Real-time processing → Stream Analytics
👉 Advanced analytics → Databricks
👉 Unified solution → Fabric


Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

Practice Questions: Identify Microsoft Cloud Services for real-time analytics (DP-900 Exam Prep)

Practice Questions


Question 1

Which Azure service is primarily used for ingesting large volumes of streaming data?

A. Azure Data Factory
B. Azure Event Hubs
C. Azure SQL Database
D. Azure Files

Answer: B

Explanation:
Azure Event Hubs is designed for high-throughput event ingestion in real time.


Question 2

Which Azure service is specifically designed for ingesting data from IoT devices?

A. Azure Blob Storage
B. Azure IoT Hub
C. Azure Synapse Analytics
D. Azure Table Storage

Answer: B

Explanation:
Azure IoT Hub enables secure communication with IoT devices and ingests telemetry data.


Question 3

Which Azure service allows real-time data processing using a SQL-like query language?

A. Azure Databricks
B. Azure Data Factory
C. Azure Stream Analytics
D. Azure Virtual Machines

Answer: C

Explanation:
Azure Stream Analytics processes streaming data using SQL-like queries.


Question 4

Which service is BEST suited for advanced real-time analytics and machine learning on streaming data?

A. Azure Files
B. Azure Databricks
C. Azure Table Storage
D. Azure DNS

Answer: B

Explanation:
Azure Databricks supports advanced analytics, Spark processing, and ML workflows.


Question 5

Which service provides a unified analytics platform that includes real-time analytics capabilities?

A. Azure Virtual Machines
B. Azure Blob Storage
C. Microsoft Fabric
D. Azure Files

Answer: C

Explanation:
Microsoft Fabric integrates real-time analytics, data engineering, and BI into one platform.


Question 6

Which component of a real-time analytics solution is responsible for capturing incoming data?

A. Processing
B. Storage
C. Visualization
D. Ingestion

Answer: D

Explanation:
The ingestion layer is responsible for capturing streaming data.


Question 7

You need to process streaming data with minimal setup using SQL-like queries. Which service should you choose?

A. Azure Databricks
B. Azure Synapse Analytics
C. Azure Stream Analytics
D. Azure Data Factory

Answer: C

Explanation:
Stream Analytics is ideal for simple, real-time processing with SQL syntax.


Question 8

Which service is MOST appropriate for handling millions of streaming events per second?

A. Azure SQL Database
B. Azure Files
C. Azure Event Hubs
D. Azure Table Storage

Answer: C

Explanation:
Event Hubs is built for high-throughput event ingestion at scale.


Question 9

Which of the following describes a typical real-time analytics pipeline?

A. Storage → Visualization → Ingestion → Processing
B. Processing → Ingestion → Storage → Visualization
C. Ingestion → Processing → Storage → Visualization
D. Visualization → Storage → Processing → Ingestion

Answer: C

Explanation:
The standard flow is:
Ingestion → Processing → Storage → Visualization


Question 10

Which scenario BEST demonstrates a real-time analytics use case?

A. Generating a yearly financial report
B. Archiving historical data
C. Monitoring live sensor data and triggering alerts
D. Migrating legacy databases

Answer: C

Explanation:
Real-time analytics is used for immediate insights and actions, such as alerts from live data.


✅ Quick Exam Takeaways

✔ Real-time analytics = low-latency insights from streaming data

✔ Core services:

  • Ingestion
    • Azure Event Hubs
    • Azure IoT Hub
  • Processing
    • Azure Stream Analytics
    • Azure Databricks
  • Platform
    • Microsoft Fabric

✔ Key roles:

  • Event Hubs → ingestion
  • Stream Analytics → real-time processing
  • Databricks → advanced analytics
  • Fabric → unified analytics platform

✔ Exam tip:
👉 Ingest streaming data → Event Hubs
👉 Process with SQL → Stream Analytics
👉 Advanced analytics → Databricks
👉 End-to-end solution → Fabric


Go to the DP-900 Exam Prep Hub main page.