
This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections:
Prepare data
--> Get data
--> Ingest or access data as needed
A core responsibility of a Microsoft Fabric Analytics Engineer is deciding how data should be brought into Fabric—or whether it should be brought in at all. For the DP-600 exam, this topic focuses on selecting the right ingestion or access pattern based on performance, freshness, cost, and governance requirements.
Ingest vs. Access: Key Concept
Before choosing a tool or method, understand the distinction:
- Ingest data: Physically copy data into Fabric-managed storage (OneLake)
- Access data: Query or reference data where it already lives, without copying
The exam frequently tests your ability to choose the most appropriate option—not just a working one.
Common Data Ingestion Methods in Microsoft Fabric
1. Dataflows Gen2
Best for:
- Low-code ingestion and transformation
- Reusable ingestion logic
- Business-friendly data preparation
Key characteristics:
- Uses Power Query Online
- Supports scheduled refresh
- Stores results in OneLake (Lakehouse or Warehouse)
- Ideal for centralized, governed ingestion
Exam tip:
Use Dataflows Gen2 when reuse, transformation, and governance are priorities.
2. Data Pipelines (Copy Activity)
Best for:
- High-volume or frequent ingestion
- Orchestration across multiple sources
- ELT-style workflows
Key characteristics:
- Supports many source and sink types
- Enables scheduling, dependencies, and retries
- Minimal transformation (primarily copy)
Exam tip:
Choose pipelines when performance and orchestration matter more than transformation.
3. Notebooks (Spark)
Best for:
- Complex transformations
- Data science or advanced engineering
- Custom ingestion logic
Key characteristics:
- Full control using Spark (PySpark, Scala, SQL)
- Suitable for large-scale processing
- Writes directly to OneLake
Exam tip:
Notebooks are powerful but require engineering skills—don’t choose them for simple ingestion scenarios.
Accessing Data Without Ingesting
1. OneLake Shortcuts
Best for:
- Avoiding data duplication
- Reusing data across workspaces
- Accessing external storage
Key characteristics:
- Logical reference only (no copy)
- Supports ADLS Gen2 and Amazon S3
- Appears native in Lakehouse tables or files
Exam tip:
Shortcuts are often the best answer when the question mentions avoiding duplication or reducing storage cost.
2. DirectQuery
Best for:
- Near-real-time data access
- Large datasets that cannot be imported
- Centralized source-of-truth systems
Key characteristics:
- Queries run against the source system
- Performance depends on source
- Limited modeling flexibility compared to Import
Exam tip:
Expect trade-off questions involving DirectQuery vs. Import.
3. Real-Time Access (Eventstreams / KQL)
Best for:
- Streaming and telemetry data
- Operational and real-time analytics
Key characteristics:
- Event-driven ingestion
- Supports near-real-time dashboards
- Often discovered via Real-Time hub
Exam tip:
Use real-time ingestion when freshness is measured in seconds, not hours.
Choosing the Right Approach (Exam-Critical)
You should be able to decide based on these factors:
| Requirement | Best Option |
| Reusable ingestion logic | Dataflows Gen2 |
| High-volume copy | Data pipelines |
| Complex transformations | Notebooks |
| Avoid duplication | OneLake shortcuts |
| Near real-time reporting | DirectQuery / Eventstreams |
| Governance and trust | Ingestion + endorsement |
Governance and Security Considerations
- Ingested data can inherit sensitivity labels
- Access-based methods rely on source permissions
- Workspace roles determine who can ingest or access data
- Endorsed datasets should be preferred for reuse
DP-600 often frames ingestion questions within a governance context.
Common Exam Scenarios
You may be asked to:
- Choose between ingesting data or accessing it directly
- Identify when shortcuts are preferable to ingestion
- Select the right tool for a specific ingestion pattern
- Balance data freshness vs. performance
- Reduce duplication across workspaces
Best Practices to Remember
- Ingest when performance and modeling flexibility are required
- Access when freshness, cost, or duplication is a concern
- Centralize ingestion logic for reuse
- Prefer Fabric-native patterns over external tools
- Let business requirements drive architectural decisions
Key Takeaway
For the DP-600 exam, “Ingest or access data as needed” is about making intentional, informed choices. Microsoft Fabric provides multiple ways to bring data into analytics solutions, and the correct approach depends on scale, freshness, reuse, governance, and cost. Understanding why one method is better than another is far more important than memorizing features.
Practice Questions:
Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …
- Identifying and understand why an option is correct (or incorrect) — not just which one
- Look for and understand the usage scenario of keywords in exam questions (for example, low code/no code, large dataset, high-volume data, reuse, complex transformations)
- Expect scenario-based questions rather than direct definitions
Also, keep in mind that …
- DP-600 questions often include multiple valid options, but only one that best aligns with the scenario’s constraints. Always identify and consider factors such as:
- Data volume
- Freshness requirements
- Reuse and duplication concerns
- Transformation complexity
1. What is the primary difference between ingesting data and accessing data in Microsoft Fabric?
A. Ingested data cannot be secured
B. Accessed data is always slower
C. Ingesting copies data into OneLake, while accessing queries data in place
D. Accessed data requires a gateway
Correct Answer: C
Explanation:
Ingestion physically copies data into Fabric-managed storage (OneLake), while access-based approaches query or reference data where it already exists.
2. Which option is BEST when the goal is to avoid duplicating large datasets across multiple workspaces?
A. Import mode
B. Dataflows Gen2
C. OneLake shortcuts
D. Notebooks
Correct Answer: C
Explanation:
OneLake shortcuts allow data to be referenced without copying it, making them ideal for reuse and cost control.
3. A team needs reusable, low-code ingestion logic with scheduled refresh. Which Fabric feature should they use?
A. Spark notebooks
B. Data pipelines
C. Dataflows Gen2
D. DirectQuery
Correct Answer: C
Explanation:
Dataflows Gen2 provide Power Query–based ingestion with refresh scheduling and reuse across Fabric items.
4. Which ingestion method is MOST appropriate for complex transformations requiring custom logic?
A. Dataflows Gen2
B. Copy activity in pipelines
C. OneLake shortcuts
D. Spark notebooks
Correct Answer: D
Explanation:
Spark notebooks offer full control over transformation logic and are suited for complex, large-scale processing.
5. When should DirectQuery be preferred over Import mode?
A. When the dataset is small
B. When data freshness is critical
C. When transformations are complex
D. When performance must be maximized
Correct Answer: B
Explanation:
DirectQuery is preferred when near-real-time access to data is required, even though performance depends on the source system.
6. Which Fabric component is BEST suited for orchestrating high-volume data ingestion with dependencies and retries?
A. Dataflows Gen2
B. Data pipelines
C. Semantic models
D. Power BI Desktop
Correct Answer: B
Explanation:
Data pipelines are designed for orchestration, handling large volumes of data, scheduling, and dependency management.
7. A dataset is queried infrequently but must support advanced modeling features. Which approach is most appropriate?
A. DirectQuery
B. Access via shortcut
C. Import into OneLake
D. Eventstream ingestion
Correct Answer: C
Explanation:
Import mode supports full modeling capabilities and high query performance, making it suitable even for infrequently accessed data.
8. Which scenario best fits the use of real-time ingestion methods such as Eventstreams or KQL databases?
A. Monthly financial reporting
B. Static reference data
C. IoT telemetry and operational monitoring
D. Slowly changing dimensions
Correct Answer: C
Explanation:
Real-time ingestion is designed for continuous, event-driven data such as IoT telemetry and operational metrics.
9. Why might ingesting data be preferred over accessing it directly?
A. It always reduces storage costs
B. It eliminates the need for security
C. It improves performance and modeling flexibility
D. It avoids data refresh
Correct Answer: C
Explanation:
Ingesting data into OneLake enables faster query performance and full support for modeling features.
10. Which factor is MOST important when deciding between ingesting data and accessing it?
A. The color of the dashboard
B. The number of reports
C. Business requirements such as freshness, scale, and governance
D. The Fabric region
Correct Answer: C
Explanation:
The decision to ingest or access data should be driven by business needs, including performance, freshness, cost, and governance—not technical convenience alone.

One thought on “Ingest or Access Data as Needed”