This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub.
This topic falls under these sections:
Describe an analytics workload (25–30%)
--> Describe common elements of large-scale analytics
--> Describe considerations for data ingestion and processing
Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.
In modern data platforms, data ingestion and processing are critical steps that determine how raw data becomes meaningful insights. For the DP-900 exam, you should understand how data enters a system, how it is transformed, and the key design considerations involved.
What Is Data Ingestion?
Data ingestion is the process of collecting and importing data from various sources into a storage or analytics system.
Common Data Sources
- Databases (relational and NoSQL)
- Files (CSV, JSON, logs)
- Streaming data (IoT devices, sensors)
- Applications and APIs
Types of Data Ingestion
1. Batch Ingestion
- Data is collected and processed at scheduled intervals
- Suitable for large volumes of data
- Higher latency (not real-time)
✔ Example:
- Daily sales data uploads
✔ Common Azure service:
Azure Data Factory
2. Stream (Real-Time) Ingestion
- Data is ingested continuously as it is generated
- Low latency (near real-time processing)
✔ Example:
- IoT sensor data
- Live website activity
✔ Common Azure services:
- Azure Event Hubs
- Azure Stream Analytics
What Is Data Processing?
Data processing involves transforming raw data into a usable format for analysis.
Typical Processing Tasks
- Cleaning data (removing errors, duplicates)
- Transforming formats (e.g., JSON → tabular)
- Aggregating data (summaries, totals)
- Enriching data (adding additional context)
Types of Data Processing
1. Batch Processing
- Processes large datasets at scheduled intervals
- Efficient for historical analysis
✔ Example:
- Monthly financial reporting
✔ Common Azure service:
- Azure Synapse Analytics
2. Stream Processing
- Processes data in real time as it arrives
- Enables immediate insights and actions
✔ Example:
- Fraud detection
- Real-time dashboards
✔ Common Azure service:
- Azure Stream Analytics
Key Considerations for Data Ingestion and Processing
1. Latency Requirements
- Batch → Higher latency (minutes/hours)
- Streaming → Low latency (seconds)
✔ Choose based on how quickly insights are needed.
2. Data Volume and Velocity
- Large datasets require scalable solutions
- High-velocity data requires streaming platforms
✔ Azure services are designed to scale automatically.
3. Data Variety
- Structured, semi-structured, and unstructured data
- Requires flexible processing tools
4. Data Quality
- Ensure accuracy and consistency
- Clean and validate data during processing
5. Scalability
- Systems must handle increasing data sizes
- Cloud platforms provide elastic scaling
6. Cost Optimization
- Batch processing is generally more cost-efficient
- Streaming may cost more due to continuous processing
7. Reliability and Fault Tolerance
- Ensure data is not lost during ingestion
- Use checkpointing and retry mechanisms
Common Architecture Pattern
A typical analytics pipeline:
- Ingestion
- Batch: Azure Data Factory
- Stream: Azure Event Hubs
- Storage
- Data lake or storage account
- Processing
- Batch: Azure Synapse Analytics
- Stream: Azure Stream Analytics
- Visualization
- Reporting tools (e.g., Power BI)
Batch vs Stream — Quick Comparison
| Feature | Batch Processing | Stream Processing |
|---|---|---|
| Data Flow | Periodic | Continuous |
| Latency | High | Low |
| Use Case | Historical analysis | Real-time insights |
| Cost | Lower | Higher |
Why This Matters for DP-900
On the exam, you may be asked to:
- Distinguish between batch and stream processing
- Identify appropriate ingestion methods
- Choose Azure services based on scenarios
- Understand trade-offs (latency, cost, scalability)
Summary — Exam-Relevant Takeaways
✔ Data ingestion = bringing data into the system
✔ Data processing = transforming data for analysis
✔ Two main patterns:
- Batch → periodic, high latency
- Streaming → real-time, low latency
✔ Key considerations:
- Latency
- Volume and velocity
- Data quality
- Scalability
- Cost
✔ Azure services to know:
- Azure Data Factory (batch ingestion)
- Azure Event Hubs (stream ingestion)
- Azure Stream Analytics (real-time processing)
- Azure Synapse Analytics (batch processing)
Go to the Practice Exam Questions for this topic.
Go to the DP-900 Exam Prep Hub main page.
