Category: azure

Describe the difference between Batch and Streaming data (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe an analytics workload (25–30%)
--> Describe considerations for real-time data analytics
--> Describe the difference between Batch and Streaming data


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Understanding the difference between batch data and streaming data is fundamental for designing modern analytics solutions. These two approaches define how data is ingested, processed, and analyzed.


What Is Batch Data?

Batch data refers to data that is:

  • Collected over a period of time
  • Processed in large chunks (batches)
  • Handled at scheduled intervals

Key Characteristics of Batch Data

  • High latency (minutes, hours, or days)
  • Processes large volumes at once
  • Typically scheduled (e.g., nightly jobs)
  • Efficient and cost-effective

Common Use Cases

  • Daily sales reports
  • Monthly financial summaries
  • Historical data analysis
  • Data warehousing workloads

Azure Services for Batch Processing

  • Azure Data Factory → batch ingestion and orchestration
  • Azure Synapse Analytics → batch processing and analytics

What Is Streaming Data?

Streaming data refers to data that is:

  • Generated continuously
  • Processed in real time (or near real time)
  • Handled as individual events or small micro-batches

Key Characteristics of Streaming Data

  • Low latency (seconds or milliseconds)
  • Continuous data flow
  • Enables real-time insights
  • Often requires more complex processing

Common Use Cases

  • IoT sensor monitoring
  • Fraud detection
  • Live dashboards
  • Website activity tracking

Azure Services for Streaming

  • Azure Event Hubs → event ingestion
  • Azure Stream Analytics → real-time processing

Batch vs Streaming — Key Differences

FeatureBatch ProcessingStreaming Processing
Data FlowPeriodicContinuous
LatencyHighLow
Data SizeLarge chunksSmall events
ComplexitySimplerMore complex
CostLowerHigher
Use CaseHistorical analysisReal-time insights

When to Use Batch Processing

Choose batch when:

  • Real-time data is not required
  • You are working with large historical datasets
  • Cost efficiency is important
  • Processing can occur on a schedule

When to Use Streaming Processing

Choose streaming when:

  • You need real-time or near real-time insights
  • Data is generated continuously
  • Immediate action is required

Hybrid Approaches (Lambda / Modern Architectures)

Many modern systems use both:

  • Batch layer → historical analysis
  • Streaming layer → real-time insights

✔ Example:

  • Real-time dashboard + nightly aggregated reports

Why This Matters for DP-900

On the exam, you may be asked to:

  • Distinguish between batch and streaming scenarios
  • Choose the appropriate processing method
  • Identify Azure services for each approach
  • Understand trade-offs (latency, cost, complexity)

Summary — Exam-Relevant Takeaways

Batch processing

  • Processes data in chunks
  • Higher latency
  • Lower cost
  • Best for historical analysis

Streaming processing

  • Processes data continuously
  • Low latency
  • Enables real-time insights
  • More complex

✔ Azure services:

  • Batch → Azure Data Factory, Azure Synapse Analytics
  • Streaming → Azure Event Hubs, Azure Stream Analytics

✔ Exam tip:
👉 Real-time requirement → Streaming
👉 Scheduled / historical → Batch


Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

Practice Questions: Describe the difference between Batch and Streaming data (DP-900 Exam Prep)

Practice Questions


Question 1

What is the primary characteristic of batch data processing?

A. Continuous data flow
B. Real-time processing
C. Processing data in scheduled chunks
D. Immediate event handling

Answer: C

Explanation:
Batch processing handles data in groups at scheduled intervals, not continuously.


Question 2

Which type of processing is BEST suited for real-time analytics?

A. Batch processing
B. Stream processing
C. Periodic processing
D. Manual processing

Answer: B

Explanation:
Stream processing enables real-time or near real-time insights.


Question 3

Which Azure service is commonly used for streaming data ingestion?

A. Azure Data Factory
B. Azure Event Hubs
C. Azure Synapse Analytics
D. Azure SQL Database

Answer: B

Explanation:
Azure Event Hubs is designed for high-throughput, real-time data ingestion.


Question 4

Which scenario is BEST suited for batch processing?

A. Monitoring live stock prices
B. Detecting fraud in real time
C. Generating a monthly financial report
D. Tracking website clicks instantly

Answer: C

Explanation:
Batch processing is ideal for scheduled, periodic workloads like reports.


Question 5

What is the typical latency for streaming data processing?

A. Hours
B. Days
C. Seconds or milliseconds
D. Weeks

Answer: C

Explanation:
Streaming processing provides low-latency, near real-time results.


Question 6

Which Azure service is used to process streaming data in real time?

A. Azure Blob Storage
B. Azure Stream Analytics
C. Azure Files
D. Azure Virtual Machines

Answer: B

Explanation:
Azure Stream Analytics processes streaming data in real time.


Question 7

Which statement about batch processing is TRUE?

A. It processes data continuously
B. It always requires real-time data sources
C. It is typically more cost-effective than streaming
D. It has lower latency than streaming

Answer: C

Explanation:
Batch processing is generally more cost-efficient than continuous streaming.


Question 8

Which scenario requires streaming processing?

A. Archiving old data
B. Processing annual tax records
C. Monitoring IoT sensor data in real time
D. Generating quarterly reports

Answer: C

Explanation:
Streaming is needed for continuous, real-time data flows like IoT.


Question 9

What is a key difference between batch and streaming processing?

A. Batch uses structured data, streaming does not
B. Streaming has higher latency than batch
C. Batch processes data in chunks, streaming processes data continuously
D. Streaming is always cheaper than batch

Answer: C

Explanation:
Batch = periodic chunks, Streaming = continuous flow.


Question 10

Which approach would you choose if immediate action is required based on incoming data?

A. Batch processing
B. Stream processing
C. Scheduled processing
D. Offline processing

Answer: B

Explanation:
Streaming is required when real-time decisions are needed.


✅ Quick Exam Takeaways

Batch processing

  • Scheduled
  • High latency
  • Cost-effective
  • Best for historical analysis

Streaming processing

  • Continuous
  • Low latency
  • Real-time insights
  • More complex

✔ Azure services:

  • Batch → Azure Data Factory, Azure Synapse Analytics
  • Streaming → Azure Event Hubs, Azure Stream Analytics

✔ Exam tip:
👉 Real-time = Streaming
👉 Scheduled/historical = Batch


Go to the DP-900 Exam Prep Hub main page.

Describe options for analytical data stores (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe an analytics workload (25–30%)
--> Describe common elements of large-scale analytics
--> Describe options for analytical data stores


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Analytical data stores are designed to support reporting, business intelligence, and large-scale data analysis. For the DP-900 exam, you should understand the different types of analytical stores, their characteristics, and when to use each.


What Is an Analytical Data Store?

An analytical data store is optimized for:

  • Querying large volumes of data
  • Aggregations and reporting
  • Historical analysis

✔ Unlike transactional systems, analytical stores focus on read-heavy workloads rather than frequent updates.


Key Characteristics

  • Optimized for complex queries and aggregations
  • Stores historical data
  • Handles large datasets (TBs to PBs)
  • Typically uses denormalized schemas
  • Designed for high-performance reads

Main Types of Analytical Data Stores


1. Data Warehouse

Definition

A structured repository designed for relational analytical queries.

Key Features

  • Uses structured data
  • Schema-based (often star or snowflake schema)
  • Supports SQL queries

Azure Example

Azure Synapse Analytics

Use Cases

  • Business intelligence reporting
  • Financial analysis
  • Enterprise dashboards

Best for: Structured data and SQL-based analytics


2. Data Lake

Definition

A storage repository for raw data in its native format.

Key Features

  • Supports structured, semi-structured, and unstructured data
  • Schema-on-read (schema applied when querying)
  • Highly scalable and cost-effective

Azure Example

Azure Data Lake Storage

Use Cases

  • Big data analytics
  • Machine learning
  • Storing raw ingestion data

Best for: Flexible, large-scale data storage


3. Data Lakehouse (Conceptual)

Definition

A hybrid approach combining features of data lakes and data warehouses.

Key Features

  • Stores raw data like a data lake
  • Supports structured queries like a warehouse
  • Often uses open formats (e.g., Parquet, Delta)

Azure Context

  • Often implemented using:
    • Azure Data Lake Storage
    • Azure Synapse Analytics

Best for: Unified analytics platform


4. Analytical Databases / Big Data Processing Systems

Definition

Systems designed for distributed processing of large datasets.

Azure Example

Azure Synapse Analytics

Key Features

  • Parallel processing
  • Handles massive datasets
  • Supports batch and interactive queries

Best for: Large-scale analytics workloads


Comparison of Analytical Data Stores

FeatureData WarehouseData LakeLakehouse
Data TypeStructuredAll typesAll types
SchemaSchema-on-writeSchema-on-readHybrid
CostHigherLowerModerate
FlexibilityLowHighHigh
Query PerformanceHighVariableHigh

Key Design Considerations


1. Data Structure

  • Structured → Data warehouse
  • Mixed or raw → Data lake

2. Query Requirements

  • Complex SQL queries → Data warehouse
  • Exploratory analytics → Data lake

3. Cost

  • Data lakes are generally more cost-effective
  • Warehouses provide optimized performance at higher cost

4. Scalability

  • All Azure analytical stores scale
  • Data lakes excel in massive data storage

5. Performance Needs

  • Warehouses → optimized for speed
  • Lakes → optimized for storage and flexibility

Typical Analytics Architecture

  1. Data Ingestion
    • Batch or streaming
  2. Storage
    • Data lake or data warehouse
  3. Processing
    • Transformations and aggregations
  4. Visualization
    • BI tools (e.g., Power BI)

Why This Matters for DP-900

On the exam, you may be asked to:

  • Identify the correct analytical store for a scenario
  • Compare data lakes vs data warehouses
  • Understand schema-on-read vs schema-on-write
  • Recognize Azure services used for analytics

Summary — Exam-Relevant Takeaways

✔ Analytical data stores are used for:

  • Reporting
  • Analytics
  • Historical data analysis

✔ Main types:

  • Data Warehouse → structured, high-performance queries
  • Data Lake → raw, flexible storage
  • Lakehouse → hybrid approach

✔ Key concepts:

  • Schema-on-write (warehouse)
  • Schema-on-read (lake)

✔ Azure services to know:

  • Azure Synapse Analytics → data warehouse & analytics
  • Azure Data Lake Storage → scalable data lake

✔ Exam tip:
👉 Structured + SQL analytics → Data Warehouse
👉 Raw + flexible + big data → Data Lake


Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

Practice Questions: Describe options for analytical data stores (DP-900 Exam Prep)

Practice Questions


Question 1

What is the primary purpose of an analytical data store?

A. To process high-volume transactions
B. To store temporary application data
C. To support reporting and data analysis
D. To manage user authentication

Answer: C

Explanation:
Analytical data stores are optimized for reporting, querying, and analysis, not transactions.


Question 2

Which type of data store is BEST suited for structured data and complex SQL queries?

A. Data lake
B. Data warehouse
C. File storage
D. Key-value store

Answer: B

Explanation:
Data warehouses are designed for structured data and high-performance SQL queries.


Question 3

Which Azure service is commonly used as a data warehouse?

A. Azure Data Lake Storage
B. Azure Synapse Analytics
C. Azure Files
D. Azure Table Storage

Answer: B

Explanation:
Azure Synapse Analytics provides data warehousing and large-scale analytics capabilities.


Question 4

What is a key characteristic of a data lake?

A. Requires predefined schema before loading data
B. Stores only structured data
C. Stores data in its raw format
D. Optimized for transactional workloads

Answer: C

Explanation:
Data lakes store raw data in native formats, supporting schema-on-read.


Question 5

Which concept describes applying schema when data is read rather than when it is written?

A. Schema-on-write
B. Schema-on-read
C. Data normalization
D. Data partitioning

Answer: B

Explanation:
Schema-on-read is used in data lakes, allowing flexible analysis.


Question 6

Which scenario is BEST suited for a data lake?

A. Financial reporting with strict schema
B. Running complex SQL joins on structured data
C. Storing raw IoT and log data for later analysis
D. Processing online transactions

Answer: C

Explanation:
Data lakes are ideal for large volumes of raw, diverse data.


Question 7

Which analytical data store typically uses schema-on-write?

A. Data lake
B. Data warehouse
C. Object storage
D. Key-value store

Answer: B

Explanation:
Data warehouses require a defined schema before data is loaded.


Question 8

Which of the following best describes a data lakehouse?

A. A transactional database system
B. A file storage system only
C. A hybrid of data lake and data warehouse
D. A key-value storage solution

Answer: C

Explanation:
A lakehouse combines flexibility of data lakes with performance of warehouses.


Question 9

Which factor is MOST important when choosing between a data lake and a data warehouse?

A. Screen resolution
B. Data structure and query requirements
C. Programming language
D. User interface design

Answer: B

Explanation:
The choice depends on data type (structured vs raw) and query needs.


Question 10

Which Azure service is BEST suited for storing large volumes of raw, unstructured data?

A. Azure SQL Database
B. Azure Data Lake Storage
C. Azure Synapse Analytics
D. Azure Table Storage

Answer: B

Explanation:
Azure Data Lake Storage is optimized for large-scale raw data storage.


✅ Quick Exam Takeaways

✔ Analytical data stores support:

  • Reporting
  • Business intelligence
  • Large-scale analytics

✔ Main types:

  • Data Warehouse → structured, SQL, high performance
  • Data Lake → raw, flexible, scalable
  • Lakehouse → hybrid approach

✔ Key concepts:

  • Schema-on-write → warehouse
  • Schema-on-read → lake

✔ Azure services:

  • Azure Synapse Analytics → data warehouse / analytics
  • Azure Data Lake Storage → data lake

✔ Exam tip:
👉 Structured + SQL → Data Warehouse
👉 Raw + flexible → Data Lake


Go to the DP-900 Exam Prep Hub main page.

Describe considerations for data ingestion and processing (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe an analytics workload (25–30%)
--> Describe common elements of large-scale analytics
--> Describe considerations for data ingestion and processing


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

In modern data platforms, data ingestion and processing are critical steps that determine how raw data becomes meaningful insights. For the DP-900 exam, you should understand how data enters a system, how it is transformed, and the key design considerations involved.


What Is Data Ingestion?

Data ingestion is the process of collecting and importing data from various sources into a storage or analytics system.

Common Data Sources

  • Databases (relational and NoSQL)
  • Files (CSV, JSON, logs)
  • Streaming data (IoT devices, sensors)
  • Applications and APIs

Types of Data Ingestion


1. Batch Ingestion

  • Data is collected and processed at scheduled intervals
  • Suitable for large volumes of data
  • Higher latency (not real-time)

✔ Example:

  • Daily sales data uploads

✔ Common Azure service:
Azure Data Factory


2. Stream (Real-Time) Ingestion

  • Data is ingested continuously as it is generated
  • Low latency (near real-time processing)

✔ Example:

  • IoT sensor data
  • Live website activity

✔ Common Azure services:

  • Azure Event Hubs
  • Azure Stream Analytics

What Is Data Processing?

Data processing involves transforming raw data into a usable format for analysis.

Typical Processing Tasks

  • Cleaning data (removing errors, duplicates)
  • Transforming formats (e.g., JSON → tabular)
  • Aggregating data (summaries, totals)
  • Enriching data (adding additional context)

Types of Data Processing


1. Batch Processing

  • Processes large datasets at scheduled intervals
  • Efficient for historical analysis

✔ Example:

  • Monthly financial reporting

✔ Common Azure service:

  • Azure Synapse Analytics

2. Stream Processing

  • Processes data in real time as it arrives
  • Enables immediate insights and actions

✔ Example:

  • Fraud detection
  • Real-time dashboards

✔ Common Azure service:

  • Azure Stream Analytics

Key Considerations for Data Ingestion and Processing


1. Latency Requirements

  • Batch → Higher latency (minutes/hours)
  • Streaming → Low latency (seconds)

✔ Choose based on how quickly insights are needed.


2. Data Volume and Velocity

  • Large datasets require scalable solutions
  • High-velocity data requires streaming platforms

✔ Azure services are designed to scale automatically.


3. Data Variety

  • Structured, semi-structured, and unstructured data
  • Requires flexible processing tools

4. Data Quality

  • Ensure accuracy and consistency
  • Clean and validate data during processing

5. Scalability

  • Systems must handle increasing data sizes
  • Cloud platforms provide elastic scaling

6. Cost Optimization

  • Batch processing is generally more cost-efficient
  • Streaming may cost more due to continuous processing

7. Reliability and Fault Tolerance

  • Ensure data is not lost during ingestion
  • Use checkpointing and retry mechanisms

Common Architecture Pattern

A typical analytics pipeline:

  1. Ingestion
    • Batch: Azure Data Factory
    • Stream: Azure Event Hubs
  2. Storage
    • Data lake or storage account
  3. Processing
    • Batch: Azure Synapse Analytics
    • Stream: Azure Stream Analytics
  4. Visualization
    • Reporting tools (e.g., Power BI)

Batch vs Stream — Quick Comparison

FeatureBatch ProcessingStream Processing
Data FlowPeriodicContinuous
LatencyHighLow
Use CaseHistorical analysisReal-time insights
CostLowerHigher

Why This Matters for DP-900

On the exam, you may be asked to:

  • Distinguish between batch and stream processing
  • Identify appropriate ingestion methods
  • Choose Azure services based on scenarios
  • Understand trade-offs (latency, cost, scalability)

Summary — Exam-Relevant Takeaways

Data ingestion = bringing data into the system
Data processing = transforming data for analysis

✔ Two main patterns:

  • Batch → periodic, high latency
  • Streaming → real-time, low latency

✔ Key considerations:

  • Latency
  • Volume and velocity
  • Data quality
  • Scalability
  • Cost

✔ Azure services to know:

  • Azure Data Factory (batch ingestion)
  • Azure Event Hubs (stream ingestion)
  • Azure Stream Analytics (real-time processing)
  • Azure Synapse Analytics (batch processing)

Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

Practice Questions: Describe considerations for data ingestion and processing (DP-900 Exam Prep)

Practice Questions


Question 1

What is the primary purpose of data ingestion?

A. To visualize data
B. To store data permanently
C. To collect and import data into a system
D. To delete outdated data

Answer: C

Explanation:
Data ingestion is the process of bringing data into a storage or analytics system.


Question 2

Which type of ingestion processes data at scheduled intervals?

A. Stream ingestion
B. Batch ingestion
C. Real-time ingestion
D. Event-driven ingestion

Answer: B

Explanation:
Batch ingestion processes data periodically, not continuously.


Question 3

Which Azure service is commonly used for batch data ingestion?

A. Azure Event Hubs
B. Azure Data Factory
C. Azure Stream Analytics
D. Azure Virtual Machines

Answer: B

Explanation:
Azure Data Factory is designed for batch ETL/ELT workflows.


Question 4

Which scenario requires stream (real-time) ingestion?

A. Monthly sales reporting
B. Archiving old data
C. Monitoring live sensor data from IoT devices
D. Migrating historical records

Answer: C

Explanation:
Streaming ingestion is used for continuous, real-time data like IoT.


Question 5

What is the primary benefit of stream processing?

A. Lower cost
B. Simpler architecture
C. Real-time insights
D. Reduced storage requirements

Answer: C

Explanation:
Stream processing enables low-latency, real-time analysis.


Question 6

Which Azure service is used for real-time data ingestion at scale?

A. Azure Synapse Analytics
B. Azure Blob Storage
C. Azure Event Hubs
D. Azure Files

Answer: C

Explanation:
Azure Event Hubs is designed for high-throughput streaming ingestion.


Question 7

Which type of processing is BEST suited for historical data analysis?

A. Stream processing
B. Batch processing
C. Real-time processing
D. Event-driven processing

Answer: B

Explanation:
Batch processing is ideal for large, historical datasets.


Question 8

Which factor is MOST important when choosing between batch and stream processing?

A. File format
B. Latency requirements
C. Storage account type
D. Programming language

Answer: B

Explanation:
The key decision is how quickly the data needs to be processed.


Question 9

Which Azure service is used to process streaming data in real time?

A. Azure Data Factory
B. Azure Stream Analytics
C. Azure SQL Database
D. Azure Files

Answer: B

Explanation:
Azure Stream Analytics processes real-time streaming data.


Question 10

Which of the following is a key consideration when designing a data ingestion pipeline?

A. Screen resolution
B. Latency, scalability, and data volume
C. Programming language syntax
D. User interface design

Answer: B

Explanation:
Important considerations include latency, scalability, volume, and data quality.


✅ Quick Exam Takeaways

Data ingestion = bringing data into the system
Data processing = transforming data for analysis

✔ Two main approaches:

  • Batch → scheduled, high latency
  • Streaming → continuous, low latency

✔ Key Azure services:

  • Azure Data Factory → batch ingestion
  • Azure Event Hubs → streaming ingestion
  • Azure Stream Analytics → real-time processing
  • Azure Synapse Analytics → batch processing

✔ Key decision factor:
👉 Do you need real-time insights or not?


Go to the DP-900 Exam Prep Hub main page.

Practice Questions: Describe Azure Cosmos DB APIs (DP-900 Exam Prep)

Practice Questions


Question 1

Which API in Azure Cosmos DB uses a SQL-like query language?

A. Gremlin API
B. Cassandra API
C. Core (SQL) API
D. Table API

Answer: C

Explanation:
The Core (SQL) API uses a SQL-like syntax to query JSON documents.


Question 2

Which Azure Cosmos DB API is BEST suited for applications currently using MongoDB?

A. Core (SQL) API
B. MongoDB API
C. Cassandra API
D. Table API

Answer: B

Explanation:
The MongoDB API provides compatibility with MongoDB drivers and queries.


Question 3

Which API should you choose for graph-based data and relationships?

A. Table API
B. Cassandra API
C. Gremlin API
D. MongoDB API

Answer: C

Explanation:
The Gremlin API is designed for graph data models and relationship analysis.


Question 4

Which API in Cosmos DB is most similar to Azure Table Storage?

A. MongoDB API
B. Cassandra API
C. Table API
D. Core (SQL) API

Answer: C

Explanation:
The Table API uses a key-value model similar to Azure Table Storage.


Question 5

Which statement about Azure Cosmos DB APIs is TRUE?

A. You can switch APIs after creating the account
B. Each API uses a different query language and data model
C. All APIs use T-SQL
D. APIs determine storage redundancy

Answer: B

Explanation:
Each API has its own data model and query language.


Question 6

Which API would you choose for a distributed system currently using Apache Cassandra?

A. Core (SQL) API
B. MongoDB API
C. Cassandra API
D. Gremlin API

Answer: C

Explanation:
The Cassandra API supports Cassandra Query Language (CQL) and workloads.


Question 7

Which API is the default and most commonly used in Azure Cosmos DB?

A. Table API
B. Gremlin API
C. Core (SQL) API
D. Cassandra API

Answer: C

Explanation:
The Core (SQL) API is the most commonly used and general-purpose API.


Question 8

Which scenario is BEST suited for the Table API?

A. Complex graph traversal
B. Large-scale relational queries
C. Simple key-value data storage
D. Document-based analytics

Answer: C

Explanation:
The Table API is ideal for simple, scalable key-value storage.


Question 9

What is a key consideration when choosing a Cosmos DB API?

A. The size of the storage account
B. The number of virtual machines
C. The application’s existing data model and query language
D. The type of Azure subscription

Answer: C

Explanation:
API selection depends on existing technologies and data models.


Question 10

Which statement best describes Azure Cosmos DB APIs?

A. Each API uses a different underlying database engine
B. APIs provide different ways to interact with the same service
C. APIs are only used for relational data
D. APIs determine the pricing tier only

Answer: B

Explanation:
All APIs use the same Cosmos DB service but offer different interfaces and models.


✅ Quick Exam Takeaways

✔ Cosmos DB APIs allow different ways to interact with the same service

✔ APIs:

  • Core (SQL) → SQL-like queries (most common)
  • MongoDB → MongoDB compatibility
  • Cassandra → Distributed systems (CQL)
  • Table → Key-value storage
  • Gremlin → Graph data

✔ Key concepts:

  • API choice depends on data model and existing system
  • API selection is permanent after creation

✔ Exam tip:
👉 Match data model → API type


Go to the DP-900 Exam Prep Hub main page.

Describe Azure Cosmos DB APIs (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe considerations for working with non-relational data on Azure (15–20%)
--> Describe Capabilities and Features of Azure Cosmos DB
--> Describe Azure Cosmos DB APIs


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Azure Cosmos DB supports multiple APIs that allow developers to interact with the database using different data models and familiar query languages.

For the DP-900 exam, you should understand what these APIs are, how they differ, and when to use each one.


What Are Azure Cosmos DB APIs?

APIs in Azure Cosmos DB define:

  • How data is structured
  • How it is queried
  • Which tools and SDKs are used

✔ Each API provides a different way to interact with the same underlying Cosmos DB service.


Why Multiple APIs?

Azure Cosmos DB supports multiple APIs to:

  • Allow developers to use familiar tools
  • Enable easy migration from existing systems
  • Support different types of applications and data models

💡 Key idea:
👉 Choose the API based on your application’s existing technology or data model


Core Azure Cosmos DB APIs


1. Core (SQL) API

Also known as the SQL API.

Key Features

  • Uses a SQL-like query language
  • Stores data as JSON documents
  • Most commonly used API

Use Cases

  • New application development
  • General-purpose NoSQL workloads

Best for: Developers familiar with SQL who want flexibility


2. MongoDB API

Key Features

  • Compatible with MongoDB drivers and tools
  • Uses MongoDB query syntax

Use Cases

  • Migrating existing MongoDB applications
  • Applications already using MongoDB

Best for: MongoDB workloads moving to Azure


3. Cassandra API

Key Features

  • Compatible with Apache Cassandra
  • Supports Cassandra Query Language (CQL)

Use Cases

  • Large-scale distributed workloads
  • Applications using Cassandra

Best for: Cassandra-based systems needing cloud scalability


4. Table API

Key Features

  • Similar to Azure Table Storage
  • Key-value data model
  • Uses OData-based queries

Use Cases

  • Simple key-value workloads
  • Applications already using Table Storage

Best for: Lightweight, scalable key-value scenarios


5. Gremlin API

Key Features

  • Supports graph data models
  • Uses Gremlin query language

Use Cases

  • Graph-based applications
  • Relationship-heavy data

Best for: Social networks, recommendation engines, network analysis


Key Differences Between APIs

APIData ModelQuery LanguageBest For
Core (SQL)Document (JSON)SQL-likeGeneral-purpose apps
MongoDBDocumentMongoDB queryMongoDB migration
CassandraWide-columnCQLDistributed systems
TableKey-valueODataSimple scalable storage
GremlinGraphGremlinRelationship-based data

Important Concepts for DP-900


1. Same Service, Different Interfaces

All APIs run on Azure Cosmos DB, but:

  • Each API has its own endpoint
  • Each uses different query syntax
  • Each supports different SDKs

2. API Choice Is Permanent

  • You choose the API when creating a Cosmos DB account
  • You cannot switch APIs later

3. Performance and Features Are Shared

  • Global distribution
  • Low latency
  • High availability
  • Scalability

✔ These benefits apply regardless of API choice.


When to Choose Each API

  • Core (SQL) API → Default choice for most applications
  • MongoDB API → Existing MongoDB apps
  • Cassandra API → Distributed, large-scale systems
  • Table API → Simple key-value workloads
  • Gremlin API → Graph relationships

Why This Matters for DP-900

On the exam, you may be asked to:

  • Identify the correct API for a scenario
  • Match APIs to data models
  • Understand why multiple APIs exist
  • Recognize migration scenarios

Summary — Exam-Relevant Takeaways

✔ Azure Cosmos DB supports multiple APIs:

  • Core (SQL) API
  • MongoDB API
  • Cassandra API
  • Table API
  • Gremlin API

✔ Each API:

  • Uses a different data model
  • Has its own query language

✔ Key concept:
👉 Choose the API based on your application’s needs or existing system

✔ Important:

  • API choice is fixed at creation
  • All APIs benefit from Cosmos DB features (scalability, global distribution)

Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

Identify use cases for Azure Cosmos DB (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe considerations for working with non-relational data on Azure (15–20%)
--> Describe capabilities and features of Azure Cosmos DB
--> Identify use cases for Azure Cosmos DB


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Azure Cosmos DB is a fully managed, globally distributed database service designed for modern applications that require low latency, massive scalability, and flexible data models.

For the DP-900 exam, you should understand when and why to use Azure Cosmos DB, especially compared to other Azure storage and database services.


What Is Azure Cosmos DB?

Azure Cosmos DB is a NoSQL, multi-model database service that supports:

  • Global distribution across multiple regions
  • Low-latency reads and writes
  • Automatic scaling
  • Multiple APIs (Core SQL, MongoDB, Cassandra, Table, Gremlin)

✔ It is designed for high-performance, internet-scale applications.


Key Characteristics That Drive Use Cases

Understanding Cosmos DB use cases starts with its capabilities:

1. Global Distribution

  • Replicate data across multiple Azure regions
  • Users access data from the closest region

✔ Enables global applications with low latency


2. Low Latency

  • Single-digit millisecond response times
  • Ideal for real-time applications

3. Massive Scalability

  • Scales throughput and storage independently
  • Handles millions of requests per second

4. Flexible Schema

  • Schema-less (JSON-based data model)
  • Supports evolving application requirements

5. Multiple APIs

  • Supports different data models:
    • SQL (Core API)
    • MongoDB
    • Cassandra
    • Table
    • Gremlin (graph)

✔ Allows developers to use familiar tools and frameworks


Common Use Cases for Azure Cosmos DB


1. Global Web and Mobile Applications

Scenario

Applications with users distributed worldwide.

Why Cosmos DB?

  • Global distribution
  • Low latency access
  • High availability

✔ Example:

  • Social media platforms
  • E-commerce applications

2. Real-Time Personalization

Scenario

Applications that tailor content to users instantly.

Why Cosmos DB?

  • Fast read/write performance
  • Flexible schema

✔ Example:

  • Product recommendations
  • Personalized dashboards

3. IoT and Telemetry Data

Scenario

Large volumes of streaming data from devices.

Why Cosmos DB?

  • High ingestion rates
  • Scalable storage
  • Schema flexibility

✔ Example:

  • Sensor data collection
  • Smart devices

4. Gaming Applications

Scenario

Online games requiring real-time interactions.

Why Cosmos DB?

  • Low latency
  • Global availability
  • High throughput

✔ Example:

  • Leaderboards
  • Player profiles
  • Game state storage

5. E-commerce Platforms

Scenario

High-traffic applications with variable workloads.

Why Cosmos DB?

  • Elastic scalability
  • Fast performance
  • Global distribution

✔ Example:

  • Shopping carts
  • Product catalogs

6. Content Management Systems

Scenario

Managing diverse and evolving content.

Why Cosmos DB?

  • Schema-less design
  • Flexible data models

✔ Example:

  • Blogs
  • Media platforms

7. Event-Driven and Microservices Architectures

Scenario

Modern distributed applications.

Why Cosmos DB?

  • Scales independently per service
  • Supports high-throughput operations

✔ Example:

  • Microservices storing independent datasets

When NOT to Use Azure Cosmos DB

Cosmos DB is not ideal when:

  • You need complex joins and relational queries
  • You require strict relational consistency across multiple tables
  • Your workload is small and cost-sensitive

✔ In these cases, relational databases like Azure SQL may be more appropriate.


Cosmos DB vs Other Azure Storage Options

ServiceBest For
Blob StorageUnstructured files (images, videos)
Azure FilesFile shares
Table StorageSimple key-value storage
Cosmos DBGlobal, high-performance NoSQL apps

Why This Matters for DP-900

On the exam, you may be asked to:

  • Identify appropriate Cosmos DB use cases
  • Choose Cosmos DB for global, low-latency applications
  • Compare it with other Azure storage services
  • Recognize scenarios requiring scalability and flexibility

Summary — Exam-Relevant Takeaways

✔ Azure Cosmos DB = globally distributed NoSQL database

✔ Key strengths:

  • Low latency
  • Global distribution
  • Massive scalability
  • Flexible schema

✔ Common use cases:

  • Global apps
  • Real-time personalization
  • IoT and telemetry
  • Gaming
  • E-commerce

✔ Not suitable for:

  • Complex relational workloads
  • Heavy join operations

✔ Key decision factor:
👉 High scale + low latency + global users = Cosmos DB


Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

Practice Questions: Identify use cases for Azure Cosmos DB (DP-900 Exam Prep)

Practice Questions


Question 1

Which scenario is BEST suited for Azure Cosmos DB?

A. Running complex SQL joins across multiple tables
B. Storing structured financial transactions with strict relational constraints
C. Supporting a globally distributed mobile application with low latency
D. Hosting a traditional on-premises file share

Answer: C

Explanation:
Cosmos DB is ideal for globally distributed applications requiring low latency.


Question 2

Which type of application benefits MOST from Cosmos DB’s global distribution capabilities?

A. Local desktop application
B. Single-region reporting system
C. Global e-commerce website
D. Batch processing system

Answer: C

Explanation:
Global applications benefit from multi-region replication and low latency.


Question 3

Which use case is BEST suited for Cosmos DB?

A. Data warehouse for historical reporting
B. IoT application collecting real-time sensor data
C. Relational database with complex joins
D. File storage for images

Answer: B

Explanation:
Cosmos DB is optimized for high-ingestion, real-time data scenarios like IoT.


Question 4

Why is Cosmos DB suitable for real-time personalization scenarios?

A. It enforces strict relational schemas
B. It supports high-latency operations
C. It provides low-latency read/write performance
D. It requires predefined schemas

Answer: C

Explanation:
Low latency enables instant updates and responses for personalization.


Question 5

Which application would MOST benefit from Cosmos DB?

A. Payroll system requiring strict ACID compliance across multiple tables
B. Static website hosting images
C. Gaming application storing player state globally
D. Spreadsheet-based reporting system

Answer: C

Explanation:
Gaming apps require low latency, high throughput, and global availability.


Question 6

Which scenario is NOT a good fit for Cosmos DB?

A. Global content management system
B. Real-time analytics dashboard
C. Complex relational reporting with joins
D. Social media application

Answer: C

Explanation:
Cosmos DB is not ideal for complex relational queries or joins.


Question 7

Which feature of Cosmos DB makes it ideal for microservices architectures?

A. Fixed schema design
B. Independent scalability for each service
C. Requirement for relational constraints
D. Limited throughput options

Answer: B

Explanation:
Each microservice can scale independently using Cosmos DB.


Question 8

Which use case involves storing flexible, evolving data structures?

A. Financial ledger system
B. Product catalog with changing attributes
C. Relational reporting system
D. Fixed-schema inventory system

Answer: B

Explanation:
Cosmos DB’s schema-less design supports evolving data models.


Question 9

Which scenario best demonstrates Cosmos DB’s high-throughput capabilities?

A. Processing monthly reports
B. Handling millions of real-time user requests
C. Archiving old documents
D. Storing backup files

Answer: B

Explanation:
Cosmos DB is designed for high-throughput, real-time workloads.


Question 10

Which Azure service would you choose for a globally distributed application requiring millisecond response times?

A. Azure Blob Storage
B. Azure Files
C. Azure Cosmos DB
D. Azure Table Storage

Answer: C

Explanation:
Cosmos DB is specifically designed for low-latency, globally distributed applications.


✅ Quick Exam Takeaways

✔ Cosmos DB = global, low-latency NoSQL database

✔ Best for:

  • Global web/mobile apps
  • IoT and telemetry
  • Gaming
  • Real-time personalization
  • Microservices

✔ Key strengths:

  • Global distribution
  • Massive scalability
  • Flexible schema
  • High throughput

✔ Not ideal for:

  • Complex joins
  • Strict relational workloads

✔ Exam tip:
👉 If you see “global + real-time + high scale” → think Cosmos DB


Go to the DP-900 Exam Prep Hub main page.