Month: May 2026

Describe responsibilities for data engineers (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe core data concepts (25–30%)
--> Identify roles and responsibilities for data workloads
--> Describe responsibilities for database engineers


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Data engineers play a foundational role in modern data ecosystems. They are responsible for designing, building, and maintaining data systems and pipelines that enable organizations to collect, store, and process data for analysis.

For the DP-900 exam, you should understand what data engineers do, how they differ from other roles, and how their work supports analytics and business intelligence.


What Is a Data Engineer?

A data engineer is responsible for:

  • Designing and building data pipelines
  • Integrating data from multiple sources
  • Transforming raw data into usable formats
  • Ensuring data is available, reliable, and scalable

They act as the bridge between raw data sources and analytics systems.


Core Responsibilities of a Data Engineer


1. Data Ingestion

Data engineers collect data from various sources, such as:

  • Transactional databases
  • Application logs
  • IoT devices
  • External APIs

They design processes to ingest data into storage systems like data lakes or data warehouses.

This can be:

  • Batch ingestion (scheduled loads)
  • Streaming ingestion (real-time data flow)

2. Data Transformation and Processing

Raw data is often messy and inconsistent. Data engineers:

  • Clean and validate data
  • Transform it into structured formats
  • Aggregate and enrich datasets

This process is often referred to as ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform).


3. Building Data Pipelines

Data engineers design and maintain data pipelines, which automate the movement and transformation of data.

Pipelines typically include:

  • Data ingestion
  • Data transformation
  • Data storage
  • Data delivery to analytics tools

Pipelines must be:

  • Reliable
  • Scalable
  • Efficient

4. Managing Data Storage Solutions

Data engineers choose and manage appropriate storage systems based on use cases:

  • Data lakes for raw and large-scale data
  • Data warehouses for structured analytical data
  • Databases for operational data

They ensure data is stored in formats optimized for processing (e.g., Parquet).


5. Ensuring Data Quality

Data engineers are responsible for maintaining high-quality data by:

  • Validating data accuracy
  • Handling missing or inconsistent data
  • Implementing data validation rules

High-quality data is essential for reliable analytics.


6. Optimizing Data Performance

To ensure efficient data processing, data engineers:

  • Optimize data pipelines
  • Choose efficient file formats (e.g., columnar formats)
  • Partition and index data where appropriate

This improves performance for downstream analytics.


7. Supporting Analytical Workloads

Data engineers prepare data for:

  • Data analysts
  • Data scientists
  • Business intelligence tools

They ensure that curated datasets are:

  • Clean
  • Structured
  • Easy to query

8. Monitoring and Maintaining Data Systems

Data engineers monitor pipelines and systems to ensure:

  • Data is processed successfully
  • Failures are detected and resolved
  • Systems remain scalable and reliable

They often use logging, alerts, and monitoring tools.


Data Engineer Responsibilities in Azure

Azure provides a wide range of services that data engineers use:


Data Ingestion & Integration

  • Azure Data Factory → Orchestrates ETL/ELT pipelines
  • Azure Event Hubs → Handles streaming data ingestion

Data Storage

  • Azure Data Lake Storage Gen2 → Scalable storage for raw and processed data
  • Azure Blob Storage → General-purpose object storage

Data Processing

  • Azure Databricks → Apache Spark-based data processing
  • Azure Synapse Analytics → Unified analytics platform

Data Transformation & Orchestration

  • Pipeline orchestration using Data Factory or Synapse pipelines
  • Batch and streaming transformations

Data Engineer vs Other Roles

Understanding role distinctions is important for DP-900:

RolePrimary Focus
Data EngineerBuild pipelines, manage data flow
DBAManage database performance and security
Data AnalystAnalyze data and create reports
Data ScientistBuild predictive models and ML solutions

Why This Matters for DP-900

On the exam, you may be asked to:

  • Identify tasks performed by data engineers
  • Distinguish data engineers from DBAs or analysts
  • Recognize tools and services used in data engineering
  • Understand how data pipelines support analytics

Summary — Exam-Relevant Takeaways

✔ Data engineers build and manage data pipelines
✔ They handle data ingestion, transformation, and storage
✔ They ensure data quality, reliability, and scalability
✔ They support analytical workloads by preparing clean datasets
✔ In Azure, they commonly use:

  • Azure Data Factory
  • Azure Data Lake Storage
  • Azure Databricks
  • Azure Synapse Analytics

✔ They act as the bridge between raw data and insights


Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

Practice Questions: Describe responsibilities for data analysts (DP-900 Exam Prep)

Practice Questions


Question 1

Which task is a primary responsibility of a data analyst?

A. Building data pipelines
B. Managing database security
C. Creating dashboards and reports
D. Configuring storage systems

Answer: C

Explanation:
Data analysts focus on visualizing data and creating reports/dashboards.


Question 2

A company wants to understand sales trends over the past year using visual reports.

Which role is MOST appropriate?

A. Data Engineer
B. Database Administrator
C. Data Analyst
D. Network Engineer

Answer: C

Explanation:
Data analysts analyze historical data and create insights through reports and dashboards.


Question 3

Which tool is most commonly used by data analysts in Azure environments?

A. Azure Data Factory
B. Azure DevOps
C. Power BI
D. Azure Kubernetes Service

Answer: C

Explanation:
Power BI is the primary tool for data visualization and reporting.


Question 4

Which activity is MOST associated with a data analyst?

A. Designing ETL pipelines
B. Writing SQL queries to explore data
C. Managing server infrastructure
D. Encrypting databases

Answer: B

Explanation:
Data analysts commonly use SQL to query and analyze data.


Question 5

What is the main goal of a data analyst?

A. Store large volumes of raw data
B. Build machine learning models
C. Turn data into actionable insights
D. Manage database performance

Answer: C

Explanation:
Data analysts focus on interpreting data and generating insights for decision-making.


Question 6

Which task is LEAST likely to be performed by a data analyst?

A. Creating a sales dashboard
B. Identifying trends in data
C. Building data ingestion pipelines
D. Summarizing business performance

Answer: C

Explanation:
Building pipelines is a data engineer responsibility, not an analyst task.


Question 7

A data analyst receives cleaned and structured data from a data warehouse. What is their PRIMARY focus?

A. Data ingestion
B. Data transformation
C. Data visualization and analysis
D. Database administration

Answer: C

Explanation:
Analysts work with prepared data to analyze and visualize insights.


Question 8

Which statement best describes the role of a data analyst?

A. They design physical database servers
B. They create and maintain ETL pipelines
C. They analyze data to support business decisions
D. They manage user permissions in databases

Answer: C

Explanation:
Data analysts focus on interpreting data and supporting decision-making.


Question 9

Which Azure service is MOST directly associated with data analyst reporting?

A. Azure Data Lake Storage
B. Azure Synapse Analytics (SQL querying)
C. Azure Virtual Machines
D. Azure Key Vault

Answer: B

Explanation:
Data analysts often query and analyze data using Azure Synapse Analytics.


Question 10

Which activity involves communicating insights from data to business stakeholders?

A. Data encryption
B. Data visualization and reporting
C. Database replication
D. Network configuration

Answer: B

Explanation:
Data analysts communicate findings through visualizations, dashboards, and reports.


✅ Key Exam Takeaways

For DP-900, remember:

✔ Data analysts focus on analysis, visualization, and reporting
✔ They work with cleaned, structured data
✔ They commonly use Power BI and SQL
✔ Their goal is to support business decision-making
✔ They do NOT typically build pipelines or manage databases


Go to the DP-900 Exam Prep Hub main page.

Describe responsibilities for data analysts (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe core data concepts (25–30%)
--> Identify roles and responsibilities for data workloads
--> Describe responsibilities for database analysts


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Data analysts play a key role in turning data into insights that drive business decisions. While data engineers prepare and organize data, and DBAs manage databases, data analysts focus on exploring, analyzing, and presenting data in meaningful ways.

For the DP-900 exam, you should understand what data analysts do, how their responsibilities differ from other roles, and how they use tools (especially in Azure) to deliver insights.


What Is a Data Analyst?

A data analyst is responsible for:

  • Exploring and interpreting data
  • Identifying trends and patterns
  • Creating reports and visualizations
  • Communicating insights to stakeholders

Their primary goal is to help organizations make data-driven decisions.


Core Responsibilities of a Data Analyst


1. Data Exploration and Analysis

Data analysts examine datasets to:

  • Identify trends and patterns
  • Detect anomalies or outliers
  • Answer business questions

They often use:

  • SQL queries
  • Data exploration tools
  • Statistical techniques (basic level for DP-900)

2. Data Visualization

A major responsibility is presenting data visually in a clear and meaningful way.

This includes creating:

  • Charts (bar, line, pie, etc.)
  • Dashboards
  • Interactive reports

Visualization helps stakeholders quickly understand insights.


3. Reporting and Dashboard Creation

Data analysts build reports that summarize data and track key metrics.

These reports may include:

  • Sales performance dashboards
  • Operational KPIs
  • Financial summaries

Reports are often refreshed regularly to provide up-to-date insights.


4. Querying Data

Data analysts use query languages (like SQL) to:

  • Retrieve specific data
  • Filter and aggregate datasets
  • Join data from multiple sources

They typically work with analytical datasets prepared by data engineers.


5. Communicating Insights

Data analysts translate technical findings into business-friendly insights.

This includes:

  • Writing summaries
  • Presenting findings to stakeholders
  • Explaining trends and recommendations

Strong communication skills are essential.


6. Working with Cleaned and Curated Data

Unlike data engineers, analysts usually do not handle raw data pipelines.

Instead, they work with:

  • Cleaned datasets
  • Structured data models
  • Data warehouses or semantic models

This allows them to focus on analysis rather than data preparation.


Data Analyst Responsibilities in Azure

Data analysts commonly use Azure tools designed for analytics and visualization:


Microsoft Power BI

The primary tool for data analysts in Azure environments:

  • Create interactive dashboards and reports
  • Connect to multiple data sources
  • Perform data modeling and transformation (Power Query)
  • Share insights across the organization

Azure Synapse Analytics (Query Layer)

Data analysts may:

  • Query data using SQL
  • Access data warehouse or lakehouse data
  • Perform analysis on large datasets

Azure SQL Database / Data Warehouse

Analysts retrieve structured data from:

  • Relational databases
  • Data warehouses

Data Analyst vs Other Roles

Understanding role differences is important for DP-900:

RolePrimary Focus
Data AnalystAnalyze data, create reports, visualize insights
Data EngineerBuild pipelines, prepare and transform data
DBAManage database performance, security, availability
Data ScientistBuild predictive models and advanced analytics

Why This Matters for DP-900

On the exam, you may be asked to:

  • Identify responsibilities of a data analyst
  • Distinguish analyst tasks from engineering or DBA tasks
  • Recognize tools used for visualization and reporting
  • Understand how analysts use data to support decisions

Summary — Exam-Relevant Takeaways

✔ Data analysts focus on analyzing and visualizing data
✔ Key responsibilities include:

  • Data exploration
  • Querying data (SQL)
  • Creating reports and dashboards
  • Communicating insights

✔ They primarily work with cleaned, structured data
✔ In Azure, they commonly use:

  • Power BI
  • Azure Synapse (querying)
  • Azure SQL / data warehouses

✔ Their goal is to turn data into actionable insights


Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

Describe Features of Transactional Workloads (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe core data concepts (25–30%)
--> Describe common data workloads
--> Describe features of transactional workloads


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

In the DP-900 exam, transactional workloads are one of the foundational types of data workloads you should understand. These workloads are central to many operational systems in enterprises and are fundamentally different from analytical or big data workloads. This article explains what transactional workloads are, their key features, and how they map to Azure services.


What Is a Transactional Workload?

A transactional workload refers to data processing that focuses on day-to-day operations — capturing and managing business events as they occur. These workloads are typically found in systems that require frequent inserts, updates, and deletes with strong guarantees of consistency and isolation.

Examples:

  • Online purchasing systems
  • Banking and financial transactions
  • Inventory management
  • Customer relationship management (CRM)

Transactional workloads form the core of OLTP (Online Transaction Processing) systems.


Key Features of Transactional Workloads

Transactional workloads have several defining characteristics that distinguish them from analytical workloads:


1. ACID Properties

Transactional systems must maintain ACID properties to ensure reliable data processing:

PropertyWhat It Means
AtomicityAll actions of a transaction succeed or fail as one unit
ConsistencyData remains valid and consistent before & after a transaction
IsolationConcurrent transactions do not interfere with each other
DurabilityOnce a transaction completes, changes are permanent

ACID guarantees are critical for business correctness in financial and operational systems.


2. High Volume of Small Operations

Transactional workloads are made up of many small, frequent operations, such as:

  • Adding a new customer
  • Updating an order status
  • Recording a payment

These operations are typically short-lived and affect a small number of rows or records at a time.


3. Real-Time or Near Real-Time Requirements

Transactional systems often have low latency requirements — applications and users expect fast responses:

  • Web applications must serve users in milliseconds
  • Background services must process requests quickly

This is crucial for user experience (e.g., online ordering, banking apps).


4. Strong Consistency and Integrity

Transactional workloads require strict guarantees that data remains reliable at all times:

  • No partial updates
  • Referential integrity between related tables
  • Synchronous processing of business events

These guarantees prevent data corruption and enforce business rules.


5. Normalized Data Structures

Transactional systems usually use normalized schemas — breaking data into related tables to reduce redundancy and maintain integrity.

For example:

  • Separate tables for Customers, Orders, OrderItems
  • Use of primary keys and foreign keys to enforce relationships

Normalized designs are efficient for transactional updates.


Transactional vs. Analytical Workloads

Understanding the difference between transactional and analytical workloads is important for the DP-900 exam.

FeatureTransactional WorkloadAnalytical Workload
Primary UseOperational processingReporting & insights
Data StructureNormalizedDenormalized
Query TypesSimple & frequentComplex & batch
Response TimeLow latencyHigh throughput
ExamplesPoint of sale, CRMDashboards, BI

Transactional systems support operational needs — operations performed today — while analytical systems support strategic decision-making based on historical data.


How Transactional Workloads Map to Azure Services

Azure provides managed services that support transactional workloads with strong consistency and fast response times.


Azure SQL Database

A fully managed relational database service that supports:

  • ACID transactions
  • High availability
  • Automatic updates and backups

It’s suitable for most OLTP systems such as:

  • E-commerce platforms
  • Customer management systems

Azure Database for PostgreSQL and MySQL

Managed open-source relational databases that provide:

  • ACID compliance
  • Compatibility with existing tools
  • Horizontal scaling options

These are good choices when applications already use PostgreSQL or MySQL.


Azure Cosmos DB (Transactional Core)

Although often thought of as NoSQL, Cosmos DB can support transactional workloads using:

  • Transactional batch operations within a partition
  • Strong consistency models

Cosmos DB is especially useful when you need globally distributed transactional systems.


Exam-Relevant Takeaways

When preparing for DP-900, focus on these points:

✔ Transactional workloads are all about frequent, small, consistent operations.
✔ They must satisfy ACID properties to ensure correctness.
✔ They prioritize real-time or near-real-time responsiveness.
✔ They typically use normalized data models.
✔ Azure services such as Azure SQL Database, Azure Database for PostgreSQL/MySQL, and Azure Cosmos DB support transactional use cases.


Summary — What You Need to Know for the Exam

  • A transactional workload handles operational tasks (e.g., orders, banking operations).
  • These workloads require ACID properties and fast, dependable performance.
  • Normalized schemas help maintain consistency and integrity.
  • Azure supports transactional workloads with services designed for reliable transaction processing.

Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

Practice Questions: Describe Features of Analytical Workloads (DP-900 Exam Prep)

Practice Questions


Question 1

Which scenario best represents an analytical workload?

A. Recording a new customer order
B. Updating inventory quantities
C. Generating a yearly sales trends report
D. Processing a credit card payment

Answer: C

Explanation:
Analytical workloads focus on reporting and historical analysis, not real-time operations.


Question 2

Analytical workloads primarily involve which type of queries?

A. Short insert and update statements
B. Point lookups by primary key
C. Complex queries with aggregations
D. Transactional batch commits

Answer: C

Explanation:
Analytical workloads typically use complex SELECT queries with GROUP BY, SUM, AVG, etc.


Question 3

Which characteristic is MOST associated with analytical workloads?

A. Many small write operations
B. ACID transaction enforcement
C. Read-heavy access to large datasets
D. Millisecond response requirements

Answer: C

Explanation:
Analytical systems mainly read and aggregate large volumes of data.


Question 4

Which schema design is commonly used for analytical workloads?

A. Fully normalized schema
B. Hierarchical schema
C. Denormalized star schema
D. Key-value schema

Answer: C

Explanation:
Analytical systems often use denormalized schemas (such as star schemas) to improve query performance.


Question 5

Which Azure service is MOST appropriate for enterprise-scale analytical reporting?

A. Azure SQL Database
B. Azure Synapse Analytics
C. Azure Table Storage
D. Azure Queue Storage

Answer: B

Explanation:
Azure Synapse Analytics is designed for large-scale analytical and data warehousing workloads.


Question 6

Which statement about analytical workloads is TRUE?

A. They prioritize low-latency updates
B. They mainly support operational business processes
C. They often analyze historical data
D. They require normalized schemas

Answer: C

Explanation:
Analytical workloads typically analyze historical and aggregated data for insights.


Question 7

Which storage format is commonly used to optimize analytical queries?

A. Row-based text files
B. Columnar formats such as Parquet
C. Binary key-value files
D. XML documents

Answer: B

Explanation:
Columnar formats like Parquet improve performance for analytical queries by minimizing I/O.


Question 8

Which workload characteristic differentiates analytical systems from transactional systems?

A. Use of indexes
B. Support for SQL
C. Focus on throughput over latency
D. Ability to store structured data

Answer: C

Explanation:
Analytical systems prioritize processing large volumes of data efficiently rather than ultra-fast response times.


Question 9

A company combines data from sales, marketing, and customer systems to build dashboards in Power BI.

What type of workload is this?

A. Transactional
B. Streaming
C. Analytical
D. Operational

Answer: C

Explanation:
Combining multiple sources for dashboards and insights is an analytical workload.


Question 10

Which activity is LEAST likely to be part of an analytical workload?

A. Running aggregate queries
B. Creating executive dashboards
C. Performing nightly ETL jobs
D. Updating individual customer records

Answer: D

Explanation:
Updating individual records is transactional, not analytical.


✅ Exam Memory Anchors

For DP-900, remember analytical workloads as:

✔ OLAP
✔ Large datasets
✔ Complex read-heavy queries
✔ Aggregations & reporting
✔ Historical analysis
✔ Denormalized schemas
✔ Columnar storage
✔ Azure Synapse + Data Lake
✔ Power BI consumers


Go to the DP-900 Exam Prep Hub main page.

Describe Features of Analytical Workloads (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe core data concepts (25–30%)
--> Describe common data workloads
--> Describe features of analytical workloads


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Analytical workloads are essential for deriving insights from data. Unlike transactional workloads — which support day-to-day operations — analytical workloads focus on querying, aggregating, summarizing, and analyzing large volumes of data to help with reporting, decision making, and trends.


What Is an Analytical Workload?

An analytical workload refers to data processing that is oriented toward analysis, rather than operational updates. It is optimized for:

  • Complex queries
  • Aggregations across large datasets
  • Historical analysis and reporting
  • Business intelligence (BI)

Analytical workloads are often associated with OLAP (Online Analytical Processing) systems.


Key Features of Analytical Workloads

1. Large Volumes of Data

Analytical systems often operate on datasets that are:

  • Much larger than transactional tables
  • Historical — spanning months or years of records
  • Combined from multiple sources (e.g., transactional systems, logs, external data)

These datasets can be stored in data warehouses, data lakes, or big data systems.


2. Complex, Read-Heavy Queries

Analytical workloads are dominated by complex SELECT queries, often involving:

  • Aggregations (SUM, AVG, COUNT)
  • Grouping by categories
  • Filtering on multiple dimensions
  • Joining large tables

These queries can be computationally intensive and are often used for reporting and dashboards.


3. Denormalized or Columnar Storage

Unlike transactional systems that use normalized schemas, analytical workloads often use:

  • Denormalized schemas (e.g., star or snowflake schemas)
  • Columnar storage formats (e.g., Parquet, ORC)

These formats improve query performance by minimizing I/O and enabling efficient aggregation.


4. Longer Query Response Times (But High Throughput)

Queries in analytical systems are not always expected to return results in milliseconds, as they:

  • Scan large amounts of data
  • Compute aggregates and summaries
  • May be optimized for throughput rather than low latency

This contrasts with transactional systems where fast, small transactions are critical.


5. Batch or Bulk Processing

Analytical workloads often rely on:

  • Batch ingestion of data (e.g., nightly ETL jobs)
  • Data transformation pipelines (cleaning, aggregating, enriching)
  • Tools like Azure Data Factory, Databricks, or Synapse pipelines

These pipelines prepare data for analytics and reporting.


6. Support for BI and Reporting Tools

Analytical workloads integrate with business intelligence tools, such as:

  • Power BI
  • Excel
  • Azure Synapse Analytics Studio

These tools connect directly to analytical stores to produce dashboards, charts, and insights.


Analytical vs Transactional Workloads — Quick Comparison

FeatureTransactionalAnalytical
Primary PurposeOperational processing (OLTP)Decision support & reporting (OLAP)
Data SizeSmall to moderateLarge or very large
Workload TypeFrequent inserts/updates/deletesComplex queries/aggregations
SchemaNormalizedOften denormalized
Query FocusSingle record operationsScanning many records
Typical ToolsRelational OLTP databasesData warehouses, big data systems

Where Analytical Workloads Run in Azure

Azure offers several services optimized for analytical workloads:

Azure Synapse Analytics

A unified analytics service that enables:

  • Data warehousing
  • Big data processing
  • Integration with Spark and SQL
  • High-performance analytics

It is ideal for large-scale reporting and BI scenarios.


Azure Data Lake Storage + Analytics

Azure Data Lake Storage Gen2 works with:

  • Apache Spark
  • Azure Databricks
  • Synapse Analytics

This combination supports big data analytics, machine learning, and data science workloads.


Azure SQL Data Warehouse (Synapse Dedicated SQL Pools)

This is the former SQL DW offering (now part of Synapse) optimized for:

  • Massive parallel processing
  • Distributed query execution
  • High-volume analytical queries

Why Analytical Workloads Matter for DP-900

For DP-900, you should be able to:

  • Define analytical workloads and distinguish them from transactional workloads
  • Recognize use cases where analytical workloads are appropriate
  • Identify Azure services designed for analytical processing
  • Understand schema design and storage options that support analytics

Being able to describe these features shows your understanding of how modern data ecosystems support business intelligence and analytics.


Summary — Exam-Relevant Takeaways

✔ Analytical workloads focus on complex queries and analysis across large datasets
✔ They use denormalized schemas and columnar storage to boost performance
✔ They are optimized for throughput and summarization, not real-time transactions
✔ They typically support reports, dashboards, and insights
✔ Azure services like Azure Synapse Analytics, Azure Data Lake, and Databricks support these workloads


Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

Practice Questions: Describe Features of Transactional Workloads (DP-900 Exam Prep)

Practice Questions


Question 1

Which scenario best represents a transactional workload?

A. Generating monthly sales reports
B. Training a machine learning model
C. Recording a customer purchase in real time
D. Visualizing historical trends

Answer: C

Explanation:
Transactional workloads capture operational business events as they occur.


Question 2

Which characteristic is most closely associated with transactional workloads?

A. Large batch queries
B. Complex aggregations
C. Frequent small read/write operations
D. Historical trend analysis

Answer: C

Explanation:
Transactional systems perform many small, fast inserts, updates, and deletes.


Question 3

Which ACID property ensures that completed transactions are permanently saved?

A. Atomicity
B. Consistency
C. Isolation
D. Durability

Answer: D

Explanation:
Durability guarantees that once a transaction commits, it remains stored even after failures.


Question 4

A banking system transfers money between accounts. If either debit or credit fails, both must roll back.

Which ACID property does this demonstrate?

A. Consistency
B. Isolation
C. Atomicity
D. Durability

Answer: C

Explanation:
Atomicity ensures that a transaction is all-or-nothing.


Question 5

Transactional workloads typically use which type of schema design?

A. Denormalized
B. Star schema
C. Snowflake schema
D. Normalized

Answer: D

Explanation:
Transactional systems usually use normalized schemas to reduce redundancy and enforce integrity.


Question 6

Which Azure service is MOST appropriate for a traditional OLTP application?

A. Azure Synapse Analytics
B. Azure SQL Database
C. Azure Data Lake Storage
D. Azure Blob Storage

Answer: B

Explanation:
Azure SQL Database is optimized for transactional (OLTP) workloads with ACID support.


Question 7

Which requirement is most critical for transactional workloads?

A. High throughput for batch queries
B. Schema flexibility
C. Low latency and strong consistency
D. Historical data retention

Answer: C

Explanation:
Transactional workloads prioritize fast response times and data consistency.


Question 8

Which workload is LEAST likely to be transactional?

A. Updating inventory levels
B. Processing credit card payments
C. Inserting new customer records
D. Running yearly financial summaries

Answer: D

Explanation:
Yearly summaries are analytical, not transactional.


Question 9

Which statement about transactional workloads is TRUE?

A. They primarily analyze historical data
B. They usually involve complex joins across millions of rows
C. They support operational business processes
D. They are optimized for reporting

Answer: C

Explanation:
Transactional workloads support daily operations such as orders, payments, and updates.


Question 10

An e-commerce application must confirm orders instantly and ensure inventory counts are always correct.

Which workload type does this describe?

A. Analytical
B. Batch
C. Streaming
D. Transactional

Answer: D

Explanation:
Real-time order processing with consistency requirements is transactional.


✅ Exam Tips for Transactional Workloads

For DP-900, remember:

✔ Focus on real-time operational processing
✔ Think OLTP
✔ Many small reads/writes
ACID compliance
Low latency + strong consistency
✔ Typically normalized schemas
✔ Azure SQL Database is the classic example


Go to the DP-900 Exam Prep Hub main page.

Describe Types of Databases (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe core data concepts (25–30%)
--> Identify options for data storage
--> Describe types of databases


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Databases are systems that store and manage data so applications can retrieve, update, and organize it efficiently. For DP-900, you should be familiar with the major types of databases, how they differ, and common use cases — especially in relation to Azure services.


What Is a Database?

A database is an organized collection of data that enables efficient access, management, and update of information. Databases may differ in how they model, structure, and query data depending on the data type, scale, and workload requirements.


Primary Types of Databases

At a high level, databases fall into two broad categories:

  1. Relational Databases
  2. Non-Relational Databases (NoSQL)

1. Relational Databases

Relational databases (RDBMS) store data in tables with rows and columns.

Key Features

  • Structured schema: Tables have defined columns with data types.
  • Relationships: Tables can be linked using keys (e.g., primary and foreign keys).
  • SQL Queries: Use Structured Query Language (SQL) to retrieve and manipulate data.
  • ACID transactions: Support atomicity, consistency, isolation, and durability for reliable data operations.

When to Use

  • Applications requiring strong data integrity
  • Banking, accounting, inventory systems
  • Workloads where relationships among data matter

Examples

  • Azure SQL Database
  • Azure Database for PostgreSQL
  • Azure Database for MySQL

2. Non-Relational Databases (NoSQL)

Non-relational databases, often called NoSQL databases, store data in ways that differ from traditional tables. They are generally schema-less and more flexible, which helps with scalability and handling varied data types.

Key Characteristics

  • No fixed schema
  • Designed for horizontal scaling and large data volumes
  • Support for semi-structured and unstructured data
  • Often optimized for specific access patterns

The most common NoSQL models include:


a. Key-Value Databases

Key-value stores are the simplest type of NoSQL database.

  • Data stored as pairs: key (identifier) and value (data).
  • Efficient for simple lookups when the key is known.

Use cases: Session state, caching, user preferences.


b. Document Databases

Document databases store data as documents, typically in JSON format.

  • Each document is a self-describing object with a unique ID.
  • Supports nested fields and flexible attributes.

Use cases: Content management, user profiles, web apps.


c. Column-Family (Wide-Column) Databases

Column-family databases use tables with column families — groups of related columns that can vary by row.

  • Designed for wide tables where columns are sparse.
  • Good for distributed data and analytical workloads.

Use cases: Time-series data, analytics, event logging.


d. Graph Databases

Graph databases focus on relationships between data elements.

  • Use nodes (entities) and edges (relationships).
  • Optimized for queries involving deep connections (e.g., social networks).

Use cases: Recommendation engines, fraud detection, network analysis.


Relational vs Non-Relational: A Quick Comparison

FeatureRelationalNon-Relational (NoSQL)
SchemaFixedFlexible / Schema-less
Data ModelTablesVaries (documents, keys, graphs)
Query LanguageSQLVaries by database
ScalabilityVertical scalingHorizontal scaling
Typical UseStrong consistency & relationshipsLarge, evolving, semi/unstructured data

How Azure Supports These Databases

Relational Database Services

Azure provides managed relational services:

  • Azure SQL Database: Managed SQL service
  • Azure Database for MySQL and PostgreSQL: Managed open-source options

These are ideal for structured data and transactional workloads.


Non-Relational Database Services

Azure supports NoSQL and other flexible databases:

  • Azure Cosmos DB: A globally distributed, multi-model NoSQL database service that supports document, key-value, column-family, and graph models.

This makes Cosmos DB unique in supporting multiple non-relational data models from a single service.


Why Understanding Types of Databases Matters for DP-900

On the DP-900 exam, you may be asked to:

  • Classify a database type based on a description of its structure.
  • Choose the best database model for a given business scenario.
  • Identify Azure services that match a database type.

Knowing relational vs non-relational databases, and the sub-types of NoSQL models, will help you answer these questions with confidence.


Summary — Exam-Relevant Takeaways

Relational databases store structured data using tables, enforce schemas, and use SQL.
NoSQL databases store non-relational data and include key-value, document, column-family, and graph types.
Azure SQL Database and open-source relational offerings support structured workloads.
Azure Cosmos DB supports multiple non-relational models for schema-flexible data.


Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

Practice Questions: Describe Types of Databases (DP-900 Exam Prep)

Practice Questions


Question 1

You need to store customer orders in tables with fixed columns and enforce relationships between customers and orders.

Which type of database should you use?

A. Graph
B. Document
C. Relational
D. Key-value

Answer: C

Explanation:
Relational databases store structured data in tables with defined schemas and support relationships via keys.


Question 2

Which characteristic best describes a relational database?

A. Schema-less storage
B. Data stored as JSON documents
C. Tables with rows and columns
D. Nodes and edges

Answer: C

Explanation:
Relational databases organize data into tables (rows and columns) and use SQL for querying.


Question 3

An application must store user profiles in flexible JSON documents where each user may have different attributes.

Which database type is most appropriate?

A. Column-family
B. Document
C. Relational
D. Graph

Answer: B

Explanation:
Document databases store data as JSON-like documents and allow flexible schemas — ideal for user profiles.


Question 4

Which Azure service supports multiple NoSQL data models such as Core (SQL) API, Table API, Cassandra API, and Gremlin API?

A. Azure SQL Database
B. Azure Table Storage
C. Azure Cosmos DB
D. Azure Database for PostgreSQL

Answer: C

Explanation:
Azure Cosmos DB is a globally distributed, multi-model NoSQL database service.


Question 5

You are designing a recommendation engine that analyzes relationships between users and products.

Which database type is best suited?

A. Relational
B. Key-value
C. Graph
D. Column-family

Answer: C

Explanation:
Graph databases specialize in relationship-heavy data using nodes and edges.


Question 6

Which statement about NoSQL databases is TRUE?

A. They always require fixed schemas
B. They primarily use SQL
C. They are optimized for horizontal scaling
D. They cannot store structured data

Answer: C

Explanation:
NoSQL databases are designed for horizontal scaling and flexible schemas.


Question 7

You need extremely fast lookups using a unique identifier, and the data structure is simple.

Which NoSQL model should you choose?

A. Document
B. Graph
C. Column-family
D. Key-value

Answer: D

Explanation:
Key-value databases store data as key/value pairs and provide very fast retrieval.


Question 8

Which Azure service is best suited for structured transactional workloads using SQL?

A. Azure Blob Storage
B. Azure Cosmos DB
C. Azure SQL Database
D. Azure Data Lake Storage

Answer: C

Explanation:
Azure SQL Database is a managed relational database service optimized for structured transactional data.


Question 9

Which feature is typically associated with relational databases but not guaranteed in NoSQL systems?

A. Global distribution
B. Flexible schemas
C. ACID transactions
D. Horizontal scaling

Answer: C

Explanation:
Relational databases traditionally provide full ACID transaction support.


Question 10

A company collects massive volumes of time-series telemetry data where columns may vary across rows.

Which database type fits this scenario best?

A. Relational
B. Document
C. Column-family
D. Graph

Answer: C

Explanation:
Column-family (wide-column) databases are well suited for large, sparse datasets such as time-series data.


✅ Key Exam Reminders

For DP-900, make sure you can confidently:

  • Distinguish relational vs non-relational
  • Recognize NoSQL models (key-value, document, column-family, graph)
  • Match Azure services to database types (especially Azure SQL vs Azure Cosmos DB)
  • Choose the right database type for a scenario

Go to the DP-900 Exam Prep Hub main page.

Describe Common Formats for Data Files (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Describe core data concepts (25–30%)
--> Identify options for data storage
--> Describe common formats for data files


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

In DP-900, Microsoft expects you to understand common data file formats, what type of data they typically store (structured, semi-structured, or unstructured), and why certain formats are used in analytics and Azure storage scenarios.

This topic connects directly to Azure Blob Storage, Azure Data Lake Storage, and analytics pipelines.


Why Data File Formats Matter

Data file formats define:

  • How data is organized inside a file
  • Whether the data is human-readable or binary
  • How efficiently it can be stored and queried
  • Which tools and services can process it

Choosing the right format impacts:

  • Performance
  • Storage cost
  • Analytics capabilities
  • Interoperability between systems

For DP-900, focus on understanding what each format is used for, not deep implementation details.


Common Data File Formats You Should Know

1. CSV (Comma-Separated Values)

CSV is one of the simplest and most widely used formats for structured data.

Key Characteristics

  • Plain text
  • Each row represents a record
  • Columns separated by commas (or other delimiters)
  • No embedded schema
  • Human readable

Example:

CustomerID,Name,City
1,John,Seattle
2,Maria,Austin

Typical Use Cases

  • Data exports and imports
  • Simple datasets
  • Spreadsheet interoperability

Exam Notes

  • Represents structured data
  • Lightweight and easy to move between systems
  • No support for nested structures or data types

2. JSON (JavaScript Object Notation)

JSON is the most common format for semi-structured data, especially in modern applications and APIs.

Key Characteristics

  • Key–value pairs
  • Supports nested objects and arrays
  • Self-describing
  • Human readable
  • Schema-on-read

Example:

{
"CustomerID": 1,
"Name": "John",
"Orders": [
{ "OrderID": 100, "Amount": 50 }
]
}

Typical Use Cases

  • Web APIs
  • Application data
  • Azure Cosmos DB documents
  • Logs and telemetry

Exam Notes

  • Represents semi-structured data
  • Flexible schema
  • Commonly used with Azure Cosmos DB and Azure Data Lake

3. XML (Extensible Markup Language)

XML is another semi-structured format that uses tags to describe data.

Key Characteristics

  • Tag-based hierarchy
  • Supports nested structures
  • Human readable but verbose
  • Self-describing

Example:

<Customer>
<CustomerID>1</CustomerID>
<Name>John</Name>
</Customer>

Typical Use Cases

  • Legacy systems
  • Configuration files
  • Enterprise data exchange

Exam Notes

  • Semi-structured
  • Less common than JSON in modern Azure solutions

4. Parquet

Parquet is a columnar, binary file format optimized for analytics workloads.

Key Characteristics

  • Column-based storage
  • Highly compressed
  • Not human readable
  • Very fast for analytical queries

Typical Use Cases

  • Big data analytics
  • Azure Synapse Analytics
  • Azure Data Lake Storage

Exam Notes

  • Used for large analytical datasets
  • Optimized for performance and storage efficiency
  • Common in modern data engineering pipelines

5. Avro

Avro is a binary format designed for data serialization and streaming.

Key Characteristics

  • Compact binary format
  • Includes schema with the data
  • Efficient for data movement
  • Not human readable

Typical Use Cases

  • Data pipelines
  • Event streaming
  • Big data ingestion

Exam Notes

  • Often used behind the scenes in analytics platforms
  • Supports schema evolution

6. Plain Text Files

Simple text files may also be used to store unstructured or loosely structured data.

Examples

  • Log files
  • Notes
  • Raw exports

Exam Notes

  • Usually treated as unstructured data
  • Stored in Azure Blob Storage or Data Lake

How These Formats Map to Data Types

This mapping is important for DP-900 questions:

FormatData Type
CSVStructured
JSONSemi-structured
XMLSemi-structured
ParquetStructured / Analytics
AvroSemi-structured
TXTUnstructured

Where These Formats Are Stored in Azure

You’ll commonly see these formats stored in:

Azure Blob Storage

  • Primary storage for files
  • Supports all formats (CSV, JSON, Parquet, images, etc.)
  • Used for unstructured and semi-structured data

Azure Data Lake Storage Gen2

  • Built on Blob Storage
  • Optimized for analytics
  • Common for Parquet and Avro files
  • Used with Azure Synapse and Azure Data Factory

Why This Matters for DP-900

On the exam, file formats typically appear in scenarios like:

  • Choosing storage for CSV or JSON files
  • Identifying formats used in analytics pipelines
  • Recognizing Parquet in big data workloads
  • Distinguishing structured vs semi-structured file types

You’re expected to understand purpose and characteristics, not internal file mechanics.


Summary — Exam-Relevant Takeaways

For DP-900, remember:

✔ CSV → structured, simple, text-based
✔ JSON / XML → semi-structured, flexible, self-describing
✔ Parquet → columnar, compressed, analytics-optimized
✔ Avro → binary, schema included, streaming-friendly
✔ TXT → unstructured

And:

  • These formats are commonly stored in Azure Blob Storage or Azure Data Lake Storage
  • Analytics formats (Parquet/Avro) are used with Azure Synapse and big data workloads

Go to the Practice Exam Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.