Enrich data by adding new columns and tables

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data
--> Transform data
--> Enrich data by adding new columns and tables

Data enrichment is a critical step in preparing analytics-ready datasets in Microsoft Fabric. This section of the DP-600 exam focuses on how analytics engineers enhance existing data by adding derived columns, augmenting datasets with new tables, and combining multiple data sources to provide more business context and analytical value.

What Does Data Enrichment Mean in Fabric?

Data enrichment involves extending raw or curated datasets with additional attributes, calculations, or related entities so that the data is more useful for reporting, analytics, and downstream consumption.

In Microsoft Fabric, enrichment can occur across:

  • Lakehouses
  • Data Warehouses
  • Dataflows Gen2
  • Power BI semantic models
  • SQL analytics endpoints

Adding New Columns

Common Ways to Add Columns

  • Calculated columns using SQL expressions
  • Derived columns in Dataflows Gen2
  • Computed columns in warehouses or lakehouses
  • Calculated columns in semantic models (DAX)

Typical Use Cases

  • Creating business-friendly attributes (e.g., full name, year-month)
  • Applying business rules (e.g., customer segment, status flags)
  • Performing type conversions or formatting
  • Adding derived metrics (e.g., profit, margin, age)

Where This Is Done

  • Lakehouse / Warehouse (SQL): Persistent, reusable transformations
  • Dataflows Gen2: Low-code, ETL-style transformations
  • Semantic models (DAX): Report-specific or analytical calculations

Exam Tip

Know where to add a column based on reuse, performance, and governance. Foundational logic belongs earlier in the data pipeline.

Adding New Tables

How New Tables Are Created

  • Creating tables from queries or transformations
  • Materializing enriched datasets
  • Joining multiple source tables into curated outputs
  • Creating dimension or fact tables for analytics

Common Use Cases

  • Creating lookup or reference tables
  • Building star schema components
  • Storing aggregated or summarized data
  • Supporting reuse across multiple reports and models

Fabric Components Involved

  • Lakehouse tables (Delta format)
  • Warehouse tables
  • Dataflows Gen2 outputs
  • Shared semantic models

Enrichment Through Joins and Relationships

Enrichment Patterns

  • Joining transactional data with reference data
  • Adding descriptive attributes from lookup tables
  • Combining data from multiple domains (e.g., sales + geography)

Best Practices

  • Use appropriate join types (inner, left, etc.)
  • Ensure consistent data types and keys
  • Validate row counts and data integrity
  • Avoid unnecessary denormalization when not required

Exam Tip

Understand the impact of joins on data volume, performance, and data correctness.

Enrichment Using Dataflows Gen2

Dataflows Gen2 are a key enrichment tool in Fabric:

  • Low-code Power Query transformations
  • Combine, append, and merge datasets
  • Add derived and conditional columns
  • Output enriched tables to OneLake

Ideal for:

  • Source-level enrichment
  • Repeatable, governed transformations
  • Non-SQL-based data preparation

Enrichment in Semantic Models

Some enrichment happens at the modeling layer:

  • Calculated columns (DAX)
  • Calculated tables
  • Role-playing dimensions

However:

  • Semantic-layer enrichment is best for analysis, not heavy transformation
  • Overuse can impact model performance and complexity

Governance and Performance Considerations

  • Prefer enriching data upstream when logic is reusable
  • Document derived columns and tables
  • Apply consistent naming conventions
  • Avoid duplicating enrichment logic across layers
  • Balance flexibility with maintainability

What to Know for the DP-600 Exam

You should be comfortable with:

  • When to add columns vs. when to add tables
  • Choosing the right Fabric component for enrichment
  • SQL vs. Power Query vs. DAX enrichment
  • Performance and governance trade-offs
  • Supporting analytics-ready and reusable datasets

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

1. What is the primary goal of data enrichment in Microsoft Fabric?

A. Reduce data storage costs
B. Improve infrastructure security
C. Increase analytical value by adding context
D. Replace raw data sources

Correct Answer: C

Explanation:
Data enrichment enhances datasets by adding derived attributes or related data so that the data becomes more meaningful and useful for analytics and reporting.

2. Where should reusable business logic for derived columns ideally be implemented?

A. Power BI report visuals
B. Semantic model calculated columns
C. Lakehouse or warehouse SQL transformations
D. Ad-hoc DAX measures

Correct Answer: C

Explanation:
Reusable business logic should be implemented upstream (lakehouse or warehouse) to promote consistency, reuse, and better governance across multiple reports and models.

3. Which Fabric feature is best suited for low-code enrichment such as merging datasets and adding conditional columns?

A. SQL analytics endpoint
B. Dataflows Gen2
C. Eventhouse
D. Real-Time hub

Correct Answer: B

Explanation:
Dataflows Gen2 use Power Query to perform low-code transformations, including merges, derived columns, and conditional logic, making them ideal for enrichment scenarios.

4. When enriching data by joining tables, which join type preserves all rows from the primary dataset?

A. Inner join
B. Right join
C. Left join
D. Cross join

Correct Answer: C

Explanation:
A left join preserves all rows from the left (primary) table while adding matching data from the right table where available.

5. Which scenario best justifies creating a new enriched table instead of adding columns to an existing one?

A. Adding a formatting column for display
B. Creating a reusable dimension table
C. Renaming an existing column
D. Filtering rows for a specific report

Correct Answer: B

Explanation:
Creating a new table is appropriate when building reusable dimension or reference tables that support multiple fact tables or analytics use cases.

6. Why should heavy transformation logic generally be avoided in Power BI semantic models?

A. Semantic models cannot handle transformations
B. It increases OneLake storage usage
C. It can negatively affect performance and maintainability
D. Semantic models do not support calculated columns

Correct Answer: C

Explanation:
While semantic models support calculated columns and tables, heavy transformation logic is better handled upstream to improve performance and simplify model maintenance.

7. Which of the following is an example of enriching data by adding a new column?

A. Changing workspace permissions
B. Adding a “Customer Segment” column based on business rules
C. Publishing a semantic model
D. Creating a new workspace

Correct Answer: B

Explanation:
Derived columns such as customer segmentation add business context to data, which is a classic enrichment scenario.

8. What is a key benefit of enriching data earlier in the data pipeline?

A. Faster report publishing
B. Reduced need for Power BI licenses
C. Improved consistency across analytics assets
D. Automatic index creation

Correct Answer: C

Explanation:
Enriching data upstream ensures that all downstream consumers use the same logic and definitions, improving consistency and governance.

9. Which Fabric storage format is typically used when creating enriched tables in a lakehouse?

A. CSV
B. Parquet
C. Delta
D. JSON

Correct Answer: C

Explanation:
Lakehouse tables in Microsoft Fabric are stored in Delta format, which supports ACID transactions and efficient analytics.

10. Which factor should most influence where enrichment logic is implemented?

A. User interface preferences
B. Reusability and performance considerations
C. The number of Power BI visuals
D. Workspace naming conventions

Correct Answer: B

Explanation:
Choosing where to enrich data depends on how reusable the logic is and how it affects performance, scalability, and governance.

One thought on “Enrich data by adding new columns and tables”

Leave a comment