Data Cleaning – Page 3 – The Data Community

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Maintain a data analytics solution 
    --> Maintain the analytics development lifecycle 
        --> Perform impact analysis of downstream dependencies from lakehouses, 
            data warehouses, dataflows, and semantic models

Impact analysis in Microsoft Fabric helps analytics engineers understand how changes to upstream data assets affect downstream items such as datasets, reports, dashboards, notebooks, and pipelines. It is a critical lifecycle practice that reduces the risk of breaking analytics solutions when making schema, logic, or data changes.

For the DP-600 exam, you should understand what impact analysis is, which Fabric tools support it, what dependencies are tracked, and how to use it in real-world lifecycle scenarios.

What Is Impact Analysis?

Impact analysis answers the question:

“If I change or delete this item, what else will be affected?”

It allows you to:

Identify downstream dependencies
Assess risk before making changes
Communicate potential impacts to stakeholders
Support safe development and deployment practices

Impact analysis is observational and informational—it does not enforce controls.

Where Impact Analysis Is Used in Fabric

Impact analysis applies across many Fabric items, including:

Lakehouses
Data Warehouses
Dataflows Gen2
Semantic models
Reports and dashboards
Notebooks and pipelines

These items form a connected analytics graph, which Fabric can visualize.

Lineage View: The Core Tool for Impact Analysis

The primary tool for impact analysis in Fabric is Lineage View.

What Lineage View Shows

Upstream data sources
Transformations and processing steps
Downstream consumers
Relationships between items

Lineage view provides a visual map of dependencies across workloads.

Impact Analysis by Asset Type

Lakehouses

Changing a Lakehouse can impact:

Notebooks reading tables
Semantic models using Direct Lake
Dataflows writing or reading data
Reports built on dependent models

Common risk: Dropping or renaming a column.

Data Warehouses

Warehouse changes may affect:

Views and SQL queries
Semantic models using DirectQuery
Reports and dashboards
External tools

Exam insight: Schema changes are a common source of downstream failures.

Dataflows Gen2

Dataflows often sit between raw data and analytics.

Changes can impact:

Lakehouses or Warehouses they load into
Semantic models consuming curated tables
Pipelines orchestrating refreshes

Semantic Models

Semantic models are among the most sensitive assets.

Changes may affect:

Reports and dashboards
Excel workbooks
Composite models
End-user self-service analytics

Exam note: Removing measures or renaming fields is high risk.

How to Perform Impact Analysis (High Level)

Select the item (Lakehouse, Warehouse, Dataflow, or Semantic Model)
Open Lineage view
Review downstream dependencies
Identify:
- Reports
- Datasets
- Pipelines
- Other dependent items
Communicate or mitigate risk before making changes

Impact Analysis in the Development Lifecycle

Impact analysis is typically performed:

Before deploying changes
Before modifying schemas
Before deleting items
During troubleshooting

It supports:

Safe Git commits
Controlled pipeline deployments
Production stability

Common Exam Scenarios

You may see questions such as:

A column change breaks multiple reports → impact analysis was skipped
An engineer needs to know which reports use a dataset → lineage view
A Lakehouse schema update affects downstream models → review dependencies
A dataset should not be modified due to executive reports → high downstream impact

Example:

Before removing a table from a semantic model, what should you do?
Correct concept: Perform impact analysis using lineage view.

Impact Analysis vs Deployment Pipelines

These concepts are related but distinct.

Feature	Impact Analysis	Deployment Pipelines
Purpose	Risk assessment	Controlled promotion
Enforced	No	Yes
Timing	Before changes	During deployment
Tool	Lineage view	Pipeline UI

Best Practices to Remember

Always check lineage before schema changes
Pay extra attention to semantic models and certified items
Communicate impacts to report owners
Pair impact analysis with:
- Version control
- Development pipelines
- Endorsements and certification

Key Exam Takeaways

Impact analysis identifies downstream dependencies
Lineage view is the primary tool in Fabric
Applies to Lakehouses, Warehouses, Dataflows, and Semantic Models
Supports safe lifecycle and governance practices
A common scenario-based exam topic

Final Exam Tip

If a question asks what will break if I change this, the answer is impact analysis via lineage view.
If it asks how to safely move changes, the answer is pipelines or Git.
Expect questions that test:
- When to perform impact analysis
- Which items are affected by changes
- Operational decision-making before deployments

Common traps:
- Confusing impact analysis with lineage documentation
- Assuming Fabric blocks breaking changes automatically
- Forgetting semantic models are often the most impacted layer

Practice Questions

Question 1 (Single choice)

What is the PRIMARY purpose of impact analysis in Microsoft Fabric?

A. Improve query performance
B. Identify downstream objects affected by a change
C. Enforce data security policies
D. Reduce data refresh frequency

Correct Answer: B

Explanation:
Impact analysis helps you understand what items depend on a given artifact, so you can assess the risk of changes.

❌ A: Performance tuning is separate
❌ C: Security is not the focus
❌ D: Refresh tuning is unrelated

Question 2 (Multi-select)

Which Fabric items can be analyzed for downstream dependencies? (Select all that apply.)

A. Lakehouses
B. Data warehouses
C. Dataflows
D. Semantic models

Correct Answers: A, B, C, D

Explanation:
Microsoft Fabric supports dependency tracking across all major analytical artifacts, enabling end-to-end lineage visibility.

Question 3 (Scenario-based)

You plan to rename a column in a lakehouse table. Which Fabric feature should you use FIRST?

A. Version control
B. Deployment pipeline
C. Impact analysis
D. Incremental refresh

Correct Answer: C

Explanation:
Renaming a column may break:

Semantic models
SQL queries
Reports

Impact analysis identifies what will be affected before the change.

Question 4 (Single choice)

Where do you access impact analysis for an item in Fabric?

A. Power BI Desktop
B. Microsoft Purview portal
C. Item settings in the Fabric workspace
D. Azure DevOps

Correct Answer: C

Explanation:
Impact analysis is accessible directly from the item context or settings within a Fabric workspace.

❌ Purview focuses on governance/catalog
❌ DevOps is not used for lineage

Question 5 (Scenario-based)

A dataflow loads data into a lakehouse that feeds multiple semantic models. What does impact analysis show?

A. Only the lakehouse
B. Only the semantic models
C. All downstream dependencies
D. Only refresh schedules

Correct Answer: C

Explanation:
Impact analysis provides a full dependency graph, showing all downstream items affected by changes.

Question 6 (Multi-select)

Which changes typically REQUIRE impact analysis before execution? (Select all that apply.)

A. Dropping columns
B. Renaming tables
C. Changing data types
D. Adding a new report page

Correct Answers: A, B, C

Explanation:
Structural changes can break dependencies. Adding a report page does not affect downstream items.

Question 7 (Scenario-based)

A semantic model is used by several reports and dashboards. What happens if you delete the model without impact analysis?

A. Nothing; reports are cached
B. Reports automatically reconnect
C. Reports and dashboards break
D. Fabric blocks the deletion

Correct Answer: C

Explanation:
Deleting a semantic model removes the data source for:

Reports
Dashboards

Impact analysis helps prevent such disruptions.

Question 8 (Single choice)

Which view best represents impact analysis results?

A. Tabular grid
B. SQL execution plan
C. Dependency graph
D. DAX query view

Correct Answer: C

Explanation:
Impact analysis is presented as a visual dependency graph, showing upstream and downstream relationships.

Question 9 (Scenario-based)

Which role MOST benefits from performing impact analysis regularly?

A. Report consumers
B. Workspace admins and data engineers
C. End-user analysts
D. External auditors

Correct Answer: B

Explanation:
Admins and engineers are responsible for:

Schema changes
Deployments
Stability

Impact analysis supports safe operational changes.

Question 10 (Multi-select)

Which best practices apply when using impact analysis? (Select all that apply.)

A. Perform before structural changes
B. Use in conjunction with deployment pipelines
C. Skip for minor schema updates
D. Communicate findings to stakeholders

Correct Answers: A, B, D

Explanation:
Impact analysis should:

Precede schema changes
Inform deployment decisions
Be communicated to stakeholders

❌ “Minor” changes can still break dependencies.

You may at times need to create a date value in Power BI either using DAX or M, or in Excel. This is a quick post that describes how to create a date value in Power BI DAX, Power Query M language, and in Excel. Working with dates is an every-day thing for anyone that works with data.

In Power BI DAX, the syntax is:

DATE(<year>, <month>, <day>) //the parameters must be valid numbers

DATE(2025, 8, 23) //returns August 23, 2025

In Power Query M, the syntax is:

#date(<year>, <month>, <day>) //the parameters must be valid numbers

#date(2022, 3, 6) //returns March 6, 2022

In Excel, the syntax is:

DATE(<year>, <month>, <day>) //the parameters must be valid numbers

DATE(1989, 12, 3) //produces 12/3/1989 (officially returns a number that represents the date in Excel date-time code)

Thanks for reading. Hope you found this useful.

Handling missing values:

Delete rows or columns with a high percentage of missing values if they don’t contribute significantly to the analysis.

Impute missing values by replacing them with a statistical measure such as mean, median, mode, or using more advanced techniques like regression imputation or k-nearest neighbors imputation.

Data integrity issues:

Finally, you need to address data integrity issues.

Check for data integrity problems such as inconsistent data types, incorrect data ranges, or violations of business rules.

Resolve integrity issues by correcting or removing problematic data.

It’s important to note that the specific data cleaning methods that need to be applied to a dataset will vary depending on the nature of the dataset, the analysis goals, and domain knowledge. It’s recommended to thoroughly understand the data and consult with domain experts when preparing to perform data cleaning tasks.

The Data Community

Category: Data Cleaning

Perform impact analysis of downstream dependencies from lakehouses, data warehouses, dataflows, and semantic models in Microsoft Fabric

For the DP-600 exam, you should understand what impact analysis is, which Fabric tools support it, what dependencies are tracked, and how to use it in real-world lifecycle scenarios.

Practice Questions

Question 1 (Single choice)

Question 2 (Multi-select)

Question 3 (Scenario-based)

Question 4 (Single choice)

Question 5 (Scenario-based)

Question 6 (Multi-select)

Question 7 (Scenario-based)

Question 8 (Single choice)

Question 9 (Scenario-based)

Question 10 (Multi-select)

Power BI load error: load was cancelled by error in loading a previous table

Creating a DATE value in Power BI DAX, Power Query M, and Excel

Data Cleaning methods

Handling missing values:

Handling categorical variables:

Removing duplicates:

Handling outliers:

Correcting inconsistent data:

Dealing with irrelevant or redundant features:

Data normalization or scaling:

Data integrity issues:

Information and resources for the data professionals' community