Category: Data Cleaning

Perform impact analysis of downstream dependencies from lakehouses, data warehouses, dataflows, and semantic models in Microsoft Fabric

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Maintain a data analytics solution
--> Maintain the analytics development lifecycle
--> Perform impact analysis of downstream dependencies from lakehouses,
data warehouses, dataflows, and semantic models

Impact analysis in Microsoft Fabric helps analytics engineers understand how changes to upstream data assets affect downstream items such as datasets, reports, dashboards, notebooks, and pipelines. It is a critical lifecycle practice that reduces the risk of breaking analytics solutions when making schema, logic, or data changes.

For the DP-600 exam, you should understand what impact analysis is, which Fabric tools support it, what dependencies are tracked, and how to use it in real-world lifecycle scenarios.

What Is Impact Analysis?

Impact analysis answers the question:

“If I change or delete this item, what else will be affected?”

It allows you to:

  • Identify downstream dependencies
  • Assess risk before making changes
  • Communicate potential impacts to stakeholders
  • Support safe development and deployment practices

Impact analysis is observational and informational—it does not enforce controls.

Where Impact Analysis Is Used in Fabric

Impact analysis applies across many Fabric items, including:

  • Lakehouses
  • Data Warehouses
  • Dataflows Gen2
  • Semantic models
  • Reports and dashboards
  • Notebooks and pipelines

These items form a connected analytics graph, which Fabric can visualize.

Lineage View: The Core Tool for Impact Analysis

The primary tool for impact analysis in Fabric is Lineage View.

What Lineage View Shows

  • Upstream data sources
  • Transformations and processing steps
  • Downstream consumers
  • Relationships between items

Lineage view provides a visual map of dependencies across workloads.

Impact Analysis by Asset Type

Lakehouses

Changing a Lakehouse can impact:

  • Notebooks reading tables
  • Semantic models using Direct Lake
  • Dataflows writing or reading data
  • Reports built on dependent models

Common risk: Dropping or renaming a column.

Data Warehouses

Warehouse changes may affect:

  • Views and SQL queries
  • Semantic models using DirectQuery
  • Reports and dashboards
  • External tools

Exam insight: Schema changes are a common source of downstream failures.

Dataflows Gen2

Dataflows often sit between raw data and analytics.

Changes can impact:

  • Lakehouses or Warehouses they load into
  • Semantic models consuming curated tables
  • Pipelines orchestrating refreshes

Semantic Models

Semantic models are among the most sensitive assets.

Changes may affect:

  • Reports and dashboards
  • Excel workbooks
  • Composite models
  • End-user self-service analytics

Exam note: Removing measures or renaming fields is high risk.

How to Perform Impact Analysis (High Level)

  1. Select the item (Lakehouse, Warehouse, Dataflow, or Semantic Model)
  2. Open Lineage view
  3. Review downstream dependencies
  4. Identify:
    • Reports
    • Datasets
    • Pipelines
    • Other dependent items
  5. Communicate or mitigate risk before making changes

Impact Analysis in the Development Lifecycle

Impact analysis is typically performed:

  • Before deploying changes
  • Before modifying schemas
  • Before deleting items
  • During troubleshooting

It supports:

  • Safe Git commits
  • Controlled pipeline deployments
  • Production stability

Common Exam Scenarios

You may see questions such as:

  • A column change breaks multiple reports → impact analysis was skipped
  • An engineer needs to know which reports use a dataset → lineage view
  • A Lakehouse schema update affects downstream models → review dependencies
  • A dataset should not be modified due to executive reports → high downstream impact

Example:

Before removing a table from a semantic model, what should you do?
Correct concept: Perform impact analysis using lineage view.

Impact Analysis vs Deployment Pipelines

These concepts are related but distinct.

FeatureImpact AnalysisDeployment Pipelines
PurposeRisk assessmentControlled promotion
EnforcedNoYes
TimingBefore changesDuring deployment
ToolLineage viewPipeline UI

Best Practices to Remember

  • Always check lineage before schema changes
  • Pay extra attention to semantic models and certified items
  • Communicate impacts to report owners
  • Pair impact analysis with:
    • Version control
    • Development pipelines
    • Endorsements and certification

Key Exam Takeaways

  • Impact analysis identifies downstream dependencies
  • Lineage view is the primary tool in Fabric
  • Applies to Lakehouses, Warehouses, Dataflows, and Semantic Models
  • Supports safe lifecycle and governance practices
  • A common scenario-based exam topic

Final Exam Tip

  • If a question asks what will break if I change this, the answer is impact analysis via lineage view.
  • If it asks how to safely move changes, the answer is pipelines or Git.
  • Expect questions that test:
    • When to perform impact analysis
    • Which items are affected by changes
    • Operational decision-making before deployments
  • Common traps:
    • Confusing impact analysis with lineage documentation
    • Assuming Fabric blocks breaking changes automatically
    • Forgetting semantic models are often the most impacted layer

Practice Questions

Question 1 (Single choice)

What is the PRIMARY purpose of impact analysis in Microsoft Fabric?

A. Improve query performance
B. Identify downstream objects affected by a change
C. Enforce data security policies
D. Reduce data refresh frequency

Correct Answer: B

Explanation:
Impact analysis helps you understand what items depend on a given artifact, so you can assess the risk of changes.

  • ❌ A: Performance tuning is separate
  • ❌ C: Security is not the focus
  • ❌ D: Refresh tuning is unrelated

Question 2 (Multi-select)

Which Fabric items can be analyzed for downstream dependencies? (Select all that apply.)

A. Lakehouses
B. Data warehouses
C. Dataflows
D. Semantic models

Correct Answers: A, B, C, D

Explanation:
Microsoft Fabric supports dependency tracking across all major analytical artifacts, enabling end-to-end lineage visibility.


Question 3 (Scenario-based)

You plan to rename a column in a lakehouse table. Which Fabric feature should you use FIRST?

A. Version control
B. Deployment pipeline
C. Impact analysis
D. Incremental refresh

Correct Answer: C

Explanation:
Renaming a column may break:

  • Semantic models
  • SQL queries
  • Reports

Impact analysis identifies what will be affected before the change.


Question 4 (Single choice)

Where do you access impact analysis for an item in Fabric?

A. Power BI Desktop
B. Microsoft Purview portal
C. Item settings in the Fabric workspace
D. Azure DevOps

Correct Answer: C

Explanation:
Impact analysis is accessible directly from the item context or settings within a Fabric workspace.

  • ❌ Purview focuses on governance/catalog
  • ❌ DevOps is not used for lineage

Question 5 (Scenario-based)

A dataflow loads data into a lakehouse that feeds multiple semantic models. What does impact analysis show?

A. Only the lakehouse
B. Only the semantic models
C. All downstream dependencies
D. Only refresh schedules

Correct Answer: C

Explanation:
Impact analysis provides a full dependency graph, showing all downstream items affected by changes.


Question 6 (Multi-select)

Which changes typically REQUIRE impact analysis before execution? (Select all that apply.)

A. Dropping columns
B. Renaming tables
C. Changing data types
D. Adding a new report page

Correct Answers: A, B, C

Explanation:
Structural changes can break dependencies. Adding a report page does not affect downstream items.


Question 7 (Scenario-based)

A semantic model is used by several reports and dashboards. What happens if you delete the model without impact analysis?

A. Nothing; reports are cached
B. Reports automatically reconnect
C. Reports and dashboards break
D. Fabric blocks the deletion

Correct Answer: C

Explanation:
Deleting a semantic model removes the data source for:

  • Reports
  • Dashboards

Impact analysis helps prevent such disruptions.


Question 8 (Single choice)

Which view best represents impact analysis results?

A. Tabular grid
B. SQL execution plan
C. Dependency graph
D. DAX query view

Correct Answer: C

Explanation:
Impact analysis is presented as a visual dependency graph, showing upstream and downstream relationships.


Question 9 (Scenario-based)

Which role MOST benefits from performing impact analysis regularly?

A. Report consumers
B. Workspace admins and data engineers
C. End-user analysts
D. External auditors

Correct Answer: B

Explanation:
Admins and engineers are responsible for:

  • Schema changes
  • Deployments
  • Stability

Impact analysis supports safe operational changes.


Question 10 (Multi-select)

Which best practices apply when using impact analysis? (Select all that apply.)

A. Perform before structural changes
B. Use in conjunction with deployment pipelines
C. Skip for minor schema updates
D. Communicate findings to stakeholders

Correct Answers: A, B, D

Explanation:
Impact analysis should:

  • Precede schema changes
  • Inform deployment decisions
  • Be communicated to stakeholders

❌ “Minor” changes can still break dependencies.


Power BI load error: load was cancelled by error in loading a previous table

You may run into this error when loading Power BI:

"load was cancelled by error in loading a previous table"

If you do get this error, keep scrolling down to see what the “inducing” error is. This message is an indication that there was an error previous to getting to the current table in the process. The real, initial error will be more descriptive. Start with resolving that error(s), and then this one will go away.

I hope you found this helpful.

Creating a DATE value in Power BI DAX, Power Query M, and Excel

You may at times need to create a date value in Power BI either using DAX or M, or in Excel. This is a quick post that describes how to create a date value in Power BI DAX, Power Query M language, and in Excel. Working with dates is an every-day thing for anyone that works with data.

In Power BI DAX, the syntax is:

DATE(<year>, <month>, <day>) //the parameters must be valid numbers

DATE(2025, 8, 23) //returns August 23, 2025

In Power Query M, the syntax is:

#date(<year>, <month>, <day>) //the parameters must be valid numbers

#date(2022, 3, 6) //returns March 6, 2022

In Excel, the syntax is:

DATE(<year>, <month>, <day>) //the parameters must be valid numbers

DATE(1989, 12, 3) //produces 12/3/1989 (officially returns a number that represents the date in Excel date-time code)

Thanks for reading. Hope you found this useful.

Data Cleaning methods

Data cleaning is an essential step in the data preprocessing pipeline when preparing data for analytics or data science. It involves identifying and correcting or removing errors, inconsistencies, and inaccuracies in the dataset to improve its quality and reliability. It is essential that data is cleaned before being used in analyses, reporting, development or integration. Here are some common data cleaning methods:

Handling missing values:

  • Delete rows or columns with a high percentage of missing values if they don’t contribute significantly to the analysis.
  • Impute missing values by replacing them with a statistical measure such as mean, median, mode, or using more advanced techniques like regression imputation or k-nearest neighbors imputation.

Handling categorical variables:

  • Encode categorical variables into numerical representations using techniques like one-hot encoding, label encoding, or target encoding.

Removing duplicates:

  • Identify and remove duplicate records based on one or more key variables.
  • Be cautious when removing duplicates, as sometimes duplicated entries may be valid and intentional.

Handling outliers:

  • Identify outliers using statistical methods like z-scores, box plots, or domain knowledge.
  • Decide whether to remove outliers or transform them based on the nature of the data and the analysis goals.

Correcting inconsistent data:

  • Standardize data formats: Convert data into a consistent format (e.g., converting dates to a specific format).
  • Resolve inconsistencies: Identify and correct inconsistent values (e.g., correcting misspelled words, merging similar categories).

Dealing with irrelevant or redundant features:

  • Remove irrelevant features that do not contribute to the analysis or prediction task.
  • Identify and handle redundant features that provide similar information to avoid multicollinearity issues.

Data normalization or scaling:

  • Normalize numerical features to a common scale (e.g., min-max scaling or z-score normalization) to prevent certain features from dominating the analysis due to their larger magnitudes.

Data integrity issues:

Finally, you need to address data integrity issues.

  • Check for data integrity problems such as inconsistent data types, incorrect data ranges, or violations of business rules.
  • Resolve integrity issues by correcting or removing problematic data.

It’s important to note that the specific data cleaning methods that need to be applied to a dataset will vary depending on the nature of the dataset, the analysis goals, and domain knowledge. It’s recommended to thoroughly understand the data and consult with domain experts when preparing to perform data cleaning tasks.