Category: Data Development

Merging Two Excel Files or Sheets Using Power Query (with the merge based on Multiple Columns)

Excel Power Query is a powerful, no-code/low-code tool that allows you to combine and transform data from multiple sources in a repeatable and refreshable way. One common use case is merging two Excel files or worksheets based on multiple matching columns, similar to a SQL join. Power Query is a major part of Power BI, but it can be used in Excel.

When to Use Power Query for Merging

Power Query is ideal when:

  • You receive recurring Excel files with the same structure
  • You need a reliable, refreshable merge process
  • You want to avoid complex formulas like VLOOKUP or XLOOKUP across many columns

Step-by-Step Overview

1. Load Both Data Sources into Power Query

  • Open Excel and go to Data → Get Data
  • Choose From Workbook (for separate files) or From Table/Range (for sheets in the same file)

Tip: Ensure the columns you plan to merge on have the same data types (e.g., text vs. number).


  • Load each dataset into Power Query as a separate query

2. Start the Merge Operation

  • In Power Query, select the primary table
  • Go to Query → Merge Queries
  • Choose the secondary table from the dropdown

3. Select Multiple Matching Columns

  • Click the first matching column in the primary table
  • Hold Ctrl (or Cmd on Mac) and select additional matching columns
  • Repeat the same column selections in the secondary table, in the same order

For example, if you needed to perform the merge on CustomerID, OrderDate, and Region, you would click Customer ID, then hold the Ctrl key and click OrderDate, then (while still holding down the Ctrl key) click Region.

Power Query treats this as a composite key, and all selected columns must match for rows from both tables to merge.


4. Choose the Join Type

Select the appropriate join kind:

  • Left Outer – Keep all rows from the first table (most common) and brings in the values for the matching rows from the second table
  • Inner – Keep only matching rows from both tables
  • Full Outer – Keep all rows from both tables, merging the table where there is a match and having just the values from the respective tables when there is no match

Click OK to complete the merge.


5. Expand the Merged Data

  • A new column appears containing nested tables
  • Click the expand icon to select which columns to bring in
  • Remove unnecessary columns to keep the dataset clean

6. Load and Refresh

  • Click Close & Load
  • The merged dataset is now available in Excel
  • When source files change, simply click Refresh to update everything automatically

Key Benefits

  • Handles multi-column joins cleanly and reliably
  • Eliminates fragile lookup formulas
  • Fully refreshable and auditable
  • Scales well as data volume grows

In Summary

Using Power Query to merge Excel data on multiple columns brings database-style joins into Excel, making your workflows more robust, maintainable, and professional. Once set up, it saves time and reduces errors—especially for recurring reporting and analytics tasks.

Thanks for reading!

Creating a DATE value in Power BI DAX, Power Query M, and Excel

You may at times need to create a date value in Power BI either using DAX or M, or in Excel. This is a quick post that describes how to create a date value in Power BI DAX, Power Query M language, and in Excel. Working with dates is an every-day thing for anyone that works with data.

In Power BI DAX, the syntax is:

DATE(<year>, <month>, <day>) //the parameters must be valid numbers

DATE(2025, 8, 23) //returns August 23, 2025

In Power Query M, the syntax is:

#date(<year>, <month>, <day>) //the parameters must be valid numbers

#date(2022, 3, 6) //returns March 6, 2022

In Excel, the syntax is:

DATE(<year>, <month>, <day>) //the parameters must be valid numbers

DATE(1989, 12, 3) //produces 12/3/1989 (officially returns a number that represents the date in Excel date-time code)

Thanks for reading. Hope you found this useful.

Using MAXX in Power BI to find the Latest Event Date across several event date columns in each row

We were working with some HR data which included multiple date fields such as Hire Date, Transfer Date, Promotion Date, and Termination Date. We needed to determine the most recent event date before termination. We ended up using the MAXX function to do this.

Sample dataset to demonstrate the scenario:

Using the following dataset to demonstrate the example:

EmployeeHire DateTransfer DatePromotion DateTermination Date
Alice2020-01-152021-05-102022-03-202023-06-15
Bob2019-11-012020-07-152021-10-05(blank)
Carol2021-03-25(blank)2021-09-142022-02-28

The goal is to calculate the most recent event and event date (i.e., the latest event and its date) between Hire Date, Transfer Date, and Promotion Date for each row. Termination Date was excluded from the comparison because the goal was to find the latest event before Termination (if that had occurred).

Using MAXX for Row-by-Row Evaluation

MAXX is an iterator function in DAX, meaning it evaluates an expression for each row of a table, then returns the maximum value. Iterator functions such as MAXX and SUMX work row-by-row over a table, in contrast to aggregate functions like MAX and SUM which operate over an entire column at once.

  • Aggregate example (MAX): Finds the highest value in a column across all rows.
  • Iterator example (MAXX): Evaluates an expression for each row in a virtual table, then finds the highest value.

This makes MAXX ideal for scenarios like this where the various dates are in multiple columns of the same row, and we need to find the max of these dates on each row.

DAX Code Example: This is an example of the code that was used to derive the latest event date.

Latest Event Date =
MAXX(
{
[Hire Date],
[Transfer Date],
[Promotion Date]
},
[Value]
)

Code Explanation:

  1. Create a virtual table with one column and three rows—one for each date we want to consider.
  2. MAXX iterates through this virtual table, evaluates [Value] (the date), and returns the latest / maximum (max) date for each iteration.

Expected Output based on the sample dataset:

EmployeeHire DateTransfer DatePromotion DateTermination DateLatest Event Date
Alice2020-01-152021-05-102022-03-202023-06-152022-03-20
Bob2019-11-012020-07-152021-10-05(blank)2021-10-05
Carol2021-03-25(blank)2021-09-142022-02-282021-09-14

This is much cleaner than using nested IF checks to determine the latest date / latest event for each record. Of course, the MAXX function can be used in other scenarios where you want to find the max value across multiple columns on each row.

Thanks for reading and I hope you found this useful!

Mastering the SWITCH Function in Power BI DAX – includes usage and code examples

When building measures and calculated columns in Power BI, we often face situations where we need to evaluate multiple possible outcomes and return different results based on conditions. While IF statements can handle this, they can become cumbersome, confusing, and hard to read when there are many branches.

The SWITCH function in DAX is a cleaner, more efficient alternative for handling multiple condition checks. Let’s take a look.

SWITCH Function Signature

The basic DAX syntax of the SWITCH function is:

SWITCH(<expression>,
       <value1>, <result1>,
       <value2>, <result2>,
       ...,
       [<else>])
  • <expression> – The value or expression you want to evaluate once.
  • <valueN> – A possible value that the expression could equal.
  • <resultN> – The result to return if the expression equals the corresponding <valueN>.
  • [<else>] – (Optional) The default result if none of the value-result pairs match.

Note: the <expression> is evaluated once and compared against multiple <value> options in sequence until a match is found.

Example 1 – Basic SWITCH Usage

Suppose we have a Customer[Category] column containing numeric codes:

  • 1 = “Bronze”
  • 2 = “Silver”
  • 3 = “Gold”

We can translate these codes into readable labels using SWITCH as in the following DAX code example:

Customer Category Label =
SWITCH(Customer[Category],
       1, "Bronze",
       2, "Silver",
       3, "Gold",
       "Unknown"
      )

Explanation:
– The Customer[Category] column is evaluated once.
– If it equals 1, "Bronze" is returned; if 2, "Silver"; if 3, "Gold".
– Otherwise, "Unknown" is returned.

Example 2 – Multiple Match Checks

If we wanted to calculate a commission rate based on a sales tier:

Commission Rate =
SWITCH(Sales[SalesTier],
       "Low", 0.02,
       "Medium", 0.05,
       "High", 0.08,
       0
)

Explanation:
– The Sales[SalesTier] column is evaluated once.
– If it equals "Low", 0.02 is returned; if "Medium", 0.05; if "High", 0.08.
– Otherwise, 0 is returned.

The SWITCH TRUE Pattern

This is a really cool and handy usage of the function. Sometimes, we don’t have a single value to compare against. Instead, we want to evaluate different logical conditions. In these cases, we can use the SWITCH TRUE pattern, which works like multiple IF statements but is much cleaner.

DAX Syntax:

SWITCH( TRUE(),
<condition1>, <result1>,
<condition2>, <result2>,
...,
[<else>])

Here, TRUE() acts as the <expression>. Each <condition> is a Boolean expression that returns TRUE or FALSE. The function returns the corresponding result of the first condition that evaluates to TRUE.

Example 3 – SWITCH TRUE for Ranges

Suppose we want to grade students based on their score:

Grade =
SWITCH(TRUE(),
Scores[Score] >= 90, "A",
Scores[Score] >= 80, "B",
Scores[Score] >= 70, "C",
Scores[Score] >= 60, "D",
"F"
)

Explanation:
– Each Scores[Score] comparison statement is evaluated in order from top to bottom, and returns the first match.
– If Scores[Score] >= 90, “A” is returned; if Scores[Score] >= 80, “B”; if Scores[Score] >= 70, “C”, if Scores[Score] >= 60, “D”
– Otherwise, “F” is returned.

Note: Other more complex conditions, such as ones using OR and AND logic or including complex calculations, can be used.

Why SWITCH is such a great, clean, easy to use function:

  • No nested IFs.
  • Each condition is independent.
  • Easy to add or modify conditions.

When to Use SWITCH Instead of IF

While IF can achieve the same results, SWITCH has several advantages:

  1. ReadabilitySWITCH structures conditions in a clear, top-to-bottom list.
  2. Maintainability – Easier to add, remove, or change cases without dealing with messy nested parentheses.
  3. Performance – In some cases, SWITCH can be more efficient because the expression (in the basic form) is evaluated once, not multiple times as with nested IF statements.
  4. Logical Branching – The SWITCH TRUE pattern handles complex conditions without deep nesting.

Example 4 – IF vs. SWITCH

Let’s take a look at a comparison example:

IF Version:
Category Label =
IF(Customer[Category] = 1, "Bronze",
IF(Customer[Category] = 2, "Silver",
IF(Customer[Category] = 3, "Gold", "Unknown")
)
)

SWITCH Version:
Category Label =
SWITCH(Customer[Category],
1, "Bronze",
2, "Silver",
3, "Gold",
"Unknown"
)

Result: As you can see, the SWITCH version is shorter, easier to read, less error-prone, and easier to maintain.

Key Takeaways

  • Use basic SWITCH when comparing one expression to multiple possible values.
  • Use SWITCH TRUE when checking multiple conditions or ranges.
  • SWITCH often results in cleaner, more maintainable DAX than deeply nested IF statements.
  • Always include a default (else) value to handle unexpected cases.

Thanks for reading and I hope you found this useful.

Calculated Columns vs. Measures in Power BI: Understanding the Differences

When working in Power BI, two common ways to add custom calculations to your data model are calculated columns and measures. While they both use DAX (Data Analysis Expressions), their purposes, storage, and performance implications differ significantly. Understanding these differences can help you design more efficient and maintainable Power BI reports.

1. What They Are

Calculated Column
A calculated column is a new column added to a table in your data model. It is calculated row-by-row based on the existing data and stored in the model like any other column.

Measure
A measure is a calculation that is evaluated on the fly, usually aggregated at the visual level. Measures don’t exist as stored data in your table—they are computed dynamically based on filter context.

To create a Calculated Column or a Measure, either from the Home menu …

… or from the Table Tools menu …

… select “New Column” (to create a Calculated Column) or “New Measure” (to create a new measure). Then enter the relevant DAX for the column or measure as shown in the next section below.

2. DAX Syntax Examples

Imagine a Sales table with columns: Product, Quantity, and Unit Price.

Calculated Column Example
Creating a calculated column:
Total Price = Sales[Quantity] * Sales[Unit Price]

This new column will appear in the table and will be stored for every row in the Sales table.

Measure Example
Creating a measure:
Total Sales = SUMX(Sales, Sales[Quantity] * Sales[Unit Price])

This measure calculates the total across all rows in the current filter context—without physically storing a column for every row.

3. When They Are Computed

FeatureCalculated ColumnMeasure
When computedDuring data model processing (data refresh).At query time (when a visual or query is run).
Where storedIn-memory within the data model (VertiPaq storage).Not stored—calculated on demand.
Performance impactIncreases model size (RAM & disk space).Consumes CPU at query time, minimal storage overhead.

4. Storage and Performance Implications

  • Calculated Columns
    • RAM & Disk Space: Stored in VertiPaq compression format. Large columns increase your .pbix file size and memory footprint.
    • CPU: Low impact at query time since results are precomputed, but refresh time increases.
    • Good for: Fields you need for filtering, sorting, or joining tables.
  • Measures
    • RAM & Disk Space: No significant impact on storage since they’re not persisted.
    • CPU: Can be CPU-intensive if the calculation is complex and used across large datasets.
    • Good for: Aggregations, KPIs, and calculations that change based on slicers or filters.

5. When to Use Each

When to Use a Calculated Column

  • You need a field for row-level filtering or grouping in visuals.
  • You need a column to create relationships between tables.
  • The calculation is row-specific and independent of report filters.

Example:

Sales Category = IF(Sales[Quantity] > 100, "High Volume", "Low Volume")

When to Use a Measure

  • You want calculations that respond dynamically to slicers and filters.
  • You want to avoid inflating your data model with unnecessary stored columns.
  • The calculation is aggregate-based.

Example:

Average Order Value = DIVIDE([Total Sales], DISTINCTCOUNT(Sales[Order ID]))

6. When They Cannot Be Used

SituationCalculated ColumnMeasure
Relationship creation✅ Can be used❌ Cannot be used
Row-level filtering in slicers✅ Can be used❌ Cannot be used
Dynamic response to slicers❌ Cannot recalculate✅ Fully dynamic
Reduce model size❌ Adds storage✅ No storage impact

7. Summary Table

FeatureCalculated ColumnMeasure
Stored in modelYesNo
Calculated atData refreshQuery time
Memory impactHigher (stored per row)Minimal
Disk size impactHigherMinimal
Dynamic filtersNoYes
Best forFiltering, relationships, sortingAggregations, KPIs, dynamic calcs

8. Best Practices

  • Default to measures when possible—they’re lighter and more flexible.
  • Use calculated columns sparingly, only when the calculation must exist at the row level in the data model.
  • If a calculated column is only used in visuals, try converting it to a measure to save memory.
  • Be mindful of CPU impact for very complex measures—optimize DAX to avoid performance bottlenecks.

I hope this was helpful in clarifying the differences between Calculated Columns and Measures, and will help you to determine which you need in various scenarios for your Power BI solutions.

Thanks for reading!

Understanding UNION, INTERSECT, and EXCEPT in Power BI DAX

When working with data in Power BI, it’s common to need to combine, compare, or filter tables based on their rows. DAX provides three powerful table / set functions for this: UNION, INTERSECT, and EXCEPT.

These functions are especially useful in advanced calculations, comparative analysis, and custom table creation in reports. If you have used these functions in SQL, the concepts here will be familiar.

Sample Dataset

We’ll use the following two tables throughout our examples:

Table: Sales_2024

The above table (Sales_2024) was created using the following DAX code utilizing the DATATABLE function (or you could enter the data directly using the Enter Data feature in Power BI):

Table: Sales_2025

The above table (Sales_2025) was created using the following DAX code utilizing the DATATABLE function (or you could enter the data directly using the Enter Data feature in Power BI):

Now that we have our two test tables, we can now use them to explore the 3 table / set functions – Union, Intersect, and Except.

1. UNION – Combine Rows from Multiple Tables

The UNION function returns all rows from both tables, including duplicates. It requires the same number of columns and compatible data types in corresponding columns in the the tables being UNION’ed. The column names do not have to match, but the number of columns and datatypes need to match.

DAX Syntax:

UNION(<Table1>, <Table2>)

For our example, here is the syntax and resulting dataset:

UnionTable = UNION(Sales_2024, Sales_2025)

As you can see, the UNION returns all rows from both tables, including duplicates.

If you were to reverse the order of the tables (in the function call), the result remains the same (as shown below):

To remove duplicates, you can wrap the UNION inside a DISTINCT() function call, as shown below:

2. INTERSECT – Returns Rows Present in Both Tables

The INTERSECT function returns only the rows that appear in both tables (based on exact matches across all columns).

DAX Syntax:

INTERSECT(<Table1>, <Table2>)

For our example, here is the syntax and resulting dataset:

IntersectTable = INTERSECT(Sales_2024, Sales_2025)

Only the rows in Sales_2024 that are also found in Sales_2025 are returned.

If you were to reverse the order of the tables, you would get the following result:

IntersectTableReverse = INTERSECT(Sales_2025, Sales_2024)

In this case, it returns only the rows in Sales_2025 that are also found in Sales_2024. Since the record with “D – West – $180” exists twice in Sales_2025, and also exists in Sales_2024, then both records are returned. So, while it might not be relevant for all datasets, order does matter when using INTERSECT.

3. EXCEPT – Returns Rows in One Table but Not the Other

The EXCEPT function returns rows from the first table that do not exist in the second.

DAX Syntax:

EXCEPT(<Table1>, <Table2>)

For our example, here is the syntax and resulting dataset:

ExceptTable = EXCEPT(Sales_2024, Sales_2025)

Only the rows in Sales_2024 that are not in Sales_2025 are returned.

If you were to reverse the order of the tables, you would get the following result:

ExceptTableReverse = EXCEPT(Sales_2025, Sales_2024)

Only the rows in Sales_2025 that are not in Sales_2024 are returned. Therefore, as you have seen, since it pulls data from the first table that does not exist in the second, order does matter when using EXCEPT.

Comparison table summarizing the 3 functions:

FunctionUNIONINTERSECTEXCEPT
Purpose & OutputReturns all rows from both tablesReturns rows that appear in both tables (i.e., rows that match across all columns in both tables)Returns rows from the first table that do not exist in the second
Match CriteriaColumn position (number of columns) and datatypesColumn position (number of columns) and datatypes and valuesColumn position (number of columns) and datatypes must match and values must not match
Order Sensitivityorder does not matterorder matters if you want duplicates returned when they exist in the first tableorder matters
Duplicate HandlingKeeps duplicates. They can be removed by using DISTINCT()Returns duplicates only if they exist in the first tableReturns duplicates only if they exist in the first table

Additional Notes for your consideration:

  • Column Names: Only the column names from the first table are kept; the second table’s columns must match in count and data type.
  • Performance: On large datasets, these functions can be expensive, so you should consider filtering the data before using them.
  • Case Sensitivity: String comparisons are generally case-insensitive in DAX.
  • Real-World Use Cases:
    • UNION – Combining a historical dataset and a current dataset for analysis.
    • INTERSECT – Finding products sold in both years.
    • EXCEPT – Identifying products discontinued or newly introduced.

Thanks for reading!

How to enter data using DAX DATATABLE function in Power BI

I previously posted an article about how to manually enter data into Power BI using the Enter Data feature. That article can be found here: https://thedatacommunity.org/2024/08/03/how-to-enter-data-manually-and-update-it-in-power-bi/. This post shares how to use DAX to create a table with data in Power BI, an alternative way of creating data manually, which can be useful in many scenarios.

In Power BI Desktop, from the Table tools menu, click New Table:

Enter the code for the new table and data as shown in the example below:

The example code below shows the syntax for the DATATABLE function. You need to specify the table name (the name in the example below is “Games Table created using DAX”), and column names and their corresponding datatypes, and then provide the data as a list of values within braces to represent each row of data. STRING values are enclosed in quotes as usual.

Then click the checkmark to run the code/create the table with data.

When you review the table in the Table View, it will look like this, just like other tables look:

This table and data can then be used as any other table and data loaded into Power BI.

Thanks for reading. I hope you found this useful.

Data Cleaning methods

Data cleaning is an essential step in the data preprocessing pipeline when preparing data for analytics or data science. It involves identifying and correcting or removing errors, inconsistencies, and inaccuracies in the dataset to improve its quality and reliability. It is essential that data is cleaned before being used in analyses, reporting, development or integration. Here are some common data cleaning methods:

Handling missing values:

  • Delete rows or columns with a high percentage of missing values if they don’t contribute significantly to the analysis.
  • Impute missing values by replacing them with a statistical measure such as mean, median, mode, or using more advanced techniques like regression imputation or k-nearest neighbors imputation.

Handling categorical variables:

  • Encode categorical variables into numerical representations using techniques like one-hot encoding, label encoding, or target encoding.

Removing duplicates:

  • Identify and remove duplicate records based on one or more key variables.
  • Be cautious when removing duplicates, as sometimes duplicated entries may be valid and intentional.

Handling outliers:

  • Identify outliers using statistical methods like z-scores, box plots, or domain knowledge.
  • Decide whether to remove outliers or transform them based on the nature of the data and the analysis goals.

Correcting inconsistent data:

  • Standardize data formats: Convert data into a consistent format (e.g., converting dates to a specific format).
  • Resolve inconsistencies: Identify and correct inconsistent values (e.g., correcting misspelled words, merging similar categories).

Dealing with irrelevant or redundant features:

  • Remove irrelevant features that do not contribute to the analysis or prediction task.
  • Identify and handle redundant features that provide similar information to avoid multicollinearity issues.

Data normalization or scaling:

  • Normalize numerical features to a common scale (e.g., min-max scaling or z-score normalization) to prevent certain features from dominating the analysis due to their larger magnitudes.

Data integrity issues:

Finally, you need to address data integrity issues.

  • Check for data integrity problems such as inconsistent data types, incorrect data ranges, or violations of business rules.
  • Resolve integrity issues by correcting or removing problematic data.

It’s important to note that the specific data cleaning methods that need to be applied to a dataset will vary depending on the nature of the dataset, the analysis goals, and domain knowledge. It’s recommended to thoroughly understand the data and consult with domain experts when preparing to perform data cleaning tasks.

Transpose vs. Pivot vs. Unpivot in Power BI

In this post, I will quickly show how to perform 3 key transformations in Power BI – Transpose, Pivot, and Unpivot – and what impact they have on the data they are applied to. These transformations are often needed to manipulate the original data to get it into the shape needed to create the Power BI model and reports. We will use a simple example so that you can see clearly the effect of each transformation.

This is the original dataset – Student Grades:

Transpose

Transpose the data by going to the Transform ribbon and clicking “Transpose”.

This is what the data looks like after the Transpose transformation is applied. The rows are converted into columns, and the columns are converted into rows. Default headers / column names are applied.

Pivot Column

Starting with the original dataset, we will now Pivot the dataset. Go to Transform ribbon, click on “Pivot Column”.

The Pivot Column dialog opens. As noted, it will use the values in the Names column to create new columns, and will use the Score column as the Values Column.

This is what the data looks like after Pivot is applied as described above. Pivot Column converts the selected Name column values into column headers.

Unpivot Columns

Starting with the original dataset, we will now Unpivot the dataset. First select all 4 name columns by using “Ctrl + Click”.

Then, go to Transform ribbon, click on “Unpivot Columns”. This is what the data looks like after the Unpivot Columns transformation is applied. Unpivot Columns converts the selected columns (the name columns in this case) into a column of values (Attribute) and the column values are stored in the Value column.

Thanks for reading!

SQL Tips: How to rename a column in a table – Oracle database – Oracle SQL

At times you will need to change the name of a column in an existing table. If you are not changing the data type, it is just one statement / step that needs to be executed. However, I strongly recommend that you also do a backup step, especially if you’re making the change in a production environment, just in case of an unexpected issue.

If you choose to do the backup, you may perform this with a “create-table-as-select” statement in this form:

create table [table_name_backup] as select * from [table_name];

Here is an example of the above statement:

create table EMPLOYEES_BKUP as select * from EMPLOYEES;

Now that the table you are modifying is all backed up, you can proceed to rename the column.

The rename SQL statement would take this form:

alter table [table_name] rename column [existing_column_name] to [new_column_name];

An example of the statement:

alter table EMPLOYEES rename column SEX to GENDER;

Thanks for reading! I hope you found this information useful.