Category: Data Engineering

Create and Transform Columns (PL-300 Exam Prep)

This post is a part of the PL-300: Microsoft Power BI Data Analyst Exam Prep Hub; and this topic falls under these sections:
Prepare the data (25–30%)
--> Transform and load the data
--> Create and Transform Columns


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Columns are the foundation of data analysis in Power BI. The ability to create new columns and transform existing ones is essential for shaping your dataset into a structure that supports meaningful insights and accurate reports.

In the PL-300 exam, Microsoft tests not only whether you can perform transformations but also whether you understand when and why to apply them.


Why Create and Transform Columns?

Before data can be modeled and visualized:

  • It must be clean, consistent, and in the right format
  • New columns may be needed to support business logic
  • Transformations ensure data is reliable and analysis-ready

For example:

  • Converting text dates into true Date types
  • Extracting parts of a string (e.g., Year from a date)
  • Splitting a full name into first and last names
  • Normalizing inconsistent text values

These are not just useful—they are often necessary for accurate DAX measures and reporting.


Where Column Transformations Happen

Most column creation and transformation tasks happen in Power Query Editor (before the data loads into the model).

Key places include:

  • Transform tab
  • Add Column tab
  • Applied Steps pane
  • Advanced Editor (for M code)

Power BI also allows column creation after loading the data through:

  • DAX Calculated Columns (in the data model)

The exam may present scenarios where you choose which tool (Power Query vs DAX) to use.


Common Column Transformations

Here are the main categories of column operations you should be ready to apply:


1. Basic Transformations

These change existing columns:

  • Rename columns
  • Change data types
  • Trim, clean, or format text
  • Replace values
  • Remove columns

These are the bread-and-butter tasks that clean and standardize data.


2. Splitting and Merging Columns

When data is combined within one field:

  • Split Column (by delimiter or number of characters)
    • Example: Split Full NameFirst Name and Last Name
  • Merge Columns
    • Example: Combine City and State into a single location field

This is essential when data needs to be restructured for modeling.


3. Extracting Components

Examples include:

  • Extracting Year, Month, or Day from a Date column
  • Taking the first/last characters from a text string
  • Extracting text before or after a specific character

These operations prepare granular fields needed for grouping or calculations.


4. Calculations Using “Add Column”

You can create derived columns based on logic:

  • Custom Columns (via M formulas)
  • Conditional Columns
    • Example: Flag High Value sales where sales > $1,000
  • Index Columns
    • Useful for row ordering

These columns often support business metrics or classifications.


Text Transformations

Text columns commonly require cleaning and standardization:

  • Uppercase / Lowercase
  • Trim (removes leading/trailing spaces)
  • Clean (removes non-printable characters)
  • Replace Values (e.g., “N/A” → null)

The exam often tests whether you know how to fix inconsistent text data.


Date and Time Transformations

Working with dates is core to analysis:

  • Change text to date/time type
  • Extract Year, Quarter, Month, Day
  • Add custom time intelligence columns
  • Use locale conversion for date parsing

This enables time-based grouping and accurate measures like YTD (Year-to-Date).


Conditional and Custom Columns

Conditional Columns

  • Created through UI (Add Column → Conditional Column)
  • Define logic visually (e.g., if Sales > 500 then "High" else "Low")

Custom Columns

  • Created using Power Query M code
  • More advanced logic and functions

Both are useful depending on the complexity of your requirement. Exam questions often compare these approaches.


Column Transformations vs DAX Calculated Columns

Power Query Column

  • Transformation occurs before data loads into model
  • Changes physical data shape
  • Useful for cleaning and structuring data

DAX Calculated Column

  • Created after data loads into the model
  • Evaluated per row in the model
  • Useful for measures and relationships tied to data model context

Exam insight:
Use Power Query transformations for structural cleanup. Use DAX calculated columns when the logic depends on model relationships or evaluation context.


Best Practices for the Exam

  • Clean data before creating columns.
    Don’t derive new data from dirty input.
  • Apply the right transformation tool.
    Power Query for structural cleanup; DAX for model-aware calculations.
  • Name columns clearly.
    Report consumers and measures depend on intuitive names.
  • Avoid unnecessary columns.
    Only keep what’s needed for reporting to improve model performance.
  • Group related transformations logically.
    Use Query folding where possible (especially for large datasets).

How This Appears on the PL-300 Exam

You might see scenarios like:

You need to split a full address column into street, city, and postal code for better filtering. Which transformation should you use?

This tests:

  • Knowledge of Split Column
  • When to apply it
  • How to maintain data type integrity afterward

Or:

Your date column is text and not aggregating correctly. What do you do?

This tests:

  • Understanding of data types
  • Ability to convert to proper Date/Time

Most questions are scenario-based, requiring both decision and action reasoning.


Key Takeaways

✔ Column transformations are a core part of shaping data
✔ Power Query is the primary environment for creating and transforming columns
✔ Use Add Column for new fields and Transform for modifying existing fields
✔ Know the difference between Power Query and DAX calculated columns
✔ Common transformations include text, date, splitting/merging, conditional logic, and custom formulas


Practice Questions

Go to the Practice Exam Questions for this topic.

Resolve Data Import Errors (PL-300 Exam Prep)

This post is a part of the PL-300: Microsoft Power BI Data Analyst Exam Prep Hub; and this topic falls under these sections: 
Prepare the data (25–30%)
--> Profile and clean the data
--> Resolve data import errors


Note that there are 10 practice questions (with answers and explanations) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub's main page.

Data import errors are a common issue when bringing data into Power BI. These errors typically arise during the Power Query stage and must be resolved before data can be successfully loaded into the data model. The PL-300 exam tests your ability to identify, interpret, and fix these errors using Power Query’s built-in tools and transformations.


What Are Data Import Errors?

Import errors occur when Power BI cannot process or convert incoming data as expected. These errors can arise from:

  • Invalid data formats
  • Incompatible data types
  • Data corruption
  • Unexpected null or missing values
  • Transformation steps that fail

Identifying and resolving these errors early ensures that your dataset is clean, consistent, and ready for modeling and reporting.


Where Import Errors Occur

Import errors are most commonly encountered:

🧩 During Data Type Conversion

When the source value cannot be converted to the target type
(e.g., text "N/A" converted to number)

🧩 In Applied Steps

If a transformation step references a column that doesn’t exist
or expects a format that isn’t present

🧩 While Combining Queries

When merging or appending tables with mismatched structures

🧩 When Parsing Complex Formats

Such as dates in nonstandard formats or malformed JSON


How Power BI Signals Import Errors

In Power Query Editor, import errors are typically shown as:

  • Error icons in the preview cells
  • A warning message in the query results (“Error” link)
  • Red dotted underlines or warnings in applied steps
  • The “Load failed” message when refreshing

The first step in resolving errors is to examine the error details.


Viewing Error Details

When an error appears in Power Query:

  1. Click the Error indicator in the cell or
  2. Use View → Column quality / Column profile

You can also filter the column to show only error values by filtering on Errors.

Exam tip:
Power BI often shows technical error messages, so part of the task is interpreting what the underlying issue is (e.g., type mismatch, invalid format, null where not expected).


Common Import Errors & How to Fix Them

1. Type Conversion Errors

Scenario: A column expected to be numeric contains text such as "Unknown".

Fix Options:

  • Use Replace Errors to substitute a default value
  • Use Replace Values to convert specific text to numeric (e.g., "Unknown"0)
  • Adjust data type after cleaning

Key Idea: Always fix the root cause before changing the data type.


2. Unexpected Null Values

Scenario: A key column has nulls where values are required, causing subsequent transformations to fail.

Fix Options:

  • Replace nulls with default values via Replace Values
  • Remove rows where the column is null
  • Use conditional logic (Add Column → Conditional Column) to handle nulls appropriately

Key Idea: Nulls can break transformations (like merges) if not handled first.


3. Transformation Step Errors

Scenario: A transformation step refers to a column removed or renamed earlier in the applied steps.

Fix Options:

  • Review and reorder steps in the APPLIED STEPS pane
  • Rename the column consistently before referencing it
  • Delete the problematic step and reapply it correctly

Key Idea: Power BI applies steps sequentially. A downstream step can fail if an upstream change invalidates assumptions.


4. Merge/Append Structure Errors

Scenario: You merge or append tables that don’t share compatible column structures (e.g., mismatched data types).

Fix Options:

  • Ensure columns used for merger/join have identical data types
  • Rename or reorder columns to match structures
  • Preclean individual tables before combining

Key Idea: Always validate structure and types before merging or appending tables.


5. Parsing & Date Format Errors

Scenario: Date values import as text due to regional format differences (MM/DD/YYYY vs DD/MM/YYYY).

Fix Options:

  • Change the column data type to Date after validating format
  • Use Transform → Using Locale to define the correct regional format
  • Use Custom Columns to parse dates manually with Date.FromText

Key Idea: Locale-aware parsing helps resolve ambiguous date formats.


Tools to Help Diagnose Import Errors

Power BI provides several tools to help you locate and fix import errors:

🔍 Error Filtering

Filter columns to show only error rows.

📊 Column Quality / Distribution / Profile

Use profiling tools to identify patterns, nulls, and anomalies.

🧠 Step Validation

Hover over each Applied Step to see whether it is valid or failing.

📝 Advanced Editor

Review M code for logic errors or incorrect references.


Best Practices for Fixing Import Errors

1. Clean Before Converting Types
Always fix textual anomalies and nulls before assigning data types.

2. Avoid Hard-Coding Values
Replace problematic values using conditional logic or parameters for maintenance.

3. Inspect Impact of Each Step
Use the Applied Steps pane to ensure each transformation is valid.

4. Test Incrementally
Fix errors one at a time and refresh often to confirm success.

5. Document Assumptions
Add comments or descriptive step names to make logic clearer.


How This Appears on the PL-300 Exam

The exam commonly tests your ability to:

✔ Identify why a query fails (type mismatch, nulls, missing column)
✔ Choose the correct sequence to fix the issue
✔ Understand the difference between Replace Errors and Remove Errors
✔ Apply transformations in the correct order (clean → convert → transform)

Most questions are scenario-based, asking what action you would take next to successfully import data.


Key Exam Takeaways

  • Import errors can be caused by data type mismatches, unexpected nulls, invalid formats, and broken transformation steps.
  • Use Power Query tools to diagnose and resolve errors before loading data into the model.
  • Always understand the root cause before applying a fix.
  • Knowing how to use Replace Errors, Replace Values, Conditional Columns, and Data Type changes is essential.

Practice Questions

Go to the Practice Exam Questions for this topic.

Glossary – 100 “Data Engineering” Terms

Below is a glossary that includes 100 common “Data Engineering” terms and phrases in alphabetical order. Enjoy!

TermDefinition & Example
Access ControlManaging who can access data. Example: Role-based permissions.
At-Least-Once ProcessingData may be processed more than once. Example: Duplicate-safe pipelines.
At-Most-Once ProcessingData processed zero or one time. Example: No retries on failure.
BackfillProcessing historical data. Example: Reloading last year’s data.
Batch ProcessingProcessing data in scheduled chunks. Example: Daily sales aggregation.
Blue-Green DeploymentDeployment strategy minimizing downtime. Example: Switching pipeline versions.
Canary ReleaseGradual rollout to detect issues. Example: New pipeline tested on 5% of data.
Change Data Capture (CDC)Capturing database changes. Example: Streaming updates from OLTP DB.
CheckpointingSaving progress during processing. Example: Spark streaming checkpoints.
Cloud StorageScalable remote data storage. Example: Azure Data Lake Storage.
Cold StorageLow-cost storage for infrequent access. Example: Archived logs.
Columnar StorageData stored by column instead of row. Example: Parquet files.
CompressionReducing data size. Example: Gzip-compressed files.
Compute EngineSystem performing data processing. Example: Spark cluster.
Consumption LayerData prepared for analytics. Example: Gold layer.
Cost OptimizationReducing infrastructure costs. Example: Query optimization.
Curated LayerCleaned and transformed data. Example: Silver layer.
DAG (Directed Acyclic Graph)Workflow structure with dependencies. Example: Airflow pipeline.
Data CatalogSearchable inventory of data assets. Example: Azure Purview.
Data ContractAgreement defining data structure and expectations. Example: Producer guarantees column names and types.
Data EngineeringThe practice of designing, building, and maintaining data systems. Example: Creating pipelines that feed analytics dashboards.
Data GovernancePolicies for data management and usage. Example: Access control rules.
Data IngestionCollecting data from source systems. Example: Ingesting API data hourly.
Data LakeCentralized storage for raw data. Example: S3-based data lake.
Data LatencyTime delay in data availability. Example: 5-minute pipeline delay.
Data LineageTracking data flow from source to output. Example: Source-to-dashboard trace.
Data MartSubset of warehouse for specific use. Example: Finance data mart.
Data MaskingObscuring sensitive data. Example: Masked credit card numbers.
Data MeshDomain-oriented decentralized data ownership. Example: Teams own their data products.
Data ModelingDesigning data structures for usage. Example: Star schema design.
Data ObservabilityMonitoring data health and pipelines. Example: Freshness alerts.
Data Partition PruningSkipping irrelevant partitions. Example: Querying one date only.
Data PipelineAn automated process that moves and transforms data. Example: Nightly ETL job from CRM to warehouse.
Data PlatformIntegrated set of data tools. Example: End-to-end analytics stack.
Data ProductA dataset treated as a product. Example: Curated customer table.
Data ProfilingAnalyzing data characteristics. Example: Value distributions.
Data QualityAccuracy, completeness, and reliability of data. Example: No duplicate records.
Data ReplayReprocessing historical events. Example: Rebuilding aggregates from logs.
Data RetentionRules for data lifespan. Example: Delete logs after 1 year.
Data SecurityProtecting data from unauthorized access. Example: Encryption at rest.
Data SerializationConverting data for storage or transport. Example: Avro encoding.
Data SinkThe destination where data is stored. Example: Data warehouse.
Data SourceThe origin of data. Example: ERP system, SaaS application.
Data ValidationEnsuring data meets expectations. Example: Null checks.
Data VersioningTracking dataset changes. Example: Snapshot tables.
Data WarehouseOptimized storage for analytics queries. Example: Azure Synapse Analytics.
Dead Letter Queue (DLQ)Storage for failed records. Example: Invalid messages routed for review.
Dimension TableTable storing descriptive attributes. Example: Customer details.
ELTExtract, Load, Transform approach. Example: Transforming data inside Snowflake.
ETLExtract, Transform, Load process. Example: Cleaning data before loading into a database.
Event TimeTimestamp when event occurred. Example: User click time.
Event-Driven ArchitectureSystems reacting to events in real time. Example: Trigger pipeline on file arrival.
Exactly-Once ProcessingEnsuring data is processed only once. Example: Preventing duplicate events.
Fact TableTable storing quantitative measures. Example: Order transactions.
Fault ToleranceSystem resilience to failures. Example: Node failure recovery.
File FormatHow data is stored on disk. Example: Parquet, CSV.
Foreign KeyField linking tables together. Example: CustomerID in orders table.
Full LoadReloading all data. Example: Initial table population.
High AvailabilitySystem uptime and reliability. Example: Multi-zone deployment.
Hot StorageHigh-performance storage for frequent access. Example: Real-time tables.
IdempotencyAbility to rerun pipelines safely. Example: Reprocessing without duplicates.
Incremental LoadLoading only new or changed data. Example: CDC-based ingestion.
IndexingCreating structures to speed queries. Example: Index on order date.
Infrastructure as Code (IaC)Managing infrastructure via code. Example: Terraform scripts.
LakehouseHybrid of data lake and warehouse. Example: Databricks Lakehouse.
Late-Arriving DataData that arrives after expected time. Example: Delayed event logs.
LoggingRecording system events. Example: Job execution logs.
Message QueueBuffer for asynchronous data transfer. Example: Kafka topic for events.
MetadataData about data. Example: Table definitions and lineage.
MetricsQuantitative indicators of performance. Example: Rows processed per run.
OrchestrationCoordinating pipeline execution. Example: DAG scheduling.
PartitioningDividing data for performance. Example: Partitioning by date.
Personally Identifiable Information (PII)Data identifying individuals. Example: Email addresses.
Pipeline MonitoringTracking pipeline execution status. Example: Failure notifications.
Primary KeyUnique identifier for a record. Example: CustomerID.
Processing TimeTimestamp when data is processed. Example: Ingestion time.
Query OptimizationImproving query efficiency. Example: Predicate pushdown.
Raw LayerStorage of unprocessed data. Example: Bronze layer.
Real-Time DataData available with minimal latency. Example: Live dashboard updates.
Retry LogicAutomatic reruns on failure. Example: Retry failed ingestion job.
ScalabilityAbility to handle growing workloads. Example: Auto-scaling clusters.
SchedulerTool managing execution timing. Example: Cron, Airflow.
SchemaThe structure of a dataset. Example: Table columns and data types.
Schema EvolutionHandling schema changes over time. Example: Adding new columns safely.
Secrets ManagementSecure handling of credentials. Example: Key Vault for passwords.
Semi-Structured DataData with flexible schema. Example: JSON, Parquet.
ServerlessInfrastructure managed by provider. Example: Serverless SQL pools.
Serving LayerLayer optimized for consumption. Example: BI-ready tables.
ShardingDistributing data across nodes. Example: User data split across servers.
Snowflake SchemaNormalized version of star schema. Example: Product broken into sub-dimensions.
Star SchemaFact table surrounded by dimensions. Example: Sales fact with date dimension.
Stream ProcessingProcessing data in real time. Example: Clickstream event processing.
Structured DataData with a fixed schema. Example: SQL tables.
Technical DebtLong-term cost of quick fixes. Example: Hardcoded transformations.
ThroughputAmount of data processed per unit time. Example: Records per second.
Transformation LayerLayer where business logic is applied. Example: dbt models.
Unstructured DataData without a predefined structure. Example: Images, PDFs.
WatermarkMarker for processed data. Example: Last processed timestamp.
WindowingGrouping stream data by time windows. Example: 5-minute aggregations.
Workload IsolationSeparating workloads to avoid contention. Example: Dedicated compute pools.

Please share your suggestions for any terms that should be added.