Tag: SQL

DP-700, Python, SQL June 3, 2026

Transform data by using PySpark, SQL, and KQL (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform batch data
      --> Transform data by using PySpark, SQL, and KQL

Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the most important skills for the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric certification exam is knowing how to transform data using the appropriate technology. Microsoft Fabric provides multiple transformation engines, each optimized for specific workloads:

PySpark for large-scale distributed data engineering and advanced transformations
SQL for relational data manipulation, warehousing, and analytics
KQL (Kusto Query Language) for high-volume log, telemetry, event, and time-series data analysis

A successful Fabric Data Engineer must understand not only how each technology works, but also when to choose one over another.

Understanding the Transformation Options in Microsoft Fabric

Microsoft Fabric supports several data processing experiences:

Technology	Primary Use Case	Common Fabric Components
PySpark	Big data processing and engineering	Lakehouse, Notebooks
SQL	Relational transformations and analytics	Warehouse, SQL Endpoint
KQL	Streaming, telemetry, logs, event analytics	Eventhouse, Real-Time Intelligence

While all three can transform data, they are designed for different scenarios.

Transforming Data with PySpark

What is PySpark?

PySpark is the Python API for Apache Spark.

Spark is a distributed processing engine that allows data engineers to process extremely large datasets across multiple nodes simultaneously.

Within Microsoft Fabric, PySpark is typically used in:

Notebooks
Lakehouses
Spark Job Definitions

When to Use PySpark

PySpark is ideal when:

Working with large-scale datasets
Performing complex transformations
Processing semi-structured data
Building data engineering pipelines
Performing machine learning preparation
Handling Delta Lake tables

Examples include:

Cleaning raw data
Parsing JSON files
Aggregating billions of records
Creating dimensional model tables
Performing data quality checks

Reading Data with PySpark

Example:

df = spark.read.format("delta").load("Tables/Sales")

Filtering Data

filtered_df = df.filter(df.Amount > 1000)

Creating New Columns

			
from pyspark.sql.functions import col
new_df = df.withColumn(
    "TaxAmount",
    col("Amount") * 0.07
)

		

Aggregating Data

			
from pyspark.sql.functions import sum
summary_df = (
    df.groupBy("Region")
      .agg(sum("Amount").alias("TotalSales"))
)

		

Writing Results

summary_df.write.mode("overwrite").saveAsTable("SalesSummary")

PySpark Advantages

Scalability

Handles terabytes and petabytes of data.

Distributed Processing

Automatically parallelizes workloads.

Flexibility

Supports:

Structured data
Semi-structured data
Unstructured data

Data Engineering Focus

Excellent for ETL and ELT processes.

PySpark Limitations

More complex than SQL
Requires programming skills
Less familiar to business analysts
Higher resource consumption for small workloads

Transforming Data with SQL

What is SQL in Fabric?

SQL remains one of the most commonly used languages in Fabric.

You can use SQL within:

Fabric Data Warehouse
Lakehouse SQL Endpoint
SQL Query Editor
Stored Procedures
Data Pipelines

When to Use SQL

SQL is ideal for:

Relational transformations
Data warehouse development
Reporting datasets
Aggregations
Joins
Dimensional modeling

Examples:

Creating fact tables
Loading dimensions
Building reporting views
Data validation

Filtering Records

			
SELECT *
FROM Sales
WHERE Amount > 1000;

Aggregations

			
SELECT
    Region,
    SUM(Amount) AS TotalSales
FROM Sales
GROUP BY Region;

		

Joining Tables

			
SELECT
    s.SaleID,
    c.CustomerName
FROM Sales s
INNER JOIN Customer c
    ON s.CustomerID = c.CustomerID;

		

Creating Transformation Tables

			
CREATE TABLE SalesSummary AS
SELECT
    Region,
    SUM(Amount) AS TotalSales
FROM Sales
GROUP BY Region;

		

SQL Advantages

Familiarity

Most data professionals know SQL.

Readability

Easy to understand and maintain.

Relational Optimization

Optimized for joins and aggregations.

Warehousing Support

Ideal for star schemas and dimensional models.

SQL Limitations

Less effective for complex data engineering workflows
Not ideal for large-scale semi-structured data processing
Limited flexibility compared to PySpark

Transforming Data with KQL

What is KQL?

Kusto Query Language (KQL) is a read-optimized query language designed for:

Telemetry
Log analytics
Event processing
Streaming data
Time-series analysis

KQL is commonly used in:

Eventhouse
Real-Time Intelligence
KQL Databases

When to Use KQL

Use KQL when working with:

Sensor data
IoT events
Application logs
Security monitoring
Streaming datasets
Time-series analytics

Examples:

Monitoring manufacturing equipment
Detecting anomalies
Security event analysis
Operational dashboards

Filtering Data

			
Events
| where Temperature > 100

Summarization

			
Events
| summarize AvgTemp = avg(Temperature)
    by DeviceID

Time-Series Analysis

			
Events
| summarize Count=count()
    by bin(Timestamp, 1h)

Detecting Trends

			
Events
| make-series AvgTemp=avg(Temperature)
    on Timestamp
    step 1h

KQL Advantages

High Performance

Optimized for large event datasets.

Time-Series Analytics

Excellent for temporal analysis.

Streaming Support

Designed for real-time workloads.

Fast Query Execution

Ideal for operational dashboards.

KQL Limitations

Not intended for traditional data warehousing
Less suitable for dimensional modeling
Not commonly used for batch ETL

Comparing PySpark, SQL, and KQL

Requirement	Best Choice
Large-scale ETL	PySpark
Data warehouse transformations	SQL
Star schema creation	SQL
Streaming analytics	KQL
Time-series analysis	KQL
Semi-structured JSON processing	PySpark
Machine learning preparation	PySpark
Business reporting datasets	SQL
Eventhouse analytics	KQL
Massive Delta Lake processing	PySpark

Choosing the Right Transformation Tool

Choose PySpark When

Processing very large datasets
Working with Data Lake data
Building engineering pipelines
Handling JSON or Parquet files
Performing advanced transformations

Choose SQL When

Building warehouses
Creating dimensional models
Developing reporting datasets
Performing relational transformations
Creating views and stored procedures

Choose KQL When

Working with event streams
Analyzing telemetry
Investigating logs
Performing time-series analysis
Monitoring operational systems

Exam Tips

Know the Primary Use Cases

A common DP-700 exam question asks which technology is most appropriate for a scenario.

Remember:

PySpark = Big Data Engineering
SQL = Relational Analytics and Warehousing
KQL = Real-Time and Time-Series Analytics

Understand Fabric Components

Know where each technology is primarily used:

Technology	Fabric Experience
PySpark	Lakehouse, Notebook
SQL	Warehouse, SQL Endpoint
KQL	Eventhouse

Focus on Scenario-Based Questions

The exam frequently describes a business requirement and asks which technology should be used.

For example:

IoT sensors → KQL
Warehouse dimension tables → SQL
Processing billions of JSON records → PySpark

Practice Exam Questions

Question 1

A data engineer must transform 20 TB of semi-structured JSON data stored in OneLake. Which technology is the best choice?

A. SQL

B. PySpark

C. KQL

D. Power Query

Answer: B

Explanation: PySpark is designed for distributed processing of massive datasets and handles semi-structured formats such as JSON efficiently.

Question 2

A Fabric solution requires creation of a star schema consisting of fact and dimension tables. Which technology is most appropriate?

A. SQL

B. KQL

C. Power BI DAX

D. Data Activator

Answer: A

Explanation: SQL is optimized for relational transformations and dimensional modeling commonly used in data warehouses.

Question 3

A company wants to analyze millions of IoT events arriving continuously from factory equipment. Which technology should be used?

A. KQL

B. Power Query

C. SQL

D. Excel

Answer: A

Explanation: KQL is designed specifically for high-volume event, telemetry, and time-series analysis workloads.

Question 4

Which Fabric component is most closely associated with KQL transformations?

A. Warehouse

B. Notebook

C. SQL Endpoint

D. Eventhouse

Answer: D

Explanation: Eventhouse is the primary Fabric experience for KQL-based analytics and real-time intelligence workloads.

Question 5

A data engineer needs to process Delta Lake tables using distributed compute. Which technology should be selected?

A. KQL

B. SQL

C. PySpark

D. Power BI

Answer: C

Explanation: PySpark integrates directly with Delta Lake and supports scalable distributed processing.

Question 6

Which language is specifically optimized for time-series analysis?

A. SQL

B. KQL

C. Python

D. DAX

Answer: B

Explanation: KQL includes built-in capabilities for temporal aggregation, anomaly detection, and time-series analytics.

Question 7

A Fabric Warehouse team needs to build a reusable transformation layer consisting of joins, aggregations, and views. Which technology should they use?

A. SQL

B. KQL

C. Dataflows Gen2

D. Spark ML

Answer: A

Explanation: SQL is the preferred language for relational transformations and warehouse development.

Question 8

Which technology is generally the best choice for preparing large datasets for machine learning?

A. KQL

B. SQL

C. DAX

D. PySpark

Answer: D

Explanation: PySpark provides scalable data preparation capabilities and integrates well with machine learning workflows.

Question 9

An engineer needs to summarize application log events by hour and identify usage trends. Which technology is most appropriate?

A. PySpark

B. Power Query

C. KQL

D. SQL

Answer: C

Explanation: KQL excels at log analytics, event monitoring, and time-based aggregations.

Question 10

A team needs a transformation language that is familiar to most database developers and optimized for relational joins. Which should they choose?

A. PySpark

B. KQL

C. Power Query

D. SQL

Answer: D

Explanation: SQL remains the standard language for relational querying, joins, aggregations, and warehouse transformations.

Go to the DP-700 Exam Prep Hub main page.

Data Development, Data Education & Training, Data Modeling, Databases, DP-900, SQL, Uncategorized May 10, 2026

Identify common Structured Query Language (SQL) statements (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Identify considerations for relational data on Azure (20–25%)
   --> Describe relational concepts
      --> Identify common Structured Query Language (SQL) statements

Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Understanding basic SQL statements is essential for working with relational data and is a key requirement for the DP-900 exam. You are not expected to be an advanced SQL developer, but you should recognize common SQL commands, their purpose, and when they are used.

What Is SQL?

Structured Query Language (SQL) is the standard language used to:

Query data
Insert new data
Update existing data
Delete data
Define database structures

SQL is used across relational database systems, including Azure services like:

Azure SQL Database
Azure Database for PostgreSQL
Azure Database for MySQL

Categories of SQL Statements

SQL statements are typically grouped into categories:

Category	Purpose
DDL (Data Definition Language)	Define and modify database structures
DML (Data Manipulation Language)	Work with data in tables
DQL (Data Query Language)	Retrieve data
DCL (Data Control Language)	Manage permissions

For DP-900, focus primarily on DDL, DML, and DQL.

1. Data Query Language (DQL)

SELECT

Used to retrieve data from a table.

			
SELECT Name, City
FROM Customers;

You can filter results:

			
SELECT Name
FROM Customers
WHERE City = 'Seattle';

💡 Key Points:

Most commonly used SQL statement
Can include filtering, sorting, and grouping

2. Data Manipulation Language (DML)

INSERT

Adds new rows to a table.

			
INSERT INTO Customers (Name, City)
VALUES ('John', 'Seattle');

UPDATE

Modifies existing data.

			
UPDATE Customers
SET City = 'Austin'
WHERE Name = 'John';

DELETE

Removes rows from a table.

			
DELETE FROM Customers
WHERE Name = 'John';

💡 Important:
Always use a WHERE clause with UPDATE and DELETE to avoid affecting all rows.

3. Data Definition Language (DDL)

CREATE

Creates new database objects such as tables.

			
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    Name VARCHAR(100),
    City VARCHAR(50)
);

		

ALTER

Modifies an existing table.

			
ALTER TABLE Customers
ADD Email VARCHAR(100);

DROP

Deletes a table or database object.

DROP TABLE Customers;

💡 Warning:
DROP permanently removes the object and its data.

4. Additional Common SQL Clauses

WHERE

Filters rows:

			
SELECT * FROM Orders
WHERE Amount > 100;

ORDER BY

Sorts results:

			
SELECT * FROM Orders
ORDER BY Amount DESC;

GROUP BY

Aggregates data:

			
SELECT City, COUNT(*)
FROM Customers
GROUP BY City;

JOIN

Combines data from multiple tables:

			
SELECT Orders.OrderID, Customers.Name
FROM Orders
JOIN Customers
ON Orders.CustomerID = Customers.CustomerID;

💡 DP-900 Tip:
You don’t need deep JOIN knowledge — just understand that JOINs combine related tables.

SQL in Azure

SQL is used across many Azure services:

Azure SQL Database

Fully managed relational database
Uses T-SQL (Microsoft’s SQL variant)

Azure Synapse Analytics

Used for analytical queries on large datasets

Azure Database for PostgreSQL

Uses PostgreSQL SQL dialect

Why This Matters for DP-900

On the exam, you may be asked to:

Identify what a SQL statement does
Match commands to their purpose (SELECT, INSERT, etc.)
Recognize DDL vs DML
Understand basic query concepts like filtering and sorting

Summary — Exam-Relevant Takeaways

✔ SELECT → Retrieve data
✔ INSERT → Add new data
✔ UPDATE → Modify existing data
✔ DELETE → Remove data

✔ CREATE / ALTER / DROP → Define and modify structures
✔ WHERE → Filter results
✔ ORDER BY → Sort data
✔ GROUP BY → Aggregate data
✔ JOIN → Combine tables

✔ SQL is the standard language for relational databases

Go to the Practice Exam Questions for this topic.

Go to the Additional Practice Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

DP-900, Microsoft Certification, SQL May 10, 2026

Additional Practice Questions: Identify common Structured Query Language (SQL) statements – SQL JOIN Focused (DP-900 Exam Prep)

Practice Questions – SQL JOIN focused questions

Question 1

What is the purpose of a SQL JOIN?

A. To delete duplicate rows
B. To combine data from multiple tables
C. To sort query results
D. To filter columns

✅ Answer: B

Explanation:
JOIN is used to combine rows from two or more related tables.

Question 2

Which type of JOIN returns only matching rows from both tables?

A. LEFT JOIN
B. RIGHT JOIN
C. INNER JOIN
D. CROSS JOIN

✅ Answer: C

Explanation:
INNER JOIN returns only rows where there is a match in both tables.

Question 3

A LEFT JOIN returns:

A. Only matching rows
B. All rows from the right table only
C. All rows from the left table and matching rows from the right
D. Only non-matching rows

✅ Answer: C

Explanation:
LEFT JOIN keeps all rows from the left table, even if there is no match.

Question 4

What happens when there is no matching row in a RIGHT JOIN?

A. The row is removed
B. NULL values are returned for missing matches
C. The query fails
D. Only matched rows are shown

✅ Answer: B

Explanation:
Unmatched columns return NULL values.

Question 5

Which JOIN type returns all possible combinations of rows between two tables?

A. INNER JOIN
B. LEFT JOIN
C. CROSS JOIN
D. FULL JOIN

✅ Answer: C

Explanation:
CROSS JOIN produces a Cartesian product (all combinations).

Question 6

Which SQL clause is used to define how tables are related in a JOIN?

A. WHERE
B. GROUP BY
C. ON
D. ORDER BY

✅ Answer: C

Explanation:
The ON clause specifies the relationship between tables.

Question 7

Given two tables: Customers and Orders. Each customer may have multiple orders. Which JOIN is typically used to retrieve all customers and their orders?

A. INNER JOIN
B. LEFT JOIN
C. CROSS JOIN
D. SELF JOIN

✅ Answer: B

Explanation:
LEFT JOIN ensures all customers appear, even those without orders.

Question 8

What does an INNER JOIN exclude?

A. Duplicate rows
B. Non-matching rows
C. NULL values only
D. Primary keys

✅ Answer: B

Explanation:
INNER JOIN only returns rows with matching values in both tables.

Question 9

Which JOIN is MOST likely to return fewer rows than the original tables?

A. CROSS JOIN
B. INNER JOIN
C. LEFT JOIN
D. FULL OUTER JOIN

✅ Answer: B

Explanation:
INNER JOIN returns only matches, often reducing row count.

Question 10

Which statement best describes a FULL OUTER JOIN?

A. Returns only matching rows
B. Returns all rows from both tables, matching where possible
C. Returns only left table rows
D. Returns only right table rows

✅ Answer: B

Explanation:
FULL OUTER JOIN returns all rows from both tables, with NULLs where no match exists.

✅ Quick Exam Takeaways

For DP-900 JOINs, remember:

✔ JOIN = combine related tables
✔ INNER JOIN = only matches
✔ LEFT JOIN = all left + matches
✔ RIGHT JOIN = all right + matches
✔ CROSS JOIN = all combinations
✔ ON clause defines relationships
✔ Unmatched values become NULL

Go to the DP-900 Exam Prep Hub main page.

Data Development, DP-900, Microsoft Certification, SQL May 10, 2026

Practice Questions: Identify common Structured Query Language (SQL) statements (DP-900 Exam Prep)

Practice Questions

Question 1

Which SQL statement is used to retrieve data from a database?

A. INSERT
B. SELECT
C. UPDATE
D. DELETE

✅ Answer: B

Explanation:
The SELECT statement is used to query and retrieve data from tables.

Question 2

Which SQL statement adds new rows to a table?

A. INSERT
B. CREATE
C. ALTER
D. SELECT

✅ Answer: A

Explanation:
INSERT is used to add new records to a table.

Question 3

Which SQL statement modifies existing data in a table?

A. UPDATE
B. DELETE
C. SELECT
D. DROP

✅ Answer: A

Explanation:
UPDATE changes existing values in one or more rows.

Question 4

Which SQL statement removes rows from a table?

A. DROP
B. DELETE
C. ALTER
D. TRUNCATE

✅ Answer: B

Explanation:
DELETE removes specific rows based on a condition.

Question 5

Which SQL statement creates a new table?

A. ALTER
B. CREATE
C. INSERT
D. SELECT

✅ Answer: B

Explanation:
CREATE is used to define new database objects such as tables.

Question 6

Which clause is used to filter rows in a SQL query?

A. ORDER BY
B. GROUP BY
C. WHERE
D. HAVING

✅ Answer: C

Explanation:
WHERE filters rows based on conditions.

Question 7

Which SQL clause is used to sort query results?

A. ORDER BY
B. GROUP BY
C. WHERE
D. JOIN

✅ Answer: A

Explanation:
ORDER BY sorts results in ascending or descending order.

Question 8

Which SQL statement permanently removes a table and its structure?

A. DELETE
B. DROP
C. REMOVE
D. CLEAR

✅ Answer: B

Explanation:
DROP deletes the table and its structure completely.

Question 9

Which SQL operation is used to combine data from two related tables?

A. GROUP BY
B. JOIN
C. UNION
D. FILTER

✅ Answer: B

Explanation:
JOIN combines rows from multiple tables based on related columns.

Question 10

Which category of SQL statements is used to define or modify database structures?

A. DML
B. DQL
C. DDL
D. DCL

✅ Answer: C

Explanation:
DDL (Data Definition Language) includes CREATE, ALTER, and DROP.

✅ Quick Exam Takeaways

For DP-900, remember:

✔ SELECT → retrieve data
✔ INSERT → add data
✔ UPDATE → modify data
✔ DELETE → remove data
✔ CREATE / ALTER / DROP → manage structure
✔ WHERE → filter results
✔ ORDER BY → sort results
✔ JOIN → combine tables
✔ SQL categories: DDL, DML, DQL

Go to the DP-900 Exam Prep Hub main page.

Data Careers, Data Engineering, Data Integration January 22, 2026January 22, 2026

What Exactly Does a Data Engineer Do?

A Data Engineer is responsible for building and maintaining the systems that allow data to be collected, stored, transformed, and delivered reliably for analytics and downstream use cases. While Data Analysts focus on insights and decision-making, Data Engineers focus on making data available, trustworthy, and scalable.

In many organizations, nothing in analytics works well without strong data engineering underneath it.

The Core Purpose of a Data Engineer

At its core, the role of a Data Engineer is to:

Design and build data pipelines
Ensure data is reliable, timely, and accessible
Create the foundation that enables analytics, reporting, and data science

Data Engineers make sure that when someone asks a question of the data, the data is actually there—and correct.

Typical Responsibilities of a Data Engineer

While the exact responsibilities vary by company size and maturity, most Data Engineers spend time across the following areas.

Ingesting Data from Source Systems

Data Engineers build processes to ingest data from:

Operational databases
SaaS applications
APIs and event streams
Files and external data sources

This ingestion can be batch-based, streaming, or a mix of both, depending on the business needs.

Building and Maintaining Data Pipelines

Once data is ingested, Data Engineers:

Transform raw data into usable formats
Handle schema changes and data drift
Manage dependencies and scheduling
Monitor pipelines for failures and performance issues

Pipelines must be repeatable, resilient, and observable.

Managing Data Storage and Platforms

Data Engineers design and maintain:

Data warehouses and lakehouses
Data lakes and object storage
Partitioning, indexing, and performance strategies

They balance cost, performance, scalability, and ease of use while aligning with organizational standards.

Ensuring Data Quality and Reliability

A key responsibility is ensuring data can be trusted. This includes:

Validating data completeness and accuracy
Detecting anomalies or missing data
Implementing data quality checks and alerts
Supporting SLAs for data freshness

Reliable data is not accidental—it is engineered.

Enabling Analytics and Downstream Use Cases

Data Engineers work closely with:

Data Analysts and BI developers
Analytics engineers
Data scientists and ML engineers

They ensure datasets are structured in a way that supports efficient querying, consistent metrics, and self-service analytics.

Common Tools Used by Data Engineers

The exact toolset varies, but Data Engineers often work with:

Databases & Warehouses (e.g., cloud data platforms)
ETL / ELT Tools and orchestration frameworks
SQL for transformations and validation
Programming Languages such as Python, Java, or Scala
Streaming Technologies for real-time data
Infrastructure & Cloud Platforms
Monitoring and Observability Tools

Tooling matters, but design decisions matter more.

What a Data Engineer Is Not

Understanding role boundaries helps teams work effectively.

A Data Engineer is typically not:

A report or dashboard builder
A business stakeholder defining KPIs
A data scientist focused on modeling and experimentation
A system administrator managing only infrastructure

That said, in smaller teams, Data Engineers may wear multiple hats.

What the Role Looks Like Day-to-Day

A typical day for a Data Engineer might include:

Investigating a failed pipeline or delayed data load
Updating transformations to accommodate schema changes
Optimizing a slow query or job
Reviewing data quality alerts
Coordinating with analysts on new data needs
Deploying pipeline updates

Much of the work is preventative—ensuring problems don’t happen later.

How the Role Evolves Over Time

As organizations mature, the Data Engineer role evolves:

From manual ETL → automated, scalable pipelines
From siloed systems → centralized platforms
From reactive fixes → proactive reliability engineering
From data movement → data platform architecture

Senior Data Engineers often influence platform strategy, standards, and long-term technical direction.

Why Data Engineers Are So Important

Data Engineers are critical because:

They prevent analytics from becoming fragile or inconsistent
They enable speed without sacrificing trust
They scale data usage across the organization
They reduce technical debt and operational risk

Without strong data engineering, analytics becomes slow, unreliable, and difficult to scale.

Final Thoughts

A Data Engineer’s job is not just moving data from one place to another. It is about designing systems that make data dependable, usable, and sustainable.

When Data Engineers do their job well, everyone downstream—from analysts to executives—can focus on asking better questions instead of questioning the data itself.

Good luck on your data journey!

Data Munging, Data Warehousing, Data Wrangling, SQL August 30, 2024July 29, 2024

SQL Tips: How to generate insert statements using the data from the output of a select statement using Toad

If you need to build insert statements for many rows of data for inserting that data into another table or into the same table in a different environment, there is a convenient way to do this in Toad. This is often needed to move data around.

In Toad, execute the appropriate select statement on the source table …

Select * from [my_schema].[my_simple_table];

I used a simple “select *” above, but your SQL statement can be any valid SQL statement that returns the data you want to insert into the other table. You may add specific columns, add filters, joins, and any other valid SQL operation.

Let’s say you want to insert the output into another table in a different schema.

Right-click on the output result data, and click “Export Dataset…”

From the “Export format” drop down menu, choose “Insert Statements”

In the Output section of the Export Dataset dialog box, enter the location and name of the Insert Script file that will be generated.

There are several other parameters that you could choose but we won’t cover them all here.

If you only wanted to generate inserts for some selected rows, select “Export only selected rows”.

If you need to specify the schema of the target table, select “Include schema name”

In the Table section, enter the name of the target schema and table

Note, there are data masking options available that can be very useful if, for example, you are moving some data from a Production environment to a Non-Production environment, and you do not want to expose the data there.

After you have set the parameters relevant to your scenario, Click “OK”.

The Insert Script file of all the data will be generated with the filename and at the location you specified. And the insert statements will include the name of the schema and table you specified.

Thanks for reading!

Data Development, Database Administration, Databases, SQL August 10, 2023

SQL Tips: How to rename a column in a table – Oracle database – Oracle SQL

At times you will need to change the name of a column in an existing table. If you are not changing the data type, it is just one statement / step that needs to be executed. However, I strongly recommend that you also do a backup step, especially if you’re making the change in a production environment, just in case of an unexpected issue.

If you choose to do the backup, you may perform this with a “create-table-as-select” statement in this form:

create table [table_name_backup] as select * from [table_name];

Here is an example of the above statement:

create table EMPLOYEES_BKUP as select * from EMPLOYEES;

Now that the table you are modifying is all backed up, you can proceed to rename the column.

The rename SQL statement would take this form:

alter table [table_name] rename column [existing_column_name] to [new_column_name];

An example of the statement:

alter table EMPLOYEES rename column SEX to GENDER;

Thanks for reading! I hope you found this information useful.

Data Development, SQL July 14, 2023July 14, 2023

SQL tips: Get last day and first day of month – Oracle SQL

At times you may need to dynamically determine the first or last day of a month based on the current date or some date in your data. This post provides a few options for deriving the First day of month, and Last day of month. And also, it provides a couple tweaks to get Last day of the next or previous month.

Get the current date (today’s date):
select sysdate from dual;

SYSDATE

16-MAY-22

Get the last day of the current month:
select trunc(last_day(sysdate)) as LastDayOfMonth from dual;

LASTDAYOFMONTH

31-MAY-22

Side note: The last_day() function can also be useful for dynamically determining leap years (that is, whether February has 28 or 29 days)
select
last_day(date ‘2020-02-01’) LastDayOfFeb2020, — leap year
last_day(date ‘2021-02-01’) LastDayOfFeb2021 — not a leap year
from dual;

LASTDAYOFFEB2020 LASTDAYOFFEB2021

29-FEB-20 28-FEB-21

Get the last day of the next month:
select add_months(trunc(last_day(sysdate)), 1) as LastDayOfNextMonth from dual;

LASTDAYOFNEXTMONTH

30-JUN-22

Get the last day of the previous month:
select add_months(trunc(last_day(sysdate)), -1) as LastDayOfPreviousMonth from dual;

LASTDAYOFPREVIOUSMONTH

30-APR-22

Get the first day of the current month:

select trunc(sysdate,’month’) as FirstDayOfMonth from dual;
select trunc(sysdate,’MM’) as FirstDayOfMonth from dual;

FIRSTDAYOFMONTH

01-MAY-22

Get the first day of the next month:

select add_months(trunc(sysdate,’MM’), 1) as FirstDayOfNextMonth from dual;

FIRSTDAYOFNEXTMONTH

01-JUN-22

You can also get the first day of the next month using this:
select trunc(last_day(sysdate)+1) as FirstDayOfNextMonth from dual;

FIRSTDAYOFNEXTMONTH

01-JUN-22

Get the first day of the previous month:

select add_months(trunc(sysdate,’MM’), -1) as FirstDayOfPreviousMonth from dual;

FIRSTDAYOFPREVIOUSMONTH

01-APR-22

Here are 3 resources for getting the first and last day of the month values on a SQL Server database:

https://zarez.net/?p=2462

SQL SERVER – Simple Method to Find FIRST and LAST Day of Current Date

SQL Server EOMONTH Function

Thanks for reading! I hope you found this information useful!

Good luck on your data journey.