Tag: SQL

Transform data by using PySpark, SQL, and KQL (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Ingest and transform data (30–35%)
   --> Ingest and transform batch data
      --> Transform data by using PySpark, SQL, and KQL


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the most important skills for the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric certification exam is knowing how to transform data using the appropriate technology. Microsoft Fabric provides multiple transformation engines, each optimized for specific workloads:

  • PySpark for large-scale distributed data engineering and advanced transformations
  • SQL for relational data manipulation, warehousing, and analytics
  • KQL (Kusto Query Language) for high-volume log, telemetry, event, and time-series data analysis

A successful Fabric Data Engineer must understand not only how each technology works, but also when to choose one over another.


Understanding the Transformation Options in Microsoft Fabric

Microsoft Fabric supports several data processing experiences:

TechnologyPrimary Use CaseCommon Fabric Components
PySparkBig data processing and engineeringLakehouse, Notebooks
SQLRelational transformations and analyticsWarehouse, SQL Endpoint
KQLStreaming, telemetry, logs, event analyticsEventhouse, Real-Time Intelligence

While all three can transform data, they are designed for different scenarios.


Transforming Data with PySpark

What is PySpark?

PySpark is the Python API for Apache Spark.

Spark is a distributed processing engine that allows data engineers to process extremely large datasets across multiple nodes simultaneously.

Within Microsoft Fabric, PySpark is typically used in:

  • Notebooks
  • Lakehouses
  • Spark Job Definitions

When to Use PySpark

PySpark is ideal when:

  • Working with large-scale datasets
  • Performing complex transformations
  • Processing semi-structured data
  • Building data engineering pipelines
  • Performing machine learning preparation
  • Handling Delta Lake tables

Examples include:

  • Cleaning raw data
  • Parsing JSON files
  • Aggregating billions of records
  • Creating dimensional model tables
  • Performing data quality checks

Reading Data with PySpark

Example:

df = spark.read.format("delta").load("Tables/Sales")

Filtering Data

filtered_df = df.filter(df.Amount > 1000)

Creating New Columns

from pyspark.sql.functions import col
new_df = df.withColumn(
"TaxAmount",
col("Amount") * 0.07
)

Aggregating Data

from pyspark.sql.functions import sum
summary_df = (
df.groupBy("Region")
.agg(sum("Amount").alias("TotalSales"))
)

Writing Results

summary_df.write.mode("overwrite").saveAsTable("SalesSummary")

PySpark Advantages

Scalability

Handles terabytes and petabytes of data.

Distributed Processing

Automatically parallelizes workloads.

Flexibility

Supports:

  • Structured data
  • Semi-structured data
  • Unstructured data

Data Engineering Focus

Excellent for ETL and ELT processes.


PySpark Limitations

  • More complex than SQL
  • Requires programming skills
  • Less familiar to business analysts
  • Higher resource consumption for small workloads

Transforming Data with SQL

What is SQL in Fabric?

SQL remains one of the most commonly used languages in Fabric.

You can use SQL within:

  • Fabric Data Warehouse
  • Lakehouse SQL Endpoint
  • SQL Query Editor
  • Stored Procedures
  • Data Pipelines

When to Use SQL

SQL is ideal for:

  • Relational transformations
  • Data warehouse development
  • Reporting datasets
  • Aggregations
  • Joins
  • Dimensional modeling

Examples:

  • Creating fact tables
  • Loading dimensions
  • Building reporting views
  • Data validation

Filtering Records

SELECT *
FROM Sales
WHERE Amount > 1000;

Aggregations

SELECT
Region,
SUM(Amount) AS TotalSales
FROM Sales
GROUP BY Region;

Joining Tables

SELECT
s.SaleID,
c.CustomerName
FROM Sales s
INNER JOIN Customer c
ON s.CustomerID = c.CustomerID;

Creating Transformation Tables

CREATE TABLE SalesSummary AS
SELECT
Region,
SUM(Amount) AS TotalSales
FROM Sales
GROUP BY Region;

SQL Advantages

Familiarity

Most data professionals know SQL.

Readability

Easy to understand and maintain.

Relational Optimization

Optimized for joins and aggregations.

Warehousing Support

Ideal for star schemas and dimensional models.


SQL Limitations

  • Less effective for complex data engineering workflows
  • Not ideal for large-scale semi-structured data processing
  • Limited flexibility compared to PySpark

Transforming Data with KQL

What is KQL?

Kusto Query Language (KQL) is a read-optimized query language designed for:

  • Telemetry
  • Log analytics
  • Event processing
  • Streaming data
  • Time-series analysis

KQL is commonly used in:

  • Eventhouse
  • Real-Time Intelligence
  • KQL Databases

When to Use KQL

Use KQL when working with:

  • Sensor data
  • IoT events
  • Application logs
  • Security monitoring
  • Streaming datasets
  • Time-series analytics

Examples:

  • Monitoring manufacturing equipment
  • Detecting anomalies
  • Security event analysis
  • Operational dashboards

Filtering Data

Events
| where Temperature > 100

Summarization

Events
| summarize AvgTemp = avg(Temperature)
by DeviceID

Time-Series Analysis

Events
| summarize Count=count()
by bin(Timestamp, 1h)

Detecting Trends

Events
| make-series AvgTemp=avg(Temperature)
on Timestamp
step 1h

KQL Advantages

High Performance

Optimized for large event datasets.

Time-Series Analytics

Excellent for temporal analysis.

Streaming Support

Designed for real-time workloads.

Fast Query Execution

Ideal for operational dashboards.


KQL Limitations

  • Not intended for traditional data warehousing
  • Less suitable for dimensional modeling
  • Not commonly used for batch ETL

Comparing PySpark, SQL, and KQL

RequirementBest Choice
Large-scale ETLPySpark
Data warehouse transformationsSQL
Star schema creationSQL
Streaming analyticsKQL
Time-series analysisKQL
Semi-structured JSON processingPySpark
Machine learning preparationPySpark
Business reporting datasetsSQL
Eventhouse analyticsKQL
Massive Delta Lake processingPySpark

Choosing the Right Transformation Tool

Choose PySpark When

  • Processing very large datasets
  • Working with Data Lake data
  • Building engineering pipelines
  • Handling JSON or Parquet files
  • Performing advanced transformations

Choose SQL When

  • Building warehouses
  • Creating dimensional models
  • Developing reporting datasets
  • Performing relational transformations
  • Creating views and stored procedures

Choose KQL When

  • Working with event streams
  • Analyzing telemetry
  • Investigating logs
  • Performing time-series analysis
  • Monitoring operational systems

Exam Tips

Know the Primary Use Cases

A common DP-700 exam question asks which technology is most appropriate for a scenario.

Remember:

  • PySpark = Big Data Engineering
  • SQL = Relational Analytics and Warehousing
  • KQL = Real-Time and Time-Series Analytics

Understand Fabric Components

Know where each technology is primarily used:

TechnologyFabric Experience
PySparkLakehouse, Notebook
SQLWarehouse, SQL Endpoint
KQLEventhouse

Focus on Scenario-Based Questions

The exam frequently describes a business requirement and asks which technology should be used.

For example:

  • IoT sensors → KQL
  • Warehouse dimension tables → SQL
  • Processing billions of JSON records → PySpark

Practice Exam Questions

Question 1

A data engineer must transform 20 TB of semi-structured JSON data stored in OneLake. Which technology is the best choice?

A. SQL

B. PySpark

C. KQL

D. Power Query

Answer: B

Explanation: PySpark is designed for distributed processing of massive datasets and handles semi-structured formats such as JSON efficiently.


Question 2

A Fabric solution requires creation of a star schema consisting of fact and dimension tables. Which technology is most appropriate?

A. SQL

B. KQL

C. Power BI DAX

D. Data Activator

Answer: A

Explanation: SQL is optimized for relational transformations and dimensional modeling commonly used in data warehouses.


Question 3

A company wants to analyze millions of IoT events arriving continuously from factory equipment. Which technology should be used?

A. KQL

B. Power Query

C. SQL

D. Excel

Answer: A

Explanation: KQL is designed specifically for high-volume event, telemetry, and time-series analysis workloads.


Question 4

Which Fabric component is most closely associated with KQL transformations?

A. Warehouse

B. Notebook

C. SQL Endpoint

D. Eventhouse

Answer: D

Explanation: Eventhouse is the primary Fabric experience for KQL-based analytics and real-time intelligence workloads.


Question 5

A data engineer needs to process Delta Lake tables using distributed compute. Which technology should be selected?

A. KQL

B. SQL

C. PySpark

D. Power BI

Answer: C

Explanation: PySpark integrates directly with Delta Lake and supports scalable distributed processing.


Question 6

Which language is specifically optimized for time-series analysis?

A. SQL

B. KQL

C. Python

D. DAX

Answer: B

Explanation: KQL includes built-in capabilities for temporal aggregation, anomaly detection, and time-series analytics.


Question 7

A Fabric Warehouse team needs to build a reusable transformation layer consisting of joins, aggregations, and views. Which technology should they use?

A. SQL

B. KQL

C. Dataflows Gen2

D. Spark ML

Answer: A

Explanation: SQL is the preferred language for relational transformations and warehouse development.


Question 8

Which technology is generally the best choice for preparing large datasets for machine learning?

A. KQL

B. SQL

C. DAX

D. PySpark

Answer: D

Explanation: PySpark provides scalable data preparation capabilities and integrates well with machine learning workflows.


Question 9

An engineer needs to summarize application log events by hour and identify usage trends. Which technology is most appropriate?

A. PySpark

B. Power Query

C. KQL

D. SQL

Answer: C

Explanation: KQL excels at log analytics, event monitoring, and time-based aggregations.


Question 10

A team needs a transformation language that is familiar to most database developers and optimized for relational joins. Which should they choose?

A. PySpark

B. KQL

C. Power Query

D. SQL

Answer: D

Explanation: SQL remains the standard language for relational querying, joins, aggregations, and warehouse transformations.


Go to the DP-700 Exam Prep Hub main page.

Identify common Structured Query Language (SQL) statements (DP-900 Exam Prep)

This post is a part of the DP-900: Microsoft Azure Data Fundamentals Exam Prep Hub. 
This topic falls under these sections:
Identify considerations for relational data on Azure (20–25%)
--> Describe relational concepts
--> Identify common Structured Query Language (SQL) statements


Note that there are 10 practice questions (with answers and explanations) for each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available on the hub below the exam topics section.

Understanding basic SQL statements is essential for working with relational data and is a key requirement for the DP-900 exam. You are not expected to be an advanced SQL developer, but you should recognize common SQL commands, their purpose, and when they are used.


What Is SQL?

Structured Query Language (SQL) is the standard language used to:

  • Query data
  • Insert new data
  • Update existing data
  • Delete data
  • Define database structures

SQL is used across relational database systems, including Azure services like:

  • Azure SQL Database
  • Azure Database for PostgreSQL
  • Azure Database for MySQL

Categories of SQL Statements

SQL statements are typically grouped into categories:

CategoryPurpose
DDL (Data Definition Language)Define and modify database structures
DML (Data Manipulation Language)Work with data in tables
DQL (Data Query Language)Retrieve data
DCL (Data Control Language)Manage permissions

For DP-900, focus primarily on DDL, DML, and DQL.


1. Data Query Language (DQL)


SELECT

Used to retrieve data from a table.

SELECT Name, City
FROM Customers;

You can filter results:

SELECT Name
FROM Customers
WHERE City = 'Seattle';

💡 Key Points:

  • Most commonly used SQL statement
  • Can include filtering, sorting, and grouping

2. Data Manipulation Language (DML)


INSERT

Adds new rows to a table.

INSERT INTO Customers (Name, City)
VALUES ('John', 'Seattle');

UPDATE

Modifies existing data.

UPDATE Customers
SET City = 'Austin'
WHERE Name = 'John';

DELETE

Removes rows from a table.

DELETE FROM Customers
WHERE Name = 'John';

💡 Important:
Always use a WHERE clause with UPDATE and DELETE to avoid affecting all rows.


3. Data Definition Language (DDL)


CREATE

Creates new database objects such as tables.

CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
Name VARCHAR(100),
City VARCHAR(50)
);

ALTER

Modifies an existing table.

ALTER TABLE Customers
ADD Email VARCHAR(100);

DROP

Deletes a table or database object.

DROP TABLE Customers;

💡 Warning:
DROP permanently removes the object and its data.


4. Additional Common SQL Clauses


WHERE

Filters rows:

SELECT * FROM Orders
WHERE Amount > 100;

ORDER BY

Sorts results:

SELECT * FROM Orders
ORDER BY Amount DESC;

GROUP BY

Aggregates data:

SELECT City, COUNT(*)
FROM Customers
GROUP BY City;

JOIN

Combines data from multiple tables:

SELECT Orders.OrderID, Customers.Name
FROM Orders
JOIN Customers
ON Orders.CustomerID = Customers.CustomerID;

💡 DP-900 Tip:
You don’t need deep JOIN knowledge — just understand that JOINs combine related tables.


SQL in Azure

SQL is used across many Azure services:


Azure SQL Database

  • Fully managed relational database
  • Uses T-SQL (Microsoft’s SQL variant)

Azure Synapse Analytics

  • Used for analytical queries on large datasets

Azure Database for PostgreSQL

  • Uses PostgreSQL SQL dialect

Why This Matters for DP-900

On the exam, you may be asked to:

  • Identify what a SQL statement does
  • Match commands to their purpose (SELECT, INSERT, etc.)
  • Recognize DDL vs DML
  • Understand basic query concepts like filtering and sorting

Summary — Exam-Relevant Takeaways

SELECT → Retrieve data
INSERT → Add new data
UPDATE → Modify existing data
DELETE → Remove data

CREATE / ALTER / DROP → Define and modify structures
WHERE → Filter results
ORDER BY → Sort data
GROUP BY → Aggregate data
JOIN → Combine tables

✔ SQL is the standard language for relational databases


Go to the Practice Exam Questions for this topic.

Go to the Additional Practice Questions for this topic.

Go to the DP-900 Exam Prep Hub main page.

Additional Practice Questions: Identify common Structured Query Language (SQL) statements – SQL JOIN Focused (DP-900 Exam Prep)

Practice Questions – SQL JOIN focused questions


Question 1

What is the purpose of a SQL JOIN?

A. To delete duplicate rows
B. To combine data from multiple tables
C. To sort query results
D. To filter columns

Answer: B

Explanation:
JOIN is used to combine rows from two or more related tables.


Question 2

Which type of JOIN returns only matching rows from both tables?

A. LEFT JOIN
B. RIGHT JOIN
C. INNER JOIN
D. CROSS JOIN

Answer: C

Explanation:
INNER JOIN returns only rows where there is a match in both tables.


Question 3

A LEFT JOIN returns:

A. Only matching rows
B. All rows from the right table only
C. All rows from the left table and matching rows from the right
D. Only non-matching rows

Answer: C

Explanation:
LEFT JOIN keeps all rows from the left table, even if there is no match.


Question 4

What happens when there is no matching row in a RIGHT JOIN?

A. The row is removed
B. NULL values are returned for missing matches
C. The query fails
D. Only matched rows are shown

Answer: B

Explanation:
Unmatched columns return NULL values.


Question 5

Which JOIN type returns all possible combinations of rows between two tables?

A. INNER JOIN
B. LEFT JOIN
C. CROSS JOIN
D. FULL JOIN

Answer: C

Explanation:
CROSS JOIN produces a Cartesian product (all combinations).


Question 6

Which SQL clause is used to define how tables are related in a JOIN?

A. WHERE
B. GROUP BY
C. ON
D. ORDER BY

Answer: C

Explanation:
The ON clause specifies the relationship between tables.


Question 7

Given two tables: Customers and Orders. Each customer may have multiple orders. Which JOIN is typically used to retrieve all customers and their orders?

A. INNER JOIN
B. LEFT JOIN
C. CROSS JOIN
D. SELF JOIN

Answer: B

Explanation:
LEFT JOIN ensures all customers appear, even those without orders.


Question 8

What does an INNER JOIN exclude?

A. Duplicate rows
B. Non-matching rows
C. NULL values only
D. Primary keys

Answer: B

Explanation:
INNER JOIN only returns rows with matching values in both tables.


Question 9

Which JOIN is MOST likely to return fewer rows than the original tables?

A. CROSS JOIN
B. INNER JOIN
C. LEFT JOIN
D. FULL OUTER JOIN

Answer: B

Explanation:
INNER JOIN returns only matches, often reducing row count.


Question 10

Which statement best describes a FULL OUTER JOIN?

A. Returns only matching rows
B. Returns all rows from both tables, matching where possible
C. Returns only left table rows
D. Returns only right table rows

Answer: B

Explanation:
FULL OUTER JOIN returns all rows from both tables, with NULLs where no match exists.


✅ Quick Exam Takeaways

For DP-900 JOINs, remember:

✔ JOIN = combine related tables
✔ INNER JOIN = only matches
✔ LEFT JOIN = all left + matches
✔ RIGHT JOIN = all right + matches
✔ CROSS JOIN = all combinations
✔ ON clause defines relationships
✔ Unmatched values become NULL


Go to the DP-900 Exam Prep Hub main page.

Practice Questions: Identify common Structured Query Language (SQL) statements (DP-900 Exam Prep)

Practice Questions


Question 1

Which SQL statement is used to retrieve data from a database?

A. INSERT
B. SELECT
C. UPDATE
D. DELETE

Answer: B

Explanation:
The SELECT statement is used to query and retrieve data from tables.


Question 2

Which SQL statement adds new rows to a table?

A. INSERT
B. CREATE
C. ALTER
D. SELECT

Answer: A

Explanation:
INSERT is used to add new records to a table.


Question 3

Which SQL statement modifies existing data in a table?

A. UPDATE
B. DELETE
C. SELECT
D. DROP

Answer: A

Explanation:
UPDATE changes existing values in one or more rows.


Question 4

Which SQL statement removes rows from a table?

A. DROP
B. DELETE
C. ALTER
D. TRUNCATE

Answer: B

Explanation:
DELETE removes specific rows based on a condition.


Question 5

Which SQL statement creates a new table?

A. ALTER
B. CREATE
C. INSERT
D. SELECT

Answer: B

Explanation:
CREATE is used to define new database objects such as tables.


Question 6

Which clause is used to filter rows in a SQL query?

A. ORDER BY
B. GROUP BY
C. WHERE
D. HAVING

Answer: C

Explanation:
WHERE filters rows based on conditions.


Question 7

Which SQL clause is used to sort query results?

A. ORDER BY
B. GROUP BY
C. WHERE
D. JOIN

Answer: A

Explanation:
ORDER BY sorts results in ascending or descending order.


Question 8

Which SQL statement permanently removes a table and its structure?

A. DELETE
B. DROP
C. REMOVE
D. CLEAR

Answer: B

Explanation:
DROP deletes the table and its structure completely.


Question 9

Which SQL operation is used to combine data from two related tables?

A. GROUP BY
B. JOIN
C. UNION
D. FILTER

Answer: B

Explanation:
JOIN combines rows from multiple tables based on related columns.


Question 10

Which category of SQL statements is used to define or modify database structures?

A. DML
B. DQL
C. DDL
D. DCL

Answer: C

Explanation:
DDL (Data Definition Language) includes CREATE, ALTER, and DROP.


✅ Quick Exam Takeaways

For DP-900, remember:

SELECT → retrieve data
INSERT → add data
UPDATE → modify data
DELETE → remove data
CREATE / ALTER / DROP → manage structure
WHERE → filter results
ORDER BY → sort results
JOIN → combine tables
✔ SQL categories: DDL, DML, DQL


Go to the DP-900 Exam Prep Hub main page.

What Exactly Does a Data Engineer Do?

A Data Engineer is responsible for building and maintaining the systems that allow data to be collected, stored, transformed, and delivered reliably for analytics and downstream use cases. While Data Analysts focus on insights and decision-making, Data Engineers focus on making data available, trustworthy, and scalable.

In many organizations, nothing in analytics works well without strong data engineering underneath it.


The Core Purpose of a Data Engineer

At its core, the role of a Data Engineer is to:

  • Design and build data pipelines
  • Ensure data is reliable, timely, and accessible
  • Create the foundation that enables analytics, reporting, and data science

Data Engineers make sure that when someone asks a question of the data, the data is actually there—and correct.


Typical Responsibilities of a Data Engineer

While the exact responsibilities vary by company size and maturity, most Data Engineers spend time across the following areas.


Ingesting Data from Source Systems

Data Engineers build processes to ingest data from:

  • Operational databases
  • SaaS applications
  • APIs and event streams
  • Files and external data sources

This ingestion can be batch-based, streaming, or a mix of both, depending on the business needs.


Building and Maintaining Data Pipelines

Once data is ingested, Data Engineers:

  • Transform raw data into usable formats
  • Handle schema changes and data drift
  • Manage dependencies and scheduling
  • Monitor pipelines for failures and performance issues

Pipelines must be repeatable, resilient, and observable.


Managing Data Storage and Platforms

Data Engineers design and maintain:

  • Data warehouses and lakehouses
  • Data lakes and object storage
  • Partitioning, indexing, and performance strategies

They balance cost, performance, scalability, and ease of use while aligning with organizational standards.


Ensuring Data Quality and Reliability

A key responsibility is ensuring data can be trusted. This includes:

  • Validating data completeness and accuracy
  • Detecting anomalies or missing data
  • Implementing data quality checks and alerts
  • Supporting SLAs for data freshness

Reliable data is not accidental—it is engineered.


Enabling Analytics and Downstream Use Cases

Data Engineers work closely with:

  • Data Analysts and BI developers
  • Analytics engineers
  • Data scientists and ML engineers

They ensure datasets are structured in a way that supports efficient querying, consistent metrics, and self-service analytics.


Common Tools Used by Data Engineers

The exact toolset varies, but Data Engineers often work with:

  • Databases & Warehouses (e.g., cloud data platforms)
  • ETL / ELT Tools and orchestration frameworks
  • SQL for transformations and validation
  • Programming Languages such as Python, Java, or Scala
  • Streaming Technologies for real-time data
  • Infrastructure & Cloud Platforms
  • Monitoring and Observability Tools

Tooling matters, but design decisions matter more.


What a Data Engineer Is Not

Understanding role boundaries helps teams work effectively.

A Data Engineer is typically not:

  • A report or dashboard builder
  • A business stakeholder defining KPIs
  • A data scientist focused on modeling and experimentation
  • A system administrator managing only infrastructure

That said, in smaller teams, Data Engineers may wear multiple hats.


What the Role Looks Like Day-to-Day

A typical day for a Data Engineer might include:

  • Investigating a failed pipeline or delayed data load
  • Updating transformations to accommodate schema changes
  • Optimizing a slow query or job
  • Reviewing data quality alerts
  • Coordinating with analysts on new data needs
  • Deploying pipeline updates

Much of the work is preventative—ensuring problems don’t happen later.


How the Role Evolves Over Time

As organizations mature, the Data Engineer role evolves:

  • From manual ETL → automated, scalable pipelines
  • From siloed systems → centralized platforms
  • From reactive fixes → proactive reliability engineering
  • From data movement → data platform architecture

Senior Data Engineers often influence platform strategy, standards, and long-term technical direction.


Why Data Engineers Are So Important

Data Engineers are critical because:

  • They prevent analytics from becoming fragile or inconsistent
  • They enable speed without sacrificing trust
  • They scale data usage across the organization
  • They reduce technical debt and operational risk

Without strong data engineering, analytics becomes slow, unreliable, and difficult to scale.


Final Thoughts

A Data Engineer’s job is not just moving data from one place to another. It is about designing systems that make data dependable, usable, and sustainable.

When Data Engineers do their job well, everyone downstream—from analysts to executives—can focus on asking better questions instead of questioning the data itself.

Good luck on your data journey!

SQL Tips: How to generate insert statements using the data from the output of a select statement using Toad

If you need to build insert statements for many rows of data for inserting that data into another table or into the same table in a different environment, there is a convenient way to do this in Toad. This is often needed to move data around.

In Toad, execute the appropriate select statement on the source table …

Select * from [my_schema].[my_simple_table];

I used a simple “select *” above, but your SQL statement can be any valid SQL statement that returns the data you want to insert into the other table. You may add specific columns, add filters, joins, and any other valid SQL operation.

Let’s say you want to insert the output into another table in a different schema.

Right-click on the output result data, and click “Export Dataset…”

From the “Export format” drop down menu, choose “Insert Statements”

In the Output section of the Export Dataset dialog box, enter the location and name of the Insert Script file that will be generated.

There are several other parameters that you could choose but we won’t cover them all here.

If you only wanted to generate inserts for some selected rows, select “Export only selected rows”.

If you need to specify the schema of the target table, select “Include schema name”

In the Table section, enter the name of the target schema and table

Note, there are data masking options available that can be very useful if, for example, you are moving some data from a Production environment to a Non-Production environment, and you do not want to expose the data there.

After you have set the parameters relevant to your scenario, Click “OK”.

The Insert Script file of all the data will be generated with the filename and at the location you specified. And the insert statements will include the name of the schema and table you specified.

Thanks for reading!

SQL Tips: How to rename a column in a table – Oracle database – Oracle SQL

At times you will need to change the name of a column in an existing table. If you are not changing the data type, it is just one statement / step that needs to be executed. However, I strongly recommend that you also do a backup step, especially if you’re making the change in a production environment, just in case of an unexpected issue.

If you choose to do the backup, you may perform this with a “create-table-as-select” statement in this form:

create table [table_name_backup] as select * from [table_name];

Here is an example of the above statement:

create table EMPLOYEES_BKUP as select * from EMPLOYEES;

Now that the table you are modifying is all backed up, you can proceed to rename the column.

The rename SQL statement would take this form:

alter table [table_name] rename column [existing_column_name] to [new_column_name];

An example of the statement:

alter table EMPLOYEES rename column SEX to GENDER;

Thanks for reading! I hope you found this information useful.

SQL tips: Get last day and first day of month – Oracle SQL

At times you may need to dynamically determine the first or last day of a month based on the current date or some date in your data. This post provides a few options for deriving the First day of month, and Last day of month. And also, it provides a couple tweaks to get Last day of the next or previous month.

Get the current date (today’s date):
select sysdate from dual;

SYSDATE

16-MAY-22

Get the last day of the current month:
select trunc(last_day(sysdate)) as LastDayOfMonth from dual;

LASTDAYOFMONTH

31-MAY-22

Side note: The last_day() function can also be useful for dynamically determining leap years (that is, whether February has 28 or 29 days)
select
last_day(date ‘2020-02-01’) LastDayOfFeb2020, — leap year
last_day(date ‘2021-02-01’) LastDayOfFeb2021 — not a leap year
from dual;

LASTDAYOFFEB2020 LASTDAYOFFEB2021

29-FEB-20 28-FEB-21

Get the last day of the next month:
select add_months(trunc(last_day(sysdate)), 1) as LastDayOfNextMonth from dual;

LASTDAYOFNEXTMONTH

30-JUN-22

Get the last day of the previous month:
select add_months(trunc(last_day(sysdate)), -1) as LastDayOfPreviousMonth from dual;

LASTDAYOFPREVIOUSMONTH

30-APR-22

Get the first day of the current month:

select trunc(sysdate,’month’) as FirstDayOfMonth from dual;
select trunc(sysdate,’MM’) as FirstDayOfMonth from dual;

FIRSTDAYOFMONTH

01-MAY-22

Get the first day of the next month:

select add_months(trunc(sysdate,’MM’), 1) as FirstDayOfNextMonth from dual;

FIRSTDAYOFNEXTMONTH

01-JUN-22

You can also get the first day of the next month using this:
select trunc(last_day(sysdate)+1) as FirstDayOfNextMonth from dual;

FIRSTDAYOFNEXTMONTH

01-JUN-22

Get the first day of the previous month:

select add_months(trunc(sysdate,’MM’), -1) as FirstDayOfPreviousMonth from dual;

FIRSTDAYOFPREVIOUSMONTH

01-APR-22

Here are 3 resources for getting the first and last day of the month values on a SQL Server database:

https://zarez.net/?p=2462

Thanks for reading! I hope you found this information useful!

Good luck on your data journey.