Tag: SQL

What Exactly Does a Data Engineer Do?

A Data Engineer is responsible for building and maintaining the systems that allow data to be collected, stored, transformed, and delivered reliably for analytics and downstream use cases. While Data Analysts focus on insights and decision-making, Data Engineers focus on making data available, trustworthy, and scalable.

In many organizations, nothing in analytics works well without strong data engineering underneath it.


The Core Purpose of a Data Engineer

At its core, the role of a Data Engineer is to:

  • Design and build data pipelines
  • Ensure data is reliable, timely, and accessible
  • Create the foundation that enables analytics, reporting, and data science

Data Engineers make sure that when someone asks a question of the data, the data is actually there—and correct.


Typical Responsibilities of a Data Engineer

While the exact responsibilities vary by company size and maturity, most Data Engineers spend time across the following areas.


Ingesting Data from Source Systems

Data Engineers build processes to ingest data from:

  • Operational databases
  • SaaS applications
  • APIs and event streams
  • Files and external data sources

This ingestion can be batch-based, streaming, or a mix of both, depending on the business needs.


Building and Maintaining Data Pipelines

Once data is ingested, Data Engineers:

  • Transform raw data into usable formats
  • Handle schema changes and data drift
  • Manage dependencies and scheduling
  • Monitor pipelines for failures and performance issues

Pipelines must be repeatable, resilient, and observable.


Managing Data Storage and Platforms

Data Engineers design and maintain:

  • Data warehouses and lakehouses
  • Data lakes and object storage
  • Partitioning, indexing, and performance strategies

They balance cost, performance, scalability, and ease of use while aligning with organizational standards.


Ensuring Data Quality and Reliability

A key responsibility is ensuring data can be trusted. This includes:

  • Validating data completeness and accuracy
  • Detecting anomalies or missing data
  • Implementing data quality checks and alerts
  • Supporting SLAs for data freshness

Reliable data is not accidental—it is engineered.


Enabling Analytics and Downstream Use Cases

Data Engineers work closely with:

  • Data Analysts and BI developers
  • Analytics engineers
  • Data scientists and ML engineers

They ensure datasets are structured in a way that supports efficient querying, consistent metrics, and self-service analytics.


Common Tools Used by Data Engineers

The exact toolset varies, but Data Engineers often work with:

  • Databases & Warehouses (e.g., cloud data platforms)
  • ETL / ELT Tools and orchestration frameworks
  • SQL for transformations and validation
  • Programming Languages such as Python, Java, or Scala
  • Streaming Technologies for real-time data
  • Infrastructure & Cloud Platforms
  • Monitoring and Observability Tools

Tooling matters, but design decisions matter more.


What a Data Engineer Is Not

Understanding role boundaries helps teams work effectively.

A Data Engineer is typically not:

  • A report or dashboard builder
  • A business stakeholder defining KPIs
  • A data scientist focused on modeling and experimentation
  • A system administrator managing only infrastructure

That said, in smaller teams, Data Engineers may wear multiple hats.


What the Role Looks Like Day-to-Day

A typical day for a Data Engineer might include:

  • Investigating a failed pipeline or delayed data load
  • Updating transformations to accommodate schema changes
  • Optimizing a slow query or job
  • Reviewing data quality alerts
  • Coordinating with analysts on new data needs
  • Deploying pipeline updates

Much of the work is preventative—ensuring problems don’t happen later.


How the Role Evolves Over Time

As organizations mature, the Data Engineer role evolves:

  • From manual ETL → automated, scalable pipelines
  • From siloed systems → centralized platforms
  • From reactive fixes → proactive reliability engineering
  • From data movement → data platform architecture

Senior Data Engineers often influence platform strategy, standards, and long-term technical direction.


Why Data Engineers Are So Important

Data Engineers are critical because:

  • They prevent analytics from becoming fragile or inconsistent
  • They enable speed without sacrificing trust
  • They scale data usage across the organization
  • They reduce technical debt and operational risk

Without strong data engineering, analytics becomes slow, unreliable, and difficult to scale.


Final Thoughts

A Data Engineer’s job is not just moving data from one place to another. It is about designing systems that make data dependable, usable, and sustainable.

When Data Engineers do their job well, everyone downstream—from analysts to executives—can focus on asking better questions instead of questioning the data itself.

Good luck on your data journey!

SQL Tips: How to generate insert statements using the data from the output of a select statement using Toad

If you need to build insert statements for many rows of data for inserting that data into another table or into the same table in a different environment, there is a convenient way to do this in Toad. This is often needed to move data around.

In Toad, execute the appropriate select statement on the source table …

Select * from [my_schema].[my_simple_table];

I used a simple “select *” above, but your SQL statement can be any valid SQL statement that returns the data you want to insert into the other table. You may add specific columns, add filters, joins, and any other valid SQL operation.

Let’s say you want to insert the output into another table in a different schema.

Right-click on the output result data, and click “Export Dataset…”

From the “Export format” drop down menu, choose “Insert Statements”

In the Output section of the Export Dataset dialog box, enter the location and name of the Insert Script file that will be generated.

There are several other parameters that you could choose but we won’t cover them all here.

If you only wanted to generate inserts for some selected rows, select “Export only selected rows”.

If you need to specify the schema of the target table, select “Include schema name”

In the Table section, enter the name of the target schema and table

Note, there are data masking options available that can be very useful if, for example, you are moving some data from a Production environment to a Non-Production environment, and you do not want to expose the data there.

After you have set the parameters relevant to your scenario, Click “OK”.

The Insert Script file of all the data will be generated with the filename and at the location you specified. And the insert statements will include the name of the schema and table you specified.

Thanks for reading!

SQL Tips: How to rename a column in a table – Oracle database – Oracle SQL

At times you will need to change the name of a column in an existing table. If you are not changing the data type, it is just one statement / step that needs to be executed. However, I strongly recommend that you also do a backup step, especially if you’re making the change in a production environment, just in case of an unexpected issue.

If you choose to do the backup, you may perform this with a “create-table-as-select” statement in this form:

create table [table_name_backup] as select * from [table_name];

Here is an example of the above statement:

create table EMPLOYEES_BKUP as select * from EMPLOYEES;

Now that the table you are modifying is all backed up, you can proceed to rename the column.

The rename SQL statement would take this form:

alter table [table_name] rename column [existing_column_name] to [new_column_name];

An example of the statement:

alter table EMPLOYEES rename column SEX to GENDER;

Thanks for reading! I hope you found this information useful.

SQL tips: Get last day and first day of month – Oracle SQL

At times you may need to dynamically determine the first or last day of a month based on the current date or some date in your data. This post provides a few options for deriving the First day of month, and Last day of month. And also, it provides a couple tweaks to get Last day of the next or previous month.

Get the current date (today’s date):
select sysdate from dual;

SYSDATE

16-MAY-22

Get the last day of the current month:
select trunc(last_day(sysdate)) as LastDayOfMonth from dual;

LASTDAYOFMONTH

31-MAY-22

Side note: The last_day() function can also be useful for dynamically determining leap years (that is, whether February has 28 or 29 days)
select
last_day(date ‘2020-02-01’) LastDayOfFeb2020, — leap year
last_day(date ‘2021-02-01’) LastDayOfFeb2021 — not a leap year
from dual;

LASTDAYOFFEB2020 LASTDAYOFFEB2021

29-FEB-20 28-FEB-21

Get the last day of the next month:
select add_months(trunc(last_day(sysdate)), 1) as LastDayOfNextMonth from dual;

LASTDAYOFNEXTMONTH

30-JUN-22

Get the last day of the previous month:
select add_months(trunc(last_day(sysdate)), -1) as LastDayOfPreviousMonth from dual;

LASTDAYOFPREVIOUSMONTH

30-APR-22

Get the first day of the current month:

select trunc(sysdate,’month’) as FirstDayOfMonth from dual;
select trunc(sysdate,’MM’) as FirstDayOfMonth from dual;

FIRSTDAYOFMONTH

01-MAY-22

Get the first day of the next month:

select add_months(trunc(sysdate,’MM’), 1) as FirstDayOfNextMonth from dual;

FIRSTDAYOFNEXTMONTH

01-JUN-22

You can also get the first day of the next month using this:
select trunc(last_day(sysdate)+1) as FirstDayOfNextMonth from dual;

FIRSTDAYOFNEXTMONTH

01-JUN-22

Get the first day of the previous month:

select add_months(trunc(sysdate,’MM’), -1) as FirstDayOfPreviousMonth from dual;

FIRSTDAYOFPREVIOUSMONTH

01-APR-22

Here are 3 resources for getting the first and last day of the month values on a SQL Server database:

https://zarez.net/?p=2462

Thanks for reading! I hope you found this information useful!

Good luck on your data journey.