Select, Filter, and Aggregate Data by Using KQL

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data 
    --> Query and analyze data 
        --> Select, filter, and aggregate data by using KQL

The Kusto Query Language (KQL) is a read-only request language used for querying large, distributed, event-driven datasets — especially within Eventhouse and Azure Data Explorer–backed workloads in Microsoft Fabric. KQL enables you to select, filter, and aggregate data efficiently in scenarios involving high-velocity data like telemetry, logs, and streaming events.

For the DP-600 exam, you should understand KQL basics and how it supports data exploration and analytical summarization in a real-time analytics context.

KQL Basics

KQL is designed to be expressive and performant for time-series or log-like data. Queries are built as a pipeline of operations, where each operator transforms the data and passes it to the next.

Selecting Data

In KQL, the project operator performs the equivalent of selecting columns:

EventHouseTable
| project Timestamp, Country, EventType, Value

project lets you choose which fields to include
You can rename fields inline: | project Time=Timestamp, Sales=Value

Exam Tip:
Use project early to limit data to relevant columns and reduce processing downstream.

Filtering Data

Filtering in KQL is done using the where operator:

EventHouseTable
| where Country == "USA"

Multiple conditions can be combined with and/or:

| where Value > 100 and EventType == "Purchase"

Filtering early in the pipeline improves performance by reducing the dataset before subsequent transformations.

Aggregating Data

KQL uses the summarize operator to perform aggregations such as counts, sums, averages, min, max, etc.

Example – Aggregate Total Sales:

EventHouseTable
| where EventType == "Purchase"
| summarize TotalSales = sum(Value)

Example – Grouped Aggregation:

EventHouseTable
| where EventType == "Purchase"
| summarize CountEvents = count(), TotalSales = sum(Value) by Country

Time-Bucketed Aggregation

KQL supports time binning using bin():

EventHouseTable
| where EventType == "Purchase"
| summarize TotalSales = sum(Value) by Country, bin(Timestamp, 1h)

This groups results into hourly buckets, which is ideal for time-series analytics and dashboards.

Common KQL Aggregation Functions

Function	Description
`count()`	Total number of records
`sum(column)`	Sum of numeric values
`avg(column)`	Average value
`min(column)` / `max(column)`	Minimum / maximum value
`percentile(column, p)`	Percentile calculation

Combining Operators

KQL queries are often a combination of select, filter, and aggregation:

EventHouseTable
| where EventType == "Purchase" and Timestamp >= ago(7d)
| project Country, Value, Timestamp
| summarize TotalSales = sum(Value), CountPurchases = count() by Country
| order by TotalSales desc

This pipeline:

Filters for purchases in the last 7 days
Projects relevant fields
Aggregates totals and counts
Orders the result by highest total sales

KQL vs SQL: What’s Different?

Feature	SQL	KQL
Syntax	Declarative	Pipeline-based
Joins	Extensive support	Limited pivot semantics
Use cases	Relational data	Time-series, event, logs
Aggregation	`GROUP BY`	`summarize`

KQL shines when querying streaming or event data at scale — exactly the kinds of scenarios Eventhouse targets.

Performance Considerations in KQL

Apply where as early as possible.
Use project to keep only necessary fields.
Time-range filters (e.g., last 24h) drastically reduce scan size.
KQL runs distributed and is optimized for large event streams.

Practical Use Cases

Example – Top Countries by Event Count:

EventHouseTable
| summarize EventCount = count() by Country
| top 10 by EventCount

Example – Average Value of Events per Day:

EventHouseTable
| where EventType == "SensorReading"
| summarize AvgValue = avg(Value) by bin(Timestamp, 1d)

Exam Relevance

In DP-600 exam scenarios involving event or near-real-time analytics (such as with Eventhouse or KQL-backed lakehouse sources), you may be asked to:

Write or interpret KQL that:
- projects specific fields
- filters records based on conditions
- aggregates and groups results
Choose the correct operator (where, project, summarize) for a task
Understand how KQL can be optimized with time-based filtering

Key Takeaways

project selects specific fields.
where filters rows based on conditions.
summarize performs aggregations.
Time-series queries often use bin() for bucketing.
The KQL pipeline enables modular, readable, and optimized queries for large datasets.

Final Exam Tips

If a question involves event streams, telemetry, metrics over time, or real-time analytics, and asks about summarizing values after filtering, think KQL with where, project, and summarize.

project → select columns
where → filter rows
summarize → aggregate and group
bin() → time-based grouping
KQL is pipeline-based, not declarative like SQL
Used heavily in Eventhouse / real-time analytics

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

Identifying and understand why an option is correct (or incorrect) — not just which one
Look for and understand the usage scenario of keywords in exam questions to guide you
Expect scenario-based questions rather than direct definitions

1. Which KQL operator is used to select specific columns from a dataset?

A. select
B. where
C. project
D. summarize

✅ Correct Answer: C

Explanation:
project is the KQL operator used to select and optionally rename columns. KQL does not use SELECT like SQL.

2. Which operator is used to filter rows in a KQL query?

A. filter
B. where
C. having
D. restrict

✅ Correct Answer: B

Explanation:
The where operator filters rows based on conditions and is typically placed early in the query pipeline for performance.

3. How do you count the number of records in a table using KQL?

A. count(*)
B. summarize count()
C. summarize count(*)
D. summarize count()

✅ Correct Answer: D

Explanation:
In KQL, aggregation functions are used inside summarize. count() counts rows; count(*) is SQL syntax.

4. Which KQL operator performs aggregations similar to SQL’s GROUP BY?

A. group
B. aggregate
C. summarize
D. partition

✅ Correct Answer: C

Explanation:
summarize is the KQL operator used for aggregation and grouping.

5. Which query returns total sales grouped by country?

| group by Country sum(Value)

| summarize sum(Value) Country

| summarize TotalSales = sum(Value) by Country

| aggregate Value by Country

✅ Correct Answer: C

Explanation:
KQL requires explicit naming of aggregates and grouping using summarize … by.

6. What is the purpose of the `bin()` function in KQL?

A. To sort data
B. To group numeric values
C. To bucket values into time intervals
D. To remove null values

✅ Correct Answer: C

Explanation:
bin() groups values—commonly timestamps—into fixed-size intervals (for example, hourly or daily buckets).

7. Which query correctly summarizes event counts per hour?

| summarize count() by Timestamp

| summarize count() by hour(Timestamp)

| summarize count() by bin(Timestamp, 1h)

| count() by Timestamp

✅ Correct Answer: C

Explanation:
Time-based grouping in KQL requires bin() to define the interval size.

8. Which operator should be placed as early as possible in a KQL query for performance reasons?

A. summarize
B. project
C. order by
D. where

✅ Correct Answer: D

Explanation:
Applying where early reduces the dataset size before further processing, improving performance.

9. Which KQL query returns the top 5 countries by event count?

| top 5 Country by count()

| summarize count() by Country | top 5 by count_

| summarize EventCount = count() by Country | top 5 by EventCount

| order by Country limit 5

✅ Correct Answer: C

Explanation:
You must first aggregate using summarize, then use top based on the aggregated column.

10. In Microsoft Fabric, KQL is primarily used with which workload?

A. Warehouse
B. Lakehouse SQL endpoint
C. Eventhouse
D. Semantic model

✅ Correct Answer: C

Explanation:
KQL is the primary query language for Eventhouse and real-time analytics scenarios in Microsoft Fabric.

The Data Community

Select, Filter, and Aggregate Data by Using KQL

For the DP-600 exam, you should understand KQL basics and how it supports data exploration and analytical summarization in a real-time analytics context.

KQL Basics

Selecting Data

Filtering Data

Aggregating Data

Example – Aggregate Total Sales:

Example – Grouped Aggregation:

Time-Bucketed Aggregation

Common KQL Aggregation Functions

Combining Operators

KQL vs SQL: What’s Different?

Performance Considerations in KQL

Practical Use Cases

Example – Top Countries by Event Count:

Example – Average Value of Events per Day:

Exam Relevance

Key Takeaways

Final Exam Tips

Practice Questions:

1. Which KQL operator is used to select specific columns from a dataset?

2. Which operator is used to filter rows in a KQL query?

3. How do you count the number of records in a table using KQL?

4. Which KQL operator performs aggregations similar to SQL’s GROUP BY?

5. Which query returns total sales grouped by country?

6. What is the purpose of the `bin()` function in KQL?

7. Which query correctly summarizes event counts per hour?

8. Which operator should be placed as early as possible in a KQL query for performance reasons?

9. Which KQL query returns the top 5 countries by event count?

10. In Microsoft Fabric, KQL is primarily used with which workload?

One thought on “Select, Filter, and Aggregate Data by Using KQL”

Leave a comment Cancel reply

Information and resources for the data professionals' community

For the DP-600 exam, you should understand KQL basics and how it supports data exploration and analytical summarization in a real-time analytics context.

KQL Basics

Selecting Data

Filtering Data

Aggregating Data

Example – Aggregate Total Sales:

Example – Grouped Aggregation:

Time-Bucketed Aggregation

Common KQL Aggregation Functions

Combining Operators

KQL vs SQL: What’s Different?

Performance Considerations in KQL

Practical Use Cases

Example – Top Countries by Event Count:

Example – Average Value of Events per Day:

Exam Relevance

Key Takeaways

Final Exam Tips

Practice Questions:

1. Which KQL operator is used to select specific columns from a dataset?

2. Which operator is used to filter rows in a KQL query?

3. How do you count the number of records in a table using KQL?

4. Which KQL operator performs aggregations similar to SQL’s GROUP BY?

5. Which query returns total sales grouped by country?

6. What is the purpose of the bin() function in KQL?

7. Which query correctly summarizes event counts per hour?

8. Which operator should be placed as early as possible in a KQL query for performance reasons?

9. Which KQL query returns the top 5 countries by event count?

10. In Microsoft Fabric, KQL is primarily used with which workload?

Share this:

Related

One thought on “Select, Filter, and Aggregate Data by Using KQL”

Leave a comment Cancel reply

Information and resources for the data professionals' community

6. What is the purpose of the `bin()` function in KQL?