Select, Filter, and Aggregate Data by Using KQL

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data
--> Query and analyze data
--> Select, filter, and aggregate data by using KQL

The Kusto Query Language (KQL) is a read-only request language used for querying large, distributed, event-driven datasets — especially within Eventhouse and Azure Data Explorer–backed workloads in Microsoft Fabric. KQL enables you to select, filter, and aggregate data efficiently in scenarios involving high-velocity data like telemetry, logs, and streaming events.

For the DP-600 exam, you should understand KQL basics and how it supports data exploration and analytical summarization in a real-time analytics context.


KQL Basics

KQL is designed to be expressive and performant for time-series or log-like data. Queries are built as a pipeline of operations, where each operator transforms the data and passes it to the next.


Selecting Data

In KQL, the project operator performs the equivalent of selecting columns:

EventHouseTable
| project Timestamp, Country, EventType, Value

  • project lets you choose which fields to include
  • You can rename fields inline: | project Time=Timestamp, Sales=Value

Exam Tip:
Use project early to limit data to relevant columns and reduce processing downstream.


Filtering Data

Filtering in KQL is done using the where operator:

EventHouseTable
| where Country == "USA"

Multiple conditions can be combined with and/or:

| where Value > 100 and EventType == "Purchase"

Filtering early in the pipeline improves performance by reducing the dataset before subsequent transformations.


Aggregating Data

KQL uses the summarize operator to perform aggregations such as counts, sums, averages, min, max, etc.

Example – Aggregate Total Sales:

EventHouseTable
| where EventType == "Purchase"
| summarize TotalSales = sum(Value)

Example – Grouped Aggregation:

EventHouseTable
| where EventType == "Purchase"
| summarize CountEvents = count(), TotalSales = sum(Value) by Country

Time-Bucketed Aggregation

KQL supports time binning using bin():

EventHouseTable
| where EventType == "Purchase"
| summarize TotalSales = sum(Value) by Country, bin(Timestamp, 1h)

This groups results into hourly buckets, which is ideal for time-series analytics and dashboards.


Common KQL Aggregation Functions

FunctionDescription
count()Total number of records
sum(column)Sum of numeric values
avg(column)Average value
min(column) / max(column)Minimum / maximum value
percentile(column, p)Percentile calculation

Combining Operators

KQL queries are often a combination of select, filter, and aggregation:

EventHouseTable
| where EventType == "Purchase" and Timestamp >= ago(7d)
| project Country, Value, Timestamp
| summarize TotalSales = sum(Value), CountPurchases = count() by Country
| order by TotalSales desc

This pipeline:

  1. Filters for purchases in the last 7 days
  2. Projects relevant fields
  3. Aggregates totals and counts
  4. Orders the result by highest total sales

KQL vs SQL: What’s Different?

FeatureSQLKQL
SyntaxDeclarativePipeline-based
JoinsExtensive supportLimited pivot semantics
Use casesRelational dataTime-series, event, logs
AggregationGROUP BYsummarize

KQL shines when querying streaming or event data at scale — exactly the kinds of scenarios Eventhouse targets.


Performance Considerations in KQL

  • Apply where as early as possible.
  • Use project to keep only necessary fields.
  • Time-range filters (e.g., last 24h) drastically reduce scan size.
  • KQL runs distributed and is optimized for large event streams.

Practical Use Cases

Example – Top Countries by Event Count:

EventHouseTable
| summarize EventCount = count() by Country
| top 10 by EventCount

Example – Average Value of Events per Day:

EventHouseTable
| where EventType == "SensorReading"
| summarize AvgValue = avg(Value) by bin(Timestamp, 1d)


Exam Relevance

In DP-600 exam scenarios involving event or near-real-time analytics (such as with Eventhouse or KQL-backed lakehouse sources), you may be asked to:

  • Write or interpret KQL that:
    • projects specific fields
    • filters records based on conditions
    • aggregates and groups results
  • Choose the correct operator (where, project, summarize) for a task
  • Understand how KQL can be optimized with time-based filtering

Key Takeaways

  • project selects specific fields.
  • where filters rows based on conditions.
  • summarize performs aggregations.
  • Time-series queries often use bin() for bucketing.
  • The KQL pipeline enables modular, readable, and optimized queries for large datasets.

Final Exam Tips

If a question involves event streams, telemetry, metrics over time, or real-time analytics, and asks about summarizing values after filtering, think KQL with where, project, and summarize.

  • project → select columns
  • where → filter rows
  • summarize → aggregate and group
  • bin() → time-based grouping
  • KQL is pipeline-based, not declarative like SQL
  • Used heavily in Eventhouse / real-time analytics

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

1. Which KQL operator is used to select specific columns from a dataset?

A. select
B. where
C. project
D. summarize

Correct Answer: C

Explanation:
project is the KQL operator used to select and optionally rename columns. KQL does not use SELECT like SQL.


2. Which operator is used to filter rows in a KQL query?

A. filter
B. where
C. having
D. restrict

Correct Answer: B

Explanation:
The where operator filters rows based on conditions and is typically placed early in the query pipeline for performance.


3. How do you count the number of records in a table using KQL?

A. count(*)
B. summarize count()
C. summarize count(*)
D. summarize count()

Correct Answer: D

Explanation:
In KQL, aggregation functions are used inside summarize. count() counts rows; count(*) is SQL syntax.


4. Which KQL operator performs aggregations similar to SQL’s GROUP BY?

A. group
B. aggregate
C. summarize
D. partition

Correct Answer: C

Explanation:
summarize is the KQL operator used for aggregation and grouping.


5. Which query returns total sales grouped by country?

A.

| group by Country sum(Value)

B.

| summarize sum(Value) Country

C.

| summarize TotalSales = sum(Value) by Country

D.

| aggregate Value by Country

Correct Answer: C

Explanation:
KQL requires explicit naming of aggregates and grouping using summarize … by.


6. What is the purpose of the bin() function in KQL?

A. To sort data
B. To group numeric values
C. To bucket values into time intervals
D. To remove null values

Correct Answer: C

Explanation:
bin() groups values—commonly timestamps—into fixed-size intervals (for example, hourly or daily buckets).


7. Which query correctly summarizes event counts per hour?

A.

| summarize count() by Timestamp

B.

| summarize count() by hour(Timestamp)

C.

| summarize count() by bin(Timestamp, 1h)

D.

| count() by Timestamp

Correct Answer: C

Explanation:
Time-based grouping in KQL requires bin() to define the interval size.


8. Which operator should be placed as early as possible in a KQL query for performance reasons?

A. summarize
B. project
C. order by
D. where

Correct Answer: D

Explanation:
Applying where early reduces the dataset size before further processing, improving performance.


9. Which KQL query returns the top 5 countries by event count?

A.

| top 5 Country by count()

B.

| summarize count() by Country | top 5 by count_

C.

| summarize EventCount = count() by Country | top 5 by EventCount

D.

| order by Country limit 5

Correct Answer: C

Explanation:
You must first aggregate using summarize, then use top based on the aggregated column.


10. In Microsoft Fabric, KQL is primarily used with which workload?

A. Warehouse
B. Lakehouse SQL endpoint
C. Eventhouse
D. Semantic model

Correct Answer: C

Explanation:
KQL is the primary query language for Eventhouse and real-time analytics scenarios in Microsoft Fabric.


One thought on “Select, Filter, and Aggregate Data by Using KQL”

Leave a comment