Working With SQL’s Window Functions

Karthik Bhandary
Level Up Coding
Published in
4 min readJul 18, 2022

--

SQL is a crucial skill for a Software Engineer, be it a Data Scientist, Data Engineer, or anyone else. In this blog, I will be going over how to use SQL’s Window functions.

Photo by Laura Cleffmann on Unsplash

Window Functions:

Window Functions — The name may intimidate you, but you don’t need to worry. Window functions are the functions that work with a group of rows instead of a single one. The “Window” in the “Window Functions” refers to a set of rows.

They can be used for many different things one of the examples is performing aggregate. Typically when we aggregate we need to group by the non-aggregated columns, otherwise, we’ll be getting an error. When using Window Functions we don’t need to worry about that.

Now coming to the syntax, we define a Window Function by using the OVER().

Syntax:

function() OVER(...) AS alias

The above syntax can be interpreted as follows:

A function is getting applied Over a window.

Now an example. Let’s say we have a match table. What we want to do is that we want to find out “How many goals were scored in each match and how did that compare to the overall average from 2011/2012”

The query looks something like this:

SELECT date, 
(home_goals + away_goals) AS goals,
AVG(home_goals + away_goals) OVER() AS overall_avg
FROM match
WHERE season = '2011/2012';

In the above query, home_goals means goals scored as a home team, and away_goals means, goals scored as an away team if you observe we did not GROUP BY date since we are using a Window Function.

Generating a RANK:

As the name suggests it will generate a RANK. But we are going to add an ORDER BY clause inside of the OVER(). If it two values are the same, then it ties those values and skips the next value in the rank(You will understand when you take a look at the example).

Example:

Using the same match table we are going to be ranking the total goals.

SELECT date,
(home_goal + away_goal) AS goals,
RANK() OVER(ORDER BY home_goal + away_goal) AS goal_rank
FROM match
WHERE season = '2011/2012';

The above would result in something like this:

Goals    goal_rank
10 1
10 1
10 1
10 1
9 5
8 6
8 6
7 8

As you can see if the goals are tied they all are getting the same value but when it comes to the next one it is skipping right to the next rank in count (Like 1 repeated 4 times so the next is 5) We can arrange in decrease order as well by mentioning DESC after the col name in ORDER BY.

If you want to avoid skipping ranks then you can use DENSE_RANK() which is the same as rank except it does not skip the ranks. The RANK() function is processed after every part of the query except ORDER BY. Meaning that it uses information from the result rather than from the table.

PARTITION:

Let’s say you are in a situation where you want to calculate AVG() for each season separated. In this situation, you can use the PARTITION BY clause in the OVER().

Take a look at this example. “How many goals were scored in each match and how did they compare to the season’s average?”

SELECT date,
(home_goal+away_goal) AS goals,
AVG(home_goals+away_goal) OVER(PARTITION BY season) AS season_avg
FROM match;

The above query returns a table with each match’s goal along with the season average.

date  goals   season_avg
2010 4 4.55
2010 5 4.55
2010 1 4.55
2011 3 6.55
2011 6 6.55
2011 2 6.55

The result may look something like this. Now if we want to PARTITION BY more than one column, we can do that by separating the columns with a “,”.

Sliding Window:

It is a kind of window function that basically is a time-based or row-based window. You can apply analysis to a given set of rows.

This too uses OVER() but the only thing extra is that it uses the following:

ROWS BETWEEN <start> AND <finish>

We substitute some keywords in the place of start and finish. Those keywords are:

  • PRECEDING — rows before the current row.
  • FOLLOWING — rows after the current row.
  • UNBOUNDED PRECEDING — slides to the edge before the current row.
  • UNBOUNDED FOLLOWING — slides to edge row after current row
  • CURRENT ROW — refers to the current row.

Now I will show you a query let me know what it does in the comment section.

SELECT date,
home_goal,
away_goal,
SUM(home_goal)
OVER( ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM match
WHERE hometeam_id = 8456 AND season = '2011/2012';

Let me know your answer. Help each other out in the comment section.

Conclusion:

In this blog, we took a look at:

  • What a window function is?
  • How to work with them?
  • How to RANK?
  • How to Partition the result?
  • What sliding windows are?

I hope this blog was helpful. Follow me on LinkedIn. If you liked my work then buy me a cup of coffee: dataguy6@ybl

Also, check out my recent works:

Level Up Coding

Thanks for being a part of our community! More content in the Level Up Coding publication.
Follow: Twitter, LinkedIn, Newsletter
Level Up is transforming tech recruiting 👉 Join our talent collective

--

--