Understanding Rolling Window Counts with SQL: A Recursive Query Solution

Understanding Rolling Window Counts with SQL

In this article, we will delve into the world of rolling window counts in SQL. Specifically, we’ll explore how to calculate counts based on a 90-day window per unique ID. This problem can be challenging due to the need for complex date calculations and counting logic.

Problem Statement

The problem involves a table with id and date columns, where multiple transactions can occur within a 90-day window. The goal is to create new columns (window and count) that display the start of each 90-day window and the count of transactions occurring during those windows.

Current Approach

The original question suggests using a combination of CASE, MIN(date) OVER (PARTITION BY id), and DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) to calculate the window column. It also proposes using ROW_NUMBER() OVER (PARTITION BY id, window) to increment the count.

However, this approach is limited due to the complexities of SQL window functions and the need for recursive logic.

Recursive Query Solution

The provided answer outlines a recursive query solution, which is more suitable for solving this problem. This approach involves two key steps: ranking records by date within each ID partition and then traversing the data set to compute the starting date of each group.

Step 1: Ranking Records by Date

with 
    tab as (
        select t.*, row_number() over(partition by id order by date) rn
        from mytable t
    )

In this step, we rank records having the same id by increasing date. This is done using the ROW_NUMBER() function with a partitioning clause that groups rows by id and orders them by date.

Step 2: Traversing the Data Set

cte as (
    select id, date, rn, date date0 from tab where rn = 1
    union all
    select t.id, t.date, t.rn, greatest(t.date, c.date + interval '90' day)
    from cte c
    inner join tab t on t.id = c.id and t.rn = c.rn + 1
)

Here, we define a recursive CTE (Common Table Expression) that starts with the first record for each id partition. Then, we traverse the data set by joining the CTE with the original table, keeping only rows where the rn value is one more than the previous row’s rn. We calculate the starting date of each group as the maximum of the current date and 90 days ago from the start of the previous group.

Final Step: Counting Transactions

select
    id,
    date,
    dense_rank() over(partition by id order by date0) grp,
    count(*)     over(partition by id order by date0, date) cnt
from cte

In this final step, we rank the records within each id partition by their start date (date0). We then calculate the total count of transactions within each 90-day window using a subquery with an outer clause.

Understanding the Recursive Query

The recursive query is composed of three main components:

  1. Ranking Records: The first part ranks records by id and date, creating a sequence of rows for each id.
  2. Traversing the Data Set: The second part traverses this ranked data set, computing the start date of each group (i.e., the first day of the next 90-day window).
  3. Counting Transactions: The final step counts the number of transactions within each group.

By combining these components, we obtain a comprehensive solution that accurately calculates rolling window counts for unique IDs.

Example Use Case

Suppose we have a table mytable with the following data:

iddate
12019-07-01
12019-12-31
12020-01-23
12020-02-12
12020-04-01
22019-06-30
22019-08-14

Using the recursive query solution outlined above, we can calculate the rolling window counts for each id as follows:

with 
    tab as (
        select id, date, row_number() over(partition by id order by date) rn
        from mytable
    )
,
cte as (
    select id, date, rn, date date0
    from tab
    where rn = 1
    union all
    select t.id, t.date, t.rn, greatest(t.date, cte.date0 + interval '90' day)
    from cte c
    inner join tab t on t.id = c.id and t.rn = c.rn + 1
),
cte2 as (
    select 
        id,
        date,
        dense_rank() over(partition by id order by date0) grp,
        count(*)     over(partition by id order by date0, date) cnt
    from cte
)
select * from cte2

The resulting CTE cte2 will contain the calculated rolling window counts for each id, including the start of each group and the total transaction count within each 90-day window.

Conclusion

In this article, we explored a complex problem involving rolling window counts in SQL. We introduced a recursive query solution that accurately calculates these counts by ranking records by date within each ID partition and then traversing the data set to compute the start date of each group. By applying this approach, you can efficiently solve similar problems in your own projects.

Note: The provided code snippets are written in PostgreSQL syntax and may need modifications for other SQL dialects.


Last modified on 2023-08-04