Using array functions to mimic window-functions alike behavior

There are cases where you may need to mimic window functions using arrays in ClickHouse. This could be for optimization purposes, to better manage memory, or to enable on-disk spilling, especially if you’re working with an older version of ClickHouse that doesn’t natively support window functions.

Here’s an example demonstrating how to mimic a window function like runningDifference() using arrays:

Step 1: Create Sample Data

We’ll start by creating a test table with some sample data:

DROP TABLE IS EXISTS test_running_difference

CREATE TABLE test_running_difference
ENGINE = Log AS
SELECT 
    number % 20 AS id, 
    toDateTime('2010-01-01 00:00:00') + (intDiv(number, 20) * 15) AS ts, 
    (number * round(xxHash32(number % 20) / 1000000)) - round(rand() / 1000000) AS val
FROM numbers(100)


SELECT * FROM test_running_difference;

┌─id─┬──────────────────ts─┬────val─┐
  0  2010-01-01 00:00:00   -1209 
  1  2010-01-01 00:00:00      43 
  2  2010-01-01 00:00:00    4322 
  3  2010-01-01 00:00:00     -25 
  4  2010-01-01 00:00:00   13720 
  5  2010-01-01 00:00:00     903 
  6  2010-01-01 00:00:00   18062 
  7  2010-01-01 00:00:00   -2873 
  8  2010-01-01 00:00:00    6286 
  9  2010-01-01 00:00:00   13399 
 10  2010-01-01 00:00:00   18320 
 11  2010-01-01 00:00:00   11731 
 12  2010-01-01 00:00:00     857 
 13  2010-01-01 00:00:00    8752 
 14  2010-01-01 00:00:00   23060 
 15  2010-01-01 00:00:00   41902 
 16  2010-01-01 00:00:00   39406 
 17  2010-01-01 00:00:00   50010 
 18  2010-01-01 00:00:00   57673 
 19  2010-01-01 00:00:00   51389 
  0  2010-01-01 00:00:15   66839 
  1  2010-01-01 00:00:15   19440 
  2  2010-01-01 00:00:15   74513 
  3  2010-01-01 00:00:15   10542 
  4  2010-01-01 00:00:15   94245 
  5  2010-01-01 00:00:15    8230 
  6  2010-01-01 00:00:15   87823 
  7  2010-01-01 00:00:15    -128 
  8  2010-01-01 00:00:15   30101 
  9  2010-01-01 00:00:15   54321 
 10  2010-01-01 00:00:15   64078 
 11  2010-01-01 00:00:15   31886 
 12  2010-01-01 00:00:15    8749 
 13  2010-01-01 00:00:15   28982 
 14  2010-01-01 00:00:15   61299 
 15  2010-01-01 00:00:15   95867 
 16  2010-01-01 00:00:15   93667 
 17  2010-01-01 00:00:15  114072 
 18  2010-01-01 00:00:15  124279 
 19  2010-01-01 00:00:15  109605 
  0  2010-01-01 00:00:30  135082 
  1  2010-01-01 00:00:30   37345 
  2  2010-01-01 00:00:30  148744 
  3  2010-01-01 00:00:30   21607 
  4  2010-01-01 00:00:30  171744 
  5  2010-01-01 00:00:30   14736 
  6  2010-01-01 00:00:30  155349 
  7  2010-01-01 00:00:30   -3901 
  8  2010-01-01 00:00:30   54303 
  9  2010-01-01 00:00:30   89629 
 10  2010-01-01 00:00:30  106595 
 11  2010-01-01 00:00:30   54545 
 12  2010-01-01 00:00:30   18903 
 13  2010-01-01 00:00:30   48023 
 14  2010-01-01 00:00:30   97930 
 15  2010-01-01 00:00:30  152165 
 16  2010-01-01 00:00:30  146130 
 17  2010-01-01 00:00:30  174854 
 18  2010-01-01 00:00:30  189194 
 19  2010-01-01 00:00:30  170134 
  0  2010-01-01 00:00:45  207471 
  1  2010-01-01 00:00:45   54323 
  2  2010-01-01 00:00:45  217984 
  3  2010-01-01 00:00:45   31835 
  4  2010-01-01 00:00:45  252709 
  5  2010-01-01 00:00:45   21493 
  6  2010-01-01 00:00:45  221271 
  7  2010-01-01 00:00:45    -488 
  8  2010-01-01 00:00:45   76827 
  9  2010-01-01 00:00:45  131066 
 10  2010-01-01 00:00:45  149087 
 11  2010-01-01 00:00:45   71934 
 12  2010-01-01 00:00:45   25125 
 13  2010-01-01 00:00:45   65274 
 14  2010-01-01 00:00:45  135980 
 15  2010-01-01 00:00:45  210910 
 16  2010-01-01 00:00:45  200007 
 17  2010-01-01 00:00:45  235872 
 18  2010-01-01 00:00:45  256112 
 19  2010-01-01 00:00:45  229371 
  0  2010-01-01 00:01:00  275331 
  1  2010-01-01 00:01:00   72668 
  2  2010-01-01 00:01:00  290366 
  3  2010-01-01 00:01:00   46074 
  4  2010-01-01 00:01:00  329207 
  5  2010-01-01 00:01:00   26770 
  6  2010-01-01 00:01:00  287619 
  7  2010-01-01 00:01:00   -2207 
  8  2010-01-01 00:01:00  100456 
  9  2010-01-01 00:01:00  165688 
 10  2010-01-01 00:01:00  194136 
 11  2010-01-01 00:01:00   94113 
 12  2010-01-01 00:01:00   35810 
 13  2010-01-01 00:01:00   85081 
 14  2010-01-01 00:01:00  170256 
 15  2010-01-01 00:01:00  265445 
 16  2010-01-01 00:01:00  254828 
 17  2010-01-01 00:01:00  297238 
 18  2010-01-01 00:01:00  323494 
 19  2010-01-01 00:01:00  286252 
└────┴─────────────────────┴────────┘

100 rows in set. Elapsed: 0.003 sec. 

This table contains IDs, timestamps (ts), and values (val), where each id appears multiple times with different timestamps.

Step 2: Running Difference Example

If you try using runningDifference directly, it works block by block, which can be problematic when the data needs to be ordered or when group changes occur.

select id, val, runningDifference(val) from (select * from test_running_difference order by id, ts);

┌─id─┬────val─┬─runningDifference(val)─┐
  0   -1209                       0 
  0   66839                   68048 
  0  135082                   68243 
  0  207471                   72389 
  0  275331                   67860 
  1      43                 -275288 
  1   19440                   19397 
  1   37345                   17905 
  1   54323                   16978 
  1   72668                   18345 
  2    4322                  -68346 
  2   74513                   70191 
  2  148744                   74231 
  2  217984                   69240 
  2  290366                   72382 
  3     -25                 -290391 
  3   10542                   10567 
  3   21607                   11065 
  3   31835                   10228 
  3   46074                   14239 
  4   13720                  -32354 
  4   94245                   80525 
  4  171744                   77499 
  4  252709                   80965 
  4  329207                   76498 
  5     903                 -328304 
  5    8230                    7327 
  5   14736                    6506 
  5   21493                    6757 
  5   26770                    5277 
  6   18062                   -8708 
  6   87823                   69761 
  6  155349                   67526 
  6  221271                   65922 
  6  287619                   66348 
  7   -2873                 -290492 
  7    -128                    2745 
  7   -3901                   -3773 
  7    -488                    3413 
  7   -2207                   -1719 
  8    6286                    8493 
  8   30101                   23815 
  8   54303                   24202 
  8   76827                   22524 
  8  100456                   23629 
  9   13399                  -87057 
  9   54321                   40922 
  9   89629                   35308 
  9  131066                   41437 
  9  165688                   34622 
 10   18320                 -147368 
 10   64078                   45758 
 10  106595                   42517 
 10  149087                   42492 
 10  194136                   45049 
 11   11731                 -182405 
 11   31886                   20155 
 11   54545                   22659 
 11   71934                   17389 
 11   94113                   22179 
 12     857                  -93256 
 12    8749                    7892 
 12   18903                   10154 
 12   25125                    6222 
 12   35810                   10685 
 13    8752                  -27058 
 13   28982                   20230 
 13   48023                   19041 
 13   65274                   17251 
 13   85081                   19807 
 14   23060                  -62021 
 14   61299                   38239 
 14   97930                   36631 
 14  135980                   38050 
 14  170256                   34276 
 15   41902                 -128354 
 15   95867                   53965 
 15  152165                   56298 
 15  210910                   58745 
 15  265445                   54535 
 16   39406                 -226039 
 16   93667                   54261 
 16  146130                   52463 
 16  200007                   53877 
 16  254828                   54821 
 17   50010                 -204818 
 17  114072                   64062 
 17  174854                   60782 
 17  235872                   61018 
 17  297238                   61366 
 18   57673                 -239565 
 18  124279                   66606 
 18  189194                   64915 
 18  256112                   66918 
 18  323494                   67382 
 19   51389                 -272105 
 19  109605                   58216 
 19  170134                   60529 
 19  229371                   59237 
 19  286252                   56881 
└────┴────────┴────────────────────────┘

100 rows in set. Elapsed: 0.005 sec. 

The output may look inconsistent because runningDifference requires ordered data within blocks.

Step 3: Using Arrays for Grouping and Calculation

Instead of using runningDifference, we can utilize arrays to group data, sort it, and apply similar logic more efficiently.

Grouping Data into Arrays - You can group multiple columns into arrays by using the groupArray function. For example, to collect several columns as arrays of tuples, you can use the following query:

SELECT 
    id, 
    groupArray(tuple(ts, val))
FROM test_running_difference
GROUP BY id

┌─id─┬─groupArray(tuple(ts, val))──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
  0  [('2010-01-01 00:00:00',-1209),('2010-01-01 00:00:15',66839),('2010-01-01 00:00:30',135082),('2010-01-01 00:00:45',207471),('2010-01-01 00:01:00',275331)]  
  1  [('2010-01-01 00:00:00',43),('2010-01-01 00:00:15',19440),('2010-01-01 00:00:30',37345),('2010-01-01 00:00:45',54323),('2010-01-01 00:01:00',72668)]        
  2  [('2010-01-01 00:00:00',4322),('2010-01-01 00:00:15',74513),('2010-01-01 00:00:30',148744),('2010-01-01 00:00:45',217984),('2010-01-01 00:01:00',290366)]   
  3  [('2010-01-01 00:00:00',-25),('2010-01-01 00:00:15',10542),('2010-01-01 00:00:30',21607),('2010-01-01 00:00:45',31835),('2010-01-01 00:01:00',46074)]       
  4  [('2010-01-01 00:00:00',13720),('2010-01-01 00:00:15',94245),('2010-01-01 00:00:30',171744),('2010-01-01 00:00:45',252709),('2010-01-01 00:01:00',329207)]  
  5  [('2010-01-01 00:00:00',903),('2010-01-01 00:00:15',8230),('2010-01-01 00:00:30',14736),('2010-01-01 00:00:45',21493),('2010-01-01 00:01:00',26770)]        
  6  [('2010-01-01 00:00:00',18062),('2010-01-01 00:00:15',87823),('2010-01-01 00:00:30',155349),('2010-01-01 00:00:45',221271),('2010-01-01 00:01:00',287619)]  
  7  [('2010-01-01 00:00:00',-2873),('2010-01-01 00:00:15',-128),('2010-01-01 00:00:30',-3901),('2010-01-01 00:00:45',-488),('2010-01-01 00:01:00',-2207)]       
  8  [('2010-01-01 00:00:00',6286),('2010-01-01 00:00:15',30101),('2010-01-01 00:00:30',54303),('2010-01-01 00:00:45',76827),('2010-01-01 00:01:00',100456)]     
  9  [('2010-01-01 00:00:00',13399),('2010-01-01 00:00:15',54321),('2010-01-01 00:00:30',89629),('2010-01-01 00:00:45',131066),('2010-01-01 00:01:00',165688)]   
 10  [('2010-01-01 00:00:00',18320),('2010-01-01 00:00:15',64078),('2010-01-01 00:00:30',106595),('2010-01-01 00:00:45',149087),('2010-01-01 00:01:00',194136)]  
 11  [('2010-01-01 00:00:00',11731),('2010-01-01 00:00:15',31886),('2010-01-01 00:00:30',54545),('2010-01-01 00:00:45',71934),('2010-01-01 00:01:00',94113)]     
 12  [('2010-01-01 00:00:00',857),('2010-01-01 00:00:15',8749),('2010-01-01 00:00:30',18903),('2010-01-01 00:00:45',25125),('2010-01-01 00:01:00',35810)]        
 13  [('2010-01-01 00:00:00',8752),('2010-01-01 00:00:15',28982),('2010-01-01 00:00:30',48023),('2010-01-01 00:00:45',65274),('2010-01-01 00:01:00',85081)]      
 14  [('2010-01-01 00:00:00',23060),('2010-01-01 00:00:15',61299),('2010-01-01 00:00:30',97930),('2010-01-01 00:00:45',135980),('2010-01-01 00:01:00',170256)]   
 15  [('2010-01-01 00:00:00',41902),('2010-01-01 00:00:15',95867),('2010-01-01 00:00:30',152165),('2010-01-01 00:00:45',210910),('2010-01-01 00:01:00',265445)]  
 16  [('2010-01-01 00:00:00',39406),('2010-01-01 00:00:15',93667),('2010-01-01 00:00:30',146130),('2010-01-01 00:00:45',200007),('2010-01-01 00:01:00',254828)]  
 17  [('2010-01-01 00:00:00',50010),('2010-01-01 00:00:15',114072),('2010-01-01 00:00:30',174854),('2010-01-01 00:00:45',235872),('2010-01-01 00:01:00',297238)] 
 18  [('2010-01-01 00:00:00',57673),('2010-01-01 00:00:15',124279),('2010-01-01 00:00:30',189194),('2010-01-01 00:00:45',256112),('2010-01-01 00:01:00',323494)] 
 19  [('2010-01-01 00:00:00',51389),('2010-01-01 00:00:15',109605),('2010-01-01 00:00:30',170134),('2010-01-01 00:00:45',229371),('2010-01-01 00:01:00',286252)] 
└────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Sorting Arrays - To sort the arrays by a specific element, for example, by the second element of the tuple, you can use the arraySort function:

SELECT 
    id, 
    arraySort(x -> (x.2), groupArray((ts, val)))
FROM test_running_difference
GROUP BY id

┌─id─┬─arraySort(lambda(tuple(x), tupleElement(x, 2)), groupArray(tuple(ts, val)))─────────────────────────────────────────────────────────────────────────────────┐
  0  [('2010-01-01 00:00:00',-1209),('2010-01-01 00:00:15',66839),('2010-01-01 00:00:30',135082),('2010-01-01 00:00:45',207471),('2010-01-01 00:01:00',275331)]  
  1  [('2010-01-01 00:00:00',43),('2010-01-01 00:00:15',19440),('2010-01-01 00:00:30',37345),('2010-01-01 00:00:45',54323),('2010-01-01 00:01:00',72668)]        
  2  [('2010-01-01 00:00:00',4322),('2010-01-01 00:00:15',74513),('2010-01-01 00:00:30',148744),('2010-01-01 00:00:45',217984),('2010-01-01 00:01:00',290366)]   
  3  [('2010-01-01 00:00:00',-25),('2010-01-01 00:00:15',10542),('2010-01-01 00:00:30',21607),('2010-01-01 00:00:45',31835),('2010-01-01 00:01:00',46074)]       
  4  [('2010-01-01 00:00:00',13720),('2010-01-01 00:00:15',94245),('2010-01-01 00:00:30',171744),('2010-01-01 00:00:45',252709),('2010-01-01 00:01:00',329207)]  
  5  [('2010-01-01 00:00:00',903),('2010-01-01 00:00:15',8230),('2010-01-01 00:00:30',14736),('2010-01-01 00:00:45',21493),('2010-01-01 00:01:00',26770)]        
  6  [('2010-01-01 00:00:00',18062),('2010-01-01 00:00:15',87823),('2010-01-01 00:00:30',155349),('2010-01-01 00:00:45',221271),('2010-01-01 00:01:00',287619)]  
  7  [('2010-01-01 00:00:30',-3901),('2010-01-01 00:00:00',-2873),('2010-01-01 00:01:00',-2207),('2010-01-01 00:00:45',-488),('2010-01-01 00:00:15',-128)]       
  8  [('2010-01-01 00:00:00',6286),('2010-01-01 00:00:15',30101),('2010-01-01 00:00:30',54303),('2010-01-01 00:00:45',76827),('2010-01-01 00:01:00',100456)]     
  9  [('2010-01-01 00:00:00',13399),('2010-01-01 00:00:15',54321),('2010-01-01 00:00:30',89629),('2010-01-01 00:00:45',131066),('2010-01-01 00:01:00',165688)]   
 10  [('2010-01-01 00:00:00',18320),('2010-01-01 00:00:15',64078),('2010-01-01 00:00:30',106595),('2010-01-01 00:00:45',149087),('2010-01-01 00:01:00',194136)]  
 11  [('2010-01-01 00:00:00',11731),('2010-01-01 00:00:15',31886),('2010-01-01 00:00:30',54545),('2010-01-01 00:00:45',71934),('2010-01-01 00:01:00',94113)]     
 12  [('2010-01-01 00:00:00',857),('2010-01-01 00:00:15',8749),('2010-01-01 00:00:30',18903),('2010-01-01 00:00:45',25125),('2010-01-01 00:01:00',35810)]        
 13  [('2010-01-01 00:00:00',8752),('2010-01-01 00:00:15',28982),('2010-01-01 00:00:30',48023),('2010-01-01 00:00:45',65274),('2010-01-01 00:01:00',85081)]      
 14  [('2010-01-01 00:00:00',23060),('2010-01-01 00:00:15',61299),('2010-01-01 00:00:30',97930),('2010-01-01 00:00:45',135980),('2010-01-01 00:01:00',170256)]   
 15  [('2010-01-01 00:00:00',41902),('2010-01-01 00:00:15',95867),('2010-01-01 00:00:30',152165),('2010-01-01 00:00:45',210910),('2010-01-01 00:01:00',265445)]  
 16  [('2010-01-01 00:00:00',39406),('2010-01-01 00:00:15',93667),('2010-01-01 00:00:30',146130),('2010-01-01 00:00:45',200007),('2010-01-01 00:01:00',254828)]  
 17  [('2010-01-01 00:00:00',50010),('2010-01-01 00:00:15',114072),('2010-01-01 00:00:30',174854),('2010-01-01 00:00:45',235872),('2010-01-01 00:01:00',297238)] 
 18  [('2010-01-01 00:00:00',57673),('2010-01-01 00:00:15',124279),('2010-01-01 00:00:30',189194),('2010-01-01 00:00:45',256112),('2010-01-01 00:01:00',323494)] 
 19  [('2010-01-01 00:00:00',51389),('2010-01-01 00:00:15',109605),('2010-01-01 00:00:30',170134),('2010-01-01 00:00:45',229371),('2010-01-01 00:01:00',286252)] 
└────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

20 rows in set. Elapsed: 0.004 sec. 

This sorts each array by the val (second element of the tuple) for each id.

Simplified Sorting Example - We can rewrite the query in a more concise way using WITH clauses for better readability:

WITH 
    groupArray(tuple(ts, val)) as window_rows,
    arraySort(x -> x.1, window_rows) as sorted_window_rows
SELECT 
    id, 
    sorted_window_rows
FROM test_running_difference
GROUP BY id

Applying Calculations with Arrays - Once the data is sorted, you can apply array functions like arrayMap and arrayDifference to calculate differences between values in the arrays:

WITH 
    groupArray(tuple(ts, val)) as window_rows,
    arraySort(x -> x.1, window_rows) as sorted_window_rows,
    arrayMap(x -> x.2, sorted_window_rows) as sorted_window_rows_val_column,
	arrayDifference(sorted_window_rows_val_column) as sorted_window_rows_val_column_diff
SELECT 
    id, 
    sorted_window_rows_val_column_diff
FROM test_running_difference
GROUP BY id

┌─id─┬─sorted_window_rows_val_column_diff─┐
  0  [0,68048,68243,72389,67860]        
  1  [0,19397,17905,16978,18345]        
  2  [0,70191,74231,69240,72382]        
  3  [0,10567,11065,10228,14239]        
  4  [0,80525,77499,80965,76498]        
  5  [0,7327,6506,6757,5277]            
  6  [0,69761,67526,65922,66348]        
  7  [0,2745,-3773,3413,-1719]          
  8  [0,23815,24202,22524,23629]        
  9  [0,40922,35308,41437,34622]        
 10  [0,45758,42517,42492,45049]        
 11  [0,20155,22659,17389,22179]        
 12  [0,7892,10154,6222,10685]          
 13  [0,20230,19041,17251,19807]        
 14  [0,38239,36631,38050,34276]        
 15  [0,53965,56298,58745,54535]        
 16  [0,54261,52463,53877,54821]        
 17  [0,64062,60782,61018,61366]        
 18  [0,66606,64915,66918,67382]        
 19  [0,58216,60529,59237,56881]        
└────┴────────────────────────────────────┘

20 rows in set. Elapsed: 0.005 sec. 

You can do also a lot of magic with arrayEnumerate and accessing different values by their ids.

Reverting Arrays Back to Rows - You can convert the arrays back into rows using arrayJoin:

WITH 
    groupArray(tuple(ts, val)) as window_rows,
    arraySort(x -> x.1, window_rows) as sorted_window_rows,
    arrayMap(x -> x.2, sorted_window_rows) as sorted_window_rows_val_column,
	arrayDifference(sorted_window_rows_val_column) as sorted_window_rows_val_column_diff,
	arrayJoin(sorted_window_rows_val_column_diff) as diff
SELECT 
    id, 
    diff
FROM test_running_difference
GROUP BY id

Or use ARRAY JOIN to join the arrays back to the original structure:

SELECT 
  id,
  diff,
  ts
FROM 
(
WITH 
    groupArray(tuple(ts, val)) as window_rows,
    arraySort(x -> x.1, window_rows) as sorted_window_rows,
    arrayMap(x -> x.2, sorted_window_rows) as sorted_window_rows_val_column
SELECT 
    id, 
	arrayDifference(sorted_window_rows_val_column) as sorted_window_rows_val_column_diff,
    arrayMap(x -> x.1, sorted_window_rows) as sorted_window_rows_ts_column
FROM test_running_difference
GROUP BY id
) as t1
ARRAY JOIN sorted_window_rows_val_column_diff as diff, sorted_window_rows_ts_column as ts

This allows you to manipulate and analyze data within arrays effectively, using powerful functions such as arrayMap, arrayDifference, and arrayEnumerate.