arrayMap, arrayJoin or ARRAY JOIN memory usage
arrayMap-like functions memory usage calculation.
In order to calculate arrayMap or similar array* functions ClickHouse® temporarily does arrayJoin-like operation, which in certain conditions can lead to huge memory usage for big arrays.
So for example, you have 2 columns:
Let’s say we want to multiply array elements at corresponding positions.
ClickHouse create temporary structure in memory like this:
We can roughly estimate memory usage by multiplying the size of columns participating in the lambda function by the size of the unnested array.
And total memory usage will be 55 values (5(array size)*2(array count)*5(row count) + 5(unnested array size)), which is 5.5 times more than initial array size.
But what if we write this function in a more logical way, so we wouldn’t use any unnested arrays in lambda.
ClickHouse create temporary structure in memory like this:
We have only 10 values, which is no more than what we have in initial arrays.
The same approach can be applied to other array* function with arrayMap-like capabilities to use lambda functions and ARRAY JOIN (arrayJoin).
Examples with bigger arrays:
Which data types we have in those arrays?
So each value use 4 bytes.
By default ClickHouse execute query by blocks of 65515 rows (max_block_size
setting value)
Lets estimate query total memory usage given previous calculations.
Correlation is pretty clear.
What if we will reduce size of blocks used for query execution?
Memory usage down in 4 times, which has strong correlation with our change: 65k -> 16k ~ 4 times.
Almost 100 times faster than first query!