This is the multi-page printable view of this section. Click here to print.
Queries & Syntax
- 1: GROUP BY
- 1.1: GROUP BY tricks
- 2: DateTime64
- 3: DISTINCT & GROUP BY & LIMIT 1 BY what the difference
- 4: Imprecise parsing of literal Decimal or Float64
- 5: Multiple aligned date columns in PARTITION BY expression
- 6: Why is simple `SELECT count()` Slow in ClickHouse?
- 7: Collecting query execution flamegraphs using system.trace_log
- 8: Using array functions to mimic window-functions alike behavior
- 9: -State & -Merge combinators
- 10: ALTER MODIFY COLUMN is stuck, the column is inaccessible.
- 11: ANSI SQL mode
- 12: Async INSERTs
- 13: Atomic insert
- 14: Cumulative Anything
- 15: Data types on disk and in RAM
- 16: DELETE via tombstone column
- 17: EXPLAIN query
- 18: Fill missing values at query time
- 19: FINAL clause speed
- 20: Join with Calendar using Arrays
- 21: JOINs
- 21.1: JOIN table engine
- 22: JSONExtract to parse many attributes at a time
- 23: KILL QUERY
- 24: Lag / Lead
- 25: Machine learning in ClickHouse
- 26: Mutations
- 27: OPTIMIZE vs OPTIMIZE FINAL
- 28: Parameterized views
- 29: Use both projection and raw data in single query
- 30: PIVOT / UNPIVOT
- 31: Possible deadlock avoided. Client should retry
- 32: Projections examples
- 33: Roaring bitmaps for calculating retention
- 34: SAMPLE by
- 35: Sampling Example
- 36: Simple aggregate functions & combinators
- 37: Skip indexes
- 37.1: Example: minmax
- 37.2: Skip index bloom_filter Example
- 37.3: Skip indexes examples
- 38: Time zones
- 39: Time-series alignment with interpolation
- 40: Top N & Remain
- 41: Troubleshooting
- 42: TTL
- 42.1: MODIFY (ADD) TTL
- 42.2: What are my TTL settings?
- 42.3: TTL GROUP BY Examples
- 42.4: TTL Recompress example
- 43: UPDATE via Dictionary
- 44: Values mapping
- 45: Window functions
1 - GROUP BY
Internal implementation
ClickHouse uses non-blocking? hash tables, so each thread has at least one hash table.
It makes easier to not care about sync between multiple threads, but has such disadvantages as:
- Bigger memory usage.
- Needs to merge those per-thread hash tables afterwards.
Because second step can be a bottleneck in case of a really big GROUP BY with a lot of distinct keys, another solution has been made.
Two-Level
https://youtu.be/SrucFOs8Y6c?t=2132
┌─name───────────────────────────────┬─value────┬─changed─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─min──┬─max──┬─readonly─┬─type───┐
│ group_by_two_level_threshold │ 100000 │ 0 │ From what number of keys, a two-level aggregation starts. 0 - the threshold is not set. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ UInt64 │
│ group_by_two_level_threshold_bytes │ 50000000 │ 0 │ From what size of the aggregation state in bytes, a two-level aggregation begins to be used. 0 - the threshold is not set. Two-level aggregation is used when at least one of the thresholds is triggered. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ UInt64 │
└────────────────────────────────────┴──────────┴─────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────┴──────┴──────────┴────────┘
In order to parallelize merging of hash tables, ie execute such merge via multiple threads, ClickHouse use two-level approach:
On the first step ClickHouse creates 256 buckets for each thread. (determined by one byte of hash function) On the second step ClickHouse can merge those 256 buckets independently by multiple threads.
GROUP BY in external memory
It utilizes a two-level group by and dumps those buckets on disk. And at the last stage ClickHouse will read those buckets from disk one by one and merge them. So you should have enough RAM to hold one bucket (1/256 of whole GROUP BY size).
optimize_aggregation_in_order GROUP BY
Usually it works slower than regular GROUP BY, because ClickHouse need’s to read and process data in specific ORDER, which makes it much more complicated to parallelize reading and aggregating.
But it use much less memory, because ClickHouse can stream resultset and there is no need to keep it in memory.
Last item cache
ClickHouse saves value of previous hash calculation, just in case next value will be the same.
https://github.com/ClickHouse/ClickHouse/pull/5417 https://github.com/ClickHouse/ClickHouse/blob/808d9afd0f8110faba5ae027051bf0a64e506da3/src/Common/ColumnsHashingImpl.h#L40
StringHashMap
Actually uses 5 different hash tables
- For empty strings
- For strings < 8 bytes
- For strings < 16 bytes
- For strings < 24 bytes
- For strings > 24 bytes
SELECT count()
FROM
(
SELECT materialize('1234567890123456') AS key -- length(key) = 16
FROM numbers(1000000000)
)
GROUP BY key
Aggregator: Aggregation method: key_string
Elapsed: 8.888 sec. Processed 1.00 billion rows, 8.00 GB (112.51 million rows/s., 900.11 MB/s.)
SELECT count()
FROM
(
SELECT materialize('12345678901234567') AS key -- length(key) = 17
FROM numbers(1000000000)
)
GROUP BY key
Aggregator: Aggregation method: key_string
Elapsed: 9.089 sec. Processed 1.00 billion rows, 8.00 GB (110.03 million rows/s., 880.22 MB/s.)
SELECT count()
FROM
(
SELECT materialize('123456789012345678901234') AS key -- length(key) = 24
FROM numbers(1000000000)
)
GROUP BY key
Aggregator: Aggregation method: key_string
Elapsed: 9.134 sec. Processed 1.00 billion rows, 8.00 GB (109.49 million rows/s., 875.94 MB/s.)
SELECT count()
FROM
(
SELECT materialize('1234567890123456789012345') AS key -- length(key) = 25
FROM numbers(1000000000)
)
GROUP BY key
Aggregator: Aggregation method: key_string
Elapsed: 12.566 sec. Processed 1.00 billion rows, 8.00 GB (79.58 million rows/s., 636.67 MB/s.)
length
16 8.89 17 9.09 24 9.13 25 12.57
For what GROUP BY statement use memory
- Hash tables
It will grow with:
Amount of unique combinations of keys participated in GROUP BY
Size of keys participated in GROUP BY
- States of aggregation functions:
Be careful with function, which state can use unrestricted amount of memory and grow indefenetely:
- groupArray (groupArray(1000)())
- uniqExact (uniq,uniqCombined)
- quantileExact (medianExact) (quantile,quantileTDigest)
- windowFunnel
- groupBitmap
- sequenceCount (sequenceMatch)
- *Map
Why my GROUP BY eat all the RAM
run your query with
set send_logs_level='trace'
Remove all aggregation functions from the query, try to understand how many memory simple GROUP BY will take.
One by one remove aggregation functions from query in order to understand which one is taking most of memory
1.1 - GROUP BY tricks
Tricks
Testing dataset
CREATE TABLE sessions
(
`app` LowCardinality(String),
`user_id` String,
`created_at` DateTime,
`platform` LowCardinality(String),
`clicks` UInt32,
`session_id` UUID
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_at)
ORDER BY (app, user_id, session_id, created_at)
INSERT INTO sessions WITH
CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app,
concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id,
toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at,
CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform,
number % 17 AS clicks,
generateUUIDv4() AS session_id
SELECT
app,
user_id,
created_at,
platform,
clicks,
session_id
FROM numbers_mt(1000000000)
0 rows in set. Elapsed: 46.078 sec. Processed 1.00 billion rows, 8.00 GB (21.70 million rows/s., 173.62 MB/s.)
┌─database─┬─table────┬─column─────┬─type───────────────────┬───────rows─┬─compressed_bytes─┬─compressed─┬─uncompressed─┬──────────────ratio─┬─codec─┐
│ default │ sessions │ session_id │ UUID │ 1000000000 │ 16065918103 │ 14.96 GiB │ 14.90 GiB │ 0.9958970223439835 │ │
│ default │ sessions │ user_id │ String │ 1000000000 │ 3056977462 │ 2.85 GiB │ 13.04 GiB │ 4.57968701896828 │ │
│ default │ sessions │ clicks │ UInt32 │ 1000000000 │ 1859359032 │ 1.73 GiB │ 3.73 GiB │ 2.151278979023993 │ │
│ default │ sessions │ created_at │ DateTime │ 1000000000 │ 1332089630 │ 1.24 GiB │ 3.73 GiB │ 3.0028009451586226 │ │
│ default │ sessions │ platform │ LowCardinality(String) │ 1000000000 │ 329702248 │ 314.43 MiB │ 956.63 MiB │ 3.042446801879252 │ │
│ default │ sessions │ app │ LowCardinality(String) │ 1000000000 │ 4825544 │ 4.60 MiB │ 956.63 MiB │ 207.87333386660654 │ │
└──────────┴──────────┴────────────┴────────────────────────┴────────────┴──────────────────┴────────────┴──────────────┴────────────────────┴───────┘
All queries and datasets are unique, so in different situations different hacks could work better or worse.
PreFilter values before GROUP BY
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE created_at > '2021-11-01 00:00:00'
GROUP BY user_id
HAVING (argMax(clicks, created_at) = 16) AND (argMax(platform, created_at) = 'Rat')
FORMAT `Null`
<Debug> MemoryTracker: Peak memory usage (for query): 18.36 GiB.
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE user_id IN (
SELECT user_id
FROM sessions
WHERE (platform = 'Rat') AND (clicks = 16) AND (created_at > '2021-11-01 00:00:00') -- So we will select user_id which could potentially match our HAVING clause in OUTER query.
) AND (created_at > '2021-11-01 00:00:00')
GROUP BY user_id
HAVING (argMax(clicks, created_at) = 16) AND (argMax(platform, created_at) = 'Rat')
FORMAT `Null`
<Debug> MemoryTracker: Peak memory usage (for query): 4.43 GiB.
Use Fixed-width data types instead of String
For example, you have 2 strings which has values in special form like this
‘ABX 1412312312313’
You can just remove the first 4 characters and convert the rest to UInt64
toUInt64(substr(‘ABX 1412312312313’,5))
And you packed 17 bytes in 8, more than 2 times the improvement of size!
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
user_id,
platform
FORMAT `Null`
Aggregator: Aggregation method: serialized
<Debug> MemoryTracker: Peak memory usage (for query): 28.19 GiB.
Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.)
WITH
CAST(user_id, 'FixedString(14)') AS user_fx,
CAST(platform, 'FixedString(4)') AS platform_fx
SELECT
user_fx,
sum(clicks)
FROM sessions
GROUP BY
user_fx,
platform_fx
FORMAT `Null`
Aggregator: Aggregation method: keys256
MemoryTracker: Peak memory usage (for query): 22.24 GiB.
Elapsed: 6.637 sec. Processed 1.00 billion rows, 27.00 GB (150.67 million rows/s., 4.07 GB/s.)
WITH
CAST(user_id, 'FixedString(14)') AS user_fx,
CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum
SELECT
user_fx,
sum(clicks)
FROM sessions
GROUP BY
user_fx,
platform_enum
FORMAT `Null`
Aggregator: Aggregation method: keys128
MemoryTracker: Peak memory usage (for query): 14.14 GiB.
Elapsed: 5.335 sec. Processed 1.00 billion rows, 27.00 GB (187.43 million rows/s., 5.06 GB/s.)
WITH
toUInt32OrZero(trim( LEADING '0' FROM substr(user_id,6))) AS user_int,
CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum
SELECT
user_int,
sum(clicks)
FROM sessions
GROUP BY
user_int,
platform_enum
FORMAT `Null`
Aggregator: Aggregation method: keys64
MemoryTracker: Peak memory usage (for query): 10.14 GiB.
Elapsed: 8.549 sec. Processed 1.00 billion rows, 27.00 GB (116.97 million rows/s., 3.16 GB/s.)
WITH
toUInt32('1' || substr(user_id,6)) AS user_int,
CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum
SELECT
user_int,
sum(clicks)
FROM sessions
GROUP BY
user_int,
platform_enum
FORMAT `Null`
Aggregator: Aggregation method: keys64
Peak memory usage (for query): 10.14 GiB.
Elapsed: 6.247 sec. Processed 1.00 billion rows, 27.00 GB (160.09 million rows/s., 4.32 GB/s.)
It can be especially useful when you tries to do GROUP BY lc_column_1, lc_column_2 and ClickHouse falls back to serialized algorithm.
Two LowCardinality Columns in GROUP BY
SELECT
app,
sum(clicks)
FROM sessions
GROUP BY app
FORMAT `Null`
Aggregator: Aggregation method: low_cardinality_key_string
MemoryTracker: Peak memory usage (for query): 43.81 MiB.
Elapsed: 0.545 sec. Processed 1.00 billion rows, 5.00 GB (1.83 billion rows/s., 9.17 GB/s.)
SELECT
app,
platform,
sum(clicks)
FROM sessions
GROUP BY
app,
platform
FORMAT `Null`
Aggregator: Aggregation method: serialized -- Slowest method!
MemoryTracker: Peak memory usage (for query): 222.86 MiB.
Elapsed: 2.923 sec. Processed 1.00 billion rows, 6.00 GB (342.11 million rows/s., 2.05 GB/s.)
SELECT
CAST(app, 'FixedString(6)') AS app_fx,
CAST(platform, 'FixedString(4)') AS platform_fx,
sum(clicks)
FROM sessions
GROUP BY
app_fx,
platform_fx
FORMAT `Null`
Aggregator: Aggregation method: keys128
MemoryTracker: Peak memory usage (for query): 160.23 MiB.
Elapsed: 1.632 sec. Processed 1.00 billion rows, 6.00 GB (612.63 million rows/s., 3.68 GB/s.)
Split your query in multiple smaller queries and execute them one BY one
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
user_id,
platform
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 28.19 GiB.
Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.)
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE (cityHash64(user_id) % 4) = 0
GROUP BY
user_id,
platform
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 8.16 GiB.
Elapsed: 2.910 sec. Processed 1.00 billion rows, 27.00 GB (343.64 million rows/s., 9.28 GB/s.)
Shard your data by one of common high cardinal GROUP BY key
So on each shard you will have 1/N of all unique combination and this will result in smaller hash tables.
Let’s create 2 distributed tables with different distribution: rand() and by user_id
CREATE TABLE sessions_distributed AS sessions
ENGINE = Distributed('distr-groupby', default, sessions, rand());
INSERT INTO sessions_distributed WITH
CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app,
concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id,
toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at,
CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform,
number % 17 AS clicks,
generateUUIDv4() AS session_id
SELECT
app,
user_id,
created_at,
platform,
clicks,
session_id
FROM numbers_mt(1000000000);
CREATE TABLE sessions_2 ON CLUSTER 'distr-groupby'
(
`app` LowCardinality(String),
`user_id` String,
`created_at` DateTime,
`platform` LowCardinality(String),
`clicks` UInt32,
`session_id` UUID
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_at)
ORDER BY (app, user_id, session_id, created_at);
CREATE TABLE sessions_distributed_2 AS sessions
ENGINE = Distributed('distr-groupby', default, sessions_2, cityHash64(user_id));
INSERT INTO sessions_distributed_2 WITH
CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app,
concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id,
toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at,
CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform,
number % 17 AS clicks,
generateUUIDv4() AS session_id
SELECT
app,
user_id,
created_at,
platform,
clicks,
session_id
FROM numbers_mt(1000000000);
SELECT
app,
platform,
sum(clicks)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS clicks
FROM sessions_distributed
GROUP BY user_id
)
GROUP BY
app,
platform;
[chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 12.02 GiB.
[chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 12.05 GiB.
[chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 12.05 GiB.
MemoryTracker: Peak memory usage (for query): 12.20 GiB.
12 rows in set. Elapsed: 28.345 sec. Processed 1.00 billion rows, 32.00 GB (35.28 million rows/s., 1.13 GB/s.)
SELECT
app,
platform,
sum(clicks)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS clicks
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform;
[chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
MemoryTracker: Peak memory usage (for query): 5.61 GiB.
12 rows in set. Elapsed: 11.952 sec. Processed 1.00 billion rows, 32.00 GB (83.66 million rows/s., 2.68 GB/s.)
SELECT
app,
platform,
sum(clicks)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS clicks
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS optimize_distributed_group_by_sharding_key = 1
[chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
MemoryTracker: Peak memory usage (for query): 5.61 GiB.
12 rows in set. Elapsed: 11.916 sec. Processed 1.00 billion rows, 32.00 GB (83.92 million rows/s., 2.69 GB/s.)
SELECT
app,
platform,
sum(clicks)
FROM cluster('distr-groupby', view(
SELECT
app,
platform,
sum(clicks) as clicks
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS clicks
FROM sessions_2
GROUP BY user_id
)
GROUP BY
app,
platform
))
GROUP BY
app,
platform;
[chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
MemoryTracker: Peak memory usage (for query): 5.55 GiB.
12 rows in set. Elapsed: 9.491 sec. Processed 1.00 billion rows, 32.00 GB (105.36 million rows/s., 3.37 GB/s.)
Query with bigger state:
SELECT
app,
platform,
sum(last_click) as sum,
max(max_clicks) as max,
min(min_clicks) as min,
max(max_time) as max_time,
min(min_time) as min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_distributed
GROUP BY user_id
)
GROUP BY
app,
platform;
MemoryTracker: Peak memory usage (for query): 19.95 GiB.
12 rows in set. Elapsed: 34.339 sec. Processed 1.00 billion rows, 32.00 GB (29.12 million rows/s., 932.03 MB/s.)
SELECT
app,
platform,
sum(last_click) as sum,
max(max_clicks) as max,
min(min_clicks) as min,
max(max_time) as max_time,
min(min_time) as min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform;
MemoryTracker: Peak memory usage (for query): 10.09 GiB.
12 rows in set. Elapsed: 13.220 sec. Processed 1.00 billion rows, 32.00 GB (75.64 million rows/s., 2.42 GB/s.)
SELECT
app,
platform,
sum(last_click) AS sum,
max(max_clicks) AS max,
min(min_clicks) AS min,
max(max_time) AS max_time,
min(min_time) AS min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS optimize_distributed_group_by_sharding_key = 1;
MemoryTracker: Peak memory usage (for query): 10.09 GiB.
12 rows in set. Elapsed: 13.361 sec. Processed 1.00 billion rows, 32.00 GB (74.85 million rows/s., 2.40 GB/s.)
SELECT
app,
platform,
sum(last_click) AS sum,
max(max_clicks) AS max,
min(min_clicks) AS min,
max(max_time) AS max_time,
min(min_time) AS min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS distributed_group_by_no_merge=2;
MemoryTracker: Peak memory usage (for query): 10.02 GiB.
12 rows in set. Elapsed: 9.789 sec. Processed 1.00 billion rows, 32.00 GB (102.15 million rows/s., 3.27 GB/s.)
SELECT
app,
platform,
sum(sum),
max(max),
min(min),
max(max_time) AS max_time,
min(min_time) AS min_time
FROM cluster('distr-groupby', view(
SELECT
app,
platform,
sum(last_click) AS sum,
max(max_clicks) AS max,
min(min_clicks) AS min,
max(max_time) AS max_time,
min(min_time) AS min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_2
GROUP BY user_id
)
GROUP BY
app,
platform
))
GROUP BY
app,
platform;
MemoryTracker: Peak memory usage (for query): 10.09 GiB.
12 rows in set. Elapsed: 9.525 sec. Processed 1.00 billion rows, 32.00 GB (104.98 million rows/s., 3.36 GB/s.)
SELECT
app,
platform,
sum(sessions)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
uniq(session_id) as sessions
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
MemoryTracker: Peak memory usage (for query): 14.57 GiB.
12 rows in set. Elapsed: 37.730 sec. Processed 1.00 billion rows, 44.01 GB (26.50 million rows/s., 1.17 GB/s.)
SELECT
app,
platform,
sum(sessions)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
uniq(session_id) as sessions
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS optimize_distributed_group_by_sharding_key = 1;
MemoryTracker: Peak memory usage (for query): 14.56 GiB.
12 rows in set. Elapsed: 37.792 sec. Processed 1.00 billion rows, 44.01 GB (26.46 million rows/s., 1.16 GB/s.)
SELECT
app,
platform,
sum(sessions)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
uniq(session_id) as sessions
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS distributed_group_by_no_merge = 2;
MemoryTracker: Peak memory usage (for query): 14.54 GiB.
12 rows in set. Elapsed: 17.762 sec. Processed 1.00 billion rows, 44.01 GB (56.30 million rows/s., 2.48 GB/s.)
SELECT
app,
platform,
sum(sessions)
FROM cluster('distr-groupby', view(
SELECT
app,
platform,
sum(sessions) as sessions
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
uniq(session_id) as sessions
FROM sessions_2
GROUP BY user_id
)
GROUP BY
app,
platform))
GROUP BY
app,
platform
MemoryTracker: Peak memory usage (for query): 14.55 GiB.
12 rows in set. Elapsed: 17.574 sec. Processed 1.00 billion rows, 44.01 GB (56.90 million rows/s., 2.50 GB/s.)
Reduce number of threads
Because each thread uses an independent hash table, if you lower thread amount it will reduce number of hash tables as well and lower memory usage at the cost of slower query execution.
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
user_id,
platform
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 28.19 GiB.
Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.)
SET max_threads = 2;
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
user_id,
platform
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 13.26 GiB.
Elapsed: 62.014 sec. Processed 1.00 billion rows, 27.00 GB (16.13 million rows/s., 435.41 MB/s.)
UNION ALL
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
app,
user_id
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 24.19 GiB.
Elapsed: 5.043 sec. Processed 1.00 billion rows, 27.00 GB (198.29 million rows/s., 5.35 GB/s.)
SELECT
user_id,
sum(clicks)
FROM sessions WHERE app = 'Orange'
GROUP BY
user_id
UNION ALL
SELECT
user_id,
sum(clicks)
FROM sessions WHERE app = 'Red'
GROUP BY
user_id
UNION ALL
SELECT
user_id,
sum(clicks)
FROM sessions WHERE app = 'Melon'
GROUP BY
user_id
UNION ALL
SELECT
user_id,
sum(clicks)
FROM sessions WHERE app = 'Blue'
GROUP BY
user_id
FORMAT Null
MemoryTracker: Peak memory usage (for query): 7.99 GiB.
Elapsed: 2.852 sec. Processed 1.00 billion rows, 27.01 GB (350.74 million rows/s., 9.47 GB/s.)
aggregation_in_order
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE app = 'Orange'
GROUP BY user_id
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 969.33 MiB.
Elapsed: 2.494 sec. Processed 250.09 million rows, 6.75 GB (100.27 million rows/s., 2.71 GB/s.)
SET optimize_aggregation_in_order = 1;
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE app = 'Orange'
GROUP BY
app,
user_id
FORMAT `Null`
AggregatingInOrderTransform: Aggregating in order
MemoryTracker: Peak memory usage (for query): 169.24 MiB.
Elapsed: 4.925 sec. Processed 250.09 million rows, 6.75 GB (50.78 million rows/s., 1.37 GB/s.)
Reduce dimensions from GROUP BY with functions like sumMap, *Resample
One
SELECT
user_id,
toDate(created_at) AS day,
sum(clicks)
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange', 'Red', 'Blue'))
GROUP BY
user_id,
day
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 50.74 GiB.
Elapsed: 22.671 sec. Processed 594.39 million rows, 18.46 GB (26.22 million rows/s., 814.41 MB/s.)
SELECT
user_id,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks
FROM
(
SELECT
user_id,
sumResample(0, 31, 1)(clicks, toDate(created_at) - toDate('2021-10-01')) AS clicks_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange', 'Red', 'Blue'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks,
arrayEnumerate(clicks_arr) AS date_diff
FORMAT `Null`
Peak memory usage (for query): 8.24 GiB.
Elapsed: 5.191 sec. Processed 594.39 million rows, 18.46 GB (114.50 million rows/s., 3.56 GB/s.)
Multiple
SELECT
user_id,
platform,
toDate(created_at) AS day,
sum(clicks)
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) AND user_id ='UID: 08525196'
GROUP BY
user_id,
platform,
day
ORDER BY user_id,
day,
platform
FORMAT `Null`
Peak memory usage (for query): 29.50 GiB.
Elapsed: 8.181 sec. Processed 198.14 million rows, 6.34 GB (24.22 million rows/s., 775.14 MB/s.)
WITH arrayJoin(arrayZip(clicks_arr_lvl_2, range(3))) AS clicks_res
SELECT
user_id,
CAST(clicks_res.2 + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks_res.1 AS clicks
FROM
(
SELECT
user_id,
sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks_arr_lvl_2,
range(31) AS date_diff
FORMAT `Null`
Peak memory usage (for query): 9.92 GiB.
Elapsed: 4.170 sec. Processed 198.14 million rows, 6.34 GB (47.52 million rows/s., 1.52 GB/s.)
WITH arrayJoin(arrayZip(clicks_arr_lvl_2, range(3))) AS clicks_res
SELECT
user_id,
CAST(clicks_res.2 + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks_res.1 AS clicks
FROM
(
SELECT
user_id,
sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks_arr_lvl_2,
range(31) AS date_diff
WHERE clicks > 0
FORMAT `Null`
Peak memory usage (for query): 10.14 GiB.
Elapsed: 9.533 sec. Processed 198.14 million rows, 6.34 GB (20.78 million rows/s., 665.20 MB/s.)
SELECT
user_id,
CAST(plat + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks
FROM
(
WITH
(SELECT flatten(arrayMap(x -> range(3) AS platforms, range(31) as days))) AS platform_arr,
(SELECT flatten(arrayMap(x -> [x, x, x], range(31) as days))) AS days_arr
SELECT
user_id,
flatten(sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01'))) AS clicks_arr,
platform_arr,
days_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks,
platform_arr AS plat,
days_arr AS date_diff
FORMAT `Null`
Peak memory usage (for query): 9.95 GiB.
Elapsed: 3.095 sec. Processed 198.14 million rows, 6.34 GB (64.02 million rows/s., 2.05 GB/s.)
SELECT
user_id,
CAST(plat + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks
FROM
(
WITH
(SELECT flatten(arrayMap(x -> range(3) AS platforms, range(31) as days))) AS platform_arr,
(SELECT flatten(arrayMap(x -> [x, x, x], range(31) as days))) AS days_arr
SELECT
user_id,
sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr,
arrayFilter(x -> ((x.1) > 0), arrayZip(flatten(clicks_arr), platform_arr, days_arr)) AS result
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
result.1 AS clicks,
result.2 AS plat,
result.3 AS date_diff
FORMAT `Null`
Peak memory usage (for query): 9.93 GiB.
Elapsed: 4.717 sec. Processed 198.14 million rows, 6.34 GB (42.00 million rows/s., 1.34 GB/s.)
SELECT
user_id,
CAST(range % 3, 'Enum8(\'Rat\' = 0, \'Mice\' = 1, \'Bat\' = 2)') AS platform,
toDate('2021-10-01') + intDiv(range, 3) AS day,
clicks
FROM
(
WITH (
SELECT range(93)
) AS range_arr
SELECT
user_id,
sumResample(0, 93, 1)(clicks, ((toDate(created_at) - toDate('2021-10-01')) * 3) + toUInt8(CAST(platform, 'Enum8(\'Rat\' = 0, \'Mice\' = 1, \'Bat\' = 2)'))) AS clicks_arr,
range_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks,
range_arr AS range
FORMAT `Null`
Peak memory usage (for query): 8.24 GiB.
Elapsed: 4.838 sec. Processed 198.14 million rows, 6.36 GB (40.95 million rows/s., 1.31 GB/s.)
SELECT
user_id,
sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
FORMAT `Null`
Peak memory usage (for query): 5.19 GiB.
0 rows in set. Elapsed: 1.160 sec. Processed 198.14 million rows, 6.34 GB (170.87 million rows/s., 5.47 GB/s.)
ARRAY JOIN can be expensive
https://kb.altinity.com/altinity-kb-functions/array-like-memory-usage/
sumMap, *Resample
https://kb.altinity.com/altinity-kb-functions/resample-vs-if-vs-map-vs-subquery/
Play with two-level
Disable:
SET group_by_two_level_threshold = 0, group_by_two_level_threshold_bytes = 0;
From 22.4 ClickHouse can predict, when it make sense to initialize aggregation with two-level from start, instead of rehashing on fly. It can improve query time. https://github.com/ClickHouse/ClickHouse/pull/33439
GROUP BY in external memory
Slow!
Use hash function for GROUP BY keys
GROUP BY cityHash64(‘xxxx’)
Can lead to incorrect results as hash functions is not 1 to 1 mapping.
Performance bugs
https://github.com/ClickHouse/ClickHouse/issues/15005
https://github.com/ClickHouse/ClickHouse/issues/29131
https://github.com/ClickHouse/ClickHouse/issues/31120
https://github.com/ClickHouse/ClickHouse/issues/35096 Fixed in 22.7
2 - DateTime64
Substract fractional seconds
WITH toDateTime64('2021-09-07 13:41:50.926', 3) AS time
SELECT
time - 1,
time - 0.1 AS no_affect,
time - toDecimal64(0.1, 3) AS uncorrect_result,
time - toIntervalMillisecond(100) AS correct_result -- from 22.4
Query id: 696722bd-3c22-4270-babe-c6b124fee97f
┌──────────minus(time, 1)─┬───────────────no_affect─┬────────uncorrect_result─┬──────────correct_result─┐
│ 2021-09-07 13:41:49.926 │ 2021-09-07 13:41:50.926 │ 1970-01-01 00:00:00.000 │ 2021-09-07 13:41:50.826 │
└─────────────────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┘
WITH
toDateTime64('2021-03-03 09:30:00.100', 3) AS time,
fromUnixTimestamp64Milli(toInt64(toUnixTimestamp64Milli(time) + (1.25 * 1000))) AS first,
toDateTime64(toDecimal64(time, 3) + toDecimal64('1.25', 3), 3) AS second,
reinterpret(reinterpret(time, 'Decimal64(3)') + toDecimal64('1.25', 3), 'DateTime64(3)') AS third,
time + toIntervalMillisecond(1250) AS fourth, -- from 22.4
addMilliseconds(time, 1250) AS fifth -- from 22.4
SELECT
first,
second,
third,
fourth,
fifth
Query id: 176cd2e7-68bf-4e26-a492-63e0b5a87cc5
┌───────────────────first─┬──────────────────second─┬───────────────────third─┬──────────────────fourth─┬───────────────────fifth─┐
│ 2021-03-03 09:30:01.350 │ 2021-03-03 09:30:01.350 │ 2021-03-03 09:30:01.350 │ 2021-03-03 09:30:01.350 │ 2021-03-03 09:30:01.350 │
└─────────────────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┘
SET max_threads=1;
Starting from 22.4
WITH
materialize(toDateTime64('2021-03-03 09:30:00.100', 3)) AS time,
time + toIntervalMillisecond(1250) AS fourth
SELECT count()
FROM numbers(100000000)
WHERE NOT ignore(fourth)
1 rows in set. Elapsed: 0.215 sec. Processed 100.03 million rows, 800.21 MB (464.27 million rows/s., 3.71 GB/s.)
WITH
materialize(toDateTime64('2021-03-03 09:30:00.100', 3)) AS time,
addMilliseconds(time, 1250) AS fifth
SELECT count()
FROM numbers(100000000)
WHERE NOT ignore(fifth)
1 rows in set. Elapsed: 0.208 sec. Processed 100.03 million rows, 800.21 MB (481.04 million rows/s., 3.85 GB/s.)
###########
WITH
materialize(toDateTime64('2021-03-03 09:30:00.100', 3)) AS time,
fromUnixTimestamp64Milli(reinterpretAsInt64(toUnixTimestamp64Milli(time) + (1.25 * 1000))) AS first
SELECT count()
FROM numbers(100000000)
WHERE NOT ignore(first)
1 rows in set. Elapsed: 0.370 sec. Processed 100.03 million rows, 800.21 MB (270.31 million rows/s., 2.16 GB/s.)
WITH
materialize(toDateTime64('2021-03-03 09:30:00.100', 3)) AS time,
fromUnixTimestamp64Milli(toUnixTimestamp64Milli(time) + toInt16(1.25 * 1000)) AS first
SELECT count()
FROM numbers(100000000)
WHERE NOT ignore(first)
1 rows in set. Elapsed: 0.256 sec. Processed 100.03 million rows, 800.21 MB (391.06 million rows/s., 3.13 GB/s.)
WITH
materialize(toDateTime64('2021-03-03 09:30:00.100', 3)) AS time,
toDateTime64(toDecimal64(time, 3) + toDecimal64('1.25', 3), 3) AS second
SELECT count()
FROM numbers(100000000)
WHERE NOT ignore(second)
1 rows in set. Elapsed: 2.240 sec. Processed 100.03 million rows, 800.21 MB (44.65 million rows/s., 357.17 MB/s.)
SET decimal_check_overflow=0;
WITH
materialize(toDateTime64('2021-03-03 09:30:00.100', 3)) AS time,
toDateTime64(toDecimal64(time, 3) + toDecimal64('1.25', 3), 3) AS second
SELECT count()
FROM numbers(100000000)
WHERE NOT ignore(second)
1 rows in set. Elapsed: 1.991 sec. Processed 100.03 million rows, 800.21 MB (50.23 million rows/s., 401.81 MB/s.)
WITH
materialize(toDateTime64('2021-03-03 09:30:00.100', 3)) AS time,
reinterpret(reinterpret(time, 'Decimal64(3)') + toDecimal64('1.25', 3), 'DateTime64(3)') AS third
SELECT count()
FROM numbers(100000000)
WHERE NOT ignore(third)
1 rows in set. Elapsed: 0.515 sec. Processed 100.03 million rows, 800.21 MB (194.39 million rows/s., 1.56 GB/s.)
SET decimal_check_overflow=0;
WITH
materialize(toDateTime64('2021-03-03 09:30:00.100', 3)) AS time,
reinterpret(reinterpret(time, 'Decimal64(3)') + toDecimal64('1.25', 3), 'DateTime64(3)') AS third
SELECT count()
FROM numbers(100000000)
WHERE NOT ignore(third)
1 rows in set. Elapsed: 0.281 sec. Processed 100.03 million rows, 800.21 MB (356.21 million rows/s., 2.85 GB/s.)
3 - DISTINCT & GROUP BY & LIMIT 1 BY what the difference
DISTINCT
SELECT DISTINCT number
FROM numbers_mt(100000000)
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 4.00 GiB.
0 rows in set. Elapsed: 18.720 sec. Processed 100.03 million rows, 800.21 MB (5.34 million rows/s., 42.75 MB/s.)
SELECT DISTINCT number
FROM numbers_mt(100000000)
SETTINGS max_threads = 1
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 4.00 GiB.
0 rows in set. Elapsed: 18.349 sec. Processed 100.03 million rows, 800.21 MB (5.45 million rows/s., 43.61 MB/s.)
SELECT DISTINCT number
FROM numbers_mt(100000000)
LIMIT 1000
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 21.56 MiB.
0 rows in set. Elapsed: 0.014 sec. Processed 589.54 thousand rows, 4.72 MB (43.08 million rows/s., 344.61 MB/s.)
SELECT DISTINCT number % 1000
FROM numbers_mt(1000000000)
LIMIT 1000
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 1.80 MiB.
0 rows in set. Elapsed: 0.005 sec. Processed 589.54 thousand rows, 4.72 MB (127.23 million rows/s., 1.02 GB/s.)
SELECT DISTINCT number % 1000
FROM numbers(1000000000)
LIMIT 1001
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 847.05 KiB.
0 rows in set. Elapsed: 0.448 sec. Processed 1.00 billion rows, 8.00 GB (2.23 billion rows/s., 17.88 GB/s.)
- Final distinct step is single threaded
- Stream resultset
GROUP BY
SELECT number
FROM numbers_mt(100000000)
GROUP BY number
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 4.04 GiB.
0 rows in set. Elapsed: 8.212 sec. Processed 100.00 million rows, 800.00 MB (12.18 million rows/s., 97.42 MB/s.)
SELECT number
FROM numbers_mt(100000000)
GROUP BY number
SETTINGS max_threads = 1
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 6.00 GiB.
0 rows in set. Elapsed: 19.206 sec. Processed 100.03 million rows, 800.21 MB (5.21 million rows/s., 41.66 MB/s.)
SELECT number
FROM numbers_mt(100000000)
GROUP BY number
LIMIT 1000
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 4.05 GiB.
0 rows in set. Elapsed: 4.852 sec. Processed 100.00 million rows, 800.00 MB (20.61 million rows/s., 164.88 MB/s.)
This query faster than first, because ClickHouse doesn't need to merge states for all keys, only for first 1000 (based on LIMIT)
SELECT number % 1000 AS key
FROM numbers_mt(1000000000)
GROUP BY key
LIMIT 1000
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 3.15 MiB.
0 rows in set. Elapsed: 0.770 sec. Processed 1.00 billion rows, 8.00 GB (1.30 billion rows/s., 10.40 GB/s.)
SELECT number % 1000 AS key
FROM numbers_mt(1000000000)
GROUP BY key
LIMIT 1001
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 3.77 MiB.
0 rows in set. Elapsed: 0.770 sec. Processed 1.00 billion rows, 8.00 GB (1.30 billion rows/s., 10.40 GB/s.)
- Multi threaded
- Will return result only after competion of aggregation
LIMIT BY
SELECT number
FROM numbers_mt(100000000)
LIMIT 1 BY number
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 6.00 GiB.
0 rows in set. Elapsed: 39.541 sec. Processed 100.00 million rows, 800.00 MB (2.53 million rows/s., 20.23 MB/s.)
SELECT number
FROM numbers_mt(100000000)
LIMIT 1 BY number
SETTINGS max_threads = 1
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 6.01 GiB.
0 rows in set. Elapsed: 36.773 sec. Processed 100.03 million rows, 800.21 MB (2.72 million rows/s., 21.76 MB/s.)
SELECT number
FROM numbers_mt(100000000)
LIMIT 1 BY number
LIMIT 1000
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 10.56 MiB.
0 rows in set. Elapsed: 0.019 sec. Processed 589.54 thousand rows, 4.72 MB (30.52 million rows/s., 244.20 MB/s.)
SELECT number % 1000 AS key
FROM numbers_mt(1000000000)
LIMIT 1 BY key
LIMIT 1000
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 5.14 MiB.
0 rows in set. Elapsed: 0.008 sec. Processed 589.54 thousand rows, 4.72 MB (71.27 million rows/s., 570.16 MB/s.)
SELECT number % 1000 AS key
FROM numbers_mt(1000000000)
LIMIT 1 BY key
LIMIT 1001
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 3.23 MiB.
0 rows in set. Elapsed: 36.027 sec. Processed 1.00 billion rows, 8.00 GB (27.76 million rows/s., 222.06 MB/s.)
- Single threaded
- Stream resultset
- Can return arbitrary amount of rows per each key
4 - Imprecise parsing of literal Decimal or Float64
Decimal
SELECT
9.2::Decimal64(2) AS postgresql_cast,
toDecimal64(9.2, 2) AS to_function,
CAST(9.2, 'Decimal64(2)') AS cast_float_literal,
CAST('9.2', 'Decimal64(2)') AS cast_string_literal
┌─postgresql_cast─┬─to_function─┬─cast_float_literal─┬─cast_string_literal─┐
│ 9.2 │ 9.19 │ 9.19 │ 9.2 │
└─────────────────┴─────────────┴────────────────────┴─────────────────────┘
When we try to type cast 64.32 to Decimal128(2) the resulted value is 64.31.
When it sees a number with a decimal separator it interprets as Float64
literal (where 64.32
have no accurate representation, and actually you get something like 64.319999999999999999
) and later that Float is casted to Decimal by removing the extra precision.
Workaround is very simple - wrap the number in quotes (and it will be considered as a string literal by query parser, and will be transformed to Decimal directly), or use postgres-alike casting syntax:
select cast(64.32,'Decimal128(2)') a, cast('64.32','Decimal128(2)') b, 64.32::Decimal128(2) c;
┌─────a─┬─────b─┬─────c─┐
│ 64.31 │ 64.32 │ 64.32 │
└───────┴───────┴───────┘
Float64
SELECT
toFloat64(15008753.) AS to_func,
toFloat64('1.5008753E7') AS to_func_scientific,
CAST('1.5008753E7', 'Float64') AS cast_scientific
┌──to_func─┬─to_func_scientific─┬────cast_scientific─┐
│ 15008753 │ 15008753.000000002 │ 15008753.000000002 │
└──────────┴────────────────────┴────────────────────┘
5 - Multiple aligned date columns in PARTITION BY expression
Alternative to doing that by minmax skip index.
CREATE TABLE part_key_multiple_dates
(
`key` UInt32,
`date` Date,
`time` DateTime,
`created_at` DateTime,
`inserted_at` DateTime
)
ENGINE = MergeTree
PARTITION BY (toYYYYMM(date), ignore(created_at, inserted_at))
ORDER BY (key, time);
INSERT INTO part_key_multiple_dates SELECT
number,
toDate(x),
now() + intDiv(number, 10) AS x,
x - (rand() % 100),
x + (rand() % 100)
FROM numbers(100000000);
SELECT count()
FROM part_key_multiple_dates
WHERE date > (now() + toIntervalDay(105));
┌─count()─┐
│ 8434210 │
└─────────┘
1 rows in set. Elapsed: 0.022 sec. Processed 11.03 million rows, 22.05 MB (501.94 million rows/s., 1.00 GB/s.)
SELECT count()
FROM part_key_multiple_dates
WHERE inserted_at > (now() + toIntervalDay(105));
┌─count()─┐
│ 9279818 │
└─────────┘
1 rows in set. Elapsed: 0.046 sec. Processed 11.03 million rows, 44.10 MB (237.64 million rows/s., 950.57 MB/s.)
SELECT count()
FROM part_key_multiple_dates
WHERE created_at > (now() + toIntervalDay(105));
┌─count()─┐
│ 9279139 │
└─────────┘
1 rows in set. Elapsed: 0.043 sec. Processed 11.03 million rows, 44.10 MB (258.22 million rows/s., 1.03 GB/s.)
6 - Why is simple `SELECT count()` Slow in ClickHouse?
Why is simple SELECT count()
Slow in ClickHouse?
ClickHouse is a columnar database that provides excellent performance for analytical queries. However, in some cases, a simple count query can be slow. In this article, we’ll explore the reasons why this can happen and how to optimize the query.
Three Strategies for Counting Rows in ClickHouse
There are three ways to count rows in a table in ClickHouse:
optimize_trivial_count_query
: This strategy extracts the number of rows from the table metadata. It’s the fastest and most efficient way to count rows, but it only works for simple count queries.allow_experimental_projection_optimization
: This strategy uses a virtual projection called _minmax_count_projection to count rows. It’s faster than scanning the table but slower than the trivial count query.Scanning the smallest column in the table and reading rows from that. This is the slowest strategy and is only used when the other two strategies can’t be used.
Why Does ClickHouse Sometimes Choose the Slowest Counting Strategy?
In some cases, ClickHouse may choose the slowest counting strategy even when there are faster options available. Here are some possible reasons why this can happen:
Row policies are used on the table: If row policies are used, ClickHouse needs to filter rows to give the proper count. You can check if row policies are used by selecting from system.row_policies.
Experimental light-weight delete feature was used on the table: If the experimental light-weight delete feature was used, ClickHouse may use the slowest counting strategy. You can check this by looking into parts_columns for the column named _row_exists. To do this, run the following query:
SELECT DISTINCT database, table FROM system.parts_columns WHERE column = '_row_exists';
You can also refer to this issue on GitHub for more information: https://github.com/ClickHouse/ClickHouse/issues/47930.
SELECT FINAL
orfinal=1
setting is used.max_parallel_replicas > 1
is used.Sampling is used.
Some other features like
allow_experimental_query_deduplication
orempty_result_for_aggregation_by_empty_set
is used.
7 - Collecting query execution flamegraphs using system.trace_log
Collecting query execution flamegraph using system.trace_log
ClickHouse has embedded functionality to analyze the details of query performance.
It’s system.trace_log
table.
By default it collects information only about queries when runs longer than 1 sec (and collects stacktraces every second).
You can adjust that per query using settings query_profiler_real_time_period_ns
& query_profiler_cpu_time_period_ns
.
Both works very similar (with desired interval dump the stacktraces of all the threads which execute the query). real timer - allows to ‘see’ the situtions when cpu was not working much, but time was spend for example on IO. cpu timer - allows to see the ‘hot’ points in calculations more accurately (skip the io time).
Trying to collect stacktraces with a frequency higher than few KHz is usually not possible.
To check where most of the RAM is used you can collect stacktraces during memory allocations / deallocation, by using the
setting memory_profiler_sample_probability
.
clickhouse-speedscope
# install
wget https://github.com/laplab/clickhouse-speedscope/archive/refs/heads/master.tar.gz -O clickhouse-speedscope.tar.gz
tar -xvzf clickhouse-speedscope.tar.gz
cd clickhouse-speedscope-master/
pip3 install -r requirements.txt
For debugging particular query:
clickhouse-client
SET query_profiler_cpu_time_period_ns=1000000; -- 1000 times per 'cpu' sec
-- or SET query_profiler_real_time_period_ns=2000000; -- 500 times per 'real' sec.
-- or SET memory_profiler_sample_probability=0.1; -- to debug the memory allocations
SELECT ... <your select>
SYSTEM FLUSH LOGS;
-- get the query_id from the clickhouse-client output or from system.query_log (also pay attention on query_id vs initial_query_id for distributed queries).
Now let’s process that:
python3 main.py & # start the proxy in background
python3 main.py --query-id 908952ee-71a8-48a4-84d5-f4db92d45a5d # process the stacktraces
fg # get the proxy from background
Ctrl + C # stop it.
To access ClickHouse with other username / password etc. - see the sources of https://github.com/laplab/clickhouse-speedscope/blob/master/main.py
clickhouse-flamegraph
Installation & usage instructions: https://github.com/Slach/clickhouse-flamegraph
pure flamegraph.pl examples
git clone https://github.com/brendangregg/FlameGraph /opt/flamegraph
clickhouse-client -q "SELECT arrayStringConcat(arrayReverse(arrayMap(x -> concat( addressToLine(x), '#', demangle(addressToSymbol(x)) ), trace)), ';') AS stack, count() AS samples FROM system.trace_log WHERE event_time >= subtractMinutes(now(),10) GROUP BY trace FORMAT TabSeparated" | /opt/flamegraph/flamegraph.pl > flamegraph.svg
clickhouse-client -q "SELECT arrayStringConcat((arrayMap(x -> concat(splitByChar('/', addressToLine(x))[-1], '#', demangle(addressToSymbol(x)) ), trace)), ';') AS stack, sum(abs(size)) AS samples FROM system.trace_log where trace_type = 'Memory' and event_date = today() group by trace order by samples desc FORMAT TabSeparated" | /opt/flamegraph/flamegraph.pl > allocs.svg
clickhouse-client -q "SELECT arrayStringConcat(arrayReverse(arrayMap(x -> concat(splitByChar('/', addressToLine(x))[-1], '#', demangle(addressToSymbol(x)) ), trace)), ';') AS stack, count() AS samples FROM system.trace_log where trace_type = 'Memory' group by trace FORMAT TabSeparated SETTINGS allow_introspection_functions=1" | /opt/flamegraph/flamegraph.pl > ~/mem1.svg
similar using perf
apt-get update -y
apt-get install -y linux-tools-common linux-tools-generic linux-tools-`uname -r`git
apt-get install -y clickhouse-common-static-dbg clickhouse-common-dbg
mkdir -p /opt/flamegraph
git clone https://github.com/brendangregg/FlameGraph /opt/flamegraph
perf record -F 99 -p $(pidof clickhouse) -G
perf script > /tmp/out.perf
/opt/flamegraph/stackcollapse-perf.pl /tmp/out.perf | /opt/flamegraph/flamegraph.pl > /tmp/flamegraph.svg
also
https://kb.altinity.com/altinity-kb-queries-and-syntax/troubleshooting/#flamegraph
8 - Using array functions to mimic window-functions alike behavior
Using array functions to mimic window functions alike behavior
There are some usecases when you may want to mimic window functions using Arrays - as an optimization step, or to contol the memory better / use on-disk spiling, or just if you have old ClickHouse version.
Running difference sample
DROP TABLE IS EXISTS test_running_difference
CREATE TABLE test_running_difference
ENGINE = Log AS
SELECT
number % 20 AS id,
toDateTime('2010-01-01 00:00:00') + (intDiv(number, 20) * 15) AS ts,
(number * round(xxHash32(number % 20) / 1000000)) - round(rand() / 1000000) AS val
FROM numbers(100)
SELECT * FROM test_running_difference
┌─id─┬──────────────────ts─┬────val─┐
│ 0 │ 2010-01-01 00:00:00 │ -1209 │
│ 1 │ 2010-01-01 00:00:00 │ 43 │
│ 2 │ 2010-01-01 00:00:00 │ 4322 │
│ 3 │ 2010-01-01 00:00:00 │ -25 │
│ 4 │ 2010-01-01 00:00:00 │ 13720 │
│ 5 │ 2010-01-01 00:00:00 │ 903 │
│ 6 │ 2010-01-01 00:00:00 │ 18062 │
│ 7 │ 2010-01-01 00:00:00 │ -2873 │
│ 8 │ 2010-01-01 00:00:00 │ 6286 │
│ 9 │ 2010-01-01 00:00:00 │ 13399 │
│ 10 │ 2010-01-01 00:00:00 │ 18320 │
│ 11 │ 2010-01-01 00:00:00 │ 11731 │
│ 12 │ 2010-01-01 00:00:00 │ 857 │
│ 13 │ 2010-01-01 00:00:00 │ 8752 │
│ 14 │ 2010-01-01 00:00:00 │ 23060 │
│ 15 │ 2010-01-01 00:00:00 │ 41902 │
│ 16 │ 2010-01-01 00:00:00 │ 39406 │
│ 17 │ 2010-01-01 00:00:00 │ 50010 │
│ 18 │ 2010-01-01 00:00:00 │ 57673 │
│ 19 │ 2010-01-01 00:00:00 │ 51389 │
│ 0 │ 2010-01-01 00:00:15 │ 66839 │
│ 1 │ 2010-01-01 00:00:15 │ 19440 │
│ 2 │ 2010-01-01 00:00:15 │ 74513 │
│ 3 │ 2010-01-01 00:00:15 │ 10542 │
│ 4 │ 2010-01-01 00:00:15 │ 94245 │
│ 5 │ 2010-01-01 00:00:15 │ 8230 │
│ 6 │ 2010-01-01 00:00:15 │ 87823 │
│ 7 │ 2010-01-01 00:00:15 │ -128 │
│ 8 │ 2010-01-01 00:00:15 │ 30101 │
│ 9 │ 2010-01-01 00:00:15 │ 54321 │
│ 10 │ 2010-01-01 00:00:15 │ 64078 │
│ 11 │ 2010-01-01 00:00:15 │ 31886 │
│ 12 │ 2010-01-01 00:00:15 │ 8749 │
│ 13 │ 2010-01-01 00:00:15 │ 28982 │
│ 14 │ 2010-01-01 00:00:15 │ 61299 │
│ 15 │ 2010-01-01 00:00:15 │ 95867 │
│ 16 │ 2010-01-01 00:00:15 │ 93667 │
│ 17 │ 2010-01-01 00:00:15 │ 114072 │
│ 18 │ 2010-01-01 00:00:15 │ 124279 │
│ 19 │ 2010-01-01 00:00:15 │ 109605 │
│ 0 │ 2010-01-01 00:00:30 │ 135082 │
│ 1 │ 2010-01-01 00:00:30 │ 37345 │
│ 2 │ 2010-01-01 00:00:30 │ 148744 │
│ 3 │ 2010-01-01 00:00:30 │ 21607 │
│ 4 │ 2010-01-01 00:00:30 │ 171744 │
│ 5 │ 2010-01-01 00:00:30 │ 14736 │
│ 6 │ 2010-01-01 00:00:30 │ 155349 │
│ 7 │ 2010-01-01 00:00:30 │ -3901 │
│ 8 │ 2010-01-01 00:00:30 │ 54303 │
│ 9 │ 2010-01-01 00:00:30 │ 89629 │
│ 10 │ 2010-01-01 00:00:30 │ 106595 │
│ 11 │ 2010-01-01 00:00:30 │ 54545 │
│ 12 │ 2010-01-01 00:00:30 │ 18903 │
│ 13 │ 2010-01-01 00:00:30 │ 48023 │
│ 14 │ 2010-01-01 00:00:30 │ 97930 │
│ 15 │ 2010-01-01 00:00:30 │ 152165 │
│ 16 │ 2010-01-01 00:00:30 │ 146130 │
│ 17 │ 2010-01-01 00:00:30 │ 174854 │
│ 18 │ 2010-01-01 00:00:30 │ 189194 │
│ 19 │ 2010-01-01 00:00:30 │ 170134 │
│ 0 │ 2010-01-01 00:00:45 │ 207471 │
│ 1 │ 2010-01-01 00:00:45 │ 54323 │
│ 2 │ 2010-01-01 00:00:45 │ 217984 │
│ 3 │ 2010-01-01 00:00:45 │ 31835 │
│ 4 │ 2010-01-01 00:00:45 │ 252709 │
│ 5 │ 2010-01-01 00:00:45 │ 21493 │
│ 6 │ 2010-01-01 00:00:45 │ 221271 │
│ 7 │ 2010-01-01 00:00:45 │ -488 │
│ 8 │ 2010-01-01 00:00:45 │ 76827 │
│ 9 │ 2010-01-01 00:00:45 │ 131066 │
│ 10 │ 2010-01-01 00:00:45 │ 149087 │
│ 11 │ 2010-01-01 00:00:45 │ 71934 │
│ 12 │ 2010-01-01 00:00:45 │ 25125 │
│ 13 │ 2010-01-01 00:00:45 │ 65274 │
│ 14 │ 2010-01-01 00:00:45 │ 135980 │
│ 15 │ 2010-01-01 00:00:45 │ 210910 │
│ 16 │ 2010-01-01 00:00:45 │ 200007 │
│ 17 │ 2010-01-01 00:00:45 │ 235872 │
│ 18 │ 2010-01-01 00:00:45 │ 256112 │
│ 19 │ 2010-01-01 00:00:45 │ 229371 │
│ 0 │ 2010-01-01 00:01:00 │ 275331 │
│ 1 │ 2010-01-01 00:01:00 │ 72668 │
│ 2 │ 2010-01-01 00:01:00 │ 290366 │
│ 3 │ 2010-01-01 00:01:00 │ 46074 │
│ 4 │ 2010-01-01 00:01:00 │ 329207 │
│ 5 │ 2010-01-01 00:01:00 │ 26770 │
│ 6 │ 2010-01-01 00:01:00 │ 287619 │
│ 7 │ 2010-01-01 00:01:00 │ -2207 │
│ 8 │ 2010-01-01 00:01:00 │ 100456 │
│ 9 │ 2010-01-01 00:01:00 │ 165688 │
│ 10 │ 2010-01-01 00:01:00 │ 194136 │
│ 11 │ 2010-01-01 00:01:00 │ 94113 │
│ 12 │ 2010-01-01 00:01:00 │ 35810 │
│ 13 │ 2010-01-01 00:01:00 │ 85081 │
│ 14 │ 2010-01-01 00:01:00 │ 170256 │
│ 15 │ 2010-01-01 00:01:00 │ 265445 │
│ 16 │ 2010-01-01 00:01:00 │ 254828 │
│ 17 │ 2010-01-01 00:01:00 │ 297238 │
│ 18 │ 2010-01-01 00:01:00 │ 323494 │
│ 19 │ 2010-01-01 00:01:00 │ 286252 │
└────┴─────────────────────┴────────┘
100 rows in set. Elapsed: 0.003 sec.
runningDifference works only in blocks & require ordered data & problematic when group changes
select id, val, runningDifference(val) from (select * from test_running_difference order by id, ts);
┌─id─┬────val─┬─runningDifference(val)─┐
│ 0 │ -1209 │ 0 │
│ 0 │ 66839 │ 68048 │
│ 0 │ 135082 │ 68243 │
│ 0 │ 207471 │ 72389 │
│ 0 │ 275331 │ 67860 │
│ 1 │ 43 │ -275288 │
│ 1 │ 19440 │ 19397 │
│ 1 │ 37345 │ 17905 │
│ 1 │ 54323 │ 16978 │
│ 1 │ 72668 │ 18345 │
│ 2 │ 4322 │ -68346 │
│ 2 │ 74513 │ 70191 │
│ 2 │ 148744 │ 74231 │
│ 2 │ 217984 │ 69240 │
│ 2 │ 290366 │ 72382 │
│ 3 │ -25 │ -290391 │
│ 3 │ 10542 │ 10567 │
│ 3 │ 21607 │ 11065 │
│ 3 │ 31835 │ 10228 │
│ 3 │ 46074 │ 14239 │
│ 4 │ 13720 │ -32354 │
│ 4 │ 94245 │ 80525 │
│ 4 │ 171744 │ 77499 │
│ 4 │ 252709 │ 80965 │
│ 4 │ 329207 │ 76498 │
│ 5 │ 903 │ -328304 │
│ 5 │ 8230 │ 7327 │
│ 5 │ 14736 │ 6506 │
│ 5 │ 21493 │ 6757 │
│ 5 │ 26770 │ 5277 │
│ 6 │ 18062 │ -8708 │
│ 6 │ 87823 │ 69761 │
│ 6 │ 155349 │ 67526 │
│ 6 │ 221271 │ 65922 │
│ 6 │ 287619 │ 66348 │
│ 7 │ -2873 │ -290492 │
│ 7 │ -128 │ 2745 │
│ 7 │ -3901 │ -3773 │
│ 7 │ -488 │ 3413 │
│ 7 │ -2207 │ -1719 │
│ 8 │ 6286 │ 8493 │
│ 8 │ 30101 │ 23815 │
│ 8 │ 54303 │ 24202 │
│ 8 │ 76827 │ 22524 │
│ 8 │ 100456 │ 23629 │
│ 9 │ 13399 │ -87057 │
│ 9 │ 54321 │ 40922 │
│ 9 │ 89629 │ 35308 │
│ 9 │ 131066 │ 41437 │
│ 9 │ 165688 │ 34622 │
│ 10 │ 18320 │ -147368 │
│ 10 │ 64078 │ 45758 │
│ 10 │ 106595 │ 42517 │
│ 10 │ 149087 │ 42492 │
│ 10 │ 194136 │ 45049 │
│ 11 │ 11731 │ -182405 │
│ 11 │ 31886 │ 20155 │
│ 11 │ 54545 │ 22659 │
│ 11 │ 71934 │ 17389 │
│ 11 │ 94113 │ 22179 │
│ 12 │ 857 │ -93256 │
│ 12 │ 8749 │ 7892 │
│ 12 │ 18903 │ 10154 │
│ 12 │ 25125 │ 6222 │
│ 12 │ 35810 │ 10685 │
│ 13 │ 8752 │ -27058 │
│ 13 │ 28982 │ 20230 │
│ 13 │ 48023 │ 19041 │
│ 13 │ 65274 │ 17251 │
│ 13 │ 85081 │ 19807 │
│ 14 │ 23060 │ -62021 │
│ 14 │ 61299 │ 38239 │
│ 14 │ 97930 │ 36631 │
│ 14 │ 135980 │ 38050 │
│ 14 │ 170256 │ 34276 │
│ 15 │ 41902 │ -128354 │
│ 15 │ 95867 │ 53965 │
│ 15 │ 152165 │ 56298 │
│ 15 │ 210910 │ 58745 │
│ 15 │ 265445 │ 54535 │
│ 16 │ 39406 │ -226039 │
│ 16 │ 93667 │ 54261 │
│ 16 │ 146130 │ 52463 │
│ 16 │ 200007 │ 53877 │
│ 16 │ 254828 │ 54821 │
│ 17 │ 50010 │ -204818 │
│ 17 │ 114072 │ 64062 │
│ 17 │ 174854 │ 60782 │
│ 17 │ 235872 │ 61018 │
│ 17 │ 297238 │ 61366 │
│ 18 │ 57673 │ -239565 │
│ 18 │ 124279 │ 66606 │
│ 18 │ 189194 │ 64915 │
│ 18 │ 256112 │ 66918 │
│ 18 │ 323494 │ 67382 │
│ 19 │ 51389 │ -272105 │
│ 19 │ 109605 │ 58216 │
│ 19 │ 170134 │ 60529 │
│ 19 │ 229371 │ 59237 │
│ 19 │ 286252 │ 56881 │
└────┴────────┴────────────────────────┘
100 rows in set. Elapsed: 0.005 sec.
Arrays !
1. Group & Collect the data into array
you can collect several column by builing array of tuples:
SELECT
id,
groupArray(tuple(ts, val))
FROM test_running_difference
GROUP BY id
┌─id─┬─groupArray(tuple(ts, val))──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 0 │ [('2010-01-01 00:00:00',-1209),('2010-01-01 00:00:15',66839),('2010-01-01 00:00:30',135082),('2010-01-01 00:00:45',207471),('2010-01-01 00:01:00',275331)] │
│ 1 │ [('2010-01-01 00:00:00',43),('2010-01-01 00:00:15',19440),('2010-01-01 00:00:30',37345),('2010-01-01 00:00:45',54323),('2010-01-01 00:01:00',72668)] │
│ 2 │ [('2010-01-01 00:00:00',4322),('2010-01-01 00:00:15',74513),('2010-01-01 00:00:30',148744),('2010-01-01 00:00:45',217984),('2010-01-01 00:01:00',290366)] │
│ 3 │ [('2010-01-01 00:00:00',-25),('2010-01-01 00:00:15',10542),('2010-01-01 00:00:30',21607),('2010-01-01 00:00:45',31835),('2010-01-01 00:01:00',46074)] │
│ 4 │ [('2010-01-01 00:00:00',13720),('2010-01-01 00:00:15',94245),('2010-01-01 00:00:30',171744),('2010-01-01 00:00:45',252709),('2010-01-01 00:01:00',329207)] │
│ 5 │ [('2010-01-01 00:00:00',903),('2010-01-01 00:00:15',8230),('2010-01-01 00:00:30',14736),('2010-01-01 00:00:45',21493),('2010-01-01 00:01:00',26770)] │
│ 6 │ [('2010-01-01 00:00:00',18062),('2010-01-01 00:00:15',87823),('2010-01-01 00:00:30',155349),('2010-01-01 00:00:45',221271),('2010-01-01 00:01:00',287619)] │
│ 7 │ [('2010-01-01 00:00:00',-2873),('2010-01-01 00:00:15',-128),('2010-01-01 00:00:30',-3901),('2010-01-01 00:00:45',-488),('2010-01-01 00:01:00',-2207)] │
│ 8 │ [('2010-01-01 00:00:00',6286),('2010-01-01 00:00:15',30101),('2010-01-01 00:00:30',54303),('2010-01-01 00:00:45',76827),('2010-01-01 00:01:00',100456)] │
│ 9 │ [('2010-01-01 00:00:00',13399),('2010-01-01 00:00:15',54321),('2010-01-01 00:00:30',89629),('2010-01-01 00:00:45',131066),('2010-01-01 00:01:00',165688)] │
│ 10 │ [('2010-01-01 00:00:00',18320),('2010-01-01 00:00:15',64078),('2010-01-01 00:00:30',106595),('2010-01-01 00:00:45',149087),('2010-01-01 00:01:00',194136)] │
│ 11 │ [('2010-01-01 00:00:00',11731),('2010-01-01 00:00:15',31886),('2010-01-01 00:00:30',54545),('2010-01-01 00:00:45',71934),('2010-01-01 00:01:00',94113)] │
│ 12 │ [('2010-01-01 00:00:00',857),('2010-01-01 00:00:15',8749),('2010-01-01 00:00:30',18903),('2010-01-01 00:00:45',25125),('2010-01-01 00:01:00',35810)] │
│ 13 │ [('2010-01-01 00:00:00',8752),('2010-01-01 00:00:15',28982),('2010-01-01 00:00:30',48023),('2010-01-01 00:00:45',65274),('2010-01-01 00:01:00',85081)] │
│ 14 │ [('2010-01-01 00:00:00',23060),('2010-01-01 00:00:15',61299),('2010-01-01 00:00:30',97930),('2010-01-01 00:00:45',135980),('2010-01-01 00:01:00',170256)] │
│ 15 │ [('2010-01-01 00:00:00',41902),('2010-01-01 00:00:15',95867),('2010-01-01 00:00:30',152165),('2010-01-01 00:00:45',210910),('2010-01-01 00:01:00',265445)] │
│ 16 │ [('2010-01-01 00:00:00',39406),('2010-01-01 00:00:15',93667),('2010-01-01 00:00:30',146130),('2010-01-01 00:00:45',200007),('2010-01-01 00:01:00',254828)] │
│ 17 │ [('2010-01-01 00:00:00',50010),('2010-01-01 00:00:15',114072),('2010-01-01 00:00:30',174854),('2010-01-01 00:00:45',235872),('2010-01-01 00:01:00',297238)] │
│ 18 │ [('2010-01-01 00:00:00',57673),('2010-01-01 00:00:15',124279),('2010-01-01 00:00:30',189194),('2010-01-01 00:00:45',256112),('2010-01-01 00:01:00',323494)] │
│ 19 │ [('2010-01-01 00:00:00',51389),('2010-01-01 00:00:15',109605),('2010-01-01 00:00:30',170134),('2010-01-01 00:00:45',229371),('2010-01-01 00:01:00',286252)] │
└────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Do needed ordering in each array
For example - by second element of tuple:
SELECT
id,
arraySort(x -> (x.2), groupArray((ts, val)))
FROM test_running_difference
GROUP BY id
┌─id─┬─arraySort(lambda(tuple(x), tupleElement(x, 2)), groupArray(tuple(ts, val)))─────────────────────────────────────────────────────────────────────────────────┐
│ 0 │ [('2010-01-01 00:00:00',-1209),('2010-01-01 00:00:15',66839),('2010-01-01 00:00:30',135082),('2010-01-01 00:00:45',207471),('2010-01-01 00:01:00',275331)] │
│ 1 │ [('2010-01-01 00:00:00',43),('2010-01-01 00:00:15',19440),('2010-01-01 00:00:30',37345),('2010-01-01 00:00:45',54323),('2010-01-01 00:01:00',72668)] │
│ 2 │ [('2010-01-01 00:00:00',4322),('2010-01-01 00:00:15',74513),('2010-01-01 00:00:30',148744),('2010-01-01 00:00:45',217984),('2010-01-01 00:01:00',290366)] │
│ 3 │ [('2010-01-01 00:00:00',-25),('2010-01-01 00:00:15',10542),('2010-01-01 00:00:30',21607),('2010-01-01 00:00:45',31835),('2010-01-01 00:01:00',46074)] │
│ 4 │ [('2010-01-01 00:00:00',13720),('2010-01-01 00:00:15',94245),('2010-01-01 00:00:30',171744),('2010-01-01 00:00:45',252709),('2010-01-01 00:01:00',329207)] │
│ 5 │ [('2010-01-01 00:00:00',903),('2010-01-01 00:00:15',8230),('2010-01-01 00:00:30',14736),('2010-01-01 00:00:45',21493),('2010-01-01 00:01:00',26770)] │
│ 6 │ [('2010-01-01 00:00:00',18062),('2010-01-01 00:00:15',87823),('2010-01-01 00:00:30',155349),('2010-01-01 00:00:45',221271),('2010-01-01 00:01:00',287619)] │
│ 7 │ [('2010-01-01 00:00:30',-3901),('2010-01-01 00:00:00',-2873),('2010-01-01 00:01:00',-2207),('2010-01-01 00:00:45',-488),('2010-01-01 00:00:15',-128)] │
│ 8 │ [('2010-01-01 00:00:00',6286),('2010-01-01 00:00:15',30101),('2010-01-01 00:00:30',54303),('2010-01-01 00:00:45',76827),('2010-01-01 00:01:00',100456)] │
│ 9 │ [('2010-01-01 00:00:00',13399),('2010-01-01 00:00:15',54321),('2010-01-01 00:00:30',89629),('2010-01-01 00:00:45',131066),('2010-01-01 00:01:00',165688)] │
│ 10 │ [('2010-01-01 00:00:00',18320),('2010-01-01 00:00:15',64078),('2010-01-01 00:00:30',106595),('2010-01-01 00:00:45',149087),('2010-01-01 00:01:00',194136)] │
│ 11 │ [('2010-01-01 00:00:00',11731),('2010-01-01 00:00:15',31886),('2010-01-01 00:00:30',54545),('2010-01-01 00:00:45',71934),('2010-01-01 00:01:00',94113)] │
│ 12 │ [('2010-01-01 00:00:00',857),('2010-01-01 00:00:15',8749),('2010-01-01 00:00:30',18903),('2010-01-01 00:00:45',25125),('2010-01-01 00:01:00',35810)] │
│ 13 │ [('2010-01-01 00:00:00',8752),('2010-01-01 00:00:15',28982),('2010-01-01 00:00:30',48023),('2010-01-01 00:00:45',65274),('2010-01-01 00:01:00',85081)] │
│ 14 │ [('2010-01-01 00:00:00',23060),('2010-01-01 00:00:15',61299),('2010-01-01 00:00:30',97930),('2010-01-01 00:00:45',135980),('2010-01-01 00:01:00',170256)] │
│ 15 │ [('2010-01-01 00:00:00',41902),('2010-01-01 00:00:15',95867),('2010-01-01 00:00:30',152165),('2010-01-01 00:00:45',210910),('2010-01-01 00:01:00',265445)] │
│ 16 │ [('2010-01-01 00:00:00',39406),('2010-01-01 00:00:15',93667),('2010-01-01 00:00:30',146130),('2010-01-01 00:00:45',200007),('2010-01-01 00:01:00',254828)] │
│ 17 │ [('2010-01-01 00:00:00',50010),('2010-01-01 00:00:15',114072),('2010-01-01 00:00:30',174854),('2010-01-01 00:00:45',235872),('2010-01-01 00:01:00',297238)] │
│ 18 │ [('2010-01-01 00:00:00',57673),('2010-01-01 00:00:15',124279),('2010-01-01 00:00:30',189194),('2010-01-01 00:00:45',256112),('2010-01-01 00:01:00',323494)] │
│ 19 │ [('2010-01-01 00:00:00',51389),('2010-01-01 00:00:15',109605),('2010-01-01 00:00:30',170134),('2010-01-01 00:00:45',229371),('2010-01-01 00:01:00',286252)] │
└────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
20 rows in set. Elapsed: 0.004 sec.
That can be rewritten like this:
WITH
groupArray(tuple(ts, val)) as window_rows,
arraySort(x -> x.1, window_rows) as sorted_window_rows
SELECT
id,
sorted_window_rows
FROM test_running_difference
GROUP BY id
Apply needed logic arrayMap / arrayDifference etc
WITH
groupArray(tuple(ts, val)) as window_rows,
arraySort(x -> x.1, window_rows) as sorted_window_rows,
arrayMap(x -> x.2, sorted_window_rows) as sorted_window_rows_val_column,
arrayDifference(sorted_window_rows_val_column) as sorted_window_rows_val_column_diff
SELECT
id,
sorted_window_rows_val_column_diff
FROM test_running_difference
GROUP BY id
┌─id─┬─sorted_window_rows_val_column_diff─┐
│ 0 │ [0,68048,68243,72389,67860] │
│ 1 │ [0,19397,17905,16978,18345] │
│ 2 │ [0,70191,74231,69240,72382] │
│ 3 │ [0,10567,11065,10228,14239] │
│ 4 │ [0,80525,77499,80965,76498] │
│ 5 │ [0,7327,6506,6757,5277] │
│ 6 │ [0,69761,67526,65922,66348] │
│ 7 │ [0,2745,-3773,3413,-1719] │
│ 8 │ [0,23815,24202,22524,23629] │
│ 9 │ [0,40922,35308,41437,34622] │
│ 10 │ [0,45758,42517,42492,45049] │
│ 11 │ [0,20155,22659,17389,22179] │
│ 12 │ [0,7892,10154,6222,10685] │
│ 13 │ [0,20230,19041,17251,19807] │
│ 14 │ [0,38239,36631,38050,34276] │
│ 15 │ [0,53965,56298,58745,54535] │
│ 16 │ [0,54261,52463,53877,54821] │
│ 17 │ [0,64062,60782,61018,61366] │
│ 18 │ [0,66606,64915,66918,67382] │
│ 19 │ [0,58216,60529,59237,56881] │
└────┴────────────────────────────────────┘
20 rows in set. Elapsed: 0.005 sec.
You can do also a lot of magic with arrayEnumerate and accessing different values by their ids.
Now you can return you arrays back to rows
use arrayJoin
WITH
groupArray(tuple(ts, val)) as window_rows,
arraySort(x -> x.1, window_rows) as sorted_window_rows,
arrayMap(x -> x.2, sorted_window_rows) as sorted_window_rows_val_column,
arrayDifference(sorted_window_rows_val_column) as sorted_window_rows_val_column_diff,
arrayJoin(sorted_window_rows_val_column_diff) as diff
SELECT
id,
diff
FROM test_running_difference
GROUP BY id
or ARRAY JOIN
SELECT
id,
diff,
ts
FROM
(
WITH
groupArray(tuple(ts, val)) as window_rows,
arraySort(x -> x.1, window_rows) as sorted_window_rows,
arrayMap(x -> x.2, sorted_window_rows) as sorted_window_rows_val_column
SELECT
id,
arrayDifference(sorted_window_rows_val_column) as sorted_window_rows_val_column_diff,
arrayMap(x -> x.1, sorted_window_rows) as sorted_window_rows_ts_column
FROM test_running_difference
GROUP BY id
) as t1
ARRAY JOIN sorted_window_rows_val_column_diff as diff, sorted_window_rows_ts_column as ts
etc.
9 - -State & -Merge combinators
-State combinator doesn’t actually store information about -If combinator, so aggregate functions with -If and without have the same serialized data.
$ clickhouse-local --query "SELECT maxIfState(number,number % 2) as x, maxState(number) as y FROM numbers(10) FORMAT RowBinary" | clickhouse-local --input-format RowBinary --structure="x AggregateFunction(max,UInt64), y AggregateFunction(max,UInt64)" --query "SELECT maxMerge(x), maxMerge(y) FROM table"
9 9
$ clickhouse-local --query "SELECT maxIfState(number,number % 2) as x, maxState(number) as y FROM numbers(11) FORMAT RowBinary" | clickhouse-local --input-format RowBinary --structure="x AggregateFunction(max,UInt64), y AggregateFunction(max,UInt64)" --query "SELECT maxMerge(x), maxMerge(y) FROM table"
9 10
-State combinator have the same serialized data footprint regardless of parameters used in definition of aggregate function. That’s true for quantile* and sequenceMatch/sequenceCount functions.
$ clickhouse-local --query "SELECT quantilesTDigestIfState(0.1,0.9)(number,number % 2) FROM numbers(1000000) FORMAT RowBinary" | clickhouse-local --input-format RowBinary --structure="x AggregateFunction(quantileTDigestWeighted(0.5),UInt64,UInt8)" --query "SELECT quantileTDigestWeightedMerge(0.4)(x) FROM table"
400000
$ clickhouse-local --query "SELECT quantilesTDigestIfState(0.1,0.9)(number,number % 2) FROM numbers(1000000) FORMAT RowBinary" | clickhouse-local --input-format RowBinary --structure="x AggregateFunction(quantilesTDigestWeighted(0.5),UInt64,UInt8)" --query "SELECT quantilesTDigestWeightedMerge(0.4,0.8)(x) FROM table"
[400000,800000]
SELECT quantileMerge(0.9)(x)
FROM
(
SELECT quantileState(0.1)(number) AS x
FROM numbers(1000)
)
┌─quantileMerge(0.9)(x)─┐
│ 899.1 │
└───────────────────────┘
SELECT
sequenceMatchMerge('(?2)(?3)')(x) AS `2_3`,
sequenceMatchMerge('(?1)(?4)')(x) AS `1_4`,
sequenceMatchMerge('(?1)(?2)(?3)')(x) AS `1_2_3`
FROM
(
SELECT sequenceMatchState('(?1)(?2)(?3)')(number, number = 8, number = 5, number = 6, number = 9) AS x
FROM numbers(10)
)
┌─2_3─┬─1_4─┬─1_2_3─┐
│ 1 │ 1 │ 0 │
└─────┴─────┴───────┘
SELECT
sequenceCountMerge('(?1)(?2)')(x) AS `2_3`,
sequenceCountMerge('(?1)(?4)')(x) AS `1_4`,
sequenceCountMerge('(?1)(?2)(?3)')(x) AS `1_2_3`
FROM
(
WITH number % 4 AS cond
SELECT sequenceCountState('(?1)(?2)(?3)')(number, cond = 1, cond = 2, cond = 3, cond = 5) AS x
FROM numbers(11)
)
┌─2_3─┬─1_4─┬─1_2_3─┐
│ 3 │ 0 │ 2 │
└─────┴─────┴───────┘
10 - ALTER MODIFY COLUMN is stuck, the column is inaccessible.
Problem
You have table:
CREATE TABLE modify_column(column_n String) ENGINE=MergeTree() ORDER BY tuple();
Populate it with data:
INSERT INTO modify_column VALUES ('key_a');
INSERT INTO modify_column VALUES ('key_b');
INSERT INTO modify_column VALUES ('key_c');
Tried to apply alter table query with changing column type:
ALTER TABLE modify_column MODIFY COLUMN column_n Enum8('key_a'=1, 'key_b'=2);
But it didn’t succeed and you see an error in system.mutations table:
SELECT *
FROM system.mutations
WHERE (table = 'modify_column') AND (is_done = 0)
FORMAT Vertical
Row 1:
──────
database: default
table: modify_column
mutation_id: mutation_4.txt
command: MODIFY COLUMN `column_n` Enum8('key_a' = 1, 'key_b' = 2)
create_time: 2021-03-03 18:38:09
block_numbers.partition_id: ['']
block_numbers.number: [4]
parts_to_do_names: ['all_3_3_0']
parts_to_do: 1
is_done: 0
latest_failed_part: all_3_3_0
latest_fail_time: 2021-03-03 18:38:59
latest_fail_reason: Code: 36, e.displayText() = DB::Exception: Unknown element 'key_c' for type Enum8('key_a' = 1, 'key_b' = 2): while executing 'FUNCTION CAST(column_n :: 0, 'Enum8(\'key_a\' = 1, \'key_b\' = 2)' :: 1) -> cast(column_n, 'Enum8(\'key_a\' = 1, \'key_b\' = 2)') Enum8('key_a' = 1, 'key_b' = 2) : 2': (while reading from part /var/lib/clickhouse/data/default/modify_column/all_3_3_0/): While executing MergeTree (version 21.3.1.6041)
And you can’t query that column anymore:
SELECT column_n
FROM modify_column
┌─column_n─┐
│ key_a │
└──────────┘
┌─column_n─┐
│ key_b │
└──────────┘
↓ Progress: 2.00 rows, 2.00 B (19.48 rows/s., 19.48 B/s.)
2 rows in set. Elapsed: 0.104 sec.
Received exception from server (version 21.3.1):
Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: Unknown element 'key_c' for type Enum8('key_a' = 1, 'key_b' = 2): while executing 'FUNCTION CAST(column_n :: 0, 'Enum8(\'key_a\' = 1, \'key_b\' = 2)' :: 1) -> cast(column_n, 'Enum8(\'key_a\' = 1, \'key_b\' = 2)') Enum8('key_a' = 1, 'key_b' = 2) : 2': (while reading from part /var/lib/clickhouse/data/default/modify_column/all_3_3_0/): While executing MergeTreeThread.
Solution
You should do the following:
Check which mutation is stuck and kill it:
SELECT * FROM system.mutations WHERE table = 'modify_column' AND is_done=0 FORMAT Vertical;
KILL MUTATION WHERE table = 'modify_column' AND mutation_id = 'id_of_stuck_mutation';
Apply reverting modify column query to convert table to previous column type:
ALTER TABLE modify_column MODIFY COLUMN column_n String;
Check if column is accessible now:
SELECT column_n, count() FROM modify_column GROUP BY column_n;
Run fixed ALTER MODIFY COLUMN query.
ALTER TABLE modify_column MODIFY COLUMN column_n Enum8('key_a'=1, 'key_b'=2, 'key_c'=3);
You can monitor progress of column type change with system.mutations or system.parts_columns tables:
SELECT
command,
parts_to_do,
is_done
FROM system.mutations
WHERE table = 'modify_column'
SELECT
column,
type,
count() AS parts,
sum(rows) AS rows,
sum(bytes_on_disk) AS bytes
FROM system.parts_columns
WHERE (table = 'modify_column') AND (column = 'column_n') AND active
GROUP BY
column,
type
11 - ANSI SQL mode
It’s possible to tune some settings which would make ClickHouse more ANSI SQL compatible(and slower):
SET join_use_nulls=1; -- introduced long ago
SET cast_keep_nullable=1; -- introduced in 20.5
SET union_default_mode='DISTINCT'; -- introduced in 21.1
SET allow_experimental_window_functions=1; -- introduced in 21.3
SET prefer_column_name_to_alias=1; -- introduced in 21.4;
SET group_by_use_nulls=1; -- introduced in 22.7;
12 - Async INSERTs
Async INSERTs is a ClickHouse feature tha enables batching data automatically and transparently on the server-side. Although async inserts work, they still have issues, but have been improved in latest versions. We recommend to batch at app/ingestor level because you will have more control and you decouple this responsibility from ClickHouse. Being said that here some insights about Async inserts you should now:
- Async inserts give acknowledgment immediately after the data got inserted into the buffer (wait_for_async_insert = 0) or by default, after the data got written to a part after flushing from buffer (wait_for_async_insert = 1).
- INSERT .. SELECT is NOT async insert.
- Async inserts will do (idempotent) retries.
- Async inserts can collect data for some offline remote clusters: Yandex self-driving cars were collecting the metrics data during the ride into ClickHouse installed on the car computer to a distributed table with Async inserts enabled, which were flushed to the cluster once the car was plugged to the network.
- Async inserts can do batching, so multiple inserts can be squashed as a single insert (but in that case, retries are not idempotent anymore).
- Async inserts can loose your data in case of sudden restart (no fsyncs by default).
- Async inserted data becomes available for selects not immediately after acknowledgment.
- Async inserts generally have more
moving parts
there are some background threads monitoring new data to be sent and pushing it out. - Async inserts require extra monitoring from different system.tables (see
system.part_log
,system.query_log
andsystem.asynchronous_inserts
for 22.8). Previously such queries didn’t appear in the query log. Check: #33239. - Important to use
wait_for_async_insert = 1
because with any error you will loose data without knowing it. For example your table is read only -> losing data, out of disk space -> losing data, too many parts -> losing data.
22.10+ bugfixes/features
- Fixed bug which could lead to deadlock while using asynchronous inserts. See #43233.
- Async insert dedup: Support block deduplication for asynchronous inserts. Before this change, async inserts did not support deduplication, because multiple small inserts coexisted in one inserted batch. See #38075 and #43304.
- Added system table
asynchronous_insert_log
. It contains information about asynchronous inserts (including results of queries in fire-and-forget mode. (with wait_for_async_insert=0)) for better introspection. See #42040. - Support async inserts in clickhouse-client for queries with inlined data. See #34267.
To improve observability / introspection
It is not possible to relate part_log/query_id
column with asynchronous_insert_log/query_id
column. We need to use query_log/query_id
:
asynchronous_insert_log
shows up the query_id
and flush_query_id
of each async insert. The query_id
from async insert shows up in the system.query_log
as type = 'QueryStart'
but the same query_id
does not show up in the query_id
column of the system.part_log
. Because the query_id
column in the part_log
is the identifier of the INSERT query that created a data part, and it seems it is for sync INSERTS but not for async inserts.
Metrics
SELECT *
FROM system.metrics
WHERE metric LIKE '%AsyncInsert%'
Query id: 7384b8c8-3d87-4059-b1c4-e9955e97232b
┌─metric───────────────┬─value─┬─description────────────────────────────────────────────────┐
│ PendingAsyncInsert │ 0 │ Number of asynchronous inserts that are waiting for flush. │
│ AsyncInsertCacheSize │ 0 │ Number of async insert hash id in cache │
└──────────────────────┴───────┴────────────────────────────────────────────────────────────┘
2 rows in set. Elapsed: 0.001 sec.
13 - Atomic insert
An insert is atomic if it creates only one part.
An insert will create one part if:
- Data is inserted directly into a MergeTree table
- Data is inserted into a single partition.
- For INSERT FORMAT:
- Number of rows is less than
max_insert_block_size
(default is1048545
) - Parallel formatting is disabled (For TSV, TKSV, CSV, and JSONEachRow formats setting
input_format_parallel_parsing=0
is set).
- Number of rows is less than
- For INSERT SELECT:
- Number of rows is less than
max_block_size
- Number of rows is less than
- Smaller blocks are properly squashed up to the configured block size (
min_insert_block_size_rows
andmin_insert_block_size_bytes
) - The MergeTree table doesn’t have Materialized Views (there is no atomicity Table <> MV)
https://github.com/ClickHouse/ClickHouse/issues/9195#issuecomment-587500824 https://github.com/ClickHouse/ClickHouse/issues/5148#issuecomment-487757235
Example how to make a large insert atomically
Generate test data in Native and TSV format ( 100 millions rows )
Text formats and Native format require different set of settings, here I want to find / demonstrate mandatory minumum of settings for any case.
clickhouse-client -q \
'SELECT toInt64(number) A, toString(number) S FROM numbers(100000000) FORMAT Native' > t.native
clickhouse-client -q \
'SELECT toInt64(number) A, toString(number) S FROM numbers(100000000) FORMAT TSV' > t.tsv
Insert with default settings (not atomic)
DROP TABLE IF EXISTS trg;
CREATE TABLE trg(A Int64, S String) Engine=MergeTree ORDER BY A;
-- Load data in Native format
clickhouse-client -q 'INSERT INTO trg FORMAT Native' <t.native
-- Check how many parts is created
SELECT
count(),
min(rows),
max(rows),
sum(rows)
FROM system.parts
WHERE (level = 0) AND (table = 'trg');
┌─count()─┬─min(rows)─┬─max(rows)─┬─sum(rows)─┐
│ 90 │ 890935 │ 1113585 │ 100000000 │
└─────────┴───────────┴───────────┴───────────┘
--- 90 parts! was created - not atomic
DROP TABLE IF EXISTS trg;
CREATE TABLE trg(A Int64, S String) Engine=MergeTree ORDER BY A;
-- Load data in TSV format
clickhouse-client -q 'INSERT INTO trg FORMAT TSV' <t.tsv
-- Check how many parts is created
SELECT
count(),
min(rows),
max(rows),
sum(rows)
FROM system.parts
WHERE (level = 0) AND (table = 'trg');
┌─count()─┬─min(rows)─┬─max(rows)─┬─sum(rows)─┐
│ 85 │ 898207 │ 1449610 │ 100000000 │
└─────────┴───────────┴───────────┴───────────┘
--- 85 parts! was created - not atomic
Insert with adjusted settings (atomic)
Atomic insert use more memory because it needs 100 millions rows in memory.
DROP TABLE IF EXISTS trg;
CREATE TABLE trg(A Int64, S String) Engine=MergeTree ORDER BY A;
clickhouse-client --input_format_parallel_parsing=0 \
--min_insert_block_size_bytes=0 \
--min_insert_block_size_rows=1000000000 \
-q 'INSERT INTO trg FORMAT Native' <t.native
-- Check that only one part is created
SELECT
count(),
min(rows),
max(rows),
sum(rows)
FROM system.parts
WHERE (level = 0) AND (table = 'trg');
┌─count()─┬─min(rows)─┬─max(rows)─┬─sum(rows)─┐
│ 1 │ 100000000 │ 100000000 │ 100000000 │
└─────────┴───────────┴───────────┴───────────┘
-- 1 part, success.
DROP TABLE IF EXISTS trg;
CREATE TABLE trg(A Int64, S String) Engine=MergeTree ORDER BY A;
-- Load data in TSV format
clickhouse-client --input_format_parallel_parsing=0 \
--min_insert_block_size_bytes=0 \
--min_insert_block_size_rows=1000000000 \
-q 'INSERT INTO trg FORMAT TSV' <t.tsv
-- Check that only one part is created
SELECT
count(),
min(rows),
max(rows),
sum(rows)
FROM system.parts
WHERE (level = 0) AND (table = 'trg');
┌─count()─┬─min(rows)─┬─max(rows)─┬─sum(rows)─┐
│ 1 │ 100000000 │ 100000000 │ 100000000 │
└─────────┴───────────┴───────────┴───────────┘
-- 1 part, success.
14 - Cumulative Anything
Sample data
CREATE TABLE events
(
`ts` DateTime,
`user_id` UInt32
)
ENGINE = Memory;
INSERT INTO events SELECT
toDateTime('2021-04-29 10:10:10') + toIntervalHour(7 * number) AS ts,
toDayOfWeek(ts) + (number % 2) AS user_id
FROM numbers(15);
Using arrays
WITH
groupArray(_ts) AS ts_arr,
groupArray(state) AS state_arr
SELECT
arrayJoin(ts_arr) AS ts,
arrayReduce('uniqExactMerge', arrayFilter((x, y) -> (y <= ts), state_arr, ts_arr)) AS uniq
FROM
(
SELECT
toStartOfDay(ts) AS _ts,
uniqExactState(user_id) AS state
FROM events
GROUP BY _ts
)
ORDER BY ts ASC
┌──────────────────ts─┬─uniq─┐
│ 2021-04-29 00:00:00 │ 2 │
│ 2021-04-30 00:00:00 │ 3 │
│ 2021-05-01 00:00:00 │ 4 │
│ 2021-05-02 00:00:00 │ 5 │
│ 2021-05-03 00:00:00 │ 7 │
└─────────────────────┴──────┘
WITH arrayJoin(range(toUInt32(_ts) AS int, least(int + toUInt32((3600 * 24) * 5), toUInt32(toDateTime('2021-05-04 00:00:00'))), 3600 * 24)) AS ts_expanded
SELECT
toDateTime(ts_expanded) AS ts,
uniqExactMerge(state) AS uniq
FROM
(
SELECT
toStartOfDay(ts) AS _ts,
uniqExactState(user_id) AS state
FROM events
GROUP BY _ts
)
GROUP BY ts
ORDER BY ts ASC
┌──────────────────ts─┬─uniq─┐
│ 2021-04-29 00:00:00 │ 2 │
│ 2021-04-30 00:00:00 │ 3 │
│ 2021-05-01 00:00:00 │ 4 │
│ 2021-05-02 00:00:00 │ 5 │
│ 2021-05-03 00:00:00 │ 7 │
└─────────────────────┴──────┘
Using window functions (starting from Clickhouse 21.3)
SELECT
ts,
uniqExactMerge(state) OVER (ORDER BY ts ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS uniq
FROM
(
SELECT
toStartOfDay(ts) AS ts,
uniqExactState(user_id) AS state
FROM events
GROUP BY ts
)
ORDER BY ts ASC
┌──────────────────ts─┬─uniq─┐
│ 2021-04-29 00:00:00 │ 2 │
│ 2021-04-30 00:00:00 │ 3 │
│ 2021-05-01 00:00:00 │ 4 │
│ 2021-05-02 00:00:00 │ 5 │
│ 2021-05-03 00:00:00 │ 7 │
└─────────────────────┴──────┘
Using runningAccumulate (incorrect result over blocks)
SELECT
ts,
runningAccumulate(state) AS uniq
FROM
(
SELECT
toStartOfDay(ts) AS ts,
uniqExactState(user_id) AS state
FROM events
GROUP BY ts
ORDER BY ts ASC
)
ORDER BY ts ASC
┌──────────────────ts─┬─uniq─┐
│ 2021-04-29 00:00:00 │ 2 │
│ 2021-04-30 00:00:00 │ 3 │
│ 2021-05-01 00:00:00 │ 4 │
│ 2021-05-02 00:00:00 │ 5 │
│ 2021-05-03 00:00:00 │ 7 │
└─────────────────────┴──────┘
15 - Data types on disk and in RAM
DataType | RAM size (=byteSize) | Disk Size |
---|---|---|
String | string byte length + 9 string length: 64 bit integer zero-byte terminator: 1 byte. | string length prefix (varint) + string itself:
|
AggregateFunction(count, ...) | varint |
See also https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup41/data_processing.pdf (slide 17-22)
16 - DELETE via tombstone column
CREATE TABLE test_delete
(
`key` UInt32,
`ts` UInt32,
`value_a` String,
`value_b` String,
`value_c` String,
`is_active` UInt8 DEFAULT 1
)
ENGINE = MergeTree
ORDER BY