Insert Deduplication / Insert idempotency

Insert Deduplication / Insert idempotency , insert_deduplicate setting.

Insert Deduplication

Replicated tables have a special feature insert deduplication (enabled by default).

Documentation: Data blocks are deduplicated. For multiple writes of the same data block (data blocks of the same size containing the same rows in the same order), the block is only written once. The reason for this is in case of network failures when the client application does not know if the data was written to the DB, so the INSERT query can simply be repeated. It does not matter which replica INSERTs were sent to with identical data. INSERTs are idempotent. Deduplication parameters are controlled by merge_tree server settings.

Example

create table test_insert ( A Int64 ) 
Engine=ReplicatedMergeTree('/clickhouse/cluster_test/tables/{table}','{replica}') 
order by A;
 
insert into test_insert values(1);
insert into test_insert values(1);
insert into test_insert values(1);
insert into test_insert values(1);

select * from test_insert;
┌─A─┐
 1                                        -- only one row has been inserted, the other rows were deduplicated
└───┘

alter table test_insert delete where 1;    -- that single row was removed

insert into test_insert values(1);

select * from test_insert;
0 rows in set. Elapsed: 0.001 sec.         -- the last insert was deduplicated again, 
                                           -- because `alter ... delete` does not clear deduplication checksums
                                           -- only `alter table drop partition` and `truncate` clear checksums

In clickhouse-server.log you may see trace messages Block with ID ... already exists locally as part ... ignoring it

# cat /var/log/clickhouse-server/clickhouse-server.log|grep test_insert|grep Block
..17:52:45.064974.. Block with ID all_7615936253566048997_747463735222236827 already exists locally as part all_0_0_0; ignoring it.
..17:52:45.068979.. Block with ID all_7615936253566048997_747463735222236827 already exists locally as part all_0_0_0; ignoring it.
..17:52:45.072883.. Block with ID all_7615936253566048997_747463735222236827 already exists locally as part all_0_0_0; ignoring it.
..17:52:45.076738.. Block with ID all_7615936253566048997_747463735222236827 already exists locally as part all_0_0_0; ignoring it.

Deduplication checksums are stored in Zookeeper in /blocks table’s znode for each partition separately, so when you drop partition, they could be identified and removed for this partition. (during alter table delete it’s impossible to match checksums, that’s why checksums stay in Zookeeper).

SELECT name, value
FROM system.zookeeper
WHERE path = '/clickhouse/cluster_test/tables/test_insert/blocks'
┌─name───────────────────────────────────────┬─value─────┐
 all_7615936253566048997_747463735222236827  all_0_0_0 
└────────────────────────────────────────────┴───────────┘

insert_deduplicate setting

Insert deduplication is controled by the insert_deduplicate setting

Let’s disable it:

set insert_deduplicate = 0;              -- insert_deduplicate is now disabled in this session

insert into test_insert values(1);
insert into test_insert values(1);
insert into test_insert values(1);

select * from test_insert format PrettyCompactMonoBlock;
┌─A─┐
 1 
 1 
 1                                    -- all 3 insterted rows are in the table
└───┘

alter table test_insert delete where 1;

insert into test_insert values(1);
insert into test_insert values(1);

select * from test_insert format PrettyCompactMonoBlock;
┌─A─┐
 1 
 1 
└───┘

Insert deduplication is a user-level setting, it can be disabled in a session or in a user’s profile (insert_deduplicate=0).

clickhouse-client --insert_deduplicate=0 ....

How to disable insert_deduplicate by default for all queries:

# cat /etc/clickhouse-server/users.d/insert_deduplicate.xml
<?xml version="1.0"?>
<yandex>
    <profiles>
        <default>
            <insert_deduplicate>0</insert_deduplicate>
        </default>
    </profile
</yandex>    

Other related settings: replicated_deduplication_window, replicated_deduplication_window_seconds, insert_deduplication_token.

More info: https://github.com/ClickHouse/ClickHouse/issues/16037 https://github.com/ClickHouse/ClickHouse/issues/3322

Non-replicated MergeTree tables

By default insert deduplication is disabled for non-replicated tables (for backward compatibility).

It can be enabled by the merge_tree setting non_replicated_deduplication_window.

Example:

create table test_insert ( A Int64 ) 
Engine=MergeTree 
order by A
settings non_replicated_deduplication_window = 100;          -- 100 - how many latest checksums to store
 
insert into test_insert values(1);
insert into test_insert values(1);
insert into test_insert values(1);

insert into test_insert values(2);
insert into test_insert values(2);

select * from test_insert format PrettyCompactMonoBlock;
┌─A─┐
 2 
 1 
└───┘

In case of non-replicated tables deduplication checksums are stored in files in the table’s folder:

cat /var/lib/clickhouse/data/default/test_insert/deduplication_logs/deduplication_log_1.txt
1	all_1_1_0	all_7615936253566048997_747463735222236827
1	all_4_4_0	all_636943575226146954_4277555262323907666

Checksums calculation

Checksums are calculated not from the inserted data but from formed parts.

Insert data is separated to parts by table’s partitioning.

Parts contain rows sorted by the table’s order by and all values of functions (i.e. now()) or Default/Materialized columns are expanded.

Example with partial insertion because of partitioning:

create table test_insert ( A Int64, B Int64 ) 
Engine=MergeTree 
partition by B 
order by A
settings non_replicated_deduplication_window = 100;  


insert into test_insert values (1,1);
insert into test_insert values (1,1)(1,2);

select * from test_insert format PrettyCompactMonoBlock;
┌─A─┬─B─┐
 1  1 
 1  2                                    -- the second insert was skipped for only one partition!!!
└───┴───┘

Example with deduplication despite the rows order:

drop table test_insert;

create table test_insert ( A Int64, B Int64 ) 
Engine=MergeTree 
order by (A, B)
settings non_replicated_deduplication_window = 100;  

insert into test_insert values (1,1)(1,2);
insert into test_insert values (1,2)(1,1);  -- the order of rows is not equal with the first insert

select * from test_insert format PrettyCompactMonoBlock;
┌─A─┬─B─┐
 1  1 
 1  2 
└───┴───┘
2 rows in set. Elapsed: 0.001 sec.          -- the second insert was skipped despite the rows order

Example to demonstrate how Default/Materialize columns are expanded:

drop table test_insert;

create table test_insert ( A Int64, B Int64 Default rand() ) 
Engine=MergeTree 
order by A
settings non_replicated_deduplication_window = 100;

insert into test_insert(A) values (1);                 -- B calculated as  rand()
insert into test_insert(A) values (1);                 -- B calculated as  rand()

select * from test_insert format PrettyCompactMonoBlock;
┌─A─┬──────────B─┐
 1  3467561058 
 1  3981927391 
└───┴────────────┘

insert into test_insert(A, B) values (1, 3467561058);  -- B is not calculated / will be deduplicated

select * from test_insert format PrettyCompactMonoBlock;
┌─A─┬──────────B─┐
 1  3981927391 
 1  3467561058 
└───┴────────────┘

Example to demonstrate how functions are expanded:

drop table test_insert;
create table test_insert ( A Int64, B DateTime64 ) 
Engine=MergeTree 
order by A
settings non_replicated_deduplication_window = 100;

insert into test_insert values (1, now64());
....
insert into test_insert values (1, now64());

select * from test_insert format PrettyCompactMonoBlock;
┌─A─┬───────────────────────B─┐
 1  2022-01-31 15:43:45.364 
 1  2022-01-31 15:43:41.944 
└───┴─────────────────────────┘

insert_deduplication_token

Since Clikhouse 22.2 there is a new setting insert_dedupplication_token. This setting allows you to define an explicit token that will be used for deduplication instead of calculating a checksum from the inserted data.

CREATE TABLE test_table
( A Int64 )
ENGINE = MergeTree
ORDER BY A
SETTINGS non_replicated_deduplication_window = 100;

INSERT INTO test_table SETTINGS insert_deduplication_token = 'test' VALUES (1);

-- the next insert won't be deduplicated because insert_deduplication_token is different
INSERT INTO test_table SETTINGS insert_deduplication_token = 'test1' VALUES (1);

-- the next insert will be deduplicated because insert_deduplication_token 
-- is the same as one of the previous
INSERT INTO test_table SETTINGS insert_deduplication_token = 'test' VALUES (2);
SELECT * FROM test_table
┌─A─┐
 1 
└───┘
┌─A─┐
 1 
└───┘