Add/Remove a new replica to a ClickHouse® cluster
ADD nodes/replicas to a ClickHouse® cluster
To add some replicas to an existing cluster if -30TB then better to use replication:
- don’t add the
remote_servers.xml
until replication is done. - Add these files and restart to limit bandwidth and avoid saturation (70% total bandwidth):
Core Settings | ClickHouse Docs
💡 Do the Gbps to Bps math correctly. For 10G —> 1250MB/s —> 1250000000 B/s and change max_replicated_*
settings accordingly:
- Nodes replicating from:
- Nodes replicating to:
Manual method (DDL)
- Create tables
manually
and be sure macros in all replicas are aligned with the ZK path. If zk path uses{cluster}
then this method won’t work. ZK path should use{shard}
and{replica}
or{uuid}
(if databases are Atomic) only.
This will generate the UUIDs in the CREATE TABLE definition, something like this:
- Copy both SQL to destination replica and execute
Using clickhouse-backup
- Using
clickhouse-backup
to copy the schema of a replica to another is also convenient and moreover if using Atomic database with{uuid}
macros in ReplicatedMergeTree engines:
Using altinity operator
If there is at least one alive replica in the shard, you can remove PVCs and STS for affected nodes and trigger reconciliation. The operator will try to copy the schema from other replicas.
Check that schema migration was successful and node is replicating
- To check that the schema migration has been successful query system.replicas:
Check how the replication process is performing using https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-replication-queue/
- If there are many postponed tasks with message:
then it is ok, the maximum replication slots are being used. Exceptions are not OK and should be investigated
If migration was successful and replication is working then wait until the replication is finished. It may take some days depending on how much data is being replicated. After this edit the cluster configuration xml file for all replicas (
remote_servers.xml
) and add the new replica to the cluster.
Possible problems
Exception REPLICA_ALREADY_EXISTS
The DDLs have been executed and some tables have been created and after that dropped but some left overs are left in ZK:
- If databases can be dropped then use
DROP DATABASE xxxxx SYNC
- If databases cannot be dropped use
SYSTEM DROP REPLICA ‘replica_name’ FROM db.table
Exception TABLE_ALREADY_EXISTS
Tables have not been dropped correctly:
- If databases can be dropped then use
DROP DATABASE xxxxx SYNC
- If databases cannot be dropped use:
Tuning
- Sometimes replication goes very fast and if you have a tiered storage hot/cold you could run out of space, so for that it is interesting to:
- reduce fetches from 8 to 4
- increase moves from 8 to 16
- Also to monitor this with:
- There are new tables in v23
system.replicated_fetches
andsystem.moves
check it out for more info. - if needed just stop replication using
SYSTEM STOP FETCHES
from the replicating nodes
REMOVE nodes/Replicas from a Cluster
- It is important to know which replica/node you want to remove to avoid problems. To check it you need to connect to the replica/node you want to remove and:
- After that we need connect to a replica different from the one that we want to remove (arg_tg01) and execute:
- This cannot be executed on the replica we want to remove (drop local replica), please use
DROP TABLE/DATABASE
for that.DROP REPLICA
does not drop any tables and does not remove any data or metadata from disk:
- After DROP REPLICA, we need to check that the replica is gone from the list or replicas. Connect to a node and execute:
- Delete the replica in the cluster configuration:
remote_servers.xml
and shutdown the node/replica removed.