Data Migration

Data Migration

Export & Import into common data formats

Pros and cons: (plus) Data can be inserted into any DBMS. (minus) Decoding & encoding of common data formats may be slower / require more CPU (minus) The data size is usually bigger than ClickHouse formats. (minus) Some of the common data formats have limitations.

remote/remoteSecure or cluster/Distributed table

Pros and cons: (plus) Simple to run. (plus) It’s possible to change the schema and distribution of data between shards. (plus) It’s possible to copy only some subset of data (plus) Needs only access to ClickHouse TCP port. (minus) Uses CPU / RAM (mostly on the receiver side)

See details in:

remote-table-function.md

clickhouse-copier

Pros and cons:

(plus) Possible to do some changes schema. (plus) Needs only access to ClickHouse TCP port. (plus) It’s possible to change the distribution of data between shards. (plus) Suitable for large clusters: many clickhouse-copier can execute the same task together. (minus) May create an inconsistent result if source cluster data is changing during the process (minus) Hard to setup. (minus) Requires zookeeper. (minus) Uses CPU / RAM (mostly on the clickhouse-copier and receiver side)

See details in:

altinity-kb-clickhouse-copier

Manual parts moving: freeze / rsync / attach

Pros and cons: (plus) Low CPU / RAM usage. (minus) Table schema should be the same. (minus) A lot of manual operations/scripting.

See details in:

rsync.md

clickhouse-backup

Pros and cons: (plus) Low CPU / RAM usage. (plus) Suitable to recover both schema & data for all tables at once. (minus) Table schema should be the same.

Just create the backup on server 1, upload it to server 2, and restore the backup.

See https://github.com/AlexAkulov/clickhouse-backup

https://altinity.com/blog/introduction-to-clickhouse-backups-and-clickhouse-backup

Fetch from zookeeper path

Pros and cons: (plus) Low CPU / RAM usage (minus) Table schema should be the same. (minus) Works only when the source and the destination clickhouse servers share the same zookeeper (without chroot) (minus) Needs to access zookeeper and ClickHouse replication ports: (interserver_http_port or interserver_https_port)

ALTER TABLE table_name FETCH PARTITION partition_expr FROM 'path-in-zookeeper'

Replication protocol

Just make one more replica in another place.

Pros and cons: (plus) Simple to setup (plus) Data is consistent all the time automatically. (plus) Low CPU and network usage. (minus) Needs to reach both zookeeper client (2181) and ClickHouse replication ports: (interserver_http_port or interserver_https_port) (minus) In case of cluster migration, zookeeper need’s to be migrated too. (minus) Replication works both ways.

../altinity-kb-zookeeper/altinity-kb-zookeeper-cluster-migration.md

See also

Github issues

https://github.com/ClickHouse/ClickHouse/issues/10943 https://github.com/ClickHouse/ClickHouse/issues/20219 https://github.com/ClickHouse/ClickHouse/pull/17871

https://habr.com/ru/company/avito/blog/500678/


clickhouse-copier

clickhouse-copier

Remote table function

Remote table function

rsync

rsync

Last modified 2021.08.24 : Code format corrections. (6fe6a9e5)