Successful ClickHouse® deployment plan

Stage 0. Build POC

Install single node ClickHouse
Start with creating a single table (the biggest one), use MergeTree engine. Create ‘some’ schema (most probably it will be far from optimal). Prefer denormalized approach for all immutable dimensions, for mutable dimensions - consider dictionaries.
Load some amount of data (at least 5 Gb, and 10 mln rows) - preferable the real one, or as close to real as possible. Usually the simplest options are either through CSV / TSV files (or insert into clickhouse_table select * FROM mysql(...) where ...)
Create several representative queries.
Check the columns cardinality, and appropriate types, use minimal needed type
Review the partition by and order by. https://kb.altinity.com/engines/mergetree-table-engine-family/pick-keys/
Create the schema(s) with better/promising order by / partitioning, load data in. Pick the best schema.
consider different improvements of particular columns (codecs / better data types etc.) https://kb.altinity.com/altinity-kb-schema-design/codecs/altinity-kb-how-to-test-different-compression-codecs/
If the performance of certain queries is not enough - consider using PREWHERE / skipping indexes
Repeat 2-9 for next big table(s). Avoid scenarios when you need to join big tables.
Pick the clients library for you programming language (the most mature are python / golang / java / c++), build some pipeline - for inserts (low QPS, lot of rows in singe insert, check acknowledgements & retry the same block on failures), ETLs if needed, some reporting layer (https://kb.altinity.com/altinity-kb-integrations/bi-tools/ )

Stage 1. Planning the production setup

Collect more data / estimate insert speed, estimate the column sizes per day / month.
Measure the speed of queries
Consider improvement using materialized views / projections / dictionaries.
Collect requirements (ha / number of simultaneous queries / insert pressure / ’exactly once’ etc)
Do a cluster sizing estimation, plan the hardware
- https://kb.altinity.com/altinity-kb-setup-and-maintenance/cluster-production-configuration-guide/hardware-requirements/
- https://blog.cloudflare.com/clickhouse-capacity-estimation-framework/
plan the network, if needed - consider using LoadBalancers etc.
- https://kb.altinity.com/altinity-kb-setup-and-maintenance/cluster-production-configuration-guide/network-configuration/
If you need sharding - consider different sharding approaches.

Stage 2. Preprod setup & development

Install ClickHouse in cluster - several nodes / VMs + zookeeper
Create good config & automate config / os / restarts (ansible / puppet etc)
- https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-settings-to-adjust/
- for docker: https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-clickhouse-in-docker/
- for k8s, use the Altinity Kubernetes Operator for ClickHouse OR https://kb.altinity.com/altinity-kb-kubernetes/altinity-kb-possible-issues-with-running-clickhouse-in-k8s/
Set up monitoring / log processing / alerts etc.
- https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-monitoring/#build-your-own-monitoring
Set up users.
- https://kb.altinity.com/altinity-kb-setup-and-maintenance/rbac/
Think of schema management. Deploy the schema.
- https://kb.altinity.com/altinity-kb-setup-and-maintenance/schema-migration-tools/
Design backup / failover strategies:
- https://clickhouse.com/docs/en/operations/backup/
- https://github.com/Altinity/clickhouse-backup
Develop pipelines / queries, create test suite, CI/CD
Do benchmark / stress tests
Test configuration changes / server restarts / failovers / version upgrades
Review the security topics (tls, limits / restrictions, network, passwords)
Document the solution for operations

Stage 3. Production setup

Deploy the production setup (consider also canary / blue-greed deployments etc)
Schedule ClickHouse upgrades every 6 to 12 months (if possible)

Last modified 2024.08.13: Fixed multiple typos here and there (9fb2290)