Successful ClickHouse deployment plan

Successful ClickHouse deployment plan.

Successful ClickHouse deployment plan

Stage 0. Build POC

  1. Install single node clickhouse
  2. Start with creating a single table (the biggest one), use MergeTree engine. Create ‘some’ schema (most probably it will be far from optimal). Prefer denormalized approach for all immutable dimensions, for mutable dimensions - consider dictionaries.
  3. Load some amount of data (at least 5 Gb, and 10 mln rows) - preferable the real one, or as close to real as possible. Usully the simplest options are either through CSV / TSV files (or insert into clickhouse_table select * FROM mysql(...) where ...)
  4. Create several representative queries.
  5. Check the columns cardinality, and appropriate types, use minimal needed type
  6. Review the partition by and order by. https://kb.altinity.com/engines/mergetree-table-engine-family/pick-keys/
  7. Create the schema(s) with better/promising order by / partitioning, load data in. Pick the best schema.
  8. consider different improvements of particular columns (codecs / better data types etc.) https://kb.altinity.com/altinity-kb-schema-design/codecs/altinity-kb-how-to-test-different-compression-codecs/
  9. If the performance of certain queries is not enough - consider using PREWHERE / skipping indexes
  10. Repeat 2-9 for next big table(s). Avoid scenarios when you need to join big tables.
  11. Pick the clients library for you programming language (the most mature are python / golang / java / c++), build some pipeline - for inserts (low QPS, lot of rows in singe insert, check acknowledgements & retry the same block on failures), ETLs if needed, some reporting layer (https://kb.altinity.com/altinity-kb-integrations/bi-tools/

Stage 1. Planning the production setup

  1. Collect more data / estimate insert speed, estimate the column sizes per day / month.
  2. Measure the speed of queries
  3. Consider improvement using materialized views / projections / dictionaries.
  4. Collect requirements (ha / number of simultaneous queries / insert pressure / ’exactly once’ etc)
  5. Do a cluster sizing estimation, plan the hardware 
  6. plan the network, if needed - consider using LoadBalancers etc.
  7. If you need sharding - consider different sharding approaches.

Stage 2. Preprod setup & developement

  1. Install clickhouse in cluster - several nodes / VMs + zookeeper
  2. Create good config & automate config / os / restarts (ansible / puppet etc)
  3. Set up monitoring / log processing / alerts etc.
  4. Set up users.
  5. Think of schema management. Deploy the schema.
  6. Design backup / failover strategies:
  7. Develop pipelines / queries, create test suite, CI/CD
  8. Do benchmark / stress tests
  9. Test configuration changes / server restarts / failovers / version upgrades
  10. Review the security topics (tls, limits / restrictions, network, passwords)
  11. Document the solution for operations

Stage 3. Production setup

  1. Deploy the production setup (consider also canary / blue-greed deployments etc)
  2. Schedule ClickHouse upgrades every 6 to 12 months (if possible)