Network Configuration

Networking And Server Room Planning

The network used for your ClickHouse cluster should be a fast network, ideally 10 Gbit or more. ClickHouse nodes generate a lot of traffic to exchange the data between nodes (port 9009 for replication, and 9000 for distributed queries). Zookeeper traffic in normal circumstanses is moderate, but in some special cases can also be very significant.

For the zookeeper low latency is more important than bandwidth.

Keep the replicas isolated on the hardware level. This allows for cluster failover from possible outages.

  • For Physical Environments: Avoid placing 2 ClickHouse replicas on the same server rack. Ideally, they should be on isolated network switches and an isolated power supply.
  • For Clouds Environments: Use different availability zones between the ClickHouse replicas when possible (but be aware of the interzone traffic costs)

These considerations are the same as the Zookeeper nodes.

For example:

RackServerServerServerServer
Rack 1CH_SHARD1_R1CH_SHARD2_R1CH_SHARD3_R1ZOO_1
Rack 2CH_SHARD1_R2CH_SHARD2_R2CH_SHARD3_R2ZOO_2
Rack 3ZOO3

Network Ports And Firewall

ClickHouse listens the following ports:

  • 9000: clickhouse-client, native clients, other clickhouse-servers connect to here.
  • 8123: HTTP clients
  • 9009: Other replicas will connect here to download data.

For more information, see CLICKHOUSE NETWORKING, PART 1.

Zookeeper listens the following ports:

  • 2181: Client connections.
  • 2888: Inter-ensemble connections.
  • 3888: Leader election.

Outbound traffic from ClickHouse connects to the following ports:

  • ZooKeeper: On port 2181.
  • Other CH nodes in the cluster: On port 9000 and 9009.
  • Dictionary sources: Depending on what was configured such as HTTP, MySQL, Mongo, etc.
  • Kafka or Hadoop: If those integrations were enabled.

SSL

For non-trusted networks enable SSL/HTTPS. If acceptable, it is better to keep interserver communications unencrypted for performance reasons.

Naming Schema

The best time to start creating a naming schema for the servers is before they’re created and configured.

There are a few features based on good server naming in ClickHouse:

  • clickhouse-client prompts: Allows a different prompt for clickhouse-client per server hostname.
  • Nearest hostname load balancing: For more information, see Nearest Hostname.

A good option is to use the following:

{datacenter}-{serverroom}-{rack identifier}-{clickhouse cluster identifier}-{shard number or server number}.

Other examples:

  • rxv-olap-ch-master-sh01-r01:
    • rxv - location (rack#15)
    • olap - product name
    • ch = clickhouse
    • master = stage
    • sh01 = shard 1
    • r01 = replica 1
  • hetnzerde1-ch-prod-01.local:
    • hetnzerde1 - location (also replica id)
    • ch = clickhouse
    • prod = stage
    • 01 - server number / shard number in that DC
  • sh01.ch-front.dev.aws-east1a.example.com:
    • sh01 - shard 01
    • ch-front - cluster name
    • dev = stage
    • aws = cloud provider
    • east1a = region and availability zone

Host Name References

Additional Hostname Tips

  • Hostnames configured on the server should not change. If you do need to change the host name, one reference to use is How to Change Hostname on Ubuntu 18.04.
  • The server should be accessible to other servers in the cluster via it’s hostname. Otherwise you will need to configure interserver_hostname in your config.
  • Ensure that hostname --fqdn and getent hosts $(hostname --fqdn) return the correct name and ip.