Network Configuration

Network Configuration

Networking And Server Room Planning

The network used for your ClickHouse cluster should be a fast network, ideally 10 Gbit. ClickHouse nodes generate a lot of traffic along with the Zookeeper connections and inter-Zookeeper communications.

For the zookeeper low latency is more important than bandwidth.

Keep the replicas isolated on the hardware level. This allows for cluster failover from possible outages.

  • For Physical Environments: Avoid placing 2 ClickHouse replicas on the same server rack. Ideally, they should be on isolated network switches and an isolated power supply.
  • For Clouds Environments: Use different availability zones between the ClickHouse replicas when possible (but be aware of the interzone traffic costs)

These considerations are the same as the Zookeeper nodes.

For example:

Rack Server Server Server Server
Rack 1 CH_SHARD1_R1 CH_SHARD2_R1 CH_SHARD3_R1 ZOO_1
Rack 2 CH_SHARD1_R2 CH_SHARD2_R2 CH_SHARD3_R2 ZOO_2
Rack 3 ZOO3

Network Ports And Firewall

ClickHouse listens the following ports:

  • 9000: clickhouse-client, native clients, other clickhouse-servers connect to here.
  • 8123: HTTP clients
  • 9009: Other replicas will connect here to download data.

For more information, see CLICKHOUSE NETWORKING, PART 1.

Zookeeper listens the following ports:

  • 2181: Client connections.
  • 2888: Inter-ensemble connections.
  • 3888: Leader election.

Outbound traffic from ClickHouse connects to the following ports:

  • ZooKeeper: On port 2181.
  • Other CH nodes in the cluster: On port 9000 and 9009.
  • Dictionary sources: Depending on what was configured such as HTTP, MySQL, Mongo, etc.
  • Kafka or Hadoop: If those integrations were enabled.

SSL

For non-trusted networks enable SSL/HTTPS. If acceptable, it is better to keep interserver communications unencrypted for performance reasons.

Naming Schema

The best time to start creating a naming schema for the servers is before they’re created and configured.

There are a few features based on good server naming in ClickHouse:

  • clickhouse-client prompts: Allows a different prompt for clickhouse-client per server hostname.
  • Nearest hostname load balancing: For more information, see Nearest Hostname.

A good option is to use the following:

{datacenter}-{serverroom}-{rack identifier}-{clickhouse cluster identifier}-{shard number or server number}.

Other examples:

  • rxv-olap-ch-master-sh01-r01:
    • rxv - location (rack#15)
    • olap - product name
    • ch = clickhouse
    • master = stage
    • sh01 = shard 1
    • r01 = replica 1
  • hetnzerde1-ch-prod-01.local:
    • hetnzerde1 - location (also replica id)
    • ch = clickhouse
    • prod = stage
    • 01 - server number / shard number in that DC
  • sh01.ch-front.dev.aws-east1a.example.com:
    • sh01 - shard 01
    • ch-front - cluster name
    • dev = stage
    • aws = cloud provider
    • east1a = region and availability zone

Host Name References

Additional Hostname Tips

  • Hostnames configured on the server should not change. If you do need to change the host name, one reference to use is How to Change Hostname on Ubuntu 18.04.
  • The server should be accessible to other servers in the cluster via it’s hostname. Otherwise you will need to configure interserver_hostname in your config.
  • Ensure that hostname --fqdn and getent hosts $(hostname --fqdn) return the correct name and ip.
Last modified 2021.08.12 : General corrections and updates. (5969b688)