How to Replicate ClickHouse RBAC Users and Grants with ZooKeeper/Keeper

Practical guide to configure Keeper-backed RBAC replication for users, roles, grants, policies, quotas, and profiles across ClickHouse nodes, including migration and troubleshooting.

How can I replicate CREATE USER and other RBAC commands automatically between servers?

This KB explains how to make SQL RBAC changes (CREATE USER, CREATE ROLE, GRANT, row policies, quotas, settings profiles, masking policies) automatically appear on all servers by storing access entities in ZooKeeper/ClickHouse Keeper.

Keeper below means either ClickHouse Keeper or ZooKeeper.

TL;DR:

  • By default, SQL RBAC changes (CREATE USER, GRANT, etc.) are local to each server.
  • Replicated access storage keeps RBAC entities in ZooKeeper/ClickHouse Keeper so changes automatically appear on all nodes.
  • This guide shows how to configure replicated RBAC, validate it, and migrate existing users safely.

Before diving into the details, the core concept is:

  • ClickHouse stores access entities in access storages configured by user_directories.
  • By default, following the shared-nothing concept, SQL RBAC objects are local (local_directory), so changes done on one node do not automatically appear on another node unless you run ... ON CLUSTER ....
  • With user_directories.replicated, ClickHouse stores the RBAC model in Keeper under a configured path (for example /clickhouse/access) and every node watches that path.
  • Each node maintains a local in-memory cache of replicated access entities and updates it via Keeper watch callbacks. As a result, access checks are fast and performed locally in memory, while RBAC modifications depend on Keeper availability and propagation.

The flow of this article:

  1. Why this model helps.
  2. How to configure it on a new cluster.
  3. How to validate and operate it.
  4. How to migrate existing RBAC safely.
  5. Advanced troubleshooting and internals.

1. ON CLUSTER vs Keeper-backed RBAC: when to use which

ON CLUSTER executes DDL on hosts that exist at execution time. In practice, it fans out the query through the distributed DDL queue (also Keeper/ZooKeeper-dependent) to currently known cluster nodes. It does not automatically replay old RBAC DDL for replicas/shards added later.

Keeper-backed RBAC differences:

  • one shared RBAC state for the cluster;
  • new servers read the same RBAC state when they join;
  • no need to remember ON CLUSTER for every RBAC statement.

Mental model: Keeper-backed RBAC replicates access state, while ON CLUSTER fans out DDL to currently known nodes.

1.1 Pros and Cons of Keeper-backed RBAC

Pros:

  • Single source of truth for RBAC across nodes.
  • No manual file sync of users.xml / local access files.
  • Fast propagation through Keeper watch-driven refresh.
  • Natural SQL RBAC workflow (CREATE USER, GRANT, REVOKE, etc.).
  • Integrates with access-entity backup/restore.

Cons:

  • Writes depend on Keeper availability. CREATE/ALTER/DROP USER/ROLE and GRANT/REVOKE fail if Keeper is unavailable, while existing authentication/authorization may continue from already loaded cache until restart.
  • Operational complexity increases (Keeper health directly affects RBAC operations).
  • Keeper data loss or accidental Keeper path damage can remove replicated RBAC state, and users may lose access; keep regular RBAC backups and test restore procedures.
  • Can conflict with ON CLUSTER if both mechanisms are used without guard settings.
  • Invalid/corrupted payload in Keeper can be skipped or be startup-fatal, depending on throw_on_invalid_replicated_access_entities.
  • Very large RBAC sets (thousands of users/roles or very complex grants) can increase Keeper/watch pressure.
  • If Keeper is unavailable during server startup and replicated RBAC storage is configured, the server may fail to start.

2. Configure Keeper-backed RBAC on a new cluster

user_directories is the ClickHouse server configuration section that defines:

  • where access entities are read from (users.xml, local SQL access files, Keeper, LDAP, etc.),
  • and in which order those sources are checked (precedence).

In short: it is the access-storage routing configuration for users/roles/policies/profiles/quotas.

Apply on every ClickHouse node:

<clickhouse>
  <user_directories replace="replace">
    <users_xml>
      <path>/etc/clickhouse-server/users.xml</path>
    </users_xml>
    <replicated>
      <zookeeper_path>/clickhouse/access/</zookeeper_path>
    </replicated>
  </user_directories>
</clickhouse>

Why replace="replace" matters:

  • without replace="replace", your fragment can be merged with defaults;
  • defaults include local_directory, so SQL RBAC may still be written locally;
  • this can cause mixed behavior (some entities in Keeper, some in local files).

Recommended configuration for clusters using replicated RBAC:

  • users_xml: bootstrap/break-glass admin users and static defaults.
  • replicated: all SQL RBAC objects (CREATE USER, CREATE ROLE, GRANT, policies, profiles, quotas).
  • avoid local_directory as an active writable SQL RBAC storage to prevent mixed write behavior.

2.1 Understand user_directories: defaults, precedence, coexistence

What can be configured in user_directories:

  • users_xml (read-only config users),
  • local_directory (SQL users/roles in local files),
  • replicated (SQL users/roles in Keeper),
  • memory,
  • ldap (read-only remote auth source).

Defaults if user_directories is not specified:

  • ClickHouse uses legacy settings (users_config and access_control_path).
  • In typical default deployments this means users_xml + local_directory.

If user_directories is specified:

  • ClickHouse uses storages from this section and ignores users_config / access_control_path paths for access storages.
  • Order in user_directories defines precedence for lookup/auth.

When several storages coexist:

  • reads/auth checks storages by precedence order;
  • CREATE USER/ROLE/... without explicit IN ... goes to the first writable target by that order (and may conflict with entities found in higher-precedence storages).

There is special syntax to target a storage explicitly:

CREATE USER my_user IDENTIFIED BY '***' IN replicated;

This is supported, but for access control we usually do not recommend mixing storages intentionally. For sensitive access rights, a single source of truth (typically replicated) is safer and easier to operate.

3. Altinity Operator (CHI) configuration example

apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
  name: rbac-replicated
spec:
  configuration:
    files:
      config.d/user_directories.xml: |
        <clickhouse>
          <user_directories replace="replace">
            <users_xml>
              <path>/etc/clickhouse-server/users.xml</path>
            </users_xml>
            <replicated>
              <zookeeper_path>/clickhouse/access/</zookeeper_path>
            </replicated>
          </user_directories>
        </clickhouse>        

4. Validate the setup quickly

Check active storages and precedence:

SELECT name, type, params, precedence
FROM system.user_directories
ORDER BY precedence;

Expected result (values can vary by version/config; precedence values are relative and order matters):

name        type        precedence
users_xml   users_xml   0
replicated  replicated  1

Check where users are stored:

SELECT name, storage
FROM system.users
ORDER BY name;

Expected result for a SQL-created user:

name         storage
kb_test      replicated

Smoke test:

  1. On node A: CREATE USER kb_test IDENTIFIED WITH no_password;
  2. On node B: SHOW CREATE USER kb_test;
  3. On either node: DROP USER kb_test;

RBAC changes usually propagate within milliseconds to seconds, depending on Keeper latency and cluster load.

Check Keeper data exists:

SELECT *
FROM system.zookeeper
WHERE path = '/clickhouse/access';

5. Handle existing ON CLUSTER RBAC scripts safely

There are two independent propagation mechanisms:

  • Replicated access storage: Keeper-based replication of RBAC entities.
  • ON CLUSTER: query fan-out through the distributed DDL queue (also Keeper/ZooKeeper-dependent).

When replicated access storage is enabled, combining both can be redundant or problematic.

Recommended practice:

  • Prefer RBAC SQL without ON CLUSTER, or enable ignore mode:
SET ignore_on_cluster_for_replicated_access_entities_queries = 1;

With this setting, existing RBAC scripts containing ON CLUSTER can still be used safely: the clause is rewritten away for replicated-access queries.

For production, prefer configuring this in a profile (for example default in users.xml) rather than relying on session-level SET:

<clickhouse>
  <profiles>
    <default>
      <ignore_on_cluster_for_replicated_access_entities_queries>1</ignore_on_cluster_for_replicated_access_entities_queries>
    </default>
  </profiles>
</clickhouse>

6. Migrate existing clusters/users

Switching to Keeper-backed RBAC should be treated as a storage migration..

Important: replay/restore RBAC on one node only. Objects are written to Keeper and then reflected on all nodes.

Key facts before migration:

  • Changing user_directories storage or changing zookeeper_path does not move existing SQL RBAC objects automatically.
  • If the path changes, old users and roles are not deleted but become effectively hidden from the new storage path.
  • zookeeper_path cannot be changed at runtime via SQL.

Recommended high-level steps:

  1. Export/backup RBAC.
  2. Apply the new user_directories config on all nodes.
  3. Restart/reload as needed.
  4. Restore/replay RBAC.
  5. Validate from multiple nodes.

6.1 SQL-only migration (export/import RBAC DDL)

This path is useful when:

  • RBAC DDL is already versioned in your repo, or
  • you want to dump/replay access entities using SQL only.
  • Replaying SHOW ACCESS output is idempotent only if you handle IF NOT EXISTS/cleanup; otherwise prefer restoring into an empty RBAC namespace.

Recommended SQL-only flow:

  1. On the source, check where the entities are stored (local vs. replicated):
SELECT name, storage FROM system.users ORDER BY name;
SELECT name, storage FROM system.roles ORDER BY name;
SELECT name, storage FROM system.settings_profiles ORDER BY name;
SELECT name, storage FROM system.quotas ORDER BY name;
SELECT name, storage FROM system.row_policies ORDER BY name;
SELECT name, storage FROM system.masking_policies ORDER BY name;
  1. Export RBAC DDL from the source:
  • simplest full dump:
SHOW ACCESS;

Save the output as SQL (for example rbac_dump.sql) in your repo/artifacts.

You can also export individual objects with SHOW CREATE USER/ROLE/... when needed.

  1. Switch the configuration to replicated user_directories on the target cluster and restart/reload.
  2. Replay the exported SQL on one node (without ON CLUSTER in replicated mode).
  3. Validate from another node (SHOW CREATE USER ..., SHOW GRANTS FOR ...).

6.2 Migration with clickhouse-backup (--rbac-only)

# backup local RBAC users/roles/etc.
clickhouse-backup create --rbac --rbac-only users_bkp_20260304

# restore (on node configured with replicated user directory)
clickhouse-backup restore --rbac-only users_bkp_20260304

Important:

  • this applies to SQL/RBAC users (created with CREATE USER ..., CREATE ROLE ..., etc.);
  • if your users are in users.xml, those are config-based (--configs) and this is not an automatic local->replicated RBAC conversion.
  • run restore on one node only; entities will be replicated through Keeper.
  • If clickhouse-backup is configured with use_embedded_backup_restore: true, it delegates to SQL BACKUP/RESTORE and follows embedded rules. (see below).

6.3 Migration with embedded SQL BACKUP/RESTORE

BACKUP
    TABLE system.users,
    TABLE system.roles,
    TABLE system.row_policies,
    TABLE system.quotas,
    TABLE system.settings_profiles,
    TABLE system.masking_policies
TO <backup_destination>;

-- after switching config
RESTORE
    TABLE system.users,
    TABLE system.roles,
    TABLE system.row_policies,
    TABLE system.quotas,
    TABLE system.settings_profiles,
    TABLE system.masking_policies
FROM <backup_destination>;

allow_backup behavior for embedded SQL backup/restore:

  • Storage-level flag in user_directories (<replicated>, <local_directory>, <users_xml>) controls whether that storage participates in backup/restore.
  • Entity-level setting allow_backup (for users/roles/settings profiles) can exclude specific RBAC objects from backup.

Defaults in ClickHouse code:

  • users_xml: allow_backup = false by default.
  • local_directory: allow_backup = true by default.
  • replicated: allow_backup = true by default.

Operational implication:

  • If you disable allow_backup for replicated storage, embedded BACKUP TABLE system.users ... may skip those entities (or fail if no backup-allowed access storage remains).

7. Troubleshooting: common support issues

SymptomTypical root causeWhat to do
User created on node A is missing on node BRBAC still stored in local_directoryVerify system.user_directories; ensure replicated is configured on all nodes and active
RBAC objects “disappeared” after config change/restartzookeeper_path or storage source changedRestore from backup or recreate RBAC in the new storage; keep path stable
New replica has no historical users/rolesTeam used only ... ON CLUSTER ... before scalingEnable Keeper-backed RBAC so new nodes load shared state
CREATE USER ... ON CLUSTER throws “already exists in replicated”Query fan-out + replicated storage both appliedRemove ON CLUSTER for RBAC or enable ignore_on_cluster_for_replicated_access_entities_queries
CREATE USER/GRANT fails with Keeper/ZooKeeper errorKeeper unavailable or connection lostCheck system.zookeeper_connection, system.zookeeper_connection_log, and server logs
RBAC writes still go to local_directory even though replicated is configuredlocal_directory remains the first writable storageUse user_directories replace="replace" and avoid writable local SQL storage in front of replicated
Server does not start when Keeper is down; no one can log inReplicated access storage needs Keeper during initializationRestore Keeper first, then restart; if needed use a temporary fallback config and keep a break-glass users.xml admin
Startup fails (or users are skipped) because of invalid RBAC payload in KeeperCorrupted/invalid replicated entity and strict validation modeUse throw_on_invalid_replicated_access_entities deliberately: true fail-fast, false skip+log; fix bad Keeper payload before re-enabling strict mode
Two independent clusters unexpectedly share the same users/rolesBoth clusters point to the same Keeper ensemble and the same zookeeper_pathUse unique RBAC paths per cluster (recommended), or isolate with Keeper chroot (requires Keeper metadata repopulation/migration)
Cannot change RBAC keeper path with SQL at runtimeNot supported by designChange config + controlled migration/restore
Trying to “sync” RBAC between independent clusters by pointing to another pathWrong migration modelUse backup/restore or SQL export/import, not ad hoc path switching
Authentication errors from app/job, but local tests workNetwork/IP/user mismatch, not replication itselfCheck system.query_log and source IP; verify user host restrictions
Short window where user seems present/absent via load balancerPropagation + node routing timingValidate directly on each node; avoid assuming LB view is instantly consistent
Server fails after aggressive user_directories replacementRequired base users/profiles missing in configKeep users_xml (or equivalent base definitions) intact

8. Operational guardrails for production

  • Keep the same user_directories config on all nodes.
  • Keep zookeeper_path unique per cluster/tenant.
  • Use a dedicated admin user for provisioning; avoid using default for automation.
  • Track configuration rollouts (who/when/what) to avoid hidden behavior changes.
  • Treat Keeper health as part of access-management SLO.
  • Plan RBAC backup/restore before changing storage path or cluster topology.

9. Observability and debugging signals

9.1 Check Keeper connectivity

SELECT * FROM system.zookeeper_connection;
SELECT * FROM system.zookeeper_connection_log ORDER BY event_time DESC LIMIT 100;
SELECT * FROM system.zookeeper WHERE path = '/clickhouse/access';

9.2 Relevant server log patterns

You can find feature-related lines in the log, by those patterns:

Access(replicated)
ZooKeeperReplicator
Can't have Replicated access without ZooKeeper
ON CLUSTER clause was ignored for query

9.3 Force RBAC reload

Force access reload:

SYSTEM RELOAD USERS;

10. Keeper path structure and semantics (advanced)

The following details are useful for advanced debugging or when inspecting Keeper paths manually.

If zookeeper_path=/clickhouse/access:

/clickhouse/access
  /uuid/<entity_uuid>   -> serialized ATTACH statements for one entity
  /U/<escaped_name>     -> user name -> UUID
  /R/<escaped_name>     -> role name -> UUID
  /S/<escaped_name>     -> settings profile name -> UUID
  /P/<escaped_name>     -> row policy name -> UUID
  /Q/<escaped_name>     -> quota name -> UUID
  /M/<escaped_name>     -> masking policy name -> UUID

When these paths are accessed:

  • startup/reconnect: ClickHouse syncs Keeper, creates roots if missing, loads all entities;
  • CREATE/ALTER/DROP RBAC SQL: updates uuid and type/name index nodes in Keeper transactions;
  • runtime: watch callbacks refresh changed entities into local in-memory mirror.

11. Low-level internals

Advanced note:

  • each ClickHouse node keeps a local in-memory cache of all replicated access entities;

  • cache is updated from Keeper watch notifications (list/entity watches), so auth/lookup paths use local memory and not direct Keeper reads on each request.

  • watch patterns used:

    • list watch on /uuid children for create/delete detection;
    • per-entity watch on /uuid/<id> for payload changes.
  • thread model:

    • dedicated watcher thread (runWatchingThread);
    • on errors: reset cached Keeper client, sleep, retry;
    • after refresh: send AccessChangesNotifier notifications.
  • cache layers:

    • primary cache: MemoryAccessStorage inside replicated access storage;
    • higher-level caches in AccessControl (RoleCache, RowPolicyCache, QuotaCache, SettingsProfilesCache) are updated/invalidated via access change notifications.
  • Read path is memory-backed (MemoryAccessStorage mirror), not direct Keeper reads per query.

  • Write path requires Keeper availability; if Keeper is down, RBAC writes fail while some reads can continue from loaded state.

  • Insert target is selected by storage order and writeability in MultipleAccessStorage; this is why leftover local_directory can hijack SQL user creation.

  • ignore_on_cluster_for_replicated_access_entities_queries is implemented as AST rewrite that removes ON CLUSTER for access queries when replicated access storage is enabled.

12. Version and history highlights

DateChangeWhy it matters
2021-07-21ReplicatedAccessStorage introduced (e33a2bf7bc9, PR #27426)First Keeper-backed RBAC replication
2023-08-18Ignore ON CLUSTER for replicated access entities (14590305ad0, PR #52975)Reduced duplicate/overlap behavior
2023-12-12Extended ignore behavior to GRANT/REVOKE (b33f1245559, PR #57538)Fixed common operational conflict with grants
2025-06-03Keeper replication logic extracted to ZooKeeperReplicator (39eb90b73ef, PR #81245)Cleaner architecture, shared replication core
2026-01-24Optional strict mode on invalid replicated entities (3d654b79853)Lets operators fail fast on corrupted Keeper payloads

13. Code references for deep dives

  • src/Access/AccessControl.cpp
  • src/Access/MultipleAccessStorage.cpp
  • src/Access/ReplicatedAccessStorage.cpp
  • src/Access/ZooKeeperReplicator.cpp
  • src/Interpreters/removeOnClusterClauseIfNeeded.cpp
  • src/Access/IAccessStorage.cpp
  • src/Backups/BackupCoordinationOnCluster.cpp
  • src/Backups/RestoreCoordinationOnCluster.cpp
  • tests/integration/test_replicated_users/test.py
  • tests/integration/test_replicated_access/test_invalid_entity.py