This is the multi-page printable view of this section. Click here to print.
S3 & object storage
- 1: AWS S3 Recipes
- 2: S3Disk
1 - AWS S3 Recipes
Using AWS IAM — Identity and Access Management roles
For EC2 instance, there is an option to configure an IAM role:
Role shall contain a policy with permissions like:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "allow-put-and-get",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject"
],
"Resource": "arn:aws:s3:::BUCKET_NAME/test_s3_disk/*"
}
]
}
Corresponding configuration of ClickHouse:
<clickhouse>
<storage_configuration>
<disks>
<disk_s3>
<type>s3</type>
<endpoint>http://s3.us-east-1.amazonaws.com/BUCKET_NAME/test_s3_disk/</endpoint>
<use_environment_credentials>true</use_environment_credentials>
</disk_s3>
</disks>
<policies>
<policy_s3_only>
<volumes>
<volume_s3>
<disk>disk_s3</disk>
</volume_s3>
</volumes>
</policy_s3_only>
</policies>
</storage_configuration>
</clickhouse>
Small check:
CREATE TABLE table_s3 (number Int64) ENGINE=MergeTree() ORDER BY tuple() PARTITION BY tuple() SETTINGS storage_policy='policy_s3_only';
INSERT INTO table_s3 SELECT * FROM system.numbers LIMIT 100000000;
SELECT * FROM table_s3;
DROP TABLE table_s3;
2 - S3Disk
Settings
<clickhouse>
<storage_configuration>
<disks>
<s3>
<type>s3</type>
<endpoint>http://s3.us-east-1.amazonaws.com/BUCKET_NAME/test_s3_disk/</endpoint>
<access_key_id>ACCESS_KEY_ID</access_key_id>
<secret_access_key>SECRET_ACCESS_KEY</secret_access_key>
<skip_access_check>true</skip_access_check>
<send_metadata>true</send_metadata>
</s3>
</disks>
</storage_configuration>
</clickhouse>
skip_access_check — if true, it’s possible to use read only credentials with regular MergeTree table. But you would need to disable merges (
prefer_not_to_merge
setting) on s3 volume as well.send_metadata — if true, ClickHouse will populate s3 object with initial part & file path, which allow you to recover metadata from s3 and make debug easier.
Restore metadata from S3
Default
Limitations:
- ClickHouse need RW access to this bucket
In order to restore metadata, you would need to create restore file in metadata_path/_s3_disk_name_
directory:
touch /var/lib/clickhouse/disks/_s3_disk_name_/restore
In that case ClickHouse would restore to the same bucket and path and update only metadata files in s3 bucket.
Custom
Limitations:
- ClickHouse needs RO access to the old bucket and RW to the new.
- ClickHouse will copy objects in case of restoring to a different bucket or path.
If you would like to change bucket or path, you need to populate restore file with settings in key=value format:
cat /var/lib/clickhouse/disks/_s3_disk_name_/restore
source_bucket=s3disk
source_path=vol1/