Clickhouse: Support S3 as the persistent storage

Created on 24 Oct 2017 · 10Comments · Source: ClickHouse/ClickHouse

Some OLAPs, such as Snowflake, directly use S3 as their table storage, even for the temporary data. It has the benefit to save money for cloud users, additionally, it saves time for data ETL. From this benchmark could we see that S3 based OLAP(Snowflake) does not have a remarkable performance difference with local storage based one(Redshift). There also exists similar projects as Rocksdb-cloud which uses S3 as the Rocksdb's persistent storage, it could be some reference to have ClickHouse been more cloud native.

feature

Source

yingfeng

All 10 comments

FYI, we successfully used ClickHouse on top of google cloud storage via gcsfuse.

valyala on 8 Nov 2017

Hi @valyala , this is an open-source project?

theseusyang on 5 Jan 2018

gcsfuse is a user-space file system for interacting with Google Cloud Storage. Means you can store your data at any type of Google Cloud Storage. Gcsfuse is written in python and will impact performance (see https://github.com/GoogleCloudPlatform/gcsfuse#performance) but it could be useful option for storing old data (rarely accessible)

hagen1778 on 5 Jan 2018

@hagen1778 Yes, I mean the plugin that hooked clickhouse on gcsfuse storage, not mean gcsfuse itself.

theseusyang on 5 Jan 2018

You don't need a plugin.
We are using gcsfuse only for RO purposes and I don't know how it will handle writes but you can try it out. What we do:

install gcsfuse
uncomment user_allow_other option at gcsfuse config /etc/fuse.conf
mount Google Bucket somewhere into filesystem. Don't forget about allow_other parameter
make symlinks from mounted disk to your CH working dir (or even try to change working dir to mounted disk)
add mounting to startup actions if you want
restart CH

hagen1778 on 5 Jan 2018

Thanks @hagen1778 I will try it.

theseusyang on 12 Jan 2018

Does anyone ended up using ClickHouse on top of fuse in long term?

blinkov on 14 Sep 2018

AFAIK: from roadmap, 2019 Q2 seems to support S3-like object storage, right?

tangyong on 15 May 2019

@tangyong that item is not about replacing native storage with S3, it's more about import/export and on the fly processing. Actually this is already partially possible with URL table engine or table function, but it lacks authentication support so it won't work with non-public S3 buckets.

blinkov on 15 May 2019

🚀1

@hagen1778