I can see that Clickhouse has a backup feature which is described in the documentation: https://clickhouse.yandex/docs/en/query_language/queries.html#backups-and-replication. What I think might be useful for users is an incremental backup that should prevent from total disasters like major hardware failures or simply human errors. Elasticsearch, for example, has plugins which allow users to take data snapshots for all indices (index is just an equivalent in Clickhouse table) or just a subset of indices. One of such plugins allows users to send data to distributed storage like AWS S3, Azure storage or Google Cloud Storage. Storing backups in S3 has 2 major advantages: cuts costs and in addition to that S3 is very durable. Some companies have also large HDFS clusters which can be also used to keep backups.
In the title of this issue, I mentioned incremental backups which also have a huge advantage and can be useful in Clickhouse. For more sophisticated tasks Clickhouse users can/should use MergreTree data engine. Since MergeTree uses merged blocks I believe some of those become immutable after some time. This means one could hypothetically create one full snapshot and then create incremental snapshots every day or every hour. Each incremental snapshot will require Clickhouse to send only new blocks to distributed storage system. Incremental backups can make Clickhouse more durable because snapshots will become inexpensive to execute and store.
This would be great and make CH more appealing to organizations that need really production-ready, mature tech. Is this something that could be added to CH roadmap for early 2018?
I can declare my hands-on to help with this but I think it can be only help and I cannot lead this.
this is a really good point, we will prepare to develop these plugins, if other guys want to join this plan, welcome!
Bash implementation for CH backups https://github.com/jetbrains-infra/clickhouse-tasks/blob/master/backup/entrypoint.sh that I have found.
My tool is able to store backups efficiently on the disk and upload it fully or incrementally on S3.
I will be glad to get feedback and bug reports.
@AlexAkulov you could have saved everyone a few clicks by providing a direct link: https://github.com/AlexAkulov/clickhouse-backup/ 馃槈
Most helpful comment
@AlexAkulov you could have saved everyone a few clicks by providing a direct link: https://github.com/AlexAkulov/clickhouse-backup/ 馃槈