Can I create about 3000 tables for one database?
Most probably it will work ok, with certain limitations:
Most probably ClickHouse (in versions > 19) will work with acceptable speed while the number of partitions across all tables is <10K. Then the number goes higher - the situation will go worse.
Why? Actually it's not a ClickHouse limitation - it's filesystem limitation: data in ClickHouse is stored in following manner:
/var/lib/clickhouse/data/<database_name>/<table_name>/<partitionid>_<partid>/<columnfiles>
So if you have too many tables & partition - you will have a lot of files & subfolders. Most probably you will not hit yet inode or other filesystem hard limits, but the performance of filesystem significantly degrades when the number of files and folders grow. Simple operations (like traversing/listing that folders) start working really slow when you have a lot of files and folders.
So the answer mostly depends on used storage and filesystem. SSD it will work better with a lot of files and folders, HDD - much worse. Maybe performance of XFS / BTRFS or some other specialized filesystem will degrade slower than with ext4 (not sure), but ClickHouse is tested only against ext4 so you can face some filesystem-related issues if you will use on production non-ext4.
To check how your OS and filesystem deal with such a number of files/directories - try that simple check:
NUMBER_OF_TABLES=3000
NUMBER_OF_PARTS_IN_TABLE=30
NUMBER_OF_COLUMNS=30
mkdir filesystem_test
cd filesystem_test
for (( tableid=1; $tableid <= $NUMBER_OF_TABLES; tableid++ ))
do
for (( partid=1; $partid <= $NUMBER_OF_PARTS_IN_TABLE; partid++ ))
do
mkdir -p table_${tableid}/partition_${partid};
for (( columnid=1; $columnid <= $NUMBER_OF_COLUMNS; columnid++ ))
do
echo $columnid > table_${tableid}/partition_${partid}/column_${columnid}_data.txt;
echo $columnid > table_${tableid}/partition_${partid}/column_${columnid}_mark.txt;
done
done
done
And check how fast filesystem recursive operations go in that folder. On my PC (ext4 on SSD) with above parameters:
➜ time bash -c 'find . | wc -l'
5493001
real 0m16.353s
user 0m2.580s
sys 0m4.800s
➜ time du --max-depth=0 -h ./
21G ./
real 0m35.890s
user 0m1.893s
sys 0m19.750s
Some extra info:
https://events.static.linuxfound.org/slides/2010/linuxcon2010_wheeler.pdf (quite old article actually)
http://old.datalad.org/test_fs_analysis.html
http://fsbench.netnation.com
etc.
@filimonov Thank you a lot. You are so awesome.
Most helpful comment
Most probably it will work ok, with certain limitations:
Most probably ClickHouse (in versions > 19) will work with acceptable speed while the number of partitions across all tables is <10K. Then the number goes higher - the situation will go worse.
Why? Actually it's not a ClickHouse limitation - it's filesystem limitation: data in ClickHouse is stored in following manner:
/var/lib/clickhouse/data/<database_name>/<table_name>/<partitionid>_<partid>/<columnfiles>So if you have too many tables & partition - you will have a lot of files & subfolders. Most probably you will not hit yet inode or other filesystem hard limits, but the performance of filesystem significantly degrades when the number of files and folders grow. Simple operations (like traversing/listing that folders) start working really slow when you have a lot of files and folders.
So the answer mostly depends on used storage and filesystem. SSD it will work better with a lot of files and folders, HDD - much worse. Maybe performance of XFS / BTRFS or some other specialized filesystem will degrade slower than with ext4 (not sure), but ClickHouse is tested only against ext4 so you can face some filesystem-related issues if you will use on production non-ext4.
To check how your OS and filesystem deal with such a number of files/directories - try that simple check:
And check how fast filesystem recursive operations go in that folder. On my PC (ext4 on SSD) with above parameters:
Some extra info:
https://events.static.linuxfound.org/slides/2010/linuxcon2010_wheeler.pdf (quite old article actually)
http://old.datalad.org/test_fs_analysis.html
http://fsbench.netnation.com
etc.