Hi,
I was trying to load the data of size 500 million into a table with 200 chunks of files with 5Million of data in each file.
few files loaded without any error but few are not and receiving the below exceptions.
Error 1:
Received exception from server (version 18.14.15):
Code: 76. DB::Exception: Received from 1.2..1:9. DB::Exception: Cannot open file /ti*/clickhouse/data/ins*/staging_contact**/tmp_insert_6b55e561a54f424a3441250045dc7123_51051_51051_0/company_name.mrk, errno: 24, strerror: Too many open files.
Error 2.
Code: 210. DB::NetException: Connection reset by peer while writing to socket (1.2.3.1:9*)
Error 3
Code: 252. DB::Exception: Received from 1.2.3.1:9*. DB::Exception: Too many parts (300). Merges are processing significantly slower than inserts..
Kindly help us with answers at the earliest.
Thanks.
Kafka? https://github.com/yandex/ClickHouse/issues/3198
MACOs? https://clickhouse.yandex/docs/en/operations/server_settings/settings/#max-open-files
Wrong limits cat /proc/{CH-Server-PID}/limits
,3.
CPU/IO exhaustion
<merge_tree>
<parts_to_delay_insert>150</parts_to_delay_insert>
<parts_to_throw_insert>900</parts_to_throw_insert>
<max_delay_to_insert>5</max_delay_to_insert>
</merge_tree>
1,2,3 Wrong partitioning show create table TABLE
1 & 2 can be related (sockets are also files). - Check @den-crane answer.
"Too many parts" with high rate of inserts see https://github.com/yandex/ClickHouse/issues/3174#issuecomment-423435071
@den-crane i havent changed anything since the last load and it was working fine
below is the table schema
CREATE TABLE in.staging_contact (column 1, column 2,.......) ENGINE = MergeTree PARTITION BY category ORDER BY (topic, domain) SETTINGS index_granularity = 8192
above is the show create table TABLE
let me know if anything need to be changed in create table statement
PARTITION BY category
CH was designed to use monthly partitioning (partition by toYYYYMM(somedate/time)) to manage data retention (drop old data). So the design suggests small number of partitions and a single digit number of partitions over batch.
If you insert 5mil rows per insert and this batch has hundreds of different categories CH would separate this batch to hundreds parts which follows to huge random I/O (to merge them) and as a result to the error "Too many parts".
Most helpful comment
Kafka? https://github.com/yandex/ClickHouse/issues/3198
MACOs? https://clickhouse.yandex/docs/en/operations/server_settings/settings/#max-open-files
Wrong limits
cat /proc/{CH-Server-PID}/limits,3.
CPU/IO exhaustion
1,2,3 Wrong partitioning
show create table TABLE