If we have a lot of parts to load, can the query start with part of the parts information to make the query execution pipelined with the parts loading?
If the query finishes the reading of previous parts, it continue to consume some more parts, so on so forth.
Any idea to support it? Thanks.
Why? The query result would be not-consistent and non-determenistic.
@filimonov
Now, all parts are loaded into memory, then the query can run. As the the number of parts become more and more, the query will be blocked for longer time.
I am thinking whether the loading and running can be done in parallel. For example, we have 10 million parts, we can load the first 10k first, then the query can start to run, so on so force.
Why the result is not consistent if we still wait for all parts to finish?
@hustnn CH design expects a small number of parts. The reason of that is a performance.
Every part === random seek. Million random seeks agains an HDD disk (200 seeks per second) is equal 5000 seconds. So if your query needs to process 1mil parts it will take at least 1.38hour just to navigate to these files. That's why the MergeTree design expects that your table will be less than 1000 parts.
@den-crane Understand.
That is one example, as you mentioned, as the part loading consumes a lot of time, if the query execution can be pipelined with the loading phase, the end-to-end time may be a little longer than the loading time. Otherwise, the end-to-end will be the loading time + execution time.
How do you think?
To execute any query CH needs knowledge about all parts of the tables this query touches.
I do not see any flaws in current architecture. What is your problem?
@den-crane Thanks for your reply.
Yes, currently, CH needs knowledge about all parts of the tables this query touches.
I did some changes on ClickHouse, now the parts is distributed through a so-called coordinator, the distribution may take some time. If still need all parts ready, the query can not start until all parts distribution are finished. In order to not blocked by the distribution, I am wondering to make the query execution pipelined with the part distribution, so that the query can start without waiting for all parts are ready.
Is my problem statement clear to you?
Then your coordinator should have / store information about all parts ( parts metadata ) and be able to provide this metadata to CH instances.
@den-crane
Yes, you are right. All needed parts will be provided by the coordinator to the CH instance. That is why I am considering to make the query execution in the CH instance pipelined with the transfer of these parts.
@hustnn and what are the advantages of having 1 mln tiny parts, instead of (lets say) 1000 bigger parts?
@filimonov You are correct. Bigger parts are better. Just imaging some worst case. Not sure the pipelining execution model helps, so I am asking you guys.
You are correct. Bigger parts are better
Glad you agree.
Just imaging some worst case.
Try to create 1 mln sub folders in single folder and do a simple ls command there
Not sure the pipelining execution model helps
I'm definitely against that.
That partially loaded table = "inconsistancy". For example in replicated case it is not clear if the replica is healthy and in-sync unless you check the set of parts. Knowledge about replica health is important for other features, etc. Too many potential bad side effects.
The alternative (change partitioning schema to have less parts) is clear and simple.
.
That partially loaded table = "inconsistancy". For example in replicated case it is not clear if the replica is healthy and in-sync unless you check the set of parts. Knowledge about replica health is important for other features, etc. Too many potential bad side effects.
Just one question. Does the replica consistency check not working on partially loaded table?
The alternative (change partitioning schema to have less parts) is clear and simple.
Agreed and Thanks.
Does the replica consistency check not working on partially loaded table?
I have 10 parts, you have 20. Does it mean i need to download 10 parts from you, or i still loading my own parts set?
Most helpful comment
@hustnn CH design expects a small number of parts. The reason of that is a performance.
Every part === random seek. Million random seeks agains an HDD disk (200 seeks per second) is equal 5000 seconds. So if your query needs to process 1mil parts it will take at least 1.38hour just to navigate to these files. That's why the MergeTree design expects that your table will be less than 1000 parts.