Clickhouse: Can query start execute asynchronously without waiting for all parts ？

Created on 24 Mar 2020 · 13Comments · Source: ClickHouse/ClickHouse

If we have a lot of parts to load, can the query start with part of the parts information to make the query execution pipelined with the parts loading?

If the query finishes the reading of previous parts, it continue to consume some more parts, so on so forth.

Any idea to support it? Thanks.

alternative build question question-answered

Source

hustnn

Most helpful comment

@hustnn CH design expects a small number of parts. The reason of that is a performance.
Every part === random seek. Million random seeks agains an HDD disk (200 seeks per second) is equal 5000 seconds. So if your query needs to process 1mil parts it will take at least 1.38hour just to navigate to these files. That's why the MergeTree design expects that your table will be less than 1000 parts.

den-crane on 27 Mar 2020

👍2

All 13 comments

Why? The query result would be not-consistent and non-determenistic.

filimonov on 25 Mar 2020

@filimonov

Now, all parts are loaded into memory, then the query can run. As the the number of parts become more and more, the query will be blocked for longer time.

I am thinking whether the loading and running can be done in parallel. For example, we have 10 million parts, we can load the first 10k first, then the query can start to run, so on so force.

Why the result is not consistent if we still wait for all parts to finish?

hustnn on 27 Mar 2020

den-crane on 27 Mar 2020

👍2

@den-crane Understand.

That is one example, as you mentioned, as the part loading consumes a lot of time, if the query execution can be pipelined with the loading phase, the end-to-end time may be a little longer than the loading time. Otherwise, the end-to-end will be the loading time + execution time.

How do you think?

hustnn on 30 Mar 2020

To execute any query CH needs knowledge about all parts of the tables this query touches.
I do not see any flaws in current architecture. What is your problem?

den-crane on 30 Mar 2020

@den-crane Thanks for your reply.

Yes, currently, CH needs knowledge about all parts of the tables this query touches.

I did some changes on ClickHouse, now the parts is distributed through a so-called coordinator, the distribution may take some time. If still need all parts ready, the query can not start until all parts distribution are finished. In order to not blocked by the distribution, I am wondering to make the query execution pipelined with the part distribution, so that the query can start without waiting for all parts are ready.

Is my problem statement clear to you?

hustnn on 30 Mar 2020

Then your coordinator should have / store information about all parts ( parts metadata ) and be able to provide this metadata to CH instances.

den-crane on 30 Mar 2020

@den-crane
Yes, you are right. All needed parts will be provided by the coordinator to the CH instance. That is why I am considering to make the query execution in the CH instance pipelined with the transfer of these parts.

hustnn on 31 Mar 2020

@hustnn and what are the advantages of having 1 mln tiny parts, instead of (lets say) 1000 bigger parts?

filimonov on 2 Apr 2020

@filimonov You are correct. Bigger parts are better. Just imaging some worst case. Not sure the pipelining execution model helps, so I am asking you guys.

hustnn on 6 Apr 2020

You are correct. Bigger parts are better

Glad you agree.

Just imaging some worst case.

Try to create 1 mln sub folders in single folder and do a simple ls command there

Not sure the pipelining execution model helps

I'm definitely against that.
That partially loaded table = "inconsistancy". For example in replicated case it is not clear if the replica is healthy and in-sync unless you check the set of parts. Knowledge about replica health is important for other features, etc. Too many potential bad side effects.

The alternative (change partitioning schema to have less parts) is clear and simple.
.

filimonov on 7 Apr 2020

That partially loaded table = "inconsistancy". For example in replicated case it is not clear if the replica is healthy and in-sync unless you check the set of parts. Knowledge about replica health is important for other features, etc. Too many potential bad side effects.

Just one question. Does the replica consistency check not working on partially loaded table?