Hello.
I wonder if there are any plans to write official client libraries, that, probably, could utilize TCP protocol instead of http?
Because now as far as I can see there are different 3rd party libs that have not very good quality. As a result all performance and speed of clickhouse is wasted inside those libs.
Probably some kind of c++ driver with native bindings to other languages?
@SkeLLLa there are no specific short-term plans, but which exactly libraries have you tried?
P.S. There's also JDBC driver for JVM-based products that is kind of official http://github.com/yandex/clickhouse-jdbc
@blinkov thanks for reply.
I'm talking mostly about libraries that use http interface, and doesn't use odbc/jdbc.
Recently I reviewed couple of nodejs drivers. For example in this case time that lib adds is up to 2x comparing to time that clickhouse uses to perform query. So if clickhouse request takes 1s, then library could add 2s for processing that response.
And all of them have several common problem:
So I think it would be good if clickhouse will have some binary library written in C++/Go/Rust/Whatever which will be responsible for communication with database preferably via TCP with have parsers built-in. And for other languages it will use binding. For example, couchbase, uses such approach: https://github.com/couchbase/couchnode.
@SkeLLLa, have you looked at https://github.com/artpaul/clickhouse-cpp
Also, there is TCP protocol driver for Go which is used for high performance data ingestion: https://github.com/kshvakov/clickhouse
There is TCP protocol driver for .NET: https://github.com/killwort/ClickHouse-Net
And for Java: https://github.com/housepower/ClickHouse-Native-JDBC
These are all 3rd party projects, so quality may be a concern. Though they are used in production systems. Official JDBC is also very good and production quality.
@alex-zaitsev thanks for suggestions.
My main goal is to have some good official lib aka driver and build libs for other langs around it.
So as for JDBC - the main con is that it requires JVM to run, so it's not good to have such dependency if you're building a library for node.js or php. Dotnet client seems quiet young project and C# now also not very cross-platform.
Clickhouse-cpp looks quiet good for making clients with binding to it, except one thing: https://github.com/artpaul/clickhouse-cpp/issues/21. Unless it doesn't support async interface it will be difficult to create a lib with binding for languages like js (node.js) due to their async nature.
Another option is try to write good TCP based client, but I haven't found any relevant docs about TCP protocol.
My main goal is to have some good official lib aka driver and build libs for other langs around it.
Another option is try to write good TCP based client, but I haven't found any relevant docs about TCP protocol.
I like the idea of having such a client. But AFAIK it's not a priority, there are a lot of server tasks, and for such a 'nice to have' stuff there are not enough resources.
Binary protocol details were not published by intent, as developers prefer to have an ability to change that protocol with braking compatibility. So the idea was that an external libs should not rely on internal protocol (which can be changed) instead of that use simple HTTP.
For ClickHouse itself HTTP is not a bottleneck, the bottleneck is usually serializing data to TSV / JSON / other format or API overhead on a library side. So if it's not a problem for you to go on a lower level (like c / asm) most probably you will be able to solve the performance issues just by fixing those most common bottlenecks (by improving serialization, maybe changing used HTTP-lib, by decreasing the API overhead, etc).
If you have resources and experience, you can help the project by creating such a driver. If it will have good quality, covered by tests and documented, probably it can get 'official' or 'recommended' status. Also you can contribute to artpaul/clickhouse-cpp, or to ClickHouse itself to add the some important building blocks to HTTP communication which are still missing (like exchange of some metainformation about query and results).
@filimonov thanks for reply.
As for HTTP it's ok, but I find it not very good, because of it's limitations:
Also if I have time I'll try to make a lib for node, probably, that will use tsv instead of json and will have some kind of transport abstraction in order to easily replace underlying protocol if something better will be available later.
query string length limit (that could actually vary depending on language used).
You can use POST and send the query in POST body instead of querystring (i think most of HTTP based libs do it like that).
For POST body there is no limitation on size, also in POST you can just send unencoded queries as plain text.
@filimonov, thanks. And as far as I can see not all libs use POST :).
@SkeLLLa these properties are kind of independent of each other:
Yandex internally has a whitelist of allowed programming languages. One of advantages of this is that infrastructure teams like ClickHouse doesn't have to scatter resources to maintain client libraries for all possible languages/frameworks/etc that exist in the world. So this allows us to keep the libraries we have relatively high quality, but it also makes it hard to justify hiring dedicated people to maintain libraries for technologies not on whitelist since nobody inside will be allowed to use them. In long term this situation might change, but at the moment all relevant libraries we have are already public.
No such plans as of 2019.