Clickhouse: Any plans to add official libraries for languages other than C++?

Created on 15 Nov 2018  路  10Comments  路  Source: ClickHouse/ClickHouse

Hello.

I wonder if there are any plans to write official client libraries, that, probably, could utilize TCP protocol instead of http?
Because now as far as I can see there are different 3rd party libs that have not very good quality. As a result all performance and speed of clickhouse is wasted inside those libs.

Probably some kind of c++ driver with native bindings to other languages?

question

All 10 comments

@SkeLLLa there are no specific short-term plans, but which exactly libraries have you tried?

P.S. There's also JDBC driver for JVM-based products that is kind of official http://github.com/yandex/clickhouse-jdbc

@blinkov thanks for reply.

I'm talking mostly about libraries that use http interface, and doesn't use odbc/jdbc.
Recently I reviewed couple of nodejs drivers. For example in this case time that lib adds is up to 2x comparing to time that clickhouse uses to perform query. So if clickhouse request takes 1s, then library could add 2s for processing that response.
And all of them have several common problem:

  • they use HTTP that have obvious overhead comparing to TCP
  • they utilize streaming interface, but consume responses in JSON which is not quiet good format for streaming
  • javascript also adds some performance overhead

So I think it would be good if clickhouse will have some binary library written in C++/Go/Rust/Whatever which will be responsible for communication with database preferably via TCP with have parsers built-in. And for other languages it will use binding. For example, couchbase, uses such approach: https://github.com/couchbase/couchnode.

@SkeLLLa, have you looked at https://github.com/artpaul/clickhouse-cpp
Also, there is TCP protocol driver for Go which is used for high performance data ingestion: https://github.com/kshvakov/clickhouse
There is TCP protocol driver for .NET: https://github.com/killwort/ClickHouse-Net
And for Java: https://github.com/housepower/ClickHouse-Native-JDBC

These are all 3rd party projects, so quality may be a concern. Though they are used in production systems. Official JDBC is also very good and production quality.

@alex-zaitsev thanks for suggestions.

My main goal is to have some good official lib aka driver and build libs for other langs around it.

So as for JDBC - the main con is that it requires JVM to run, so it's not good to have such dependency if you're building a library for node.js or php. Dotnet client seems quiet young project and C# now also not very cross-platform.
Clickhouse-cpp looks quiet good for making clients with binding to it, except one thing: https://github.com/artpaul/clickhouse-cpp/issues/21. Unless it doesn't support async interface it will be difficult to create a lib with binding for languages like js (node.js) due to their async nature.

Another option is try to write good TCP based client, but I haven't found any relevant docs about TCP protocol.

My main goal is to have some good official lib aka driver and build libs for other langs around it.
Another option is try to write good TCP based client, but I haven't found any relevant docs about TCP protocol.

I like the idea of having such a client. But AFAIK it's not a priority, there are a lot of server tasks, and for such a 'nice to have' stuff there are not enough resources.

Binary protocol details were not published by intent, as developers prefer to have an ability to change that protocol with braking compatibility. So the idea was that an external libs should not rely on internal protocol (which can be changed) instead of that use simple HTTP.

For ClickHouse itself HTTP is not a bottleneck, the bottleneck is usually serializing data to TSV / JSON / other format or API overhead on a library side. So if it's not a problem for you to go on a lower level (like c / asm) most probably you will be able to solve the performance issues just by fixing those most common bottlenecks (by improving serialization, maybe changing used HTTP-lib, by decreasing the API overhead, etc).

If you have resources and experience, you can help the project by creating such a driver. If it will have good quality, covered by tests and documented, probably it can get 'official' or 'recommended' status. Also you can contribute to artpaul/clickhouse-cpp, or to ClickHouse itself to add the some important building blocks to HTTP communication which are still missing (like exchange of some metainformation about query and results).

@filimonov thanks for reply.

As for HTTP it's ok, but I find it not very good, because of it's limitations:

  • query string length limit (that could actually vary depending on language used).
  • query string parameters encoding could be potential issue that is hard to predict.
    That why I was also looking to TCP protocol or for some library that will provide an abstraction over it with simple async interface in order that will allow to create simple bindings for other languages.

Also if I have time I'll try to make a lib for node, probably, that will use tsv instead of json and will have some kind of transport abstraction in order to easily replace underlying protocol if something better will be available later.

query string length limit (that could actually vary depending on language used).

You can use POST and send the query in POST body instead of querystring (i think most of HTTP based libs do it like that).

For POST body there is no limitation on size, also in POST you can just send unencoded queries as plain text.

@filimonov, thanks. And as far as I can see not all libs use POST :).

@SkeLLLa these properties are kind of independent of each other:

  1. high quality usually means having stable public interface, reasonable test coverage and infrastructure (including integration tests), easy-to-understand and detailed enough documentation, lack of serious issues and so on.
  2. official usually just means that client library is maintained by same company as main product (Yandex in our case).
  3. whether TCP or HTTP protocol is used. For example, official drivers for either ODBC and JDBC are using HTTP and it's rarely a bottleneck.

Yandex internally has a whitelist of allowed programming languages. One of advantages of this is that infrastructure teams like ClickHouse doesn't have to scatter resources to maintain client libraries for all possible languages/frameworks/etc that exist in the world. So this allows us to keep the libraries we have relatively high quality, but it also makes it hard to justify hiring dedicated people to maintain libraries for technologies not on whitelist since nobody inside will be allowed to use them. In long term this situation might change, but at the moment all relevant libraries we have are already public.

No such plans as of 2019.

Was this page helpful?
0 / 5 - 0 ratings