Clickhouse: UDF support

Created on 16 Jun 2016  路  7Comments  路  Source: ClickHouse/ClickHouse

It would be great if we could create user defined functions in ClickHouse. I don't know whether it has C++ API for that but a high level language such as V8 for Lua would be better since they allow us to create functions in runtime similar to PL/pgSQL and PL/SQL.

Since ClickHouse is an analytical database, users will want to perform complex analytical queries such as funnel and retention and implementing them in ANSI SQL (with joins, CTEs etc.) is quite inefficient. UDFs would help us to avoid expensive JOINS so it would be a huge win.

UDFs can be created using CREATE FUNCTION syntax similar to this one:

CREATE FUNCTION dummy_func() RETURNS int8 AS '
    return 1;
' LANGUAGE V8;

Implementing aggregate functions may be harder than scalar functions but even the support for scalar functions would be great.

feature st-community-taken

Most helpful comment

We have not support for UDFs yet for several reasons:

  1. To not fixate on some API. Internal interfaces are subjects to change. We want to have the ability to "break" the code.
  2. Interface of ordinary functions is quite complex: due to vector engine, functions are dispatched not on single values, but on arrays of values. It allows to unroll and vectorize loops; or for example, to run substring search on whole array. Also, functions provide different code path for constants and for non-constants - they depend on memory layout of data, for efficiency. It made interface quite ugly for using from foreign language. It is possible to provide some simple interface (without ability to vectorize) for foreign languages and it could be done in different ways... Whole idea remains reasonable for implementing 'dirty' temporary functions.

For now, it is better to add every new function directly to ClickHouse source. If function is not too specific, code will be merged.

All 7 comments

We have not support for UDFs yet for several reasons:

  1. To not fixate on some API. Internal interfaces are subjects to change. We want to have the ability to "break" the code.
  2. Interface of ordinary functions is quite complex: due to vector engine, functions are dispatched not on single values, but on arrays of values. It allows to unroll and vectorize loops; or for example, to run substring search on whole array. Also, functions provide different code path for constants and for non-constants - they depend on memory layout of data, for efficiency. It made interface quite ugly for using from foreign language. It is possible to provide some simple interface (without ability to vectorize) for foreign languages and it could be done in different ways... Whole idea remains reasonable for implementing 'dirty' temporary functions.

For now, it is better to add every new function directly to ClickHouse source. If function is not too specific, code will be merged.

Since CH already bundles Clang, how about UDFs in C/CPP? Scalar functions could be a good start...

Since CH already bundles Clang, how about UDFs in C/CPP? Scalar functions could be a good start...

First we will try to add dynamic code generation for compositions of simple functions.
For example, a * b + c will be compiled to single loop.

Only after we will think about adding C++ UDF.

First we will try to add dynamic code generation for compositions of simple functions.

FYI, dynamic code generation has been implemented.

This task is assigned to @ObjatieGroba and he will do research and proof-of-concepts on various variants of UDFs.

Some discussion around UDF in ClickHouse with code is here

Maybe this is a crazy idea, but that about the Julia language as UDF language? It's fast, can broadcast function calls to arrays out of box and can be embedded into c/c++ code.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zhicwu picture zhicwu  路  3Comments

innerr picture innerr  路  3Comments

vixa2012 picture vixa2012  路  3Comments

fizerkhan picture fizerkhan  路  3Comments

derekperkins picture derekperkins  路  3Comments