Clickhouse: function extract how to use (?<=exp)

Created on 28 Sep 2018  Β·  2Comments  Β·  Source: ClickHouse/ClickHouse

use extract to match https://github.com/yandex/ClickHouse/issues/new;
re:(?<=://)[A-Za-z0-9.]+(?=/)
error:

:) SELECT extract('https://github.com/yandex/ClickHouse/issues/new', '(?<=://)[A-Za-z0-9.]+(?=/)')

SELECT extract('https://github.com/yandex/ClickHouse/issues/new', '(?<=://)[A-Za-z0-9.]+(?=/)')

Received exception from server (version 1.1.54362):
Code: 1000. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception: Exception: OptimizedRegularExpression: cannot compile re2: (?<=://)[A-Za-z0-9.]+(?=/), error: invalid perl operator: (?<.

question

Most helpful comment

Clickhouse uses re2 engine for regexp matching. It support only some subset of PCRE.
You can find docs about re2 syntax here: https://github.com/google/re2/wiki/Syntax
As you can see lookaround expressions are not supported.
In your case you actually don't need lookbehind / lookahead expressions. Just mark capture group with parentheses:

SELECT extract('https://github.com/yandex/ClickHouse/issues/new', '://([A-Za-z0-9.]+)/')

β”Œβ”€extract('https://github.com/yandex/ClickHouse/issues/new', '://([A-Za-z0-9.]+)/')─┐
β”‚ github.com                                                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

P.S. For manipulating urls in clickhouse there are a lot of nice functions.

SELECT domain('https://github.com/yandex/ClickHouse/issues/new')

β”Œβ”€domain('https://github.com/yandex/ClickHouse/issues/new')─┐
β”‚ github.com                                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

All 2 comments

Clickhouse uses re2 engine for regexp matching. It support only some subset of PCRE.
You can find docs about re2 syntax here: https://github.com/google/re2/wiki/Syntax
As you can see lookaround expressions are not supported.
In your case you actually don't need lookbehind / lookahead expressions. Just mark capture group with parentheses:

SELECT extract('https://github.com/yandex/ClickHouse/issues/new', '://([A-Za-z0-9.]+)/')

β”Œβ”€extract('https://github.com/yandex/ClickHouse/issues/new', '://([A-Za-z0-9.]+)/')─┐
β”‚ github.com                                                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

P.S. For manipulating urls in clickhouse there are a lot of nice functions.

SELECT domain('https://github.com/yandex/ClickHouse/issues/new')

β”Œβ”€domain('https://github.com/yandex/ClickHouse/issues/new')─┐
β”‚ github.com                                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

thanks

Was this page helpful?
0 / 5 - 0 ratings

Related issues

amonakhov picture amonakhov  Β·  3Comments

vixa2012 picture vixa2012  Β·  3Comments

bseng picture bseng  Β·  3Comments

fizerkhan picture fizerkhan  Β·  3Comments

derekperkins picture derekperkins  Β·  3Comments