Clickhouse: function extract how to use (?<=exp)

Created on 28 Sep 2018 · 2Comments · Source: ClickHouse/ClickHouse

use extract to match https://github.com/yandex/ClickHouse/issues/new;
re:(?<=://)[A-Za-z0-9.]+(?=/)
error:

:) SELECT extract('https://github.com/yandex/ClickHouse/issues/new', '(?<=://)[A-Za-z0-9.]+(?=/)')

SELECT extract('https://github.com/yandex/ClickHouse/issues/new', '(?<=://)[A-Za-z0-9.]+(?=/)')

Received exception from server (version 1.1.54362):
Code: 1000. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception: Exception: OptimizedRegularExpression: cannot compile re2: (?<=://)[A-Za-z0-9.]+(?=/), error: invalid perl operator: (?<.

question

Source

qietingfengling

Most helpful comment

Clickhouse uses re2 engine for regexp matching. It support only some subset of PCRE.
You can find docs about re2 syntax here: https://github.com/google/re2/wiki/Syntax
As you can see lookaround expressions are not supported.
In your case you actually don't need lookbehind / lookahead expressions. Just mark capture group with parentheses:

SELECT extract('https://github.com/yandex/ClickHouse/issues/new', '://([A-Za-z0-9.]+)/')

┌─extract('https://github.com/yandex/ClickHouse/issues/new', '://([A-Za-z0-9.]+)/')─┐
│ github.com                                                                        │
└───────────────────────────────────────────────────────────────────────────────────┘

P.S. For manipulating urls in clickhouse there are a lot of nice functions.

SELECT domain('https://github.com/yandex/ClickHouse/issues/new')

┌─domain('https://github.com/yandex/ClickHouse/issues/new')─┐
│ github.com                                                │
└───────────────────────────────────────────────────────────┘

filimonov on 28 Sep 2018

👍3

All 2 comments

SELECT extract('https://github.com/yandex/ClickHouse/issues/new', '://([A-Za-z0-9.]+)/')

┌─extract('https://github.com/yandex/ClickHouse/issues/new', '://([A-Za-z0-9.]+)/')─┐
│ github.com                                                                        │
└───────────────────────────────────────────────────────────────────────────────────┘

P.S. For manipulating urls in clickhouse there are a lot of nice functions.

SELECT domain('https://github.com/yandex/ClickHouse/issues/new')

┌─domain('https://github.com/yandex/ClickHouse/issues/new')─┐
│ github.com                                                │
└───────────────────────────────────────────────────────────┘

filimonov on 28 Sep 2018

👍3

thanks

qietingfengling on 29 Sep 2018

Was this page helpful?

0 / 5 - 0 ratings