Crystal: Support for \P, \p, and \X has not been compiled

Created on 27 May 2016  路  7Comments  路  Source: crystal-lang/crystal

Hi all,
I'm trying to use a regex to find utf-8 letters.
When I try to run :

"string".scan(/\p{L}/)

I get the following error :

invalid regex: support for \P, \p, and \X has not been compiled 

What can I do to use unicode matchers like \p ?

Thank you !

feature infrastructure

Most helpful comment

I guess we'd just need to pass that flag when we build crystal for distribution: https://github.com/crystal-lang/omnibus-crystal/blob/master/config/software/pcre.rb

Maybe off topic, but I guess in the end we should conform to RE2, which I think is becoming a kind of standard related to regular expressions, and not depend on PCRE (I mean, we could still use PCRE, but eventually replace it with something else that would still conform to RE2)

All 7 comments

To add a minimum information to track what is needed to support this.

Since we use pcre for regex, from http://www.pcre.org/pcre.txt

UNICODE CHARACTER PROPERTY SUPPORT

       UTF  support allows the libraries to process character codepoints up to
       0x10ffff in the strings that they handle. On its own, however, it  does
       not provide any facilities for accessing the properties of such charac-
       ters. If you want to be able to use the pattern escapes \P, \p, and \X,
       which refer to Unicode character properties, you must add

         --enable-unicode-properties

       to  the  configure  command. This implies UTF support, even if you have
       not explicitly requested it.

       Including Unicode property support adds around 30K  of  tables  to  the
       PCRE  library.  Only  the general category properties such as Lu and Nd
       are supported. Details are given in the pcrepattern documentation.

@arambert maybe you can build crystal on your system after you have installed libpcre with the above tweak and see if it works. The error message seems to tell that the regex reached libpcre.

Check http://crystal-lang.org/docs/installation/from_source_repository.html and http://crystal-lang.org/docs/installation/on_mac_osx_using_homebrew.html for information on how to build crystal from sources.

This is nothing we can fix in the standard library right? It's a question for your distro and possibly our omnibus project.

馃憤 installing pcre from source with --enable-unicode-properties (by default with Homebrew ) + installing Crystal from source (with Homebrew) did the trick.

brew install --build-from-source pcre
# remove previous crystal version if necessary then : 
brew install --build-from-source crystal-lang

When Crystal was installed from a Homebrew bottle (as a binary), this problem was occuring.
Is there a way to include PCRE with unicode properties in the binaries ?

I guess we'd just need to pass that flag when we build crystal for distribution: https://github.com/crystal-lang/omnibus-crystal/blob/master/config/software/pcre.rb

Maybe off topic, but I guess in the end we should conform to RE2, which I think is becoming a kind of standard related to regular expressions, and not depend on PCRE (I mean, we could still use PCRE, but eventually replace it with something else that would still conform to RE2)

This is now working in 0.18.0: https://play.crystal-lang.org/#/r/11pc

Awesome ! Thanks.

Was this page helpful?
0 / 5 - 0 ratings