Hi all,
I'm trying to use a regex to find utf-8 letters.
When I try to run :
"string".scan(/\p{L}/)
I get the following error :
invalid regex: support for \P, \p, and \X has not been compiled
What can I do to use unicode matchers like \p ?
Thank you !
To add a minimum information to track what is needed to support this.
Since we use pcre for regex, from http://www.pcre.org/pcre.txt
UNICODE CHARACTER PROPERTY SUPPORT
UTF support allows the libraries to process character codepoints up to
0x10ffff in the strings that they handle. On its own, however, it does
not provide any facilities for accessing the properties of such charac-
ters. If you want to be able to use the pattern escapes \P, \p, and \X,
which refer to Unicode character properties, you must add
--enable-unicode-properties
to the configure command. This implies UTF support, even if you have
not explicitly requested it.
Including Unicode property support adds around 30K of tables to the
PCRE library. Only the general category properties such as Lu and Nd
are supported. Details are given in the pcrepattern documentation.
@arambert maybe you can build crystal on your system after you have installed libpcre with the above tweak and see if it works. The error message seems to tell that the regex reached libpcre.
Check http://crystal-lang.org/docs/installation/from_source_repository.html and http://crystal-lang.org/docs/installation/on_mac_osx_using_homebrew.html for information on how to build crystal from sources.
This is nothing we can fix in the standard library right? It's a question for your distro and possibly our omnibus project.
馃憤 installing pcre from source with --enable-unicode-properties (by default with Homebrew ) + installing Crystal from source (with Homebrew) did the trick.
brew install --build-from-source pcre
# remove previous crystal version if necessary then :
brew install --build-from-source crystal-lang
When Crystal was installed from a Homebrew bottle (as a binary), this problem was occuring.
Is there a way to include PCRE with unicode properties in the binaries ?
I guess we'd just need to pass that flag when we build crystal for distribution: https://github.com/crystal-lang/omnibus-crystal/blob/master/config/software/pcre.rb
Maybe off topic, but I guess in the end we should conform to RE2, which I think is becoming a kind of standard related to regular expressions, and not depend on PCRE (I mean, we could still use PCRE, but eventually replace it with something else that would still conform to RE2)
This is now working in 0.18.0: https://play.crystal-lang.org/#/r/11pc
Awesome ! Thanks.
Most helpful comment
I guess we'd just need to pass that flag when we build crystal for distribution: https://github.com/crystal-lang/omnibus-crystal/blob/master/config/software/pcre.rb
Maybe off topic, but I guess in the end we should conform to RE2, which I think is becoming a kind of standard related to regular expressions, and not depend on PCRE (I mean, we could still use PCRE, but eventually replace it with something else that would still conform to RE2)