Crystal: Regex can't match utf-8 characters

Created on 12 Jul 2017  路  3Comments  路  Source: crystal-lang/crystal

Hello,

I was trying to match some utf-8 characters but it doesn't seem to work.

/[[:alnum:]]/x.match "脿" # => nil

It works in Ruby and seems compliant with PCRE, is there a bug with crystal or is there a subtlety I didn't notice here ?

Note: I also wrote an error which did not raise any alarm, a /[class]]/. Maybe it should be mandatory to escape the ']' inside or outside the class

Most helpful comment

Looks like somebody should replace PCRE with Onigmo.

All 3 comments

This behavior related to absent option:

class Regex
  @[Flags]
  enum Options
    PCRE_UCP = 0x20000000

But this option requires PCRE build with UCP support.

pcre brew formula on MacOS contains '--enable-unicode-properties' option, but for some reason PCRE_UCP failed. Needs more investigation.

See also https://gist.github.com/jweyrich/9803969

P.S.: Ruby works fine because built with special Oniguruma branch (not pcre).

Looks like somebody should replace PCRE with Onigmo.

I think we could at least provide Regex::Option::UCP so people can use it if necessary. Just need to document that this option requires a compatible libpcre.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

asterite picture asterite  路  60Comments

straight-shoota picture straight-shoota  路  91Comments

HCLarsen picture HCLarsen  路  162Comments

farleyknight picture farleyknight  路  64Comments

asterite picture asterite  路  139Comments