Julia: Issue with eachmatch using unicode and regex pipes

Created on 17 Apr 2018  路  2Comments  路  Source: JuliaLang/julia

This call doesn't terminate (eachmatch produces an infinite iterator):
collect(eachmatch(r"^$|\S", "枚"))

The following calls do terminate:

collect(eachmatch(r"\S", "枚"))
collect(eachmatch(r"^$", "枚"))
match(r"^$|\S", "枚")
matchall(r"^$|\S", "枚")

Related issue(?): https://github.com/JuliaLang/julia/issues/26199

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Phenom(tm) II X4 965 Processor
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Barcelona)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, amdfam10)
bug strings

Most helpful comment

Yup, it seems like that fixes it. Will submit a PR shortly.

(It's pretty surprising that no one has noticed it before now, since it seems like there is a problem whenever the end of a match is not an ASCII character, and the problem has been in the code for years.)

All 2 comments

It seems like this line should be calling ncodeunits rather than lastindex.

Yup, it seems like that fixes it. Will submit a PR shortly.

(It's pretty surprising that no one has noticed it before now, since it seems like there is a problem whenever the end of a match is not an ASCII character, and the problem has been in the code for years.)

Was this page helpful?
0 / 5 - 0 ratings