Julia: Unexpected behaviour in `matchall`

Created on 14 Feb 2018 · 5Comments · Source: JuliaLang/julia

For example the code below returns the captured () SubString with the rest of the expression rather than just the capture request.

matchall(r"<tag>(.*?)</tag>", "<tag>I</tag> <tag>like</tag> <tag>writing</tag> <tag>code</tag>")
# 4-element Array{SubString{String},1}:
#  "<tag>I</tag>"      
#  "<tag>like</tag>"   
#  "<tag>writing</tag>"
#  "<tag>code</tag>"

The expected behaviour should return:

"I"
"like"
"writing"
"code"

Or maybe:

["I"]
["like"]
["writing"]
["code"]

To account for multiple possible captures in each match.

I am currently using the eachmatch function as an alternative.

search & find

Source

dataPulverizer

Most helpful comment

Let's just deprecate it. We can add something else back later.

StefanKarpinski on 14 Feb 2018

👍2

All 5 comments

AFAICT it works as documented, i.e. it returns m.match rather than m.captures. I guess the docs could be made more explicit.

As you note, the problem with returning captures is that there can be several of them, and it's going to be inefficient to return a vector of vectors. In that case, better use eachmatch and handle the multiple captures as appropriate.

We could also deprecate matchall since the best behavior isn't completely obvious. It could be changed to return a vector of RegexMatch objects, equivalent to collect(eachmatch(...)). That would be consistent with the to-be-added findeach function proposed in the Search & Find Julep. But then for full consistency match should be called matchfirst.