For example the code below returns the captured () SubString with the rest of the expression rather than just the capture request.
matchall(r"<tag>(.*?)</tag>", "<tag>I</tag> <tag>like</tag> <tag>writing</tag> <tag>code</tag>")
# 4-element Array{SubString{String},1}:
# "<tag>I</tag>"
# "<tag>like</tag>"
# "<tag>writing</tag>"
# "<tag>code</tag>"
The expected behaviour should return:
"I"
"like"
"writing"
"code"
Or maybe:
["I"]
["like"]
["writing"]
["code"]
To account for multiple possible captures in each match.
I am currently using the eachmatch function as an alternative.
AFAICT it works as documented, i.e. it returns m.match rather than m.captures. I guess the docs could be made more explicit.
As you note, the problem with returning captures is that there can be several of them, and it's going to be inefficient to return a vector of vectors. In that case, better use eachmatch and handle the multiple captures as appropriate.
We could also deprecate matchall since the best behavior isn't completely obvious. It could be changed to return a vector of RegexMatch objects, equivalent to collect(eachmatch(...)). That would be consistent with the to-be-added findeach function proposed in the Search & Find Julep. But then for full consistency match should be called matchfirst.
Another issue to consider is the return type discrepancy between match and matchall.
I find that confusing, despite the correct documentation.
@sschelm that's a good point! matchall should return a Array{RegEx} type object.
Let's just deprecate it. We can add something else back later.
Most helpful comment
Let's just deprecate it. We can add something else back later.