by denys.seguret:
ReplaceAllStringFunc is useful when you need to process the match to compute the
replacement, but sometimes you need to match a bigger string than the one you want to
replace. A similar function able to replace submatch(es) seems necessary.
Let's say you have strings like
input := `bla b:foo="hop" blabla b:bar="hu?"`
and you want to replace the part between quotes in b:foo="hop" and
b:bar="hu?" using a function.
It's easy to build a regular expression to get the match and submatch, for example
r := regexp.MustCompile(`\bb:\w+="([^"]+)"`)
but when you use ReplaceAllStringFunc, the callback is only provided the whole match,
not the submatch, and must return the whole string. Practically this means you need to
execute the regexp (or another one) in the callback, for example like this :
input := `bla bla b:foo="hop" blablabla b:bar="hu?"`
r := regexp.MustCompile(`(\bb:\w+=")([^"]+)`)
fmt.Println(r.ReplaceAllStringFunc(input, func(m string) string {
parts := r.FindStringSubmatch(m)
return parts[1] + complexFunc(parts[2])
}))
I think a function ReplaceAllStringSubmatchFunc would be useful and would avoid the
second pass. The callback would receive the submatch and return the replacement of the
submatch. The last example would be rewritten as
input := `bla bla b:foo="hop" blablabla b:bar="hu?"`
r := regexp.MustCompile(`\bb:\w+="([^"]+)"`)
fmt.Println(r.ReplaceAllStringSubmatchFunc(input, complexFunc))
A similar function (ReplaceAllStringSubmatchSliceFunc ?) could be designed to give the
callback an array of strings that the callback would change. In fact it could be decided
that only this last function is really necessary.
Links :
- "How-to" question on Stack-Overflow : http://stackoverflow.com/q/17065465/263525
- Playground link : http://play.golang.org/p/I6Pg8OUeTj
CL https://golang.org/cl/106360043 mentions this issue.
I just hit this issue as well. Does "Unplanned" mean this is unlikely to get worked on?
I'm also including some information on my use-case, in case that helps.
I'm trying to transformed loglines containing key-value pairs, to redact any string values. So for example:
name: "Joe", last_name: "Bloggs", age: 5, nickname: "Jogs" }
might become:
name: "SOME_HASH", last_name: "SOME_HASH", age: 5, $comment: "do not redact me", nickname: "SOME_HASH" }
I only want to target quoted strings that are followed by either , (comma) or } (closing curly-braces), and I also want to ignore any $comment fields.
I know that Go's regexp doesn't have lookahead/lookbehinds, which means I can't check for the above. using those. That restricts me somewhat. However, I figured I'd just capture everything using a regex like this:
quoted_string_regex, _ := regex.Compile(`(\$comment: )?"([^"]*)"[,| }]`)
and then check the actual subgroups to see if $comment was there, and also grab out the comma or curly-brace, and put that back on at the end.
However, I'm using ReplaceAllStringFunc which only gives you the entire match - so it seem like I either need to do a second regex inside my callback function, or I need to do a bunch of contains/splits/ends-with etc.
(Obviously, if I've missed something obvious that is available in Go, please feel free to correct the above).
Does "Unplanned" mean this is unlikely to get worked on?
Unplanned just means that this won't potentially block a release. I know that @michaelmatloob has been looking at regexp stuff recently; perhaps he is interested.
Just wanted to add that I hit the very same issue today. I was trying to implement a simple tag replacement, e.g.
Name: {name}
First name: {firstname}
becomes
Name: Doe
First name: Jon
I'm coming from a Perl background; my first intuition was using a regexp like /{([^}]+)}/. Note the submatch in parentheses: In Perl, it would be possible to use replace (and call a function on the submatch) or use split (and get the submatches returned). In Go, split never returns the part that matches, and ReplaceAllStringFunc will return the complete string instead of just the submatch.
I'm not planning on working on this. If you're interested in contributing this, feel free to do so, but note that the freeze will start in a few days.
Is this issue solved by Regexp.Expand and Regexp.ExpandString?
@AlekSi
I guess not, at least not in a straightforward way. The number of variables in the expand template is limited, whereas the number of matches in a string isn't.
I came across this post by Elliot Chance, it solved a JavaScript to Go porting problem I was having (for consistency it would be nice if it was incorporated as a new method in the Go regexp package):
http://elliot.land/post/go-replace-string-with-regular-expression-callback
Gist here: https://gist.github.com/elliotchance/d419395aa776d632d897
Thanks for the link @srackham - I hit exactly the same problem with trying to port something from JavaScript to Go. It would definitely be nice to see this functionality inside the standard regexp package.
I also found another project which appears to implement similar functionality in perhaps a cleaner way because it replaces the default regexp: https://github.com/agext/regexp
This gives some idea of how the solution could look: https://github.com/agext/regexp/blob/master/agext.go#L105
Here is a snippet for anyone else looking for a way to replace submatches with a function using bytes (not strings) and without having to deal with intermediate (non-captured) data: https://gist.github.com/slimsag/14c66b88633bd52b7fa710349e4c6749
Most helpful comment
I came across this post by Elliot Chance, it solved a JavaScript to Go porting problem I was having (for consistency it would be nice if it was incorporated as a new method in the Go regexp package):
http://elliot.land/post/go-replace-string-with-regular-expression-callback
Gist here: https://gist.github.com/elliotchance/d419395aa776d632d897