Julia: split() enhancement, keep_splitter

Created on 16 Feb 2017  路  3Comments  路  Source: JuliaLang/julia

Hello,

I would like to suggest a feature request, which I'm now working on in a new branch, for the split(str, splitter; limit=0, keep=true) function.

I found myself trying to use the function in that way, and what I expect somehow is the following,

julia> split("abcabcdabbcd", "b"; keep_splitter = true)
3-element Array{SubString{String},1}:
 "a"   
 "bca" 
 "bcda"
 "b"
 "bcd"

Should I keep on working? Would it be a breaking change for the rest of the ecosystem?

Some ideas:

  • If the keep_spliter flag is true, then it should not make any difference whether keep (which stands for empty results), is true or false, since, there wouldn't be empty results at all.
  • I'm not sure whether to include some other flag to include the splitter just to the next substring or to the previous one. This is how readlines(file) work, isn't it? In each element of the array we have '\n' at the end.
julia> split("abcabcdabbcd", "b"; keep_splitter = true, prepend = false)
3-element Array{SubString{String},1}:
 "ab"   
 "cab" 
 "cdab"
 "b"
 "cd"

Thanks in advance!

PS, maybe it can be done with another function I don't know yet

collections design strings

Most helpful comment

This also crops up when trying to split camel case strings:

foo = "ThisShouldBeSeparate"
split(foo, isuppercase)
5-element Array{SubString{String},1}:
 ""
 "his"
 "hould"
 "e"
 "eparate"

A solution is to use a regex delimiter: split(foo, r"(?=[A-Z])") but that's far less intuitive (and I wouldn't have solved it without outside help).

All 3 comments

Related to a possible splitlines function (https://github.com/JuliaLang/julia/pull/20390), and to the new chomp argument to readline/eachline (https://github.com/JuliaLang/julia/pull/19944, https://github.com/JuliaLang/julia/pull/19944).

Cc: @mpastell

Note that another way of looking at the current keep keyword, what it really does is say that your splitter can be repeated 1 or more times, i.e. implicitly wrapping it in a (...)+ as a regex.

This also crops up when trying to split camel case strings:

foo = "ThisShouldBeSeparate"
split(foo, isuppercase)
5-element Array{SubString{String},1}:
 ""
 "his"
 "hould"
 "e"
 "eparate"

A solution is to use a regex delimiter: split(foo, r"(?=[A-Z])") but that's far less intuitive (and I wouldn't have solved it without outside help).

Was this page helpful?
0 / 5 - 0 ratings