Openrefine: Allow smartSplit() to handle String of chars, not just a single char

Created on 7 Jul 2018  路  7Comments  路  Source: OpenRefine/OpenRefine

Is your feature request related to a problem or area of OpenRefine? Please describe.
smartSplit() does not currently handle more than 1 separator character to split on.

Describe the solution you'd like
allow smartSplit() to take a full string of characters to split on. And have it work like the other split functions and allow sep to be a String and not just charAt(0) , and keeping its "smarts" with special handling of quotes and guessing tabs or commas if sep String is not supplied.

Describe alternatives you've considered
none

Additional context
Comment from issue #1586
@ostephens Its a smartSplit() issue. Looks like David Huynh never went in and added the multichar support (I checked the code and history in link below) and so it is only looking at a single char for the sep String instead of any number of characters for sep. This was added as an enhancement for split() and also on the menu for "Split into several columns..." and the reason I know this is because I asked @dfhuynh to change that functionality back in 2010, which he did. BUT...doesn't look like we did that for smartSplit() .... so yeah... smartSplit() only takes the 1st character in the sep argument supplied and ignores any other characters in the argument.

Notice here it splits on everything with "r" ... but not my supplied "rd"
I'd say that's a enhancement from 2010 that I asked for that never happened in smartSplit() 馃榾 and not a classic bug 馃槃
image

So your Store Array functionality is fine. And works great from my testing on all sides, so far.

Issue is our inaccurate documentation and description of smartSplit() saying it works with a sep String, when it fact it only works with a sep Char ... and make a change to have it work like the other split functions and allow sep to be a String and not just charAt(0) , and keeping its "smarts" with special handling of quotes and guessing tabs or commas if sep String is not supplied.

@ostephens @ettorerizza @wetneb If you agree on above enhancement with smartSplit() then create new issue with above info copied in.

enhancement good first issue help wanted

All 7 comments

@thadguidry Fixed this issue in following PR. Could you please take a look? https://github.com/OpenRefine/OpenRefine/pull/1761

@thadguidry, if I am not mistaken, while working on this issue I figured that functions don't have test coverage. So do we have any plans to add them in one go or should we just add them incrementally as we encounter the low test coverage? e.g in this fix, should I add test for SmartSplit in SmartSplitTests directly by invoking the new SmartSplit().call(...) ?

@omkarnix Directly in SmartSplitTest. You are welcome to improve SmartSplitTests. Take a look at how CoalesceTests.java is performed, where @ostephens did a great job in the test setup.

@thadguidry Thanks for the response. I would certainly like to contribute in unit tests for not only SmartSplit but other functions as well.
for my https://github.com/OpenRefine/OpenRefine/pull/1761, should I create new issue for adding test in SmartSplitTests or amend the same PR? Also, should I create new issue to add other functions' tests?

@omkarnix no need to create issues about tests, we will welcome them in any form and any PR!

@wetneb. Sure Will add the tests and update my PR tomorrow. Thanks!

@wetneb @omkarnix great job guys ! thanks so much for this !

Was this page helpful?
0 / 5 - 0 ratings

Related issues

katrinleinweber picture katrinleinweber  路  3Comments

thadguidry picture thadguidry  路  3Comments

wetneb picture wetneb  路  3Comments

ettorerizza picture ettorerizza  路  4Comments

wetneb picture wetneb  路  3Comments