Openrefine: [Wishlist] Seamless conversion of arrays into multiple columns

Created on 15 Oct 2012  路  12Comments  路  Source: OpenRefine/OpenRefine

_Original author: [email protected] (May 14, 2010 14:53:52)_

Unless I'm very much mistaken, there's no direct way to turn an array value into multiple columns.
Using join() requires finding a suitable separator character to split on later which requires knowledge
of the full content of the column.

Perhaps returning an array from a transform should create new columns representing the array
values? I'm not fussy about the specifics, but at the moment arrays seem to be 2nd class data types.

_Original issue: http://code.google.com/p/google-refine/issues/detail?id=36_

enhancement grel imported from old code repo logic usability

All 12 comments

_From stefano.[email protected] on May 14, 2010 16:13:57:_
Yeah, I agree, it would be useful to have your expression return an array and have Gridworks treat that as a way to build multiple columns... the problem though is that an expression could yield arrays with different length on each row, which means that Gridworks would have to do this in two passes: the first to understand how many columns will have to be created in total (taking the max of all the arrays returned by applying the expression on each row) and the second to create and fill up the cells in the new columns.

Another issue is naming the columns, but we could just come up with random names (say columnXX with an incremental counter).

Another option is to have some sort of 'column creator manifest', something like

{{{
value.split(',').make_columns({ "something" : result[0], "whatever" : result[1] });
}}}

but gets very verbose pretty fast.

Thoughts?

_From [email protected] on May 14, 2010 16:37:36:_
Actually the existing column splitting command already deals with both issues. We only
need to make it take any arbitrary expression that produces arrays.

_From thadguidry on May 29, 2010 20:18:52:_
Does this issue also deal with a simple UI interface for Edit Column / Join ??? For
example, I have 2 or more columns (first name, last name) that I want to easily combine
in order to reconcile with /person, and I simply just type the column names themselves
with a , separator to handle performing the join upon apply into my new column name.
We have Edit Cell / Join but no Edit Column / Join ??

_From [email protected] on July 31, 2011 17:44:33:_
In case anyone is looking for a simple workaround to join two columns (for example, joining a firstname and lastname column into a single 'name' column) - I found the simplest solution was to export the data from Refine as an Excel spreadsheet, and then to use the 'concatenate' function in Excel to join them.

The concatenate formula (including a whitespace between the firstname and lastname) is:

=A2&" "&B2

I found that for this to work neatly, you should use Refine to trim the whitespace from before and after the text strings. Other than that, worked a charm!

_From tfmorris on September 18, 2012 17:20:52:_
Remove obsolete milestone

@jackyq2015 @wetneb @ostephens @ettorerizza Read through this issue... I think David Huynh has a good point, and this would be a very useful new GREL function.

Imagine a value...

Yang,David,John,Charlie

and doing GREL

value.split(',')

produces

[ "Yang", "David", "John", "Charlie" ]

and by adding a new command

value.split(',').toColumns()

the preview window will instead show

Creating Columns: [ "Yang", "David", "John", "Charlie" ]

and then clicking OK on Transform window it executes the existing core/column-split command for those rows that have valid Arrays (non-error rows).

Additionally, the new toColumns() can also take parameters just like it does on the Edit columns -> Split into several columns

Thoughts ?

This could be very neatly solved by a new "by expression" option to the "Split into several columns..." dialogue.

@simonwiles Not any expression, right? Can you give an example of how you are thinking this might work?

Sorry if I was unclear. I simply meant an expression which returns multiple values -- an array (GREL) or list (Jython).

Would it be better to show the option where the user puts in their expression... and we tell the user if the provided expression produces an array or list that we can split on into columns? "You need to supply an expression which will produce an array or list for this option to work correctly"

Something like that is what you are thinking? What or how would it work in your perfect world, colors and all :)

That's a good idea @simonwiles, and yes Thad we would need to find a way to make this understandable easily for the user.

value.split(',').toColumns()

the preview window will instead show

Creating Columns: [ "Yang", "David", "John", "Charlie" ]

@thadguidry: I think we cannot go into the direction of a function toColumns(), because they can be called a few times in a single expression. We can't control that (i.e. or([ "Yang", "David"].toColums, [ "Antonin", "Antoine"].toColums) could be written).

But, what I would really consider, is, adding to ALL places with a 芦 expression preview聽禄, the ability to choose WHERE the result is going to be stored. Right now, we can store in 2 places: the value of the cell, or the cell object in itself. I think we should change that, and make it explicit, and add other columns also as destination (i.e., you transform from the 芦聽Name聽禄 column, so the first element of an array could be stored there, and that element _could be_ an array, but if you choose also to affect column 芦聽Given name聽禄 and column 芦聽Surname聽禄, then item 2 of the resulting array would go in 芦聽Given name聽禄 and item 3 of the resulting array could go in 芦聽Surname聽禄.

I think the user should have more control over where the result is going to go, and adding extra columns as a destination would be great.

Regards, Antoine

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ettorerizza picture ettorerizza  路  4Comments

thadguidry picture thadguidry  路  3Comments

stellasia picture stellasia  路  4Comments

anchardo picture anchardo  路  3Comments

thadguidry picture thadguidry  路  4Comments