This is a feature-request to add R support in Edit cells > Transform > Language.
Would be a great extension of course, but a lot of work. Maybe with JRI ?
Better I think for us long term would be Renjin http://docs.renjin.org/en/latest/introduction.html
I also really like the fact that it uses javax.scripting interfaces http://docs.renjin.org/en/latest/library/evaluating.html
I think just having R within OpenRefine would REALLY expand our user base.
I see there are a few language implementation now - can anyone point me to how these are defined? Would the jython extension be a good model?
@akbertram yes, the jython extension is a good model, as it isolates very well the support for that language.
@thadguidry +1000
R has a very active community, but a bit apart from other programmers. It's a booming language, as evidenced by each StackOverflow survey. R has also a steep learning curve, and being able to use it in a visual interface like that of OR would facilitate its learning.
Most importantly, a full support of R would bring a lot of potentialities to OpenRefine. The Jython extension was a great idea. The problem is that Jython only supports some of the Python modules (those that are not written in C). We can not use a whole range of great packages like numpy, pandas, NLTK and so on. With full support for R (including packages written in C++), there will be not much you cannot do in OpenRefine.
@ettorerizza N.B. Renjin tries to provide the best of both worlds - platform independence and full support for modules with C, C++ and Fortran code. We have a tool chain that compiles these languages to JVM bytecode so they can be used as normal JVM libraries without having to help your users navigate the setup of a Fortran compiler. Builds of the CRAN packages are published to http://packages.renjin.org
UPDATE: So I posted a nice email to the R Lang users group to let them know we'd love to collaborate and explore some use cases, and not just this one. Who knows, maybe later after our UI Refresh, they could help build some cool extensions for data exploration. So we'll see some folks coming into this issue and also asking about things on our mailing list. Expect it to get busy around here ! Like Alex @akbertram coming into the picture to take a look at things and see where he can help out with Renjin ! Thanks Alex !!!
This makes me wonder: if OpenRefine was a Jupyter client, and supported a Jupyter kernel connection on the back of Edit cells > Transform > Language, you would be opening up support for all manner of things, especially if there was Apache Arrow (https://github.com/OpenRefine/OpenRefine/issues/1469) data transfer in place (which would work with or python/pandas dataframes?).
I wonder if there are packages on the back of https://github.com/minrk/thebelab that might help with this?
@psychemedia Do you know R Lang well enough ? Would you like to have a Hangout this weekend to show me ideas ?
@thadguidry apols for not picking this up sooner, I try to go offline over the w/e.
My R Lang knowledge is sketchy. Thinking a bit more about the Jupyter route as a way of providing different sorts of language support for transformations, this perhaps places an undesirable requirement of having a Jupyter server running. In which case, 'native' support for R, as with Python/Jython, is perhaps more appropriate.
That said, for people working in a Jupyter environment, the ability to connect to a Jupyter kernel (of which there are many) would provide a more general way of supporting transformations using arbitrary languages.
One way of doing this might be to provide a way of opening a connection to a Jupyter kernel and somehow (?!) passing state between it and OpenRefine. I'm pitching ideas as a user here, but will try to read up on the docs to see if I can better articulate what I have in mind. But my fieeling is that supporting a Jupyter client as part of OpenRefine would be a big part of that.
Related: Jupyter client docs, what looks like it might be an old proof of concept(?) Java Jupyter client; UI/UX: the ThebeLab Javascript Jupyter client, or the SageCell server.
i'm basically just riffing on the idea that Jupyter lets you run code in a "cell" against an arbitrary language kernel and then trying to imagine how OpenRefine could leverage it.
@psychemedia Thanks Tony. We are now documenting ideas and more technical details into a Wiki page so that its more readable. But feel free to continue discussion in this issue or on our mailing list.
Linking to the discussion on OpenRefine user mailing list.
So the world has certainly changed since this issue was first opened in just 2 years time.
There is now growing support for Polygot Applications through various tooling.
One of those is GraalVM
Graal can help and do many things actually, like make Java programs faster or make applications extensible.
Currently GraalVM has limited support for Ruby, R, Python.
Lots of cool things, like guest language functions can be eval'ed and used as Java Values
I allow others to have full rights on experimentation or ideas :-)
Yes Graal would be a great fit to implement expression languages in OpenRefine. I don't think we can replace our current Jython extension with their Python implementation, as it is still less mature than Jython, but Graal could potentially be used to implement new languages. Also, it should not be too hard to reimplement GREL in Graal, which would probably add a lot of benefits too.
As GREL shows, OpenRefine's expression languages do not need to be fully-fledged programming languages to be useful, so Graal's main shortcoming of the partiality of its implementations is less of a concern than in many other places.
@wetneb yea exactly my feelings for same reasons.
I have very little experience with R but my understanding is that it would not be very useful as an expression language: what is interesting is R is its handling of tables, and that would not really help people write better expressions to compute single values.
Instead, we would need to just let users run an arbitrary R function taking the current table as argument and returning a new table which would replace the current one.
Most helpful comment
@ettorerizza N.B. Renjin tries to provide the best of both worlds - platform independence and full support for modules with C, C++ and Fortran code. We have a tool chain that compiles these languages to JVM bytecode so they can be used as normal JVM libraries without having to help your users navigate the setup of a Fortran compiler. Builds of the CRAN packages are published to http://packages.renjin.org