_Original author: [email protected] (April 20, 2012 18:48:30)_
Is there any way to export facets?
What steps will reproduce the problem?
1.Produce a project in Google Refine
2.Extract your Undo/Redo
3.Export your project to .tar.gz
What is the expected output? What do you see instead?
I expect that some remnant of my data facets and their state are exported, so that when I "Apply" or "Import", I have the same facets and selections as when I did Extract or Export.
What version of Google Refine are you using?
r2491 (20120420 checkout from SVN trunk)
What operating system and browser are you using?
Any
Is this problem specific to the type of browser you're using or it happens in all the browsers you tried?
Any
Please provide any additional information below.
I am able to save my actions as a selenium Test Case, and with that I am able to bring back my facets, but this seems unnecessary. I thought that the facets would persist a little bit longer.
_Original issue: http://code.google.com/p/google-refine/issues/detail?id=560_
_From tfmorris on April 20, 2012 22:17:08:_
Facets don't even persist across project editing sessions, let alone export/import cycles, so that would be the first thing to do before being able to export them.
One possible workaround, depending on what you're trying to do is, if you've got a set of rows selected with facets, you can flag or star the rows so that you have a permanent record of which rows were selected.
The one way that you can preserve facets and their state (selections, etc) is to click the "Permalink" link at the top of the project page. This will fill in the browser address bar with a big long URL that contains the browser information encoded in it.
The permalink option to preserve facets doesn't seem to work for larger amounts of facet selections.
I tested the permalink option with approximately 50 text facets (worked fine), then tested with 136 --this returned a blank page.
Using OpenRefine 2.6 beta 1 [TRUNK] for Windows
@anayram wouldn't permalinking 136 facets exceed the 2000 character limit of URLs?
I've wondered is there a more concise way of encoding facet settings in a permalink than the current implementation?
You are right, the current method creates really long urls.
Even the ability to print the contents of the facet results to a file would help. As it is, there seems to be no way to select, export, or save the results of a large faceting exercise.
I am keeping this issue in mind for 4.0, it's one of the things we can change as we redesign the architecture of the tool.
Hi Alan,
Actually, we currently do support copying facet values through a mechanism of the clipboard.
On each facet, you will have a blue link for "choices" that you can click on.
This will open another panel that has the textual representation of the facet values.
You can copy/paste this text into whatever file you need. ( CTRL-A to select all text in the panel that appears, then CTRL-C to copy, then CTRL-V to paste into a text editor or file that you need.
I agree it would be better not to have the user do the copy/paste values through the clipboard manually, but instead have the panel that appears to give options to the user to "Save to file...", etc. instead.
But for now, all we have is the clipboard handling, so hopefully your number of facet choices that you want to export are not large enough to overflow your clipboard on your system.
Actually, we currently do support copying facet values through a mechanism of the clipboard.
(...)
You can copy/paste this text into whatever file you need. ( CTRL-A to select all text in the panel that appears, then CTRL-C to copy, then CTRL-V to paste into a text editor or file that you need.
(...)
But for now, all we have is the clipboard handling, so hopefully your number of facet choices that you want to export are not large enough to overflow your clipboard on your system.
The real problem is to paste back those values, so the whole faceting work does not get lost and it comes back to life when you reopen your project the next day.
Is this possible @thadguidry ?
@abubelinha No, not at the moment. I think your ask in that regard would be of persisting Facet settings (not the values themselves, because they might change from Project to Project) and where you want to apply the same Facet settings to another project. That is what this issue #560 we are commenting in, is asking for as well.
@abubelinha No, not at the moment. I think your ask in that regard would be of persisting Facet settings (…) That is what this issue #560 is about.
There are really at least 3 considerations for facets:
That would be a great user feature… So tempting… ;-)
Regards,
Antoine
@antoine2711 Yes. That is what I was leading to... We might need a few issues that separate the concerns and needs of the community around Facets. This particular issue seems to be focused (based on original comment) on Facet settings.
If you can find some of the other "facet" labeled issues that cover those needs, and link to this, would be great.
_From tfmorris on April 20, 2012 22:17:08:_
Facets don't even persist across project editing sessions, let alone export/import cycles, so that would be the first thing to do before being able to export them.
I can’t believe this issue is more than 8 years old for such a basic thing.
If facets don’t persist, they are almost useless for cleaning long and complex datasets: I have more than 2 million rows so I can’t do this without several working journeys.
And I do have to turn off my computer when I leave the office.
I totally agree with @tfmorris ... first things first:
@abubelinha Thanks for your insight and opinion. I can believe it is this old. Here's why... We, OpenRefine, only have about 3 part-time developers volunteer their time with over 700 issues at one point in time, now 400+. We certainly understand that there's a need for easier repeatable processes in OpenRefine and having Facets part of that. Getting time to think about how to do that, while not breaking some existing functionality is sometimes hard and there are other priorities for the team immediately.
We appreciate you trying to prioritize which issues are important to you right now, we do sincerely.
We have a rough roadmap of some important features asked for within our Community Survey that we post about. There were much more votes in other areas of OpenRefine that folks wanted to see fixed or added, so we have been focusing on those this year. But rest assured, this feature is within the top 20%...it's just not the top 5% of what folks cared about, which is what we're working on now behind the scenes.
(…) We, OpenRefine, only have about 3 part-time developers volunteer their time with over 700 issues at one point in time, now 400+. We certainly understand that there's a need for easier repeatable processes in OpenRefine and having Facets part of that. Getting time to think about how to do that, while not breaking some existing functionality is sometimes hard and there are other priorities for the team immediately.
We appreciate you trying to prioritize which issues are important to you right now, we do sincerely.
We have a rough roadmap of some important features asked for within our Community Survey that we post about. There were much more votes in other areas of OpenRefine that folks wanted to see fixed or added, so we have been focusing on those this year. But rest assured, this feature is within the top 20%...it's just not the top 5% of what folks cared about, which is what we're working on now behind the scenes.
@thadguidry: I’ve read a few month ago that one of the target of OR’s community was to open up so volunteers could come here, and, learn OR in such a way that they can write little PR, and contribute.
Is this still an objective?
The way I see it, a newcomer (but an intermediary level user), who knows coding, JS, and a bit of Java, can, if he learns GitHub, the philosophy of OR, and gets up to date with, let’s say 3 to 6 month of interaction here and on gGroups, can do simple issues.
But, he will need guidance, he will ask many question, and some no so bright ones. It will take some time of the core contributors. And, he cannot start and undertake big and medium issues, because issues come with testing, bug report, concept critics, etc.
But, this issue falls under those smaller issues, one that a newcomer with a few PR merged could handle, IF, support can be provided…
@antoine2711 Yes. That is what I was leading to... We might need a few issues that separate the concerns and needs of the community around Facets. This particular issue seems to be focused (based on original comment) on Facet settings.
If you can find some of the other "facet" labeled issues that cover those needs, and link to this, would be great.
I already did that two month ago. I looked at all of them. Facet are one of the most exposed to the user functionality. And it has, surprisingly, a lot dead angles. The level of sophistication is below the standard you ask me with the issues we've worked.
Maybe we could do more issues, but keeping them smaller, more incremental. This would be a way to get more people to come here and exchange.
And it would be, a believe, more « agile » way of working.
As a user, I share @abubelinha views that 8 years is long. But, as a new contributor here, I'm more tempted to code a solution, my year of working with OR has provided MUCH motivation to me. But, if I code, I might need help, and manage all the human overhead associated to a contribution on GitHub.
I know the topic of this issue is not « contributions to OR by other people », but I feel it is indirectly related.
Regards,
Antoine
@antoine2711 Let's move further discussion on those topics to the dev mailing list. I've started a thread already.
Perhaps the main reason why this has not been tackled is that it requires taking important design decisions: how should the persistence of facets work?
From a user perspective, I would personally like that when I reopen a project, the facets are restored in the state I left them. No particular action would be required to store the facets, they would just be persisted by default, just like the position in the history is always persisted too. (It is always possible to remove all facets if that is not desired).
From a technical perspective, we need to decide where these facets could be serialized. It could be done browser-side, for instance in cookies or via the more modern local storage/indexdb API. It could be done server-side, by serializing the facet configuration (i.e. EngineConfig) in the project data. Doing this backend-side would also mean that facets would be preserved when exporting / importing an OpenRefine project.
I think my preference would be to do this backend-side: this would also avoid bad surprises when switching browsers or clearing browser data. This is particularly important because OpenRefine is designed to run locally, so all cookies or local data are shared by other web applications running on localhost
.
So, to do this backend-side, we would need to:
Perhaps the main reason why this has not been tackled is that it requires taking important design decisions: how should the persistence of facets work?
From a user perspective, I would personally like that when I reopen a project, the facets are restored in the state I left them. No particular action would be required to store the facets, they would just be persisted by default, just like the position in the history is always persisted too. (It is always possible to remove all facets if that is not desired).
From a technical perspective, we need to decide where these facets could be serialized. It could be done browser-side, for instance in cookies or via the more modern local storage/indexdb API. It could be done server-side, by serializing the facet configuration (i.e. EngineConfig) in the project data. Doing this backend-side would also mean that facets would be preserved when exporting / importing an OpenRefine project.
I think my preference would be to do this backend-side: this would also avoid bad surprises when switching browsers or clearing browser data. This is particularly important because OpenRefine is designed to run locally, so all cookies or local data are shared by other web applications running on
localhost
.
@wetneb: I agree will all you have written. Me, I think we should do both. But not at the same time. The way I see it, is many parts.
Now, all those can be separate issues, done be separate people, if some kind of active coding coordination is done. And for that to work, it must not be you, that will already anyway have to review the code.
The step 1. has to be done first, and 4 should probably be cut in 3 or 4 issues, but most of that could be worked in parallel/at it own pace.
A lot can be done, but if it's all one issue, not many people might have time/skills to undertake such a commitment. It would take a pretty good programer, also.
Regards,
Antoine
I am not sure about serializing facets both server-side and client-side. When you open a project and both the backend and the frontend have facets to restore, which ones do you pick?
I am not sure about serializing facets both server-side and client-side. When you open a project and both the backend and the frontend have facets to restore, which ones do you pick?
So, since the « browser facets » are stored in the « browser facets group » on the project data, it's just a duplicate (in theory, but it could not be, and then, browser would be loaded, and only the « browser facets group » would be updated on the server side. It would always be a relatively recent version of the browser one, a backup in case your browser user data gets destroyed.
EDIT:
The difference is that server side would have a bunch of « folders » of facets configuration (itself a bunch of facets), whereas, client side would only have one active « facets configurations », the current. But such a system would enable the user to « swap » all his current facets to a « folder » on the project.
EDIT2:
Maybe this discussion could be in a gDocs dedicated to Facets Improvements. That would be what I call second « RoadMap ».
Regards, A.
I still do not see the benefit of storing facets in the browser if they are already stored in the backend. If you want to expand on your proposal of "folders", feel free to do so in an external document, but I think this is diverging from the original issue. By trying to fit in too many improvements at once, the risk is that no improvement gets merged.
I still do not see the benefit of storing facets in the browser if they are already stored in the backend. If you want to expand on your proposal of "folders", feel free to do so in an external document, but I think this is diverging from the original issue. By trying to fit in too many improvements at once, the risk is that no improvement gets merged.
Yes, I will do a gDoc about that. It will also show a bit what I'm trying to explain about smaller increment. I know I propose more stuff for Facets, but in fact it can be cut in many small pieces, so it would be easier for new-contributors. Showing might be clearer than explaining.
Regards, A.
I agree that facets should be stored server-side, only, like all the rest of the OpenRefine permanent state.
The key question in my mind is whether a set of facets (& selections) is part of the project state or editing session state. Currently we persist operation history and data as part of the project and, separately, expression history, starred expressions, and settings values on a per-user basis. Other session viewing state: current page, page size, record vs row mode, are, for better or worse, completely transient.
As I mentioned in 2014, the permalink mechanism does save the current facets and selections, but it's limited by maximum URL length. A short URL mechanism such as provided by Grafana would be one way to avoid this limitation. That would involve saving the facets & selections in the metadata.json
and assigning them an identifier which could be used to look them up and resolve them later. This would probably be the lowest cost way to improve the current situation. This would be in addition to, not instead of, the current functionality.
We could also declare facets & selections to be an intrinsic part of the project state and save them every time the project is saved (every 5 min). This is a more fundamental change and would probably also require that we implement things like "Open with no facets" for recovery from messed up (ie poorly performing) facet sets as well as work out the implications for multiple simultaneous edit sessions for a project, etc.
@tfmorris I am not a real programmer but I also have the feeling that the permalink thing is not the way to go.
On this page I found a plugin for saving facets.
From its manual I guess it works also using a permalink.
(these urls above refer to the wayback machine since the original site is not working right now).
The problem is that when I tried that plugin, it simply failed because my dataset had many rows and facets and the permalink was too long, or something like that.
So I googled for something else and then I finally found this issue.
I am happy there is so many people talking about it again.
Forgive me if I go somehow off-topic but ... where is that community survey mentioned by @thadguidry above? I would like to vote to rise up this issue, if possible ;)
The problem is that when I tried that plugin, it simply failed because my dataset had many rows and facets and the permalink was too long, or something like that.
So I googled for something else and then I finally found this issue.
I am happy there is so many people talking about it again.Forgive me if I go somehow off-topic but ... where is that community survey mentioned by @thadguidry above? I would like to vote to rise up this issue, if possible ;)
@abubelinha: The survey is done for this year, but there will be another one next year. Here's the compilation of the last years: https://groups.google.com/forum/#!topic/openrefine/sHOYuoGrzNE
You probably understand that most of the people working here are volunteers, like me, Thad, Tom, etc., There is a process here, so even if someone code something, it still has to be reviewed for code and design.
I also have this problem, so I try to code it: PR https://github.com/OpenRefine/OpenRefine/pull/2685. It's functional, but only for ListFacet (the most common). Other facets could be add fast, if there is a consensus in the community to go ahead.
If you can download and build from a PR, try it and give me some news. If you can't, you could tell me your OS and I would compile it for you and you could test it. Oh, it doesn't export, a facet will only persists in the browser per project, if you save it.
Regards,
Antoine
If you can download and build from a PR, try it and give me some news. If you can't, you could tell me your OS and I would compile it for you and you could test it. Oh, it doesn't export, a facet will only persists in the browser per project, if you save it.
Sorry @antoine2711 , I dont know howto build it.
I use refine in two environments: Windows7 at home, and a Linux account at work.
If any of them is available (or both) I would be happy to try them.
Thanks a lot to all of you for your help!
@abubelinha if you want to "vote" for this issue, you can click the smiley face at the top on the original post and then click the thumbs up sign.
@wetneb re backward compatibility
make sure this is backwards-compatible: previous OpenRefine versions should not choke on projects serialized with the new version. They should just ignore the serialized facets.
This is difficult to do because the old version a) doesn't know whether what it's ignoring is optional and b) can't write it back out again. I also don't see it as being very valuable. It's much more common to make sure that forward migration works (ie v3 can open v2 files).
I am ecstatic to find the tip about the 'choices' popup - that's very helpful, thank you!
I have to say that I would also like facets to persist for a project, not just because I often use many facets and need to revisit projects, but also because I am able to crash OpenRefine from time to time and it's tedious to start over. I'm not too bothered about saving the state of the facet (as in, which choices are selected), just the existence of the facet itself. I frequently use GREL to make custom facets and it's tiresome to have to type these in again when revisiting a project.
I frequently use GREL to make custom facets and it's tiresome to have to type these in again when revisiting a project.
Independent of savable facets, there is a GREL expression history which you can use retrieve recent expressions, including those that you used to create custom facets.
Ah, so there is (GREL history) - that's fantastic, thank you for pointing it out!
Most helpful comment
@antoine2711 Let's move further discussion on those topics to the dev mailing list. I've started a thread already.