Openrefine: Proposals to improve the metadata system

Created on 22 Dec 2017  路  15Comments  路  Source: OpenRefine/OpenRefine

After using the metadata system for a month, including the development version with tags, I give you some suggestions for improvement.

  • Experience shows that the best time to fill the metadata of a document is during its creation (after that, we forget). So, it would be usefull to be able to fill at least the metadata "subject" and "description" in the creation window, just as one can already add tags and change the name of the file.

Example:

screenshot-127 0 0 1-3333-2017-12-22-12-38-15-423

  • The metadata editing window is not very user-friendly. A simple improvement would be to move the "edit" button closer to the corresponding label.

screenshot-127 0 0 1-3333-2017-12-22-12-40-32-159

  • The field "description"can only contain a few lines of text. It would be great to have a button "comments" in order to add structured text with paragraphs and basic formatting. It could be, for example, a small txt or html file in which the user could describe the operations that he/she performed on the file. A sort of README. Alternatively: allow to upload and attach a txt file to the projects.

  • As mentioned in another thread, the "creator" metadata is 99% of the time the user. I do not think many people will bother to fill it. It should be hidden and filled automatically one way or the other.

Well, I hope you will consider these proposals relevant. Don't hesitate if you have others suggestions.

metadata

All 15 comments

There is a json editor at my branch for the "data package"
https://github.com/jackyq2015/OpenRefine/
It can do the in-place editing and validation. you can take a look.

We can change the "About" dialog to the same. But it does not support the attachment for now. Also I am afraid that the way you try to use the "description" field is not so usual. Those operations/comments should be separated out from metadata. we have another ticket to deal with operation history/comments.

I agree that the metadata such as creator should be populated automatically. we can have a preference setting "creator" and each time we pull it from there and feed to the new project created. what do you think?

"we have another ticket to deal with operation history/comments."

Hi @jackyq2015 . I'll take a look at this ticket. Where is it ?

To be more clear, this metadata "comments" that I propose is not to comment on the history of JSON operations. It would rather be a notepad integrated into the project. Indeed, most users (librarians, datajournalists, scientists ...) do not just clean up their data with Open Refine: most often, they must also write a report on the operations they have performed. By attaching the draft of this report to the OpenRefine project, all the information would be stored in one place.

"we can have a preference setting "creator" and each time we pull it from there and feed to the new project created."

Good idea. We could define a preference setting "username". If the project is created, this username becomes the "Creator". If another username modifies the project, this name would be added to a list of metadata "Contributor". Once a "username" is set, all old projects that did not have "Creator" metadata will be automatically assigned to this username.

Some background history on the need...
We had heard the need for both Project notes and Column notes... I think the best way to handle that is with a new Menu option called Add Notes ... that does a "no-op" like David Huynh suggested in #368.

  1. Support a Add Notes edit button that opens an freeform input box for Project level from the ALL menu.
  2. Support a Add Notes edit button that opens an freeform input box for a Column menu (where the "no-op" holds the column name as well so that the user doesn't have to type the Column name, but can just say "this column" in the text anywhere).

I am not worried about sequencing... I would let the user Add or Modify any text blob they want to the Notes input dialog box at both Project and Column level...and let them deal with the sequencing of any text by themselves by numbering or whatever characters or text they want to use to organize. For example in a Column level...

"1. First I need to facet on this column to find all unique types. 2. Then I force uppercase all the unique types to make things look better. 3. Then I can work more with Column "values_2017" and see if there are duplicates
"

@ettorerizza is that what you refer to? Or you can provide a sample of the notes if you don't mind. I think attaching the notes to the project level and column level make sense because it is nature for the user and easy to access when they want to edit/view it. Metadata level data only accessible on the "project index" page. Most importantly, My understanding is that those notes should be part of project and operation instead of metadata itself.

Exactly @jackyq2015 , even if @thadguidry goes a step further with the idea to annotate each column. When working on an Open Refine project, we take all sorts of notes to document our work and then describe what we did. We sometimes create Grel formulas or Jython scripts that we would like to reuse. It is not always easy, months later, to extract these scripts from the history in JSON.

Here is an example of note taken months ago (in French, sorry). This text file is in my Dropbox, far from the relevant Open Refine project. I do not even know which one it is, since there are six versions of the same. It would be great if this kind of note could be integrated into the project itself.

I have no strong views on whether such notes should be regarded as metadata or as part of the project. After all, any data can be considered as a metadata depending on the way we look at it.

I really believe that the documentation of research data is a problem in the scientific community. With all the improvements in recent months, Open Refine could become a true Electronic Lab Notebook.

Operations_Refine_BelgicaPress.txt

@ettorerizza I added the "username" preference and it will be pulled as the creator of the metadata.
Will hide the "creator" column from the index page, do you want me to hide the "description" and "subject" column as well?

Thank you, @jackyq2015 . Regarding "subject", I use it for the moment to regroup files in the same real world project. Maybe we will discover later that it duplicates the tags, but for now, better to leave it visible. Description, on the other hand, should not be hidden under any circumstances. It's this field that makes it easy to find a project among others with very similar names.

ok. then only the "creator" will be hidden then.

@jackyq2015 your commits about this issue are on the data package branch, maybe it would be simpler to put them on a separate branch as the changes are mostly unrelated.

@ettorerizza I am not sure I follow you. Why do you want to keep your note in the project metadata?
I'll rather use the option add note described by @thadguidry to let the user add a new step in the history tab with a description of what a group of GREL expression does (option 1 from Thad comment).

I think storing the note in the project metadata is cleaner than having an operation for it. Renaming a project does not perform any operation, why should adding a note do that?

@magdmartin Ah, but I totally agree with Thad's comment. As long as OpenRefine allows one day to associate notes with a project, I'm happy, no matter how those notes are stored. I first suggested including them in the metadata schema because for me any information about the file is a metadata. But from a conceptual point of view, these notes can also be considered as a separate document that would be linked to the OR project using a DCMI term like "isReferencedBy" or "relation"

@wetneb I think we are discussing two different things and I just highjack this thread (sorry about that).

  1. A master note of the project in the metadata as discussed in this ticket.
  2. A documentation of the code (like you add comments in other programming languages) in the history tab. I think this should be discussed in #368.

implemented by PR #1398

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kushthedude picture kushthedude  路  3Comments

anchardo picture anchardo  路  3Comments

ettorerizza picture ettorerizza  路  3Comments

lapoisse picture lapoisse  路  3Comments

dantexier picture dantexier  路  4Comments