OpenRefine already provides support for reconciliation with Wikidata/Wikibase, and Wikidata/Wikibase editing. This is a very popular and powerful feature that has seen large uptake by the Wikidata community, and that has made OpenRefine the most powerful and refined batch Wikidata editing tool for non-coders.
Recently, Wikimedia Commons - the media repository of the Wikimedia ecosystem - has been enhanced with structured data. It is now possible to describe files on Wikimedia Commons with (federated) Wikidata statements as well; Commons now also contains a federated Wikibase instance.
I'd like to ask the OpenRefine developer community whether you would be willing to consider structured data editing support for Wikimedia Commons as well, in addition to the Wikidata module.
I can imagine that OpenRefine could provide support in two different ways:
Describe alternatives you've considered
Alternative batch upload and editing tools for Wikimedia Commons exist. Some (will) support structured data, some not.
I will be happy to advise you if you decide you are interested in following up on this feature request; I work with the structured data team and can connect you with our developers if needed.
Thanks for making a ticket about this. I have been thinking about that for a while - it's definitely something that I would like to see happening.
The way I would approach this is as follows:
This should be enough to enable editing of existing MediaInfo entities. As a user, the workflow would look like this:
Chick Corea & Stanley Clarke.jpg, ids are MediaInfo ids such as M74698470)Chick Corea to Q192465)Support for uploading new files would be interesting too. I have not thought a lot about the possible workflows for this, but here is one that comes to mind:
This requires adding support for file upload using the MediaWiki API (either within Wikidata-Toolkit or directly in OpenRefine).
Note that to treat Commons as a "generic" Wikibase install (and therefore rely on #1640) we need to figure out how to represent its relationship with Wikidata in a simple way (which amounts to some limited support for Wikibase federation). That might be slightly challenging given that Wikibase federation is still in its very early days, so the mechanisms needed to make it work seamlessly might not be fully in place or subject to change. Another option would be to make a separate Commons extension which would duplicate the Wikidata extension, tailored specifically for Commons. Such a duplication would likely make things harder to maintain, but it might be necessary if the federation between Commons and Wikidata has not been designed with a generic Wikibase federation scenario in mind.
Anyway I think it is important to discuss what the ideal workflows would look like from a user perspective (not having uploaded a lot of things to Common, I might not be aware of the real needs around this).
Thanks Sandra for reaching out!
I am generally in favor of this enhancement request.
Since @wetneb is our primary maintainer for the OpenRefine Wikidata functionality, I trust him on the feasibility and functionality.
My questions are around the uploading of new files. (what is absolutely required or not for upload to commons from a batch tool like OpenRefine? I didn't see a strict set of Rules mentioned but might have missed them somewhere).
Anyways, my questions:
I would love to see an OpenRefine features that enables uploading media files to Commons. It would make life a lot easier for GLAM institutions who combine data donations about their collections to Wikidata with image donations to Wikimedia Commons.
@fokky Hi Sandra again, could you find some time to answer my previous questions above here in the Github issue you opened?
Since this is a rather complex feature request, I have started a Google document with an attempt of describing various workflows and use cases. Anyone is welcome to comment there and ping me in the document. I'll be happy to answer questions.
- would the "title blacklist" need to be enforced as the Upload Wizard does? https://commons.wikimedia.org/wiki/Commons:Upload_Wizard
https://commons.wikimedia.org/wiki/MediaWiki:Titleblacklist
Answer: yes, absolutely, and it would also be extremely helpful to give users feedback that they're violating this when attempting to do an upload with 'faulty' characters. (This is missing in many external Wikimedia Commons upload tools, and is the source of a lot of confusion and frustration.)
- Would feedback upon uploading be needed as in the Upload Wizard?
Some feedback for users would be welcome, just like the feedback users get in OpenRefine when editing Wikidata, but I'd say this is not a very urgent issue. I'll be happy to look at specific examples and review where these could fit.
- What about "campaign support" for multimedia competitions like Wiki Loves Monuments?
These campaigns happen fully on wiki (or, to a very limited extent, potentially via other applications like Monumental) and I think these are generally out of scope here. One use case I would see: campaign organizers may want to take uploads through their campaign and improve metadata of the files after the participants have uploaded them. But that would fit into the use cases I described in the Google doc.
@fokky Thanks so much for answering these. We'll review the Google doc and if I have any questions I will comment within that doc.
Concerning file upload from OpenRefine, there are some generic libraries for MediaWiki that we could potentially use: https://www.mediawiki.org/wiki/API:Client_code/Java
For instance JWBF seems to support file upload: https://github.com/eldur/jwbf/blob/master/src/main/java/net/sourceforge/jwbf/mediawiki/actions/editing/FileUpload.java
(I haven't tried it out or researched more deeply to see if it would make sense to rely on it)
- would the "title blacklist" need to be enforced as the Upload Wizard does? https://commons.wikimedia.org/wiki/Commons:Upload_Wizard
https://commons.wikimedia.org/wiki/MediaWiki:TitleblacklistAnswer: yes, absolutely, and it would also be extremely helpful to give users feedback that they're violating this when attempting to do an upload with 'faulty' characters. (This is missing in many external Wikimedia Commons upload tools, and is the source of a lot of confusion and frustration.)
To add to that. The tool itself doesn't have to implement the blacklist, you just have to implement the handling of the exceptions the API will throw at you when you happen to hit the blacklist.
In practice if you use a decent file name mask, you'll never hit it.
Metadata editing and reconciliation support for Mediawiki Commons sounds like a useful and logical extension, but file uploading seems like a stretch. It would be an entirely new, Mediawiki-specific capability. I think it makes much more sense to allow the user to output a package of metadata in JSON (or whatever) format that could be used with some standard Mediawiki file uploader utility/client (or perhaps a Pattypan spreadsheet to feed to its "Validate & Upload" step?).
Metadata editing and reconciliation support for Mediawiki Commons sounds like a useful and logical extension, but file uploading seems like a stretch. It would be an entirely new, Mediawiki-specific capability. I think it makes much more sense to allow the user to output a package of metadata in JSON (or whatever) format that could be used with some standard Mediawiki file uploader utility/client (or perhaps a Pattypan spreadsheet to feed to its "Validate & Upload" step?).
Just preparing would render it quite useless from the user point of view. Breaking it out doesn't make much sense to me. The actual upload part should probably be an extension based on existing library that takes care of the actual upload and returns the Media ID (M12345) so open refine can add the structured data to that. See for example https://commons.wikimedia.org/w/index.php?title=File:Footpath_off_Lodge_Lane_-_geograph.org.uk_-_6065126.jpg&action=history
Yes, an extension feels right for this.
The Wikidata extension already contains a lot of MediaWiki-specific features (in fact, Wikibase-specific) and that is a pretty popular feature so I feel it is worth going the extra mile to offer seamless workflows to users, when that is possible.
That does not prevent us from shipping a Commons extension separately from the main software if we are worried this is integration is taking over the project too much (or for other reasons, such as size).
One could say that having the full CRUD with MediaWiki, including file upload, could be a very good addition to OR. That would make us direct competitors to PattyPan, as a side note.
But, being able to Create, Read, Update and Delete MediaWiki pages, could indeed serve many purposes. A nice extension packaging that make sens, but I guess it would end up integrated to OR, like the wikibase Wikidata extension.
I would love to see that — full CRUD to MediaWiki — just after full CRUD to WD.
Regards,
Antoine
One could say that having the full CRUD with MediaWiki, including file upload, could be a very good addition to OR.
I assume you're referring to "Create, read, update and delete" here and not the unpleasant substance. AFAIK OpenRefine currently only supports editing through Wikibase and not through the classic MediaWiki api. Adding that would be quite an endeavor. Not sure what the use case would be. Sounds like scope creep to me, at least for this task. Probably best to create a new task for that.
I assume you're referring to "Create, read, update and delete" here and not the unpleasant substance.
Yes, of course. We lack update & delete for WD currently.
AFAIK OpenRefine currently only supports editing through Wikibase and not through the classic MediaWiki api.
Right.
Adding that would be quite an endeavor. Not sure what the use case would be. Sounds like scope creep to me, at least for this task. Probably best to create a new task for that.
Well, if you plan to upload the file to Upload a file to Common, that requires MediaWiki file upload. And if you want to set the template to WC, that would be thru the modification of the MediaWiki page. I might be wrong, but I believe WC use MediaWiki file upload functionality and that templates are stored in the page data. I believe the same code is involved. I'm just saying it in a generic way.
Regards,
Antoine
Summary of the questions that are still up in the air for me concerning SDC's specific setup as a Wikibase instance:
https://hackmd.io/ZYWPoLrZSUSE9paRnXe7hg?view
Most helpful comment
Thanks for making a ticket about this. I have been thinking about that for a while - it's definitely something that I would like to see happening.
The way I would approach this is as follows:
This should be enough to enable editing of existing MediaInfo entities. As a user, the workflow would look like this:
Chick Corea & Stanley Clarke.jpg, ids are MediaInfo ids such as M74698470)Chick Coreato Q192465)Support for uploading new files would be interesting too. I have not thought a lot about the possible workflows for this, but here is one that comes to mind:
This requires adding support for file upload using the MediaWiki API (either within Wikidata-Toolkit or directly in OpenRefine).
Note that to treat Commons as a "generic" Wikibase install (and therefore rely on #1640) we need to figure out how to represent its relationship with Wikidata in a simple way (which amounts to some limited support for Wikibase federation). That might be slightly challenging given that Wikibase federation is still in its very early days, so the mechanisms needed to make it work seamlessly might not be fully in place or subject to change. Another option would be to make a separate Commons extension which would duplicate the Wikidata extension, tailored specifically for Commons. Such a duplication would likely make things harder to maintain, but it might be necessary if the federation between Commons and Wikidata has not been designed with a generic Wikibase federation scenario in mind.
Anyway I think it is important to discuss what the ideal workflows would look like from a user perspective (not having uploaded a lot of things to Common, I might not be aware of the real needs around this).