Openrefine: Add OAuth support to Wikidata extension

Created on 18 May 2018  Â·  30Comments  Â·  Source: OpenRefine/OpenRefine

With @lucaswerkmeister we are exploring hosting OpenRefine instances for the Wikidata community. As part of that, we would need to migrate Wikidata authentication to OAuth.

https://github.com/scribejava/scribejava/pull/852 adds support for MediaWiki OAuth to the scribejava OAuth library. We also need support in Wikidata-Toolkit (see https://github.com/Wikidata/Wikidata-Toolkit/issues/268).


_This is a proposed Google Summer of Code project in 2020. If you are not planning to apply for an internship via GSoC, we kindly ask that you do not work on this task yet, in order to leave the floor to potential interns._

enhancement export gsooutreachy multi-user support wikibase

Most helpful comment

Hello,
I started studying OAuth and its different flows. I realized OAuth is about authorization and not authentication, so basically what will be happening is "asking for permission to maybe access or modify stuff".
Going through all the article, when I think about every process happening, in super simple words it is an implementation of hash-maps where we'll get some user to enter his credentials to our client application which in turn works with all that exchanging of consumer key/secret and access key/secret with authorization server which gives the final approval based on a flag.
I might have got it all wrong, please correct me if there is some error in what I understood.

Now coming to the solution part, the two main types of flows I saw in every article were three-legged OAuth flow and two-legged OAuth flow, where I understood that the two-legged flow differs only as the end-user authorization isn't involved. In my opinion when it comes to Wikidata extension, the three-legged OAuth support is better as it works well in the case when the user doesn't prefer remembering the credentials all the time except for the first time(while registering), so the time consumed in authorization is saved ,thus giving a better user experience and faster on-boarding along with more security.

What is your opinion on this?

All 30 comments

Note that, as far as I’m aware, it’s not yet clear what this will mean for the instances we’re not hosting :) I currently see two possibilities:

  1. Keep username/password authentication and use that if no OAuth consumer has been configured.
  2. Automatically register an owner-only consumer (which does not require review by Wikidata administrators) and use that if no other OAuth consumer has been configured.

(The advantage of option 2 would be that we don’t need to store the user’s password indefinitely, we only need it once to register the owner-only consumer and then store the consumer/access key/secret instead. And I suppose this would automatically group all edits made from that instance via the OAuth revision tag.)

BTW the Phabricator task for a hosted OpenRefine instance is T194767.

I thought about the workflow for the consumer-only tokens and it still looks a bit blurry to me. To request an owner-only token, I assume you must be logged in as the owner. So, the first time a user uses OpenRefine, they would first need to login with login / password via OpenRefine so that we can request n owner-only token, and then have them go through the OAuth process with these credentials… so they would log in twice.

Yeah, and apparently owner-only tokens don’t support the usual “three-legged OAuth flow”, you have to use the “one-legged OAuth flow” instead, so we won’t even be able to reuse all the code from full OAuth… it’s probably much simpler to just stick to normal username/password authentication for local instances.

Okay. By the way, do you know if it is possible to edit with a bot flag via OAuth? Maybe there is a particular scope to request for that, but at least for https://tools.wmflabs.org/oabot/ it seems that the bot flag is not added to the edits… So we should make sure to get the right scopes when registering the consumer.

I think you have to explicitly set the bot parameter in action=edit or action=wbeditentity, and have the bot right (usually: be part of the “Bots” group).

yes, but even when both conditions are met, User:OAbot's edits made via OAuth were not flagged.

Are you sure that’s not a bug in OAbot? You’re setting data['bot'] = '', and it looks like that might cause Python requests to completely drop the parameter:

$ python3
Python 3.6.5 (default, Apr  1 2018, 05:46:30) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> data = {'action':'edit'}
>>> if 'yes':
...     data['bot'] = ''
...     
>>> r = requests.post("http://httpbin.org/post", data=data)
>>> print(r.text)
{"args":{},"data":"","files":{},"form":{"action":"edit"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Content-Length":"11","Content-Type":"application/x-www-form-urlencoded","Host":"httpbin.org","User-Agent":"python-requests/2.18.4"},"json":null,"origin":"158.109.94.211","url":"http://httpbin.org/post"}

Notice how the form key only has action: edit, but no bot.

data = (('action', 'edit'), ('bot', '')) (list of tuples instead of dict) works, though. (That is, it sends an empty bot param in the request – I don’t know whether MediaWiki will interpret it correctly! Probably safest to simply use bot=1.)

Thought: I think it would be best to use a consumer with a fixed callback URL (OpenRefine just sends oauth_callback=oob to Special:OAuth/initiate), so that OpenRefine doesn’t need to know its public hostname. That way, only two configuration variables should be necessary: consumer key and consumer secret.

Makes sense, so if the callback is supplied when applying for OAuth credentials, it does not have to be supplied by OR during the OAuth process

Hi, I've been studying the gsoc idea oauth-support-for-wikidata-extension these days.

it’s probably much simpler to just stick to normal username/password authentication for local instances.

I do agree with @lucaswerkmeister on this. OR is used as a local software in most situation, so it's important to keep it easy.

I think it's better that we just support “three-legged OAuth flow”.

To be more specific, for the host, we can offer consumer key and consumer secret configuration options in the OR configuration file refine.ini (or a better place?). It's the host's responsibility to register its OR instance at Special:OAuthConsumerRegistration/propose and get the key/secret pair.

As for the users, we can have a use oauth option at the Manage Wikidata account interface (which should be highly recommended if the OR instance has the corresponding consumer key/secret).

To make Wikidata extension configurable to work against other Wikibase instances, honestly, I don't have a clear idea on that for now, but I'm confident that I can make it work in ther future.

I'm thinking about the expected outcomes of oauth-support-for-wikidata-extension. We have two options:

  1. We provide embedded OAuth credentials for wikidata extension for release versions, just like what we do with gdata extension. With #2392 , we can now provide our own credentials to override the default ones. We can do the same with wikidata extension. So that users can, but not have to acquire and use their own credentials. The disadvantage is that this is not a good practice in principle, users are recommended to get their own credentials.

  2. We don't provide embedded OAuth credentials. This way, OR hosts need to acquire their own credentials then (which requires review by Wikidata administrators). But in principle, this is better.

So what are your suggestions on these two options?

For Wikidata, I would go for option 2. The feature will still be very useful to users who care about not providing their password to OR and to those who want to host instances for others to use.

The Android Commons App also considered implementing OAuth and they decided against it because of this issue (https://phabricator.wikimedia.org/T179519, https://github.com/commons-app/apps-android-commons/issues/819).

The Android Commons App also considered implementing OAuth and they decided against it.

Seems they canceled the plan mainly because they don't have a backend. That won't be a problem for us.

@wetneb With scribejava/scribejava#852, scribejava now supports OAuth 1.0a with MediaWiki, but OAuth 2 is not supported. Accordingly, OAuthApiConnection only supports OAuth 1.0a. So it's easy to add OAuth 1.0a support for wikidata extension.

Do you think we should support OAuth 2 as well?

That's something I would let GSoC applicants propose what they think is best :)

Fine, I'll try to find out which version is more suitable for OR :)

Hello,
I started studying OAuth and its different flows. I realized OAuth is about authorization and not authentication, so basically what will be happening is "asking for permission to maybe access or modify stuff".
Going through all the article, when I think about every process happening, in super simple words it is an implementation of hash-maps where we'll get some user to enter his credentials to our client application which in turn works with all that exchanging of consumer key/secret and access key/secret with authorization server which gives the final approval based on a flag.
I might have got it all wrong, please correct me if there is some error in what I understood.

Now coming to the solution part, the two main types of flows I saw in every article were three-legged OAuth flow and two-legged OAuth flow, where I understood that the two-legged flow differs only as the end-user authorization isn't involved. In my opinion when it comes to Wikidata extension, the three-legged OAuth support is better as it works well in the case when the user doesn't prefer remembering the credentials all the time except for the first time(while registering), so the time consumed in authorization is saved ,thus giving a better user experience and faster on-boarding along with more security.

What is your opinion on this?

The Android Commons App also considered implementing OAuth and they decided against it.

Seems they canceled the plan mainly because they don't have a backend. That won't be a problem for us.

I approve!

hello I am Aman Singh Rajput from IIT BHU VARANASI. I am studying OAuth Support and also done some case study on its features I am finishing my proposal for the GSOC . If some one found some issue please let me know

Hello,@wetneb ,I want to contribute to this project through Google Summer of Code 2020. I'm a university students major in Software Engineering in ShangHai, China. I have teamwork experience in Web programed based on Vue and Spring, I have basic fronted skills but more familiar with the backend.I have upload my proposal in GSOC's website, If possible please please give me some guide. Here is my e-mail: [email protected], thank you.

@afkbrb We'll also need to have a test case added to support the redirection endpoint URI for both /oauth2/authorize and /oauth2/access_token

  • redirect_uri (optional) - if present, must match the URI that was set when client was registered exactly

Additional context: https://tools.ietf.org/html/rfc6749#section-3.1.2

@lucaswerkmeister Authorization servers can have metadata describing their
configuration. I poked around and didn't see anything about Authorization Server Metadata Request. Does Mediawiki support that, either through HTTPS or values in JWT?

Additional context: https://tools.ietf.org/html/rfc8414

@lucaswerkmeister Authorization servers can have metadata describing their
configuration. I poked around and didn't see anything about Authorization Server Metadata Request. Does Mediawiki support that, either through HTTPS or values in JWT?

No idea. (It sounds like it’s an OAuth 2.0 only thing? I haven’t looked into MediaWiki support for that version at all yet.)

We'll also need to have a test case added to support the redirection endpoint URI for both /oauth2/authorize and /oauth2/access_token

@thadguidry , we're going to use OAuth 1.0a for authorization, since the OAuth version supported by Wikidata-Toolkit is 1.0a. OAuth 1.0a uses oauth_callback for redirection, see https://oauth.net/core/1.0a/#obtain_request_token. I'll add test cases for it.

@afkbrb That's something that was missing from this discussion... what your final decision was for supporting OAuth 2.0 since @wetneb said he left it in GSoC applicants hands. No problem. OAuth 1.0a it shall be then, for now. Thanks for clarifying.

Fixed by #2661

As discussed under #2661 , three-legged OAuth support for the Wikidata extension is not useful now or in the near future, since it requires the multi-user support of OpenRefine, which still has a long way to go.

2661 doesn't achieve all the goals of this issue, it just adds two-legged OAuth support (enable using owner-only consumer to login). The three-legged OAuth support is added in the half of #2661, but since it's not useful, the code has been deleted.

Actually, perhaps I was hasty in closing this. The enhancements discussed in #2729 might be enough to get a useful level of support for three legged OAuth without full multi-user support, but it also sounds like actual multi-user support is needed for the use case being discussed of a shared resource.

Was this page helpful?
0 / 5 - 0 ratings