Openrefine: Fetch URLs should follow redirections

Created on 1 Jan 2018  路  5Comments  路  Source: OpenRefine/OpenRefine

Apparently, HTTP redirections are not followed by the URL fetcher: they probably should.

enhancement fetch urls

Most helpful comment

OK - the issue seems to be that the URLConnection library does not follow redirects when they use different protocols to the original request. In particular this means that if you request an http URI, and there is a redirect to an https URI, this will not be followed. Redirects which are to the same protocol (http->http or https->https) work as expected in OpenRefine.

This behaviour is deliberate, as it stops redirects taking the user from a secure protocol to an unsecured one (https->http). The second answer in this StackOverflow post gives a good explanation of why this is a bad idea https://stackoverflow.com/questions/1884230/urlconnection-doesnt-follow-redirect

What is less clear to me is whether there are any issues in supporting http->https redirects (which will be the far more common scenario) - this feels like it is increasingly common and we could support it without any security concerns.

Any other views?

All 5 comments

I'm already looking at the URL fetching as part of #1217 so I can look at this as well

OK - the issue seems to be that the URLConnection library does not follow redirects when they use different protocols to the original request. In particular this means that if you request an http URI, and there is a redirect to an https URI, this will not be followed. Redirects which are to the same protocol (http->http or https->https) work as expected in OpenRefine.

This behaviour is deliberate, as it stops redirects taking the user from a secure protocol to an unsecured one (https->http). The second answer in this StackOverflow post gives a good explanation of why this is a bad idea https://stackoverflow.com/questions/1884230/urlconnection-doesnt-follow-redirect

What is less clear to me is whether there are any issues in supporting http->https redirects (which will be the far more common scenario) - this feels like it is increasingly common and we could support it without any security concerns.

Any other views?

I agree there should not be any security concern with HTTP -> HTTPS. By the way, if URLConnection does this sort of nonsense, it might be worth migrating to a more modern library (https://stackoverflow.com/questions/1322335/what-is-the-best-java-library-to-use-for-http-post-get-etc). Ideally one that we already have in our dependencies, and in a dream world something that can be easily mocked for tests.

Looks like a work around for cross-protocol redirects was already implemented for data import in response to #748. Relevant commit is 4f7da9d18e05361a6b1135528394b59f1e13b244

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dantexier picture dantexier  路  4Comments

antoine2711 picture antoine2711  路  3Comments

ralcazar-oeg picture ralcazar-oeg  路  3Comments

katrinleinweber picture katrinleinweber  路  3Comments

lapoisse picture lapoisse  路  3Comments