Go-ipfs: Update subdomain redirection logic to only engage on browsers

Created on 11 Mar 2020  路  9Comments  路  Source: ipfs/go-ipfs

From conversation with @stebalien and @jbenet we should only redirect to a subdomain gateway if the user is detected to be a browser node so as to avoid breaking curl and wget usage (curl doesn't auto-redirect without passing -L and wget won't span hosts on redirect). Note this is needed for 0.5 to avoid breaking existing usage.

Required for https://github.com/ipfs/go-ipfs/issues/6776

kinenhancement

All 9 comments

Ex what ipfs-search does:
curl http://localhost:8081/ipfs/QmS4ustL54uo8FzR9455qaxZwuMiUhyvMcX9Ba8nUH4uVv/readme

Potential solution: Just return the page with the redirect response but set the "Clear-Site-Data" header. CURL will download the page and won't follow the redirect (and will even politely exit with a zero status by default). Browsers should follow the redirect.

@Stebalien I implemented that idea (Option A - see https://github.com/ipfs/go-ipfs/pull/6982). Let me know what you think.

Personally I don't feel too confident with it, produces valid responses, but feels like a hack that could bite us in the future.

What if we go with a much simpler hack (Option B - see https://github.com/ipfs/go-ipfs/pull/6984)

  • see if User-Agent match ^(curl|wget)/i

    • add Clear-Site-Data header

    • disable subdomain, no redirect, return regular HTTP 200 response

Would it be worse than A? I feel it is a bit safer and more explicit behavior.

My concern is that there are probably other tools that don't follow redirects either (although this looks rare). Option A is the most general option available, as far as I know, and it doesn't rely on user agent sniffing.

If it does bite us, I believe we can switch to option B in the future, right?

I feel it is a bit safer and more explicit behavior.

How?


I'm fine just white listing curl if option A causes issues, it's just that user-agent sniffing makes me uncomfortable.

I feel [user-agent sniff] is a bit safer and more explicit behavior.
How?

In my mind User-Agent sniff (B) would be bit "safer" because if it breaks something, user can change/remove User-Agent sent to go-ipfs and control gateway behavior that way:

$ curl -H "User-Agent:" 

If there is an issue with redirect+payload (A) user can't easily fix it on the client side alone, they need to either wait for new go-ipfs or put a reverse proxy in front of go-ipfs and fix responses on the fly.

To be fair both hacks feel bit bad to me and I'd rather ask people to fix their curl scripts by adding -L or simply switching from localhost to 127.0.0.1 which remains path gateway anyway (C).

:point_right: that being said, if I had to pick between A and B I'd probably try to make A work, just because it feels more magical ;-)

I'm fine just white listing curl if option A causes issues, it's just that user-agent sniffing makes me uncomfortable.

I managed to fix errors in PoC for A (https://github.com/ipfs/go-ipfs/pull/6982), so right now both options are available

If [A] does bite us, I believe we can switch to option B in the future, right?

Yes, we can switch from A to B at any time, code is in https://github.com/ipfs/go-ipfs/pull/6984

If there is an issue with redirect+payload (A) user can't easily fix it on the client side alone, they need to either wait for new go-ipfs or put a reverse proxy in front of go-ipfs and fix responses on the fly.

Well, my thinking here is that this is _less_ likely to go wrong because there's nothing special about it. I guess the worst-case scenario is if someone assumes that the HTTP response body for a redirect is always some kind of error message.

Sounds like A (301+payload) is the way to go then?

@Stebalien is it ok if I rebase&merge A (https://github.com/ipfs/go-ipfs/pull/6982) into subdomain PR at https://github.com/ipfs/go-ipfs/pull/6096, and close the other PR?

For now, yes.

For the record A (https://github.com/ipfs/go-ipfs/pull/6982) was merged with https://github.com/ipfs/go-ipfs/pull/6096 and released with go-ipfs 0.5.0.


To illustrate: asking dweb.link for CID bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq
returns HTTP 301 to subdomain, with text payload: hello:

$ curl -sD - https://dweb.link/ipfs/bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq
HTTP/2 301                           
server: nginx
date: Wed, 13 May 2020 19:05:41 GMT
content-type: text/plain; charset=utf-8
content-length: 5
accept-ranges: bytes
access-control-allow-methods: GET
cache-control: public, max-age=29030400, immutable
clear-site-data: "cookies", "storage"
etag: "bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq"
last-modified: Thu, 01 Jan 1970 00:00:01 GMT
location: https://bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq.ipfs.dweb.link/
x-ipfs-gateway-host: gateway-bank1-fra2
x-ipfs-path: /ipfs/bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq
access-control-allow-origin: *
access-control-allow-methods: GET, POST, OPTIONS
access-control-allow-headers: X-Requested-With, Range, Content-Range, X-Chunked-Output, X-Stream-Output
access-control-expose-headers: Content-Range, X-Chunked-Output, X-Stream-Output
x-ipfs-pop: gateway-bank1-fra2
strict-transport-security: max-age=31536000; includeSubDomains; preload

hello
Was this page helpful?
0 / 5 - 0 ratings

Related issues

Jorropo picture Jorropo  路  3Comments

magik6k picture magik6k  路  3Comments

Kubuxu picture Kubuxu  路  3Comments

magik6k picture magik6k  路  3Comments

Mikaela picture Mikaela  路  3Comments