From conversation with @stebalien and @jbenet we should only redirect to a subdomain gateway if the user is detected to be a browser node so as to avoid breaking curl and wget usage (curl doesn't auto-redirect without passing -L and wget won't span hosts on redirect). Note this is needed for 0.5 to avoid breaking existing usage.
Required for https://github.com/ipfs/go-ipfs/issues/6776
Ex what ipfs-search does:
curl http://localhost:8081/ipfs/QmS4ustL54uo8FzR9455qaxZwuMiUhyvMcX9Ba8nUH4uVv/readme
Potential solution: Just return the page with the redirect response but set the "Clear-Site-Data" header. CURL will download the page and won't follow the redirect (and will even politely exit with a zero status by default). Browsers should follow the redirect.
@Stebalien I implemented that idea (Option A - see https://github.com/ipfs/go-ipfs/pull/6982). Let me know what you think.
Personally I don't feel too confident with it, produces valid responses, but feels like a hack that could bite us in the future.
What if we go with a much simpler hack (Option B - see https://github.com/ipfs/go-ipfs/pull/6984)
User-Agent match ^(curl|wget)/iClear-Site-Data headerWould it be worse than A? I feel it is a bit safer and more explicit behavior.
My concern is that there are probably other tools that don't follow redirects either (although this looks rare). Option A is the most general option available, as far as I know, and it doesn't rely on user agent sniffing.
If it does bite us, I believe we can switch to option B in the future, right?
I feel it is a bit safer and more explicit behavior.
How?
I'm fine just white listing curl if option A causes issues, it's just that user-agent sniffing makes me uncomfortable.
I feel [user-agent sniff] is a bit safer and more explicit behavior.
How?
In my mind User-Agent sniff (B) would be bit "safer" because if it breaks something, user can change/remove User-Agent sent to go-ipfs and control gateway behavior that way:
$ curl -H "User-Agent:"
If there is an issue with redirect+payload (A) user can't easily fix it on the client side alone, they need to either wait for new go-ipfs or put a reverse proxy in front of go-ipfs and fix responses on the fly.
To be fair both hacks feel bit bad to me and I'd rather ask people to fix their curl scripts by adding -L or simply switching from localhost to 127.0.0.1 which remains path gateway anyway (C).
:point_right: that being said, if I had to pick between A and B I'd probably try to make A work, just because it feels more magical ;-)
I'm fine just white listing curl if option A causes issues, it's just that user-agent sniffing makes me uncomfortable.
I managed to fix errors in PoC for A (https://github.com/ipfs/go-ipfs/pull/6982), so right now both options are available
If [A] does bite us, I believe we can switch to option B in the future, right?
Yes, we can switch from A to B at any time, code is in https://github.com/ipfs/go-ipfs/pull/6984
If there is an issue with redirect+payload (A) user can't easily fix it on the client side alone, they need to either wait for new go-ipfs or put a reverse proxy in front of go-ipfs and fix responses on the fly.
Well, my thinking here is that this is _less_ likely to go wrong because there's nothing special about it. I guess the worst-case scenario is if someone assumes that the HTTP response body for a redirect is always some kind of error message.
Sounds like A (301+payload) is the way to go then?
@Stebalien is it ok if I rebase&merge A (https://github.com/ipfs/go-ipfs/pull/6982) into subdomain PR at https://github.com/ipfs/go-ipfs/pull/6096, and close the other PR?
For now, yes.
For the record A (https://github.com/ipfs/go-ipfs/pull/6982) was merged with https://github.com/ipfs/go-ipfs/pull/6096 and released with go-ipfs 0.5.0.
To illustrate: asking dweb.link for CID bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq
returns HTTP 301 to subdomain, with text payload: hello:
$ curl -sD - https://dweb.link/ipfs/bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq
HTTP/2 301
server: nginx
date: Wed, 13 May 2020 19:05:41 GMT
content-type: text/plain; charset=utf-8
content-length: 5
accept-ranges: bytes
access-control-allow-methods: GET
cache-control: public, max-age=29030400, immutable
clear-site-data: "cookies", "storage"
etag: "bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq"
last-modified: Thu, 01 Jan 1970 00:00:01 GMT
location: https://bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq.ipfs.dweb.link/
x-ipfs-gateway-host: gateway-bank1-fra2
x-ipfs-path: /ipfs/bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq
access-control-allow-origin: *
access-control-allow-methods: GET, POST, OPTIONS
access-control-allow-headers: X-Requested-With, Range, Content-Range, X-Chunked-Output, X-Stream-Output
access-control-expose-headers: Content-Range, X-Chunked-Output, X-Stream-Output
x-ipfs-pop: gateway-bank1-fra2
strict-transport-security: max-age=31536000; includeSubDomains; preload
hello