Go-ipfs: Proposal: manifest file for the IPFS gateway

Created on 13 Apr 2019 · 13Comments · Source: ipfs/go-ipfs

Problem

Some people (myself included) are running websites through IPFS, and relying on gateways (like Cloudflare's) to serve them to users via HTTP(S).

The current IPFS protocol has some limitations when doing this. The biggest one is the inability to set custom headers that a HTTP web server might need, starting from Content-Type.

Proposed solution

I propose we create a manifest file that can be stored inside each folder added to IPFS. The manifest file is a YAML (or JSON) document, for example called .ipfs-gateway.yaml and could contain additional metadata that is relevant to IPFS gateways only. For example:

````yaml

Version of this manifest format

version: 1

Add rules to specific files/patterns

files:

name: 'logo.svg'
contentType: 'image/svg' # Set the Content-Type header
name: 'images/*.dng' # Use glob-style patterns
contentType: 'image/x-adobe-dng'
contentDisposition: 'attachment' # Set the Content-Disposition header
name: 'index.html'
contentSecurityPolicy: '...' # Set the Content-Security-Policy header
etag: 'abcdef123' # Set the ETag header
contentLanguage: 'en-us' # Set the Content-Language header
name: 'redirect.html'
redirect: 'other-page.html' # Set the Location header (requires a 3xx status code)

Configure additional options for the gateway

options:
# Redirects HTTP -> HTTPS traffic
alwaysUseHTTPS: true
````

When the IPFS gateway serves a folder, it needs to check if there's a manifest file, and apply the rules configured in it.

The manifest allows adding certain HTTP headers to files served by the gateway. We should explicitly whitelist the allowed headers, as in shared gateways there could be issues with other apps (e.g. imagine someone deployed an app that enabled HSTS, and that would impact the entire gateway).

The manifest file should be placed in the root of the folder added to IPFS. Since it's just another document published through the IPFS network, a change in the manifest file would result in the entire folder having a completely different hash, and this is by design.

Alternative proposals

There have been many users asking to implement custom metadata/headers for files inside IPFS, including on https://github.com/ipfs/faq/issues/224 I believe that, while the ask was for the ability to add metadata to files published on IPFS, in reality what users want/need could be better satisfied with a proposal like this.

Compared to adding support for metadata in IPFS, this proposal has many pros:

It's easier to implement as it doesn't require changes to the IPFS protocol, and users on old versions of IPFS would simply ignore the manifest.
The manifest file would be picked up by the IPFS gateway only, and that's good. Users who just request documents via the CLI don't need metadata anyways.
Adding metadata to each file is cumbersome when you try to add/pin files using the IPFS APIs and CLIs. For example, you couldn't just do ipfs add -r folder/ anymore, and the CLI would become complex fast.
The manifest file is extendible in the future, should we want to use it for other configurations for the gateway.
The manifest file becomes part of the folder published on IPFS, so a change in the document would lead to the folder having a completely different hash. I believe this should be considered a feature, as it maintains the immutability principle of IPFS.
Lastly, manifest files can be checked into source control together with the web app published on IPFS.

The cons:

It requires adding another file to the folder published
It doesn't work for files published on IPFS that aren't part of a folder

kinenhancement statuaccepted topigateway

Source

ItalyPaleAle

👍4

Most helpful comment

@kevincox I disagree on HSTS. It should be something set at the infrastructure-level and not at the app-level. Let's say also you deploy an app and serve it on that custom domain and you enable HSTS. The next version of the app does not contain that key in the manifest: now you can't roll-back HSTS easily.

As for format... My preference would be to support both YAML and JSON if possible. We could use the same file and parse it with a YAML parser... since YAML is a superset of JSON, there shouldn't be issues

ItalyPaleAle on 27 Feb 2020

👍3

All 13 comments

I like this approach but we have to be careful:

We should only allow safe to set headers
UnixFSv2 might start moving again soon, we might wast to use it for some of this

magik6k on 16 Apr 2019

👍2

We've been waiting for any sort of progress on UnixFS for a very long time. It's starting to get to the point where I'd rather have something that works now than something that is always at some point in the future. Specifically: mimetype support.

Concretely, there hasn't been changes to UnixFSv2 for over a year!
https://github.com/ipfs/unixfs-v2

dokterbob on 5 May 2019

I am not very familiar with UnixFSv2, but echoing @dokterbob I am wondering too why this would have a dependency on that?

ItalyPaleAle on 5 May 2019

We've done some soul searching recently on mime-types and realized that, really, they probably don't belong in the filesystem itself anyways. Even if we _did_ store them in the filesystem, you're right about tooling.

Given that, I like this proposal and would be happy to accept a patch (although, to be realistic, I may take a while to review it).

Changes from the original proposal:

I wouldn't support alwaysUseHTTPS.
- Using HTTPs should be a gateway level option.
- Really, all public gateways should use HTTPs.
I would use JSON. While it sucks to write, this is a part of a protocol. We can always add toml/yaml support later if this becomes too much trouble.

Thank you @ItalyPaleAle for taking the time to think this through.

Stebalien on 16 Nov 2019

👍2 🎉1

Very related:

Idea for customising 404 handling that raises the idea of a htaccess like mechanism for configuring the gateways: https://github.com/ipfs/go-ipfs/pull/4233#issuecomment-337359457
This proposal skates close to the cohosted service-worker of doom https://github.com/ipfs/go-ipfs/issues/4025 but dodges it as it doesn't let you set content types for raw files, only things contained in directories, but worth being aware of.

olizilla on 26 Nov 2019

👍2

I am in favor of something like this specification for gateways. One of our issues in adopting IPFS for our frontend is that we serve HSTS preload and CSP headers on our domains, making it necessary for these headers to be present on any gateway solution. While this can be solved by making our own gateway, that defeats the purpose of the decentralized nature of IPFS.

I do wonder about integrating HTTPS and custom certificates into gateways as well. Does anyone have a solution to that? Perhaps a DNS-level solution for the gateway to provide a valid certificate?

EDIT:

I support the alwaysUseHTTPS toggle as having valid HTTP -> HTTPS upgrades on the connection is required for HSTS preload websites
I would use YML, not JSON. While I understand the "purity" of JSON, YML is a commonly accepted configuration language (see: Serverless.com, Swagger, AWS CloudFormation), especially relevant to the group of programmers that would use IPFS gateways for content

rhyeal on 26 Nov 2019

@rhyeal I think that @Stebalien's point, which I do subscribe to, is that all decisions related to the transport layer, such as enabling or enforcing TLS, or adding HSTS, should be done outside of the gateway. Indeed, you likely don't want the ipfs-gateway directly exposed on the Internet, but you should proxy it with Nginx or something similar. You can then enable HTTPS and HSTS on the gateway.

ItalyPaleAle on 26 Nov 2019

👍3

Note that gateways are also used for serving sites on custom domains. For these domains the user may want to enforce things such as HSTS. I think that there should be a suggested set of allowed headers for shared-domain and custom-domain gateways can use whatever headers they would like.

kevincox on 27 Feb 2020

I would also highly recommend JSON for the manifest as it is ubiquitous. YAML parsers have notable variation in what they accept and could cause compatibility problems. We can consider automatically compiling YAML to JSON on upload however the actual protocol should be JSON.

kevincox on 27 Feb 2020

ItalyPaleAle on 27 Feb 2020

👍3

I guess I can be convinced about HSTS specifically, however it would be nice if there was a portable way to specify this in case I even need to move between gateways. However I can see the argument that it can be infrastructure specific.

kevincox on 28 Feb 2020

I think the advantages of YAML are small compared to allowing the user to write YAML and convert it to JSON before storing.

Note that YAML also has security concerns including insecure features (often disabled by default but this is a footgun that will keep happening over time) as well as Denial of Service options (a small YAML file can expand to consume arbitrary amounts of RAM). In the end you need to restrict to a subset of YAML which will cause confusion, interoperability concerns and security vulnerabilities (when people fail to make this restriction).

I strongly recommend that we stick to something simple (like JSON) for the protocol end of things.

I agree that YAML can be simpler to write, however we can benefit from that while still keeping the protocol efficient, simple and safe.

kevincox on 28 Feb 2020

This is looking really interesting.
Regarding redirects, web apps can perform redirects client-side, for SEO purposes, a server-side redirect would be preferred.
So would be nice to be able to set a status code in the config:

... 

# Add rules to specific files/patterns
files:
  - name: 'redirect.html'
    redirect: 'other-page.html' # Set the Location header (requires a 3xx status code)
    statusCode: 302 # default could be 301

...