Js-ipfs: Embracing web native FormData / File where possible

Created on 11 May 2020  ·  14Comments  ·  Source: ipfs/js-ipfs

As discussed at https://github.com/ipfs/js-ipfs/issues/2838#issuecomment-599688414 mainstream browsers today do not yet support passingReadableStream as request body. To workaround this limitation currently JS-IPFS encodes stream of content (on ipfs.add / ipfs.write) into multipart/form-data by buffering it:

https://github.com/ipfs/js-ipfs/blob/cb5b9ec420431e43c26d183f76e2bd5e4a8bdf77/packages/ipfs-http-client/src/add.js#L13-L23
https://github.com/ipfs/js-ipfs/blob/cb5b9ec420431e43c26d183f76e2bd5e4a8bdf77/packages/ipfs-http-client/src/lib/multipart-request.js#L58-L64

https://github.com/ipfs/js-ipfs/blob/cb5b9ec420431e43c26d183f76e2bd5e4a8bdf77/packages/ipfs-http-client/src/lib/to-stream.browser.js#L11-L21

This is problematic for webui for instance as some files may not fit into memory or exceed browser quota.


Modern browsers do have native FormData and File / Blob primitives that if used carefully can facilitate streaming. It appears that in the past browser loaded all of the FormData / File / Blob content during fetch despite not reading it. However that evidently had being fixed and modern browsers (Tested with Firefox, Chrome, Safari) no longer exhibit that problem.

Following example was used to verify this behavior
https://ipfs.io/ipfs/QmWVrTAeA1FqRSK3owpCfvB69wsz2Dbr2THK6ehn5uEKbp

On my Mac I do not observe visible increase memory when I uploading 5 Gig file (across mentioned browsers). I also observe streaming behavior as as file size change in WebUI is changing during upload:

Screen Shot 2020-05-08 at 3 33 55 PM
Screen Shot 2020-05-08 at 3 36 54 PM


This investigation suggests that there is an opportunity to avoid buffering overhead in browsers. It also appears that doing it for ipfs.files.write should be fairly straight forward since it always writes a single file.

For ipfs.add things appear more complicated, as headers need to be added to each multipart part (E.g. unix mode & mtime). Some investigation is required to see how visible that is with native FormData. There are also some concerns about supporting directory structures.

P1 diexpert browser kinmaintenance topiperf

All 14 comments

There are also some concerns about supporting directory structures.

As per HTTP API docs

>

Directory parts have a special content type application/x-directory. These parts do not carry any data. The part headers look as follows:

Content-Disposition: form-data; name="file"; filename="folderName"
Content-Type: application/x-directory

I suppose that is the problem, that can't be immediately addresses via browser APIs. However I wonder if it is worth consider browser limitation in HTTP API so that such metadata could be encoded differently. Given that new FormData().append(name, content, fileName) provides a way to specify name and filename metadata I see two options:

  1. filename may contains path delimiter /. Is not that good enough clue that it is a directory entry ?
  2. Maybe instead of name="file" something else e.g. name="directory" could be used ?

Did some investigation which leads me to believe that directory structures are not a problem. Following code

let data = new FormData()
let dir = new File([], 'folderName', {
  lastModified: new Date('1995-12-17T03:24:00'),
  type: 'x-directory'
})
let file = new File(['could uploded file'], 'folderName/demo.txt', {
  lastModified: new Date('1995-12-18T03:24:00'),
  type: 'plain/text'
})
data.append('file', dir)
data.append('file', file)


fetch('/post', {
  method: "POST",
  body: data
})

Produces following request body:

Content-Disposition: form-data; name="file"; filename="folderName"
Content-Type: x-directory


-----------------------------1783252016325641052066437817
Content-Disposition: form-data; name="file"; filename="folderName:demo.txt"
Content-Type: plain/text

could uploded file
-----------------------------1783252016325641052066437817--

However, it also turns out that lastModified metadata is ignored.

Now as far as I can tell there are three other headers (mode, mtime, mtime-nsecs) currently being used (although not documented in HTTP API docs):

https://github.com/ipfs/js-ipfs/blob/cb5b9ec420431e43c26d183f76e2bd5e4a8bdf77/packages/ipfs-http-client/src/lib/multipart-request.js#L29-L43

I wonder if we could allow providing metadata as prefix of the content e.g. following code

let data = new FormData()
let dir = new File([], 'folderName', {
  lastModified: new Date('1995-12-17T03:24:00'),
  type: 'x-directory'
})

let rawFile = new File(['could uploded file'], 'demo.txt')
let stat = [
  `mode: 0666`,
  `mtime: ${new Date('1995-12-18T03:24:00').getTime()}`,
  `\r\n`
]

let fileWithMeta = new Blob([stat.join('\r\n'), rawFile], {
  type: 'x-file'
})
data.append('file', dir)
data.append('file', fileWithMeta, 'folderName/demo.txt')


fetch('/post', {
  method: "POST",
  body: data
})

Produces following request body

Content-Disposition: form-data; name="file"; filename="folderName"
Content-Type: x-directory


-----------------------------72860279640036950501085036757
Content-Disposition: form-data; name="file"; filename="folderName/demo.txt"
Content-Type: x-file

mode: 0666
mtime: 819285840000

could uploded file
-----------------------------72860279640036950501085036757--

This also would not require breaking change to the HTTP API, but rather backwards compatible extension that recognizes x-file content type as encoding of headers + content.

Alternatively we could implement our own FormData encoder that uses Blob API and avoid reading underlying data e.g:

class FormDataWriter {
  parts: Blob[]
  constructor() {
    this.parts = []
  }
  append(name, value, filename, headers = {}) {
    const partHeaders = []
    const partFilename =
      filename != null
        ? encodeURIComponent(filename)
        : value instanceof Blob
        ? "blob"
        : value instanceof File
        ? encodeURIComponent(value.name)
        : null

    const contentDisposition =
      partFilename == null
        ? `form-data; name="${name}"`
        : `form-data; name="${name}"; filename="${partFilename}"`

    partHeaders.push(`Content-Disposition: ${contentDisposition}`)

    // File is subclass of Blob so we don't need to check for that.
    // Also if `ContentType` header is passed no need for adding one.
    if (value instanceof Blob && headers["Content-Type"] == null) {
      partHeaders.push(`Content-Type: ${value.type}`)
    }

    for (const [name, value] of Object.entries(headers)) {
      partHeaders.push(`${name}: ${value}`)
    }

    this.parts.push(new Blob([partHeaders.join("\r\n"), "\r\n", value]))
  }
  toBlob(boundary = `-----------------------------${nanoid()}`) {
    const chunks = []
    for (const part of this.parts) {
      chunks.push(`--${boundary}\r\n`)
    }
    chunks.push(`\r\n--${boundary}--\r\n`)
    return new Blob(chunks)
  }
}

With that former example could be updated to include custom headers with parts:

var data = new FormDataWriter()
let dir = new File([], 'folderName', {
  lastModified: new Date('1995-12-17T03:24:00'),
  type: 'x-directory'
})

let file = new File(['could uploded file'], 'demo.txt')

data.append('file', dir)
data.append('file', file, 'folderName/demo.txt', {
  mode: '0666',
  mtime: file.lastModified,
})

let boundary = `-----------------------------${nanoid()}`
fetch('/post', {
  method: "POST",
  body: data.toBlob(boundary),
  headers: {
    'Content-Type': `multipart/form-data; boundary=${boundary}`
  }
})

Which will produces following request body:

Content-Disposition: form-data; name="file"; filename="blob"
Content-Type: x-directory

-------------------------------UFeaHa0FIdCE_8YhUw-5T
Content-Disposition: form-data; name="file"; filename="folderName%2Fdemo.txt"
Content-Type: 
mode: 0666
mtime: 1589225417186

could uploded file
-------------------------------UFeaHa0FIdCE_8YhUw-5T--

However there is a major downside to this approach:

  • It implies that any user of IPFS HTTP API needs to import this or other custom FormData encoder to use the API. Which is why I think it would be better to take into account limitations of the web platform and make HTTP API compatible with it (as in proposed in previous comment, or possibly something else).

I decided to write this here instead of #3033 as this seems to be more on topic (streaming in browser during HTTP POST)

We use ipfs.add in both ipfs-companion and ipfs-webui (cc @rafaelramalho19).
Both apps use old API, but there is wip for switching to async iterators version.

The problem of buffering entire thing by ipfs-http-client (https://github.com/ipfs/js-ipfs/issues/2863) is still present in the new API and is really painful, as it impacts initial user experience when interacting with Web UI.

My thoughts on the matter:

  • It is CRITICAL to leverage FormData / File / Blob to facilitate streaming uploads in ipfs-http-client running in the browser

    • as noted by @Gozala above, it seems vendors fixed it and it is now possible to do it

    • ipfs.add and ipfs.files.write should just "do the right thing"

    • @achingbrain what are your thoughts if we special-case inputs with those types and use browser-specific normalization that does not break streaming? Note the idea here is to keep existing API intact (unlike https://github.com/ipfs/js-ipfs/issues/3033 which is about creating something new)

  • We are missing reliable progress reporting

    • right now ipfs.add reports 0% and then jumps to 100% after entire thing is accepted on the other end

  • Additional metadata attributes (mode and mtime) are optional:

    • we call them "unixfsv1.5" internally

    • context: https://github.com/ipfs/specs/issues/217

    • afaik unixfsv1.5 is not even implemented by go-ipfs (only js-ipfs in Node atm)

    • the way I see it:

    • update core-api docs to indicate that those are optional and opt-in

    • most of the users don't need them – if the choice is between supporting them and streaming upload in browser, we should pick streaming and update docs noting those attributes are supported only in Node for now.

Does this sound sensible?

  • We are missing reliable progress reporting

    • right now ipfs.add reports 0% and then jumps to 100% after entire thing is accepted on the other end

It still possible to get progress reporting by using old xhr.onreadystate progress events. API is not as ergonomic as fetch but it can send same data types, although I have not verified that it does not load whole thing into memory before upload. I'm assuming it will stream, but needs verification to be sure.

Additional metadata attributes (mode and mtime) are optional:

I do not believe we need to choose between metadata and no buffering. I believe my last two comments illustrate how we could get both. That being said I believe it would require either:

  1. Some changes to the HTTP API so metadata is encoded / provided differenly.
  2. Custom FormData encoder implementation that builds up request body as a blob.

It is worth considering trade-offs between those two, I personally would prefer to keep client as lean & simple possible so that use of HTTP API requires no client library, it's just there for the convenience.

Breaking changes to HTTP API /api/v0 are something we should avoid, if possible.
(People still struggle with switch to POST-only API, slowing down lib upgrades etc)

If support for opt-in mode and mtime are the only reason to change HTTP API, I'd go as far as to say that we could simply not support those in browser context and nobody would care. AFAIK it is a niche feature used in CLI/daemon contexts.

But the need for a fix is there.
I often stumble upon new prior art of userland support/workarounds for FormData etc:

  • https://www.npmjs.com/package/ipfs-mini (2.76 kB compressed)
    > [..] in the browser, it uses a very light FormData Blob handling proceedure [..]

I believe that, as a bare minimum, we need to improve ipfs-http-client's ipfs.add in browser as part of Pinning Service integration (https://github.com/ipfs/ipfs-gui/issues/91) work in Q3.

Importing files with ipfs-webui (over HTTP API) should be fast and work with files >4GB.
(UC: I want to drag&drop my 4K video into WebUI and then pin it to a remote service)

Breaking changes to HTTP API /api/v0 are something we should avoid, if possible.
(People still struggle with switch to POST-only API, slowing down lib upgrades etc)

If support for opt-in mode and mtime are the only reason to change HTTP API, I'd go as far as to say that we could simply not support those in browser context and nobody would care. AFAIK it is a niche feature used in CLI/daemon contexts.

I should point out that changes I mention are not breaking changes, it would just allow passing those options in an alternative way. If users choose to not pass those, or pass them in an old fashion it should still work.

I will try to drive this effort as my time allows (right now I'm blocked on reviews). With that I intend to do following:

  1. Add this item to the agenda of IPFS Core Implementations Weekly meeting.
  2. Schedule a Design Review Proposal

    • We need a decision on what if any changes could be done to HTTP API to make it compatible with browser native FormData.

    • Get a consensus on how ipfs.add / ipfs.file.write should behave here.

    • How do we tests this ? I'm hesitant to adding 4gig of data to our tests.

As with #3022 I remain very skeptical of keeping ipfs.add API as is in a hope that provided input is going to allow for optimizations described. That is because it is just too easy to pass an input that can defeat optimizations described here.

Unless I'm missing something I think the simplest thing to do here is to write a browser-specific version of /packages/ipfs-http-client/src/lib/multipart-request.js that probes the input - if it's a Blob or { content: Blob } or an iterable (not async) of either of them, use the native FormData object as proposed here. If it's not, fall back to the default normalisation. That should cover the drag & drop use case @lidel is interested in and we can apply further optimisations later down the line as and when necessary.

How do we tests this ? I'm hesitant to adding 4gig of data to our tests.

There's always the age old hack of using it-buffer-stream or something similar to generates buffers filled with 0s - 4 gigs of data will be transferred but IPFS block de-duping will only write one block to the blockstore and no data files need to be added to the tests.

i could have this working in Safari, with near 300mb video, crash in chrome, brave, firefox...
i think Safari is having 300mb buffer limit and the rest of the browser having 150mb buffer limit.

i think FormData is a must for browser... all the buffer stream method won't work in browser when it reach 300mb.
you see the Safari complaint about memory when it is buffering the last file which is 271mb itself.

https://youtu.be/J4CvwfRpHeA

  const updateProgress = (byteSent, fileName, fileSize) => {
    uploadStatus = uploadStatus + "."
    percent = (byteSent / fileSize) * 100
    //console.log(percent + "%")
  }

  const createFile = (file) => {
    return {
      path: file.name,
      // content could be a stream, a url, a Buffer, a File etc
      content: file
    }
  }

  const streamFiles = async (node, files) => {

    let chunckSize = 1048576;

    for (const file of files) {
      uploadStatus = uploadStatus + "\n"

      const fileName = file.name
      const fileSize = file.size

      // Create a stream to write files to
      const stream = new ReadableStream({
        start(controller) {
          controller.enqueue(createFile(file))
          // When we have no more files to add, close the stream
          controller.close()
        }
      }, {
        highWaterMark: chunckSize
      })


      for await (const data of node.add(stream, {
        pin: false,
        chunker: 'size-' + chunckSize,
        wrapWithDirectory: false,
        progress: (byteSent) => updateProgress(byteSent, fileName, fileSize)
      })) {

        //reaching end, will return from there.
        if (data.path.indexOf('/') === -1 && data.path !== '') {

          try {

            uploadStatus = uploadStatus + "\n" + data.path + ": " + data.cid

            const src = "/ipfs/" + data.cid
            const dst = "/" + data.path
            node.files.cp(src, dst)

          } catch (err) {
            //throw Object.assign(new Error('Folder already exists.'), { code: 'ERR_FOLDER_EXISTS' })
          }
        }
      }
    }

  }

For anyone following along, there are two PRs that aim to (among other things) improve performance of ipfs.add in ipfs-http-client in browser context by addressing findings from this issue in alternative ways:

Was fixed by #3184

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gitGksgk picture gitGksgk  ·  3Comments

beingmohit picture beingmohit  ·  3Comments

daviddias picture daviddias  ·  4Comments

daviddias picture daviddias  ·  3Comments

daviddias picture daviddias  ·  3Comments