Please specify what version of the library you are using: [ 2.0.13 ]
Please specify what version(s) of SharePoint you are targeting: [ SPO ]
The man question: How to upload >2GB file to SPO without loading it in memory with fs.readFile()?
I do not see a good example of uploading very large local files with pnpjs under nodejs.
With the help of @koltyakov I managed to adapt the example of using nodejs streams shown here:
https://pnp.github.io/pnpjs/sp/files/#adding-a-file-using-nodejs-streams
to use fs.createReadStream(), but I hit the SharePoint Online limitation -> File size (request body) is limited to 262144000 bytes (more info: https://github.com/koltyakov/pnp-upload/issues/6).
Unfortunately, that's 262MB so not really workable for my 2GB and bigger files.
Up to 2 GB can be uploaded with addChunks() but that requires loading in memory first in order to be chunked.
Curious what you think we can do if the service has a 2 GB limit?
Edit: Not being glib, but it is a service limitation. Did you have an idea or proposal? So far as I know that is the current file size upload limit. We support addChunked as you note, which can support files of any size logically, so you are hitting the service limit beyond that.
Nik is referencing slightly bit different thing.
File size limit for a file in SPO library is greater than 2 GB (10 or 100 GB). Yet, with ordinary file.add even providing a readable stream to the method the limit is 250 MB, as this is a max size per a request body. Here, we can do nothing.
With addChunks, it鈥檚 possible uploading up to the limit of 10/100 GB (a brave assumption), but... addChunks uses Blob/ArrayBuffer therefore to upload a file its whole body should be loaded to the memory first then passes to the method. With a readable stream, next chunk read could happening in a stream version of addChunks (which I could create and PR) before calling continueUpload/other internal methods used in addChunks. So the memory consumption won鈥檛 be sized to a file volume but a single chunk.
Before creating a stream version of addChunks, the open question for a brief research: maybe there is any other simple way uploading file as stream and not facing body length limit. E.g. providing content type header (application/octet-stream;) or something.
Different tech, but still using the same REST requests: https://github.com/pnp/pnpcore/blob/dev/src/sdk/PnP.Core/Model/SharePoint/Core/Internal/FileCollection.cs#L84 shows how we do a chunked upload in the PnP Core SDK. https://pnp.github.io/pnpcore/using-the-sdk/files-large.html documents working with large files (SPO supports up to 100GB files).
If you can read the nodejs stream via a buffer it should not load the full file in memory, just the size of your buffer with some overhead...at least that's how it works in .NET, one can upload/download very large files without any noticeable memory impact.
Thanks @jansenbe! In PnPjs we use the same APIs, the only diff is the input for a file which is historically not a stream, but Blob/ArrayBuffer. A version of the method with stream reader is an option here, planning to add such gently without effecting existing users code bases.
For a collection of different tech solving the same things: https://github.com/koltyakov/gosip/blob/master/api/filesChunked.go#L30 =)
whow...amazing piece of work @koltyakov !
These are all great and I will review - but curious with streams we would then be node only, yes? Which is fine, just thinking it through. Maybe we can add something to nodejs package extending the sp lib like we did for stream support previously.
@patrick-rodgers thanks for considering my request! And for your first question - @koltyakov answered exactly what I had in mind.
From a nodejs client perspective, it is up to you guys how will implement it, but will be great, to have it as part of PnPjs. The go lang example is my back up option, does not seem hard to be implemented with custom code but the better spot is to be part of PnP.
Thanks again, everyone!
Hi all, created an extension for Node streams. The backward compatibility is kept. Added tests as well.
@patrick-rodgers could you please review is the way I implemented the extension methods is aligned with global vision?
In the stream implementation there might be a room for enhancements regarding process callback: with a stream there is no way for getting the size rather than externally (e.g. with fs.stats). I intended to keep all the interfaced unchanged, so didn't introduce a new argument for the size. As a result, currently, process callback receives IFileUploadProgressData without fileSize and totalBlocks. But for the folks who need showing a process it's possible by getting and comparing the size (received externally) and currentPointer.
Closing this as @koltyakov's great work on the stream methods was released as part of 2.1.0. Thanks!
It's time for me to update pnp-upload sample as well! But I'm lazy so will only do it over the weekend.
Updated the sample. I found a required thing is aligning highWaterMark with upload method chunkSize. Also suggested a way of bypassing file size to progress callback.
const ticker: (data: IFileUploadProgressData) => void = 'function' === typeof progress ? (() => {
const stats = fs.statSync(filePath);
// In a stream object there is no `size` property, so IFileUploadProgressData object can't know
// `fileSize` and `totalBlocks` without externally provided size received e.g. with fs.stat.
// This wraps provided `progress` callback and enriches data argument to contain missed props.
return (data: IFileUploadProgressData): void => {
data.fileSize = stats.size;
data.totalBlocks = data.totalBlocks ??
parseInt((data.fileSize / chunkSize).toString(), 10) + ((data.fileSize % chunkSize === 0) ? 1 : 0);
progress(data);
};
})() : null;
const fileName = path.parse(filePath).name + path.parse(filePath).ext;
// Important: highWaterMark must be equal to chunkSize
const rs = fs.createReadStream(filePath, { highWaterMark: chunkSize });
return this.web.getFolderByServerRelativeUrl(folderRelativeUrl)
.files.addChunked(fileName, rs, ticker, true, chunkSize);
Most helpful comment
whow...amazing piece of work @koltyakov !