It seems that upload stream backpressure is not handled properly.
In the following example, the stream is consumed very quickly (100% reached in 2 seconds) whereas the upload take a while... (that is expected with my ADSL connexion). This mean that the whole file is loaded into memory then sent gradually, which disables the benefit of the streams.
Using your example:
var fileName = 'ReactOS-0.4.3-iso.zip'; // test file, ~100MB
const fileSize = fs.statSync(fileName).size;
const res = await drive.files.create(
{
requestBody: {
// a requestBody element is required if you want to use multipart
},
media: {
body: fs.createReadStream(fileName),
},
},
{
// Use the `onUploadProgress` event from Axios to track the
// number of bytes uploaded to this point.
onUploadProgress: evt => {
const progress = (evt.bytesRead / fileSize) * 100;
process.stdout.clearLine();
process.stdout.cursorTo(0);
process.stdout.write(`${Math.round(progress)}% complete`);
},
}
);
console.log(res.data);
Configuration:
Node.js v10.10.0
googleapis 33.0.0
OS windows7
Yeah, I can confirm this is real. This is a bug with axios where the streams aren't exactly piped around correctly.
Same problem here!
Node.js v10.13.0
googleapis 35.0.0
Also experiencing the same issue. This bug basically renders the progress feature useless and create a big confusion. As you can image how puzzled I was when it reported 100% in 2 seconds when I tried to upload a 1 GB file.
Is there any workaround at the moment?
@trulyshelton Yes there is, set maxRedirects to 0 in Axios' request config, for more info check this issue: Axios Backpressuring Issue
So your code can be something like this:
drive.files
.create(
{
resource: {
name,
mimeType,
},
media: {
mimeType,
body,
},
},
// Workaround axios' issue of streams incorrect backpressuring, issue: https://github.com/googleapis/google-api-nodejs-client/issues/1107
{ maxRedirects: 0 },
)
Yeah, I can confirm this is real. This is a bug with axios where the streams aren't exactly piped around correctly.
@JustinBeckwith I guess we can consider that its not a bug and close the issue since its actually a feature in Axios to allow following redirects by buffering the entire file to account for that, which can be turned off by setting the maxRedirects to 0 in the config of the request and streams will be piped correctly.
@omardoma I tried and it work fine except it have problem when upload file video mp4 - it uploaded but missing some bytes, the size is little than original size and I cannot open it.
Edited: have same problem with image file, it can be open but loss a 1/10 of frame.
@omardoma I tried and it work fine except it have problem when upload file video mp4 - it uploaded but missing some bytes, the size is little than original size and I cannot open it.
Edited: have same problem with image file, it can be open but loss a 1/10 of frame.
I can confirm, experiencing the same loss after using maxRedirects: 0
. I tried a 5MB.zip file and a 100MB.zip file. Both missing the final 131,072 Bytes (0.125 Mebibyte).
This is worse than I expected.
If I do not set the maxRedirects: 0
, it will have the high memory usage issue as in #1107 , the whole file is loaded into memory which means I cannot upload any file larger than 1GB on my VM which has less than that memory.
If I do set the maxRedirects: 0
, the file uploaded is incomplete. There鈥檚 no way around here.
Exactly the same issue for me.
I guess using axios instead of request wasn't actually a very good idea :-/
A little workaround would be amazing, especially because we just cannot use your library at all to do such a job :-/.
This PR googleapis/nodejs-googleapis-common/pull/65 fixes this issue.
I've tested with 129MB mp4 file with maxRedirects: 0
and successfully uploaded with no loss.
This PR googleapis/nodejs-googleapis-common/pull/65 fixes this issue.
I've tested with 129MB mp4 file with
maxRedirects: 0
and successfully uploaded with no loss.
With your PR and maxRedirects set to 0 I can see the upload progress more reasonably, However, I am still experiencing loss. Looking at the output of onUploadProgress
, it seems like the last chunk of the file was not uploaded.
// output of console.log on `onUploadProgress`
...
{ bytesRead: 14745600 }
{ bytesRead: 14811136 }
{ bytesRead: 14876672 }
{ bytesRead: 14942208 } // uploaded file size shown in Google Drive
{ bytesRead: 14960582 } // actual file size
However, this is better than the case without PR
...
{ bytesRead: 14745600 }
{ bytesRead: 14811136 } // uploaded file size shown in Google Drive
{ bytesRead: 14876672 }
{ bytesRead: 14942208 }
// missing the final { bytesRead: 14960582 } actual file size
I confirm that.
I'm still experiencing some loss, I only test with "very" large files (~8Go).
Didn't see that the last chunk wasn't uploaded, but it's worth the investigation
I propose the following patch as it seems to fix the issue (tested on a 5,242,880 bytes zip file and a 14,960,582 bytes dmg file).
Base on the PR https://github.com/googleapis/nodejs-googleapis-common/pull/65, Change the on end handler to be
pStream.on('end', () => {
rStream.on('drain', () => {
rStream.push('\r\n');
rStream.push(finale);
rStream.push(null);
});
});
This still requires setting maxRedirects: 0
I suspect that the reason for the bug is piping between multiple stream and the end
event is not being emitted properly. Regardless, this can be fixed more elegantly by using only one stream. Since pStream
is ProgressStream
which extends Transform
and rStream
is PassThrough
which is Transform
, there is no need to use two Stream, instead we can just get rid of pStream
and replace it with rStream
declared as ProgressStream
.
I am making some assumptions here, so if I am missing any feature please ignore this. But it is working for me. The patch is here.
Relevant Code:
...
const boundary = uuid.v4();
const finale = `--${boundary}--`;
const rStream = new ProgressStream();
const isStream = isReadableStream(multipart[1].body);
headers['Content-Type'] = `multipart/related; boundary=${boundary}`;
for (const part of multipart) {
const preamble = `--${boundary}\r\nContent-Type: ${part['Content-Type']}\r\n\r\n`;
rStream.push(preamble);
if (typeof part.body === 'string') {
rStream.push(part.body);
rStream.push('\r\n');
}
else {
// Axios does not natively support onUploadProgress in node.js.
// Pipe through the pStream first to read the number of bytes read
// for the purpose of tracking progress.
rStream.on('progress', bytesRead => {
if (options.onUploadProgress) {
options.onUploadProgress({ bytesRead });
}
});
part.body.pipe(rStream, { end: false });
part.body.on('end', () => {
rStream.push('\r\n');
rStream.push(finale);
rStream.push(null);
});
}
}
if (!isStream) {
rStream.push(finale);
rStream.push(null);
}
options.data = rStream;
...
Hum... I tried your patch and I still get loss issues :-/. Actually I tried with very light files (1Mb) and I don't get the full filed upload to google drive neither :-/.
I don't know much about streams (really a beginner) but could the end event be called even if the last chunks didn't come ?
Still get the issue for me anyway.
Could you tell me about what you are testing (your process) ? So that I could try to do the same as you do :-)
Thank you @trulyshelton for suggestion.
@GhyslainBruno I confirm there is still some data loss with the patch when using maxRedirects : 0
setting. Mainly last chunk is not uploaded.
I believe I have the fix, submitting a PR shortly.
Wow you rock if you do !
I'm kinda trying to understand some streams mechanisms right now, and I'm seeing that rStream.on('unpipe') is called before part.body.on('end'), which doesn't sound logic to me, but I might be wrong :-/.
It really sounds like last chuck is not uploaded indeed.
Anyway can't wait for your PR ^^. Thanks for your work guys !
That PR fix the issue.
On a side note, if my understanding is correct, in ProgressStream we probably don't need to emit progress
event. drain
event is always emitted when it is appropriate to resume writing data to the stream.
which in this case I believe is when Axios finish uploading this part of the stream.
So we can probably get a more accurate upload progress if we do this
rStream.on('drain', () => {
if (options.onUploadProgress) {
options.onUploadProgress({ bytesRead: rStream.bytesRead });
}
});
@trulyshelton do you keep using maxRedirects: 0 ?
@GhyslainBruno you need to pass maxRedirects: 0
if you do not want the request to be fully buffered before upload.
Oh yeah right, my bad.
I don't know if it's only for my use case but I still get some loss. Actually the stream that I pass to body is the response of a web request (I download a file and pass the download stream to the create API).
.on('response', async function(response) {
...
await drive.files.create({
resource: {
...
},
media: {
...
body: response,
...
}
}
Maybe some of you get some similar use case.
If someone has a clue for that... In my case I'm still having the loss issue :-/
Actually, sometimes it works and sometimes it doesn't... I can't understand why --'
Actually I tried with some files on disk and everything works just fine.
So for me it's just another issue.
Anyway, thanks for your work guys. @AVaksman your changes in your PR are perfect.
@GhyslainBruno with web download stream, are you getting same result with and without maxRedirects: 0
?
Yes, sometimes (but really not often) the upload is complete, and sometimes it's not.
From my understanding, it looks like sometimes, some "chunks" (or paquets of data) are lost (or something else but didn't succeed to go where they should go) and as no error is handled, so the upload is not total.
I'm starting to read the Node JS doc about that to understand more precisely what's going on.
Anyway, for my previous use case I decided to use Resumable uploads (with single request) and Got library to handle the process without using the googleapis or now, and it works like a charm, if anyone is interested.
Yes, sometimes (but really not often) the upload is complete, and sometimes it's not.
From my understanding, it looks like sometimes, some "chunks" (or paquets of data) are lost (or something else but didn't succeed to go where they should go) and as no error is handled, so the upload is not total.
I'm starting to read the Node JS doc about that to understand more precisely what's going on.
Anyway, for my previous use case I decided to use Resumable uploads (with single request) and Got library to handle the process without using the googleapis or now, and it works like a charm, if anyone is interested.
@GhyslainBruno Hey, can you share with me your approach, as I was thinking to do the same thing until google-apis are stable enough to use
Hi @omardoma, sorry about the delay.
Sure I can, to make my web requests i use Got (which is a really elegant solution in my point of view).
const metadata = {
mimeType: link.mimeType,
name: link.filename,
parents: [movieFolderCreated.id]
};
const options = {
headers: {
'Authorization': 'Bearer ' + oAuth2Client.credentials.access_token,
'Content-Type': 'application/json; charset=UTF-8',
'X-Upload-Content-Length': link.filesize,
'X-Upload-Content-Type': link.mimeType
},
body: JSON.stringify(metadata)
};
// Initialize the upload - getting an url where to put chunks to
const response = await got('https://www.googleapis.com/upload/drive/v3/files?uploadType=resumable', options);
const resumableUrl = response.headers.location;
const downloadStream = got.stream(myUrlToStream)
downloadStream.on('downloadProgress', async progress => {make the stuff wanted})
(which is documented here) await got.put(resumableUrl, {
headers: {
'Content-Length': link.filesize //the size of the file you want to upload to Google Drive
},
body: downloadStream
});
It really works like a charm.
The only thing is about the resumable part, which I can't get to work properly.
At the end of some files uploads (really big ones I guess) I suddenly get a 308 error code, and I then follow the steps written in the Google Drive API to resume an upload.
But when I upload the last parts of my data (basically some hundreds of bytes), the new download stream ends, but the new PUT request to upload the remaining parts of the file to Google Drive API seems to not get closed, and then shuts because of TIME OUT.
So if anyone knows something about how to resume an upload using streams and Google Drive REST API, I will be glad to get some help :-/ (this might not be the place where to post, if it is the case, I apologize for it).
Thanks @GhyslainBruno for your example. Question: Where are you getting link
from? How do you know all the meta information if the file is at a URL? Or did you have it on local drive before uploading?
Most helpful comment
Thank you @trulyshelton for suggestion.
@GhyslainBruno I confirm there is still some data loss with the patch when using
maxRedirects : 0
setting. Mainly last chunk is not uploaded.I believe I have the fix, submitting a PR shortly.