so i tried to stream an image (manipulated it with gm first), and did something like that:
gm(request('http://www.some-domain.com/image.jpg'), 'image.jpg').stream(function(err, stdout, stderr) {
var data, request;
data = {
Bucket: 'my-bucket',
Key: 'image.jpg',
Body: stdout,
ContentType: mime.lookup('image.jpg')
};
s3.client.putObject(data, function(err, res) {
console.log('done');
});
});
that throws following error: Cannot determine length of [object Object]
of course it does, because in the 'afterBuild' event it tries to get the byte-length of the httpRequest-body.
but if my body is a stream, of course he can not get the content-length.
but thats only part1 of the issue. if i remove the 'afterBuild' for testing purposes, streaming still doesnt work. (the writeBody-method is called every couple of seconds, but the request is not finished).
i installed the module from github (not npm), to have the latest available sources...
+1 I ran into the exact same issue today. Based on https://github.com/aws/aws-sdk-js/issues/3 I assumed that streams were supported in the Body of a putObject, but it appears that this will only work if the stream is created from fs.createReadStream().
The code that is throwing the exception is in the util.js file:
byteLength: function byteLength(string) {
if (string === null || string === undefined) return 0;
if (typeof string === 'string') string = new Buffer(string);
if (string.length !== undefined) {
return string.length;
} else if (string.path !== undefined) {
return require('fs').lstatSync(string.path).size;
} else {
throw AWS.util.error(new Error(), {
message: 'Cannot determine length of ' + string, object: string
});
}
}
I think that supporting this should be fairly simple, but will require a change to the API so that ContentLength can be specified by the caller.
An easier workaround for this would be to set the .length property on the stream that you created, for instance:
gm(request('http://www.some-domain.com/image.jpg'), 'image.jpg').stream(function(err, stdout, stderr) {
var data, request;
// set stream length
stdout.length = SIZE_HERE;
data = {
Bucket: 'my-bucket',
Key: 'image.jpg',
Body: stdout,
ContentType: mime.lookup('image.jpg')
};
s3.client.putObject(data, function(err, res) {
console.log('done');
});
});
The above _should_ work as advertised. It also avoids messing around with extra stream specific logic (something we unfortunately have to do for fs streams) and makes use of a fairly standard JS convention.
Though I do agree exposing ContentLength in S3 would be useful, but note that if it were added it would be available on a per-service basis only.
That little trick worked for me, thanks!
so how do i do that in my example? if i try to get the filesize with the gm-library and stream inside a callback i get this error from the request library:
You cannot pipe after data has been emitted from the response.what i tried was that:
gm(request('http://www.some-domain.com/image.jpg'), 'image.jpg').filesize({ bufferStream: true }, function(err, filesize) {
this.stream(function(err, stdout, stderr) {
var data, request;
stdout.length = filesize;
data = {
Bucket: 'my-bucket',
Key: 'image.jpg',
Body: stdout,
ContentType: mime.lookup('image.jpg')
};
s3.client.putObject(data, function(err, res) {
console.log('done');
});
});
});
I'm not sure how gm() works, but you're likely going to have to ask the gm stream, not the request stream, how many bytes it will be generating. Even if request() gave you a number, it won't be the same number that your manipulated image is. It may not even be possible to get the size via streams with that library, in which case you would probably want to write to disk first (or an in-memory buffer) and then stream from there. I don't know for sure though; it would be best to check the docs linked from https://npmjs.org/package/gm
I'm going to close this since it's specific to a third party library. S3 requires a Content-Length to be provided on all payload requests, so this is something that _must_ be supported by whatever third party library you use.
FWIW I see a "filesize()" property in the gm docs that might allow you to get this value: http://aheckmann.github.io/gm/docs.html#getters
Another workaround using just gm, aws-sdk, http, fs
http.get('http://s3-sa-east-1.amazonaws.com/bucket/path/image.jpg', function(res) {
if(res.statusCode != 200) {
console.log("Err\n");
} else {
gm(res).resize(w, h, '^').gravity('Center').extent(w, h).quality(80).stream(function(err, stdout, stderr) {
var buf = new Buffer(0);
stdout.on('data', function(d) {
buf = Buffer.concat([buf, d]);
});
stdout.on('end', function() {
var data = {
Bucket: bucket,
Key: 'pathtoimage/thumb.jpg',
Body: buf
};
s3.client.putObject(data, function(err, resp) {
console.log("Done\n");
});
});
});
}
});
This last trick, using fs instead of http.get, worked for me
thanks perfect :)
@aws @lsegal You either need to accept readable streams as per the ReadableStream documentation (which does not specify a length) or document somewhere that your implementation is limited in this way.
If I have to know the size then I either need to load (potentially) very large amounts of data into memory or write it to disk, neither of which is a good option in the environment I am operating in.
@terribleplan the SDK supports ReadableStream per the docs, and you do not need to specify a length with the stream. The issue is that the underlying service (S3) needs to know how many bytes are in your PUT request. Checking the .length property on the stream is just a convenience (since it is implemented in readable streams returned by the fs module)-- you can alternatively provide a regular vanilla stream and pass the byte length as the ContentLength parameter to the putObject operation.
There is not much the SDK can do about this limitation in S3, as needing to know the size of the payload is a requirement of the service. If the only way you can determine the size is to load large amounts of data into memory, this is not something the SDK could do much about-- again, the restriction comes from the service-- though I would strongly recommend buffering in chunks if you can so as to not load all data into memory at once (potentially writing out to disk for larger files if you're doing some kind of transform on the stream).
That said, thanks for the feedback. I agree that better documentation about S3's limitation could be useful here, I will add a note to look into making this limitation more explicit. I would also recommend visiting S3's forums to put in a feature request to remove the Content-Length restriction. The ability to support streaming payloads would be a huge benefit not just for the JS SDK, but other tools as well. Let them know you believe this would be an important improvement to the service!
@lsegal It look like some of I want can be done through the multipart API, and that s3-upload-stream wraps it nicely, so I doubt any change will be made since there are probably benefits on the s3 side knowing the size of the upload in advance.
@rcmonteiro your example may work, but it defeats the purpose of streaming. (your putting everything into memory)
Google ranks this issue highly, so perhaps it's worth noting for the next person that http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property handles streaming.
Ultimately they should copy some code from here https://github.com/nathanpeck/s3-upload-stream , as it is done right .
@nmccready, according to its README, that is made obsolete by the functionality I linked.
Ah nice, misread your post. So this supports streaming. Can there be an example where .pipe is being used?
@nmccready you still don't need to use pipe, simply pass the stream object as the body parameter of the upload() call as shown in the doc examples. The stream object depends on the library you are using, but it's typically the thing you are trying to call .pipe on.
Something like this. Beware that I haven't run this so it probably contains typos.
var https = require("follow-redirects").https;
var s3 = new (require("aws-sdk")).S3();
var gitHubToken = "hex string from github personal token";
module.exports = function(callback) {
var request = https.get(
{
host: "api.github.com",
path: "/repos/aws/aws-sdk-js/tarball/4a404cb8c06bba6b7b00c323671376c6377889ed?access_token="+gitHubToken,
headers: {
"User-Agent": "Unique user agent string",
},
function(response) {
return handleTarball(null, response, callback);
}
}
);
request.on("error", function(err) {
return handleTarball(err, null, callback);
});
};
var handleTarball = function(err, response, callback) {
if (err) {
return callback(err);
}
if (response.statusCode != 200) {
return callback(new Error("unsuccessful status code: "+response.statusCode));
}
s3.upload({
Bucket: "some-bucket",
Key: "aws-sdk-js-latest.tar.gz",
Body: response, // <--------------------- it's that simple if you use s3.upload
ContentType: response.headers["content-type"], // shouldn't hurt
}, callback);
};
it's that simple if you use s3.upload
I was definitely hoping that would be the case from how it's described in the docs, but in trying to do this, I still end up with this error:
{
"errorMessage": "Cannot determine length of [object Object]",
"errorType": "Error",
"stackTrace": [
"byteLength (/var/runtime/node_modules/aws-sdk/lib/util.js:179:26)",
"ManagedUpload.fillBuffer (/var/runtime/node_modules/aws-sdk/lib/s3/managed_upload.js:385:19)",
"ManagedUpload.send (/var/runtime/node_modules/aws-sdk/lib/s3/managed_upload.js:199:33)",
"/var/runtime/node_modules/aws-sdk/lib/util.js:799:25",
"new Promise (<anonymous>)",
"ManagedUpload.promise (/var/runtime/node_modules/aws-sdk/lib/util.js:798:14)",
"/var/task/index.js:55:106",
"handler (/var/task/index.js:19:54)",
"<anonymous>",
"process._tickDomainCallback (internal/process/next_tick.js:228:7)"
]
}
Basic flow of my code is reading a list of 'filenames' from s3, then fetching each as a stream, which are then written out as a stream using the s3 upload function. I'm using highland to handle some of the higher level concepts, but it just returns a node ReadableStream at the end, so I can't see why this would be an issue.
const fileList = await client.listResultsFiles();
const mergedFiles = highland(fileList)
.map(client.streamFile)
.sequence()
.toNodeStream();
return client.uploadStream('results-AAAAAA.json')(mergedFiles)
Helpers for reference
const listResultsFiles = async function(): Promise<string[]> {
// List
const s3Objects = await s3.listObjectsV2({
...commonParams,
Prefix: cfg.resultsKeyPrefix
}).promise();
// ..snip.. some bits that make the filenames sortable
// Sort
return lodash
.sortBy(files, ['prefix', 'start', 'end', 'suffix'])
.map(({filename}) => filename);
};
const streamFile = (key: string) => {
const fileStream = s3.getObject({...commonParams, Key: key}).createReadStream();
return highland(fileStream)
};
const uploadStream = (destinationKey: string) => (streamToUpload: ReadableStream) => {
return s3.upload({
...commonParams,
Key: destinationKey,
Body: streamToUpload,
}).promise()
};
Edit: I'm not sure of the 'why', but this solved (or at least worked around) the issue for me, and now works as expected.. https://github.com/aws/aws-sdk-js/issues/1713#issuecomment-447979639
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.
Most helpful comment
@rcmonteiro your example may work, but it defeats the purpose of streaming. (your putting everything into memory)