Make a database query returning a stream of results, as opposed to current implementation that returns an all-in-one in-memory array.
Original description
I'm not very experimented with loopback yet but I tried to investigate various methods for the ability to stream the result of a simple query from a client (instead of sending the result over http).
I found that
Did I miss anything else, or does loopback not yet support this feature and I'm potentially left with the implementation myself?
Just to highlight my use case: I have currently an Express.js app working on Heroku that I'm trying to rewrite on Loopback. Heroku has a 30s timeout out of the box (hard limit) and my application is fetching potentially 10s of thousands of records so that some requests may take longer than 30s to respond. Simple solution to avoid the timeout is to stream the result over sockets - it's also more user friendly since the user is able to see the loading going on.
@Krisa : I'm looking into this issue. Will update you when I have something to share. Thanks.
@Krisa : Hi, I guess, Using server-sent events is close to your use case. Let me know your thoughts.
It would be great if you could please share any sample implementation that you might have seen, on web, for your use case.
I would be left with the implementation using that.
My current implementation is using express.js with mongoose doing the connection to the MongoDB server (and fetching records) and node ws streaming the result to the client. It's completely bespoke so it's not really worth sharing anything more than that. I could probably do the same in loopback in a hacky way, and especially non loopback-way bypassing probably any ACL, etc., but I would be happy to have something more or less supported natively.
I didn't want to reference any other framework here, but for the sake of avoiding any misunderstanding, this is exactly what I'm looking for...however with Loopback :-) => 1- the client starts a request, which is 2- processed by the server and 3- the server streams the output to the client.
@Krisa : Hey, I guess, LB doesn't look like to have out-of-the-box stream for load events. I was able to implement the stream on an operation hook: loaded and registering a remote method.
Take a look at PR above that I sent to my sandbox repo. I believe that's something close to what you want.
Try cloning the repo and switching to the branch, launch the server and
curl -X GET --header "Accept: application/json" "http://localhost:3000/api/MyModels"
This sounds like a feature request for an internal component to add for streaming data (ie. websockets). @ritch Do you have any opinion here? Should this remain in userland or be something that we should be looking at to implement as part of LB?
@bajtos : PTAL and share your views as well.
Heroku has a 30s timeout out of the box (hard limit) and my application is fetching potentially 10s of thousands of records so that some requests may take longer than 30s to respond. Simple solution to avoid the timeout is to stream the result over sockets - it's also more user friendly since the user is able to see the loading going on.
Why aren't you batching / paging the requests? I think making several smaller requests would be a simple solution to this problem.
OTOH I think the feature request is valid and something I've wanted for a while. The ability to create a cursor against a loopback datasource and incrementally respond with data.
We do have support for responding with data incrementally, as @gunjpan pointed out, using server sent events. You can use this without ChangeStreams. Here is a basic example that batches the queries to a page of 10 and streams them to the client using server sent events.
var PassThrough = require('stream').PassThrough;
var clone = require('lodash').clone;
module.exports = function(Users) {
var DEFAULT_LIMIT = 10;
User.stream = function(limit, filter, cb) {
var stream = new PassThrough({ objectMode: true });
var clonedFilter = clone(filter);
var page = 0;
var isDone = false;
clonedFilter.limit = limit;
cb(null, stream);
async.while(function() {
return isDone;
}, function() {
clonedFilter.skip = page * limit;
User.find(clonedFilter, function(err, users) {
stream.write(users);
});
}, done);
function done(err) {
isDone = true;
stream.write({end: true, error: err});
}
};
User.remoteMethod('stream', {
description: 'Create a get stream.',
accessType: 'READ',
http: [
{ verb: 'get', path: '/stream' }
],
accepts: [{
arg: 'limit',
type: 'object',
}, {
arg: 'filter',
type: 'object'
}],
returns: {
arg: 'stream',
type: 'ReadableStream',
json: true,
},
});
};
OTOH I think the feature request is valid and something I've wanted for a while. The ability to create a cursor against a loopback datasource and incrementally respond with data.
:+1:
Thank you @ritch this makes sense and answer some of my initial problems (also I didn't know the stream#PassThrough, it looks interesting...). Yet this solution may not be fully optimal regarding performances by opening every time a new cursor on the database (i.e. db request => records streamed => db request => etc.). Is there eventually a way to make a stream from the request like below?
My current solution (mongoose/ws) is taking indeed advantage of the stream operator on mongo itself. Very simplified example:
var _stream = mongo.Collection.find({...}, 'field1 field2 etc.').lean().batchSize(_batchSize).stream();
_stream.on('data', streamData(_stream));
_stream.on('error', streamError);
_stream.on('end', streamEnd);
function streamData(myStream) {
return function(data) {
ws.send(data);
}
}
The feature request makes sense to me too. I think it has two parts:
1) How to make a database query returning a stream of results, as opposed to current implementation that returns an all-in-one in-memory array.
2) How to get the stream of results to the HTTP client (web browser).
As @ritch pointed out in his comment, 2) should be already available.
The remaining part is 1), which is something that needs to get implemented in loopback-datasource-juggler and then (eventually) in all connectors.
Having wrote that, I am afraid we don't have bandwidth to work on this feature in the near feature (next 3 months at least).
@bajtos agree, I could get the example above working (with few amendments to the code), hence the streaming (of anything) is definitely working well.
Regarding 1), I wanted to see whether doing everytime a new query is really slower:
I have run each sample 5 times:
Example using Collection.find(...) takes between 5 to 6 seconds (code similar to the one proposed by Ritch)
Example using Collection.find(...)....stream() takes between 2 to 2.5 seconds (similar to the code shown above).
Definitely looking forward for some support regarding query streaming.
EDIT: doing a simple Collection.find({}) (returning my 21477 records in just 1 query) using loopback is still slower than mongodb#stream for some reasons I don't understand. It takes roughly 3s to 3.5s indeed.
@Krisa : Meanwhile, did you have a chance to look at this PR: https://github.com/gunjpan/sandbox/issues/2 . It should help you to move forward while we wait for this feature implementation. Thanks.
Thanks @gunjpan for taking the time to share that. An elegant solution to stream from the server to the client, but it's a similar solution to the one proposed by Ritch before and does not help further regarding the stream from the database to the server.
I have not experimented beyond my relatively simple performance test I've shared in my previous post but I'm currently concerned the built-in adapter is noticeably slower than the native MongoDB one, even when we exclude any streaming. I understand streaming is not supported yet, but while you'll implement it, you may want to double check whether there are no issues on raw performance.
Any update on this? Its been 14 months since any activity. I find it very frustrating that loopback doesnt offer any sort of streaming solution. How are we supposed to work with large data sets?
Is there any update on this for version 2x?
Is it possible that we'll see this in loopback 3.x?
Definitely an interesting feature.
I suppose for loopback2 it's not gonna happen.
Is it in the pipe for loopback3 / loopback-next ?
LoopBack version 3 is in LTS, we won't be adding any new features.
Feel free to open a new issue in loopback-next if you are interested in this feature.
Most helpful comment
The feature request makes sense to me too. I think it has two parts:
1) How to make a database query returning a stream of results, as opposed to current implementation that returns an all-in-one in-memory array.
2) How to get the stream of results to the HTTP client (web browser).
As @ritch pointed out in his comment, 2) should be already available.
The remaining part is 1), which is something that needs to get implemented in loopback-datasource-juggler and then (eventually) in all connectors.
Having wrote that, I am afraid we don't have bandwidth to work on this feature in the near feature (next 3 months at least).