Tedious: StreamParser stalling other queries

Created on 4 Oct 2015  路  11Comments  路  Source: tediousjs/tedious

I have an issue with node-mssql: "node-mssql is not truly concurrent on queries".
Issue: https://github.com/patriksimek/node-mssql/issues/217

short description: multiple queries with stream enabled do not run concurrently because the tedious stream parser _might_ block other streamed queries from being processed:

  StreamParser.prototype._transform = function (input, encoding, done) {
    var job, length, offset, result;
    offset = 0;
    this.buffer.append(input);
    if (!this.generator) {
      this.generator = this.parser();
      this.currentStep = this.generator.next();
    }
    job = void 0;
    result = void 0;
    length = void 0;
    while (!this.currentStep.done) {
      job = this.currentStep.value;
      if (!(job instanceof Job)) {
        return done(new Error('invalid job type'));
      }
      length = job.length;
      if (this.buffer.length - offset < length) {
        break;
      }
      result = job.execute(this.buffer, offset);
      offset += length;
      this.currentStep = this.generator.next(result);
    }
    this.buffer.consume(offset);
    if (this.currentStep.done) {
      this.push(null);
    }
    return done();
  };

I am debugging my program with Visual Studio Code and it it looks like the while loop hit inside the stream parser blocks my other queries from being processed or returning any rows.

Can you give me any inside on how the StreamParser is handled when multiple concurrent connections are open?

@arthurschreiber
@patriksimek

Action Required Response needed known issue

Most helpful comment

I'd also like to know if there is any progress on this topic because this issue is a mayor blocker for writing scaling business applications working with large data.

If the whole app can't accept any new requests because the query() blocks it all, this leads to real worse user experiences.

I'd really appreciate any progress here.

Regards,
Michael

All 11 comments

Hey @XemsDoom

I've read your issue description, and I'm wondering whether your issue is with the way tedious works or with Node.JS in general. Node.js is a single threaded, event based environment, so doing any sort of processing that happens on the CPU always blocks other CPU processing operations.

Regarding the stream parsing code in tedious: the parsing of incoming data is "asynchronously synchronous". This means that data that is (asynchronously) pushed into the stream parser is parsed synchronously. So as long as data is available on the stream, it will be parsed, which is a blocking, synchronous operation.

I've been playing around with making the parsing "asynchronously asynchronous", but existing internals of tedious depend on the existing behaviour and need to be changed.

You can try using an older version of tedious (e.g. 1.11.5) and see if the issue was already existing there. Fixing this is definately on my list, it's just not a very high priority right now.

Guten Abend @arthurschreiber. I know how NodeJS works and that CPU intensive JS code will block any other processing since everything runs in a single-thread.

In my opinion there's a bug in tedious or a wrong design decision, because im pretty sure it is not supposed to function as it does now. Multiple streamed queries must not be processed sequentially. By "sequentially" I mean the queries being processed after the one before has finished.

If one query has a lot of rows, let's say 100k and a second query 50k rows, the queries should be processed by the stream parser intertwined as rows are fetched from the database server.

Why? Because if different web requests want to retrieve data from two different tables, I don't want one user to be waiting because of other database queries. We are getting the same dilemma as in traditional web servers with this design.


I try to explain it in german, maybe it is easier to understand :)

Tedious bearbeitet nur eine Query zu einem gegebenen Zeitpunkt, statt alle ausgef眉hrten Queries gemeinsam _nebenl盲ufig_ im Stream Parser zu bearbeiten, momentan werden die anderen Queries nicht bearbeitet, bis die erste Query durch den Stream Parser durch ist, es wird also das Bearbeiten der weiteren Queries blockiert bzw. werden diese somit sequenziell abgearbeitet.

Ergo, f眉hrst du 3 Queries aus und hast 3 Callbacks f眉r wenn Rows gefetcht wurden, werden die 2 weiteren Callbacks erst aufgerufen, wenn die erste Query durch ist, obwohl eigentlich durch das Streaming die verschiedenen Queries miteinander bearbeitet werden sollen. Mit _Miteinander_ meine ich, dass alle 3 Callbacks jeweils in einer offensichtlich nicht geordneten Abfolge gerufen werden.

Also zum Beispiel so:
Query 1, Query 2, Query 3 werden nebenl盲ufig ausgef眉hrt. Die Queries werden jeweils in eigenen Connections ausgef眉hrt aus dem Connection Pool von mssql.

Nun sollten die Rows von allen 3 Queries eigentlich gleichzeitig zur眉ck kommen:
row-query1, row-query1, row-query3, row-query2, row-query3, row-query2, row-query2....
Halt ohne Determinismus da wir ja nicht vorhersagen k枚nnen wie die Resultate zur眉ck kommen.

Tedious aber f眉hrt das ganze so aus:
row-query1, row-query1, row-query1.....bis halt fertig....dann row-query2, row-query2, row-query2....bis wieder fertig....dann row-query3, row-query3, row-query3....

Was nicht so sein sollte. Das hat ja nichts damit zu tun, dass NodeJS single-threaded ist, sondern dass der Parser nicht verschiedene Queries _konkurrierend_ bearbeitet.

Hey @XemsDoom

Thanks for the through explanation (and the german translation - I'll reply in english so other people understand what we're talking about). :smile:

As explained above this is a known but undocumented limitation in tedious right now, and I'm planning to work on fixing it after I fixed the unfortunate performance issues I introduced with the 1.12 release line and the switch to generators.

I have a local branch that changes the parsing to be intertwined, but that breaks other tedious internals that currently rely on events being fired in a specific order in one event loop turn. All of this is fixable, but requires time and thorough testing, as I don't want to do the same mistake I did with the 1.12 release.

I'd tell you to feel free at trying to fix this, but the upcoming changes to improve performance will change the token parsing code a lot, and I don't want to you to waste your time. :disappointed_relieved:

Thanks Arthur. Looks like you are already aware of this problem and I'm thankful that you are working on it. Can we please let this issue be open, so that I can see when the problem got fixed? If I can help in any way, just tell me, I would be glad to do so.

Have a nice evening.

@arthurschreiber Is any work being done with this? It's been a year and a half since last reported "working on it"

I'd also like to know if there is any progress on this topic because this issue is a mayor blocker for writing scaling business applications working with large data.

If the whole app can't accept any new requests because the query() blocks it all, this leads to real worse user experiences.

I'd really appreciate any progress here.

Regards,
Michael

Hi @arthurschreiber,

I am encountering the same (I think ?) issue where a stream will block the entire nodeJS app until the stream is complete, however i am struggling to locate the affected block of code as quoted earlier by @luca-moser. Could you please point me in the correct direction as I would like to see i I can fix it for my particular use case. You mentioned that you have a local branch where you were testing an async async solution - would you be able to share this and the known issues this causes? Thanks!

Hello. I am having trouble too. I use stream to get large amounts of data and send them in the front-end with websockets. All works well, but tedious is still blocking while streaming. Are there any news? Can we use it in a non-blocking way?

Thanks

Five and a half years later and the asynchronous synchronous behavior of streams is still not fixed. :-(

@bluefire2121 I think https://github.com/tediousjs/tedious/pull/1240 will fix this. Five and a half years later. 馃槵

We just released [email protected] to the tedious@next release channel. We'd be happy to hear back from you if this fixes the issue you're describing here.

If no issues come up, this will be promoted to the tedious@latest release channel automatically in a week.

Was this page helpful?
0 / 5 - 0 ratings