Node: Faster `Transform` stream

Created on 14 May 2020  路  7Comments  路  Source: nodejs/node

Some thoughts on how to make Transform streams faster.

Part of the overhead of Transform (and PassThrough) is that it is actually 2 streams, one Writable and one Readable, both with buffering and state management, which are connected together.

We could try to skip this and implement Transform as a Readable which implements the Writable interface and proxies the naming.

e.g.

class FastTransform extends Readable {
  constructor(options) {
    super(options)
    this._writableState = {
      length: 0,
      needDrain: false,
      ended: false,
      finished: false
    }
  }
  get writableEnded () {
    return this._writableState.ended
  }
  get writableFinished () {
    return this._writableState.finished
  }
  _read () {
    const rState = this._readableState
    const wState = this._writableState

    if (!wState.needDrain) {
      return
    }

    if (wState.length + rState.length > rState.highWaterMark) {
      return
    }

    wState.needDrain = false
    this.emit('drain')
  }
  write (chunk) {
    const rState = this._readableState
    const wState = this._writableState

    const len = chunk.length

    wState.length += len
    this._transform(chunk, null, (err, data) => {
      wState.length -= len
      if (err) {
        this.destroy(err)
      } else if (data != null) {
        this.push(data)
      }
      this._read()
    })

    wState.needDrain = wState.length + rState.length > rState.highWaterMark

    return wState.needDrain
  }
  end () {
    const wState = this._writableState

    wState.ended = true
    if (this._flush) {
      this._flush(chunk, (err, data) => {
        const wState = this._writableState
        if (err) {
          this.destroy(err)
        } else {
          if (data != null) {
            this.push(data)
          }
          this.push(null)
          wState.finished = true
          this.emit('finish')
        }
      })
    } else {
      this.push(null)
      wState.finished = true
      this.emit('finish')
    }
  }
}

// TODO: Make Writable[Symbol.hasInstance] recognize `FastTransform`.

Making this fully backwards compatible with Transform might be difficult.

stream

Most helpful comment

How does it compare to transforming with an async generator function?

All 7 comments

Thoughts @nodejs/streams?

How much will we gain from this? I'm +1, possibly with a better name.

@mcollina: ~60%

$ time node fast-transform.js 

real    0m1.582s
user    0m1.737s
sys     0m0.157s

$ time node transform.js 

real    0m2.671s
user    0m2.683s
sys     0m0.069s
undici$ 

possibly with a better name.

FasterTransform? :smile:

How does it compare to transforming with an async generator function?

$ time node async-generator.js 
real    0m0.788s
user    0m0.772s
sys     0m0.075s

$ time node fast-transform.js 
real    0m1.612s
user    0m1.729s
sys     0m0.169s

So using an async generator in pipeline might be significantly faster. Is it worth further optimizing Transform? Note that an optimization as suggested here does make it harder to maintain than today's rather trivial implementation.

pipeline will convert a Readable source into an async iterator and read from it that way. Might introduce unnecessary overhead. Will investigate that first.

Apparently I should be using pipeline. This does seem like a good replacement for PassThrough.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

AkashaThorne picture AkashaThorne  路  207Comments

jonathanong picture jonathanong  路  93Comments

feross picture feross  路  208Comments

mikeal picture mikeal  路  197Comments

addaleax picture addaleax  路  146Comments