I maintain the popular readdirp module. Decided to try fs.opendir from node 12.12. The results are suboptimal: it's 3x slower to walk file trees when compared to fs.readdir.
To reproduce:
if (opendir) to if (!opendir) I get 3x speedup, which means opendir is 3x slower than readdir.Refs: #583, #29349
Fwiw, https://github.com/nodejs/node/pull/29893 isn鈥檛 released yet and is going to improve performance significantly.
That being said, the big advantage of fs.opendir() is that it has a low upper limit on the memory necessary to read a directory, and not necessarily being faster than fs.readdir().
I think it must have comparable performance (-+ 30%), because it's reasonable to assume the software doesn't know how big a directory is ahead of time. Would be especially useful for readdirp.
With that PR, I鈥檓 getting something closer to an 80聽% perf difference.
One possibility that was suggested in it was making the number of entries read in one go configurable. I think that would bring this into acceptable parameters for you.
Great!
So, this will always be slower than fs.readdir(). Always, because to use it reasonably, you're going to need an async iterator (Promises) or an object stream (ugh).
You'd be able to get much closer by having a different but frankly weird/awful api: an async iteratable of sync iterables...
That being said that kind of subverts the point of the API. Sure, it should buffer to some extent because performance can reasonably be substantially improved without significant drawback. However, the large idea of such an API is to allow scheduled work on said directory entries to be started/return as iterator progresses, and to avoid such a large memory allocation on large directories.
Would you like a PR for that?
@Fishrock123 We have a highWaterMark option for streams, so adding something similar here seems fine to me. In the end it just gives the user more control over the memory-vs-time tradeoff that鈥檚 involved here.
@piyukore06 Unless there鈥檚 a strong objection, I鈥檇 say go for it. :)
Oh, I wasn't clear but I certainly support being able to configure this. In fact, I was thinking about that as I was reviewing your PR. 馃槃
@piyukore06 Please let us know if you need any help making that PR. It'l require a bit of C++, but only a little bit.
Yes @Fishrock123 I feel a bit stuck, I will push my changes as soon as possible. Maybe you people could take a look and point me in right direction..
Thanks in advance :)
@addaleax @Fishrock123 I don't think I completely get the whole building, compiling and testing process. It's taking more time than expected, so If anyone else wants to go ahead and finish it, please do. I will try and understand more about the ecosystem and try to find another issue to contribute to. Thank you for your patience.
@piyukore06 I don't see any performance improvements in the test from first post. Basically no changes with node 13-latest and bufferSize: 1024 / 4096 / whatevs. BufferSize: 1 makes this slower, so the arg works.
Most helpful comment
Oh, I wasn't clear but I certainly support being able to configure this. In fact, I was thinking about that as I was reviewing your PR. 馃槃