Node: Feature request: fs.ReadStream.tell()

Created on 26 Mar 2016  路  7Comments  路  Source: nodejs/node

When I try to parse a large local utf8-encoded file, it takes about 2 or 3 hours to accomplish the job. So I need to monitor the rate of progress. Because it is a utf8 file, I can't just read it and convert it by chunk.toString("utf8"), it will cause the characters broken. I set .setEncoding('utf8') first. But then I can't use chunk.length to get the right value of the file position.

I have read some comments, but there is no way to meet my request.
(e.g., https://github.com/nodejs/node-v0.x-archive/issues/1527 http://stackoverflow.com/questions/13932967/how-do-i-do-random-access-reads-from-large-files-using-node-js )

If we could have an "ftell" as c does (see http://www.gnu.org/software/libc/manual/html_node/File-Positioning.html ), then it would be possible to monitor the progress. So may we add the "ftell" function to the file stream?

  • Version: 5.9.1
  • Platform: Linux
feature request fs stream

Most helpful comment

const Transform = require("stream").Transform
class CounterTransform extends Transform { // does this work with classes btw?
  constructor() { 
    this.bytes = 0;
  }
  _transform = function (chunk, encoding, cb) {
  this.bytes += chunk.length
  this.push(chunk)
  cb();
}

Or, just keep track of chunk.length and aggregate it in your stream?

(I realize this might not give you the accurate number of UTF8 chars but you'll get an estimate of how far into the file you are)

All 7 comments

FWIW there is already an undocumented readStream.pos which gets updated if you pass a numeric start value in your fs.ReadStream options (e.g. fs.createReadStream('foo', { start: 0 })). I'm not sure if merely documenting this would be ideal or not...

Thank you. But I find the .pos is not accuracy. For example, the file is 2372126 bytes (with CJK Unified Ideographs), and at last .pos points to 2490368, bigger than the file size. I'm wondering how could this happen.

Perhaps we should move the line at https://github.com/nodejs/node/blob/master/lib/fs.js#L1902 :

this.pos += toRead;

into function onread(er, bytesRead),
and set:

this.pos += bytesRead;

Probably a dumb question: Can't you pipe it through a stream that counts how many bytes it saw and then convert the output stream to UTF8?

const Transform = require("stream").Transform
class CounterTransform extends Transform { // does this work with classes btw?
  constructor() { 
    this.bytes = 0;
  }
  _transform = function (chunk, encoding, cb) {
  this.bytes += chunk.length
  this.push(chunk)
  cb();
}

Or, just keep track of chunk.length and aggregate it in your stream?

(I realize this might not give you the accurate number of UTF8 chars but you'll get an estimate of how far into the file you are)

you can use stringdecoder (Which is what the stream does internally) to convert to a string and see how much of the file has been read so far

Thank you, benjamingr, calvinmetcalf. Well, I have tried another approach, Buffer.byteLength(), and it works for me; althrough I'm still not satisfied.
https://github.com/kanasimi/CeJS/blob/6b1bc65a00bcec5840df579ffb6c520a83b19cb6/application/net/wiki.js#L5874

Given that there are plenty of ways this can be implemented, closing. Can reopen if folks feel it's necessary

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TazmanianDI picture TazmanianDI  路  127Comments

aduh95 picture aduh95  路  104Comments

jonathanong picture jonathanong  路  91Comments

thecodingdude picture thecodingdude  路  158Comments

feross picture feross  路  208Comments