When I try to parse a large local utf8-encoded file, it takes about 2 or 3 hours to accomplish the job. So I need to monitor the rate of progress. Because it is a utf8 file, I can't just read it and convert it by chunk.toString("utf8"), it will cause the characters broken. I set .setEncoding('utf8') first. But then I can't use chunk.length to get the right value of the file position.
I have read some comments, but there is no way to meet my request.
(e.g., https://github.com/nodejs/node-v0.x-archive/issues/1527 http://stackoverflow.com/questions/13932967/how-do-i-do-random-access-reads-from-large-files-using-node-js )
If we could have an "ftell" as c does (see http://www.gnu.org/software/libc/manual/html_node/File-Positioning.html ), then it would be possible to monitor the progress. So may we add the "ftell" function to the file stream?
FWIW there is already an undocumented readStream.pos which gets updated if you pass a numeric start value in your fs.ReadStream options (e.g. fs.createReadStream('foo', { start: 0 })). I'm not sure if merely documenting this would be ideal or not...
Thank you. But I find the .pos is not accuracy. For example, the file is 2372126 bytes (with CJK Unified Ideographs), and at last .pos points to 2490368, bigger than the file size. I'm wondering how could this happen.
Perhaps we should move the line at https://github.com/nodejs/node/blob/master/lib/fs.js#L1902 :
this.pos += toRead;
into function onread(er, bytesRead),
and set:
this.pos += bytesRead;
Probably a dumb question: Can't you pipe it through a stream that counts how many bytes it saw and then convert the output stream to UTF8?
const Transform = require("stream").Transform
class CounterTransform extends Transform { // does this work with classes btw?
constructor() {
this.bytes = 0;
}
_transform = function (chunk, encoding, cb) {
this.bytes += chunk.length
this.push(chunk)
cb();
}
Or, just keep track of chunk.length and aggregate it in your stream?
(I realize this might not give you the accurate number of UTF8 chars but you'll get an estimate of how far into the file you are)
you can use stringdecoder (Which is what the stream does internally) to convert to a string and see how much of the file has been read so far
Thank you, benjamingr, calvinmetcalf. Well, I have tried another approach, Buffer.byteLength(), and it works for me; althrough I'm still not satisfied.
https://github.com/kanasimi/CeJS/blob/6b1bc65a00bcec5840df579ffb6c520a83b19cb6/application/net/wiki.js#L5874
Given that there are plenty of ways this can be implemented, closing. Can reopen if folks feel it's necessary
Most helpful comment
Or, just keep track of
chunk.lengthand aggregate it in your stream?(I realize this might not give you the accurate number of UTF8 chars but you'll get an estimate of how far into the file you are)