I am trying to read a big file (13,147,026 lines of text) line by line with deno but it's giving me error:
BufferFullError: Buffer full
at BufReader.readSlice (bufio.ts:339:15)
at async BufReader.readString (bufio.ts:217:20)
at async stream_file (buffer.ts:9:18)
Here is my code:
import { BufReader } from "https://deno.land/std/io/bufio.ts";
export async function stream_file(filename: string) {
const file = await Deno.open(filename);
const bufReader = new BufReader(file);
console.log("Reading data...");
let line: string | any;
let lineCount: number = 0;
while ((line = await bufReader.readString("\n")) != Deno.EOF) {
lineCount++;
// do something with `line`.
}
file.close();
console.log(`${lineCount} lines read.`);
}
versions:
deno 0.33.0
v8 8.1.108
typescript 3.7.2
I am trying to find an equivalent of node streams.
please advise!
In Deno, the I/O model is different. The Reader interface is what you want. The idea of a reader is to give the caller a way to request data while getting back pressure (which you can detect by checking to see if the Reader gave you the number of bytes you expected). If you aren't familiar with Go, this might seems a bit odd.
But to give an example of your code above:
import { BufReader } from "https://deno.land/[email protected]/io/mod.ts";
import { TextProtoReader } from "https://deno.land/[email protected]/textproto/mod.ts";
import { parse } from "https://deno.land/[email protected]/flags/mod.ts";
import { basename } from "https://deno.land/[email protected]/path/mod.ts";
export async function read(r: Deno.Reader) {
const reader = new TextProtoReader(BufReader.create(r));
console.log("Reading data...");
let lineCount = 0;
while (true) {
let line = await reader.readLine();
if (line === Deno.EOF) break;
// do something with `line`
lineCount += 1;
}
console.log(`${lineCount} lines read.`);
}
if (import.meta.main) {
const args = parse(Deno.args, {
boolean: ["h"],
alias: {
h: ["help"]
}
});
if (args.h) {
printUsage();
Deno.exit(0);
}
const [filename] = args._;
if (!filename) {
printUsage();
Deno.exit(1);
}
const file = filename === "-" ? Deno.stdin : await Deno.open(filename);
await read(file);
file.close();
function printUsage() {
console.error(
`Usage: deno --allow-read ${basename(import.meta.url)} <filename>`
);
}
}
The thing to note is that I'm using TextProtoReader as a convenience because it has logic for reading lines when the underlying BufReader's buffer is full (you can check out the source鈥t's pretty straightforward).
I tried it with 3 approaches to read 307 MB test file :
import { BufReader } from "https://deno.land/std/io/bufio.ts";
export async function stream_file(filename: string) {
const file = await Deno.open(filename);
const bufReader = new BufReader(file);
console.log("Reading data...");
let line: string | any;
let lineCount: number = 0;
while ((line = await bufReader.readString("\n")) != Deno.EOF) {
lineCount++;
// do something with `line`.
}
file.close();
console.log(`${lineCount} lines read.`);
}
import { BufReader } from "https://deno.land/[email protected]/io/mod.ts";
import { TextProtoReader } from "https://deno.land/[email protected]/textproto/mod.ts";
export async function textProtoReader(filename:string) {
const r: Deno.Reader = await Deno.open(filename)
const reader = new TextProtoReader(BufReader.create(r));
console.log("Reading data...");
let lineCount = 0;
while (true) {
let line = await reader.readLine();
if (line === Deno.EOF) break;
// do something with `line`
lineCount += 1;
}
console.log(`${lineCount} lines read.`);
}
import { BufReader } from "https://deno.land/std/io/bufio.ts";
export async function readLine(filename: string) {
const file = await Deno.open(filename);
const bufReader = new BufReader(file);
console.log("Reading data...");
let line: string | any;
let lineCount: number = 0;
while ((line = await bufReader.readLine()) != Deno.EOF) {
lineCount++;
// do something with `line`.
}
file.close();
console.log(`${lineCount} lines read.`);
}
rust itself does it in
use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;
use std::io::{stdin, stdout, Read, Write};
use std::time::{Instant};
fn main() {
let start = Instant::now();
let mut counter = 0;
// File hosts must exist in current path before this produces output
if let Ok(lines) = read_lines("./enwik9") {
// Consumes the iterator, returns an (Optional) String
for _line in lines {
counter = counter + 1;
}
println!("{}",counter)
}
let duration = start.elapsed();
println!("Time elapsed in expensive_function() is: {:?}", duration);
pause()
}
fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where P: AsRef<Path>, {
let file = File::open(filename)?;
Ok(io::BufReader::new(file).lines())
}
fn pause() {
let mut stdout = stdout();
stdout.write(b"Press Enter to continue...").unwrap();
stdout.flush().unwrap();
stdin().read(&mut [0]).unwrap();
}
node does that:
const fs = require("fs");
const readline = require("readline");
async function processLineByLine() {
console.log(Date());
const fileStream = fs.createReadStream("./enwik9");
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
let counter = 0;
for await (const line of rl) {
counter++;
}
console.log(Date());
return counter;
}
can anybody explain why it's different?
I guess the best way to read big file in deno is approach 3.
Side note, feels like something we need to add to the benchmarks.
Most helpful comment
Side note, feels like something we need to add to the benchmarks.