Zig: Reading from stdin is incredibly slow

Created on 1 Feb 2020  路  12Comments  路  Source: ziglang/zig

On 0.5.0+357f42da6 the following takes around 125 microseconds to read each line from stdin

const warn = @import("std").debug.warn;
const std = @import("std");
const mem = std.mem;

pub fn main() !void {
    const allocator = std.heap.page_allocator;
    const stream = &std.io.getStdIn().inStream().stream;
    var buffer = try std.Buffer.initCapacity(allocator, 1000);

    var count: usize = 0;
    var timer = try std.time.Timer.start();
    while (stream.readUntilDelimiterBuffer(&buffer, '\n', 1000)) {
        warn("{}\n", .{timer.lap()});
        count += 1;
    } else |err| {
        switch (err) {
            error.EndOfStream => {},
            else => return err,
        }
    }
    warn("{}\n", .{count});
}

With the test data:

awk 'BEGIN{for (i=0; i<10000; i++){print "abcdef\tghijk\tlmnop\tqrstuv\twxyz1234\tABCDEF\tHIJK\tLMNOP\tQRSTUV\tWXYZ123"}}' > big.tsv

Performance is bad on debug and release fast.

main < big.tsv

Perf issues observed on windows and macos.

question

Most helpful comment

I understand how buffered by default would have solved this issue for you. However, it's a bit antithetical to the zig philosophy to opt you in by default to abstractions that you didnt ask for. Another thing that C has by default is locking on stdio files, but zig expects you to manage your own locking if necessary. (std.debug.warn has locking, don't worry)

I agree the interface is a bit clunky. That's an important issue to solve independent of stdio.

I think this issue will be solved with sufficiently complete and available std lib documentation, which is not yet the case today. I think it's fair to expect a programmer to have read the documentation for every function call they use.

All 12 comments

Also has the same performance issues using std.io.readLine

And also when opening '/dev/stdin' as a file and reading from that.

Also seems others have had perf issues with this in the past https://freenode.irclog.whitequark.org/zig/2019-11-02 mentioned here

Actually the performance issues are not just limited to stdin, here's the same but reading the file directly.

const warn = @import("std").debug.warn;
const std = @import("std");

pub fn main() !void {
    const file = try std.fs.cwd().openFile("big.tsv", .{.read=true, .write=false});
    var stream = file.inStream().stream;
    var buffer: [1000]u8 = undefined;

    var count: usize = 0;
    var timer = try std.time.Timer.start();
    while (std.io.readLineSlice(buffer[0..])) {
        warn("{}\n", .{timer.lap()});
        count += 1;
    } else |err| {
        switch (err) {
            error.EndOfStream => {},
            else => return err,
        }
    }
    warn("{}\n", .{count});
}

This also performs poorly, taking 125 microseconds to read a line.

https://github.com/ziglang/zig/blob/1baaf9a503bd399f4da4a9d3e80695b1739ea966/lib/std/io/in_stream.zig#L107

So using a buffered stream is an option, but even buffered read methods like the above do not seem to actually buffer reads and just read a byte at a time.

Try using the buffered input stream

That's not a buffered read method; that's a method to read into a std.io.Buffer. For buffering, use a buffered input stream (as Andrew mentioned).

Maybe the buffered input should be the default and those who really want unbuffered stream could use UnbufferedStream

Agreed, considering how awkward the buffered API is to use:

const warn = @import("std").debug.warn;
const std = @import("std");

pub fn main() !void {
    var stream = std.io.BufferedInStream(std.fs.File.InStream.Error).init(&std.io.getStdIn().inStream().stream).stream;
    var buffer: [1000]u8 = undefined;

    var count: usize = 0;
    var timer = try std.time.Timer.start();

    while (std.io.readLine(&stream, buffer[0..])) {
        warn("{}\n", .{timer.lap()});
        count += 1;
    } else |err| {
        switch (err) {
            error.EndOfStream => {},
            else => return err,
        }
    }
    warn("{}\n", .{count});
}

This results in an integer overflow for some reason. So I'm stumped.

yeah, using the BufferedInStream fixed the issue for me, but I would not have found that had it not been mentioned here. Defaulting to buffered and making that the clean API would be nice.

Side note, this was my first foray into Zig. I love the language!

Working code for me:

const warn = @import("std").debug.warn;
const std = @import("std");
const mem = std.mem;

const Record = struct {
    name: []u8,
    count: u32,

    pub fn init(allocator: *mem.Allocator, input: []u8) !Record {
        // split the string
        var vals = mem.separate(input, "\t");
        var count: u32 = 0;
        while (vals.next()) |val| {
            if (mem.indexOf(u8, val[1..4], "bc")) |pos| {
                count = count + 1;
            }
            if (mem.indexOf(u8, val[1..4], "BC")) |pos| {
                count = count + 1;
            }
        }
        const name_slice = input[0..@intCast(usize, mem.indexOf(u8, input, "\t").?)];
        const name = try allocator.alloc(u8, name_slice.len);
        mem.copy(u8, name, name_slice);
        return Record{ .name = name, .count = count };
    }
};

/// A readLine funciton like nims
pub fn readLine(fh: *std.File, buffer: *[]u8) !void {}

pub fn main() !void {
    var mainArena = std.heap.ArenaAllocator.init(std.heap.c_allocator);
    errdefer mainArena.deinit();
    const arena = &mainArena.allocator;

    const input = try std.fs.File.openRead("/dev/stdin");
    defer input.close();

    var buf_in = std.io.BufferedInStream(std.os.ReadError).init(&input.inStream().stream);
    const stream = &buf_in.stream;
    var buffer = std.Buffer.initNull(arena);
    var records = std.ArrayList(Record).init(arena);
    while (stream.readUntilDelimiterBuffer(&buffer, '\n', 1000)) {
        try records.append(try Record.init(arena, buffer.toSlice()));
    } else |err| {
        switch (err) {
            error.EndOfStream => {},
            else => return err,
        }
    }
    var total: u32 = 0;
    for (records.toSlice()) |r| {
        total += r.count;
    }
    warn("{}", total);
}

That did the job in a little over a second, putting in on par with my C version, despite the dumb stuff I'm doing in string compares.

I understand how buffered by default would have solved this issue for you. However, it's a bit antithetical to the zig philosophy to opt you in by default to abstractions that you didnt ask for. Another thing that C has by default is locking on stdio files, but zig expects you to manage your own locking if necessary. (std.debug.warn has locking, don't worry)

I agree the interface is a bit clunky. That's an important issue to solve independent of stdio.

I think this issue will be solved with sufficiently complete and available std lib documentation, which is not yet the case today. I think it's fair to expect a programmer to have read the documentation for every function call they use.

I think that's reasonable.

One part of the current API that's a bit awkward is having to pass an error type to the BufferedInStream. Wouldn't one always want the buffered stream to just pull the error type off the stream being buffered? It wouldn't be so bad to type std.io.buffered(&stream) which synthesises the new stream type with the existing error type and calls its constructor.

edit: I guess this would impose a compile time interface on stream that doesn't currently exist, so maybe this could exist as a member on some common streams like file, in which case the syntax would be some_file.InStream().bufferedStream()

Was this page helpful?
0 / 5 - 0 ratings

Related issues

andersfr picture andersfr  路  3Comments

andrewrk picture andrewrk  路  3Comments

fengb picture fengb  路  3Comments

komuw picture komuw  路  3Comments

daurnimator picture daurnimator  路  3Comments