On 0.5.0+357f42da6 the following takes around 125 microseconds to read each line from stdin
const warn = @import("std").debug.warn;
const std = @import("std");
const mem = std.mem;
pub fn main() !void {
const allocator = std.heap.page_allocator;
const stream = &std.io.getStdIn().inStream().stream;
var buffer = try std.Buffer.initCapacity(allocator, 1000);
var count: usize = 0;
var timer = try std.time.Timer.start();
while (stream.readUntilDelimiterBuffer(&buffer, '\n', 1000)) {
warn("{}\n", .{timer.lap()});
count += 1;
} else |err| {
switch (err) {
error.EndOfStream => {},
else => return err,
}
}
warn("{}\n", .{count});
}
With the test data:
awk 'BEGIN{for (i=0; i<10000; i++){print "abcdef\tghijk\tlmnop\tqrstuv\twxyz1234\tABCDEF\tHIJK\tLMNOP\tQRSTUV\tWXYZ123"}}' > big.tsv
Performance is bad on debug and release fast.
main < big.tsv
Perf issues observed on windows and macos.
Also has the same performance issues using std.io.readLine
And also when opening '/dev/stdin' as a file and reading from that.
Also seems others have had perf issues with this in the past https://freenode.irclog.whitequark.org/zig/2019-11-02 mentioned here
Actually the performance issues are not just limited to stdin, here's the same but reading the file directly.
const warn = @import("std").debug.warn;
const std = @import("std");
pub fn main() !void {
const file = try std.fs.cwd().openFile("big.tsv", .{.read=true, .write=false});
var stream = file.inStream().stream;
var buffer: [1000]u8 = undefined;
var count: usize = 0;
var timer = try std.time.Timer.start();
while (std.io.readLineSlice(buffer[0..])) {
warn("{}\n", .{timer.lap()});
count += 1;
} else |err| {
switch (err) {
error.EndOfStream => {},
else => return err,
}
}
warn("{}\n", .{count});
}
This also performs poorly, taking 125 microseconds to read a line.
So using a buffered stream is an option, but even buffered read methods like the above do not seem to actually buffer reads and just read a byte at a time.
Try using the buffered input stream
That's not a buffered read method; that's a method to read into a std.io.Buffer. For buffering, use a buffered input stream (as Andrew mentioned).
Maybe the buffered input should be the default and those who really want unbuffered stream could use UnbufferedStream
Agreed, considering how awkward the buffered API is to use:
const warn = @import("std").debug.warn;
const std = @import("std");
pub fn main() !void {
var stream = std.io.BufferedInStream(std.fs.File.InStream.Error).init(&std.io.getStdIn().inStream().stream).stream;
var buffer: [1000]u8 = undefined;
var count: usize = 0;
var timer = try std.time.Timer.start();
while (std.io.readLine(&stream, buffer[0..])) {
warn("{}\n", .{timer.lap()});
count += 1;
} else |err| {
switch (err) {
error.EndOfStream => {},
else => return err,
}
}
warn("{}\n", .{count});
}
This results in an integer overflow for some reason. So I'm stumped.
yeah, using the BufferedInStream fixed the issue for me, but I would not have found that had it not been mentioned here. Defaulting to buffered and making that the clean API would be nice.
Side note, this was my first foray into Zig. I love the language!
Working code for me:
const warn = @import("std").debug.warn;
const std = @import("std");
const mem = std.mem;
const Record = struct {
name: []u8,
count: u32,
pub fn init(allocator: *mem.Allocator, input: []u8) !Record {
// split the string
var vals = mem.separate(input, "\t");
var count: u32 = 0;
while (vals.next()) |val| {
if (mem.indexOf(u8, val[1..4], "bc")) |pos| {
count = count + 1;
}
if (mem.indexOf(u8, val[1..4], "BC")) |pos| {
count = count + 1;
}
}
const name_slice = input[0..@intCast(usize, mem.indexOf(u8, input, "\t").?)];
const name = try allocator.alloc(u8, name_slice.len);
mem.copy(u8, name, name_slice);
return Record{ .name = name, .count = count };
}
};
/// A readLine funciton like nims
pub fn readLine(fh: *std.File, buffer: *[]u8) !void {}
pub fn main() !void {
var mainArena = std.heap.ArenaAllocator.init(std.heap.c_allocator);
errdefer mainArena.deinit();
const arena = &mainArena.allocator;
const input = try std.fs.File.openRead("/dev/stdin");
defer input.close();
var buf_in = std.io.BufferedInStream(std.os.ReadError).init(&input.inStream().stream);
const stream = &buf_in.stream;
var buffer = std.Buffer.initNull(arena);
var records = std.ArrayList(Record).init(arena);
while (stream.readUntilDelimiterBuffer(&buffer, '\n', 1000)) {
try records.append(try Record.init(arena, buffer.toSlice()));
} else |err| {
switch (err) {
error.EndOfStream => {},
else => return err,
}
}
var total: u32 = 0;
for (records.toSlice()) |r| {
total += r.count;
}
warn("{}", total);
}
That did the job in a little over a second, putting in on par with my C version, despite the dumb stuff I'm doing in string compares.
I understand how buffered by default would have solved this issue for you. However, it's a bit antithetical to the zig philosophy to opt you in by default to abstractions that you didnt ask for. Another thing that C has by default is locking on stdio files, but zig expects you to manage your own locking if necessary. (std.debug.warn has locking, don't worry)
I agree the interface is a bit clunky. That's an important issue to solve independent of stdio.
I think this issue will be solved with sufficiently complete and available std lib documentation, which is not yet the case today. I think it's fair to expect a programmer to have read the documentation for every function call they use.
I think that's reasonable.
One part of the current API that's a bit awkward is having to pass an error type to the BufferedInStream. Wouldn't one always want the buffered stream to just pull the error type off the stream being buffered? It wouldn't be so bad to type std.io.buffered(&stream) which synthesises the new stream type with the existing error type and calls its constructor.
edit: I guess this would impose a compile time interface on stream that doesn't currently exist, so maybe this could exist as a member on some common streams like file, in which case the syntax would be some_file.InStream().bufferedStream()
Most helpful comment
I understand how buffered by default would have solved this issue for you. However, it's a bit antithetical to the zig philosophy to opt you in by default to abstractions that you didnt ask for. Another thing that C has by default is locking on stdio files, but zig expects you to manage your own locking if necessary. (std.debug.warn has locking, don't worry)
I agree the interface is a bit clunky. That's an important issue to solve independent of stdio.
I think this issue will be solved with sufficiently complete and available std lib documentation, which is not yet the case today. I think it's fair to expect a programmer to have read the documentation for every function call they use.