I changed current implementation:
pub fn skipBytes(self: Self, num_bytes: u64) !void {
var i: u64 = 0;
while (i < num_bytes) : (i += 1) {
_ = try self.readByte();
}
}
to
pub fn skipBytes(self: Self, num_bytes: u64) !void {
try self.context.seekBy(@intCast(i64, num_bytes));
}
This solved my issues but I'm not sure than it will work for all streams (e.g. non seekable).
io API is still a dark forest for me. :)
Yeah, the seekBy won't work on all streams. An improvement could be made by replacing the readByte with reading into a temporary buffer.
I drew up a simple proof of concept improvement of skipBytes:
const std = @import("std");
const benchmark = @import("zig-bench/bench.zig").benchmark;
fn skipByte(reader: anytype, num_bytes: u64) !void {
const buf_size = 512;
var buf: [buf_size]u8 = undefined;
var remaining = num_bytes;
while (remaining > 0) {
const amt = std.math.min(remaining, buf_size);
try reader.readNoEof(buf[0..amt]);
remaining -= amt;
}
}
pub fn main() !void {
try benchmark(struct {
pub fn skipByteStd() !void {
var buf: [4096]u8 = undefined;
var reader = std.io.fixedBufferStream(&buf);
try reader.reader().skipBytes(4000);
}
pub fn skipByteNew() !void {
var buf: [4096]u8 = undefined;
var reader = std.io.fixedBufferStream(&buf);
try skipByte(reader.reader(), 4000);
}
});
}
The benchmark results:
Benchmark Iterations Mean(ns)
------------------------------------
skipByteStd(0) 10000 120217
skipByteNew(0) 38547 12971
However, in release-fast, the benchmarks look like this:
Benchmark Iterations Mean(ns)
------------------------------------
skipByteStd(0) 100000 19
skipByteNew(0) 100000 18
Seems that LLVM optimises this quite a bit
However, in release-safe, performance improvements are still very noticeable.
joachim@joachimsNixos ~/asdf> zig run read_byte.zig
Benchmark Iterations Mean(ns)
------------------------------------
skipByteStd(0) 10000 120217
skipByteNew(0) 38547 12971
joachim@joachimsNixos ~/asdf> zig run --release-fast read_byte.zig
Benchmark Iterations Mean(ns)
------------------------------------
skipByteStd(0) 100000 19
skipByteNew(0) 100000 18
joachim@joachimsNixos ~/asdf> zig run --release-safe read_byte.zig
Benchmark Iterations Mean(ns)
------------------------------------
skipByteStd(0) 100000 3356
skipByteNew(0) 100000 16
joachim@joachimsNixos ~/asdf>
@joachimschmidt557
Can you benchmark with my ugly patch, please? :)
@data-man Benchmark code:
const std = @import("std");
const benchmark = @import("zig-bench/bench.zig").benchmark;
fn skipByte(reader: anytype, num_bytes: u64) !void {
try reader.context.seekBy(@intCast(i64, num_bytes));
}
pub fn main() !void {
try benchmark(struct {
pub fn skipByteStd() !void {
var file = try std.fs.cwd().openFile("file", .{});
defer file.close();
var reader = file.reader();
try reader.skipBytes(400);
}
pub fn skipByteNew() !void {
var file = try std.fs.cwd().openFile("file", .{});
defer file.close();
var reader = file.reader();
try skipByte(reader, 400);
}
});
}
Benchmark results:
joachim@joachimsNixos ~/asdf> zig run read_byte.zig
Benchmark Iterations Mean(ns)
------------------------------------
skipByteStd(0) 10000 226675
skipByteNew(0) 100000 2317
joachim@joachimsNixos ~/asdf> zig run --release-fast read_byte.zig
Benchmark Iterations Mean(ns)
------------------------------------
skipByteStd(0) 10000 210312
skipByteNew(0) 100000 1902
joachim@joachimsNixos ~/asdf> zig run --release-safe read_byte.zig
Benchmark Iterations Mean(ns)
------------------------------------
skipByteStd(0) 10000 210145
skipByteNew(0) 100000 1933
joachim@joachimsNixos ~/asdf>
But as I was saying, this patch does not apply for all streams. If you want to get the full potential of the underlying structure (file), you should use std.fs.File directly instead of std.io.Reader.
However, in release-fast, the benchmarks look like this
The reader's position doesn't get clobbered after skipBytes, so any piece of code that isn't directly involved in returning an error will be marked dead. That might be why you're seeing such a huge reduction. The optimizer is probably deleting the whole thing. That said, it still looks like your code is much faster, so nice work!
With #6174 merged, this can be closed for now.