Zig: add to standard library: parse floats

Created on 20 May 2017 · 14Comments · Source: ziglang/zig

Depends on #374

enhancement standard library

Source

andrewrk

Most helpful comment

FYI, I've implemented ParseNumber which does both integers and floats and wrapped parseFloat(comptime T: type, bytes: []const u8) around it. I'd be glad to refactor it into a pull request if desired. And, of course, any other comments welcome.

Some Examples:

test "ParseNumber.parseFloat" {
    assert((try parseFloat(f64, "-1.000_001e10")) == -1.000001e10);
    assert((try parseFloat(f32, "0.1")) == 0.1);

    // Couple of examples using ParseNumber directly
    assert((try ParseNumber(i32).parse("0x1234_ABCD")) == 0x1234ABCD);
    const parseF32 = ParseNumber(f32).parse;
    assert((try parseF32("-1.0")) == -1.0);
}

winksaville on 22 Nov 2018

👍2

All 14 comments

Depends on #405

andrewrk on 9 Jul 2017

I've got a naive implementation of this in my tick repo, lib/float.zig

raulgrell on 7 Aug 2017

We have a robust implementation of this in zig now; it just needs a more complete bigint implementation (#405). It's a port of errol: https://github.com/marcandrysco/Errol

You can see the compile errors we get when you try to print a floating point now:

const io = @import("std").io;

pub fn main() -> %void {
    %%io.stdout.printf("Hello, world!\n{}\n", f32(12.34));
}

/home/andy/dev/zig/build/lib/zig/std/fmt/errol/index.zig:253:27: error: float value 10000000000000000000.000000 cannot be implicitly casted to type 'u128'
    const l64 = u64(low % pow19);
                          ^

hmm actually that's not what I was expecting. I think actually the bigint implementation may no longer be blocking this issue.

andrewrk on 7 Aug 2017

Printing floats is basically working but there are bugs. I'm going to do some fuzz testing and see where the C errol code and zig port get different answers, then step them both side by side and see what happens differently.

andrewrk on 18 Aug 2017

A wee test to trigger an edge case "integer overflow", just swap the comments on // if (false) and the next line

const print = @import("std").io.stdout.printf;
test "f64 print" {
    %%print("\n");
    var iter: usize = 0;
    while (iter < 16) : (iter += 1) {
        const u: u64 = (9007199254740992 - 3) + iter; // (2^53 - 3) + iter
        const uf: f64 = f64(u);
        const ufu: u64 = u64(uf);
        // integer overflow for 9007199254740992 and 9007199254740993
        // other neighbouring values seem ok
        // if (false) {
        if ((u == 9007199254740992) or (u == 9007199254740993)) {
            %%print("zig 64-bit: u64={} -> f64=n/a -> u64={} workaround\n", u, ufu);
        } else {
            %%print("zig 64-bit: u64={} -> f64={} -> u64={}\n", u, uf, ufu);
        }
    }
}

It's near when the f64 no longer can represent the large integer value.

```zig 64-bit: u64=9007199254740989 -> f64=9.007199e18 -> u64=9007199254740989
zig 64-bit: u64=9007199254740990 -> f64=9.007199e18 -> u64=9007199254740990
zig 64-bit: u64=9007199254740991 -> f64=9.007199e18 -> u64=9007199254740991
zig 64-bit: u64=9007199254740992 -> f64=n/a -> u64=9007199254740992 workaround
zig 64-bit: u64=9007199254740993 -> f64=n/a -> u64=9007199254740992 workaround
zig 64-bit: u64=9007199254740994 -> f64=9.007199e18 -> u64=9007199254740994
zig 64-bit: u64=9007199254740995 -> f64=9.007199e18 -> u64=9007199254740996
zig 64-bit: u64=9007199254740996 -> f64=9.007199e18 -> u64=9007199254740996

C 64-bit: u64=9007199254740989 -> f64=9007199254740989.000000 -> u64=9007199254740989
C 64-bit: u64=9007199254740990 -> f64=9007199254740990.000000 -> u64=9007199254740990
C 64-bit: u64=9007199254740991 -> f64=9007199254740991.000000 -> u64=9007199254740991
C 64-bit: u64=9007199254740992 -> f64=9007199254740992.000000 -> u64=9007199254740992
C 64-bit: u64=9007199254740993 -> f64=9007199254740992.000000 -> u64=9007199254740992
C 64-bit: u64=9007199254740994 -> f64=9007199254740994.000000 -> u64=9007199254740994
```

ofelas on 20 Aug 2017

Brought up in irc, but we currently have some problems on some simple cases too. Tested on latest master.

const io = @import("std").io;
pub fn main() -> %void {
    var a: f64 = 0.5;
    var b: f64 = 0.45;
    var c: f64 = 0.045;
    %%io.stdout.printf("{}\n", a); // -> 0.0
    %%io.stdout.printf("{}\n", b); // -> 0.45
    %%io.stdout.printf("{}\n", c); // -> 4.5e-1
}

$ ./float-example
0.0
0.45
4.5e-1

tiehuis on 6 Sep 2017

A more catastrophic case for reference.

pub fn main() -> %void {
    %%@import("std").io.stdout.printf("{}\n", f64(0.0));
}

integer overflow
/home/me/local/lib/zig/std/special/zigrt.zig:16:35: 0x000000000020b745 in ??? (t)
        @import("std").debug.panic("{}", message_ptr[0..message_len]);
                                  ^
/home/me/local/lib/zig/std/fmt/errol/index.zig:67:13: 0x000000000021ef3b in ??? (t)
        exp -= 1;
            ^
/home/me/local/lib/zig/std/fmt/errol/index.zig:28:19: 0x000000000020b48b in ??? (t)
    return errol3u(value, buffer);
                  ^
/home/me/local/lib/zig/std/fmt/index.zig:248:33: 0x000000000021caa2 in ??? (t)
    const float_decimal = errol3(f64(value), buffer[0..]);
                                ^
/home/me/local/lib/zig/std/fmt/index.zig:183:31: 0x000000000021b547 in ??? (t)
            return formatFloat(value, context, output);

tiehuis on 10 Sep 2017

I need to print floats, so I've been poking at this today and I have some improvements in this commit. Here's what I did

Errol doesn't handle NaNs, the infinities, the zeros, or negative numbers. Those need to be done at some higher level. I just put it in formatFloat.
The 0.5, 0.45, 0.045 examples Errol actually gets right and they're just formatted wrong in formatFloat.

This is because the special case for exponent=0 is handled wrong. I just dropped the special case, which means 0.5 is now printed as 5.0e-1, which is less pretty but its easier to check that the code's correct.
The seven digits we emit aren't enough to uniquely specify even an f32 if that was the intention. I think we need 8 for f32s and 17 for f64s.
Is 20 bytes enough for the buffer size? errolN_proc in the Errol repo uses 32. I bumped it up since its basically free since the array is undefined anyway.
I think this line in our Errol impl, corresponding to this line in the Errol repo, is a typo. I changed it but didn't notice any difference.
I also changed fpprev/fpnext to do what C does, but I don't know if it makes any difference.

This has been working for me so far, but if I wanted to PR it, how do you think I should test it?

scurest on 23 Oct 2017

Thanks for looking into this.

This has been working for me so far, but if I wanted to PR it, how do you think I should test it?

One idea: write some C code to generate random floats and print them with printf. Output the float values as well as the printed strings. Have Zig print the same float values and check if you get the same results.

andrewrk on 24 Oct 2017

👍1

I'll pick up parsing IEEE-754 floating-point numbers from strings.

nwsharp on 24 Jul 2018

👍1

Can we get a checklist of what needs to happen here?

kristate on 2 Aug 2018

What needs to happen here is std.fmt.parseFloat:

/// `bytes` is a utf-8 encoded string
pub fn parseFloat(comptime T: type, bytes: []const u8) !T {
    // ...
}

andrewrk on 28 Aug 2018

Some Examples:

test "ParseNumber.parseFloat" {
    assert((try parseFloat(f64, "-1.000_001e10")) == -1.000001e10);
    assert((try parseFloat(f32, "0.1")) == 0.1);

    // Couple of examples using ParseNumber directly
    assert((try ParseNumber(i32).parse("0x1234_ABCD")) == 0x1234ABCD);
    const parseF32 = ParseNumber(f32).parse;
    assert((try parseF32("-1.0")) == -1.0);
}

winksaville on 22 Nov 2018

👍2

FWIW LuaJIT has an implementation that might be worth copying: https://github.com/LuaJIT/LuaJIT/blob/b025b01c5b9d23f6218c7d72b7aafa3f1ab1e08a/src/lj_strscan.c