Nim: Proposal: endinanness-aware I/O in stdlib streams

Created on 2 Dec 2019  路  11Comments  路  Source: nim-lang/Nim

I've been working on an endianness-aware buffered file I/O module for quite a while now and have just realised that 1) I could actually use streams instead, 2) this functionality is generic enough to be added to the streams standard module.

Not sure what the general view is on extending the standard library at this point, so I thought I'd ask here first to avoid wasting my time.

The changes I'm proposing would not change anything existing; I'd just add the following functions to support endianness-aware I/O. The BE and LE postfixes indicate the endinanness of the data in the stream. For example, readInt16BE would read 2 bytes from the stream and swap them on a little-endian platform, but perform no conversion on a big-endian platform.

The single-value read & write functions would look like this:

proc readInt16BE(s: Stream): int16
proc readInt16LE(s: Stream): int16
...
proc readFloat64BE(s: Stream): float64
proc readFloat64LE(s: Stream): float64

proc peekInt16BE(s: Stream): int16
proc peekInt16LE(s: Stream): int16
...
proc peekFloat64BE(s: Stream): float64
proc peekFloat64LE(s: Stream): float64

proc writeInt16BE(s: Stream): int16
proc writeInt16LE(s: Stream): int16
...
proc writeFloat64BE(s: Stream): float64
proc writeFloat64LE(s: Stream): float64

And here's the interface for the chunked versions (LE variants omitted for brevity):

proc writeDataBE*(s: Stream, buffer: openArray[int16|uint16], numItems: Natural)
proc writeDataBE*(s: Stream, buffer: openArray[int32|uint32|float32], numItems: Natural)
proc writeDataBE*(s: Stream, buffer: openArray[int64|uint64|float64], numItems: Natural)

proc writeData16BE*(s: Stream, buffer: pointer, bufLen: Natural)
proc writeData32BE*(s: Stream, buffer: pointer, bufLen: Natural)
proc writeData64BE*(s: Stream, buffer: pointer, bufLen: Natural)

proc readData16BE*(s: Stream, buffer: var openArray[int16|uint16], numItems: Natural)
proc readData32BE*(s: Stream, buffer: var openArray[int32|uint32|float32], numItems: Natural)
proc readData64BE*(s: Stream, buffer: var openArray[int64|uint64|float64], numItems: Natural)

proc readData16BE*(s: Stream, buffer: pointer, bufLen: Natural)
proc readData32BE*(s: Stream, buffer: pointer, bufLen: Natural)
proc readData64BE*(s: Stream, buffer: pointer, bufLen: Natural)

This should cover all use cases and it's not causing breaking changes because
we're just adding functionality.

Let me know what you think. I'm happy to implement this this in streams, but
if you don't like the idea for some reason, I'll just carry on with my own
library :)

Stdlib

Most helpful comment

the way to go about this would be to develop said library, battle-test it, _then_ suggest it for stdlib inclusion, if it really turns out to be generally useful

All 11 comments

Love the idea, not sure about the function naming scheme :)

@dom96 Cheers! Apart from the naming, do you have any comments about the proposed API? Names like readUint64AsLittleEndian would be perhaps more descriptive but pretty tedious to use and just way too long. Any better ideas? I thought the BE/LE postfixes were pretty standard in such libraries.

What we could also do is an additional Endianness parameter, then people like me could alias the values to BE and LE :)

BE/LE is standard, we use it all over the place as well in Status codebase.

We use those names fromBytes and fromBytesBE with a typedesc parameter:

https://github.com/status-im/nim-stew/blob/1c4293b3e754b5ea68a188b60b192801162cd44e/stew/endians2.nim

func fromBytes*(
    T: typedesc[SomeEndianInt],
    x: array[sizeof(T), byte],
    endian: Endianness = system.cpuEndian): T {.inline.} =
  ## Convert a byte sequence to a native endian integer. By default, native
  ## endianness is used which is not portable!
  for i in 0..<sizeof(result): # No copymem in vm
    result = result or T(x[i]) shl (i * 8)

  if endian != system.cpuEndian:
    result = swapBytes(result)

func fromBytes*(
    T: typedesc[SomeEndianInt],
    x: openArray[byte],
    endian: Endianness = system.cpuEndian): T {.inline.} =
  ## Read bytes and convert to an integer according to the given endianness. At
  ## runtime, v must contain at least sizeof(T) bytes. By default, native
  ## endianness is used which is not portable!
  ##
  ## REVIEW COMMENT (zah)
  ## This API is very strange. Why can't I pass an open array of 3 bytes
  ## to be interpreted as a LE number? Also, why is `endian` left as a
  ## run-time parameter (with such short functions, it could easily be static).

  const ts = sizeof(T) # Nim bug: can't use sizeof directly
  var tmp: array[ts, byte]
  for i in 0..<tmp.len: # Loop since vm can't copymem
    tmp[i] = x[i]
  fromBytes(T, tmp, endian)

func fromBytesBE*(
    T: typedesc[SomeEndianInt],
    x: array[sizeof(T), byte]): T {.inline.} =
  ## Read big endian bytes and convert to an integer. By default, native
  ## endianness is used which is not portable!
  fromBytes(T, x, bigEndian)

func fromBytesBE*(
    T: typedesc[SomeEndianInt],
    x: openArray[byte]): T {.inline.} =
  ## Read big endian bytes and convert to an integer. At runtime, v must contain
  ## at least sizeof(T) bytes. By default, native endianness is used which is
  ## not portable!
  fromBytes(T, x, bigEndian)

I think we need a way to evolve the standard library core APIs like streams or endians.

@mratsim Thanks for, it gave me some good ideas! I've decided to do the generic typedesc approach instead. Hope you won't mind that I'll borrow a few ideas from the snippet above :)

I think these are welcome additions to the stdlib if introduced in a new module. I'm not a fan of bloating the existing modules any further -- at least until IC reaches production quality, or even alpha status.

@Araq Sure, so you're thinking of introducing streams2 just with the extra functions and possibly merging it with streams and/or deprecating the non-endian aware functions later perhaps? Sorry I don't know what "IC" refers to in this context.

EDIT: I think I figured it out... Incremental Compiler :)

I would name it std / binstreams.

Ok cool, that's a good name!

Be sure to check out our own faststreams that use memory mapping here for zero overhead streaming: https://github.com/status-im/nim-faststreams.

Though it might be a tad too big (600 lines + a couple of ptr arithmetics helpers).

@mratsim Thanks for that!

the way to go about this would be to develop said library, battle-test it, _then_ suggest it for stdlib inclusion, if it really turns out to be generally useful

Was this page helpful?
0 / 5 - 0 ratings