Nim: questionable `csize` definition in `system.nim`

Created on 14 Sep 2019 · 16Comments · Source: nim-lang/Nim

I'm sure the story of csize is different from cstring.

Current definition

csize* {.importc: "size_t", nodecl.} = int
    ## This is the same as the type ``size_t`` in *C*.

what interesting is the comment claims csize has the same type as size_t in C.
in reality, when we take a look at C or CPP header file, especially stddef.h, they agree it should be an unsigned int.

CPP doc and GNU C doc also agree size_t should be unsigned int.

if csize is an int, it implies csize can be negative or cannot exceed high(int) which is questionable.

also if it is used in FFI, it can be a subtle bug, very difficult to track down.

Source

jangko

👍4

Most helpful comment

We had this discussion internally already, but I repeat my points for the public here. I agree that size_t should have been a signed integer for C and C++. I also agree that C's way of using unsigned for Natural is broken. But in my opinion that is pretty much irrelevant here, because csize is not about that at all. It is only for wrapper code. And here it is important the csize is actually what size_t is in C, and not what you wish size_t should have been in C.

krux02 on 17 Sep 2019

👍3

All 16 comments

I strongly agree here. csize should be uint. I remember I had a discussion about this with @Araq where I told him that we need to change this in order to be correct, but can't remember his arguments anymore. Maybe he can repeat them. I thought I had an issue about this open already, but aparently there isn't.

krux02 on 14 Sep 2019

Sizes beyond 2GB on 32 bit machines never come up anyway and C's way of using unsigned for "Natural" is so broken that I don't want to replicate it.

Araq on 16 Sep 2019

krux02 on 17 Sep 2019

👍3

Cool but it is what it is and you ignored my actual argument which is "it simply doesn't come up in practice".

Araq on 17 Sep 2019

From my understanding of that csize definition, it would mean that:

In the generated C code, the type will be size_t.
In nimvm, it will be int.

If that's the case, then we should just change it to uint, since IIRC we have proper range checking for uint now, so that would help catching csize < 0 bugs.

This will be helpful in practice as ISO C defines size_t as:

Unsigned integer type of the result of the sizeof operator.

So having csize changed into a type that errors on values < 0 at compile time will help eliminate problems at runtime.

alaviss on 19 Sep 2019

it simply doesn't come up in practice

What about memory mapped file? it can easily exceed 2GB. And actually I found this discrepancy when I thought I can use csize in place of SIZE_T used in a memory mapped proc.

jangko on 24 Sep 2019

Yes, you can use it in place of size_t because it is size_t, check the generated code for more information.

In your case, it's clear that size_t didn't fit the bill. You should use a bigger type like int64 instead.

alaviss on 24 Sep 2019

😕1

Yes, you can use it in place of size_t because it is size_t, check the generated code for more information.

probably the codegen is correct, but when we are still in Nim side and not yet crossing to the C, the Nim compiler will treat it as an int.

In your case, it's clear that size_t didn't fit the bill. You should use a bigger type like int64 instead.

It's an OS proc, I cannot choose how it is implemented.

jangko on 24 Sep 2019

Yet you can use a different type in your wrapper.

Araq on 30 Sep 2019

but i agree with @jangko

What about memory mapped file? it can easily exceed 2GB.

alehander92 on 30 Sep 2019

Sorry, would break too much code for no gain. Rejected.

Araq on 1 Oct 2019

This is now properly fixed. So the "Won't fix" label can be removed.

Clyybber on 8 Oct 2019

Should we reattach won't fix?

krux02 on 18 Oct 2019

proc foobar(x,y: csize): void =
  echo x, " < ", y, ": ", x < y

const
  a = csize(-1)
  b = csize(123)

foobar(a,b)
static:
  foobar(a,b)

krux02 on 19 Oct 2019

the intent of csize is clearly that it's unsigned - the documentation says so straight out, so any code using it is already broken with regards to this intent - it's UB land.

re overloads, this is generally an something that the person wrapping the C codeneeds to sort out - if you think you have better ideas about the signedness of types than C, it's fair that there is an explicit conversion. Hiding this conversion inside the generated C code, where it will be swallowed by the many NI casts that nim outputs is worse: it's hidden from view and the side effects only come out in less tested corner cases - the C compiler will use one definition and the nim compiler another.

using int here also leads to unexpected behaviour with regards to wrapping and exceptions: csize, being unsigned, has wrapping behaviour - again, this makes csize depart from its documented behaviour because in nim, signed integers raise.

typically in C, size_t is used where you expect that you might need its full range - large memory buffers.