Nim: questionable `csize` definition in `system.nim`

Created on 14 Sep 2019  路  16Comments  路  Source: nim-lang/Nim

I'm sure the story of csize is different from cstring.

Current definition

csize* {.importc: "size_t", nodecl.} = int
    ## This is the same as the type ``size_t`` in *C*.

what interesting is the comment claims csize has the same type as size_t in C.
in reality, when we take a look at C or CPP header file, especially stddef.h, they agree it should be an unsigned int.

CPP doc and GNU C doc also agree size_t should be unsigned int.

if csize is an int, it implies csize can be negative or cannot exceed high(int) which is questionable.

also if it is used in FFI, it can be a subtle bug, very difficult to track down.

Most helpful comment

We had this discussion internally already, but I repeat my points for the public here. I agree that size_t should have been a signed integer for C and C++. I also agree that C's way of using unsigned for Natural is broken. But in my opinion that is pretty much irrelevant here, because csize is not about that at all. It is only for wrapper code. And here it is important the csize is actually what size_t is in C, and not what you wish size_t should have been in C.

All 16 comments

I strongly agree here. csize should be uint. I remember I had a discussion about this with @Araq where I told him that we need to change this in order to be correct, but can't remember his arguments anymore. Maybe he can repeat them. I thought I had an issue about this open already, but aparently there isn't.

Sizes beyond 2GB on 32 bit machines never come up anyway and C's way of using unsigned for "Natural" is so broken that I don't want to replicate it.

We had this discussion internally already, but I repeat my points for the public here. I agree that size_t should have been a signed integer for C and C++. I also agree that C's way of using unsigned for Natural is broken. But in my opinion that is pretty much irrelevant here, because csize is not about that at all. It is only for wrapper code. And here it is important the csize is actually what size_t is in C, and not what you wish size_t should have been in C.

Cool but it is what it is and you ignored my actual argument which is "it simply doesn't come up in practice".

From my understanding of that csize definition, it would mean that:

  • In the generated C code, the type will be size_t.
  • In nimvm, it will be int.

If that's the case, then we should just change it to uint, since IIRC we have proper range checking for uint now, so that would help catching csize < 0 bugs.

This will be helpful in practice as ISO C defines size_t as:

Unsigned integer type of the result of the sizeof operator.

So having csize changed into a type that errors on values < 0 at compile time will help eliminate problems at runtime.

it simply doesn't come up in practice

What about memory mapped file? it can easily exceed 2GB. And actually I found this discrepancy when I thought I can use csize in place of SIZE_T used in a memory mapped proc.

Yes, you can use it in place of size_t because it is size_t, check the generated code for more information.

In your case, it's clear that size_t didn't fit the bill. You should use a bigger type like int64 instead.

Yes, you can use it in place of size_t because it is size_t, check the generated code for more information.

probably the codegen is correct, but when we are still in Nim side and not yet crossing to the C, the Nim compiler will treat it as an int.

In your case, it's clear that size_t didn't fit the bill. You should use a bigger type like int64 instead.

It's an OS proc, I cannot choose how it is implemented.

Yet you can use a different type in your wrapper.

but i agree with @jangko

What about memory mapped file? it can easily exceed 2GB.

Sorry, would break too much code for no gain. Rejected.

This is now properly fixed. So the "Won't fix" label can be removed.

Should we reattach won't fix?

proc foobar(x,y: csize): void =
  echo x, " < ", y, ": ", x < y

const
  a = csize(-1)
  b = csize(123)

foobar(a,b)
static:
  foobar(a,b)

the intent of csize is clearly that it's unsigned - the documentation says so straight out, so any code using it is already broken with regards to this intent - it's UB land.

re overloads, this is generally an something that the person wrapping the C codeneeds to sort out - if you think you have better ideas about the signedness of types than C, it's fair that there is an explicit conversion. Hiding this conversion inside the generated C code, where it will be swallowed by the many NI casts that nim outputs is worse: it's hidden from view and the side effects only come out in less tested corner cases - the C compiler will use one definition and the nim compiler another.

using int here also leads to unexpected behaviour with regards to wrapping and exceptions: csize, being unsigned, has wrapping behaviour - again, this makes csize depart from its documented behaviour because in nim, signed integers raise.

typically in C, size_t is used where you expect that you might need its full range - large memory buffers.

We now have system.csize_t and csize is deprecated.

Was this page helpful?
0 / 5 - 0 ratings