Rust: Need to stress that byte-string literals are not null terminated

Created on 25 Jan 2017  路  9Comments  路  Source: rust-lang/rust

I noticed in another project that someone had ported some C code, and I think they assumed that the best analogue for a C string literal was a Rust ASCII byte string literal.

The problem with this assumption is that C string literals are implicitly null-terminated. In C, "Hello" represents a sequence of six bytes: ['H', 'e', 'l', 'l', 'o', '\0']. But in Rust, b"Hello" represents just the five bytes ['H', 'e', 'l', 'l',' o'].

And that representation runs deep: if you cast the byte string reference to an unsafe pointer and use that to read one byte past the 'o', chances are you will not see a '\0'. (At least, that is my observation.)

It is too easy for a new user to make the mistake of equating these two things. We could add usage notes pointing this out, and advising people to add the '\0' when transcribing C string literals to Rust byte strings.

P-medium T-doc

Most helpful comment

We can still translate strings with trailing null.

Or we can translate them with trailing b" your string literal is buggy\0".

All 9 comments

(originally i was going to suggest an alternative "fix", of changing the representation so that we guarantee that a null byte terminates every byte string literal. But Simon Sapin pointed out to me that the compiler assigns the specific fixed-size array type to each byte string literal, and so adding that null byte would require increasing the length by one, which would be observable in existing type signatures and thus a breaking change...)

would require increasing the length by one

No it wouldn鈥檛. We can still translate strings with trailing null. The problem begins when you start slicing stuff. Pretty much every slice/str operation will not preserve the trailing null and therefore only serves to make the bug in user code more invisible.

@nagisa even stuff like Box::new(b"foo") wouldn't copy this secret null terminator. Maybe we can have a clippy lint that heuristically looks for literals being passed across FFI boundaries. Or even some kind of "FFI sanitizer" (ffigrind?) that wraps all function calls and checks stuff.

@nagisa wrote:

We can still translate strings with trailing null.

I'm trying to understand this suggestion; is the idea that the type of b"Hello" would be [char; 5], but since its generated into the static data area, we would emit 6 characters into the static area (and include the trailing null at the end of it)?

I'm personally not actually all that worried about people who are slicing things; at that point, you've gotten deep enough into the language that hopefully you've acquired an appreciation for how slices are represented to share backing storage.

My main concern was people who are naively transcribing C code. Slicing isn't an issue there, right?

We can still translate strings with trailing null.

Or we can translate them with trailing b" your string literal is buggy\0".

I'm trying to understand this suggestion; is the idea that the type of b"Hello" would be [char; 5], but since its generated into the static data area, we would emit 6 characters into the static area (and include the trailing null at the end of it)?

That鈥檚 the idea, but I鈥檓 significantly opposed to it. It hides bugs in the code. I would much rather code to crash and fix it immediately instead of making the same mistake over nine-thousand times all over place until I realise the mistake.

We certainly could make the note more prominent than what鈥檚 written in the book.

We certainly could make the note more prominent than what鈥檚 written in the book.

More prominent and less vague. In particular, I read that note as referring solely to (character) strings, not necessarily including byte strings.

Docs team triage: p-medium. Not totally sure where exactly this should go, but seems good.

I think I'm going to close this as WONTFIX. Here's why:

  1. We're not tracking issues for the old book. A PR would be welcome though!
  2. The new book no longer has an FFI chapter.
  3. Eventually, we should have some sort of comprehensive FFI documentation, and hopefully this would be discussed there.
  4. I'm not sure where this should go; in theory it could go in std::ffi maybe, but are people going to read those module docs? It's not like they're a guide to FFI generally.

If anyone wants to send in a PR for 1, please do so!

Was this page helpful?
0 / 5 - 0 ratings