Rust: Suggestion: CStr literals

Created on 11 Aug 2020  Â·  15Comments  Â·  Source: rust-lang/rust

Why?

Currently, creating a CStr, even from a bytestring literal, is quite noisy.

// NOTE: Don't forget to add \0 at the end or this is unsound!
let cstr = unsafe { CStr::from_bytes_with_nul_unchecked(b"Hello, World\0") };

Furthermore, there's no way to ensure the well formed-ness of that literal at compile time¹, despite hardcoded C-strings being fairly common when creating bindings for C libraries.

How?

To address this, I would like to propose a new string literal type:

let cstr = c"Hello, World!";

It would function nearly identical to byte-string literals, with the following key differences:

1) Its type is &CStr, not &[u8]
2) It may not contain any nul bytes (\0, \x00 as well as a 'physical' nul byte are forbidden)
3) The compiler automatically adds a nul byte at the end of the string.

See also "Alternatives?" below.

Pros?

  • Fills a niche in the language people are forced to build around²
  • Allows creation of const CStr items (currently not possible on stable)

Cons?

  • Have to commit to adding a new literal (sub)type to the language, which may require small tweaks to the parser

Alternatives?

1) Make associated functions on CStr const. This would still leave the burden of checking on the user.
2) Add a cstr! macro taking a byte-string literal and applying the needed checks and transformations.

¹ It is possible using proc-macros, though this poses different issues regarding ergonomics and stability.
² GitHub code search for CStr::from_bytes_with_nul_unchecked

A-ffi C-feature-request T-lang

Most helpful comment

Another alternative: Provide a cstr! macro that turns a regular string literal into a &'static CStr (and checks that it doesn't contain null bytes)

All 15 comments

Alternative: Make CStr associated methods const. That is a lib changes only rather than a language change.

Another alternative: Provide a cstr! macro that turns a regular string literal into a &'static CStr (and checks that it doesn't contain null bytes)

Alternative: Make CStr associated methods const. That is a lib changes only rather than a language change.

This does not resolve the ergonomics / potential unsoundedness issue

Another alternative: Provide a cstr! macro that turns a regular string literal into a &'static CStr (and checks that it doesn't contain null bytes)

A macro would work (and is in fact probably better as CStr is specific to std and not a built-in type), but would have to take a byte string, as C-strings need not conform to any encoding, including str's UTF-8.

It could take either, it just has to check that there's no 0 bytes contained within

Yeah this can just be a proc macro in a crate that takes a string literal or byte string literal and then produces a &[u8].

I suppose so. I still think it would make sense to include it into
std/rustc.

I'm not familiar with writing proc-macros, unfortunately

On Wed, 12 Aug 2020, 01:22 Lokathor, notifications@github.com wrote:

Yeah this can just be a proc macro in a crate that takes a string literal
or byte string literal and then produces a &[u8].

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/75401#issuecomment-672352148,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AJI55IXFNDIV6ZNQYG6FWSTSAHHFBANCNFSM4P27NFHQ
.

Probably something _similar to_ this https://github.com/Lokathor/utf16_lit/blob/main/src/lib.rs#L62, but instead of recoding the bytes from utf8 into utf16, you'd just spit them out directly as u8 values.

Combining the suggestions, cstr! could soon just expand to:

const { CStr::from_bytes_with_nul($input).unwrap() }

no proc-macros necessary.

Is inline const implemented?

No, but it is RFC-accepted and doesn't seem terribly hard to implement (he says as a non compiler developer).

IIRC, concat! cannot currently work with bytestring values, so that is one limit to the macro_rules version.

Still, regardless of the macro details, I think we can all agree that this ability, while nice, doesn't need to exist directly within the compiler or standard library. It can be done as a standard user crate.

So people should make the helper macro they want as a crates.io crate, and then worry about it maybe being moved into the standard library at a later date.

The const features required to make that work are not near stabilization, so only std could feasibly implement it that way in the near future (it would be possible to do without inline const, but even making from_bytes_with_nul a stable const fn seems quite far away given it needing reference transmutes, and I'm unsure of the const-panic status).

Well, I should clarify: you can get a const &[u8] using various macro setups. Which is basically good enough.

I sure don't think you'd get a new language construct into Stable faster than a proc-macro that spits out &[u8] values could go up on crates.io

Haha, perhaps I should've looked on crates.io before opening this :D

I think I will close this, since there doesn't seem to be any particular interest in introducing something like this to core or std.

Note that you can't directly pass those to C, as &CStr is a fat pointer. I sometimes do

macro_rules! lit_cstr {
    ($s:literal) {
        (concat!($s, "\0").as_bytes().as_ptr() as *const ::libc::c_char)
    };
}

which produces a *const c_char that I can use directly in calls to C functions like

let errc = libc::sysctlbyname(
    lit_cstr!("hw.cpufrequency"),
    &mut out as *mut _
    &mut msize,
    null_mut(),
    0,
);

and the like. Of course, this doesn't defend against an interior NUL, but it also takes a literal (reducing the likelihood), and is still safe if it happens since it would just see the end of the str sooner.

IMO CStr's primary value is when dealing with:

  1. non-literal rust strings, that need either checking / are non-'static and thus need to carry the lifetime whatever they're derived from / etc
  2. converting from *const c_char to Rust strs using std::ffi::CStr::from_ptr and such.

But for literals I've been using lit_cstr-alike macros for a bit now and havent hit a downside. (Okay, sometimes I end up adding a bit more type safety to the pointers that come out of the macro, but nothing crazy).

Was this page helpful?
0 / 5 - 0 ratings