Rfcs: Pre-RFC: Add minimal uninitialized statics

Created on 13 Feb 2017 · 10Comments · Source: rust-lang/rfcs

I have been thinking about this problem for a while, so I wanted to get feedback for my proposed solution before submitting an actual RFC.

Thanks in advance :smile:

Summary

Currently, Rust has no way to represent non-mut statics that cannot be initialized const-ly; i.e. a static variable that is initialized once after the program starts running and subsequently remains constant. This forces people into using annoying solutions such as lazy_static or even raw pointers to avoid lots of boilerplate everywhere. The proposed fix allows minimal runtime initialization of statics to fix this problem.

Motivation

Imagine we want to use a static that represents the set of current system users. We just want to capture this set of users at the beginning of program execution and present a read-only copy as a global variable.

pub static USERS: &'static [User] = get_set_of_users();

However, we cannot do this unless get_set_of_users is const. What if get_set_of_users is not const, though?

Currently, you have a choice between several annoying options:

Use lazy_static. This means that forevermore you are stuck dereferencing USERS to get the underlying value, which is kind of weird and non-intuitive. It also can lead to non-intuitive errors with strange types and macro stuff. It also may lead to heap allocation in some cases. Finally, it only works if the type is Sync, even though a static is read-only and thus thread-safe.
Make it a static mut and initialize the value or use internal mutability (which is what lazy_static does behind the scenes). This is unsafe and so not preferable. I actually had to resort to this in a recent project with lots of low-level code where the heap was unavailable for a while.
Make it non-global and pass it as a parameter everywhere. This clutters method/function signatures.

The most frustrating part of this conundrum is that it should not be necessary to do all of this just to create a constant value that I know will be initialized before it is used and will remain valid until termination.

This relates to the 2017 Roadmap in that it makes it easier to come to Rust from other languages where this is not a problem, particularly C/C++. It is a minimal change with potentially high impact in ergonomics.

Detailed Design

The idea is to allow minimal uninitialized static variables under the condition that the compiler can statically guarantee that they are initialized before use. This, I think, should be sufficient for most cases.

Meanwhile, it does not seem that it would require much change to the compiler. I am not really knowledgeable about static analysis, but I believe this can be done with some straightforward taint analysis. (Please correct if this is not true). I am frankly not knowledgeable enough about rustc's design to know at what phase of compilation this should happen, but it seems like it could happen after type and lifetime inference (see Unresolved Questions).

Under this proposal the above example would be:

pub static USERS: &'static [User];

pub fn main() {
    // Can do some other initialization stuff here...
    // BUT no code can use `USERS` yet, since it is uninited. 
    // That would result in a compile error.

    ...

    // Initialize `USERS`
    USERS = get_users();

    // Can now use `USERS` like a normal `static`...

    ...
}

The compiler should not need to handle undecideable problems, so the following should result in compile errors:

pub static USERS: &'static [User];

pub fn main() {

    if some_condition {
        init_users();
    }

    // Ahhhh! Possibly uninited

    ...
}

pub static USERS: &'static [User];

pub fn main() {

    if some_condition {
        init_users();
    }

    if some_condition {
        init_users2(); // Ahhhh! Possibly double inited
    }

    // Ahhhh! Possibly uninited

    ...
}

I think otherwise, statics would function as they already do...

How We Teach This

The idea seems intuitive enough that it should only need a couple of simple examples like the ones above in the book or Rust by Example. I wouldn't even say that this is really even a distinct feature from statics; rather, it is a useful way of using them.

Other than that, it should not need much additional teaching effort, I think. Nor do I think that much confusion could arise from this proposal's additions, as it seems pretty self-contained.

Drawbacks

Maybe there is a more robust or general solution to uninitialized values? In that case, this proposal would clutter up the language.
It requires some implementation effort, I suppose, though I did design it to be minimal -- a stepping stone to a fuller solution.

Alternatives

Stick with what we have.
If the mysterious "more robust or general solution" turns up, we should do that...

Unresolved Questions

I'm not 100% sure how this interacts with lifetime inferrence. For example, I would like for the lifetime of USERS to be 'static since it does remain alive for the entire execution of the program after initialization, while the compiler enforces that it is not used before that. However, I am not sure if this causes soundness issues.

T-lang

Source

mark-i-m

👎2 👍2 😕1

All 10 comments

The idea is to allow minimal uninitialized static variables under the condition that the compiler can statically guarantee that they are initialized before use.

So this can only be used in crates that define main and are compiled as executable? (Never in a library.)

SimonSapin on 13 Feb 2017

Won't most use cases become obsolete with better const evaluation?

oli-obk on 13 Feb 2017

👍1

@SimonSapin TBH, I am not sure how much metadata rlibs carry; i.e. are they like shared objects or ddl, which just carry linker symbols?

Ideally, a library can define functions which can be used to initialize any statics and make sure those functions are called before the values are used.

@oli-obk

const evaluation would alleviate the pain in a lot of places, but it cannot help if there is a need to do computation before initializing a static. This comes in handy in two places:

If initializing your static requires a heap allocation. In this case, heap needs to be initialized first.
If you need to do some non-const evaluation, as is the case with my example above.

I have run into both cases in the past, and IIUC, const evaluation would not help.

mark-i-m on 13 Feb 2017

Ideally, a library can define functions which can be used to initialize any statics and make sure those functions are called before the values are used.

This sounds like adding a whole other "dimension" to functions (and methods?) that affects what can be called where, in addition to const and unsafe. I don’t know if this feature is worth that complexity.

SimonSapin on 13 Feb 2017

Correct me if I'm wrong, but the improvement proposed here would only apply to variables which:

definitely need to be global, despite all of the usual reasons why globals are bad and accepting the clutter of "pass it as a parameter everywhere" is generally better
cannot be evaluated at compile-time, even after the CTFE enhancemens miri integration is expected to bring
can be theoretically proven by the compiler (or cargo???) to be initialized prior to all use sites
are "popsicle immutable", i.e. computed once and then "frozen" never to be mutated again
and either:
- should be computed at the beginning of the program, rather than lazily evaluated at first use
- is okay to lazily evaluate, but explicit dereferencing at every use site is too unergonomic

Maybe this is just me, but that _seems_ like a very narrow set of use cases, especially for a feature that relies on some very non-local reasoning. Could this be better addressed by making lazy_static more ergonomic somehow?

If initializing your static requires a heap allocation. In this case, heap needs to be initialized first.

Minor point, but the previews of miri I've seen stated it will be able to simulate heap allocations and unsafe code as part of compile-time evaluation, meaning most operations on standard containers will be available in CTFE when that gets integrated. Of course that doesn't help if your static object really needs a pointer into the heap for some reason, but it still covers a lot of things that "require a heap allocation".

Ixrec on 13 Feb 2017

👍1

Maybe this is just me, but that seems like a very narrow set of use cases, especially for a feature that relies on some very non-local reasoning.

@Ixrec The proposal is intentionally small in scope, as I wished to minimize implementation effort requied. That said, you raise a good point about non-local reasoning (I believe that this is what @SimonSapin is referring to also). To be honest, I don’t know enough about rustc’s design to know how much actual implementation effort is required. Intuitively, it doesn’t seem like the non-local reasoning in this case is hard. Starting at the entry point, you just check for any uses of a static before the first assignment to that static.

Could this be better addressed by making lazy_static more ergonomic somehow?

Possibly, but lazy_static still feels like a bit of a hack for something that the language ought to provide, IMHO.

definitely need to be global, despite all of the usual reasons why globals are bad and accepting the clutter of "pass it as a parameter everywhere" is generally better

If we want rust to be friendly for systems and low-level code, then good static support is a must. I agree that in general global state is not preferred, but in low-level code, having a variable in the static section of binary can be really handy.

Also, most of the arguments I have heard against global state are really more applicable for shared mutable state than shared immutable state. Notice that I could have made this proposal applicable for static muts, but I don't think those should be easier to use.

cannot be evaluated at compile-time, even after the CTFE enhancemens miri integration is expected to bring

are "popsicle immutable", i.e. computed once and then "frozen" never to be mutated again

This may just be me, but my experience is that these are actually more useful than you seem to imply, especially in low-level program and for interior-ly mutable data structures.

Minor point, but the previews of miri I've seen stated it will be able to simulate heap allocations and unsafe code as part of compile-time evaluation, meaning most operations on standard containers will be available in CTFE when that gets integrated. Of course that doesn't help if your static object really needs a pointer into the heap for some reason, but it still covers a lot of things that "require a heap allocation".

That's pretty cool! Still, I would argue that especially in low-level code, you would want to have fine control over exactly what memory operations happen and when.

mark-i-m on 14 Feb 2017

I think we should look how much miri and other const eval can do before pursuing this further. Its sad that miri blocks so many things but right now it still has a chance of being integrated.