I have been thinking about this problem for a while, so I wanted to get feedback for my proposed solution before submitting an actual RFC.
Thanks in advance :smile:
Currently, Rust has no way to represent non-mut statics that cannot be initialized const-ly; i.e. a static variable that is initialized once after the program starts running and subsequently remains constant. This forces people into using annoying solutions such as lazy_static or even raw pointers to avoid lots of boilerplate everywhere. The proposed fix allows minimal runtime initialization of statics to fix this problem.
Imagine we want to use a static that represents the set of current system users. We just want to capture this set of users at the beginning of program execution and present a read-only copy as a global variable.
pub static USERS: &'static [User] = get_set_of_users();
However, we cannot do this unless get_set_of_users is const. What if get_set_of_users is not const, though?
Currently, you have a choice between several annoying options:
Use lazy_static. This means that forevermore you are stuck dereferencing USERS to get the underlying value, which is kind of weird and non-intuitive. It also can lead to non-intuitive errors with strange types and macro stuff. It also may lead to heap allocation in some cases. Finally, it only works if the type is Sync, even though a static is read-only and thus thread-safe.
Make it a static mut and initialize the value or use internal mutability (which is what lazy_static does behind the scenes). This is unsafe and so not preferable. I actually had to resort to this in a recent project with lots of low-level code where the heap was unavailable for a while.
Make it non-global and pass it as a parameter everywhere. This clutters method/function signatures.
The most frustrating part of this conundrum is that it should not be necessary to do all of this just to create a constant value that I know will be initialized before it is used and will remain valid until termination.
This relates to the 2017 Roadmap in that it makes it easier to come to Rust from other languages where this is not a problem, particularly C/C++. It is a minimal change with potentially high impact in ergonomics.
The idea is to allow minimal uninitialized static variables under the condition that the compiler can statically guarantee that they are initialized before use. This, I think, should be sufficient for most cases.
Meanwhile, it does not seem that it would require much change to the compiler. I am not really knowledgeable about static analysis, but I believe this can be done with some straightforward taint analysis. (Please correct if this is not true). I am frankly not knowledgeable enough about rustc's design to know at what phase of compilation this should happen, but it seems like it could happen after type and lifetime inference (see Unresolved Questions).
Under this proposal the above example would be:
pub static USERS: &'static [User];
pub fn main() {
// Can do some other initialization stuff here...
// BUT no code can use `USERS` yet, since it is uninited.
// That would result in a compile error.
...
// Initialize `USERS`
USERS = get_users();
// Can now use `USERS` like a normal `static`...
...
}
The compiler should not need to handle undecideable problems, so the following should result in compile errors:
pub static USERS: &'static [User];
pub fn main() {
if some_condition {
init_users();
}
// Ahhhh! Possibly uninited
...
}
pub static USERS: &'static [User];
pub fn main() {
if some_condition {
init_users();
}
if some_condition {
init_users2(); // Ahhhh! Possibly double inited
}
// Ahhhh! Possibly uninited
...
}
I think otherwise, statics would function as they already do...
The idea seems intuitive enough that it should only need a couple of simple examples like the ones above in the book or Rust by Example. I wouldn't even say that this is really even a distinct feature from statics; rather, it is a useful way of using them.
Other than that, it should not need much additional teaching effort, I think. Nor do I think that much confusion could arise from this proposal's additions, as it seems pretty self-contained.
USERS to be 'static since it does remain alive for the entire execution of the program after initialization, while the compiler enforces that it is not used before that. However, I am not sure if this causes soundness issues.The idea is to allow minimal uninitialized static variables under the condition that the compiler can statically guarantee that they are initialized before use.
So this can only be used in crates that define main and are compiled as executable? (Never in a library.)
Won't most use cases become obsolete with better const evaluation?
@SimonSapin TBH, I am not sure how much metadata rlibs carry; i.e. are they like shared objects or ddl, which just carry linker symbols?
Ideally, a library can define functions which can be used to initialize any statics and make sure those functions are called before the values are used.
@oli-obk
const evaluation would alleviate the pain in a lot of places, but it cannot help if there is a need to do computation before initializing a static. This comes in handy in two places:
static requires a heap allocation. In this case, heap needs to be initialized first.const evaluation, as is the case with my example above.I have run into both cases in the past, and IIUC, const evaluation would not help.
Ideally, a library can define functions which can be used to initialize any statics and make sure those functions are called before the values are used.
This sounds like adding a whole other "dimension" to functions (and methods?) that affects what can be called where, in addition to const and unsafe. I don鈥檛 know if this feature is worth that complexity.
Correct me if I'm wrong, but the improvement proposed here would only apply to variables which:
Maybe this is just me, but that _seems_ like a very narrow set of use cases, especially for a feature that relies on some very non-local reasoning. Could this be better addressed by making lazy_static more ergonomic somehow?
If initializing your static requires a heap allocation. In this case, heap needs to be initialized first.
Minor point, but the previews of miri I've seen stated it will be able to simulate heap allocations and unsafe code as part of compile-time evaluation, meaning most operations on standard containers will be available in CTFE when that gets integrated. Of course that doesn't help if your static object really needs a pointer into the heap for some reason, but it still covers a lot of things that "require a heap allocation".
Maybe this is just me, but that seems like a very narrow set of use cases, especially for a feature that relies on some very non-local reasoning.
@Ixrec The proposal is intentionally small in scope, as I wished to minimize implementation effort requied. That said, you raise a good point about non-local reasoning (I believe that this is what @SimonSapin is referring to also). To be honest, I don鈥檛 know enough about rustc鈥檚 design to know how much actual implementation effort is required. Intuitively, it doesn鈥檛 seem like the non-local reasoning in this case is hard. Starting at the entry point, you just check for any uses of a static before the first assignment to that static.
Could this be better addressed by making lazy_static more ergonomic somehow?
Possibly, but lazy_static still feels like a bit of a hack for something that the language ought to provide, IMHO.
- definitely need to be global, despite all of the usual reasons why globals are bad and accepting the clutter of "pass it as a parameter everywhere" is generally better
If we want rust to be friendly for systems and low-level code, then good static support is a must. I agree that in general global state is not preferred, but in low-level code, having a variable in the static section of binary can be really handy.
Also, most of the arguments I have heard against global state are really more applicable for shared mutable state than shared immutable state. Notice that I could have made this proposal applicable for static muts, but I don't think those should be easier to use.
- cannot be evaluated at compile-time, even after the CTFE enhancemens miri integration is expected to bring
- are "popsicle immutable", i.e. computed once and then "frozen" never to be mutated again
This may just be me, but my experience is that these are actually more useful than you seem to imply, especially in low-level program and for interior-ly mutable data structures.
Minor point, but the previews of miri I've seen stated it will be able to simulate heap allocations and unsafe code as part of compile-time evaluation, meaning most operations on standard containers will be available in CTFE when that gets integrated. Of course that doesn't help if your static object really needs a pointer into the heap for some reason, but it still covers a lot of things that "require a heap allocation".
That's pretty cool! Still, I would argue that especially in low-level code, you would want to have fine control over exactly what memory operations happen and when.
I think we should look how much miri and other const eval can do before pursuing this further. Its sad that miri blocks so many things but right now it still has a chance of being integrated.
@est31 That's a fair point... do you know if there is a place where we can look further into this (e.g. an RFC or PR or something)?
@mark-i-m The "design document" is a comment on internals (from last year), and miri is on github.
I'm going to close this since it looks like consts going to subsume a lot of use cases...