Multiple developers have registered a desire to allow public access to fields that expose abstraction-breaking impl details (rather than following the current convention of making all such fields private to the structure in question, thus containing the abstraction-breakage to the structure's module).
See the following postponed (or otherwise closed) RFCs:
I think there is a use for such unsafe fields even when they are private, see https://internals.rust-lang.org/t/pre-rfc-unsafe-types/3073.
Honestly speaking, none of these RFCs really convinced me we want unsafe public fields. The further away an access is from where the invariant "is", the more likely it is for that access to be done without enough consideration of the invariant. I'm already worried about modules growing big, I don't think there should ever be direct access to fields carrying invariants from outside the module. If something like this is needed, just provide a public, unsafe setter function. This provides a good place to add precise documentation about when this function is safe to use (thus also very explicitly making this part of the promised, stable API), and the name of the function may give some additional hints to callers.
@RalfJung if I'm reading your two comments correctly, it sounds like you _do_ want this feature, but you don't want it for public fields?
(Seems odd to add it solely for non pub
fields, just in terms of it seeming like a semi-arbitrary restriction ... don't get me wrong, I can understand the reasoning that "its unsafe, so there's surely some invariant that should be maintained by the declaring module itself"; but nonetheless, it seems like that is an argument to _lint_ and warn against unsafe pub
fields, but not a reason to outlaw them in the language itself.)
Update: Oh, whoops, I didn't read the linked pre-RFC or even look carefully at its title. After skimming the proposal there, I can see how that model means that pub
and unsafe
are no longer orthogonal, but rather quite intermingled.
I'm not sure I want that feature (unsafe private fields), I am uncertain. But I thought it was worth brining it up for discussion.
I do not want unsafe public fields. I think it is a good idea to keep the intuitive notion of abstraction boundary and "how far out do I have to care about this invariant" aligned.
(just for proper cross referencing purposes, there is an excellent series of comments regarding unsafe fields that unfortunately occurred on an RFC issue other than this one.)
In that RFC issue, glaebhoerl mentions that safe code can break unsafe code assumptions in at least two ways:
A) Modify a field incorrectly (this is what declaring fields unsafe could prevent).
B) Unsafe code calls safe code and makes assumptions (say, that arithmetic is done correctly). If the called safe code is wrong, the unsafe code now does horrible things.
glaebhoerl, tsion and RalfJung seem about half convinced that the existence of B invalidates the value of unsafe fields, but I think B is much less terrible than A. In a nutshell, eliminating A is like limiting the use of global mutable state in software.
Unsafe fields would create a wonderful invariant that "every memory-safety critical code is an explicit dependency of some unsafe code". This makes inspecting code for safety much easier and more intuitive than the current situation, where starting from unsafe code we need to propagate through every variable it relies on to all code that may access that variable (a much more implicit rule).
The rule "only code in unsafe blocks can contribute to any memory unsafety" is not a good goal IMO; unsafe code X relying on the functional correctness of code Y should not mean we need to increase the proof burden around Y by turning off some compiler checks, right? I think the explicit dependency invariant, if we can get it, is about as good as it gets.
Some quick musings in response to @daniel-vainsencher 's comment
A. Modify[ing] a field incorrectly [... in the context of safe code ...] is what declaring fields unsafe could prevent
In the absence of an actual RFC, its hard to make verifiable absolute statements about what a given feature would provide.
Will it be legal for unsafe code to first borrow an unsafe field into a usual &
-reference, and then pass that reference into safe code?
&unsafe
-references as part of this hypothetical feature? Or would borrows of unsafe fields only go straight to unsafe *
-pointers, never to &
-references?(I am just trying to say that there is a non-trivial burden of proof here, even when it comes to supporting seemingly obvious claims such as A above.)
Update: One might argue that safe code won't be able to do anything harmful with the &
-reference; but of course there is always both &mut
and also interior mutability to consider.
Update 2: And _of course_ my argument above is a bit of a straw-man, in the sense that buggy unsafe code could also pass a borrowed reference to private data to clients who should not have any access to it. But I'm not claiming that privacy prevents such things. I'm just wary of what claims _are_ made of a proposed feature.
@daniel-vainsencher Also how do (or don't) indirect calls factor into that scenario?
To clarify (to myself first of all) I am referring to RFC https://github.com/rust-lang/rfcs/pull/80/ and believe that the proposed feature is that safe code _by itself_ cannot access unsafe fields. Your scenario of unsafe code passing references to unsafe fields into safe code is allowed. This could require annotating the parameter in the safe function as unsafe (which makes some sense) but I am not assuming this (allowing more uses for the unsafe keyword is a bigger change).
Note that unsafe code may access both safe and unsafe fields, but should be written so that _safety_ depends only on values of unsafe fields. I was not at all clear on this in my previous comment.
@pnkfelix :
I do not understand how you are using "at fault" in this context. My point of view is in terms of the guarantees that programmers can get while acting "naturally", and what it costs.
I am not advocating for making unsafe fields potentially public (or against it). I can more easily imagine uses for private unsafe fields (restricting access even more than now, hence reducing the inspection burden), but system level abstraction may exist where public makes sense. Do we know of examples of such?
@daniel-vainsencher What I was thinking of is that "every memory-safety critical code is an explicit dependency of some unsafe code" seems to break down the moment that the unsafe code makes an indirect call -- suddenly the range of code which may be implicated becomes open-ended again.
It may be that this isn't relevant in practice because if unsafe
code relies on particular assumptions about the behavior of fn
s or trait objects passed into it for its safety, then it is violating the unsafe
contract - that it should be safe for all possible inputs? I haven't thought that all the way through...
FWIW, if unsafe
fields really would let us so sharply delineate the code which may or may not be implicated in memory safety, the idea sounds appealing to me. Are struct
fields really the only vector through which unsafe code can indirectly depend on the behavior of safe code?
@glaebhoerl sorry, I thought by indirect you meant merely transitive callees, now I understand you mean calls to functions (or methods of traits) passed into the unsafe code as parameters.
I am still calling that an explicit dependency in terms of a human inspector, even if the call is more or less opaque to the compiler: when inspecting unsafe code with a function parameter, you are in general liable to find out what values (code) those parameters might have and reason about their functional correctness.
The cost incurred is proportional to the power being used (in contrast to accessing any field requires inspection of the module), so it seems ok to me.
@glaebhoerl I am not sure it is very meaningful as a contract to require unsafe code to be "safe for all possible inputs" when in the current situation, it is often not safe for every value of fields that can be accessed by safe code anywhere in the module. Feels a bit like requiring iron bars on the window when the door is missing... ;)
I may well be missing ways to screw things up, we'll see.
Wondering what @Gankro thinks about this proposal, having just read his thesis.
I have frequently argued against unsafe fields and unsafe types. I essentially introduce the problem they're supposed to they're supposed to solve, and then reject them in this section of the nomicon.
I don't really have the energy to put in a detailed argument, but basically I don't believe it's well-motivated. unsafe-fn is, unsafe-trait is. Pub unsafe fields could be argued for as a useful ergonomic boon for exposing an unsafe piece of data (see: mutating statics, accessing repr(packed) fields, derefing raw ptrs). Priv-unsafe fields are basically pointless. I don't think I've ever seen an error in the wild that could have been prevented, and I've seen several that wouldn't have.
I just read that section (all of chapter two, actually), unsafe fields are
not mentioned nor argued against, except if you mean implicitly by stating
privacy works perfectly?
The argument I have put above for unsafe private fields is twofold: the
code one needs to inspect to ensure invariants is smaller (possibly much
smaller, for a large module only small parts of which are unsafe relevant),
and explicitly enumerated, instead of "the whole module".
Isn't this _the_ place to argue for/against this feature?
I have frequently argued against unsafe fields and unsafe types. I
essentially introduce the problem they're supposed to they're supposed to
solve, and then reject them in this section
https://doc.rust-lang.org/nightly/nomicon/working-with-unsafe.html of the
nomicon.
I don't really have the energy to put in a detailed argument, but basically
I don't believe it's well-motivated. unsafe-fn is, unsafe-trait is. Pub
unsafe fields could be argued for as a useful ergonomic boon for exposing
an unsafe piece of data. Priv-unsafe fields are basically pointless (I
don't think I've ever seen an error in the wild that could have been
prevented, and I've seen several that wouldn't have).
—
Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rfcs/issues/381#issuecomment-174753519.
the code one needs to inspect to ensure invariants is smaller (possibly much smaller, for a large module only small parts of which are unsafe relevant), and explicitly enumerated, instead of "the whole module".
This is not actually true. First of all, this relies on nobody forgetting to mark any fields unsafe. Secondly, the code that has to be inspected it still "any function that contains an unsafe block", since there will usually be local variables that we rely on to have certain invariants. Finally, I'd be surprised if, after properly marking all fields as unsafe, there would be a significant number of functions left that have no unsafe block.
Pub unsafe fields could be argued for as a useful ergonomic boon for exposing an unsafe piece of data (see: mutating statics, accessing repr(packed) fields, derefing raw ptrs).
I disagree, and think "pub unsafe" is a contradiction in itself. Privacy is tied to abstraction safety is tied to hiding unsafe behind an abstraction boundary, I think it would just be confusing to loosen this connection.
I once opened an RFC for this, but after talking to @Gankro I am convinced this feature doesn't offer much. The number of lines of code in unsafe blocks is not at all a good proxy for the "unsafety" of the code.
A single line of unsafe code can do basically anything and affect any other code (usually within the privacy boundary) so you don't shorten review time by typing unsafe a smaller number of times.
@RalfJung
pub unsafe
makes perfect sense - it means that the fields have additional invariants that are not represented by the type-system - for example, possibly the len
field of a Vec
. It's not like they add an additional proof obligation to _every_ piece of unsafe code that is written, only to unsafe code that writes to the field.
If fields have additional invariants, they should be private. If you make them public, you actually make your invariants part of the public API, and you can never ever change the invariants associated with that field again because that could break in combination with client code that still assumes (and establishes!) the old invariants. This is just really bad API design, and IMHO confusing. If you _really_ want this, go ahead and add a getter/setter, that doesn't even incur any overhead after inlining. But at least you had to make some conscious effort to decide that this should be part of your public API.
The API concern isn't restricted to unsafe
fields. Every time you define a pub
field with an invariant, even if it is not a memory-safety-related invariant, you allow users of your struct to depend on it. That's how things work.
Sure. And the answer to this problem is to make the field private, not to make it unsafe
. There is a way to solve this problem in Rust right now, and it is superior to pub unsafe
, we should not add a second way.
That was also my argument the first time this feature came around - I don't believe this feature pulls its weight.
OTOH, privacy is primarily intended for abstraction (preventing users from depending on incidental details), not for protection (ensuring that invariants always hold). The fact that it can be used for protection is basically an happy accident.
To clarify the difference, C strings have no abstraction whatever - they are a raw pointer to memory. However, they do have an invariant - they must point to a valid NUL-terminated string. Every place that constructs such a string must ensure it is valid, and every place that consumes it can rely on it. OTOH, a safe, say, buffered reader needs abstraction but doesn't need protection - it does not hold any critical invariant, but may want to change its internal representation.
On the other 2 corners, a tuple has no abstraction and no protection - it is just its contents, while an unsafe HashMap
both wants to hide its internals and has complex invariants that need to be protected from careless (or theoretically, malicious) programmers.
I disagree, that's not an accident. Abstraction is exactly what allows code to _rely_ on local invariants to be maintained, no matter what well-typed code runs using our data structure. Besides type safety, establishing abstraction is the key feature of type systems.
In other words, a well-established way to make sure that your invariants always hold is to ensure that users cannot even tell you are having invariants, because they can't see any of the implementation details. People are used to this kind of thinking and programming from all kinds of languages that have privacy. This even works formally, once you have proven "representation independence" you can use that proof to show that your invariants are maintained.
Now, Rust has this mechanism to pretty much embed untyped code within typed code (I know that unsafe
still does enforce typing rules, but fundamentally, not much is lost when thinking of unsafe code as untyped). Of course untyped code can poke the abstraction. But that doesn't mean we should make it _convenient_ for that code to do so.
Of course untyped code can poke the abstraction. But that doesn't mean we should make it convenient for that code to do so.
Here we go again.
@reem I am also not convinced that the feature would pull it's weight.
But part of your response confused me:
you don't shorten review time by typing unsafe a smaller number of times.
I didn't think this feature leads to fewer unsafe blocks. If anything, there will be _more_.
I thought the reason some claim it shortens review times is that, in some cases, you get to focus your attention solely on the unsafe blocks. That's quite different, no?
Maybe "accidentally" wasn't the right word. Still, the existence of (public) unsafe
functions demonstrate the usefulness of protection without abstraction. Unsafe fields are just a better interface in some cases.
Moderator note: Just a gentle reminder to all to keep comments constructive. Thanks!
@RalfJung:
the code one needs to inspect to ensure invariants is smaller (possibly much smaller, for a large module only small parts of which are unsafe relevant), and explicitly enumerated, instead of "the whole module".
This is not actually true. First of all, this relies on nobody forgetting to mark any fields unsafe.
I disagree, in a sense. I'll get back to that.
@pnkfelix
I thought the reason some claim it shortens review times is that, in some cases, you get to focus your attention solely on the unsafe blocks. That's quite different, no?
Yes thanks, that's the point, lets use a model to make this crystal clear.
Suppose we can "edit" code which costs 1 and has probability 0.1 of forgetting some invariant, and "prove" which costs c ( say 20) is careful enough to have zero probability (say, by formalizing the code into coq, or whatever). Is the model more or less ok? does anyone think c is closer to 2, in which case all of this doesn't matter? I am abstracting away the fact that changes in the code may require us to "edit" and then re"prove" many times, but lets keep that implicitly in mind.
My claim is:
Without private fields, to ensure safety of a module we need to pay c * (size of module + possibly some external functions)
: the first because any code can write to fields whose variants affect safety, the second because if the module uses the result of some outside function in setting an invariant-laden variable we might need to read it.
With private fields, to ensure safety of a module, we need to pay size of module + c * (total size of functions: marked unsafe + mentioning unsafe + those that they call)
.
So in a module that has a tiny unsafe function relying on a single private field, unsafe fields allows us to be approximately c
times more efficient by marking that field unsafe. Do we agree on the math, assuming the model? this is the benefit I am arguing for.
Now if we agree on the model and the math, we can argue whether it is more convenient to use this feature, or to keep modules with any unsafety minimal so the privacy boundary does the same job about as efficiently.
Coming back to particular arguments:
First of all, this relies on nobody forgetting to mark any fields unsafe.
The model shows something else: the "proof" step starts with looking at the explicitly unsafe code, and marking unsafe any fields for which safety depends on non-trivial invariants holding. So even if someone forgot in the past, this is efficient to deal with on the first "proof". By setting the field unsafe, this invariant will now be maintained even when doing mere "edit" of code not marked unsafe because the compiler helps. In contrast, without unsafe fields, even if you remember that a field has an invariant once, that doesn't lower the probability of error on the next edit (possibly by someone else).
Secondly, the code that has to be inspected it still "any function that contains an unsafe block", ...
This was never sufficient; in the current situation any function that writes a variable with a safety relevant invariant must be inspected to maintain safety. These functions are currently effectively unsafe, without any marking.
... since there will usually be local variables that we rely on to have certain invariants.
Thanks for bringing this case up: even with unsafe fields, we still need to inspect the whole function, not just the unsafe block. Still better than the whole module, though!
Finally, I'd be surprised if, after properly marking all fields as unsafe, there would be a significant number of functions left that have no unsafe block.
Here is an example how this could happen. Take a matrix abstraction module. It has a little code that provides safe access without bounds checking (relying on LLVM is not always sufficient), hence the module uses unsafe blocks. Their safe implementation understandably relies on an invariant like the allocated size equals the product of rows and cols. Now add a hundred different linear algebra algorithms (from sum to LU decomposition), none of which write any of the invariant carrying variables, but all read them. Also add one silly function that does write these variables, say by providing a view of the matrix as a vector.
Now "proving" this class without unsafe fields requires reviewing the linear algebra, and with unsafe fields it does not. Does this example make sense?
Yes, most of this makes sense. I was not able to entirely follow the part about forgetting to mark fields as unsafe. But one thing I might say (and maybe that's what you said) is that forgetting to mark a field "unsafe" is only relevant if some unsafe
code actually relies on this invariant. So as part of the "proof", when we come to a function containing unsafe, and we want to justify its safety - we _have_ to think about the invariant, and that's when we should check that the field actually _is_ unsafe. So, again, we don't have to look at functions that don't contain unsafe and that are not called by functions containing unsafe.
So, yeah, you pretty much summed up why I originally made the proposal on form of my pre-RFC. And I still believe this is all true. But you really need an example like the one you have at the end to get any benefit out of this. The typical unsafe data structure, on the other hand, looks more like Vec
or HashMap
of RefCell
, where there's just no way around checking pretty much anything. There may be a function here or there that needs no checking, but when you do the "proof", you'll see at a glance that a short function doesn't actually do anything that could violate the invariants, and just skip it. Saying "check all functions that contain unsafe or that write to a field of a struct" gives you pretty much all the benefit of unsafe fields.
Since unsafe code can rely on the functional behavior of safe code, you also still have to check some (or many) functions that are actually entirely safe. For this reason, it is not clear whether this change actually carries its own weight - i.e., any change we make adds complexity and needs to be documented and communicated, so it needs to be better than just "a little useful in a few corner cases".
Safe things that can break unsafe code:
if ptr.is_null() { unsafe { free(ptr) } }
mem::uninitialized()
(equiv, writing without ptr::write when destructors are involved)This is of course ignoring all the places where we play fast-and-loose with marking functions unsafe internally at all. Regardless, _these_ are the kinds of errors I actually see in practice. These are the things that keep me up at night. Someone accidentally updating a field they're not supposed to? Nah. Also, you can screw up in exactly the inverse manner:
fn pop(&mut self) -> Option<T> {
if len == 0 { None } else { unsafe { Some(ptr::read(self.ptr.offset(self.len - 1))) } }
// oops didn't update `self.len`, memory unsafe.
}
This example doesn't _even_ mutate. It only reads, but that breaks an invariant, because the read is a semantic move.
I don't "get" the probabilistic model. I see no reason why evaluating a safe or unsafe function is materially different in an unsafe module. I want all my code to be correct, not just the unsafe code. I'm going to have it reviewed, I'm going to toss in debug_asserts, I'm going to test the crap out of it. There's no material difference to me if BTreeMap has a segfault or if the largest element isn't yielded by its iterator. The code is buggy and needs to be fixed. Yes, the segfault is a _worse_ error in terms of security, but that's life.
In my experience, every single struct field is unsafe
in unsafe modules. Why can't your formal analysis just assume that? It seems to completely handle your matrix example. It's easy enough to tell if an access is read-only or not in Rust.
Edit: Really, it seems to me that we all just want dependent types and/or design-by-contract invariants/constraints. All this unsafe field/type stuff is just a horribly poor version of those features.
- incorrect control flow -- if ptr.is_null() { unsafe { free(ptr) } }
- reading mem::uninitialized() (equiv, writing without ptr::write when destructors are involved)
Both of these fall under "you have to completely check every function that contains an unsafe block".
Why can't your formal analysis just assume that?
Just in case you are referring to my attempts at proving unsafe code safe, that one is entirely independent of anything like that. I didn't suggest unsafe fields to facilitate that work, it would not help at all. This is all just about the compiler-programmer interface.
@RalfJung yes, I'm sure this doesn't relate to your work, but @daniel-vainsencher seemed to be suggesting proving stuff in a more adhoc way ("say, by formalizing the code into coq, or whatever")
Yes, @RalfJung, @Gankro: I am using "proof" as a shorthand for any way of applying more effort to gain confidence by analyzing the code.
@Gankro I understand your points as:
I'll focus on the last. And specifically, you say:
"I see no reason why evaluating a safe or unsafe function is materially different in an unsafe module. I want all my code to be correct, not just the unsafe code."
But history says that bugs always do remain, so the "verification" budget (however implicit) is always short, so we better use it efficiently. So preventing bugs which are more expensive (to debug, to have exploited, etc) takes priority. And then it makes sense to apply the "proof" level to prevent unsafety _first_ and then whatever remains of your budget to test general correctness. Another reason to take care of the safety first and rest of correctness later is because after you've "proved" safety, you are back to the pleasantness of reading Safe Rust, even in an unsafe module! seems like a much much easier job than treating every unsafe module as Unsafe Rust throughout.
and
In my experience, every single struct field is unsafe in unsafe modules.
I find this statement very surprising. As an example, is the depth field in a BTreeMap unsafe? by a quick inspection I don't see that it is, seems the only use of it is as a hint to search stack capacity.
@RalfJung I agree, unsafe private fields are strongly useful only in modules with unsafe code but with much more safe code. In other words, in Rust without unsafe fields, every module is encouraged to be either completely safe, or with minimalist functionality.
Do we want performant abstractions to carry this extra cost for additional functionality? I think removing this tradeoff makes design easier.
BTW, what _is_ the weight of unsafe private fields?
@daniel-vainsencher my contention is that (private) unsafe fields are basically useless, so literally any implementation, documentation, and learning complexity that it adds is inordinate. The fact that something is relatively minor to add is not an adequate argument to add it here. We do not favour a grab-bag approach to language design.
I don't worry about invariants that unsafe fields rely on getting broken much because they're liable to show up in tests _very quickly_. Of course, I work on data structures, which are gloriously well-specified and convenient to test. The only hard thing is exception safety, and I don't think unsafe fields help with that.
I still haven't seen a well-motivated example.
While I am not yet decided on the usefulness of unsafe fields, I have to say I don't think testing is an adequate alternative. Testing is great, and important, but it usually does not find subtle violations of invariants, such as those occurring only non-deterministically (a pervasive problem in concurrent data structures) or those that need strange combinations of operations or just a _lot_ of operations to occur (like bugs relation to overflows).
I've started to become more in favor of unsafe fields. I don't think they are a panacea by any means, but I do think that they could be quite helpful as a form of documentation. Basically, they give you another way to identify actions that could potentially disturb invariants.
Let me argue in narrative form (the best kind of argument ;). I was recently doing some refactoring work on Rayon where I tried to think more carefully about unsafety. Basically, before I had a lot of unsafe traits, and I wanted to see if I could narrow down the unsafety into a few, more local invariants. To this end, my first step was that I removed _all_ unsafe fns and unsafe blocks from the codebase, and then just checked what would and would not compile. It turned out that the set of places actually taking unsafe actions was really quite small. For each of those fns, I either asserted locally the conditions I needed (if I could), or else documented the invariants that must hold and declare the fn as unsafe
. I could then recompile and repeat the exercise at the callers.
In some cases, I uncovered invariants that were "struct-wide" invariants -- for example, that it is ok to write to self.target.offset(x)
so long as x >= 0 && x < self.len
. In other words, that the len
field represents a range of writable memory. Here I could document the invariant on the struct, and I could mark the new
function unsafe (because it established the invariant), but I could not declare the fields involved as unsafe. This means I can't use my strategy of letting the compiler guide me to the next place where I have to check things.
I found this exercise very interesting. Among other things, explicitly writing out my assumptions did lead me to uncover some flaws. For example, relating to that len
field specifically, I was assuming that the user would invoke the trait methods in the expected order: if they did not, they could modify the len
field in such a way as to permit bad writes.
So, I do think it would have been helpful if I could have declared fields unsafe. It would also have given me an obvious place to write out the invariants that pertain to those fields. _However,_ it's not a panacea. For example, marking the fields as unsafe helps initially, but if I modified the invariants on those fields, it wouldn't help me identify the places where I now need to do additional checks. (I have some thoughts on that too, but that's a separate thread I guess.)
I also most definitely wouldn't want to suggest that unsafe
fields obviate one from the need of thinking carefully about all the code in your module. I see them as a tool for helping to identify places of interest, but they don't fundamentally change the fact that the "trust boundary" is the module.
As for the topic of pub unsafe
fields and whether they are a good idea, I think it will all depend on context. If you are publishing a library on crates.io, I think pub unsafe
is a bad idea. If you are writing some code for your application, I think pub unsafe
may be a fine idea, because it is quite plausible then for you to identify and change all users of the field if you want to -- and sometimes, for performance, it is helpful to be able to directly access fields.
(I'm curious to hear if other people think this strategy for examining unsafety is clearly flawed, by the way. Also, it's true that in lieu of unsafe fields, each time I discovered a new invariant, I could have eagerly searched the module for uses of the relevant fields (and I did). But it just makes the process harder.)
I don't think it is flawed at all. Actually, I think I'd follow a similar strategy. Unsafety is all about assigning _more_ meaning to types than they (syntactically) have. Carefully documenting what exactly it is that is additionally assumed is certainly the first step to a formal proof of safety (e.g., in the framework that I am trying to develop). In other words, if a module has been documented the way you suggested, that's already taking some burden from whoever ends up doing a full, formal proof of correctness. If then every unsafe block also has a comment explaining _why_ this particular access maintains the invariant, that's already half the proof ;-) .
I think something like this has been in my mind when I suggested unsafe fields (well, I suggested unsafe types, but I was quickly convinced that doing this on the field level is better). I just wasn't able to articulate it so clearly.
In a discussion [1] on Reddit, /u/diwic proposed a solution within current Rust. I cleaned it up a bit and put it on a playpen at [2]. I think the best way forward is to use (a bikeshedded variant of) this module in some relevant use cases, and if the pattern is deemed important enough that the ugly syntax is unacceptable to incorporate into the language, pick up this issue again.
[1] https://www.reddit.com/r/rust/comments/4gdi4a/curious_about_state_in_safeunsafe_code/
[2] http://is.gd/9kGSvF
That solution doesn't actually work though - you can still swap out a field with that wrapper type with another instance of that wrapper type that might contain a entirely different value, completely safely.
I think of this as "public privacy" and "private unsafety". Privacy is a safety feature, allowing the programmer to make local reasoning about type invariants. Adding unsafe fields adds the ability to set public fields, which might hold some guarantee (e.g., vec.len <= vec.cap
) without pushing to the stack (i.e. setter method). Although this is inlined most of the time when optimizations is activated, it will certainly improve performance in e.g. debug builds (for example, you can read a vector's length without pushing to the stack).
Even more interesting is the ability to express module-local invariants through the type system. In particular, you can ensure certain guarantees about your type in the defining module _it self_. Let's look at vec.rs
from libcollections
. This module is 1761 lines long (excluding tests), and has many invariants it relies on for safety.
Currently, Rust has a module-level safety model. You can shoot you self in the foot by breaking invariants of structures in the defining module. Adding unsafe fields helps this situation a lot. It will bring down the responsibility for holding the invariants to the function itself, making reasoning about unsafety even easier.
The last thing to mention is better syntax. A lot of getters can be removed, arguably improving readability.
On the flip side, there is a lot of code that needs to be rewritten. And I imagine that, for consistency, certain fields in the standard library should be made pub unsafe
. Note that such a change won't be breaking, since you can already have fields and methods with overlapping names.
@Kimundi: that version had a more general bug: initialization should also take invariants into account.
The fix is that new
should be unsafe
also. I have already applied this change (I am preparing a micro crate for unsafe_field).
This fix addresses your concern as well: http://is.gd/4JWZc2
@Kimundi also, thanks for having a poke at it!
Is there any compelling usecase for reading to be unsafe?
@Kimundi: actually my fix does not quite address your concern: creating the new value is unsafe, so the common case of struc.a = UnsafeMember<usize>::new(4);
would fail to compile. But if we happen to have a value of that type lying around, we certainly could swap it in and break an invariant in safe code as you say :(
This means that the proposed library only adds a speed bump, not a guarantee. So it is not a reasonable replacement for the language feature, and I am not sure it is even worth using in a project: it will probably prevent some bugs, but also make maintainers over-confident. Blech. Great catch!
In short, yes, unsafe fields seem to require a language feature.
@ticki Unsynchronized reading during a write from another thread can be unsafe, because it may produce an invalid value. If a public field is declared as safely readable and there's some kind of unsafe (non-enforced) synchronization strategy, unsafe code can't guard against reading from safe code, and UB may result.
No, that's not possible.
Why not? Am I missing something?
I am guessing that @ticki is relying on "mutable xor shared" and @golddranks is talking about a module that breaks it internally e.g., to provide an efficient concurrent data structure.
I think that my assumption that only writing is unsafe needs discarding. Unsafe fields should be unsafe to access, period. And I think that fields not subject to "mutable xor shared" should never be public, even if any unsafe fields are allowed to be (which I don't know about).
If unsafe fields are unsafe to read, they don't have the same power. Everyone would prefer to write the respective getter function instead of the field directly, sacrificing performance.
How important is the performance benefit of shaving off an accessor in debug builds?
I think the main argument for private unsafe fields is preserving memory safety at a small maintenance expense. In this case, accessing those fields through a small set of functions that ensure written and read values are indeed valid is probably a best practice anyway.
How important is the performance benefit of shaving off an accessor in debug builds?
Well, considering how many getters there are, I think that _some_ gain will be there.
But the argument for local guarantees for invariants is much more compelling.
My thoughts on this:
rust
unsafe fn get_ref(&self) -> &T;
unsafe fn get_mut(&mut self) -> &mut T;
unsafe fn get_move(self) -> T;
unsafe fn set(t: T);
static mut
has - you can't know in what state the field is.Copy
data. In combination with a copyable ManualDrop
type, this would still allow all types.@Kimundi About safety of dropping: To my understanding your concern is that any implementation of the Drop trait that accesses the unsafe fields has to include unsafe
section? why is that a problem exactly? it seems to exactly encode the fact that these fields carry invariants, and any finalization done on them requires care.
Or are you concerned about memory safety when Drop is not implemented at all? IIUC, Rust does not guarantees that drops will occur, so we should not depend on them for memory safety purposes anyway. Right?
But it is strongly wanted (and often assumed, though not for memory safety) for destructors to be called. Not doing so would be quite counterintuitive. If destructors are not to be called in unsafe fields, I would require a !Drop
.
@ticki, @Kimundi, you seem to be referring to dropping the value in the field that is marked unsafe, right? I though Kimundi was referring to a drop of the struct containing the field.
But unsafe code already should ensure (beforehand) that it is safe to drop any sensitive fields before they get dropped, right?
So the unsafe fields proposal merely helps formalize this obligation: any field that might be unsafe to drop at any time should be marked an unsafe field, and brought to a safe state before it gets dropped.
Hence, I think the compiler should drop values in unsafe fields exactly when it would in regular ones. Am I missing something? This is also an important property if we want porting modules (with unsafe to use unsafe fields) to be an easy enhancement of documentation with no associated danger.
@daniel-vainsencher: I'm referring to this case:
struct Foo {
unsafe bar: SomeType
}
where SomeType
has a destructor.
If Foo
where a regular struct, then dropping an instance of Foo
would call the destructor of SomeType
with a &mut
to the bar
field.
If, however, bar
is marked as a unsafe field, and thus unsafe to access, then it would be unsafe for a drop of Foo to also drop the bar
field.
So either unsafe
fields need to be defined as needing manual dropping, requiring to implement Drop
explicitly; or they need to be defined as being Copy
to ensure they don't need to be dropped at all.
Third option would be to silently not drop then, but that seems like a footgun.
@Kimundi Thank you for taking the time to write your view very explicitly. The term unsafe
is being overloaded (maybe more specialized terms exist and I am not aware of them):
1) Some actions are dangerous bugs, causing UB etc. We call those unsafe or memory unsafe.
2) unsafe
sections allow us to do some things in category (1), but have another desired use: The rust community seems to want to be able to trace all cases of (1) to cases of (2).
3) For this purpose, we use for example the annotation of functions as unsafe
to propagate the danger signal; marking a function unsafe does not add potential for bugs!
You seem to treat accessing unsafe fields as unsafe in sense (1): actually can cause UB, like static mut
variables. This could happen, for example if we allowed the compiler more liberty in optimizing around such variables. In this case, allowing the regular drop behavior (so when we drop Foo, we call drop on SomeType
as well, which can be a bug, therefore the compiler should not do it. Do I understand correctly?
As far as I am concerned, unsafe fields are exactly like unsafe functions, a tool for (3). Marking a field unsafe is sufficient warning to any maintainer of the code that when the compiler does its usual thing around drop
s, heavy objects had better be put away. This is the same responsibility that writers of unsafe code face now, just made a bit more precise: with correctly annotated unsafe fields, the drop
of other fields should never cause unsafety (in sense 1). I imagine that when it is non-trivial, this responsibility will often be discharged using an unsafe section inside the destructor of Foo
. Does this make sense?
Therefore in my opinion unsafe fields should be dropped exactly when regular fields are; they should not require manual dropping, implementation of Drop, a Copy bound. All of those are tools an implementor might use to avoid unsafety in sense (1), but its his choice.
@daniel-vainsencher
I think you convinced me :) To put what you said in different words, would you agree that unsafe fields could be defined as this:
That is a concise and precise definition that I agree with.
On May 7, 2016 8:38 AM, "Marvin Löbel" [email protected] wrote:
@daniel-vainsencher https://github.com/daniel-vainsencher
I think you convinced me :) To put what you said in different words, would
you agree that unsafe fields could be defined as this:
- Marking a field of a struct as unsafe requires an unsafe block for
any direct access of the field.
- This also includes construction of the struct.
- The implementation of the struct still needs to ensure that the
field is always in a state where destroying the struct does not cause
memory unsafety.
- Specifically, if the field contains a type with a destructor, it
needs to be in a safe-to-call-drop state after the drop of the containing
struct.
- This would be a invariant that needs to be kept in mind at
initialization/modification time of the field value
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rfcs/issues/381#issuecomment-217632817
I think unions will need to eventually support very similar capability, but with the opposite sign - making fields safe.
Unsafe blocks is a huge burden especially on something that is statically known (to the user, but not to the language) to be safe. A good example is unions like LARGE_INTEGER (again).
cc https://github.com/rust-lang/rust/issues/32836
I feel like https://github.com/rust-lang/rust/issues/41622 is another point where unsafe fields could have helped: If we make it so that types with unsafe fields to NOT get any automatic instances for unsafe OIBITS, that would have prevented accidentally implementing Sync
for MutexGuard<Cell<i32>>
.
Really, there is no way to predict how the invariant of a type with unsafe fields could interact with the assumptions provided by unsafe traits, so implementing them automatically is always dangerous. The author of a type with unsafe fields has to convince himself that the guarantees of an unsafe traits a satisfied. Not getting these instances automatically is an ergonomics hit, but the alternative is to err on the side of "what could possibly go wrong", which is not very Rust-y.
@RalfJung good point. I am pretty strongly in favor of unsafe fields at this point. The only thing that holds me back is some desire to think a bit more about the "unsafe" model more generally.
Bit OT: I wonder how hard would it be to implement a way to express invariants maintained by said struct and insert lints (and possibly runtime checks in debug mode) about them. Probably it wouldn't help with broken Send
and Sync
but it might be interesting e.g. in case of Vec
(len <= capacity
).
I've a question, I want to be able to point to the same entity with one pointer being owned, the other being mutably borrowed. My example is to implement a single linked list which is optimized for appending elements to the end, so I have a first_node and a last_node which can potentially be overlapping:
pub struct SingleLinkedList<T>
{
unsafe first_node: OptionNode<T>,
unsafe last_node: &OptionNode<T>,
pub length: mut u32,
}
Would that be possible? Or is there any workaround to this?
@sighoya This won't be possible, as it would break Rust's aliasing model
The only thing this does is makes accessing these fields unsafe
, nothing more.
Most helpful comment
I feel like https://github.com/rust-lang/rust/issues/41622 is another point where unsafe fields could have helped: If we make it so that types with unsafe fields to NOT get any automatic instances for unsafe OIBITS, that would have prevented accidentally implementing
Sync
forMutexGuard<Cell<i32>>
.Really, there is no way to predict how the invariant of a type with unsafe fields could interact with the assumptions provided by unsafe traits, so implementing them automatically is always dangerous. The author of a type with unsafe fields has to convince himself that the guarantees of an unsafe traits a satisfied. Not getting these instances automatically is an ergonomics hit, but the alternative is to err on the side of "what could possibly go wrong", which is not very Rust-y.