In a.rs
#![crate_type = "lib"]
pub struct X(pub u8);
impl Drop for X {
fn drop(&mut self) {
println!("{}", self.0);
}
}
pub fn f(x: &mut X, g: fn()) {
x.0 = 1;
g();
x.0 = 0;
}
In b.rs
extern crate a;
fn main() {
a::f(&mut a::X(0), g);
}
fn g() {
panic!();
}
$ rustc -O a.rs && rustc -L . b.rs && ./b
thread '<main>' panicked at 'explicit panic', b.rs:8
0
$ rustc a.rs && rustc -L . b.rs && ./b
thread '<main>' panicked at 'explicit panic', b.rs:8
1
$ rustc -V
rustc 1.6.0-nightly (2e07996a9 2015-10-29)
NB: Some pre-1.0 nightlies already behaved this way.
Looks like a codegen bug related to the noalias attribute we place on the IR. It also looks like this isn't necessarily related to a function pointer. The codegen for this function is a little messy but when cleaned up it looks like:
%X = type { i8, i8 }
declare void @foo()
define void @f(%X* noalias) {
%x = alloca %X*
store %X* %0, %X** %x
%2 = load %X*, %X** %x
%3 = getelementptr inbounds %X, %X* %2, i32 0, i32 0
store i8 1, i8* %3
call void @foo()
%4 = load %X*, %X** %x
%5 = getelementptr inbounds %X, %X* %4, i32 0, i32 0
store i8 0, i8* %5
ret void
}
When optimized, we get:
%X = type { i8, i8 }
declare void @foo()
define void @f(%X* noalias nocapture) {
%2 = getelementptr inbounds %X, %X* %0, i64 0, i32 0
tail call void @foo()
store i8 0, i8* %2, align 1
ret void
}
And as we can see, the store of 1 has been optimized away. If we instead remove the noalias attribute on the pointer, the store remains
%X = type { i8, i8 }
declare void @foo()
define void @f(%X* nocapture) {
%2 = getelementptr inbounds %X, %X* %0, i64 0, i32 0
store i8 1, i8* %2, align 1
tail call void @foo()
store i8 0, i8* %2, align 1
ret void
}
cc @dotdash, perhaps we can't place noalias on &mut? That seems unfortunate, but I guess there's also a possibility this is a bug in LLVM (but seems unlikely).
I'm no expert, but isn't the bug here that we're emitting call instead of invoke?
AFAIK marking &mut's noalias is correct. It seems to me that LLVM is using the fact that a call can't throw to determine that the first store is unobservable. If we emited invoke it would be forced to conservatively assume that g can throw and perform the store.
I'm wrong.
An LLVM IR call is semantically the same as an invoke with a trivial landing pad.
There's a decent argument that LLVM's treatment of noalias is wrong. Consider the following C++ (GNU-dialect) testcase:
#include <iostream>
using namespace std;
void xx() {
throw 1;
}
void (*volatile g)() = xx;
void f(int* __restrict x) {
*x = 1;
g();
*x = 0;
}
int main() {
int x = 0;
try { f(&x); } catch(...) {}
std::cout << x;
return 0;
}
gcc prints 1; clang prints 0.
On the other hand, you're likely to run into resistance proposing any restriction on how alias analysis can reason about noalias pointers.
This isn't just a noalias issue; it can occur with other kinds of alias information as well. For example, the first store gets deleted in this code too, even though the call to foo may unwind.
declare void @foo()
@g = external global i8
define void @f() {
store i8 1, i8* @g
call void @foo() readnone
store i8 0, i8* @g
ret void
}
MemoryDependenceAnalysis in LLVM currently says that the second store here has a direct local dependence on the first store. There's a sense in which that is a correct answer, but DeadStoreElimination inteprets this answer to mean that the first store is dead. It's not immediately obvious which of these two should be fixed.
call having an implicit control path requires special handling in a other passes too (eg., the inliner has to know how to convert call into invoke if it's being inlined into a context where it'll have a cleanup), so it'd be cool if LLVM had something like an invoke with only one successor, which the optimizer could treat like a kind of conditional return, however, that's a much bigger change.
triage: P-high
Is the problem here that LLVM is optimizing too heavily, or that we are providing incorrect annotations?
Either way, it may be prudent to supply fewer annotations for the time being. That'd certainly be an easy patch to do.
I asked sunfish to clarify on IRC, and he believes this is 100% an llvm bug.
I've now filed an LLVM bug: https://llvm.org/bugs/show_bug.cgi?id=25422
We were discussing in the compiler team meeting. Do we expect LLVM to fix this in a short time frame? I would think probably not -- we were thinking therefore that it would be prudent to workaround in the time being. Would this mean removing the aliasing annotations and thus taking some speed hit? Are there other solutions? Other implications?
Do we expect LLVM to fix this in a short time frame?
No; fixing DSE is probably easy enough... but trusting noalias to work correctly alongside unwinding would probably require auditing every use of alias analysis in LLVM (for example, I'm pretty sure LICM is also broken). In any case, no matter how fast this is fixed, Rust can't unconditionally depend on any fix because we support linking against 3.7.
Would this mean removing the aliasing annotations and thus taking some speed hit?
Yes.
Are there other solutions?
No.
Other implications?
I don't think so.
LICM is not obviously broken; see eg. isGuaranteedToExecute in LICM.cpp; it explicitly checks for instructions which may unwind before doing related things.
The bug here does appear to arise from the difference between MemDep's way of viewing the world and DSE's.
I don't know how conservative Rust wants to be here, but there is room for optimism that this may just be an isolated DSE bug (or a MemDep bug which only affects DSE).
Won't removing noalias kill our performance?
Okay, LICM's is in fact correct... the definition of isGuaranteedToExecute is overly conservative in a way which happens to cover up this issue.
That said, my non-comprehensive audit shows -memcpyopt, -mldst-motion, and -loop-idiom are broken. (Granted, -mldst-motion isn't part of the default pass pipeline.) A couple more testcases:
#![crate_type="lib"]
// Optimized to memset before loop, but g() could panic.
pub fn f(x: &mut [i32; 1000], g: fn() -> i32) {
for i in 0..1000 {
x[i] = 0;
g();
}
}
#![crate_type="lib"]
// Optimized to output result of h() directly into x, but g() could panic.
pub fn f(x: &mut [i32; 100], g: fn() -> i32, h: fn() -> [i32; 100]) {
let a = h();
g();
*x = a;
}
Wow. I revoke my optimism then.
@arielb1
Won't removing noalias kill our performance?
I am curious what effect it will have, I don't really have a good feeling
at all for how important this is.
On Thu, Nov 12, 2015 at 7:39 PM, Dan Gohman [email protected]
wrote:
Wow. I revoke my optimism then.
—
Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rust/issues/29485#issuecomment-156281427.
This looks rather unambiguously like an LLVM bug (though we may want to work around it).
(a reasonable subtask for a relative newcomer to the project could be to evaluate the actual performance impact removing the noalias attribute here.)
For rust itself, the text segment in the .so files grows by 1-3%. Runtime for rustc -Zno-trans also goes up by 1-3%, depending on the crate.
@dotdash
Could we have concrete numbers (-Z time-passes output)? This is much lower than I thought - what change have you done?
Of course, 1-3% perf losses still suck - this might be a reason to make official rustc builds use -Z no-landing-pads.
@arielb1 I removed all noalias attributes from function arguments and return values.
diff --git a/src/librustc_trans/trans/attributes.rs b/src/librustc_trans/trans/attributes.rs
index 28dfa4e..32dd31d 100644
--- a/src/librustc_trans/trans/attributes.rs
+++ b/src/librustc_trans/trans/attributes.rs
@@ -10,7 +10,7 @@
//! Set and unset common attributes on LLVM values.
use libc::{c_uint, c_ulonglong};
-use llvm::{self, ValueRef, AttrHelper};
+use llvm::{self, ValueRef};
use middle::ty;
use middle::infer;
use session::config::NoDebugInfo;
@@ -117,7 +117,6 @@ pub fn from_fn_attrs(ccx: &CrateContext, attrs: &[ast::Attribute], llfn: ValueRe
llvm::ColdAttribute as u64)
}
} else if attr.check_name("allocator") {
- llvm::Attribute::NoAlias.apply_llfn(llvm::ReturnIndex as c_uint, llfn);
} else if attr.check_name("unwind") {
unwind(llfn, true);
}
@@ -190,24 +189,12 @@ pub fn from_fn_type<'a, 'tcx>(ccx: &CrateContext<'a, 'tcx>, fn_type: ty::Ty<'tcx
// invisible to the program. We also know it's nonnull as well
// as how many bytes we can dereference
attrs.arg(1, llvm::Attribute::StructRet)
- .arg(1, llvm::Attribute::NoAlias)
.arg(1, llvm::Attribute::NoCapture)
.arg(1, llvm::DereferenceableAttribute(llret_sz));
// Add one more since there's an outptr
idx += 1;
} else {
- // The `noalias` attribute on the return value is useful to a
- // function ptr caller.
- match ret_ty.sty {
- // `Box` pointer return values never alias because ownership
- // is transferred
- ty::TyBox(it) if common::type_is_sized(ccx.tcx(), it) => {
- attrs.ret(llvm::Attribute::NoAlias);
- }
- _ => {}
- }
-
// We can also mark the return value as `dereferenceable` in certain cases
match ret_ty.sty {
// These are not really pointers but pairs, (pointer, len)
@@ -233,8 +220,7 @@ pub fn from_fn_type<'a, 'tcx>(ccx: &CrateContext<'a, 'tcx>, fn_type: ty::Ty<'tcx
// For non-immediate arguments the callee gets its own copy of
// the value on the stack, so there are no aliases. It's also
// program-invisible so can't possibly capture
- attrs.arg(idx, llvm::Attribute::NoAlias)
- .arg(idx, llvm::Attribute::NoCapture)
+ attrs.arg(idx, llvm::Attribute::NoCapture)
.arg(idx, llvm::DereferenceableAttribute(llarg_sz));
}
@@ -244,8 +230,6 @@ pub fn from_fn_type<'a, 'tcx>(ccx: &CrateContext<'a, 'tcx>, fn_type: ty::Ty<'tcx
// `Box` pointer parameters never alias because ownership is transferred
ty::TyBox(inner) => {
- attrs.arg(idx, llvm::Attribute::NoAlias);
-
if common::type_is_sized(ccx.tcx(), inner) {
let llsz = machine::llsize_of_real(ccx, type_of::type_of(ccx, inner));
attrs.arg(idx, llvm::DereferenceableAttribute(llsz));
@@ -265,10 +249,6 @@ pub fn from_fn_type<'a, 'tcx>(ccx: &CrateContext<'a, 'tcx>, fn_type: ty::Ty<'tcx
// on memory dependencies rather than pointer equality
let interior_unsafe = mt.ty.type_contents(ccx.tcx()).interior_unsafe();
- if mt.mutbl == hir::MutMutable || !interior_unsafe {
- attrs.arg(idx, llvm::Attribute::NoAlias);
- }
-
if mt.mutbl == hir::MutImmutable && !interior_unsafe {
attrs.arg(idx, llvm::Attribute::ReadOnly);
}
diff --git a/src/librustc_trans/trans/foreign.rs b/src/librustc_trans/trans/foreign.rs
index 95e9e85..39858e9 100644
--- a/src/librustc_trans/trans/foreign.rs
+++ b/src/librustc_trans/trans/foreign.rs
@@ -363,8 +363,7 @@ pub fn trans_native_call<'blk, 'tcx>(bcx: Block<'blk, 'tcx>,
// The outptr can be noalias and nocapture because it's entirely
// invisible to the program. We also know it's nonnull as well
// as how many bytes we can dereference
- attrs.arg(1, llvm::Attribute::NoAlias)
- .arg(1, llvm::Attribute::NoCapture)
+ attrs.arg(1, llvm::Attribute::NoCapture)
.arg(1, llvm::DereferenceableAttribute(llret_sz));
};
Sizes before:
text data bss dec filename
9021 38464 8 47493 lib/libarena-71b07a99.so
32863 12766 8 45637 lib/libflate-71b07a99.so
19286 40241 4 59531 lib/libfmt_macros-71b07a99.so
60656 70240 8 130904 lib/libgetopts-71b07a99.so
12498 47964 4 60466 lib/libgraphviz-71b07a99.so
25740 23790 24 49554 lib/liblog-71b07a99.so
49478 104506 4 153988 lib/librbml-71b07a99.so
3367632 3223067 32 6590731 lib/librustc-71b07a99.so
216766 87705 4 304475 lib/librustc_back-71b07a99.so
304387 176979 4 481370 lib/librustc_borrowck-71b07a99.so
18597 149256 8 167861 lib/librustc_data_structures-71b07a99.so
1926412 121359 8 2047779 lib/librustc_driver-71b07a99.so
768321 1497412 16 2265749 lib/librustc_front-71b07a99.so
259104 65749 8 324861 lib/librustc_lint-71b07a99.so
29222128 1897150 105744 31225022 lib/librustc_llvm-71b07a99.so
1611292 218121 4 1829417 lib/librustc_metadata-71b07a99.so
420463 165511 8 585982 lib/librustc_mir-71b07a99.so
926977 39857 4 966838 lib/librustc_platform_intrinsics-71b07a99.so
39491 11789 8 51288 lib/librustc_plugin-71b07a99.so
89545 21788 4 111337 lib/librustc_privacy-71b07a99.so
409397 164117 8 573522 lib/librustc_resolve-71b07a99.so
2262107 715243 12 2977362 lib/librustc_trans-71b07a99.so
1749382 562100 4 2311486 lib/librustc_typeck-71b07a99.so
3166949 705446 24 3872419 lib/librustdoc-71b07a99.so
163400 500427 16 663843 lib/libserialize-71b07a99.so
1623203 1825834 5912 3454949 lib/libstd-71b07a99.so
4221433 3111003 24 7332460 lib/libsyntax-71b07a99.so
149020 80788 4 229812 lib/libterm-71b07a99.so
215043 126814 16 341873 lib/libtest-71b07a99.so
53340591 15805486 111932 69258009 (TOTALS)
Sizes after:
text data bss dec filename
9245 38469 8 47722 lib/libarena-71b07a99.so
32903 12772 4 45679 lib/libflate-71b07a99.so
20786 40241 4 61031 lib/libfmt_macros-71b07a99.so
63584 70235 4 133823 lib/libgetopts-71b07a99.so
12744 47949 8 60701 lib/libgraphviz-71b07a99.so
26132 23790 24 49946 lib/liblog-71b07a99.so
49860 104494 8 154362 lib/librbml-71b07a99.so
3393388 3223197 32 6616617 lib/librustc-71b07a99.so
219222 87724 4 306950 lib/librustc_back-71b07a99.so
317487 176994 4 494485 lib/librustc_borrowck-71b07a99.so
18965 149257 4 168226 lib/librustc_data_structures-71b07a99.so
1980840 121366 8 2102214 lib/librustc_driver-71b07a99.so
777253 1497469 16 2274738 lib/librustc_front-71b07a99.so
261536 65773 8 327317 lib/librustc_lint-71b07a99.so
29222240 1897162 105744 31225146 lib/librustc_llvm-71b07a99.so
1661684 218169 4 1879857 lib/librustc_metadata-71b07a99.so
427259 165522 4 592785 lib/librustc_mir-71b07a99.so
927005 39868 4 966877 lib/librustc_platform_intrinsics-71b07a99.so
39967 11805 8 51780 lib/librustc_plugin-71b07a99.so
90133 21789 8 111930 lib/librustc_privacy-71b07a99.so
407357 164139 4 571500 lib/librustc_resolve-71b07a99.so
2351875 715590 8 3067473 lib/librustc_trans-71b07a99.so
1763514 562155 4 2325673 lib/librustc_typeck-71b07a99.so
3172349 705522 24 3877895 lib/librustdoc-71b07a99.so
165452 500460 16 665928 lib/libserialize-71b07a99.so
1642203 1825793 5912 3473908 lib/libstd-71b07a99.so
4319225 3111203 24 7430452 lib/libsyntax-71b07a99.so
151144 80785 4 231933 lib/libterm-71b07a99.so
218399 126818 16 345233 lib/libtest-71b07a99.so
53743751 15806510 111920 69662181 (TOTALS)
Ignoring the rustc_llvm crate, which is pretty much unaffected by this change, the size of the text segment is 1.67% larger without the noalias attributes.
I'll gather new timings in the evening. It'll probably be output from perf stat though, as the -Ztime-passes times didn't seem stable enough to be useful. I'll check them again though.
More numbers. I went with the -Ztime-passes times, because this time they seemed rather stable. Maybe I previously forgot to turn off something that ran in the background and messed with the timings.
| With noalias | Without noalias | Change | Pass |
| --: | --: | --: | --- |
| 0.095 | 0.099 | +4.21% | parsing |
| 0.034 | 0.034 | +0.00% | configuration 1 |
| 0.000 | 0.000 | +0.00% | recursion limit |
| 0.002 | 0.002 | +0.00% | gated macro checking |
| 0.000 | 0.000 | +0.00% | crate injection |
| 0.001 | 0.001 | +0.00% | macro loading |
| 0.000 | 0.000 | +0.00% | plugin loading |
| 0.000 | 0.000 | +0.00% | plugin registration |
| 0.634 | 0.649 | +2.36% | expansion |
| 0.012 | 0.013 | +8.33% | complete gated feature checking 1 |
| 0.146 | 0.149 | +2.05% | configuration 2 |
| 0.000 | 0.000 | +0.00% | gated configuration checking |
| 0.071 | 0.072 | +1.40% | maybe building test harness |
| 0.000 | 0.000 | +0.00% | prelude injection |
| 0.006 | 0.006 | +0.00% | checking that all macro invocations are gone |
| 0.000 | 0.000 | +0.00% | checking for inline asm in case the target doesn't support it |
| 0.013 | 0.013 | +0.00% | complete gated feature checking 2 |
| 0.067 | 0.068 | +1.49% | assigning node ids |
| 0.046 | 0.046 | +0.00% | lowering ast -> hir |
| 0.101 | 0.102 | +0.99% | indexing hir |
| 0.000 | 0.000 | +0.00% | attribute checking |
| 0.023 | 0.023 | +0.00% | early lint checks |
| 0.000 | 0.000 | +0.00% | external crate/lib resolution |
| 0.004 | 0.004 | +0.00% | language item collection |
| 0.092 | 0.093 | +1.08% | resolution |
| 0.011 | 0.011 | +0.00% | lifetime resolution |
| 0.000 | 0.000 | +0.00% | looking for entry point |
| 0.000 | 0.000 | +0.00% | looking for plugin registrar |
| 0.030 | 0.030 | +0.00% | region resolution |
| 0.004 | 0.004 | +0.00% | loop checking |
| 0.004 | 0.004 | +0.00% | static item recursion checking |
| 0.105 | 0.105 | +0.00% | type collecting |
| 0.001 | 0.001 | +0.00% | variance inference |
| 0.451 | 0.469 | +3.99% | coherence checking |
| 0.553 | 0.579 | +4.70% | wf checking (old) |
| 0.729 | 0.735 | +0.82% | item-types checking |
| 4.844 | 4.987 | +2.95% | item-bodies checking |
| 0.000 | 0.000 | +0.00% | drop-impl checking |
| 0.999 | 1.035 | +3.60% | wf checking (new) |
| 0.194 | 0.198 | +2.06% | const checking |
| 0.055 | 0.055 | +0.00% | privacy checking |
| 0.009 | 0.009 | +0.00% | stability index |
| 0.041 | 0.042 | +2.43% | intrinsic checking |
| 0.011 | 0.011 | +0.00% | effect checking |
| 0.070 | 0.071 | +1.42% | match checking |
| 0.258 | 0.262 | +1.55% | MIR dump |
| 0.025 | 0.026 | +4.00% | liveness checking |
| 0.408 | 0.416 | +1.96% | borrow checking |
| 0.204 | 0.207 | +1.47% | rvalue checking |
| 0.092 | 0.093 | +1.08% | reachability checking |
| 0.033 | 0.033 | +0.00% | death checking |
| 0.031 | 0.032 | +3.22% | stability checking |
| 0.000 | 0.000 | +0.00% | unused lib feature checking |
| 0.696 | 0.725 | +4.16% | lint checking |
| 11.300 | 11.609 | +2.73% | total real |
| 11.158 | 11.696 | +4.82% | total user |
| With noalias | Without noalias | Change | Pass |
| --: | --: | --: | --- |
| 0.200 | 0.226 | +13.00% | parsing |
| 0.065 | 0.065 | +0.00% | configuration 1 |
| 0.000 | 0.000 | +0.00% | recursion limit |
| 0.003 | 0.003 | +0.00% | gated macro checking |
| 0.000 | 0.000 | +0.00% | crate injection |
| 0.004 | 0.004 | +0.00% | macro loading |
| 0.000 | 0.000 | +0.00% | plugin loading |
| 0.000 | 0.000 | +0.00% | plugin registration |
| 0.248 | 0.253 | +2.01% | expansion |
| 0.009 | 0.009 | +0.00% | complete gated feature checking 1 |
| 0.068 | 0.068 | +0.00% | configuration 2 |
| 0.000 | 0.000 | +0.00% | gated configuration checking |
| 0.031 | 0.031 | +0.00% | maybe building test harness |
| 0.029 | 0.029 | +0.00% | prelude injection |
| 0.004 | 0.004 | +0.00% | checking that all macro invocations are gone |
| 0.000 | 0.000 | +0.00% | checking for inline asm in case the target doesn't support it |
| 0.009 | 0.009 | +0.00% | complete gated feature checking 2 |
| 0.029 | 0.029 | +0.00% | assigning node ids |
| 0.025 | 0.025 | +0.00% | lowering ast -> hir |
| 0.013 | 0.014 | +7.69% | indexing hir |
| 0.000 | 0.000 | +0.00% | attribute checking |
| 0.013 | 0.013 | +0.00% | early lint checks |
| 0.001 | 0.001 | +0.00% | external crate/lib resolution |
| 0.003 | 0.003 | +0.00% | language item collection |
| 0.071 | 0.071 | +0.00% | resolution |
| 0.005 | 0.005 | +0.00% | lifetime resolution |
| 0.000 | 0.000 | +0.00% | looking for entry point |
| 0.000 | 0.000 | +0.00% | looking for plugin registrar |
| 0.015 | 0.016 | +6.66% | region resolution |
| 0.002 | 0.002 | +0.00% | loop checking |
| 0.002 | 0.002 | +0.00% | static item recursion checking |
| 0.038 | 0.038 | +0.00% | type collecting |
| 0.001 | 0.001 | +0.00% | variance inference |
| 0.049 | 0.045 | -8.17% | coherence checking |
| 0.061 | 0.063 | +3.27% | wf checking (old) |
| 0.085 | 0.086 | +1.17% | item-types checking |
| 1.529 | 1.570 | +2.68% | item-bodies checking |
| 0.000 | 0.000 | +0.00% | drop-impl checking |
| 0.171 | 0.181 | +5.84% | wf checking (new) |
| 0.143 | 0.150 | +4.89% | const checking |
| 0.030 | 0.029 | -3.34% | privacy checking |
| 0.006 | 0.004 | -33.34% | stability index |
| 0.013 | 0.014 | +7.69% | intrinsic checking |
| 0.006 | 0.006 | +0.00% | effect checking |
| 0.036 | 0.037 | +2.77% | match checking |
| 0.153 | 0.158 | +3.26% | MIR dump |
| 0.015 | 0.016 | +6.66% | liveness checking |
| 0.262 | 0.271 | +3.43% | borrow checking |
| 0.180 | 0.190 | +5.55% | rvalue checking |
| 0.008 | 0.008 | +0.00% | reachability checking |
| 0.018 | 0.018 | +0.00% | death checking |
| 0.019 | 0.019 | +0.00% | stability checking |
| 0.000 | 0.000 | +0.00% | unused lib feature checking |
| 0.151 | 0.153 | +1.32% | lint checking |
| 3.898 | 4.010 | +2.87% | total real |
| 3.821 | 3.929 | +2.82% | total user |
| With noalias | Without noalias | Change | Pass |
| --: | --: | --: | --- |
| 0.167 | 0.176 | +5.38% | parsing |
| 0.058 | 0.061 | +5.17% | configuration 1 |
| 0.000 | 0.000 | +0.00% | recursion limit |
| 0.004 | 0.003 | -25.00% | gated macro checking |
| 0.000 | 0.000 | +0.00% | crate injection |
| 0.006 | 0.007 | +16.66% | macro loading |
| 0.000 | 0.000 | +0.00% | plugin loading |
| 0.000 | 0.000 | +0.00% | plugin registration |
| 0.741 | 0.773 | +4.31% | expansion |
| 0.019 | 0.019 | +0.00% | complete gated feature checking 1 |
| 0.159 | 0.167 | +5.03% | configuration 2 |
| 0.000 | 0.000 | +0.00% | gated configuration checking |
| 0.078 | 0.080 | +2.56% | maybe building test harness |
| 0.072 | 0.074 | +2.77% | prelude injection |
| 0.009 | 0.009 | +0.00% | checking that all macro invocations are gone |
| 0.000 | 0.000 | +0.00% | checking for inline asm in case the target doesn't support it |
| 0.019 | 0.019 | +0.00% | complete gated feature checking 2 |
| 0.073 | 0.075 | +2.73% | assigning node ids |
| 0.067 | 0.067 | +0.00% | lowering ast -> hir |
| 0.056 | 0.057 | +1.78% | indexing hir |
| 0.000 | 0.000 | +0.00% | attribute checking |
| 0.035 | 0.033 | -5.72% | early lint checks |
| 0.003 | 0.003 | +0.00% | external crate/lib resolution |
| 0.006 | 0.006 | +0.00% | language item collection |
| 0.251 | 0.256 | +1.99% | resolution |
| 0.012 | 0.013 | +8.33% | lifetime resolution |
| 0.000 | 0.000 | +0.00% | looking for entry point |
| 0.000 | 0.000 | +0.00% | looking for plugin registrar |
| 0.060 | 0.063 | +5.00% | region resolution |
| 0.006 | 0.006 | +0.00% | loop checking |
| 0.006 | 0.006 | +0.00% | static item recursion checking |
| 0.064 | 0.064 | +0.00% | type collecting |
| 0.001 | 0.001 | +0.00% | variance inference |
| 0.062 | 0.059 | -4.84% | coherence checking |
| 0.072 | 0.077 | +6.94% | wf checking (old) |
| 0.128 | 0.139 | +8.59% | item-types checking |
| 9.500 | 9.826 | +3.43% | item-bodies checking |
| 0.000 | 0.000 | +0.00% | drop-impl checking |
| 0.535 | 0.560 | +4.67% | wf checking (new) |
| 0.510 | 0.513 | +0.58% | const checking |
| 0.082 | 0.079 | -3.66% | privacy checking |
| 0.011 | 0.011 | +0.00% | stability index |
| 0.033 | 0.029 | -12.13% | intrinsic checking |
| 0.025 | 0.024 | -4.00% | effect checking |
| 0.115 | 0.115 | +0.00% | match checking |
| 0.560 | 0.568 | +1.42% | MIR dump |
| 0.075 | 0.075 | +0.00% | liveness checking |
| 1.085 | 1.113 | +2.58% | borrow checking |
| 0.530 | 0.548 | +3.39% | rvalue checking |
| 0.024 | 0.024 | +0.00% | reachability checking |
| 0.065 | 0.060 | -7.70% | death checking |
| 0.060 | 0.062 | +3.33% | stability checking |
| 0.000 | 0.000 | +0.00% | unused lib feature checking |
| 0.435 | 0.440 | +1.14% | lint checking |
| 16.035 | 16.514 | +2.98% | total real |
| 15.430 | 15.897 | +3.02% | total user |
@dotdash great, thanks! This is sort of inline with my expectations. I also expect we'll see more impact on smaller examples.
As an aside, at the work week I got a lot of great suggestions for benchmarks, I have to try and draw them up together into a central list now.
Assigning to myself for the purpose of gathering up a few benchmarks.
Doesn't the above patch remove way more noalias annotations than is necessary? In particular, The noalias on returns from Box::new and the noalias on shared references should be much less critical, and not be affected by the LLVM bug at hand. I can't quite tell where many of the other noalias used to be added.
So I ran the benchmarks from https://github.com/cristicbz/rust-doom/tree/rustc-benchmark
with a current master build and one with the above patch applied.
master:
test freedoom1 ... bench: 396,057,607 ns/iter (+/- 14,013,493)
test freedoom2 ... bench: 508,963,398 ns/iter (+/- 19,068,029)
Patched:
test freedoom1 ... bench: 416,906,637 ns/iter (+/- 5,445,396)
test freedoom2 ... bench: 530,530,952 ns/iter (+/- 8,757,330)
That is interesting. Looks like about a 5% slowdown.
This patch also removes NoAlias for "Box pointer return values" and for shared references. Neither of them are affected by the problem discussed here, and I assume at least Box return values are not even controversial. (The other places NoAlias is removed, I could not recognize; mind you I did not spend much time analyzing that.)
Maybe it'd be interesting to see how big the impact of just disabling NoAlias for shared borrows is?
EDIT: _oops_ obviously I mixed up shared and mutable references,
Time for some new numbers. This time only removing the noalias attribute from mutable borrows as suggested by @RalfJung. I've been slightly sceptical about that the return value ones being ok to be marked as noalias, but that was because I thought of them as an out-parameter (like in the LLVM IR) instead of the return values they are in the rust code. The problematic cases don't arise with return values because we only write to them at the end of the function.
diff --git a/src/librustc_trans/trans/attributes.rs b/src/librustc_trans/trans/attributes.rs
index 28dfa4e..95c3941 100644
--- a/src/librustc_trans/trans/attributes.rs
+++ b/src/librustc_trans/trans/attributes.rs
@@ -265,7 +265,7 @@ pub fn from_fn_type<'a, 'tcx>(ccx: &CrateContext<'a, 'tcx>, fn_type: ty::Ty<'tcx
// on memory dependencies rather than pointer equality
let interior_unsafe = mt.ty.type_contents(ccx.tcx()).interior_unsafe();
- if mt.mutbl == hir::MutMutable || !interior_unsafe {
+ if mt.mutbl != hir::MutMutable && !interior_unsafe {
attrs.arg(idx, llvm::Attribute::NoAlias);
}
Current master (rustc 1.8.0-dev (d0ef74026 2016-02-04)):
test freedoom1 ... bench: 384,162,874 ns/iter (+/- 11,155,763)
test freedoom2 ... bench: 492,713,445 ns/iter (+/- 18,825,658)
Patched master:
test freedoom1 ... bench: 387,141,369 ns/iter (+/- 4,549,142)
test freedoom2 ... bench: 493,736,564 ns/iter (+/- 6,199,499)
So basically no difference in performance. I'll make an PR for this.
Thanks for persevering, @dotdash!
On Thu, Feb 04, 2016 at 02:48:48PM -0800, Björn Steinbrink wrote:
So basically no difference in performance. I'll make an PR for this.
Great.