New in netcoreapp2.1 (doesn't impact netcoreapp2.0 or net47)
Full repro below; synopsis: fixed buffers go pretty wild when used on a field in a key in a Dictionary<TKey,TValue> (and possibly some other scenarios), causing:
foreach) that GetHashCode() and Equals() are working correctlySince this could affect many more scenarios, I'd wager this is a pretty serious regression.
Code:
```c#
// #define EXPLICIT_FIELDS
// Hypothesis: JIT bug affecting how fixed buffers are handled
//
// Affects: netcoreapp2.1, debug and release
// Does not seem to affect: netcoreapp2.0, net47
//
// the idea behind CommandBytes is that it is a fixed-sized string-like thing
// used for matching commands; it is implemented as a fixed buffer
// of longs, but: the first byte of the first element is coerced into
// a byte and used to store the length; the actual text payload (ASCII)
// starts at the second byte of the first element
//
// as far as I can tell, it is all validly implemented, and it works fine
// in isolation, however: when used in a dictionary, it goes bad;
// - items not being found despite having GetHashCode and Equals match
// - items over 1 chunk size becoming corrupted (see: ToInnerString)
//
// however, if I replace the fixed buffer with the same number of
// regular fields (_c0,_c1,_c2) and use all the same code, it
// all works correctly! to see this, define EXPLICIT_FIELDS
//
// The "Main" method populates a dictionary in the expected way,
// then attempts to find things - either via TryGetValue or manually;
// it then compares the contents
//
// Yes, this code is evil; it is for a very specific optimized scenario.
using System;
using System.Collections.Generic;
using System.Text;
unsafe struct CommandBytes : IEquatable
{
private const int ChunkLength = 3;
public const int MaxLength = (ChunkLength * 8) - 1;
fixed long _chunks[ChunkLength];
long _c0, _c1, _c2;
public override int GetHashCode()
{
fixed (long* lPtr = _chunks)
fixed (long* lPtr = &_c0)
{
var hashCode = -1923861349;
long* x = lPtr;
for (int i = 0; i < ChunkLength; i++)
{
hashCode = hashCode * -1521134295 + (*x++).GetHashCode();
}
return hashCode;
}
}
public override string ToString()
{
fixed (long* lPtr = _chunks)
fixed (long* lPtr = &_c0)
{
var bPtr = (byte*)lPtr;
return Encoding.ASCII.GetString(bPtr + 1, bPtr[0]);
}
}
public int Length
{
get
{
fixed (long* lPtr = _chunks)
fixed (long* lPtr = &_c0)
{
var bPtr = (byte*)lPtr;
return bPtr[0];
}
}
}
public byte this[int index]
{
get
{
fixed (long* lPtr = _chunks)
fixed (long* lPtr = &_c0)
{
byte* bPtr = (byte*)lPtr;
int len = bPtr[0];
if (index < 0 || index >= len) throw new IndexOutOfRangeException();
return bPtr[index + 1];
}
}
}
public CommandBytes(string value)
{
value = value.ToLowerInvariant();
var len = Encoding.ASCII.GetByteCount(value);
if (len > MaxLength) throw new ArgumentOutOfRangeException("Maximum command length exceeed");
fixed (long* lPtr = _chunks)
_c0 = _c1 = _c2 = 0L;
fixed (long* lPtr = &_c0)
{
Clear(lPtr);
byte* bPtr = (byte*)lPtr;
bPtr[0] = (byte)len;
fixed (char* cPtr = value)
{
Encoding.ASCII.GetBytes(cPtr, value.Length, bPtr + 1, len);
}
}
}
public override bool Equals(object obj) => obj is CommandBytes cb && Equals(cb);
public string ToInnerString()
{
fixed (long* lPtr = _chunks)
fixed (long* lPtr = &_c0)
{
long* x = lPtr;
var sb = new StringBuilder();
for (int i = 0; i < ChunkLength; i++)
{
if (sb.Length != 0) sb.Append(',');
sb.Append(*x++);
}
return sb.ToString();
}
}
public bool Equals(CommandBytes value)
{
fixed (long* lPtr = _chunks)
fixed (long* lPtr = &_c0)
{
long* x = lPtr;
long* y = value._chunks;
long* y = &value._c0;
for (int i = 0; i < ChunkLength; i++)
{
if (*x++ != *y++) return false;
}
return true;
}
}
private static void Clear(long* ptr)
{
for (int i = 0; i < ChunkLength; i++)
{
*ptr++ = 0L;
}
}
}
static class Program
{
static void Main()
{
var lookup = new Dictionary
void Add(string val)
{
var cb = new CommandBytes(val);
// prove we didn't screw up
if (cb.ToString() != val)
throw new InvalidOperationException("oops!");
lookup.Add(cb, val);
}
Add("client");
Add("cluster");
Add("command");
Add("config");
Add("dbsize");
Add("decr");
Add("del");
Add("echo");
Add("exists");
Add("flushall");
Add("flushdb");
Add("get");
Add("incr");
Add("incrby");
Add("info");
Add("keys");
Add("llen");
Add("lpop");
Add("lpush");
Add("lrange");
Add("memory");
Add("mget");
Add("mset");
Add("ping");
Add("quit");
Add("role");
Add("rpop");
Add("rpush");
Add("sadd");
Add("scard");
Add("select");
Add("set");
Add("shutdown");
Add("sismember");
Add("spop");
Add("srem");
Add("strlen");
Add("subscribe");
Add("time");
Add("unlink");
Add("unsubscribe");
void HuntFor(string lookFor)
{
Console.WriteLine($"Looking for: '{lookFor}'");
var hunt = new CommandBytes(lookFor);
if (lookup.TryGetValue(hunt, out var found))
{
Console.WriteLine($"Found via TryGetValue: '{found}'");
}
else
{
Console.WriteLine("**NOT FOUND** via TryGetValue");
}
Console.WriteLine("looking manually");
foreach (var pair in lookup)
{
if (pair.Value == lookFor)
{
Console.WriteLine($"Found manually: '{pair.Value}'");
var key = pair.Key;
void Compare<T>(string caption, Func<CommandBytes, T> func)
{
T x = func(hunt), y = func(key);
Console.WriteLine($"{caption}: {EqualityComparer<T>.Default.Equals(x, y)}, '{x}' vs '{y}'");
}
Compare("GetHashCode", _ => _.GetHashCode());
Compare("ToString", _ => _.ToString());
Compare("Length", _ => _.Length);
Compare("ToInnerString", _ => _.ToInnerString());
Console.WriteLine($"Equals: {key.Equals(hunt)}, {hunt.Equals(key)}");
var eq = EqualityComparer<CommandBytes>.Default;
Console.WriteLine($"EqualityComparer: {eq.Equals(key, hunt)}, {eq.Equals(hunt, key)}");
Compare("eq GetHashCode", _ => eq.GetHashCode(_));
}
}
Console.WriteLine();
}
HuntFor("ping");
HuntFor("subscribe");
}
}
Expected output:
Looking for: 'ping'
Found via TryGetValue: 'ping'
looking manually
Found manually: 'ping'
GetHashCode: True, '-1991953770' vs '-1991953770'
ToString: True, 'ping' vs 'ping'
Length: True, '4' vs '4'
ToInnerString: True, '444234035204,0,0' vs '444234035204,0,0'
Equals: True, True
EqualityComparer: True, True
eq GetHashCode: True, '-1991953770' vs '-1991953770'
Looking for: 'subscribe'
Found via TryGetValue: 'subscribe'
looking manually
Found manually: 'subscribe'
GetHashCode: True, '764962383' vs '764962383'
ToString: True, 'subscribe' vs 'subscribe'
Length: True, '9' vs '9'
ToInnerString: True, '7598244868551701257,25954,0' vs '7598244868551701257,25954,0'
Equals: True, True
EqualityComparer: True, True
eq GetHashCode: True, '764962383' vs '764962383'
(note `Found via TryGetValue`, and lots of `True` indicating no corruption)
Actual output on netcoreapp2.1
Looking for: 'ping'
NOT FOUND via TryGetValue
looking manually
Found manually: 'ping'
GetHashCode: True, '-1991953770' vs '-1991953770'
ToString: True, 'ping' vs 'ping'
Length: True, '4' vs '4'
ToInnerString: True, '444234035204,0,0' vs '444234035204,0,0'
Equals: True, True
EqualityComparer: True, True
eq GetHashCode: True, '-1991953770' vs '-1991953770'
Looking for: 'subscribe'
NOT FOUND via TryGetValue
looking manually
Found manually: 'subscribe'
GetHashCode: False, '764962383' vs '945069981'
ToString: False, 'subscribe' vs 'subscri '
Length: True, '9' vs '9'
ToInnerString: False, '7598244868551701257,25954,0' vs '7598244868551701257,0,0'
Equals: False, False
EqualityComparer: False, False
eq GetHashCode: False, '764962383' vs '945069981'
```
Note the **NOT FOUND** for both; the first item doesn't have data corruption (but fails anyway) - the second item (longer, hits 2 elements) has become corrupted.
This repros with netcoreapp3.0 with the nightly sdk 3.0.0-preview1-26710-03 on my system too.
.NET Core SDK (gem盲脽 "global.json"):
Version: 2.2.100-refac-20180613-1
Commit: c8899f8124
Laufzeitumgebung:
OS Name: Windows
OS Version: 10.0.17134
OS Platform: Windows
RID: win10-x64
Base Path: C:\Users\Tornhoof\Downloads\dotnet-sdk-latest-win-x64\sdk\3.0.100-alpha1-009105\
Host (useful for support):
Version: 3.0.0-preview1-26710-03
Commit: d3b49d62b2
.NET Core SDKs installed:
[...]
3.0.100-alpha1-009105 [C:\Users\Tornhoof\Downloads\dotnet-sdk-latest-win-x64\sdk]
.NET Core runtimes installed:
[...]
Microsoft.NETCore.App 3.0.0-preview1-26710-03 [C:\Users\Tornhoof\Downloads\dotnet-sdk-latest-win-x64\shared\Microsoft.NETCore.App]
Looking for: 'ping'
**NOT FOUND** via TryGetValue
looking manually
Found manually: 'ping'
GetHashCode: True, '-1991953770' vs '-1991953770'
ToString: True, 'ping' vs 'ping'
Length: True, '4' vs '4'
ToInnerString: True, '444234035204,0,0' vs '444234035204,0,0'
Equals: True, True
EqualityComparer: True, True
eq GetHashCode: True, '-1991953770' vs '-1991953770'
Looking for: 'subscribe'
**NOT FOUND** via TryGetValue
looking manually
Found manually: 'subscribe'
GetHashCode: True, '945069981' vs '945069981'
ToString: True, 'subscri ' vs 'subscri '
Length: True, '9' vs '9'
ToInnerString: True, '7598244868551701257,0,0' vs '7598244868551701257,0,0'
Equals: True, True
EqualityComparer: True, True
eq GetHashCode: True, '945069981' vs '945069981'
@Tornhoof interestingly though - the data corruption issue ~looks to be resolved~. It just didn't work :)
edit: oof; scratch that; the data corruption issue got worse - it now corrupts identically inside and outside the dictionary! see the 'subscri ' and the '7598244868551701257,0,0' (meaning: the last 2 segments are 0, but this is using 10 bytes, so the second segment should not be zero)
I don't get the data corruption on 2.1 either, but i'd expect it to be named 'subscribe' in the ToString()?
.NET Core SDK (gem盲脽 "global.json"):
Version: 2.1.400-preview-009171
Commit: 6f5d38734b
Laufzeitumgebung:
OS Name: Windows
OS Version: 10.0.17134
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\2.1.400-preview-009171\
Host (useful for support):
Version: 2.1.2
Commit: 811c3ce6c0
Looking for: 'ping'
**NOT FOUND** via TryGetValue
looking manually
Found manually: 'ping'
GetHashCode: True, '-1991953770' vs '-1991953770'
ToString: True, 'ping' vs 'ping'
Length: True, '4' vs '4'
ToInnerString: True, '444234035204,0,0' vs '444234035204,0,0'
Equals: True, True
EqualityComparer: True, True
eq GetHashCode: True, '-1991953770' vs '-1991953770'
Looking for: 'subscribe'
**NOT FOUND** via TryGetValue
looking manually
Found manually: 'subscribe'
GetHashCode: True, '945069981' vs '945069981'
ToString: True, 'subscri ' vs 'subscri '
Length: True, '9' vs '9'
ToInnerString: True, '7598244868551701257,0,0' vs '7598244868551701257,0,0'
Equals: True, True
EqualityComparer: True, True
eq GetHashCode: True, '945069981' vs '945069981'
@Tornhoof that is the corruption :)
Ok, then the corruption is on my system too, I just was confused because in your output you had one "correct" subscribe string (in the ToString line), in mine both are invalid.
I can repro this in coreclr master, and complus_jitminopts=1 fixes things, so it does seem to be related to jit optimization.
Currently suspect there's a missing check in struct promotion, or perhaps we missed seeing an address exposure.
Could be a consequence of dotnet/coreclr#16220 where we started allowing promotion more broadly, which jives with this being 2.1 or later behavior.
Local var header for many methods shows we've promoted we think is an 8 byte _chunks field.
; Assembly listing for method Program:<Main>g__Add|0_0(ref,byref)
; ...
;* V10 tmp7 [V10 ] ( 0, 0 ) struct (24) zero-ref unsafe-buffer
; V11 tmp8 [V11,T06] ( 4, 4 ) bool -> rcx
; V12 tmp9 [V12 ] ( 3, 3 ) long -> [rsp+0x28] do-not-enreg[XF] addr-exposed ptr V02._chunks(offs=0x00) P-DEP
; V13 tmp10 [V13,T11] ( 2, 2 ) long -> rdx V10._chunks(offs=0x00) P-INDEP
Per metadata, CommandBytes is a 24 byte struct with one 24 byte _chunks field. That field is a struct <chunks>e__FixedBuffer with one 8 byte field FixedElementField. Inner struct is marked as an unsafe value type and it looks like runtime propagates this up to the outer struct too.
lvaCanPromoteStructType decides we can unwrap the inner struct and replace it with its single field, so we end up with a 24 byte promotable instance with one 8 byte field and "with holes".
Hence _chunks loses its last 16 bytes.
So two possible avenues towards a fix (may need both):
Just blocking promotion fixes this test case. Still not sure it's a completely general fix, but it looks like a step in the right direction.
@mgravell is it ok if I use your example as a regression test case here in CoreCLR?
@AndyAyersMS absolutely fine
@AndyAyersMS possibly a silly question (I'm way out of my depth when I look at the JIT implementation), but: would this change accidentally prevent promotion when the size difference is just due to padding/alignment?
No, not silly....
It is not always easy for the jit to know why there is extra space in the struct and whether or not user code might attempt to access it somehow. So we try to be careful, but we also want to make sure we can promote as many cases as possible, as promotion is a key to optimizing structs aggressively.
The assumption we make with dotnet/coreclr#19156 is that if the there is just one field at offset 0 and the enclosing struct is larger than the field, the user code might access the extra space. And additionally, we only do this when the enclosing struct is itself enclosed by a struct.
When we make changes like this we also examine the jit generated code on a large number of methods (via the jit-diff tools from jitutils ... and I didn't see any code changes in my initial scan. I'll do a bit broader sweep in a bit and post the results over on the PR.
That calls into question whether we properly handle the case where we don't have the outer enclosing struct. I think we're ok here -- we have made fixes for this case recently, see for instance dotnet/runtime#9666. But perhaps we should invest in some additional testing.
Should be fixed via dotnet/coreclr#19156.
Description
Small structs (<= 64 bytes) containing fixed fields may be incorrectly promoted by the jit, leaving the trailing part of the struct uninitialized in some struct copies.
Customer Impact
Struct contents are incorrect, can cause crashes or other unexpcted behavior.
Regression?
Regression in 2.1
Risk
Low: fixed struct construct is not commonly used; no diffs across all existing framework and coreclr code.
2.1 PR: dotnet/coreclr#19224
Reopen to consider porting fix to release/2.1 servicing.
Approved for 2.1.4
Fix is now merged into the 2.1 branch via dotnet/coreclr#19224.
Most helpful comment
I can repro this in coreclr master, and
complus_jitminopts=1fixes things, so it does seem to be related to jit optimization.Currently suspect there's a missing check in struct promotion, or perhaps we missed seeing an address exposure.
Could be a consequence of dotnet/coreclr#16220 where we started allowing promotion more broadly, which jives with this being 2.1 or later behavior.
Local var header for many methods shows we've promoted we think is an 8 byte
_chunksfield.