Runtime: WPF Applications crashing when loading DWrite

Created on 20 May 2019  Â·  30Comments  Â·  Source: dotnet/runtime

  • .NET Core Version: 3.0.100-preview6-012012, regression from preview5, exact build that regresses unknown at this time.
  • Windows version: N/A
  • Does the bug reproduce also in WPF for .NET Framework 4.8?:No

Problem description:
When a WPF application loads DirectWriteForwarder, the module constructor attempts to load DirectWrite and instantiate the appropriate factory for use with font rendering. In this build, the invocation of the creation function for the factory fails with an invalid cast HResult.

The offending function is in Factory,cpp on line 122 (pasted here as this source is not currently available):
```c#
__declspec(noinline) void Factory::Initialize(
FactoryType factoryType
)
{
IUnknown* factoryTemp;
DWRITECREATEFACTORY pfnDWriteCreateFactory = (DWRITECREATEFACTORY)GetDWriteCreateFactoryFunctionPointer();

    HRESULT hr = (*pfnDWriteCreateFactory)(
        DWriteTypeConverter::Convert(factoryType),
        (REFIID)(*(_guidForIDWriteFactory->Value)),
        &factoryTemp
        );

    ConvertHresultToException(hr, "Factory::Initialize");

    _pFactory = (IDWriteFactory*)factoryTemp;
}

```
Actual behavior: Invalid cast exception.

Expected behavior: DirectWrite is initialized.

Minimal repro: https://github.com/rladuca/DWriteIssue

@dotnet/wpf-developers

area-Interop-coreclr blocking bug

Most helpful comment

@rladuca @vatsan-madhavan If you need a quick fix you may want to just pass the IID directly, without involving _guidForIDWriteFactory. The justification for that complication that I see in the reference source

            /// This variable stores the GUID of the IDWriteFactory interface.
            /// The reason we are not using __uuidof(IDWriteFactory) is because the complier generates a global
            /// variable and a static method to initialize it which is not annotated properly with security tags.
            /// This makes the static method fail NGENing and causes Jitting which affects perf.
            /// If the complier gets fixed then we can remove this scheme and use __uuidof(IDWriteFactory).

is likely irrelevant in .NET Core where security is no longer relevant.

All 30 comments

I was able to repro this in 3.0.100-preview6-011846. Replacing DirectWriteForwarder.dll from an older build (specifically, 3.0.100-preview6-012013) resolves the problem.

This doesn't seem related to loading of dwrite - dwrite has already been loaded when this error happens.

Nor does this seem related to anything in coreclr - attempting to swap out to an older version of coreclr makes no difference.

@tgani-msft, I see a lot of unexplained IL differences in DirectWriteForwarder between those two versions.

Changing milestone from 3.0 => Preview. This will be a Preview 6 blocker

It's fixed in 3.0.100-preview6-012016

Closing as this is fixed.

@jeffschwMSFT, was there a problem/bug in coreclr recently that might have caused this?

/cc @ericstj
/cc @dotnet/wpf-developers

We have not made any recent fixes that look similar to this issue.

3.0.100-preview6-011846 reproduces this for anyone - WPF is completely dead on this version of core-sdk.
3.0.100-preview6-012016 resolved it.

We have no idea what changed underneath.

@tgani-msft, can you think of anything related to the C++ compiler/tools that might have been the cause for this?

We have a report that 3.0.100-preview6-012027 is affected again. Reactivating.

Since it happens that I run into this exception myself I took a quick look out of curiosity. It would seem that DWriteCreateFactory is called with an invalid IID and thus the call fails with E_NOINTERFACE.

The IID doesn't seem to be completely invalid, it's more like truncated - 4 bytes are fine and the rest are 0. That smalls like bad code generation to me. I'll try to take a closer look.

This is a bug in the JIT compiler. The static constructor of Factory

static Factory::Factory()
{
    System::Guid guid = System::Guid("b859ee5a-d838-4b5b-a2e8-1adc7d93db48");
    _GUID* pGuidForIDWriteFactory = new _GUID();
    *pGuidForIDWriteFactory = Native::Util::ToGUID(guid);
    _guidForIDWriteFactory = gcnew NativePointerWrapper<_GUID>(pGuidForIDWriteFactory);  
}

contains a 16 byte cpblk

    IL_0034: ldloc.0
    IL_0035: ldloca.s 3
    IL_0037: ldc.i4.s 16
    IL_0039: unaligned. 4
    IL_003c: cpblk

that copies a _GUID to *pGuidForIDWriteFactory. The JIT gets confused by the fact that the _GUID struct contains a single int field and copies only 4 bytes instead of 16.

fgMorphCopyBlock:block assignment to morph:
               [000037] ----G+-N----              /--*  LCL_VAR   struct(AX)(P) V03 loc3         
                                                  /--*    int    V03.<alignment member> (offs=0x00) -> V08 tmp3         
               [000041] -A-XG-------              *  ASG       struct (copy)
               [000040] ---X-+-N----              \--*  BLK(16)   struct
               [000036] -----+------                 \--*  LCL_VAR   long   V00 loc0         
 (srcDoFldAsg=true) using field by field assignments.

fgMorphCopyBlock (after):
               [000096] ----G--N----              /--*  LCL_VAR   int   (AX) V08 tmp3         
               [000097] -A-XG+------              *  ASG       int   
               [000095] *--X---N----              \--*  IND       int   
               [000093] ------------                 |  /--*  CNS_INT   long   0 Fseq[<alignment member>]
               [000094] ------------                 \--*  ADD       byref 
               [000092] ------------                    \--*  LCL_VAR   long   V00 loc0         

@CarolEidt FYI I'll take look. In theory this shouldn't be difficult to fix but I wouldn't surprised to find that it is - we need to distinguish between an actual cpblk (which under no circumstances can ignore holes) and a cpobj which may do field by field copy.

@rladuca @vatsan-madhavan If you need a quick fix you may want to just pass the IID directly, without involving _guidForIDWriteFactory. The justification for that complication that I see in the reference source

            /// This variable stores the GUID of the IDWriteFactory interface.
            /// The reason we are not using __uuidof(IDWriteFactory) is because the complier generates a global
            /// variable and a static method to initialize it which is not annotated properly with security tags.
            /// This makes the static method fail NGENing and causes Jitting which affects perf.
            /// If the complier gets fixed then we can remove this scheme and use __uuidof(IDWriteFactory).

is likely irrelevant in .NET Core where security is no longer relevant.

@mikedn Thank you for the detailed analysis and suggested fix. I am going to make the change and test locally to see if everything works. Do you foresee any other side effects of the codegen issue? Is this just a very specific case or is there a broader general issue we should be watching for (in maybe other unrelated structs)?

Also, can the compiler (/cc @tgani-msft) do something to avoid tripping this or related bugs?

Do you foresee any other side effects of the codegen issue? Is this just a very specific case or is there a broader general issue we should be watching for (in maybe other unrelated structs)?

I wouldn't be surprised to find more similar bugs in the dwrite forwarder code, it's written in C++/CLI and the IL produced by VC++ compiler is somewhat unusual.

I don't quite understand why this happened only now. I don't think there was any recent JIT change that could have caused this, maybe @CarolEidt would remember as she's familiar with this part of the JIT.

I suppose it may have caused by a VC++ change as well, the _GUID struct is:

c# [StructLayout(LayoutKind.Sequential, Size = 16)] [NativeCppClass] [UnsafeValueType] internal struct _GUID { private int <alignment member>; }
I don't remember seeing such "alignment member" fields in the past. But maybe it's only my memory, it's been a while since I looked at C++/CLI generated code.

P.S.
Nope, it's not VC++ related. The _GUID struct and the IL code seem to be identical in .NET 4.8.

@mikedn This seems to be ping-ponging for some reason. I just installed 3.0.100-preview6-012028 and I am not getting the crash (sans fix).

@arpitmathur and @vatsan-madhavan Also confirmed this.

Maybe that will help in diagnosing some difference?

It would be nice to have minimal repro (ideally without WPF), so that JIT team can take closer look.
@mikedn are you able to craft it? (and do you have the time?)

cc @RussKeldorph @jkotas @CarolEidt @BruceForstall as this is likely JIT bug

@dotnet/jit-contrib

We are observing an issue where static type constructors are NOT called:

Here is the problem (it repro in WinForms also):

public class MyWfpControl : Control
{
   static int foo = Goo();
   ….
}

When the type MyWpfControl is loaded by the .NET core, the static type initializer Goo() is not called.

We think is a dot net core problem because of 2 reasons:

  1. Started happening after upgrading to a newer version of dotnet core.
  2. It repro in both WinForms and WPF.

@ocalvo Could you please share complete repro? There is nothing that guarantees the static type initializer Goo() to be called in your example. Your example has "before field init" static constructor. Such static constructor is guaranteed to be called only when the field foo is accessed.

@ocalvo, which version of sdk did you start noticing your problem?

It would be nice to have minimal repro (ideally without WPF), so that JIT team can take closer look.

C# only (no WPF, no C++/CLI):
```C#
unsafe class Program
{
[StructLayout(LayoutKind.Sequential, Size = 16)]
struct GUID
{
private int align;
}

static void Main()
{
    GUID g = default;
    Test(&g);
    Console.WriteLine(Unsafe.As<GUID, Guid>(ref g));
}

[MethodImpl(MethodImplOptions.NoInlining)]
static GUID GetGUID(ref Guid guid) => Unsafe.As<Guid, GUID>(ref guid);

[MethodImpl(MethodImplOptions.NoInlining)]
static unsafe void Test(GUID* result)
{
    Guid g = Guid.NewGuid();
    GUID guid = GetGUID(ref g);
    //*result = guid;
    Unsafe.CopyBlock(result, &guid, (uint)sizeof(GUID));
}

}
`` This prints something like02551bc7-0000-0000-0000-000000000000, like the IID passed toDWriteCreateFactory` this has 4 valid bytes and the rest are all 0.

Still using cpblk, like in the C++/CLI version. The issue can be reproduced without cpblk but it's debatable if copying such a struct requires copying the entire struct or only its sole field. Obviously this is not up for debate in the case of cpblk, it must copy all 16 bytes.

This seems to be ping-ponging for some reason. I just installed 3.0.100-preview6-012028 and I am not getting the crash (sans fix).

So far I haven't found anything that would explain any kind of random behavior. It really looks like there's a hole in the JIT/VM logic - the JIT does take some precautions to not promote such structs, exactly due to C++/CLI, but it relies on the VM classifying the struct as having custom layout. Except that the VM doesn't classify GUID as "custom layout", despite it having explicit size. This has something to do with a certain VMFLAG_NOT_TIGHTLY_PACKED flag that I don't quite understand when it is supposed to be set.

Any ideas @jkotas? Shouldn't GUID be CORINFO_FLG_CUSTOMLAYOUT?

Here is the sample code. You need to uncomment these lines.

@ocalvo, which version of sdk did you start noticing your problem?

@marb2000, @azchohfi do you remember which version specifically of the dontnet we started seeing the problem?

By looking at the toolkit history I want to say "3.0.100-preview5-011568"

preview 4 worked without this workaround, AFAIR.

The C# static field issue has nothing to do with this issue. Please ignore.

Shouldn't GUID be CORINFO_FLG_CUSTOMLAYOUT?

Yes, it should.

This is regression introduced by https://github.com/dotnet/coreclr/pull/23013/ . Before this change, pEEClassLayoutInfoOut->SetHasExplicitSize(TRUE) got called for this struct that in turn set the custom layout flag. The SetHasExplicitSize(TRUE) call was deleted in this change with no replacement.

Note that the bug repros only with optimizations on. You have to turn off tiered JITing; or run the repro in a loop; to see it.

@jkoritzinsky Could you please take a look?

cc @AaronRobinsonMSFT @jeffschwMSFT

@dotnet-actwx-bot FYI

Hi, I'm still getting the error in net core 3.1.8, easy to repeat in C#

Typeface font = new Typeface("Arial");

Was there a fix/work around?
Thanks

@BentleyJamesSlater since this fix is in 3.0/3.1, and the issue is closed, I suggest to open a new issue with as much info as possible eg., callstack.

Was this page helpful?
0 / 5 - 0 ratings