Runtime: Do not use coreclr.{dll/so/dylib} name for Mono runtime

Created on 27 Mar 2020 · 23Comments · Source: dotnet/runtime

Reusing CoreCLR name for Mono is confusing
There are number of places (e.g. diagnostic/debuggers) that look for coreclr.dll. When they see coreclr.dll that is not actually coreclr.dll, they will likely break in creative ways.

cc @leculver @dotnet/dotnet-diag

area-VM-meta-mono

Source

jkotas

👍8

Most helpful comment

@steveisok Couple of questions about the intent of the changes:

Do we plan to support mono VM in FDD (shared framework)? I hope not, as that would introduce lot of new challenges?
If we don't support mono VM FDD, then having a command line option to switch between the VMs doesn't seem to be useful as SCD (self-contained) apps will not come with two runtimes.
As such I don't see a need to pass any new info between hostfxr and hostpolicy
The code seems to suggest that the mono VM lives in a mono subdirectory (or maybe I'm reading this wrong)?

If we plan to support FDD with mono VM, then that's a whole other problem with probably large number of hosting changes required (depending on how the Mono VM's NETCore.App framework would be named, versioned and how it would be installed).

If we plan to support only SCD with mono VM, then I think we should take one of the two approaches:

Share hostpolicy between the two runtimes fully - in that case hostpolicy would simply look for both coreclr.dll and mono.dll (or the respective OS variants of these names). It would use the one it finds (presumably coreclr.dll would be tested first purely for backward compat reasons).
- Pros: shared code, no need to build two versions of hostpolicy. Easier to test as most functionality of hostpolicy would be identical between the two runtimes.
- Cons: ideally the ABI of coreclr and mono would not be too different. Hostpolicy could adapt the ABI based on the filename, but it would better to not diverge too much. In theory we would need to support this "shared hostpolicy" moving forward, but in reality this is probably non-issue as hostpolicy versions and ships with the runtime anyway.
Have different hostpolicy for each runtime. Mono would ship its own version of hostpolicy. Both versions would expose the same ABI to hostfxr, but otherwise they could be entirely different.
- Pros: complete freedom in the runtime ABI and behavior. The interface between hostpolicy and the runtimes could be entirely different.
- Cons: Necessity to build hostpolicy twice - probably with some code share via source code sharing. Complicates build, delivery. More complex testing as we would effectively need two test suites covering the same scenarios separately.

I personally would prefer the first option - we share the hostpolicy - we only teach it to recognize both runtime names. If we have the ABIs exactly the same, that's the only change necessary. We could diverge the ABIs slightly if necessary (but not too much).

Note that in either case we need to modify hostfxr to recognize both runtime names and treat them as identical. This is unfortunate legacy behavior where hostfxr uses the presence of coreclr.dll in the app as one of the clues to detect self-contained apps (while .runtimconfig.json should be the only source of truth really). You already made this change in the above branch.

vitek-karas on 25 May 2020

👍4

All 23 comments

I think this also affects the hosting layer. For hostpolicy to be able to load Mono when it's not named coreclr.dll/so/dylib, it will have to know how to identify the mono runtime and load it (preferably using the same APIs as CoreCLR provides).

jkoritzinsky on 27 Mar 2020

We need to avoid naming Mono coreclr.dll at all costs.

Not only does it confuse a lot of diagnostic tools we've built, it will also imply work to go "fix" the same problems in partner teams. This would include places like WinDbg, Watson, and !analyze. We've already burned a lot of our goodwill recently with some of the features we've built and not informed partner teams, creating unexpected work for them long after they had a chance to give input into the process. (For example not building consensus around Portable PDBs with partner teams before it was built, among other examples. We shouldn't repeat that mistake here.)

Aside from diagnostic tools and partner teams, this also could potentially affect our ability to properly diagnose Watson failures. Watson uses a module's filename and version alone to bucket failures so we'd potentially collide with other products' coreclr.dll. Note this problem is why we went out of our way to re-version coreclr.dll to a seemingly nonsensical values (e.g. 4.700.20.6603) to avoid collisions with the previous Silverlight versions, we are going to cause even more problems if the file version of a Mono coreclr.dll happens to be even _near_ any of the other versions of coreclr.dll we've shipped. It will effectively mix all failures together making it very, very difficult to tell what our actual failures are in the products where these version/filenames collide.

We've been down this road before and we know it's a bad idea that will cause a lot of problems for us and our partner teams. Let's just not do this and not have the headache (or revert it now if it's already been done).

leculver on 28 Mar 2020

👍3

cc: @jeffschwMSFT @vitek-karas @swaroop-sridhar for possibly required hosting changes to enable discovering a different name for the mono runtime when running on desktop.

jkoritzinsky on 21 May 2020

I actually messed around trying to learn the host and got it to load libMono.

I may have some local changes that aren't pushed, but this is a (hacked) start... maybe :-).

https://github.com/steveisok/runtime/tree/dotnet-vm-switch

steveisok on 25 May 2020

@steveisok Couple of questions about the intent of the changes:

Do we plan to support mono VM in FDD (shared framework)? I hope not, as that would introduce lot of new challenges?
If we don't support mono VM FDD, then having a command line option to switch between the VMs doesn't seem to be useful as SCD (self-contained) apps will not come with two runtimes.
As such I don't see a need to pass any new info between hostfxr and hostpolicy
The code seems to suggest that the mono VM lives in a mono subdirectory (or maybe I'm reading this wrong)?

If we plan to support only SCD with mono VM, then I think we should take one of the two approaches:

Share hostpolicy between the two runtimes fully - in that case hostpolicy would simply look for both coreclr.dll and mono.dll (or the respective OS variants of these names). It would use the one it finds (presumably coreclr.dll would be tested first purely for backward compat reasons).
- Pros: shared code, no need to build two versions of hostpolicy. Easier to test as most functionality of hostpolicy would be identical between the two runtimes.
- Cons: ideally the ABI of coreclr and mono would not be too different. Hostpolicy could adapt the ABI based on the filename, but it would better to not diverge too much. In theory we would need to support this "shared hostpolicy" moving forward, but in reality this is probably non-issue as hostpolicy versions and ships with the runtime anyway.
Have different hostpolicy for each runtime. Mono would ship its own version of hostpolicy. Both versions would expose the same ABI to hostfxr, but otherwise they could be entirely different.
- Pros: complete freedom in the runtime ABI and behavior. The interface between hostpolicy and the runtimes could be entirely different.
- Cons: Necessity to build hostpolicy twice - probably with some code share via source code sharing. Complicates build, delivery. More complex testing as we would effectively need two test suites covering the same scenarios separately.

vitek-karas on 25 May 2020

👍4

I think the first option makes sense for our use case. I don’t see us needing to diverge at this point at all. Also, the reason why I had a separate folder was primarily for mono’s System.Private.Corlib.

This branch was just for my learning, so if it helps at all, great. Thankfully, we are deploying mono as stand alone only because I agree there would be challenges with alternating runtimes in FDD.

steveisok on 26 May 2020

Why does the name of the DLL matter? Mono implements the same hosting API as libcoreclr.so. If other tools are looking for other undocumented APIs, those APIs should be documented and Mono should implement them.

The two runtimes should be drop in replacements for each other. Differentiating runtime capabilities based on the name of the DLL seems to move in the opposite direction.

lambdageek on 26 May 2020

The host needs to be able to find and load the runtime dll. See this doc which currently explicitly says that the final step is to find coreclr.dll and load it.
As mentioned above, it's not a good idea to call mono's runtime coreclr.dll, so we need to change the host to be able to find and load mono instead.

Note that none of the host configuration files (.runtimeconfig.json and .deps.json) doesn't exactly specify which dll is the runtime - it does list the runtime dll as one of the native dependencies, but it doesn't say which out of the many native dependencies is the runtime dll (the entrypoint).

vitek-karas on 26 May 2020

My concern is that if we build code that makes decisions based on the _name_ of the runtime, rather than its _capabilities_, we're adding future technical debt. (What happens when Mono converges closer to CoreCLR?) I appreciate that the host doesn't have this limitation.

Watson's bucketing based on the DLL name is unfortunate.

lambdageek on 26 May 2020

I do agree that ideally the only "difference" would be the name of the dll.

vitek-karas on 26 May 2020

Watson's bucketing based on the DLL name is unfortunate.

That's how all Watson bucketing works and has worked for decades. It would not be a small project to try to change this just for coreclr.dll. We require all products to "play nice" with Watson so that we can find bugs (especially security related) in products which build on top of us.

By making it harder to distinguish between coreclr and mono, we are not just making our lives harder but also those of our partner teams at Microsoft who use Watson.

My concern is that if we build code that makes decisions based on the _name_ of the runtime, rather than its _capabilities_, we're adding future technical debt.
The two runtimes should be drop in replacements for each other. Differentiating runtime capabilities based on the name of the DLL seems to move in the opposite direction.

Please note that while the runtimes are converging in terms of _runtime_ apis and behaviors, there's no plan to unify the diagnostics story between the two. At least there's no plan as far as I've heard. This means:

By naming them the same thing we are adding technical debt _today_, and it's debt that has to be mostly paid by our partner teams and external customers who use and build diagnostic tools.
The diagnostics capabilities, diagnostic apis, and diagnostic implementation of the two runtimes _are_ different. From a diagnostics standpoint the two runtimes aren't drop in replacements for each other even if they are at runtime when executing user code.
We will always have to distinguish the two runtimes from each other in some way in order for debuggers to properly handle crashdumps/coredumps/live debugging. Today we distinguish them by loaded module name. We could do something different, but debuggers and diagnostics tools will have a separate codepath for each runtime for the forseeable future. Making this change forces every debugger/tool currently enlightened to coreclr.dll to make changes for no benefit.

@lambdageek Architecturally I understand what you are pushing for, but the reality of our diagnostics story is complicated. The work we would need to do to accommodate naming both runtimes coreclr.dll the same is a very high just to make everything work exactly the same as it does today, and it's not scoped to our teams. It's changing the ecosystem in a way that everyone has to react to just to stay in the same place.

leculver on 26 May 2020

One last thought I wanted to capture here. Even ignoring other scenarios, when naming these two libraries the same name it would be very difficult to disambiguate between the two of them in some crashdump scenarios. Minidumps might not contain any parts of the module itself, not even the first few bytes of the PE header. Instead, we sometimes (but rarely) have to use only the MINIDUMP_MODULE structure to know about an image.

In cases where we have a minidump without module data, and where we cannot obtain the original PE image (such as no connection to the symbol server or CLR forgot to index a binary), then a debugger developer is in a real bind on what to do in that case. It would have to have two different debugging engines for mono vs coreclr, but it would have to guess as to which one to use because there's not enough info to know whether mono is loaded or coreclr is loaded...and debugging those two things aren't compatible.

This scenario should be rare, and it's not something a debugger couldn't overcome, but it's another example illustrating the messy situations that arise from naming these two DLLs the same name.

leculver on 26 May 2020

Watson's bucketing based on the DLL name is unfortunate.

I'm not sure why this is viewed as a negative and I may not understand some background issue? I agree that some places where code is currently using the module name to discriminate should not be doing so, but IMO Watson isn't one of them. In the case of Watson an abstraction based on capabilities appears undesirable, the implementation is the thing I want to correlate issues with.

noahfalk on 26 May 2020

👍1

@noahfalk:

Watson's bucketing based on the DLL name is unfortunate.

I'm not sure why this is viewed as a negative and I may not understand some background issue?

I'm not sure about Aleksey's position, but keying on the module name alone has the unfortunate side effect of grouping like-named modules into the same bucket, for example, everything I mentioned in my topmost reply. In a perfect world Watson would understand that coreclr.dll can be separate products such as Silverlight, .Net Core, and Mono (though the last case is what we are debating). That would let us bucket the products' failures separately instead of getting one giant bucket mixed together.

I say in an ideal world because I don't know how we would accomplish that given the technical challenges of Watson dumps. I agree it's unfortunate, but I don't know of any better way to do it.

leculver on 30 May 2020

@jkotas Since a lot of our scope was readjusted for net6, I think we should move this issue to the 6.0 milestone.

steveisok on 7 Jul 2020

👍3

The two runtimes should be drop in replacements for each other.

@lambdageek, sorry for an off-topic question (looking for a quick pointer / pro tip 🙂): currently, when we build all components of dotnet/runtime repo, i.e. ./build.sh -c Release, we get tarballs produced in artifacts/packages/Release/Shipping directory. Extracting the artifacts/packages/Release/Shipping/dotnet-runtime-5.0.0-dev-illumos-x64.tar.gz does not gives us mono. I manually overwrote mono's libcoreclr.so and corelib (built from the same git rev) on openindiana to the extracted location, and dotnet(1) was able to execute program using mono runtime.

What is the official packaging plan for 5.0; will we have a separate dotnet installer for mono for itanium or would it be in the same installer as coreclr? (it would be nice to have an environment variable to select the default runtime on dev box, e.g. DOTNET_RUNTIME_FLAVOR=mono)

am11 on 11 Aug 2020

@am11 Aleksey is currently on paternity leave. Maybe @steveisok or @akoeplinger has an idea?

CoffeeFlux on 11 Aug 2020

👍1

@am11 The only mono runtime packs we are generating at this time are for osx, linux x64, and linux arm64 (technically wasm too). As I understand it for net5, there is no plan beyond what we're already generating.

steveisok on 11 Aug 2020

@steveisok, are those mono dedicated packs? and does ./build.sh -c Release give us those packs in artifacts/packages/Release/Shipping or is there a different switch used to produce them?

am11 on 11 Aug 2020

If you're, for example, on osx, you'll get a mono specific microsoft.netcore.app runtime pack when you build like that. And if mono is all you're after, then you can skip building coreclr w/ ./build.sh -Subset mono+libs+installer -c Release. It will speed up your build considerably.

steveisok on 11 Aug 2020

Thank you. I have found artifacts/bin/ref/microsoft.netcore.app/Release/ on macOS and linux. However, that is missing bunch of things to execute an app (.dll), such as Microsoft.NETCore.App.deps.json and libhostpolicy.so (which we can fix by hand, sure). imo:

if it is going to be a separate installation for 5.0, it would be nice to have similar self-contained package as coreclr produced under artifacts/packages/Release/Shipping (preferably with mono in package's filename), so devs can extract and use it (e.g. artifacts/packages/Release/Shipping/dotnet-runtime-5.0.0-dev-osx-x64.tar.gz has everything needed from the runtime).
if it is going to be part of unified dotnet-runtime package (containing coreclr and mono bits SxS somehow), then it would be nice to have a switch to select runtime (in dotnet --runtime {mono,coreclr} path/to/my.dll as well as environment variable DOTNET_RUNTIME_FLAVOR={mono,coreclr}). this will prevent users from manually overwriting binaries).

am11 on 12 Aug 2020

We don't plan to do this for 5.0 as the main usage of mono in 5.0 is WebAssembly, but we'll flush out the "mono on desktop" details for a later release.