cc @leculver @dotnet/dotnet-diag
I think this also affects the hosting layer. For hostpolicy to be able to load Mono when it's not named coreclr.dll/so/dylib, it will have to know how to identify the mono runtime and load it (preferably using the same APIs as CoreCLR provides).
We need to avoid naming Mono coreclr.dll at all costs.
Not only does it confuse a lot of diagnostic tools we've built, it will also imply work to go "fix" the same problems in partner teams. This would include places like WinDbg, Watson, and !analyze. We've already burned a lot of our goodwill recently with some of the features we've built and not informed partner teams, creating unexpected work for them long after they had a chance to give input into the process. (For example not building consensus around Portable PDBs with partner teams before it was built, among other examples. We shouldn't repeat that mistake here.)
Aside from diagnostic tools and partner teams, this also could potentially affect our ability to properly diagnose Watson failures. Watson uses a module's filename and version alone to bucket failures so we'd potentially collide with other products' coreclr.dll. Note this problem is why we went out of our way to re-version coreclr.dll to a seemingly nonsensical values (e.g. 4.700.20.6603) to avoid collisions with the previous Silverlight versions, we are going to cause even more problems if the file version of a Mono coreclr.dll happens to be even _near_ any of the other versions of coreclr.dll we've shipped. It will effectively mix all failures together making it very, very difficult to tell what our actual failures are in the products where these version/filenames collide.
We've been down this road before and we know it's a bad idea that will cause a lot of problems for us and our partner teams. Let's just not do this and not have the headache (or revert it now if it's already been done).
cc: @jeffschwMSFT @vitek-karas @swaroop-sridhar for possibly required hosting changes to enable discovering a different name for the mono runtime when running on desktop.
I actually messed around trying to learn the host and got it to load libMono.
I may have some local changes that aren't pushed, but this is a (hacked) start... maybe :-).
@steveisok Couple of questions about the intent of the changes:
hostfxr
and hostpolicy
mono
subdirectory (or maybe I'm reading this wrong)?If we plan to support FDD with mono VM, then that's a whole other problem with probably large number of hosting changes required (depending on how the Mono VM's NETCore.App framework would be named, versioned and how it would be installed).
If we plan to support only SCD with mono VM, then I think we should take one of the two approaches:
hostpolicy
between the two runtimes fully - in that case hostpolicy
would simply look for both coreclr.dll
and mono.dll
(or the respective OS variants of these names). It would use the one it finds (presumably coreclr.dll
would be tested first purely for backward compat reasons).hostpolicy
. Easier to test as most functionality of hostpolicy
would be identical between the two runtimes.hostpolicy
for each runtime. Mono would ship its own version of hostpolicy
. Both versions would expose the same ABI to hostfxr
, but otherwise they could be entirely different.I personally would prefer the first option - we share the hostpolicy
- we only teach it to recognize both runtime names. If we have the ABIs exactly the same, that's the only change necessary. We could diverge the ABIs slightly if necessary (but not too much).
Note that in either case we need to modify hostfxr
to recognize both runtime names and treat them as identical. This is unfortunate legacy behavior where hostfxr
uses the presence of coreclr.dll
in the app as one of the clues to detect self-contained apps (while .runtimconfig.json
should be the only source of truth really). You already made this change in the above branch.
I think the first option makes sense for our use case. I don鈥檛 see us needing to diverge at this point at all. Also, the reason why I had a separate folder was primarily for mono鈥檚 System.Private.Corlib.
This branch was just for my learning, so if it helps at all, great. Thankfully, we are deploying mono as stand alone only because I agree there would be challenges with alternating runtimes in FDD.
Why does the name of the DLL matter? Mono implements the same hosting API as libcoreclr.so
. If other tools are looking for other undocumented APIs, those APIs should be documented and Mono should implement them.
The two runtimes should be drop in replacements for each other. Differentiating runtime capabilities based on the name of the DLL seems to move in the opposite direction.
The host needs to be able to find and load the runtime dll. See this doc which currently explicitly says that the final step is to find coreclr.dll
and load it.
As mentioned above, it's not a good idea to call mono's runtime coreclr.dll
, so we need to change the host to be able to find and load mono instead.
Note that none of the host configuration files (.runtimeconfig.json
and .deps.json
) doesn't exactly specify which dll is the runtime - it does list the runtime dll as one of the native dependencies, but it doesn't say which out of the many native dependencies is the runtime dll (the entrypoint).
My concern is that if we build code that makes decisions based on the _name_ of the runtime, rather than its _capabilities_, we're adding future technical debt. (What happens when Mono converges closer to CoreCLR?) I appreciate that the host doesn't have this limitation.
Watson's bucketing based on the DLL name is unfortunate.
I do agree that ideally the only "difference" would be the name of the dll.
Watson's bucketing based on the DLL name is unfortunate.
That's how all Watson bucketing works and has worked for decades. It would not be a small project to try to change this just for coreclr.dll. We require all products to "play nice" with Watson so that we can find bugs (especially security related) in products which build on top of us.
By making it harder to distinguish between coreclr and mono, we are not just making our lives harder but also those of our partner teams at Microsoft who use Watson.
My concern is that if we build code that makes decisions based on the _name_ of the runtime, rather than its _capabilities_, we're adding future technical debt.
The two runtimes should be drop in replacements for each other. Differentiating runtime capabilities based on the name of the DLL seems to move in the opposite direction.
Please note that while the runtimes are converging in terms of _runtime_ apis and behaviors, there's no plan to unify the diagnostics story between the two. At least there's no plan as far as I've heard. This means:
@lambdageek Architecturally I understand what you are pushing for, but the reality of our diagnostics story is complicated. The work we would need to do to accommodate naming both runtimes coreclr.dll the same is a very high just to make everything work exactly the same as it does today, and it's not scoped to our teams. It's changing the ecosystem in a way that everyone has to react to just to stay in the same place.
One last thought I wanted to capture here. Even ignoring other scenarios, when naming these two libraries the same name it would be very difficult to disambiguate between the two of them in some crashdump scenarios. Minidumps might not contain any parts of the module itself, not even the first few bytes of the PE header. Instead, we sometimes (but rarely) have to use only the MINIDUMP_MODULE structure to know about an image.
In cases where we have a minidump without module data, and where we cannot obtain the original PE image (such as no connection to the symbol server or CLR forgot to index a binary), then a debugger developer is in a real bind on what to do in that case. It would have to have two different debugging engines for mono vs coreclr, but it would have to guess as to which one to use because there's not enough info to know whether mono is loaded or coreclr is loaded...and debugging those two things aren't compatible.
This scenario should be rare, and it's not something a debugger couldn't overcome, but it's another example illustrating the messy situations that arise from naming these two DLLs the same name.
Watson's bucketing based on the DLL name is unfortunate.
I'm not sure why this is viewed as a negative and I may not understand some background issue? I agree that some places where code is currently using the module name to discriminate should not be doing so, but IMO Watson isn't one of them. In the case of Watson an abstraction based on capabilities appears undesirable, the implementation is the thing I want to correlate issues with.
@noahfalk:
Watson's bucketing based on the DLL name is unfortunate.
I'm not sure why this is viewed as a negative and I may not understand some background issue?
I'm not sure about Aleksey's position, but keying on the module name alone has the unfortunate side effect of grouping like-named modules into the same bucket, for example, everything I mentioned in my topmost reply. In a perfect world Watson would understand that coreclr.dll can be separate products such as Silverlight, .Net Core, and Mono (though the last case is what we are debating). That would let us bucket the products' failures separately instead of getting one giant bucket mixed together.
I say in an ideal world because I don't know how we would accomplish that given the technical challenges of Watson dumps. I agree it's unfortunate, but I don't know of any better way to do it.
@jkotas Since a lot of our scope was readjusted for net6, I think we should move this issue to the 6.0 milestone.
The two runtimes should be drop in replacements for each other.
@lambdageek, sorry for an off-topic question (looking for a quick pointer / pro tip 馃檪): currently, when we build all components of dotnet/runtime repo, i.e. ./build.sh -c Release
, we get tarballs produced in artifacts/packages/Release/Shipping
directory. Extracting the artifacts/packages/Release/Shipping/dotnet-runtime-5.0.0-dev-illumos-x64.tar.gz
does not gives us mono. I manually overwrote mono's libcoreclr.so and corelib (built from the same git rev) on openindiana to the extracted location, and dotnet(1) was able to execute program using mono runtime.
What is the official packaging plan for 5.0; will we have a separate dotnet installer for mono for itanium or would it be in the same installer as coreclr? (it would be nice to have an environment variable to select the default runtime on dev box, e.g. DOTNET_RUNTIME_FLAVOR=mono
)
@am11 Aleksey is currently on paternity leave. Maybe @steveisok or @akoeplinger has an idea?
@am11 The only mono runtime packs we are generating at this time are for osx, linux x64, and linux arm64 (technically wasm too). As I understand it for net5, there is no plan beyond what we're already generating.
@steveisok, are those mono dedicated packs? and does ./build.sh -c Release
give us those packs in artifacts/packages/Release/Shipping
or is there a different switch used to produce them?
If you're, for example, on osx, you'll get a mono specific microsoft.netcore.app
runtime pack when you build like that. And if mono is all you're after, then you can skip building coreclr w/ ./build.sh -Subset mono+libs+installer -c Release
. It will speed up your build considerably.
Thank you. I have found artifacts/bin/ref/microsoft.netcore.app/Release/
on macOS and linux. However, that is missing bunch of things to execute an app (.dll), such as Microsoft.NETCore.App.deps.json
and libhostpolicy.so
(which we can fix by hand, sure). imo:
artifacts/packages/Release/Shipping
(preferably with mono
in package's filename), so devs can extract and use it (e.g. artifacts/packages/Release/Shipping/dotnet-runtime-5.0.0-dev-osx-x64.tar.gz
has everything needed from the runtime).dotnet --runtime {mono,coreclr} path/to/my.dll
as well as environment variable DOTNET_RUNTIME_FLAVOR={mono,coreclr}
). this will prevent users from manually overwriting binaries).We don't plan to do this for 5.0 as the main usage of mono in 5.0 is WebAssembly, but we'll flush out the "mono on desktop" details for a later release.
Removing blocking-release a this was moved to 6.0 and that label is being used for 5.0 tracking.
Most helpful comment
@steveisok Couple of questions about the intent of the changes:
hostfxr
andhostpolicy
mono
subdirectory (or maybe I'm reading this wrong)?If we plan to support FDD with mono VM, then that's a whole other problem with probably large number of hosting changes required (depending on how the Mono VM's NETCore.App framework would be named, versioned and how it would be installed).
If we plan to support only SCD with mono VM, then I think we should take one of the two approaches:
hostpolicy
between the two runtimes fully - in that casehostpolicy
would simply look for bothcoreclr.dll
andmono.dll
(or the respective OS variants of these names). It would use the one it finds (presumablycoreclr.dll
would be tested first purely for backward compat reasons).hostpolicy
. Easier to test as most functionality ofhostpolicy
would be identical between the two runtimes.hostpolicy
for each runtime. Mono would ship its own version ofhostpolicy
. Both versions would expose the same ABI tohostfxr
, but otherwise they could be entirely different.I personally would prefer the first option - we share the
hostpolicy
- we only teach it to recognize both runtime names. If we have the ABIs exactly the same, that's the only change necessary. We could diverge the ABIs slightly if necessary (but not too much).Note that in either case we need to modify
hostfxr
to recognize both runtime names and treat them as identical. This is unfortunate legacy behavior wherehostfxr
uses the presence ofcoreclr.dll
in the app as one of the clues to detect self-contained apps (while.runtimconfig.json
should be the only source of truth really). You already made this change in the above branch.