https://mc.dot.net/#/user/dotnet-bot/pr~2Fdotnet~2Fcorefx~2Frefs~2Fpull~2F34871~2Fmerge/test~2Ffunctional~2Fcli~2F~2Fouterloop~2F/20190128.1/workItem/System.Runtime.Serialization.Formatters.Tests/analysis/xunit/System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests~2FValidateAgainstBlobs(obj:%20Bitmap%20%7B%20Flags%20=%2077840,%20FrameDimensionsList%20=%20%5B7462dc86-6180-4c7e-8e3f-ee7333a7a483%5D,%20Height%20=%20)
Assert.Equal() Failure
Expected: 77840
Actual: 73744
at System.Runtime.Serialization.Formatters.Tests.EqualityExtensions.IsEqual(Bitmap this, Bitmap other, Boolean isSamePlatform) in /__w/1/s/src/System.Runtime.Serialization.Formatters/tests/EqualityExtensions.cs:line 1233
Failing in Outerloop. @safern can you please take a look.
Hasn't this been failing for a long time in outer loop?
Right.
Ok. I was just confused by the "safern can you please take a look" part :)
I pinged him as he is the owner of System.Drawing who also added some serialization tests for it. I didn't notice that failing test in Outerloop before so that's why I created this issue now.
Got it. :)
This is consistently failing on OpenSUSE.
I鈥檒l take a look.
I ran this in a openSuse 42.3 container that I've got, and it didn't repro, I actually got the expected value from the raw image and from the blob-deserialized image. I will try to use the repro tool tomorrow to validate the machines have the latest libgdiplus installed.
Thanks.
I'm waiting for the engineering team to provide me with a machine with helix setup to validate this issue and understand if the issue is libgdiplus version itself or not.
I tried to repro this on 2 different OpenSUSE42.3 environments, one a docker container and the other one of the helix machines, I didn't have any luck reproducing the issue. I also just downloaded the latest testResults.xml file from some of the latest outerloop runs (official builds do outerloop) and it is passing:
<test name="System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests.ValidateAgainstBlobs(obj: Bitmap { Flags = 73744, FrameDimensionsList = [7462dc86-6180-4c7e-8e3f-ee7333a7a483], Height = 100, HorizontalResolution = 96, Palette = ColorPalette { Entries = [...], Flags = 0 }, ... }, blobs: [System.Runtime.Serialization.Formatters.Tests.TypeSerializableValue, System.Runtime.Serialization.Formatters.Tests.TypeSerializableValue])" type="System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests" method="ValidateAgainstBlobs" time="0.0115822" result="Pass" />
This is also the output from the helix machine I got into it from multiple runs:
~/safern/tests> cat testResults.xml | grep 'ValidateAgainstBlobs' | grep 'Bitmap'
<test name="System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests.ValidateAgainstBlobs(obj: Bitmap { Flags = 73744, FrameDimensionsList = [7462dc86-6180-4c7e-8e3f-ee7333a7a483], Height = 100, HorizontalResolution = 96, Palette = ColorPalette { Entries = [...], Flags = 0 }, ... }, blobs: [System.Runtime.Serialization.Formatters.Tests.TypeSerializableValue, System.Runtime.Serialization.Formatters.Tests.TypeSerializableValue])" type="System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests" method="ValidateAgainstBlobs" time="0.0143476" result="Pass" />
Also, I did a kusto query and the last failure was on January 27th. From digging in libgdiplus it seems like there were some fixes on Bitmap in libgdiplus, could be the reason why it is not failing anymore.
Closing and if we see the failure again we can reopen.
@safern it now seems to be reproducing on the "Libraries Test Run release coreclr Linux x64 Debug" leg. It has failed on my PR with this exact issue on x64 Ubuntu 18.04 several times (I've tried to restart the leg and keep getting it):
https://dev.azure.com/dnceng/public/_build/results?buildId=579375&view=results
There is a similar failure in other tests
System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests.RoundtripManyObjectsInOneStream:
Assert.Equal() Failure
Expected: 19977332
Actual: 19978652
System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests.ValidateBasicObjectsRoundtrip:
Assert.Equal() Failure
Expected: 93535970
Actual: 93537180
This hasn't hit a rolling build yet but it is starting to show up in a number of PRs.
|Build|Pull Request | Test Failure Count|
| --- | --- | --- |
|#579294|#32592|2|
|#579319|#34261|2|
|#579375|#34154|2|
|#579412|#34263|1|
|#579428|#32592|2|
|#579508|#34064|1|
|#579536|#34275|1|
|#579539|#34275|1|
|#579587|#34225|1|
|#579589|#34166|1|
|#579596|#34046|1|
|#579629|#34022|1|
|#579675|#34086|1|
|#579684|#32592|2|
|#579750|#34249|1|
|Build|Pull Request|Console|Core|Test Results|Run Client|
| --- | --- | --- | --- | --- | --- |
|#579294|#32592|console.log||testResults.xml|run_client.py|
|#579294|#32592|console.log|||run_client.py|
|#579319|#34261|console.log||testResults.xml|run_client.py|
|#579319|#34261|console.log||testResults.xml|run_client.py|
|#579375|#34154|console.log||testResults.xml|run_client.py|
|#579375|#34154|console.log||testResults.xml|run_client.py|
|#579412|#34263|console.log||testResults.xml|run_client.py|
|#579428|#32592|console.log||testResults.xml|run_client.py|
|#579428|#32592|console.log|||run_client.py|
|#579508|#34064|console.log||testResults.xml|run_client.py|
|#579536|#34275|console.log||testResults.xml|run_client.py|
|#579539|#34275|console.log||testResults.xml|run_client.py|
|#579587|#34225|console.log||testResults.xml|run_client.py|
|#579589|#34166|console.log||testResults.xml|run_client.py|
|#579596|#34046|console.log||testResults.xml|run_client.py|
|#579629|#34022|console.log||testResults.xml|run_client.py|
|#579675|#34086|console.log||testResults.xml|run_client.py|
|#579684|#32592|console.log||testResults.xml|run_client.py|
|#579684|#32592|console.log|||run_client.py|
|#579750|#34249|console.log||testResults.xml|run_client.py|
runfo tests -d runtime -c 100 -pr -n "System.Runtime.Serialization.Formatters.Tests Work Item" -m -e 579185
Excluded 579185 from data because it appears to be a legitimate failing PR
This hasn't hit a rolling build yet but it is starting to show up in a number of PRs.
Some of the PRs that are on that data aren't this issue. For example, PR: https://github.com/dotnet/runtime/pull/34166, build: https://dev.azure.com/dnceng/public/_build/results?buildId=579185&view=ms.vss-test-web.build-test-results-tab
All workitems are crashing because of the changes in the PR itself.
@safern it now seems to be reproducing on the "Libraries Test Run release coreclr Linux x64 Debug" leg.
@joperezr and I are looking at it at the moment, trying to repro locally.
Ok, so in between @MattGal @joperezr and I were able to root cause the issue. Here's the summary of our investigation:
First, I thought it was a libgdiplus version issue because in my local Ubuntu18.04 I was running against 6.0.4 (latest) libgdiplus and it didn't repro. So I rolled back to the one inbox and it started failing consistently. So we went and talked to @MattGal to see if something changed on the machines, and it hasn't (however they use libgdiplus 4.2, so they definitely should be updated to use the latest), but since nothing changed on how the machines are setup, we went and look at the build history, and found this change: https://github.com/dotnet/runtime/pull/34251/-- which seems suspicious because in this specific scenario we build a Bitmap from a Stream and we save it into a Stream, which uses managed delegates passed down via PInvoke to libgdiplus, and libgdiplus calls back into them.
So I reverted the change and it doesn't repro on any of the two versions above in my local machine after 10 runs it didn't repro at all.
@jkotas even though it only reproes on an old libgdiplus, I think it is worth investigating why it started failing and if there is a bug in the runtime.
We have two options here to unblock PRs. Either revert https://github.com/dotnet/runtime/pull/34251 or condition this test data to only run on Windows or when libgdiplus version > 6. @jkotas which one do you prefer?
I'll follow up on updating the machines to use the latest libgdiplus as that we should do and enable the test that we have for macOS checking that we run on > 6 in other Linux distros once that happens.
Note that this only repros on PRs because in CI we run these tests against a Release framework and in PRs against Debug. Also your PR didn't catch it because it only changed CoreCLR and we only ran tests against a checked runtime and this test is skipped on checked runtime because it is very slow
I ended up putting the revert PR here in the meantime: https://github.com/dotnet/runtime/pull/34306
Most helpful comment
Ok, so in between @MattGal @joperezr and I were able to root cause the issue. Here's the summary of our investigation:
First, I thought it was a libgdiplus version issue because in my local Ubuntu18.04 I was running against 6.0.4 (latest) libgdiplus and it didn't repro. So I rolled back to the one inbox and it started failing consistently. So we went and talked to @MattGal to see if something changed on the machines, and it hasn't (however they use libgdiplus 4.2, so they definitely should be updated to use the latest), but since nothing changed on how the machines are setup, we went and look at the build history, and found this change: https://github.com/dotnet/runtime/pull/34251/-- which seems suspicious because in this specific scenario we build a Bitmap from a Stream and we save it into a Stream, which uses managed delegates passed down via PInvoke to libgdiplus, and libgdiplus calls back into them.
So I reverted the change and it doesn't repro on any of the two versions above in my local machine after 10 runs it didn't repro at all.
@jkotas even though it only reproes on an old libgdiplus, I think it is worth investigating why it started failing and if there is a bug in the runtime.
We have two options here to unblock PRs. Either revert https://github.com/dotnet/runtime/pull/34251 or condition this test data to only run on Windows or when
libgdiplusversion >6. @jkotas which one do you prefer?I'll follow up on updating the machines to use the latest libgdiplus as that we should do and enable the test that we have for macOS checking that we run on
> 6in other Linux distros once that happens.Note that this only repros on PRs because in CI we run these tests against a Release framework and in PRs against Debug. Also your PR didn't catch it because it only changed CoreCLR and we only ran tests against a checked runtime and this test is skipped on checked runtime because it is very slow