Walletwasabi: Make the CI, Unit and RegTest stable and faster

Created on 10 Aug 2020 · 18Comments · Source: zkSNACKs/WalletWasabi

CI Tests are failing for a long time now. We have a known bug somewhere here https://github.com/zkSNACKs/WalletWasabi/issues/3660

Other tests are started to fail on the Linux CI too, sometimes on macOS.
RegTests are in a disastrous state too, takes almost 20 minutes to make them pass.

Here is a list of the task as I see now.

[x] Fix failing CI tests.
[x] Check azure logs and fix randomly failing CI tests.
[ ] Speed up CI tests, replace delays with "signaling".
[ ] Fix RegTests, bonus: try to add back RegTests to CI.
[x] Add deterministic build test to CI in some way - figure out the concept first. https://github.com/zkSNACKs/WalletWasabi/pull/4065

debug

Source

molnard

🚀1

All 18 comments

Most of the tests could use a fake node like this one https://github.com/lontivero/MockBitcoinRpc

Some pros:

It runs in isolation what means you can run as many instances as you want in parallel without conflicts
It makes possible to run almost all our regression tests in parallel and without conflict
It starts immediately. We don't need to wait the node to do all the bootstrapping. In other words: it is fast.

Some pros:

it is not finished and would require maintenance from time to time.
it is useful for testing our code and logic but it is not useful to test real interaction with real node.
we have to be able to switch and test with the real node at any time.

lontivero on 10 Aug 2020

I think that https://github.com/lontivero/MockBitcoinRpc is a nice way to add new tests. But the current tests should be fixed anyway. However, it may require back-and-forth between refactoring and fixing tests. That's because the tests often instantiate something but that something is a big beast itself.

kiminuo on 10 Aug 2020

Maybe we can start with specifying which tests fail often:

WalletWasabi.Tests.UnitTests.BlockNotifierTests.LongChainReorgAsync
WalletWasabi.Tests.UnitTests.BitcoinCore.P2pBasedTests.MempoolNotifiesAsync
WalletWasabi.Tests.UnitTests.BitcoinCore.NodeBuildingTests.GetVersionTestsAsync
WalletWasabi.Tests.UnitTests.BlockNotifierTests.SimpleReorgAsync
WalletWasabi.Tests.UnitTests.BitcoinCore.StalledBitcoinNodeAsync
WalletWasabi.Tests.UnitTests.Hwi.DefaultResponseTests.HwiProcessBridgeTestAsync
System.Threading.Tasks.TaskCanceledException : Waiting for process exiting was canceled. ---- System.OperationCanceledException : The operation was canceled.

WORK IN PROGRESS

kiminuo on 10 Aug 2020

WalletWasabi.Tests.IntegrationTests.ItBitExchangeRateProviderTestsAsync

yahiheb on 10 Aug 2020

[DONE] ~~https://github.com/zkSNACKs/WalletWasabi/pull/3863 - this PR should help with tests too. So if you have some time, please review this PR.~~

kiminuo on 11 Aug 2020

Okay, so there is one more thing I have found out when working on https://github.com/zkSNACKs/WalletWasabi/pull/3910:

Tests seem to be run in parallel which is fine setting but the consequence seems to be that time-sensitive tests may break easily. I was thinking: Right, but give it 5 seconds to do a operation and it will always pass. That's not the case. When using ConfigureAwait(true) which is correct and default value for await operations, it may take a lot of time to return to given place in a test with the same thread.

My current feeling about the flaky tests is that it may make sense to create a new test project whose tests are run serially so that you can test even time-sensitive features with long-ish timeouts more easily. Or maybe not a new test project but a new folder for these tests. But xunit parallelization settings are not that straightforward :-(

kiminuo on 12 Aug 2020

* WalletWasabi.Tests.IntegrationTests.ItBitExchangeRateProviderTestsAsync

I'm not sure whether xunit has some attribute for saying "retry test for x-times". That would help a bit but to be honest I don't like these "tests touching external services" - it may fail for too many reasons.

kiminuo on 12 Aug 2020

👍1

If ItBitExchangeRateProviderTestsAsync fails then that means that the service is not working well (even when it seems to work okay for me, who cares). The point is that we need to know that because the price providers are external APIs that have a big impact on wasabi users. We don't want to retry x-times because that would give us a false idea about the availability of the service. However I think those tests shouldn't run as part of a normal build, i mean, why should my PR fail when Coinbase server is overloaded?

lontivero on 12 Aug 2020

However I think those tests shouldn't run as part of a normal build, i mean, why should my PR fail when Coinbase server is overloaded?

Yeah, this is what monitor services should do in general. You have a C# project (or any other for that matter) that you can run regularly (sort of cron job) which prints status of particular services. If this monitoring service shows that some 3rd party service is not working anymore, it can be removed from Wasabi Wallet.

kiminuo on 12 Aug 2020

Remove those tests then.

lontivero on 13 Aug 2020

Tests seem to be run in parallel which is fine setting but the consequence seems to be that time-sensitive tests may break easily. I was thinking: Right, but give it 5 seconds to do a operation and it will always pass. That's not the case. When using ConfigureAwait(true) which is correct and default value for await operations, it may take a lot of time to return to given place in a test with the same thread.

https://xunit.net/docs/running-tests-in-parallel.html

Turn off parallelism for specific Test Collection
[CollectionDefinition(DisableParallelization = true)]
Default: false
Parallel-capable test collections will be run first (in parallel), followed by parallel-disabled test collections (run sequentially).

A couple of solutions in my mind:

Creating a test collection for sensitive tests and include tests one-by-one.
Disable parallelism at all.
Move sensitive Unit Tests into the different Test class where parallelism is disabled.

I vote for the first one.

molnard on 13 Aug 2020

Parallel-capable test collections will be run first (in parallel), followed by parallel-disabled test collections (run sequentially).

Create a Test collection called "DisableParallel". Set [CollectionDefinition(DisableParallelization = true)].
Add DisableParallel Test collection to all sensitive test classes.

Does this make sense?

molnard on 13 Aug 2020

Create a Test collection called "DisableParallel". Set [CollectionDefinition(DisableParallelization = true)].
Add DisableParallel Test collection to all sensitive test classes.

Well, I don't think so. This collection won't be run in parallel, right. But all other tests will be run parallel to this collection of tests. That's imo not what you want.

I would vote for 3.

kiminuo on 13 Aug 2020

According to this:

Parallel-capable test collections will be run first (in parallel), followed by parallel-disabled test collections (run sequentially).

It will run parallel enable tests first, wait until all finishes then start the parallel disable collections one-by-one. The question is if not having a test collection is a test collection too?

molnard on 13 Aug 2020

Parallel-capable test collections will be run first (in parallel), followed by parallel-disabled test collections (run sequentially).

Ah. I didn't know that. That sounds good.

kiminuo on 13 Aug 2020

It will run parallel enable tests first, wait until all finishes then start the parallel disable collections one-by-one. The question is if not having a test collection is a test collection too?

I would simply try to add it. We will see, if we add a few logger.LogInfo calls.

kiminuo on 13 Aug 2020

👍1

I have spent some time looking into output of CI tasks:

2020-08-26T17:02:45.9797397Z Info: .NET Core SDK/runtime 2.2 and 3.0 are now End of Life(EOL) and have been removed from all hosted agents. If you're using these SDK/runtimes on hosted agents, kindly upgrade to newer versions which are not EOL, or else use UseDotNet task to install the required version.

(An example here: https://dev.azure.com/zkSNACKs/e79964fe-d087-4901-b0b7-69a8d48cd7a7/_apis/build/builds/31895/logs/10)

as I thought it's an issue in Azure Pipelines settings but it turns out it's just a (unfortunate) info message that does not apply to us.

Relevant sources: