rspec-rails 🚀 - Add support for Rails 6 built-in Parallel Testing

We'd love to support parallel tests with RSpec itself, its a rather large task however. If we get parallelisation support into rspec-core you can be sure we'll integrate with the rails helpers.

JonRowe on 1 Apr 2019

👍22

Note that linked #1254 says:

There are bunch of gems that allow for parallel execution, but all of these are process based and not thread based.

I believe the parallel testing now built into Rails can be process-based or thread-based, and by default is process-based.

Is supporting Rails process-based parallel testing less of a challenge? It looks like one of the main things Rails does for you when supporting this is setting up a separate database for each test worker, as well as orchestrate getting each worker to run different tests and aggregating the results.

jrochkind on 29 Apr 2019

In the short term all we could offer is to help integrate 3rd party parallel testing with Rails tools.

Our own parallel journey requires changes to rspec-core which are mostly identical for process or thread based.

Please note there are parallel testing extensions for RSpec already!

JonRowe on 30 Apr 2019

Discourse now has a solution based on parallel_tests https://github.com/discourse/discourse/pull/7778

p8 on 13 Aug 2019

👍13

Is there any update on supporting the activesupport parallelize helper in rspec-rails, now that rails 6 has been released?

iainbeeston on 21 Aug 2019

Sorry, I meant to include a link with more details:

https://github.com/rails/rails/pull/31900

iainbeeston on 21 Aug 2019

No, RSpec itself needs to support parallelisation in order for it to supported

JonRowe on 21 Aug 2019

👍2

Sorry @JonRowe - can I please clarify something?

rspec-core needs updating even for multi-process parallelisation, right? (not just multi-threaded parallelisation as described in https://github.com/rspec/rspec-core/issues/1254, which I would assume is more challenging)

iainbeeston on 21 Aug 2019

The main rspec-core runner doesn't support parallelisation, so it has no means of collating results from workers, be they threads or processes. Process based parallelisation is simpler than thread based due to well, the GIL on MRI rubies and lack of 100% thread safety within the other gems. (I'm fairly certain for example that there are threaded bugs lurking within the mocks code).

My ideal plan for taking us to parallelisation is to use the "fork" model that the bisect runner uses, maybe with a fall back to a "shell" model (essentially processes) but I haven't had the time to work much on it.

JonRowe on 21 Aug 2019

👍10 ❤9

to summarize: regardless if thread or processes: rspec-core isn't supporting it yet, so if we run a rails 6.0 application and use rspec, we won't benefit from it. correct?

krtschmr on 25 Aug 2019

The helpers are there to support parallelisation, rspec on its own doesn't support parallelisation so you'll need an extension gem to benefit from it.

JonRowe on 25 Aug 2019

My ideal plan for taking us to parallelisation is to use the "fork" model that the bisect runner uses, maybe with a fall back to a "shell" model (essentially processes) but I haven't had the time to work much on it.

@JonRowe Would you copy all bisect code and renaming the namespace to parallel and make the required changes?

p8 on 5 Sep 2019

No I'm writing it from first principles using the bisect runner as a guide and some other external gems.

JonRowe on 5 Sep 2019

👍23 ❤18

@JonRowe thanks for taking this on! Is there anything we can do to help this along? Not sure I'm qualified to write any of the runner but if there's something, please let us know!

wwahammy on 8 Nov 2019

👍2

Hey guys! What's the latest on this?

courtsimas on 22 Jan 2020

👍3

Still awaiting finishing the core work

JonRowe on 23 Jan 2020

❤22

Since Support Multithread Execution ticket on rspec-core doesn't seem to be a good fit, let me do a brain dump of tools that allow for parallelization of RSpec suite execution here:

https://github.com/skroutz/rspecq - (added 31 July 2020) ci-queue inspired, Redis
https://github.com/Shopify/ci-queue - RSpec support is deprecated, but not planned for removal, just not actively maintained; no support for before(:context)
https://github.com/tmm1/test-queue - quite well known
https://github.com/sandro/specjour - uses Bonjour to find worker processes
https://github.com/cookpad/rrrspec - unmaintained for a year
https://github.com/conversation/rspec-queue
https://github.com/grosser/forking_test_runner
https://github.com/paraspec/paraspec - just found it recently, didn't investigate
https://github.com/lonelyplanet/queuecumber - the name says it all
https://github.com/grosser/parallel_tests - does not aggregate reports. serves as a base for some other tools in this list
https://github.com/ArturT/knapsack - splits the suite into equal'ish parts basing on previous runs' results. provides a dynamic queue for distributed rspec processes in the commercial version, but doesn't have on-premise option AFAIK
https://github.com/aslakhellesoy/rspec-distributed - quite old. uses DRb, so can accumulate build results not only from several processes running on different CPU cores, but also from several machines. seems abandoned
at my previous company, I was working on an in-house tool, but it's still not open-sourced unfortunately
... I keep bumping into new ones each and every month

All or most of those tools allow for parallelized execution in processes, arranging work with several databases via more or less common TEST_ENV_NUMBER env var.

With such a variety of tools, it seems that parallel testing of Rails applications with RSpec is not really an unsolved problem.
On my last project, we were running a dozen machines each running eight RSpec processes, that allowed to reduce spec suite wall clock run time 50x.
It's not by the way constrained by the usage of Rails 6, works equally in Rails 4.2+.

Basing on the efforts that had to be spent on building and maintaining such a tool while working full-time, I wouldn't really expect that a comparable tool to appear out of thin air in RSpec distribution and to immediately become better than everything else on the market.

pirj on 5 Feb 2020

👍18 ❤2

this will be awesome with rails 6+

jtoy on 11 Mar 2020

@pirj I recently attempted to improve our CI testing using some of the tools you mentioned above, and I didn't have much luck. I didn't test all of them, but I did test the ones that were the most popular, and none really seemed to meet what we're looking for.

In an ideal world, a solution for this would do the following:

If multiple cores are available, automatically parallelize the specs. This would have to be configurable.
Automatically aggregate the results into a single, readable output stream using the provided formatter.
Continue to work with coverage tools like SimpleCov.

My understanding of the Rails feature is that it gets us more of the way there. This still feels like an unsolved problem to me for Rspec. I believe Jest does something like this at the file level in JavaScript. It would be awesome to see.

LandonSchropp on 7 May 2020

❤1

@LandonSchropp I have bad news for you if you really need to aggregate the results into a single standard RSpec report. It's a hard task with a number of non-obvious edge cases. We've solved it on my previous project, at least for those cases that we've faced. The report is gathered from different workers, spread over different processes or even machines. Unfortunately, it's not open-sourced yet.
If you're using a JUnit-compatible XML-based reporter, combining the results in the end for CI to consume shouldn't be a big deal though.

As to my memory, we had no big issues with aggregating SimpleCov results, but that worked before I joined and I didn't work on this part.

Auto-scaling to multiple available cores is a trivial task. Once you (depending on your system flavour and CPU type) get the number of processors, you're able to run rake db:test:create and rspec by prefixing them with a TEST_ENV_NUMBER, and you're done. Some prefer leaving one core, but you may play around and run 2x more parallel processes than you have cores available. It depends on your tests what will yield the best results. Typically this is all done it a 20 LoC shell script.

Another non-trivial task is to evenly distribute your specs across those parallel processes. If you don't, you risk that one of the processes will be running for a significant amount of time, while other cores will finish already and will be idle. It is solved in Knapsack Pro and AFAIR in rrrspec.
The key is to keep the execution time of the specs, and feed specs to RSpec runner dynamically, one by one. Most time-consuming specs changed specs since the last run, and newly introduced specs are run first to reduce the unevenness of the distribution.
Frankly, I don't have any proof that this queuing system is faster than pre-divided execution on a pretty large suite.
Don't divide the spec into individual examples if you're using let_it_be, or some other form of before(:context) with a transaction rollback. The expensive setup will multiply in this case.

My understanding of the Rails feature is that it gets us more of the way there.

Honestly, I'm not so sure about that. A major flaw I could spot is that Rails doesn't detect if a worker died and did not finish all the workload it was supposed to execute. A better way is to have a timeout and consider worker as dead and spread over its remaining workload among alive workers. This is also solved in my previous project.

pirj on 7 May 2020

👀2

@pirj Thanks for the detailed reply!

Sorry if my post sounded like I was trivializing the amount of work involved in this. That definitely wasn't my intention. My goal in my last message was more to enumerate my team's use case in the hopes that it could influence the future design of this feature. I'm sure that a refactor along these lines is a massive undertaking.

As to my memory, we had no big issues with aggregating SimpleCov results, but that worked before I joined and I didn't work on this part.

I'll have to take a second look at that. When I was investigating, it wasn't clear to me how I'd go about that. I found one plugin for integrating SimpleCov with parallel-tests, but it said it was only compatible with CircleCI, and we're using Codeship, so I didn't explore it further. If you have any hints you could share, I'd appreciate it.

Auto-scaling to multiple available cores is a trivial task. Once you (depending on your system flavour and CPU type) get the number of processors, you're able to run rake db:test:create and rspec by prefixing them with a TEST_ENV_NUMBER, and you're done.

Yep, that's what we're currently doing. In my laptop, I've got 8 cores, each with hyperthreading, so that spins up 15 processes to run on. With that many simultaneous tests running, you can see how managing output would be helpful. I hadn't looked into aggregating JUnit XML compatible results. Do you have an example of something like that with Rspec you could point me to? Maybe that would help solve some of our problems. Our CI processes have four cores available, so it's somewhat less of a concern there.

Frankly, I don't have any proof that this queuing system is faster than pre-divided execution on a pretty large suite.

That's roughly my experience as well. We have a few specs in our application are that are slower than others, but for the most part the costs of individual specs are amortized over our files. Based on my observations of using Jest, it looks like it breaks the spec files into a queue, and then runs each file as a worker becomes available. In our use case, pre-splitting the specs has worked fine.

When I was exploring Knapsack, I actually felt like it was trying to solve a different problem than what we were experiencing. As you said, for us, it was less about distribution and more about taking advantage of multiple cores locally and in our CI environment.

A major flaw I could spot is that Rails doesn't detect if a worker died and did not finish all the workload it was supposed to execute. A better way is to have a timeout and consider worker as dead and spread over its remaining workload among alive workers. This is also solved in my previous project.

I can't speak to that. I'm sure you're absolutely correct, and I agree with you that the behavior you're describing sounds better than what's being done by the Rails runner.

What I meant more from my comment is that what's advertised from the Rails runner is more what we're after. It advertises that it automatically detects the number of cores on the machine, creates the test databases for us, and then splits up our tests and runs them. We love Rspec and will continue to use it over Minitest, but it would be awesome if there were an analog to the parallel features with Rspec that was as easy to use as the Rails solution.

Thanks again for your detailed reply. I really appreciate it!

LandonSchropp on 8 May 2020

I can't speak to the rest of it, but as for merging JUnit reports, I've done that before in a different context. The file format is pretty simple, it would just be a matter of a little Nokogiri code to open each report and output a bigger report with the contents of each. Just have each rspec process output in JUnit format and then handle the aggregation and printing of results yourself

rabidaudio on 9 May 2020

@JonRowe Discourse would be happy to fund some work here if you or any other contributors are looking to build it.

Some requirements from me :)

Dedicated gem, plug-and-play into your existing rspec rails app. Must be drop in replacement for Discourse.
Fork per testing process
Pull vs push model. (our bin/turbo_rspec is still the old push model, it means that some workers can be doing nothing while other workers have backlogs of work)... instead similar to @tmm1's test_queue master process has a queue of tests and child processes just ask it for the next test to run via an IO.pipe or some other cheap transport.
Non interleaved output like Discourse's bin/turbo_rspec
Long term also offers discourse bin/rake autospec which is basically a interruptible guard.

If anyone want to work on this and would like the work funded, DM me via Twitter. @samsaffron

SamSaffron on 18 May 2020

🚀17 ❤15 👍15

My plan is to build support for parallelism into rspec-core, which solves many of the problems with plug'n'play, both as our own option, and the ability to farm work out to other plugins. Fork is absolutely on those planes. I very much like the idea of the master rspec process controlling the queue like that, as it allows much easier non interleaved output, which was very much essential.

From there rspec-rails would than integrate the Rails system test helpers for their database support etc

I hadn't thought about the possibility of such a model being a long running process, and reloading / rerunning specs, thats got a set of complications all on its own (as I'm sure you know!)

JonRowe on 18 May 2020

👍8

@JonRowe it must be noted too that parallelism in Rspec will also give a strong acceleration to all the "TDD devops" tools which happen to be based on RSpec (like InSpec and ServerSpec). Just a side-dish, but a very nice one, and definitely useful to the reputation of Ruby as a whole ^_^. Given that, maybe Chef could also be willing to provide some funding in that area (I'm not working there, but just making a wild guess!).

thbar on 18 May 2020

👍4

@JonRowe @SamSaffron My company, CommitChange, would be able to add a small amount to the parallel testing pot. It'd be well under $1000 (we just don't have the resources to spend a lot more) but it's something if it'd help. I totally get if this isn't enough to bother but there's likely many smaller users who might be able to put in a few hundred.

wwahammy on 18 May 2020

❤7 👍3

thanks everyone, I have been chatting privately to @JonRowe and it looks like Jon may have some time for this project, stay tuned :confetti_ball:

SamSaffron on 19 May 2020

🎉64 👍19 ❤7

+1

tobsch on 18 Jun 2020

Any updates or PR's to share here? Super excited.

adenta on 15 Jul 2020

Discourse are sponsoring @JonRowe here a bit to ensure he has enough time to work on internals

We are also sponsoring @ioquatix who is working on a new gem to do general work queue based parallel testing with an rspec backend.

Nothing is ready for testing yet, but we hope to have something in the next couple of months.

SamSaffron on 15 Jul 2020

👍45 ❤15

That's great news! FWIW we've also recently put up our own implementation that uses a pull mode with separate processes and a centralized work queue, backed by Redis: rspecq. It's been working great (having executed over a 700 CI builds so far, for a large RSpec suite in our Rails monolith). That said, it'd be awesome to have build-in support in rspec-core :rocket:

agis on 15 Jul 2020

👍7

cc @ArturT

pirj on 31 Jul 2020

Thanks @pirj for pinging me.

Recently I've been working with @shadre on running in parallel a slow RSpec test file split by test examples on multiple CI nodes.

We also prepared an article how to run slow RSpec files in parallel using Github Actions.

ArturT on 31 Jul 2020

👎4 👀1

@SamSaffron @JonRowe may we have a quick update on the progress, please?

longnd on 19 Nov 2020

👍1

@ioquatix is working on a parallel runner, it is getting there he can link to the repo and provide a bit of an update if you feel like testing.

We still need to polish off some small rough edges but it works overall.

SamSaffron on 20 Nov 2020

❤8 👍6 🚀5 🎉4

@ioquatix is working on a parallel runner, it is getting there he can link to the repo and provide a bit of an update if you feel like testing.

We still need to polish off some small rough edges but it works overall.

Do you mean this one: https://github.com/ioquatix/turbo_test?

ilyazub on 23 Nov 2020

👍1

Yes that is it ... Samuel is still working on it.

On Tue, Nov 24, 2020 at 2:33 AM Ilya Zub notifications@github.com wrote:

@ioquatix https://github.com/ioquatix is working on a parallel runner,
it is getting there he can link to the repo and provide a bit of an update
if you feel like testing.

We still need to polish off some small rough edges but it works overall.

Do you mean this one: https://github.com/ioquatix/turbo_test?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rspec/rspec-rails/issues/2104#issuecomment-732235726,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAABIXKLANAMBECNM63VZKTSRJ6EFANCNFSM4HCVUOKQ
.

SamSaffron on 24 Nov 2020

👍1

I will give you an update this weekend. Sorry, it has been hectic preparing for (virtual) conferences.

ioquatix on 24 Nov 2020

❤2 👍2 🚀1

Meanwhile, I've extracted turbo tests from the Discourse and Rubygems source code into a separate gem: turbo_tests. Samuel's work is kinda more long-term from my perspective, as it doesn't use the parallel_tests gem.

_PS. It's funny that we started working on the same thing separately on the same day (October 30th)._

ilyazub on 24 Nov 2020

🚀1 ❤1

Rspec-rails: Add support for Rails 6 built-in Parallel Testing

Most helpful comment

All 39 comments

Related issues