Use Rails 6 built-in parallelizer API for running parallel RSpec executors
https://edgeguides.rubyonrails.org/testing.html#parallel-testing
https://github.com/rails/rails/pull/31900#issuecomment-374299402
We'd love to support parallel tests with RSpec itself, its a rather large task however. If we get parallelisation support into rspec-core you can be sure we'll integrate with the rails helpers.
Note that linked #1254 says:
There are bunch of gems that allow for parallel execution, but all of these are process based and not thread based.
I believe the parallel testing now built into Rails can be process-based or thread-based, and by default is process-based.
Is supporting Rails process-based parallel testing less of a challenge? It looks like one of the main things Rails does for you when supporting this is setting up a separate database for each test worker, as well as orchestrate getting each worker to run different tests and aggregating the results.
In the short term all we could offer is to help integrate 3rd party parallel testing with Rails tools.
Our own parallel journey requires changes to rspec-core which are mostly identical for process or thread based.
Please note there are parallel testing extensions for RSpec already!
Discourse now has a solution based on parallel_tests https://github.com/discourse/discourse/pull/7778
Is there any update on supporting the activesupport parallelize
helper in rspec-rails, now that rails 6 has been released?
Sorry, I meant to include a link with more details:
No, RSpec itself needs to support parallelisation in order for it to supported
Sorry @JonRowe - can I please clarify something?
rspec-core needs updating even for multi-process parallelisation, right? (not just multi-threaded parallelisation as described in https://github.com/rspec/rspec-core/issues/1254, which I would assume is more challenging)
The main rspec-core runner doesn't support parallelisation, so it has no means of collating results from workers, be they threads or processes. Process based parallelisation is simpler than thread based due to well, the GIL on MRI rubies and lack of 100% thread safety within the other gems. (I'm fairly certain for example that there are threaded bugs lurking within the mocks code).
My ideal plan for taking us to parallelisation is to use the "fork" model that the bisect runner uses, maybe with a fall back to a "shell" model (essentially processes) but I haven't had the time to work much on it.
to summarize: regardless if thread or processes: rspec-core isn't supporting it yet, so if we run a rails 6.0 application and use rspec, we won't benefit from it. correct?
The helpers are there to support parallelisation, rspec on its own doesn't support parallelisation so you'll need an extension gem to benefit from it.
My ideal plan for taking us to parallelisation is to use the "fork" model that the bisect runner uses, maybe with a fall back to a "shell" model (essentially processes) but I haven't had the time to work much on it.
@JonRowe Would you copy all bisect
code and renaming the namespace to parallel
and make the required changes?
No I'm writing it from first principles using the bisect runner as a guide and some other external gems.
@JonRowe thanks for taking this on! Is there anything we can do to help this along? Not sure I'm qualified to write any of the runner but if there's something, please let us know!
Hey guys! What's the latest on this?
Still awaiting finishing the core work
Since Support Multithread Execution ticket on rspec-core
doesn't seem to be a good fit, let me do a brain dump of tools that allow for parallelization of RSpec suite execution here:
before(:context)
All or most of those tools allow for parallelized execution in processes, arranging work with several databases via more or less common TEST_ENV_NUMBER
env var.
With such a variety of tools, it seems that parallel testing of Rails applications with RSpec is not really an unsolved problem.
On my last project, we were running a dozen machines each running eight RSpec processes, that allowed to reduce spec suite wall clock run time 50x.
It's not by the way constrained by the usage of Rails 6, works equally in Rails 4.2+.
Basing on the efforts that had to be spent on building and maintaining such a tool while working full-time, I wouldn't really expect that a comparable tool to appear out of thin air in RSpec distribution and to immediately become better than everything else on the market.
this will be awesome with rails 6+
@pirj I recently attempted to improve our CI testing using some of the tools you mentioned above, and I didn't have much luck. I didn't test all of them, but I did test the ones that were the most popular, and none really seemed to meet what we're looking for.
In an ideal world, a solution for this would do the following:
My understanding of the Rails feature is that it gets us more of the way there. This still feels like an unsolved problem to me for Rspec. I believe Jest does something like this at the file level in JavaScript. It would be awesome to see.
@LandonSchropp I have bad news for you if you really need to aggregate the results into a single standard RSpec report. It's a hard task with a number of non-obvious edge cases. We've solved it on my previous project, at least for those cases that we've faced. The report is gathered from different workers, spread over different processes or even machines. Unfortunately, it's not open-sourced yet.
If you're using a JUnit-compatible XML-based reporter, combining the results in the end for CI to consume shouldn't be a big deal though.
As to my memory, we had no big issues with aggregating SimpleCov results, but that worked before I joined and I didn't work on this part.
Auto-scaling to multiple available cores is a trivial task. Once you (depending on your system flavour and CPU type) get the number of processors, you're able to run rake db:test:create
and rspec
by prefixing them with a TEST_ENV_NUMBER
, and you're done. Some prefer leaving one core, but you may play around and run 2x more parallel processes than you have cores available. It depends on your tests what will yield the best results. Typically this is all done it a 20 LoC shell script.
Another non-trivial task is to evenly distribute your specs across those parallel processes. If you don't, you risk that one of the processes will be running for a significant amount of time, while other cores will finish already and will be idle. It is solved in Knapsack Pro and AFAIR in rrrspec.
The key is to keep the execution time of the specs, and feed specs to RSpec runner dynamically, one by one. Most time-consuming specs changed specs since the last run, and newly introduced specs are run first to reduce the unevenness of the distribution.
Frankly, I don't have any proof that this queuing system is faster than pre-divided execution on a pretty large suite.
Don't divide the spec into individual examples if you're using let_it_be
, or some other form of before(:context)
with a transaction rollback. The expensive setup will multiply in this case.
My understanding of the Rails feature is that it gets us more of the way there.
Honestly, I'm not so sure about that. A major flaw I could spot is that Rails doesn't detect if a worker died and did not finish all the workload it was supposed to execute. A better way is to have a timeout and consider worker as dead and spread over its remaining workload among alive workers. This is also solved in my previous project.
@pirj Thanks for the detailed reply!
Sorry if my post sounded like I was trivializing the amount of work involved in this. That definitely wasn't my intention. My goal in my last message was more to enumerate my team's use case in the hopes that it could influence the future design of this feature. I'm sure that a refactor along these lines is a massive undertaking.
As to my memory, we had no big issues with aggregating SimpleCov results, but that worked before I joined and I didn't work on this part.
I'll have to take a second look at that. When I was investigating, it wasn't clear to me how I'd go about that. I found one plugin for integrating SimpleCov with parallel-tests, but it said it was only compatible with CircleCI, and we're using Codeship, so I didn't explore it further. If you have any hints you could share, I'd appreciate it.
Auto-scaling to multiple available cores is a trivial task. Once you (depending on your system flavour and CPU type) get the number of processors, you're able to run rake db:test:create and rspec by prefixing them with a TEST_ENV_NUMBER, and you're done.
Yep, that's what we're currently doing. In my laptop, I've got 8 cores, each with hyperthreading, so that spins up 15 processes to run on. With that many simultaneous tests running, you can see how managing output would be helpful. I hadn't looked into aggregating JUnit XML compatible results. Do you have an example of something like that with Rspec you could point me to? Maybe that would help solve some of our problems. Our CI processes have four cores available, so it's somewhat less of a concern there.
Frankly, I don't have any proof that this queuing system is faster than pre-divided execution on a pretty large suite.
That's roughly my experience as well. We have a few specs in our application are that are slower than others, but for the most part the costs of individual specs are amortized over our files. Based on my observations of using Jest, it looks like it breaks the spec files into a queue, and then runs each file as a worker becomes available. In our use case, pre-splitting the specs has worked fine.
When I was exploring Knapsack, I actually felt like it was trying to solve a different problem than what we were experiencing. As you said, for us, it was less about distribution and more about taking advantage of multiple cores locally and in our CI environment.
A major flaw I could spot is that Rails doesn't detect if a worker died and did not finish all the workload it was supposed to execute. A better way is to have a timeout and consider worker as dead and spread over its remaining workload among alive workers. This is also solved in my previous project.
I can't speak to that. I'm sure you're absolutely correct, and I agree with you that the behavior you're describing sounds better than what's being done by the Rails runner.
What I meant more from my comment is that what's advertised from the Rails runner is more what we're after. It advertises that it automatically detects the number of cores on the machine, creates the test databases for us, and then splits up our tests and runs them. We love Rspec and will continue to use it over Minitest, but it would be awesome if there were an analog to the parallel features with Rspec that was as easy to use as the Rails solution.
Thanks again for your detailed reply. I really appreciate it!
I can't speak to the rest of it, but as for merging JUnit reports, I've done that before in a different context. The file format is pretty simple, it would just be a matter of a little Nokogiri code to open each report and output a bigger report with the contents of each. Just have each rspec process output in JUnit format and then handle the aggregation and printing of results yourself
@JonRowe Discourse would be happy to fund some work here if you or any other contributors are looking to build it.
Some requirements from me :)
Dedicated gem, plug-and-play into your existing rspec rails app. Must be drop in replacement for Discourse.
Fork per testing process
Pull vs push model. (our bin/turbo_rspec is still the old push model, it means that some workers can be doing nothing while other workers have backlogs of work)... instead similar to @tmm1's test_queue
master process has a queue of tests and child processes just ask it for the next test to run via an IO.pipe or some other cheap transport.
Non interleaved output like Discourse's bin/turbo_rspec
Long term also offers discourse bin/rake autospec
which is basically a interruptible guard.
If anyone want to work on this and would like the work funded, DM me via Twitter. @samsaffron
My plan is to build support for parallelism into rspec-core, which solves many of the problems with plug'n'play, both as our own option, and the ability to farm work out to other plugins. Fork is absolutely on those planes. I very much like the idea of the master rspec process controlling the queue like that, as it allows much easier non interleaved output, which was very much essential.
From there rspec-rails
would than integrate the Rails system test helpers for their database support etc
I hadn't thought about the possibility of such a model being a long running process, and reloading / rerunning specs, thats got a set of complications all on its own (as I'm sure you know!)
@JonRowe it must be noted too that parallelism in Rspec will also give a strong acceleration to all the "TDD devops" tools which happen to be based on RSpec (like InSpec and ServerSpec). Just a side-dish, but a very nice one, and definitely useful to the reputation of Ruby as a whole ^_^. Given that, maybe Chef could also be willing to provide some funding in that area (I'm not working there, but just making a wild guess!).
@JonRowe @SamSaffron My company, CommitChange, would be able to add a small amount to the parallel testing pot. It'd be well under $1000 (we just don't have the resources to spend a lot more) but it's something if it'd help. I totally get if this isn't enough to bother but there's likely many smaller users who might be able to put in a few hundred.
thanks everyone, I have been chatting privately to @JonRowe and it looks like Jon may have some time for this project, stay tuned :confetti_ball:
+1
Any updates or PR's to share here? Super excited.
Discourse are sponsoring @JonRowe here a bit to ensure he has enough time to work on internals
We are also sponsoring @ioquatix who is working on a new gem to do general work queue based parallel testing with an rspec backend.
Nothing is ready for testing yet, but we hope to have something in the next couple of months.
That's great news! FWIW we've also recently put up our own implementation that uses a pull mode with separate processes and a centralized work queue, backed by Redis: rspecq. It's been working great (having executed over a 700 CI builds so far, for a large RSpec suite in our Rails monolith). That said, it'd be awesome to have build-in support in rspec-core :rocket:
cc @ArturT
Thanks @pirj for pinging me.
Recently I've been working with @shadre on running in parallel a slow RSpec test file split by test examples on multiple CI nodes.
We also prepared an article how to run slow RSpec files in parallel using Github Actions.
@SamSaffron @JonRowe may we have a quick update on the progress, please?
@ioquatix is working on a parallel runner, it is getting there he can link to the repo and provide a bit of an update if you feel like testing.
We still need to polish off some small rough edges but it works overall.
@ioquatix is working on a parallel runner, it is getting there he can link to the repo and provide a bit of an update if you feel like testing.
We still need to polish off some small rough edges but it works overall.
Do you mean this one: https://github.com/ioquatix/turbo_test?
Yes that is it ... Samuel is still working on it.
On Tue, Nov 24, 2020 at 2:33 AM Ilya Zub notifications@github.com wrote:
@ioquatix https://github.com/ioquatix is working on a parallel runner,
it is getting there he can link to the repo and provide a bit of an update
if you feel like testing.We still need to polish off some small rough edges but it works overall.
Do you mean this one: https://github.com/ioquatix/turbo_test?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rspec/rspec-rails/issues/2104#issuecomment-732235726,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAABIXKLANAMBECNM63VZKTSRJ6EFANCNFSM4HCVUOKQ
.
I will give you an update this weekend. Sorry, it has been hectic preparing for (virtual) conferences.
Meanwhile, I've extracted turbo tests from the Discourse and Rubygems source code into a separate gem: turbo_tests
. Samuel's work is kinda more long-term from my perspective, as it doesn't use the parallel_tests
gem.
_PS. It's funny that we started working on the same thing separately on the same day (October 30th)._
Most helpful comment
thanks everyone, I have been chatting privately to @JonRowe and it looks like Jon may have some time for this project, stay tuned :confetti_ball: