Pants: Target Hitting Recursion Limit During Pants Setup (with workaround)

Created on 18 Nov 2020  路  3Comments  路  Source: pantsbuild/pants

Description of Problem

We鈥檙e in the process of migrating from 1.25 to 2.1.0., and hit an issue trying to run a test on specific target. The target is large and results in a max recursion limit exceeded.

I tried hacking on sys.setrecursionlimit and found for our use case 1021 was the min that would allow the test to succeed.

We can try breaking that target up, but the app it is testing is kind of a monolith so i don鈥檛 know how successful that would be.

Can you make a runtime limit in pants to handle?

This error happens in the pants setup before our pytest is run.

Workaround

In one of our plugin's register.py we added sys.setrecursionlimit(1021) and this resolved our problem.

bug user reported

All 3 comments

Although the original issue was resolved, the fix from #11202 seems to be able to peg CPUs when running on a target with a sufficiently large number of files (44 files with 1500 deps each in a graph of more than 200k edges).

We have not yet been able to confirm that the issue is an infinite loop via a profile: profiles appear to show multiple threads individually making progress. And diving in to microbenchmark (of just the code ported in #11202), it appears that we have some super-linear behavior due to lock contention (both the GIL and possibly also our interning):

>>> took 0.40951013565063477 with 1 threads
>>> took 2.203711986541748 with 2 threads
>>> took 11.812678098678589 with 4 threads
>>> took 24.522939205169678 with 8 threads

This is a good microbenchmark to examine both GIL acquisition and interning, so I'd like to see how much we can improve this by reducing contention here (which should pay dividends elsewhere in the codebase). If reducing lock contention isn't sufficient, we can look into adjusting the TransitiveTargets code to do more structure sharing (as it did before #10409) to avoid redundant work.

Status update: I believe that I have a holistic fix for this, as well as for a few other TransitiveTarget performance concerns. I need to nail down one issue with the @rule graph first, but am hopeful that I can have it reviewable/cherry-pickable tomorrow.

Unfortunately the holistic solution (#11270) has a blocker (#11269). For now I'm going to revert the first fix for this ticket, and fix it via #11271.

Sorry for the big mess! Graph cycles are the bane of my existence =)

Was this page helpful?
0 / 5 - 0 ratings