Pants: Generic incremental strategy (white/blacklist) for fmt/lint

Created on 11 Dec 2018 · 27Comments · Source: pantsbuild/pants

Chances are when folks want to enable any linter/formatter, not all targets can be compliant at the same time, and normally this would be an incremental process.

Therefore a generic mechanism to allow incremental fmt/lint tasks migration would be beneficial, i.e. blacklist, whitelist, or both.

Source

wisechengyi

🎉1 👍1

Most helpful comment

@codealchemy is going to take this on as a starter project. Thanks!

benjyw on 19 Feb 2019

🎉2

All 27 comments

Hi @stuhood @benjyw @jsirois, if the idea sounds sane, I'll probably follow the pattern of https://github.com/pantsbuild/pants/blob/327ddc52e212e972ba9130599e850e570d9baa50/src/python/pants/task/target_restriction_mixins.py

Logic:

whitelist and blacklist cannot be specified at the same time
the content for the lists would be a list of target addresses, so the lint/fmt tasks can filter on them.

wisechengyi on 11 Dec 2018

Yea, could work.

The other prior art though is https://github.com/pantsbuild/pants/blob/327ddc52e212e972ba9130599e850e570d9baa50/src/python/pants/task/fmt_task_mixin.py ... and that might be closer to the mark for this case in general.

stuhood on 11 Dec 2018

Maybe this could be generic to all tasks? You can imagine tasks that are allowed to not compile for some reason (e.g., we have some in the pants repo, and they prevent us from doing ./pants compile ::). So maybe this could be a global option that points to a JSON file that maps task scopes to target regexps? And these would apply to subscopes, so that this would work:

{
  'target_restrictions': {
    'blacklist': 
      '': [<target regexes>], # Targets ignored by all tasks, similar to --exclude-target-regexp
      'lint': [<target regexes>], # Targets ignored by all tasks in the lint goal
      'lint.scalastyle': [<target regexes>], # Targets ignored by just the scalastyle task
  },
  'whitelist': {
    ... same idea ...
  }
}

Then you can implement this completely generically at the v1 task level, by returning just the right targets from context.targets(). It seems like about the same amount of work, and might be better than an ad-hoc solution for lint/fmt. Does that make sense?

benjyw on 11 Dec 2018

@benjyw the problem I see with the generic approach is that you can write down invalid combinations when product dependency chains are involved. If I blacklist a jvm target from compile but I forget to blacklist it from test, test will blow up. It seems like the generic approach would need to propagate target restrictions along product dependency edges. It seems like a solvable problem, but maybe more work up front than "leaf" restrictioons like fmt and lint that (currently) produce no products that other tasks consume.

jsirois on 11 Dec 2018

👍1

Yea, IMO this should only apply to roots via the root Task (or @console_rule) being aware of some explicit option and skipping those roots.

EDIT: And also, that's much more portable to v2.

stuhood on 11 Dec 2018

We still need some kind of solution for, e.g.,./pants test :: not working in our own repo. We should strive for a way of making that work, as it's a pretty natural idiom.

We can restrict it to tasks that produce no products at first, until we figure out the rest?

Or, going to the opposite extreme - could we implement this via target tags? Or is it a strong requirement that this list is managed in one place? If so, could the lists in fact be lists of tags that are centralized but act as-if present on the targets? Then we can completely decouple how we specify which things to skip from the logic of skipping, which now only needs to look at tags on individual targets.

benjyw on 11 Dec 2018

Basically I'm trying to make this as general-use as possible, and not ad-hoc. And we have the tags mechanism that is crying out for uses.

benjyw on 11 Dec 2018

I did not consider the details of Benjy's proposal closely enough.

Then you can implement this completely generically at the v1 task level, by returning just the right targets from context.targets().

Currently, we create 1 Context object for the whole pants run that all Tasks are constructed with. This would mean, that for a given target restriction configuration we'd supply the intersection of restrictions to the 1 Context object. As a result, ./pants fmt compile doc :: would format compile and generate docs for just the slice of targets passing lint restrictions for example. That would be safe, but might be surprising. We could switch to creating a Context per task (each sharing a single Products) in order to feed a different set of active targets to tasks with restrictions configured, but this would lead to the worry about missing products discussed above. The engine though does have enough information to fail the run early when it calculates an implicated task restricts off targets / products needed by other Tasks. We could then say something like "The lint task cannot run for [targets] which are needed for lint-report, please either: 1. run ./pants [restricted targets] lint-report or 2. Update lint-report restrictions to match lint." when the engine fails fast.

I do agree it would be nice to not be ad-hoc with this.

jsirois on 11 Dec 2018

Well, the existing context.targets() can present different sets of targets to different tasks, since the v1 engine knows which task is currently executing, and can modify context appropriately.

benjyw on 11 Dec 2018

But again I'd also be fine with a target tags solution. We can structure the tags so they can be more generic in the future (e.g., 'skip', 'skip-', 'skip-.') but initially only support these for goal=lint or goal=fmt, and implement specially for those tasks.

In other words, I don't object to this currently only being implemented for lint and fmt, I just don't want an ad-hoc schema that only applies to those tasks. As long as it's generalizable in some obvious way, I'm happy.

benjyw on 11 Dec 2018

👍1

We still need some kind of solution for, e.g.,./pants test :: not working in our own repo. We should strive for a way of making that work, as it's a pretty natural idiom.

I agree that this is a problem, but the solution proposed via --no-fast (and also in v2's natural mode of operation: no batching of targets) is to actually fix ./pants test :: to work the same way that running ./pants test $target_n N times would work.

BUT, it is still the case that it can be useful to skip flaky tests using ticket flags (Twitter uses a known_to_fail=JIRA-1234 tag convention for skipping flaky tests). So a generic target flag / tag skipping facility might be good... I would just ask that we make sure it integrates well with the v2/@console_rule model. And it would require a lot more design to make generic I think.

stuhood on 11 Dec 2018

Regarding the scope of the change:

I think John had a good point on having to specify the combination of targets. In the fmt/lint case, targets are independent from each other, i.e. say A depends on B, when ./pants lint A, if linting on B fails, it is okay just to blacklist B, and A is not affected.

Maybe I am oversimplifying the issue here because, for example, android linting would need the entire transitively context, but so far the lint/fmt tasks we currently have in Pants operate on individual target level.

Via tags

The idea seems doable, would it look like the following?

scala_library(
  sources = ...
  tags = {lint.scalafmt}
}

java_library(
  sources = ...
  tags = {no.lint.google-java-format}
}

Via lists

The white/blacklist options seem more straight-forward as there's a separate list for each task.

wisechengyi on 11 Dec 2018

Yes, except I'd do no-lint.google-java-format, to differentiate the prefix from the dotted scope.

The lists can be implemented as out-of-band tags. That is, we apply the tags from the lists to each target on creation. Then everything can proceed as-if they were inline tags. Then this is reusable for all sorts of things, e.g., we could add the integration tag to integration tests by matching a name regex, instead of having to individually tag every integration test target.

benjyw on 11 Dec 2018

Ok I can attempt to implement having fmt/lint reading tags first, then add lists at a later step.

wisechengyi on 11 Dec 2018

This functionality would be exceptionally useful in next project I want to do to add type hints to Pants.

Would also be helpful to Foursquare's gradual adoption of type hints, as they haven't been able to turn mypy on in CI yet even though some modules are now typed. cc @OniOni and @GoingTharn

MyPy may violate one of your assumptions, though.

In the fmt/lint case, targets are independent from each other, i.e. say A depends on B, when ./pants lint A, if linting on B fails, it is okay just to blacklist B, and A is not affected.

With type hints, if you use code from other modules, it will depend on that module, such as module A depending on module B. If there are issues with underlying module B, it will result in those errors for module B surfacing when running mypy moduleA. It will, at least, only surface the errors from module B relevant to module A, so the scope is limited. But there is scope creep.

This applies to other linters too, such as Pylint complaining about not-a-member when module A tries to use a value from module B that does not exist.

I think this is all valid behavior that we want to keep—which we will because the tools themselves handle these relationships. If I understand this proposal correctly, it sounds like Pants will simply be passing a limited set of files to target to the tool itself.

Eric-Arellano on 12 Dec 2018

@Eric-Arellano In the mypy case, would the lint call mypy in a single invocation for all the transitive sources? That is, in the case of A depending on B, ./pants lint A would call mypy A.py B.py

wisechengyi on 12 Dec 2018

@wisechengyi with MyPy, ./pants lint A would call mypy A.py. In the process, MyPy would look up the code for B.py and see if it's used correctly in A.py. That is, it is not directly linting B.py, but using it to help lint A.py, and it may complains about issues that are in B.py that relate to A.py.

This is compared to something like pycodestyle, where ./pants fmt A has zero dependencies on other files.

Eric-Arellano on 12 Dec 2018

@Eric-Arellano : I think that for that case, mypy should maybe be in the compile goal instead.

stuhood on 12 Dec 2018

@stuhood MyPy being in the compile goal could make sense. Technically, MyPy is classified as a linter because it's run over code that's already written and is not at all necessary for Python to actually work. But it is doing similar work to what a strongly typed compiler does.

One counterpoint to this, however: Pylint's not-a-member check. ./pants lint A will call pylint A.py, which will look up the code to B.py to see that it is used correctly in A.py. The relationship dependencies are the same to MyPy (although the scope of the check is of course simpler, only checking if the member exists rather than understanding the module's types).

With both of these tools, they themselves will manage the dependencies for us. We don't need to pass our relation graph to them, and I don't think that would even be possible. All we will do is pass ./pants lint A and then MyPy and Pylint will find the relevant files, based off of import statements.

Eric-Arellano on 12 Dec 2018

./pants lint A will call pylint A.py, which will look up the code to B.py to see that it is used correctly in A.py.

This essentially makes the pants call transitive because when calling mypy A.py, Pants has to prepare B.py to be on PYTHONPATH

wisechengyi on 12 Dec 2018

Dropping in to say: we determined during a retro that this would have been helpful with the 1.14.x release, which got python 3 linting working stably (and thus need a bunch of per-file suppressions, and fixes for anything that could not be suppressed). Making this work generically for more linters/formatters would be great.

stuhood on 13 Feb 2019

👍1

It might be difficult for me to jump back on this within one month or two. Please reassign if it's urgent.

In the meantime, is #6947 still an option to get the feature going, then we can follow up to swap the internal implementation?

wisechengyi on 14 Feb 2019

I can take a swipe at this, perhaps.

benjyw on 14 Feb 2019

Or actually - we have a new employee starting next week, this might be a great starter project for them.

benjyw on 14 Feb 2019

🎉2

In the meantime, is #6947 still an option to get the feature going, then we can follow up to swap the internal implementation?

I think that with minor tweaks to use explicit Subsystems rather than Tasks, that model could work, yea.

stuhood on 15 Feb 2019

@codealchemy is going to take this on as a starter project. Thanks!

benjyw on 19 Feb 2019

🎉2

@codealchemy : Thanks for the patch! The meat of this is now implemented, so will resolve in favor of #7302.

stuhood on 2 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Add gRPC and MyPy support to Python Protobuf

Eric-Arellano · 5Comments

add support for binary build with pyinstaller + staticx

adabuleanu · 6Comments

python_binary doesn't have the singular `source=` argument documented

cosmicexplorer · 4Comments

Add Black support to pants and dogfood it

pierrechevalier83 · 4Comments

`genrule` support via isolated processes

stuhood · 4Comments