Dvc: hooks: pylint: pre-commit failure

Created on 1 Jul 2020 · 19Comments · Source: iterative/dvc

.pre-commit-config.yaml states that the pylint hook is defined locally, yet .pre-commit-hooks.yaml fails to define it. As a result, attempting to commit will result in an error:

~/gits/dvc (master)$ git commit
black....................................................................Passed
seed isort known_third_party.............................................Passed
isort....................................................................Passed
flake8...................................................................Passed
pylint...................................................................Failed
- hook id: pylint
- exit code: 1

Executable `pylint` not found

beautysh.............................................(no files to check)Skipped
DVC pre-commit...........................................................Passed
- hook id: dvc-pre-commit
- duration: 0.5s

Data and pipelines are up to date.

Defining the hook's language as system and asking devs to manually run pip install .[tests] goes against fundamental principles of pre-commit hooks. None of the other hooks require this.

In any case pylint itself defines .pre-commit-hooks.yaml so we should just use theirs.

build discussion testing

Source

casperdcl

Most helpful comment

Ok, so looks like "do nothing" is the choice for now. Closing.

efiop on 15 Jul 2020

👍2

All 19 comments

@casperdcl, as pre-commit runs on an isolated environment, we'd need to disable a lot of configuration to do this way.

Also, contributors should already install from .[tests]. Without that, how'll they be running tests?

None of the other hooks require this.

dvc-pre-commit is already a local run, for which you do need to have activated virtualenv beforehand and installed all reqs.

Is it too difficult to pip install .[tests]? If it's that inconvenience, we could just pop it off from the pre-commit hooks.

See: https://github.com/pre-commit/pre-commit-hooks/issues/157

skshetry on 2 Jul 2020

setting correct configuration via the hook itself (e.g. commandline args) is the correct way
I'd strongly recommend against pip install .[tests] especially for DVC where there are many tests which won't even run locally due to lack of dependencies/credentials. pre-commit => quick local tests, CI => full tests. On the rare occasion where an outside contributor is developing a big new feature where it's a pain to constantly wait for the full CI tests, it's usually easiest (and probably required) to write a new test... and just run that test manually locally when debugging. Certainly not the whole test suite.
- I'd even more strongly recommend against breaking pre-commit install by suddenly insisting on pip install .[tests] as a new pre-requisite
re: dvc-pre-commit; having a local run hook that is genuinely provided locally is acceptable - in this case, dvc is really provided by the local repo
even with the hook's dependencies locally installed - pylint, pylint-pytest (not available via conda) and pylint-plugin-utils - the hook takes a long time to run, which is annoying.

For now I'm fine with popping off from the pre-commit hooks as you suggest; this seems far less painful than leaving it as-is.

casperdcl on 2 Jul 2020

re: dvc-pre-commit; having a local run hook that is genuinely provided locally is acceptable - in this case, dvc is really provided by the local repo

@casperdcl We still need to pip install all dvc dependencies for that, and pylint is a test dependency, so we can run system's pylint just fine.

Do I understand correctly that you guys are discussing removing pylint from pre-commit hooks? I'm strongly against that, it is helping to discover issues before creating a PR and is certainly faster than waiting for CI to error-out.

efiop on 2 Jul 2020

@efiop, I'm still looking for answer for "Is it too difficult to pip install .[tests]?" from @casperdcl.
Ideally, I'd also love to have it isolated, but in practice, the benefits outweigh the cons.

If it's too hard to install our test dependencies, I could create a custom script that just throws a message to the user
complaining that pylint is not installed and does not break pre-commit runs. But, there should be a very strong argument to do so.

skshetry on 2 Jul 2020

👍2

@skshetry @casperdcl why does pylint is different in this sense from black, isort and other "linters"? (just curious where does this problem come from). They should be the same to mind, have the same flow, conventions, etc.

shcheklein on 2 Jul 2020

@shcheklein, pylint also does dynamic analysis for some checks, not just static analysis.

You can see some failures here: https://github.com/iterative/dvc/pull/4146/checks?check_run_id=827566035#step:5:124

When in isolated environment, not all dependencies are installed and pylint starts complaining about missing-import or no-member. We could either disable these, but as we have just started now, it's better to keep it for now, and we'll see.

Also, we are using a plugin pylint_pytest that does dynamic execution to get all the fixtures to cut down the noise (related to function arguments name being same as pytest fixtures).
And, also I wanted to enable pylint for all of our codebase (including tests), we have our own custom plugin to silence some issues that are thrown in tests.

So, TLDR: it does dynamic execution too, and is hence able to find more issues.

skshetry on 2 Jul 2020

@skshetry gotcha, makes sense! I would vote probably for providing some happy path in simple scenarios (warn but do not fail if env is not fully setup), assuming we always run it on CI/CD.

shcheklein on 2 Jul 2020

👍1

Is it too difficult to pip install .[tests]?

"Difficult" is not really the adjective I'd use... but for a casual newcomer, yes. It would be a complication that would be difficult and off-putting. Personally I'd say "nuisance" and "incorrect" rather than "difficult" (see below)

why does pylint is different in this sense from black, isort and other "linters"?

This is the point. It's not really a linter. It's really more of a full-blown test suite.

it takes a long time to run (several seconds each commit)
it actually imports all libraries mentioned in the code to do the analysis and thus errors out when run in an isolated env where these deps aren't installed - very unlike a linter (black/isort/flake8)
python -m tests is not added to pre-commit so neither should pylint, really
pip install .[tests] is for python -m tests, while pre-commit install is for, well, pre-commit. There should not be a dependency.

casperdcl on 2 Jul 2020

So, TLDR: it does dynamic execution too, and is hence able to find more issues.

exactly. python -m tests can be described in precisely the same words. Unless you want to add python -m tests (perhaps with --lf flag) to pre-commit hooks too, pylint really shouldn't be there.

casperdcl on 2 Jul 2020

it takes a long time to run (several seconds each commit)

this is a good point, btw. I would say that even existing DVC hook (dvc status?) is quite annoying already when you develop fast and do commits.

shcheklein on 2 Jul 2020

So looks like nothing to do here? Or am I missing something?

For the record: if you want to disable the hooks just use --no-verify.

efiop on 14 Jul 2020

@efiop I think the consensus between (@shcheklein, @skshetry, and I) is to either remove or allow failures of the hook in pre-commit (but leave it in CI).

we could just pop it off from the pre-commit hooks.

See: pre-commit/pre-commit-hooks#157

gotcha, makes sense! I would vote probably for providing some happy path in simple scenarios (warn but do not fail if env is not fully setup), assuming we always run it on CI/CD.

As I understand, @efiop is against this idea because:

it is helping to discover issues before creating a PR and is certainly faster than waiting for CI to error-out

However, the same could be said of every single non-authenticated CI/CD test. It is also not a fast test, it takes a few seconds so really shouldn't happen pre-commit. Using --no-verify also disables ALL pre-commit hooks which is not a solution.

this is a good point, btw. I would say that even existing DVC hook (dvc status?) is quite annoying already when you develop fast and do commits.

casperdcl on 14 Jul 2020

I feel like we are mixing up topics here.

The original issue is pre-commit failure, which, as described above, is not nice, but an intended way of using it. I see @casperdcl suggested using init hook in https://github.com/iterative/dvc/pull/4146#issuecomment-652552793 , which looks like the proper way to solve it. @skshetry could you take a look, please?

The second issue that came up is pylint not being very fast (and, well, the whole pre-commit set of hooks). I can't see any way of making it significantly less intrusive without defeating the purpose of having pre-commit hooks in the first place. I would say that the slowest in the pack is our dvc hook itself, so probarly would start there (it was the intention so we eat our own food).

efiop on 14 Jul 2020

👍1

Not sure about the "DVC hook taking the longest" though - on my machine pylint it much slower; maybe that's just me.

casperdcl on 14 Jul 2020

I see @casperdcl suggested using init hook in #4146 (comment)

I don't like that approach/hack, it's far easier to pre-activate virtualenv. And, it does not solve the issue, it still needs
all of the tests dependencies to be installed.

The possible ways to solve this are:

Remove pylint from pre-commit altogether. Add a guide to how-to run pylint in contributing docs.
Create a wrapper script that'd be a no-op if pylint is not installed.
Do nothing.
Make it a pre-push hooks?

I don't have any strong opinions on either of the above.
If we cannot commit to anything, I'd suggest we close the issue and defer it for later.

cc @casperdcl @efiop

skshetry on 14 Jul 2020

👍2

I was thinking about option (2) as well; sounds reasonable?

casperdcl on 14 Jul 2020

Noop is too dangerous, people will just ignore it. Could try to improve the message, but I'm not sure if it could help much, since it is descriptive already. If not that and there is no better option, I would just leave it as is for now.

efiop on 14 Jul 2020

I think no-op would be frustrating for contributors -> silent pass just to fail on PR, each time we have new python env.