Dvc.org: scripts: link check compatibility and docs

Created on 8 Apr 2020  Â·  13Comments  Â·  Source: iterative/dvc.org

Compatibility: it doesn't run out-of-the-box because pcregrep is not usually on the OS.

  • Not sure about MacOS (but I remember sed was a problem there)
  • On Ubuntu I had to sudo apt install pcregrep
  • For Windows, I got it from https://sourceforge.net/projects/gnuwin32/files/pcre/, installed and put in PATH.
    sed will likely be a problem in some Windows terminals too, but it can also be obtained from GNUWin (http://sourceforge.net/projects/gnuwin32/files/sed).

By "and docs" I mean we should probably at least write the above info and how to use these commands in the README.

bug dependencies enhancement

All 13 comments

Cc @casperdcl

It sounds too complicated if we run it locally on pre-commit. We should not require installing stuff with apt (or even worse on Windows) for the link checker.

Sure, I guess I'm just suggesting we add this info to the README, at least 🙂
...so maybe this is a duplicate or can be merged into #1125

no, I think it's a bug to require this in the first place. We should not require more than yarn on most reasonable platforms.

...so maybe this is a duplicate or can be merged into #1125

no, this is a different one and is not related with the recent migration to Gatsby.

Got it. Also note it seems to have another bug as noted in https://github.com/iterative/dvc.org/issues/1000#issuecomment-610762657:

It's outputting

find: ‘/.../dvc.org/pages/’: No such file or directory

every time now. I think it has to be updated given the new project structure.

find: ‘/.../dvc.org/pages/’: No such file or directory

can we do a quick fix for this?

You're right, that's pretty simple ha. OK fixed in 9665edc3 (#1124)

Compatibility: it doesn't run out-of-the-box because pcregrep is not usually on the OS [...] Not sure about MacOS (but I remember sed was a problem there) [...] likely be a problem in some Windows terminals too

and

[...] sounds too complicated if we run it locally on pre-commit. We should not require installing stuff with apt (or even worse on Windows) for the link checker [...] it's a bug to require this in the first place. We should not require more than yarn on most reasonable platforms.

link-check was never intended to be run locally. We had discussions about this originally and there were several reasons:

  • pcregrep really is required. Most regex implementations (including python's own as well as any I could find via yarn) lack the features required to make nested bracket detection (as with markdown links) work flawlessly
  • it can take a few minutes to check all links (which is why it's best done in the cloud via a cron job)
  • ergo the scripts were written in bash, which itself could be an issue on e.g. Windows

Compared to the hassle of finding work-arounds, I don't think it's too much of an issue to wait for small diffs to be checked by CI rather than installing & running locally.

By "and docs" I mean we should probably at least write the above info and how to use these commands in the README.

... maybe. I think just documenting in code is enough purely for the purpose of maintenance of CI scripts.

The relevant lines are:

https://github.com/iterative/dvc.org/blob/7d259fb8a480c46e26a42577524efd61945e1b61/.circleci/config.yml#L35
https://github.com/iterative/dvc.org/blob/7d259fb8a480c46e26a42577524efd61945e1b61/package.json#L18-L19

using:

both of which point to:

https://github.com/iterative/dvc.org/blob/7d259fb8a480c46e26a42577524efd61945e1b61/scripts/link-check.sh#L1-L6

@jorgeorpinel let's remove it from docs for now. Since there is not intention to support it locally for now, let's not bother documenting a lot of stuff.

@casperdcl may be we can write a comment in the script itself about its dependencies and this script is purely for CI to run?

let's remove it from docs for now

Removed in 4bf9eea7 (for #1160)

Also worth looking at the original PR #958.

E.g.: https://github.com/iterative/dvc.org/pull/958#issuecomment-583359254 states:

[prcregrep is] one of the reasons why this is a CI-only check by default rather than a pre-commit hook... no guarantee that it'll run on dev machines

Closing this. I think it's resolved more or less. Decided to keep it CI focused mostly.

Was this page helpful?
0 / 5 - 0 ratings