Dvc.org: scripts: link check improvements

Created on 15 Feb 2020  Â·  10Comments  Â·  Source: iterative/dvc.org

UPDATE (by @shcheklein) - making it p0 since people stopped paying attention to CI due to the last item in the list below. We can make the last item only as a separate p0 PR.


Regarding checking for dead links:

  • support comments in exclusion list (https://github.com/iterative/dvc.org/pull/978#issuecomment-586562774)
  • support excluding files/dirs from checks (.gitignore-like https://github.com/iterative/dvc.org/pull/997#issuecomment-586545594)?
  • support excluding lines (noqa-like https://github.com/iterative/dvc.org/pull/997#issuecomment-586545211)?
  • possibly yarn build and then look for <a href>/<ims src>
  • update CHECK_LINKS_RELATIVE_URL in CI automatically (instead of https://dvc.org, use predictable URLs - https://devcenter.heroku.com/articles/github-integration-review-apps#selecting-the-url-pattern) so can cross-references to newly added content in PRs
  • on 405 try GET instead of HEAD. Affects towarddatascience links. Remove them from the exclusion list after that.
discussion enhancement priority-p0 website

Most helpful comment

Updated. Made the last item in the list p0, since people stopped paying attention to CI.

All 10 comments

btw @shcheklein if you're happy to revert https://github.com/iterative/dvc.org/blob/64f4c4840bac3262710def805ad609de04650f8c/scripts/link-check-git-all.sh#L2 back to using git instead of find then it would be much easier to actually use git's internal .gitignore logic to exclude files

@casperdcl also it's outputting

find: ‘/.../dvc.org/pages/’: No such file or directory

every time now. I think it has to be updated given the new project structure.

See also #1123

Feel free to use #1124 to address some of this stuff.

UPDATE: Moving to https://github.com/iterative/dvc.org/issues/1123#issuecomment-610765722

Updated. Made the last item in the list p0, since people stopped paying attention to CI.

The first step to solve this is to set the predictable Heroku deploy URLs on the dvc.org Heroku project. This opens up the question of which prefix to use, but I would think the example behavior of just using the project name, dvc-landing, is pretty acceptable to everyone since nothing but machines will interact with these URLs.

From there, we'll probably have to make sure the link checker is able to handle the case where a CI check runs before the deploy preview is up. The current integrations on Heroku or CircleCI may already handle this, but if so I'm unaware.

Looking at the link check scripts, it should be pretty easy to adapt them to the predictable Heroku PRs once Heroku and CircleCI are set up for it. There is a quirk in the CircleCI builtin env vars CIRCLE_PR_NUMBER env var saying it only is present on "forked PRs", which could be interpreted to mean it isn't friendly to branch workflow but I'm not sure- especially in the present day.

Though, if that is the case, I believe we can chop up the CIRCLE_PULL_REQUEST env var which is present on every run and get the PR number from that. It just seems really weird if we'd have to do that, but the option exists in case the worst case is true.

I can make the Heroku changes and get started at any time, but I think it'd prudent to hold off until I get the go-ahead in case there's a particular reason we don't already have predictable preview URLs. Would @shcheklein or @jorgeorpinel know if this is the case?

@rogermparent consider this option (if it's technically possible):

  • use Github action for this check
  • in Github action hopefully there is a way to run it when "environment" is ready and keep it pending until it is deployed
  • hopefully we should be able to get the environment URL from the GH action

Also, for the forked PRs- Heroku doesn't deploy them automatically, means that we can't keep CircleCI running infinitely .

handle the case where a CI check runs before the deploy preview is up

hopefully there is a way to run it when "environment" is ready and keep it pending until it is deployed

What's wrong with periodically pinging the predictable url with a timeout?

What's wrong with periodically pinging the predictable url with a timeout?

wasting resources? deploy might take >10mins

Quick comment: we're still getting 404 errors when new docs are added in a PR e.g. https://app.circleci.com/pipelines/github/iterative/dvc.org/5841/workflows/3fb81bb8-eb90-42f7-8bf7-6e845b278ecb/jobs/5897 (for #1705). Do we have a separate issue or hope to be able to fix that at some point? 🙂

Thanks

@jorgeorpinel Try merging master into your branch, the GitHub Actions-based checker that fixes the issue runs off the workflow files in the PR, so PRs made before the new checker was added will need to be merged before the issue is solved in a particular PR.

Ah makes sense, OK cool np then!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kurianbenoy picture kurianbenoy  Â·  5Comments

jorgeorpinel picture jorgeorpinel  Â·  3Comments

utkarshsingh99 picture utkarshsingh99  Â·  3Comments

elleobrien picture elleobrien  Â·  4Comments

piojanu picture piojanu  Â·  4Comments