Is your feature request related to a problem? Please describe.
Currently, Accept HTTP header is hardcoded: https://github.com/sphinx-doc/sphinx/blob/dbefc9865d8c2c4006ed52475d1bff865358cd00/sphinx/builders/linkcheck.py#L111. And when I hit servers that require custom headers, the only option is to add those URLs to the ignore list which is what I'd like to avoid.
Describe the solution you'd like
Make HTTP headers configurable.
Describe alternatives you've considered
Adding the affected URL to linkcheck_ignore
Additional context
We have a GitHub Actions badge in README which then gets embedded into Sphinx docs. Running linkcheck used to work but now it doesn't. After some debugging I discovered that if the HTTP query doesn't have Accept: HTTP header, it works. But the header that Sphinx injects causes GitHub's server to respond with HTTP/1.1 406 Not Acceptable.
Interestingly, if you open this URL in a browser, it works: https://github.com/cherrypy/cheroot/workflows/Test%20suite/badge.svg. Google Chrome sends the following header: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9.
$ curl --head -H 'User-Agent: Sphinx/2.4.3 requests/2.23.0 python/3.7.4' https://github.com/cherrypy/cheroot/workflows/Test%20suite/badge.svg
HTTP/1.1 200 OK
date: Tue, 03 Mar 2020 18:53:13 GMT
content-type: image/svg+xml; charset=utf-8
server: GitHub.com
status: 200 OK
vary: X-PJAX, Accept-Encoding, Accept, X-Requested-With
cache-control: max-age=300, private
etag: W/"6e6be7ee648f0c6c3c74f436c281da7e"
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
expect-ct: max-age=2592000, report-uri="https://api.github.com/_private/browser/errors"
content-security-policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; connect-src 'self' uploads.github.com www.githubstatus.com collector.githubapp.com api.github.com www.google-analytics.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com wss://live.github.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; frame-src render.githubusercontent.com; img-src 'self' data: github.githubassets.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; manifest-src 'self'; media-src 'none'; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com
Age: 0
Set-Cookie: _gh_sess=p238CMtx5HWH1dro34Ug5297UE6yfWFIdIXjOC%2Fz6c0KFat8kP6FKO%2BpnLDFOrOop4N%2FjA%2FnKLDavWjC6VVQYoPNNbqh%2B4N41map9mUfvFhhx8HMW19Du1h5fn9g2Tv4TZcNSJfwfFV465Xzxq9t213ud1LEQEukuzbcIFn1hNy%2FBbmJ%2BF0MjS6eZk%2BPVQ2kLNdrtaBz%2BJ6RFTwhyu7nrxXLbgh08T2mBKLI8BREu3%2Fh1f7S%2FJ%2BIaQFq5mFItrQ140%2BSDmMgWF7tGKuZqDnHYw%3D%3D--YFLr0%2B3yKMbqGo%2Ff--P2WJDemx1goxFvxleo%2FnsQ%3D%3D; Path=/; HttpOnly; Secure
Set-Cookie: _octo=GH1.1.1438747173.1583261593; Path=/; Domain=github.com; Expires=Wed, 03 Mar 2021 18:53:13 GMT; Secure
Set-Cookie: logged_in=no; Path=/; Domain=github.com; Expires=Wed, 03 Mar 2021 18:53:13 GMT; HttpOnly; Secure
Accept-Ranges: bytes
Content-Length: 2211
X-GitHub-Request-Id: 1C24:16DCA:5FBDEC6:880AF26:5E5EA799
$ curl --head -H 'Accept: text/html,application/xhtml+xml;q=0.9,*/*;q=0.8' -H 'User-Agent: Sphinx/2.4.3 requests/2.23.0 python/3.7.4' https://github.com/cherrypy/cheroot/workflows/Test%20suite/badge.svg
HTTP/1.1 406 Not Acceptable
date: Tue, 03 Mar 2020 18:53:49 GMT
content-type: text/html
server: GitHub.com
status: 406 Not Acceptable
vary: X-PJAX, Accept-Encoding, Accept, X-Requested-With
cache-control: no-cache
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
expect-ct: max-age=2592000, report-uri="https://api.github.com/_private/browser/errors"
content-security-policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; connect-src 'self' uploads.github.com www.githubstatus.com collector.githubapp.com api.github.com www.google-analytics.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com wss://live.github.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; frame-src render.githubusercontent.com; img-src 'self' data: github.githubassets.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; manifest-src 'self'; media-src 'none'; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com
Age: 0
Set-Cookie: _gh_sess=cq2fhZutOVFanPybUxb%2F5FN5FRD9j%2FKOq2N5WN83m30t6Xnu8y1Zgcc4kBIw0MiYid9VOJTComfgw5O4jAWg91GLK0peYu9XfNKn2bPmd7GDmjYwak2QE%2FvElg%2BVs8yuL8lMOdtZSxAfQdObkQHyPM9KCs%2FXj7qofetrUASScJ2v%2BBdIw%2BUDANHDp%2FoH0ckbWIY4ouHQD%2BAy1KG00IMLjyRJ%2Fgr0V57JhemCUNk0pqscP7vFagUR%2BicETzEd2%2B%2Fy45pkpTTiwqds%2BFyoPoxn1g%3D%3D--Po2%2Boh3TsKnH2dDk--uLvCvDG7SDRtQP9jQ5%2B3Pw%3D%3D; Path=/; HttpOnly; Secure
Set-Cookie: _octo=GH1.1.1102872677.1583261629; Path=/; Domain=github.com; Expires=Wed, 03 Mar 2021 18:53:49 GMT; Secure
Set-Cookie: logged_in=no; Path=/; Domain=github.com; Expires=Wed, 03 Mar 2021 18:53:49 GMT; HttpOnly; Secure
Content-Length: 0
X-GitHub-Request-Id: 1E08:1FAA7:4596C76:6318A3E:5E5EA7BD
Confirmed. It seems better not to send Accept: header to GitHub. On the other hand, some server requires the header (see #5140). So it would be better to allow to customize it via code or configuration.
Just an idea, linkcheck_request_header might be helpful for such case:
linkcheck_request_header = {
'*': {'Accept': 'text/html,application/xhtml+xml;q=0.9,*/*;q=0.8',}
'https://github.com': {},
...
}
@tk0miya this looks like a good idea.
Oops, I've overlooked to work for this issue on the 3.0 release... I just set the milestone for this issue now.
Most helpful comment
Confirmed. It seems better not to send
Accept:header to GitHub. On the other hand, some server requires the header (see #5140). So it would be better to allow to customize it via code or configuration.Just an idea,
linkcheck_request_headermight be helpful for such case: