It supposed to behave in the same way as man.dvc.org/get-url
The fix should be applies to the server.js file.
That is funny. Probably a bug in the redirect code...
This is still broken after attempting a hotfix. In fact now man.dvc.org/get-url is also broken! Which is weird because I tested the logic in server.js and it works well locally. Here's the latest prod deploy diff: https://github.com/iterative/dvc.org/compare/37aab6a0...a07bd0c5#diff-78c12f5adc1848d13b1c6f07055d996eL37-R37 for this code:
https://github.com/iterative/dvc.org/blob/a07bd0c5b1efd749e830dd950a19cc4a8a45202c/server.js#L34-L42
['/get-url', '/import-url'].indexOf(pathname) < 0 basically checks that pathname is not one of those 2 exceptions, before the code replaces - for / in any other path. Tested locally with node:
> ['/get-url', '/import-url'].indexOf('/get-url')
0
> ['/get-url', '/import-url'].indexOf('/import-url')
1
> ['/get-url', '/import-url'].indexOf('/cache-dir')
-1
However on prod everything gets replaced, even when indexOf results in >= 0
Changed code to this:
https://github.com/iterative/dvc.org/blob/1376f2a3fce6886faa2ff344e66826039cd43395/server.js#L34-L46
to make it easier to read. This code also works well in manual tests, but it's also not working yet on prod ):
There's 2 things going on here ^ BTW:
- for / in man.dvc.org/ paths (except /get-url and /import-url).We could separate each one for even further clarity, but that would result in an extra redirect.
Related: #757
I think the reason it does not work for me is caching of redirects:
Request URL: https://man.dvc.org/import-url
Request Method: GET
Status Code: 301 Moved Permanently (from disk cache)
Remote Address: 35.170.171.200:443
Referrer Policy: no-referrer-when-downgrade
If I run it a private windows everything works fine. So, we should check if there is a way to manage cache lifetime for redirects.
I guess this is the issue we bumped into! 301 Redirects: The Horror That Cannot Be Uncached
Basically you can only do a 301 redirect once...
So right now we have to deal with the fact that browsers and search engines have already cached all of our 301 redirects. We would need to turn the ones we're not so sure are permanent into 302 redirects and then wait a month or more to assume the caching is gone.
In the mean time, for these 2 specific cases we could setup 302 redirects from /import/url and /get/url to /import-url and /get-url respectively. These would be second redirects after the cached one from man.dvc.org...
I would try to avoid 302s . As I mentioned our redirects are 301s in nature. We do not plan to go back to the previous page and serve content directly from man.dvc.org/something for example.
Let's do some experiments with no-cache. Let's not worry about stuff that is already cached somewhere outside. First let's worry about the solution moving forwards (new redirects, new clients).
Right, that's an example of a permanent one, from man.dvc.org to dvc.org/doc/command-reference
But it used to turn - into / and now we don't really want that, so that part was not so permanent. 馃構
man. subdomain to the cmd ref path, but make a separate 302 that turns - to / in some paths only (instead of excluding just import-url and get-url) for backward compatibility.What do you think?
Let's not worry about stuff that is already cached somewhere outside.
I guess in this case we can indeed ignore it. I doubt many people have used man.dvc.org/import-url or /get-url and definitely no search engines since man.dvc.org URLs are only printed in our command line. 馃檪 (I think)
I think
there are probably some links outside, not many indeed
We should keep the 301 redirect from the man. subdomain to the cmd ref path, but make a separate 302 that turns - to / in some paths only (instead of excluding just import-url and get-url) for backward compatibility.
sounds too complicated to be honest. Let's first see what we can with no-cache policy.
Almost all our redirects are permanent in nature, I actually don't see a good example of 302.
Almost all our redirects are permanent in nature, I actually don't see a good example of 302.
Well, let's see our current redirects in server.js:
Seems like you're right! Only the - for / replacement under man.dvc.org redirects seems to me like something we should address with 302s or 303s or 307s or 308s. (See https://github.com/iterative/dvc.org/issues/757#issuecomment-549494538)
Let's first see what we can (do) with
no-cachepolicy.
But OK! Let's try this first...
replace - for / in {cmd} except for /get-url, /import-url = Temporary (backward compat)
it's also permanent, I think. We don't plan to change it back right? It's temporary in a sense that we will remove it completely soon and start returning 404. Tbh, would not worry that much about redirects like man.dvc.org/remote-add at all because of this.
302s or 303s or 307s or 308s.
need to be careful with this - I don't know anything about 30x, x>2. :)
Let's first see what we can (do) with
no-cachepolicy.
OK I added Cache-Control: no-cache to the 301 header. Will have to merge #772 to check the effect (on new client only).
I don't know anything about 30x
302 (Found (Previously "Moved temporarily")) has been superseded by 303 and 307.
303 See Other: "The response to the request can be found under another URI using the GET method."
307 Temporary Redirect (since HTTP/1.1): "The request should be repeated with another URI; however, future requests should still use the original URI... the request method is not allowed to be changed..."
Seems to me 307 is exactly what we want during a period of backward compatibility with certain man.dvc.org/cmd-subcmd URL paths that used to need converting - to /.
From https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#3xx_Redirection
yep, I just don't know how well all these codes are supported that's why a bit cautious
OK so now that #772 is merged, I opened an incognito window in Chrome and went to https://man.dvc.org/import-url, then tried again. Both times I see this same Network response:
General
Request URL: https://man.dvc.org/cache-dir
Request Method: GET
Status Code: 301 Moved Permanently
Remote Address: 52.72.245.79:443
Referrer Policy: no-referrer-when-downgrade
Response Headers
HTTP/1.1 301 Moved Permanently
Server: Cowboy
Connection: keep-alive
Cache-Control: no-cache
Location: https://dvc.org/doc/command-reference/import-url
Date: Tue, 05 Nov 2019 00:08:45 GMT
Transfer-Encoding: chunked
Via: 1.1 vegur
No more (from disk cache). I guess this means the issue is fixed? Not sure how else to confirm.
Can also text with
curl -Lv man.dvc.org/some-thing > /dev/null.
Alright, seems like this is fixed now even for previous clients.
Having a no-cache 301 "Moved Permanently" seems counterintuitive, but I guess that's the easiest solution here, and allowed by the HTTP 1.1 standard: https://tools.ietf.org/html/rfc2616#section-14.9
I still think we should use 307 for some or all the backward compatibility redirects (see https://github.com/iterative/dvc.org/issues/768#issuecomment-549494662) but OK, will address this later in #757.
@jorgeorpinel I don't think it's fixed unfortunately. I think Firefox at least does not respect the flag.
Boo FF. Happily it has like 5% of the market. https://en.wikipedia.org/wiki/Usage_share_of_web_browsers#Summary_tables
Safari is more relevant, but I can't tell whether it respects the cache-directive.
It's 11% for dvc.org, the same as Safari :)
So should we separate the 301 redirect from man.dvc.org to /doc/cmd-ref from a new 302 or 307 redirect that replaces - for / in certain paths only? And if so, do you know which exact commands with dash we are trying to provide backward compatibility for in this redirect?
And if so, do you know which exact commands
almost all of them (except import-url and get-url)
from a new 302 or 307 redirect that replaces - for / in certain paths only
let's not do this. Not that we understand that it's only caching issue, I think we can keep 301s. Most of our users will see the right result. When we get to the #757 we can think one more time about what codes we apply where - most likely most of them will be 301s anyway to keep SEO.
let's close this for now? good findings and good research for 757.
let's not do this... When we get to the #757 we can think one more time about what codes
Sounds like a plan.
most likely most of them will be 301s anyway to keep SEO
We'll see haha 馃檲
let's close this for now?
Sure! You reopened it so I was just thinking out-loud what alternatives we had left. Closing! 馃檪