4, 5
This is a follow up on https://github.com/silverstripe/silverstripe-framework/pull/9074/files#r298450879
Director::absoluteURL method does not seem to have any means to produce proper absolute URLs including SS_BASE_URL. As long as input path is absolute, it will always ignore SS_BASE_URL.
This behaviour also seems to be covered by the tests.
Basically these tests cover the behaviour of ignoring the base url for absolute paths:
Director::config()->set('alternate_base_url', 'http://www.mysite.com:9090/mysite/');
$this->assertEquals("http://www.mysite.com:9090/root", Director::absoluteURL("/root", Director::BASE));
$this->assertEquals("http://www.mysite.com:9090/root", Director::absoluteURL("/root", Director::ROOT));
$this->assertEquals("http://www.mysite.com:9090/root", Director::absoluteURL("/root", Director::REQUEST));
That means we need to either "sanitize" the input URL before passing it to that method, or don't use the method.
Another option is to deprecate it and provide more clear APIs around working with URLs?
Just clarifying my understanding of the problem.
Let's say your site is accessible under 2 domains: example.com and example.org. You want example.org to be the primary domain, so you set that as your SS_BASE_URL.
However, when a user access the site on the DotCOM domain and the site outputs an absolute URL, it ignores the SS_BASE_URL value in favour of the domain used to access the site.
Is that the behaviour we are finding problematic?
It鈥檚 absoluteURL(), not canonicalIRL(). 馃槈
My thinking aligns with @unclecheese. The primary use case for absoluteURL is to create a URL that will work in Emails or in other places outside of SilverStripe.
Although there could be a use case for canonicalURL as well.
I reckon if nobody knows what is this method for and why it's doing what it's doing, we should deprecate it and provide several others, which would be more SOLID :)
I reckon if nobody knows what is this method for and why it's doing what it's doing, we should deprecate it and provide several others, which would be more SOLID :)
It's ensuring that any url you pass into it is prefixed with the appropriate protocol, hostname, and base url.
As an aside, make sure you are only using alternate_base_url for testing purposes (i.e. to override SS_BASE_URL), since it interferes with the operation of urls on live sites. That may be what you're experiencing?
@dnsl48 I answered your concern on that other PR at https://github.com/silverstripe/silverstripe-framework/pull/9074/files#r317874705
It's ensuring that any url you pass into it is prefixed with the appropriate protocol, hostname, and base url.
Yes, that's been my assumption as well. However, as I mentioned above, it seems to be producing incorrect results for absolute URLs.
As an aside, make sure you are only using alternate_base_url for testing purposes. That may be what you're experiencing?
Good to know, cheers :)
However, that's not the case. The snippet I put in the first message is a copy-paste from the framework tests covering the discussion subject.
it seems to be producing incorrect results for absolute URLs
Maybe that's correct result from some perspective, but at least 3 developers have already been confused by what the method actually does. Seems likely the method has misleading name/behaviour/documentation.
This looks like a bug to me, if your base URL includes a sub-folder then absoluteURL() should respect that (IMO)
However, when a user access the site on the DotCOM domain and the site outputs an absolute URL, it ignores the SS_BASE_URL value in favour of the domain used to access the site.
This behaviour is intentional in order to support modules like subsites, by the way. Not that this is necessarily correct (one could argue the SS_BASE_URL should supersede this), but we would need to be careful in changing this behaviour.
Personally, I dislike the Director::REQUEST behaviour of Director::absoluteLink(), and it was maintained primarily to support old legacy behaviour (see the deprecation in the top of the method). I'd be happy to drop this entirely in ss5 to help make the logic more clear. :)
In fact, instead of passing the base "type" as the second argument, we could probably just allow the users to pass in the base path explicitly; If the user wants a "current request" based url, they could pass it in as second, otherwise it would default to the SS_BASE_URL.
I suggest to keep the is_root_relative() check, in order to prevent the double base url combination, however,
How about the main body look something like this.
/**
* Turns the given URL into an absolute URL. By default non-site root relative urls will be
* evaluated relative to the current base_url.
*
* @param string $url URL To transform to absolute.
* @param string $base Base URL to use, otherwise SS_BASE_URL will be used
*
* @return string
*/
public static function absoluteURL($url, $base = null)
{
// Use base provided, or default to SS_BASE_URL
$parent = $base ?: Director::absoluteBaseURL();
// Absolute urls without protocol are added
// E.g. //google.com -> http://google.com
if (strpos($url, '//') === 0) {
return self::parseProtocol($base) . substr($url, 2);
}
// Already absolute
if (self::is_absolute_url($url)) {
return $url;
}
// If url is root-relative, strip subdirectory off base to prevent double-added subdir
if (self::is_root_relative_url($url)) {
$parent = self::parseProtocol($base) . self::parsehost($base);
}
// Map empty urls to relative slash and join to base
if (empty($url) || $url === '.' || $url === './') {
$url = '/';
}
return Controller::join_links($parent, $url);
}
@dnsl48 thoughts?
If you want to fit this into ss4, you could make a new method, and deprecate the old one, however. :) @robbieaverill ?
I guess with the above code, you could always make Director::baseUrl() injectable by subsites?
I suggest to keep the is_root_relative() check, in order to prevent the double base url combination, however.
I cannot think about use cases where it's required. On the other hand, I already had a couple use cases where is_root_relative() prevents that method from being useful.
I think we could come up with a list of use cases we want to address by this particular method, then would be more clear what it should and shouldn't do.
My feeling we might potentially even split SS_BASE_URL into two different vars: SS_BASE_ROOT and SS_BASE_HOST and then check for substr(...) !== $SS_BASE_ROOT instead of is_root_relative, as long as the method needs that concern.
Another potential thing to consider is splitting concerns between two methods: absoluteURL and canonicalURL (I guess that might make sense in context of subsites module?).
I cannot think about use cases where it's required. On the other hand, I already had a couple use cases where is_root_relative() prevents that method from being useful.
Your SS_BASE_URL is http://localhost/mysite/. You have a link you parsed from a HTML text block with <a href="/mysite/about-us" /> and you want to convert it to an absolute link. Your page head has <base href="http://localhost/mysite/" />.
In your PHP code you run Director::absoluteLink('/mysite/about-us').
http://localhost/mysite/about-us.http://localhost/mysite/mysite/about-us.If you click that link in your browser, you will navigate to http://localhost/mysite/about-us. The PHP code needs to replicate the same logic as the browser, otherwise you'll end up with inconsistent behaviour when resolving links on the serverside.
Let's say you have another link in the same situation above. <a href="contact-us"> You want to get the absolute URL of this, so you run Director::absoluteLink('contact-us'). Because this is a non-root relative url, our web browser would have resolved this link differently, appending it to the base tag instead, sending us to http://localhost/mysite/contact-us. Thus, our PHP code replicates the same logic, by appending it to the base instead of stripping the trailing path component (/mysite/).
Please check https://mor10.com/html-basics-hyperlink-syntax-absolute-relative-and-root-relative/ and make sure you understand the distinction.
The important silverstripe-specific principle here is that, when using <base > tag (which SilverStripe currently relies on), root relative URLS IGNORE the path component of the base tag, where non-root relative urls append to the base.
I think we may need to define some common terminology. I suggest we refer to the following RFCs
Simplifying the BNFs from RFC1808 we have the following:
http://localhost/root/page///localhost/root/page/ || /root/page/ || root/page/root/page/root/pageHere's the use cases I see and derived from the discussion so far:
My guess is we're thinking in different contexts about how absolute paths should be handled.
In your PHP code you run Director::absoluteLink('/mysite/about-us').
I'm guessing you meant Director::absoluteURL by that and not absoluteLink.
You have a link you parsed from a HTML text block with
<a href="/mysite/about-us" />and you want to convert it to an absolute link.
Feels like an edge case to me as long as we're talking about backend application?
My main use case for this method so far was to convert routes (read url paths) from yaml configuration into URLs to use them for redirections. In this context considering SS_BASE_URL as a base tag for an html document doesn't make much sense. It should rather be considered as an application subfolder of the web root which should be added for absolute paths as well as for relative ones. Using relative paths for configuring routes doesn't feel semantically correct since that should indicate a route relative to the requested route and not to the application base.
I'm guessing you meant Director::absoluteURL by that and not absoluteLink.
Yes sorry.
Feels like an edge case to me as long as we're talking about backend application?
It's less of an edge case than you'd think; E.g. HTML editor field generating content for an email could use root relative links (or even simply, static templates with such links passing through an absolute link rewriter).
My main use case for this method so far was to convert routes (read url paths) from yaml
There are a few options you could use in this case:
/There are a few options you could use in this case
Yes, I agree, these are valid options for working around.
On the other hand, our APIs around working with URLs are still confusing for developers, which this issue brings up.
Also, looks like Director::is_absolute_url does treat a subset of relative urls as absolute, which is semantically incorrect (accordingly to the RFCs I provided above).
Nearly every Link() method produces a root relative url, but the way. E.g.
Director::absoluteBaseURL(); // http://localhost/base/
Director::baseURL(); // /base/
$page->RelativeLink(); // about-us/
$page->Link(); // /base/about-us
$page->AbsoluteLink(); // http://localhost/base/about-us
Director::absoluteLink($page->Link()); // http://localhost/base/about-us
Notice how the last line converts the root-relative link to an absolute link. SilverStripe follows the convention of preferring to generate root-relative links in Link() method, whenever possible.
//localhost is not a relative link. It's a protocol-less absolute link. I'm happy to be proved wrong if that's the case though.
What relative urls are treated as absolute?
That's correct and documented feature though :) Still, semantically incorrect. This one: //localhost/path is a relative URL, accordingly to the RFC 1808
Ok, according to the rfc it should have a protocol to be absolute. I'm happy for you to suggest a change to remedy that in the is_absolute_url method.
I'll get back to you when I'm back in the office tomorrow. In the mean time please look at the examples I have above and let me know how you feel we should change this api.
Nearly every Link() method produces a root relative url, but the way.
Sorry, I'm not familiar with that terminology too much, so I've been trying to find the common ground by referring to the RFCs above.
To clarify what I mean by that, here are a couple of ideas accordingly to RFCs:
url and pathabsolute or relative//host/ or be a path (either absolute or relative)My feeling is that the use case convert an absolute path to an absolute url is hard to achieve by existing APIs.
Another question is where the absolute path comes from and what it indicates.
how you feel we should change this api
Before suggesting the particular changes I'd like to put together a list of use cases we want it to cover.
I'll start doing that in this post.
HTTP 1.1 Location header supports both Absolute URL and Relative URL. RFC 7231
HTTP 1.0 Location header only supports Absolute URL. RFC 1945
As such, for keeping backward compatibility with HTTP 1.0, as the best practice we may want to always generate absolute URLs for redirects.
As such, I can see 2 scenarios:
Absolute URL for Absolute PathsAbsolute URL for Relative PathsIn case we get the redirect address from YAML configs, we may want to:
SS_BASE_URL to it if it's an absolute pathREQUEST_URI path to it in case it's a relative pathHTTP 1.1 is 20 years old; Do you think we need to worry about 1.0 support? I agree we should never provide relative urls, but at the very least, root relative urls should be safe enough, and are the current convention in silverstripe core.
add SS_BASE_URL to it if it's an absolute path
I've explained in detail why you can't add the whole base url to absolute paths; You can only add the protocol and hostname to resolve these urls to get a correct result. Please read above again, especially the example I gave with $page->Link().
add REQUEST_URI path to it in case it's a relative path
$Link in templates would never follow this resolution method, because it would always use the <base > tag (SS_BASE_URL) as I have also explained. If you use the whole REQUEST_URI you will end up with inconsistent behaviour. E.g. a link from http://localhost/about-us/ to about-us/our-staff would end up as http://localhost/about-us/about-us/our-staff.
@robbieaverill @chillu could either of you please check our assumptions here to make sure we come to a resolution. It may be blocking #9207
HTTP 1.1 is 20 years old; Do you think we need to worry about 1.0 support?
Well, that's a controversial question. However, here's a long living stackoverflow topic which has some stats from someone's random internet servers: https://stackoverflow.com/questions/2073392/is-http-1-0-still-in-use
I wouldn't confidently say 1.0 is not in use anymore.
I've explained in detail why you can't add the whole base url to absolute paths ... Please read above again, especially the example I gave with $page->Link().
It feels like you're mixing up hyperlink with url path and url. Sorry, I'm not exactly sure how Page->Link() is related to what absoluteURL should be doing? These are not really interchangeable things.
Hyperlink is a part of HTML specifications, whereas URL and url-path is from HTTP specifications (I've given a number of RFCs about that above).
If you're saying: we cannot do that because we have Page->link implemented in this way, then it feels similar to what this issue is about - we have APIs which are confusing for developers.
$Link in templates would never follow this resolution method,
This is exactly why I'd like to write up the list of use cases before we discuss particular API implementations.
You're talking about generating hyperlinks, I'm talking about generating HTTP redirects - those are different use cases and may need different approaches.
Sorry, I'm not exactly sure how Page->Link() is related to what absoluteURL should be doing?
SiteTree::AbsoluteLink() calls this method to generate it's absolute link.
/**
* Get the absolute URL for this page, including protocol and host.
*
* @param string $action See {@link Link()}
* @return string
*/
public function AbsoluteLink($action = null)
{
if ($this->hasMethod('alternateAbsoluteLink')) {
return $this->alternateAbsoluteLink($action);
} else {
return Director::absoluteURL($this->Link($action));
}
}
SilverStripe must be internally consistent. This means that both template logic and redirects must be consistent with one another; This is why Link() generates root-relative links, in order to ensure resolution is consistent. The system only works if all the parts follow the same conventions.
SilverStripe wasn't developed with RFC consistent terminology in mind, so try not to get too hung up on terms used internally.
E.g. a link from
http://localhost/about-us/toabout-us/our-staffwould end up ashttp://localhost/about-us/about-us/our-staff.
That is expected behaviour IMHO.
Only without a base tag
Clarifying the docs is a good start.
Although, the documented behaviour itself is confusing for a lot of devs (at least 3 people I've counted so far including myself). And some others think it's not doing what it should do.
If you think that's not a problem I'll close this issue in a week. Otherwise, I'll keep writing down the use cases which we want to cover by the APIs and then I'll come up with some suggestions for refactoring it.
__Please upvote this post if you think it's worth refactoring__
Most helpful comment
Clarifying the docs is a good start.
Although, the documented behaviour itself is confusing for a lot of devs (at least 3 people I've counted so far including myself). And some others think it's not doing what it should do.
If you think that's not a problem I'll close this issue in a week. Otherwise, I'll keep writing down the use cases which we want to cover by the APIs and then I'll come up with some suggestions for refactoring it.
__Please upvote this post if you think it's worth refactoring__