Joomla-cms: [4.0] Dramatically Improve SEO -> New SEF canonical URL options

Created on 6 Jul 2019  路  21Comments  路  Source: joomla/joomla-cms

Is your feature request related to a problem? Please describe.

The Joomla SEF plugin has a canonical URL feature that seems to be both poorly documented (it doesn't even say the word "canonical" in the Joomla backend) and it is implemented in a way that will create more SEO problems than it solves.

My suggestions below will not make this perfect for all websites, but it should dramatically improve SEO for a majority of Joomla websites that are using the built-in SEF URL functionality. And for websites that use 3rd party components for ecommerce or publishing, those users will likely be using a more powerful URL management system anyway (SH404SEF, etc).

This feature request is to simply fix a pretty big SEO issue that the Joomla Core code creates.

Describe the solution you'd like

Add three new options to the SEF Plugin to give website owners more control over their canonical URLs. Screenshot attached.

Simpler solution:

  1. Remove query parameters from Canonical URLs
  2. Canonicalise index.php to home
  3. Enforce a trailing slash on SEF URLs

More complex solution:

  1. Remove query parameters from canonical URLs, but give users the ability to whitelist / blacklist certain query parameters. More flexible, but more difficult to write and probably more error prone for users.
  2. Canonicalise index.php to home
  3. Enforce a trailing slash on SEF URLs

Additional context

If a Joomla user has SEF URLs and rewrites turned off, then they do not need this option.

But with mod_rewrite on and SEF URLs enabled, most query parameters on a standard Joomla installation are for analytics or sorting/filtering. (to my knowledge)

More details on removing query parameters
Canonical URLs are instructions to search engines to collapse their index of duplicate or very similar URLs into one. This can have HUGE impacts on SEO rankings and website traffic. But under the current Joomla implementation, the following URLs would all canonicalise to themselves, telling search engines they are all unique, despite the content on page being identical:

https://www.example.com/
https://www.example.com/?utm_source=organic
https://www.example.com/?fbclid=XXXXX
https://www.example.com/?view=remind

More details on canonicalising index.php
Currently, Joomla can render these URLs with SEF turned on, and the content is identical. They should be canonicalised to the homepage URL.

https://www.example.com/
https://www.example.com/index.php

This also has some considerations on multi-language sites, where you get URLs like this:

https://www.example.com/index.php
https://www.example.com/es/index.php

These would need to canonicalise to:

https://www.example.com/
https://www.example.com/es/

More details on the option to enforce a trailing slash
Currently, Joomla can render these URLs with SEF turned on, and the content is identical. A website should adhere to a standard pattern and 301 redirect to the desired pattern to minimise duplicate content

https://www.example.com/category/my-article
https://www.example.com/category/my-article/

If the option is enabled, then the first URL is 301 redirected to the second URL.

Please consider these enhancements. URL management is tricky, but it is also critical for success in SEO and driving free traffic.

Screen Shot 2019-07-06 at 18 05 44

J4 Issue No Code Attached Yet

Most helpful comment

This all makes sense to me. I think this would be a great addition.

All 21 comments

Please can a Bug Squad Team-Member answer as this Issue is opened more than 12 hours ago?

This all makes sense to me. I think this would be a great addition.

Joomla! core does not use trailing slash.

@SharkyKZ How do you mean?

I can access https://www.joomla.org/announcements.html/ the same as https://www.joomla.org/announcements.html and they both display the same page.

On a client side I fixed this exact issue the other day.

You can access URLs with trailing slashes but Joomla! core (except Language Switcher on homepage, but that seems to be unintentional) generates URLs without trailing slashes. If anything was to be enforced, it should be URLs without slashes. Unless we actually want to change URL structure to include a trailing slash for some reason.

The present so-called "canonical" setting in Joomla SEF has nothing to do with the real use of a canonical. To make it shortly: it does not deal at all with duplicate urls in the same domain/site and was never designed for that. It just redirects to another domain.
The Tip is clear.
PLG_SEF_DOMAIN_DESCRIPTION="If your site can be accessed through more than one domain enter the preferred (sometimes referred to as canonical) domain here. <br /><strong>Note:</strong> https://example.com and https://www.example.com are different domains."

(except Language Switcher on homepage, but that seems to be unintentional)

I think it is intentional as
mysite.com/en
is not at all equal to
mysite.com/en/
because en is the lang sef prefix (for en-GB) and it is the only way to differentiate in code the language used for the Home page.

It loads the same page though. Also hreflang does not contain a trailing slash:

<link href="http://localhost/index.php/de" rel="alternate" hreflang="de-DE">
<link href="http://localhost/index.php/en" rel="alternate" hreflang="en-GB">
<link href="http://localhost/index.php/en" rel="alternate" hreflang="x-default">

It does in 3.x, taking into account that here remove url language code is set for en-GB as site default language

<link href="http://localhost:8888/installmulti/trunkgitnew/fr/" rel="alternate" hreflang="fr-FR" />
    <link href="http://localhost:8888/installmulti/trunkgitnew/" rel="alternate" hreflang="en-GB" />
    <link href="http://localhost:8888/installmulti/trunkgitnew/it/" rel="alternate" hreflang="it-IT" />

So there were some changes in 4.0 and I think this could be an issue.
Needs further testing with cookie, default lang browser, etc.

RE: Joomla does not use a trailing slash: agreed this enforcement could be to remove the trailing slash. The SEO advantage to having a trailing slash is it signals a clear directory-type URL structure, but management of this becomes more difficult if users have enabled URL suffix, like ".html". Generally, you wouldn't want a URL to end with "alias.html/", it's odd formatting. Some people are very particular about their URL structure, so maybe giving people options is best: Do nothing, enforce trailing slash, remove trailing slash. So long as there is a 301 redirect in place, it solves issues of duplicate content that can happen when URLs get malformed by bugs, plugins, user error, etc.

RE: the current canonical implementation being intended for cross-domain canonicalization. If that's the case, its still falls very short of solving duplicate content problems. When you say "Redirects to another domain" what do you mean? Does it actually trigger a 301 redirect under some scenarios?

RE: Trailing slashes in HREFLANG tags on the homepage. I see a trailing slash in HREFLANG on my personal website when accessing the homepage. At least on my install, the configuration appears correct, but you might have found an issue under a different config. And if I navigate to the index.php page, the HREFLANGS reference the correct value while the canonical does not.

BROWSER REQUESTS: https://www.example.com/index.php
CANONICAL: https://www.example.com/index.php
HREFLANG EN: https://www.example.com/
HREFLANG TH: https://www.example.com/th/

RE: the current canonical implementation being intended for cross-domain canonicalization. If that's the case, its still falls very short of solving duplicate content problems. When you say "Redirects to another domain" what do you mean? Does it actually trigger a 301 redirect under some scenarios?

I badly expressed myself.
It is not a redirect _per se_. It just adds a canonical url where the domain is the domain entered in the field, telling search engines that the "real" domain to crawl is the "canonical" one.
It expects both domain to have exactly the same structure.

Ok, very interesting that this feature was initially added to solve cross-domain canonical issues. But I think the fact remains that this implementation is still problematic: it only solves a very narrow issue (cross-domain duplicate content), while creating new duplicate content issues by excluding critical options in the logic (ability to remove/whitelist/blacklist query parameters). Given the broad use of marketing and analytic tracking parameters in URLs, this is a pretty important concept to deal with and would be impactful for users trying to configure SEO options.

I see a trailing slash in HREFLANG on my personal website when accessing the homepage.

Yes, because you use 3.x and not 4.0

And if I navigate to the index.php page, the HREFLANGS reference the correct value while the canonical does not.

It does here, even in multilang (3.x)
<link href="http://anotherdomain.org/installmulti/trunkgitnew/fr/" rel="canonical" />

So, Yes, Joomla has no real code to deal with true canonical.
Normally, with the new routing in 4.0, we should get much less duplicates though if not none (not sure).

@SharkyKZ
Looks like I misunderstood your post about the switcher.

@infograf768 I'm not familiar with the new routing, but if there is no canonical URL with proper logic to remove query parameters from SEF URLs, then the duplicate content problem still exists and would be a huge thing to address for the platform.

I suggest you test 4.0 (php 7.2 minimum) https://developer.joomla.org/nightly-builds.html

@rhotog we can add a canonical tag. At least to some pages like articles.

@SharkyKZ

we can add a canonical tag. At least to some pages like articles.

Not that I know of with default core. Can you explain?

To get one I had to do this hack for articles (to be sure I did not enter anything in the sef domain field)

diff --git a/administrator/components/com_content/models/forms/article.xml b/administrator/components/com_content/models/forms/article.xml
index 422206b..ead0f98 100644
--- a/administrator/components/com_content/models/forms/article.xml
+++ b/administrator/components/com_content/models/forms/article.xml
@@ -602,4 +602,19 @@
                size="25" 
            />
+
+           <field
+               name="spacer3"
+               type="spacer"
+               hr="true"
+           />
+
+           <field
+               name="canonical"
+               type="url"
+               label="JCANONICAL"
+               validate="url"
+               filter="url"
+               relative="false"
+           />
        </fieldset>

diff --git a/administrator/language/en-GB/en-GB.ini b/administrator/language/en-GB/en-GB.ini
index f9d8e7f..985afa4 100644
--- a/administrator/language/en-GB/en-GB.ini
+++ b/administrator/language/en-GB/en-GB.ini
@@ -57,4 +57,5 @@
 JASSOCIATIONS_DESC="Associations descending"
 JCANCEL="Cancel"
+JCANONICAL="Canonical URL"
 JCATEGORIES="Categories"
 JCATEGORY="Category"
diff --git a/components/com_content/views/article/tmpl/default.php b/components/com_content/views/article/tmpl/default.php
index ad21b37..9b0c420 100644
--- a/components/com_content/views/article/tmpl/default.php
+++ b/components/com_content/views/article/tmpl/default.php
@@ -169,5 +169,10 @@
    ?>
    <?php endif; ?>
-   <?php // Content is generated by content plugin event "onContentAfterDisplay" ?>
+   <?php
+   if (!empty($params->get('canonical')))
+   {
+       JFactory::getApplication()->getDocument()->addHeadLink(htmlspecialchars($params->get('canonical')), 'canonical');
+   }
+    // Content is generated by content plugin event "onContentAfterDisplay" ?>
    <?php echo $this->item->event->afterDisplayContent; ?>
 </div>

It is a hack as I just entered the resulting sef link picked from frontend:

Screen Shot 2019-07-12 at 17 25 21

It should be done I guess with a JRoute and the non sef link in the field.

We can use whatever we normally use to generate item links to get canonical URLs. Once such link is inserted it will stay the same even if the page is access from different URLs. See #18341 for example.

No, we can not set a canonical in our code.

First of all about the canonical link: This is NOT supposed to be on every page. The canonical URL is the URL that a page should be accessed by and if the page is rendered under a different URL, that different URL should have a canonical tag that links to the correct URL. But a page should NOT contain a canonical that just points to itself.

Regarding our canonical implementation: We, as Joomla core developer, have no way to know if the current page is supposed to be the canonical URL. Thus we can't set that canonical URL correctly. A site integrator would know the right URLs and could thus code something for the site he is working on, but for us it is not possible. A component itself could implement a canonical link behavior, but again, this is not something that I see us as Joomla core doing and instead would point to custom/third party solutions.

Was this page helpful?
0 / 5 - 0 ratings