Magento2: sitemap.xml build wrong urls for sitemapindex via cron

Created on 28 Apr 2017  路  12Comments  路  Source: magento/magento2

In our setup the sitemap generation build wrong urls for the sitemapindex via cron.

Preconditions

  1. Magento 2.1.6
  2. PHP 7.0.8
  3. Home of www-data user (execute the cron) is /var/www/
  4. Sitemap generation via cron is activated

    • Generation over Magento 2 Admin Panel works correct

    • Generation over Magento 2 Cron generate wrong urls

Steps to reproduce

  1. Start sitemap generation via cron

Expected result

  1. Normally the urls in sitemapindex are correct and used from Magento 2 main directory.

Actual result

  1. In the sitemapindex via cron the complete path from the home directory of the www-data user to the Magento 2 directory is set.
Cannot Reproduce Clear Description Format is valid needs update bug report

Most helpful comment

I can confirm this happens when the sitemap itself is an index file that contains sitemap children (in shops with 50k+ pages).

So if I create a sitemap.xml file under the /pub/ path, its address will be

/pub/sitemap.xml

but it will contain sitemap children such as

http://example.com/websites/example.com/public_html/pub/sitemap-1-1.xml
http://example.com/websites/example.com/public_html/pub/sitemap-1-2.xml
...

which aren't valid paths.

Hope that helps!

All 12 comments

Hello.
I tried to reproduce the issue you have reported.

Steps to reproduce:

  1. Execute this to get the time on the server: php -r '$date = date("m/d/Y h:i:s a", time()); echo "Server time is: " . $date . "\r\n"; exit;'
  2. Create the directory 'sitemap' in the root folder of your site.
  3. Navigate to Stores -> Settings -> Configuration -> Catalog -> XML Sitemap.
  4. Expand 'Generation' panel.
  5. Change 'Enable' to 'Yes'.
  6. Change 'Start Time' to 10 minutes later than the time from the step 1.
  7. Select 'Hourly' from 'Frequency' dropdown in 'Products Options', 'Categories Options' and 'CMS Pages Options'.
  8. Save config.
  9. Navigate to Marketing -> SEO & Search -> Site Map.
  10. Press 'Add Sitemap'.
  11. Change 'Filename' to 'sitemap.xml' and 'Path' to '/sitemap/'
  12. Press 'Save and Generate'.
  13. Wait for the cron generation to be executed by the time you specified in step 6.

Here is my sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:content="http://www.google.com/schemas/sitemap-content/1.0" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url><loc>http://magento216.vg/home</loc><lastmod>2017-07-04T08:48:45+00:00</lastmod><changefreq>hourly</changefreq><priority>0.2</priority></url>
<url><loc>http://magento216.vg/enable-cookies</loc><lastmod>2017-07-04T08:48:43+00:00</lastmod><changefreq>hourly</changefreq><priority>0.2</priority></url>
<url><loc>http://magento216.vg/privacy-policy-cookie-restriction-mode</loc><lastmod>2017-07-04T08:48:43+00:00</lastmod><changefreq>hourly</changefreq><priority>0.2</priority></url>
</urlset>

Please, correct me if you tried in another way.

@TomashKhamlai, just FYI: it feels like this is a duplicate of https://github.com/magento/magento2/issues/5321#issuecomment-310448904, but @christian-forgacs isn't talking about images in his opening post, so I'm not entirely convinced that he is reporting the same bug.

@christian-forgacs are you using NGINX as a server?

@TomashKhamlai yes we're using NGINX as server.

I can also replicate this. The problem happens for me when running the sitemap generation through bin/magento cron:run from outside of the magento directory. e.g our magento code is in /var/www/src, if I run php src/bin/magento cron:run from /var/www the urls will contain src as part of the baseurl (https://www.example.com/src). Running from the magento directory, e.g php bin/magento cron:run from /var/www/src, works as expected.

I believe the problem can be traced back to \Magento\Sitemap\Model\Sitemap::_getStoreBaseDomain, the $storeDomain variable returned is incorrect under the conditions described. I believe it is due to the logic in this condition

Similarly if I run php /var/www/src/bin/magento cron:run from the root of our server the job actually fails with the following message:

Notice: Undefined property: Magento\Sitemap\Model\Observer::$_translateModel in /var/www/src/vendor/magento/module-sitemap/Model/Observer.php

This is due to $documentRoot being empty in the same condition causing this error:

Warning: strpos(): Empty needle in /var/www/src/vendor/magento/module-sitemap/Model/Sitemap.php

@christian-forgacs , thank you for your report.
We were not able to reproduce this issue by following the steps you provided.

Steps to reproduce with NGINX and Magento version 2.1.6 :

  1. Execute this to get the time on the server: php -r '$date = date("m/d/Y h:i:s a", time()); echo "Server time is: " . $date . "\r\n"; exit;'
  2. Create the directory 'sitemap' in the root folder of your site.
  3. Navigate to Stores -> Settings -> Configuration -> Catalog -> XML Sitemap.
  4. 'Generation' -> Change 'Enable' to 'Yes'.
  5. Change 'Start Time' to 2 minutes later than the time from the step 1.
  6. Save config. Flush cache.
  7. Navigate to Marketing -> SEO & Search -> Site Map. Press 'Add Sitemap'.
  8. Change 'Filename' to 'sitemap.xml' and 'Path' to '/sitemap/'.
  9. Press 'Save'.
  10. Wait for time from 1 step + 2 min and run command php magento2ce/bin/magento cron:run && magento2ce/bin/magento cron:run several times.

Please provide more details regarding your environment, or try to reproduce this
issue on a clean installation.

@christian-forgacs, thank you for your report.
We were not able to reproduce this issue by following the steps you provided. If you'd like to update it, please reopen the issue.

I can confirm this happens when the sitemap itself is an index file that contains sitemap children (in shops with 50k+ pages).

So if I create a sitemap.xml file under the /pub/ path, its address will be

/pub/sitemap.xml

but it will contain sitemap children such as

http://example.com/websites/example.com/public_html/pub/sitemap-1-1.xml
http://example.com/websites/example.com/public_html/pub/sitemap-1-2.xml
...

which aren't valid paths.

Hope that helps!

I can also confirm that this is an issue with the sitemap index.
The issue can be tracked back to \Magento\Sitemap\Model\Sitemap::_getDocumentRoot.
When you run as cron $this->_request->getServer('DOCUMENT_ROOT') will be empty, and realpath with an empty input will return the path where the cron is starting. Normally this is the home directory for the user running the cron job.

A workaround would be to change the cron job like this:

* * * * * cd /path/to/magento/root; /usr/bin/php /path/to/magento/root/bin/magento cron:run

And if you are running you document root inside pub:

* * * * * cd /path/to/magento/root/pub; /usr/bin/php /path/to/magento/root/bin/magento cron:run

More errors:
1) the URLs to products inside the sitemap files are not fully SEO => remove index.php from URLs
2) images must point to the source image path and not to the cache folder

https://piezas-portatiles.com/index.php/bateria-para-acer-aspire-e1-522-e1-530-li-ion-14-8v-2600mah-bt28.html2017-05-31T08:30:48+00:00daily1.0https://piezas-portatiles.com/pub/media/catalog/product/cache/03b00c63a37940b60c758eb1601e45c8/b/t/bt28.jpgBateria para Acer ASPIRE E1-522 E1-530 Li-ion 14,8v 2600mAh BT28https://piezas-portatiles.com/pub/media/catalog/product/cache/03b00c63a37940b60c758eb1601e45c8/b/t/bt28-1.jpgBateria para Acer ASPIRE E1-522 E1-530 Li-ion 14,8v 2600mAh BT28https://piezas-portatiles.com/pub/media/catalog/product/cache/03b00c63a37940b60c758eb1601e45c8/b/t/bt28-2.jpgBateria para Acer ASPIRE E1-522 E1-530 Li-ion 14,8v 2600mAh BT28https://piezas-portatiles.com/pub/media/catalog/product/cache/03b00c63a37940b60c758eb1601e45c8/b/t/bt28-3.jpgBateria para Acer ASPIRE E1-522 E1-530 Li-ion 14,8v 2600mAh BT28https://piezas-portatiles.com/pub/media/catalog/product/cache/03b00c63a37940b60c758eb1601e45c8/b/t/bt28-4.jpgBateria para Acer ASPIRE E1-522 E1-530 Li-ion 14,8v 2600mAh BT28

Some products and categories have unfriendly URLs:

<loc>
https://piezas-portatiles.com/index.php/catalog/product/view/id/221071
</loc>

https://github.com/magento/magento2/issues/9440#issuecomment-351671697

above solution is work for me

Was this page helpful?
0 / 5 - 0 ratings