In our setup the sitemap generation build wrong urls for the sitemapindex via cron.
Hello.
I tried to reproduce the issue you have reported.
php -r '$date = date("m/d/Y h:i:s a", time()); echo "Server time is: " . $date . "\r\n"; exit;'Here is my sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:content="http://www.google.com/schemas/sitemap-content/1.0" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url><loc>http://magento216.vg/home</loc><lastmod>2017-07-04T08:48:45+00:00</lastmod><changefreq>hourly</changefreq><priority>0.2</priority></url>
<url><loc>http://magento216.vg/enable-cookies</loc><lastmod>2017-07-04T08:48:43+00:00</lastmod><changefreq>hourly</changefreq><priority>0.2</priority></url>
<url><loc>http://magento216.vg/privacy-policy-cookie-restriction-mode</loc><lastmod>2017-07-04T08:48:43+00:00</lastmod><changefreq>hourly</changefreq><priority>0.2</priority></url>
</urlset>
Please, correct me if you tried in another way.
@TomashKhamlai, just FYI: it feels like this is a duplicate of https://github.com/magento/magento2/issues/5321#issuecomment-310448904, but @christian-forgacs isn't talking about images in his opening post, so I'm not entirely convinced that he is reporting the same bug.
@christian-forgacs are you using NGINX as a server?
@TomashKhamlai yes we're using NGINX as server.
I can also replicate this. The problem happens for me when running the sitemap generation through bin/magento cron:run from outside of the magento directory. e.g our magento code is in /var/www/src, if I run php src/bin/magento cron:run from /var/www the urls will contain src as part of the baseurl (https://www.example.com/src). Running from the magento directory, e.g php bin/magento cron:run from /var/www/src, works as expected.
I believe the problem can be traced back to \Magento\Sitemap\Model\Sitemap::_getStoreBaseDomain, the $storeDomain variable returned is incorrect under the conditions described. I believe it is due to the logic in this condition
Similarly if I run php /var/www/src/bin/magento cron:run from the root of our server the job actually fails with the following message:
Notice: Undefined property: Magento\Sitemap\Model\Observer::$_translateModel in /var/www/src/vendor/magento/module-sitemap/Model/Observer.php
This is due to $documentRoot being empty in the same condition causing this error:
Warning: strpos(): Empty needle in /var/www/src/vendor/magento/module-sitemap/Model/Sitemap.php
@christian-forgacs , thank you for your report.
We were not able to reproduce this issue by following the steps you provided.
Please provide more details regarding your environment, or try to reproduce this
issue on a clean installation.
@christian-forgacs, thank you for your report.
We were not able to reproduce this issue by following the steps you provided. If you'd like to update it, please reopen the issue.
I can confirm this happens when the sitemap itself is an index file that contains sitemap children (in shops with 50k+ pages).
So if I create a sitemap.xml file under the /pub/ path, its address will be
/pub/sitemap.xml
but it will contain sitemap children such as
http://example.com/websites/example.com/public_html/pub/sitemap-1-1.xml
http://example.com/websites/example.com/public_html/pub/sitemap-1-2.xml
...
which aren't valid paths.
Hope that helps!
I can also confirm that this is an issue with the sitemap index.
The issue can be tracked back to \Magento\Sitemap\Model\Sitemap::_getDocumentRoot.
When you run as cron $this->_request->getServer('DOCUMENT_ROOT') will be empty, and realpath with an empty input will return the path where the cron is starting. Normally this is the home directory for the user running the cron job.
A workaround would be to change the cron job like this:
* * * * * cd /path/to/magento/root; /usr/bin/php /path/to/magento/root/bin/magento cron:run
And if you are running you document root inside pub:
* * * * * cd /path/to/magento/root/pub; /usr/bin/php /path/to/magento/root/bin/magento cron:run
More errors:
1) the URLs to products inside the sitemap files are not fully SEO => remove index.php from URLs
2) images must point to the source image path and not to the cache folder
Some products and categories have unfriendly URLs:
<loc>
https://piezas-portatiles.com/index.php/catalog/product/view/id/221071
</loc>
https://github.com/magento/magento2/issues/9440#issuecomment-351671697
above solution is work for me
Most helpful comment
I can confirm this happens when the sitemap itself is an index file that contains sitemap children (in shops with 50k+ pages).
So if I create a sitemap.xml file under the /pub/ path, its address will be
/pub/sitemap.xml
but it will contain sitemap children such as
http://example.com/websites/example.com/public_html/pub/sitemap-1-1.xml
http://example.com/websites/example.com/public_html/pub/sitemap-1-2.xml
...
which aren't valid paths.
Hope that helps!