Hi, we have store with around 2 millions of products and we installed the extension to optimize category and search pages.
We had issue with the full reindex time. We increase batch_indexing_size to 10000 and this speeds up the reindex process up to 4 hours.
But the catalog_product index have 15 millions of records inside. We have 23 attributes that are is_searchable.
So I have 2 questions:
First of all, please ensure you are not affected by this (strange) issue with specific versions of MariaDB : https://github.com/Smile-SA/elasticsuite/issues/1621
If that's not the case, there are several other things we'll need to know to help you :
are your metrics collected on a production environment ? If yes, please describe the specs of the servers which are used, and the configuration of each component.
the index size will depend on the number of visible products, and of course on the number of attributes that are used by the search engine (is_searchable but also is_filterable, is_filterable_in_search, is_sortable and even is_used_for_promo_rules are indexed into the ES index)
That being said, we had to deal recently with a large catalog on a client project, and we made some improvements we should be able to implement on a generic manner here.
We hope to release them into the next version.
Regards
Thanks @romainruaud
We are using MySQL 5.7 so we shouldn't be affected from #1621 issue.
The metrics are not collected on a production server. They are collected from server that we use for development purposes. I will find out the specs of the server an will let you know. Which configurations you need and what components?
We will check the attributes and will make our best to reduce attributes.
And last - when you expect to release next version with the improvements for a large catalog?
If you are collecting this metrics on a "small" or "multi-tenant" development server, we may consider them as being either "irrelevant" or "not that bad for a small server". No offense here, but for such size of catalog, it's not a surprise it would take several hours to reindex on a small infrasctructure.
That being said, I'll reconsider according to the server specs. What I'd like to know when running such kind of benchmark is generally :
Regards
The PR #1777 should bring much improvements on indexing speed.
It's merged into master and scheduled for next major version.
@romainruaud Thanks,
Do you know when we can expect this 2.9 version?
As soon as Magento will release the 2.3.5 we should be able to release our 2.9.0
@vpashovski did you test the new version ?
@romainruaud still no, because 2.9.0 is not compatible with 2.3.4. We need to update the site first to 2.3.5 but this is not possible for now.
When we do it I will let you know.
Thanks for asking :)
@romainruaud we did the upgrade and now use Magento 2.3.5-p1 and elasticusite 2.9.0, also we are using Elasticsearch 7.x (Also tested on Elasticsearch 6.8.x).
We used default values for batch_indexing_size and max_parallel_handles
And it looks like the result is worse.
Now the full catalogsearch_fulltext index is running 19 hours.
Before upgrade it was 4 hours.
After that I tried to tune this two parameters but there was no improvements.
We will try to edit Heap parameter of the server.
Can you give some advice what other parameters we can to optimize?
Also I have another question - Is Full reindex running every day automatically? If yes why this is necessary?
Hello @vpashovski,
Are you using the MSI (Multi-Source Inventory) features ?
How many websites/stores do you have ?
The dramatic issue with MSI is that with additional stock/stock sources
inventory_stock_X) with the required MySQL indices.inventory_stock_1) pulling records from the legacy cataloginventory_stock_status table (which will be removed by Magento at some point).And the problem is that there are no indices on a view, so whenever a query is performed with a batch_indexing_size WHERE condition on product ids, it will actually perform a full scan of the cataloginventory_stock_status table.
So, divide your catalog size by your batch_indexing_size : that's how many full scans that will be performed on your 2M cataloginventory_stock_status table ... that hurts.
That hurts especially if you are not actually using multiple stock/stock sources.
The sad story is that, in such a context (no having multiple stock/stock sources) I can no longer recommend to disable the MSI modules. As they are no embedded and enabled by default, pretty much nobody tests Magento without them, and I've encountered multiple legacy cataloginventory bugs on 2.3.4.
What I can recommend you try though, is to try to use our "deprecated" inventory source provider resource model \Smile\ElasticsuiteCatalog\Model\ResourceModel\Product\Indexer\Fulltext\Datasource\Deprecation\InventoryData instead of \Smile\ElasticsuiteCatalog\Model\ResourceModel\Product\Indexer\Fulltext\Datasource\InventoryData.
I used that approach on a 300K catalog reducing by a factor of 6 at least the indexing time (from 12+ hours down to 2h on a undersized platform).
This will require a bit of DI and creating a custom inventory source provider model.
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="urn:magento:framework:ObjectManager/etc/config.xsd">
<!-- Use the deprecated stock data provider to avoid using the "inventory_stock_1" index view which lacks indices and kills performances -->
<!-- Datasources resolver -->
<type name="Smile\ElasticsuiteCore\Index\DataSourceResolver">
<arguments>
<argument name="datasources" xsi:type="array">
<item name="catalog_product" xsi:type="array">
<item name="stock" xsi:type="object">YourProject\ElasticsuiteCatalog\Model\Product\Indexer\Fulltext\Datasource\InventoryData</item>
</item>
</argument>
</arguments>
</type>
<preference for="Smile\ElasticsuiteCatalog\Model\ResourceModel\Product\Indexer\Fulltext\Datasource\InventoryDataInterface"
type="Smile\ElasticsuiteCatalog\Model\ResourceModel\Product\Indexer\Fulltext\Datasource\Deprecation\InventoryData" />
</config>
and YourProject\ElasticsuiteCatalog\Model\Product\Indexer\Fulltext\Datasource\InventoryData consisting of
<?php
/**
* YourProject ElasticsuiteCatalog stock data provider model.
* Reimplemented to use the deprecated resource model which pulls data from the legacy stock index table
* and not from the "inventory_stock_1" view which lacks indexes and kills performances.
*
* @category YourProject
* @package YourProject\ElasticsuiteCatalog
* @author Richard BAYET <[email protected]>
* @copyright 2020 Smile
* @license Open Software License ("OSL") v. 3.0
*/
namespace YourProject\ElasticsuiteCatalog\Model\Product\Indexer\Fulltext\Datasource;
use Smile\ElasticsuiteCore\Api\Index\DatasourceInterface;
use Smile\ElasticsuiteCatalog\Model\ResourceModel\Product\Indexer\Fulltext\Datasource\InventoryDataInterface;
/**
* Datasource used to append inventory data to product during indexing.
*
* @category Smile
* @package Smile\ElasticsuiteCatalog
* @author Romain Ruaud <[email protected]>
*/
class InventoryData implements DatasourceInterface
{
/**
* @var InventoryDataInterface
*/
private $resourceModel;
/**
* Constructor.
*
* @param InventoryDataInterface $resourceModel Resource model
*/
public function __construct(InventoryDataInterface $resourceModel)
{
$this->resourceModel = $resourceModel;
}
/**
* Add inventory data to the index data.
* {@inheritdoc}
*/
public function addData($storeId, array $indexData)
{
$inventoryData = $this->resourceModel->loadInventoryData($storeId, array_keys($indexData));
foreach ($inventoryData as $inventoryDataRow) {
$productId = (int)$inventoryDataRow['product_id'];
$indexData[$productId]['stock'] = [
'is_in_stock' => (bool)$inventoryDataRow['stock_status'],
'qty' => (int)$inventoryDataRow['qty'],
];
}
return $indexData;
}
}
I'll discuss internally with @romainruaud, we might add that "deprecated" model (along with the existing "deprecated" resource model) in the code base so people with a huge catalog and not actually using the MSI multi stock/stock sources features to fallback on the legacy approach for indexing with just the bit of DI on Smile\ElasticsuiteCore\Index\DataSourceResolver.
Regards,
Hello @vpashovski,
Concerning
But the catalog_product index have 15 millions of records inside. We have 23 attributes that are is_searchable.
This is normal : a document/record in a Elasticsearch is any document, including subdocuments.
For products, those sub-documents are the price and category data.


For instance, a single product located in 3 categories on a Magento instance with 4 customer groups will generate 8 documents in the catalog_product index : 1 for the global document, 3 sub-documents for category data and 4 sub-documents for the price data.
One possibility to reduce the total amount of documents per index, one simple solution, which is also beneficial for Magento price indexing performances, is to remove unused customer groups.
Regards,
Hi @rbayet
Thanks for the response.
We have one website with one store.
MSI is enabled but in this moment we are not actually using it. There is ERP system that will manage stocks.
I will try your suggestion.
What about the Full reindex - is it running every day?
No @vpashovski, Elasticsuite does not include a specific cronjob which would run a full reindex every day.
But the catalogsearch_fulltext index has dependencies on the other indices (price, stock, category products), so any daily product update (as you mentioned an ERP system) will schedule the impacted products for a catalogsearch_fulltext partial reindex.
Regards,
This issue was waiting update from the author for too long.
Without any update, we are unfortunately not sure how to resolve this issue.
We are therefore reluctantly going to close this bug for now. Please don't hesitate to comment on the bug if you have any more information for us; we will reopen it right away!
Thanks for your contribution.
Hello @rbayet,
Sorry for the late response :)
We tried your fix and it looks like the speed for catalogsearch_fulltext index is much improved. Now it takes only 1.5 hours to reindex 2 millions of products and before it was about 25+ hours 馃
Now we are testing are there any other issues.
Thanks for your help :)
@vpashovski can I close this one ?
we've got everything needed for memories in @rbayet 's note
Regards
This issue was waiting update from the author for too long.
Without any update, we are unfortunately not sure how to resolve this issue.
We are therefore reluctantly going to close this bug for now. Please don't hesitate to comment on the bug if you have any more information for us; we will reopen it right away!
Thanks for your contribution.
Most helpful comment
As soon as Magento will release the 2.3.5 we should be able to release our 2.9.0