Magento2: Why "search_tmp_" used? and whats use of it?

Created on 26 Dec 2019  路  27Comments  路  Source: magento/magento2

15545

https://magento.stackexchange.com/questions/143420/magento2-search-tmp-join-leads-to-empty-results-in-category-listing

Preconditions

There are lot of issues with Magento, biggest is performance issue at category/search.
Main issue is creating dynamic tables on run time and causing lot of issues.

  1. create search_tmp temp table when click on category/search
  2. it creates more than 12 or 24 tables for every search you do for any keyword or sometime it goes more than 40.

Steps to reproduce (*)

  1. take any magento2 website and search for any product
  2. it creates dynamic tables and drop it after search is done

Expected result (*)

  1. it should not create temp tables on run time and causing website down

Actual result (*)

  1. it should not create temp tables on run time

questions:

  1. why search_tmp tables introduced in Magento2 , what is reason?
  2. lets says it has been introduced but why is it generated at website run time , in frontend or when you search for product?
  3. why same Magento1 approach not used? is it something being used with scheduling?
  4. How can we not use search_tmp tables?
  5. There are so many queries regarding this, why is it not being answered? is it resolved in latest versions?
  6. Any actions had been taken in Magento2.3.3?
  7. Why it is being used when customer searching products and creating more than 12 tables in single page?
  8. Lets says if search_tmp not completed, page goes to not responding, it ends up creating sleep queries. queries are not being killed which supposed to kill after page render.
  9. is this happening with Elastic search? I dont think so, because I could see issue even after enabling mysql search.

Please let all questions answered because all customers facing this issue. customers are running huge servers to keep website up and running.

Please help.

Thanks
nagaraj

CatalogSearch Clear Description Confirmed Format is valid Ready for Work done Reproduced on 2.4.x S2 Dev.Experience

Most helpful comment

@engcom-Hotel, it doesn't mean that if the class is marked as deprecated that it is no longer used within Magento.
I can still find at least 3 references in Magento's CatalogSearch module where it is being used.

Also be aware that not every single Magento user is able to run ElasticSearch on their hosting, a lot of Magento users will still use Mysql as the search engine.

I'm reopening, because it was closed for the wrong reason.

All 27 comments

Hi @nagrgk. Thank you for your report.
To help us process this issue please make sure that you provided the following information:

  • [ ] Summary of the issue
  • [ ] Information on your environment
  • [ ] Steps to reproduce
  • [ ] Expected and actual results

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento give me 2.4-develop instance - upcoming 2.4.x release

For more details, please, review the Magento Contributor Assistant documentation.

@nagrgk do you confirm that you were able to reproduce the issue on vanilla Magento instance following steps to reproduce?

  • [ ] yes
  • [ ] no

yes, it is reproducible in vanilla Magento

Hi @engcom-Hotel. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

  • [ ] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.
  • [ ] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • [ ] 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • [ ] 4. Verify that the issue is reproducible on 2.4-develop branch

    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!

  • [ ] 5. Add label Issue: Confirmed once verification is complete.

  • [ ] 6. Make sure that automatic system confirms that report has been added to the backlog.

Hello @nagrgk
I am not able to reproduce this issue on a fresh Magento 2.4-develop because search_tmp_ table usage was deprecated.
image
Magento is using search_query table now:
image

So i have to close this issue as not relevant.

Thanks for your report!

@engcom-Hotel, it doesn't mean that if the class is marked as deprecated that it is no longer used within Magento.
I can still find at least 3 references in Magento's CatalogSearch module where it is being used.

Also be aware that not every single Magento user is able to run ElasticSearch on their hosting, a lot of Magento users will still use Mysql as the search engine.

I'm reopening, because it was closed for the wrong reason.

@engcom-Hotel
2.4 is long way to go. search_temp queries are killing stores.

Before you justify, please think about the performance and business use cases always.
here people are very eager to close tickets and not take any actions and not thinking about the end users.

@hostep @nagrgk
i have to discuss with my team lead about this issue, because we cannot just confirm issues that are not reproducible on the Magento 2.4-develop.

@engcom-Hotel
looks this is perfect reply. taking time and investigating is always best than just closing.

thanks

Hi @engcom-Golf. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

  • [ ] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.
  • [ ] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • [ ] 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • [ ] 4. Verify that the issue is reproducible on 2.4-develop branch

    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!

  • [ ] 5. Add label Issue: Confirmed once verification is complete.

  • [ ] 6. Make sure that automatic system confirms that report has been added to the backlog.

:white_check_mark: Confirmed by @engcom-Hotel
Thank you for verifying the issue. Based on the provided information internal tickets MC-30417 were created

Issue Available: @engcom-Hotel, _You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself._

Hello @hostep @nagrgk
Sorry for my mistake. I did not understand the description correctly.
Now we found a problem.

Thanks for collaboration!

Is there any fix or patch so far, Our productions is getting killed atleast once a day.

This is the query we are being slammed with increasing the DB CPU and connection count

SELECT e . * , cat_index . position AS cat_index_position , price_index . price , price_index . tax_class_id , price_index . final_price , IF ( price_index . tier_price IS NOT ? , LEAST ( price_index . min_price , price_index . tier_price ) , price_index . min_price ) AS minimal_price , price_index . min_price , price_index . max_price , price_index . tier_price , IFNULL ( review_summary . reviews_count , ? ) AS reviews_count , IFNULL ( review_summary . rating_summary , ? ) AS rating_summary , stock_status_index . stock_status AS is_salable FROM catalog_product_entity AS e INNER JOIN catalog_category_product_index_store2 AS cat_index ON cat_index . product_id = e . entity_id AND cat_index . store_id = ? AND cat_index . visibility IN ( ?+ ) AND cat_index . category_id = ? INNER JOIN catalog_product_index_price AS price_index ON price_index . entity_id = e . entity_id AND price_index . website_id = ? AND price_index . customer_group_id = ? LEFT JOIN review_entity_summary AS review_summary ON e . entity_id = review_summary . entity_pk_value AND review_summary . store_id = ? AND review_summary . entity_type = ( SELECT review_entity . entity_id FROM review_entity WHERE ( entity_code = ? ) ) LEFT JOIN cataloginventory_stock_status AS stock_status_index ON e . entity_id = stock_status_index . product_id AND stock_status_index . website_id = ? AND stock_status_index . stock_id = ? INNER JOIN search_tmp_5e60fd001dca54_28934909 AS search_result ON e . entity_id = search_result . entity_id ORDER BY cat_index . position ASC , e . entity_id DESC LIMIT ?

Using ES may help. Or even better to profile the application and eliminate redundant queries on DB
dxebug dumps
this profiler
https://github.com/NoiseByNorthwest/php-spx
or xdebug
removing join from that table would result with empty filters in layered navigation

@ilnytskyi We are using elastic search, but not sure what went wrong. We get same error. The above query is hitting with DB again whenever we lift up maintenance page

Can you lead me to a possible fix?

@magento-engcom-team @engcom-Hotel Any insight on the above issue?

@nagrgk @arunelangovan
We've experienced the issue with the queries INSERT INTO search_tmp_* overloading the MySQL database hosted at AWS. Normally, such queries execute very quickly in under ~10ms. However, occasionally dramatic spikes are observed in the New Relic APM.
https://twitter.com/SergiiShymko/status/1045332638989312001?s=20
DoHDkDbU8AEhqKR

The root cause turned out to be exceeding the Write IOPS limit of RDS database. By default, RDS databases have a General Purpose (SSD) storage. Such databases accrue the IOPS credits at a rate dependent on the storage size. For a 100G database, the limit is 300 Write IOPS. If the usage exceeds the credits accumulation rate, all credits will be used up and AWS will artificially limit the database throughput. Performance will be restored once enough credits have been accumulated. You can see your Write IOPS stats in the RDS monitoring charts.
aws_rds_write_iops_limit

The solution is to upgrade the storage to the Provisioned IOPS. It costs $100/mo for 1000 IOPS. That's a reasonable expense to guarantee reliable handling of the elevated traffic levels.
aws_rds_provisioned_iops

Related bug #9807

@sshymko Thanks for the explanation

hey @nagrgk,
the main reason why we have dynamically generated search_tmp table is the way how Magento search works right now, and in particular that internally our Search API returns the only IDs of matched documents, but not documents by themselves.

The simplified algorithm is quite simple.

  1. User makes request specifying needed filters and providing search phrase
  2. Magento builds either MySQL (deprecated since 2.4.0) or Elasticsearch Full-Text Search query and processes this request. As a result a bunch (up to 10 000 results) of document IDs which correspond to the search query is returned
  3. Then Magento has to refine data for those entities (currently we support only product search). So, the system needs to retrieve data for the bunch of product IDs above. There are two options here:
    a. Make a query similar toSelect ... from catalog_product_entity where entity_ID in (lis_of_ids_here)
    b. Store the IDs into a temporary table and then make a query with join, similar to select .. from catalog_product_entity inner join search_tmp where catalog_product_entity.entity_id = search_tmp.entiry_id

The last option is more efficient if the number of returned documents is high because MySQL would use an index for joining tables, while in case of the query with IN (many_ids) - most probably query optimizer will decide to end up with disk scan.

The behavior is the same for both adapters MySQL and Elasticsearch because currently both of them follow the same API (read return IDs of documents found, but not data itself).

Regarding the spikes you mentioned @sshymko, we noticed similar spikes on some installations before. It was really hard to reproduce those spikes while testing, and as far as I remember those issues were specific to particular MySQL/Maria version and been solved with an upgrade.

The issue may be not actual as MySQL was removed from Magento version 2.4.0 and Elasticsearch should be used instead

Hi @gabrieldagama. Thank you for working on this issue.
Looks like this issue is already verified and confirmed. But if you want to validate it one more time, please, go though the following instruction:


    1. Add/Edit Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

    1. Verify that the issue is reproducible on 2.4-develop branch
      Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
      - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
      - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!
    1. If the issue is not relevant or is not reproducible any more, feel free to close it.

Closing this issue because it is not actual since MySQL was removed from Magento version 2.4.0 and Elasticsearch should be used instead.

I believe all the comments above state that the issue is still present even when you have Elasticsearch enabled. It's also reproduced in 2.4

Hi @hws47a, I've done some investigation on this before closing, but maybe I'm missing something.

As far as I understood Magento\Framework\Search\Adapter\Mysql\TemporaryStorage is being used by Magento\CatalogSearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier which is an implementation of Magento\CatalogSearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplierInterface and it's preference is being overwriten by ElasticSearch's implementation Magento\Elasticsearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier.

On my investigation, from what I could see, the ids that come from ElasticSearch are filtered with an e.entity_id IN (?) condition, as shown on Magento\Elasticsearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier line 73, as a result we are getting the following query:

SELECT `e`.*,
       `price_index`.`price`,
       `price_index`.`tax_class_id`,
       `price_index`.`final_price`,
       IF(price_index.tier_price IS NOT NULL, Least(price_index.min_price,
                                              price_index.tier_price),
       price_index.min_price)                   AS `minimal_price`,
       `price_index`.`min_price`,
       `price_index`.`max_price`,
       `price_index`.`tier_price`,
       Ifnull(review_summary.reviews_count, 0)  AS `reviews_count`,
       Ifnull(review_summary.rating_summary, 0) AS `rating_summary`,
       `stock_status_index`.`stock_status`      AS `is_salable`
FROM   `catalog_product_entity` AS `e`
       INNER JOIN `catalog_product_index_price` AS `price_index`
               ON price_index.entity_id = e.entity_id
                  AND price_index.customer_group_id = 0
                  AND price_index.website_id = '1'
       INNER JOIN `catalog_product_website` AS `product_website`
               ON product_website.product_id = e.entity_id
                  AND product_website.website_id = 1
       LEFT JOIN `review_entity_summary` AS `review_summary`
              ON e.entity_id = review_summary.entity_pk_value
                 AND review_summary.store_id = 1
                 AND review_summary.entity_type = (SELECT
                     `review_entity`.`entity_id`
                                                   FROM   `review_entity`
                                                   WHERE  ( entity_code =
                                                            'product' ))
       INNER JOIN `cataloginventory_stock_status` AS `stock_status_index`
               ON e.entity_id = stock_status_index.product_id
                  AND stock_status_index.website_id = 0
                  AND stock_status_index.stock_id = 1
WHERE  ( stock_status_index.stock_status = 1 )
       AND ( e.entity_id IN ( 1 ) )
ORDER  BY Field(e.entity_id, 1) 

I've tried to find any usage of this temporary table but wasn't able to. Can you please provide more details on why you believe this is still actual?

Thanks

Closing this issue again, please, if you find evidence that the search_tmp is being used provide more details here and re-open the issue.

Thank you.

Hi, on magento 2.4.1, tmp table is still used, see logged query in catalog page:

SELECT COUNT(DISTINCT e.entity_id)
FROM `catalog_product_flat_1` AS `e`
INNER JOIN `catalog_product_index_price` AS `price_index` ON price_index.entity_id = e.entity_id AND price_index.customer_group_id = 0 AND price_index.website_id = '1'
INNER JOIN `catalog_category_product_index_store1` AS `cat_index` ON cat_index.product_id=e.entity_id AND cat_index.store_id=1 AND cat_index.visibility IN(3, 4) AND cat_index.category_id=1997
INNER JOIN `cataloginventory_stock_status` AS `stock_status_index` ON e.entity_id = stock_status_index.product_id
INNER JOIN `search_tmp_5fa7aa3ad505c7_73512525` AS `search_result` ON e.entity_id = search_result.entity_id
WHERE (stock_status_index.stock_status = 1)

EDIT:
actually this seems to be linked to my new theme which includes some modules and somehow still use search_tmp table :-(
With LUMA them, I don't see search_tmp in queries aymore

Hi @hws47a, I've done some investigation on this before closing, but maybe I'm missing something.

As far as I understood Magento\Framework\Search\Adapter\Mysql\TemporaryStorage is being used by Magento\CatalogSearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier which is an implementation of Magento\CatalogSearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplierInterface and it's preference is being overwriten by ElasticSearch's implementation Magento\Elasticsearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier.

On my investigation, from what I could see, the ids that come from ElasticSearch are filtered with an e.entity_id IN (?) condition, as shown on Magento\Elasticsearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier line 73, as a result we are getting the following query:

SELECT `e`.*,
       `price_index`.`price`,
       `price_index`.`tax_class_id`,
       `price_index`.`final_price`,
       IF(price_index.tier_price IS NOT NULL, Least(price_index.min_price,
                                              price_index.tier_price),
       price_index.min_price)                   AS `minimal_price`,
       `price_index`.`min_price`,
       `price_index`.`max_price`,
       `price_index`.`tier_price`,
       Ifnull(review_summary.reviews_count, 0)  AS `reviews_count`,
       Ifnull(review_summary.rating_summary, 0) AS `rating_summary`,
       `stock_status_index`.`stock_status`      AS `is_salable`
FROM   `catalog_product_entity` AS `e`
       INNER JOIN `catalog_product_index_price` AS `price_index`
               ON price_index.entity_id = e.entity_id
                  AND price_index.customer_group_id = 0
                  AND price_index.website_id = '1'
       INNER JOIN `catalog_product_website` AS `product_website`
               ON product_website.product_id = e.entity_id
                  AND product_website.website_id = 1
       LEFT JOIN `review_entity_summary` AS `review_summary`
              ON e.entity_id = review_summary.entity_pk_value
                 AND review_summary.store_id = 1
                 AND review_summary.entity_type = (SELECT
                     `review_entity`.`entity_id`
                                                   FROM   `review_entity`
                                                   WHERE  ( entity_code =
                                                            'product' ))
       INNER JOIN `cataloginventory_stock_status` AS `stock_status_index`
               ON e.entity_id = stock_status_index.product_id
                  AND stock_status_index.website_id = 0
                  AND stock_status_index.stock_id = 1
WHERE  ( stock_status_index.stock_status = 1 )
       AND ( e.entity_id IN ( 1 ) )
ORDER  BY Field(e.entity_id, 1) 

I've tried to find any usage of this temporary table but wasn't able to. Can you please provide more details on why you believe this is still actual?

Thanks

What if we're using a third party search like Algolia or Klevu, rather the Elasticsearch, which means we have to have Elasticsearch turned off to work?

Was this page helpful?
0 / 5 - 0 ratings