Elasticsearch: “totalTermFreq must be at least docFreq” error after upgrading to 7.0.1

Created on 8 May 2019  ·  4Comments  ·  Source: elastic/elasticsearch

Elasticsearch version (bin/elasticsearch --version):
Version: 7.0.1, Build: default/tar/e4efcb5/2019-04-29T12:56:03.145736Z, JVM: 1.8.0_202

Plugins installed:
Just the plugins which are shipped with ES

JVM version (java -version):
openjdk version "1.8.0_202"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_202-b08)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.202-b08, mixed mode)

OS version (uname -a if on a Unix-like system):
Linux 3.10.0-514.21.1.el7.x86_64 #1 SMP Thu May 25 17:04:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
After upgrading from 6.7.1 I get lots of “totalTermFreq must be at least docFreq” errors.

Steps to reproduce:

I don't have a simple example of how to reproduce because it only happens when I index tons of data. Until now I wasn't able to reproduce it with just a few documents.

But I'll share what I've figured out so far.
Also see the following discussion, there a two other guys encountering the same problem:
https://discuss.elastic.co/t/totaltermfreq-must-be-at-least-docfreq-error-after-upgrading-to-7-0-1/179977

I search using multi-match queries with cross_fields. Changing it to best_fields helps, but is not what I want.

In my environment it seems that array fields are causing the problem.

I've got a mapping like:

{
    "properties": {
        "description": {
            "type": "text",
            "similarity": "custom_similarity",
            "term_vector" : "with_positions_offsets",
            "analyzer": "standard_analyzer",
            "search_analyzer": "standard_search_analyzer",
            "fields": {
                "ngram": {
                    "type": "text",
                    "similarity": "custom_similarity",
                    "analyzer": "ngram_analyzer",
                    "search_analyzer": "standard_search_analyzer"
                },
                "edge_ngram_prefix": {
                    "type": "text",
                    "similarity": "custom_similarity",
                    "analyzer": "edge_ngram_1_analyzer",
                    "search_analyzer": "standard_search_analyzer"
                }
            }
        },
        "tags": {
            "type": "text",
            "similarity": "custom_similarity",
            "analyzer": "standard_analyzer",
            "search_analyzer": "standard_search_analyzer",
            "fields": {
                "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                }
            }
        }
    }
}

And I post data like:

POST /some_index/_doc
{
    "description": "Some description",
    "tags": ["foo", "bar", "foo bar"]
}

When I search in both fields the query fails.

The query looks like:

{  
   "from":0,
   "size":100,
   "query":{  
      "bool":{  
         "must":[  
            {  
               "function_score":{  
                  "query":{  
                     "bool":{  
                        "must":[  
                           {  
                              "function_score":{  
                                 "query":{  
                                    "multi_match":{  
                                       "query":"foo",
                                       "fields":[  
                                          "description^3.0",
                                          "description.edge_ngram_prefix^0.90000004",
                                          "description.ngram^0.6",
                                          "tags^1.0"
                                       ],
                                       "type":"cross_fields",
                                       "operator":"AND",
                                       "slop":0,
                                       "prefix_length":0,
                                       "max_expansions":50,
                                       "tie_breaker":0.05,
                                       "zero_terms_query":"NONE",
                                       "auto_generate_synonyms_phrase_query":true,
                                       "fuzzy_transpositions":true,
                                       "boost":1.0
                                    }
                                 },
                                 "functions":[  
                                    {  
                                       "filter":{  
                                          "match_all":{  
                                             "boost":1.0
                                          }
                                       },
                                       "field_value_factor":{  
                                          "field":"boost",
                                          "factor":1.0,
                                          "modifier":"none"
                                       }
                                    }
                                 ],
                                 "score_mode":"sum",
                                 "max_boost":3.4028235E38,
                                 "boost":1.0
                              }
                           }
                        ],
                        "adjust_pure_negative":true,
                        "boost":1.0
                     }
                  }
               }
            }
         ],
         "adjust_pure_negative":true,
         "boost":1.0
      }
   }
}

When I remove all those array fields from the query I do no longer get this error.

So problem seems to have something to do with the cross_fields option and array fields.

Let me know if you need more details, logs, etc.

Edit:
I also deleted and reindexed the data. But that didn't help, also.

:SearcRanking >bug v7.0.1

Most helpful comment

Thanks for reporting @TheRealChrisS , @markharwood I am able to reproduce and I opened https://github.com/elastic/elasticsearch/pull/41938 for the fix.

All 4 comments

Pinging @elastic/es-search

Related TTF changes in cross fields: https://github.com/elastic/elasticsearch/pull/41125/files#diff-9ccffb97cfc8d6b98ec206fe570d7a04L180

I've not reproduced that here yet with simplified mappings. Can you supply a full mapping that allows me to reproduce the issue given a clean install of 7.01?

Thanks for reporting @TheRealChrisS , @markharwood I am able to reproduce and I opened https://github.com/elastic/elasticsearch/pull/41938 for the fix.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jpountz picture jpountz  ·  3Comments

ttaranov picture ttaranov  ·  3Comments

abtpst picture abtpst  ·  3Comments

clintongormley picture clintongormley  ·  3Comments

DhairyashilBhosale picture DhairyashilBhosale  ·  3Comments