Elasticsearch version (bin/elasticsearch --version):
Version: 7.0.1, Build: default/tar/e4efcb5/2019-04-29T12:56:03.145736Z, JVM: 1.8.0_202
Plugins installed:
Just the plugins which are shipped with ES
JVM version (java -version):
openjdk version "1.8.0_202"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_202-b08)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.202-b08, mixed mode)
OS version (uname -a if on a Unix-like system):
Linux 3.10.0-514.21.1.el7.x86_64 #1 SMP Thu May 25 17:04:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
After upgrading from 6.7.1 I get lots of “totalTermFreq must be at least docFreq” errors.
Steps to reproduce:
I don't have a simple example of how to reproduce because it only happens when I index tons of data. Until now I wasn't able to reproduce it with just a few documents.
But I'll share what I've figured out so far.
Also see the following discussion, there a two other guys encountering the same problem:
https://discuss.elastic.co/t/totaltermfreq-must-be-at-least-docfreq-error-after-upgrading-to-7-0-1/179977
I search using multi-match queries with cross_fields. Changing it to best_fields helps, but is not what I want.
In my environment it seems that array fields are causing the problem.
I've got a mapping like:
{
"properties": {
"description": {
"type": "text",
"similarity": "custom_similarity",
"term_vector" : "with_positions_offsets",
"analyzer": "standard_analyzer",
"search_analyzer": "standard_search_analyzer",
"fields": {
"ngram": {
"type": "text",
"similarity": "custom_similarity",
"analyzer": "ngram_analyzer",
"search_analyzer": "standard_search_analyzer"
},
"edge_ngram_prefix": {
"type": "text",
"similarity": "custom_similarity",
"analyzer": "edge_ngram_1_analyzer",
"search_analyzer": "standard_search_analyzer"
}
}
},
"tags": {
"type": "text",
"similarity": "custom_similarity",
"analyzer": "standard_analyzer",
"search_analyzer": "standard_search_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
And I post data like:
POST /some_index/_doc
{
"description": "Some description",
"tags": ["foo", "bar", "foo bar"]
}
When I search in both fields the query fails.
The query looks like:
{
"from":0,
"size":100,
"query":{
"bool":{
"must":[
{
"function_score":{
"query":{
"bool":{
"must":[
{
"function_score":{
"query":{
"multi_match":{
"query":"foo",
"fields":[
"description^3.0",
"description.edge_ngram_prefix^0.90000004",
"description.ngram^0.6",
"tags^1.0"
],
"type":"cross_fields",
"operator":"AND",
"slop":0,
"prefix_length":0,
"max_expansions":50,
"tie_breaker":0.05,
"zero_terms_query":"NONE",
"auto_generate_synonyms_phrase_query":true,
"fuzzy_transpositions":true,
"boost":1.0
}
},
"functions":[
{
"filter":{
"match_all":{
"boost":1.0
}
},
"field_value_factor":{
"field":"boost",
"factor":1.0,
"modifier":"none"
}
}
],
"score_mode":"sum",
"max_boost":3.4028235E38,
"boost":1.0
}
}
],
"adjust_pure_negative":true,
"boost":1.0
}
}
}
}
],
"adjust_pure_negative":true,
"boost":1.0
}
}
}
When I remove all those array fields from the query I do no longer get this error.
So problem seems to have something to do with the cross_fields option and array fields.
Let me know if you need more details, logs, etc.
Edit:
I also deleted and reindexed the data. But that didn't help, also.
Pinging @elastic/es-search
Related TTF changes in cross fields: https://github.com/elastic/elasticsearch/pull/41125/files#diff-9ccffb97cfc8d6b98ec206fe570d7a04L180
I've not reproduced that here yet with simplified mappings. Can you supply a full mapping that allows me to reproduce the issue given a clean install of 7.01?
Thanks for reporting @TheRealChrisS , @markharwood I am able to reproduce and I opened https://github.com/elastic/elasticsearch/pull/41938 for the fix.
Most helpful comment
Thanks for reporting @TheRealChrisS , @markharwood I am able to reproduce and I opened https://github.com/elastic/elasticsearch/pull/41938 for the fix.