Hello,
I've been trying to find all documents that contains a field with an empty string value (_textContent
) inside my index using the missing
filter. But I think that this filter doesn't treat empty strings as a null value.
Is this a bug, or the intended behavior for this filter? If it is like this by design, may I suggest adding an empty
filter, if that's even possible?
As a plus, I'll post the mapping that I'm currently using, the document that contains the empty string field and the query that I'm trying to run:
Mapping:
{
"documents": {
"mappings": {
"document": {
"properties": {
"_contratante": {
"type": "string"
},
"_dateFields": {
"type": "nested",
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "date",
"format": "dateOptionalTime"
}
}
},
"_indexadoPor": {
"type": "string"
},
"_textContent": {
"type": "string"
},
"_textFields": {
"type": "nested",
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "string"
}
}
},
"_tipoDocumento": {
"type": "string"
}
}
}
}
}
}
Document:
{
"_index": "documents",
"_type": "document",
"_id": "xxx",
"_version": 1,
"found": true,
"_source": {
"_id": "xxx",
"_contratante": "xxx",
"_tipoDocumento": "xxx",
"_indexadoPor": "xxx",
"_dateFields": [
{
"id": "538730ece4b0d13600208d7a:2",
"value": 1404183600000
}
],
"_textFields": [
{
"id": "538730ece4b0d13600208d7a:0",
"value": "xxx"
},
{
"id": "538730ece4b0d13600208d7a:1",
"value": "xxx"
},
{
"id": "538730ece4b0d13600208d7a:3",
"value": ""
},
{
"id": "538730ece4b0d13600208d7a:4",
"value": "xxxx"
}
],
"_textContent": ""
}
}
Query:
{
"query": {
"filtered": {
"filter": {
"missing": {
"field": "_textContent"
}
}
}
}
}
Thanks a lot for the amazing tool developed here!
Hi Paulo and everyone,
I just ran a few tests and also have same questions about it.
The situation can be produced by the gist: https://gist.github.com/hxuanji/d941c21fc75648ce7ba4
On the ES 1.3.2, the empty string cannot be find and the gist above can only find one document {"name": null}
. And its explanation is
ConstantScore(cache(NotFilter(cache(BooleanFilter(_field_names:name)))))
But on the ES 1.0.1, the final result of the gist can both find the empty-string and null-value documents. Also its explanation is
ConstantScore(cache(NotFilter(cache(BooleanFilter(name:[* TO *])))))
.
It seems the parsing rule changed, _field_names
seems not consider some special null cases on the ES 1.3.2 ?!
And this _field_names
added from #5659.
Any ideas?
I think this is a duplicate of https://github.com/elasticsearch/elasticsearch/issues/7348 ?
Hi @jprante,
Yes, I think so.
Does adding an empty
filter sounds silly? I think there's some good use cases for this.
Here's a workaround:
PUT t/t/1
{
"textContent": ""
}
PUT t/t/2
{
"textContent": "foo"
}
GET t/t/_search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "textContent"
}
}
],
"must_not": [
{
"wildcard": {
"textContent": "*"
}
}
]
}
}
}
Not optimal, but it works
You're a life saver @clintongormley
I want to do the inverse of this:
"query": {
"bool": {
"filter": {
"wildcard": {
"name": "*"
}
}
}
And it appears to work. Any document with a name matches and any doc with null or '' name does not match. But I don't know why it works! I thought that the wildcard filter expands * to some number of terms so that if I have, say, 9999 terms in the name field, if I index a document with name = "zzzzzzzzzzzzebra" then that will be well past the term expansion and it shouldn't match. But it does. Why?
Somehow I guess it's doing the constant_score
thing here and it's using the variant that doesn't hit the 1024 clause error. But how does that work? How can you look up anything in the index w/o having a should
clause with all the terms inside of it?
@clintongormley ?
Now it's 2020. Still not found a better solution for "field is not the empty string".
Used the following solution, but is this the really good practice?
{
"wildcard":{"field_name":"*"}
}
Most helpful comment
Here's a workaround:
Not optimal, but it works