If I have a query string like "one AND two OR three", I expect the results to be the same as one of "(one AND two) OR three" or "one AND (two OR three)" but it seems like they are actually the same as "one OR two OR three".
Also, if the default_operator is AND, actually I expect "one two OR three" to work either like "(one AND two) OR three" or "one AND (two OR three)" -- but instead it works like "one OR two OR three".
Here's an example:
# create a test index
curl -XPUT 'localhost:9200/test'
# index a doc with a field containing the text "one"
curl -XPUT 'localhost:9200/test/mytype/1' -d '
{
"text": "one"
}'
# query the index with no default_operator, for "one AND two OR three" (0 results as expected)
curl -XGET 'localhost:9200/test/mytype/_search?pretty' -d '
{
"query": {
"query_string": {
"default_field": "_all",
"query": "text:(one AND two OR three)"
}
}
}'
# query with default operator, "(one AND two) OR three" (0 results as expected)
# "one AND (two OR three)" also gives the expected 0 results
curl -XGET 'localhost:9200/test/mytype/_search?pretty' -d '
{
"query": {
"query_string": {
"default_field": "_all",
"default_operator": "AND",
"query": "text:((one AND two) OR three)"
}
}
}'
# query "one AND two OR three" now with default operator, returns one result but I expect 0
curl -XGET 'localhost:9200/test/mytype/_search?pretty' -d '
{
"query": {
"query_string": {
"default_field": "_all",
"default_operator": "AND",
"query": "text:(one AND two OR three)"
}
}
}'
Using elasticsearch-1.7.1.
This may help:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html?q=query_string#_boolean_operators
Especially, "While the + and - only affect the term to the right of the operator, AND and OR can affect the terms to the left and right."
By the way, there are several problems with your specific queries:
You're probably using the +/- syntax or using the match query DSL though.
The problem is that the query string does not use pure boolean logic. It is intended to be used as a query, not as a filter. Queries have required matches (must) and optional matches (should) which are not required but improve the score if they are present.
Worth reading this blogpost to understand more: https://lucidworks.com/blog/why-not-and-or-and-not/
As @sarwarbhuiyan said, you're better off using the query DSL if you want real boolean logic.
Thanks! You're probably right it's better to simply always use +/- but at this point that would require us to (further) translate user queries.
However, I still don't understand the default_operator behavior -- why for the query "x AND y OR z" is the behavior different when the default_operator is "AND" vs. when it's not specified (and from the docs I understand defaults to OR)?
To answer @sarwarbhuiyan -- initially I was trying to understand why "x y OR z" returned results with only x and not y and not z with the default_operator AND. And sorry about the default_field/specifying field in query string redundancy, unfortunately I constructed the test query from various sources. I don't think this is related to the behavior I see though. My point is actually about the default_operator behavior, not about the AND/OR operators.
Also according to that linked doc page (unless I understood incorrectly, again!) "a AND b OR c" should be equivalent to "(a AND b) OR c" because AND takes precedence -- and in case the default_operator is unspecified, it seems to be, but in case the default_operator is AND, it is not. This is the behavior I am trying to understand/work around.
The answer is explained in that blog post, to quote:
Things definitely get very confusing when these “boolean operators” are used in ways other then those described above. In some cases this is because the query parser is trying to be forgiving about “natural language” style usage of operators that many boolean logic systems would consider a parse error. In other cases, the behavior is bizarrely esoteric:
- Queries are parsed left to right
- NOT sets the Occurs flag of the clause to it’s right to MUST_NOT
- AND will change the Occurs flag of the clause to it’s left to MUST unless it has already been set to MUST_NOT
- AND sets the Occurs flag of the clause to it’s right to MUST
- If the default operator of the query parser has been set to “And”: OR will change the Occurs flag of the clause to it’s left to SHOULD unless it has already been set to MUST_NOT
- OR sets the Occurs flag of the clause to it’s right to SHOULD
Frankly, these rules are just too hard to remember. This is one of the many reasons I don't like using the query_string query at all. Here's another reason. Look at these two queries for example:
http://foo # finds an empty regex in field `http` and `foo` in the `_all` field
http://foo/ # throws a malformed regex exception
If you want to understand how the query string syntax is being understood, then use the validate-query API:
GET _validate/query?explain
{
"query": {
"query_string": {
"query": "x AND y OR z",
"default_operator": "OR"
}
}
}
Thanks so much for explaining!
Most helpful comment
The answer is explained in that blog post, to quote:
Frankly, these rules are just too hard to remember. This is one of the many reasons I don't like using the
query_stringquery at all. Here's another reason. Look at these two queries for example:If you want to understand how the query string syntax is being understood, then use the validate-query API: