Yugabyte-db: [YSQL] Wrong results when using ORDER BY clause

Created on 18 Mar 2020  路  7Comments  路  Source: yugabyte/yugabyte-db

The issue

I've been trying to use yugabytedb with MongooseIM and some of its tests have failed. After debugging the issues it's turned out that the problem boils down to using ORDER BY column DESC clause in a SELECT query. Number of returned rows for query that uses ORDER BY clause and the one that does not use the clause are different.

How to reproduce the issue

Please follow the steps below to reproduce the issue:

  1. Create a table:
CREATE TABLE mam_message_test(
  id BIGINT NOT NULL,
  user_id INT NOT NULL,
  PRIMARY KEY(user_id, id)
);
  1. Populate the table:
INSERT INTO mam_message_test
VALUES (12870657, 17),
  (16393729, 17),
  (19780865, 17),
  (21770241, 17),
  (23683329, 17),
  (25524993, 17),
  (27495937, 17),
  (29214465, 17),
  (31265793, 17),
  (32910593, 17),
  (34800129, 17),
  (36739073, 17),
  (38732033, 17),
  (40899841, 17),
  (42786561, 17);
  1. Select some rows from the table:
SELECT id FROM mam_message_test 
WHERE id < 32910593 AND user_id = 17
LIMIT 9;
    id
----------
 12870657
 16393729
 19780865
 21770241
 23683329
 25524993
 27495937
 29214465
 31265793
(9 rows)
  1. Select some rows from the table but sort them in descending order:
SELECT id FROM mam_message_test 
WHERE id < 32910593 AND user_id = 17
ORDER BY id DESC LIMIT 9;
    id
----------
 31265793
 29214465
 27495937
 25524993
 23683329
 21770241
 19780865
 16393729
(8 rows)

As we can see we now have only 8 rows instead of 9. I would expect having still 9 rows.

When the query works properly

I have found a few ways of avoiding the issue:

  1. Do not use PRIMARY KEY - if we create a table without PRIMARY KEY the query works as expected:
CREATE TABLE mam_message_test(
  id BIGINT NOT NULL,
  user_id INT NOT NULL
);
  1. Do not use user_id = 17 condition:
SELECT id FROM mam_message_test 
WHERE id < 32910593
ORDER BY id DESC LIMIT 9;

Test environment

I've run the above tests on macOS Catalina 10.15.3 using yugabyte-2.1.2.0 and 1-node cluster. I've used ysqlsh for interacting with the db.

See yb-ctl status output below:

| Node Count: 1 | Replication Factor: 1                                                            |
----------------------------------------------------------------------------------------------------
| JDBC                : jdbc:postgresql://127.0.0.1:5433/postgres                                  |
| YSQL Shell          : bin/ysqlsh                                                                 |
| YCQL Shell          : bin/cqlsh                                                                  |
| YEDIS Shell         : bin/redis-cli                                                              |
| Web UI              : http://127.0.0.1:7000/                                                     |
| Cluster Data        : /Users/kacpermentel/yugabyte-data                                          |
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
| Node 1: yb-tserver (pid 80789), yb-master (pid 80786)                                            |
----------------------------------------------------------------------------------------------------
| JDBC                : jdbc:postgresql://127.0.0.1:5433/postgres                                  |
| YSQL Shell          : bin/ysqlsh                                                                 |
| YCQL Shell          : bin/cqlsh                                                                  |
| YEDIS Shell         : bin/redis-cli                                                              |
| data-dir[0]         : /Users/kacpermentel/yugabyte-data/node-1/disk-1/yb-data                    |
| yb-tserver Logs     : /Users/kacpermentel/yugabyte-data/node-1/disk-1/yb-data/tserver/logs       |
| yb-master Logs      : /Users/kacpermentel/yugabyte-data/node-1/disk-1/yb-data/master/logs        |
----------------------------------------------------------------------------------------------------

I'd be grateful if you could take a look at this. Thanks.

areysql communitrequest priorithigh

Most helpful comment

@mkacper Thanks for reporting the issue and the thorough description. It should be fixed by 6ae98bc (also specifically tested your examples locally).
Note: The fix should be included in our next minor release (version 2.1.3) which should be ready in about 1-2 weeks or less.

Root issue was with using both reverse-order scans (because of the ORDER BY .. DESC) and paging results. The paging was indirectly caused by the LIMIT clause due to the LESS_THAN vs LESS_THAN_EQUAL issue that @ndeodhar mentioned above.
We will do additional filtering in the query layer if needed so having a more general condition for the internal request is still _correct_, but filtering that first row meant we needed to send another internal request to fetch the full result (See the diff description for more details).

All 7 comments

Yes, able to repro in 2.1.

ORDER BY id desc seems to work fine; but when a LIMIT is added, and specifically LIMIT 9 (in this test case) and not other values for the LIMIT-- then it seems to trigger the bug.

##OK##
yugabyte=# SELECT id FROM mam_message_test
WHERE id < 32910593 and user_id=17 ORDER BY id DESC;
    id
----------
 31265793
 29214465
 27495937
 25524993
 23683329
 21770241
 19780865
 16393729
 12870657
(9 rows)

##NOT OK##
yugabyte=# SELECT id FROM mam_message_test 
WHERE id < 32910593 and user_id=17 ORDER BY id DESC LIMIT 9;
    id
----------
 31265793
 29214465
 27495937
 25524993
 23683329
 21770241
 19780865
 16393729
(8 rows)

##OK##
yugabyte=# SELECT id FROM mam_message_test 
WHERE id < 32910593 and user_id=17 ORDER BY id DESC LIMIT 10;
    id
----------
 31265793
 29214465
 27495937
 25524993
 23683329
 21770241
 19780865
 16393729
 12870657
(9 rows)

##NOT OK: Num Rows OK; but not the correct set of rows ##
yugabyte=# SELECT id FROM mam_message_test 
WHERE id < 32910593 and user_id=17 ORDER BY id DESC LIMIT 6;
    id
----------
 31265793
 29214465
 27495937
 25524993
 23683329
 19780865
(6 rows)

##NOT OK: Num Rows OK; but not the correct set of rows ##
yugabyte=# SELECT id FROM mam_message_test 
WHERE id < 32910593 and user_id=17 ORDER BY id DESC LIMIT 7;
    id
----------
 31265793
 29214465
 27495937
 25524993
 23683329
 21770241
 16393729
(7 rows)

##NOT OK: Num Rows OK; but not the correct set of rows ##
yugabyte=# SELECT id FROM mam_message_test 
WHERE id < 32910593 and user_id=17 ORDER BY id DESC LIMIT 8;
    id
----------
 31265793
 29214465
 27495937
 25524993
 23683329
 21770241
 19780865
 12870657
(8 rows)

Confirm bug exists.
Even on range-based partitioning.

Thx @mkacper for reporting with clear repro/test case. We'll look into this as hi-pri.

Thank you guys for the quick reaction!

@kmuthukk I am not sure if the examples you've marked as OK are really correct.

Let's consider the example with LIMIT 6:

##OK##
yugabyte=# SELECT id FROM mam_message_test 
WHERE id < 32910593 and user_id=17 ORDER BY id DESC LIMIT 6;
    id
----------
 31265793
 29214465
 27495937
 25524993
 23683329
 19780865
(6 rows)

When we look at the results the last row should be 21770241 not 19780865. The analogical situation happens for LIMIT 7 and LIMIT 8.

@mkacper - You are correct -- I was looking only at the number of rows returned, but not the values. Will update my post inline soon.

There are 2 problems:

  1. Read RPC sent to tserver has wrong opcode - we send LESS_THAN_OR_EQUAL instead of LESS_THAN. See example RPC (part of RPC):
  max_hash_code: 28919                                                                               
  table_id: "000030a9000030008000000000004100"                                                       
  condition_expr {                                                                                   
    condition {                                                                                      
      op: QL_OP_AND                                                                                  
      operands {                                                                                     
        condition {                                                                                  
          op: QL_OP_LESS_THAN_EQUAL                                                                  
          operands {                                                                                 
            column_id: 1                                                                             
          }                                                                                          
          operands {                                                                                 
            value {                                                                                  
              int64_value: 32910593                                                                  
            }                                                                                        
          }                                                                                          
        }                                                                                            
      }                                                                                              
    }                                                                                                
  }                  

Due to this, docDB ends up sending 9 rows <= condition. Postgres then filters out the ones that match = condition and we see lesser results.
Note that if you change the condition in the where clause to be id <= 32910593, things work as expected.

  1. Another problem which contributes to above queries like LIMIT 3 returning wrong entries is that once we receive 3 entries from docdb after the first read RPC, we continue to make read RPC requests over and over again to the tservers (essentially scanning all data for that hash). In the case of LIMIT 3, the first entry from the second RPC gets picked and we see wrong results.

@mkacper Thanks for reporting the issue and the thorough description. It should be fixed by 6ae98bc (also specifically tested your examples locally).
Note: The fix should be included in our next minor release (version 2.1.3) which should be ready in about 1-2 weeks or less.

Root issue was with using both reverse-order scans (because of the ORDER BY .. DESC) and paging results. The paging was indirectly caused by the LIMIT clause due to the LESS_THAN vs LESS_THAN_EQUAL issue that @ndeodhar mentioned above.
We will do additional filtering in the query layer if needed so having a more general condition for the internal request is still _correct_, but filtering that first row meant we needed to send another internal request to fetch the full result (See the diff description for more details).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ajcaldera1 picture ajcaldera1  路  4Comments

robertpang picture robertpang  路  3Comments

nocaway picture nocaway  路  3Comments

bmatican picture bmatican  路  3Comments

hudclark picture hudclark  路  4Comments