Tidb: push Limit down to the probe side of IndexLookUp even if there is an Order

Created on 24 Nov 2020 · 6Comments · Source: pingcap/tidb

Development Task

mysql> create table t (a int, b int, key(a));
mysql> explain select * from t where b>10 order by a limit 1;
+-----------------------------------+---------+-----------+---------------------+--------------------------------+
| id                                | estRows | task      | access object       | operator info                  |
+-----------------------------------+---------+-----------+---------------------+--------------------------------+
| Limit_12                          | 1.00    | root      |                     | offset:0, count:1              |
| └─Projection_25                   | 1.00    | root      |                     | test.t.a, test.t.b             |
|   └─IndexLookUp_24                | 1.00    | root      |                     |                                |
|     ├─IndexFullScan_21(Build)     | 3.00    | cop[tikv] | table:t, index:a(a) | keep order:true, stats:pseudo  |
|     └─Selection_23(Probe)         | 1.00    | cop[tikv] |                     | gt(test.t.b, 10)               |
|       └─TableRowIDScan_22         | 3.00    | cop[tikv] | table:t             | keep order:false, stats:pseudo |
+-----------------------------------+---------+-----------+---------------------+--------------------------------+

In the case above, we can push Limit down to the probe side to get a better plan:

Limit                           root
  Projection                    root
    IndexLookUp                 root
      IndexFullScan             cop
      Limit                     cop
        Selection               cop
           TableRowIDScan       cop

siplanner statuhelp-wanted typenhancement

Source

qw4990

All 6 comments

@tangwz Would you like to take a look at this?

qw4990 on 24 Nov 2020

/assign

tangwz on 25 Nov 2020

👍1

After discussion, we find the Limit cannot be pushed down to the table side in this case since rows read from the index side would be reordered before being sent to the table side.

qw4990 on 1 Dec 2020

This problem affects many of our users, so we still need to solve it.

qw4990 on 22 Dec 2020

Something about why this would not improve too much. Let us look at how the reader works currently:

We scan the index by its order.
We get the row_id which is ordered by index.
And for simplification, we say that the the rows we scan is {indexKey: key_0, row_id: id_0}, {indexKey: key_1, row_id: id_1}, .... {indexKey: key_n, row_id, id_n} and for each i and j, i < j <=> key_i < key_j.
We first split them to many tasks, each one contains 20000 rows by default. Thus the ith task contains id_{20000*(i-1)}, ..., id_{20000*i-1}. Since here we need to return the data which keep order by index, we need to return the task's results in order.
This SQL has run 1 min and doesn't get it result yet. => At least the first x tasks returns no rows. => The rows with id in {id_0, .... id_{20000*x-1} don't satisfiy the filters in the Selection exeutor.
Whatever we push a limit or top-n to the table side or not with correctness satisfied, it won't change that the first x tasks must be executed and the rows with id in id_0, ..., id_{20000*x-1} are still scanned. We still won't scan the rows we really want earlier than before.

winoros on 8 Jan 2021

👀2

One more example to clarify:

Suppose that in the query select * from t where b=? order by a limit 1;, the column a is province and the column b is the city.
We first inserted 1 million rows with a = 'Hebei' and b = 'Tangshan' then insert 1 million rows a = 'Hebei' and b = 'Shijiazhuang'.
The query here is where b = 'Shijiazhuang' order by a limit 1;
Since the rows are ordered by column a, we must first scan 1 million rows b = 'Tangshan' then we're able to scan the rows with b = 'Shijiazhuang'.
Whatever we push the limit down to the table scan side or not. The selection will filter the first 1 million rows(whose b is Tangshan) out. And the limit receieves no data. It cannot help until the first 1 million rows is scanned and filtered.