mysql> create table t (a int, b int, key(a));
mysql> explain select * from t where b>10 order by a limit 1;
+-----------------------------------+---------+-----------+---------------------+--------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------------+---------+-----------+---------------------+--------------------------------+
| Limit_12 | 1.00 | root | | offset:0, count:1 |
| 鈹斺攢Projection_25 | 1.00 | root | | test.t.a, test.t.b |
| 鈹斺攢IndexLookUp_24 | 1.00 | root | | |
| 鈹溾攢IndexFullScan_21(Build) | 3.00 | cop[tikv] | table:t, index:a(a) | keep order:true, stats:pseudo |
| 鈹斺攢Selection_23(Probe) | 1.00 | cop[tikv] | | gt(test.t.b, 10) |
| 鈹斺攢TableRowIDScan_22 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+-----------------------------------+---------+-----------+---------------------+--------------------------------+
In the case above, we can push Limit down to the probe side to get a better plan:
Limit root
Projection root
IndexLookUp root
IndexFullScan cop
Limit cop
Selection cop
TableRowIDScan cop
@tangwz Would you like to take a look at this?
/assign
After discussion, we find the Limit cannot be pushed down to the table side in this case since rows read from the index side would be reordered before being sent to the table side.
This problem affects many of our users, so we still need to solve it.
Something about why this would not improve too much. Let us look at how the reader works currently:
{indexKey: key_0, row_id: id_0}, {indexKey: key_1, row_id: id_1}, .... {indexKey: key_n, row_id, id_n} and for each i and j, i < j <=> key_i < key_j.id_{20000*(i-1)}, ..., id_{20000*i-1}. Since here we need to return the data which keep order by index, we need to return the task's results in order.{id_0, .... id_{20000*x-1} don't satisfiy the filters in the Selection exeutor.id_0, ..., id_{20000*x-1} are still scanned. We still won't scan the rows we really want earlier than before.One more example to clarify:
select * from t where b=? order by a limit 1;, the column a is province and the column b is the city.a = 'Hebei' and b = 'Tangshan' then insert 1 million rows a = 'Hebei' and b = 'Shijiazhuang'.where b = 'Shijiazhuang' order by a limit 1;b = 'Tangshan' then we're able to scan the rows with b = 'Shijiazhuang'.b is Tangshan) out. And the limit receieves no data. It cannot help until the first 1 million rows is scanned and filtered.This is why this pushdown won't help a lot when such query executes a long time.