10.0.17134 Windows 10 Pro (VM)
MacOS 10.13.6 (17G4015)
3.3.1 and 2.11.2
Add a LOG(INFO) statement to the virtual table you want to test and recompile (e.g. LOG(INFO) << "genProcesses()" in processes.cpp).
SELECT count(1) FROM processes WHERE path IN ('a','b','c','d','e','f','g','h')
The virtual table should be queried once, regardless of how many items are in the list. I should only see one "genProcesses()" log line.
osquery> SELECT count(1) FROM processes WHERE path IN ('a','b','c','d','e','f','g','h');
I0121 13:09:52.299521 2920076160 processes.cpp:473] genProcesses
I0121 13:09:52.309139 2920076160 processes.cpp:473] genProcesses
I0121 13:09:52.317179 2920076160 processes.cpp:473] genProcesses
I0121 13:09:52.323804 2920076160 processes.cpp:473] genProcesses
I0121 13:09:52.329715 2920076160 processes.cpp:473] genProcesses
I0121 13:09:52.335345 2920076160 processes.cpp:473] genProcesses
I0121 13:09:52.341004 2920076160 processes.cpp:473] genProcesses
I0121 13:09:52.346843 2920076160 processes.cpp:473] genProcesses
I see one "genProcesses()" line for every item in the list (8 times in the example above).
As you can imagine, frequent queries on processes looking for specific paths can add significant load to the system. Especially the old WMI-based processes table.
Wow, this looks like a great finding. Thanks for testing it on multiple platforms/osquery versions. Let me look into that. (This is even weirder, because for processes table cacheable=True)
btw. Have you had chance to check any other table?
UPD UPD:
So what happens internally, is that sqlite created 8 queries
select path from processes where path = 'a'
select path from processes where path = 'b'
and so on ...
UPD : I also tried with the preferences table on MACOS, current head of osquery. There is the same issue. Looks like we have to look into the sqlite implementation.
The problem is in our virtual_table.cpp::xBestIndex heuristic. It is biased to more constrained, queries. We definitely need to improve it.
According to the following link, xFilter will be called multiple times, once for each value in the RHS
of the IN operator. And xFilter is where we are scanning rows by calling vtable->generate(). So seems like a bit more work than tweaking xFilter, unfortunately.
http://sqlite.1065341.n5.nabble.com/xBestIndex-XFilter-and-virtual-tables-td79955.html
Oh. That sounds worse. Looks like someone has to patch sqlite.
For note: I traced xBestIndex and it's definitely so :(
I should note that this happens with joins as well. Running SELECT pid, path, authority FROM processes LEFT JOIN signature USING (path) on MacOS, it will scan the signature table once for each process path.
I assume, it happens on any other platform. That sounds worse than the problem with IN.
In case if someone does not want to collaborate with the sqlite community, I'll try to deal with it when I have time.
I wouldn't say we need to alter sqlite code itself. It's in the way osquery handles xBestIndex, xFilter, xNext, xEOF. The xFilter should be used to gather constraints only, not scan the tables.
Actually, I think you are right that we need to modify sqlite to get constraints sent together to xFilter. Oy.
I had some time to play with it. Looks like there is the easy fix.
When I removed lines, SQLite worked as we wanted to.
All is left, is to come up with proper if condition, when to exclude useless constraints.
Interesting. It looks like that will result in a single xFilter call, but with no constraints, so there's no way for virtual table to filter. As you can see from the sqlite instructions below (instructions 5-11 are a loop). Which is still better than calling xFilter list_size times. I'm looking at the sqlite spaghetti code to see how we can pass a full constraint list.
EXPLAIN SELECT pid,path FROM processes WHERE path IN ('/bin/bash','/bin/cat')
+------+-------------+----+----+----+-------------------+----+---------+
| addr | opcode | p1 | p2 | p3 | p4 | p5 | comment |
+------+-------------+----+----+----+-------------------+----+---------+
| 0 | Init | 0 | 13 | 0 | | 00 | |
| 1 | VOpen | 0 | 0 | 0 | vtab:7FD6EC67AC00 | 00 | |
| 2 | Integer | 0 | 1 | 0 | | 00 | |
| 3 | Integer | 0 | 2 | 0 | | 00 | |
| 4 | VFilter | 0 | 12 | 1 | | 00 | |
| 5 | VColumn | 0 | 2 | 3 | | 00 | |
| 6 | Eq | 3 | 8 | 4 | (BINARY) | 42 | |
| 7 | Ne | 3 | 11 | 5 | (BINARY) | 52 | |
| 8 | VColumn | 0 | 0 | 6 | | 00 | |
| 9 | VColumn | 0 | 2 | 7 | | 00 | |
| 10 | ResultRow | 6 | 2 | 0 | | 00 | |
| 11 | VNext | 0 | 5 | 0 | | 00 | |
| 12 | Halt | 0 | 0 | 0 | | 00 | |
| 13 | Transaction | 0 | 0 | 2 | 0 | 01 | |
| 14 | String8 | 0 | 4 | 0 | /bin/bash | 00 | |
| 15 | String8 | 0 | 5 | 0 | /bin/cat | 00 | |
| 16 | Goto | 0 | 1 | 0 | | 00 | |
+------+-------------+----+----+----+-------------------+----+---------+
All the tables I checked so far, either can not filter things at all, or can filter things using EQUAL. Do you think, it makes a lot of difference if we provide all the EQUAL constraints together or separately?
So I took a step back and now better understand the situation. If xBestIndex indicates use of an index for a column, sqlite will invoke xFilter for each constraint. If xBestIndex does not indicate a column as an index, sqlite will call xFilter once without any constraints and do the filtering afterwards. So the fix is this: virtual_table changes.
NOTE: The problem remains, that if a user attempts to JOIN on a column that isn't indexed, it will result in xFilter being N times without any constraints.
Yup, that's what I was aiming to.
There are few more cases, when constraints need to be required.
ColumnOptions
INDEX, REQUIRED, ADDITIONAL, OPTIMIZED.
But I did not know about the last part. It looks like a deal breaker :(
I see why SQLite does not want to do the merging job there. It looks hard to implement. So they just pushed logic to the vtab side.
The Worst scenario, which I can not explain at all is.
select count(1) from chrome_extensions join users on users.gid=chrome_extensions.uid;
It executed both chrome_extensions and users tables len(users) times. :(
More understandable scenario happened with
select count(1) from chrome_extensions join users on users.uid=chrome_extensions.uid;
chrome_extensions once, and users len(chrome_extensions) times
UPD:
So, query
select count(1) from chrome_extensions join users on users.gid=chrome_extensions.uid;
acts so weirdly because of the xBestIndex (It's resolvable). Only hard problem we have left, is what you found out.
With the xBestIndex change, and using LEFT JOIN for your chrome extensions, it works as expected, since the columns are not indexed. It looks like the optimizer will swap users and chrome_extensions unless you specify LEFT.
FYI @guliashvili , I use the following change for line, along with #5422 fix:
if (index_used) {
pIdxInfo->aConstraintUsage[i].argvIndex = static_cast<int>(++expr_index);
}
@theopolis Can I close this? I think indexes are fixed now, right?
Yeap!