Version CockroachDB CCL v1.1-beta.20170907-120-g6f7e3b0
I backed up some data from reg and restored it on sapphire and did a quick sanity check count(*) and was surprised to see 4m rows. I tried to narrow the date range where the new rows appeared and instead found pretty much any WHERE clause got rid of them, even a NULL OR NOT NULL:
> SELECT count(*) FROM updates WHERE timestamp IS NULL OR timestamp IS NOT NULL;
+----------+
| count(*) |
+----------+
| 1436341 |
+----------+
(1 row)
Time: 2.381654597s
> SELECT count(*) FROM updates;
+----------+
| count(*) |
+----------+
| 4308600 |
+----------+
(1 row)
Time: 8.987759834s
root@localhost:26257/registration>
I tried SET DISTSQL = off; but it doesn't change anything.
@vivekmenezes for routing
Sounds like it could be related to #18524, but the version number you state doesn't seem to include that PR. Could you please double check to see whether the version you were testing with includes that PR?
@dt can you retry your test with the latest changes on master including #18702
Thanks!
root@localhost:26257/> set database = registration;
SET
Time: 888.017碌s
root@localhost:26257/registration> SELECT version();
+---------------------------------------------------------------------------------------------------+
| version() |
+---------------------------------------------------------------------------------------------------+
| CockroachDB CCL v1.1-alpha.20170817-1058-gfbcd8ca (linux amd64, built 2017/09/24 15:09:04, go1.9) |
+---------------------------------------------------------------------------------------------------+
(1 row)
Time: 578.011碌s
root@localhost:26257/registration> SELECT count(*) FROM updates;
+----------+
| count(*) |
+----------+
| 4308600 |
+----------+
(1 row)
Time: 13.667848111s
root@localhost:26257/registration> SELECT count(*) FROM updates WHERE timestamp IS NULL OR timestamp IS NOT NULL;
+----------+
| count(*) |
+----------+
| 1436341 |
+----------+
(1 row)
Time: 5.45897963s
root@localhost:26257/registration>
@arjunravinarayan can you follow up with @dt on this issue. Thanks!
FWIW this isn't a distsql issue as set distsql=off doesn't change the output as @dt mentioned above. ISTM this is unrelated to #18702.
Ok, an EXPLAIN helps to reveal what's going on here. COUNT is returning different results for a scan over the timestamp index vs the primary index.
root@:26257/registration> explain SELECT count(*) FROM registration.updates where timestamp is null or timestamp is not null;
+-------+--------+-------+-------------------------------+
| Level | Type | Field | Description |
+-------+--------+-------+-------------------------------+
| 0 | group | | |
| 1 | render | | |
| 2 | scan | | |
| 2 | | table | updates@updates_timestamp_idx |
| 2 | | spans | ALL |
+-------+--------+-------+-------------------------------+
(5 rows)
root@:26257/registration> explain SELECT count(*) FROM registration.updates;
+-------+--------+-------+-----------------+
| Level | Type | Field | Description |
+-------+--------+-------+-----------------+
| 0 | group | | |
| 1 | render | | |
| 2 | scan | | |
| 2 | | table | updates@primary |
| 2 | | spans | ALL |
+-------+--------+-------+-----------------+
(5 rows)
On register itself, the index and the primary index don't match either.
root@localhost:26257/registration> SELECT count(*) FROM updates;
+----------+
| count(*) |
+----------+
| 1458683 |
+----------+
(1 row)
root@localhost:26257/registration> SELECT count(*) FROM updates@updates_timestamp_idx;
+----------+
| count(*) |
+----------+
| 1458828 |
+----------+
(1 row)
So it seems there are 2 problems here.
The original issue that @dt reported is caused by #18524. I'm not sure the cause yet but I'll be reverting it if I can't figure out a fix soon.
There's also the issue that the two indexes don't match in the source data. cc @vivekmenezes for possible cross-reference with #18533.
Going to fork this issue for the remaining investigation of the second issue - why the primary index and secondary index do not match in size.
Most helpful comment
18524 is flawed because it doesn't account for tables with multiple column families. I'll submit a fix today, along with tests for COUNT(*) on tables with more than 1 column family, which apparently did not exist before.