In YB 1.3.1.0, on Debian, it looks like columns with a default value of NOW() may, occasionally, be initialized to NULL instead. With a schema like
CREATE TABLE foo (
k TIMESTAMP DEFAULT NOW(),
v INT
) IF NOT EXISTS;
And a workload comprised of inserts (INSERT INTO foo (v) VALUES (?)), and reads (SELECT k, v FROM foo ORDER BY k), it appears that sometimes, k may wind up NULL, rather than being a timestamp. This log line shows a series of [k, v] pairs from a read; notice that it contains one row (13) with a timestamp, and one row (9) with null. 20190812T154729.000-0400.zip
2019-08-12 15:50:01,257{GMT} INFO [jepsen worker 5] yugabyte.ysql.append-table: table append11 has ([#inst "2019-08-12T19:49:57.609143000-00:00" 13] [nil 9])
Other tables may contain no null entries, or many nulls. These null values appear stable over multiple seconds. To reproduce, try Jepsen e324b33, and run
lein run test --os debian --version 1.3.1.0 -w ysql/append-table --concurrency 2n --time-limit 300
The checker doesn't verify this property, so the test failing or succeeding isn't meaningful. Instead, grep the logs for a line with a table with nil entries.
grep -e 'table append.\+ has.\+nil' store/latest/jepsen.log
Notes:
As per https://www.postgresql.org/docs/devel/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT "now() is a traditional PostgreSQL equivalent to transaction_timestamp()."
@spolitov wrote: << I suppose that null is caused by our lazy picking of transaction start time. we should change NOW implementation to pick this time, if it was not picked yet >>
I've changed PG NOW function code (timestamp.c) to log its state:
Datum
now(PG_FUNCTION_ARGS)
{
elog(LOG, " === Calling NOW() ===");
TimestampTz txnStartTs = GetCurrentTransactionStartTimestamp();
Datum result = TimestampTzGetDatum(txnStartTs);
elog(LOG, " === Start TS: %ld, result: %ld ===", txnStartTs, result);
return result;
}
When using ysqlsh, it triggers both when using original schema/query, or even simple SELECT NOW().
When tested with Jepsen, let's take a look at this log excerpt
(click to expand)
2019-08-14 15:37:32,297{GMT} INFO [jepsen worker 5] jepsen.util: 5 :invoke :txn [[:append 1 3] [:r 1 nil]]
2019-08-14 15:37:32,297{GMT} INFO [jepsen worker 0] jepsen.util: 0 :invoke :txn [[:append 0 2]]
2019-08-14 15:37:32,297{GMT} INFO [jepsen worker 4] jepsen.util: 4 :invoke :txn [[:r 1 nil] [:r 1 nil] [:r 1 nil]]
2019-08-14 15:37:32,299{GMT} INFO [jepsen worker 6] jepsen.util: 6 :invoke :txn [[:append 1 5] [:r 2 nil]]
2019-08-14 15:37:32,299{GMT} INFO [jepsen worker 1] jepsen.util: 1 :invoke :txn [[:append 1 1] [:append 1 2] [:append 0 1] [:append 2 1]]
2019-08-14 15:37:32,299{GMT} INFO [jepsen worker 9] jepsen.util: 9 :invoke :txn [[:r 2 nil] [:append 0 3]]
2019-08-14 15:37:32,299{GMT} INFO [jepsen worker 3] jepsen.util: 3 :invoke :txn [[:r 0 nil] [:append 1 4] [:append 2 2] [:append 2 3]]
2019-08-14 15:37:32,300{GMT} INFO [jepsen worker 8] jepsen.util: 8 :invoke :txn [[:append 1 6] [:r 1 nil] [:r 2 nil]]
2019-08-14 15:37:32,300{GMT} INFO [jepsen worker 2] jepsen.util: 2 :invoke :txn [[:append 2 4] [:append 1 7] [:r 0 nil] [:r 2 nil]]
2019-08-14 15:37:32,300{GMT} INFO [jepsen worker 7] jepsen.util: 7 :invoke :txn [[:r 1 nil] [:r 1 nil]]
2019-08-14 15:37:32,345{GMT} INFO [jepsen worker 4] yugabyte.ysql.append-table: Creating table append1 and retrying
2019-08-14 15:37:32,345{GMT} INFO [jepsen worker 3] yugabyte.ysql.append-table: Creating table append0 and retrying
2019-08-14 15:37:32,346{GMT} INFO [jepsen worker 9] yugabyte.ysql.append-table: Creating table append2 and retrying
2019-08-14 15:37:32,346{GMT} INFO [jepsen worker 7] yugabyte.ysql.append-table: Creating table append1 and retrying
2019-08-14 15:37:32,347{GMT} INFO [jepsen worker 1] yugabyte.ysql.append-table: Creating table append1 and retrying
2019-08-14 15:37:32,350{GMT} INFO [jepsen worker 6] yugabyte.ysql.append-table: Creating table append1 and retrying
2019-08-14 15:37:32,354{GMT} INFO [jepsen worker 8] yugabyte.ysql.append-table: Creating table append1 and retrying
2019-08-14 15:37:32,360{GMT} INFO [jepsen worker 5] yugabyte.ysql.append-table: Creating table append1 and retrying
2019-08-14 15:37:32,360{GMT} INFO [jepsen worker 2] yugabyte.ysql.append-table: Creating table append2 and retrying
2019-08-14 15:37:32,362{GMT} INFO [jepsen worker 0] yugabyte.ysql.append-table: Creating table append0 and retrying
2019-08-14 15:37:40,485{GMT} WARN [jepsen worker 7] yugabyte.ysql.append-table: Encountered error with conn "n3"; reopening
2019-08-14 15:37:40,499{GMT} WARN [jepsen worker 8] yugabyte.ysql.append-table: Encountered error with conn "n4"; reopening
2019-08-14 15:37:40,500{GMT} WARN [jepsen worker 2] yugabyte.ysql.append-table: Encountered error with conn "n3"; reopening
2019-08-14 15:37:40,527{GMT} WARN [jepsen worker 0] yugabyte.ysql.append-table: Encountered error with conn "n1"; reopening
2019-08-14 15:37:40,531{GMT} WARN [jepsen worker 1] yugabyte.ysql.append-table: Encountered error with conn "n2"; reopening
2019-08-14 15:37:40,546{GMT} WARN [jepsen worker 5] yugabyte.ysql.append-table: Encountered error with conn "n1"; reopening
2019-08-14 15:37:40,672{GMT} INFO [jepsen worker 8] jepsen.util: 8 :info :txn [[:append 1 6] [:r 1 nil] [:r 2 nil]] [:psql-exception "ERROR: duplicate key value violates unique constraint \"pg_type_typname_nsp_index\""]
2019-08-14 15:37:40,680{GMT} INFO [jepsen worker 2] jepsen.util: 2 :info :txn [[:append 2 4] [:append 1 7] [:r 0 nil] [:r 2 nil]] [:psql-exception "ERROR: duplicate key value violates unique constraint \"pg_type_typname_nsp_index\""]
2019-08-14 15:37:40,685{GMT} INFO [jepsen worker 8] jepsen.util: 18 :invoke :txn [[:r 1 nil] [:r 1 nil]]
2019-08-14 15:37:40,685{GMT} INFO [jepsen worker 7] jepsen.util: 7 :info :txn [[:r 1 nil] [:r 1 nil]] [:psql-exception "ERROR: duplicate key value violates unique constraint \"pg_type_typname_nsp_index\""]
2019-08-14 15:37:40,690{GMT} INFO [jepsen worker 7] jepsen.util: 17 :invoke :txn [[:r 2 nil]]
2019-08-14 15:37:40,690{GMT} INFO [jepsen worker 1] jepsen.util: 1 :info :txn [[:append 1 1] [:append 1 2] [:append 0 1] [:append 2 1]] [:psql-exception "ERROR: duplicate key value violates unique constraint \"pg_type_typname_nsp_index\""]
2019-08-14 15:37:40,697{GMT} INFO [jepsen worker 2] jepsen.util: 12 :invoke :txn [[:append 2 5] [:append 2 6]]
2019-08-14 15:37:40,697{GMT} INFO [jepsen worker 1] jepsen.util: 11 :invoke :txn [[:append 0 4] [:append 2 7]]
2019-08-14 15:37:40,730{GMT} INFO [jepsen worker 0] jepsen.util: 0 :info :txn [[:append 0 2]] [:psql-exception "ERROR: duplicate key value violates unique constraint \"pg_type_typname_nsp_index\""]
2019-08-14 15:37:40,734{GMT} INFO [jepsen worker 0] jepsen.util: 10 :invoke :txn [[:append 0 5]]
2019-08-14 15:37:40,757{GMT} INFO [jepsen worker 5] jepsen.util: 5 :info :txn [[:append 1 3] [:r 1 nil]] [:psql-exception "ERROR: duplicate key value violates unique constraint \"pg_type_typname_nsp_index\""]
2019-08-14 15:37:40,764{GMT} INFO [jepsen worker 5] jepsen.util: 15 :invoke :txn [[:append 1 8] [:append 0 6] [:r 0 nil] [:r 1 nil]]
2019-08-14 15:37:40,964{GMT} INFO [jepsen worker 2] jepsen.util: 12 :ok :txn [[:append 2 5] [:append 2 6]]
2019-08-14 15:37:40,964{GMT} INFO [jepsen worker 2] jepsen.util: 12 :invoke :txn [[:r 1 nil]]
2019-08-14 15:37:41,010{GMT} INFO [jepsen worker 0] jepsen.util: 10 :ok :txn [[:append 0 5]]
2019-08-14 15:37:41,010{GMT} INFO [jepsen worker 0] jepsen.util: 10 :invoke :txn [[:r 1 nil] [:r 2 nil]]
2019-08-14 15:37:41,090{GMT} INFO [jepsen worker 1] jepsen.util: 11 :ok :txn [[:append 0 4] [:append 2 7]]
2019-08-14 15:37:41,091{GMT} INFO [jepsen worker 1] jepsen.util: 11 :invoke :txn [[:append 2 8] [:r 2 nil] [:r 0 nil] [:append 0 7]]
2019-08-14 15:37:41,484{GMT} INFO [jepsen worker 7] yugabyte.ysql.append-table: table append2 has ([nil 5] [nil 6] [nil 7])
2019-08-14 15:37:41,484{GMT} INFO [jepsen worker 7] jepsen.util: 17 :ok :txn [[:r 2 [5 6 7]]]
2019-08-14 15:37:41,485{GMT} INFO [jepsen worker 7] jepsen.util: 17 :invoke :txn [[:append 1 9] [:r 1 nil] [:append 2 9] [:r 1 nil]]
2019-08-14 15:37:41,556{GMT} INFO [jepsen worker 5] yugabyte.ysql.append-table: table append0 has ([nil 6] [nil 4] [nil 5])
2019-08-14 15:37:43,273{GMT} WARN [jepsen worker 0] yugabyte.ysql.append-table: Encountered error with conn "n1"; reopening
2019-08-14 15:37:43,889{GMT} WARN [jepsen worker 3] yugabyte.ysql.append-table: Encountered error with conn "n4"; reopening
2019-08-14 15:37:43,985{GMT} WARN [jepsen worker 8] yugabyte.ysql.append-table: Encountered error with conn "n4"; reopening
2019-08-14 15:37:44,389{GMT} INFO [jepsen worker 9] yugabyte.ysql.append-table: table append2 has ([nil 5] [nil 6] [nil 7])
We can search for the first NOW() invocation in the logs:
alex@alex-debian-jessie:~/code/jepsen/yugabyte$ grep -iR '=== Start TS' store/latest/ | cut -d' ' -f2 | sort | head -n 1
15:37:43.851
Note that first append transaction returned :ok at 15:37:40,964 - meaning that NOW() wasn't invoked at all!
Logs: logs-jepsen_2019-08-14_20-40-08__append-table_c2n_t60_logs2.tar.gz
Interesting thing: we're actually writing tombstone instead of NOW
I0815 11:58:14.998617 3349 docdb.cc:1409] === Apply: SubDocKey(DocKey(0x04cf, ["B\xe8Z2mUO\xba\xb1R\xeaO\xb0\x18v\xff"], []), [SystemColumnId(0)]) = null (type = kNullLow) (int64 = 4067611415479844864), time: { physical: 1565870294982314 }, txn: 7af7b080-2dd3-4dbb-9a3b-f371dd089576, raw: 4704CF5342E85A326D554FBAB152EA4FB01876FF000021214A800D08238001C41760777966804A
I0815 11:58:15.000723 3349 docdb.cc:1409] === Apply: SubDocKey(DocKey(0x04cf, ["B\xe8Z2mUO\xba\xb1R\xeaO\xb0\x18v\xff"], []), [ColumnId(1)]) = DEL (type = kTombstone) (int64 = 232), time: { physical: 1565870294982314 }, txn: 7af7b080-2dd3-4dbb-9a3b-f371dd089576, raw: 4704CF5342E85A326D554FBAB152EA4FB01876FF000021214B810D08238001C41760777966803FAB
I0815 11:58:15.001297 3349 docdb.cc:1409] === Apply: SubDocKey(DocKey(0x04cf, ["B\xe8Z2mUO\xba\xb1R\xeaO\xb0\x18v\xff"], []), [ColumnId(2)]) = 9 (type = kInt32) (int64 = 38654705824), time: { physical: 1565870294982314 }, txn: 7af7b080-2dd3-4dbb-9a3b-f371dd089576, raw: 4704CF5342E85A326D554FBAB152EA4FB01876FF000021214B820D08238001C41760777966803F8B
@m-iancu made an assumption about the cause:
What can happen is that:
a. conn 1 creates the table (this does multiple ops internally):
- create YB table
- add
pg_class/pg_attributeentries- modify entries to set default (done as a separate step even when part of create table query itself)
b. conn 2 does an insert between step
2and3then it could cause this issue.
This appears to be correct, the following Java test reliably reproduces this:
https://gist.github.com/frozenspider/9c5719968f5acf21a89c82fdf22f37a3
Should be fixed as part of #1404
I've expanded the test which uncovered this issue into a workload which specifically demonstrates the issue and reports its presence. Note that this problem affects any default value, not just timestamps or NOW(). With Jepsen d9e6a09, try
lein run test --os debian --url https://s3-us-west-2.amazonaws.com/downloads.yugabyte.com/new/yugabyte-1.3.1.2-b1-release-centos-x86_64.tar.gz -w ysql/default-value --time-limit 300
This has been resolved and is no longer reproducible. This was confirmed by verifying the following:
Note that in order to be safe, the testing was done using 100 iterations of the Jepsen test
lein run test --os debian ... -w ysql/append-table --concurrency 2n --time-limit 300
(using Jepsen commit e324b33 and corresponding --version or --url)
Heyyyy congats!!! :)
Most helpful comment
Heyyyy congats!!! :)