A user running into this crash intermittently:
Core was generated by `/home/yugabyte/tserver/bin/yb-tserver --flagfile /home/yugabyte/tserver/conf/se'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f73a455f28f in _Alloc_hider (__a=..., __dat=<optimized out>, this=<optimized out>)
at /home/yugabyte/yb-software/yugabyte-2.0.11.0-b8-centos-x86_64/linuxbrew-xxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/basic_string.h:109
109 /home/yugabyte/yb-software/yugabyte-2.0.11.0-b8-centos-x86_64/linuxbrew-xxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/basic_string.h: No such file or directory.
(gdb) where
#0 0x00007f73a455f28f in _Alloc_hider (__a=..., __dat=<optimized out>, this=<optimized out>)
at /home/yugabyte/yb-software/yugabyte-2.0.11.0-b8-centos-x86_64/linuxbrew-xxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/basic_string.h:109
#1 basic_string (__str=..., this=0x453d688)
at /home/yugabyte/yb-software/yugabyte-2.0.11.0-b8-centos-x86_64/linuxbrew-xxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/basic_string.h:400
#2 YBTableName (this=0x453d688) at ../../src/yb/client/yb_table_name.h:40
#3 yb::ql::PreparedResult::PreparedResult (this=0x453d680, stmt=...) at ../../src/yb/yql/cql/ql/util/statement_result.cc:125
#4 0x00007f73a6ee1f0d in yb::ql::Statement::Prepare (this=<optimized out>, processor=processor@entry=0x609ea00, mem_tracker=...,
internal=internal@entry=false, result=result@entry=0x7f7341bc4350) at ../../src/yb/yql/cql/ql/statement.cc:61
#5 0x00007f73a7bcf4d1 in yb::cqlserver::CQLProcessor::ProcessRequest (this=0x609ea00, req=...) at ../../src/yb/yql/cql/cqlserver/cql_processor.cc:299
#6 0x00007f73a7bd12e5 in yb::cqlserver::CQLProcessor::ProcessRequest (this=this@entry=0x609ea00, req=...)
at ../../src/yb/yql/cql/cqlserver/cql_processor.cc:225
#7 0x00007f73a7bd1656 in yb::cqlserver::CQLProcessor::ProcessCall (this=this@entry=0x609ea00, call=...)
at ../../src/yb/yql/cql/cqlserver/cql_processor.cc:176
#8 0x00007f73a7bea37e in yb::cqlserver::CQLServiceImpl::Handle (this=0x4aa8210, inbound_call=...) at ../../src/yb/yql/cql/cqlserver/cql_service.cc:142
#9 0x00007f739fa6cd49 in yb::rpc::ServicePoolImpl::Handle (this=0x37edd40, incoming=...) at ../../src/yb/rpc/service_pool.cc:262
#10 0x00007f739fa10fb4 in yb::rpc::InboundCall::InboundCallTask::Run (this=<optimized out>) at ../../src/yb/rpc/inbound_call.cc:212
#11 0x00007f739fa78998 in yb::rpc::(anonymous namespace)::Worker::Execute (this=<optimized out>) at ../../src/yb/rpc/thread_pool.cc:99
#12 0x00007f739e0435ff in operator() (this=0x4d0fc78)
at /home/yugabyte/yb-software/yugabyte-2.0.11.0-b8-centos-x86_64/linuxbrew-xxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#13 yb::Thread::SuperviseThread (arg=0x4d0fc20) at ../../src/yb/util/thread.cc:739
#14 0x00007f7398b02694 in start_thread (arg=0x7f7341bc5700) at pthread_create.c:333
#15 0x00007f739823f41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
They are using YCQL Go driver & yugabyte-2.0.11.0-b8-centos-x86_64
Looked into the stack and it looks like the root cause is that stmt.bind_table() is null in statement_result.cc:PreparedResult::PreparedResult.
Checked this with the table_name_ variable in the statement class (pt_dml.h) and it is indeed null.
is_system_ = true,
table_ = {
<std::__shared_ptr<yb::client::YBTable, (__gnu_cxx::_Lock_policy)2>> = {
_M_ptr = 0x0,
_M_refcount = {
_M_pi = 0x0
}
}, <No data fields>},
Interestingly, this is a system table (first line in snippet above). Correlating with the logs it looks like this might be the problematic query:
SELECT durable_writes, strategy_class, strategy_options FROM system.schema_keyspaces WHERE keyspace_name = ?
Testing locally I get:
-- system.schema_keyspaces without bind vars
cqlsh> SELECT durable_writes, strategy_class, strategy_options FROM system.schema_keyspaces;
'schema_keyspaces' not found in keyspace 'system'
cqlsh> SELECT durable_writes, strategy_class, strategy_options FROM system.schema_keyspaces where keyspace_name = 'a';
'schema_keyspaces' not found in keyspace 'system'
-- system.schema_keyspaces without bind vars (suspicious query from logs)
cqlsh> SELECT durable_writes, strategy_class, strategy_options FROM system.schema_keyspaces where keyspace_name = ?;
(0 rows)
-- bind vars on an actually supported system table (system.peers)
cqlsh> select * from system.peers where peer = ?;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid Arguments. Bind variable at position 1 not found
select * from system.peers where peer = ?;
^^^^^^
(ql error -304)"
(0 rows)
So it looks like this is an unsupported system table but somehow we are skipping some basic checks when using bind variables.
I didn't test what happens if we actually execute a query against a query prepared as above but presumably "prepare" might (wrongly) go through and then execute hits that bug.
So two issues:
1) Driver is making calls to system.schema_keyspaces, the system table in older version of Cassandra instead of system_schema.keyspaces (which is the system table in newer versions of Cassandra & YugabyteDB YCQL)
User provided this additional observation:
If the host reported the wrong cql version -->
https://github.com/yugabyte/gocql/blob/master/session.go#L245
then the gocql client would fall back to using that old system table with a bind var -->
https://github.com/yugabyte/gocql/blob/master/metadata.go#L513
2) Even if the driver does issue a query to a non-existing system table, we (YugabyteDB) shouldn't crash. So we need to handle that gracefully.
For issue 1, @rajukumaryb landed https://github.com/yugabyte/gocql/commit/df05d34da67ba698f6a8caedebed8d4b2bff5641 in the gocql driver and that should help avoid the issue.
Issue 2: the extra safety net on the yb-tserver side, to handle the NULL ptr issue - (table_name_ variable in the statement class (pt_dml.h) being null) is still TBD.
The issue with null pointer crash is fixed by the commit above.
For issue 1, @rajukumaryb landed yugabyte/gocql@df05d34 in the gocql driver.
So, I'm closing the issue.
Most helpful comment
So two issues:
1) Driver is making calls to
system.schema_keyspaces, the system table in older version of Cassandra instead ofsystem_schema.keyspaces(which is the system table in newer versions of Cassandra & YugabyteDB YCQL)User provided this additional observation:
If the host reported the wrong cql version -->
https://github.com/yugabyte/gocql/blob/master/session.go#L245
then the gocql client would fall back to using that old system table with a bind var -->
https://github.com/yugabyte/gocql/blob/master/metadata.go#L513
2) Even if the driver does issue a query to a non-existing system table, we (YugabyteDB) shouldn't crash. So we need to handle that gracefully.