Yugabyte-db: [YCQL] signal 11: seg fault: yb::ql::PreparedResult::PreparedResult

Created on 23 Jan 2020  路  4Comments  路  Source: yugabyte/yugabyte-db

A user running into this crash intermittently:

Core was generated by `/home/yugabyte/tserver/bin/yb-tserver --flagfile /home/yugabyte/tserver/conf/se'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f73a455f28f in _Alloc_hider (__a=..., __dat=<optimized out>, this=<optimized out>)
    at /home/yugabyte/yb-software/yugabyte-2.0.11.0-b8-centos-x86_64/linuxbrew-xxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/basic_string.h:109
109 /home/yugabyte/yb-software/yugabyte-2.0.11.0-b8-centos-x86_64/linuxbrew-xxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/basic_string.h: No such file or directory.
(gdb) where
#0  0x00007f73a455f28f in _Alloc_hider (__a=..., __dat=<optimized out>, this=<optimized out>)
    at /home/yugabyte/yb-software/yugabyte-2.0.11.0-b8-centos-x86_64/linuxbrew-xxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/basic_string.h:109
#1  basic_string (__str=..., this=0x453d688)
    at /home/yugabyte/yb-software/yugabyte-2.0.11.0-b8-centos-x86_64/linuxbrew-xxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/basic_string.h:400
#2  YBTableName (this=0x453d688) at ../../src/yb/client/yb_table_name.h:40
#3  yb::ql::PreparedResult::PreparedResult (this=0x453d680, stmt=...) at ../../src/yb/yql/cql/ql/util/statement_result.cc:125
#4  0x00007f73a6ee1f0d in yb::ql::Statement::Prepare (this=<optimized out>, processor=processor@entry=0x609ea00, mem_tracker=...,
    internal=internal@entry=false, result=result@entry=0x7f7341bc4350) at ../../src/yb/yql/cql/ql/statement.cc:61
#5  0x00007f73a7bcf4d1 in yb::cqlserver::CQLProcessor::ProcessRequest (this=0x609ea00, req=...) at ../../src/yb/yql/cql/cqlserver/cql_processor.cc:299
#6  0x00007f73a7bd12e5 in yb::cqlserver::CQLProcessor::ProcessRequest (this=this@entry=0x609ea00, req=...)
    at ../../src/yb/yql/cql/cqlserver/cql_processor.cc:225
#7  0x00007f73a7bd1656 in yb::cqlserver::CQLProcessor::ProcessCall (this=this@entry=0x609ea00, call=...)
    at ../../src/yb/yql/cql/cqlserver/cql_processor.cc:176
#8  0x00007f73a7bea37e in yb::cqlserver::CQLServiceImpl::Handle (this=0x4aa8210, inbound_call=...) at ../../src/yb/yql/cql/cqlserver/cql_service.cc:142
#9  0x00007f739fa6cd49 in yb::rpc::ServicePoolImpl::Handle (this=0x37edd40, incoming=...) at ../../src/yb/rpc/service_pool.cc:262
#10 0x00007f739fa10fb4 in yb::rpc::InboundCall::InboundCallTask::Run (this=<optimized out>) at ../../src/yb/rpc/inbound_call.cc:212
#11 0x00007f739fa78998 in yb::rpc::(anonymous namespace)::Worker::Execute (this=<optimized out>) at ../../src/yb/rpc/thread_pool.cc:99
#12 0x00007f739e0435ff in operator() (this=0x4d0fc78)
    at /home/yugabyte/yb-software/yugabyte-2.0.11.0-b8-centos-x86_64/linuxbrew-xxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#13 yb::Thread::SuperviseThread (arg=0x4d0fc20) at ../../src/yb/util/thread.cc:739
#14 0x00007f7398b02694 in start_thread (arg=0x7f7341bc5700) at pthread_create.c:333
#15 0x00007f739823f41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

They are using YCQL Go driver & yugabyte-2.0.11.0-b8-centos-x86_64

priorithigh

Most helpful comment

So two issues:

1) Driver is making calls to system.schema_keyspaces, the system table in older version of Cassandra instead of system_schema.keyspaces (which is the system table in newer versions of Cassandra & YugabyteDB YCQL)

User provided this additional observation:

If the host reported the wrong cql version -->
https://github.com/yugabyte/gocql/blob/master/session.go#L245
then the gocql client would fall back to using that old system table with a bind var -->
https://github.com/yugabyte/gocql/blob/master/metadata.go#L513

2) Even if the driver does issue a query to a non-existing system table, we (YugabyteDB) shouldn't crash. So we need to handle that gracefully.

All 4 comments

Looked into the stack and it looks like the root cause is that stmt.bind_table() is null in statement_result.cc:PreparedResult::PreparedResult.
Checked this with the table_name_ variable in the statement class (pt_dml.h) and it is indeed null.

  is_system_ = true,
  table_ = {
    <std::__shared_ptr<yb::client::YBTable, (__gnu_cxx::_Lock_policy)2>> = {
      _M_ptr = 0x0,
      _M_refcount = {
        _M_pi = 0x0
      }
    }, <No data fields>},

Interestingly, this is a system table (first line in snippet above). Correlating with the logs it looks like this might be the problematic query:

SELECT durable_writes, strategy_class, strategy_options FROM system.schema_keyspaces WHERE keyspace_name = ?

Testing locally I get:

-- system.schema_keyspaces without bind vars
cqlsh> SELECT durable_writes, strategy_class, strategy_options FROM system.schema_keyspaces;
'schema_keyspaces' not found in keyspace 'system'
cqlsh> SELECT durable_writes, strategy_class, strategy_options FROM system.schema_keyspaces where keyspace_name = 'a';
'schema_keyspaces' not found in keyspace 'system'

-- system.schema_keyspaces without bind vars (suspicious query from logs)
cqlsh> SELECT durable_writes, strategy_class, strategy_options FROM system.schema_keyspaces where keyspace_name = ?;

(0 rows)
-- bind vars on an actually supported system table (system.peers)
cqlsh> select * from system.peers where peer = ?;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid Arguments. Bind variable at position 1 not found
select * from system.peers where peer = ?;
^^^^^^
 (ql error -304)"

(0 rows)

So it looks like this is an unsupported system table but somehow we are skipping some basic checks when using bind variables.

I didn't test what happens if we actually execute a query against a query prepared as above but presumably "prepare" might (wrongly) go through and then execute hits that bug.

So two issues:

1) Driver is making calls to system.schema_keyspaces, the system table in older version of Cassandra instead of system_schema.keyspaces (which is the system table in newer versions of Cassandra & YugabyteDB YCQL)

User provided this additional observation:

If the host reported the wrong cql version -->
https://github.com/yugabyte/gocql/blob/master/session.go#L245
then the gocql client would fall back to using that old system table with a bind var -->
https://github.com/yugabyte/gocql/blob/master/metadata.go#L513

2) Even if the driver does issue a query to a non-existing system table, we (YugabyteDB) shouldn't crash. So we need to handle that gracefully.

For issue 1, @rajukumaryb landed https://github.com/yugabyte/gocql/commit/df05d34da67ba698f6a8caedebed8d4b2bff5641 in the gocql driver and that should help avoid the issue.

Issue 2: the extra safety net on the yb-tserver side, to handle the NULL ptr issue - (table_name_ variable in the statement class (pt_dml.h) being null) is still TBD.

The issue with null pointer crash is fixed by the commit above.

For issue 1, @rajukumaryb landed yugabyte/gocql@df05d34 in the gocql driver.

So, I'm closing the issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fabiocmazzo picture fabiocmazzo  路  5Comments

rkarthik007 picture rkarthik007  路  5Comments

kevbaker picture kevbaker  路  3Comments

rahuldesirazu picture rahuldesirazu  路  3Comments

bmatican picture bmatican  路  3Comments