Yugabyte-db: [YSCQL] Querying Transaction Enabled YCL table using Rust Driver v1.8 results Seg Fault: SQL Error: Execution Error. Timed out waiting for read time

Created on 7 Aug 2019  Â·  9Comments  Â·  Source: yugabyte/yugabyte-db

This is my setup

sudo $HOME/yugabyte-1.3.1.0/bin/yb-master \

  --master_addresses 139.99.221.130:7100 \

  --rpc_bind_addresses 139.99.221.130 \

  --fs_data_dirs "/home/ubuntu/disk1,/home/ubuntu/disk2" \

  >& /home/ubuntu/disk1/yb-master.out \

  --replication_factor=1 &
sudo $HOME/yugabyte-1.3.1.0/bin/yb-tserver \
  --tserver_master_addrs 139.99.221.130:7100 \
  --rpc_bind_addresses 139.99.221.130 \
  --start_pgsql_proxy \
  --pgsql_proxy_bind_address=139.99.221.130:5433 \
  --cql_proxy_bind_address=139.99.221.130:9042 \
  --fs_data_dirs "/home/ubuntu/disk1,/home/ubuntu/disk2" \
  >& /home/ubuntu/disk1/yb-tserver.out &

When I run SELECT * FROM articles WHERE key="key_id" LIMIT 1 using Cassandra driver this works fine.

But when I run SELECT * FROM articles LIMIT 1

get Seg fault from the server Logs can be found here 139.99.221.130:7100

Some error messages

Timed out (yb/tserver/tablet_service.cc:980): Failed Read(tablet: 71b4d17c06e949d7a6ea653277b7deb2, num_ops: 1, num_attempts: 47, txn: 0000000
E0806 12:00:28.461766  2095 process_context.cc:180] SQL Error: Execution Error. Timed out waiting for read time
SELECT * FROM rekisa.articles WHERE store_id = ? LIMIT 6 OFFSET 0
^^^^^^
W0806 12:00:28.462041  2097 cql_rpc.cc:268] CQL Call from 120.29.52.204:54710, stream id: 10492 took 60001ms. Details:
W0806 12:00:28.462188  2097 cql_rpc.cc:271] cql_details {
  type: "QUERY"
  call_details {
    sql_string: "SELECT * FROM rekisa.articles WHERE store_id = ? LIMIT 6 OFFSET 0  "
  }
}
Trace:
W0806 12:00:28.462211  2097 cql_rpc.cc:272]
aredocdb communitrequest kinbug

All 9 comments

@davidoabile provided access to his test environment. Here's the stack trace from the core file:

(gdb) where
#0  0x00007fc6bd284c63 in tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**) () from /home/centos/yugabyte-1.3.1.0/lib/yb-thirdparty/libtcmalloc.so.4
#1  0x00007fc6bd284f5c in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /home/centos/yugabyte-1.3.1.0/lib/yb-thirdparty/libtcmalloc.so.4
#2  0x00007fc6bd285009 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /home/centos/yugabyte-1.3.1.0/lib/yb-thirdparty/libtcmalloc.so.4
#3  0x00007fc6bd288967 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long)) () from /home/centos/yugabyte-1.3.1.0/lib/yb-thirdparty/libtcmalloc.so.4
#4  0x00007fc6b922c0cd in google::protobuf::TextFormat::Printer::SetUseUtf8StringEscaping(bool) () from /home/centos/yugabyte-1.3.1.0/lib/yb-thirdparty/libprotobuf.so.15
#5  0x00007fc6b922dac2 in google::protobuf::TextFormat::Printer::Printer() () from /home/centos/yugabyte-1.3.1.0/lib/yb-thirdparty/libprotobuf.so.15
#6  0x00007fc6b922f59f in google::protobuf::Message::DebugString() const () from /home/centos/yugabyte-1.3.1.0/lib/yb-thirdparty/libprotobuf.so.15
#7  0x00007fc6c6f4bc6b in yb::cqlserver::CQLInboundCall::LogTrace (this=0x2361410) at ../../src/yb/yql/cql/cqlserver/cql_rpc.cc:271
#8  0x00007fc6bf3d9699 in yb::rpc::InboundCall::QueueResponse (this=this@entry=0x2361410, is_success=is_success@entry=true) at ../../src/yb/rpc/inbound_call.cc:178
#9  0x00007fc6c6f47d81 in yb::cqlserver::CQLInboundCall::RespondSuccess (this=0x2361410, buffer=..., metrics=...) at ../../src/yb/yql/cql/cqlserver/cql_rpc.cc:189
#10 0x00007fc6c6f3d6aa in yb::cqlserver::CQLProcessor::SendResponse (this=this@entry=0x286a000, response=...) at ../../src/yb/yql/cql/cqlserver/cql_processor.cc:189
#11 0x00007fc6c6f3e5cd in yb::cqlserver::CQLProcessor::StatementExecuted (this=0x286a000, s=..., result=...) at ../../src/yb/yql/cql/cqlserver/cql_processor.cc:387
#12 0x00007fc6c62e89d2 in Run (a2=..., a1=..., this=0x7fc69d4e8480) at ../../src/yb/gutil/callback.h:494
#13 yb::ql::QLProcessor::RunAsyncDone(std::string const&, yb::ql::StatementParameters const&, yb::ql::ParseTree const*, yb::Callback<void (yb::Status const&, std::shared_ptr<yb::ql::ExecutedResult> const&)>, yb::Status const&, std::shared_ptr<yb::ql::ExecutedResult> const&) (this=<optimized out>, stmt=..., params=..., parse_tree=<optimized out>, cb=..., s=..., result=...) at ../../src/yb/yql/cql/ql/ql_processor.cc:386
#14 0x00007fc6c62e8ccf in Run (a6=..., a5=..., a4=..., a3=<synthetic pointer>, a2=..., a1=..., object=<optimized out>, this=<synthetic pointer>) at ../../src/yb/gutil/bind_internal.h:545
#15 MakeItSo (a7=..., a6=..., a5=..., a4=<optimized out>, a3=..., a2=..., a1=<optimized out>, runnable=...) at ../../src/yb/gutil/bind_internal.h:1005
#16 yb::internal::Invoker<5, yb::internal::BindState<yb::internal::RunnableAdapter<void (yb::ql::QLProcessor::*)(std::string const&, yb::ql::StatementParameters const&, yb::ql::ParseTree const*, yb::Callback<void (yb::Status const&, std::shared_ptr<yb::ql::ExecutedResult> const&)>, yb::Status const&, std::shared_ptr<yb::ql::ExecutedResult> const&)>, void (yb::ql::QLProcessor*, std::string const&, yb::ql::StatementParameters const&, yb::ql::ParseTree const*, yb::Callback<void (yb::Status const&, std::shared_ptr<yb::ql::ExecutedResult> const&)>, yb::Status const&, std::shared_ptr<yb::ql::ExecutedResult> const&), void (yb::internal::UnretainedWrapper<yb::ql::QLProcessor>, yb::internal::ConstRefWrapper<std::string>, yb::internal::ConstRefWrapper<yb::ql::StatementParameters>, yb::internal::OwnedWrapper<yb::ql::ParseTree const>, yb::Callback<void (yb::Status const&, std::shared_ptr<yb::ql::ExecutedResult> const&)>)>, void (yb::ql::QLProcessor*, std::string const&, yb::ql::StatementParameters const&, yb::ql::ParseTree const*, yb::Callback<void (yb::Status const&, std::shared_ptr<yb::ql::ExecutedResult> const&)>, yb::Status const&, std::shared_ptr<yb::ql::ExecutedResult> const&)>::Run(yb::internal::BindStateBase*, yb::Status const&, std::shared_ptr<yb::ql::ExecutedResult> const&) (base=<optimized out>, x6=..., x7=...) at ../../src/yb/gutil/bind_internal.h:2323
#17 0x00007fc6c5e8f5db in Run (a2=..., a1=..., this=0x7fc69d4e8500) at ../../src/yb/gutil/callback.h:494
#18 yb::ql::Executor::StatementExecuted (this=this@entry=0x286a478, s=...) at ../../src/yb/yql/cql/ql/exec/executor.cc:2184
#19 0x00007fc6c5e95440 in yb::ql::Executor::ProcessAsyncResults (this=0x286a478, rescheduled=<optimized out>) at ../../src/yb/yql/cql/ql/exec/executor.cc:1524
#20 0x00007fc6bf437788 in yb::rpc::(anonymous namespace)::Worker::Execute (this=<optimized out>) at ../../src/yb/rpc/thread_pool.cc:100
#21 0x00007fc6bd825849 in operator() (this=0x11f7d08) at /home/centos/yugabyte-1.3.1.0/linuxbrew-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#22 yb::Thread::SuperviseThread (arg=<optimized out>) at ../../src/yb/util/thread.cc:690
#23 0x00007fc6b853f694 in start_thread (arg=0x7fc69d4e9700) at pthread_create.c:333
#24 0x00007fc6b7c7c41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Per @davidoabile -

queries like:

SELECT COUNT(id) as row FROM articles

runs fine on in cqlsh, but fails/crashes from their Rust program.

He mentioned they are using
Rust version 1.18
from https://github.com/AlexPikalov/cdrs

Perhaps the above stack trace is unrelated to crash.

When @davidoabile ran the workload again, the corresponding core file generated had this backtrace this time:

Core was generated by `/home/centos/yugabyte-1.3.1.0/bin/yb-tserver --tserver_master_addrs 139.99.221.'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000005075b00 in ?? ()
(gdb) where
#0  0x0000000005075b00 in ?? ()
#1  0x00007f1ce5a4ad6e in MessageSize (value=...) at /n/jenkins/thirdparty/yugabyte-thirdparty-2019-04-29T16_41_21/thirdparty/installed/uninstrumented/include/google/protobuf/wire_format_lite_inl.h:1020
#2  yb::tserver::ReadResponsePB::ByteSizeLong (this=0x50a9d88) at src/yb/tserver/tserver.pb.cc:5951
#3  0x00007f1ce5571c6e in ByteSize (this=<optimized out>) at /n/jenkins/thirdparty/yugabyte-thirdparty-2019-04-29T16_41_21/thirdparty/installed/uninstrumented/include/google/protobuf/message_lite.h:346
#4  yb::rpc::RpcContext::RespondSuccess (this=this@entry=0x7f1cb6603938) at ../../src/yb/rpc/rpc_context.cc:131
#5  0x00007f1ceaed746e in yb::tserver::RpcOperationCompletionCallback<yb::tserver::ReadResponsePB>::OperationCompleted (this=this@entry=0x7f1cb6603920) at ../../src/yb/tserver/service_util.h:159
#6  0x00007f1ceaf33e43 in OperationCompleted (this=0x7f1cb6603920) at /home/centos/yugabyte-1.3.1.0/linuxbrew-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/atomic_base.h:477
#7  yb::tserver::TabletServiceImpl::CompleteRead (this=this@entry=0x1e787e0, read_context=read_context@entry=0x7f1cb6603b70) at ../../src/yb/tserver/tablet_service.cc:1274
#8  0x00007f1ceaf351f2 in yb::tserver::TabletServiceImpl::Read (this=this@entry=0x1e787e0, req=req@entry=0x50a9cf0, resp=resp@entry=0x50a9d88, context=...) at ../../src/yb/tserver/tablet_service.cc:1207
#9  0x00007f1ce935317a in yb::tserver::TabletServerServiceIf::Handle (this=0x1e787e0, call=...) at src/yb/tserver/tserver_service.service.cc:154
#10 0x00007f1ce5589dc1 in yb::rpc::ServicePoolImpl::Handle (this=0x227da40, incoming=...) at ../../src/yb/rpc/service_pool.cc:243
#11 0x00007f1ce5536ef4 in yb::rpc::InboundCall::InboundCallTask::Run (this=<optimized out>) at ../../src/yb/rpc/inbound_call.cc:207
#12 0x00007f1ce5595788 in yb::rpc::(anonymous namespace)::Worker::Execute (this=<optimized out>) at ../../src/yb/rpc/thread_pool.cc:100
#13 0x00007f1ce3983849 in operator() (this=0x4478e28) at /home/centos/yugabyte-1.3.1.0/linuxbrew-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#14 yb::Thread::SuperviseThread (arg=<optimized out>) at ../../src/yb/util/thread.cc:690
#15 0x00007f1cde69d694 in start_thread (arg=0x7f1cb6605700) at pthread_create.c:333
#16 0x00007f1cdddda41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

UPDATE:

@davidoabile indicated that he was able to narrow this down further.

He's running into these issues when using transaction enabled YCQL tables with Rust driver. The same table, when queried from cqlsh works just fine.

@srhickma was able to repro this locally using Rust driver and the sample program provided by @davidoabile . Looks like the combination of using Rust driver against a transaction enabled YCQL table is somehow causing simple selects to timeout and some time segfault/crash.

@srhickma - for now, assigning to you now to share the steps you used to reproduce the program.

After that, we can then make a call on who can help further investigate/work on the fix.

Thanks!

This can be reproduced using the following steps:

  1. Download files from https://gist.github.com/srhickma/4baf24dbbb2bce311e0550ce02e5661a and place the files in the following structure:
test_yb/Cargo.toml
test_yb/src/main.rs
  1. Build the project:
$ cd test_yb
$ cargo build
  1. Start a local yugabyte cluster using yb-ctl create, then run CREATE=1 ./target/debug/test_yb to create the test table and keyspace.

  2. Running the following will produce the error (it will hang until a timeout):

./target/debug/test_yb

I was able to fix this issue by manually setting the query consistency to QUORUM (cqlsh uses SERIAL internally):

    let params = QueryParamsBuilder::new()
        .consistency(Consistency::Quorum)
        .finalize();

    session.query_with_params("SELECT * FROM test_ks.users", params)
        .expect("Query failed");

So this could be used as a workaround for the time being.

@davidoabile :

This seems to be related to issue we saw with Lua Driver (#684) which was also using a default consistency level of ONE. For YB, the recommended default consistency is QUORUM (since we use Raft for distributed consensus based replication), and as @srhickma mentioned, using that workaround should unblock you.

Thank you for the feedback and for looking I to this.

While I can work with the workaround, I personally believe the server
should detect a malformed behaviour and return an error instead of crashing
with no error message.

Regards,
David Oabile

On Tue, 13 Aug. 2019, 06:48 Kannan Muthukkaruppan, notifications@github.com
wrote:

@davidoabile https://github.com/davidoabile :

This seems to be related to issue we saw with Lua Driver (#684
https://github.com/YugaByte/yugabyte-db/issues/684) which was also
using a default consistency level of ONE. For YB, the recommended default
consistency is QUORUM (since we use Raft for distributed consensus based
replication), and as @srhickma https://github.com/srhickma mentioned,
using that workaround should unblock you.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/YugaByte/yugabyte-db/issues/1982?email_source=notifications&email_token=AANBOC5QI2QSFDG7VUTURGLQEHECZA5CNFSM4IJ3AB5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4DZA4I#issuecomment-520589425,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AANBOC4XZYSU2SWKFYAJ3YLQEHECZANCNFSM4IJ3AB5A
.

@davidoabile - you are absolutely correct. That's why we will keep this issue open, and we have re-opened #684 also.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

robertpang picture robertpang  Â·  3Comments

robertpang picture robertpang  Â·  3Comments

hudclark picture hudclark  Â·  4Comments

fabiocmazzo picture fabiocmazzo  Â·  4Comments

bmatican picture bmatican  Â·  3Comments