Yugabyte-db: YCQL select with offset/limit timeout

Created on 1 Apr 2019  Â·  8Comments  Â·  Source: yugabyte/yugabyte-db

Version: yugabyte-ce-1.2.2.0-darwin

When SELECT from a table with offset/limit, if offset+limit > number of rows, the cqlsh will return error: Client request timeout. The CPU load of yb-tserver will > 100% then.

It looks like there is a dead loop in yb-tserver in this case.

communitrequest kinbug

Most helpful comment

hi @tigerzhang - checked with the team- I believe they have narrowed down the issue, and expect to have a fix in the next 3-4 days.

All 8 comments

@tigerzhang thank you for reporting this issue!

I tried to reproduce this issue on a small example, did not happen with 3 rows: https://gist.githubusercontent.com/mbautin/2c378a6dba7d49d495bf36689e6607bb/raw

Will also try to create an automated test with different numbers of rows.

@tigerzhang is there anything else you could tell us about your workload/dataset, e.g. the schema, the number of rows, if the workload writes new keys vs. overwrites old keys, that might help diagnose this issue?

OS: macOS Mojave 10.14.3

  • Local one node cluster:
➜  yugabyte-1.2.2.0 ./bin/yb-ctl destroy
➜  yugabyte-1.2.2.0 ./bin/yb-ctl create

Workload https://github.com/tigerzhang/yb-sample-apps/blob/master/src/main/java/com/yugabyte/sample/apps/CassandraYunbaSub.java

➜  yb-sample-apps master ✗ java -jar ./target/yb-sample-apps.jar --workload CassandraYunbaSub --nodes 127.0.0.1:9042
  • Break the workload after 1 minute.

  • Query data using cqlsh

➜  yugabyte-1.2.2.0 ~/yugabyte/yugabyte-1.2.2.0/bin/cqlsh
cqlsh> use ybdemo_keyspace ;
cqlsh:ybdemo_keyspace> select * from cassandrayunbasub2 where appkey_topic='appkey0_topic1';

 appkey  | id  | appkey_topic
---------+-----+----------------
 appkey0 | 100 | appkey0_topic1
 appkey0 | 101 | appkey0_topic1
 appkey0 | 102 | appkey0_topic1
 appkey0 | 103 | appkey0_topic1
 appkey0 | 104 | appkey0_topic1
 appkey0 | 105 | appkey0_topic1
 appkey0 | 106 | appkey0_topic1
 appkey0 | 107 | appkey0_topic1
 appkey0 | 108 | appkey0_topic1
 appkey0 | 109 | appkey0_topic1
 appkey0 | 110 | appkey0_topic1
 appkey0 | 111 | appkey0_topic1
 appkey0 | 112 | appkey0_topic1
 appkey0 | 113 | appkey0_topic1
 appkey0 | 114 | appkey0_topic1
 appkey0 | 115 | appkey0_topic1
 appkey0 | 116 | appkey0_topic1
 appkey0 | 117 | appkey0_topic1
 appkey0 | 118 | appkey0_topic1
 appkey0 | 119 | appkey0_topic1
 appkey0 | 120 | appkey0_topic1
 appkey0 | 121 | appkey0_topic1
 appkey0 | 122 | appkey0_topic1
 appkey0 | 123 | appkey0_topic1
 appkey0 | 124 | appkey0_topic1
 appkey0 | 125 | appkey0_topic1
 appkey0 | 126 | appkey0_topic1
 appkey0 | 127 | appkey0_topic1
 appkey0 | 128 | appkey0_topic1
 appkey0 | 129 | appkey0_topic1
 appkey0 | 130 | appkey0_topic1
 appkey0 | 131 | appkey0_topic1
 appkey0 | 132 | appkey0_topic1
 appkey0 | 133 | appkey0_topic1
 appkey0 | 134 | appkey0_topic1
 appkey0 | 135 | appkey0_topic1
 appkey0 | 136 | appkey0_topic1
 appkey0 | 137 | appkey0_topic1
 appkey0 | 138 | appkey0_topic1
 appkey0 | 139 | appkey0_topic1
 appkey0 | 140 | appkey0_topic1
 appkey0 | 141 | appkey0_topic1
 appkey0 | 142 | appkey0_topic1
 appkey0 | 143 | appkey0_topic1
 appkey0 | 144 | appkey0_topic1
 appkey0 | 145 | appkey0_topic1
 appkey0 | 146 | appkey0_topic1
 appkey0 | 147 | appkey0_topic1
 appkey0 | 148 | appkey0_topic1
 appkey0 | 149 | appkey0_topic1
 appkey0 | 150 | appkey0_topic1
 appkey0 | 151 | appkey0_topic1
 appkey0 | 152 | appkey0_topic1
 appkey0 | 153 | appkey0_topic1
 appkey0 | 154 | appkey0_topic1
 appkey0 | 155 | appkey0_topic1
 appkey0 | 156 | appkey0_topic1
 appkey0 | 157 | appkey0_topic1
 appkey0 | 158 | appkey0_topic1
 appkey0 | 159 | appkey0_topic1
 appkey0 | 160 | appkey0_topic1
 appkey0 | 161 | appkey0_topic1
 appkey0 | 162 | appkey0_topic1
 appkey0 | 163 | appkey0_topic1
 appkey0 | 164 | appkey0_topic1
 appkey0 | 165 | appkey0_topic1
 appkey0 | 166 | appkey0_topic1
 appkey0 | 167 | appkey0_topic1
 appkey0 | 168 | appkey0_topic1
 appkey0 | 169 | appkey0_topic1
 appkey0 | 170 | appkey0_topic1
 appkey0 | 171 | appkey0_topic1
 appkey0 | 172 | appkey0_topic1
 appkey0 | 173 | appkey0_topic1
 appkey0 | 174 | appkey0_topic1
 appkey0 | 175 | appkey0_topic1
 appkey0 | 176 | appkey0_topic1
 appkey0 | 177 | appkey0_topic1
 appkey0 | 178 | appkey0_topic1
 appkey0 | 179 | appkey0_topic1
 appkey0 | 180 | appkey0_topic1
 appkey0 | 181 | appkey0_topic1
 appkey0 | 182 | appkey0_topic1
 appkey0 | 183 | appkey0_topic1
 appkey0 | 184 | appkey0_topic1
 appkey0 | 185 | appkey0_topic1
 appkey0 | 186 | appkey0_topic1
 appkey0 | 187 | appkey0_topic1
 appkey0 | 188 | appkey0_topic1
 appkey0 | 189 | appkey0_topic1
 appkey0 | 190 | appkey0_topic1
 appkey0 | 191 | appkey0_topic1
 appkey0 | 192 | appkey0_topic1
 appkey0 | 193 | appkey0_topic1
 appkey0 | 194 | appkey0_topic1
 appkey0 | 195 | appkey0_topic1
 appkey0 | 196 | appkey0_topic1
 appkey0 | 197 | appkey0_topic1
 appkey0 | 198 | appkey0_topic1
 appkey0 | 199 | appkey0_topic1

(100 rows)

cqlsh:ybdemo_keyspace> select * from cassandrayunbasub2 where appkey_topic='appkey0_topic1' limit 30 offset 90;
OperationTimedOut: errors={'127.0.0.1': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=127.0.0.1
cqlsh:ybdemo_keyspace>
  • CPU load of yb-tserver is about 108% after this error occurred.

@tigerzhang thank you for the detailed instructions! We will reproduce and debug the problem on our side ASAP.

Hight CPU load is observed without workload.

Thread name | Cumulative User CPU(s) | Cumulative Kernel CPU(s) | Cumulative IO-wait(s)
rpc_tp_CQLServer_7-86170 | 313.57 | 24.38 | 0
    @     0x7ffbb9f6911f  (unknown)
    @     0x7ffbbf27c7ff  (unknown)
    @     0x7ffbc6ab5afa  yb::ql::Executor::FetchMoreRows()
    @     0x7ffbc6ab682c  yb::ql::Executor::ProcessTnodeResults()
    @     0x7ffbc6ab6262  yb::ql::Executor::ProcessTnodeResults()
    @     0x7ffbc6ab6d96  yb::ql::Executor::ProcessAsyncResults()
    @     0x7ffbc6ab7825  yb::ql::Executor::FlushAsyncDone()
    @     0x7ffbc6ab7d49  _ZN5boost6detail8function26void_function_obj_invoker1IZN2yb2ql8Executor10FlushAsyncEvEUlRKNS3_6StatusEE0_vS8_E6invokeERNS1_15function_bufferES8_
    @     0x7ffbc37819c6  _ZNSt17_Function_handlerIFvvEZN2yb6client8internal7Batcher11RunCallbackERKNS1_6StatusEEUlvE_E9_M_invokeERKSt9_Any_data
    @     0x7ffbc377f462  yb::client::internal::Batcher::RunCallback()
    @     0x7ffbc3780e1d  yb::client::internal::Batcher::CheckForFinishedFlush()
    @     0x7ffbc37760e8  yb::client::internal::AsyncRpc::Finished()
    @     0x7ffbc0a13d79  yb::rpc::OutboundCall::CallCallback()
    @     0x7ffbc0a14100  yb::rpc::OutboundCall::SetFinished()
    @     0x7ffbc0a1be87  yb::rpc::LocalYBInboundCall::Respond()
    @     0x7ffbc0a5c96b  yb::rpc::YBInboundCall::RespondSuccess()

@tigerzhang I was able to reproduce the issue locally with your instructions. We'll work on a fix -- stay tuned!

If I may, any clue?

hi @tigerzhang - checked with the team- I believe they have narrowed down the issue, and expect to have a fix in the next 3-4 days.

@tigerzhang - this was fixed in https://github.com/YugaByte/yugabyte-db/commit/9c140a552b4c08dc945881344cd07c603afe667e.

Will keep you posted once a new release/build is available with this change. But if you are building some source, then you should be ready to try already.

Was this page helpful?
0 / 5 - 0 ratings