Version: yugabyte-ce-1.2.2.0-darwin
When SELECT from a table with offset/limit, if offset+limit > number of rows, the cqlsh will return error: Client request timeout. The CPU load of yb-tserver will > 100% then.
It looks like there is a dead loop in yb-tserver in this case.
@tigerzhang thank you for reporting this issue!
I tried to reproduce this issue on a small example, did not happen with 3 rows: https://gist.githubusercontent.com/mbautin/2c378a6dba7d49d495bf36689e6607bb/raw
Will also try to create an automated test with different numbers of rows.
@tigerzhang is there anything else you could tell us about your workload/dataset, e.g. the schema, the number of rows, if the workload writes new keys vs. overwrites old keys, that might help diagnose this issue?
OS: macOS Mojave 10.14.3
➜ yugabyte-1.2.2.0 ./bin/yb-ctl destroy
➜ yugabyte-1.2.2.0 ./bin/yb-ctl create
➜ yb-sample-apps master ✗ java -jar ./target/yb-sample-apps.jar --workload CassandraYunbaSub --nodes 127.0.0.1:9042
Break the workload after 1 minute.
Query data using cqlsh
➜ yugabyte-1.2.2.0 ~/yugabyte/yugabyte-1.2.2.0/bin/cqlsh
cqlsh> use ybdemo_keyspace ;
cqlsh:ybdemo_keyspace> select * from cassandrayunbasub2 where appkey_topic='appkey0_topic1';
appkey | id | appkey_topic
---------+-----+----------------
appkey0 | 100 | appkey0_topic1
appkey0 | 101 | appkey0_topic1
appkey0 | 102 | appkey0_topic1
appkey0 | 103 | appkey0_topic1
appkey0 | 104 | appkey0_topic1
appkey0 | 105 | appkey0_topic1
appkey0 | 106 | appkey0_topic1
appkey0 | 107 | appkey0_topic1
appkey0 | 108 | appkey0_topic1
appkey0 | 109 | appkey0_topic1
appkey0 | 110 | appkey0_topic1
appkey0 | 111 | appkey0_topic1
appkey0 | 112 | appkey0_topic1
appkey0 | 113 | appkey0_topic1
appkey0 | 114 | appkey0_topic1
appkey0 | 115 | appkey0_topic1
appkey0 | 116 | appkey0_topic1
appkey0 | 117 | appkey0_topic1
appkey0 | 118 | appkey0_topic1
appkey0 | 119 | appkey0_topic1
appkey0 | 120 | appkey0_topic1
appkey0 | 121 | appkey0_topic1
appkey0 | 122 | appkey0_topic1
appkey0 | 123 | appkey0_topic1
appkey0 | 124 | appkey0_topic1
appkey0 | 125 | appkey0_topic1
appkey0 | 126 | appkey0_topic1
appkey0 | 127 | appkey0_topic1
appkey0 | 128 | appkey0_topic1
appkey0 | 129 | appkey0_topic1
appkey0 | 130 | appkey0_topic1
appkey0 | 131 | appkey0_topic1
appkey0 | 132 | appkey0_topic1
appkey0 | 133 | appkey0_topic1
appkey0 | 134 | appkey0_topic1
appkey0 | 135 | appkey0_topic1
appkey0 | 136 | appkey0_topic1
appkey0 | 137 | appkey0_topic1
appkey0 | 138 | appkey0_topic1
appkey0 | 139 | appkey0_topic1
appkey0 | 140 | appkey0_topic1
appkey0 | 141 | appkey0_topic1
appkey0 | 142 | appkey0_topic1
appkey0 | 143 | appkey0_topic1
appkey0 | 144 | appkey0_topic1
appkey0 | 145 | appkey0_topic1
appkey0 | 146 | appkey0_topic1
appkey0 | 147 | appkey0_topic1
appkey0 | 148 | appkey0_topic1
appkey0 | 149 | appkey0_topic1
appkey0 | 150 | appkey0_topic1
appkey0 | 151 | appkey0_topic1
appkey0 | 152 | appkey0_topic1
appkey0 | 153 | appkey0_topic1
appkey0 | 154 | appkey0_topic1
appkey0 | 155 | appkey0_topic1
appkey0 | 156 | appkey0_topic1
appkey0 | 157 | appkey0_topic1
appkey0 | 158 | appkey0_topic1
appkey0 | 159 | appkey0_topic1
appkey0 | 160 | appkey0_topic1
appkey0 | 161 | appkey0_topic1
appkey0 | 162 | appkey0_topic1
appkey0 | 163 | appkey0_topic1
appkey0 | 164 | appkey0_topic1
appkey0 | 165 | appkey0_topic1
appkey0 | 166 | appkey0_topic1
appkey0 | 167 | appkey0_topic1
appkey0 | 168 | appkey0_topic1
appkey0 | 169 | appkey0_topic1
appkey0 | 170 | appkey0_topic1
appkey0 | 171 | appkey0_topic1
appkey0 | 172 | appkey0_topic1
appkey0 | 173 | appkey0_topic1
appkey0 | 174 | appkey0_topic1
appkey0 | 175 | appkey0_topic1
appkey0 | 176 | appkey0_topic1
appkey0 | 177 | appkey0_topic1
appkey0 | 178 | appkey0_topic1
appkey0 | 179 | appkey0_topic1
appkey0 | 180 | appkey0_topic1
appkey0 | 181 | appkey0_topic1
appkey0 | 182 | appkey0_topic1
appkey0 | 183 | appkey0_topic1
appkey0 | 184 | appkey0_topic1
appkey0 | 185 | appkey0_topic1
appkey0 | 186 | appkey0_topic1
appkey0 | 187 | appkey0_topic1
appkey0 | 188 | appkey0_topic1
appkey0 | 189 | appkey0_topic1
appkey0 | 190 | appkey0_topic1
appkey0 | 191 | appkey0_topic1
appkey0 | 192 | appkey0_topic1
appkey0 | 193 | appkey0_topic1
appkey0 | 194 | appkey0_topic1
appkey0 | 195 | appkey0_topic1
appkey0 | 196 | appkey0_topic1
appkey0 | 197 | appkey0_topic1
appkey0 | 198 | appkey0_topic1
appkey0 | 199 | appkey0_topic1
(100 rows)
cqlsh:ybdemo_keyspace> select * from cassandrayunbasub2 where appkey_topic='appkey0_topic1' limit 30 offset 90;
OperationTimedOut: errors={'127.0.0.1': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=127.0.0.1
cqlsh:ybdemo_keyspace>
@tigerzhang thank you for the detailed instructions! We will reproduce and debug the problem on our side ASAP.
Hight CPU load is observed without workload.
Thread name | Cumulative User CPU(s) | Cumulative Kernel CPU(s) | Cumulative IO-wait(s)
rpc_tp_CQLServer_7-86170 | 313.57 | 24.38 | 0
@ 0x7ffbb9f6911f (unknown)
@ 0x7ffbbf27c7ff (unknown)
@ 0x7ffbc6ab5afa yb::ql::Executor::FetchMoreRows()
@ 0x7ffbc6ab682c yb::ql::Executor::ProcessTnodeResults()
@ 0x7ffbc6ab6262 yb::ql::Executor::ProcessTnodeResults()
@ 0x7ffbc6ab6d96 yb::ql::Executor::ProcessAsyncResults()
@ 0x7ffbc6ab7825 yb::ql::Executor::FlushAsyncDone()
@ 0x7ffbc6ab7d49 _ZN5boost6detail8function26void_function_obj_invoker1IZN2yb2ql8Executor10FlushAsyncEvEUlRKNS3_6StatusEE0_vS8_E6invokeERNS1_15function_bufferES8_
@ 0x7ffbc37819c6 _ZNSt17_Function_handlerIFvvEZN2yb6client8internal7Batcher11RunCallbackERKNS1_6StatusEEUlvE_E9_M_invokeERKSt9_Any_data
@ 0x7ffbc377f462 yb::client::internal::Batcher::RunCallback()
@ 0x7ffbc3780e1d yb::client::internal::Batcher::CheckForFinishedFlush()
@ 0x7ffbc37760e8 yb::client::internal::AsyncRpc::Finished()
@ 0x7ffbc0a13d79 yb::rpc::OutboundCall::CallCallback()
@ 0x7ffbc0a14100 yb::rpc::OutboundCall::SetFinished()
@ 0x7ffbc0a1be87 yb::rpc::LocalYBInboundCall::Respond()
@ 0x7ffbc0a5c96b yb::rpc::YBInboundCall::RespondSuccess()
@tigerzhang I was able to reproduce the issue locally with your instructions. We'll work on a fix -- stay tuned!
If I may, any clue?
hi @tigerzhang - checked with the team- I believe they have narrowed down the issue, and expect to have a fix in the next 3-4 days.
@tigerzhang - this was fixed in https://github.com/YugaByte/yugabyte-db/commit/9c140a552b4c08dc945881344cd07c603afe667e.
Will keep you posted once a new release/build is available with this change. But if you are building some source, then you should be ready to try already.
Most helpful comment
hi @tigerzhang - checked with the team- I believe they have narrowed down the issue, and expect to have a fix in the next 3-4 days.