Yugabyte-db: Leader not ready to serve requests (Error 24) - Replicate database model into Yuga on Vagrant

Created on 28 Mar 2019 · 7Comments · Source: yugabyte/yugabyte-db

Hi Team,
I'm working on trying to create all tables of our Rails application into yuga, but after certain point I'm getting the following error, when the process of table creation is running:

ActiveRecord::StatementInvalid: PG::InternalError: ERROR: Timed out: Read(tablet: 682b5b0d7ab749bfa5710a5386dafcc4, num_ops: 1, num_attempts: 344, txn: 00000000-0000-0000-0000-000000000000) passed its deadline 1212.291s (now: 1212.341s): Leader not ready to serve requests. (yb/consensus/consensus.cc:137): Leader not yet replicated NoOp to be ready to serve requests (error 24)

At first time I thought that it could be related to a memory size problem of my VM so I increased the size to reach almost 9GB, but unfortunately the issue is still happening.
My current configuration is:

vagrant@vagrant:~/yugabyte-1.2.0.2$ free -mh
total        used        free      shared  buff/cache   available
Mem:           8.6G        330M        7.7G         15M        580M        8.0G

it’s a 1 yb-master and 1 yb-tserver:

2019-03-28 20:55:37,353 INFO: Server is running: type=master, node_id=1, PID=1361, admin service=http://192.168.33.10:7000
2019-03-28 20:55:37,357 INFO: Server is running: type=tserver, node_id=1, PID=1364, admin service=http://192.168.33.10:9000, cql service=192.168.33.10:9042, redis service=192.168.33.10:6379, pgsql service=192.168.33.10:5433

PS NOTE: Every time I increased the memory size of my VM i had to destroy and recreate the nodes because after this error they seemed to be blocked/inaccessible.

I'm attaching postgresql* yb-tserver* yb-master* logs which could be helpful for you.

error_24-logs.zip

aredocdb communitrequest kinbug

Source

dbronzetti

All 7 comments

Possibly you are running into this recently fixed issue on the YSQL side.

https://github.com/YugaByte/yugabyte-db/issues/1066

CC: @bmatican , @rao-vasireddy

kmuthukk on 29 Mar 2019

👍1

From the logs:

Application fingerprint: version 1.2.0.2 build 2 revision 8a8750c30366ecc5e6bbe05dae05269b1686dfd4 build_type RELEASE built at 14 Mar 2019 02:57:43 UTC

So it definitely doesn't have the fix for #1066. But not yet sure if it duplicate of that or a different issue.

The pg log contains:

2019-03-28 16:24:14.509 UTC [1330] LOG:  database system is ready to accept connections
2019-03-28 16:42:52.106 UTC [1474] ERROR:  Timed out: Read(tablet: 682b5b0d7ab749bfa5710a5386dafcc4, num_ops: 1, num_attempts: 344, txn: 00000000-0000-0000-0000-000000000000) passed its deadline 1212.291s (now: 1212.341s): Leader not ready to serve requests. (yb/consensus/consensus.cc:137): Leader not yet replicated NoOp to be ready to serve requests (error 24)
2019-03-28 16:42:52.106 UTC [1474] STATEMENT:  CREATE  INDEX "index_batch_sub_list_on_batch" ON "batch_sub_list_members" ("batch_id", "temp_member_id", "sub_list_id")
019-03-28 19:56:52.404 UTC [1330] LOG:  received smart shutdown request
2019-03-28 19:56:52.426 UTC [1330] LOG:  received SIGHUP, reloading configuration files
2019-03-28 19:56:52.427 UTC [1354] LOG:  shutting down

kmuthukk on 29 Mar 2019

And when the issue happens at 16:42:52 in the pg log, there's nothing in the yb-tserver log for:

I0328 16:41:51.254041  2165 replica_state.cc:710] T 5d14f43f72934e0d843ef22b06562ba9 P 394c7b78a2f54b28b5b946d6e26beb8d [term 1 LEADER]: Advanced the committed_index across terms. Last committed operation was: term: 0 index: 0 New committed index is: term: 1 index: 1
I0328 17:24:25.380347  1345 server_base.cc:169] Running on host vagrant

for like 45 minutes; almost as if things froze.

Similar story in the yb-master log:

I0328 16:41:51.254341  1442 catalog_manager.cc:4090] Tablet: 5d14f43f72934e0d843ef22b06562ba9 reported consensus state change. New consensus state: current_term: 1 leader_uuid: "394c7b78a2f54b28b5b946d6e26beb8d" config { opid_index: -1 peers { permanent_uuid\
: "394c7b78a2f54b28b5b946d6e26beb8d" member_type: VOTER last_known_private_addr { host: "192.168.33.10" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }from 394c7b78a2f54b28b5b946d6e26beb8d
I0328 17:24:22.604445  1303 server_base.cc:169] Running on host vagrant

@dbronzetti:

1) Any clues from /var/log/messages from around the same time?
2) Can you give 1.2.2.0 version a try (instead of 1.2.0.2). That has the fix for #1066.

kmuthukk on 29 Mar 2019

@kmuthukk, thanks for analyzing this and help to try to figure out this issue.

Unfortunately I destroy the yuga data directories because I tested some more times some other things so I cant check that now.
As far as I read the issue #1066 seems to have a 3-node cluster config but my config here is the default one (2-node cluster), anyways I will be glad to try the 1.2.2.0 version to see if this case could be another collateral issue to that failure found.

dbronzetti on 29 Mar 2019

1066 leak fix applies to SQL workloads (on single or multi-node).

kmuthukk on 29 Mar 2019

👍1

@dbronzetti added (on slack) that:

<<<
I’m doing the tests with the version 1.2.2.0 as you suggested but I’m getting the error at the beginning of my model replication:

ActiveRecord::StatementInvalid: PG::FeatureNotSupported: \
   ERROR:  WHERE condition option 32 not supported yet
DETAIL:  The WHERE condition option is not supported yet.