Yugabyte-db: Failed to initialize sys tables async

Created on 30 Jul 2018  路  11Comments  路  Source: yugabyte/yugabyte-db

Hello;
I'm trying to evaluate yugabyte for use in a large multi-tennant situation (graph based) currently running on GCP.

The good news, was this worked fine the first time I tried it. I ran it a couple of times successfully with a janus graph backend.

The bad news, now it won't run. I get variations of this error

Not found (yb/util/env_posix.cc:1011): Unable to initialize catalog manager: Failed to initialize sys tables async: Could not load tablet metadata from /mnt/data0/yb-data/master/tablet-meta/00000000000000000000000000000000: /mnt/data0/yb-data/master/tablet-meta/00000000000000000000000000000000: No such file or directory (error 2)

depending on which tag I tried.

I was using 'latest' on Friday, that or any of the other recent tags no longer work. I wondered if this might be a recent update, since there is nothing to change on my part.

The masters keep restarting, presumably because of the error above.

There is one big problem here - I am quite new to kubernetes, GCP and yugabyte, so I might have missed something obvious. I have tried a new cluster, manually deleting disks, storage, images, etc. I have cleared out all I can.

Any help would be appreciated.

kinquestion

Most helpful comment

With some outstanding support this is now working. There was a stale file in the docs, which is being addressed now.
Thanks a lot. This looks like a great product.

All 11 comments

Hi @therealnb,

Great that you are trying out Janus on YugaByte, would love to hear how that goes as that would be useful for the community!

In this issue, for some reason, the disk data does not seem to have been persisted.

Great that you are trying out Janus on YugaByte, would love to hear how that goes.

I am assuming you got the cluster to work after clearing everything out? If the answer is no, thats useful in iteslf. You might need to clear up the data on the underlying machines, the questions below would help us understand that better.

Here is some information that would help us debug further:

  • Could you please describe your environment - is this GKE, or are you running your own Kubernetes on GCP? What nodes/disks are you using for your k8s cluster?

  • Could you please describe how you brought up the cluster (which yaml you used - would be great if you can post that). Mainly looking for details like if you are using a StatefulSet or some other controller type, the volume type (local disk or a persistent mount), etc.

  • Can you post the output of the following for Kubernetes YugaByte cluster:

kubectl get pods
kubectl get pv
kubectl get pvc

air:results1core12cli nigel$ kubectl get pods NAME READY STATUS RESTARTS AGE yb-master-0 0/1 CrashLoopBackOff 2 1m yb-master-1 0/1 CrashLoopBackOff 2 1m yb-master-2 0/1 CrashLoopBackOff 2 1m yb-tserver-0 1/1 Running 0 1m yb-tserver-1 1/1 Running 0 1m yb-tserver-2 1/1 Running 0 1m air:results1core12cli nigel$ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-5e391cb9-9402-11e8-b961-42010a8000e9 1Gi RWO Delete Bound default/datadir-yb-master-0 standard 2h pvc-5e3b45bb-9402-11e8-b961-42010a8000e9 1Gi RWO Delete Bound default/datadir-yb-master-1 standard 2h pvc-5e4018d5-9402-11e8-b961-42010a8000e9 1Gi RWO Delete Bound default/datadir-yb-master-2 standard 2h pvc-5f5783d1-9402-11e8-b961-42010a8000e9 1Gi RWO Delete Bound default/datadir-yb-tserver-0 standard 2h pvc-5f5a81e7-9402-11e8-b961-42010a8000e9 1Gi RWO Delete Bound default/datadir-yb-tserver-1 standard 2h pvc-5f5f972a-9402-11e8-b961-42010a8000e9 1Gi RWO Delete Bound default/datadir-yb-tserver-2 standard 2h air:results1core12cli nigel$ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE datadir-yb-master-0 Bound pvc-5e391cb9-9402-11e8-b961-42010a8000e9 1Gi RWO standard 2h datadir-yb-master-1 Bound pvc-5e3b45bb-9402-11e8-b961-42010a8000e9 1Gi RWO standard 2h datadir-yb-master-2 Bound pvc-5e4018d5-9402-11e8-b961-42010a8000e9 1Gi RWO standard 2h datadir-yb-tserver-0 Bound pvc-5f5783d1-9402-11e8-b961-42010a8000e9 1Gi RWO standard 2h datadir-yb-tserver-1 Bound pvc-5f5a81e7-9402-11e8-b961-42010a8000e9 1Gi RWO standard 2h datadir-yb-tserver-2 Bound pvc-5f5f972a-9402-11e8-b961-42010a8000e9 1Gi RWO standard 2h air:results1core12cli nigel$
This is vanilla GCP/GKE at the moment.

To make the cluster
gcloud container clusters create graphtest --machine-type n1-standard-4 --scopes "https://www.googleapis.com/auth/bigtable.admin","https://www.googleapis.com/auth/bigtable.data"
The scopes stuff was because I was trying bigtable before. I can try that without if you think it will help. That was just a command I reused (cut n paste) this cluster has been recreated since the bigtable effort.

I removed the underlying machines and their disks. As far as I can see I started everything from scratch, apart from starting a new project in GCP.

It was going well, but something has changed. Maybe in gcp, it prompted me to do an update today.

I have tried these tags
latest
1 GB
3 days ago
1.0.5.3-b24
1 GB
3 days ago
1.0.5.3-b20
1 GB
3 days ago
1.0.5.0-b20
1 GB
4 days ago

From https://hub.docker.com/r/yugabytedb/yugabyte/tags/
by manually changing yugabyte-statefulset.yaml

I tried this without the extra bigtable scopes as described in https://docs.yugabyte.com/latest/deploy/public-clouds/gcp/gke/
and it gave the same error.

I tried this as well
gcloud container clusters create graphtest --no-enable-autorepair --scopes "https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/devstorage.full_control"
Since some of the gcloud defaults changed today. Still the same result.

Can anyone else successfully run up a yugabyte cluster as of 30th July 2018? Is it just me?

@therealnb yeah we have some internal integrations testing, but going to test it manually to make sure everything is good on our end. After that I can reach out to you and we probably can jump on a call (hangouts) and sort this out. Let me know what time works with you.

Sorry for the inconvenience.

I pinged you on linkedin. It would be good to connect, working or not.
I'm in the UK, so sooner is better than later for me.

With some outstanding support this is now working. There was a stale file in the docs, which is being addressed now.
Thanks a lot. This looks like a great product.

Thanks @therealnb! Connected with you on Linkedin as well.

Docs updated with the correct url.

Was this page helpful?
0 / 5 - 0 ratings