Google-cloud-go: Datastore: Calls to Put hang when run inside Kubernetes cluster, fine out of cluster.

Created on 9 Mar 2018  Â·  9Comments  Â·  Source: googleapis/google-cloud-go

I've been having an issue that I cannot figure out or even properly debug. When developing locally with "cloud.google.com/go/datastore" on Kubernetes using an in-cluster configuration, I can write to Cloud Datastore just fine. However when I deploy it on my cluster, my programs hangs and never returns once .Put(... is called on my datastore client. I don't get any output whatsoever. I've been able to get rudimentary gdb access to a running process on my cluster but have not been able to figure out what is going wrong or where the code is getting stuck.

I have followed the directions here.

I have tried loading my service account file by these two methods.

client, err := datastore.NewClient(ctx, projectID, option.WithServiceAccountFile("/var/secrets/google/key.json"))
if err != nil {
    log.Fatalf("Failed to create client: %v", err)
}
client, err := datastore.NewClient(ctx, projectID)
if err != nil {
    log.Fatalf("Failed to create client: %v", err)
}

Both work in creating a valid client.

I also tried moving to new nodes with more permissions enabled with:

gcloud --project MY_PROJECT container node-pools create main-pool \ --cluster my-cluster-us-cntrl1a \ --zone us-central1-a \ --enable-autoupgrade \ --num-nodes 1 --machine-type n1-standard-2 \ --enable-autoscaling --min-nodes=1 --max-nodes=6 \ --scopes cloud-platform,datastore

The permissions to my cluster looks like this:
screen shot 2018-03-08 at 6 54 28 pm

My service account has the role of Cloud Datastore User and Owner for good measure.

What are other things to check for when running on Kubernetes from within the cluster? Is there any good way to debug this to get logs as to what's happening?

datastore p1 bug

Most helpful comment

I finally just used a multi-stage docker image, which 1) installs
certificates from an alpine image then 2) creates the final image from
scratch, copying my app's binary and certificates.

Works fine

Le jeu. 27 déc. 2018 à 10:15, Jean de Klerk notifications@github.com a
écrit :

@twiggg https://github.com/twiggg The scratch distro appears to have a
package manager https://github.com/emmett1/scratchpkg. I would imagine
you could add it using this.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/googleapis/google-cloud-go/issues/928#issuecomment-450110927,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKx8lBBsL8O61HNVwNS7BBXiw6BjCdxPks5u9I-WgaJpZM4SjrAi
.

All 9 comments

I tried to replicate this. I wrote the following program:

package main

import (
    "flag"
    "fmt"
    "log"

    "cloud.google.com/go/datastore"
    "golang.org/x/net/context"
)

var projectID = flag.String("project", "", "project ID")

type Task struct {
    Description string
}

func main() {
    flag.Parse()
    ctx := context.Background()
    client, err := datastore.NewClient(ctx, *projectID)
    if err != nil {
        log.Fatal(err)
    }
    log.Print("created client")
    key := datastore.NameKey("Task", "milk", nil)
    task := &Task{Description: "Buy milk"}
    if _, err := client.Put(ctx, key, task); err != nil {
        log.Fatalf("Put: %v", err)
    }
    log.Print("Put succeeded")
    var gtask Task
    if err := client.Get(ctx, key, &gtask); err != nil {
        log.Fatalf("Get: %v", err)
    }
    fmt.Println(gtask)
}

I made a docker container for it:

FROM gcr.io/distroless/base
ENV GRPC_GO_LOG_SEVERITY_LEVEL INFO
ADD put-get .
ENTRYPOINT ["./put-get"]

(Note the environment variable enabling gRPC logging.)

I tagged and pushed it:

docker build -t datastore-put-get .
docker tag datastore-put-get gcr.io/MY_PROJECT/datastore-put-get
gcloud docker -- push gcr.io/MY_PROJECT/datastore-put-get

I wrote a pod yaml:

apiVersion: v1
kind: Pod
metadata:
  name: datastore-put-get
spec:
  containers:
  - name: datastore-put-get
    image: gcr.io/MY_PROJECT/datastore-put-get
    args: [-project, MY_PROJECT]

Then I ran it on my GKE cluster and grabbed the output:

$ kubectl create -f put-get.yaml 
pod "datastore-put-get" created
$ kubectl logs datastore-put-get
INFO: 2018/03/09 18:57:18 dialing to target with scheme: ""
2018/03/09 18:57:18 created client
INFO: 2018/03/09 18:57:18 ccResolverWrapper: sending new addresses to cc: [{datastore.googleapis.com:443 0  <nil>}]
INFO: 2018/03/09 18:57:18 ClientConn switching balancer to "pick_first"
INFO: 2018/03/09 18:57:18 pickfirstBalancer: HandleSubConnStateChange: 0xc420180630, CONNECTING
INFO: 2018/03/09 18:57:18 pickfirstBalancer: HandleSubConnStateChange: 0xc420180630, READY
2018/03/09 18:57:19 Put succeeded
{Buy milk}

Could you duplicate that and see if it works? If it does, how do your real code and commands differ from these?

Thanks for looking into this!

I put that code into my setup in addition to the GRPC_GO_LOG_SEVERITY_LEVEL flag.

Here's what I got in the logs:

INFO: 2018/03/09 21:38:33 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, TRANSIENT_FAILURE
WARNING: 2018/03/09 21:38:33 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: failed to load system roots and no roots provided". Reconnecting...
INFO: 2018/03/09 21:38:33 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, CONNECTING
INFO: 2018/03/09 21:38:26 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, TRANSIENT_FAILURE
WARNING: 2018/03/09 21:38:26 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: failed to load system roots and no roots provided". Reconnecting...
INFO: 2018/03/09 21:38:26 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, CONNECTING
INFO: 2018/03/09 21:38:22 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, TRANSIENT_FAILURE
WARNING: 2018/03/09 21:38:22 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: failed to load system roots and no roots provided". Reconnecting...
INFO: 2018/03/09 21:38:22 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, CONNECTING
INFO: 2018/03/09 21:38:19 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, TRANSIENT_FAILURE
WARNING: 2018/03/09 21:38:19 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: failed to load system roots and no roots provided". Reconnecting...
INFO: 2018/03/09 21:38:19 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, CONNECTING
INFO: 2018/03/09 21:38:18 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, TRANSIENT_FAILURE
WARNING: 2018/03/09 21:38:18 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: failed to load system roots and no roots provided". Reconnecting...
INFO: 2018/03/09 21:38:18 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, CONNECTING
INFO: 2018/03/09 21:38:17 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, TRANSIENT_FAILURE
WARNING: 2018/03/09 21:38:17 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: failed to load system roots and no roots provided". Reconnecting...
INFO: 2018/03/09 21:38:17 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, CONNECTING
INFO: 2018/03/09 21:38:17 ClientConn switching balancer to "pick_first"
INFO: 2018/03/09 21:38:17 ccResolverWrapper: sending new addresses to cc: [{datastore.googleapis.com:443 0 <nil>}]
2018/03/09 21:38:17 created client
INFO: 2018/03/09 21:38:17 dialing to target with scheme: ""

Could you be using alpine? See #791.

Ah! Yes I am running it on alpine.

Adding RUN apk --no-cache --update add ca-certificates to my dockerfile did the trick!

Ah! Yes I am running it on alpine.

Adding RUN apk --no-cache --update add ca-certificates to my dockerfile did the trick!

Jeffd, if we have a multi-build container, would you running that 'apk add ca-certificates' in the build portion or the second stage, or both? Using golang:alpine for the builder stage and alpine:latest for the copy-from-builder final stage.

Also running the 'RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo' for building the Go binary, gRPC's still hang no matter what we try.

Below are the gRPC logs:

IINFO: 2018/11/14 17:09:29 pickfirstBalancer: HandleSubConnStateChange: 0xc0002244b0, CONNECTING
WARNING: 2018/11/14 17:09:29 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0  <nil>}. Err :connection error: desc = "transport: authenticatio
n handshake failed: x509: certificate signed by unknown authority". Reconnecting...
INFO: 2018/11/14 17:09:29 pickfirstBalancer: HandleSubConnStateChange: 0xc0002244b0, TRANSIENT_FAILURE
INFO: 2018/11/14 17:09:30 pickfirstBalancer: HandleSubConnStateChange: 0xc0002244b0, CONNECTING
WARNING: 2018/11/14 17:09:30 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0  <nil>}. Err :connection error: desc = "transport: authenticatio
n handshake failed: x509: certificate signed by unknown authority". Reconnecting...
INFO: 2018/11/14 17:09:30 pickfirstBalancer: HandleSubConnStateChange: 0xc0002244b0, TRANSIENT_FAILURE
INFO: 2018/11/14 17:09:32 pickfirstBalancer: HandleSubConnStateChange: 0xc0002244b0, CONNECTING
WARNING: 2018/11/14 17:09:32 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0  <nil>}. Err :connection error: desc = "transport: authenticatio
n handshake failed: x509: certificate signed by unknown authority". Reconnecting...
INFO: 2018/11/14 17:09:32 pickfirstBalancer: HandleSubConnStateChange: 0xc0002244b0, TRANSIENT_FAILURE

On Google Cloud Platform, starting an instance from docker image (VM with Container-Optimized OS), built from scratch and adding a compiled golang app (bin), setting the GRPC_GO_LOG_SEVERITY_LEVEL to INFO also shows the underlying grpc call for a datastoreClient.Put() fails silently due to x509 unknown certificate authority.

My docker Image is based on scratch and only contains the bin and opens 80/443 ports. Since this is not based on Alpine but scratch I can not do the magic
"RUN apk --no-cache --update add ca-certificates"
If I don't want an multi-stage build.

Any other way to include ca-certificates ?

...

I'm migrating from appengine, where I did not need to use the client.Put() but an older package where I just called datastore.Put(ctx,key,entity) ... So I did not car about TLS, grpc and certificates ...

Somebody has an idea on that?

@JohnAntonusMaximus the first stage of the Docker build should be to build your golang bin and import/test things, the second stage should start from a scratch image and copy only the artifacts for the app if I understood it well

@twiggg The scratch distro appears to have a package manager https://github.com/emmett1/scratchpkg. I would imagine you could add it using this.

I finally just used a multi-stage docker image, which 1) installs
certificates from an alpine image then 2) creates the final image from
scratch, copying my app's binary and certificates.

Works fine

Le jeu. 27 déc. 2018 à 10:15, Jean de Klerk notifications@github.com a
écrit :

@twiggg https://github.com/twiggg The scratch distro appears to have a
package manager https://github.com/emmett1/scratchpkg. I would imagine
you could add it using this.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/googleapis/google-cloud-go/issues/928#issuecomment-450110927,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKx8lBBsL8O61HNVwNS7BBXiw6BjCdxPks5u9I-WgaJpZM4SjrAi
.

Was this page helpful?
0 / 5 - 0 ratings