Ambassador: Ambassador not working on fresh installed kubernetes

Created on 16 May 2018 · 41Comments · Source: datawire/ambassador

Describe the bug
The Ambassador container within ambassador pod does not spawn up properly on a local cluster with kubernetes 1.10.2

To Reproduce
Steps to reproduce the behavior:
A fresh install of kubernetes 1.10.2 on a local cluster.
Follow the guidance to install ambassador-no-rbac.

Expected behavior
expect ambassador to spawn up.
but it give error code 137. Both liveness probe and Readiness probe failed with getsockopt: connection refused.

ambassador-6c7dd7799b-2mjzx   1/2       CrashLoopBackOff   13         37m

Versions (please complete the following information):

Ambassador: 0.32.1
Kubernetes environment: local cluster installed using kubeadm. Configured with calico.
Version 1.10.2
Ubuntu 16.04

Additional context
Someone @richarddli suggested this is due to a dns problem, but local dns (/etc/resolve.conf) is configured using 8.8.8.8 and 8.8.4.4, I am sure at least I can access google website. (By deploying a busybox into my cluster, I checked my busybox container can ping google.com, but not sure for others.)
Here are some extra log for debugging

kubernetes@local-cluster-0:~/my-kubeflow$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c kubedns
I0503 10:46:23.833551       1 dns.go:48] version: 1.14.8
I0503 10:46:23.862220       1 server.go:71] Using configuration read from directory: /kube-dns-config with period 10s
I0503 10:46:23.862374       1 server.go:119] FLAG: --alsologtostderr="false"
I0503 10:46:23.862433       1 server.go:119] FLAG: --config-dir="/kube-dns-config"
I0503 10:46:23.862466       1 server.go:119] FLAG: --config-map=""
I0503 10:46:23.862485       1 server.go:119] FLAG: --config-map-namespace="kube-system"
I0503 10:46:23.862504       1 server.go:119] FLAG: --config-period="10s"
I0503 10:46:23.862532       1 server.go:119] FLAG: --dns-bind-address="0.0.0.0"
I0503 10:46:23.862551       1 server.go:119] FLAG: --dns-port="10053"
I0503 10:46:23.862592       1 server.go:119] FLAG: --domain="cluster.local."
I0503 10:46:23.862621       1 server.go:119] FLAG: --federations=""
I0503 10:46:23.862648       1 server.go:119] FLAG: --healthz-port="8081"
I0503 10:46:23.862668       1 server.go:119] FLAG: --initial-sync-timeout="1m0s"
I0503 10:46:23.862688       1 server.go:119] FLAG: --kube-master-url=""
I0503 10:46:23.862742       1 server.go:119] FLAG: --kubecfg-file=""
I0503 10:46:23.862761       1 server.go:119] FLAG: --log-backtrace-at=":0"
I0503 10:46:23.862794       1 server.go:119] FLAG: --log-dir=""
I0503 10:46:23.862815       1 server.go:119] FLAG: --log-flush-frequency="5s"
I0503 10:46:23.862835       1 server.go:119] FLAG: --logtostderr="true"
I0503 10:46:23.862854       1 server.go:119] FLAG: --nameservers=""
I0503 10:46:23.862872       1 server.go:119] FLAG: --stderrthreshold="2"
I0503 10:46:23.862891       1 server.go:119] FLAG: --v="2"
I0503 10:46:23.862911       1 server.go:119] FLAG: --version="false"
I0503 10:46:23.862953       1 server.go:119] FLAG: --vmodule=""
I0503 10:46:23.863269       1 server.go:201] Starting SkyDNS server (0.0.0.0:10053)
I0503 10:46:23.864231       1 server.go:220] Skydns metrics enabled (/metrics:10055)
I0503 10:46:23.864282       1 dns.go:146] Starting endpointsController
I0503 10:46:23.864363       1 dns.go:149] Starting serviceController
I0503 10:46:23.901849       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0503 10:46:23.901936       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0503 10:46:24.364803       1 dns.go:170] Initialized services and endpoints from apiserver
I0503 10:46:24.364871       1 server.go:135] Setting up Healthz Handler (/readiness)
I0503 10:46:24.364935       1 server.go:140] Setting up cache handler (/cache)
I0503 10:46:24.364962       1 server.go:126] Status HTTP port 8081
I0504 04:33:43.169511       1 dns.go:555] Could not find endpoints for service "tf-hub-0" in namespace "kubeflow". DNS records will be created once endpoints show up.
I0515 01:02:23.572540       1 dns.go:555] Could not find endpoints for service "tf-hub-0" in namespace "kubeflow". DNS records will be created once endpoints show up.

kubernetes@local-cluster-0:~/my-kubeflow$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c dnsmasq
I0503 10:46:24.860136       1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --no-negcache --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0503 10:46:24.870496       1 nanny.go:94] Starting dnsmasq [-k --cache-size=1000 --no-negcache --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0503 10:46:25.334307       1 nanny.go:116] dnsmasq[12]: started, version 2.78 cachesize 1000
I0503 10:46:25.334618       1 nanny.go:116] dnsmasq[12]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0503 10:46:25.334632       1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0503 10:46:25.334638       1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0503 10:46:25.334643       1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0503 10:46:25.334651       1 nanny.go:116] dnsmasq[12]: reading /etc/resolv.conf
I0503 10:46:25.334657       1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0503 10:46:25.334662       1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0503 10:46:25.334668       1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0503 10:46:25.334672       1 nanny.go:116] dnsmasq[12]: using nameserver 8.8.8.8#53
I0503 10:46:25.334678       1 nanny.go:116] dnsmasq[12]: using nameserver 8.8.4.4#53
I0503 10:46:25.334716       1 nanny.go:116] dnsmasq[12]: read /etc/hosts - 7 addresses
I0503 10:46:25.334851       1 nanny.go:119]
W0503 10:46:25.334861       1 nanny.go:120] Got EOF from stdout

kubernetes@local-cluster-0:~/my-kubeflow$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c sidecar
I0503 10:46:25.240827       1 main.go:51] Version v1.14.8
I0503 10:46:25.240868       1 server.go:45] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[{Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33} {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}] PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns})
I0503 10:46:25.240906       1 dnsprobe.go:75] Starting dnsProbe {Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}
I0503 10:46:25.240992       1 dnsprobe.go:75] Starting dnsProbe {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}
W0503 10:46:25.241374       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:58432->127.0.0.1:53: read: connection refused

Can anyone help me have a look? Many thanks in advance!

Source

jiaanguo

Most helpful comment

@victortrac can you please provide more detail how you fixed this?

ttfreeman on 18 Sep 2019

👍2

All 41 comments

Hey! Could you also try doing kubectl logs $ambassadorpod -c ambassador and/or kubectl describe pod $ambassadorpod? My first thought is actually RBAC, but the logs will be a big help. Thanks!

kflynn on 16 May 2018

Hi @kflynn,
These are my logs

kubernetes@local-cluster-0:~/my-kubeflow$ kubectl logs ambassador-99dcfd54c-pbrw9 -n kubeflow -c ambassador
./entrypoint.sh: set: line 63: can't access tty; job control turned off

kubernetes@local-cluster-0:~/my-kubeflow$ kubectl describe pod ambassador-99dcfd54c-pbrw9 -n kubeflow
Labels:         pod-template-hash=558798107
                service=ambassador
Annotations:    <none>
Status:         Running
IP:             192.168.126.70
Controlled By:  ReplicaSet/ambassador-99dcfd54c
Containers:
  ambassador:
    Container ID:   docker://f5b5493b61ffc5ebc1670e17150d40a3c111b3b035e4adbe74f7539ff0752394
    Image:          quay.io/datawire/ambassador:0.26.0
    Image ID:       docker-pullable://quay.io/datawire/ambassador@sha256:76ecbafc7716006b76255916b2a2f4cadb8adbd664fa909aee9d8330773d158d
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Thu, 17 May 2018 19:36:30 +1000
      Finished:     Thu, 17 May 2018 19:38:29 +1000
    Ready:          False
    Restart Count:  374
    Limits:
      cpu:     1
      memory:  400Mi
    Requests:
      cpu:      200m
      memory:   100Mi
    Liveness:   http-get http://:8877/ambassador/v0/check_alive delay=30s timeout=1s period=30s #success=1 #failure=3
    Readiness:  http-get http://:8877/ambassador/v0/check_ready delay=30s timeout=1s period=30s #success=1 #failure=3
    Environment:
      AMBASSADOR_NAMESPACE:         kubeflow (v1:metadata.namespace)
      AMBASSADOR_SINGLE_NAMESPACE:  true
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from ambassador-token-bwzbp (ro)
  statsd:
    Container ID:   docker://9c82a8a224b16dfc6948ec2244c5d386b04303876ffa7de02ddfe44cbfa5e530
    Image:          quay.io/datawire/statsd:0.22.0
    Image ID:       docker-pullable://quay.io/datawire/statsd@sha256:38f2bd8ddc299762f523cab8e485f980ee90b3f25a3c22a30fab401dab8527f7
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Wed, 16 May 2018 15:50:51 +1000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from ambassador-token-bwzbp (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  ambassador-token-bwzbp:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ambassador-token-bwzbp
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                 From                      Message
  ----     ------     ----                ----                      -------
  Warning  Unhealthy  6m (x1121 over 1d)  kubelet, local-cluster-2  Readiness probe failed: Get http://192.168.126.70:8877/ambassador/v0/check_ready: dial tcp 192.168.126.70:8877: getsockopt: connection refused
  Warning  BackOff    1m (x4425 over 1d)  kubelet, local-cluster-2  Back-off restarting failed container

jiaanguo on 17 May 2018

@jiaanguo Hmmm. I don't actually see the kubectl logs output -- should that have been attached? (This might be a good candidate for the Gitter channel, too.)

kflynn on 22 May 2018

@kflynn

kubernetes@local-cluster-0:~$ kubectl logs ambassador-647b46cdf5-mjzmd -n kubeflow ambassador
./entrypoint.sh: set: line 63: can't access tty; job control turned off

I do not think I even have the log for the ambassador.
It just shut down silently.

jiaanguo on 23 May 2018

So it looks like your Ambassador pod is running 0.26.0 with a 0.22.0 statsd container. The mismatch isn't really supported, and Kubeflow is currently shipping with Ambassador 0.30.1. I think the first thing to do here is to upgrade to the latest Kubeflow, and let's see what happens.

kflynn on 23 May 2018

Same issue on a brand new kops-built AWS cluster running kubernetes 1.8.7 and calico. I tried Ambassador 0.34.0 as well as the latest master image.

victortrac on 24 May 2018

@victortrac do you have your logs? that would be a big help. thanks!

richarddli on 24 May 2018

@richarddli Here's the describe from one of the pods:

  Normal   SuccessfulMountVolume  4m                kubelet, ip-10-201-44-14.ec2.internal  MountVolume.SetUp succeeded for volume "ambassador-token-6hndh"
  Normal   Pulling                4m                kubelet, ip-10-201-44-14.ec2.internal  pulling image "quay.io/datawire/ambassador:master-f982c77"
  Normal   Pulled                 4m                kubelet, ip-10-201-44-14.ec2.internal  Successfully pulled image "quay.io/datawire/ambassador:master-f982c77"
  Normal   Created                4m                kubelet, ip-10-201-44-14.ec2.internal  Created container
  Normal   Started                4m                kubelet, ip-10-201-44-14.ec2.internal  Started container
  Normal   Pulled                 4m                kubelet, ip-10-201-44-14.ec2.internal  Container image "quay.io/datawire/statsd:0.34.0" already present on machine
  Normal   Created                4m                kubelet, ip-10-201-44-14.ec2.internal  Created container
  Normal   Started                4m                kubelet, ip-10-201-44-14.ec2.internal  Started container
  Warning  Unhealthy              3m (x3 over 3m)   kubelet, ip-10-201-44-14.ec2.internal  Liveness probe failed: Get http://100.119.139.133:8877/ambassador/v0/check_alive: dial tcp 100.119.139.133:8877: getsockopt: connection refused
  Warning  Unhealthy              3m (x12 over 3m)  kubelet, ip-10-201-44-14.ec2.internal  Readiness probe failed: Get http://100.119.139.133:8877/ambassador/v0/check_ready: dial tcp 100.119.139.133:8877: getsockopt: connection refused
  Normal   Killing                3m                kubelet, ip-10-201-44-14.ec2.internal  Killing container with id docker://ambassador:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled                 3m                kubelet, ip-10-201-44-14.ec2.internal  Container image "quay.io/datawire/ambassador:master-f982c77" already present on machine

It looks like the issue is here:

$ kubectl logs ambassador-npgw4 ambassador
./entrypoint.sh: set: line 65: can't access tty; job control turned off

https://github.com/datawire/ambassador/blob/master/ambassador/entrypoint.sh#L65

It looks like the container never actually starts envoy, so after the readiness check fails after a while, k8s kills it.

I'm not really sure what the job control stuff is doing - did the base docker image change that could cause this break?

victortrac on 24 May 2018

Ok, fixed my problem. The entrypoint.sh error was a red herring. After investigation, it turns out the issue was kubewatch.py was hanging due to a serviceAccount mismatch and not having network access to the Kube API.

I had ambassador running in a namespace but the ClusterRoleBinding was trying to match the service account in the default namespace. After fixing that, ambassador is starting as expected.

It would have saved a lot of time to put some echos in entrypoint.sh or some logging messages in kubewatch.py. I may send in a PR.

victortrac on 25 May 2018

Nice work debugging! A PR would be great.

I wonder if there's something we could be doing in the diagnostics service where we could test the general Kubernetes configuration (can you access the Kube API, for example) which could make things easier to troubleshoot?

richarddli on 25 May 2018

@victortrac So the bizarre thing here is that all the processes started by entrypoint.sh are, in fact, supposed to be logging to stdout, so that you can see the output easily with kubectl logs. It's quite strange to me that you're getting no output.

I've built a test version of Ambassador at dwflynn/ambassador:0.34.0-jobcontrol-0 -- could you try that, by any chance? It drops the attempt to muck with job control in entrypoint.sh, and tries to request unbuffered output for the Python processes started to handle the heavy lifting...

Many thanks!

kflynn on 25 May 2018

This issue can be reproduced with a fresh minikube install as well (minikube 0.27, k8s 1.10.0).

Update: it seems like the k8s API is not responsive from the POD due to kube-dns.

ihrwein on 26 May 2018

@kflynn I tried to reproduce the failure condition (where the ClusterRoleBinding had a ServiceAccount subject in the "default" namespace instead of my ambassador namespace):

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: ambassador
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: ambassador
subjects:
- kind: ServiceAccount
  name: ambassador
  namespace: default

Here is the log output based on your debug image:

$ kubectl logs -n verocity -f ambassador-cg5tt ambassador 
2018-05-28 16:40:56 kubewatch 0.34.0-jobcontrol-0 INFO: generating config with gencount 1 (0 changes)
2018-05-28 16:40:56 kubewatch 0.34.0-jobcontrol-0 INFO: Scout reports {"latest_version": "0.32.2", "application": "ambassador", "notices": [], "cached": false, "timestamp": 1527525656.155374}
[2018-05-28 16:40:56.563][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-05-28 16:40:56.563][8][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-05-28 16:40:56.571][8][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-05-28 16:40:56.571][8][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
AMBASSADOR: starting diagd
AMBASSADOR: starting Envoy
AMBASSADOR: waiting
PIDS: 9:diagd 10:envoy 11:kubewatch
[2018-05-28 16:40:56.782][12][info][main] source/server/server.cc:184] initializing epoch 0 (hot restart version=9.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363)
[2018-05-28 16:40:57.045][12][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-05-28 16:40:57.045][12][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-05-28 16:40:57.148][12][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-05-28 16:40:57.148][12][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
[2018-05-28 16:40:57.148][12][info][main] source/server/server.cc:343] all clusters initialized. initializing init manager
[2018-05-28 16:40:57.148][12][info][config] source/server/listener_manager_impl.cc:543] all dependencies initialized. starting workers
[2018-05-28 16:40:57.149][12][info][main] source/server/server.cc:359] starting main dispatch loop
2018-05-28 16:40:58 diagd 0.34.0-jobcontrol-0 INFO: Scout reports {"latest_version": "0.32.2", "application": "ambassador", "notices": [], "cached": false, "timestamp": 1527525658.449703}
[2018-05-28 16:41:07.150][12][info][main] source/server/drain_manager_impl.cc:65] shutting down parent after drain

victortrac on 28 May 2018

@victortrac Wow! Many thanks. I'm going to land that unbuffering change, then, and then get kubewatch to be more vocal about being unable to talk to Kubernetes. Thanks!

kflynn on 29 May 2018

Hmmmmmmm. @victortrac, can you give me the full YAML you're using to configure things? When I try to reproduce with what I expect is the same, I'm getting a stack trace in the logs and an immediate restart.

kflynn on 29 May 2018

@kflynn I have one. Sorry for the format

apiVersion: v1
data:
jupyterhub_config.py: |
import json
import os
from kubespawner.spawner import KubeSpawner
from jhub_remote_user_authenticator.remote_user_auth import
RemoteUserAuthenticator
from oauthenticator.github import GitHubOAuthenticator
  class KubeFormSpawner(KubeSpawner):
  # relies on HTML5 for image datalist
def _options_form_default(self):
return '''
<label for='image'>Image<
label>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<input list="image" name="image" placeholder='repo/image:tag'>
<datalist id="image">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.4.1
notebook-cpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.4.1
notebook-gpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.5.1
notebook-cpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.5.1
notebook-gpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.6.0
notebook-cpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.6.0notebook-gpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.7.0
notebook-cpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.7.0
notebook-gpu:v20180419-0ad94c4e">
</datalist>
<br/><br/>
  <label for='cpu_guarantee'>CPU<
label>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<input name='cpu_guarantee' placeholder='200m, 1.0, 2.5, etc'><
input>
<br/><br/>
  <label for='mem_guarantee'>Memory<
label>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<input name='mem_guarantee' placeholder='100Mi, 1.5Gi'><
input>
<br/><br/>
  <label for='extra_resource_limits'>Extra Resource Limits<
label>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<input name='extra_resource_limits'
placeholder='{&apos;nvidia.com/gpu&apos;: &apos;3&apos;}'><
input>
<br/><br/>
'''
  def options_from_form(self, formdata):
options = {}
options['image'] = formdata.get('image', [''])[0].strip()options['cpu_guarantee'] = formdata.get('cpu_guarantee', ['']
[0].strip()
options['mem_guarantee'] = formdata.get('mem_guarantee', ['']
[0].strip()
options['extra_resource_limits'] =
formdata.get('extra_resource_limits', [''])[0].strip()
return options
  @property
def singleuser_image_spec(self):
image = 'gcr.io/kubeflow/tensorflow-notebook-cpu'
if self.user_options.get('image'):
image = self.user_options['image']
return image
  @property
def cpu_guarantee(self):
cpu = '500m'
if self.user_options.get('cpu_guarantee'):
cpu = self.user_options['cpu_guarantee']
return cpu
  @property
def mem_guarantee(self):
mem = '1Gi'
if self.user_options.get('mem_guarantee'):
mem = self.user_options['mem_guarantee']
return mem
  @property
def extra_resource_limits(self):extra = ''
if self.user_options.get('extra_resource_limits'):
extra = json.loads(self.user_options['extra_resource_limits'])
return extra
  ################################################
##
### JupyterHub Options
################################################
##
c.JupyterHub.ip = '0.0.0.0'
c.JupyterHub.hub_ip = '0.0.0.0'
# Don't try to cleanup servers on exit - since in general for k8s, we
want
# the hub to be able to restart without losing user containers
c.JupyterHub.cleanup_servers = False
################################################
##
  ################################################
##
### Spawner Options
################################################
##
c.JupyterHub.spawner_class = KubeFormSpawner
c.KubeSpawner.singleuser_image_spec = 'gcr.io/kubeflow
tensorflow-notebook'
c.KubeSpawner.cmd = 'start-singleuser.sh'
c.KubeSpawner.args = ['--allow-root']
# gpu images are very large ~15GB. need a large timeout.
c.KubeSpawner.start_timeout = 60 * 30# Increase timeout to 5 minutes to avoid HTTP 500 errors on
JupyterHub
c.KubeSpawner.http_timeout = 60 * 5
  ################################################
##
### Persistent volume options
################################################
##
# Using persistent storage requires a default storage class.
# TODO(jlewi): Verify this works on minikube.
# TODO(jlewi): Should we set c.KubeSpawner.singleuser_fs_gid =
1000
# see https://github.com/kubeflow/kubeflow/pull
22#issuecomment-350500944
pvc_mount = os.environ.get('NOTEBOOK_PVC_MOUNT')
if pvc_mount and pvc_mount != 'null':
c.KubeSpawner.user_storage_pvc_ensure = True
# How much disk space do we want?
c.KubeSpawner.user_storage_capacity = '10Gi'
c.KubeSpawner.pvc_name_template = 'claim-{username
{servername}'
c.KubeSpawner.volumes = [
{
'name': 'volume-{username}{servername}',
'persistentVolumeClaim': {
'claimName': 'claim-{username}{servername}'
}
}
]
c.KubeSpawner.volume_mounts = [{
'mountPath': pvc_mount,
'name': 'volume-{username}{servername}'
}
]
  ######## Authenticator ######
c.JupyterHub.authenticator_class =
'dummyauthenticator.DummyAuthenticator'
kind: ConfigMap
metadata:
name: jupyterhub-config
namespace: kubeflow
---
apiVersion: v1
kind: Service
metadata:
labels:
app: tf-hub
name: tf-hub-0
namespace: kubeflow
spec:
clusterIP: None
ports:
- name: hub
port: 8000
selector:
app: tf-hub
---
apiVersion: v1
kind: Servicemetadata:
labels:
app: tf-hub-lb
name: tf-hub-lb
namespace: kubeflow
spec:
ports:
- name: hub
port: 80
targetPort: 8000
selector:
app: tf-hub
type: ClusterIP
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: tf-hub
namespace: kubeflow
spec:
replicas: 1
serviceName: ""
template:
metadata:
labels:
app: tf-hub
spec:
containers:
- command:
- jupyterhub
- -f- /etc/config/jupyterhub_config.py
image: gcr.io/kubeflow/jupyterhub-k8s:1.0.1
name: tf-hub
ports:
- containerPort: 8000
- containerPort: 8081
volumeMounts:
- mountPath: /etc/config
name: config-volume
serviceAccountName: jupyter-hub
volumes:
- configMap:
name: jupyterhub-config
name: config-volume
updateStrategy:
type: RollingUpdate
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
name: jupyter-role
namespace: kubeflow
rules:
- apiGroups:
- '*'
resources:
- '*'
verbs:
- '*'
---
apiVersion: v1kind: ServiceAccount
metadata:
labels:
app: jupyter-hub
name: jupyter-hub
namespace: kubeflow
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: jupyter-role
namespace: kubeflow
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: jupyter-role
subjects:
- kind: ServiceAccount
name: jupyter-hub
namespace: kubeflow
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: tf-job-operator
namespace: kubeflow
spec:
replicas: 1
template:
metadata:
labels:name: tf-job-operator
spec:
containers:
- command:
- /opt/mlkube/tf-operator
- --controller-config-file=/etc/config/controller_config_file.yaml
- --alsologtostderr
- -v=1
env:
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
image: gcr.io/kubeflow-images-public/tf_operator:v20180329
a7511ff
name: tf-job-operator
volumeMounts:
- mountPath: /etc/config
name: config-volume
serviceAccountName: tf-job-operator
volumes:
- configMap:
name: tf-job-operator-config
name: config-volume
---
apiVersion: v1
data:controller_config_file.yaml: |-
{
"grpcServerFilePath": "/opt/mlkube/grpc_tensorflow_server
grpc_tensorflow_server.py"
}
kind: ConfigMap
metadata:
name: tf-job-operator-config
namespace: kubeflow
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app: tf-job-operator
name: tf-job-operator
namespace: kubeflow
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
labels:
app: tf-job-operator
name: tf-job-operator
rules:
- apiGroups:
- tensorflow.org
- kubeflow.org
resources:
- tfjobs
verbs:------ '*'
apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- '*'
apiGroups:
- storage.k8s.io
resources:
- storageclasses
verbs:
- '*'
apiGroups:
- batch
resources:
- jobs
verbs:
- '*'
apiGroups:
- ""
resources:
- configmaps
- pods
- services
- endpoints
- persistentvolumeclaims
- events
verbs:
- '*'
apiGroups:- apps
- extensions
resources:
- deployments
verbs:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
labels:
app: tf-job-operator
name: tf-job-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: tf-job-operator
subjects:
- kind: ServiceAccount
name: tf-job-operator
namespace: kubeflow
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: tfjobs.kubeflow.org
spec:
group: kubeflow.org
names:
kind: TFJob
plural: tfjobssingular: tfjob
version: v1alpha1
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
labels:
app: tf-job-dashboard
name: tf-job-dashboard
rules:
- apiGroups:
- tensorflow.org
- kubeflow.org
resources:
- tfjobs
verbs:
- '*'
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- '*'
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
verbs:
- '*'
- apiGroups:
- batchresources:
- jobs
verbs:
- '*'
- apiGroups:
- ""
resources:
- configmaps
- pods
- services
- endpoints
- persistentvolumeclaims
- events
verbs:
- '*'
- apiGroups:
- apps
- extensions
resources:
- deployments
verbs:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
labels:
app: tf-job-dashboard
name: tf-job-dashboard
roleRef:
apiGroup: rbac.authorization.k8s.iokind: ClusterRole
name: tf-job-dashboard
subjects:
- kind: ServiceAccount
name: tf-job-dashboard
namespace: kubeflow
---
apiVersion: v1
kind: Service
metadata:
annotations:
getambassador.io/config: |-
---
apiVersion: ambassador/v0
kind: Mapping
name: tfjobs-ui-mapping
prefix: /tfjobs/
rewrite: /tfjobs/
service: tf-job-dashboard.kubeflow
name: tf-job-dashboard
namespace: kubeflow
spec:
ports:
- port: 80
targetPort: 8080
selector:
name: tf-job-dashboard
type: ClusterIP
---
apiVersion: v1
kind: ServiceAccountmetadata:
labels:
app: tf-job-dashboard
name: tf-job-dashboard
namespace: kubeflow
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: tf-job-dashboard
namespace: kubeflow
spec:
template:
metadata:
labels:
name: tf-job-dashboard
spec:
containers:
- command:
- /opt/tensorflow_k8s/dashboard/backend
image: gcr.io/kubeflow-images-public/tf_operator:v20180329
a7511ff
name: tf-job-dashboard
ports:
- containerPort: 8080
serviceAccountName: tf-job-dashboard
---
apiVersion: v1
kind: Service
metadata:
labels:service: ambassador
name: ambassador
namespace: kubeflow
spec:
ports:
- name: ambassador
port: 80
targetPort: 80
selector:
service: ambassador
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
labels:
service: ambassador-admin
name: ambassador-admin
namespace: kubeflow
spec:
ports:
- name: ambassador-admin
port: 8877
targetPort: 8877
selector:
service: ambassador
type: ClusterIP
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:name: ambassador
namespace: kubeflow
rules:
- apiGroups:
- ""
resources:
- services
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- update
- patch
- get
- list
- watch
- apiGroups:
- ""
resources:
- secrets
verbs:
- get
- list
- watch
---apiVersion: v1
kind: ServiceAccount
metadata:
name: ambassador
namespace: kubeflow
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: ambassador
namespace: kubeflow
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ambassador
subjects:
- kind: ServiceAccount
name: ambassador
namespace: kubeflow
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ambassador
namespace: kubeflow
spec:
replicas: 3
template:
metadata:
labels:
service: ambassadornamespace: kubeflow
spec:
containers:
- env:
- name: AMBASSADOR_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: AMBASSADOR_SINGLE_NAMESPACE
value: "true"
image: quay.io/datawire/ambassador:0.30.1
livenessProbe:
httpGet:
path: /ambassador/v0/check_alive
port: 8877
initialDelaySeconds: 30
periodSeconds: 30
name: ambassador
readinessProbe:
httpGet:
path: /ambassador/v0/check_ready
port: 8877
initialDelaySeconds: 30
periodSeconds: 30
resources:
limits:
cpu: 1
memory: 400Mi
requests:
cpu: 200m
memory: 100Mi- image: quay.io/datawire/statsd:0.30.1
name: statsd
restartPolicy: Always
serviceAccountName: ambassador
---
apiVersion: v1
kind: Service
metadata:
annotations:
getambassador.io/config: |-
---yaml
apiVersion: ambassador/v0
kind: Mapping
name: k8s-dashboard-ui-mapping
prefix: /k8s/ui/
rewrite: /
tls: true
service: kubernetes-dashboard.kube-system
name: k8s-dashboard
namespace: kubeflow
spec:
ports:
- port: 443
targetPort: 8443
selector:
k8s-app: kubernetes-dashboard
type: ClusterIP

372046933 on 30 May 2018

@372046933, if you wrap all of that in three backticks when pasting, it'll be _much_ more helpful! You can take a look at the Syntax Highlighting section in https://guides.github.com/features/mastering-markdown/ for more. Thanks!

kflynn on 30 May 2018

@kflynn Thank you. I will spend some time looking at mark down.

372046933 on 31 May 2018

Here is one that I used on a fresh minikube install:

apiVersion: v1
kind: Service
metadata:
  labels:
    service: ambassador-admin
  name: ambassador-admin
spec:
  type: NodePort
  ports:
  - name: ambassador-admin
    port: 8877
    targetPort: 8877
  selector:
    service: ambassador
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ambassador
spec:
  replicas: 1
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
      labels:
        service: ambassador
    spec:
      containers:
      - name: ambassador
        image: quay.io/datawire/ambassador:0.34.0
        resources:
          limits:
            cpu: 1
            memory: 400Mi
          requests:
            cpu: 200m
            memory: 100Mi
        env:
        - name: AMBASSADOR_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        livenessProbe:
          httpGet:
            path: /ambassador/v0/check_alive
            port: 8877
          initialDelaySeconds: 30
          periodSeconds: 3
        readinessProbe:
          httpGet:
            path: /ambassador/v0/check_ready
            port: 8877
          initialDelaySeconds: 30
          periodSeconds: 3
      - name: statsd
        image: quay.io/datawire/statsd:0.34.0
      restartPolicy: Always

I get the following:

$ kubectl logs ambassador-7cbc8d6c78-m4qxd -c ambassador
./entrypoint.sh: set: line 65: can't access tty; job control turned off
Traceback (most recent call last):
  File "/application/kubewatch.py", line 493, in <module>
    main()
  File "/usr/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/application/kubewatch.py", line 476, in main
    sync(restarter)
  File "/application/kubewatch.py", line 313, in sync
    for x in v1.list_namespaced_config_map(restarter.namespace).items ]
  File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 12395, in list_namespaced_config_map
    (data) = self.list_namespaced_config_map_with_http_info(namespace, **kwargs)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 12497, in list_namespaced_config_map_with_http_info
    collection_formats=collection_formats)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 335, in call_api
    _preload_content, _request_timeout)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 148, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 371, in request
    headers=headers)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 250, in GET
    query_params=query_params)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 240, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Fri, 01 Jun 2018 14:55:14 GMT', 'Content-Length': '269'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"configmaps is forbidden: User \"system:serviceaccount:default:default\" cannot list configmaps in the namespace \"default\"","reason":"Forbidden","details":{"kind":"configmaps"},"code":403}


AMBASSADOR: kubewatch sync exited with status 1
Here's the envoy.json we were trying to run with:
ls: /etc/envoy*.json: No such file or directory
No config generated.
AMBASSADOR: shutting down

PierrickI3 on 1 Jun 2018

@PierrickI3 That looks like an RBAC failure. Did you start Minikube with RBAC? (Are you on our Slack? This might be easier there -- instructions are at www.getambassador.io :) ).

kflynn on 1 Jun 2018

I am not using RBAC. And yes, I am on slack

PierrickI3 on 1 Jun 2018

@PierrickI3 Hmm, I don't see you on the datawire-oss slack -- what's your username there?

I think it'd be interesting to try setting up RBAC on your minikube. https://www.getambassador.io/yaml/ambassador-rbac.yaml should work for you; can you give that a shot?

kflynn on 1 Jun 2018

I have passed --authorization-mode=Node,RBAC to kube-apiserver

372046933 on 4 Jun 2018

@372046933 let's try this -- see the "attach files by dragging & dropping, etc." below the text box to add a comment to the PR? Can you attach a zipfile of the YAML you tried to paste earlier?

Thanks!!

kflynn on 4 Jun 2018

@kflynn I just did 'kubectl apply -f https://www.getambassador.io/yaml/ambassador/ambassador-rbac.yaml'. But got below.

./entrypoint.sh: set: line 65: can't access tty; job control turned off
2018-06-05 09:32:59,483 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),)': /api/v1/namespaces/default/configmaps
2018-06-05 09:32:59 kubewatch 0.34.0 WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),)': /api/v1/namespaces/default/configmaps
2018-06-05 09:32:59,488 WARNING Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),)': /api/v1/namespaces/default/configmaps
2018-06-05 09:32:59 kubewatch 0.34.0 WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),)': /api/v1/namespaces/default/configmaps
2018-06-05 09:32:59,493 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),)': /api/v1/namespaces/default/configmaps
2018-06-05 09:32:59 kubewatch 0.34.0 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),)': /api/v1/namespaces/default/configmaps
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
    self._validate_conn(conn)
  File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
    conn.connect()
  File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 326, in connect
    ssl_context=context)
  File "/usr/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 329, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3.6/ssl.py", line 407, in wrap_socket
    _context=self, _session=session)
  File "/usr/lib/python3.6/ssl.py", line 814, in __init__
    self.do_handshake()
  File "/usr/lib/python3.6/ssl.py", line 1068, in do_handshake
    self._sslobj.do_handshake()
  File "/usr/lib/python3.6/ssl.py", line 689, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/application/kubewatch.py", line 493, in <module>
    main()
  File "/usr/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/application/kubewatch.py", line 476, in main
    sync(restarter)
  File "/application/kubewatch.py", line 313, in sync
    for x in v1.list_namespaced_config_map(restarter.namespace).items ]
  File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 12395, in list_namespaced_config_map
    (data) = self.list_namespaced_config_map_with_http_info(namespace, **kwargs)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 12497, in list_namespaced_config_map_with_http_info
    collection_formats=collection_formats)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 335, in call_api
    _preload_content, _request_timeout)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 148, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 371, in request
    headers=headers)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 250, in GET
    query_params=query_params)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 223, in request
    headers=headers)
  File "/usr/lib/python3.6/site-packages/urllib3/request.py", line 66, in request
    **urlopen_kw)
  File "/usr/lib/python3.6/site-packages/urllib3/request.py", line 87, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
  File "/usr/lib/python3.6/site-packages/urllib3/poolmanager.py", line 321, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 668, in urlopen
    **response_kw)
  File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 668, in urlopen
    **response_kw)
  File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 668, in urlopen
    **response_kw)
  File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='172.111.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces/default/configmaps (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),))
AMBASSADOR: kubewatch sync exited with status 1
Here's the envoy.json we were trying to run with:
ls: /etc/envoy*.json: No such file or directory
No config generated.
AMBASSADOR: shutting down

abmybgx on 5 Jun 2018

@abmybgx I think this is a different issue (you're running into some sort of cert error) from the core issues mentioned above. Could you open a new issue, with details on your configuration (e.g., Kubernetes version, etc.)?

richarddli on 5 Jun 2018

@richarddli Thx for your comment. And after I changed self.verify_ssl from True to False in /usr/lib/python3.6/site-packages/kubernetes/client/configuration.py, it works now.

abmybgx on 6 Jun 2018

@PierrickI3 did you manage to fix your issue? I ran into the same error.

I was following the tutorial https://github.com/SeldonIO/seldon-core/blob/master/notebooks/ksonnet_ambassador_minikube.ipynb

but with RBAC enabled when starting minikube and also using the right ambassador yaml with RBAC.

AdrianLsk on 8 Jun 2018

@AdrianLsk Ah, thanks for that! I'll try that and see what happens for me.

kflynn on 8 Jun 2018

@AdrianLsk I haven't been able to do it just yet. You should probably try @richarddli's comment above.
I would be interested in knowing what you found out though.

PierrickI3 on 8 Jun 2018

@kflynn great, hopefully you can find a solution :)

@PierrickI3 ah, that's a shame. Unfortunately, @richarddli's comment corresponds to an error due to a certificate verification failure, which is different from ours.

AdrianLsk on 8 Jun 2018

I think the root of our problem is the configmaps methods specification in the ambassador yaml file for rbac:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: ambassador
rules:
- apiGroups: [""]
  resources:
  - services
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["create", "update", "patch", "get", "list", "watch"]
- apiGroups: [""]
  resources:
  - secrets
  verbs: ["get", "list", "watch"]

since the error says that:

configmaps is forbidden: User <user> cannot list configmaps in the namespace <namespace>,"reason":"Forbidden","details":{"kind":"configmaps"}

where @PierrickI3 has:
<user> = \"system:serviceaccount:default:default\"
<namespace> = \"default\"
and for me it's
<user> = \"system:serviceaccount:seldon:ambassador\"
<namespace> = \"seldon\"

AdrianLsk on 8 Jun 2018

Augh. So. Seldon deploys into the seldon namespace, indeed, and Ambassador will need to be tweaked for that.

@AdrianLsk Are you on our Slack channel? www.getambassador.io has instructions, if not -- I'd like to give you a different ambassador.yaml to try, but it'll likely be easier to interact there.

kflynn on 8 Jun 2018

👍1

OK, so here's a better Ambassador deployment YAML for Seldon. I'll figure out how to create a PR for this for the Seldon folks.

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: ambassador
rules:
- apiGroups: [""]
  resources:
  - services
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["create", "update", "patch", "get", "list", "watch"]
- apiGroups: [""]
  resources:
  - secrets
  verbs: ["get", "list", "watch"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ambassador
  namespace: seldon
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: ambassador
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: ambassador
subjects:
- kind: ServiceAccount
  name: ambassador
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: ambassador
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: ambassador
subjects:
- kind: ServiceAccount
  name: ambassador
  namespace: seldon
---
apiVersion: v1
kind: Service
metadata:
  name: ambassador
  namespace: seldon
spec:
  selector:
    service: ambassador
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80
  type: NodePort
---
apiVersion: v1
kind: Service
metadata:
  labels:
    service: ambassador-admin
  name: ambassador-admin
  namespace: seldon
spec:
  ports:
  - name: ambassador-admin
    port: 8877
    targetPort: 8877
  selector:
    service: ambassador
  type: NodePort
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ambassador
  namespace: seldon
spec:
  replicas: 1
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: 'false'
      labels:
        service: ambassador
    spec:
      containers:
      - image: quay.io/datawire/ambassador:0.34.1
        name: ambassador
        env:
        - name: AMBASSADOR_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        resources:
          limits:
            cpu: 1
            memory: 400Mi
          requests:
            cpu: 200m
            memory: 100Mi
      - image: quay.io/datawire/statsd:0.34.1
        name: statsd
      restartPolicy: Always
      serviceAccountName: ambassador

kflynn on 8 Jun 2018

👍1

Thanks to @AdrianLsk for checking this for me! Anyone want to give it a shot?

The issue is indeed the RBAC configuration around the separate namespace. It's not enough to create RBAC roles and accounts in the default namespace; you need some of them in the seldon namespace.

kflynn on 8 Jun 2018

cc @cliveseldon

richarddli on 8 Jun 2018

Thanks for the info. Yes please create an issue on Seldon-Core so we can try to solve it. We have ksonnet and helm packages.

cliveseldon on 8 Jun 2018

@cliveseldon I believe if you want to just update your default install docs to use the above YAML, that would work.

richarddli on 8 Jun 2018

Opened https://github.com/SeldonIO/seldon-core/issues/165 with the YAML above for seldon.

kflynn on 12 Jun 2018

OK, I'm closing this PR because at this point, everything is pointing to RBAC. We've updated Ambassador's docs to talk about how to verify that RBAC is enabled, to hopefully prevent this from being a major problem in the future.

Please reopen this issue, or file a new one, if you see this problem again. Thanks!

kflynn on 18 Jun 2018

@victortrac can you please provide more detail how you fixed this?

ttfreeman on 18 Sep 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings