Describe the bug
The Ambassador container within ambassador pod does not spawn up properly on a local cluster with kubernetes 1.10.2
To Reproduce
Steps to reproduce the behavior:
A fresh install of kubernetes 1.10.2 on a local cluster.
Follow the guidance to install ambassador-no-rbac.
Expected behavior
expect ambassador to spawn up.
but it give error code 137. Both liveness probe and Readiness probe failed with getsockopt: connection refused.
ambassador-6c7dd7799b-2mjzx 1/2 CrashLoopBackOff 13 37m
Versions (please complete the following information):
Additional context
Someone @richarddli suggested this is due to a dns problem, but local dns (/etc/resolve.conf) is configured using 8.8.8.8 and 8.8.4.4, I am sure at least I can access google website. (By deploying a busybox into my cluster, I checked my busybox container can ping google.com, but not sure for others.)
Here are some extra log for debugging
kubernetes@local-cluster-0:~/my-kubeflow$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c kubedns
I0503 10:46:23.833551 1 dns.go:48] version: 1.14.8
I0503 10:46:23.862220 1 server.go:71] Using configuration read from directory: /kube-dns-config with period 10s
I0503 10:46:23.862374 1 server.go:119] FLAG: --alsologtostderr="false"
I0503 10:46:23.862433 1 server.go:119] FLAG: --config-dir="/kube-dns-config"
I0503 10:46:23.862466 1 server.go:119] FLAG: --config-map=""
I0503 10:46:23.862485 1 server.go:119] FLAG: --config-map-namespace="kube-system"
I0503 10:46:23.862504 1 server.go:119] FLAG: --config-period="10s"
I0503 10:46:23.862532 1 server.go:119] FLAG: --dns-bind-address="0.0.0.0"
I0503 10:46:23.862551 1 server.go:119] FLAG: --dns-port="10053"
I0503 10:46:23.862592 1 server.go:119] FLAG: --domain="cluster.local."
I0503 10:46:23.862621 1 server.go:119] FLAG: --federations=""
I0503 10:46:23.862648 1 server.go:119] FLAG: --healthz-port="8081"
I0503 10:46:23.862668 1 server.go:119] FLAG: --initial-sync-timeout="1m0s"
I0503 10:46:23.862688 1 server.go:119] FLAG: --kube-master-url=""
I0503 10:46:23.862742 1 server.go:119] FLAG: --kubecfg-file=""
I0503 10:46:23.862761 1 server.go:119] FLAG: --log-backtrace-at=":0"
I0503 10:46:23.862794 1 server.go:119] FLAG: --log-dir=""
I0503 10:46:23.862815 1 server.go:119] FLAG: --log-flush-frequency="5s"
I0503 10:46:23.862835 1 server.go:119] FLAG: --logtostderr="true"
I0503 10:46:23.862854 1 server.go:119] FLAG: --nameservers=""
I0503 10:46:23.862872 1 server.go:119] FLAG: --stderrthreshold="2"
I0503 10:46:23.862891 1 server.go:119] FLAG: --v="2"
I0503 10:46:23.862911 1 server.go:119] FLAG: --version="false"
I0503 10:46:23.862953 1 server.go:119] FLAG: --vmodule=""
I0503 10:46:23.863269 1 server.go:201] Starting SkyDNS server (0.0.0.0:10053)
I0503 10:46:23.864231 1 server.go:220] Skydns metrics enabled (/metrics:10055)
I0503 10:46:23.864282 1 dns.go:146] Starting endpointsController
I0503 10:46:23.864363 1 dns.go:149] Starting serviceController
I0503 10:46:23.901849 1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0503 10:46:23.901936 1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0503 10:46:24.364803 1 dns.go:170] Initialized services and endpoints from apiserver
I0503 10:46:24.364871 1 server.go:135] Setting up Healthz Handler (/readiness)
I0503 10:46:24.364935 1 server.go:140] Setting up cache handler (/cache)
I0503 10:46:24.364962 1 server.go:126] Status HTTP port 8081
I0504 04:33:43.169511 1 dns.go:555] Could not find endpoints for service "tf-hub-0" in namespace "kubeflow". DNS records will be created once endpoints show up.
I0515 01:02:23.572540 1 dns.go:555] Could not find endpoints for service "tf-hub-0" in namespace "kubeflow". DNS records will be created once endpoints show up.
kubernetes@local-cluster-0:~/my-kubeflow$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c dnsmasq
I0503 10:46:24.860136 1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --no-negcache --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0503 10:46:24.870496 1 nanny.go:94] Starting dnsmasq [-k --cache-size=1000 --no-negcache --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0503 10:46:25.334307 1 nanny.go:116] dnsmasq[12]: started, version 2.78 cachesize 1000
I0503 10:46:25.334618 1 nanny.go:116] dnsmasq[12]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0503 10:46:25.334632 1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0503 10:46:25.334638 1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0503 10:46:25.334643 1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0503 10:46:25.334651 1 nanny.go:116] dnsmasq[12]: reading /etc/resolv.conf
I0503 10:46:25.334657 1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0503 10:46:25.334662 1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0503 10:46:25.334668 1 nanny.go:116] dnsmasq[12]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0503 10:46:25.334672 1 nanny.go:116] dnsmasq[12]: using nameserver 8.8.8.8#53
I0503 10:46:25.334678 1 nanny.go:116] dnsmasq[12]: using nameserver 8.8.4.4#53
I0503 10:46:25.334716 1 nanny.go:116] dnsmasq[12]: read /etc/hosts - 7 addresses
I0503 10:46:25.334851 1 nanny.go:119]
W0503 10:46:25.334861 1 nanny.go:120] Got EOF from stdout
kubernetes@local-cluster-0:~/my-kubeflow$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c sidecar
I0503 10:46:25.240827 1 main.go:51] Version v1.14.8
I0503 10:46:25.240868 1 server.go:45] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[{Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33} {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}] PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns})
I0503 10:46:25.240906 1 dnsprobe.go:75] Starting dnsProbe {Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}
I0503 10:46:25.240992 1 dnsprobe.go:75] Starting dnsProbe {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}
W0503 10:46:25.241374 1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:58432->127.0.0.1:53: read: connection refused
Can anyone help me have a look? Many thanks in advance!
Hey! Could you also try doing kubectl logs $ambassadorpod -c ambassador and/or kubectl describe pod $ambassadorpod? My first thought is actually RBAC, but the logs will be a big help. Thanks!
Hi @kflynn,
These are my logs
kubernetes@local-cluster-0:~/my-kubeflow$ kubectl logs ambassador-99dcfd54c-pbrw9 -n kubeflow -c ambassador
./entrypoint.sh: set: line 63: can't access tty; job control turned off
kubernetes@local-cluster-0:~/my-kubeflow$ kubectl describe pod ambassador-99dcfd54c-pbrw9 -n kubeflow
Labels: pod-template-hash=558798107
service=ambassador
Annotations: <none>
Status: Running
IP: 192.168.126.70
Controlled By: ReplicaSet/ambassador-99dcfd54c
Containers:
ambassador:
Container ID: docker://f5b5493b61ffc5ebc1670e17150d40a3c111b3b035e4adbe74f7539ff0752394
Image: quay.io/datawire/ambassador:0.26.0
Image ID: docker-pullable://quay.io/datawire/ambassador@sha256:76ecbafc7716006b76255916b2a2f4cadb8adbd664fa909aee9d8330773d158d
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Thu, 17 May 2018 19:36:30 +1000
Finished: Thu, 17 May 2018 19:38:29 +1000
Ready: False
Restart Count: 374
Limits:
cpu: 1
memory: 400Mi
Requests:
cpu: 200m
memory: 100Mi
Liveness: http-get http://:8877/ambassador/v0/check_alive delay=30s timeout=1s period=30s #success=1 #failure=3
Readiness: http-get http://:8877/ambassador/v0/check_ready delay=30s timeout=1s period=30s #success=1 #failure=3
Environment:
AMBASSADOR_NAMESPACE: kubeflow (v1:metadata.namespace)
AMBASSADOR_SINGLE_NAMESPACE: true
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from ambassador-token-bwzbp (ro)
statsd:
Container ID: docker://9c82a8a224b16dfc6948ec2244c5d386b04303876ffa7de02ddfe44cbfa5e530
Image: quay.io/datawire/statsd:0.22.0
Image ID: docker-pullable://quay.io/datawire/statsd@sha256:38f2bd8ddc299762f523cab8e485f980ee90b3f25a3c22a30fab401dab8527f7
Port: <none>
Host Port: <none>
State: Running
Started: Wed, 16 May 2018 15:50:51 +1000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from ambassador-token-bwzbp (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
ambassador-token-bwzbp:
Type: Secret (a volume populated by a Secret)
SecretName: ambassador-token-bwzbp
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 6m (x1121 over 1d) kubelet, local-cluster-2 Readiness probe failed: Get http://192.168.126.70:8877/ambassador/v0/check_ready: dial tcp 192.168.126.70:8877: getsockopt: connection refused
Warning BackOff 1m (x4425 over 1d) kubelet, local-cluster-2 Back-off restarting failed container
@jiaanguo Hmmm. I don't actually see the kubectl logs output -- should that have been attached? (This might be a good candidate for the Gitter channel, too.)
@kflynn
kubernetes@local-cluster-0:~$ kubectl logs ambassador-647b46cdf5-mjzmd -n kubeflow ambassador
./entrypoint.sh: set: line 63: can't access tty; job control turned off
I do not think I even have the log for the ambassador.
It just shut down silently.
So it looks like your Ambassador pod is running 0.26.0 with a 0.22.0 statsd container. The mismatch isn't really supported, and Kubeflow is currently shipping with Ambassador 0.30.1. I think the first thing to do here is to upgrade to the latest Kubeflow, and let's see what happens.
Same issue on a brand new kops-built AWS cluster running kubernetes 1.8.7 and calico. I tried Ambassador 0.34.0 as well as the latest master image.
@victortrac do you have your logs? that would be a big help. thanks!
@richarddli Here's the describe from one of the pods:
Normal SuccessfulMountVolume 4m kubelet, ip-10-201-44-14.ec2.internal MountVolume.SetUp succeeded for volume "ambassador-token-6hndh"
Normal Pulling 4m kubelet, ip-10-201-44-14.ec2.internal pulling image "quay.io/datawire/ambassador:master-f982c77"
Normal Pulled 4m kubelet, ip-10-201-44-14.ec2.internal Successfully pulled image "quay.io/datawire/ambassador:master-f982c77"
Normal Created 4m kubelet, ip-10-201-44-14.ec2.internal Created container
Normal Started 4m kubelet, ip-10-201-44-14.ec2.internal Started container
Normal Pulled 4m kubelet, ip-10-201-44-14.ec2.internal Container image "quay.io/datawire/statsd:0.34.0" already present on machine
Normal Created 4m kubelet, ip-10-201-44-14.ec2.internal Created container
Normal Started 4m kubelet, ip-10-201-44-14.ec2.internal Started container
Warning Unhealthy 3m (x3 over 3m) kubelet, ip-10-201-44-14.ec2.internal Liveness probe failed: Get http://100.119.139.133:8877/ambassador/v0/check_alive: dial tcp 100.119.139.133:8877: getsockopt: connection refused
Warning Unhealthy 3m (x12 over 3m) kubelet, ip-10-201-44-14.ec2.internal Readiness probe failed: Get http://100.119.139.133:8877/ambassador/v0/check_ready: dial tcp 100.119.139.133:8877: getsockopt: connection refused
Normal Killing 3m kubelet, ip-10-201-44-14.ec2.internal Killing container with id docker://ambassador:Container failed liveness probe.. Container will be killed and recreated.
Normal Pulled 3m kubelet, ip-10-201-44-14.ec2.internal Container image "quay.io/datawire/ambassador:master-f982c77" already present on machine
It looks like the issue is here:
$ kubectl logs ambassador-npgw4 ambassador
./entrypoint.sh: set: line 65: can't access tty; job control turned off
https://github.com/datawire/ambassador/blob/master/ambassador/entrypoint.sh#L65
It looks like the container never actually starts envoy, so after the readiness check fails after a while, k8s kills it.
I'm not really sure what the job control stuff is doing - did the base docker image change that could cause this break?
Ok, fixed my problem. The entrypoint.sh error was a red herring. After investigation, it turns out the issue was kubewatch.py was hanging due to a serviceAccount mismatch and not having network access to the Kube API.
I had ambassador running in a namespace but the ClusterRoleBinding was trying to match the service account in the default namespace. After fixing that, ambassador is starting as expected.
It would have saved a lot of time to put some echos in entrypoint.sh or some logging messages in kubewatch.py. I may send in a PR.
Nice work debugging! A PR would be great.
I wonder if there's something we could be doing in the diagnostics service where we could test the general Kubernetes configuration (can you access the Kube API, for example) which could make things easier to troubleshoot?
@victortrac So the bizarre thing here is that all the processes started by entrypoint.sh are, in fact, supposed to be logging to stdout, so that you can see the output easily with kubectl logs. It's quite strange to me that you're getting no output.
I've built a test version of Ambassador at dwflynn/ambassador:0.34.0-jobcontrol-0 -- could you try that, by any chance? It drops the attempt to muck with job control in entrypoint.sh, and tries to request unbuffered output for the Python processes started to handle the heavy lifting...
Many thanks!
This issue can be reproduced with a fresh minikube install as well (minikube 0.27, k8s 1.10.0).
Update: it seems like the k8s API is not responsive from the POD due to kube-dns.
@kflynn I tried to reproduce the failure condition (where the ClusterRoleBinding had a ServiceAccount subject in the "default" namespace instead of my ambassador namespace):
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: ambassador
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ambassador
subjects:
- kind: ServiceAccount
name: ambassador
namespace: default
Here is the log output based on your debug image:
$ kubectl logs -n verocity -f ambassador-cg5tt ambassador
2018-05-28 16:40:56 kubewatch 0.34.0-jobcontrol-0 INFO: generating config with gencount 1 (0 changes)
2018-05-28 16:40:56 kubewatch 0.34.0-jobcontrol-0 INFO: Scout reports {"latest_version": "0.32.2", "application": "ambassador", "notices": [], "cached": false, "timestamp": 1527525656.155374}
[2018-05-28 16:40:56.563][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-05-28 16:40:56.563][8][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-05-28 16:40:56.571][8][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-05-28 16:40:56.571][8][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
AMBASSADOR: starting diagd
AMBASSADOR: starting Envoy
AMBASSADOR: waiting
PIDS: 9:diagd 10:envoy 11:kubewatch
[2018-05-28 16:40:56.782][12][info][main] source/server/server.cc:184] initializing epoch 0 (hot restart version=9.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363)
[2018-05-28 16:40:57.045][12][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-05-28 16:40:57.045][12][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-05-28 16:40:57.148][12][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-05-28 16:40:57.148][12][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
[2018-05-28 16:40:57.148][12][info][main] source/server/server.cc:343] all clusters initialized. initializing init manager
[2018-05-28 16:40:57.148][12][info][config] source/server/listener_manager_impl.cc:543] all dependencies initialized. starting workers
[2018-05-28 16:40:57.149][12][info][main] source/server/server.cc:359] starting main dispatch loop
2018-05-28 16:40:58 diagd 0.34.0-jobcontrol-0 INFO: Scout reports {"latest_version": "0.32.2", "application": "ambassador", "notices": [], "cached": false, "timestamp": 1527525658.449703}
[2018-05-28 16:41:07.150][12][info][main] source/server/drain_manager_impl.cc:65] shutting down parent after drain
@victortrac Wow! Many thanks. I'm going to land that unbuffering change, then, and then get kubewatch to be more vocal about being unable to talk to Kubernetes. Thanks!
Hmmmmmmm. @victortrac, can you give me the full YAML you're using to configure things? When I try to reproduce with what I expect is the same, I'm getting a stack trace in the logs and an immediate restart.
@kflynn I have one. Sorry for the format
apiVersion: v1
data:
jupyterhub_config.py: |
import json
import os
from kubespawner.spawner import KubeSpawner
from jhub_remote_user_authenticator.remote_user_auth import
RemoteUserAuthenticator
from oauthenticator.github import GitHubOAuthenticator

 class KubeFormSpawner(KubeSpawner):

 # relies on HTML5 for image datalist
def _options_form_default(self):
return '''
<label for='image'>Image<
label>
<input list="image" name="image" placeholder='repo/image:tag'>
<datalist id="image">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.4.1
notebook-cpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.4.1
notebook-gpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.5.1
notebook-cpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.5.1
notebook-gpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.6.0
notebook-cpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.6.0notebook-gpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.7.0
notebook-cpu:v20180419-0ad94c4e">
<option value="gcr.io/kubeflow-images-public/tensorflow-1.7.0
notebook-gpu:v20180419-0ad94c4e">
</datalist>
<br/><br/>

 <label for='cpu_guarantee'>CPU<
label>
<input name='cpu_guarantee' placeholder='200m, 1.0, 2.5, etc'><
input>
<br/><br/>

 <label for='mem_guarantee'>Memory<
label>
<input name='mem_guarantee' placeholder='100Mi, 1.5Gi'><
input>
<br/><br/>

 <label for='extra_resource_limits'>Extra Resource Limits<
label>
<input name='extra_resource_limits'
placeholder='{'nvidia.com/gpu': '3'}'><
input>
<br/><br/>
'''

 def options_from_form(self, formdata):
options = {}
options['image'] = formdata.get('image', [''])[0].strip()options['cpu_guarantee'] = formdata.get('cpu_guarantee', ['']
[0].strip()
options['mem_guarantee'] = formdata.get('mem_guarantee', ['']
[0].strip()
options['extra_resource_limits'] =
formdata.get('extra_resource_limits', [''])[0].strip()
return options

 @property
def singleuser_image_spec(self):
image = 'gcr.io/kubeflow/tensorflow-notebook-cpu'
if self.user_options.get('image'):
image = self.user_options['image']
return image

 @property
def cpu_guarantee(self):
cpu = '500m'
if self.user_options.get('cpu_guarantee'):
cpu = self.user_options['cpu_guarantee']
return cpu

 @property
def mem_guarantee(self):
mem = '1Gi'
if self.user_options.get('mem_guarantee'):
mem = self.user_options['mem_guarantee']
return mem

 @property
def extra_resource_limits(self):extra = ''
if self.user_options.get('extra_resource_limits'):
extra = json.loads(self.user_options['extra_resource_limits'])
return extra

 ################################################
##
### JupyterHub Options
################################################
##
c.JupyterHub.ip = '0.0.0.0'
c.JupyterHub.hub_ip = '0.0.0.0'
# Don't try to cleanup servers on exit - since in general for k8s, we
want
# the hub to be able to restart without losing user containers
c.JupyterHub.cleanup_servers = False
################################################
##

 ################################################
##
### Spawner Options
################################################
##
c.JupyterHub.spawner_class = KubeFormSpawner
c.KubeSpawner.singleuser_image_spec = 'gcr.io/kubeflow
tensorflow-notebook'
c.KubeSpawner.cmd = 'start-singleuser.sh'
c.KubeSpawner.args = ['--allow-root']
# gpu images are very large ~15GB. need a large timeout.
c.KubeSpawner.start_timeout = 60 * 30# Increase timeout to 5 minutes to avoid HTTP 500 errors on
JupyterHub
c.KubeSpawner.http_timeout = 60 * 5

 ################################################
##
### Persistent volume options
################################################
##
# Using persistent storage requires a default storage class.
# TODO(jlewi): Verify this works on minikube.
# TODO(jlewi): Should we set c.KubeSpawner.singleuser_fs_gid =
1000
# see https://github.com/kubeflow/kubeflow/pull
22#issuecomment-350500944
pvc_mount = os.environ.get('NOTEBOOK_PVC_MOUNT')
if pvc_mount and pvc_mount != 'null':
c.KubeSpawner.user_storage_pvc_ensure = True
# How much disk space do we want?
c.KubeSpawner.user_storage_capacity = '10Gi'
c.KubeSpawner.pvc_name_template = 'claim-{username
{servername}'
c.KubeSpawner.volumes = [
{
'name': 'volume-{username}{servername}',
'persistentVolumeClaim': {
'claimName': 'claim-{username}{servername}'
}
}
]
c.KubeSpawner.volume_mounts = [{
'mountPath': pvc_mount,
'name': 'volume-{username}{servername}'
}
]

 ######## Authenticator ######
c.JupyterHub.authenticator_class =
'dummyauthenticator.DummyAuthenticator'
kind: ConfigMap
metadata:
name: jupyterhub-config
namespace: kubeflow
---
apiVersion: v1
kind: Service
metadata:
labels:
app: tf-hub
name: tf-hub-0
namespace: kubeflow
spec:
clusterIP: None
ports:
- name: hub
port: 8000
selector:
app: tf-hub
---
apiVersion: v1
kind: Servicemetadata:
labels:
app: tf-hub-lb
name: tf-hub-lb
namespace: kubeflow
spec:
ports:
- name: hub
port: 80
targetPort: 8000
selector:
app: tf-hub
type: ClusterIP
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: tf-hub
namespace: kubeflow
spec:
replicas: 1
serviceName: ""
template:
metadata:
labels:
app: tf-hub
spec:
containers:
- command:
- jupyterhub
- -f- /etc/config/jupyterhub_config.py
image: gcr.io/kubeflow/jupyterhub-k8s:1.0.1
name: tf-hub
ports:
- containerPort: 8000
- containerPort: 8081
volumeMounts:
- mountPath: /etc/config
name: config-volume
serviceAccountName: jupyter-hub
volumes:
- configMap:
name: jupyterhub-config
name: config-volume
updateStrategy:
type: RollingUpdate
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
name: jupyter-role
namespace: kubeflow
rules:
- apiGroups:
- '*'
resources:
- '*'
verbs:
- '*'
---
apiVersion: v1kind: ServiceAccount
metadata:
labels:
app: jupyter-hub
name: jupyter-hub
namespace: kubeflow
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: jupyter-role
namespace: kubeflow
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: jupyter-role
subjects:
- kind: ServiceAccount
name: jupyter-hub
namespace: kubeflow
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: tf-job-operator
namespace: kubeflow
spec:
replicas: 1
template:
metadata:
labels:name: tf-job-operator
spec:
containers:
- command:
- /opt/mlkube/tf-operator
- --controller-config-file=/etc/config/controller_config_file.yaml
- --alsologtostderr
- -v=1
env:
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
image: gcr.io/kubeflow-images-public/tf_operator:v20180329
a7511ff
name: tf-job-operator
volumeMounts:
- mountPath: /etc/config
name: config-volume
serviceAccountName: tf-job-operator
volumes:
- configMap:
name: tf-job-operator-config
name: config-volume
---
apiVersion: v1
data:controller_config_file.yaml: |-
{
"grpcServerFilePath": "/opt/mlkube/grpc_tensorflow_server
grpc_tensorflow_server.py"
}
kind: ConfigMap
metadata:
name: tf-job-operator-config
namespace: kubeflow
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app: tf-job-operator
name: tf-job-operator
namespace: kubeflow
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
labels:
app: tf-job-operator
name: tf-job-operator
rules:
- apiGroups:
- tensorflow.org
- kubeflow.org
resources:
- tfjobs
verbs:------ '*'
apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- '*'
apiGroups:
- storage.k8s.io
resources:
- storageclasses
verbs:
- '*'
apiGroups:
- batch
resources:
- jobs
verbs:
- '*'
apiGroups:
- ""
resources:
- configmaps
- pods
- services
- endpoints
- persistentvolumeclaims
- events
verbs:
- '*'
apiGroups:- apps
- extensions
resources:
- deployments
verbs:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
labels:
app: tf-job-operator
name: tf-job-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: tf-job-operator
subjects:
- kind: ServiceAccount
name: tf-job-operator
namespace: kubeflow
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: tfjobs.kubeflow.org
spec:
group: kubeflow.org
names:
kind: TFJob
plural: tfjobssingular: tfjob
version: v1alpha1
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
labels:
app: tf-job-dashboard
name: tf-job-dashboard
rules:
- apiGroups:
- tensorflow.org
- kubeflow.org
resources:
- tfjobs
verbs:
- '*'
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- '*'
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
verbs:
- '*'
- apiGroups:
- batchresources:
- jobs
verbs:
- '*'
- apiGroups:
- ""
resources:
- configmaps
- pods
- services
- endpoints
- persistentvolumeclaims
- events
verbs:
- '*'
- apiGroups:
- apps
- extensions
resources:
- deployments
verbs:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
labels:
app: tf-job-dashboard
name: tf-job-dashboard
roleRef:
apiGroup: rbac.authorization.k8s.iokind: ClusterRole
name: tf-job-dashboard
subjects:
- kind: ServiceAccount
name: tf-job-dashboard
namespace: kubeflow
---
apiVersion: v1
kind: Service
metadata:
annotations:
getambassador.io/config: |-
---
apiVersion: ambassador/v0
kind: Mapping
name: tfjobs-ui-mapping
prefix: /tfjobs/
rewrite: /tfjobs/
service: tf-job-dashboard.kubeflow
name: tf-job-dashboard
namespace: kubeflow
spec:
ports:
- port: 80
targetPort: 8080
selector:
name: tf-job-dashboard
type: ClusterIP
---
apiVersion: v1
kind: ServiceAccountmetadata:
labels:
app: tf-job-dashboard
name: tf-job-dashboard
namespace: kubeflow
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: tf-job-dashboard
namespace: kubeflow
spec:
template:
metadata:
labels:
name: tf-job-dashboard
spec:
containers:
- command:
- /opt/tensorflow_k8s/dashboard/backend
image: gcr.io/kubeflow-images-public/tf_operator:v20180329
a7511ff
name: tf-job-dashboard
ports:
- containerPort: 8080
serviceAccountName: tf-job-dashboard
---
apiVersion: v1
kind: Service
metadata:
labels:service: ambassador
name: ambassador
namespace: kubeflow
spec:
ports:
- name: ambassador
port: 80
targetPort: 80
selector:
service: ambassador
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
labels:
service: ambassador-admin
name: ambassador-admin
namespace: kubeflow
spec:
ports:
- name: ambassador-admin
port: 8877
targetPort: 8877
selector:
service: ambassador
type: ClusterIP
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:name: ambassador
namespace: kubeflow
rules:
- apiGroups:
- ""
resources:
- services
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- update
- patch
- get
- list
- watch
- apiGroups:
- ""
resources:
- secrets
verbs:
- get
- list
- watch
---apiVersion: v1
kind: ServiceAccount
metadata:
name: ambassador
namespace: kubeflow
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: ambassador
namespace: kubeflow
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ambassador
subjects:
- kind: ServiceAccount
name: ambassador
namespace: kubeflow
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ambassador
namespace: kubeflow
spec:
replicas: 3
template:
metadata:
labels:
service: ambassadornamespace: kubeflow
spec:
containers:
- env:
- name: AMBASSADOR_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: AMBASSADOR_SINGLE_NAMESPACE
value: "true"
image: quay.io/datawire/ambassador:0.30.1
livenessProbe:
httpGet:
path: /ambassador/v0/check_alive
port: 8877
initialDelaySeconds: 30
periodSeconds: 30
name: ambassador
readinessProbe:
httpGet:
path: /ambassador/v0/check_ready
port: 8877
initialDelaySeconds: 30
periodSeconds: 30
resources:
limits:
cpu: 1
memory: 400Mi
requests:
cpu: 200m
memory: 100Mi- image: quay.io/datawire/statsd:0.30.1
name: statsd
restartPolicy: Always
serviceAccountName: ambassador
---
apiVersion: v1
kind: Service
metadata:
annotations:
getambassador.io/config: |-
---yaml
apiVersion: ambassador/v0
kind: Mapping
name: k8s-dashboard-ui-mapping
prefix: /k8s/ui/
rewrite: /
tls: true
service: kubernetes-dashboard.kube-system
name: k8s-dashboard
namespace: kubeflow
spec:
ports:
- port: 443
targetPort: 8443
selector:
k8s-app: kubernetes-dashboard
type: ClusterIP
@372046933, if you wrap all of that in three backticks when pasting, it'll be _much_ more helpful! You can take a look at the Syntax Highlighting section in https://guides.github.com/features/mastering-markdown/ for more. Thanks!
@kflynn Thank you. I will spend some time looking at mark down.
Here is one that I used on a fresh minikube install:
apiVersion: v1
kind: Service
metadata:
labels:
service: ambassador-admin
name: ambassador-admin
spec:
type: NodePort
ports:
- name: ambassador-admin
port: 8877
targetPort: 8877
selector:
service: ambassador
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ambassador
spec:
replicas: 1
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
labels:
service: ambassador
spec:
containers:
- name: ambassador
image: quay.io/datawire/ambassador:0.34.0
resources:
limits:
cpu: 1
memory: 400Mi
requests:
cpu: 200m
memory: 100Mi
env:
- name: AMBASSADOR_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
livenessProbe:
httpGet:
path: /ambassador/v0/check_alive
port: 8877
initialDelaySeconds: 30
periodSeconds: 3
readinessProbe:
httpGet:
path: /ambassador/v0/check_ready
port: 8877
initialDelaySeconds: 30
periodSeconds: 3
- name: statsd
image: quay.io/datawire/statsd:0.34.0
restartPolicy: Always
I get the following:
$ kubectl logs ambassador-7cbc8d6c78-m4qxd -c ambassador
./entrypoint.sh: set: line 65: can't access tty; job control turned off
Traceback (most recent call last):
File "/application/kubewatch.py", line 493, in <module>
main()
File "/usr/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/application/kubewatch.py", line 476, in main
sync(restarter)
File "/application/kubewatch.py", line 313, in sync
for x in v1.list_namespaced_config_map(restarter.namespace).items ]
File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 12395, in list_namespaced_config_map
(data) = self.list_namespaced_config_map_with_http_info(namespace, **kwargs)
File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 12497, in list_namespaced_config_map_with_http_info
collection_formats=collection_formats)
File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 335, in call_api
_preload_content, _request_timeout)
File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 148, in __call_api
_request_timeout=_request_timeout)
File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 371, in request
headers=headers)
File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 250, in GET
query_params=query_params)
File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 240, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Fri, 01 Jun 2018 14:55:14 GMT', 'Content-Length': '269'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"configmaps is forbidden: User \"system:serviceaccount:default:default\" cannot list configmaps in the namespace \"default\"","reason":"Forbidden","details":{"kind":"configmaps"},"code":403}
AMBASSADOR: kubewatch sync exited with status 1
Here's the envoy.json we were trying to run with:
ls: /etc/envoy*.json: No such file or directory
No config generated.
AMBASSADOR: shutting down
@PierrickI3 That looks like an RBAC failure. Did you start Minikube with RBAC? (Are you on our Slack? This might be easier there -- instructions are at www.getambassador.io :) ).
I am not using RBAC. And yes, I am on slack
@PierrickI3 Hmm, I don't see you on the datawire-oss slack -- what's your username there?
I think it'd be interesting to try setting up RBAC on your minikube. https://www.getambassador.io/yaml/ambassador-rbac.yaml should work for you; can you give that a shot?
I have passed --authorization-mode=Node,RBAC to kube-apiserver
@372046933 let's try this -- see the "attach files by dragging & dropping, etc." below the text box to add a comment to the PR? Can you attach a zipfile of the YAML you tried to paste earlier?
Thanks!!
@kflynn I just did 'kubectl apply -f https://www.getambassador.io/yaml/ambassador/ambassador-rbac.yaml'. But got below.
./entrypoint.sh: set: line 65: can't access tty; job control turned off
2018-06-05 09:32:59,483 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),)': /api/v1/namespaces/default/configmaps
2018-06-05 09:32:59 kubewatch 0.34.0 WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),)': /api/v1/namespaces/default/configmaps
2018-06-05 09:32:59,488 WARNING Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),)': /api/v1/namespaces/default/configmaps
2018-06-05 09:32:59 kubewatch 0.34.0 WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),)': /api/v1/namespaces/default/configmaps
2018-06-05 09:32:59,493 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),)': /api/v1/namespaces/default/configmaps
2018-06-05 09:32:59 kubewatch 0.34.0 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),)': /api/v1/namespaces/default/configmaps
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
conn.connect()
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 326, in connect
ssl_context=context)
File "/usr/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 329, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/lib/python3.6/ssl.py", line 407, in wrap_socket
_context=self, _session=session)
File "/usr/lib/python3.6/ssl.py", line 814, in __init__
self.do_handshake()
File "/usr/lib/python3.6/ssl.py", line 1068, in do_handshake
self._sslobj.do_handshake()
File "/usr/lib/python3.6/ssl.py", line 689, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/application/kubewatch.py", line 493, in <module>
main()
File "/usr/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/application/kubewatch.py", line 476, in main
sync(restarter)
File "/application/kubewatch.py", line 313, in sync
for x in v1.list_namespaced_config_map(restarter.namespace).items ]
File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 12395, in list_namespaced_config_map
(data) = self.list_namespaced_config_map_with_http_info(namespace, **kwargs)
File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 12497, in list_namespaced_config_map_with_http_info
collection_formats=collection_formats)
File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 335, in call_api
_preload_content, _request_timeout)
File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 148, in __call_api
_request_timeout=_request_timeout)
File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 371, in request
headers=headers)
File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 250, in GET
query_params=query_params)
File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 223, in request
headers=headers)
File "/usr/lib/python3.6/site-packages/urllib3/request.py", line 66, in request
**urlopen_kw)
File "/usr/lib/python3.6/site-packages/urllib3/request.py", line 87, in request_encode_url
return self.urlopen(method, url, **extra_kw)
File "/usr/lib/python3.6/site-packages/urllib3/poolmanager.py", line 321, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 668, in urlopen
**response_kw)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 668, in urlopen
**response_kw)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 668, in urlopen
**response_kw)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='172.111.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces/default/configmaps (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),))
AMBASSADOR: kubewatch sync exited with status 1
Here's the envoy.json we were trying to run with:
ls: /etc/envoy*.json: No such file or directory
No config generated.
AMBASSADOR: shutting down
@abmybgx I think this is a different issue (you're running into some sort of cert error) from the core issues mentioned above. Could you open a new issue, with details on your configuration (e.g., Kubernetes version, etc.)?
@richarddli Thx for your comment. And after I changed self.verify_ssl from True to False in /usr/lib/python3.6/site-packages/kubernetes/client/configuration.py, it works now.
@PierrickI3 did you manage to fix your issue? I ran into the same error.
I was following the tutorial https://github.com/SeldonIO/seldon-core/blob/master/notebooks/ksonnet_ambassador_minikube.ipynb
but with RBAC enabled when starting minikube and also using the right ambassador yaml with RBAC.
@AdrianLsk Ah, thanks for that! I'll try that and see what happens for me.
@AdrianLsk I haven't been able to do it just yet. You should probably try @richarddli's comment above.
I would be interested in knowing what you found out though.
@kflynn great, hopefully you can find a solution :)
@PierrickI3 ah, that's a shame. Unfortunately, @richarddli's comment corresponds to an error due to a certificate verification failure, which is different from ours.
I think the root of our problem is the configmaps methods specification in the ambassador yaml file for rbac:
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: ambassador
rules:
- apiGroups: [""]
resources:
- services
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["create", "update", "patch", "get", "list", "watch"]
- apiGroups: [""]
resources:
- secrets
verbs: ["get", "list", "watch"]
since the error says that:
configmaps is forbidden: User <user> cannot list configmaps in the namespace <namespace>,"reason":"Forbidden","details":{"kind":"configmaps"}
where @PierrickI3 has:
<user> = \"system:serviceaccount:default:default\"
<namespace> = \"default\"
and for me it's
<user> = \"system:serviceaccount:seldon:ambassador\"
<namespace> = \"seldon\"
Augh. So. Seldon deploys into the seldon namespace, indeed, and Ambassador will need to be tweaked for that.
@AdrianLsk Are you on our Slack channel? www.getambassador.io has instructions, if not -- I'd like to give you a different ambassador.yaml to try, but it'll likely be easier to interact there.
OK, so here's a better Ambassador deployment YAML for Seldon. I'll figure out how to create a PR for this for the Seldon folks.
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: ambassador
rules:
- apiGroups: [""]
resources:
- services
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["create", "update", "patch", "get", "list", "watch"]
- apiGroups: [""]
resources:
- secrets
verbs: ["get", "list", "watch"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: ambassador
namespace: seldon
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: ambassador
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ambassador
subjects:
- kind: ServiceAccount
name: ambassador
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: ambassador
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ambassador
subjects:
- kind: ServiceAccount
name: ambassador
namespace: seldon
---
apiVersion: v1
kind: Service
metadata:
name: ambassador
namespace: seldon
spec:
selector:
service: ambassador
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
type: NodePort
---
apiVersion: v1
kind: Service
metadata:
labels:
service: ambassador-admin
name: ambassador-admin
namespace: seldon
spec:
ports:
- name: ambassador-admin
port: 8877
targetPort: 8877
selector:
service: ambassador
type: NodePort
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ambassador
namespace: seldon
spec:
replicas: 1
template:
metadata:
annotations:
sidecar.istio.io/inject: 'false'
labels:
service: ambassador
spec:
containers:
- image: quay.io/datawire/ambassador:0.34.1
name: ambassador
env:
- name: AMBASSADOR_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
limits:
cpu: 1
memory: 400Mi
requests:
cpu: 200m
memory: 100Mi
- image: quay.io/datawire/statsd:0.34.1
name: statsd
restartPolicy: Always
serviceAccountName: ambassador
Thanks to @AdrianLsk for checking this for me! Anyone want to give it a shot?
The issue is indeed the RBAC configuration around the separate namespace. It's not enough to create RBAC roles and accounts in the default namespace; you need some of them in the seldon namespace.
cc @cliveseldon
Thanks for the info. Yes please create an issue on Seldon-Core so we can try to solve it. We have ksonnet and helm packages.
@cliveseldon I believe if you want to just update your default install docs to use the above YAML, that would work.
Opened https://github.com/SeldonIO/seldon-core/issues/165 with the YAML above for seldon.
OK, I'm closing this PR because at this point, everything is pointing to RBAC. We've updated Ambassador's docs to talk about how to verify that RBAC is enabled, to hopefully prevent this from being a major problem in the future.
Please reopen this issue, or file a new one, if you see this problem again. Thanks!
@victortrac can you please provide more detail how you fixed this?
Most helpful comment
@victortrac can you please provide more detail how you fixed this?