When creating a new workspace the following error is returned from the Web UI
Error: Request createWorkspace failed with message: 14 UNAVAILABLE: failed to connect to all addresses
Workspace launches
Red Hat 7.8 (Docker CE 19.03.13)
Self-hosted installation (0.40) integrated with GitLab (on-premise)
URL: https://gitpod.domain.local/#https://gitlab.domain.local/joesmith/myrepo/-/tree/master/
Error from server
{"@type":"type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent","serviceContext":{"service":"server","version":"v0.4.0"},"stack_trace":"Error: 14 UNAVAILABLE: failed to connect to all addresses
at Object.exports.createStatusError (/app/node_modules/grpc/src/common.js:91:15)
at Object.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:1204:28)
at InterceptingListener._callNext (/app/node_modules/grpc/src/client_interceptors.js:568:42)
at InterceptingListener.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:618:8)
at callback (/app/node_modules/grpc/src/client_interceptors.js:845:24)","component":"server","severity":"ERROR","time":"2020-10-15T21:32:37.208Z","environment":"production","region":"local","message":"Request createWorkspace failed with internal server error","error":"Error: 14 UNAVAILABLE: failed to connect to all addresses
at Object.exports.createStatusError (/app/node_modules/grpc/src/common.js:91:15)
at Object.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:1204:28)
at InterceptingListener._callNext (/app/node_modules/grpc/src/client_interceptors.js:568:42)
at InterceptingListener.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:618:8)
at callback (/app/node_modules/grpc/src/client_interceptors.js:845:24)","payload":{"method":"createWorkspace","args":[{"contextUrl":"https://gitlab.devlnk.net/john.gallucci/bisf-cli/-/tree/master/","mode":"select-if-running"},{"_isCancelled":false}]}}
n/a
I can confirm I have the same issue with my docker-compose self-hosted Gitpod. After fixing issue #1906 this is the current error that pops up.
Self-hosted installation (0.40) integrated with GitLab (on-premise)
I also had this error with 0.4.0 occasionally. Usually, a re-deploy fixed the problem.
Since 0.5.0 this has never happened to me again. Have you tried to upgrade to version 0.5.0?
See also: https://community.gitpod.io/t/clean-install-and-unable-to-launch-workspace/1547
So I'm using the current latest tag from eu.gcr.io/gitpod-core-dev/build/gitpod-k3s
This uses eu.gcr.io/gitpod-io/self-hosted/theia-server:0.5.0
I ran some wireshark captures on the pods and what I find strange is the server pod is making a tcp/8080 connection to the image builder and getting connection resets. This is surprising because the image-builder is not listening on tcp/8080 so why would the server pod be making this connection attempt in the first place?
@corneliusludmann I see only 0.5.0 available for gitpod chart and not gitpod-selfhosted which is only available up to 0.4.0. Will the regular gitpod chart work the same on-premise?
Well the server is definitely configured to communicate to image-builder over tcp/8080. Here is the list of environment variables from within the server pod:
unode@server-85d4499574-p8cf5:/app/node_modules/@typefox/server$ env | grep IMAGE_BUILDER
IMAGE_BUILDER_SERVICE_HOST=10.43.101.132
IMAGE_BUILDER_PORT_8080_TCP_PROTO=tcp
IMAGE_BUILDER_PORT=tcp://10.43.101.132:8080
IMAGE_BUILDER_PORT_8080_TCP_ADDR=10.43.101.132
IMAGE_BUILDER_SERVICE_PORT_RPC=8080
IMAGE_BUILDER_PORT_8080_TCP_PORT=8080
IMAGE_BUILDER_PORT_8080_TCP=tcp://10.43.101.132:8080
IMAGE_BUILDER_SERVICE_PORT=8080
However, I can verify that image-builder is in fact NOT listening on tcp/8080 so perhaps this has not been deployed correctly (even though it is marked as active)
Thanks for your analysis, @jgallucci32.
I see only 0.5.0 available for gitpod chart and not gitpod-selfhosted which is only available up to 0.4.0. Will the regular gitpod chart work the same on-premise?
The gitpod-selfhosted repo is deprecated. I just added a note. The Gitpod 0.5.0 helm charts work more or less the same. You'll find sample values.yaml files at https://github.com/gitpod-io/gitpod/tree/master/chart and https://github.com/gitpod-io/gitpod/tree/master/install/helm.
It would be great if you could check if you experience the same with Gitpod 0.5.0. Probably @csweichel could have a look at your findings.
@corneliusludmann I'm not sure if it's helpful, but I'm experiencing the same issue when using the provided docker-compose.yaml. I'm happy to provide some logs if you tell me what you need.
Thanks,
philjak
@corneliusludmann @philjak I was able to get past this issue with a workaround. It came down to the MTU setting of the Docker-in-Docker image being set to 1500 when Kubernetes (which uses Calico networking) has an overlay with MTU 1450 on the base container/pod.
In order to fix this I had to add the flag --mtu=1450 to the entrypoint for the image-builder pod. Here is a snippet of the manifest for it:
- args:
- dockerd
- --userns-remap=default
- -H tcp://127.0.0.1:2375
- --mtu=1450
Apparently this is a known issue with K8s + DinD + Alpine running an apk fetch as noted in this Github issue
https://github.com/gliderlabs/docker-alpine/issues/307
Awesome @jgallucci32 ! I can confirm. Edited the image-builder deployment and it's working!
@philjak could you tell, what is exactly needed to change in the docker-compose.yaml ?
okay, my current fix is to create a new volume which maps /chart/templates to a local folder and insert a modified image-builder-deployment.yaml
but this problem should be fixed in the dockerimage, I think the mtu could statically be changed without great impact also to non DinD environments
Thanks @BenjaminBeichler for sharing. I really just edited the running deployment - so this was no persistent change. But until the issue has been fixed, I guess I'll also try to using a temporarily volume.
@jgallucci32 Thank you very much for investigating this issue. :+1:
To make sure that this fixes the issue I ran some tests with the docker-compose.yaml setting: Without your fix, in 4 of 10 cases, I get the “failed to connect to all addresses” error after deployment. With your fix, I successfully deployed Gitpod 10 times in a row without this error. It's still a rather small sample, but I am convinced that this fixes the bug. :smile:
As already described, a temporary fix would be to mount a patched image-builder-deployment.yaml into the chart folder. Add this to your docker-compose.yaml volumes section:
- ./image-builder-deployment.yaml:/chart/templates/image-builder-deployment.yaml
You can get this patched file e.g. by running:
$ docker-compose exec gitpod sed 's/"dockerd"/"dockerd", "--mtu=1450"/' /chart/templates/image-builder-deployment.yaml > ../image-builder-deployment.yaml
I'll gonna create a PR to fix this soon.
Most helpful comment
@corneliusludmann @philjak I was able to get past this issue with a workaround. It came down to the MTU setting of the Docker-in-Docker image being set to 1500 when Kubernetes (which uses Calico networking) has an overlay with MTU 1450 on the base container/pod.
In order to fix this I had to add the flag
--mtu=1450to the entrypoint for the image-builder pod. Here is a snippet of the manifest for it:Apparently this is a known issue with K8s + DinD + Alpine running an
apk fetchas noted in this Github issuehttps://github.com/gliderlabs/docker-alpine/issues/307