What happened:
$ kubectl logs b4e7d8f6-494f-11e9-ad67-00155d19cd6a -c clonerefs
{"component":"clonerefs","error":"could not create/append to /root/.ssh/known_hosts: open /root/.ssh/known_hosts: no such file or directory","level":"error","msg":"failed to add host fingerprints","time":"2019-03-18T07:30:59Z"}
What you expected to happen:
Clonerefs processes SSHHostFingerprints configuration successfully.
How to reproduce it (as minimally and precisely as possible):
Note: This should be reproducible by replacing cbuchacher/testrepo with a public repo, and removing ssh_key_secrets (not ssh_host_fingerprints).
$ kubectl create secret generic prow-github-ssh-key --from-file=ssh-privatekey=/path/to/id_rsa
$ kubectl create secret generic prow-github-ssh-knownhosts --from-file=known_hosts=/path/to/known_hosts
$ go get -u k8s.io/test-infra/prow/cmd/mkpj
$ mkpj --github-token-path=github-access-token --job=bar-job --config-path=config.yaml >bar-job.yaml
$ kubectl create -f bar-job.yaml
plank:
job_url_template: 'https://example.com/{{.Spec.Job}}/{{.Status.BuildID}}/'
report_template: '[Full PR test history](https://example.com/?org={{.Spec.Refs.Org}}&repo={{.Spec.Refs.Repo}}).'
job_url_prefix: https://example.com/view/gcs/
pod_pending_timeout: 60m
default_decoration_config:
timeout: 7200000000000 # 2h
grace_period: 15000000000 # 15s
utility_images:
clonerefs: "gcr.io/k8s-prow/clonerefs:v20190312-abfe0e0"
initupload: "gcr.io/k8s-prow/initupload:v20190312-abfe0e0"
entrypoint: "gcr.io/k8s-prow/entrypoint:v20190312-abfe0e0"
sidecar: "gcr.io/k8s-prow/sidecar:v20190312-abfe0e0"
gcs_configuration:
bucket: "... SNIP ..."
path_strategy: "legacy"
default_org: "cbuchacher"
default_repo: "testrepo"
gcs_credentials_secret: "prow-service-account"
presubmits:
cbuchacher/testrepo:
- name: bar-job
always_run: true
decorate: true
decoration_config:
ssh_key_secrets:
- prow-github-ssh-key
ssh_host_fingerprints:
- prow-github-ssh-knownhosts
clone_uri: "[email protected]:cbuchacher/testrepo.git"
skip_report: false
spec:
containers:
- image: gcr.io/cloud-builders/docker
command:
- "/usr/bin/docker"
args: ['build', '-f', 'Dockerfile', '.']
volumeMounts:
- name: docker-socket
mountPath: /var/run/docker.sock
volumes:
- name: docker-socket
hostPath:
path: /var/run/docker.sock
type: Socket
Please provide links to example occurrences, if any:
Anything else we need to know?:
Related to #9450.
/area prow/pod-utilities
/cc @cjwagner @fejta
/assign
The issue here is that we configure the SSH keys with ssh-agent but use $HOME/.ssh/known_hosts for the fingerprints. We need to use an ephemeral configuration for the fingerprints, too, so we do not need to be in any specific part of the filesystem
Looks like you can configure the paths to look at with ssh-broker-config.xml but that doesn't solve our problem as that will not be a writable path either
We may need to pass -o UserKnownHostsFile to the downstream git calls?
/unassign
/assign @fejta
LMK if you would like me to do the impl, Erick, but I am not sure what you think would be the best way forward here
We could set the environment variable GIT_SSH_COMMAND='ssh -o UserKnownHostsFile=/path/to/known_hosts' if this is needed only for git commands.
Love that idea!
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle rotten
@stevekuznetsov I'm having the same problem now. I "fixed' it by adding the following hack to clonerefs/run.go addHostFingerprints:
````go
sshDir := filepath.Join(os.Getenv("HOME"), ".ssh")
path := filepath.Join(sshDir, "known_hosts")
if _, err := os.Stat(sshDir); os.IsNotExist(err) {
err := os.MkdirAll(sshDir, 0755)
if err != nil {
return fmt.Errorf("could not create sshDir %s: %v", sshDir, err)
}
}
````
The problem in my case was just that the ~/.ssh folder didn't exist so I just added the folder and everything else worked as expected.
What do you think what be the right way forward?
That assumes that there's a user, $HOME is set, and the process can create that directory, none of which are true in the general case. I think the above solution (setting $GIT_SSH_COMMAND) is a more complete solution -- could you try that?
@stevekuznetsov so write the file to a place where it can be written and set the envvar? Yes I can try that
The file should already be written and exist in a known place from the mount, so we just need to set the var!
@stevekuznetsov
I'm not sure if I configured Prow wrong, but in my case only the ssh keys are mounted. The known hosts fingerprints are parsed directly from the CLONEREFS_OPTIONS env var: https://github.com/kubernetes/test-infra/blob/master/prow/clonerefs/run.go#L50-L54
````
initContainers:
```go
// KeyFiles are files containing SSH keys to be used
// when cloning. Will be added tossh-agent.
KeyFiles []stringjson:"key_files,omitempty"`
// HostFingerPrints are ssh-keyscan host fingerprint lines to use
// when cloning. Will be added to ~/.ssh/known_hosts
HostFingerprints []string `json:"host_fingerprints,omitempty"`
````
It's similar in the plank config:
plank:
default_decoration_config:
ssh_key_secrets:
- ssh-secret
ssh_host_fingerprints:
- "43.18.255.110 ssh-rsa A...1p"
- "43.18.255.110 ecdsa-sha2-nistp256 A...Q8="
So I could think of two alternatives:
GIT_SSH_COMMAND env varWhat do you think?
Oh, sorry. I misremembered how it was working. Your second approach sounds good! Do we fail today when we try to update ~/.ssh/known_hosts as well>
Oh, sorry. I misremembered how it was working. Your second approach sounds good! Do we fail today when we try to update
~/.ssh/known_hostsas well>
Okay perfect. Depends on how you define failing :). I got an error in the log (because the file did not exist) and then the git clone or whatever it's doing later fails because the known_hosts had not been updated/created. So I would say if somebody depends on entries in known_hosts I'm not sure how it could work with the current code.
@stevekuznetsov I started implementing it and I'm not sure what's better:
clonerefs-tmp) to the pod and mount this emptyDir into the clonerefs initContainer just to ensure the /tmp folder exists/tmp folder (which in our current case is always there because clonerefs uses Alpine and not scratch)I implemented the second variant in this PR: https://github.com/kubernetes/test-infra/pull/14468
But I could change it to the first variant, if it's really the preferred one. It just looks like a lot of overhead to ensure the /tmp folder exists, especially because then I would have to change the code which generates the initContainers. (https://github.com/kubernetes/test-infra/blob/master/prow/pod-utils/decorate/podspec.go#L294)
I think adding the EmptyDir would be preferred -- it also ensures that we will have write access to that dir. Prow generally prefers this method and we mount an EmptyDir for /tmp in other components as well.
/cc @droslean
Okay no problem.