Test-infra: clonerefs: could not create/append to /root/.ssh/known_hosts

Created on 18 Mar 2019  路  20Comments  路  Source: kubernetes/test-infra

What happened:

$ kubectl logs b4e7d8f6-494f-11e9-ad67-00155d19cd6a -c clonerefs
{"component":"clonerefs","error":"could not create/append to /root/.ssh/known_hosts: open /root/.ssh/known_hosts: no such file or directory","level":"error","msg":"failed to add host fingerprints","time":"2019-03-18T07:30:59Z"}

What you expected to happen:

Clonerefs processes SSHHostFingerprints configuration successfully.

How to reproduce it (as minimally and precisely as possible):

Note: This should be reproducible by replacing cbuchacher/testrepo with a public repo, and removing ssh_key_secrets (not ssh_host_fingerprints).

$ kubectl create secret generic prow-github-ssh-key --from-file=ssh-privatekey=/path/to/id_rsa
$ kubectl create secret generic prow-github-ssh-knownhosts --from-file=known_hosts=/path/to/known_hosts
$ go get -u k8s.io/test-infra/prow/cmd/mkpj
$ mkpj --github-token-path=github-access-token --job=bar-job --config-path=config.yaml >bar-job.yaml
$ kubectl create -f bar-job.yaml
  • config.yaml
plank:
  job_url_template: 'https://example.com/{{.Spec.Job}}/{{.Status.BuildID}}/'
  report_template: '[Full PR test history](https://example.com/?org={{.Spec.Refs.Org}}&repo={{.Spec.Refs.Repo}}).'
  job_url_prefix: https://example.com/view/gcs/
  pod_pending_timeout: 60m
  default_decoration_config:
    timeout: 7200000000000 # 2h
    grace_period: 15000000000 # 15s
    utility_images:
      clonerefs: "gcr.io/k8s-prow/clonerefs:v20190312-abfe0e0"
      initupload: "gcr.io/k8s-prow/initupload:v20190312-abfe0e0"
      entrypoint: "gcr.io/k8s-prow/entrypoint:v20190312-abfe0e0"
      sidecar: "gcr.io/k8s-prow/sidecar:v20190312-abfe0e0"
    gcs_configuration:
      bucket: "... SNIP ..."
      path_strategy: "legacy"
      default_org: "cbuchacher"
      default_repo: "testrepo"
    gcs_credentials_secret: "prow-service-account"

presubmits:
  cbuchacher/testrepo:
  - name: bar-job
    always_run: true
    decorate: true
    decoration_config:
      ssh_key_secrets:
      - prow-github-ssh-key
      ssh_host_fingerprints:
      - prow-github-ssh-knownhosts
    clone_uri: "[email protected]:cbuchacher/testrepo.git"
    skip_report: false
    spec:
      containers:
      - image: gcr.io/cloud-builders/docker
        command:
        - "/usr/bin/docker"
        args: ['build', '-f', 'Dockerfile', '.']
        volumeMounts:
        - name: docker-socket
          mountPath: /var/run/docker.sock
      volumes:
      - name: docker-socket
        hostPath:
          path: /var/run/docker.sock
          type: Socket

Please provide links to example occurrences, if any:

Anything else we need to know?:

Related to #9450.

arepropod-utilities kinbug

All 20 comments

/area prow/pod-utilities
/cc @cjwagner @fejta
/assign

The issue here is that we configure the SSH keys with ssh-agent but use $HOME/.ssh/known_hosts for the fingerprints. We need to use an ephemeral configuration for the fingerprints, too, so we do not need to be in any specific part of the filesystem

Looks like you can configure the paths to look at with ssh-broker-config.xml but that doesn't solve our problem as that will not be a writable path either

We may need to pass -o UserKnownHostsFile to the downstream git calls?

/unassign
/assign @fejta
LMK if you would like me to do the impl, Erick, but I am not sure what you think would be the best way forward here

We could set the environment variable GIT_SSH_COMMAND='ssh -o UserKnownHostsFile=/path/to/known_hosts' if this is needed only for git commands.

Love that idea!

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

/remove-lifecycle rotten

@stevekuznetsov I'm having the same problem now. I "fixed' it by adding the following hack to clonerefs/run.go addHostFingerprints:

````go
sshDir := filepath.Join(os.Getenv("HOME"), ".ssh")
path := filepath.Join(sshDir, "known_hosts")

if _, err := os.Stat(sshDir); os.IsNotExist(err) {
    err := os.MkdirAll(sshDir, 0755)
    if err != nil {
        return fmt.Errorf("could not create sshDir %s: %v", sshDir, err)
    }
}

````

The problem in my case was just that the ~/.ssh folder didn't exist so I just added the folder and everything else worked as expected.

What do you think what be the right way forward?

That assumes that there's a user, $HOME is set, and the process can create that directory, none of which are true in the general case. I think the above solution (setting $GIT_SSH_COMMAND) is a more complete solution -- could you try that?

@stevekuznetsov so write the file to a place where it can be written and set the envvar? Yes I can try that

The file should already be written and exist in a known place from the mount, so we just need to set the var!

@stevekuznetsov

I'm not sure if I configured Prow wrong, but in my case only the ssh keys are mounted. The known hosts fingerprints are parsed directly from the CLONEREFS_OPTIONS env var: https://github.com/kubernetes/test-infra/blob/master/prow/clonerefs/run.go#L50-L54

````
initContainers:

  • command:

    • /clonerefs

      env:

    • name: CLONEREFS_OPTIONS

      value: '{"src_root":"/home/prow/go","log":"/logs/clone.json",...,"key_files":["/s

      ecrets/ssh/ssh-secret"],"host_fingerprints":["43.18.255.110

      ssh-rsa A...1p","43.18.255.110

      ecdsa-sha2-nistp256 A...Q8="]}'

      image: **/clonerefs:latest

      imagePullPolicy: Always

      name: clonerefs

      resources: {}

      terminationMessagePath: /dev/termination-log

      terminationMessagePolicy: File

      volumeMounts:

    • mountPath: /logs

      name: logs

    • mountPath: /home/prow/go

      name: code

    • mountPath: /secrets/ssh/ssh-secret

      name: ssh-keys-ssh-secret

      readOnly: true

      ````

```go // KeyFiles are files containing SSH keys to be used // when cloning. Will be added tossh-agent. KeyFiles []stringjson:"key_files,omitempty"`

// HostFingerPrints are ssh-keyscan host fingerprint lines to use
// when cloning. Will be added to ~/.ssh/known_hosts
HostFingerprints []string `json:"host_fingerprints,omitempty"`

````

It's similar in the plank config:
plank: default_decoration_config: ssh_key_secrets: - ssh-secret ssh_host_fingerprints: - "43.18.255.110 ssh-rsa A...1p" - "43.18.255.110 ecdsa-sha2-nistp256 A...Q8="

So I could think of two alternatives:

  • mount the ssh_host_fingerprints similar to the ssh secret, but I think this would be a breaking change
  • mount an emptyDir (e.g. /tmp). Write the known hosts file on this directory and set the GIT_SSH_COMMAND env var

What do you think?

Oh, sorry. I misremembered how it was working. Your second approach sounds good! Do we fail today when we try to update ~/.ssh/known_hosts as well>

Oh, sorry. I misremembered how it was working. Your second approach sounds good! Do we fail today when we try to update ~/.ssh/known_hosts as well>

Okay perfect. Depends on how you define failing :). I got an error in the log (because the file did not exist) and then the git clone or whatever it's doing later fails because the known_hosts had not been updated/created. So I would say if somebody depends on entries in known_hosts I'm not sure how it could work with the current code.

@stevekuznetsov I started implementing it and I'm not sure what's better:

  1. Add a emptyDir (clonerefs-tmp) to the pod and mount this emptyDir into the clonerefs initContainer just to ensure the /tmp folder exists
  2. Just rely on the existing /tmp folder (which in our current case is always there because clonerefs uses Alpine and not scratch)

I implemented the second variant in this PR: https://github.com/kubernetes/test-infra/pull/14468

But I could change it to the first variant, if it's really the preferred one. It just looks like a lot of overhead to ensure the /tmp folder exists, especially because then I would have to change the code which generates the initContainers. (https://github.com/kubernetes/test-infra/blob/master/prow/pod-utils/decorate/podspec.go#L294)

I think adding the EmptyDir would be preferred -- it also ensures that we will have write access to that dir. Prow generally prefers this method and we mount an EmptyDir for /tmp in other components as well.

/cc @droslean

Okay no problem.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stevekuznetsov picture stevekuznetsov  路  4Comments

fejta picture fejta  路  4Comments

zacharysarah picture zacharysarah  路  3Comments

BenTheElder picture BenTheElder  路  3Comments

MrHohn picture MrHohn  路  4Comments