Harbor: since 1.8 automating full install/configuration of harbor is harder

Created on 19 Jun 2019 · 10Comments · Source: goharbor/harbor

Before Harbor 1.8 I could configure Harbor to use external Auth via config file or environment variable, all automation tooling know how to do both of those things. whether its chef or puppet, or something like confd pulling from consul. Even better languages like spring have integration to use kubeconfig maps as config sources.

These are all things we've learned to treat as code, and as devops/sre/whatever we have built up strong muscles around managing, testing, securing, etc, treating config changes as "code changes".

With the new configuration API however we cannot do any of those things, and instead of tried and true trusted ways to configure our app, the developers are forcing us to use an API without a decent client tool. now instead of the simple task of "write file -> restart service" we have to rewrite automation tooling to take our list of preferred settings and sling them into the API via curl or similar.

What's even more painful is that some settings are still set via environment variable (such as UAA_CA_ROOT) so its not even completely consistent.

Proposal:

We bring configuration via ENV or config file back and we mark anything set in them as read-only in the API/UI.
or
We write a config cli* and people can use that, as its less error prone than just running curl -X POST a bunch of times.

a config cli is how spinnaker's halyard works, you start halyard, run a bunch of hal commands to configure it and then tell it to deploy/apply and it then deploys all of the spinnaker microservices with your settings. It's slightly different in that its actually managing the lifecycle of the components via halyard vs via kube manifests directly, but the idea is there.

areconfiguration kinrequirement prioritmedium

Source

paulczar

👍20

Most helpful comment

This issue has been open for over a year, and I still cannot easily configure Harbor without performing circus tricks. We need to get back to being able to configure auth settings via config files and/or environment variables.

Here's my current method for configuring auth after the fact ... it uses the raw chart to write out some secrets and a job to curl put to the api. It works ... but its not a great experience.

{{- if eq .Values._.auth.type "oidc" }}
  - metadata:
      name: auth-config-json
    apiVersion: v1
    kind: Secret
    stringData:
      auth.json: |
        {
          "auth_mode": "oidc_auth",
          "oidc_name": "oidc",
          "oidc_scope": "openid,offline_access,email,profile",
          "oidc_client_id": "{{ requiredEnv "HARBOR_OIDC_CLIENT_ID" }}",
          "oidc_client_secret": "{{ requiredEnv "HARBOR_OIDC_CLIENT_SECRET" }}",
          "oidc_endpoint": "{{ requiredEnv "HARBOR_OIDC_URL" }}",
          "oidc_verify_cert": "false"
        }
      password: {{ env "HARBOR_ADMIN_PASSWORD" }}
  - metadata:
      name: configure-auth
    apiVersion: batch/v1
    kind: Job
    spec:
      template:
        spec:
          volumes:
          - name: config
            secret:
              secretName: auth-config-json
          initContainers:
          - name: wait-for-harbor
            image: curlimages/curl:7.72.0
            command:
            - "bin/sh"
            - "-c"
            - "while ! curl -s http://harbor-core > /dev/null ; do sleep 10 ; done"
          containers:
          - name: configure-auth
            image: curlimages/curl:7.72.0
            volumeMounts:
            - name: config
              mountPath: "/config"
              readOnly: true
            env:
            - name: "PASSWORD"
              valueFrom:
                secretKeyRef:
                  name: auth-config-json
                  key: password
            command:
            - "bin/sh"
            - "-c"
            - 'curl -s -i -X PUT -u "admin:$PASSWORD" -H "Content-Type: application/json" http://harbor-core/api/v2.0/configurations -d @/config/auth.json'
          restartPolicy: Never
      backoffLimit: 4
{{- end }}

paulczar on 22 Aug 2020

👍6

All 10 comments

One more thing ... When setting the config settings via the API, the settings can later still be overwritten by the Web Console ... so we still suffer from the same problem of config drift as "the configuration in my app does not match the configuration in the automation", it's just this time a user setting via the app wins over the user setting via the automation.

I know a lot of folks use automation ( and the frequency of rerunning it every say 30 seconds ) to re-mediate changes when somebody changes the live config to be different from the expected config.

It's a bit like Kubernetes itself ... If you set something in a deployment and then later somebody modifies a pod owned by that deployment ... when it comes to reconcile that .. the deployment being the owner of the podTemplate will win. This should be the same case for configuring an application.

paulczar on 19 Jun 2019

Essentially configuration should have a single source of truth, and in this modern age of devops/ci-cd that sort of truth should be an artifact that can be traced and audited. Something like one of the following:

1) config-management - chef/puppet/helm code + declarative state - both stored in github, both tested on change, and ideally applied on by ci/cd tool. changes auditable via github and/or ci/cd tooling.

2) centralized config service - consul, etcd, etc. Usually requires a client app that takes values and writes them to a template/config file, although some apps can request config directly from them. the centralized config service should track changes and be auditable.

3) cowboys sharing an admin password clicking around in the webui changing settings at will.

One of the above is an anti-pattern ...

paulczar on 21 Jun 2019

👍4

@paulczar Hey paul, you raise some good points and we are evaluating some of the options you mentioned but I want to give some color to everyone on why we made those changes, there's quite an audience here it seems :) Prior to 1.8, everything was stuffed into the config file and it was growing out of control, and opening a floodgate in terms of user behavior of just modifying a single file for harbor admins. Just mod it, copy it over, and deploy, but you can see how that can become problematic. Some of those same params in the cfg which were editable in the web console only persists through the DB but won't backdate the config file obviously, leading to stale config files. The decision to create the yml as it is currently was really to create a clearer seperation between system variables and user/project level settings so it got gutted essentially, but creating two complementary subsets of a previous superset. So there shoudn't be any param that can be edited via different paths basically, we were really striving for the single version of truth you mentioned. User settings should be set via API/UI, it makes no sense to leave these in the cfg. The UAA_CA_ROOT is indeed an an oversight, but I think you might have found the only guy in the intersection, thanks for catching that :)

The current setup using API does require curling, and it can be (a bit?) more painful and the harbor instances have to be up and running already. We have explored a cli option previously like the one you mentioned as replacement but advantages over curling did not tip it in its favor compared to more urgent features. Can you elaborate on specific automation tooling you have in mind here, maybe share some specific use cases? ie. I'm guessing users like the old setup because they can auto deploy multiple harbor instances via a single config file by just changing the hostname or something to that effect, without having to tinker with automating APIs or hitting buttons on the UI.

"the developers are forcing us to use an API without a decent client tool." Yep, good client tooling is indeed what we are after. But I don't think throwing everything in the yaml is what we want.

xaleeks on 11 Jul 2019

Thanks for the response! I still believe that having a configuration API for setting underlying config settings is an antipattern. If you look at similar software, you set all of this stuff up via config files or env variables:

examples:

Grafana - https://grafana.com/docs/auth/github/
Concourse - https://concourse-ci.org/cf-uaa-auth.html

paulczar on 17 Jul 2019

Just want to add some emphasis to this one:

I know a lot of folks use automation ( and the frequency of rerunning it every say 30 seconds ) to re-mediate changes when somebody changes the live config to be different from the expected config.

This is particularly important in highly regulated environments that require frequent audits and compliance to know that settings are correctly configured. We need to be able to track who makes config changes and when those config changes occurred, and know that the last set configuration is what is still live in the environment. This is extremely easy to do when the configuration lives in a single source of truth (a git repository) where we have a lot of mechanisms to handle everything mentioned -- pull requests, signed commits, and RBAC on accessing the repositories. Industry is leaning heavily towards that declarative format with prevention of drift.

voor on 17 Jul 2019

👍6

What is the solution to creating and then using robot accounts created with the API?

Even with the approaches goharbor/harbor-helm#254, goharbor/harbor-helm#256 or *any "call the API" approach, there is no possibility to declare the robot account token/password.

Yes, the robot account can be created via the API and the token extracted from the response.
However, the "API caller" must now have write permission to store the token where it can be accessed and used by external systems. This makes the use of the API for creating robot accounts impractical in an automated deployment.

If the credentials could be declared through configuration or via the API, then at least the credentials are not generated during the API call. Additionally, this would allow the robot credentials to be updated, where the current approach with the API seems to be that the robot account would need to be deleted and recreated, which breaks all active users of the robot account until newly generated token can be manually propagated.

hobti01 on 9 Aug 2019

👍1

I think you can use configmap like below.

jenkins : https://github.com/helm/charts/blob/master/stable/jenkins/values.yaml#L38-L39
grafana : https://github.com/helm/charts/blob/master/stable/grafana/values.yaml#L364-L373

Hokwang on 19 Aug 2019

Here's my current method for configuring auth after the fact ... it uses the raw chart to write out some secrets and a job to curl put to the api. It works ... but its not a great experience.

{{- if eq .Values._.auth.type "oidc" }}
  - metadata:
      name: auth-config-json
    apiVersion: v1
    kind: Secret
    stringData:
      auth.json: |
        {
          "auth_mode": "oidc_auth",
          "oidc_name": "oidc",
          "oidc_scope": "openid,offline_access,email,profile",
          "oidc_client_id": "{{ requiredEnv "HARBOR_OIDC_CLIENT_ID" }}",
          "oidc_client_secret": "{{ requiredEnv "HARBOR_OIDC_CLIENT_SECRET" }}",
          "oidc_endpoint": "{{ requiredEnv "HARBOR_OIDC_URL" }}",
          "oidc_verify_cert": "false"
        }
      password: {{ env "HARBOR_ADMIN_PASSWORD" }}
  - metadata:
      name: configure-auth
    apiVersion: batch/v1
    kind: Job
    spec:
      template:
        spec:
          volumes:
          - name: config
            secret:
              secretName: auth-config-json
          initContainers:
          - name: wait-for-harbor
            image: curlimages/curl:7.72.0
            command:
            - "bin/sh"
            - "-c"
            - "while ! curl -s http://harbor-core > /dev/null ; do sleep 10 ; done"
          containers:
          - name: configure-auth
            image: curlimages/curl:7.72.0
            volumeMounts:
            - name: config
              mountPath: "/config"
              readOnly: true
            env:
            - name: "PASSWORD"
              valueFrom:
                secretKeyRef:
                  name: auth-config-json
                  key: password
            command:
            - "bin/sh"
            - "-c"
            - 'curl -s -i -X PUT -u "admin:$PASSWORD" -H "Content-Type: application/json" http://harbor-core/api/v2.0/configurations -d @/config/auth.json'
          restartPolicy: Never
      backoffLimit: 4
{{- end }}

paulczar on 22 Aug 2020

👍6

I would just like to +1 this as being a pretty critical feature. So far I really like harbor and in my testing so far I think it has a lot of potential as a private registry.

However from an IaC/automation standpoint it is really hard to sell. While there is an API - some things (like replication rules) don't seem to be present - essentially forcing manual configuration. We extensively use IaC tools like terraform and gitlab to build, deploy and configure our applications storing all of the "code" in git. Objectively I believe this is a better way of managing applications (vs manual configuration) due to great transparency, traceability, accountability and ease and automation of deployment and configuration.

Some key changes that would be ideal from my perspective for this tools would be:
1) Declarative configuration through static files (that can be easily updated/changed on the fly)
2) Fully featured API

stuff nice to have would be integration with a tool like Terraform for example.

Some example applications that we use that are configured like this:
1) Vault (Hashicorp)
2) Grafana (as mentioned above)
3) Prometheus
I am sure there are more that I'm not mentioning.

I definitely understand the challenges and just wanted to offer my perspective as someone trying to implement Harbor in production. Thanks for all your great work and this looks like a great project.

gmintoco on 29 Sep 2020

👍1

I think this issue is critical. More and more people want the setup of their clusters to be set up in a git repository.
A nice UI is a plus but not a requirement, same for a rest API.
Trying to support something else than k8s is a waste of effort, k8s is everywhere, even on your laptop now.
If you have a look at ArgoCD, they have a nice UI but everything you do is, in the end, a manifest.
If everything was on a manifest in harbor, we could even use a tool like argocd to choose if we want the users to be able to change the thinks in the UI or not without changing the UI. We could use gitops tools to monitor config drift (to commit them into git or to prevent overwriting them) or to force instantly a reconfiguration with the defined settings.
We could also restore, create, delete robot accounts the way we want.
The only things that should not be in the cluster configuration should be artifacts and their related metadata.

We evaluated harbor and would really like to use it but we won't because of this.
I understand it's a big change but I don't think harbor will go anywhere if harbor does not take this direction.