Harbor: Harbor containers fail to start on docker startup

Created on 23 Feb 2019  Â·  23Comments  Â·  Source: goharbor/harbor

After restarting the docker daemon (or rebooting the system), not all harbor containers get started successfully. Because of the restart: always directive in docker-compose.yml, I expected harbor to automatically start up after rebooting.

Note about my setup: I'm running harbor behind nginx as a reverse proxy, as explained in #3114 (main changes: port of proxy service, public_url fixup in prepare script, X-Forwarded-Proto changes as detailed in troubleshooting documentation).

Steps to reproduce the problem:

  • restart docker
  • observe docker ps -a and see that only some containers reach Up status

Manually starting harbor via docker-compose -f ./docker-compose.yml -f docker-compose.clair.yml up -d works fine.

  • harbor version: 1.7.1
  • docker engine version: 5:18.09.2~3-0~debian-stretch
  • docker-compose version: 1.23.2, build 1110ad01

Additional context:

See below for log messages from dockerd. It seems that the ordering is somehow not observed, that is, services that depend on the log service try to start up before syslog is fully up and running, but give up early, when they find that they cannot reach syslog. Docker doesn't appear to be attempting any more restarts in this case.

Feb 23 16:39:17 code dockerd[28575]: time="2019-02-23T16:39:17.457432463+01:00" level=info msg="Loading containers: start."
Feb 23 16:39:19 code dockerd[28575]: time="2019-02-23T16:39:19.534774219+01:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Feb 23 16:39:22 code dockerd[28575]: time="2019-02-23T16:39:22.951084811+01:00" level=error msg="5cbeb21200bb05437f9fd599d829babd6ca5d6b946ddbc0986733d85b52ec3ca cleanup: failed to delete container from containerd: no such container"
Feb 23 16:39:22 code dockerd[28575]: time="2019-02-23T16:39:22.951114412+01:00" level=error msg="Failed to start container 5cbeb21200bb05437f9fd599d829babd6ca5d6b946ddbc0986733d85b52ec3ca: failed to initialize logging driver: dial tcp 127.0.0.1:1514: connect: connection refused"
Feb 23 16:39:25 code dockerd[28575]: time="2019-02-23T16:39:25.395062463+01:00" level=error msg="942dd05349486b3d1646415572abc126ebce7b31aec54888ea859467b2d0b422 cleanup: failed to delete container from containerd: no such container"
Feb 23 16:39:25 code dockerd[28575]: time="2019-02-23T16:39:25.395085466+01:00" level=error msg="Failed to start container 942dd05349486b3d1646415572abc126ebce7b31aec54888ea859467b2d0b422: failed to initialize logging driver: dial tcp 127.0.0.1:1514: connect: connection refused"
Feb 23 16:39:25 code dockerd[28575]: time="2019-02-23T16:39:25.491026866+01:00" level=error msg="371c7e506f09e1f6c1c154d6fcacb862432bd7b4cb831a59d6fdc4b3dac92d87 cleanup: failed to delete container from containerd: no such container"
Feb 23 16:39:25 code dockerd[28575]: time="2019-02-23T16:39:25.491056484+01:00" level=error msg="Failed to start container 371c7e506f09e1f6c1c154d6fcacb862432bd7b4cb831a59d6fdc4b3dac92d87: failed to initialize logging driver: dial tcp 127.0.0.1:1514: connect: connection refused"
Feb 23 16:39:26 code dockerd[28575]: time="2019-02-23T16:39:26.670989556+01:00" level=error msg="0420313e7b29e1eb74776495d3ccd6be383ab6c14dc2a4c9c31f770b3f8ac724 cleanup: failed to delete container from containerd: no such container"
Feb 23 16:39:26 code dockerd[28575]: time="2019-02-23T16:39:26.671032205+01:00" level=error msg="Failed to start container 0420313e7b29e1eb74776495d3ccd6be383ab6c14dc2a4c9c31f770b3f8ac724: failed to initialize logging driver: dial tcp 127.0.0.1:1514: connect: connection refused"
Feb 23 16:39:27 code dockerd[28575]: time="2019-02-23T16:39:27.771129969+01:00" level=error msg="5dc8e982846d7f339a86ff9e20c937b4ce18b7634d5b28e3b291723dc5403a38 cleanup: failed to delete container from containerd: no such container"
Feb 23 16:39:27 code dockerd[28575]: time="2019-02-23T16:39:27.771157135+01:00" level=error msg="Failed to start container 5dc8e982846d7f339a86ff9e20c937b4ce18b7634d5b28e3b291723dc5403a38: failed to initialize logging driver: dial tcp 127.0.0.1:1514: connect: connection refused"
Feb 23 16:39:27 code containerd[871]: time="2019-02-23T16:39:27.791326620+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/71a79f0da980e15c3e425066c0ef83d455d67361140245cfc34d6abdd37913f6/shim.sock" debug=false pid=28994
Feb 23 16:39:28 code containerd[871]: time="2019-02-23T16:39:28.161320952+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/d462cae2d8bc4614fbfddd692706866e301d5e0650cd8cd232929470a40d4c67/shim.sock" debug=false pid=29062
Feb 23 16:39:28 code containerd[871]: time="2019-02-23T16:39:28.874403639+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/5004316792bc356ade270d746be0fe886df634f50ae6bf4730d1696f95632fcc/shim.sock" debug=false pid=29165
Feb 23 16:39:29 code dockerd[28575]: time="2019-02-23T16:39:29.278982945+01:00" level=error msg="9708ab1bfbcfce250d7d6876517792f01e17fdbd682e4f6e80445313ff638de6 cleanup: failed to delete container from containerd: no such container"
Feb 23 16:39:29 code dockerd[28575]: time="2019-02-23T16:39:29.279020093+01:00" level=error msg="Failed to start container 9708ab1bfbcfce250d7d6876517792f01e17fdbd682e4f6e80445313ff638de6: failed to initialize logging driver: dial tcp 127.0.0.1:1514: connect: connection refused"
Feb 23 16:39:29 code dockerd[28575]: time="2019-02-23T16:39:29.678990965+01:00" level=error msg="05e5805d46720f24a854a361159cefdbfe31773ccf52c784c08c14a23b8a57d0 cleanup: failed to delete container from containerd: no such container"
Feb 23 16:39:29 code dockerd[28575]: time="2019-02-23T16:39:29.679026771+01:00" level=error msg="Failed to start container 05e5805d46720f24a854a361159cefdbfe31773ccf52c784c08c14a23b8a57d0: failed to initialize logging driver: dial tcp 127.0.0.1:1514: connect: connection refused"
Feb 23 16:39:30 code dockerd[28575]: time="2019-02-23T16:39:30.078974849+01:00" level=error msg="ae98ae73d7b252321a86f2bef8c27da5f974f98e74435b9fd35775bf98c9bf84 cleanup: failed to delete container from containerd: no such container"
Feb 23 16:39:30 code dockerd[28575]: time="2019-02-23T16:39:30.079012597+01:00" level=error msg="Failed to start container ae98ae73d7b252321a86f2bef8c27da5f974f98e74435b9fd35775bf98c9bf84: failed to initialize logging driver: dial tcp 127.0.0.1:1514: connect: connection refused"
Feb 23 16:39:30 code dockerd[28575]: time="2019-02-23T16:39:30.079048260+01:00" level=info msg="Loading containers: done."
Feb 23 16:39:30 code dockerd[28575]: time="2019-02-23T16:39:30.115257599+01:00" level=info msg="Docker daemon" commit=6247962 graphdriver(s)=overlay2 version=18.09.2
Feb 23 16:39:30 code dockerd[28575]: time="2019-02-23T16:39:30.115336823+01:00" level=info msg="Daemon has completed initialization"
Feb 23 16:39:30 code dockerd[28575]: time="2019-02-23T16:39:30.139043317+01:00" level=info msg="API listen on /var/run/docker.sock"

Most helpful comment

You can also create a systemd service (see config below). This wil start all Harbor Docker containers using systemd.

[Unit]
Description=Harbor
After=docker.service systemd-networkd.service systemd-resolved.service
Requires=docker.service
Documentation=http://github.com/vmware/harbor

[Service]
Type=simple
Restart=on-failure
RestartSec=5
ExecStart=/usr/local/bin/docker-compose -f {{ harbor_install_path }}/harbor/docker-compose.yml up
ExecStop=/usr/local/bin/docker-compose -f {{ harbor_install_path }}/harbor/docker-compose.yml down

[Install]
WantedBy=multi-user.target

All 23 comments

@sk1p If you see this again. Could you provide the logs under /var/logs/harbor?

@reasonerjt sure. I just restarted docker and the following containers managed to come up:

CONTAINER ID        IMAGE                          COMMAND                  CREATED             STATUS                             PORTS                       NAMES
3b7ba6282f33        goharbor/nginx-photon:v1.7.5   "nginx -g 'daemon of…"   4 days ago          Restarting (1) 48 seconds ago                                  nginx
17c7ea09eba4        goharbor/harbor-core:v1.7.5    "/harbor/start.sh"       4 days ago          Up 41 seconds (health: starting)                               harbor-core
2e87ab1a6699        goharbor/harbor-db:v1.7.5      "/entrypoint.sh post…"   4 days ago          Up 2 minutes (healthy)             5432/tcp                    harbor-db
9bc9e042fd16        goharbor/harbor-log:v1.7.5     "/bin/sh -c /usr/loc…"   4 days ago          Up 2 minutes (healthy)             127.0.0.1:1514->10514/tcp   harbor-log

In core.log, these messages are repeated (because harbor-adminserver didn't come up):

Apr 14 20:05:45 172.18.0.1 core[24003]: 2019-04-14T18:05:45Z [ERROR] [utils.go:101]: failed to connect to tcp://adminserver:8080, retry after 2 seconds :dial tcp: lookup adminserver on 127.0.0.11:53: no such host
Apr 14 20:05:47 172.18.0.1 core[24003]: 2019-04-14T18:05:47Z [ERROR] [utils.go:101]: failed to connect to tcp://adminserver:8080, retry after 2 seconds :dial tcp: lookup adminserver on 127.0.0.11:53: no such host
Apr 14 20:05:49 172.18.0.1 core[24003]: 2019-04-14T18:05:49Z [ERROR] [utils.go:101]: failed to connect to tcp://adminserver:8080, retry after 2 seconds :dial tcp: lookup adminserver on 127.0.0.11:53: no such host
Apr 14 20:05:51 172.18.0.1 core[24003]: 2019-04-14T18:05:51Z [ERROR] [utils.go:101]: failed to connect to tcp://adminserver:8080, retry after 2 seconds :dial tcp: lookup adminserver on 127.0.0.11:53: no such host

proxy.log repeats this after the restart (makes sense because harbor-portal didn't come up):

Apr 14 20:05:59 172.18.0.1 proxy[24003]: 2019/04/14 18:05:59 [emerg] 1#0: host not found in upstream "portal:80" in /etc/nginx/nginx.conf:20
Apr 14 20:05:59 172.18.0.1 proxy[24003]: nginx: [emerg] host not found in upstream "portal:80" in /etc/nginx/nginx.conf:20

Postgres appears to start up without major problems (though it does complain about unclean shutdown):

Apr 14 20:00:31 172.18.0.1 postgresql[17555]: 
Apr 14 20:00:33 172.18.0.1 postgresql[17555]: Session terminated, terminating shell... ...killed.
Apr 14 20:01:01 172.18.0.1 postgresql[24003]: LOG:  database system was not properly shut down; automatic recovery in progress
Apr 14 20:01:01 172.18.0.1 postgresql[24003]: LOG:  invalid record length at 4/6E13DD28: wanted 24, got 0
Apr 14 20:01:01 172.18.0.1 postgresql[24003]: LOG:  redo is not required
Apr 14 20:01:02 172.18.0.1 postgresql[24003]: LOG:  MultiXact member wraparound protections are now enabled
Apr 14 20:01:02 172.18.0.1 postgresql[24003]: LOG:  database system is ready to accept connections
Apr 14 20:01:02 172.18.0.1 postgresql[24003]: LOG:  autovacuum launcher started

All others stopped logging after the restart, for example in the redis log, the last message appears to be part of process of shutting down:

Apr 14 19:58:22 172.18.0.1 redis[17555]: 8:M 14 Apr 17:58:22.013 * Background saving started by pid 1187
Apr 14 19:58:22 172.18.0.1 redis[17555]: 1187:C 14 Apr 17:58:22.063 * DB saved on disk
Apr 14 19:58:22 172.18.0.1 redis[17555]: 1187:C 14 Apr 17:58:22.064 * RDB: 0 MB of memory used by copy-on-write
Apr 14 19:58:22 172.18.0.1 redis[17555]: 8:M 14 Apr 17:58:22.113 * Background saving terminated with success
Apr 14 20:00:31 172.18.0.1 redis[17555]: sudo: kill(8, SIGTERM): Operation not permitted

I experience the same issue with:

harbor version: 1.7.5
docker engine version: 18.09.5
docker-compose version: 1.24.0, build 0aa5906
OS: CentOS 7.6

"docker container ls -a" shows 4 exited containers after some reboot:
aa8f4ee26790 goharbor/harbor-jobservice:v1.7.5 "/harbor/start.sh" About an hour ago Exited (137) 3 minutes ago harbor-jobservice
860f4e5cd2d8 goharbor/clair-photon:v2.0.8-v1.7.5 "/docker-entrypoint.…" About an hour ago Exited (137) 3 minutes ago 6060-6061/tcp clair
e014632adcce goharbor/redis-photon:v1.7.5 "docker-entrypoint.s…" About an hour ago Exited (137) 3 minutes ago 6379/tcp redis
a7e7bb4f42df goharbor/harbor-adminserver:v1.7.5 "/harbor/start.sh" About an hour ago Exited (137) 3 minutes ago harbor-adminserver

"docker inspect" on harbor-adminserver shows the error:
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 137,
"Error": "failed to initialize logging driver: dial tcp 127.0.0.1:1514: connect: connection refused",

It seems to be related to start ordering issue. It should not be an issue with the "restart: always" policy.

Actually, the issue seems that this restart policy does not seem to be applied. It's more a docker issue in my opinion.

I have a fix for the logging error: failed to initialize logging driver: dial tcp 127.0.0.1:1514: connect: connection refused.

Configured the logging driver option mode: "non-blocking" inside the docker-compose.yml file (for each container).

    logging:
      driver: "syslog"
      options:
        syslog-address: "tcp://127.0.0.1:1514"
        tag: "registry"
        mode: "non-blocking"

Same issue here:

harbor version: 1.8.1
docker engine version: 18.09.7
docker-compose version: 1.23.2, build 1110ad01
OS: CentOS 7.6

You can also create a systemd service (see config below). This wil start all Harbor Docker containers using systemd.

[Unit]
Description=Harbor
After=docker.service systemd-networkd.service systemd-resolved.service
Requires=docker.service
Documentation=http://github.com/vmware/harbor

[Service]
Type=simple
Restart=on-failure
RestartSec=5
ExecStart=/usr/local/bin/docker-compose -f {{ harbor_install_path }}/harbor/docker-compose.yml up
ExecStop=/usr/local/bin/docker-compose -f {{ harbor_install_path }}/harbor/docker-compose.yml down

[Install]
WantedBy=multi-user.target

You can also create a systemd service (see config below). This wil start all Harbor Docker containers using systemd.

[Unit]
Description=Harbor
After=docker.service systemd-networkd.service systemd-resolved.service
Requires=docker.service
Documentation=http://github.com/vmware/harbor

[Service]
Type=simple
Restart=on-failure
RestartSec=5
ExecStart=/usr/local/bin/docker-compose -f {{ harbor_install_path }}/harbor/docker-compose.yml up
ExecStop=/usr/local/bin/docker-compose -f {{ harbor_install_path }}/harbor/docker-compose.yml down

[Install]
WantedBy=multi-user.target

thanks, this works for me.

I have a fix for the logging error: failed to initialize logging driver: dial tcp 127.0.0.1:1514: connect: connection refused.

Configured the logging driver option mode: "non-blocking" inside the docker-compose.yml file (for each container).

    logging:
      driver: "syslog"
      options:
        syslog-address: "tcp://127.0.0.1:1514"
        tag: "registry"
        mode: "non-blocking"

it doesn't work.
docker version : Docker version 1.13.1, build dded712/1.13.1
docker-compose version: docker-compose version 1.9.0, build 2585387
harbor: v1.5.1
OS: CentOS Linux release 7.3.1611 (Core)

docker-compose uses "depends_on" to make sure the logging container is started before the others.
The "restart: always" setting ensures the docker daemon starts the containers again after a restart, but it has no notion of any dependencies. So all containers are started at the same time, but since the syslog service is not reachable, some of the containers fail to start up, and the daemon doesn't try to start them again afterwards.

Using "depends_on" seems to be a poor way to ensure a correct startup procedure, since the docker daemon doesn't support it.

So the fix is to either get rid of the syslog logging (which is quite confusing anyway, tbh) and just use "docker logs" to view log entries, or add a systemd unit as a workaround (as mentioned above)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Having same issue as well. Adding mode: "non-blocking" everywhere didn't fix issue :(

Adding a systemd unit works for docker restart but doesnt work after docker stop and start

Came here after admin UI failed to load after the VM hosting harbor was restarted. Having this same issue in v 1.10.1. container for nginx-photon continually restarts and never gets to a healthy state.

I have v 1.10.1 and I have the issue where only harbor-portal and harbor-log containers will start after vm reboot. The systemd approach outlined by ryan4yin works.

Even v2.0.0 has this issue. After a reboot of the server only half of the containers are restarted. Running "docker-compose start" manually again is a possible workaround.

Wow. This broke our production servers 3 times now.

I will try @rvanbutselaar solution, but we had large unavailabilities after a simple docker update due to the containers not restarting correctly.

I'm not sure how high this issue is in your priorities, but this is really a big problem in production. We have to connect manually and issue a docker-compose up ourselves each time the docker daemon is updated automatically through security updates.

I have the same problem,In the end ,I find that the harbor-log container is not working properly; So it is normally after replacing the log containers!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

I believe this problem still exists.

This problem still exists in the latest release

Yes, the 20.10 version of docker tonight broke our harbor because it didn't restart correctly.

Also, @rvanbutselaar 's solution is not working for us: https://github.com/goharbor/harbor/issues/7008#issuecomment-507665340

I had an idea reading the Docker doc:

https://docs.docker.com/config/containers/live-restore/

Maybe if we configure our daemon to not stop the containers we'll "skip" the issue.

@rvanbutselaar 's solution still works for me (#7008 (comment))

Without it, I still have the error "failed to initialize logging driver: dial tcp 127.0.0.1:1514: connect: connection refused" randomly. The systemd unit ensures a start of all the services with the correct dependencies.
@Gui13, are you sure your unit has correctly started ? What is your output of "systemctl status " ?

Mine is:

â—Ź harbor.service - Harbor Container Registry
   Loaded: loaded (/etc/systemd/system/harbor.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2020-12-16 02:28:02 CET; 13h ago
 Main PID: 4260 (docker-compose)
    Tasks: 13
   Memory: 43.9M
   CGroup: /system.slice/harbor.service
           └─4260 /usr/bin/python2 /usr/bin/docker-compose -f /opt/harbor/docker-compose.yml up

Dec 16 02:28:06 lnx-gva310.gva.dts.corp.local docker-compose[4260]: harbor-db        | WARNING: no logs are available with the 'syslog' log driver
Dec 16 02:28:06 lnx-gva310.gva.dts.corp.local docker-compose[4260]: registry         | WARNING: no logs are available with the 'syslog' log driver
Dec 16 02:28:06 lnx-gva310.gva.dts.corp.local docker-compose[4260]: redis            | WARNING: no logs are available with the 'syslog' log driver
Dec 16 02:28:06 lnx-gva310.gva.dts.corp.local docker-compose[4260]: harbor-portal    | WARNING: no logs are available with the 'syslog' log driver
Dec 16 02:28:06 lnx-gva310.gva.dts.corp.local docker-compose[4260]: harbor-core      | WARNING: no logs are available with the 'syslog' log driver
Dec 16 02:28:06 lnx-gva310.gva.dts.corp.local docker-compose[4260]: nginx            | WARNING: no logs are available with the 'syslog' log driver
Dec 16 02:28:06 lnx-gva310.gva.dts.corp.local docker-compose[4260]: harbor-jobservice | WARNING: no logs are available with the 'syslog' log driver
Dec 16 02:28:06 lnx-gva310.gva.dts.corp.local docker-compose[4260]: clair            | WARNING: no logs are available with the 'syslog' log driver
Dec 16 02:28:06 lnx-gva310.gva.dts.corp.local docker-compose[4260]: clair-adapter    | WARNING: no logs are available with the 'syslog' log driver
Dec 16 02:28:06 lnx-gva310.gva.dts.corp.local docker-compose[4260]: registryctl      | WARNING: no logs are available with the 'syslog' log driver

My config is:
docker v19.03.6
docker-compose v1.24.0
harbor v2.1.1

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mramanathan picture mramanathan  Â·  3Comments

protochron picture protochron  Â·  3Comments

cten picture cten  Â·  3Comments

Hotege picture Hotege  Â·  3Comments

pingcrosby picture pingcrosby  Â·  3Comments