Hi all, I have a problem using cadvisor on centos 7. When cadvisor is running, docker failes to remove other containers saying that the containers filesystem is busy. After stopping cadvisor is stopped container removal is working again.
I demostrated that in this gist: https://gist.github.com/cornelius-keller/0fd2d23b68ccd88c9328
I also included os version and docker info in the gist.
Thanks for reporting, @cornelius-keller
what cadvisor version are you running? Can you get host:port/validate for cadvisor?
Is this a temporary situation, or does the container fs stays busy till you delete cadvisor?
@rjnagal
Cadvisor version is:
[root@583274-app35 ~]# docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE docker.io/google/cadvisor latest 399ae3c46a0e 47 hours ago 19.89 MB [root@583274-app35 ~]#
This is a permanent situation. The container fs stays busy untill I delete cadvisor.
What do you mean by getting host:port/validate for cadvisor? Cadvisor was still running and responsive on the web ui if that is what you mean. Unfortunately I can't give you any public host port to validate as cadvisor is only exposed via a vpn.
Yeah, I just need the ouput from /validate endpoint on cadvisor UI. You can
scrub any data that's private in there. Thanks
On Fri, Jun 12, 2015 at 9:54 AM, Cornelius Keller [email protected]
wrote:
@rjnagal https://github.com/rjnagal
Cadvisor version is:[root@583274-app35 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZEdocker.io/google/cadvisor latest 399ae3c46a0e 47 hours ago 19.89 MB
[root@583274-app35 ~]#This is a permanent situation. The container fs stays busy untill I delete
cadvisor.What do you mean by getting host:port/validate for cadvisor? Cadvisor was
still running and responsive on the web ui if that is what you mean.
Unfortunately I can't give you any public host port to validate as cadvisor
is only exposed via a vpn.—
Reply to this email directly or view it on GitHub
https://github.com/google/cadvisor/issues/771#issuecomment-111555689.
Sorry was a long day, did not get that this was an endpoint. I added the output to the gist.
I am facing this same issue. Essentially, running cadvisor with --volume=/:/rootfs:ro causes other containers' devicemapper mounts to be mounted inside the cadvisor container, so they can't be properly destroyed when issuing docker rm on the target container as they will appear in use.
How can this be solved?
When i run it on Fedora 21, it works fine. But when i run it on Ubuntu 14.04.2 LTS I get the same error as described above.
Error response from daemon: Cannot destroy container xxx_jenkinsMaster_1230: Driver aufs failed to remove root filesystem 13b421d0458e740e42e5fa5ac1cb68f32638f0bc723d9ba16718955214d79b7d: rename /var/lib/docker/aufs/mnt/13b421d0458e740e42e5fa5ac1cb68f32638f0bc723d9ba16718955214d79b7d /var/lib/docker/aufs/mnt/13b421d0458e740e42e5fa5ac1cb68f32638f0bc723d9ba16718955214d79b7d-removing: device or resource busy
The main difference is, that Ubuntu uses AUFS, where Fedora uses Devicemapper. Maby thats the problem.
@rjnagal I can confirm that this issue happens on Ubuntu trusty x64 with Doceker 1.8.1, cadvisor:latest and devicemapper.
'1cb6051b30a1' being the container ID.
# grep -l 1cb6051b30a1 /proc/*/mountinfo
/proc/1963/mountinfo
# ps aux | grep -i 1963
root 1963 1.9 0.8 588740 71688 ? Ssl Aug26 30:08 /usr/bin/cadvisor
root 14767 0.0 0.0 11744 952 pts/0 S+ 00:56 0:00 grep --color=auto -i 1963
Please suggest a workaround for this.
same here with CentOS + Docker 1.8.1(devicemapper)
Had to remove --volume=/:/rootfs:ro && --volume=/var/lib/docker:/var/lib/docker:ro
@rjnagal: Excepting disk usage calculation, cAdvisor does not poke at any
of these directories right?
On Fri, Aug 28, 2015 at 12:26 AM, Jihoon Chung [email protected]
wrote:
same here with CentOS + Docker 1.8.1(devicemapper)
Had to remove --volume=/:/rootfs:ro &&
--volume=/var/lib/docker:/var/lib/docker:ro—
Reply to this email directly or view it on GitHub
https://github.com/google/cadvisor/issues/771#issuecomment-135661164.
Same problem here with Ubuntu 14.04.3.
@difro solution works but cadvisor can't provide docker stats anymore.
Any workaround?
The last time I ran into this problem, I digged a little bit into the cAdvisor source code. I'm not 100% sure - because it was a few weeks ago - but this is essentially the gist:
If you use cAdvisor like it is shown in README.md you'll mount /var/lib/docker as a volume into the container. This will create dead containers.
The reason, cAdvisor wants you to mount /var/lib/docker is - as far as I could see - only to display a certain info that is only interesting for admins and should be known before hand.
We should be able to get all info from a docker inspect rather than parsing the container config file. Seems like mounting /var/lib/docker is causing more trouble than it's worth.
we also encounter the same problem (cadvisor:lastest, ubuntu 14.04)
any updates regarding this?
The best we can do for now is to let users optionally disable filesystem
usage metrics. We are waiting for some of the new upstream kernel features
to simplify disk accounting.
On Tue, Jan 26, 2016 at 2:51 PM, Sven Müller [email protected]
wrote:
any updates regarding this?
—
Reply to this email directly or view it on GitHub
https://github.com/google/cadvisor/issues/771#issuecomment-175277349.
Same situation.
My Docker Version is 1.9.1
Cadvisor version 0.18.0
And when docker rm container fails, the status of that container change to "dead" .
Is it possible to umount that specific mountpoint when container status changed to "exit" or "dead" ?
+1
cAdvisor doesn't mount anything. It runs du periodically to collect
filesystem stats. Other than that, it does not touch the container's
filesystem at all.
The easy fix for this would be to retry docker deletion or disable
filesystem aggregation in cadvisor.
On Wed, Feb 3, 2016 at 2:57 PM, Alex Rhea [email protected] wrote:
+1
—
Reply to this email directly or view it on GitHub
https://github.com/google/cadvisor/issues/771#issuecomment-179518025.
running cAdvisor without --volume=/:/rootfs:ro seems to fix it.
As pointed out in https://github.com/google/cadvisor/blob/master/docs/running.md
I haven't fully tested it yet, but works fine up to now
Upgraded docker to 1.10.3 and now cAdvisor can only see the docker images, but no containers, if I only use volume mounts:
If I add /:/rootfs:ro, cAdvisor can see the containers, but I get device or resource busy, when trying to remove any container.
@xbglowx Are you using the latest cadvisor release?
Using google/cadvisor:v0.22.0
Any ideas or suggestions how can i dig inside the issue?
cc @timstclair
I was able to reproduce this locally with docker v1.9.1 and cAdvisor 0.22.0, but only right after starting cAdvisor and only once (removing a second container works). I could not reproduce with docker v1.11.
Is this consistent with everyone else's experience?
With docker 1.11.1 the is issue is gone. With the latest fixes from docker part, seems working now.
I'm still able to reproduce this with docker 1.11.1 and cAdvisor 0.23.0. Ubuntu 14.04.
@ashkop Can you try running cAdvisor with --disable_metrics="tcp,disk" and see if that resolves the issue? Note that you will not get docker container filesystem metrics by adding this flag.
If I try using --disable_metrics="tcp,disk" I get the following:
sudo docker run -ti -v /var/lib/docker/:/var/lib/docker:ro -v /var/run:/var/run:rw -v /sys:/sys:ro -v /:/rootfs:ro google/cadvisor --disable_metrics="tcp,disk"
panic: assignment to entry in nil map
goroutine 1 [running]:
panic(0xb0c8c0, 0xc8201c0440)
/usr/local/go/src/runtime/panic.go:481 +0x3e6
main.(*metricSetValue).Set(0x15ac528, 0x7ffe3cea1f59, 0x8, 0x0, 0x0)
/go/src/github.com/google/cadvisor/cadvisor.go:85 +0x1da
flag.(*FlagSet).parseOne(0xc82004e060, 0xc82005e901, 0x0, 0x0)
/usr/local/go/src/flag/flag.go:881 +0xdd9
flag.(*FlagSet).Parse(0xc82004e060, 0xc82000a100, 0x2, 0x2, 0x0, 0x0)
/usr/local/go/src/flag/flag.go:900 +0x6e
flag.Parse()
/usr/local/go/src/flag/flag.go:928 +0x6f
main.main()
/go/src/github.com/google/cadvisor/cadvisor.go:99 +0x68
This is with cAdvisor version 0.23.0 (750f18e). Works fine with 0.22.0.
I still need to see if using --disable_metrics="tcp,disk" fixes the problem.
Yeah, that was fixed in https://github.com/google/cadvisor/pull/1259, but it's not integrated into any release.
@vishh Unfortunately the flag didn't help. As @xbglowx mentioned, this option causes 0.23.0 to crash, so I tried 0.22.0 and canary. Both still prevent me from removing containers. Here's the error message I get:
Error response from daemon: Unable to remove filesystem for 9e96817fba0a443f75d1426b6d7a586f4bc84217b06eb021f6d28bae4f341473: remove /var/lib/docker/containers/9e96817fba0a443f75d1426b6d7a586f4bc84217b06eb021f6d28bae4f341473/shm: device or resource busy
Same here on Debian 8, Docker 1.11.1 and latest cAdvisor.
@timstclair Can we make a v0.23.1 release with the fix for --disable_metrics flag?
I am experiencing the same issue with the following versions
"cAdvisor version: 0.23.0-750f18e"
google/cadvisor latest 5cda8139955b 8 days ago 48.92 MB
CentOS Linux release 7.2.1511 (Core)
Docker version 1.11.1, build 5604cbe
Work around was to remove /var/lib/docker from the shared volume.
@vishh Is this fixed if we just stopped tracking disk metrics for these machines? Are there other dependencies?
@rjnagal Disk metrics should be the only dependency. Disabling that by using --disable_metrics=tcp,disk should fix this issue.
Can we do that by default when we detect devicemapper?
@rjnagal AFAIK, it is not limited to devicemapper alone. AUFS is also affected. If we need a default solution, we will have to disable per-container disk metrics by default.
The issue persists in v0.23.1 on CentOS7, Docker 1.10.1, devicemapper
docker run \
--rm \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
google/cadvisor:v0.23.1 \
-docker_only \
--disable_metrics="tcp,disk"
To add more info - the issue persists on v0.23.1 and v0.23.2 on CentOS7, Docker 1.11.1, devicemapper.
However the issue only occurs when cadvisor is run from docker.
Running cadvisor directly on CentOS7 works without issues.
Could you add more details about your repro steps? How many containers are you running, with what options? It would help if we could reproduce from a clean VM centos image.
I tried to reproduce it on fresh VM, but failed. I'll try to find the difference that is actually causing the issue. Meanwhile I did lsof inside the cadvisor container of the file that is being blocked. Here's what I got:
1 /usr/bin/cadvisor pipe:[70918923]
1 /usr/bin/cadvisor pipe:[70918924]
1 /usr/bin/cadvisor pipe:[70918925]
1 /usr/bin/cadvisor socket:[70919220]
1 /usr/bin/cadvisor anon_inode:[eventpoll]
1 /usr/bin/cadvisor anon_inode:inotify
1 /usr/bin/cadvisor socket:[70919240]
I also noticed that issue occurs only if I start cadvisor after my own containers. If cadvisor is the first one started, then I can restart my containers without any issue.
@ashkop That's actually correct. I tried to reproduce the error, but couldn't. If the other containers are started first, only then cadvisor blocks removal.
Here's a script to replicate the error on CentOS 7.
You will need a machine with an empty block device (just replace the path to the device in DOCKER_DATA_DISK) and it will setup docker with devicemapper through lvm's thin-pool, run a container, then cadvisor and then stop & rm the first container.
#!/bin/bash
DOCKER_DATA_DISK=/dev/vdb
set -exo pipefail
setenforce Permissive
yum update -y
yum install -y lvm2
systemctl enable lvm2-lvmetad
systemctl start lvm2-lvmetad
pvcreate $DOCKER_DATA_DISK
vgcreate data $DOCKER_DATA_DISK
lvcreate -l 100%free -T data/docker_thin
curl -sSL https://get.docker.com/ | sh
mkdir -p /etc/systemd/system/docker.service.d
cat <<EOF > /etc/systemd/system/docker.service.d/docker-lvm.conf
[Service]
ExecStart=
ExecStart=/usr/bin/docker daemon -H fd:// \
-s devicemapper \
--storage-opt dm.thinpooldev=/dev/mapper/data-docker_thin
TimeoutStartSec=3000
EOF
systemctl daemon-reload
systemctl enable docker
systemctl start docker
sleep 3
docker run \
--name=test \
-d \
debian:jessie \
/bin/sh -c "while true; do foo; sleep 1; done"
docker run \
-d \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--name=cadvisor \
google/cadvisor:v0.23.1 \
-docker_only \
--disable_metrics="tcp,disk"
docker stop test
docker rm test
The output is:
... some data ...
+ docker stop test
test
+ docker rm test
Error response from daemon: Unable to remove filesystem for 7d7513b0c3310f26e7425728f9c34e219db53a5e4dbb6e0e4259c2e6eb760044: remove /var/lib/docker/containers/7d7513b0c3310f26e7425728f9c34e219db53a5e4dbb6e0e4259c2e6eb760044/shm: device or resource busy
On Ubuntu 14.04, using --disable_metrics="tcp,disk" still does not fix the problem. I've confirmed @ashkop 's observation: If cAdvisor is started after another container, then removing said container fails.
To get around this issue i have tried running cadvisor as standalone..however it does not get data while i am using RHEL ,
cadvisor complains "unable to get fs usage from thin pool for device".. it seems it cant get right information about the storage driver.
Using RHEL 7.1
version 0.23.3 (6607e7c)
docker 1.9.1
Anybody tried similar
This issue is hitting us often and affecting production container deployments (Debian 8.5 hosts, Docker 1.11.1).
Can anyone spell out what we lose by omitting the /:/rootfs:ro mount? Is it just disk usage metrics?
AFAIK, it should be just the disk usage metrics
On Tue, Jul 19, 2016 at 2:38 PM, Shane StClair [email protected]
wrote:
This issue is hitting us often and affecting production container
deployments (Debian 8.5 hosts, Docker 1.11.1).Can anyone spell out what we lose by omitting the /:/rootfs:ro mount? Is
it just disk usage metrics?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/google/cadvisor/issues/771#issuecomment-233774348,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKN3e53lwmDwcVP7hDBloCHdfD_Dsks5qXUO_gaJpZM4FBIxe
.
So, it is possible to stop cadvisor before stop/start any other containers and then start cadvisor again?
cadvisor should be the first container to start.
One should not have to worry about starting/stopping containers in order to properly run cAdvisor. Monitoring should have no affect on the running of containers.
100% agreed. I'm a telling that to workaround the issue you can start cAdvisor before other containers.
But once Cadvisor is monitoring the other containers you are not able to
remove one of the monitored ones until you remove Cadvisor, at least that
happened to me a lot until now that I stop Cadvisor update containers and
start it again. I am doing this wrong?
2016-07-24 13:25 GMT-03:00 Alex [email protected]:
100% agreed. I'm a telling that to workaround the issue you can start
cAdvisor before other containers.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/google/cadvisor/issues/771#issuecomment-234786522,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AACgalxt67a6n9U7PMSW2nyKXJRWjjP5ks5qY5IBgaJpZM4FBIxe
.
Alvaro
i ran into this issue as well. removing the /:/rootfs:ro volume works around the issue for me, but i do lose some stats... looks like network inside the containers, and process lists... maybe others that i haven't noticed right off.
Docker 1.11.2, cAdvisor 0.23.1
i am able to confirm that having cAdvisor loaded first, any containers loaded after are able to be removed without the 'device or resource busy' error
Wanted to add that removing the root volume (/:/rootfs:ro) did not solve this issue for us. We ended up removing cadvisor from our deployment ecosystem until this issue is resolved as it was causing too much pain in our deployment scheme.
upgrading to Docker 1.12 also did not make any difference
Removing the root volume did not make any difference for me. Still blocking containers from removal.
docker 1.11.1
Going to remove cAdvisor from all systems as it is blocking my deployment.
We have this issue too in our production environment.
It's very frustrating because it blocks our upgrade process.
We use Debian 8, Docker 1.10.3 and cadvisor 0.23.2.
I've opted for put down cadvisor while deploying/removing containers and
then put it up again. Not liked much but works.
2016-08-03 17:09 GMT-03:00 Chad McElligott [email protected]:
upgrading to Docker 1.12 also did not make any difference
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/google/cadvisor/issues/771#issuecomment-237356597,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AACgatIXetobsMlE0PJR7LKsnGO454m2ks5qcPWRgaJpZM4FBIxe
.
Alvaro
I ended up just running cAdvisor on the host instead of as a container, and it is working well that way.
Hello Chadxz,
Did you build the latest release or the master? Because the current master branch has issues I have the impression, or I'm doing something wrong.
../../../golang.org/x/oauth2/jws/jws.go:67:17: error: reference to undefined identifier ‘base64.RawURLEncoding’
return base64.RawURLEncoding.EncodeToString(b), nil
^
../../../golang.org/x/oauth2/jws/jws.go:85:16: error: reference to undefined identifier ‘base64.RawURLEncoding’
return base64.RawURLEncoding.EncodeToString(b), nil
^
../../../golang.org/x/oauth2/jws/jws.go:105:16: error: reference to undefined identifier ‘base64.RawURLEncoding’
return base64.RawURLEncoding.EncodeToString(b), nil
^
../../../golang.org/x/oauth2/jws/jws.go:116:25: error: reference to undefined identifier ‘base64.RawURLEncoding’
decoded, err := base64.RawURLEncoding.DecodeString(s[1])
^
and it goes on like that.
We too removed cadvisor from our dev systems as it created 'dead' containers when we tried to remove others.
Evert, to which tool have you moved to monitor containers?
2016-08-05 10:17 GMT-03:00 EvertMDC [email protected]:
Hello Chadxz,
Did you build the latest release or the master? Because the current master
branch has issues I have the impression, or I'm doing something wrong.../../../golang.org/x/oauth2/jws/jws.go:67:17: error: reference to
undefined identifier ‘base64.RawURLEncoding’
return base64.RawURLEncoding.EncodeToString(b), nil
^
../../../golang.org/x/oauth2/jws/jws.go:85:16: error: reference to
undefined identifier ‘base64.RawURLEncoding’
return base64.RawURLEncoding.EncodeToString(b), nil
^
../../../golang.org/x/oauth2/jws/jws.go:105:16: error: reference to
undefined identifier ‘base64.RawURLEncoding’
return base64.RawURLEncoding.EncodeToString(b), nil
^
../../../golang.org/x/oauth2/jws/jws.go:116:25: error: reference to
undefined identifier ‘base64.RawURLEncoding’
decoded, err := base64.RawURLEncoding.DecodeString(s[1])
^and it goes on like that.
We too removed cadvisor from our dev systems as it created 'dead'
containers when we tried to remove others.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/google/cadvisor/issues/771#issuecomment-237847659,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AACgarT3UVvwHModKzo533_6tykLHwPZks5qczf_gaJpZM4FBIxe
.
Alvaro
@EvertMDC I downloaded the prebuilt binary from the latest stable release on the releases page https://github.com/google/cadvisor/releases/tag/v0.23.2
Thanks @chadxz . I overlooked that.
Hello @zevarito . None at the moment but I have used the container exporter images and they worked fine. They suggested to use cadvisor however as they are no longer maintaining it.
Going to run it on the system itself now as Chadxz suggested and see how it goes.
Hi Evert,
I have all with Cadvisor, but I will stop monitoring the Host itself with
Cadvisor and just monitor the containers. For the Host I think Node
Exporter should be the safest bet.
2016-08-05 11:19 GMT-03:00 EvertMDC [email protected]:
Thanks @chadxz https://github.com/chadxz . I overlooked that.
Hello @zevarito https://github.com/zevarito . None at the moment but I
have used the container exporter images and they worked fine. They
suggested to use cadvisor however as they are no longer maintaining it.
Going to run it on the system itself now as Chadxz suggested and see how
it goes.https://github.com/docker-infra/container_exporter
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/google/cadvisor/issues/771#issuecomment-237862836,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AACgahA5bHNx71HVm7ZWjAMOsG9OPY7sks5qc0Z3gaJpZM4FBIxe
.
Alvaro
:+1: to node exporter plus cadvisor. That's what we use & very happy with that combo.
Hey @chadxz, have you continued to have success running cadvisor on the host directly?
I am facing the same problem production when running cadvisor in a container. It's hard to validate in a short period of time whether the baremetal approach will work to fix this bug when it appears so rarely.
@jaybennett89 cadvisor running on host has been working fine. The "Unable to remove filesystem" error never occurs.
I suggest adding dm.use_deferred_removal=true and dm.use_deferred_deletion=true to /etc/docker/daemon.json as a possible workaround:
{
"live-restore": true,
"storage-driver": "devicemapper",
"storage-opts": [
"dm.use_deferred_removal=true",
"dm.use_deferred_deletion=true"
]
}
Has the new release something to do with this bug? The description seems like it does.
No, the issue was https://github.com/google/cadvisor/issues/1461. I'll update the release notes.
@rhuddleston tried using dm.use_deferred_removal=true and dm.use_deferred_deletion=true. Still Resource busy error is throwing but containers are getting removed which were being in Dead state earlier. Is that the same with you?
I noticed something different the other day. My lvm volume couldn't be removed until I stopped the cadvisor container. This must be related.
I had the same problem using docker-compose. As a Workaround (and like it has been mentioned before) i removed my Cadvisor service from the compose file and started the container manually using just docker before anything else.
Also, i had to connect this container to the same network as my compose services using docker network connect default cadvisor, this way my services can now see the container. I can now restart my services without running into this nasty error.
The workaround that always works as per our experience running CAdvisor in host as a process , rather than as a docker container. We have it running in Production without any incidence.
Same problem for us too... We had to stop to use cAdvisor in production until this is fixed.
The workaround that says to install cAdivsor directly to the host is not possible for us.
Hey,
any news from this issue ? I'm still stuck with this:
The bug "device is busy" appears only if cadvisor is start after other containers we want to manage (restart, remove, ...).
I tried all workaround but I'm not completely satified:
Remove this two volumes
- /:/rootfs:ro
- /var/lib/docker/:/var/lib/docker:ro
Problem: We loose most container metrics...
Start cadvisor first
This is the easier workaround to put into practice but it is not really convenient and scallable...
Stop all container, start cadvisor, restart all container:
#!/bin/sh
# Stop all containers
docker stop $(docker ps -a -q)
# Start cadvisor
docker run \
-d \
--name=cadvisor \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--net logs_back-tier \
google/cadvisor
# Restart all container
docker start $(docker ps -a -q)
Very slow if you have a lots of containers... And not working well if you use docker-compose and depends_on for starting order...
Any documentation or advice with this ? Because it is really painfull with docker to link container with service on host (you need to find the host ip for the container network, ...). If someone has anything on this I would appreciate.
That's it ;) The best is to fix this issue perhaps in future version but it seems to be a bit critical and reproductible.
Thanks in advance!
I haven't seen any solution to this yet, apart from installing cadvisor on the host itself without a container. But your machines have to support that.
It's a difficult one to solve in my opinion since cadvisor actually works. It's only in the runtime environment of docker that issues arise.
I noticed today that my systems were using devicemapper, so i switched them to using aufs. So far I have not had any issue with being unable to remove containers when they are started before the cAdvisor container is started. Using all volumes also... docker 1.11.1 / aufs / cAdvisor 0.24.1
Seems like (f)statfs are the problem, according to docker at least:
https://docs.docker.com/engine/admin/troubleshooting_volume_errors/
This is no longer an issue for me, since I switched to:
None of the workarounds above were working for me. Like @xbglowx, the issue has been solved after upgrading the kernel (from 3.16 to 4.9).
@viossat: which storage driver are you using?
It switched from aufs to overlay2 by itself after the kernel upgrade (Docker 17.04.0-ce).
(overlay is in the mainline from kernel 3.18 and overlay2 is supported from 4.0)
I got similar issue with prometheus node_exporter also
https://github.com/prometheus/node_exporter/issues/602
seems bind mounting the path including /var/lib/docker
makes mount namespace leaking.
both are resolved with running it on host directly.
@keylok yeah, with node_exporter is even worse, I'd suggest to move it to
the host (as developers say) to avoid any troubles.
2017-06-15 5:28 GMT-03:00 keyolk notifications@github.com:
I got similar issue with prometheus node_exporter also
prometheus/node_exporter#602
https://github.com/prometheus/node_exporter/issues/602seems bind mounting the path which including /var/lib/docker
make mount name space leaking.both are resolved with running it on host directly.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/google/cadvisor/issues/771#issuecomment-308664996,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AACgaqx351IfzBjKIr-_9DW01UyHRmngks5sEOtKgaJpZM4FBIxe
.
--
Alvaro
@zevarito
I think it can be mitigated.
if I can put exact volumes to be used to the container.
what I means just take off /var/lib/docker/devicemapper being mounted.
could you inform me what of exact host data it uses ?
Same issue with my system:
OS: ubuntu 14.04 LTS
Kernel: 3.13.0-48-generic
Docker: 17.04.0-ce
Got this issue when run cadvisor v:0.26 with docker (even cadvisor:latest). Everything seems ok with node_exporter
@jindov Try to upgrade your kernel, you need to switch to the overlay driver. See my previous comment.
Thank @viossat, I will try to upgrade on dev env, but with prod env, we can't do this, so I decide to run on host directly. It's worked well
How to run cadvisor on the host directly?
Gary
You can use supervisord to run directly, this is my configuration to run cadvisor:
[program:cadvisor]
directory=/build/metric_exporter/cadvisor/src/github.com/google/cadvisor
command=/build/metric_exporter/cadvisor/src/github.com/google/cadvisor/cadvisor -port 9080
autostart=true
autorestart=unexpected
redirect_stderr=true
environment=GOROOT="/usr/local/go",GOPATH="GOPATH=/build/metric_exporter/cadvisor",PATH="$GOPATH/bin:$GOROOT/bin:$PATH"
Jin
Same problem with RHEL 7.4, Docker 17.06.2. Doesn't matter if I'm using ZFS or Overlay2.
Any solution for this by now? Or just run cAdvisor directly on the host?
Hope this helps someone else:
Ubuntu 16.X (kernel 4.4.X) and Docker 1.11.2 w/ AUFS works fine.
Ubuntu 14.X (kernel 3.13.X) and Docker 1.11.2 w/ AUFS exhibits the problem.
So, it looks like overlay isn't necessary, a kernel upgrade is all that's required.
I'm having this issue (can't remove container) on latest version of cadvisor on Centos 7 with kernel version:
$ uname -r
3.10.0-514.26.2.el7.x86_64
Unfortunately I can't upgrade the kernel version, as it is provisioned by our infra team. And we can't upgrade the OS ourselves.
I bypassed this issue by systemctl restart cadvisor, then docker rm <container id> worked.
Most helpful comment
One should not have to worry about starting/stopping containers in order to properly run cAdvisor. Monitoring should have no affect on the running of containers.