Netdata: Multi-host Support

Created on 1 Apr 2016  ยท  58Comments  ยท  Source: netdata/netdata

I have been looking at #10 and others where Docker is discussed. Isolating and combining at the same time to visualize internal container stats and the host machine can be difficult and confusing. One option would be to support multiple hosts in a single instance by specifying the ROOT path for each. The default for single independent machine can still be the way it is right now. All the data will then be encapsulated or name-spaced for each host.

This will allow visualizing certain cases that cant't easily be done right now. For instance someone has a few Nginx instances of Docker containers running. The current solution is to mount it's logs in a specific location inside the detdata container, can only support one instance of Nginx. The multi-host config support will allow monitoring each docker images separately along with the host machine or even remotely mounted hosts right from a single instance of netdata.

This can also be helpful for shared hosting providers that have multiple user instances and would like to monitor individual instances separately.

Most helpful comment

recent netdata development allows monitoring ephemeral containers. So, containers are added and removed and netdata follows closely, adding and removing charts (including their alarms).

netdata can also be centralized, meaning that instances of it may be running inside containers to collect application metrics from hosts, which are streamed in real-time to central netdata (probably running at the host, or elsewhere). When netdata runs in headless mode, it needs just 5MB of RAM.

Finally, netdata is able to monitor cgroups used for systemd services. Actually it can monitor any cgroup based technology (e.g. even ubuntu snaps).

I think we can close this issue now.

All 58 comments

@bigfoot90 Interesting

There is already a request to monitor all containers from the host #91
If we do this right, all system metrics (CPU, memory, disk, etc) will be available to a netdata running at the host (or running in a container but with access to the host's /proc and /sys).

There are also a few enhancements to be made to apps.plugin possibly grouping charts per user, per group and probably per cgroups #146
This will provide additional information regarding the apps running in each container.

What remains is to find a solution to collect values from the applications themselves. You say something about logs. This is not done by parsing their logs. Netdata plugins talk to dedicated interfaces the applications expose with key performance metrics.

So, let's say that you have 10 nginx, or 10 mysql, or 10 apache, or 10 named, each in a container and you want to collect data for each of them.

A few plugins can already do it: mysql and named as such examples. They can be configured to connect to any number of servers.

The others will have to be extended to collect values from multiple servers.

This could also be a policy for all future plugins. If we write a JMX plugin for example to monitor java, it should be able to connect to multiple java instances, or at least allow us to copy it under different names and run it more than once.

Will this do?

I would take the same approach Datadog uses. They also have a Docker image for their agent and mount /proc and /sys/fs/cgroup in similar fashion (to get host and per-cgroup stats). For Docker container statistics I'd vote to hook into appropriate TCP or HTTP interfaces (such as the Docker stats API) to get statistics.

For my use case (heavy LXC usage) grouping per cgroup would work well if we were able to find meaningful cgroups names. Many cgroups can be created, if it is by users or even by systemd (systemd creates cgroup for multi instance services).

Those meaningful cgroups could be collected and named (for exampleLXC - $NAME) by Docker, LXC and other plugins, or the path manually named in a config file (there could be also toggle button to show all cgroups).

There is also case of namespaces, especially network namespaces (as other types like PID, User and Mount do no hide information from the host).

Network namespaces would also have to be somehow linked to LXC and Docker instances for meaning names. In case of veth interfaces it would be nice to show information about linked partner.

Collecting data from LXC can be done in two ways, either by command line tools (lxc-ls, lxc-info) or liblxc written in C.

It's a very tricky question. For those of you familiar with ganglia-monitor, the way it works is that is has agents on all nodes, which send out data over multi-cast. One node has a collector (central node), which aggregates all the data together so you can view metrics on your whole cluster.

There are a few key differences here between traditional VMs or bare-metal machines. You can install whatever you like into any of them - and most likely, all of them are using the same OS.

Problem 1, Docker images can run in anything, debian, ubuntu, alpine. No info about available libraries, etc (solution: static compilation?).
Problem 2: The only way to extend the container with an agent is to docker cp, but you still need to run it somehow - there are entrypoints, systemd, shell scripts (run.sh like in netdata docker). No reliable fix?
Problem 3: Even if you managed to install and run an agent within each docker - you would need network connectivity from the central instance (which pulls), or to the central instance (to where it would push metrics). If to look at ganglia as an example, multicast is config-less (in the sense that all agents send to the same channel, and netdata listens to that channel). But, it does require extending privileges in _each_ existing container.

tl;dr: installing an agent in your existing containers is probably a non-starter.

So, currently, your options are just finding out additional metrics on the host (lxc?) and pushing it to netdata. I would advise against exposing additional services/access to the netdata docker instance. For example, if you would want to map docker stats from within the container, it would give the container privileges to other docker commands like docker run, docker exec, docker rm/rmi - the security implications are obvious.

What could be done:

Under proc and sys folders there already exist various cgroups interfaces with available metrics for docker. Some more information is available in the docker documentation here. It would be possible to extend the graphs in netdata itself, to map the cgroups filesystems in a more readable way. I'm not sure if all data is available from here (like container names, for example). From what I understand, this is what @andyshinn also suggest here. :+1: on that.

I see some interest in keeping a central netdata collector node, which would aggregate data from multiple netdata hosts. This would have some impact on individual hosts, as they would have to send this data over the wire all the time. I would configure the collector node to register with the hosts (all the configuration would be in the collector) and then netdata would send it the collected metrics at whatever rate it is possible (keeping low latency - tcp, 1-2 seconds? or configurable less-realtime amount). This is similar to munin. Obviously such a collector is currently out of scope for netdata, but I'm sure something similar to redis "MONITOR" command could be built into netdata web apis (websocket + start receiving all metrics). Such changes are up to @ktsaou. I would vote :+1: on such an endpoint, but one of main strengths of netdata is it's powerful graphing. The guys really put so much effort into it that I doubt it can be replicated very quickly by someone else. So the question is guys - how about a collector netdata which would collect all the metrics from a cluster of netdata hosts? :)

ok, so I understand the best way to collect cgroup data is using the host's /proc and /sys. If netdata is running on a container or not, is another story. I understand the best solution is to collect the values directly from the filesystem.

Someone with a docker installation, please post a gist with this:

find /sys/fs/cgroup

lxc does provide its names in the filesystem (as given by @Kubuxu here: https://gist.github.com/Kubuxu/a443014329fbdb3bbd6202e4f90304b2)

If docker does the same, we have the names and it is time to collect data...

@titpetric you raised another question regarding a "central" netdata. You all know I don't believe in this, but I would like to discuss it. So, I'll open another issue to discuss how we can monitor a large number of netdata servers.

As you say netdata's strength is its speed, but if you take the same chart libraries (they are open source too) and put them on any other back-end, you will not get netdata... The netdata server responds in about 2ms for most chart refreshes. This is why the UI is so snappy (of course I spent quite some time trying to eliminate UI bottlenecks too, but they are both that give this result).

This would be the gist of cgroups in mine: https://gist.github.com/titpetric/c6460037145e116323ceea31bdc46f50

Also as a fyi, this is my docker ps -a -q (just docker ID's). Readable names, e.g. "netdata" are not available under cgroups folder anywhere (not as filename/folder or file contents).

root@serenity:/src/netdata/bin# docker ps -a -q
d88f008a6004
668058ae4159
653c4fe9fd45
ab0e9bb71210
95cb39da38c2
dc6b99c4121a
c19d37217acf
8c02fcc8aa75
d87bf7d0cfce
2f65ea214248
312934ee62f6

The hash d88f008a6004 is located under cgroups (in slightly longer format). Think of it as a git commit (short hash), the ones under cgroups will be longer.

and how do we find the names?

From within the container, the only way is forwarding the docker command socket, which im not advocating. It opens up access to all docker commands.

# docker ps --format="{{.ID}} {{.Names}}"
d88f008a6004 netdata
668058ae4159 dropbox
653c4fe9fd45 nginx-front
ab0e9bb71210 redis
95cb39da38c2 serenity-black-dev
dc6b99c4121a dev
c19d37217acf dnsmasq
8c02fcc8aa75 serenity-black-ext
d87bf7d0cfce phpmyadmin
2f65ea214248 db1
312934ee62f6 samba

This is the only way I know to get the names from the host.

This is the only way I know to get the names from the host.

and this is only available as root?

It's also available in the docker group.

tl;dr: installing an agent in your existing containers is probably a non-starter.

Having an agent in every container is a real no-go. It is against the philosophy of micro-architecture where every container being responsible for only one task which is especially important in case of large and complex system orchestration and equally feasible in small set-ups. Additionally, injecting another binary in every container (or the image) will affect the sharability on the image. Stats ideally should be unobtrusive, externally collected, controlled, and visualized. One reason why jQuery was such a big hit was it's unobtrusive nature where without littering all the HTML elements with onclick attributes, one can externally bind events to them and bring all the benefits to the table.

I created a charts.d plugin for monitoring all running Docker containers on a host. It does a 'docker ps' to get the docker container id's and then does an 'inspect' to build a dictionary of id/name and id/pid. From there you should have everything you need to dig through /sys/fs and /proc to get whatever stats you want to graph.
I ended up using sudo because adding the netdata user to the docker group didn't seem work in allowing the netdata user to run docker commands directly.
I'm sure this can be improved upon but I offer it up as working proof of concept that might help people out...

https://github.com/davechouinard/docker-netdata-plugin

@davechouinard very nice!

I have already started writing the data collection for containers in C (I thought that was the proper way to do it, since I have seen several installations with dozens of containers running - anything else would be very slow). The only problem I have is getting the docker id-name mappings.

I guess per docker information could also be exported from apps.plugin, that already walks through the entire process tree.

I'll have a look at your work and I'll let you know of my progress.

@ktsaou Let me know if I can help somehow. If system calls are not to your liking (calling "docker ps", "docker inspect"), the only other way would be to build/use a client which talks directly to the docker deamon via docker socket. There's an implementation available in Go. https://github.com/fsouza/go-dockerclient

Edit: official docs here

Running docker commands to get some information such as id/name mapping would mean that this has to be done on the docker host not inside a container which mounts certain host files, right?

@ibnesayeed technically, not. You can pass the docker.sock to a container, but it also exposes the complete docker API to the container. It has security implications, as Docker to my knowledge doesn't currently provide an ACL to limit which API calls would be available here.

The sick-est work-around which I can think of is to run a minimal Go program on the host, which would be a proxy to the docker socker, giving only the limited API. This program could register it's own socket which could be securely forwarded to the netdata container. It's currently the only way to have security and at the same time not add too much complexity to a running host/container.

In this case the security implications are a bit more relaxed, as the socket is local (not available on the network), passed to a specific container with -v, and assumed to be secure as it implements a read-only interface.

Well, as I see it, the only info missing from /sys/fs/cgroup is the docker id-name mapping. I don't know what to do yet. Alternatives:

  1. netdata could connect to docker API
  2. netdata could run something with escalated privileges
  3. netdata could just wait for a file with this mapping in /etc/netdata/docker-names.conf, so you can have a cron job to dump this info to this file once every few minutes, or somehow add a hook somewhere in docker to update this file, which netdata will read at runtime to get the docker names - no netdata restart should be required.

No solution is elegant though.

For the moment, I plan for solution 3 (the simplest), until we find something better. This would allow netdata run in a docker itself and it will be your responsibility to find a way to expose the names of the containers. If you fail to do this, you will see docker IDs on the dashboard. Of course, when we reach this point, we can re-evaluate the situation.


There is something more I need to understand and I need your help:

Shall netdata be able to participate in multiple containers?

How does it sound to say that netdata has to run at the host, and certain of its plugins will be able to enter different network namespaces, to also collect data from the apps you run inside the containers.

Something like this, in a more compact form:

ip netns exec CONTAINERID mysql -s -e "show global status;

The above could expose the performance metrics of a mysql running at CONTAINERID to a netdata server running at the host. Am I right?

Is this behaviour desired?

and another question

I plan to do this Per container. One such section will be shown for every container you have.

  1. CPU

    • user

    • system

  2. Per core CPU utilization

    • core0

    • coreN

  3. Memory

    • cache

    • rss

    • rss_huge

    • mapped files

  4. Memory Writeback
  5. Memory Page Faults

    • in

    • out

  6. Swap
  7. Disk Bandwidth

    • read

    • write

  8. Disk Operations

    • read

    • write

  9. Disk Bandwidth per I/O type

    • sync

    • async

  10. Disk Operations per I/O type

    • sync

    • async

  11. Disk Throttled Bandwidth

    • read

    • write

  12. Disk Throttled Operations

    • read

    • write

  13. Disk Throttled Bandwidth per I/O type

    • sync

    • async

  14. Disk Throttled Operations per I/O type

    • sync

    • async

  15. Disk Queued Operations

    • read

    • write

  16. Disk Queued Operations per I/O type

    • sync

    • async

  17. Disk Merged Operations

    • read

    • write

  18. Disk Merged Operations per I/O type

    • sync

    • async

A lot of charts per container!

Questions:

  1. I think it would be too much to show also per container and per disk statistics. Do you agree?
  2. Do you also need a few charts comparing the containers (i.e. CPU Total per container, Memory Total per container, Per Disk Total per container)?

I am thinking a different approach which is more inline with the original proposal of this ticket, but takes care of some of the docker container stuff in.

Let's assume that the config file of netdata has namespacing support (according to this request) to isolate multiple virtual or real host hosts logically. These namespaces are defined manually. For the illustration purposes allow me to use YAML format of config file:

---
name: Some Global Name
history: 3600
other: global configs
hosts:
    this host:
        host access prefix: /path/where/root/of/this host/is/mounted
        other: host specific config
    a remote host:
        host access prefix: /path/where/root/of/a remote host/is/mounted
        other: remote host specific config
    a docker container name/id:
        host access prefix: /path/where/root/of/a docker container/is/mounted
        other: container host specific config
    another docker container name/id:
        host access prefix: /path/where/root/of/another docker container/is/mounted
        other: container host specific config

Say, if a config like this is available, netdata can generate namespaced reports for them. Now, it's the job of the maintainer to customize how they want their config files to look like and how to make sure the corresponding paths will be mounted in appropriate places irrespective of whether it is running in a container or outside on the host machine. The maintainers may chose to combine multiple hosts/containers together or split different services of the same host under separate virtual host configs. This config file can then be generated and periodically updated using a templating system. This way generation of the config file will not be the responsibility of netdata at all. There are some nice examples on the web where people automatically generate config files for Nginx to dynamically load balance multiple docker container instances of different services, and this will be no different. This approach even solves the container name issue and it can grab other metadata for each container. Such a routine can be run on the host machine where netdata container is running, and the config file can then be mounted inside the netdata container.

Hm... I am not sure I understand your proposal.
Can you give a more detailed example for let's say a host with containers A and B?

Hm... I am not sure I understand your proposal.
Can you give a more detailed example for let's say a host with containers A and B?

Let's assume the name of the host machine is myhost and the two containers are named containerA and containerB. Also, assume that we are going to run netdata on the host machine rather than inside a container for now. Which means we are planning to expose all the stats in three namespaces (I will call these name spaces as hosts or virtual hosts sometimes).

Let's assume that netdata can read though the config file and reads a list of different virtual hosts/namespaces and generates stats for them treating each of them as a separate machine/host. Assume that netdata expects a root path for each virtual host/namespace and generate stats accordingly. Under that root path, it can assume a complete host and generate any stats that make sense utilizing the built-in predictive power of the netdata for known common data sources.

Lets assume that the two containers have their /proc and /sys volumes mounted from specific places on the host. Lets assume that the host machine has the following directory structure for those mounted volumes:

/
โ”œโ”€โ”€ scratch
โ”‚ย ย  โ”œโ”€โ”€ containerA
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ proc
โ”‚   โ”‚ย ย  โ””โ”€โ”€ sys
โ”‚ย ย  โ””โ”€โ”€ containerB
โ”‚ย ย   ย ย  โ”œโ”€โ”€ proc
โ”‚    ย ย  โ””โ”€โ”€ sys
โ”œโ”€โ”€ proc
โ””โ”€โ”€ sys

Now assume the following netdata config file:

---
name: Some Global Name
history: 3600
other: global configs
virtual hosts:
    myhost:
        host access prefix: / # it's default so can be skipped
        other: myhost specific config
    containerA:
        host access prefix: /scratch/containerA/
        other: containerA host specific config
    containerB:
        host access prefix: /scratch/containerB/
        other: containerB host specific config

Now the netdata can loop thorough the objects under virtual hosts keys and generate three namespaces myhost, containerA, and containerB. And as described before, this config file can be updated using external tool using some templating system and querying the docker engine.

Suppose, containerB runs MySQL, and corresponding directories are mounted under /scratch/containerB/ thennetdata` will notice them and will generate MySQL specific stats for containerB automatically.

Additionally, if someone wants to see stats per application, not per container, then they can combine corresponding directories from various related containers in a virtual host/namespace and mount the directories/files accordingly.

Now, consider that the netdata itself needs to be running inside a separate container, then basically one needs to mount the externally generated config file inside the container and mount the above described directory structure inside (perhaps in readonly mode).

So, you suggest to run one netdata, which should behave as if you were running one for each container.

I see 2 problems with these:

  1. netdata will re-do the work for each container. So the CPU resources used by netdata will be multiplied by the number of containers running (it would be more or less the same with running multiple netdata, one for each container).

If we use the host's /proc and /sys we have all the information we need (except docker names). It is just a matter of organizing this information on the dashboard, and the work will be done just once.

  1. you have not solved the problem of netdata plugins willing to talk to applications running inside the containers. This communication, in most of the cases, is not filesystem based, but network based.

@ktsaou if container id name mappings are still wanted, i'd vote for using docker.sock API. There are Two relevant API calls that need to be implemented, proto is HTTP, response is json., There is some variance in API versions of Docker, so mileage may vary on older Docker hosts.

You will need: list containers
And most likely: inspect a container

Inspect a container is mostly to expand the short Id you get with list containers into a long one. It can be avoided if you'd just like to match it against the cgroups filesystem (you can do this expansion by yourself, but it might be better to rely on the API). Also, some metrics are only exposed by this API, perhaps we should treat cgroups & docker more along the lines of a custom plugin/app, like mysql and named which you mentioned above - and only use the API as a source of truth?

Check PR #308
First support for monitoring cgroups. Tomorrow I will try to work around the 2 issues mentioned there, before merging it.

@titpetric I am thinking to adding a shell script that will be called with the id of the container as taken from /sys and should respond with a name for it.

This will de-couple the names from netdata. It will allow us to do whatever we like with names, without having to push a new netdata version.

The script will be configurable at netdata.conf so it would be easy to just supply yours.

How does that sound?

I'm still in favor of opening the docker.sock from standard location.
Usually owned by root.docker (or root.root in some cases). It's one well
defined api call with a json response. Don't see the benefit of an exec
call because: 1) api call wouldn't need to spawn a process, and would be
faster, 2) script would be bundled with netdata anyway? Decoupling in this
case is a weak argument.

There's only the question of non-docker use of cgroups (lxc?) for which I
don't have the info. Obviously in such a case, docker.sock would not exist
but you would have cgroups. But this i think is the minority use case for
now. And in newer versions, cgroups has container names I believe (can't
confirm, someone with lxc care to pitch in?)

P.s. i have the proxy which limits requests to the docker socket already in
testing, so there will be a viable decoupled option to provide docker.sock
to a netdata container safely.

On Sunday, 24 April 2016, Costa Tsaousis [email protected] wrote:

@titpetric https://github.com/titpetric I am thinking to adding a shell
script that will be called with the id of the container as taken from /sys
and should respond with a name for it.

This will de-couple the names from netdata. It will allow us to do
whatever we like with names, without having to push a new netdata version.

How does that sound?

โ€”
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/firehol/netdata/issues/115#issuecomment-214037917

ok. You are right. But this needs more work (different API versions to be tested, write JSON parsing in C, test the proxy, etc).

I suggest to follow this path:

  1. I will move cgroups to a new thread. Now they are together with all other proc plugins. We need this, since there might be latency in collecting the container names and if it is with proc all other data collectors will be slowed down too.
  2. netdata will call an external script only when a new container is found. It will not run periodically. Just once per container found and never again. If you want to change the name, you will have to restart netdata. I will supply a sample script that in case of docker containers, it will lookup the names to a file (/etc/netdata/cgroups-names.conf) or if the container in question is not found, it will just shorten the docker names (keep the first 12 characters of the long id).

With just the above, I will merge PR #308 so that everyone can benefit from it.

Then, we can write whatever we like to collect the names properly and when we will be ready, we will replace point 2 with the new solution.

Is that a good path?

I implemented the above: https://github.com/firehol/netdata/pull/308#issuecomment-214056814
From my part, this can be merged now.

@ktsaou I've just published docker-proxy-acl which will limit the access to docker.sock to only some subset of requests specified. For what netdata needs, it's enough to run it with ./run -a containers - this will enable the API request on /containers/json, which dumps all the containers, or /containers/[id]/json to get an individual mapping. The first endpoint could be used when netdata starts, the second could be used when a new ID is detected in cgroups.

What's your opinion on just using an existing JSON parser? Judging from this benchmark the RapidJSON project would be the best-fit for evaluation.

I wouldn't test against different API versions, but I would gracefully handle a missing field from the JSON. The "Names" field is present since API 1.15 from what I can see in the official documentation. You can get the API version by issuing a call against /version (requires -a version with docker-proxy-acl). We could say that you need a Docker API >=1.15 to use it in general, it would take some time to roll out some unit tests, but it should be possible... I'd rather just "gracefully" ignore the responses if they don't contain Name fields.

Threading: yes, cgroups should live in it's own thread. Responses vary in response times, they may be quite slow. ~With the proxy I'm not decoding the protocol, so finalizing a response / closing the connection requires up to a second of dead air, in addition to the time to first byte.~
Edit: applies only to docker-proxy, which is exposing the docker.sock API over HTTP. Ignore this note.

If there's some need here to wrap the API requests for container names into something like names-getAll and names-get $name, I can give it a stab with bash and some awk. I'd still rather it was done in the cgroups plugin to avoid a exec call + have the dependencies in code (as opposed to assuring that awk is the correct version, etc.).

Note, I'll be updating the docs for titpetric/netdata docker image to include instructions how to use it along with docker-proxy-acl to provide a limited API to the netdata container. Might take a few days with my travel/work schedule.

ok. I think the right way to do it, is to build first an external C program and once we are happy with it, merge it into netdata. The external command I made netdata call will be handy for the transition.

Regarding the JSON parser, I read RapidJSON is C++. A few years ago I used successfully jsmn. This is pure ANSI C for embeddable applications (I used it with an application that was installed in card processing terminals with just 4MB RAM).

Anyway, I think it is best first to have a simple shell script do the job, then move to a C binary and then merge it into netdata.

BTW, are the long ids of the docker containers persistent across reboots? The way I implemented this right now you can enable / disable a container in netdata.conf using the original container id as found in /sys/fs/cgroup. If they are not persistent, I'll have to change this...

are the long ids of the docker containers persistent across reboots?

no they are not. I am committing a change to fix netdata (it will first run the script to translate the id of the container and then check if it is enabled).

BTW, I installed and run docker. @titpetric now I understand you. Docker did things a little bit complicated.

Well, it's quite clear in many ways. Once you turn your head around a bit... :) It's just very young tech (3 years a few days ago), so it might be missing some things which are available somewhere else. Nothing I haven't been able to work around yet - it's very powerful out of the box :)

it's very powerful out of the box

I am sure!

At cgroup-name.sh I added a docker ps just in case it can be run. On my system, as a test, I added the netdata user to the docker group. However I still cannot make it work. When the script is run under netdata I get this:

WARNING: Error loading config file:/root/.docker/config.json - stat /root/.docker/config.json: permission denied
Cannot connect to the Docker daemon. Is the docker daemon running on this host?

Any ideas?

And when I run it from the command line, it works:

# su -c "docker ps" netdata
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
f6431eb57363        alpine              "sh"                2 hours ago         Up 2 hours                              dreamy_newton

I'm not exactly sure what you want to do. It seems the docker binary needs config.json (which might be empty), so you'd need root permissions to run it.

drwx------ 2 root root 4096 Apr 20 08:08 .docker/
-rw------- 1 root root  129 Apr 20 08:08 .docker/config.json

If you want a very hack-ish way to get a container ID from the docker socket:

# echo -e "GET /containers/d7007f81b34a/json HTTP/1.0\r\n" | nc -U /var/run/docker.sock | sed 's/"//g' | sed 's/,/,\n/g' | grep ^Name | sed 's/[,:]/ /g' | cut -d " " -f 2
/db1

There's also other ways (socat for one, curl > 7.40 for another, via this example). With it you shouldn't have problems, it seems just the docker binary has issues in regards to that config.json

Well, according to the docs, any user who is a member of the docker group should be able to run docker ps.

The problem is that I cannot make this work. I add netdata to the docker group and still it does not work.

The problem is that I cannot make this work. I add netdata to the docker group and still it does not work.

I had this same problem with my charts.d plugin which is why I turned to sudo but I think I may have found something that may be the issue:
In my plugin if I run:
whoami > /tmp/whoami.txt
I get 'netdata' inside that file.

When I run:
echo $HOME > /tmp/home.txt
I get '/root' inside that file.

I start the netdata daemon as root and then it drops privs and runs as the netdata user but it appears that the $HOME environment variable can't be left as /root if you want to run docker commands.
Same issue they had here: https://github.com/docker/docker/issues/14669 when trying to get jenkins to run docker commands. It would work fine when they did 'su jenkins' because that set the correct $HOME environment but inside the application it was left as /root and wouldn't work...

update:
I exported $HOME=/tmp in my script and the /root/.docker/config.json warning went away but still can't connect to the docker daemon so this apparently isn't the fix...

update2: along these lines I fooled around with printing all the groups I'm a member of inside my charts.d netdata script:
echo ${GROUPS[*]} inside my script reveals only the netdata group id
If I change the netdata user's shell to /bin/bash and 'su - netdata' and run this same thing:
echo ${GROUPS[*]} I see the docker group id in the list.
So there seems to be some fundamental difference between a normal user and their groups in the shell and the way the netdata daemon is running as the netdata user - somehow the group memberships are not kept intact...

ok. I fixed them both ($HOME and supplementary groups):

 # cat /proc/21726/status
Name:   netdata
State:  S (sleeping)
Tgid:   21726
Ngid:   0
Pid:    21726
PPid:   1
TracerPid:      0
Uid:    999     999     999     999
Gid:    981     981     981     981
FDSize: 64
Groups: 40 979 981           <<<<<<<<<<<<<<<<<<<<<<<<<<<
NStgid: 21726
NSpid:  21726
NSpgid: 21725
NSsid:  21725

and verified the cgroup-name.sh script can now run docker ps if the netdata user is in the docker group.

image

and fixed cgroup-name.sh to properly pick the docker names. I used this:

docker ps --filter=id="${CGROUP:7:64}" --format="{{.Names}}"

Tested this out on one of my work servers this morning, all I had to do was add netdata user to the docker group and all the container names populated, looks great, thanks!

Fixed all the pending issues (including using mountinfo for getting the proper names of /sys/fs/cgroup).

Until we use @titpetric 's proxy, I have made the installer add the netdata user to the docker group, so that it will work out of the box for most users.

@ktsaou You will have to keep netdata in the docker group, proxy or not.

I fixed the docker-proxy-acl just now to give the file 0666 permissions, but I can't resolve the group name until Go 1.7 - implemented here. Currently I can't set the usergroup to docker, because I'd have to hack it in. Since it's a "safe" interface, 0666 is a good compromise.

Keep in mind, some of the hosts might not even have a docker group. While I can't be sure what kind of percentage that is, it's very possible that in some cases when running netdata outside of the container, it will not be possible to access the socket because of root.root ownership. I guess this is just a note for the documentation, not sure if I'd want to escalate netdata privs because of it.

I would suggest to add a configuration flag to specify the location of the docker socket. It may be a path to any local unix socket (made with docker-proxy-acl), you could perhaps detect it by looking at /tmp/docker-proxy-acl/docker.sock first, and then look at /var/run/docker.sock if the first fails?

You will have to keep netdata in the docker group, proxy or not.

Then why do we need the proxy for?

The way cgroups support is today in netdata, I think there are 2 areas that can be improved:

  1. I am trying to find a way to move the veth interfaces under their cgroup. Any ideas how I can find this information?
  2. It would be perfect to have at least 2 stacked charts per disk showing their usage per container. These charts could be:

    1. I/O bandwidth

    2. I/O operations

Is this needed or is it too much detail?
Generally I am trying to avoid charts that are not very useful. They require memory, especially the stacked ones with several dimensions.

About 1:
You would have to find under which network namespace given paired veth is and then check under which network namespace processes of given cgroup are.
Be wary of fact that network namespaces are totally different subsystem from cgroups.

@ktsaou It's not needed, but it's recommended for the security-concerned.

The proxy is meant to restrict access which is given via docker.sock - it's an obvious issue/caveat of forwarding docker.sock as-is to a container, since it doesn't limit access to the host. If netdata is running on the host - you still have the option of including docker-proxy-acl, for a wider audience (www-data for example), and close docker.sock only to root.root/chmod 0600.

This is why I recommend checking for /tmp/docker-proxy-acl/docker.sock first (world readable), and then for /var/run/docker.sock second (usually root.docker, may be root.root/0600 in some cases!). If you want a longer read: Exposing your Docker API.

An implementation for querying docker.sock:

#!/bin/bash
function docker_name_api {
        DOCKERID=$1
        echo $DOCKERID
        # request api with nc, extract name with JSON.sh,
        # trim quotes from name, finally remove slashes.
        NAME=$(echo -e "GET /containers/$DOCKERID/json HTTP/1.0\r\n" | \
                nc -U /var/run/docker.sock | \
                sed 's/"//g' | sed 's/,/,\n/g' | grep ^Name | sed 's/[,:]/ /g' | cut -d " " -f 2)
        echo "${NAME/\//}"
}

docker_name_api $1

And an altenative version:

function docker_name_api {
        DOCKERID=$1
        echo $DOCKERID
        # request api with nc, extract name with JSON.sh,
        # trim quotes from name, finally remove slashes.
        NAME=$(echo -e "GET /containers/$DOCKERID/json HTTP/1.0\r\n" | \
                nc -U /var/run/docker.sock | \
                grep '^{' | bash JSON.sh -b | grep '^\["Name"\]' | \
                cut -f 2 | rev | cut -c2- | rev | cut -c2-)
        echo "${NAME/\//}"
}

I'm using JSON.sh to extract keys from json response. Needs netcat-openbsd for -U option, egrep for JSON.sh. Possible to replace that line with socat or curl >= 7.40. I'd still like something more self contained (first example). My first example from above however might not be the most resilient to various json structure changes, but it has no new deps (only basic grep/cut).

I'd suggest:

  1. check for docker binary
  2. check for docker.sock
  3. if docker binary exists use that (as you do now)
  4. if docker socket exists use that (above implementation(s))

I'm mostly not forwarding docker binary into the netdata container, and would like to avoid it as there are a number of libraries needed to make the binary work.

Let me know if I should modify the script bundled with netdata or if you have other plans.

I'd suggest:

check for docker binary
check for docker.sock
if docker binary exists use that (as you do now)
if docker socket exists use that (above implementation(s))

It seems good.

Check this:

curl -sS --unix-socket /var/run/docker.sock "http://docker/containers/${DOCKERID}/json" |\
    sed 's|.*"Name":"\(/[^\"]\+\)",.*|\1|'

Since the docker outputs many Name: the above is a hack. The docker names start with /, so it matches that.

Another solution, could be this:

curl -sS --unix-socket /var/run/docker.sock "http://docker/containers/${DOCKERID}/json" |\
    jq -r .Name

This one is probably the best way to do it, since jq is a full blown JSON parser and query language for JSON files.

Ok, jq seems fine. Better reliable, just don't want to encourage some
dependency hell. I will look at it tomorrow to include it in the netdata
dockerfile. I'm pretty sure that curl is outdated in debian jessie however,
and doesn't include --unix-socket. I'll amend your last suggestion to
whatever is available (netcat is reliable).

On Thursday, 5 May 2016, Costa Tsaousis [email protected] wrote:

I'd suggest:

check for docker binary
check for docker.sock
if docker binary exists use that (as you do now)
if docker socket exists use that (above implementation(s))

It seems good.

Check this:

curl -sS --unix-socket /var/run/docker.sock "http://docker/containers/${DOCKERID}/json" |\
sed 's|._"Name":"(/[^\"]+)",._|\1|'

Since the docker outputs many Name: the above is a hack. The docker names
start with /, so it matches that.

Another solution, could be this:

curl -sS --unix-socket /var/run/docker.sock "http://docker/containers/dd0430e2d924/json" |\
jq -r .Name

This one is probably the best way to do it, since jq is a full blown JSON
parser and query language for JSON files.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/firehol/netdata/issues/115#issuecomment-217181987

You are right. I think however people will start abusing JSON.sh for data collection, and this will make everything slow. I prefer to have a dependency on jq, which is fast enough...

Modify the cgroup-name.sh too.

I added jq and netcat-openbsd to the netdata docker repo. I've checked curl in jessie and it was like I was afraid of - the --unix-socket is not included (curl 7.38 is included in debian jessie, need 7.40). I've also looked at what's available on the Docker API, and slightly extended the function to get the hostname for api versions < 1.17 which didn't include the name parameter.

function docker_name_api {
        DOCKERID=$1
        # request api with nc, extract name with JSON.sh,
        # trim quotes from name, finally remove slashes.
        JSON=$(echo -e "GET /containers/$DOCKERID/json HTTP/1.0\r\n" | nc -U /var/run/docker.sock | egrep '^{.*')
        NAME=$(echo $JSON | jq .Name,.Config.Hostname | grep -v null | head -n1 | rev | cut -c2- | rev | cut -c2- | sed 's|^/||')
        echo $NAME
}

The .Name parameter was introduced in api 1.17 and before only .Config.Hostname was available. jq prints null if .Name is not included, that's why i added grep -v null. I take the first match (most specific), trim the quotes with rev/cut, and strip leading slash if it exists with sed. I think it's the most robust version so far.

Let me know if you have any comments before I fork/change cgroup-name.sh.

nice. jq -r will not print the quotes

PR is up in #382, tested against my local netdata. I'd consider using just API but it would make jq a hard dependancy as opposed to a soft one (container with just socket has jq installed, and there are no other use cases for having docker.sock without having the docker binary).

I find this post relevant, so leaving here for reference. Container Monitoring: Top 10 Docker Metrics to Track

I see 2 metrics missing from netdata:

  1. total number of containers running
  2. memory fail count per container

right?

Docker 1.12 has introduced built in scaling and load-balancing along with swarm mode for clustering. They use labels to identify containers that belong to a specific service. This can be utilized to aggregate the metrics per service with the option to expand them to drill down to individual container level. I am not quite sure how easy would it be and how to present it effectively, but just tossing the idea here.

recent netdata development allows monitoring ephemeral containers. So, containers are added and removed and netdata follows closely, adding and removing charts (including their alarms).

netdata can also be centralized, meaning that instances of it may be running inside containers to collect application metrics from hosts, which are streamed in real-time to central netdata (probably running at the host, or elsewhere). When netdata runs in headless mode, it needs just 5MB of RAM.

Finally, netdata is able to monitor cgroups used for systemd services. Actually it can monitor any cgroup based technology (e.g. even ubuntu snaps).

I think we can close this issue now.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jeremyjpj0916 picture jeremyjpj0916  ยท  64Comments

UltimateByte picture UltimateByte  ยท  68Comments

cakrit picture cakrit  ยท  56Comments

luvpreetsingh picture luvpreetsingh  ยท  67Comments

pgassmann picture pgassmann  ยท  72Comments