Elasticsearch version: 5.2.2
Plugins installed: []
JVM version: 8u122
OS version: Linux
Description of the problem including expected versus actual behavior:
The OS Probe Regex fails if there is a cgroup entry with no controller and crashes. Example /proc/self/cgroup (see last line):
8:net_cls:/ 7:devices:/user.slice 6:pids:/user.slice/user-1000.slice/session-3.scope 5:blkio:/ 4:freezer:/ 3:memory:/ 2:cpu,cpuacct:/ 1:cpuset:/ 0::/user.slice/user-1000.slice/session-3.scope
Related issue: https://github.com/elastic/elasticsearch/issues/23218
Steps to reproduce:
1.
2.
3.
Provide logs (if relevant):
Mar 03 12:43:00 elasticsearch systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE Mar 03 12:43:00 elasticsearch elasticsearch[641]: ... 6 more Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:241) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.bootstrap.Bootstrap$6.(Bootstrap.java:241) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.node.Node. (Node.java:232) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.node.Node. (Node.java:345) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.monitor.MonitorService. (MonitorService.java:45) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.monitor.os.OsService. (OsService.java:45) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.monitor.os.OsProbe.osStats(OsProbe.java:466) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.monitor.os.OsProbe.getCgroup(OsProbe.java:414) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.monitor.os.OsProbe.getControlGroups(OsProbe.java:216) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at java.util.regex.Matcher.group(Matcher.java:536) ~[?:1.8.0_121] Mar 03 12:43:00 elasticsearch elasticsearch[641]: Caused by: java.lang.IllegalStateException: No match found Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.cli.Command.main(Command.java:88) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:125) ~[elasticsearch-5.2.2.jar:5.2.2] Mar 03 12:43:00 elasticsearch elasticsearch[641]: org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: No match found Mar 03 12:43:00 elasticsearch elasticsearch[641]: [2017-03-03T12:43:00,027][WARN ][org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler] uncaught exception in thread [main] Mar 03 12:42:59 elasticsearch elasticsearch[641]: [2017-03-03T12:42:59,000][WARN ][org.elasticsearch.deprecation.script.groovy.GroovyScriptEngineService] [groovy] scripts are deprecated, use [painless] scripts instead Mar 03 12:42:58 elasticsearch elasticsearch[641]: [2017-03-03T12:42:58,412][INFO ][org.elasticsearch.plugins.PluginsService] no plugins loaded
Describe the feature:
Make the regex more robust.Changing the + to a * in the failing regex for the part matching the cgroup controller name should do the trick. I would make a PR, but I am not willing to sign the CLA.
Thanks for the report @phile314-fh and sorry for the issue. I'll put together a fix soon. What Linux distribution are you using (including version, and kernel version)? Would you share the output of cat /proc/cgroups and mount | grep cgroup?
Kernel: Linux nixos 4.9.9 #1-NixOS SMP Thu Feb 9 07:08:40 UTC 2017 x86_64 GNU/Linux
systemd: 232
Distribution: NixOS 17.03
/proc/cgroups:
#subsys_name hierarchy num_cgroups enabled cpuset 1 1 1 cpu 2 1 1 cpuacct 2 1 1 blkio 5 1 1 memory 3 1 1 devices 7 38 1 freezer 4 1 1 net_cls 8 1 1 pids 6 43 1
mount | grep cgroup:
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup2 (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
Although you don't officially support NixOS, this ought to be fixed as the problem could also happen on other distributions. Furthermore, as the error (when it occurs) is quite severe, a more liberal parsing of the cgroups seems appropriate to me.
Okay, I wanted to ensure it was only because the cgroup version 2 hierarchy was mistakenly accounted for and that's exactly what is happening here. I opened #23493.
Faced this on FC26 as well.... For ES v5.2.2
$ cat /etc/redhat-release
Fedora release 26 (Twenty Six)
$ uname -a
Linux jaypc 4.11.9-300.fc26.x86_64 #1 SMP Wed Jul 5 16:21:56 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
@jay-dihenkar You should upgrade, this is fixed in 5.3.1.
@jasontedor 5.3.1 are incompatible with my services already in production on AWS, theres some workaround or fix for it?
I using fedora 26 too... @jay-dihenkar u found some fix for it?
@aymone You can disable the cgroup version 2 hierarchy on your system, otherwise you have to upgrade.
@jasontedor do you know how to do it?
I can't update my ES, same as @aymone , because services depend on this version.
Any workarounds? How one can disable cgroup hierarchy and what does it mean / what side-effects can it have?
I have on Ubuntu 17.10:
12:rdma:/
11:pids:/user.slice/user-0.slice/session-3.scope
10:blkio:/user.slice/user-0.slice/session-3.scope
9:hugetlb:/
8:net_cls,net_prio:/
7:perf_event:/
6:devices:/user.slice/user-0.slice/session-3.scope
5:memory:/user.slice/user-0.slice/session-3.scope
4:cpu,cpuacct:/user.slice/user-0.slice/session-3.scope
3:cpuset:/
2:freezer:/
1:name=systemd:/user.slice/user-0.slice/session-3.scope
0::/user.slice/user-0.slice/session-3.scope
@aymone @KrzysztofMadejski Please poke around in documentation and the web for that, that is a general Linux issue, not an Elasticsearch issue.
@aymone I've cherrypicked https://github.com/elastic/elasticsearch/commit/ae6331f27e9237c0fbdf2a7a175026fbf91fccd7 into 5.2.2 tag, resolved conflicts, compiled from source and it seems to work.
Run the gradle as gradle assemble -Dbuild.snapshot=false
@jasontedor it would be good to add "wontfix" label here.
What do you mean? It is fixed in #23493 released in 5.3.1.
The bug report is against version 5.2.2 so I see it as "won't fix" for branch 5.2.x. Such notion makes sense to me because minor versions may introduce backwards incompatible changes (5.3 does) so an upgrade is not straightforward operation if you have ES in production.
The other issue which is troubling me more is why you introduce backwards incompatible changes in minor versions, which is contrary to the declaration at https://www.elastic.co/support/eol. But for clarity let's put it into another issue.
I understand where you鈥檙e coming from, but our maintenance policy is very clear (when 5.3.0 is released, 5.2 sees no more releases) and all the information needed to determine what version this is fixed in is already available.
But for clarity let's put it into another issue.
Please do.
Most helpful comment
Okay, I wanted to ensure it was only because the cgroup version 2 hierarchy was mistakenly accounted for and that's exactly what is happening here. I opened #23493.