I've been evaluating journalbeat to send data to logstash, and journalbeat stops sending data daily, which I am assuming is tied to when journald is rotating journals. A quick restart of journalbeat gets data sending again.
My journald storage mode is set to auto, and I've tried it both with setting up the persistent storage, and without. I've also tried it where I specified the journal location in journalbeat and without.
Also seeing same issue with
I'm able to reproduce using master with https://github.com/elastic/beats/commit/952d92272839248fe8a2cb98dc265ae8c608b674 as the newest commit.
This has been causing us lots of issues since we are processing around 10 million records a day. The "workaround" is to restart journalbeat every 5 minutes.
Possibly related to this: https://github.com/coreos/go-systemd/issues/233
If that's the case, they were waiting on the 1.11 release of golang earlier this year and they may be able to merge their pending fix. I bumped the upstream maintainers just to see.
Currently testing this as a workaround, installed in /etc/cron.hourly/journalbeat-rotate:
#!/bin/sh
set -ue
SYSTEM_JOURNAL_FILE_ID=$(journalctl --file /var/log/journal/*/system.journal --header | awk -F ': ' '/File ID/ { print $2 }')
if [ ! -e /var/lib/journalbeat/system-journal.id ] || [ "$SYSTEM_JOURNAL_FILE_ID" != "$(cat /var/lib/journalbeat/system-journal.id)" ]; then
systemctl try-restart journalbeat && echo "$SYSTEM_JOURNAL_FILE_ID" > /var/lib/journalbeat/system-journal.id
fi
After testing journalbeat on several systemd versions, it turns out that it is not a journalbeat bug. The beat was affected by the same issue as journalctl as it uses the same C API: https://github.com/systemd/systemd/issues/474
This issue is fixed and released in v233. So currently journalbeat requires at least systemd v233.
We are experiencing this or some other issue that causes journalbeat to stop sending logs on Ubuntu 18.04 which uses systemd v237. Like the other posts, restarting journalbeat causes it to move along again.
Same issue here with systemd v238. Restarting journalbeat "solves" the issue.
@JustinHead @mickymiek Do you know what triggers the stopping of shipping logs (e.g. rotate, vacuum)?
I would like to track down whether the issue you are facing is due to a bug in the C API or in Journalbeat.
Hello @kvch ,
@mickymiek and I are now investigating the case. We believe it could be related to rotate but have no evidence yet.
During investigations, and probably after deleting journal files, then running journalctl --vacuum-time=1seconds, we started to see tons of [input] input/input.go:167 Error while reading event: error while waiting for event: operation not permitted in logs due to inotify_init1(O_NONBLOCK|O_CLOEXEC) = -1 EPERM (Operation not permitted)
This might be due to incorrect seccomp policy. Could you please try to turn seccomp policy to test this theory?
seccomp.enabled: false
Yes, this is related to seccomp, and disabling it workaround that permission issue.
BTW, we're running CoreOS, same case than https://github.com/elastic/beats/pull/10605 so probably covered by your https://github.com/elastic/beats/pull/10593
We're now monitoring it, to make sure the "stop reading journals" is really related tot hat seccomp issue.
I'm experiencing this issue. I'm running ubuntu 18.04.2 (systemd v237).
Could the issue be that the journalbeat binaries are built on a debian stretch image, and stretch has libsystemd v232? I wonder if it might be possible to use libsystemd from backports (which is v239) to get the fixed version?
馃憢, we faced a similar issue (operation not permitted) on CoreOS Systemd (we reported https://github.com/elastic/beats/pull/10605) and found the result was because the seccomp policy was missing ppoll (confirmed using strace).
We've been running with https://github.com/elastic/beats/pull/10593 which fixes the issue 馃檶
@kvch confirmed, no more issue since we disabled seccomp. Gonna go with the master branch since #10593 got merged. Thx!
I'm having the same trouble with Journalbeat 6.6.1 on Ubuntu 16.04. Setting "seccomp.enabled: false" has made no difference for me unfortunately. The debug logs show nothing useful, but the journal timestamps indicate that this is happening whenever the journal is rotated.
What do you mean by same issue? In this issue, two problem is reported.
What is your systemd version? If it is v232 or below, journalbeat fails to read entries after rotation, because of a bug in systemd.
If you are seeing operation not permitted errors, that is due to a bug in journalbeat, which is going to be fixed in the next release. As a workaround you can set seccomp.enabled to false.
@kvch the issue being that Journalbeat stops shipping data. Ubuntu 16.04 uses systemd 229 and a future backport of that fix will be unlikely in an LTS release. Will #10593 help me with this?
Unfortunately, not. The systemd API call we are using right now has the bug. But in a future release, we would like to refactor the code to remove this dependency.
The same here for centos/RHEL 7.6. Is there a related open issue about that ?
I have opened an issue to track the two problems in different issues: https://github.com/elastic/beats/issues/11110
Unfortunately, not. The systemd API call we are using right now has the bug. But in a future release, we would like to refactor the code to remove this dependency.
Sorry, but my question was about the bug in API systemd <=v232 using by journalbeat.
Centos/RHEL 7.6 runs systemd v219. Is there a chance to see a fix in a near future ?
This is the issue to track that problem. It is definitely something we want to fix in an upcoming release.
So, its status shouldn't be closed.
Reopened.
It looks like that: https://github.com/coreos/go-systemd/pull/279 is fixing this problem.
I re-compiled journald with this patch applied and now there are no problems when systemd journal is rotated. Test is running ~2days and all messages were processed.
No other regressions were found during this testing.
Tested system is Ubuntu 16.04 (xenial) and Go is 1.11.
How did you apply the patch? We use github.com/coreos/go-systemd/sdjournal not github.com/coreos/go-systemd/journal. Did you copy the latter package also?
In the meantime I am looking into it.
Also, what's your systemd version?
@kvch I was just replaced whole go-systemd directory with newer version.
You can try one of my docker image:
kvaps/journalbeat:7.0.0-fix9533
kvaps/journalbeat:8.0.0-fix9533
Agree. This fix is working fine for me!
what's your systemd version?
- systemd 229 (xenial)
- systemd 232 (stretch)
- systemd 237 (bionic)
- systemd 241 (stretch)
go version go1.11.5 linux/amd64
journalbeat version 8.0.0 (amd64), libbeat 8.0.0 [258c1c8419ec945967f8a9e8e9d92e166f0bb3f7 built 2019-03-19 01:41:30 +0000 UTC]
Also, what's your systemd version?
systemd version is:
229-4ubuntu21.17 (xenial)
I applied mentioned diff to go-systemd (master) and compiled journalbeat (master) with it's dependencies.
I don't have exact steps to reproduce it now.
"system_info": {
"build": {
"commit": "005bb608b010fa777db527738ad27b5f40086986",
"libbeat": "8.0.0",
"time": "2019-03-16T22:51:04.000Z",
"version": "8.0.0"
}
}
When will this be available through the official elastic repo?
@liqw33d @kvaps Thank you for verifying the fix!
@vquiering Once the fix is released in a new version of github.com/coreos/go-systemd, I am updating the vendored folder.
I re-compiled journald with this patch applied and now there are no problems when systemd journal is rotated. Test is running ~2days and all messages were processed.
Does it mean the fix consists to recompile journald ? The patch you mentioned is only related to github.com/coreos/go-systemd/journal (for writing to systemd's logging service, journald), journalbeat uses github.com/coreos/go-systemd/sdjournal (for reading from journald by wrapping its C API). Please, can you clarify ?
@hamelg yes it is, you just need to recompile journalbeat with updated go-systemd
yes it is, you just need to recompile journalbeat with updated go-systemd
@kvaps I feel a little confused. Why it can work when updating the package that has not been depended by journalbeat?
This is what I am investigating. My hunch is that the init function of that package connects to the FD of systemd journal, thus it lets Journalbeat read when and what changes are being made to the journal it is reading from.
I did more investigation on the matter. It turns out that the problem was fixed in Journalbeat not in its dependency github.com/coreos/go-systemd/sdjournal. The upcoming 6.7 and 7.0.0-rc release works with systemd v232.
Do you mind verifying my findings once the release is available?
The upcoming 6.7 and 7.0.0-rc release works with
systemd v232.
Take the liberty to ask, when is the 6.7 and 7.0.0-rc release date ?
6.7 has just been released.
the release of 6.7 works fine for me on ubuntu 16.04 with systemd version 229. But I found some erros as follows:
2019-03-27T17:52:36.001+0800 ERROR [input] input/input.go:167 Error while reading event: failed to get realtime timestamp: 99 {"id": "a6a84c06-f11d-4f37-9d8f-3251d731d235"}
Does anyone have ideas ? My feeling is that it happened when rotating.
I confirm 6.7 has fixed the issue on centos 7.6 with systemd version 219.
But I found some erros as follows:
2019-03-27T17:52:36.001+0800 ERROR [input] input/input.go:167 Error while reading event: failed to get realtime timestamp: 99 {"id": "a6a84c06-f11d-4f37-9d8f-3251d731d235"}
The same here, I see this error each time the journal is rotating.
@hamelg Do you mind opening a separate issue for that? Thanks for the confirmation.
Fixed in 6.7 and on master.
Most helpful comment
It looks like that: https://github.com/coreos/go-systemd/pull/279 is fixing this problem.
I re-compiled journald with this patch applied and now there are no problems when systemd journal is rotated. Test is running ~2days and all messages were processed.
No other regressions were found during this testing.
Tested system is Ubuntu 16.04 (xenial) and Go is 1.11.