Logstash: Add troubleshooting documentation for very slow startup times due to lack of entropy

Created on 25 Oct 2016 · 16Comments · Source: elastic/logstash

The lack of entropy in /dev/random can make logstash (or even any other java/jruby software?) startup time be higher than 5 minutes.

Example issue here where starting logstash took 10 minutes: https://github.com/elastic/logstash/issues/6114

JRuby's wiki has a section on this issue https://github.com/jruby/jruby/wiki/Improving-startup-time#ensure-your-system-has-adequate-entropy

We should somehow address this in a troubleshooting section, either:

1) making the user aware that the problem exists and other software could be causing this by draining /dev/random
2) suggest software that generates entropy for /dev/random, therefore working around the draining issue

Usually there should be no issue unless /dev/random is being read from very frequently (for example a ruby filter that reads from it on every event).

bug v5.5.0

Source

jsvd

👍5

Most helpful comment

I hit this on a small VM and fixed it by installing haveged

jakommo on 6 Jan 2017

👍3 🎉1

All 16 comments

Maybe we could also add something on startup that uses nio to try and read from /dev/random, and if we timeout waiting for data, then we abort startup (consider this a health check required for startup to succeed) and log a this reason.

jordansissel on 10 Nov 2016

+1 on the sanity check. For systems that support it, we could simply check /proc/sys/kernel/random/entropy_avail in the bash scripts

jsvd on 10 Nov 2016

👍1

We could add a note for this in "getting started" for now

suyograo on 11 Nov 2016

@suyograo I wonder if it makes sense at this point to start a Troubleshooting container? We could have something like:

Troubleshooting
    Performance Troubleshooting Guide
    Startup Issues

It would be a good place to add other troubleshooting advice. But if you think we'll have a programmatic check for adequate entropy (soonish) we could just add a note to the doc for now, as you suggest.

dedemorton on 15 Nov 2016

I hit this on a small VM and fixed it by installing haveged

jakommo on 6 Jan 2017

👍3 🎉1

As per chat with @jasontedor, we might be able to solve this by switching to /dev/urandom, which should be configurable via securerandom.source=file:/dev/urandom in JAVA_HOME/jre/lib/security/java.security.

jakommo on 6 Jan 2017

👍1

Note that this line will already exist and should be edited from securerandom.source=file:/dev/random to securerandom.source=file:/dev/urandom and note that you can also just add this to JVM options via -Djava.security.egd=file:/dev/urandom. Lastly, this will only help if the underlying issue here is caused by using SecureRandom which defaults to using /dev/random; if someone is misbehaving and gathering randomness directly from /dev/random then there is nothing that we can do other than the suggestion @jakommo already offered (and correcting them to use /dev/urandom).

jasontedor on 6 Jan 2017

changing /dev/random to /dev/urandom is ok in dev, but we should avoid recommending that for production as it's considered a security issue

jsvd on 6 Jan 2017

It's not a security issue, /dev/urandom is cryptographically secure.

jasontedor on 6 Jan 2017

👍1

I agree with @jasontedor that /dev/urandom is good to go on Linux

jaymode on 6 Jan 2017

👍1

Anything but documentation or notes - nobody reads these. A selfcheck plus a message in CAPS on startup will do :-)
P.S. fixed with haveged also

ksiv on 17 Feb 2017

Confirmed this bug today with Logstash v5.2.2 on Ubuntu 16.04. No output appeared when running bin/logstash for 15+ minutes. I attempted to start the process multiple times.

After running sudo apt-get install rng-tools, Logstash started immediately. I don't know how secure this is for use in production. haveged may be a better alternative.

For those who are troubleshooting, you can check the available system entropy by running: cat /proc/sys/kernel/random/entropy_avail on Ubuntu.

More information here: http://serverfault.com/questions/214605/gpg-not-enough-entropy

jhot on 16 Mar 2017

Same problem

mrjameshamilton on 22 Mar 2017

Just wanted to chime in and say this made my first experience with logstash a bit stressful (now clocking 3 hours trying to troubleshoot this).

gtirloni on 28 Mar 2017

👍1

@gtirloni So sorry you had a hard first experience. This is still a bit of a "needle in a haystack" problem that only a relatively small subset of users are experiencing. Some VM environments see it, some don't. We're working out how to document this best, because the problem affects Logstash, but the problem isn't caused by Logstash. We still haven't sorted out the best way forward, and we appreciate your feedback. Sorry again you had a bad experience.

untergeek on 28 Mar 2017

👍2

I'm OK with the solution for this (hopefully) being using urandom on Linux systems.

Implementation details being that we update our default jvm flags to include -Djava.security.egd=file:/dev/urandom and write some tests to verify that we are reading from the correct randomness source during startup.

jordansissel on 28 Mar 2017

🎉1

Was this page helpful?

0 / 5 - 0 ratings