Elasticsearch: ElasticSearch 1.6 fails to start with an empty/missing fstab

Created on 3 Jul 2015 · 27Comments · Source: elastic/elasticsearch

Hi,

I'm running ES in a FreeBSD jail, so there is no fstab and no mount point visible. And since I upgraded ES to version 1.6, it no longer start because it fails to obtain a lock because "Mount point not found in fstab" (file permissions are ok).

[2015-07-03 12:22:10,088][INFO ][node                     ] [Awesome Android] version[1.6.0], pid[42071], build[cdd3ac4/2015-06-09T13:36:34Z]
[2015-07-03 12:22:10,088][INFO ][node                     ] [Awesome Android] initializing ...
[2015-07-03 12:22:10,092][INFO ][plugins                  ] [Awesome Android] loaded [], sites []
[2015-07-03 12:22:10,133][ERROR][bootstrap                ] Exception
org.elasticsearch.ElasticsearchIllegalStateException: Failed to obtain node lock, is the following location writable?: [/var/db/elasticsearch/mon]
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:158)
        at org.elasticsearch.node.internal.InternalNode.<init>(InternalNode.java:162)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:77)
        at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:245)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Caused by: java.io.IOException: failed to obtain lock on /var/db/elasticsearch/mon/nodes/49
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:145)
        ... 5 more
Caused by: java.io.IOException: Mount point not found in fstab
        at sun.nio.fs.BsdFileStore.findMountEntry(BsdFileStore.java:86)
        at sun.nio.fs.UnixFileStore.<init>(UnixFileStore.java:65)
        at sun.nio.fs.BsdFileStore.<init>(BsdFileStore.java:40)
        at sun.nio.fs.BsdFileSystemProvider.getFileStore(BsdFileSystemProvider.java:53)
        at sun.nio.fs.BsdFileSystemProvider.getFileStore(BsdFileSystemProvider.java:37)
        at sun.nio.fs.UnixFileSystemProvider.getFileStore(UnixFileSystemProvider.java:368)
        at java.nio.file.Files.getFileStore(Files.java:1413)
        at org.elasticsearch.env.NodeEnvironment.getFileStore(NodeEnvironment.java:256)
        at org.elasticsearch.env.NodeEnvironment.access$000(NodeEnvironment.java:62)
        at org.elasticsearch.env.NodeEnvironment$NodePath.<init>(NodeEnvironment.java:75)
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:134)
        ... 5 more

Why does it even need to access /etc/fstab just to obtain a lock?

:CorInfrCore

Source

peikk0

Most helpful comment

Moving the directory I chroot, creating an empty dir where it was and mounting the new location to the old with mount -o bind, I can get elasticsearch to start. Tell me again why you need elasticsearch's root to be its own filesystem? Surely if the rest of the world can lock files without it, elasticsearch can manage? Why add so much magic when it will only break things?

remram44 on 9 Mar 2017

👍2

All 27 comments

It looks up the type of filesystem I think to determine whether it is an SSD or not, as that changes the default number of merge threads.

@mikemccand any ideas here?

clintongormley on 5 Jul 2015

For 1.x we use this for for diagnostics (log filesystem type, mount point, free space for each path.data on node init) and from JmxFsProbe (pulling "fs" node stats). This was done in #10502 and #10527 ...

In 2.0 we also log the spins detection (SSD or not). Currently ES does not default merge schedule defaults according to spins; rather, we always use aggressive settings (more than one merge thread). I'm not sure we should change to Lucene's defaults... that could be a sudden change on ES users upgrading.

Before, JmxFsProbe would only call Files.getFileStore API when you asked for fs stats (and sigar wasn't used), so you wouldn't hit this unless you pulled fs stats w/o sigar, but now we cache the FileStore on init instead.

mikemccand on 6 Jul 2015

@mikemccand so what should we do in this case (freebsd jail) where there is not fstab?

clintongormley on 7 Jul 2015

@peikk0 Can you configure your jail to include an fstab? I don't have much experience with jails in FreeBSD but on some quick googling it seems like this is possible, e.g. https://forums.freebsd.org/threads/jail-conf.34741/

mikemccand on 7 Jul 2015

Those fstab are used by the host and are not visible inside the jail. Jails are not allowed to mount anything by default anyway and don't have access to devices, or else it could compromise the host and other jails.

Anyways, even if I create a fake fstab in the jail, it still won't start:

yavin ~ # mount
corellia/usr/jails/mon on / (zfs, local, noatime, nfsv4acls)
yavin ~ # cat /etc/fstab
corellia/usr/jails/mon / zfs noauto 0 0
yavin ~ # service elasticsearch console
[2015-07-07 17:19:02,750][INFO ][node                     ] [Mantra] version[1.6.0], pid[3165], build[cdd3ac4/2015-06-09T13:36:34Z]
[2015-07-07 17:19:02,750][INFO ][node                     ] [Mantra] initializing ...
[2015-07-07 17:19:02,754][INFO ][plugins                  ] [Mantra] loaded [], sites []
{1.6.0}: Initialization Failed ...
- ElasticsearchIllegalStateException[Failed to obtain node lock, is the following location writable?: [/var/db/elasticsearch/mon]]
        IOException[failed to obtain lock on /var/db/elasticsearch/mon/nodes/49]
                IOException[Mount point not found in fstab]

IMO, a proper fix would be to catch the exception and use a safe default in this case.

peikk0 on 7 Jul 2015

I don't think thats necessarily safe. Its abnormal that you cannot retrieve information from the filestore. I don't think we should hide the error condition.

rmuir on 7 Jul 2015

also keep in mind, I think the filestore is used to know the amount of disk space.

rmuir on 7 Jul 2015

/etc/fstab is not a reliable source anyway, it defines what should be mounted, but not what actually is. How about simply using the output of mount? Or something equivalent (no /proc/mounts on FreeBSD either, or /proc/ at all).

peikk0 on 7 Jul 2015

Those are issues to take up with the bsd port of the openjdk IMO. we are just using the only way in java to do it, and thats to call Files.getFileStore

rmuir on 7 Jul 2015

It sounds like this configuration can't be supported - we need access to this info, and we rely on Java to provide it.

Closing

clintongormley on 10 Jul 2015

I just ran into this problem – it’s exceedingly unhelpful behaviour for those of us wishing to jail elasticsearch.

djneades on 9 Oct 2015

You need to open issues with oracle about it. There is nothing we can do.

rmuir on 9 Oct 2015

I've read it works fine with OpenJDK 8, I haven't tried it yet though.

peikk0 on 9 Oct 2015

@peikk0, thank you for the pointer. Unfortunately, that doesn’t seem to be the case, at least in my configuration (I have a data directory nullfs mounted into the jail’s directory hierarchy). I have OpenJDK8 installed (and uninstalled OpenJDK7 just to make sure that JDK 8 is being used), but I still encounter the problem.

djneades on 9 Oct 2015

Having investigated this a little further, it seems that setting the jail’s enforce_statfs property to 1 allows Elasticsearch to obtain the information it needs (at least when running with OpenJDK 8). From the jail (8) man page:

             This determines what information processes in a jail are able to
             get about mount points.  It affects the behaviour of the follow‐
             ing syscalls: statfs(2), fstatfs(2), getfsstat(2), and
             fhstatfs(2) (as well as similar compatibility syscalls).  When
             set to 0, all mount points are available without any restric‐
             tions.  When set to 1, only mount points below the jail's chroot
             directory are visible.  In addition to that, the path to the
             jail's chroot directory is removed from the front of their path‐
             names.  When set to 2 (default), above syscalls can operate only
             on a mount-point where the jail's chroot directory is located.

Perhaps this information will be useful to anyone else who encounters this issue.

djneades on 9 Oct 2015

Good catch! I just tried it and it works with OpenJDK 7 too!

peikk0 on 9 Oct 2015

Nice solution! I think we should return this information in the error if we hit exception trying to pull the filestores on freebsd. I will take care of it.

rmuir on 9 Oct 2015

@peikk0, thank you for the confirmation, that’s good to know.

@rmuir, an informational message would no doubt be very helpful, good call!

djneades on 9 Oct 2015

I have experienced the same problem on Linux (so no jails, no chroot) when the path.data is in a BTRFS subvolume that is not mounted.
Actually I think that it is more a java issue, but I wonder if there is any workaround for btrfs subvolumes too... And, yes, there is one: mounting the subvolumes explicitly solved the issue for me.

marcoc610 on 30 Oct 2015

This is still an issue on Linux.

hydrapolic on 19 Apr 2016

I am running into this on Linux. I don't think it is right for elasticsearch to go through that much magic to determine if locking should work. At the very least, I'd expect it to at least try to lock rather than exiting with an error "Failed to obtain node lock" although the conditions are right.

remram44 on 9 Mar 2017

@remram44 are you running this on linux with no virtualization? without an /etc/fstab file? Maybe you can give us more information about your setup.

dakrone on 9 Mar 2017

I'm running from a chroot, so there is no entry in /proc/mounts for /. Adding some fstab with fake info gives out ElasticsearchIllegalStateException[Failed to obtain node lock, is the following location writable?: [/var/lib/elasticsearch/elasticsearch]]

remram44 on 9 Mar 2017