Elasticsearch: ElasticSearch 1.6 fails to start with an empty/missing fstab

Created on 3 Jul 2015  Â·  27Comments  Â·  Source: elastic/elasticsearch

Hi,

I'm running ES in a FreeBSD jail, so there is no fstab and no mount point visible. And since I upgraded ES to version 1.6, it no longer start because it fails to obtain a lock because "Mount point not found in fstab" (file permissions are ok).

[2015-07-03 12:22:10,088][INFO ][node                     ] [Awesome Android] version[1.6.0], pid[42071], build[cdd3ac4/2015-06-09T13:36:34Z]
[2015-07-03 12:22:10,088][INFO ][node                     ] [Awesome Android] initializing ...
[2015-07-03 12:22:10,092][INFO ][plugins                  ] [Awesome Android] loaded [], sites []
[2015-07-03 12:22:10,133][ERROR][bootstrap                ] Exception
org.elasticsearch.ElasticsearchIllegalStateException: Failed to obtain node lock, is the following location writable?: [/var/db/elasticsearch/mon]
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:158)
        at org.elasticsearch.node.internal.InternalNode.<init>(InternalNode.java:162)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:77)
        at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:245)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Caused by: java.io.IOException: failed to obtain lock on /var/db/elasticsearch/mon/nodes/49
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:145)
        ... 5 more
Caused by: java.io.IOException: Mount point not found in fstab
        at sun.nio.fs.BsdFileStore.findMountEntry(BsdFileStore.java:86)
        at sun.nio.fs.UnixFileStore.<init>(UnixFileStore.java:65)
        at sun.nio.fs.BsdFileStore.<init>(BsdFileStore.java:40)
        at sun.nio.fs.BsdFileSystemProvider.getFileStore(BsdFileSystemProvider.java:53)
        at sun.nio.fs.BsdFileSystemProvider.getFileStore(BsdFileSystemProvider.java:37)
        at sun.nio.fs.UnixFileSystemProvider.getFileStore(UnixFileSystemProvider.java:368)
        at java.nio.file.Files.getFileStore(Files.java:1413)
        at org.elasticsearch.env.NodeEnvironment.getFileStore(NodeEnvironment.java:256)
        at org.elasticsearch.env.NodeEnvironment.access$000(NodeEnvironment.java:62)
        at org.elasticsearch.env.NodeEnvironment$NodePath.<init>(NodeEnvironment.java:75)
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:134)
        ... 5 more

Why does it even need to access /etc/fstab just to obtain a lock?

:CorInfrCore

Most helpful comment

Moving the directory I chroot, creating an empty dir where it was and mounting the new location to the old with mount -o bind, I can get elasticsearch to start. Tell me again why you need elasticsearch's root to be its own filesystem? Surely if the rest of the world can lock files without it, elasticsearch can manage? Why add so much magic when it will only break things?

All 27 comments

It looks up the type of filesystem I think to determine whether it is an SSD or not, as that changes the default number of merge threads.

@mikemccand any ideas here?

For 1.x we use this for for diagnostics (log filesystem type, mount point, free space for each path.data on node init) and from JmxFsProbe (pulling "fs" node stats). This was done in #10502 and #10527 ...

In 2.0 we also log the spins detection (SSD or not). Currently ES does not default merge schedule defaults according to spins; rather, we always use aggressive settings (more than one merge thread). I'm not sure we should change to Lucene's defaults... that could be a sudden change on ES users upgrading.

Before, JmxFsProbe would only call Files.getFileStore API when you asked for fs stats (and sigar wasn't used), so you wouldn't hit this unless you pulled fs stats w/o sigar, but now we cache the FileStore on init instead.

@mikemccand so what should we do in this case (freebsd jail) where there is not fstab?

@peikk0 Can you configure your jail to include an fstab? I don't have much experience with jails in FreeBSD but on some quick googling it seems like this is possible, e.g. https://forums.freebsd.org/threads/jail-conf.34741/

Those fstab are used by the host and are not visible inside the jail. Jails are not allowed to mount anything by default anyway and don't have access to devices, or else it could compromise the host and other jails.

Anyways, even if I create a fake fstab in the jail, it still won't start:

yavin ~ # mount
corellia/usr/jails/mon on / (zfs, local, noatime, nfsv4acls)
yavin ~ # cat /etc/fstab
corellia/usr/jails/mon / zfs noauto 0 0
yavin ~ # service elasticsearch console
[2015-07-07 17:19:02,750][INFO ][node                     ] [Mantra] version[1.6.0], pid[3165], build[cdd3ac4/2015-06-09T13:36:34Z]
[2015-07-07 17:19:02,750][INFO ][node                     ] [Mantra] initializing ...
[2015-07-07 17:19:02,754][INFO ][plugins                  ] [Mantra] loaded [], sites []
{1.6.0}: Initialization Failed ...
- ElasticsearchIllegalStateException[Failed to obtain node lock, is the following location writable?: [/var/db/elasticsearch/mon]]
        IOException[failed to obtain lock on /var/db/elasticsearch/mon/nodes/49]
                IOException[Mount point not found in fstab]

IMO, a proper fix would be to catch the exception and use a safe default in this case.

I don't think thats necessarily safe. Its abnormal that you cannot retrieve information from the filestore. I don't think we should hide the error condition.

also keep in mind, I think the filestore is used to know the amount of disk space.

/etc/fstab is not a reliable source anyway, it defines what should be mounted, but not what actually is. How about simply using the output of mount? Or something equivalent (no /proc/mounts on FreeBSD either, or /proc/ at all).

Those are issues to take up with the bsd port of the openjdk IMO. we are just using the only way in java to do it, and thats to call Files.getFileStore

It sounds like this configuration can't be supported - we need access to this info, and we rely on Java to provide it.

Closing

I just ran into this problem – it’s exceedingly unhelpful behaviour for those of us wishing to jail elasticsearch.

You need to open issues with oracle about it. There is nothing we can do.

I've read it works fine with OpenJDK 8, I haven't tried it yet though.

@peikk0, thank you for the pointer. Unfortunately, that doesn’t seem to be the case, at least in my configuration (I have a data directory nullfs mounted into the jail’s directory hierarchy). I have OpenJDK8 installed (and uninstalled OpenJDK7 just to make sure that JDK 8 is being used), but I still encounter the problem.

Having investigated this a little further, it seems that setting the jail’s enforce_statfs property to 1 allows Elasticsearch to obtain the information it needs (at least when running with OpenJDK 8). From the jail (8) man page:

             This determines what information processes in a jail are able to
             get about mount points.  It affects the behaviour of the follow‐
             ing syscalls: statfs(2), fstatfs(2), getfsstat(2), and
             fhstatfs(2) (as well as similar compatibility syscalls).  When
             set to 0, all mount points are available without any restric‐
             tions.  When set to 1, only mount points below the jail's chroot
             directory are visible.  In addition to that, the path to the
             jail's chroot directory is removed from the front of their path‐
             names.  When set to 2 (default), above syscalls can operate only
             on a mount-point where the jail's chroot directory is located.

Perhaps this information will be useful to anyone else who encounters this issue.

Good catch! I just tried it and it works with OpenJDK 7 too!

Nice solution! I think we should return this information in the error if we hit exception trying to pull the filestores on freebsd. I will take care of it.

@peikk0, thank you for the confirmation, that’s good to know.

@rmuir, an informational message would no doubt be very helpful, good call!

I have experienced the same problem on Linux (so no jails, no chroot) when the path.data is in a BTRFS subvolume that is not mounted.
Actually I think that it is more a java issue, but I wonder if there is any workaround for btrfs subvolumes too... And, yes, there is one: mounting the subvolumes explicitly solved the issue for me.

This is still an issue on Linux.

I am running into this on Linux. I don't think it is right for elasticsearch to go through that much magic to determine if locking should work. At the very least, I'd expect it to at least try to lock rather than exiting with an error "Failed to obtain node lock" although the conditions are right.

@remram44 are you running this on linux with no virtualization? without an /etc/fstab file? Maybe you can give us more information about your setup.

I'm running from a chroot, so there is no entry in /proc/mounts for /. Adding some fstab with fake info gives out ElasticsearchIllegalStateException[Failed to obtain node lock, is the following location writable?: [/var/lib/elasticsearch/elasticsearch]]

Moving the directory I chroot, creating an empty dir where it was and mounting the new location to the old with mount -o bind, I can get elasticsearch to start. Tell me again why you need elasticsearch's root to be its own filesystem? Surely if the rest of the world can lock files without it, elasticsearch can manage? Why add so much magic when it will only break things?

Was this ever solved? I'm trying to setup elastic in an iocage with Version: 6.2.4, Build: ccec39f/2018-04-12T20:37:28.497551Z, JVM: 1.8.0_162

The iocage is running directly on an SSD with /var/db/elasticsearch mounted as nullfs from the host on a mechanical drive.

In my world with elasticsearch 7.3.1 I still need to apply above bind-mount workaround from @remram44.

Was this page helpful?
0 / 5 - 0 ratings