Elasticsearch: java.lang.SecurityException: access denied ("java.lang.reflect.ReflectPermission" "suppressAccessChecks") for :Plugin Repository HDFS

Created on 5 Sep 2017 · 27Comments · Source: elastic/elasticsearch

Elasticsearch version (bin/elasticsearch --version):
Version: 5.5.2, Build: b2f0c09/2017-08-14T12:33:14.154Z, JVM: 1.8.0_121

Plugins installed:

repository-hdfs
x-pack

JVM version (java -version):
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-tdc1-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

OS version (uname -a if on a Unix-like system):
Linux HOSTNAME 3.0.101-0.113.TDC.1.R.0-default #1 SMP Fri Dec 9 04:51:20 PST 2016 (ca32437) x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
Listing snapshots for the repository-hdfs plugin snapshot repository should work, but instead there is a java.lang.SecurityException: access denied ("java.lang.reflect.ReflectPermission" "suppressAccessChecks") from the JVM security manager.

Steps to reproduce:

Install Elasticsearch
Install repository-hdfs plugin
Create Elasticsearch snapshot repository pointing to HDFS
Try to list snapshots from that repository (curl -i -u USERNAME -H 'Accept: application/json' -H 'Content-Type: application/json' "https://$(hostname -f):9200/_snapshot/CLUSTERNAME/_all?pretty")

Provide logs (if relevant):
Stacktrace from missing security policy permission

...

# first call to curl -i -u USERNAME -H 'Accept: application/json' -H 'Content-Type: application/json' "https://$(hostname -f):9200/_snapshot/CLUSTERNAME/_all?pretty"

org.elasticsearch.transport.RemoteTransportException: [master-HOSTNAME][IP_ADDRESS:9301][cluster:admin/snapshot/get]
Caused by: java.lang.SecurityException: access denied ("java.lang.reflect.ReflectPermission" "suppressAccessChecks")
        at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) ~[?:1.8.0_121]
        at java.security.AccessController.checkPermission(AccessController.java:884) ~[?:1.8.0_121]
        at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) ~[?:1.8.0_121]
        at java.lang.reflect.AccessibleObject.setAccessible(AccessibleObject.java:128) ~[?:1.8.0_121]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:396) ~[?:?]
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) ~[?:?]
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) ~[?:?]
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[?:?]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335) ~[?:?]
        at com.sun.proxy.$Proxy34.getServerDefaults(Unknown Source) ~[?:?]
        at org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:640) ~[?:?]
        at org.apache.hadoop.hdfs.DFSClient.shouldEncryptData(DFSClient.java:1755) ~[?:?]
        at org.apache.hadoop.hdfs.DFSClient.newDataEncryptionKey(DFSClient.java:1761) ~[?:?]
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:210) ~[?:?]
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160) ~[?:?]
        at org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:581) ~[?:?]
        at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2933) ~[?:?]
        at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:815) ~[?:?]
        at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:740) ~[?:?]
        at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:385) ~[?:?]
        at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:706) ~[?:?]
        at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:647) ~[?:?]
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:918) ~[?:?]
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:974) ~[?:?]
        at java.io.DataInputStream.read(DataInputStream.java:100) ~[?:1.8.0_121]
        at org.elasticsearch.common.io.Streams.copy(Streams.java:79) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.common.io.Streams.copy(Streams.java:60) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getRepositoryData(BlobStoreRepository.java:762) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.snapshots.SnapshotsService.getRepositoryData(SnapshotsService.java:140) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:97) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:55) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:87) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:166) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.5.2.jar:5.5.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

...

# second call to curl -i -u USERNAME -H 'Accept: application/json' -H 'Content-Type: application/json' "https://$(hostname -f):9200/_snapshot/CLUSTERNAME/_all?pretty"

[2017-09-01T14:08:10,615][WARN ][r.suppressed             ] path: /_snapshot/CLUSTERNAME/_all, params: {pretty=, repository=CLUSTERNAME, snapshot=_all}
org.elasticsearch.transport.RemoteTransportException: [master-HOSTNAME][IP_ADDRESS:9301][cluster:admin/snapshot/get]
Caused by: java.lang.IllegalStateException
        at com.google.common.base.Preconditions.checkState(Preconditions.java:129) ~[?:?]
        at org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:116) ~[?:?]
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:160) ~[?:?]
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) ~[?:?]
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[?:?]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335) ~[?:?]
        at com.sun.proxy.$Proxy34.getListing(Unknown Source) ~[?:?]
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1681) ~[?:?]
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1665) ~[?:?]
        at org.apache.hadoop.fs.Hdfs.listStatus(Hdfs.java:257) ~[?:?]
        at org.apache.hadoop.fs.FileContext$Util$1.next(FileContext.java:1806) ~[?:?]
        at org.apache.hadoop.fs.FileContext$Util$1.next(FileContext.java:1802) ~[?:?]
        at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) ~[?:?]
        at org.apache.hadoop.fs.FileContext$Util.listStatus(FileContext.java:1808) ~[?:?]
        at org.apache.hadoop.fs.FileContext$Util.listStatus(FileContext.java:1767) ~[?:?]
        at org.apache.hadoop.fs.FileContext$Util.listStatus(FileContext.java:1726) ~[?:?]
        at org.elasticsearch.repositories.hdfs.HdfsBlobContainer$6.run(HdfsBlobContainer.java:145) ~[?:?]
        at org.elasticsearch.repositories.hdfs.HdfsBlobContainer$6.run(HdfsBlobContainer.java:142) ~[?:?]
        at org.elasticsearch.repositories.hdfs.HdfsBlobStore.lambda$execute$0(HdfsBlobStore.java:132) ~[?:?]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_121]
        at java.security.AccessController.doPrivileged(AccessController.java:713) ~[?:1.8.0_121]
        at org.elasticsearch.repositories.hdfs.HdfsBlobStore.execute(HdfsBlobStore.java:129) ~[?:?]
        at org.elasticsearch.repositories.hdfs.HdfsBlobContainer.listBlobsByPrefix(HdfsBlobContainer.java:142) ~[?:?]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.listBlobsToGetLatestIndexId(BlobStoreRepository.java:930) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.latestIndexBlobId(BlobStoreRepository.java:908) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getRepositoryData(BlobStoreRepository.java:746) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.snapshots.SnapshotsService.getRepositoryData(SnapshotsService.java:140) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:97) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:55) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:87) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:166) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.5.2.jar:5.5.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.5.2.jar:5.5.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

:DistributeSnapshoRestore >bug v5.5.4 v5.6.2

Source

risdenk

All 27 comments

I was able to fix this with the following:

# Fix for security permission error
cat /home/elasticsearch/.java.policy
grant {
  permission java.lang.reflect.ReflectPermission "suppressAccessChecks";
};

risdenk on 5 Sep 2017

👍1

Interestingly the repository-hdfs plugin already has a Java security policy file that requests suppressAccessChecks permissions. https://github.com/elastic/elasticsearch/blob/master/plugins/repository-hdfs/src/main/plugin-metadata/plugin-security.policy#L26 It looks like from the stacktrace that the error doesn't go through any of the repository-hdfs classes however.

risdenk on 5 Sep 2017

As you point out, the problem here is that the call is coming from core where we do not and will not grant suppressAccessChecks.

jasontedor on 5 Sep 2017

@jbaiera Would you take a look please?

jasontedor on 5 Sep 2017

@jasontedor yea I agree that adding suppressAccessChecks to the whole JVM is not preferred but didn't have a good way to work around it in the short term. We are currently moving from 2.x to 5.x for Elasticsearch and this was a pain point on Friday. Since we are in a development environment, I can provide/test other things if necessary.

risdenk on 5 Sep 2017

If I were to guess, the bulk of this issue comes from trying to get server defaults for the encryption introduced in Hadoop.

        at org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:640) ~[?:?]
        at org.apache.hadoop.hdfs.DFSClient.shouldEncryptData(DFSClient.java:1755) ~[?:?]
        at org.apache.hadoop.hdfs.DFSClient.newDataEncryptionKey(DFSClient.java:1761) ~[?:?]

https://github.com/apache/hadoop/blob/branch-2.8.1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java#L1760

I didn't see a quick way to short circuit this process by adding a Hadoop conf setting saying that encryption is either enabled or not. It looks like this code has to go to the Namenode.

I was going to try webhdfs:// instead of hdfs:// but saw that the repository-hdfs plugin explicitly checks for the scheme to be hdfs://.

https://github.com/elastic/elasticsearch/blob/master/plugins/repository-hdfs/src/main/java/org/elasticsearch/repositories/hdfs/HdfsRepository.java#L87

Some of these issues might go away if the plugin were to allow webhdfs:// as an option since that is over HTTP instead of direct RPC calls. I saw #24455 is something related to that idea.

risdenk on 6 Sep 2017

Indeed the problem seems to actually be here: https://github.com/apache/hadoop/blob/branch-2.8.1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java#L640

      serverDefaults = namenode.getServerDefaults();

So it's trying to do reflection in namenode something that we do not allow to do that? I say this from

at com.sun.proxy.$Proxy34.getServerDefaults(Unknown Source) ~[?:?]

Wonder if we could use the dfs.trustedchannel.resolver.class setting to force this to be bypassed: https://github.com/apache/hadoop/blob/branch-2.8.1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferClient.java#L206 ?

gmoskovicz on 6 Sep 2017

As you point out, the problem here is that the call is coming from core where we do not and will not grant suppressAccessChecks.

This is correct. Since there's a core utility in the stack trace (org.elasticsearch.common.io.Streams.copy) it's tripping up the security manager. What should really be happening is the stream should be wrapped in something that does the HDFS calls with privileges. Took a quick look and found that this seems to already be done in master with https://github.com/elastic/elasticsearch/pull/22793. Perhaps this just needs to be backported to the 5.x branches.

jbaiera on 7 Sep 2017

22793 looks like it could help. Elasticsearch 6.0 beta 2 would have that change right? I should be able to try it to see if it fixes this issue. If it works then would be great to see it backported to 5.x.

risdenk on 7 Sep 2017

It looks like #22793 is the only non test code change to repository-hdfs that isn't in the 5.x line.

risdenk on 7 Sep 2017

There's a reason that #22793 is not in 5.6. That change was predicated on removing socket permissions from core which was only done in 6.0+. Thus, we can not simply backport that change even if it picked cleanly (which it doesn't), it was engineered with different assumptions than are present in 5.6. It's plausible that some of the code in #22793 will help address the issue here but instead we have to:

have a failing test case on 5.6
pull only the relevant changes from #22793 into 5.6
prepare a new patch for 5.6 that goes through code review

jasontedor on 7 Sep 2017

👍1

So I found the case where this can happen. I'm trying to boil it down to the simplest steps. From what I've gathered so far:

requires a readonly HDFS repository to be created and point to a location that isn't empty
requires that no other HDFS repository operations have been done already (like take a snapshot)
requires trying to list the readonly HDFS repository

For reference my test case is here:

I've haven't had a chance to see if this is fixed in 6.x or master yet. I've been focusing on trying to recreate in a standalone test case with 5.5.2.

risdenk on 9 Sep 2017

Thank you so much @risdenk, that's awesome effort and much appreciated. @jbaiera will work closely with you on this starting next week. And 5.5.2 is definitely the right place to focus efforts (or the 5.6 branch is fine too).

jasontedor on 9 Sep 2017

The code path differs for a readonly repository in this method:

https://github.com/elastic/elasticsearch/blob/5.5/core/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java#L895

Non Read Only Path

The BlobStoreRepository.getRepositoryData() method calls BlobStoreRepository.latestIndexBlobId() which in in the non readonly case will eventually call the DFSClient.getServerDefaults() method with the correct filecontext through HdfsBlobContainer. This eventually filters down to the correct HDFS blobstore implementation with doPrivileged. The DFSClient.getServerDefaults() is then cached for 60 minutes so the next call without the file context from Streams.copy works.

Failure Read Only Path

The BlobStoreRepository.getRepositoryData() method calls BlobStoreRepository.latestIndexBlobId() which in in the readonly case doesn't call out to HDFS it looks like. This then means no cached server detaults in DFSClient.getServerDefaults() and so the next call without the file context from Streams.copy fails to work.

risdenk on 9 Sep 2017

Side note: interestingly travis ci doesn't seem to fail where as locally on my Mac it does. All the testing is in Docker so not sure what the difference is right now.

risdenk on 9 Sep 2017

For reference, I switched the base Elasticsearch image to 6.0.0-beta2 and could not reproduce the same error. The HDFS repository lists correctly even with readonly.

risdenk on 9 Sep 2017

@risdenk Thanks for all the work toward distilling the issue down to a few simple steps. This is incredibly helpful information. I'll start working toward getting a failing test built and then we'll know where to go from there.

jbaiera on 12 Sep 2017

@jbaiera - Glad the information is helpful.

I finally got the gradle integTestSecure working and even after editing the secure_hdfs_repository REST test yaml files I wasn't able to reproduce the problem. I can logically see how the code path I'm hitting in my environment is hit but seem to have a hard time producing the issue in the test suite.

I ran into a few problems with Vagrant 2.0 and on a Mac the vagrant test fixture just stops. On a linux box the test suite works ok.

risdenk on 12 Sep 2017

@risdenk I wanted to check in and share what I've got on this: I also was running into issues with reproducing this via the rest integration test tooling. I decided to try and reproduce by standing up the local hdfs fixture with a local elasticsearch instance and re-tracing your provided steps. This did indeed reproduce the error you've described.

This raises the question why it doesn't cleanly reproduce in the rest integration test framework. Diving deeper uncovered what looks like two unrelated bugs (one with the repository, and one potentially in the HDFS client) but hasn't yielded any answers for why it doesn't reproduce. I'll update back here with more info as it's available.

jbaiera on 12 Sep 2017

@jbaiera Glad I'm not crazy it doesn't reproduce in the integration test but it does if you stand up a cluster. Last night I was able to get it to reproduce once in the rest integration test framework.

The branch for this is here: https://github.com/risdenk/elasticsearch/tree/test_readonly_repository

I created this from the 5.5 branch. https://github.com/elastic/elasticsearch/compare/5.5...risdenk:test_readonly_repository

risdenk on 13 Sep 2017

From that branch I was able to recreate the access denied with gradle integTestSecure from the plugins/repository-hdfs directory.

risdenk on 13 Sep 2017

Checking in again on this since it's been the parent of a few real brain teasers:

I was able to get the rest test framework to reproduce the error, but only when the test case that reproduces it is run relatively early in the suite. This is a problem since our testing suite has no real sense of ordering (The test spec names with the leading numbers are a bit misleading). I found that one of the reasons that the integration test passes when the test case is run later is because the suppress access check permissions are being skipped since the methods in question have already been set accessible by privileged code in previous test cases.

That being said, since there is at least a test that reproduces it now, I'll just get a PR up for fixing it soon. We'll have to just accept that the ordering of the tests matters for reproducing it since the state of the HDFS Client doesn't give us much choice.

jbaiera on 15 Sep 2017

This is a problem since our testing suite has no real sense of ordering (The test spec names with the leading numbers are a bit misleading).

Ah that explains a lot! I couldn't figure out why the integration tests passed when I explicitly set the order with the leading numbers.

Do the tests run in parallel as well? If each test setup and tore down the repository there shouldn't be any overlap since the HDFS client is per repository.

risdenk on 16 Sep 2017

The overlap in this scenario comes from all the tests executing on the same JVM and class instances. By the time the reproducing test case runs, previous test cases in the suite have already set the methods in question to be accessible from privileged callsites. When the reproducing test case attempts to run, the methods that it would normally try to set accessible are already accessible. The code skips trying to set them, and thusly avoids the security exceptions.

That said, there are still some other discrepancies between running the rest test suite and running against a local cluster that I haven't pinned down yet, but my hunch so far has been that the differences are more likely caused by how rapid the calls are made than the substance of the calls themselves. The HDFS client is pretty sensitive when a security manager is installed.

jbaiera on 18 Sep 2017

After getting the requisite vagrant testing support commits backported to the 5.x branches and shoring up the last of the changes needed, I've opened #26714 for this.

jbaiera on 19 Sep 2017

Thanks @jbaiera!

risdenk on 19 Sep 2017

I have merged #26714 and backported to the 5.5.x branch. This should make it into a release soon.

jbaiera on 21 Sep 2017

Was this page helpful?

0 / 5 - 0 ratings