We know that in Presto, Hadoop's FileSystem#cache is replaced by PrestoExtendedFileSystemCache, and in PrestoExtendedFileSystemCache, all the FileSytems are wrapped by HadoopExtendedFileSystem:
// PrestoExtendedFileSystemCache.java
@Override
protected FileSystem createPrestoFileSystemWrapper(FileSystem original)
{
return new HadoopExtendedFileSystem(original);
}
The code has not take LocalFileSystem into consideration, many FileSystem implementation utilize LocalFileSystem to implement multipart upload(write) data into the FileSystem. When we get a LocalFileSystem, the type cast will fail:
// FileSystem.java
public static LocalFileSystem getLocal(Configuration conf)
throws IOException {
return (LocalFileSystem)get(LocalFileSystem.NAME, conf);
}
@jainxrohit, could you help to take a look?
Thanks @xumingming for raising this question. We also have this issue. I took a look at the code, just as you said, it seems Presto replaced Hadoop FileSystem Cache and do not take LocalFileSystem into consideration. We have a work around fix in our production like below:
diff --git a/src/main/java/org/apache/hadoop/fs/PrestoFileSystemCache.java b/src/main/java/org/apache/hadoop/fs/PrestoFileSystemCache.java
index c3575eb..a66ff99 100644
--- a/src/main/java/org/apache/hadoop/fs/PrestoFileSystemCache.java
+++ b/src/main/java/org/apache/hadoop/fs/PrestoFileSystemCache.java
@@ -109,7 +109,7 @@ public final class PrestoFileSystemCache
map.put(key, fileSystemHolder);
}
- return fileSystemHolder.getFileSystem();
+ return ((FilterFileSystem)fileSystemHolder.getFileSystem()).getRawFileSystem();
}
private static FileSystem createFileSystem(URI uri, Configuration conf)
@jainxrohit, do you think if @Naveen007 can take this ticket? It might require some digging. Of course, I see a very easy but hacky fix to it.
@jainxrohit, do you think if @Naveen007 can take this ticket? It might require some digging. Of course, I see a very easy but hacky fix to it.
Sure, we can try it.
@wujinhu Your patch might not work in the latest presto versions since the expected return class from PrestoExtendedFileSystemCache#get is ExtendedFileSystem which is a subclass of FileSystem.
With the current setup, there is no straightforward way of using both FileSystem.getLocal() and environment.getFilesystem("/", config), since the later call irrespective of local config or HDFS config expects a return object of ExtendedFileSystem.
How about not wrapping the FileSystem class, if it is type of LocalFileSystem ? @Naveen007
@xumingming Yeah that is a possibility, but in HdfsEnvironment class (it expects a ExtendedFileSystem object to be returned even if it is called with Path("/") and it would fail here).
public ExtendedFileSystem getFileSystem(String user, Path path, Configuration configuration)
throws IOException
{
return hdfsAuthentication.doAs(user, () -> {
FileSystem fileSystem = path.getFileSystem(configuration);
fileSystem.setVerifyChecksum(verifyChecksum);
checkState(fileSystem instanceof ExtendedFileSystem);
return (ExtendedFileSystem) fileSystem;
});
}
Most helpful comment
@jainxrohit, could you help to take a look?