Presto: PrestoExtendedFileSystemCache have not take LocalFileSystem into consideration

Created on 12 Jun 2020  路  8Comments  路  Source: prestodb/presto

We know that in Presto, Hadoop's FileSystem#cache is replaced by PrestoExtendedFileSystemCache, and in PrestoExtendedFileSystemCache, all the FileSytems are wrapped by HadoopExtendedFileSystem:

    // PrestoExtendedFileSystemCache.java
    @Override
    protected FileSystem createPrestoFileSystemWrapper(FileSystem original)
    {
        return new HadoopExtendedFileSystem(original);
    }

The code has not take LocalFileSystem into consideration, many FileSystem implementation utilize LocalFileSystem to implement multipart upload(write) data into the FileSystem. When we get a LocalFileSystem, the type cast will fail:

// FileSystem.java
  public static LocalFileSystem getLocal(Configuration conf)
    throws IOException {
    return (LocalFileSystem)get(LocalFileSystem.NAME, conf);
  }

Most helpful comment

@jainxrohit, could you help to take a look?

All 8 comments

@jainxrohit, could you help to take a look?

Thanks @xumingming for raising this question. We also have this issue. I took a look at the code, just as you said, it seems Presto replaced Hadoop FileSystem Cache and do not take LocalFileSystem into consideration. We have a work around fix in our production like below:

diff --git a/src/main/java/org/apache/hadoop/fs/PrestoFileSystemCache.java b/src/main/java/org/apache/hadoop/fs/PrestoFileSystemCache.java
index c3575eb..a66ff99 100644
--- a/src/main/java/org/apache/hadoop/fs/PrestoFileSystemCache.java
+++ b/src/main/java/org/apache/hadoop/fs/PrestoFileSystemCache.java
@@ -109,7 +109,7 @@ public final class PrestoFileSystemCache
             map.put(key, fileSystemHolder);
         }

-        return fileSystemHolder.getFileSystem();
+        return ((FilterFileSystem)fileSystemHolder.getFileSystem()).getRawFileSystem();
     }

     private static FileSystem createFileSystem(URI uri, Configuration conf)

@jainxrohit, do you think if @Naveen007 can take this ticket? It might require some digging. Of course, I see a very easy but hacky fix to it.

@jainxrohit, do you think if @Naveen007 can take this ticket? It might require some digging. Of course, I see a very easy but hacky fix to it.

Sure, we can try it.

@wujinhu Your patch might not work in the latest presto versions since the expected return class from PrestoExtendedFileSystemCache#get is ExtendedFileSystem which is a subclass of FileSystem.

With the current setup, there is no straightforward way of using both FileSystem.getLocal() and environment.getFilesystem("/", config), since the later call irrespective of local config or HDFS config expects a return object of ExtendedFileSystem.

How about not wrapping the FileSystem class, if it is type of LocalFileSystem ? @Naveen007

@xumingming Yeah that is a possibility, but in HdfsEnvironment class (it expects a ExtendedFileSystem object to be returned even if it is called with Path("/") and it would fail here).

public ExtendedFileSystem getFileSystem(String user, Path path, Configuration configuration)
            throws IOException
    {
        return hdfsAuthentication.doAs(user, () -> {
            FileSystem fileSystem = path.getFileSystem(configuration);
            fileSystem.setVerifyChecksum(verifyChecksum);
            checkState(fileSystem instanceof ExtendedFileSystem);
            return (ExtendedFileSystem) fileSystem;
        });
    }
Was this page helpful?
0 / 5 - 0 ratings