Aws-sdk-java-v2: S3: provide java.nio.FileSystem implementation

Created on 17 Aug 2019  路  17Comments  路  Source: aws/aws-sdk-java-v2

Expected Behavior

The java.nio.FileSystem API provides an abstraction for dealing with different types of file systems, and for accessing files and folders within that file system. AWS should provide an implementation of this interface backed by an s3 client.

Current Behavior

Application code either has to explicitly know about s3 (by passing around S3Client everywhere) or else use some custom abstraction, which inevitably ends up being a half-baked implementation of parts of the FileSystem and Path APIs anyway.

Possible Solution

There are two existing attempts to solve this problem that I can find:

  • Upplication/Amazon-S3-FileSystem-NIO2

    • built against the old 1.11 AWS SDK, so incompatible with codebases that use aws-sdk-java-v2
    • no code changes, and very little activity in issues, for over a year so not actively maintained?
  • elerch/Amazon-S3-FileSystem-NIO2

    • fork of the Upplication implementation, but switches to aws-sdk-java-v2
    • however this largely appears to be a solo effort, no guarantees about support, bugfixes etc going forwards
  • carlspring/s3fs-nio

Fortunately, this code is MIT-licensed so perhaps could form the basis of an official library?

Context

Developing application code against s3 can be problematic because it's not always possible to have a live s3 instance available: access policies might be very strict (no access to s3 from outside corp network, or only from specific (non-dev) machines), or developers might not even have internet access at all when e.g. working on the move.

Attempts to solve this problem in a different way exists (e.g. libraries which provide a local webserver with an s3-like interface) but this adds another set of dependencies, another tool to have to learn / configure / debug.

In my view, a cleaner solution would be to provide an implementation of java.nio.FileSystem which is the standard Java abstraction for dealing with different file systems. Application code would only need to talk to java.nio.file.Path and friends and developers could be confident of their code working reliably regardless of whether it's running against local disk storage, or s3.

feature-request

Most helpful comment

Hi guys,

We are pleased to let you know that we've created a spin-off project (rebranded fork called s3fs-nio) of Upplication/Amazon-S3-FileSystem-NIO2. This is a spin-off (of the latest Upplication/Amazon-S3-FileSystem-NIO2 master with fully preserved history), instead of just another fork, as the upstream has a plethora of forks which contain fixes for bits and pieces that had pull requests against the upstream which were never merged. Most of these forks appear to have sadly died out, just like the upstream seems to have died (Upplication/Amazon-S3-FileSystem-NIO2#135).

As there is a need for such a library, we have decided to take on this task and rebuild an active project with a knowledge base, chat channel and helpful community around it. Ultimately, we would really appreciate it, if we could have some of the Amazon folks helping out with advice and reviewing pull requests, as this would be a massive help for us!

We've done a big clean up of the code, upgraded its depenencies and migrated to AWS SDK v2 (special thanks to @ptirador for all the hard work, as well as to @elerch + @markjschreiber for their advice and reviews!). The project is actively tested against JDK 8 and 11 via Github Actions. Our work is not done and we would like to keep working on this project! We intend to invite all the contributors with open pull requests to join our efforts and forward-port their fixes to our project.

If anyone is interested in lending a hand and joining our project, please reach out, as we have plenty to do! :)

@ashleymercer : Could you please add us to your list above? Thanks! :)

cc: @ptirador , @steve-todorov , @sbespalov

All 17 comments

We've talked about this recently, actually. It would be cool to have, but our friends over in .NET (who have done something similar) have stated that it's actually surprisingly tricky to do correctly.

Marking as feature request.

Big +1 for this feature. We maintain a fork of Upplication project (see here), but it would be very useful to have an official implementation.

Has there been any changes and are there any plans to make official support? :)

We still think it would be a cool idea. Right now the team is focused on getting customer's favorite V1 features into V2. Once we've gotten further along in that process, we can start more seriously considering new, cool features like this one.

Great! Thanks for the update! Please keep us in the loop :)

@millems This would be very useful for us. The proliferation of forks of the Upplication provider ( which is itself a fork of an older provider) causes a lot of confusion.

Google has a very robust open source Path provider for gs: buckets. It would be great if Amazon did too.

Are there any updates on this?

Sorry, this still has not been prioritized.

Google's NIO storage provider works really well and is pretty straight-forward to integrate in any project.

What would be required in order to get the ball rolling for an S3 provider as well?

If somebody were to, say, walk over the different forks of Upplication/Amazon-S3-FileSystem-NIO2 and merge the useful changes that people have done in their forks, would the Amazon team be interested in adopting such a fork and continuing the work?

Unfortunately we aren't able to take on the project ourselves right now, even in just a maintenance capacity. That might change in the future, as demand for this feature rises (both here on Github and via any other official AWS channels of communication) or demand for our time elsewhere falls.

Until such time, we would be surprised and delighted if the open source community were to take up the mantle and develop such a feature. We'd be willing to provide any kind of AWS expertise you might require in the design or development of such a project.

We've talked about this recently, actually. It would be cool to have, but our friends over in .NET (who have done something similar) have stated that it's actually surprisingly tricky to do correctly.

@millems , would you mind elaborating? What were the issues? What were they trying to do and what exactly didn't work?

Having been involved in the development of the google implementation as well as currently developing a generic https filesystem provider, I can say that it's a reasonable amount of work but definitely not insurmountable. One person working part time for a year should be able to come up with a very good solution. It probably has to be iterated though as new error modes are discovered / appear due to changes in the underlying infrastructure.

I would say the hardest part is making it robust against intermittent failure. A file system can't fail at the same rate the internet does so every operation has to be able to continue and retry in the face of failures.
Authentication is also tricky and I don't know how amazon handles this.
Performance is tricky because of the ludicrously high latency compared to local disk operations so some sort of caching or prefetching layer is very helpful.

This is the sort of project that definitely benefits from a set of dedicated maintainers rather than a hodgepodge of forks with their own solutions.

@lbergelson's summary is great. @normj can weigh in on the struggles encountered doing it for .NET.

The "year" of time might sound more intense than I meant. It took initial work but then needed continual adjustment over time as we discovered new rare edge cases through use. Not a solid year of someone writing code.

For the .NET SDK we have a similar feature where make S3 look like a file system matching the .NET File IO API. Although it does make it easier to traverse it does cause pitfalls that are not obvious to the user because S3 really isn't a filesystem. For example the .NET File IO has file operations to append to an existing file. Looks simple and very tempting API for users to call. Under the cover we have to download the object concat the new data and reupload the data. Also if you do a simple File system operation like move or rename directories S3 doesn't really have directories and you end up having to get list all of the objects copy them over and then delete them. If there are a lot of objects under that S3 virtual directory this can be very costly.

So although we have the similar approach in .NET it has cause a lot of confusion for users, especially new to S3, that I actually regret us having the feature. I would rather users of S3 know what manipulations they are doing to S3 then doing what looks like a simple operations but getting a big surprise when it is actually very slow and costly operation.

Thanks for sharing your experience with GCS, as well as the S3 .Net implementation!

Hi guys,

We are pleased to let you know that we've created a spin-off project (rebranded fork called s3fs-nio) of Upplication/Amazon-S3-FileSystem-NIO2. This is a spin-off (of the latest Upplication/Amazon-S3-FileSystem-NIO2 master with fully preserved history), instead of just another fork, as the upstream has a plethora of forks which contain fixes for bits and pieces that had pull requests against the upstream which were never merged. Most of these forks appear to have sadly died out, just like the upstream seems to have died (Upplication/Amazon-S3-FileSystem-NIO2#135).

As there is a need for such a library, we have decided to take on this task and rebuild an active project with a knowledge base, chat channel and helpful community around it. Ultimately, we would really appreciate it, if we could have some of the Amazon folks helping out with advice and reviewing pull requests, as this would be a massive help for us!

We've done a big clean up of the code, upgraded its depenencies and migrated to AWS SDK v2 (special thanks to @ptirador for all the hard work, as well as to @elerch + @markjschreiber for their advice and reviews!). The project is actively tested against JDK 8 and 11 via Github Actions. Our work is not done and we would like to keep working on this project! We intend to invite all the contributors with open pull requests to join our efforts and forward-port their fixes to our project.

If anyone is interested in lending a hand and joining our project, please reach out, as we have plenty to do! :)

@ashleymercer : Could you please add us to your list above? Thanks! :)

cc: @ptirador , @steve-todorov , @sbespalov

Was this page helpful?
0 / 5 - 0 ratings