Hi folks,
Bringing this over from the forums as it seems to be a bug/design issue instead of a config problem.
I'm testing rclone as a read-only mount backend for s3ql via unionfs, which is currently storing files in encrypted 10MB chunks on Amazon Drive.
Most reads show an s3ql file downloaded one time only and FUSE reading the file normally in 128k chunks, but there seem to be another download attempt each time rclone reports a seek:
12:34:57 DEBUG : s3ql_data_/264/s3ql_data_264611: Dir.Lookup
12:34:57 DEBUG : s3ql_data_/264/s3ql_data_264611: Dir.Lookup OK
12:34:57 DEBUG : s3ql_data_/264/s3ql_data_264611: File.Attr valid=1m0s ino=0 size=10486673 mode=-rw-rw-r--
12:34:57 DEBUG : s3ql_data_/264/s3ql_data_264611: File.Open OpenReadOnly
12:34:57 DEBUG : s3ql_data_/264/s3ql_data_264611: Downloading large object via tempLink
12:35:06 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read size 32768 offset 0
12:35:06 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read OK
12:35:06 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read size 16384 offset 98304
12:35:06 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.seek from 32768 to 98304
12:35:06 DEBUG : s3ql_data_/264/s3ql_data_264611: Downloading large object via tempLink
12:35:06 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read OK
12:35:06 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read size 65536 offset 32768
12:35:07 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.seek from 114688 to 32768
12:35:07 DEBUG : s3ql_data_/264/s3ql_data_264611: Downloading large object via tempLink
12:35:15 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read OK
12:35:15 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read size 114688 offset 114688
12:35:15 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.seek from 98304 to 114688
12:35:15 DEBUG : s3ql_data_/264/s3ql_data_264611: Downloading large object via tempLink
12:35:21 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read OK
12:35:21 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read size 131072 offset 229376
12:35:21 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read OK
12:35:21 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read size 131072 offset 360448
12:35:21 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read OK
...
12:35:22 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read size 131072 offset 10321920
12:35:22 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read OK
12:35:22 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read size 36864 offset 10452992
12:35:22 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Read OK
12:35:22 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Flush
12:35:22 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Flush OK
12:35:22 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Flush
12:35:22 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Flush OK
12:35:22 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Release closing
12:35:22 DEBUG : s3ql_data_/264/s3ql_data_264611: ReadFileHandle.Release OK
This is with rclone v1.36-47-gd86ea86尾 with mount options:
--allow-other --read-only --buffer-size 100M --acd-templink-threshold 0 --max-read-ahead 100M
Adjusting/removing --buffer-size and --max-read-ahead seem to have no effect. Based on the timestamps, it seems that the download is actually being repeated for the 10MB file, which kills read performance. Streaming media files results in playback halting and buffering while the backend gets hammered sending the same file over and over.
Is there any way to ensure the file is downloaded only once and serve all reads from that download instead of re-downloading? I've been poking through the code but no luck so far seeing where the re-download is being triggered - any hints on how to fix this would be much appreciated!
but there seem to be another download attempt each time rclone reports a seek:
That is how seeking is currently implemented. When you do a seek, rclone will re-open the file with a Range request seeking to the position you asked for.
rclone does not as yet buffer read files, leading to the above.
There are various efforts underway to improve this, including a LRU cache for file segments which would fix this, so watch this space!
Thanks for the info Nick, rclone is fantastic otherwise and it's surprising how well all of these layers (rclone/unionfs/s3ql) work together as-is. Looking forward to the fix for the seek/read buffering, it could be the final piece of the puzzle to get Amazon Drive to behave at least minimally consistently for long sequential reads.
This looks like an oldie as well.
This should be covered in the VFS rewrite that is going on and since ACD is not working anymore, I'm assuming you are good :)
Let me know if you have anything else or I misunderstood something.
Thanks!
Most helpful comment
That is how seeking is currently implemented. When you do a seek, rclone will re-open the file with a
Rangerequest seeking to the position you asked for.rclone does not as yet buffer read files, leading to the above.
There are various efforts underway to improve this, including a LRU cache for file segments which would fix this, so watch this space!