External Storage - S3 - has very high requests
In Oct 766K Requests
In Nov 22.6 Million requests
In Dec 53.5 Million requests
As of 1/7/15 5 Million requests
I have an S3 bucket mounted on my Owncloud server with about 20 gigs (30 files only) in it. I've removed s3 as a shared drive on all clients but left it on the server. SQLite is my database.
Owncloud Server 7.04
What it should be at?
Don't know but it is continuing to worsen.
Did you perform any updates in the time when it increased?
I must have updated from at least 7.03 in that time frame.
CC @butonic @icewind1991
Same for me. I used the sync client to store ~300 pictures in an s3 external mount. Produced ~1.5 Million request within a few days. The mount was created and used on 7.0.4. Is a request for every file made whenever the sync client syncs?
Same here, now at about 12 million requests, for syncing 2k files, with about 1k done.
Also on version 7.0.4 at the moment.
Related: I see the CPU of my Digital Ocean droplet at about 30% while 1 (mac) client is syncing. Which seems quite steep.
Same problem here 19 million requests in the month of January around 100 files or so in Owncloud, i also use S3 for images on my website that has next to no traffic.
You guys are all talking about database requests, right ?
Regarding connections to the remote S3 server, implementing a stat cache storage wrapper might help avoid that many connections: #13971
I think they are talking about http calls to s3. In contrast to s3 as objectstorage, the files external implementation makes an http request for every stat call. @PVince81 improved the situation with a stat cache. The number of calls can only be reduced if owncloud could exclusively write to s3. Then we wouldnd have to scan for extenal changes, which is problematic anyway (also see https://github.com/owncloud/core/issues/11797).
Nope, the stat cache I added was for SWIFT, and it's still in PR: #7897
But I'd rather go for a generic solution that can be used for any storage: https://github.com/owncloud/core/issues/13971 (yet to be implemented)
I guess the main problem with the current version of the s3 external storage is that the high number of http request can lead to unexpexted high fees as you also pay a small fee per http request. In my opinion, requireing an exclusive s3 bucket for ownCloud is pretty reasonable, at least on s3 where you can have many separated buckets.
I want confirm very high amount of queries to s3 from owncloud.
Very high.
I collected data in hope that it helps.
Server ubuntu + nginx + php5-fpm + mysql.
I installed latest version of owncloud from Web Installer (8.0.2).
And latest version of desktop client (win 1.8.0).
I created separate bucket on s3 special for this test.
I used amazon aws only for s3 for owncloud, so it's pure experiment.
On desktop i used folder with some files, encrypted with encfs.
I waited for finish full sync plus a day without touching this files.
I don't touched any file in this folder.
here log of web server: http://scr.lexore.net/o.access.log.gz
here log of desktop client: http://scr.lexore.net/client.owncloudsync.log.gz
Filenames crypted too :)
As you can see, after sync finish, owncloud starts "Syncrun" every minute.
Syncrun goes approx a hour, client wait a minute, and then start new Syncrun, wait a minute, e.t.c
So i can say, that client starts and starts Syncrun, when no files were modified.
In time of Syncrun owncloud makes huge amount of requests to s3.
And all this requests costs a money.
So, owncloud+s3 turns into money-eating machine :)
Here some screenshots:
folder options (which was in sync): http://scr.lexore.net/20150415-slr-25kb.jpg
aws cost explorer (daily spend): http://scr.lexore.net/20150415-tt0-88kb.jpg
aws bill details (summary spend): http://scr.lexore.net/20150415-n6u-90kb.jpg
Here some calculations:
On "bill details" screenshot we can see 8 170 128 GET requests to amazon s3.
In client logfile there are 34 syncruns on unchanged files ("for sure").
First syncrun on unchanged files started "2015-04-13T07:19:19".
From this time there are approx 26 thousands of http requests in server logfile (2/3 of 38 thousand in sum).
34 syncruns * 2584 files / 26000 = approx 3 requests per file.
But this generated approx 5 mln requests to amazon s3 (2/3 of 8170128).
5 000 000 / 26 000 = approx 192
Approx 192 requests to s3 in one request to owncloud!
On unchanged files!
I think this can be optimized :)
Yes. A stat cache needs to be added in the S3 implementation.
If anyone has some time, you could look at https://github.com/owncloud/core/pull/7897 and do the same for https://github.com/owncloud/core/blob/master/apps/files_external/lib/amazons3.php (no need to pick the ArrayCache class, it already exists in core).
I'll set the milestone for this ticket to 8.2 for now.
I have been dealing with this issue since it started with 7.04. Prior to that, (6.06 then 7.03) I used S3 with minimal overhead. Something changed in 7.04 that started hammering S3 with all these requests. It also puts a much higher load on the owncloud server. For now I'm hold at an earlier version to avoid the problem, but I would love to help track down what caused it in the first place.
I'm not aware of any change that would increase the number of requests of S3 specifically.
But indirectly it is possible that some other parts of the code is using more FS calls (like additional checks) that might also cause additional calls here.
In that case, would it be best to just implement a stat cache and forget it? I'll look through the changes that happened back then to see what I can find. I know all the extra requests are PROPFIND if that rings a bell.
The stat cache would be the ideal solution, yes.
Also having this issue. ~11M requests to S3 with a single Owncloud being the only service connected to it. I'm using the External Storage plugin and am using the Owncloud app on Android to sync photos to the S3 storage when they are taken.
Just wondering... is the fix to this considered to be the option (under the gear icon) for how often to check for changes? I just discovered this and selected "never". I'm hoping this will keep my requests down as like the others they will up to be real money with any kind of real use. If this works, I'll be happy in this case since I only plan on managing the bucket through Owncloud.
@unleashit this is an option, yes.
The second step is to implement a stat cache for S3, similar to the one proposed for SWIFT: https://github.com/owncloud/core/pull/7897 which can help saving redundant requests.
@MTRichards @cmonteroluque something to consider for 9.1 ? (implementing a stat cache for S3 to avoid too many requests to the S3 server, making it very slow)
Hard for me to tell if this is the way to fix the problem, but add to 9.1 list.
@MTRichards we started doing something similar for SWIFT and improved perf significantly, it's still in a PR and I'd like to finish that too for 9.1: https://github.com/owncloud/core/pull/7897
That's how I know :wink:
Excellent! Cross linked.
good idea @PVince81
Just moved to S3 external storage with a folder hosting upwards of 9k files. I am quite worried now that the AWS bill will be huge, if as one commenter said only 30 files resulting in millions of S3 requests.
Also the sync client is taking hours to only check for changes in that folder :-/
Is there something I can help out with improving the plugin step by step?
@cdhowie IMHO the S3 module is unusable due to the high requests. I had a very modest amount of files, and I ate through the free tier in about a day, and put a stop to it just a few days later after I wracked up $2 in api fees. As this is open source, I would never expect anyone to do anything. But again IMHO this is the kind of thing that can bite, so with that in mind I think it might be a good idea to remove it from the core (maybe offer as an extension) unless it can be fixed.
@unleashit My comment was not directed at you, it was directed at a comment (since removed) by someone asking for help with AWS directly, and had nothing to do with ownCloud.
@cdhowie I actually thought it was to @nebulade. My bad for apparently not reading correctly :)
Hey, this issue has been closed because the label status/STALE is set and there were no updates for 7 days. Feel free to reopen this issue if you deem it appropriate.
(This is an automated comment from GitMate.io.)
Secondary s3 storage integration just got move over to https://github.com/owncloud/files_external_s3.
Unfortunately GitHub does not allow to move issues. Please reopen the issue at https://github.com/owncloud/files_external_s3/issues
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.