I'm not sure if there's an existing cache of file attributes (e.g., size and file checksum) but in an ideal world, rather than make a call to S3 to see if anything has changed (which is what appears to be happening), the attribute cache could be queried instead for any comparison.
Since the documentation says nothing should touch the S3 bucket other than Nextcloud, in theory any change would go through Nextcloud so the cache would be properly updated.
Perhaps a walk through S3 (slow random walk and/or periodic walk) to update the cache (and invalidate with perhaps a warning to admin [DO NOT TOUCH S3 DIRECTLY] would be appropriate. And a regular check on total files in the bucket and total bucket size would also allow for a quick cache invalidation and warning.
As a side note, the folder sizes don't seem to update correctly when using S3, presumably that information could be pulled from the cache as well.
Extremely high-levels of S3 calls - appears that all client checks (has a file changed) are hitting S3 directly. So, for 503 files, in two weeks I had 15 million S3 GETs by three users.
Operating system:
Centos 7
Web server:
Apache 2.4
Database:
MariaDB
PHP version:
5.6
Nextcloud version: (see Nextcloud admin page)
11.0.1
Updated from an older Nextcloud/ownCloud or fresh install:
Fresh install
Where did you install Nextcloud from:
Signing status:
Signing status
No errors have been found.
List of activated apps:
App list
Enabled:
The content of config/config.php:
Config report
{
"system": {
"instanceid": "ocl1styuwl51",
"passwordsalt": "REMOVED SENSITIVE VALUE",
"secret": "REMOVED SENSITIVE VALUE",
"trusted_domains": [
"XX"
],
"datadirectory": "\/var\/www\/html\/nextcloud\/data",
"overwrite.cli.url": "XX",
"dbtype": "mysql",
"version": "11.0.1.2",
"dbname": "nextcloud",
"dbhost": "XX",
"dbport": "3306",
"dbtableprefix": "oc_",
"dbuser": "REMOVED SENSITIVE VALUE",
"dbpassword": "REMOVED SENSITIVE VALUE",
"logtimezone": "UTC",
"installed": true,
"memcache.local": "\OC\Memcache\APCu",
"updater.release.channel": "stable",
"mail_from_address": "owncloud",
"mail_smtpmode": "smtp",
"mail_domain": "thefoodcycleny.com",
"mail_smtpauthtype": "PLAIN",
"mail_smtpauth": 1,
"mail_smtphost": "mail.chelsea.net",
"mail_smtpname": "REMOVED SENSITIVE VALUE",
"mail_smtppassword": "REMOVED SENSITIVE VALUE",
"mail_smtpport": "465",
"mail_smtpsecure": "ssl",
"theme": "",
"loglevel": 2,
"maintenance": false
}
}
Are you using external storage, if yes which one: Was using S3, now disabled
Yes, the way this handles S3 is really, horrendously bad. It should keep a copy of the file's metadata on your local server / DB and consult that when it needs to know the name, size, attributes, etc and then configure S3 to push updates to you via Simple Notification Service, or alternatively you could poll using SQS to inform you of a update in the file which would trigger you refreshing your metadata.
So is the verdict that for now S3 is unusable? I just tried connecting my NC instance to S3 and tried uploading some files - small ones work (small <5MB ) large ones seem to finish uploading to the instance but then vanish without a trace. They up fine to the local storage. This leaves a pretty bad taste and a general disillusion in the state of affairs.
cc @karlitschek @oparoz I guess we should look into this sooner or later
@icewind1991 Any ideas how to profile this?
Turn on logging for that S3 bucket and try to upload a file again. How are you uploading the file? Does it upload to the next loud server first and is then pushed to the S3 or is it uploaded straight to the S3 bucket? It could be an issue with the webserver if you're uploading via the web interface (although if big files are uploaded successfully to local storage it would discount this) or possibly a problem with the multipart upload to S3
AFAIK, when using S3 as primary storage, Nextcloud uses the cache exclusively until the file needs to be updated. When using external storage, the user can choose whether to check files on every access or never, because that storage location may be updated from the outside. I don't think we can easily fix that behaviour. The solution would be to use S3 as primary storage.
@SpiraMirabilis @dbchelgit Do you use the primary storage or external storage app?
I'm using it as external storage, since my primary storage is local EBS at the moment. I took another look at the configuration options, and I'm not certain if I had my external stuff set to "Never" or "Once for every Direct Access". I have it set to Never in the little test one I have set up.
If it's set to Never and that eliminates the constant checking, perhaps this is just a documentation fix? Indicate. I.e., "If the S3 bucket is dedicated to Nextcloud (as it should be) then set this to 'Never' to reduce S3 charges". Also, if there is a concern about data integrity for other reasons, perhaps instead of doing this with accesses, maybe a task can be added to cron to walk the S3 bucket on some regular basis and do the checksum to see if anything has changed. I think you can do a HEAD request and just get the object metadata to keep data transfer charges down.
David.
Could you please retry with Nextcloud 12. We improved a lot here and found some issues. If there is still a problem please reopen as new ticket. Thanks
Having the same issue, but with Swift.
I'm using it as external storage, since my primary storage is local EBS at the moment. I took another look at the configuration options, and I'm not certain if I had my external stuff set to "Never" or "Once for every Direct Access". I have it set to Never in the little test one I have set up.
If it's set to Never and that eliminates the constant checking, perhaps this is just a documentation fix? Indicate. I.e., "If the S3 bucket is dedicated to Nextcloud (as it should be) then set this to 'Never' to reduce S3 charges". Also, if there is a concern about data integrity for other reasons, perhaps instead of doing this with accesses, maybe a task can be added to cron to walk the S3 bucket on some regular basis and do the checksum to see if anything has changed. I think you can do a HEAD request and just get the object metadata to keep data transfer charges down.
David.
@dbchelgit did changing the "check for changes" to never solve the issue? I just setup nextcloud with it backed solely by S3 and I woke up to a decent charge on my AWS account overnight due to data upload yesterday.
Derek, I finally just gave up. I would still have excessive usage charges, and occasionally things would just weirdly break - the S3 backend didn’t really seem stable. So I’m paying for disk instead and it’s been fine.
On Apr 16, 2019, at 8:57 AM, Derek Palmer notifications@github.com wrote:
I'm using it as external storage, since my primary storage is local EBS at the moment. I took another look at the configuration options, and I'm not certain if I had my external stuff set to "Never" or "Once for every Direct Access". I have it set to Never in the little test one I have set up.
If it's set to Never and that eliminates the constant checking, perhaps this is just a documentation fix? Indicate. I.e., "If the S3 bucket is dedicated to Nextcloud (as it should be) then set this to 'Never' to reduce S3 charges". Also, if there is a concern about data integrity for other reasons, perhaps instead of doing this with accesses, maybe a task can be added to cron to walk the S3 bucket on some regular basis and do the checksum to see if anything has changed. I think you can do a HEAD request and just get the object metadata to keep data transfer charges down.
David.
@dbchelgit https://github.com/dbchelgit did changing the "check for changes" to never solve the issue? I just setup nextcloud with it backed solely by S3 and I woke up to a decent charge on my AWS account overnight due to data upload yesterday.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/nextcloud/server/issues/3673#issuecomment-483647902, or mute the thread https://github.com/notifications/unsubscribe-auth/AJcTPdT6jIlNs3tUOiVazM4gO2ZvT_HLks5vhciygaJpZM4MQNk6.
Just went to my billing dashboard after moving some of my stuff to s3 and getting about 500k tier 1 requests / day. Gonna add up to over $30 this month just for s3 requests. No bueno 👎. Is anyone looking into this or is s3 in nextcloud dead? Curious why this was closed?
@chiefy I bailed on S3. For when I was using nextcloud I switched to B2 (backblaze) via MiniIO (https://github.com/minio/minio) and it worked pretty well.
I no longer use external cloud storage on nextcloud, only physical drives in my home server.
Most helpful comment
@chiefy I bailed on S3. For when I was using nextcloud I switched to B2 (backblaze) via MiniIO (https://github.com/minio/minio) and it worked pretty well.
I no longer use external cloud storage on nextcloud, only physical drives in my home server.