Aws-cli: S3 - RequestTimeout during large files

Created on 11 Oct 2013  路  30Comments  路  Source: aws/aws-cli

I'm trying to upload a large file (9 GB) and getting a RequestTimeout error using aws s3 mv ...

I haven't fully tested it yet, but it seems like if I run the command over and over it will eventually work.

Here's the debug log from a failed attempt: https://s3.amazonaws.com/nimbus-public/s3_backup.log

I'll post back if I determine that retrying the command several times works or not.

aws version: aws-cli/1.1.2 Python/2.7.3 Windows/2008ServerR2

bug

Most helpful comment

This hasn't been fixed yet (running latest aws-cli). I did get it to work by setting --cli-read-timeout to 0.

aws --cli-read-timeout 0 s3 cp s3://file .

File successfully cp/mv after doing so.

All 30 comments

After multiple retries the command does eventually work on these large files (7-11GB). But sometimes takes dozens of retries.

BTW, I'm running the command on an EC2 instance - shouldn't be any latency or network issues.

Looking into this, I believe I know what's causing the issue.

Note, I'm having similar reliability issues moving even larger files ~175Gb to S3. We've tried mv, sync, and copy with various results. We're running the following:
aws s3 --version
aws-cli/1.1.2 Python/2.6.6 Linux/2.6.32-358.el6.x86_64

Note this is a single (large) file. It's already compressed.

We often see:
A client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.

Because of the large file size, retrying is extremely expensive. The larger the file size, the less luck we're having.

Is there a better tool that can be used linux command line for moving big data to S3?

Out of 4 attempts yesterday, only one of them was successful. 25% isn't a good enough success rate for us to depend on this for what we're trying to do.

Let me know if you need full logging.

Recent improvements with ver 1.1.2 include giving a valid non-zero return code when we fail, so at least we know that we've failed.

Here's the most recent tail of the output:

2013-10-16 18:05:42,755 - botocore.hooks - DEBUG - Event before-auth.s3: calling handler
2013-10-16 18:05:42,755 - botocore.handlers - DEBUG - Checking for DNS compatible bucket for: https://s3.amazonaws.com/AWL-Backup/mysql/innobackup-mysql-2013-10-16.xbstream?uploadId=Y1fx18aYpHc91zTKCYJWrziFTDYGYIWmow80MSK28xrH9RX7ZzxKs61mHKB1opG7YNlIeiSr8fcsSVN5_LSn5j0wBQQfe1GizoFVhC4arAQRDWiGLt_6HNrFu02ej1au
2013-10-16 18:05:42,755 - botocore.handlers - DEBUG - Not changing URI, bucket is not DNS compatible: AWL-Backup
2013-10-16 18:05:42,755 - botocore.auth - DEBUG - Calculating signature using hmacv1 auth.
2013-10-16 18:05:42,755 - botocore.auth - DEBUG - HTTP request method: DELETE
2013-10-16 18:05:42,756 - botocore.auth - DEBUG - StringToSign:
DELETE

Thu, 17 Oct 2013 00:05:42 GMT
/AWL-Backup/mysql/innobackup-mysql-2013-10-16.xbstream?uploadId=Y1fx18aYpHc91zTKCYJWrziFTDYGYIWmow80MSK28xrH9RX7ZzxKs61mHKB1opG7YNlIeiSr8fcsSVN5_LSn5j0wBQQfe1GizoFVhC4arAQRDWiGLt_6HNrFu02ej1au
2013-10-16 18:05:42,756 - botocore.endpoint - DEBUG - Sending http request:
2013-10-16 18:05:43,448 - botocore.response - DEBUG - Response Body:

2013-10-16 18:05:43,448 - botocore.hooks - DEBUG - Event needs-retry.s3.AbortMultipartUpload: calling handler
2013-10-16 18:05:43,448 - botocore.retryhandler - DEBUG - No retry needed.
2013-10-16 18:05:43,449 - botocore.hooks - DEBUG - Event after-call.s3.AbortMultipartUpload: calling handler
2013-10-16 18:05:43,449 - awscli.errorhandler - DEBUG - HTTP Response Code: 204

At this point the exit and return $? = 1 (failure).

This should be fixed now. The issue was that the s3 side of the connection was closing the connection while we were uploading data. We detect this and automatically retry the request when this happens. However, we need to ensure that if the body is a file like object (which is the case when cp/mv/sync'ing to s3) that we properly reset the stream back to the beginning to ensure we send the entire body contents again.

James, thanks. I assume the fix is deployed. We've got an overnight cron to retest. If it fails, I'll be back at it next week.

The fix is in our develop branch now. It will be incorporated into our next release soon.

garnatt, what's the release schedule or where can I watch for it?

The 1.2.0 release, which contains this bug fix, is now out.

Thank you!

Just got this error uploading a 6.6Gb file:

"A client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed."
[root@digital ~]# aws s3 --version
aws-cli/1.2.0 Python/2.6.6 Linux/2.6.32-358.6.1.el6.x86_64

Debug Ouput from 2nd Attempt:

2013-10-20 20:49:05,319 - awscli.customizations.s3.tasks - DEBUG - Part number 477 completed for filename: FILE.tar
2013-10-20 20:49:05,348 - awscli.customizations.s3.executer - DEBUG - Received print task: {'message': u'upload: ..FILE.tar to s3://FILE.tar', 'total_parts': 483, 'error': False}
2013-10-20 20:49:06,642 - botocore.response - DEBUG - Response Body:
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>RequestTimeout</Code><Message>Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.</Message><RequestId>REQUEST_ID</RequestId><HostId>HOST_ID</HostId></Error>
2013-10-20 20:49:06,643 - botocore.hooks - DEBUG - Event needs-retry.s3.UploadPart: calling handler <botocore.retryhandler.RetryHandler object at 0x31844d0>
2013-10-20 20:49:06,643 - botocore.retryhandler - DEBUG - No retry needed.
2013-10-20 20:49:06,643 - botocore.hooks - DEBUG - Event after-call.s3.UploadPart: calling handler <awscli.errorhandler.ErrorHandler object at 0x2f4bdd0>
2013-10-20 20:49:06,643 - awscli.errorhandler - DEBUG - HTTP Response Code: 400
2013-10-20 20:49:06,644 - awscli.customizations.s3.tasks - DEBUG - Error during part upload: A client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/awscli/customizations/s3/tasks.py", line 148, in __call__
    self._filename.service, 'UploadPart', params)
  File "/usr/lib/python2.6/site-packages/awscli/customizations/s3/utils.py", line 115, in operate
    http_response, response_data = operation.call(**kwargs)
  File "/usr/lib/python2.6/site-packages/botocore/operation.py", line 82, in call
    parsed=response[1])
  File "/usr/lib/python2.6/site-packages/botocore/session.py", line 550, in emit
    return self._events.emit(event_name, **kwargs)
  File "/usr/lib/python2.6/site-packages/botocore/hooks.py", line 158, in emit
    response = handler(**kwargs)
  File "/usr/lib/python2.6/site-packages/awscli/errorhandler.py", line 50, in __call__
    raise ClientError(msg)
ClientError: A client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
2013-10-20 20:49:06,712 - awscli.customizations.s3.executer - DEBUG - Received print task: {'message': u'upload failed: ..FILE.tar to s3://FILE.tar\nA client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.', 'error': True}
upload failed: ..FILE.tar to s3://FILE.tar
A client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
2013-10-20 20:49:07,128 - botocore.response - DEBUG - Response Body:

2013-10-20 20:49:07,128 - botocore.hooks - DEBUG - Event needs-retry.s3.UploadPart: calling handler <botocore.retryhandler.RetryHandler object at 0x31844d0>
2013-10-20 20:49:07,128 - botocore.retryhandler - DEBUG - No retry needed.
2013-10-20 20:49:07,128 - botocore.hooks - DEBUG - Event after-call.s3.UploadPart: calling handler <awscli.errorhandler.ErrorHandler object at 0x2f4bdd0>
2013-10-20 20:49:07,128 - awscli.errorhandler - DEBUG - HTTP Response Code: 200
2013-10-20 20:49:07,129 - awscli.customizations.s3.tasks - DEBUG - Part number 478 completed for filename: FILE.tar
2013-10-20 20:49:07,149 - awscli.customizations.s3.executer - DEBUG - Error calling task: Upload has been cancelled.
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/awscli/customizations/s3/executer.py", line 104, in run
    function()
  File "/usr/lib/python2.6/site-packages/awscli/customizations/s3/tasks.py", line 339, in __call__
    parts = self._upload_context.wait_for_parts_to_finish()
  File "/usr/lib/python2.6/site-packages/awscli/customizations/s3/tasks.py", line 437, in wait_for_parts_to_finish
    raise UploadCancelledError("Upload has been cancelled.")
UploadCancelledError: Upload has been cancelled.
2013-10-20 20:49:07,177 - awscli.customizations.s3.executer - DEBUG - Received print task: {'message': u'upload: ..FILE.tar to s3://FILE.tar', 'total_parts': 483, 'error': False}
2013-10-20 20:49:08,119 - botocore.response - DEBUG - Response Body:

2013-10-20 20:49:08,119 - botocore.hooks - DEBUG - Event needs-retry.s3.UploadPart: calling handler <botocore.retryhandler.RetryHandler object at 0x31844d0>
2013-10-20 20:49:08,119 - botocore.retryhandler - DEBUG - No retry needed.
2013-10-20 20:49:08,120 - botocore.hooks - DEBUG - Event after-call.s3.UploadPart: calling handler <awscli.errorhandler.ErrorHandler object at 0x2f4bdd0>
2013-10-20 20:49:08,120 - awscli.errorhandler - DEBUG - HTTP Response Code: 200
2013-10-20 20:49:08,120 - awscli.customizations.s3.tasks - DEBUG - Part number 480 completed for filename: FILE.tar
2013-10-20 20:49:08,142 - awscli.customizations.s3.executer - DEBUG - Received print task: {'message': u'upload: ..FILE.tar to s3://FILE.tar', 'total_parts': 483, 'error': False}

Could you check in the debug logs if there's a traceback that occurs earlier in the logs? Generally, we've seen that the RequestTimeoutError occurs because something earlier in the upload triggered a retry and we weren't properly resetting the IO streams on a retry attempt, but this should be fixed in 1.2.0. I'd like to see what caused the initial retry.

I'm also trying to reproduce this issue on v1.2.0. I'll update with what I find.

For clarification I was using the --recursive option on a folder which contained two sub-folders with a file in each.
I have just tried the upload again explicitly specifying the .tar file only and it went through fine.

I will retry the folder with the --recursive option to get the debug logs you have requested.

so request time-out issue is not resolved yet?
I am trying to upload big file by Rails carrierwave, and it also shoots Request-Timeout issue.
And it is really critical issue.

Is there anyone who resolved this issue?

I've largely given up on the AWSCLI over a home connection. The connectivity and retry logic isn't robust enough to make this a viable solution. We timeout, retry, and eventually fail on large upload requests.

seems to work flawlessly on 1.3.12 :+1:

on 1.2.1 got:

upload failed: ./bigzip.zip to s3://mybucket-test-s3/bigzip.zip
A client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
``
I see a 

Completed 170 of 185 part(s) with -169 file(s) remaining

that decreases with each error message.
This is a aws s3 cp command from a EC2 Instance in the same region the bigzip.zip is about 1.3GB in size.

---

I am seeing the same issue with a large file. Getting Max retries exceeded with url (Caused by <class 'socket.error'>: [Errno 104] Connection reset by peer). There doesn't seem to be a way to

Same problem here with

aws-cli/1.2.9 Python/3.4.0 Linux/3.13.0-29-generic

Hi, I m getting this even for 2KB files through data power. However 1KB files works fine.

Still appears to be a problem. Brand new Ubuntu VM, installed AWS CLI tools,
aws-cli/1.10.1 Python/3.5.2 Linux/4.4.0-38-generic botocore/1.3.23

Once more I face the dreaded ConnectionResetError 104!

upload failed: 1.pdf to s3://mybucket/1.pdf ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
upload failed: 1.pdf to s3://mybucket/1.pdf ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
upload failed: 2.pdf to s3://mybucket/2.pdf ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
upload failed: 3.jpg to s3://mybucket/3.jpg ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

This happens if using the AWS UI to upload large files too.

Hey guys, I am seeing this issue too. I get it while trying to do a multipart upload. I thought my chunk size was too large, but alas I changed it to 5MB and it still gets a timeout error.

This needs to be reopened.

Same problem. I ended up using good ol' s3cmd:

s3cmd --continue-put put 5GB_file s3://some-bucket

Worked like a charm. Who says s3cmd is dead? :neckbeard: 馃搧 鉃★笍 馃寧 鉃★笍 馃摤

Seeing this issue with:

> aws s3 --version
> aws-cli/1.11.44 Python/2.7.3 Linux/3.7.10-1.40-desktop botocore/1.5.7

20 GB file. Retrying usually works, but still

Also seeing this with

aws s3 --version
aws-cli/1.11.56 Python/2.7.12+ Linux/4.6.0-kali1-amd64 botocore/1.5.9

using both the S3 Accelerate endpoint, and the non-accelerated endpoint gave the same 'connection aborted' error as described above, even after trying 10-12 times, with a 3Gb file. (us-west-2)

Interestingly/anecdotally: Removing the following option from the 's3 cp' command allowed the copy to complete on the very next attempt.

Removed the following option; --sse AES256

This suggests to me that there is a (regression?) issue with the way that the confirmation of the part-completions from the server-side are being handled, which are likely made worse/slower when SSE is enabled. Looks like the client may be timing out too quickly ?

This issue isn't specific to large files. I was just testing something and tried to upload a copy of the WordPress source files. After about 200 files, it starts to give me the ConnectionResetError(104, 'Connection reset by peer')) for files that are 300KB in size and even smaller

aws s3 --version
aws-cli/1.11.13 Python/3.5.2 Linux/4.4.0-1031-aws botocore/1.4.70

Just a small update. It seems to have been fixed in some later version (and even runs a lot faster). I checked out the latest version using pip install --upgrade awscli so my version was as follows:

aws s3 --version
aws-cli/1.11.138 Python/3.5.2 Linux/4.4.0-1031-aws botocore/1.6.5

Re-ran the command it successfully copied all the files (+ was a lot faster than the default version that you install with Ubuntu 16.04 LTS)

Thank you for the update..

This hasn't been fixed yet (running latest aws-cli). I did get it to work by setting --cli-read-timeout to 0.

aws --cli-read-timeout 0 s3 cp s3://file .

File successfully cp/mv after doing so.

Experiencing the same issue with a 49 GB file - --cli-read-timeout 0 didn't help here.

aws s3 --version
aws-cli/1.14.44 Python/3.6.9 Linux/4.19.0-0.bpo.6-amd64 botocore/1.8.48

I wonder why the protocol cannot fall back to polling instead of relying on a socket connection being open for a long time while the chunks are being assembled on the server?

Was this page helpful?
0 / 5 - 0 ratings