Aws-cli: s3 ls/cp unicode problem

Created on 1 Feb 2019  ·  8Comments  ·  Source: aws/aws-cli

for s3 ls
display is fine, but after redirect/pipe every Chinese character becomes "?" question mark.
Reproduce:
$ aws s3 ls s3://bucketname
PRE 健身/
PRE 手工制作水晶石头等/
PRE 菜谱/
$ aws s3 ls s3://bucketname > x.txt
$ cat x.txt
PRE ??/
PRE ?????????/
PRE ??/

FAILED: aws-cli/1.16.96 Python/2.7.15rc1 Linux/4.15.0-1031-aws botocore/1.12.86
SUCCESS: aws-cli/1.14.44 Python/3.6.7 Linux/4.15.0-1031-aws botocore/1.8.48 (ubuntu 18.04 LTS apt install)
SUCCESS: aws-cli/1.15.71 Python/3.5.2 Linux/4.15.0-1031-aws botocore/1.10.70 (ubuntu 18.04 LTS snap install)
files generated with version 1.14.44 and 1.15.71 are correct.

for s3 cp
one version works fine and another fails.
FAILED: version: aws-cli/1.15.71 Python/3.5.2 Linux/4.15.0-1031-aws botocore/1.10.70 (ubuntu 18.04 LTS snap install)
$ aws s3 cp s3://buck1/成都/006.JPG s3://buck2/成都/006.JPG
fatal error: 'utf-8' codec can't encode character '\udce6' in position 11: surrogates not allowed

SUCCESS: version aws-cli/1.14.44 Python/3.6.7 Linux/4.15.0-1031-aws botocore/1.8.48 (ubuntu 18.04 LTS apt install)
$ aws s3 cp s3://buck1/成都/006.JPG s3://buck2/成都/006.JPG
copy: s3://buck1/成都/006.JPG to s3://buck2/成都/006.JPG
$ echo $?
0

bug

Most helpful comment

Related PR https://github.com/aws/aws-cli/pull/1844.

I was able to get UTF-8 characters instead of ????? by executing:

export PYTHONIOENCODING=utf-8

before using aws s3 ls s3://bucketname | othertool.

All 8 comments

@dawnfantasy - Thank you for reaching out and reporting this behavior. I am investigating this problem. I was not able to reproduce the same results you are seeing when I tested:

$ aws s3 ls s3://bucketname > x.txt
$ cat x.txt

I digging a little deeper into this issue and will update you with any new findings. Any chance you can provide the debug logs using the --debug option? It would help us understand what is happening.

@dawnfantasy - Thank you for your feedback and the debug logs. I will investigate this issue and reply back as soon as I have more information.

@dawnfantasy - Are you using EC2 Linux instances? If so, did you use custom AMIs or Amazon AMIs? Do you have the Amazon AMI IDs?

ec2 instance, with Amazon AMI.
AMI ID: ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20180912 (ami-0ac019f4fcb7cb7e6)

@justnance Just in case you missed my last reply.

Related PR https://github.com/aws/aws-cli/pull/1844.

I was able to get UTF-8 characters instead of ????? by executing:

export PYTHONIOENCODING=utf-8

before using aws s3 ls s3://bucketname | othertool.

Looks like this should be fixed via PR #1844.

Was this page helpful?
0 / 5 - 0 ratings