Aws-cli: Unable to load file:// with non-English char in UTF-8 encoding on Windows

Created on 27 Mar 2020  路  14Comments  路  Source: aws/aws-cli

OS: Win10
CLI: aws-cli/2.0.5 Python/3.7.5 Windows/10 botocore/2.0.0dev9
Not sure if this is relevant since AWSCLI2 uses own copy of python, but I have Python 3.7.6 installed as well

I got the following error when calling aws dynamodb batch-write-item --request-items file://xxx.json in a windows .bat where the json file contains non-English char.

Error parsing parameter '--request-items': Unable to load paramfile (xxx.json), text contents could not be decoded.  If this is a binary file, please use the fileb:// prefix instead of the file:// prefix.

Adding --debug shows the following

2020-03-27 12:23:08,369 - MainThread - awscli.clidriver - DEBUG - Client side parameter validation failed
Traceback (most recent call last):
  File "lib\site-packages\awscli\paramfile.py", line 86, in get_file
UnicodeDecodeError: 'cp950' codec can't decode byte 0x96 in position 218: illegal multibyte sequence

Tried adding the following in the batch session but none of them helps

CHCP 65001
set PYTHONUTF8=1
set PYTHONIOENCODING=UTF-8

Obviously I should NOT try to modify the python script and add , encoding='utf8' to the params of open(), and I would avoid modifying Windows' system variable or registry.

Any help would be much appreciated, thanks.

bug

Most helpful comment

Hi @vz10

It worked fine!
I could used templatefile that with Japanese comment.
Thank you.

All 14 comments

Hi @Zeeeeta ,
can you tell me which character is causing the error so I can try and reproduce it?

Hi @KaibaLopez
Basically any Chinese characters (no matter Traditional or Simplified), could be as simple as 涓枃.
I have Traditional Chinese Win10, guess that causes python to use cp950 as default.

Hi @Zeeeeta ,
I'm having some issues replicating this, looks like some characters are permitted and some others not, (涓枃.json for example works for me) let me test this and dig around a bit, and I'll update you with an answer as soon as I get one.

Hi @KaibaLopez
Not the file name, but the data content inside the json.
File name I just use English.

Hi @Zeeeeta,

I took a quick look over the issue and you're right on the Python encoding. Windows will default to Microsoft's proprietary Code Page 950 (cp950) on a system setup with Traditional Chinese. We use the systems locale information to determine how we interpret the bytes in files which is where the underlying issue is coming from and why @KaibaLopez is unable to reproduce it.

Based on PEP540, I'd expect set PYTHONUTF8=1 to resolve this on Python3.7. Would you be able to do a couple quick tests to help pinpoint the issue?

  • Could you run this command with the PYTHONUTF8=1 flag enabled: python -c "import locale; print(locale.getpreferredencoding()"
  • Could you run your aws command with --debug and share the contents?

Hi @nateprewitt

I tried to test this script with 3 different inputs in the json data:

@ECHO off

ECHO Printing Python locale...
CALL python -c "import locale; print(locale.getpreferredencoding())"
ECHO.

ECHO Setting PYTHONUTF8=1...
SET PYTHONUTF8=1
ECHO.

ECHO Printing Python locale...
CALL python -c "import locale; print(locale.getpreferredencoding())"
ECHO.

ECHO Batch-writing to DynamoDB...
CALL aws dynamodb batch-write-item --request-items file://test.json --debug

PAUSE

Results for different inputs:
"涓枃" (basic Chinese characters which has no difference between Traditional Chinese and Simplifed Chinese)
-> successfully processed but the word on DynamoDB is unreadable

"缁忕悊" (Simplifed Chinese) and "缍撶悊" (Traditional Chinese)
-> both failed with the debug log below

Printing Python locale...
cp1252

Setting PYTHONUTF8=1...

Printing Python locale...
UTF-8

Batch-writing to DynamoDB...
2020-04-20 12:28:39,935 - MainThread - awscli.clidriver - DEBUG - CLI version: aws-cli/2.0.3 Python/3.7.5 Windows/10 botocore/2.0.0dev7
2020-04-20 12:28:39,935 - MainThread - awscli.clidriver - DEBUG - Arguments entered to CLI: ['dynamodb', 'batch-write-item', '--request-items', 'file://test.json', '--debug']
2020-04-20 12:28:39,935 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function add_timestamp_parser at 0x000002A1FA8E43A8>
2020-04-20 12:28:39,935 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function register_uri_param_handler at 0x000002A1FA0CF8B8>
2020-04-20 12:28:39,935 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function add_binary_formatter at 0x000002A1FB12EB88>
2020-04-20 12:28:39,935 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function inject_assume_role_provider_cache at 0x000002A1FA0FD3A8>
2020-04-20 12:28:39,935 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function attach_history_handler at 0x000002A1FA79ECA8>
2020-04-20 12:28:39,935 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function inject_json_file_cache at 0x000002A1FA748A68>
2020-04-20 12:28:39,950 - MainThread - botocore.loaders - DEBUG - Loading JSON file: C:\Program Files\Amazon\AWSCLIV2\botocore\data\dynamodb\2012-08-10\service-2.json
2020-04-20 12:28:39,950 - MainThread - botocore.hooks - DEBUG - Event building-command-table.dynamodb: calling handler <function _add_wizard_command at 0x000002A1FB1255E8>
2020-04-20 12:28:39,950 - MainThread - botocore.hooks - DEBUG - Event building-command-table.dynamodb: calling handler <function add_waiters at 0x000002A1FA8EFDC8>
2020-04-20 12:28:39,966 - MainThread - botocore.loaders - DEBUG - Loading JSON file: C:\Program Files\Amazon\AWSCLIV2\botocore\data\dynamodb\2012-08-10\waiters-2.json
2020-04-20 12:28:39,966 - MainThread - awscli.clidriver - DEBUG - OrderedDict([('request-items', <awscli.arguments.CLIArgument object at 0x000002A1FB2DD188>), ('return-consumed-capacity', <awscli.arguments.CLIArgument object at 0x000002A1FB2D7088>), ('return-item-collection-metrics', <awscli.arguments.CLIArgument object at 0x000002A1FB2DD1C8>)])
2020-04-20 12:28:39,966 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.dynamodb.batch-write-item: calling handler <function add_streaming_output_arg at 0x000002A1FA8E6558>
2020-04-20 12:28:39,966 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.dynamodb.batch-write-item: calling handler <function add_cli_input_json at 0x000002A1FA101DC8>
2020-04-20 12:28:39,966 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.dynamodb.batch-write-item: calling handler <function add_cli_input_yaml at 0x000002A1FA101A68>
2020-04-20 12:28:39,966 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.dynamodb.batch-write-item: calling handler <function unify_paging_params at 0x000002A1FA75D1F8>
2020-04-20 12:28:39,988 - MainThread - botocore.loaders - DEBUG - Loading JSON file: C:\Program Files\Amazon\AWSCLIV2\botocore\data\dynamodb\2012-08-10\paginators-1.json
2020-04-20 12:28:39,988 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.dynamodb.batch-write-item: calling handler <function add_generate_skeleton at 0x000002A1FA8479D8>
2020-04-20 12:28:39,988 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.dynamodb.batch-write-item: calling handler <function add_auto_prompt at 0x000002A1FB12AD38>
2020-04-20 12:28:39,988 - MainThread - botocore.hooks - DEBUG - Event before-building-argument-table-parser.dynamodb.batch-write-item: calling handler <bound method OverrideRequiredArgsArgument.override_required_args of <awscli.customizations.cliinput.CliInputJSONArgument object at 0x000002A1FB2DD248>>
2020-04-20 12:28:39,988 - MainThread - botocore.hooks - DEBUG - Event before-building-argument-table-parser.dynamodb.batch-write-item: calling handler <bound method OverrideRequiredArgsArgument.override_required_args of <awscli.customizations.cliinput.CliInputYAMLArgument object at 0x000002A1FB2DD288>>
2020-04-20 12:28:39,988 - MainThread - botocore.hooks - DEBUG - Event before-building-argument-table-parser.dynamodb.batch-write-item: calling handler <bound method GenerateCliSkeletonArgument.override_required_args of <awscli.customizations.generatecliskeleton.GenerateCliSkeletonArgument object at 0x000002A1FB2D7C08>>
2020-04-20 12:28:39,988 - MainThread - botocore.hooks - DEBUG - Event before-building-argument-table-parser.dynamodb.batch-write-item: calling handler <bound method AutoPromptArgument.override_required_args of <awscli.customizations.autoprompt.AutoPromptArgument object at 0x000002A1FB2E9E08>>
2020-04-20 12:28:39,988 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.dynamodb.batch-write-item.request-items: calling handler <awscli.paramfile.URIArgumentHandler object at 0x000002A1FA638248>
2020-04-20 12:28:39,988 - MainThread - awscli.clidriver - DEBUG - Client side parameter validation failed
Traceback (most recent call last):
  File "site-packages\awscli\paramfile.py", line 86, in get_file
  File "c:\codebuild\tmp\output\src050522015\src\repos\awscli\.tox\exe\lib\encodings\cp1252.py", line 23, in decode
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 448: character maps to <undefined>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "site-packages\awscli\paramfile.py", line 45, in __call__
  File "site-packages\awscli\paramfile.py", line 78, in get_paramfile
  File "site-packages\awscli\paramfile.py", line 91, in get_file
awscli.paramfile.ResourceLoadingError: Unable to load paramfile (test.json), text contents could not be decoded.  If this is a binary file, please use the fileb:// prefix instead of the file:// prefix.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "site-packages\awscli\clidriver.py", line 335, in main
  File "site-packages\awscli\clidriver.py", line 507, in __call__
  File "site-packages\awscli\clidriver.py", line 656, in __call__
  File "site-packages\awscli\clidriver.py", line 711, in _build_call_parameters
  File "site-packages\awscli\clidriver.py", line 723, in _unpack_arg
  File "site-packages\awscli\argprocess.py", line 81, in unpack_argument
  File "site-packages\botocore\session.py", line 664, in emit_first_non_none_response
  File "site-packages\botocore\hooks.py", line 227, in emit
  File "site-packages\botocore\hooks.py", line 210, in _emit
  File "site-packages\awscli\paramfile.py", line 47, in __call__
awscli.argprocess.ParamError: Error parsing parameter '--request-items': Unable to load paramfile (test.json), text contents could not be decoded.  If this is a binary file, please use the fileb:// prefix instead of the file:// prefix.

Error parsing parameter '--request-items': Unable to load paramfile (test.json), text contents could not be decoded.  If this is a binary file, please use the fileb:// prefix instead of the file:// prefix.
Press any key to continue . . .

Note that I am on different machine so the default codepage is different from before, SET PYTHONUTF8=1 changed Python's locale to UTF-8, but the CLI seems still trying to use cp1252, maybe this is due to CLI v2 using embedded Python which isn't affected by the enrivonment variable PYTHONUTF8?

@Zeeeeta Replying to this comment/question:

maybe this is due to CLI v2 using embedded Python which isn't affected by the enrivonment variable PYTHONUTF8?

Yep that is exactly what the issue is. We are using pyinstaller to freeze Python into our executable and that does not respect the PYTHONUTF8 or PYTHONIOENCODING env vars at runtime. We are working on a fix for this so that you can set some sort of env var to control the encoding and be able to override the system's encoding.

@kyleknap
Thank you very much, any ETA for this change please?

Hi @Zeeeeta,

In AWS CLI version 2.0.13 has been added environment variable AWS_CLI_FILE_ENCODING. It can be used to set encoding used for text files different from the locale.

More details you can fiind in the docs

Could you please try it out and confirm that it fixed your issue.

Hi @vz10

I also encountered the same problem.
I upgraded the CLI to 2.0.14 and set the environment variable AWS_CLI_FILE_ENCODING=UTF-8, but the problem was not resolved.

> aws --version
aws-cli/2.0.14 Python/3.7.7 Windows/10 botocore/2.0.0dev18

> set AWS_CLI_FILE_ENCODING=UTF-8
> aws cloudformation package --debug ^
  --template-file app-sam.yaml ^
  --s3-bucket %ARTIFACT_BUCKET% ^
  --output-template-file app-output_sam.yaml

--- snip ----

2020-05-17 22:50:43,831 - MainThread - awscli.clidriver - DEBUG - Exception caught in main()
Traceback (most recent call last):
  File "lib\site-packages\awscli\clidriver.py", line 335, in main
  File "lib\site-packages\awscli\clidriver.py", line 507, in __call__
  File "lib\site-packages\awscli\customizations\commands.py", line 190, in __call__
  File "lib\site-packages\awscli\customizations\cloudformation\package.py", line 150, in _run_main
  File "lib\site-packages\awscli\customizations\cloudformation\package.py", line 165, in _export
  File "lib\site-packages\awscli\customizations\cloudformation\artifact_exporter.py", line 565, in __init__
UnicodeDecodeError: 'cp932' codec can't decode byte 0x86 in position 847: illegal multibyte sequence
2020-05-17 22:50:43,832 - MainThread - awscli.clidriver - DEBUG - Exiting with rc 255

'cp932' codec can't decode byte 0x86 in position 847: illegal multibyte sequence

Is my settings incorrect?

@vz10

{
    "{table-name}": [
        {
            "PutRequest": {
                "Item": {
                    "group": {
                        "S": "test"
                    },
                    "key": {
                        "S": "涓枃"
                    },
                    "value": {
                        "L": [
                            {
                                "S": "涓枃"
                            },
                            {
                                "S": "涓枃"
                            }
                        ]
                    }
                }
            }
        },
        {
            "PutRequest": {
                "Item": {
                    "group": {
                        "S": "test"
                    },
                    "key": {
                        "S": "缁忕悊"
                    },
                    "value": {
                        "L": [
                            {
                                "S": "缁忕悊"
                            },
                            {
                                "S": "缁忕悊"
                            }
                        ]
                    }
                }
            }
        },
        {
            "PutRequest": {
                "Item": {
                    "group": {
                        "S": "test"
                    },
                    "key": {
                        "S": "缍撶悊"
                    },
                    "value": {
                        "L": [
                            {
                                "S": "缍撶悊"
                            },
                            {
                                "S": "缍撶悊"
                            }
                        ]
                    }
                }
            }
        }
    ]
}

Tried with the above .json input and aws-cli/2.0.14, they are nicely imported.
image

Hi @monamu ,

Your settings are correct. But this cloudformation customization ignores the encoding.

I'll mark it as a bug. It'll be fixed in future release.

Hi @monamu ,

Sorry for delay

In AWS CLI version 2.0.24 bug due to which some customizations ignored AWS_CLI_FILE_ENCODING environment variable has been fixed.

Could you please try it out and confirm that it fixed your issue.

Hi @vz10

It worked fine!
I could used templatefile that with Japanese comment.
Thank you.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

vadimkim picture vadimkim  路  3Comments

ikim23 picture ikim23  路  3Comments

rahul003 picture rahul003  路  3Comments

braddr picture braddr  路  3Comments

maanbsat picture maanbsat  路  3Comments