Aws-cli: Provide an option to perfom unicode normalization on local file names

Created on 14 Nov 2015  Â·  13Comments  Â·  Source: aws/aws-cli

Summary

aws s3 sync doesn't play well with HFS+ unicode normalization on OS X. I suggest to add an option to normalize file names read locally in normal form C before doing anything with them.

Reproduction steps

  1. Create a file on S3 containing an accented character. For reasons that will become apparent later, do this on a Linux system.

    (linux) % echo test > test/café.txt
    (linux) % aws s3 sync test s3://<test-bucket>/test
    
  2. Synchronize that file on a Mac.

    (OS X) % aws s3 sync s3://<test-bucket>/test test
    download: s3://<test-bucket>/test/café.txt to test/café.txt
    
  3. Synchronize it back to S3.

    (OS X) % aws s3 sync s3://<test-bucket>/test test
    upload: test/café.txt to s3://<test-bucket>/test/café.txt
    
  4. Expected result: no upload because the file is identical locally and on S3: I was just sync'd!
  5. Actual result: the file is uploaded again.

At this point the file shows up twice in S3!

screen shot 2015-11-14 at 22 45 38

Why this happens

Unicode defines two normal forms — NFC and NFD — for some characters, typically accented characters which are common in Western European languages and even occur in English.

The documentation of unicodedata.normalize, the Python function that converts between the two forms, has a good explanation.

A quick illustration:

>>> "café".encode('utf-8')
b'caf\xc3\xa9'
>>> unicodedata.normalize('NFC', "café").encode('utf-8')
b'caf\xc3\xa9'
>>> unicodedata.normalize('NFD', "café").encode('utf-8')
b'cafe\xcc\x81'

The default filesystem of OS X, HFS+, enforces something that resembles NFD. (Let's say I haven't encountered the difference yet.)

Pretty much everything else, including typing on a keyboard on Linux or OS X, uses NFC. I'm not sure about Windows.

Of course this is entirely HFS+'s fault, but since OS X is a popular system among your target audience, I hope you may have some interest in providing a solution to this problem.

What you can do about it

I think a --normalize-unicode option (possibly with a better name) for aws s3 sync would be useful. It would normalize file names read from the local filesystem with unicodedata.normalize('NFKC', filepath).

Its primary purpose would be to interact with S3 on OS X and have file names in NFC form on S3, which is what the rest of the world expects and will cause the least amount of problems.

I don't know aws cli well enough to tell which other parts could use this option. I just encountered the problem when trying to replace "rsync to file server" with "aws s3 sync to S3".

FWIW rsync provides a solution to this problem with the --iconv option. A common idiom is --iconv=UTF8-MAC,UTF8 when rsync'ing from OS X to Linux and --iconv=UTF8,UTF8-MAC when rsync'ing from Linux to OS X. UTF8-MAC is how rsync calls the encoding of file names on HFS+.

However this isn't a good API to tackle the specific problem I'm raising here. This API is about the encoding of file names. The bug is related to Unicode normalization. These are different concepts. UTF8-MAC mixes them.

Thanks!

s3 unicode

Most helpful comment

Patch rebased on top of develop.

commit c5466f2191b073303edef62d531761591e7e6c90
Author: Aymeric Augustin <[email protected]>
Date:   Tue Dec 20 23:05:49 2016 +0100

    Add an option to normalize file names.

diff --git a/awscli/customizations/s3/filegenerator.py b/awscli/customizations/s3/filegenerator.py
index f24ca187..70a17581 100644
--- a/awscli/customizations/s3/filegenerator.py
+++ b/awscli/customizations/s3/filegenerator.py
@@ -13,6 +13,7 @@
 import os
 import sys
 import stat
+import unicodedata

 from dateutil.parser import parse
 from dateutil.tz import tzlocal
@@ -116,7 +117,8 @@ class FileGenerator(object):
     ``FileInfo`` objects to send to a ``Comparator`` or ``S3Handler``.
     """
     def __init__(self, client, operation_name, follow_symlinks=True,
-                 page_size=None, result_queue=None, request_parameters=None):
+                 page_size=None, result_queue=None, request_parameters=None,
+                 normalize_unicode=False):
         self._client = client
         self.operation_name = operation_name
         self.follow_symlinks = follow_symlinks
@@ -127,6 +129,7 @@ class FileGenerator(object):
         self.request_parameters = {}
         if request_parameters is not None:
             self.request_parameters = request_parameters
+        self.normalize_unicode = normalize_unicode

     def call(self, files):
         """
@@ -170,6 +173,8 @@ class FileGenerator(object):
         """
         join, isdir, isfile = os.path.join, os.path.isdir, os.path.isfile
         error, listdir = os.error, os.listdir
+        if self.normalize_unicode:
+            path = unicodedata.normalize('NFKC', path)
         if not self.should_ignore_file(path):
             if not dir_op:
                 stats = self._safely_get_file_stats(path)
@@ -188,6 +193,8 @@ class FileGenerator(object):
                 listdir_names = listdir(path)
                 names = []
                 for name in listdir_names:
+                    if self.normalize_unicode:
+                        name = unicodedata.normalize('NFKC', name)
                     if not self.should_ignore_file_with_decoding_warnings(
                             path, name):
                         file_path = join(path, name)
diff --git a/awscli/customizations/s3/subcommands.py b/awscli/customizations/s3/subcommands.py
index 02d591ea..b9b1d6c9 100644
--- a/awscli/customizations/s3/subcommands.py
+++ b/awscli/customizations/s3/subcommands.py
@@ -418,6 +418,14 @@ REQUEST_PAYER = {
     )
 }

+NORMALIZE_UNICODE = {
+    'name': 'normalize-unicode', 'action': 'store_true',
+    'help_text': (
+        'Normalizes file names read from the local filesystem in unicode '
+        'normal form KC. This is mainly useful when running on OS X.'
+    )
+}
+
 TRANSFER_ARGS = [DRYRUN, QUIET, INCLUDE, EXCLUDE, ACL,
                  FOLLOW_SYMLINKS, NO_FOLLOW_SYMLINKS, NO_GUESS_MIME_TYPE,
                  SSE, SSE_C, SSE_C_KEY, SSE_KMS_KEY_ID, SSE_C_COPY_SOURCE,
@@ -425,7 +433,8 @@ TRANSFER_ARGS = [DRYRUN, QUIET, INCLUDE, EXCLUDE, ACL,
                  WEBSITE_REDIRECT, CONTENT_TYPE, CACHE_CONTROL,
                  CONTENT_DISPOSITION, CONTENT_ENCODING, CONTENT_LANGUAGE,
                  EXPIRES, SOURCE_REGION, ONLY_SHOW_ERRORS,
-                 PAGE_SIZE, IGNORE_GLACIER_WARNINGS, FORCE_GLACIER_TRANSFER]
+                 PAGE_SIZE, IGNORE_GLACIER_WARNINGS, FORCE_GLACIER_TRANSFER,
+                 NORMALIZE_UNICODE]


 def get_client(session, region, endpoint_url, verify, config=None):
@@ -964,12 +973,14 @@ class CommandArchitecture(object):
             'client': self._source_client, 'operation_name': operation_name,
             'follow_symlinks': self.parameters['follow_symlinks'],
             'page_size': self.parameters['page_size'],
+            'normalize_unicode': self.parameters['normalize_unicode'],
             'result_queue': result_queue,
         }
         rgen_kwargs = {
             'client': self._client, 'operation_name': '',
             'follow_symlinks': self.parameters['follow_symlinks'],
             'page_size': self.parameters['page_size'],
+            'normalize_unicode': self.parameters['normalize_unicode'],
             'result_queue': result_queue,
         }

All 13 comments

For what it's worth, the following patch solves my problem:

diff -r -u -x '*.pyc' -w awscli.orig/customizations/s3/filegenerator.py awscli/customizations/s3/filegenerator.py
--- awscli.orig/customizations/s3/filegenerator.py  2015-11-15 08:50:45.000000000 +0100
+++ awscli/customizations/s3/filegenerator.py   2015-11-15 18:56:31.000000000 +0100
@@ -13,6 +13,7 @@
 import os
 import sys
 import stat
+import unicodedata

 from dateutil.parser import parse
 from dateutil.tz import tzlocal
@@ -116,11 +117,12 @@
     ``FileInfo`` objects to send to a ``Comparator`` or ``S3Handler``.
     """
     def __init__(self, client, operation_name, follow_symlinks=True,
-                 page_size=None, result_queue=None):
+                 page_size=None, normalize_unicode=False, result_queue=None):
         self._client = client
         self.operation_name = operation_name
         self.follow_symlinks = follow_symlinks
         self.page_size = page_size
+        self.normalize_unicode = normalize_unicode
         self.result_queue = result_queue
         if not result_queue:
             self.result_queue = queue.Queue()
@@ -167,6 +169,8 @@
         """
         join, isdir, isfile = os.path.join, os.path.isdir, os.path.isfile
         error, listdir = os.error, os.listdir
+        if self.normalize_unicode:
+            path = unicodedata.normalize('NFKC', path)
         if not self.should_ignore_file(path):
             if not dir_op:
                 size, last_update = get_file_stat(path)
@@ -185,6 +189,8 @@
                 listdir_names = listdir(path)
                 names = []
                 for name in listdir_names:
+                    if self.normalize_unicode:
+                        name = unicodedata.normalize('NFKC', name)
                     if not self.should_ignore_file_with_decoding_warnings(
                             path, name):
                         file_path = join(path, name)
diff -r -u -x '*.pyc' -w awscli.orig/customizations/s3/s3handler.py awscli/customizations/s3/s3handler.py
--- awscli.orig/customizations/s3/s3handler.py  2015-11-15 08:50:45.000000000 +0100
+++ awscli/customizations/s3/s3handler.py   2015-11-15 09:25:54.000000000 +0100
@@ -64,7 +64,8 @@
                        'grants': None, 'only_show_errors': False,
                        'is_stream': False, 'paths_type': None,
                        'expected_size': None, 'metadata_directive': None,
-                       'ignore_glacier_warnings': False}
+                       'ignore_glacier_warnings': False,
+                       'normalize_unicode': False}
         self.params['region'] = params['region']
         for key in self.params.keys():
             if key in params:
diff -r -u -x '*.pyc' -w awscli.orig/customizations/s3/subcommands.py awscli/customizations/s3/subcommands.py
--- awscli.orig/customizations/s3/subcommands.py    2015-11-15 08:50:45.000000000 +0100
+++ awscli/customizations/s3/subcommands.py 2015-11-15 18:18:23.000000000 +0100
@@ -301,12 +301,21 @@
 }


+NORMALIZE_UNICODE = {
+    'name': 'normalize-unicode', 'action': 'store_true',
+    'help_text': (
+        'Normalizes file names read from the local filesystem in unicode '
+        'normal form KC. This is mainly useful when running on OS X.'
+    )
+}
+
+
 TRANSFER_ARGS = [DRYRUN, QUIET, INCLUDE, EXCLUDE, ACL,
                  FOLLOW_SYMLINKS, NO_FOLLOW_SYMLINKS, NO_GUESS_MIME_TYPE,
                  SSE, STORAGE_CLASS, GRANTS, WEBSITE_REDIRECT, CONTENT_TYPE,
                  CACHE_CONTROL, CONTENT_DISPOSITION, CONTENT_ENCODING,
                  CONTENT_LANGUAGE, EXPIRES, SOURCE_REGION, ONLY_SHOW_ERRORS,
-                 PAGE_SIZE, IGNORE_GLACIER_WARNINGS]
+                 PAGE_SIZE, IGNORE_GLACIER_WARNINGS, NORMALIZE_UNICODE]


 def get_client(session, region, endpoint_url, verify):
@@ -770,10 +779,12 @@
                                        operation_name,
                                        self.parameters['follow_symlinks'],
                                        self.parameters['page_size'],
+                                       self.parameters['normalize_unicode'],
                                        result_queue=result_queue)
         rev_generator = FileGenerator(self._client, '',
                                       self.parameters['follow_symlinks'],
                                       self.parameters['page_size'],
+                                      self.parameters['normalize_unicode'],
                                       result_queue=result_queue)
         taskinfo = [TaskInfo(src=files['src']['path'],
                              src_type='s3',

I'm not submitting it as a PR because it's missing at least tests and documentation. I'm mostly leaving it here in case others find it helpful.

Of course, feel free to use it as a starting point for fixing this issue if my approach doesn't seem too off base.

EDIT: just updated the patch to apply unicode normalization before sorting file names.

Wow, nice work! We'll look into it

I created a branch and opened a pull request in order to make it easier to maintain the patch -- the recent release broke it.

Here's a new version of the patch, recreated against the latest release.

In case someone else uses it:

  • I plan to maintain it for the foreseeable future because I need it. I'll post occasional updates here. Changes are extremely limited and should be easy to port to future versions.
  • While the initial response was positive, it's unclear whether AWS plans to fix this bug. Unfortunately, in my experience, Americans companies tend not to care much about Unicode, even if they do business internationally, so I'm not getting my hopes too high.
  • For this reason, I suggest sticking to ASCII file names on S3 rather than using this if it isn't too late for you. (It is too late for me.)
diff -r -u -x '*.pyc' -w awscli.orig/customizations/s3/filegenerator.py awscli/customizations/s3/filegenerator.py
--- awscli.orig/customizations/s3/filegenerator.py  2015-11-15 08:50:45.000000000 +0100
+++ awscli/customizations/s3/filegenerator.py   2015-11-15 18:56:31.000000000 +0100
@@ -13,6 +13,7 @@
 import os
 import sys
 import stat
+import unicodedata

 from dateutil.parser import parse
 from dateutil.tz import tzlocal
@@ -116,11 +117,12 @@
     ``FileInfo`` objects to send to a ``Comparator`` or ``S3Handler``.
     """
     def __init__(self, client, operation_name, follow_symlinks=True,
-                 page_size=None, result_queue=None):
+                 page_size=None, normalize_unicode=False, result_queue=None):
         self._client = client
         self.operation_name = operation_name
         self.follow_symlinks = follow_symlinks
         self.page_size = page_size
+        self.normalize_unicode = normalize_unicode
         self.result_queue = result_queue
         if not result_queue:
             self.result_queue = queue.Queue()
@@ -167,6 +169,8 @@
         """
         join, isdir, isfile = os.path.join, os.path.isdir, os.path.isfile
         error, listdir = os.error, os.listdir
+        if self.normalize_unicode:
+            path = unicodedata.normalize('NFKC', path)
         if not self.should_ignore_file(path):
             if not dir_op:
                 size, last_update = get_file_stat(path)
@@ -185,6 +189,8 @@
                 listdir_names = listdir(path)
                 names = []
                 for name in listdir_names:
+                    if self.normalize_unicode:
+                        name = unicodedata.normalize('NFKC', name)
                     if not self.should_ignore_file_with_decoding_warnings(
                             path, name):
                         file_path = join(path, name)
diff -r -u -x '*.pyc' -w awscli.orig/customizations/s3/s3handler.py awscli/customizations/s3/s3handler.py
--- awscli.orig/customizations/s3/s3handler.py  2015-11-15 08:50:45.000000000 +0100
+++ awscli/customizations/s3/s3handler.py   2015-11-15 09:25:54.000000000 +0100
@@ -64,7 +64,8 @@
                        'grants': None, 'only_show_errors': False,
                        'is_stream': False, 'paths_type': None,
                        'expected_size': None, 'metadata_directive': None,
-                       'ignore_glacier_warnings': False}
+                       'ignore_glacier_warnings': False,
+                       'normalize_unicode': False}
         self.params['region'] = params['region']
         for key in self.params.keys():
             if key in params:
diff -r -u -x '*.pyc' -w awscli.orig/customizations/s3/subcommands.py awscli/customizations/s3/subcommands.py
--- awscli.orig/customizations/s3/subcommands.py    2015-11-15 08:50:45.000000000 +0100
+++ awscli/customizations/s3/subcommands.py 2015-11-15 18:18:23.000000000 +0100
@@ -301,12 +301,21 @@
 }


+NORMALIZE_UNICODE = {
+    'name': 'normalize-unicode', 'action': 'store_true',
+    'help_text': (
+        'Normalizes file names read from the local filesystem in unicode '
+        'normal form KC. This is mainly useful when running on OS X.'
+    )
+}
+
+
 TRANSFER_ARGS = [DRYRUN, QUIET, INCLUDE, EXCLUDE, ACL,
                  FOLLOW_SYMLINKS, NO_FOLLOW_SYMLINKS, NO_GUESS_MIME_TYPE,
                  SSE, STORAGE_CLASS, GRANTS, WEBSITE_REDIRECT, CONTENT_TYPE,
                  CACHE_CONTROL, CONTENT_DISPOSITION, CONTENT_ENCODING,
                  CONTENT_LANGUAGE, EXPIRES, SOURCE_REGION, ONLY_SHOW_ERRORS,
-                 PAGE_SIZE, IGNORE_GLACIER_WARNINGS]
+                 PAGE_SIZE, IGNORE_GLACIER_WARNINGS, NORMALIZE_UNICODE]


 def get_client(session, region, endpoint_url, verify):
@@ -770,10 +779,12 @@
                                        operation_name,
                                        self.parameters['follow_symlinks'],
                                        self.parameters['page_size'],
+                                       self.parameters['normalize_unicode'],
                                        result_queue=result_queue)
         rev_generator = FileGenerator(self._client, '',
                                       self.parameters['follow_symlinks'],
                                       self.parameters['page_size'],
+                                      self.parameters['normalize_unicode'],
                                       result_queue=result_queue)
         taskinfo = [TaskInfo(src=files['src']['path'],
                              src_type='s3',

Thanks for the excellent analysis Aymeric, this is exactly the issue I'm experiencing and it was difficult to track down.

I hope somebody from AWS can help us here.

Updated version of the patch against the latest release.

commit 78640c7f7a345fb3740b72c239007470a5709caf
Author: Aymeric Augustin
Date:   Tue Dec 20 23:05:49 2016 +0100

    Add an option to normalize file names.

diff --git a/awscli/customizations/s3/filegenerator.py b/awscli/customizations/s3/filegenerator.py
index d33b77f..13a7f1d 100644
--- a/awscli/customizations/s3/filegenerator.py
+++ b/awscli/customizations/s3/filegenerator.py
@@ -13,6 +13,7 @@
 import os
 import sys
 import stat
+import unicodedata

 from dateutil.parser import parse
 from dateutil.tz import tzlocal
@@ -116,7 +117,8 @@ class FileGenerator(object):
     ``FileInfo`` objects to send to a ``Comparator`` or ``S3Handler``.
     """
     def __init__(self, client, operation_name, follow_symlinks=True,
-                 page_size=None, result_queue=None, request_parameters=None):
+                 page_size=None, result_queue=None, request_parameters=None,
+                 normalize_unicode=False):
         self._client = client
         self.operation_name = operation_name
         self.follow_symlinks = follow_symlinks
@@ -127,6 +129,7 @@ class FileGenerator(object):
         self.request_parameters = {}
         if request_parameters is not None:
             self.request_parameters = request_parameters
+        self.normalize_unicode = normalize_unicode

     def call(self, files):
         """
@@ -170,6 +173,8 @@ class FileGenerator(object):
         """
         join, isdir, isfile = os.path.join, os.path.isdir, os.path.isfile
         error, listdir = os.error, os.listdir
+        if self.normalize_unicode:
+            path = unicodedata.normalize('NFKC', path)
         if not self.should_ignore_file(path):
             if not dir_op:
                 size, last_update = get_file_stat(path)
@@ -189,6 +194,8 @@ class FileGenerator(object):
                 listdir_names = listdir(path)
                 names = []
                 for name in listdir_names:
+                    if self.normalize_unicode:
+                        name = unicodedata.normalize('NFKC', name)
                     if not self.should_ignore_file_with_decoding_warnings(
                             path, name):
                         file_path = join(path, name)
diff --git a/awscli/customizations/s3/subcommands.py b/awscli/customizations/s3/subcommands.py
index 4bc7398..04afe3f 100644
--- a/awscli/customizations/s3/subcommands.py
+++ b/awscli/customizations/s3/subcommands.py
@@ -417,6 +417,14 @@ REQUEST_PAYER = {
     )
 }

+NORMALIZE_UNICODE = {
+    'name': 'normalize-unicode', 'action': 'store_true',
+    'help_text': (
+        'Normalizes file names read from the local filesystem in unicode '
+        'normal form KC. This is mainly useful when running on OS X.'
+    )
+}
+
 TRANSFER_ARGS = [DRYRUN, QUIET, INCLUDE, EXCLUDE, ACL,
                  FOLLOW_SYMLINKS, NO_FOLLOW_SYMLINKS, NO_GUESS_MIME_TYPE,
                  SSE, SSE_C, SSE_C_KEY, SSE_KMS_KEY_ID, SSE_C_COPY_SOURCE,
@@ -424,7 +432,8 @@ TRANSFER_ARGS = [DRYRUN, QUIET, INCLUDE, EXCLUDE, ACL,
                  WEBSITE_REDIRECT, CONTENT_TYPE, CACHE_CONTROL,
                  CONTENT_DISPOSITION, CONTENT_ENCODING, CONTENT_LANGUAGE,
                  EXPIRES, SOURCE_REGION, ONLY_SHOW_ERRORS,
-                 PAGE_SIZE, IGNORE_GLACIER_WARNINGS, FORCE_GLACIER_TRANSFER]
+                 PAGE_SIZE, IGNORE_GLACIER_WARNINGS, FORCE_GLACIER_TRANSFER,
+                 NORMALIZE_UNICODE]


 def get_client(session, region, endpoint_url, verify, config=None):
@@ -963,12 +972,14 @@ class CommandArchitecture(object):
             'client': self._source_client, 'operation_name': operation_name,
             'follow_symlinks': self.parameters['follow_symlinks'],
             'page_size': self.parameters['page_size'],
+            'normalize_unicode': self.parameters['normalize_unicode'],
             'result_queue': result_queue,
         }
         rgen_kwargs = {
             'client': self._client, 'operation_name': '',
             'follow_symlinks': self.parameters['follow_symlinks'],
             'page_size': self.parameters['page_size'],
+            'normalize_unicode': self.parameters['normalize_unicode'],
             'result_queue': result_queue,
         }

This patch is perfect for me, thanks. 👍

Patch rebased on top of develop.

commit c5466f2191b073303edef62d531761591e7e6c90
Author: Aymeric Augustin <[email protected]>
Date:   Tue Dec 20 23:05:49 2016 +0100

    Add an option to normalize file names.

diff --git a/awscli/customizations/s3/filegenerator.py b/awscli/customizations/s3/filegenerator.py
index f24ca187..70a17581 100644
--- a/awscli/customizations/s3/filegenerator.py
+++ b/awscli/customizations/s3/filegenerator.py
@@ -13,6 +13,7 @@
 import os
 import sys
 import stat
+import unicodedata

 from dateutil.parser import parse
 from dateutil.tz import tzlocal
@@ -116,7 +117,8 @@ class FileGenerator(object):
     ``FileInfo`` objects to send to a ``Comparator`` or ``S3Handler``.
     """
     def __init__(self, client, operation_name, follow_symlinks=True,
-                 page_size=None, result_queue=None, request_parameters=None):
+                 page_size=None, result_queue=None, request_parameters=None,
+                 normalize_unicode=False):
         self._client = client
         self.operation_name = operation_name
         self.follow_symlinks = follow_symlinks
@@ -127,6 +129,7 @@ class FileGenerator(object):
         self.request_parameters = {}
         if request_parameters is not None:
             self.request_parameters = request_parameters
+        self.normalize_unicode = normalize_unicode

     def call(self, files):
         """
@@ -170,6 +173,8 @@ class FileGenerator(object):
         """
         join, isdir, isfile = os.path.join, os.path.isdir, os.path.isfile
         error, listdir = os.error, os.listdir
+        if self.normalize_unicode:
+            path = unicodedata.normalize('NFKC', path)
         if not self.should_ignore_file(path):
             if not dir_op:
                 stats = self._safely_get_file_stats(path)
@@ -188,6 +193,8 @@ class FileGenerator(object):
                 listdir_names = listdir(path)
                 names = []
                 for name in listdir_names:
+                    if self.normalize_unicode:
+                        name = unicodedata.normalize('NFKC', name)
                     if not self.should_ignore_file_with_decoding_warnings(
                             path, name):
                         file_path = join(path, name)
diff --git a/awscli/customizations/s3/subcommands.py b/awscli/customizations/s3/subcommands.py
index 02d591ea..b9b1d6c9 100644
--- a/awscli/customizations/s3/subcommands.py
+++ b/awscli/customizations/s3/subcommands.py
@@ -418,6 +418,14 @@ REQUEST_PAYER = {
     )
 }

+NORMALIZE_UNICODE = {
+    'name': 'normalize-unicode', 'action': 'store_true',
+    'help_text': (
+        'Normalizes file names read from the local filesystem in unicode '
+        'normal form KC. This is mainly useful when running on OS X.'
+    )
+}
+
 TRANSFER_ARGS = [DRYRUN, QUIET, INCLUDE, EXCLUDE, ACL,
                  FOLLOW_SYMLINKS, NO_FOLLOW_SYMLINKS, NO_GUESS_MIME_TYPE,
                  SSE, SSE_C, SSE_C_KEY, SSE_KMS_KEY_ID, SSE_C_COPY_SOURCE,
@@ -425,7 +433,8 @@ TRANSFER_ARGS = [DRYRUN, QUIET, INCLUDE, EXCLUDE, ACL,
                  WEBSITE_REDIRECT, CONTENT_TYPE, CACHE_CONTROL,
                  CONTENT_DISPOSITION, CONTENT_ENCODING, CONTENT_LANGUAGE,
                  EXPIRES, SOURCE_REGION, ONLY_SHOW_ERRORS,
-                 PAGE_SIZE, IGNORE_GLACIER_WARNINGS, FORCE_GLACIER_TRANSFER]
+                 PAGE_SIZE, IGNORE_GLACIER_WARNINGS, FORCE_GLACIER_TRANSFER,
+                 NORMALIZE_UNICODE]


 def get_client(session, region, endpoint_url, verify, config=None):
@@ -964,12 +973,14 @@ class CommandArchitecture(object):
             'client': self._source_client, 'operation_name': operation_name,
             'follow_symlinks': self.parameters['follow_symlinks'],
             'page_size': self.parameters['page_size'],
+            'normalize_unicode': self.parameters['normalize_unicode'],
             'result_queue': result_queue,
         }
         rgen_kwargs = {
             'client': self._client, 'operation_name': '',
             'follow_symlinks': self.parameters['follow_symlinks'],
             'page_size': self.parameters['page_size'],
+            'normalize_unicode': self.parameters['normalize_unicode'],
             'result_queue': result_queue,
         }

I had a bit of free time this morning so I took a look at this. It doesn't look like this will work since we will need to operate on those files down the line and having the altered path will break that. I think the changes necessary to fully support this feature would need to be more invasive.

Good Morning!

We're closing this issue here on GitHub, as part of our migration to UserVoice for feature requests involving the AWS CLI.

This will let us get the most important features to you, by making it easier to search for and show support for the features you care the most about, without diluting the conversation with bug reports.

As a quick UserVoice primer (if not already familiar): after an idea is posted, people can vote on the ideas, and the product team will be responding directly to the most popular suggestions.

We’ve imported existing feature requests from GitHub - Search for this issue there!

And don't worry, this issue will still exist on GitHub for posterity's sake. As it’s a text-only import of the original post into UserVoice, we’ll still be keeping in mind the comments and discussion that already exist here on the GitHub issue.

GitHub will remain the channel for reporting bugs.

Once again, this issue can now be found by searching for the title on: https://aws.uservoice.com/forums/598381-aws-command-line-interface

-The AWS SDKs & Tools Team

This entry can specifically be found on UserVoice at: https://aws.uservoice.com/forums/598381-aws-command-line-interface/suggestions/33168379-provide-an-option-to-perfom-unicode-normalization

This message was created automatically by mail delivery software.

A message that you sent could not be delivered to one or more of its
recipients. This is a temporary error. The following address(es) deferred:

[email protected]
Domain salmanwaheed.info has exceeded the max emails per hour (163/150 (108%)) allowed. Message will be reattempted later

------- This is a copy of the message, including all the headers. ------
Received: from github-smtp2-ext1.iad.github.net ([192.30.252.192]:34761 helo=github-smtp2a-ext-cp1-prd.iad.github.net)
by box1177.bluehost.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256)
(Exim 4.89_1)
(envelope-from noreply@github.com)
id 1ej0Pc-001aoJ-Eq
for [email protected]; Tue, 06 Feb 2018 03:23:40 -0700
Date: Tue, 06 Feb 2018 02:23:29 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=github.com;
s=pf2014; t=1517912609;
bh=s25/ZHjWhyhYV9V97C8YTJNZ5BORhSs5xPzdklFZIKk=;
h=From:Reply-To:To:Cc:In-Reply-To:References:Subject:List-ID:
List-Archive:List-Post:List-Unsubscribe:From;
b=Z5vLfuztlKa3gUlFxh+rQiu6Swt+G7hinUV/cSIOkbzYfAWamnhD0ULyBqsv52peJ
stwTFQoWt4in2Tf4AhG9ZXAivaotPW0i81bIOZjiXnFd8vfgaVj0s3bxRpwx4Tj/6r
FuFEFp5+1eaUj88/4+viBqt+X152syrZ3YEkGWjo=
From: Andre Sayre notifications@github.com
Reply-To: aws/aws-cli reply@reply.github.com
To: aws/aws-cli aws-cli@noreply.github.com
Cc: Subscribed subscribed@noreply.github.com
Message-ID:
In-Reply-To:
References:
Subject: Re: [aws/aws-cli] Provide an option to perfom unicode normalization
on local file names (#1639)
Mime-Version: 1.0
Content-Type: multipart/alternative;
boundary="--==_mimepart_5a798221e024c_1fee2ad9784d8ed44348e1";
charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: list
X-GitHub-Sender: ASayre
X-GitHub-Recipient: salmanwaheed
X-GitHub-Reason: subscribed
List-ID: aws/aws-cli
List-Archive: https://github.com/aws/aws-cli
List-Post: reply@reply.github.com
List-Unsubscribe: ,
https://github.com/notifications/unsubscribe/AO8bOM9ETFXf7BbCu4Gt-bci8Pk4jmUHks5tSCghgaJpZM4Gibvq
X-Auto-Response-Suppress: All
X-GitHub-Recipient-Address: [email protected]
X-Spam-Status: No, score=0.5
X-Spam-Score: 5
X-Spam-Bar: /
X-Ham-Report: Spam detection software, running on the system "box1177.bluehost.com",
has NOT identified this incoming email as spam. The original
message has been attached to this so you can view it or label
similar future email. If you have any questions, see
root\@localhost for details.

Content preview: Closed #1639. -- You are receiving this because you are subscribed
to this thread. Reply to this email directly or view it on GitHub: https://github.com/aws/aws-cli/issues/1639#event-1459789997
Closed #1639. [...]

Content analysis details: (0.5 points, 5.0 required)

pts rule name description


0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked.
See
http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
for more information.
[URIs: github.com]
-0.5 SPF_PASS SPF: sender matches SPF record
0.0 HTML_MESSAGE BODY: HTML included in message
0.7 HTML_IMAGE_ONLY_20 BODY: HTML: images with 1600-2000 bytes of words
-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
2.5 DCC_CHECK No description available.
-0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's
domain
0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid
-2.1 AWL AWL: Adjusted score from AWL reputation of From: address
X-Spam-Flag: NO

----==_mimepart_5a798221e024c_1fee2ad9784d8ed44348e1
Content-Type: text/plain;
charset=UTF-8
Content-Transfer-Encoding: 7bit

Closed #1639.

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/aws/aws-cli/issues/1639#event-1459789997
----==_mimepart_5a798221e024c_1fee2ad9784d8ed44348e1
Content-Type: text/html;
charset=UTF-8
Content-Transfer-Encoding: 7bit

Closed #1639.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.


----==_mimepart_5a798221e024c_1fee2ad9784d8ed44348e1--

This message was created automatically by mail delivery software.

A message that you sent could not be delivered to one or more of its
recipients. This is a temporary error. The following address(es) deferred:

[email protected]
Domain salmanwaheed.info has exceeded the max emails per hour (162/150 (108%)) allowed. Message will be reattempted later

------- This is a copy of the message, including all the headers. ------
------ The body of the message is 6170 characters long; only the first
------ 5000 or so are included here.
Received: from github-smtp2-ext1.iad.github.net ([192.30.252.192]:34195 helo=github-smtp2a-ext-cp1-prd.iad.github.net)
by box1177.bluehost.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256)
(Exim 4.89_1)
(envelope-from noreply@github.com)
id 1ej0Pb-001aoA-8m
for [email protected]; Tue, 06 Feb 2018 03:23:39 -0700
Date: Tue, 06 Feb 2018 02:23:28 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=github.com;
s=pf2014; t=1517912608;
bh=Y/hd9JmoeMXxH6KcRXvfPyHL6nLfCP0pkkFmBhdNXcw=;
h=From:Reply-To:To:Cc:In-Reply-To:References:Subject:List-ID:
List-Archive:List-Post:List-Unsubscribe:From;
b=cAiSo4/7KEkv8Y09Jc9toFjiBRsftUbnU6o4wAN3r99MK75KQdvfWNMs47IuPeIUc
iLCjtWYRi66OiNWPx41icZ/f1wzH67rnKH4BuzQh6wgR//S+gtQfFyNCEHUh7Y+fHN
bzgdujckmQC6NeZe79OADG6IM+i3wW0Cx/+8B6sw=
From: Andre Sayre notifications@github.com
Reply-To: aws/aws-cli reply@reply.github.com
To: aws/aws-cli aws-cli@noreply.github.com
Cc: Subscribed subscribed@noreply.github.com
Message-ID:
In-Reply-To:
References:
Subject: Re: [aws/aws-cli] Provide an option to perfom unicode normalization
on local file names (#1639)
Mime-Version: 1.0
Content-Type: multipart/alternative;
boundary="--==_mimepart_5a7982208e1ea_167e2aec7b20eecc215875";
charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: list
X-GitHub-Sender: ASayre
X-GitHub-Recipient: salmanwaheed
X-GitHub-Reason: subscribed
List-ID: aws/aws-cli
List-Archive: https://github.com/aws/aws-cli
List-Post: reply@reply.github.com
List-Unsubscribe: ,
https://github.com/notifications/unsubscribe/AO8bOGxOP_4Qx_TAGx-UXBEgDiRQuEKBks5tSCgggaJpZM4Gibvq
X-Auto-Response-Suppress: All
X-GitHub-Recipient-Address: [email protected]
X-Spam-Status: No, score=-1.1
X-Spam-Score: -10
X-Spam-Bar: -
X-Ham-Report: Spam detection software, running on the system "box1177.bluehost.com",
has NOT identified this incoming email as spam. The original
message has been attached to this so you can view it or label
similar future email. If you have any questions, see
root\@localhost for details.

Content preview: Good Morning! We're closing this issue here on GitHub, as
part of our migration to UserVoice
for feature requests involving the AWS CLI. [...]

Content analysis details: (-1.1 points, 5.0 required)

pts rule name description


0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked.
See
http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
for more information.
[URIs: github.com]
-0.5 SPF_PASS SPF: sender matches SPF record
0.0 HTML_MESSAGE BODY: HTML included in message
-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
-0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's
domain
0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid
-0.5 AWL AWL: Adjusted score from AWL reputation of From: address
X-Spam-Flag: NO

----==_mimepart_5a7982208e1ea_167e2aec7b20eecc215875
Content-Type: text/plain;
charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Good Morning!

We're closing this issue here on GitHub, as part of our migration to Use=
rVoice
for feature requests involving the AWS CLI.

This will let us get the most important features to you, by making it eas=
ier to search for and show support for the features you care the most abo=
ut, without diluting the conversation with bug reports.

As a quick UserVoice primer (if not already familiar): after an idea is p=
osted, people can vote on the ideas, and the product team will be respond=
ing directly to the most popular suggestions.

We=E2=80=99ve imported existing feature requests from GitHub - Search for=
this issue there!

And don't worry, this issue will still exist on GitHub for posterity's sa=
ke. As it=E2=80=99s a text-only import of the original post into UserVoi=
ce, we=E2=80=99ll still be keeping in mind the comments and discussion th=
at already exist here on the GitHub issue.

GitHub will remain the channel for reporting bugs. =

Once again, this issue can now be found by searching for the title on: ht=
tps://aws.uservoice.com/forums/598381-aws-command-line-interface =

-The AWS SDKs & Tools Team

-- =

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/aws/aws-cli/issues/1639#issuecomment-363377996=

----==_mimepart_5a7982208e1ea_167e2aec7b20eecc215875
Content-Type: text/html;
charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Good Morning!

We're closing this issue here on GitHub, as part of our migration to <= a href=3D"https://aws.uservoice.com/forums/598381-aws-command-line-interf= ace" rel=3D"nofollow">UserVoice for feature requests involving the AW= S CLI.

This will let us get the most important features to you, by making it = easier to search for and show support for the features you care the most = about, without diluting the conversation with bug reports.

As a quick UserVoice primer (if not already familiar): after an idea i= s posted, people can vote on the ideas, and the product team will be resp= onding directly to the most popular suggestions.

We=E2=80=99ve imported existing feature requests from GitHub - Search = for this issue there!

And don't worry, this issue will still exist on GitHub for posterity's= sake. As it=E2=80=99s a text-only import of the original post into User= Voice, we=E2=80=99ll still be keeping in mind the comments and discussion= that already exist here on the GitHub issue.

GitHub will remain the channel for reporting bugs.

Once again, this issue can now be found by searching for the title on:= https://aws.uservoice.com/forums/598381-aws-comma= nd-line-interface

-The AWS SDKs & Tools Team

&m= dash;
You are receiving this because you are subscribed to this thre= ad.
Reply to this email directly, view it on GitHub, or mute the thread.3D""

ta>