Aws-cli: aws dynamodb update-item adding unnecessary base64 encoding to binary attributes

Created on 14 Jan 2015  路  13Comments  路  Source: aws/aws-cli

It seems to me that dynamodb update-item is adding an additional and unnecessary layer of base64 encoding to binary attribute values.

I'm using a table with a string hash key called key (very original) and a string range key called version_part.

Let's follow this example. I start by adding data that contains a binary field with the UTF-8 string "test!":

aws --no-paginate --output json --region '<redacted>' dynamodb update-item --table-name '<redacted>' --return-values 'UPDATED_OLD' --key 'file:///<redacted>.tmp' --attribute-updates 'file:///<redacted>.tmp'

These are the contents of the key file:

{  
   "key":{  
      "S":"test"
   },
   "version_part":{  
      "S":"index"
   }
}

These are the contents of the attribute-updates file:

{  
   "version":{  
      "Action":"ADD",
      "Value":{  
         "N":"1"
      }
   },
   "timestamp":{  
      "Action":"PUT",
      "Value":{  
         "S":"20150114 17:19:09"
      }
   },
   "parts":{  
      "Action":"PUT",
      "Value":{  
         "N":"0"
      }
   },
   "data":{  
      "Action":"PUT",
      "Value":{  
         "B":"dGVzdCE="
      }
   },
   "compress":{  
      "Action":"PUT",
      "Value":{  
         "S":"none"
      }
   }
}

Notice the value of the data attribute, it is the base 64 representation of the string test! as confirmed by the command-line tool base64:

| => echo dGVzdCE= | base64 -D
test!

So the operation succeeds, and I proceed to read the string back.

aws --no-paginate --output json --region '<redacted>' dynamodb batch-get-item --request-items 'file:///<redacted>.tmp'

This is what the request-items file contains:

{  
   "<redacted>":{  
      "ConsistentRead":false,
      "Keys":[  
         {  
            "key":{  
               "S":"test"
            },
            "version_part":{  
               "S":"index"
            }
         }
      ]
   }
}

This is what the JSON output is:

{  
   "UnprocessedKeys":{  

   },
   "Responses":{  
      "<redacted>":[  
         {  
            "version":{  
               "N":"2"
            },
            "timestamp":{  
               "S":"20150114 17:19:09"
            },
            "compress":{  
               "S":"none"
            },
            "version_part":{  
               "S":"index"
            },
            "parts":{  
               "N":"0"
            },
            "key":{  
               "S":"test"
            },
            "data":{  
               "B":"ZEdWemRDRT0="
            }
         }
      ]
   }
}

Notice how the value of data does not match what was written, which was the result of base64encode('test!'). In fact, it is the equivalent of base64encode(base64encode("test!")) as this simple test confirms:

| => echo ZEdWemRDRT0= | base64 -D
dGVzdCE=
| => echo ZEdWemRDRT0= | base64 -D | base64 -D
test!

So it seems that the update-item operation is incorrectly assuming that a binary attribute value is not yet base 64 encoded and is applying an additional layer of base64 encoding. In fact, binary values _must_ already be base64 encoded, or it wouldn't be possible to represent them properly in a JSON format in the first place.

All of this was tested on Max OS X Mavericks:

| => aws --version
aws-cli/1.7.0 Python/2.7.8 Darwin/14.0.0
bug dynamodb

All 13 comments

Thanks for the detailed writeup. I can see what's going on here. This is due to our generic processing for binary/blob types. Just to give more background here, whenever something is modeled as a blob type, we automatically base64 encode the input value. This is a general construct. It applies to any input for any service/operation that has binary input.

However on output, we don't automatically base64 decode values. This is because we don't want to write arbitrary binary data to stdout.

In the case of top level parameters where you can input binary content directly (via fileb://), this seems reasonable. However you raise a good point when the binary type is nested such that the input required is JSON. When this happens, it's not actually possible to enter binary content via JSON, so we need to handle this case.

I'll discuss this with the team and update when we have something to propose. Thanks for bringing this to our attention.

Thanks for the update, and also for providing such a great tool in the first place. :)

Look forward to seeing this fixed, and let me know if there's anything else I can do to help.

Any news on this issue?

I believe this issue should be assigned a higher priority than it currently is, since using using this CLI to interact with DynamoDB on EC2 instances is one of its most fundamental features.

We are suffering from this exact same issue here, too. Note that this Base64 issue does not only manifests in update-item but in every operation including put-item and delete-item, etc.

@asieira, using the term "gentlemen" to address developers is presumptuous, sexist, and unappreciated.

by default when you put_item to dynamodb using binary data type, it will do base64. So i think there're two options there.
1: stop base64 binary and let dynamodb do the work. below is my code sample.

`import boto3
import os
import sys
import botocore
import json
import base64
client = boto3.client('dynamodb',region_name='ap-southeast-2')

note: b'str' is not encoded string, is original string.

str=b'test!'
item={
'testbug':{
'S':'testbug'
},
'testbytes':{
'B': str
}
}

response = client.put_item(TableName='testbug',Item=item)
print (response)`

2: use data type "S", which is String, and aws won't do extra base64 for your data.

This behaviour is really weird.

Like this one liner for copying all items from one dynamodb table to another will fail because of that extra base64 encoding on binary data.

aws dynamodb scan --table-name from_table | jq -rc '.Items[]' | tr '\n' '\0' | xargs -0i aws dynamodb put-item --table-name to_table --item '{}'

There should be a way of disabling that when data is coming from json.

Is there any plan to address this?

I have just hit this issue as well. Can any of the devs on here please comment on whether or not this is going to be addressed?

This is still an issue. Please give me the option to not re-encode data I queried directly from DynamoDB

I am affected by this too. Is there a fix/workaround?

Could you please add some flag to the command to disable the default encoding behaviour when necessary?

The only workaround I found is to use boto3. If you have AWS CLI, then you have Python along with boto3 already installed. Then you can have one-liner python -c 'from boto3 import client; c=client(\"dynamod\"....'.

It's worth noting that this is not a problem w/ the 2.x awscli.

Was this page helpful?
0 / 5 - 0 ratings