Is there a way to delete these objects while avoiding any errors while using batch delete?
/\x10 in file name.s3.Bucket.objectsCollectionobjs = bucket.objects.filter(Prefix="path/to/includes_escape")
for obj in objs.all():
print(repr((obj.key)))
-->
'path/to/includes_escape/\x10\x10\x10\x10\x10\x10\x10foo.jpg'
'path/to/includes_escape/\x10\x10\x10\x10\x10\x10\x10bar.jpg'
objs.delete()
--> ClientError: An error occurred (MalformedXML) when calling the DeleteObjects operation: The XML you provided was not well-formed or did not validate against our published schema
(edit)
debug trace
boto3.resources.collection [DEBUG] Calling paginated s3:list_objects with {'Bucket': 'sawadev-s3object-filter', 'Prefix': 'path/to/includes_escape'}
boto3.resources.factory [DEBUG] Loading s3:ObjectSummary
boto3.resources.model [DEBUG] Renaming ObjectSummary attribute key
boto3.resources.action [DEBUG] Calling s3:delete_objects with {
'Bucket': 'my-example-bucket',
'Delete': {'Objects': [
{'Key': 'path/to/includes_escape/\x10\x10\x10\x10\x10\x10\x10foo.jpg'},
{'Key': 'path/to/includes_escape/\x10\x10\x10\x10\x10\x10\x10bar.jpg'}
]}}
maybe, we should replace control characters with character reference?
e.g. \x10 => \x10
We probably need to do such substitution for the action parameter.
example:
import re
unsafe = '\x00\x01\x07\x10foo.jpg'
reg = re.compile('[\x00-\x1F]')
def to_ref(string):
s = []
for x in repr(string):
s.append('&#' + str(ord(x)) + ';')
del s[-1]
del s[0]
return ''.join(s)
mg = reg.findall(unsafe)
mg = list(set(mg))
print(repr(unsafe))
for x in mg:
ref = to_ref(x)
unsafe = unsafe.replace(x, ref)
print(repr(unsafe))
#-->
'\x00\x01\x07\x10foo.jpg'
'\x00\x01\x07\x10foo.jpg'
'\x00\x01\x07\x10foo.jpg'
'\x00\x01\x07\x10foo.jpg'
'\x00\x01\x07\x10foo.jpg'
@sawanoboly - Thank you for your post. I am not able to reproduce the problem. When i use your code inside a lambda function i am not getting the file name as you got. Instead i receive this file:
"test2005/includes_escape/\\x10\\x10\\x10\\x10\\x10\\x10\\x10bar.jpg"
And when i execute your code with python 3.7 i am getting the same file name as yours. But in both the case delete() method is successfully deleting the file without any error.
Please provide me with the full debug log.
@swetashre Thank you for your response.
But, when you creating objects, it looks like you are creating keys in a safe way. So it's not a control character, it's an escaped backslash.
We prepared sample file for reproducing and an example of how to create S3 object
https://s3.amazonaws.com/download.getshifter.io/temp/wapuu.zip
$ wget https://s3.amazonaws.com/download.getshifter.io/temp/wapuu.zip
$ unzip -l wapuu.zip
Archive: wapuu.zip
Length Date Time Name
--------- ---------- ----- ----
31022 06-14-2019 10:39 wapuu/^P^P^Pwapuu_escape.png ###<<< here is filename which includes control character
31022 06-14-2019 10:39 wapuu/wapuu.png
--------- -------
62044 2 files
Next, extract zip file by using a utility that does not remove file name control characters (e.g. The Unarchiver.app). The cli unzip command removes control characters from file names.
After confirming that control characters remain in the file name, upload the file using aws-cli.
$ ls -1 wapuu/
???wapuu_escape.png
wapuu.png
$ aws s3 sync wapuu s3://your_test_bucket/wapuu
upload: wapuu/wapuu_escape.png to s3://your_test_bucket/wapuu/wapuu_escape.png
upload: wapuu/wapuu.png to s3://your_test_bucket/wapuu/wapuu.png
You can use this procedure to prepare for reproduction.
Objects are created with the following key:
After that, I think that you can confirm with the reproduction procedure presented at the beginning.
@sawanoboly - Thank you for providing me with reproduction step. With those step now i am able to reproduce the issue. I also got these error when i tried objs.delete() because currently s3 doesn't support those character in order to do batch delete. That's why you are getting XML malformed error from the service.
But i am able to delete those key individually without any error. Instead of doing objs.delete() you can try deleting those file one at a time.
Hope it helps and let me know if you have any questions.
I often delete thousands to tens of thousands of objects.
As a result of this problem, if there is at least one key in this state among objects matching the filter, all objects can not be deleted. This is my biggest problem.
It is easy to handle it if it can be summarized in the batch execution response, 'Errors', but in fact it will raise an exception, so no object will disappear.
In that case (delete thousands to tens of thousands of objects), I think that the time taken for the following two processes will be greatly different. Correct?
# batch delete
objs.delete()
# delete each object
for obj in objs.all():
obj.delete()
If my recognition is wrong and it takes time for these two and API call restrictions are similar, I will choose the 'each object' process.
However, if the 'each object' process results in a significant increase in required time or API limitations, the following workaround will be used.
try:
objs.delete()
except ClientError:
for obj in objs.all():
obj.delete()
Alternatively, it is one of the candidates to check in advance the presence or absence of control characters for all keys and exclude them prior to batch execution.
Either way, it's nice to have objs.delete () run someday safely due to the high cost of processing.
@sawanoboly - Thank you for sharing your feedback. Yes you are right this can be used as a workaround till the service team fix the issue.
try:
objs.delete()
except ClientError:
for obj in objs.all():
obj.delete()
This will be a feature request for the service team. I will let the service team know about the issue.
Please let me know if you have any questions.
This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.
I am still seeing this issue. Has there been any resolution to this?
Most helpful comment
I am still seeing this issue. Has there been any resolution to this?