Describe the bug
Using multi-byte characters in BlockBlobClient#QueryAsync's SQL results in a HTTP 400 error.
Expected behavior
Query can be executed successfully without errors.
Actual behavior (include Exception or Stack Trace)
If I specify a multi-byte character (in my case, Japanese) in the SQL, a 400 error occurs and the query cannot be executed.
Azure.RequestFailedException: 'Service request failed.
Status: 400 (XML specified is not syntactically valid.)
ErrorCode: InvalidXmlDocument
Headers:
Server: Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0
x-ms-error-code: InvalidXmlDocument
x-ms-request-id: 951ecc82-401e-0122-5011-94d975000000
x-ms-version: 2019-12-12
x-ms-client-request-id: 8974ba70-388c-475e-91f2-5b3e949bd205
Date: Sat, 26 Sep 2020 14:31:11 GMT
Content-Length: 229
Content-Type: application/xml
'
To Reproduce
You can reproduce this by running the following code
var blobServiceClient = new BlobServiceClient("CONNECTION_STRING");
var containerClient = blobServiceClient.GetBlobContainerClient("CONTAINER_NAME");
var blobClient = containerClient.GetBlockBlobClient("BLOB_NAME");
var options = new BlobQueryOptions
{
InputTextConfiguration = new BlobQueryCsvTextOptions
{
HasHeaders = false
}
};
var result = await blobClient.QueryAsync("SELECT * FROM BlobStorage WHERE _1 = '東京都'", options);
Environment:
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.
Thank you for your feedback. Tagging and routing to the team best able to assist.
Hi,
Thank you for the code snippet, I was able to reproduce your error.
Something strange is happening with the characters where the last bit of our request body is getting chopped off causing the 400 InvalidXMLDocument you are seeing.
Fiddler Request Trace
POST http://[redacted].blob.core.windows.net/test-container-fb279800-253a-4693-9ea2-0c751739577b/test-blob-01bc95cf-419a-4bbd-9d5a-7d60e9273f8c?comp=query HTTP/1.1
x-ms-version: 2019-12-12
x-ms-client-request-id: [redacted]
x-ms-return-client-request-id: true
User-Agent: azsdk-net-Storage.Blobs/12.7.0-alpha.20201001.1 (.NET Core 4.6.29220.03; Microsoft Windows 10.0.19041 )
x-ms-date: Thu, 01 Oct 2020 17:24:20 GMT
Authorization: [redacted]
Content-Type: application/xml
Content-Length: 122
Host: [redacted].blob.core.windows.net
<QueryRequest><QueryType>SQL</QueryType><Expression>SELECT * FROM BlobStorage WHERE _1 = '東京都'</Expression></QueryRe
We will look into this why this is happening and update the issue accordingly. Thank you for bringing this issue to our attention.
I've tested it too, but there seems to be a problem with the handling of Content-Length. It doesn't seem to take into account that multi-byte strings are used.
I call Encoding.UTF8.GetByteCount and set Content-Length, it works correctly. (It's an inefficient code for testing purposes)
_request.Headers.SetValue("Content-Type", "application/xml");
- _request.Headers.SetValue("Content-Length", _text.Length.ToString(System.Globalization.CultureInfo.InvariantCulture));
+ _request.Headers.SetValue("Content-Length", System.Text.Encoding.UTF8.GetByteCount(_text).ToString(System.Globalization.CultureInfo.InvariantCulture));
_request.Content = Azure.Core.RequestContent.Create(System.Text.Encoding.UTF8.GetBytes(_text));