Azure-sdk-for-net: [BUG] Can't use multi-byte characters in BlockBlobClient#QueryAsync

Created on 26 Sep 2020  ·  4Comments  ·  Source: Azure/azure-sdk-for-net

Describe the bug

Using multi-byte characters in BlockBlobClient#QueryAsync's SQL results in a HTTP 400 error.

Expected behavior

Query can be executed successfully without errors.

Actual behavior (include Exception or Stack Trace)

If I specify a multi-byte character (in my case, Japanese) in the SQL, a 400 error occurs and the query cannot be executed.

Azure.RequestFailedException: 'Service request failed.
Status: 400 (XML specified is not syntactically valid.)
ErrorCode: InvalidXmlDocument

Headers:
Server: Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0
x-ms-error-code: InvalidXmlDocument
x-ms-request-id: 951ecc82-401e-0122-5011-94d975000000
x-ms-version: 2019-12-12
x-ms-client-request-id: 8974ba70-388c-475e-91f2-5b3e949bd205
Date: Sat, 26 Sep 2020 14:31:11 GMT
Content-Length: 229
Content-Type: application/xml
'

To Reproduce

You can reproduce this by running the following code

var blobServiceClient = new BlobServiceClient("CONNECTION_STRING");
var containerClient = blobServiceClient.GetBlobContainerClient("CONTAINER_NAME");
var blobClient = containerClient.GetBlockBlobClient("BLOB_NAME");

var options = new BlobQueryOptions
{
    InputTextConfiguration = new BlobQueryCsvTextOptions
    {
        HasHeaders = false
    }
};

var result = await blobClient.QueryAsync("SELECT * FROM BlobStorage WHERE _1 = '東京都'", options);

Environment:

  • Name and version of the Library package used: Azure.Storage.Blobs 12.6.0
  • Hosting platform or OS and .NET runtime version: .NET Core SDK 3.1.402 / .NET Core Runtime 3.1.8
  • IDE and version : Visual Studio 16.7.4
Client Service Attention Storage bug customer-reported needs-team-attention

All 4 comments

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

Thank you for your feedback. Tagging and routing to the team best able to assist.

Hi,
Thank you for the code snippet, I was able to reproduce your error.

Something strange is happening with the characters where the last bit of our request body is getting chopped off causing the 400 InvalidXMLDocument you are seeing.

Fiddler Request Trace

POST http://[redacted].blob.core.windows.net/test-container-fb279800-253a-4693-9ea2-0c751739577b/test-blob-01bc95cf-419a-4bbd-9d5a-7d60e9273f8c?comp=query HTTP/1.1
x-ms-version: 2019-12-12
x-ms-client-request-id: [redacted]
x-ms-return-client-request-id: true
User-Agent: azsdk-net-Storage.Blobs/12.7.0-alpha.20201001.1 (.NET Core 4.6.29220.03; Microsoft Windows 10.0.19041 )
x-ms-date: Thu, 01 Oct 2020 17:24:20 GMT
Authorization: [redacted]
Content-Type: application/xml
Content-Length: 122
Host: [redacted].blob.core.windows.net

<QueryRequest><QueryType>SQL</QueryType><Expression>SELECT * FROM BlobStorage WHERE _1 = '東京都'</Expression></QueryRe

We will look into this why this is happening and update the issue accordingly. Thank you for bringing this issue to our attention.

I've tested it too, but there seems to be a problem with the handling of Content-Length. It doesn't seem to take into account that multi-byte strings are used.

I call Encoding.UTF8.GetByteCount and set Content-Length, it works correctly. (It's an inefficient code for testing purposes)

  _request.Headers.SetValue("Content-Type", "application/xml");
- _request.Headers.SetValue("Content-Length", _text.Length.ToString(System.Globalization.CultureInfo.InvariantCulture));
+ _request.Headers.SetValue("Content-Length", System.Text.Encoding.UTF8.GetByteCount(_text).ToString(System.Globalization.CultureInfo.InvariantCulture));
  _request.Content = Azure.Core.RequestContent.Create(System.Text.Encoding.UTF8.GetBytes(_text));
Was this page helpful?
0 / 5 - 0 ratings