Beast: Can not read file name with umlauts in boost::beast::http::file_body

Created on 7 Sep 2020  路  17Comments  路  Source: boostorg/beast

Thanks again for the very cool library, first and foremost! We have a minor issue with umlauts (low priority).

Version of Beast

#define BOOST_BEAST_VERSION 290

Steps necessary to reproduce the problem

My code is based on the advanced_server_flex.cpp example, and my problem appears specifically around line
https://github.com/boostorg/beast/blob/develop/example/advanced/server-flex/advanced_server_flex.cpp#L179:

    // Attempt to open the file
    beast::error_code ec;
    http::file_body::value_type body;
    body.open(path.c_str(), beast::file_mode::scan, ec);

    // Handle the case where the file doesn't exist
    if(ec == beast::errc::no_such_file_or_directory)  //< This is true for us when the file name has umlauts.

We used this code successfully on multiple platforms (MSVC, Ubuntu and MacOSX) for about a year now. However as soon as we started adding clang-cl to the list of compilers, the webserver can no longer read files that have Umlauts in the file name.

We tested with a file named

眉枚盲laut.txt

that is _not_ found on Windows when using clang-cl, whereas all other platforms/compilers work fine.

All relevant compiler information

The error comes with Visual Studio 2019 x64 16.7.1 with the Microsoft-supplied clang-cl version 10.0.0. Using the Microsoft-Compiler and not clang-cl, everything works.

Bug

All 17 comments

Thanks for reporting this @emmenlau ,

Before I start looking into this, do you have the impression that this is a URL parsing problem or a file access problem?

Thanks for reporting this @emmenlau ,

Before I start looking into this, do you have the impression that this is a URL parsing problem or a file access problem?

Thanks a lot @madmongo1 ! Admittedly I don't know.

The file was created with boost::filesystem on all platforms, and can be correctly queried with boost::filesystem on all platforms. Also, Windows Explorer shows the file correctly. So the least I can guess, its not a problem with the file not existing or being inaccessible due to permissions.

The server also prints the file name correctly when I print the std::string filename:

handle_request(): The resource 'C:/data/stable-tmp-MSVC-Haswell-7-x64-cl19.27.29111_clang10.0.0/Debug/public/眉枚盲laut.txt' was not found.

Does that help? Let me know anything I can do to further narrow this down...

probably having to do with fopen and utf-8 encoded paths or something like that

probably having to do with fopen and utf-8 encoded paths or something like that

Yes that sounds about right. We encode paths as UTF-8, and now that you mention it, I remember that boost::filesystem required a default conversion for Windows to be set. Something along the lines of

boost::filesystem::path::imbue( 
    std::locale( std::locale(), new std::codecvt_utf8_utf16<wchar_t>() ) );

Its interesting however that this used to work with boost::beast and the MSVC compilers out of the box, whereas clang-cl seems to treat it differently.

ugh, facets....

beast::file is _supposed_ to handle utf-8 encoded filenames, but I never implemented it.

beast::file is _supposed_ to handle utf-8 encoded filenames, but I never implemented it.

Hehe, fair enough! Do you think this will be on the road map for the next 3-6 months? Or is it rather unlikely to happen? The issue has no urgency for us, but it would be nice to know that this will come at some point...

Do you think this will be on the road map for the next 3-6 months?

@madmongo1 ?

We have a user with an unimplemented use case. My pride will not allow me to leave this wrong unrighted.
It鈥檚 going to the top of the development queue.

It鈥檚 going to the top of the development queue.

Hahaha noooo! Its really not so urgent! :-) But guys, thanks again for the quick response!

Hi @emmenlau, would you mind checking whether adding a manifest to your executable solves this?

It will still need a code solution for older versions of windows, but I am interested to know whether this works:

https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page

Hi @emmenlau, would you mind checking whether adding a manifest to your executable solves this?

It will still need a code solution for older versions of windows, but I am interested to know whether this works:

https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page

This is awesome, I was not aware of it! Sadly, we run all our builds and tests on Windows 7 so I can't test it (yet). But this is a very interesting new insight for me. Nice work from Microsoft.

Hi @emmenlau ,

Are you able to give me the sequence of bytes you think you are getting for "眉枚盲laut.txt" ?

I have performed a conversion online to : { 0xc3, 0xbc, 0xc3, 0xb6, 0xc3, 0xa4, 0x6c, 0x61, 0x75, 0x74, 0x2e , 0x74, 0x78, 0x74 };

But this is not converting correctly to UTF-16 via ::MultiByteToWideChar

Argh! I should be more helpful in this, but sadly I don't know how to get this. Originally the sequence is encoded in UTF-8, but this is not what you want, no? Then we use this UTF-8 string in boost::filesystem to create test files that boost::beast should serve. The client sends the UTF-8 string via a boost::beast-based HTTP connection to the server. I assume what we receive server-side is still UTF-8, and when I print it, it looks correct. So far, also everything works fine with any compiler I tried.

The error seems to come when I pass this (probably UTF-8 encoded) string to body.open(path.c_str(), beast::file_mode::scan, ec);: While the file reading works with all other compilers, it fails with clang-cl, claiming the file would not exist. But I have no way of tracking how clang translates/encodes/parses the string, and also no way of knowing what exactly fails in opening the file.

I can only say that in boost::filesystem, everything works when we imbue the automatic conversion with std::codecvt_utf8_utf16<wchar_t>().

Cutting a long story short, what can I do to be more helpful? Should I try to step into this with the debugger to see if I can find more details? Or does body.open() accept UTF-16 so I can try to perform a conversion manually?

Thanks for responding @emmenlau

What would really help me track this down is:

  1. a simple 10-line program that demonstrates this causing a problem on clang-cl but not on msvc or gcc.
  2. idiot-proof instructions on how to compile and link it :-)

This way I can replicate without too much effort and focus my attention on what is going wrong. It's beginning to sound as if we're not detecting Windows properly when compiling on clang (note 1), but I'd like to be able to prove it.

note 1: Because we do have code to detect that we're on Windows and convert UTF-8 to Windows Unicode internally.

R

Dear @madmongo1 ,

sorry for the super late reply, I did not have time to continue with the issue. Now I've just created a relatively small test-case that inhibits the aforementioned problem. Its based on gtest, boost::filesystem to create a test file, and boost::beast to open the test file. The test works for me in our CI-system on Linux, macOS and the Microsoft Compiler. It only fails (as expected) when using ClangCl on MSVC with error message handle_request(): The resource '眉枚盲laut.txt' was not found.

Can you try to reproduce the problem on your side? Or do you spot any obvious mistakes in the test? Feel free to throw out gtest if you need, its not required.

Here the file for download:
BoostBeastClangCLUmlautsTest.txt

Fantastic. Thank you.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jed1 picture jed1  路  4Comments

vchang-akamai picture vchang-akamai  路  5Comments

maddinat0r picture maddinat0r  路  4Comments

MarcoRhayden picture MarcoRhayden  路  6Comments

fpingas picture fpingas  路  7Comments