Opencv: Unicode Path/Filename for imread and imwrite

Created on 27 Jul 2015  Â·  3Comments  Â·  Source: opencv/opencv

Transferred from http://code.opencv.org/issues/1268

|| Richard Steffen on 2011-07-29 13:28
|| Priority: Low
|| Affected: None
|| Category: highgui-images
|| Tracker: Feature
|| Difficulty: 
|| PR: 
|| Platform: None / None

Unicode Path/Filename for imread and imwrite

Currently, the imread and imwrite method only supports std::string as an input. This isn't working with non-ascii directories/paths. Therefore, the a software depends on OpenCV can not guaranty working on all maschines.

History

Alexander Shishkov on 2012-02-12 20:47
-   Description changed from Currently, the imread and imwrite method
    only supports std::string as an inpu... to Currently, the imread and
    imwrite method only supports std::string as an inpu... More
Vadim Pisarevsky on 2012-04-04 11:58
std::string is still capable of storing any unicode name via UTF-8 encoding, and so it's fopen responsibility to handle this UTF-8 name properly. On Mac and Linux I was able to store image into a file with non-ASCII letters using normal cv::imwrite(). I guess, on Windows it will work too, if you save a source file to UTF-8.
-   Priority changed from High to Low
-   Assignee set to Vadim Pisarevsky
-   Status changed from Open to Cancelled
Andrey Kamaev on 2012-05-18 14:20
-   Target version set to 2.4.0
n n on 2014-03-18 13:11
AFAIK fopen does not support Unicode on Windows and can't be used to open a path with Unicode characters. The UTF-8 string must be converted to UTF-16 and given to _wfopen instead. See ImageMagick's fopen_utf8 wrapper for example code: http://www.imagemagick.org/api/MagickCore/utility-private_8h_source.html#l00103
-   Target version changed from 2.4.0 to 2.4.9
-   Status changed from Cancelled to Open
n n on 2014-03-19 10:48
One possible workaround for now using Boost and a memory mapped file:

    mapped_file map(path(L"filename"), ios::in);
    Mat file(1, numeric_cast<int>(map.size()), CV_8S, const_cast<char*>(map.const_data()), CV_AUTOSTEP);
    Mat image(imdecode(file, 1));

The downside is that I/O errors cause access violations instead of C++ exceptions. Also don't write to the "file" Mat. :)
n n on 2014-03-19 11:20
Unfortunately the trick of avoiding to store the image file in memory doesn't work with imwrite, as imencode stores the output in a vector with standard allocator specified. If memory is no issue the contents can of course be written to file using Boost afterwards.
Alexander Smorkalov on 2014-04-02 01:18
-   Target version changed from 2.4.9 to 3.0
auto-transferred imgcodecs feature low

All 3 comments

Encoding issues.... difficult but no impossible

if you have this string = 'テスト/abc.jpg'
You can encode as Windows encoding the characters like this->
print('テスト/abc.jpg'.encode('utf-8').decode('unicode-escape'))
And you get something like this = 'テスト/abc.jpg'

Then if you want to read the file and get the filenames readable and usable, you can use some library to read the filenames of your path and then change the encoding->
#fname is like 'テスト/abc.jpg'
fname.encode('iso-8859-1').decode('utf-8')) # This result of your initial string ='テスト/abc.jpg'

OpenCV core team discussed the problem on weekly meeting and decided to stay conservative and do not introduce new API calls with wchar_t, wstring and other string types By the following reasons:

  • Most of image decoding and encoding libraries use standard fopen call to open files and extra wchar_t support requires domain libraries modification
  • Modern Linux, Mac OS and latest Windows releases support UTF-8 encoding that allows to use std::string as container to pass it to OpenCV functions.
  • Popular FSes on Linux do not use wchar_t natively and the overloads are not cross platform solution.

There are 2 alternatives to use wchar_t strings with OpenCV:

  1. Convert wchar_t strings to UTF-8 and pass UTF-8 string as cv::imread and cv::imwrite parameter. UTF-8 string is handled by system fopen call and it's behavior depends on OS support and locale. See mbstowcs in C++ standard for more details.

  2. OpenCV provides cv::imdecode and cv::imencode functions that allow to decode and encode image using memory buffer as input and output. The solution decouples file IO and image decoding and allows to manage path strings, locales, etc in user code. See code snippet for cv::imencode bellow. fopen can be replaced with _wfopen for wide strings support. See Microsoft reference manual for details: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=vs-2019

#include <vector>
#include <opencv2/core.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>

int main(int argc, char ** argv)
{
    FILE* f = fopen("lena.jpg", "rb");
    fseek(f, 0, SEEK_END); // seek to end of file
    size_t buffer_size = ftell(f); // get current file pointer
    fseek(f, 0, SEEK_SET); // seek back to beginning of file

    std::vector<char> buffer(buffer_size);
    fread(&buffer[0], sizeof(char), buffer_size, f);
    fclose(f);

    cv::Mat frame = cv::imdecode(buffer, cv::IMREAD_COLOR);

    cv::imshow("Camera", frame);
    cv::waitKey();
}
Was this page helpful?
0 / 5 - 0 ratings