Kratos: Non-Ascii characters in input files

Created on 15 Feb 2018  ·  27Comments  ·  Source: KratosMultiphysics/Kratos

Some users try to use characters like 'º' or 'á' in ther cases. It seems like ModelPartIO is unable to deal with them. Apparently, there were some attempts in the past for fixing this, but it was not solved. I write it here as a reminder, because the importance of this kind of limitations is growing as the community grows.

Help Wanted

All 27 comments

We can support Unicode, this will solve all the problems, even the lack of emojis

Thx for the reminder.
Technically it is possible as @loumalouomega says. We could change iostream to wiostream. That would allow to support at least unicode.

Problems:

  • Windows and linux implementation _differ_ , and this is a very polite way to put it, for example windows will handle Unicode as UTF-16 sequences as far as I know (I may be wrong). See that we already have problems here).
  • w_char is wider than a char, hence the code will be slower and we will need more memory to read them, or not, depending on the encoding (see below).

Questions

  • Is it not clear to me what would happen if we feed the new model_part_io with UTF-8 or UTF-32. We should control this ourselves?
  • In case of extending support for other encoding, where do we stop? For example what if someone wants to write something in chinese in their modelpart? shall we support that? probably it will be technically possible. For example, shall we allow this (it is not an image):

K̵̹͓̖̰̟̖̝͙̖͓͕͈̪̽́̽̍̈́͋͗͆̑̈́̔̋͝͝ͅŗ̷̟͉̘͓͕̪͔̗̤̣̫̬̍̋̉̄̍͑̒͑̇͑̍͠͝a̵̟̞̮̺̮̯̘̜̮͖̜͊̆̽̔͐̀͜ͅt̷̡̞̮̙̦̳̗̱̣̘̭̲̅̊̄̌̍͗̓͆̇͝o̵̧̧̧̖͖̜̩̘̐̓͒̈́̿͒͑̔̎̚͝͝͝s̵̼̹̫̙̜̯̰̅̌̀͝
̷̗̝͈̾̈́̓͊̋̒̈̓̍̇̈̂̓̈́̕

Conslusion: My opinion is not give support to it.

P.s: What exactly are trying to write in the modelpart using those symbols :thinking:

I am in favor of NOT supporting anything that is not in the original ASCII (the first 128 characters of it as i recall are completely portable). The point is however if it is possible to check if the format is not compatible in a quick way...even just telling: there is a "strange character" in line whatever

https://wp.libpf.com/?p=626

these guys have the same problem as we do. Maybe we can write this utility, distribute it with Kratos and tell in the error message that something was wrong, and to check via the utility if the file format is ok

alternatively we could use the exception mechanism and only call this function on the line that gives troubles...

The problem @maceligueta is mentioning is not to have utf-32 in mdpa file. The problem is to deal with non-ascii characters in file name which is not as deep as what you were mentioning. I am in favor of correcting this and I am aware of the problem in dealing with unicode settings.

ah ok, if you are telling that the FILENAME should allow special characters i don't have objections

Let's bound this to the file name, then. If possible, I would extend it to the name of the SubModelParts if possible (which don't even allow spaces currently), but I see that this really complex, so let's do it just for the file name.

I understand that SubModelPart names it is a useful feature but then we should change entire mdpa file IO to wchar.

one quesiton: does the error appear in python or when we open the mdpa?

I just pass this to the next release

Is this solved?

I am assume not @roigcarlo is not in favor

Aside from the fact that I don't like it, I don't think we agreed on how to
implement this, and who has to do it

El jue., 1 nov. 2018 13:27, Vicente Mataix Ferrándiz <
[email protected]> escribió:

I am assume not @roigcarlo https://github.com/roigcarlo is not in favor


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/KratosMultiphysics/Kratos/issues/1511#issuecomment-435025318,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB2Jr2kGz9cR2syEOTtszk4tqUHkOoAVks5uquisgaJpZM4SGtZe
.

I think that we could limit the issue to a simple check of the filename and the content of the mdpa, with an understandable error. The library linked by @RiccardoRossi could help.
Actually, I did a simple attempt to adapt the Core to accept a non-Ascii filename, but the change was being propagated to so many files and methods that I resigned (I thought it was straight forward).
Let's see if we agree on adding a couple of methods that check the format of any filename and the content of a text file, callable from python...

I will post a meme about this

imagen

@maceligueta I think that's a really good idea!

@maceligueta if it is a major effort we can try to do a partitioned refactory starting with modelpart name and then go to the other parts of the code. Do you see this possible?

Right now I do not remember the details. What I remember is that I had to modify Python scripts first. After that, all charhad to be moved to wchar, which was painful for cases like https://github.com/KratosMultiphysics/Kratos/blob/5f8c5498a07584c7ef661337978a718840b2a475/kratos/sources/model_part_io.cpp#L54
Putting checks would be relatively easy, though.

following today's discussion

 #include <string>
 #include <iostream>

 int main()
 {
     std::string a = "aaa";
     std::wstring b = L"ñ";

     std::cout << a << std::endl;
     std::wcout << b << std::endl;

     if(a == b)
         std::cout << "they are equal " << std::endl;
     else
         std::cout << "not equal " << std::endl;

  return 0;
}

this simple file does not compile when compiled with

  g++ -std=c++11 -fPIC main.cpp -o test.exe

error is

error: no match for ‘operator==’ (operand types are ‘std::__cxx11::string {aka std::__cxx11::basic_string}’ and ‘std::__cxx11::wstring {aka std::__cxx11::basic_string}’)
if(a == b)
^~

actually now i am lost:

#include <string>
#include <iostream>

int main()
{
    std::string a = "n";
    std::string aa = "ñ";
    std::wstring b = L"ñ";

    std::cout << a << std::endl;
    std::cout << aa << std::endl;
    std::wcout << b << std::endl;

return 0;
}

outputs

g++ -std=c++14 -fPIC main.cpp -o test.exe 
 riccardo  ~/.../scratch/checkwstring  ./test.exe 
n
ñ
�

so std::string works correctly BUT wstring does not... which REALLY implies that i understood nothing about this ... :-D

mmm,maybe problem is with the filename itself not being stored in utf8 in windows ...

https://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as

Is this still relevant? should we make a new issue? its been dead for 1 year and not in any Project. ( We can also add it to the @KratosMultiphysics/technical-committee list...)

Well... It is still dangerous if a user chooses a name with accents or º (degrees). If there's no solution for it I would keep the issue open. It is also a good reminder of all the ideas that came up.
I think I remember @RiccardoRossi telling me that he had no problems trying this type of file names in Python3. Maybe this is just a matter of deprecating Python 2?

I just packed a version of Kratos to be tested in a Virtual Machine (clean Windows 10) and had this problem when launching the computation:

    return ReorderConsecutiveFromGivenIdsModelPartIO(modelpart, nodeid, elemid, condid, IO.SKIP_TIMER)
RuntimeError: Error: Error opening mdpa file : C:\Users\Miguel Ángel\Desktop\borrar.gid\borrarDEM.mdpa

in kratos/sources/model_part_io.cpp:60:ModelPartIO::ModelPartIO

But the problem got solved when I moved the example to C:\.
I guess that the fact of having spaces and most probably the accent Á in the path are the cause of the error.
This happened with Python3.7 and the latest master of Kratos and GiDInterface.

I just opened a PR with a relatively simple change that might work.

Was this page helpful?
0 / 5 - 0 ratings