v5.6.1/Ubuntu18-7.3.tar
sqlserver 2017
Ubuntu1804
7.3
setting character encoding to SQLSRV_ENC_CHAR -> reads / writes data stored in existing DB with windows cp1252 encoding correctly
My issue strongly relates to #998
Essentially we have a DB running where several Windows (IIS) servers have been reading/writing data for a long time now.
Now Apache Servers should access the same Databases, reading the special characters encoded in the DB correctly and also store them in the same format for the windows server to continue functioning
We have a quick fix running that essentially converts all UTF-8 characters back-and-forth to Windows-1252 encoding when after reading and before writing data to the DB roughly like:
// after read
mb_convert_encoding($retrieved_data_from_db, 'windows-1252', 'UTF-8');
// before write
mb_convert_encoding($data_about_to_be_written_to_db, 'UTF-8', 'windows-1252');
Initially the solution looked ugly but functional, however it turns out several characters don't translate properly in this encoding conversion as documented here (https://www.i18nqa.com/debug/bug-double-conversion.html)
e.g.:
php> echo ( (mb_convert_encoding("Á", 'UTF-8', 'windows-1252')));
�
Thus customer name are regularly garbled.
Do you have any suggestion for us to go forward?
Any help strongly appreciated. ( Even suggestions how to properly migrate the DB to UTF-8 or smth along those lines )
Hi @lwohlhart , if you're willing to migrate your DB from using the ANSI codepage, it is the way to go. Not only you will improve the performance of your application (no more conversions by calling mb_convert* functions), but UTF-8 is the default in Ubuntu and other Linux distros. As you have also pointed out, Windows-1252 is not exactly identical to the ANSI standard ISO-8859-1.
This discussion might give you some ideas to start exploring your options. When using sqlsrv, you will need to connect using "CharacterSet" => "UTF-8". Please see this page for details.
Okay seems like the current encoding really is not an option.
Thank you for the suggestions;
Looks like the conversion is going to be a long process but we'll see to it.
Just out of curiosity:
We will have Window IIS php and Linux Apache php servers running and accessing the same database then.
Any grave caveats we should be aware of?
Such as: You have to make sure to use this/that locale on the Windows clients |or| set your connection string to this/that.
Thank you for your great work and support!
Hi @lwohlhart
If you're using both Windows and Linux to access the same database, I'd recommend using UTF-8 because it's the default in Linux platforms. As I mentioned above, if using sqlsrv, please make sure you connect using "CharacterSet" => "UTF-8". If you use our pdo driver pdo_sqlsrv, UTF-8 is the default.
The recent SQL Servers should support UTF-8 strings well, but if you find any issue, please let us know.
hey,
thank you for your advice. We're in the process of preparing the conversion.
I'll close the issue for now and come back to you if something happens.
Thanks!
@yitam just wanted to say thank you.
We've gone through the trouble of converting all our databases and can now properly use the utf-8 encoding.
Thanks for your advice! Happy 2021 !
Glad to be of help @lwohlhart 👍
Happy 2021 to you as well!