Node-oracledb: Oracledb charset causes chinese garbled code

Created on 7 Nov 2019 · 20Comments · Source: oracle/node-oracledb

The document says Note that node-oracledb will always uses the AL32UTF8 character set,but I using the database server character set is us7ascii, resulting in the query out of the Chinese garbled code.How does oracledb handle us7ascii encoded query results.

enhancement

Source

lfpei

All 20 comments

Please give us a complete, runnable test case that includes

the DB character set (see the query in here),
the SQL to create the table and insert data.
the JS script that execute a query or whatever you are trying to do.
your operating system and version information (see here)
the expected output

Thanks!

It seems that you are trying to store multibyte data in a single byte character set? That sounds "wrong". The information above will help us understand.

cjbj on 7 Nov 2019

@cjbj My database is on the server of the company, I can't give you the link code, please kindly build a local oracle database to set the character set us7ascii, please have a look at the query results. Under normal circumstances, the query of Chinese is normal, now because the oracledb default character set is AL32UTF8, resulting in the query is box or question mark.

lfpei on 7 Nov 2019

@lfpei it is standard practice for the reporter (you) to create a test case when reporting a problem, as you can see from other recent issues reported.

I have NO idea what column types you are using or what data you have or what computer and language environment have. It would be a complete waste of my time to test without your assistance.

cjbj on 7 Nov 2019

❤1

@lfpei Storing non-ascii characters to US7ASCII database is not supported. For example substr may split a byte sequence constructing a multibyte character and return garbage. You should request your DBA to change the database character set.
See: https://docs.oracle.com/en/database/oracle/oracle-database/19/nlspg/character-set-migration.html#GUID-68664144-771B-4AD2-B015-8EBC91D8EEA8

Under normal circumstances, the query of Chinese is normal,

You get Chinese characters fortunately. I guess that you don't set NLS_LANG. When NLS_LANG isn't set, us7ascii is selected as the client character set. When the database character set and the client character set are same, characters are transmitted as they are without character set conversion. So you get normal characters because the character sets of client and server are both us7ascii. If you set proper NLS_LANG environment variable to use Chinese character set such as ZHS16GBK or ZHS32GB18030, you get garbled characters.

kubo on 7 Nov 2019

@kubo that's what I'm guessing too, but it would be ideal to know for sure.

cjbj on 8 Nov 2019

I just need to query the data, I don't need to store it,and my environment variable NLS_LANG has been set to US7ASCII, and the registry has also been modified by oracle. My system is Windows, and the query in CMD is normal, but the query using node-oracledb is a mess of codes.

lfpei on 8 Nov 2019

@lfpei why is it a mess? What is the column type it is stored in? Do you have any example data? Have you tried a patch like mentioned in the issue you originally posted to?

cjbj on 8 Nov 2019

@cjbj

the DB character set: US7ASCII. this is important
create table student
(
stuname varchar2(50) not null,
sex char(1) not null,
);
insert into student (stuname, sex) values ('张三三', '男');
insert into student (stuname, sex) values ('李小四', '女');
select * from student;
windows server 2008；
querying by cmd is right.by oracledb is wrong(client environment variable NLS_LANG has been set to US7ASCII)

thanks

followbin on 15 Nov 2019

the DB character set: US7ASCII. this is important

I call it wrong, since you are inserting non-ASCII characters. Why are you using that character set? What am I missing?

cjbj on 15 Nov 2019

querying by cmd is right.

It works only when the querying tool doesn't care about character set encodings. I guess that all Java-based applications, whose internal character set is UTF-16, garble Chinese characters in your environment.

You should request your DBA to change the database character set. Otherwise, you should be resigned to any trouble about characters.

kubo on 17 Nov 2019

I can't change the DB character set,because the DB is belong to the third company. And the DB is working on line.I just query data from the DB's view。

---Original---
From: "Kubo Takehiro"<[email protected]>
Date: Sun, Nov 17, 2019 10:07 AM
To: "oracle/node-oracledb"<[email protected]>;
Cc: "Comment"<[email protected]>;"followbin"<[email protected]>;
Subject: Re: [oracle/node-oracledb] Oracledb charset causes chinese garbled code (#1175)

querying by cmd is right.

You should request your DBA to change the database character set. Otherwise, you should be resigned to any trouble about characters.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.

followbin on 17 Nov 2019

@followbin Then, you cannot use tools which convert characters from the client character set to internal representation internally. For example: nodejs, java, python3.

You may be able to use tools which get characters as they are and, optionally, label the client character set information. Python2 may be available if cx_Oracle gets characters as byte strings. Ruby may be available when Encoding::default_internal is nil and you call String#force_encoding for each queried character data.

kubo on 17 Nov 2019

I call it wrong, since you are inserting non-ASCII characters. Why are you using that character set? What am I missing?

@cjbj The DB is belong to the third company，I can't change the DB character set。I want to try to get the original data ，then transcode the original data（I'm not sure whether it can work）。If it not work, I will make a dll library to call. T_T 。By the way,my program is a client written by electron.

followbin on 18 Nov 2019

You could try to hack the character set - at your own risk. See, for example, https://github.com/natlibfi-arlehiko/odpi/blob/b26feec583c580a136d02f61330a4541e3701a21/src/dpiImpl.h#L103
I've no idea if this kind of change (using whatever character set you want) will work. I don't know the implications.

cjbj on 18 Nov 2019

If you have knowledge of C language, you can fork node-oracledb and enhance it for your needs.

change the client character set to US7ASCII (here)
replace napi_create_string_utf8 in this directory with your own function which converts Chinese character set (GB18030?) to UTF-16 by calling MultiByteToWideChar (if it runs only on Windows) and calls napi_create_string_utf16 instead.

I don't know whether that is all. Other tasks may be required.
I guess that the node-oracledb team won't merge such workaround for unusual environments. So you need to do it at your own risk.

edited: If you need to pass Chinese characters to Oracle, you need to replace napi_get_value_string_utf8 also.

kubo on 18 Nov 2019

👍1

  Thank you very much for your help, although it may be difficult because I am a C# developer, but I will try it. I am a bit familiar with C. Thanks again

followbin on 19 Nov 2019

@cjbj @kubo
I successfully got the right results. Here is my solution:

Configure the oracle character set environment variable on the machine where my client is located. Set NLS_LANG to AMERICAN_AMERICA.US7ASCII.
use the .NET library System.Data.OleDb; only connect to the oracle database of the server, and return the query results.Then make a dll library. Create a method to call the dll library using nodejs
oracledb calls the above js method. Then it returns the correct result.
I think maybe there is a better solution, but I am a C# developer and my ability is limited. Looking forward to other people having better solutions

followbin on 19 Nov 2019

Why bother using Node.js !!!?

I'll close this - you have the solution (use a suitable DB character set) and a work around.

Thanks @kubo for your efforts.

cjbj on 20 Nov 2019

Why bother using Node.js !!!?

My client is written in Electron, it's popular and powerful, it can easily implement many functions, and it has little dependence on the system environment.

I'll close this - you have the solution (use a suitable DB character set) and a work around.

Although I solved my problem, is it possible to directly support the encoding setting function? Because this may also be helpful to others.

followbin on 20 Nov 2019

Although I solved my problem, is it possible to directly support the encoding setting function?

If I were a developer of node-oracledb, I would not support it.

What I wrote above is a workaround. It depends on probably undocumented behavior. It won't work if Oracle adds sanity check of character encoding in future even when client encoding is same with database encoding.

Because this may also be helpful to others.

It makes node-oracledb complex more than needs if no one use it. I, the maintainer of ruby-oci8 for many years, have not received such requests nor issues.

kubo on 29 Nov 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings