Heidisql: EEncodingError: No mapping for the Unicode character in target multi-byte code page

Created on 29 Jan 2019 · 39Comments · Source: HeidiSQL/HeidiSQL

Steps to reproduce this issue

Step 1; Import data from large (>300MB) sql file, in the Query tab -> Load SQL file
Step 2; Click "Run files directly"
Then I get...
bugreport.txt

Current behavior

The program crashes. When "Continue" is clicked a loading window appears, saying "Reading next chunk from file", but it never does anything.

Expected behavior

The file should be imported.

Environment

HeidiSQL version: 10.1.0.5464
Operating system: Windows 10 x64 build 17763

bug confirmed

Source

NikolaySTZ

👍1

Most helpful comment

I just spent two hours of testing with these 1.5gb of international strings. HeidiSQL repeated to crash with the above message when loading the 20MB after query 214. My earlier approach for such read-errors was 10 times to increase the chunk size by 4 bytes and try again each time, hoping that broken multi byte characters then gets read correctly. But this strategy failed in many file cases. I just changed two things for the next nightly build:

increase that padding from 4 Bytes to 1 MB
catch the above crash and show an error dialog instead

For me, your file now imports correctly.

By the way, if your import is very slow, like me, then I recommend these settings in Preferences > Logging:
grafik

ansgarbecker on 14 Apr 2019

❤2 👍2

All 39 comments

I am having exactly the same problem and I do not know how to solve it.
When I do the restoration through Workbech it finishes correctly.
But I would like to do in HeidiSQL.

athaydemirela on 8 Feb 2019

Hi! Exactly the same issue.

isma274 on 4 Apr 2019

Could you please tell which encoding you selected in the file-open dialog?

ansgarbecker on 4 Apr 2019

I've got exactly the same issue.
I select the file only. "Encoding" drop-down has "UTF-8" already selected.
The file was created by HeidiSQL when I wanted to export SQL dump of the database.

Investigation results
It happens when there is a first byte of the multi-byte UTF-8 character in SQL file at position 20971519 of SQL file. It's the last position of the first 20 megabytes of SQL file data. Does HeidiSQL split input SQL files in 20-megabyte chunks?! WHY??! I mean, it's 2019, computers have tens of gigabytes of memory (mine has 32 gb + huge swap file). Why not load entire SQL file in memory?

izogfif on 5 Apr 2019

Here is the test file: https://drive.google.com/open?id=153Lpi0xNY9BKsM1ubrxaqHz_kQ3FWgBo

izogfif on 5 Apr 2019

Thanks for the file attachment. I'll test that.

Did you watch out which encoding you selected in the file-open dialog?

ansgarbecker on 5 Apr 2019

Yes. UTF-8.

izogfif on 8 Apr 2019

I just stumbled across a UTF-8 issue in HeidiSQL, on files with only one character in them. I realized Delphi's TEncoding.UTF8 expects a preamble/BOM, which I just worked around with an overwritten encoding. I was hoping this fix heals this issue here. Please report back after testing the new nightly build. Thanks!

ansgarbecker on 13 Apr 2019

I have meet the same issue. While importing the SQL file - got the: EEncodingError: No mapping for the Unicode character in target multi-byte code page .. I have manually selected UTF8 in the import dialog.

Tested with nightly build: Revision 10.1.0.5526 from 13 Apr 2019 10:52
If you are interested I can share tests file, but it is 1.5G - but no problem to share in case of need.

under my investigation the problem could be related with splitting the big file as izogfif said, but not sure from 100% with this ...

milanjo on 14 Apr 2019

v10.1.0.5526 includes an attempt to fix this. Could you please update and try again?

ansgarbecker on 14 Apr 2019

Tested yesterday with yesterday nightly build: Revision 10.1.0.5526 from 13 Apr 2019 10:52
Should I test with build from today Revision 10.1.0.5527 ?? i think no ...

milanjo on 14 Apr 2019

I oversaw you already tested with that version. Will make some more tests. Would be good to have a test file, even if it's big - probably compress it and make it somehow available for me?

ansgarbecker on 14 Apr 2019

Here is the test file compressed using 7zip: https://we.tl/t-eKD2XA9e6j

thank you very much for your effort, will donate for sure this time ...
i am using heidisql for a couple of years :)

milanjo on 14 Apr 2019

Thanks for the testfile - I just downloaded it, so you may delete it if you'd like to.

ansgarbecker on 14 Apr 2019

increase that padding from 4 Bytes to 1 MB
catch the above crash and show an error dialog instead

For me, your file now imports correctly.

By the way, if your import is very slow, like me, then I recommend these settings in Preferences > Logging:
grafik

ansgarbecker on 14 Apr 2019

❤2 👍2

Hello .. sorry for the big test file with complicated structures of tables .. Anyway it is working fine for me for now! The whole file was imported now to the database with no Errors! Also no Error dialog after finishing ...

Tested with version: Revision 10.1.0.5528 from 14 Apr 2019 19:54
Thank you very much for your hard work. Have a nice rest of weekend.

milanjo on 14 Apr 2019

I am closing this now. Just shout if you have a file which does not yet import correctly with the latest build.

ansgarbecker on 16 Apr 2019

Hi, I just experienced this on the latest nightly build

IainCoSource on 20 May 2019

@IainCoSource please post the encoding of that file, and the one you selected in the file-open dialog.
Also, if you can, mail me that file if it does not contain critical information.

ansgarbecker on 20 May 2019

@IainCoSource please post the encoding of that file, and the one you selected in the file-open dialog.
Also, if you can, mail me that file if it does not contain critical information.

Will send after i remove the sensitive data

IainCoSource on 20 May 2019

Hello .. sorry for the big test file with complicated structures of tables .. Anyway it is working fine for me for now! The whole file was imported now to the database with no Errors! Also no Error dialog after finishing ...

Tested with version: Revision 10.1.0.5528 from 14 Apr 2019 19:54
Thank you very much for your hard work. Have a nice rest of weekend.

The problem has been fixed after I updated my HeidiSql.
Thank you Ansgar.

mamaly12 on 15 Jun 2019

Hi
This is still happening on the latest version. Please advise

soopermouse on 3 Sep 2019

👍2

I'm still experiencing this issue too

lareeth on 3 Sep 2019

@soopermouse and @lareeth can you please verify you are selecting the right encoding in the file-open-dialog.

ansgarbecker on 3 Sep 2019

@ansgarbecker
TL;DR:

Edit source/apphelpers.pas
Change

BufferPadding = SIZE_MB;

BufferPadding = 1;

Full explanation:

If you check the changes for this issue, you'll be able to figure what's going on:

SQL dump is read in chunks.
When chunk boundary splits multi-byte Unicode character into pieces, the chunk contains an invalid character sequence.
An attempt to parse chunk using the specified encoding is made.
Since the chunk contains only a piece of multi-byte Unicode character, string decoding fails.
Chunk size is increased by 1 megabyte, then entire process is repeated. Up to 10 times.

Now guess what happens if there are multi-byte characters every 1048576 bytes (1mb) in SQL file? No matter how many times you increase your chunk size by 1048576 bytes, until you read the file until the very end, you'll be getting an error.

Here is the file with the test case: https://drive.google.com/open?id=1T56Kw32gObgKLRkaylHkx-jrZrhTS0Ad

It was generated via this Java program:
```import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.util.Arrays;

public class Main {
public static void main(String[] args) throws Throwable {
final int FILE_SIZE = 31 * 1024 * 1024;
final int CHUNK_SIZE = 1024;
byte[] bytes = new byte[FILE_SIZE];
byte[] invalidSequence = {0x2f, 0x2a, (byte) 0xd1, (byte) 0x82, 0x2a, 0x2f};
Arrays.fill(bytes, (byte) 32);
for (int pos = CHUNK_SIZE - 3; pos + invalidSequence.length < bytes.length; pos += CHUNK_SIZE) {
System.arraycopy(invalidSequence, 0, bytes, pos, invalidSequence.length);
}
Files.write(Paths.get("C:\temp\heidi-invalid.sql"), bytes, StandardOpenOption.CREATE);
}
}
```

Proposal: set BufferPadding in source/apphelpers.pas to 1 byte. It's highly unlikely that there is a multi-byte encoding having maximum length of encoded Unicode character larger than 10 bytes (10 is the amount of attempts). So it should work.

izogfif on 4 Sep 2019

I have the same issue

nmoreaud on 17 Sep 2019

Have the same issue on version 10.2.0.5599 and 4.13 GB database dump.
Selected UTF-8.

decadence on 17 Sep 2019

@izogfif I just followed your advice using just 1 byte incrementation for BufferPadding. Sounds logical though I don't expect that to work. It's more a question which encoding the user selected in the file-open dialog, which some of the reporters did not tell in their comments. However, please update to the latest build in half an hour and try again.

ansgarbecker on 17 Sep 2019

@ansgarbecker yes this fixes the issue. You can check it yourself using the sample SQL file I mentioned previously. It only contains comments, so it should be safe to import as UTF-8 on any database. Or generate such file yourself.

izogfif on 27 Sep 2019

Great! Thanks for your help and feedback.

ansgarbecker on 27 Sep 2019

Hi,

I've the same issue on latest version downloaded today. I selected UTF8 in the dialog box.
Can someone assist please

roughed on 3 Jun 2020

@roughed this is an old and closed thread. Please file a new issue, but previously assure you are using the latest HeidiSQL build. In that new issues, it may be helpful to see some details about your loaded file, e.g. the encoding, size and the charset of the target tables.

ansgarbecker on 3 Jun 2020

Also having the same issue. This is not fixed

Nottt on 26 Jun 2020

@Nottt this is a closed issue - if you really have this issue with the latest HeidiSQL build, then please attach a sample file here.

ansgarbecker on 26 Jun 2020

@Nottt make sure to use the same database collation/charset.

@ansgarbecker Maybe an idea here would be, to make the error message here a little bit more easy to understand?

Like in the case a user is trying to import a database from one collation/charset into another collation/charset - he will get the error statet in the issues topic. Maybe pointing out that charset/collation of the database somebody is trying to import to needs to be the same as while exporting would help?

tobiasgraeber on 24 Aug 2020

the error is still observed in the build 11.0.0.6096, please reopen the issue

zxweed on 2 Sep 2020

👍1

Please look at the "new" report from @roughed : #1048 .

ansgarbecker on 3 Sep 2020

For me this error was being caused by trying to import a more recent backup from server to my desktop. It was a minor difference...like mariadb 10.4.2 to mariadb 10.4.1 or something like this but still caused issues.

So If someone is having this, make sure the version of the databases you are running are identical down to the last number

Nottt on 4 Sep 2020

I have the same