Notepad3: Can't handle files >=4GB

Created on 27 Oct 2019  路  25Comments  路  Source: rizonesoft/Notepad3

I tried opening a file (5.76 GB) with Notepad3 5.19.815.2595 (x64), however it stopped displaying the file at around 1.76 GB into the file. Search or anything else also doesn't work past that point.
the problem appears to be here: https://github.com/rizonesoft/Notepad3/blob/68f0e44cfa6add6ddb820e3fb80923819701cc0b/src/Edit.c#L983
because the higher DWORD of the file size is ignored

Scintilla upstream enhancement / feature req.

Most helpful comment

@RaiKoHoff basic changes

  1. avoid Sci_ApplyLexerStyle(0, -1); in Style_SetLexer, it can be changed to SCI_STARTSTYLING(0)
  2. use https://www.scintilla.org/ScintillaDoc.html#SCI_SETIDLESTYLING in _InitializeSciEditCtrl()
  3. profile the exe. "1 min. 10 sec. " looks suspicious, even slow than old GDI-based Notepad2-mod.
  4. IsUTF8() (takes 1~2 seconds for 2GB file on my i5 system) and EditDetectEOLMode() (about 200ms for 2GB file) has SSE2/AVX2 accelerations.

About "file mapping": I think it's more complicated, I don't work out how to avoid loading entire file into memory but still keep correct syntax color and folding (all file in my Notepad2 has code folding). I only have experience (for viewing huge SQL dump) with Vim which can load file larger than 4GB with color (seems Vim loads file chunks on demand, but I not checked it's source code).

All 25 comments

This will be only the first triggered problem (out of many other problems I expect for files larger than 2 GB). Support from Scintilla is also in beta state (see also: https://www.scintilla.org/ScintillaDoc.html#MultipleViews : SC_DOCUMENTOPTION_TEXT_LARGE).

Beside the expected "sluggish" behavior of the Lexers (better to switch to Lexer_NULL for these large files, I expect other very inconvenient slow behavior (search and replace, etc.).
On the other hand, sensible text documents of this size are a "pathological case" and this will reduce the priority to support this "very large file handling" by order of magnitude ... :thinking:

I think at least a warning and failing or something similar about this would be nice, just in case someone tries to open large files like this. Additionally, you might want to check smaller sizes too, like 4GB-16 (which ends up at an allocation size of 0 bytes) to make sure they don't crash the program. Appropriately sized files also bypass the file size warning currently.

(also just a sidenote: my file was in fact a sensible text document, namely a log file)

It's not slow when use idle styling, see https://github.com/zufuliu/notepad2/issues/125.

HI @RaiKoHoff, there are bugs in WideCharToMultiByteEx() and MultiByteToWideCharEx(): they don't handle UTF-16 surrogate pairs (CharNextW, CharPrevW) or DBCS multibyte characters (IsDBCSLeadByteEx, CharNextExA, CharPrevExA).

Hi @zufuliu , thanks for the hint, it was just an idea workaround the 2GB (MAX_INT) limitation, I didn't test or used it yet ... :thinking:

Hello @namazso , for very large files, I recommend for example: PilotEdit 13.3.0

PilotEdit is a handy and reliable file (text- and hex-) editor designed to help users to execute scripts, extract strings and edit large files.

Features:

  • PilotEdit is four times faster than PilotEdit Lite when opening huge files in ASCII mode.
  • Edit huge files of 400GB (40 billion lines) in quick mode.
  • Compare and merge two huge files of 100GB (10 billion lines).
  • Encrypt/decrypt files larger than 10GB.
  • Edit an encrypted file transparently.
  • Sort a huge file of 1GB.
  • Find/remove duplicate lines in a file larger than 1GB.
  • Extract strings matching a regular expression.
  • Execute PilotEdit scripts to replace strings automatically.
  • Automatically detect start tag and end tag.
  • Format source code.
  • Edit, download/upload large files through SFTP.
  • Highlight all occurrences of selected word.
  • Replace millions occurrences of strings in a huge file in quick mode.
  • Change the encoding of big files.
  • Code Collapse. ...

Hello @RaiKoHoff ,
With this link, you can download 3 log files to test the 2GB limit: Test_files_size_2GB_limit.rar

The file: Size_2.01 GB (2065 MB - 12.231.271 lines).log produces this dialog. 馃憤

Size_2 01 GB (2065 MB - 12 231 271 lines)

The file: Size_1.99 GB (2042 MB - 12.095.368 lines).log opens (on a fast i7 system) after:

  • 1 min. 10 sec. with Notepad3 (64-bit) v5.19.1114.2674 BETA 馃悓
  • 1 sec. with EditPadLite 7.6.5 馃槷
  • 12 sec. with EditPlus 5.2.2386
  • 8 sec. with Notepad++ v7.8.1
  • 21 sec. with Notepad2 (original) 4.2.25
  • 21 sec. with Notepad2-mod 4.2.25.998
  • crash with Notepad2e R92
  • 5 sec. with Notepad2-zulufiu 4.19.11r2524
  • 5 sec. with SciTE 4.2.1
  • 15 sec. with VSCode 1.40.1

@hpwamr how about Notepad2, you can test the AVX2 build.
using idle styling will make loading bigger file fast.

how about Notepad2, you can test the AVX2 build.
using idle styling will make loading bigger file fast.

Hello @zufuliu , indeed your AVX2 build is very fast to open a big file.
I will update my previous list with the result of your Notepad2 and some others Notepad.
PS: What do you mean with: "idle styling" ?

use SciCall_SetIdleStyling(SC_IDLESTYLING_ALL) or with other none SC_IDLESTYLING_NONE value.

and don't call SciCall_ColouriseAll() in Style_SetLexer(), call it somewhere when colourising whole document is needed, i.e. when toggling all folds.

SciTE use SC_DOCUMENTOPTION_STYLES_NONE for bigger file (default 10MB).
Notepad++ can't open file larger than 2GB, they explicitly cast Scintilla API return values to int.
EditPadLite might use file mapping.

@zufuliu : Sorry, we didn't find the time to look at your solution yet - I think too: that is the right way.
Using the "file mapping" technology would be a lot more effort, especially we want to continue the support of the AES Encryption feature.

@RaiKoHoff basic changes

  1. avoid Sci_ApplyLexerStyle(0, -1); in Style_SetLexer, it can be changed to SCI_STARTSTYLING(0)
  2. use https://www.scintilla.org/ScintillaDoc.html#SCI_SETIDLESTYLING in _InitializeSciEditCtrl()
  3. profile the exe. "1 min. 10 sec. " looks suspicious, even slow than old GDI-based Notepad2-mod.
  4. IsUTF8() (takes 1~2 seconds for 2GB file on my i5 system) and EditDetectEOLMode() (about 200ms for 2GB file) has SSE2/AVX2 accelerations.

About "file mapping": I think it's more complicated, I don't work out how to avoid loading entire file into memory but still keep correct syntax color and folding (all file in my Notepad2 has code folding). I only have experience (for viewing huge SQL dump) with Vim which can load file larger than 4GB with color (seems Vim loads file chunks on demand, but I not checked it's source code).

Just tested file mapping (based on example from https://docs.microsoft.com/en-us/windows/win32/memory/creating-a-view-within-a-file), mapping entire file is indeed faster than reading entire file, the other benefit of file mapping is that the API supports file larger then 4GiB.
https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-mapviewoffile

@zufuliu : Exactly what I expected: Creating only a Mapped View on huge files should be faster :smiley:.
What about changing and saving that huge file ?
What about searching and replacing strings spread over the huge file?

On the other hand, I have to dig deeper in that issue, maybe Notepad3 may not support a huge encrypted file (if it uses a stream cipher ...).

@RaiKoHoff searching is OK, including mark all occurrences. Editing (a rarely operation for huge files) is very slow, with huge memory/CPU usage (resizing and moving). Saving is not slow (using WriteFile), using file mapping (and range pointer) to save file is not tested, I think it should faster than WriteFile.

Any way, loading entire file into memory is not the proper way to view huge files, but it's the only way supported by current Scintilla.

Hello @hpwamr ,

Feel free to test the RC2 version "Notepad3Portable_5.20.225.3_RC2.paf.exe.7z" or higher.
See "Notepad3 BETA-channel access #1129" or here Notepad3Portable_5.20.225.3_RC2.paf.exe.7z or from site_2.

Note: "Notepad3Portable RC2" can be used in "2 flavors", see with/without ext.: ".7z" or from site_2.

Your comments and suggestions are welcome... 馃槂

Hello @namazso ,

Today, I've tested Notepad3 (64-bit) v5.20.304.1 RC2 with the file :

  • Size_2.01 GB (2065 MB - 12.231.271 lines).log

The file is opened after +/- 8 sec on a fast i7 system.

Note: The maximum size is changed from 2 GB to 4 GB. 馃憤

As far as I'm concerned, I think you (requester) can close this issue...

@hpwamr well, the original issue was about an 5.76 GB file, so I think it could stay. however I appreciate the increased limit.

We can leave this issue open, until the Scintilla Component can handle files bigger than 4GB ...

@hpwamr what's the time on your fast computer for SciTE 4.4.5, Notepad2 4.20.09 (AVX2, x64 and Win32 builds), Notepad3 with Scintilla 4.4.5?

Hello @zufuliu ,

Download test files: "Test_big_files_till_4GB.rar"

The test file: "Size_3.98 GB (4086 MB - 24.190.736 lines).log" opens (on a fast i7 system) after:

  • 7 sec. with Notepad2_en_x64_v4.20.08r3252 (Scintilla 4.4.4)
  • 6 sec.. with Notepad2_en_AVX2_v4.20.08r3252 (Scintilla 4.4.4)
  • 5 sec. with Notepad2_en_x64_v4.20.09r3288 (Scintilla 4.4.5)
  • 4,7 sec. with Notepad2_en_AVX2_v4.20.09r3288 (Scintilla 4.4.5)
  • 14 sec. with Notepad3Portable (x64) v5.20.913.2 rc (Scintilla 4.4.4)
  • 12,7 sec. with Notepad3Portable (x64) v5.20.917.1 beta (Scintilla 4.4.5)

Hi @hpwamr, thank you 馃憤. What's about the time for AVX2 builds (v4.20.08r3252, v4.20.09r3288)?

For Notepad3, there will have improvements after Scintilla 5 (maybe in 5.1 or 5.2).
Currently, I suggest update EditDetectEOLMode, remove the eol_table, which turns out to be slower, the latest code for EditDetectEOLMode is available at https://github.com/zufuliu/notepad2/blob/master/src/Edit.c#L742

@zufuliu : Thank you for enhancement suggestions - I will check your latest code for EOL detection soon... - best regards 馃憤

Hello @zufuliu ,
I've just updated my above list: https://github.com/rizonesoft/Notepad3/issues/1713#issuecomment-692164843

@hpwamr Thanks.

Was this page helpful?
0 / 5 - 0 ratings