Ghidra: Will Java always be the primary language of ghidra?

Created on 21 Mar 2019  路  4Comments  路  Source: NationalSecurityAgency/ghidra

Compared to other tools such as IDA ghidra feels quiet sluggish even on an i9 machine. I can only assume this is because it has roots is in Java - but maybe its just an algorithm issue?

Anyway I'm not trying to start a flame/language wars here but is Java always going to be the main/go forward language?

If there was enough community backing to port to say C++/Qt for instance would it be considered and would it be accepted?

Also I'd be interested to know why this was the original language of choice. I'd assume its just because its what the NSA internal devs where most familiar with at the time?

Question

Most helpful comment

There are many reasons Ghidra might appear sluggish; the visible plugins that are displaying and updating information, background analysis on a program (running 6 decompilers in the background), the complexity of analysis being performed (fixing damage from non-returning functions), or the size of the program. Even with very large programs, I hope you will find Ghidra performs essentially the same.

One startup cost when first loading a program is the automated re-compilation of language modules from changes. Sometimes when unzipping the files, the date time stamp on the file will be reset. This causes the processor module for the binary being opened to be re-compiled to a .sla file. This should only happen once for that processor and subsequent uses of the processor will use the latest compiled .sla file. We've tried to limit this re-compile by providing pre-compiled .sla files, but occasionally the time stamps are reset or timestamps that are too close are put on the files at build time.

Another potential issue on the i9 is the use of background threads that JAVA uses for garbage collection. On a processor with many available cores/threads we've seen some issues with GC thrashing. If this is the case, there is a setting to restrict the GC threads that might not be working in your case.

Occasionally plugins display information that must be constantly updated due to changes in the background or are costly to compute need to be re-designed, for example the data type archives have historically had performance problems that have been addressed when processing large programs. An attempt is made to maintain interactivity and reduce unnecessary updates with good event re-action management. The code is profiled from time to time when sluggishness is noted to identify the culprit.

If you note some sluggishness it may be there is computation in the background going on to display the data. For example, the code browser is doing an amazing number of controlled complex tasks to provide maximum configurabilty, maintain interactivity during analysis, all while smoothly scrolling with many types of plug-able fields displaying unknown information. These costs would be born in any GUI language chosen.

There are many advantages and some sacrifices to using JAVA. A few advantages come to mind that I depend on:

Ghidra rarely loses work due to bugs in both Ghidra base, contributions, and scripts, for example out of memory errors. Previously libraries of C/C++ code using JNI were used, but it was found that bugs in code caused crashes and loss of data at the worst time. Using C/C++ directly has been discouraged.
Scripts (and Ghidra) are fully interactively debuggable in modern development environments with the ability to push changes to code on the fly while debugging without re-setting up the issue under inspection.
Originally the portability of the GUI was an issue (Windows, Linux, Mac), however Qt has gotten better on that front. Also at the time of choice (not trying to get into a comparison of other options) JAVA was the only language that could provide the on the fly extendability necessary for Ghidra to do what it does.

Normally Ghidra runs with only 1Gig of available memory to Ghidra and I've found I rarely need to bump it up unless there is a greedy algorithm that needs to be run. The memory in JAVA is normally mostly just empty overhead to allow processing to be quick without excessive garbage collection. I have 125M in use with several programs up, Java is using 450M with about 300M free.

Ghidra provides a tremendous amount of capability using JAVA. The advantages and disadvantages of moving to another architecture would need to be very carefully weighed, to include first a re-design of sluggish algorithms, in favor of another option to merit the time and potential loss of capability, vice just enjoying using the tool ;-)

All 4 comments

I think that the subsequent modules may be developed independently, and perhaps the core modules will gradually turn to other languages, such as C++.

There are many reasons Ghidra might appear sluggish; the visible plugins that are displaying and updating information, background analysis on a program (running 6 decompilers in the background), the complexity of analysis being performed (fixing damage from non-returning functions), or the size of the program. Even with very large programs, I hope you will find Ghidra performs essentially the same.

One startup cost when first loading a program is the automated re-compilation of language modules from changes. Sometimes when unzipping the files, the date time stamp on the file will be reset. This causes the processor module for the binary being opened to be re-compiled to a .sla file. This should only happen once for that processor and subsequent uses of the processor will use the latest compiled .sla file. We've tried to limit this re-compile by providing pre-compiled .sla files, but occasionally the time stamps are reset or timestamps that are too close are put on the files at build time.

Another potential issue on the i9 is the use of background threads that JAVA uses for garbage collection. On a processor with many available cores/threads we've seen some issues with GC thrashing. If this is the case, there is a setting to restrict the GC threads that might not be working in your case.

Occasionally plugins display information that must be constantly updated due to changes in the background or are costly to compute need to be re-designed, for example the data type archives have historically had performance problems that have been addressed when processing large programs. An attempt is made to maintain interactivity and reduce unnecessary updates with good event re-action management. The code is profiled from time to time when sluggishness is noted to identify the culprit.

If you note some sluggishness it may be there is computation in the background going on to display the data. For example, the code browser is doing an amazing number of controlled complex tasks to provide maximum configurabilty, maintain interactivity during analysis, all while smoothly scrolling with many types of plug-able fields displaying unknown information. These costs would be born in any GUI language chosen.

There are many advantages and some sacrifices to using JAVA. A few advantages come to mind that I depend on:

Ghidra rarely loses work due to bugs in both Ghidra base, contributions, and scripts, for example out of memory errors. Previously libraries of C/C++ code using JNI were used, but it was found that bugs in code caused crashes and loss of data at the worst time. Using C/C++ directly has been discouraged.
Scripts (and Ghidra) are fully interactively debuggable in modern development environments with the ability to push changes to code on the fly while debugging without re-setting up the issue under inspection.
Originally the portability of the GUI was an issue (Windows, Linux, Mac), however Qt has gotten better on that front. Also at the time of choice (not trying to get into a comparison of other options) JAVA was the only language that could provide the on the fly extendability necessary for Ghidra to do what it does.

Normally Ghidra runs with only 1Gig of available memory to Ghidra and I've found I rarely need to bump it up unless there is a greedy algorithm that needs to be run. The memory in JAVA is normally mostly just empty overhead to allow processing to be quick without excessive garbage collection. I have 125M in use with several programs up, Java is using 450M with about 300M free.

Ghidra provides a tremendous amount of capability using JAVA. The advantages and disadvantages of moving to another architecture would need to be very carefully weighed, to include first a re-design of sluggish algorithms, in favor of another option to merit the time and potential loss of capability, vice just enjoying using the tool ;-)

As I am using an i9 system any chance you could provide more details on how to prevent the GC thrashing? (Btw that wouldn't be an issue with C++ ;)).

Also I think most users probably come from IDA or the likes which just feels a lot more snappier in general. Not saying it can't be fixed without not using Java but it is an observation worth noting.

IDA has a freeware version if you'd like to compare using some samples.

Edit: Btw forgot to say -thanks for detailed response 馃憤

You can control the number of GC threads with the -XX:ParallelGCThreads= JVM argument. If you look at support/analyzeHeadless, you will see we restrict it to 2 because it may be a common use-case to start up several instances of the headless analyzer concurrently, and if we didn't restrict the number of GC threads, each process would create a GC thread for every available core on the system. For a system with a lot of cores, this can cause system-wide performance problems.

If you are launching the Ghidra GUI with the ghidraRun script, the number of GC threads is not explicitly defined, so by default it will start one thread per core. Since you most likely have only one instance of the GUI running at a time, this shouldn't be a problem, but it's certainly a setting you can play with to see if it helps you.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

0x6d696368 picture 0x6d696368  路  17Comments

0x6d696368 picture 0x6d696368  路  19Comments

tzizi picture tzizi  路  17Comments

dalvarezperez picture dalvarezperez  路  19Comments

Piruzzolo picture Piruzzolo  路  19Comments