I want to integrate Ghidra RE tools into an application which does not expect the user/runner to have Ghidra installed. Leveraging the tools has proven to be quite difficult as most seem to expect DB objects -- which I can't seem to create without a project.
I've noticed that no one else has raised any issues relating to this use case, and there is very, very little documentation on this in the support files. Is there a best-practices or any set of examples that provide some insight on how to use Ghidra in external applications?
Minor update: I found the mechanism for accessing the application in jar mode:
Object consumer = new Object();
Application.initilalizeApplication(new GhidraJarApplicationLayout(), new HeadlessGhidraApplicationConfiguration());
Program program = AutoImporter.importByUsingBestGuess(programFile, null, consumer, new MessageLog(), TaskMonitor.DUMMY);
// do stuff with it
program.release(consumer);
This allows me to do a majority of what I needed, but having a best-practices or examples document/repo would still be quite helpful.
Here's one: https://github.com/nshalabi/Coding-Ghidra
What you did is fine, but you could simplify things a bit if you use the HeadlessAnalyzer class, which initializes the Application for you, and ultimately uses the AutoImporter.
HeadlessAnalyzer headless = HeadlessAnalyzer.getInstance();
headless.processLocal(...);
@ryanmkurtz Ooh, fantastic -- I'm assuming there are other optimisations as well that are simply not listed. Upon review, this is less optimal for my use case -- I have no desire to create any project files.
@polsab The link you provided is interesting, but I don't think that's optimal, either -- I have no intent on making any project files (actually, touching disk at all, if possible).
I have raised #799 in association with this issue, as I expect there to be several use-cases for this (e.g. CI/CD for plugin development, applications/libraries developed using Ghidra, etc)
Could you outline your specific use case (i.e., workflow). There are many possible use cases and knowing yours would help us respond appropriately. All use cases will require disk access (or ram disk), temporary file storage, etc. Frequently a temporary/transient Ghidra project may be utilized for staging files where the use of a Ghidra Server is involved.
I would like to open programs, analyse them, and close them without the use of the Ghidra frontend or headless mode, but through the application itself. I would prefer not to save project files as I am not creating projects, simply analysing programs passed to the library and then discarding them. The resulting information garnered from this single analysis would be passed to other programs for further processing. I wish to avoid requesting that users install Ghidra and centralise the processing, so it is not appropriate to use either visual or headless mode and provide a script.
To be clear, I am not specifically requesting documentation or examples for my use case, but for the general use of Ghidra as a single jar. For example, I have noticed significant differences in decompiled output between single jar and frontend/headless mode (e.g. no resolution of libc functions). Some differences in behaviour are non-trivial and are difficult to resolve without extensive knowledge of the design of Ghidra.
Ultimately, my question is: What additional considerations must be taken while using the single jar vs Ghidra frontend or headless? Can documentation on those differences be added to READMEs in Ghidra so that developers using Ghidra as a library can do so more effectively?
The Ghidra jar mode can still utilize the HeadlessAnalyzer.getInstance() to perform work via the API. The HeadlessAnalyzer class can facilitate quite a few non-GUI and non-command-line use cases for cloud computing deployments. The script for building a ghidra.jar can be tailored to limit the content.
Unclear what you mean by resolution of libc functions. For ELF external function symbols should be identified as Imports. The differences you see may be a function of the import mechanism, associated import options and use of auto-analysis w/options. Examine and compare your Program properties between the two resulting Ghidra programs.
I understand that the headless analyzer can be used to facilitate these actions, but they explicitly require project information -- which I do not want to use.
Taking your advice (sort of?) and looking into the behaviour of HeadlessAnalyzer a bit deeper, I was successfully able to complete full analysis rather than just minimal analysis by adding the following lines:
int analyse = program.startTransaction("ANALYSE");
FlatProgramAPI api = new FlatProgramAPI(program);
api.analyzeAll(program);
program.endTransaction(analyse, true);
I understand the need for disk access now (looking at the tmp dir) as it is used for db caching, relating to your previous comment.
I suspect that your ultimate advice for using Ghidra as a library is "instantiate the HeadlessAnalyzer and use it like a script", but this is not the means by which I had hoped to use Ghidra as a library. If this is the best mechanism by which to use Ghidra, then I guess my request for documentation on project-less Ghidra as a library is somewhat moot as it likely involves using Ghidra in possibly unintended manners.
Sorry for the delayed response. Yes, the headless analyzer (or doing what the headless analyzer is doing internally) is our best way of doing analysis when using Ghidra as a library. Not involving the project-mechanism is out of scope for us right now.
Most helpful comment
I would like to open programs, analyse them, and close them without the use of the Ghidra frontend or headless mode, but through the application itself. I would prefer not to save project files as I am not creating projects, simply analysing programs passed to the library and then discarding them. The resulting information garnered from this single analysis would be passed to other programs for further processing. I wish to avoid requesting that users install Ghidra and centralise the processing, so it is not appropriate to use either visual or headless mode and provide a script.
To be clear, I am not specifically requesting documentation or examples for my use case, but for the general use of Ghidra as a single jar. For example, I have noticed significant differences in decompiled output between single jar and frontend/headless mode (e.g. no resolution of libc functions). Some differences in behaviour are non-trivial and are difficult to resolve without extensive knowledge of the design of Ghidra.
Ultimately, my question is: What additional considerations must be taken while using the single jar vs Ghidra frontend or headless? Can documentation on those differences be added to READMEs in Ghidra so that developers using Ghidra as a library can do so more effectively?