Hello, I have many raw binary images that I imported using the headless analyzer, these files contain MIPS instructions that reference functions and data from some of the other raw files imported. How can I make the tool understand that the address is valid, but is contained in another file from the same project?
Its going to be a bit hacky, but you could possibly do it with the ExternalManager. You would need to create a library in the program making the reference create an external function for it and then you need to have at least 1 thunk to the created function in the EXTERNAL address space. (I don't fully understand why the thunk is a requirement)
Then in the binary which contains the function being called you need to add an external entry point at that functions entry point.
I understand this may be a bit unclear as I'm going off the top of my head and don't have the documentation in front of me. I have an example I'll comment later as I was recently playing with this.
You should also be able to do it directly through the gui too. In the external programs window create a new external program for the external binary and set the path. Then in the corresponding entry in the symbol table import category add an import for the function and set the symbol name and address in the external binary. You should then be able to make a reference to it via an external reference or by thunking to the function in the EXTERNAL address space.
To do this from the GUI:




After filling out the import information, creating a reference to the created external location should do the job.
Here is a crude example as a ghidra script
public class DummyScript extends GhidraScript {
@Override
public void run() throws Exception {
ExternalManager manager = currentProgram.getExternalManager();
Library library = manager.addExternalLibraryName(moduleName, SourceType.ANALYSIS);
// the address here is the external address not the functionAddress.
ExternalLocation loc = manager.addExtFunction(
library, externalLabelName, null, SourceType.ANALYSIS);
/* either get/create the function at the address making the external reference
or create a reference to the external address */
// using function
Function function = getFunction(program, address);
if (function != null && !function.isThunk()) {
function.setThunkedFunction(loc.createFunction());
}
// using reference using existing data at an address.
createExternalReference(data, library.getName(), externalLabelName, loc.getExternalSpaceAddress());
}
}
By "make the tool understand that the address is valid", what specifically would you want to happen when they are known to be valid?
I'm assuming you want the analysis to create pointers to those other programs.
You can just create an uninitialized block of memory that represents the address space available in the other programs. Ghidra will then recognize those locations. I'm assuming you are trying to get as automated as possible. You can always add individual external references by hand.
Alternatively you can just "import->add to program" the bytes from the other binaries into your first program.
Automatically creating external references for navigation to the other programs would be a little trickier. You could make the fake uninitialized blocks, let ghidra make references to them, and then write a script similar to @astrelsky. Just modifying any reference to the fake block to be an external reference to the other program.
An interesting idea would be to add the idea of an "external memory mapped blocks" that would automatically add external references to the associated external program when they fall within the external memory mapped block.
To do this automatically in headless, you would need to define all memory blocks for the external programs in some automated fashion, say in a script that new the full memory map. You could also have a script that just creates uninitialized memory blocks for all memory, except maybe the zero block so that you don't need to know the block.
@astrelsky: I believe the thunk is necessary to all the function signature to be placed on the external location such that the decompiler can pick up the function signature.
This "Alternatively you can just "import->add to program" the bytes from the other binaries into your first program." was the easiest for me and it worked. Thanks both for your suggestions!!
Not sure how hacky this is but you could save your binarys as hex32 files then just paste them all into one single file. Hacky but very simple.
Most helpful comment
By "make the tool understand that the address is valid", what specifically would you want to happen when they are known to be valid?
Are the programs raw binaries, or are they a format like ELF, PE, etc.?
I'm assuming you want the analysis to create pointers to those other programs.
You can just create an uninitialized block of memory that represents the address space available in the other programs. Ghidra will then recognize those locations. I'm assuming you are trying to get as automated as possible. You can always add individual external references by hand.
Alternatively you can just "import->add to program" the bytes from the other binaries into your first program.
Automatically creating external references for navigation to the other programs would be a little trickier. You could make the fake uninitialized blocks, let ghidra make references to them, and then write a script similar to @astrelsky. Just modifying any reference to the fake block to be an external reference to the other program.
An interesting idea would be to add the idea of an "external memory mapped blocks" that would automatically add external references to the associated external program when they fall within the external memory mapped block.
To do this automatically in headless, you would need to define all memory blocks for the external programs in some automated fashion, say in a script that new the full memory map. You could also have a script that just creates uninitialized memory blocks for all memory, except maybe the zero block so that you don't need to know the block.
@astrelsky: I believe the thunk is necessary to all the function signature to be placed on the external location such that the decompiler can pick up the function signature.