Ghidra: Need a more general/flexible way to specify known non-returning functions

Created on 15 Jul 2020  路  9Comments  路  Source: NationalSecurityAgency/ghidra

Is your feature request related to a problem? Please describe.

I'm using the BinaryLoader on a number of programs that all use the same API. In this situation, the loader-specific lists of non-returning functions don't apply, so that there's no way to use the analyzer Non-returning functions - Known.

Instead, for each program I have to manually flag whatever non-returning functions aren't discovered by Non-returning functions - Discovered.

Describe the solution you'd like
I'd like an easy way for the average user to provide a user-specified file of non-returning function names.

How this might be done:

  • Making BinaryLoader more flexible
  • A script for just this purpose

Describe alternatives you've considered

  • Writing my own custom analyzer or script. But,
    -- This would require a significant learning curve
    -- Whatever I construct on my own is likely to be
    --- Less well-integrated into Ghidra
    --- Harder to test
    --- More fragile than something coming from the experts.

Additional context
See issue #2101.

Although my request addresses the BinaryLoader, it also highlights a problem with the linkage between the three specific loaders each with their own relatively fixed list of non-returning functions. This is especially the case with elf format, which is used across many hardware architectures, operating systems, and other APIs.

Most helpful comment

As a work-around a Raw Binary entry could be added to the noReturnFunctionConstraints.xml file and a new data file added which contains the non-returning function names. :

<executable_format name="Raw Binary">
    <functionNamesFile>BinaryFunctionsThatDoNotReturn</functionNamesFile>
</executable_format>

Additional constraints can be added to the above entry (by nesting) to narrow it down to a specific case. Other constraints which may be of use include: compiler, language and property. Where compiler and language constraints specify an id attribute, and property constraints identify a specific _Program Information_ property by name and value attribute. The language id also supports the use of a wild card between :'s (e.g., id="ARM:LE:32:*"). Example:

<executable_format name="Raw Binary">
   <property name="OriginalFilename" value="mybinary">
       <functionNamesFile>MyBinaryFunctionsThatDoNotReturn</functionNamesFile>
   </property>
   <language id="ARM:LE:32:*">
       <functionNamesFile>BinaryARMFunctionsThatDoNotReturn</functionNamesFile>
   </language>
</executable_format>

NOTE: I have not actually tried the above which I have based on code inspection of the constraint parser DecisionTree.java and the various ProgramConstraint implementations.

All 9 comments

The files containing such function names are contained within the Ghidra installation and may be readily modified within the directory Ghidra/Features/Base/data. These files are distinguished based upon the load format as identified by the specification file noReturnFunctionConstraints.xml. For example, ELF non-returning function names are contained within the file ElfFunctionsThatDoNotReturn while PE (i.e., windows) non-returning function names are contained within the file PEFunctionsThatDoNotReturn.

@ghidra1, there is some background for this request in #2101. In this case, he's using the BinaryLoader, so none of those files will be applied.

As a work-around a Raw Binary entry could be added to the noReturnFunctionConstraints.xml file and a new data file added which contains the non-returning function names. :

<executable_format name="Raw Binary">
    <functionNamesFile>BinaryFunctionsThatDoNotReturn</functionNamesFile>
</executable_format>

Additional constraints can be added to the above entry (by nesting) to narrow it down to a specific case. Other constraints which may be of use include: compiler, language and property. Where compiler and language constraints specify an id attribute, and property constraints identify a specific _Program Information_ property by name and value attribute. The language id also supports the use of a wild card between :'s (e.g., id="ARM:LE:32:*"). Example:

<executable_format name="Raw Binary">
   <property name="OriginalFilename" value="mybinary">
       <functionNamesFile>MyBinaryFunctionsThatDoNotReturn</functionNamesFile>
   </property>
   <language id="ARM:LE:32:*">
       <functionNamesFile>BinaryARMFunctionsThatDoNotReturn</functionNamesFile>
   </language>
</executable_format>

NOTE: I have not actually tried the above which I have based on code inspection of the constraint parser DecisionTree.java and the various ProgramConstraint implementations.

This will probably be okay as a workaround.

I'd like to leave my feature request out there, though, in hopes that a better approach can be found for a future release.

Additional constraints can be added to the above entry (by nesting) to narrow it down to a specific case. Other constraints which may be of use include: compiler, language and property. Where compiler and language constraints specify an id attribute, and property constraints identify a specific _Program Information_ property by name and value attribute. The language id also supports the use of a wild card between :'s (e.g., id="ARM:LE:32:*"). Example:

<executable_format name="Raw Binary">
   <property name="OriginalFilename" value="mybinary">
       <functionNamesFile>MyBinaryFunctionsThatDoNotReturn</functionNamesFile>
   </property>
   <language id="ARM:LE:32:*">
       <functionNamesFile>BinaryARMFunctionsThatDoNotReturn</functionNamesFile>
   </language>
</executable_format>

NOTE: I have not actually tried the above which I have based on code inspection of the constraint parser DecisionTree.java and the various ProgramConstraint implementations.

It's not clear to me how the example constraint(s) would work.

Of course, it's obvious that the constraint(s) are only applicable to Raw Binary format.

Beyond that, though, are/is there:

  • Two totally independent constraints?
    -- The first that constrains the applicability of MyBinaryFunctionsThatDoNotReturn to _just_ raw binary files named mybinary.
    -- The second that constrains the applicability of BinaryARMFunctionsThatDoNotReturn to _just_ raw binary files that match the language ID.
    If so, the straightforward interpretation would seem to be this:

    1. If the file being imported is named mybinary, the function name list comes from MyBinaryFunctionsThatDoNotReturn and the second constraint is ignored. [Else,]

    2. If the user selected language for the import matches ARM:LE:32:*, the function name list comes from BinaryARMFunctionsThatDoNotReturn

  • Some more subtle kind or combination of restraint(s)?

As you pointed out these are two independent constraints, although I am unsure how the constraint precedence mechanism works. This will take some code inspection or asking the right person to find out.

The constraints can be 'and-ed' by nesting them:

<executable_format name="Raw Binary">
   <property name="OriginalFilename" value="mybinary">
      <language id="ARM:LE:32:*">
          <functionNamesFile>MyBinaryFunctionsThatDoNotReturn</functionNamesFile>
      </language>
   </property>
</executable_format>

Arranged that way, it is much clearer that the effect is like a logical "and" of the two constraint conditions. Thanks. That example is better suited for the write-up I'm making for our other team members here.

We are considering some changes to data type archives, and it got me thinking.

Function signatures can be tagged by name as non-returning in a data type archive.
You can apply all the function signatures from an archive by name.
So for a project or IDE, etc. there are certain functions that are non-returning.

You could apply the non-returning functions from there regardless of format.
I suppose the Known non-returning analyzer could look at whatever archives you have open, to see any tagged non-returning ones.

That said, if you import a binary as raw, where did the names come from?
If you then apply some names, you can then apply the function signatures from an archive that match the names, then if they have the non-returning attribute they would be made non-returning.

That said, if you import a binary as raw, where did the names come from?
If you then apply some names, you can then apply the function signatures from an archive that match the names, then if they have the non-returning attribute they would be made non-returning.

This is what I've been doing. 馃槃 Fortunately, due to limitations on the code I can access from home due to COVID-19, the number of functions has been very small and easy to deal with.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

loudinthecloud picture loudinthecloud  路  3Comments

forkoz picture forkoz  路  3Comments

0x6d696368 picture 0x6d696368  路  3Comments

lab313ru picture lab313ru  路  3Comments

marcushall42 picture marcushall42  路  3Comments