Radare2: How to interpret the output of radiff2, and how does it work?

Created on 26 Jul 2016  Â·  9Comments  Â·  Source: radareorg/radare2

Hello,

I successfully use the latest radiff2 to launch the function-level similarity comparison. Could anyone shed light on how to interpret the output of this tool?

For example, in the output, here is one result:

sym.emit_mandatory_arg_note   44 0x40197c | UNMATCH  (1.000000) | 0x401480    48 sym.imp.dcgettext

Then how should I interpret the score (1.000000), does it represent a "Confidence interval" for the UNMATCH?

In addition, I am curious on how does radiff2 work. If I understand correctly, is one function from Binary_1 compared to every function in Binary_2, and acquire the sorted top one similar matching?

radiff2

All 9 comments

RTFS

On 26 Jul 2016, at 23:57, Shuai Wang [email protected] wrote:

Hello,

I successfully use the latest radiff2 to launch the function-level similarity comparison. Could anyone shed light on how to interpret the output of this tool?

For example, in the output, here is one result:

sym.emit_mandatory_arg_note 44 0x40197c | UNMATCH (1.000000) | 0x401480 48 sym.imp.dcgettext
Then how should I interpret the score (1.000000), does it represent a "Confidence interval" for the UNMATCH?

In addition, I am curious on how does radiff2 work. If I understand correctly, is one function from Binary_1 compared to every function in Binary_2, and acquire the sorted top one similar matching?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/5388, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lr7MrbD-u8pq0ESrz5XsxrmDPqTKks5qZoKzgaJpZM4JVonl.

@radare I don't get it. Here is my question, How does 1.00000 mean regarding "unmatch" ??

Rtfs = read the fine source

The scale is from 0 to 1. 1 means exact match

On 27 Jul 2016, at 01:51, Shuai Wang [email protected] wrote:

@radare I don't get it. Here is my question, How does 1.00000 mean regarding "unmatch" ??

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@radare I totally understand the meaning of RTFS or RTFM. But this is so wired. I checked multiple outputs of function-level comparison. And all I can find is either 0 or 1. Besides, if 1 means exact match, then it is inconsistent with "unmatch".

Hi guys, so yes, 0 to 1 is the similarity score. The radiff util returns the difference and the similarity calculated as:

            double diff = (double) (*distance) / (double) (R_MAX (aLen, bLen));
            *similarity = (double)1 - diff;

I haven't looked in to the other parts of radiff, my changes were local to the r_diff_buffers_distance function, could it be that the comparison operator that's calling the buffers distance is having issues with a floating point comparison?

The stuff I worked on for optimisation was (i thought) entirely selfcontained in the r_diff_buffers_distance black box. The main optimisation to r_diff_buffers_distance was to change the inner loop length depending on certain criteria - some of these resulted in reads past the end of the buffer but they appear to be fixed. The only modified buffers are (ut32) *distance and (double) *similarity:

    if (distance) {
        // the final distance is the last byte we processed in the inner loop.
        // v0 is used instead of v1 because we switched the pointers before exiting the outer loop
        *distance = v0[stop];
        if (similarity) {
            double diff = (double) (*distance) / (double) (R_MAX (aLen, bLen));
            *similarity = (double)1 - diff;
        }
    }

I had a bit of a look at where r_diff_buffers_distance is used, it made my brain hurt. But, here are some observations:
r_diff_buffers_distance is called from: radiff2.c (one location: main), and libr/anal/diff.c (3 locations: r_anal_diff_bb, and twice in r_anal_diff_fcn).

The lower level of this looks to be r_anal_diff_fcn (types of RAnalFunction are used by r_anal_diff_bb), but both functions set R_ANAL_DIFF_TYPE_MATCH or R_ANAL_DIFF_TYPE_UNMATCH.

In r_anal_diff_fcn, the struct for RAnalFunction includes: ut8 pointer to *fingerprint, and an RAnalDiff pointer to *diff.

Struct RAnalDiff has a (double) distance, and no similarity. Which is kinda hinkey because distance should be an integer (ut32?). And, the calling functions seem to refer to the similarity in &t which should be (and is) a (double)..

r_diff_buffers_distance (
    NULL,
    fcn->fingerprint,
    fcn_size,
    fcn2->fingerprint,
    fcn2_size,
    NULL,
    &t);
fcn->diff->dist = fcn2->diff->dist = t;

while the declaration for the distance diffing function is:

R_API bool r_diff_buffers_distance(
    RDiff *d,
    const ut8 *a,
    ut32 la,
    const ut8 *b,
    ut32 lb,
    ut32 *distance,
    double *similarity) {..}

So, I don't know if it's significant, but the dist in the returned 't' is actually the similarity not the distance.. t is a double from 0 to 1, while distance is ut32 but null from the calling function.

The t==1 comparison (see below) here might be causing issues? I don't know.. I guess it should work because doubles accurately represent integers for small values right?

            r_diff_buffers_distance (NULL, fcn->fingerprint, r_anal_fcn_size (fcn),
                    fcn2->fingerprint, r_anal_fcn_size (fcn2), NULL, &t);

note the &t will contain the (double) similarity result, then:

            /* Set flag in matched functions */
            fcn->diff->type = fcn2->diff->type = (t==1)?
                R_ANAL_DIFF_TYPE_MATCH: R_ANAL_DIFF_TYPE_UNMATCH;
            fcn->diff->dist = fcn2->diff->dist = t;

So yeah, there's some inconsistencies in the calling code as far as variable types and comparisons go that COULD perhaps be resulting in t==1 borking??

Sorry guys, again, out of time but if that sounds like it's the right area to be looking, I'll have a peek over the next few weeks..

so it’s a regression after that optimization?

On 28 Jul 2016, at 08:07, NikolaiHampton [email protected] wrote:

The stuff I worked on for optimisation was (i thought) entirely selfcontained in the r_diff_buffers_distance black box. The main optimisation to r_diff_buffers_distance was to change the inner loop length depending on certain criteria - some of these resulted in reads past the end of the buffer but they appear to be fixed. The only modified buffers are (ut32) *distance and (double) *similarity.
The r_diff_buffers_distance function also updates the Distance on return:

if (distance) {
    // the final distance is the last byte we processed in the inner loop.
    // v0 is used instead of v1 because we switched the pointers before exiting the outer loop
    *distance = v0[stop];
    if (similarity) {
        double diff = (double) (*distance) / (double) (R_MAX (aLen, bLen));
        *similarity = (double)1 - diff;
    }
}

I had a bit of a look at where r_diff_buffers_distance is used, it made my brain hurt. But, here are some observations:
r_diff_buffers_distance is called from: radiff2.c (one location: main), and libr/anal/diff.c (3 locations: r_anal_diff_bb, and twice in r_anal_diff_fcn).

The lower level of this looks to be r_anal_diff_fcn (types of RAnalFunction are used by r_anal_diff_bb), but both functions set R_ANAL_DIFF_TYPE_MATCH or R_ANAL_DIFF_TYPE_UNMATCH.

In r_anal_diff_fcn, the struct for RAnalFunction includes: ut8 pointer to *fingerprint, and an RAnalDiff pointer to *diff.

Struct RAnalDiff has a (double) distance, and no similarity. Which is kinda hinkey because distance should be an integer (ut32?). And, the calling functions seem to refer to the similarity in &t which should be (and is) a (double)..

        r_diff_buffers_distance (NULL, fcn->fingerprint, fcn_size,fcn2->fingerprint, fcn2_size, NULL, &t);
        fcn->diff->dist = fcn2->diff->dist = t;

while the types for the distance diffing is:

R_API bool r_diff_buffers_distance(RDiff *d, const ut8 *a, ut32 la, const ut8 *b, ut32 lb, ut32 *distance, double *similarity) {
So, I don't know if it's significant, but the dist in the returned 't' is actually the similarity not the distance.. t is a double from 0 to 1, while distance is ut32 but null from the calling function.

The t==1 comparison (see below) here might be causing issues? I don't know.. I guess it should work because doubles accurately represent integers for small values right?

        r_diff_buffers_distance (NULL, fcn->fingerprint, r_anal_fcn_size (fcn),
                fcn2->fingerprint, r_anal_fcn_size (fcn2), NULL, &t);

note the &t will contain the (double) similarity result, then:

        /* Set flag in matched functions */
        fcn->diff->type = fcn2->diff->type = (t==1)?
            R_ANAL_DIFF_TYPE_MATCH: R_ANAL_DIFF_TYPE_UNMATCH;
        fcn->diff->dist = fcn2->diff->dist = t;

So yeah, there's some inconsistencies in the calling code as far as variable types and comparisons go that COULD perhaps be resulting in t==1 borking??

Sorry guys, again, out of time but if that sounds like it's the right area to be looking, I'll have a peek over the next few weeks..

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/5388#issuecomment-235807237, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lnbGofajW_bSPkcks3w_xAEyz5lTks5qaEcmgaJpZM4JVonl.

see the new radiff2 -S

Was this page helpful?
0 / 5 - 0 ratings

Related issues

YugoCode picture YugoCode  Â·  3Comments

XVilka picture XVilka  Â·  7Comments

XVilka picture XVilka  Â·  3Comments

PaquitoRiviera picture PaquitoRiviera  Â·  7Comments

YugoCode picture YugoCode  Â·  6Comments