Hello,
I successfully use the latest radiff2 to launch the function-level similarity comparison. Could anyone shed light on how to interpret the output of this tool?
For example, in the output, here is one result:
sym.emit_mandatory_arg_note 44 0x40197c | UNMATCH (1.000000) | 0x401480 48 sym.imp.dcgettext
Then how should I interpret the score (1.000000), does it represent a "Confidence interval" for the UNMATCH?
In addition, I am curious on how does radiff2 work. If I understand correctly, is one function from Binary_1 compared to every function in Binary_2, and acquire the sorted top one similar matching?
Dup of https://github.com/radare/radare2/issues/5142 it seems
RTFS
On 26 Jul 2016, at 23:57, Shuai Wang [email protected] wrote:
Hello,
I successfully use the latest radiff2 to launch the function-level similarity comparison. Could anyone shed light on how to interpret the output of this tool?
For example, in the output, here is one result:
sym.emit_mandatory_arg_note 44 0x40197c | UNMATCH (1.000000) | 0x401480 48 sym.imp.dcgettext
Then how should I interpret the score (1.000000), does it represent a "Confidence interval" for the UNMATCH?In addition, I am curious on how does radiff2 work. If I understand correctly, is one function from Binary_1 compared to every function in Binary_2, and acquire the sorted top one similar matching?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/5388, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lr7MrbD-u8pq0ESrz5XsxrmDPqTKks5qZoKzgaJpZM4JVonl.
@radare I don't get it. Here is my question, How does 1.00000 mean regarding "unmatch" ??
Rtfs = read the fine source
The scale is from 0 to 1. 1 means exact match
On 27 Jul 2016, at 01:51, Shuai Wang [email protected] wrote:
@radare I don't get it. Here is my question, How does 1.00000 mean regarding "unmatch" ??
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
@radare I totally understand the meaning of RTFS or RTFM. But this is so wired. I checked multiple outputs of function-level comparison. And all I can find is either 0 or 1. Besides, if 1 means exact match, then it is inconsistent with "unmatch".
Hi guys, so yes, 0 to 1 is the similarity score. The radiff util returns the difference and the similarity calculated as:
double diff = (double) (*distance) / (double) (R_MAX (aLen, bLen));
*similarity = (double)1 - diff;
I haven't looked in to the other parts of radiff, my changes were local to the r_diff_buffers_distance function, could it be that the comparison operator that's calling the buffers distance is having issues with a floating point comparison?
The stuff I worked on for optimisation was (i thought) entirely selfcontained in the r_diff_buffers_distance black box. The main optimisation to r_diff_buffers_distance was to change the inner loop length depending on certain criteria - some of these resulted in reads past the end of the buffer but they appear to be fixed. The only modified buffers are (ut32) *distance and (double) *similarity:
if (distance) {
// the final distance is the last byte we processed in the inner loop.
// v0 is used instead of v1 because we switched the pointers before exiting the outer loop
*distance = v0[stop];
if (similarity) {
double diff = (double) (*distance) / (double) (R_MAX (aLen, bLen));
*similarity = (double)1 - diff;
}
}
I had a bit of a look at where r_diff_buffers_distance is used, it made my brain hurt. But, here are some observations:
r_diff_buffers_distance is called from: radiff2.c (one location: main), and libr/anal/diff.c (3 locations: r_anal_diff_bb, and twice in r_anal_diff_fcn).
The lower level of this looks to be r_anal_diff_fcn (types of RAnalFunction are used by r_anal_diff_bb), but both functions set R_ANAL_DIFF_TYPE_MATCH or R_ANAL_DIFF_TYPE_UNMATCH.
In r_anal_diff_fcn, the struct for RAnalFunction includes: ut8 pointer to *fingerprint, and an RAnalDiff pointer to *diff.
Struct RAnalDiff has a (double) distance, and no similarity. Which is kinda hinkey because distance should be an integer (ut32?). And, the calling functions seem to refer to the similarity in &t which should be (and is) a (double)..
r_diff_buffers_distance (
NULL,
fcn->fingerprint,
fcn_size,
fcn2->fingerprint,
fcn2_size,
NULL,
&t);
fcn->diff->dist = fcn2->diff->dist = t;
while the declaration for the distance diffing function is:
R_API bool r_diff_buffers_distance(
RDiff *d,
const ut8 *a,
ut32 la,
const ut8 *b,
ut32 lb,
ut32 *distance,
double *similarity) {..}
So, I don't know if it's significant, but the dist in the returned 't' is actually the similarity not the distance.. t is a double from 0 to 1, while distance is ut32 but null from the calling function.
The t==1 comparison (see below) here might be causing issues? I don't know.. I guess it should work because doubles accurately represent integers for small values right?
r_diff_buffers_distance (NULL, fcn->fingerprint, r_anal_fcn_size (fcn),
fcn2->fingerprint, r_anal_fcn_size (fcn2), NULL, &t);
note the &t will contain the (double) similarity result, then:
/* Set flag in matched functions */
fcn->diff->type = fcn2->diff->type = (t==1)?
R_ANAL_DIFF_TYPE_MATCH: R_ANAL_DIFF_TYPE_UNMATCH;
fcn->diff->dist = fcn2->diff->dist = t;
So yeah, there's some inconsistencies in the calling code as far as variable types and comparisons go that COULD perhaps be resulting in t==1 borking??
Sorry guys, again, out of time but if that sounds like it's the right area to be looking, I'll have a peek over the next few weeks..
so it’s a regression after that optimization?
On 28 Jul 2016, at 08:07, NikolaiHampton [email protected] wrote:
The stuff I worked on for optimisation was (i thought) entirely selfcontained in the r_diff_buffers_distance black box. The main optimisation to r_diff_buffers_distance was to change the inner loop length depending on certain criteria - some of these resulted in reads past the end of the buffer but they appear to be fixed. The only modified buffers are (ut32) *distance and (double) *similarity.
The r_diff_buffers_distance function also updates the Distance on return:if (distance) { // the final distance is the last byte we processed in the inner loop. // v0 is used instead of v1 because we switched the pointers before exiting the outer loop *distance = v0[stop]; if (similarity) { double diff = (double) (*distance) / (double) (R_MAX (aLen, bLen)); *similarity = (double)1 - diff; } }I had a bit of a look at where r_diff_buffers_distance is used, it made my brain hurt. But, here are some observations:
r_diff_buffers_distance is called from: radiff2.c (one location: main), and libr/anal/diff.c (3 locations: r_anal_diff_bb, and twice in r_anal_diff_fcn).The lower level of this looks to be r_anal_diff_fcn (types of RAnalFunction are used by r_anal_diff_bb), but both functions set R_ANAL_DIFF_TYPE_MATCH or R_ANAL_DIFF_TYPE_UNMATCH.
In r_anal_diff_fcn, the struct for RAnalFunction includes: ut8 pointer to *fingerprint, and an RAnalDiff pointer to *diff.
Struct RAnalDiff has a (double) distance, and no similarity. Which is kinda hinkey because distance should be an integer (ut32?). And, the calling functions seem to refer to the similarity in &t which should be (and is) a (double)..
r_diff_buffers_distance (NULL, fcn->fingerprint, fcn_size,fcn2->fingerprint, fcn2_size, NULL, &t); fcn->diff->dist = fcn2->diff->dist = t;while the types for the distance diffing is:
R_API bool r_diff_buffers_distance(RDiff *d, const ut8 *a, ut32 la, const ut8 *b, ut32 lb, ut32 *distance, double *similarity) {
So, I don't know if it's significant, but the dist in the returned 't' is actually the similarity not the distance.. t is a double from 0 to 1, while distance is ut32 but null from the calling function.The t==1 comparison (see below) here might be causing issues? I don't know.. I guess it should work because doubles accurately represent integers for small values right?
r_diff_buffers_distance (NULL, fcn->fingerprint, r_anal_fcn_size (fcn), fcn2->fingerprint, r_anal_fcn_size (fcn2), NULL, &t);note the &t will contain the (double) similarity result, then:
/* Set flag in matched functions */ fcn->diff->type = fcn2->diff->type = (t==1)? R_ANAL_DIFF_TYPE_MATCH: R_ANAL_DIFF_TYPE_UNMATCH; fcn->diff->dist = fcn2->diff->dist = t;So yeah, there's some inconsistencies in the calling code as far as variable types and comparisons go that COULD perhaps be resulting in t==1 borking??
Sorry guys, again, out of time but if that sounds like it's the right area to be looking, I'll have a peek over the next few weeks..
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/5388#issuecomment-235807237, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lnbGofajW_bSPkcks3w_xAEyz5lTks5qaEcmgaJpZM4JVonl.
see the new radiff2 -S