Lots of false positives after upgrade to 4.0.0
aws-java-sdk-core-1.11.455.jar (cpe:/a:cce-interact:interact:1.11.455, com.amazonaws:aws-java-sdk-core:1.11.455, cpe:/a:interact:interact:1.11.455) : CVE-2006-1643, CVE-2006-1642, CVE-2007-4177, CVE-2006-1644
quartz-2.3.0.jar (org.quartz-scheduler:quartz:2.3.0, cpe:/a:jenkins:jenkins:2.3) : CVE-2018-1000169, CVE-2017-2610, CVE-2017-2611, CVE-2017-1000504, CVE-2017-2609, CVE-2017-2601, CVE-2017-2602, CVE-2017-2603, CVE-2017-2604, CVE-2017-2606, CVE-2017-2607, CVE-2017-2608, CVE-2017-1000354, CVE-2017-1000398, CVE-2017-1000355, CVE-2017-1000399, CVE-2017-1000396, CVE-2017-1000353, CVE-2017-1000356, CVE-2018-6356, CVE-2017-2612, CVE-2017-1000391, CVE-2017-2613, CVE-2017-1000394, CVE-2017-1000395, CVE-2018-1000170, CVE-2017-1000392, CVE-2017-1000393, CVE-2018-1000067, CVE-2017-2598, CVE-2018-1000068, CVE-2017-1000400, CVE-2017-2599, CVE-2017-1000401, CVE-2017-17383, CVE-2017-2600, CVE-2016-9299, CVE-2018-1999043, CVE-2018-1999042, CVE-2018-1000195, CVE-2018-1999005, CVE-2018-1999004, CVE-2018-1000193, CVE-2018-1999007, CVE-2018-1000194, CVE-2018-1999006, CVE-2018-1999001, CVE-2018-1999045, CVE-2018-1000192, CVE-2018-1999044, CVE-2018-1999003, CVE-2018-1999047, CVE-2018-1999002, CVE-2018-1999046
json-20180813.jar (cpe:/a:light:light:-, org.json:json:20180813, cpe:/a:all-for-one:all_for_one:-) : CVE-2018-12056
commons-lang-2.6.0.redhat-7.jar (cpe:/a:linux:util-linux:2.6.0, commons-lang:commons-lang:2.6.0.redhat-7) : CVE-2011-1677, CVE-2011-1676, CVE-2011-1675
Looks like it's enough if version match even when other identifiers are different
I can confirm the behavior that it is enough if the version matches...I have plenty of FPs such as
micrometer-core-1.1.0.jar -> cpe:/a:git:git:1.1.0, cpe:/a:git_project:git:1.1.0
Having the same issue after switching from 3.X to 4.X:
netflix-eventbus-0.3.0.jar: ids:(cpe:/a:git_project:git:0.3.0, com.netflix.netflix-commons:netflix-eventbus:0.3.0, cpe:/a:git:git:0.3.0) : CVE-2008-5516, CVE-2010-2542, CVE-2010-3906, CVE-2013-0308, CVE-2014-9938, CVE-2015-7082, CVE-2015-7545, CVE-2017-14867
archaius-core-0.7.4.jar: ids:(com.netflix.archaius:archaius-core:0.7.4, cpe:/a:git:git:0.7.4, cpe:/a:git_project:git:0.7.4) : CVE-2008-5516, CVE-2010-2542, CVE-2010-3906, CVE-2013-0308, CVE-2014-9938, CVE-2015-7082, CVE-2015-7545, CVE-2017-14867
servo-core-0.10.1.jar: ids:(cpe:/a:docker:docker:0.10.1, com.netflix.servo:servo-core:0.10.1) : CVE-2014-0047, CVE-2014-5277, CVE-2014-6407, CVE-2014-9358, CVE-2015-3627, CVE-2015-3630, CVE-2015-3631, CVE-2016-3697, CVE-2017-14992, CVE-2017-7297
netflix-infix-0.3.0.jar: ids:(cpe:/a:git_project:git:0.3.0, cpe:/a:git:git:0.3.0, com.netflix.netflix-commons:netflix-infix:0.3.0) : CVE-2008-5516, CVE-2010-2542, CVE-2010-3906, CVE-2013-0308, CVE-2014-9938, CVE-2015-7082, CVE-2015-7545, CVE-2017-14867
jersey-apache-client4-1.19.1.jar: ids:(cpe:/a:oracle:oracle_client:1.19.1, com.sun.jersey.contribs:jersey-apache-client4:1.19.1) : CVE-2006-0550
Unfortunately I require a working 4.X version and can't move back to 3.X as I moved on to Gradle 5.
I've had some time to dig deeper into the problem and found the following issue:
With version 4.0.0 of the dependency check the lucene library got updated.
At this place the lucene index is queried to get the matching CPEs for a vendor/product:
https://github.com/jeremylong/DependencyCheck/blob/c11905684ceefe5c58077bb588d88f7af5fb41db/core/src/main/java/org/owasp/dependencycheck/analyzer/CPEAnalyzer.java#L323-L325
As you can see, afterwards a check is done so that the matching score is bigger than 0.08.
I've tested the library "com.netflix.hystrix:hystrix-core:1.5.12" and the results were:
In fact it was a drastical score increase: The max score with the old version was "0.038690303" but with the new version it is "41.03853".
So the question it now: What triggered the increase of the lucene result score?!
After I have read some news on the changes of Lucene 6 and 7, I can tell you what's going on here:
With Lucene 6 the scoring algorithm was changed from TF*IDF to BM25 - so it is expected that scores look different now with Lucene 6.
Additionally with Lucene 7 the query normalization was dropped. That means that all versions before Lucene 7 tried to bring the score to a value between 0 and 1. But with the new version the score can have any value.
And finally the developers of Lucene mention that it is discouraged (and already was for several Lucene versions) to use the score to compare the results to "numbers" or to other results. The score should only be used to sort the resulting documents of a query.
This in turn means: The current approach is broken and needs a different solution than before. Simply comparing the score to "0.08" will not work anylonger.
I did some checks, if it would be possible to downgrade Lucene (from users build). Unfortunatelly its not possible as there are lucene version related changes inside Engine.java. For a quick fix the easy way is to revert this commit in dependency check and run release: https://github.com/jeremylong/DependencyCheck/commit/e0d644bdc9d7986f8153e91dbb8e7713565970f3
(this version is vulnerable according to the git message. I do not think its that big issue considering how long lucene actually runs...).
@jeremylong Can you please provide your suggestion as author of the plugin? We as Gradle 5 users do not really have any option to go forward or backwards...
Edit: ~it seems that to make it "working" again, we just need to normalize it manually. Its rather simple to do using the maxScore reported by topDocs:~
https://lucene.apache.org/core/7_0_0/core/org/apache/lucene/search/TopDocs.html#getMaxScore--
It would be still a broken approach though, but it will at least make it as functional* as it was before.
*provided that the 0.08 constant works reasonably for the changed algorithm...hard to guess, no idea why its 0.08
Edit2: More investigation. Its not that simple. The issue is that with maxScore normalization, we'll get only normalized score per query. So it still ends up with values way above 0.08 for hystrix (when you search and there is nothing relevant, all the results will not be relevant). It seems that the root cause is here:
Query normalization's goal was to make scores comparable across queries, which was only implemented by the ClassicSimilarity. Since ClassicSimilarity is not the default similarity anymore, this functionality has been removed. Boosts are now propagated through Query#createWeight.
https://lucene.apache.org/core/7_5_0/MIGRATE.html
I have also in my experiment added usage of ClassicSimilarity after this line:
https://github.com/jeremylong/DependencyCheck/blob/c11905684ceefe5c58077bb588d88f7af5fb41db/core/src/main/java/org/owasp/dependencycheck/data/cpe/CpeMemoryIndex.java#L138
The results are still the same (FPs still present). It seems to me that the only quickfix is downgrade back to version 5, or 6 (and set ClassicSimilarity)
I have done additional testing on this and should have a solution in the near future and plan on building a set of test cases so that this issue does not occur again (i.e. automated test cases should fail). I apologize about the 4.0.0 release - it was done fairly quickly due to the published vulnerabilities in some of the dependencies.
@GFriedrich the score is being used to filter out completely erroneous matches. Yes, the use of the score for more then sorting is discouraged - however, I could not come up with an effective way to filter the results. If you tell Lucene to give you 20 results - you will get 20 results even if the query:
antlr org.antlr http://www.antlr.org Antlr 3.4 Runtime antlr-runtime
returns
The filter of 0.08 no longer works as the scores are, as you pointed out, no longer between 0 and 1. In some initial testing a score filter of 30 as a minimum score appears to work. But I have several more rounds of testing to do to confirm this. I am also researching other mechanisms of filtering the results.
I can verify that I do no longer see any issue in 4.0.1
I've "reopened" the issue via #1637 as there are still several false positives.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
I have done additional testing on this and should have a solution in the near future and plan on building a set of test cases so that this issue does not occur again (i.e. automated test cases should fail). I apologize about the 4.0.0 release - it was done fairly quickly due to the published vulnerabilities in some of the dependencies.
@GFriedrich the score is being used to filter out completely erroneous matches. Yes, the use of the score for more then sorting is discouraged - however, I could not come up with an effective way to filter the results. If you tell Lucene to give you 20 results - you will get 20 results even if the query:
returns
The filter of 0.08 no longer works as the scores are, as you pointed out, no longer between 0 and 1. In some initial testing a score filter of 30 as a minimum score appears to work. But I have several more rounds of testing to do to confirm this. I am also researching other mechanisms of filtering the results.