Hi,
I'm curious about the details about how harbor scans an image for cve/cwe.
Lets say there is a CVE-001 in a binary 'foobar" of ubuntu image, does it perform the following?
1) unpack a ubuntu image to directory A
2) get a hash of the binary "foobar" in A
3) use the same hash algo. to get a hash value of a good binary "foobar" which has fixed the cve.
4) compare the above two hash value to verify if cve has been fixed.
Please suggest.
hi @danielpacak, could you help?
👋 @liubogithub
TL;DR;
It's not Harbor that does the actual scanning. Harbor delegates scanning to configured scanner adapters, which implement so called Pluggable Scanner API.
Now depending on the selected scanner and its vendor the logic for discovering vulnerabilities might be / is different.
Trivy in particular does not support binary scanning. It's using OS package managers and related config files to list packages and then match the list with vulnerability databases. For application dependencies it looks into dependency manager-specific configuration file such as package.json for Node.js. For more details, please consult https://github.com/aquasecurity/trivy
The only scanner I know that is integrated with Harbor and does binary scanning is commercial Aqua CSP scanner. See https://github.com/aquasecurity/harbor-scanner-aqua for more details.
To know if other vendors support binary scanning I'd ask in the Harbor slack channel and see if any vendor or end user can confirm.
@danielpacak Thanks a lot for your inputs.
So my team has made a new image format rafs in our container image service project nydus. The major motivation is to make starting a container fast as it should.
It has separated metadata and data, which means it's now possible to scan an image with metadata only.
Since our format is highly extensible, I'm trying to figure out how we can take advantage of that to make scanning image an easier job to do. If we can make it, maybe we can provide harbor an alternative way of scanning as well as letting users start their containers instantly.
@liubogithub I only had a flick though the rafs spec, but I'm wondering what's your experience with Trivy vulnerability scanner, which is capable of scanning OCI artifacts. If that's the case Trivy should just work. Is there anything you'd like to improve in terms of performance?
@danielpacak A rafs based image is supposed to be the same thing as an image of OCIv1, only with different implementation, so yes, it should work with Trivy alike.
After knowing more about how Clair and Trivy work (looking for files like "var/lib/dpkg/status" and scanning the content), now I understand the possible benefit that rafs based images can bring in is to embed those "bill of material" files into the metadata layer, while scanning layers, scanners will only need to download the metadata layer, saving tons of time on downloading data layers.
That's very interesting @liubogithub Actually if you look at fanal, which is used by Trivy to create the BOM of the scanned artifact, it caches such information in file system or Redis. Not sure if it's feasible / reasonable to store the BOM in the OCI registry directly but definitely worth considering. /cc @knqyf263
Looking at the fanal example, it needs to open the container layer.tar file (which might be a large file to download) to look for the image details. When integrated with rafs, it is possible for fanal to just download a small metadata layer and directly access any file in the container image without downloading the entire layer.tar file. Is it a worth trying direction?
@danielpacak May I ask what is exactly "binary scanning"? Does it refer to scanning anything other than BOM?
Looking at the fanal example, it needs to open the container layer.tar file (which might be a large file to download) to look for the image details. When integrated with rafs, it is possible for fanal to just download a small metadata layer and directly access any file in the container image without downloading the entire layer.tar file. Is it a worth trying direction?
I'd look into Redis implementation to see how we save BOM per image layer as Redis keys. The one that you shared is based on the file system storage and the single BoltDB file, which might become large.
@danielpacak May I ask what is exactly "binary scanning"? Does it refer to scanning anything other than BOM?
By binary scanning we mean the way to scan certain binaries added to the image that does not have package manager or is not using application dependency configuration files such as package-lock.json. For example, scratch or distroless images.
There are different techniques implemented by security vendors, but in very simple terms we look into a binary to find its signature and identify the CPE. Once you have the CPE you can lookup vulnerabilities in the NVD database, which has CPE information. All that is harder to implement than it sounds, therefore it's usually supported by commercial scanners such as Aqua Enterprise / CSP, for which we also have the Harbor plugin.