Security-wg: Machine friendly vulnerability database

Created on 5 Feb 2018  Â·  30Comments  Â·  Source: nodejs/security-wg

Hi,

I am working in https://github.com/dgonzalez/gammaray. One of the sources I have plans to support is the security-wg but the challenge is that the current format of the vuln folder is not very friendly for processing.

It would be fantastic to aggregate the data and publish it on a daily basis so we can pull that data from a single file for offline analysis and convenience.

At the moment you need to clone the repository and parse a number of files to list all the vulnerabilities which is not the end of the world but having a release whenever the data is changed or on a timely basis (daily, weekly...) would be of great help.

I can help with it if necessary.

DB good first issue help wanted

Most helpful comment

I'd say it is the perfect time now to make modifications before it grows legs.

All 30 comments

Hey @dgonzalez,

It's great to see you working on a security topic that benefits the Node.js ecosystem.

I have a similar need with regards to vulns and was thinking of using one of the cve databases and their APIs to get this information, but they are currently not up to date due to the historic vulns in the repo. I wonder if @evilpacket can comment on whether nsp provide some sort of open API for this kind of use?

Gammaray integrates with OSSIndex and I have plans to integrate with as many dbs as possible but the node.js security working group is a must and I dont want to start coding till the format is settled.

https://github.com/dgonzalez/gammaray/pull/3/files -> I have added support for the vulnerabilities in the current format but it would be ideal if we can agree on the format so it does not break by changes.

@dgonzalez I have been thinking about this for a while now. If I was publishing a npm package with the content of the vuln base each time a new record is added, would that work for you?

@nodejs/security-wg would this be an issue if I added a github webhook to trigger a job for this purpose?

@dgonzalez I have been thinking about this for a while now. If I was publishing a npm package with the content of the vuln base each time a new record is added, would that work for you?

That would be pretty handy! We could also add a very simple js API to travel the files.

@mcollina yes! I'm just not certain what is the best way to host such API. I tried to find the best solution a few weeks ago but did not have any result I really liked so far. It depends how popular such API would end up being.

That would work like charm!

Also I've been following this project for a while: https://github.com/Grafeas/Grafeas. I am not sure if it is of the interest of @nodejs/security-wg but it is a way to make the metadata about packages available via API. That would also be another option but it is not as simple as a npm package.

Thanks!

Pinging back on this, seems like we can go both ways:

  • Package the vulns/ and distribute it as an npm package
  • Provide an API - Seems to me that cloud functions would make a good candidate for this kind of service. They're cheap by nature but we'll probably incur some relatively small monthly costs so maybe one of the big cloud providers can sponsor this? (gcp/azure/aws or even IBM's cloud with the help of Michael on this)

I also suggest we open up new issues for the above two as they are both handled differently.

P.S
I think before we get more activity on the topic we can probably remove the security-wg-agenda label as this doesn't require yet any decision making/review on the agenda level from the team.

I am happy either way but the npm package looks like a better option to me (easier to distribute and minimal maintenance (can be automated)). Happy to help with this. Do you want me to automate the npm package creation and publication?

https://github.com/dgonzalez/node-swg-vulnerability-fetcher -> here is the fetcher (needs work for automating the npm publish but it is one manual step as of now)
https://github.com/dgonzalez/node-swg-vulnerabilities -> here the vulnerabilities published on https://www.npmjs.com/package/swg-vulnerabilities

Happy to transfer repos to the working group if we want to.

I'm wondering how the npm makes it easier to consume? If it just has a copy of the vuln db files is it not just as easy to do a clone of the repo? I'm probably missing something here but thought I'd ask to better understand.

The separate module should make it easier to build versioned tooling that consumes the vulnerabilities data.

it also allows us to validate the data. If the format changes or the repo
structure changes, the tool that process it will fail but not the tools
that consume the npm package. Node.js security working group is for humans
and the npm package is to be consumed by machines. Also, I dont think the
current json format is optimal as it requires querying all the files for
finding vulns for a single package. We should probably change it to
something like:

my-npm-package.json

and have a list of vulns inside so, we add it directly from hacker rank and
the tool that generates the npm aggregates the data in a single file per
package.

Does it make sense?

They are indeed not contradicting (package and service API).
I agree about the current format which is less desirable from several aspects: the internal id we use for filenames and the vulnerability id as well as no standard for fields like recommendation, or the author field currently denormalized which makes it harder to parse.

I'm not sure on the origins and requirements of the vulnerability format but it makes sense to me to open an issue dedicated to the formatting and also a PR to a document that explains the fields (which is helpful for newcomers) eventhough in the future we might have tools for automating the generation of the vulnerability file.

AFAIK format is inherited from the archive handovered by the NSP.

yeah that makes sense. question is how flexible is the format and can we possibly make changes in a non-breakable way at all?

We only have ~400 entries in the vuln database right now, so if we want to make significant changes to the format, we should do it quickly before modifying the data becomes too big of a task. I'm also not sure if anyone relies on the data in its current format, so maybe we don't need to worry about breaking anyone yet? For example, I don't know if NSP has moved to use the data from this repo or they're still using their own db.

Right. And another of a change I'd like to do is instead of having:

"author": "lirantal" (https://twitter.com/liran_tal)

Then it can be broken down to several fields in a more normalized fashion:

"author": {
  "username": "lirantal",
  "url": "https://twitter.com/liran_tal"
}

We can do those things in a non-breakable way by just adding another structured field, or instead just parse out and re-structure current author field.

Another thing that wouldn't scale well is the ids and the filenames based off of those ids. It's quite a manual task to check what should be the current id, and also querying any open PRs to see if an id is already pending by someone else. Maybe we can just use H1 ids, or possibly no ids at all since we don't really have any database. Another workaround is timestamped ids? I don't know but can't help to think that current id system is not scalable.

Agreed.

Maybe the ID could be a per-module ID? Like modulename-1, etc. It doesn't totally solve the problem, but makes collisions a lot less likely (and makes it easier to find the previous IDs).

I think in the interest of making these entries easier to create, we could also remove the slug field (does anyone know what that's used for? NSP seems to use IDs now, like: https://nodesecurity.io/advisories/566).

I'd say it is the perfect time now to make modifications before it grows legs.

what about having a file per module and an array of vulnerabilities inside? like:
module-name.json with content:

[
   {vuln1},
   {vuln2},
   ...
]

So that we can keep the ID for traceability purposes and also makes a lot easier to check for vulns on a certain module.

I'm not a big fan of having a file per module as we might end up with bloaded files. We want it to be machine readable, but human friendly too.

@vdeturckheim this has been labeled security-wg-agenda for 28 days now. Does it need to remain on the agenda for today's meeting?

I think @dgonzalez has made some progess on this? He might want to discuss it at today's meeting. Removing flag anyway.

Yes!

I will.

On 22 March 2018 at 13:36, Vladimir de Turckheim notifications@github.com
wrote:

Closed #115 https://github.com/nodejs/security-wg/issues/115.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nodejs/security-wg/issues/115#event-1535579132, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAHkOqf-9ZwZxHLBrg4L7A3_ITUn-6Pqks5tg6jqgaJpZM4R5LOq
.

I'm not sure about closing this..? I mean seems like some of us agreed there are improvements we can do for the vuln format so do we want to take it in a new issue?

@lirantal I'm not sure why I closed it. Most likely a mistake on my side.

Is this still an open issue, or does the changes to the structure of the vuln DB address it?

I think we can close and link this to the issue that @bnb opened.
Plus there's also a related work on this by @vdeturckheim with regards to the algolia search

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sam-github picture sam-github  Â·  5Comments

ronperris picture ronperris  Â·  5Comments

vdeturckheim picture vdeturckheim  Â·  8Comments

bmeck picture bmeck  Â·  7Comments

joker314 picture joker314  Â·  4Comments