Security-wg: Machine friendly vulnerability database

Created on 5 Feb 2018 · 30Comments · Source: nodejs/security-wg

Hi,

I am working in https://github.com/dgonzalez/gammaray. One of the sources I have plans to support is the security-wg but the challenge is that the current format of the vuln folder is not very friendly for processing.

It would be fantastic to aggregate the data and publish it on a daily basis so we can pull that data from a single file for offline analysis and convenience.

At the moment you need to clone the repository and parse a number of files to list all the vulnerabilities which is not the end of the world but having a release whenever the data is changed or on a timely basis (daily, weekly...) would be of great help.

I can help with it if necessary.

DB good first issue help wanted

Source

dgonzalez

👍1

Most helpful comment

I'd say it is the perfect time now to make modifications before it grows legs.

dgonzalez on 15 Mar 2018

👍2

All 30 comments

Hey @dgonzalez,

It's great to see you working on a security topic that benefits the Node.js ecosystem.

I have a similar need with regards to vulns and was thinking of using one of the cve databases and their APIs to get this information, but they are currently not up to date due to the historic vulns in the repo. I wonder if @evilpacket can comment on whether nsp provide some sort of open API for this kind of use?

lirantal on 5 Feb 2018

👍1

Gammaray integrates with OSSIndex and I have plans to integrate with as many dbs as possible but the node.js security working group is a must and I dont want to start coding till the format is settled.

dgonzalez on 5 Feb 2018

https://github.com/dgonzalez/gammaray/pull/3/files -> I have added support for the vulnerabilities in the current format but it would be ideal if we can agree on the format so it does not break by changes.

dgonzalez on 5 Feb 2018

@dgonzalez I have been thinking about this for a while now. If I was publishing a npm package with the content of the vuln base each time a new record is added, would that work for you?

@nodejs/security-wg would this be an issue if I added a github webhook to trigger a job for this purpose?

vdeturckheim on 9 Feb 2018

@dgonzalez I have been thinking about this for a while now. If I was publishing a npm package with the content of the vuln base each time a new record is added, would that work for you?

That would be pretty handy! We could also add a very simple js API to travel the files.

mcollina on 9 Feb 2018

@mcollina yes! I'm just not certain what is the best way to host such API. I tried to find the best solution a few weeks ago but did not have any result I really liked so far. It depends how popular such API would end up being.

vdeturckheim on 9 Feb 2018

That would work like charm!

dgonzalez on 12 Feb 2018

Also I've been following this project for a while: https://github.com/Grafeas/Grafeas. I am not sure if it is of the interest of @nodejs/security-wg but it is a way to make the metadata about packages available via API. That would also be another option but it is not as simple as a npm package.

Thanks!

dgonzalez on 12 Feb 2018

Pinging back on this, seems like we can go both ways:

Package the vulns/ and distribute it as an npm package
Provide an API - Seems to me that cloud functions would make a good candidate for this kind of service. They're cheap by nature but we'll probably incur some relatively small monthly costs so maybe one of the big cloud providers can sponsor this? (gcp/azure/aws or even IBM's cloud with the help of Michael on this)

I also suggest we open up new issues for the above two as they are both handled differently.

P.S
I think before we get more activity on the topic we can probably remove the security-wg-agenda label as this doesn't require yet any decision making/review on the agenda level from the team.

lirantal on 10 Mar 2018

I am happy either way but the npm package looks like a better option to me (easier to distribute and minimal maintenance (can be automated)). Happy to help with this. Do you want me to automate the npm package creation and publication?

dgonzalez on 12 Mar 2018

https://github.com/dgonzalez/node-swg-vulnerability-fetcher -> here is the fetcher (needs work for automating the npm publish but it is one manual step as of now)
https://github.com/dgonzalez/node-swg-vulnerabilities -> here the vulnerabilities published on https://www.npmjs.com/package/swg-vulnerabilities

Happy to transfer repos to the working group if we want to.

dgonzalez on 12 Mar 2018

I'm wondering how the npm makes it easier to consume? If it just has a copy of the vuln db files is it not just as easy to do a clone of the repo? I'm probably missing something here but thought I'd ask to better understand.

mhdawson on 12 Mar 2018

The separate module should make it easier to build versioned tooling that consumes the vulnerabilities data.

jasnell on 13 Mar 2018

it also allows us to validate the data. If the format changes or the repo
structure changes, the tool that process it will fail but not the tools
that consume the npm package. Node.js security working group is for humans
and the npm package is to be consumed by machines. Also, I dont think the
current json format is optimal as it requires querying all the files for
finding vulns for a single package. We should probably change it to
something like:

my-npm-package.json

and have a list of vulns inside so, we add it directly from hacker rank and
the tool that generates the npm aggregates the data in a single file per
package.

Does it make sense?

dgonzalez on 13 Mar 2018

They are indeed not contradicting (package and service API).
I agree about the current format which is less desirable from several aspects: the internal id we use for filenames and the vulnerability id as well as no standard for fields like recommendation, or the author field currently denormalized which makes it harder to parse.

I'm not sure on the origins and requirements of the vulnerability format but it makes sense to me to open an issue dedicated to the formatting and also a PR to a document that explains the fields (which is helpful for newcomers) eventhough in the future we might have tools for automating the generation of the vulnerability file.

lirantal on 13 Mar 2018

AFAIK format is inherited from the archive handovered by the NSP.

vdeturckheim on 14 Mar 2018

yeah that makes sense. question is how flexible is the format and can we possibly make changes in a non-breakable way at all?

lirantal on 14 Mar 2018

We only have ~400 entries in the vuln database right now, so if we want to make significant changes to the format, we should do it quickly before modifying the data becomes too big of a task. I'm also not sure if anyone relies on the data in its current format, so maybe we don't need to worry about breaking anyone yet? For example, I don't know if NSP has moved to use the data from this repo or they're still using their own db.

drifkin on 14 Mar 2018

Right. And another of a change I'd like to do is instead of having:

"author": "lirantal" (https://twitter.com/liran_tal)

Then it can be broken down to several fields in a more normalized fashion:

"author": {
  "username": "lirantal",
  "url": "https://twitter.com/liran_tal"
}

We can do those things in a non-breakable way by just adding another structured field, or instead just parse out and re-structure current author field.

Another thing that wouldn't scale well is the ids and the filenames based off of those ids. It's quite a manual task to check what should be the current id, and also querying any open PRs to see if an id is already pending by someone else. Maybe we can just use H1 ids, or possibly no ids at all since we don't really have any database. Another workaround is timestamped ids? I don't know but can't help to think that current id system is not scalable.

lirantal on 14 Mar 2018

Agreed.

Maybe the ID could be a per-module ID? Like modulename-1, etc. It doesn't totally solve the problem, but makes collisions a lot less likely (and makes it easier to find the previous IDs).

I think in the interest of making these entries easier to create, we could also remove the slug field (does anyone know what that's used for? NSP seems to use IDs now, like: https://nodesecurity.io/advisories/566).

drifkin on 14 Mar 2018

I'd say it is the perfect time now to make modifications before it grows legs.

dgonzalez on 15 Mar 2018

👍2

what about having a file per module and an array of vulnerabilities inside? like:
module-name.json with content:

[
   {vuln1},
   {vuln2},
   ...
]

So that we can keep the ID for traceability purposes and also makes a lot easier to check for vulns on a certain module.

dgonzalez on 15 Mar 2018

I'm not a big fan of having a file per module as we might end up with bloaded files. We want it to be machine readable, but human friendly too.

vdeturckheim on 15 Mar 2018

@vdeturckheim this has been labeled security-wg-agenda for 28 days now. Does it need to remain on the agenda for today's meeting?

cjihrig on 22 Mar 2018

I think @dgonzalez has made some progess on this? He might want to discuss it at today's meeting. Removing flag anyway.

vdeturckheim on 22 Mar 2018

Yes!

I will.

On 22 March 2018 at 13:36, Vladimir de Turckheim notifications@github.com
wrote:

Closed #115 https://github.com/nodejs/security-wg/issues/115.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nodejs/security-wg/issues/115#event-1535579132, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAHkOqf-9ZwZxHLBrg4L7A3_ITUn-6Pqks5tg6jqgaJpZM4R5LOq
.

dgonzalez on 22 Mar 2018

I'm not sure about closing this..? I mean seems like some of us agreed there are improvements we can do for the vuln format so do we want to take it in a new issue?

lirantal on 22 Mar 2018

👍1

@lirantal I'm not sure why I closed it. Most likely a mistake on my side.

vdeturckheim on 13 Apr 2018

👍1

Is this still an open issue, or does the changes to the structure of the vuln DB address it?

sam-github on 16 Apr 2019

I think we can close and link this to the issue that @bnb opened.
Plus there's also a related work on this by @vdeturckheim with regards to the algolia search

lirantal on 17 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Security WG meeting 2017-11-02

sam-github · 5Comments

Update Third-party Triage Process Documentation

ronperris · 5Comments

Security WG meeting 2017-10-12

vdeturckheim · 8Comments

Gather impact of supporting non file based modules through ESM

bmeck · 7Comments

[Question] How to assign an ID to security issues?

joker314 · 4Comments