The current state of a report is not fully normalized which makes it hard to parse it as machine readable format. Moreover, the report has mixed formatting, for example: a newline indicated by \n and code emphasis indicated by markdown as ``
The weakness description of the vulnerability.
Always a clear-text string, no formatting allowed.
Markdown parsable text
Name would be a full name, username is as set in the hackerone report, and website as a social reference for the user (will be taken from the hackerone platform).
Author field will consist of an object with the following fields:
author: {
name: "",
username: "",
website: ""
}
Instead of a string, this should be an array of strings:
references: [
"https://hackerone.com/...",
"https://github.com/..."
]
Probably refers to the nodejs, or nsp blog where the vulnerability was disclosed.
Recommending to remove this entry.
I'm wondering if we can have some status enum thing here, where the path is easily understood such as either you should upgrade, or drop it completely due to no fix available. Something like:
recommendation: "UPGRADE|DROP|any other status we can have?"
If not this, then we should have a formatted string that follows a standard, such as:
{
"id": 399,
"title": "Command Injection - Generic",
"overview": "`whereis` concatenates unsanitized input into exec() command",
"created_at": "2018-02-25",
"updated_at": "2018-03-28",
"publish_date": "2018-03-28",
"author": "小泻芯胁芯褉芯写邪 袧懈泻懈褌邪 袗薪写褉械械胁懈褔 (https://github.com/ChALkeR)",
"module_name": "whereis",
"cves": [
""
],
"vulnerable_versions": "<=0.4.0",
"patched_versions": ">=0.4.1",
"slug": "whereis-command-injection---generic",
"recommendation": "use npm package `which` instead",
"references": "- https://hackerone.com/reports/319476\n- https://github.com/vvo/node-whereis/commit/0f64e3780235004fb6e43bfd153ea3e0e210ee2b",
"cvss_vector": "CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H",
"cvss_score": 9.9,
"coordinating_vendor": ""
}
A reference to the proposed changes:
{
"id": 399,
"title": "Command Injection - Generic",
"overview": "`whereis` concatenates unsanitized input into `exec()` command",
"created_at": "2018-02-25",
"updated_at": "2018-03-28",
"publish_date": "2018-03-28",
"author": {
"name": "小泻芯胁芯褉芯写邪 袧懈泻懈褌邪 袗薪写褉械械胁懈褔",
"username": "ChALkeR",
"website": "https://github.com/ChALkeR"
},
"module_name": "whereis",
"cves": [
],
"vulnerable_versions": "<=0.4.0",
"patched_versions": ">=0.4.1",
"recommendation": "DROP: the package and use npm package `which` instead",
"references": [
"https://hackerone.com/reports/319476",
"https://github.com/vvo/node-whereis/commit/0f64e3780235004fb6e43bfd153ea3e0e210ee2b"
],
"cvss_vector": "CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H",
"cvss_score": 9.9,
"coordinating_vendor": ""
}
slug I think refers to the nodejs blog where the vulnerability was disclosed. This often has more info so a link is useful, although that could fit in the references.
Is this for core vulnerabilities or those in 3rd party modules?
Yeah, so I think the slug field can be removed entirely.
The scope of this issue is currently only ecosystem vulns. The core node vulns have a different format entirely, which we can consider aligning up to this but I'm not sure what it will break if we do so.
I'd agree that for ecosystem vulns, dropping slug and using the references makes the most sense to me.
LGTM.
Also, when we have multiple versions range to handle, machine readability is not optimal.
Maybe having versions related field as
"vulnerable_versions": [
{"min": "=0.0.0", "max": "=4.0.0"} // the "=" means that the version is in the range
{"min": "5.0.0", "max": "=5.1.0"}
]
would be clearer.
@vdeturckheim I'm not sure if this further simplifies it or not.
it is however more machine-friendly I guess.
@greysteil would be great if you could chime in here with feedback regarding your recent changes on the formatting through out the vulns (i.e: https://github.com/nodejs/security-wg/pull/216)
@lirantal yeah, parsing versions ranges can be pretty complicated sometimes. Mostly trying to make my life easier here ^^
Interesting one. On requirement ranges I would go with either:
[">= 1.0.0 < 1.3.2", "^2.1.2"] would mean either ">= 1.0.0 < 1.3.2" or "^2.1.2").|| as an or separator, as specified in JS's semver setup (read the bit about || here).I think my preference is for (2), given that JS has support for that. Actually, looking at it, (2) is way cleverer than it needs to be. My recommendation would be to use the smallest amount of the language possible and go for (1).
(I mess about with these in a lot of languages in Dependabot. Everyone does it differently, but the || requirement is pretty easy to massage into other formats. If you're working in Ruby, I do it here.)
On the other suggestions, I think it's going to be easier to discuss these over individual PRs that change the validations. For starters, I'll put in PRs for:
title: validate that it is fewer than 100 characters, with no new lines (or something like that)slug: drop itdates: make created_at and updated_at YYYY-MM-DD (this one needs some discussion)I think https://github.com/nodejs/security-wg/issues/171 can be closed in favour of this PR.
The changes proposed until now (checked are merged already):
I took the liberty updated the reference provided in the issue with the proposed and applied changes:
{
"id": 399,
"title": "Command Injection - Generic",
"overview": "`whereis` concatenates unsanitized input into `exec()` command",
"created_at": "2018-02-25",
"updated_at": "2018-03-28",
"publish_date": "2018-03-28",
"author": {
"name": "小泻芯胁芯褉芯写邪 袧懈泻懈褌邪 袗薪写褉械械胁懈褔",
"username": "ChALkeR",
"website": "https://github.com/ChALkeR"
},
"module_name": "whereis",
"cves": [
],
"vulnerable_versions": "<=0.4.0",
"patched_versions": ">=0.4.1",
"recommendation": "DROP: the package and use npm package `which` instead",
"references": [
"https://hackerone.com/reports/319476",
"https://github.com/vvo/node-whereis/commit/0f64e3780235004fb6e43bfd153ea3e0e210ee2b"
],
"cvss_vector": "CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H",
"cvss_score": 9.9,
"coordinating_vendor": ""
}
Overall those changes make all sense to me and would make working with the npm now and in the future way easier.
What needs some final discussion imoh:
After working through the PRs I think this should not be too much work, and I would like to spend some time on this.
WDYT?
@pxlpnk I like that. What would be your take on version ranges format?
@vdeturckheim I like the [">= 1.0.0 < 1.3.2", "^2.1.2"] version more. It is a basic data type and does not need extra parsing as || would.
However I have no strong opinion on this.
The two versions:
2: [">= 1.0.0 < 1.3.2", "^2.1.2"]
2: ">= 1.0.0 || < 1.3.2 || ^2.1.2"
@greysteil Please correct me if I understood it wrong.
Agree on avoiding ||. In your example above, the second one would be ">= 1.0.0 < 1.3.2 || ^2.1.2", because npm treats the space as an AND which is what we want there. Avoiding using || avoids that kind of complexity entirely 馃檪
I also think || is rather confusing.
So what are the next steps for this then?
@pxlpnk would you have time to open a PR to iterate uppon?
I have time to work on this and move it forward.
I try to do it in the following order:
and will propose individual PRs for this.
@pxlpnk do you mind if I edit the original issue with your summary and proposed final format so we have one source of reference instead of having to scroll for it in the comments?
Please go ahead. 馃憤
@lirantal @vdeturckheim I proposed two PRs with changes for the author and references fields.
Please review them when you have some time and let me know if this is the direction we should go.
At the moment this is mostly about data hygiene and getting the data into a good machine-readable format.
@nodejs/security-wg we'd like to move forward with 2 PRs to be merged and update the vulnerability db format according with the suggestions made in this issue, specifically they are:
We'd like to ask your feedback with regards to any tooling or otherwise dependency around the db incase this may affect you.
** This has been open for a while (both PRs and this issue) so we'll merge in a week time if there are no objections.
After the PRs will be merged you should expect the following format to be applied:
Name would be a full name, username is as set in the hackerone report, and website as a social reference for the user (will be taken from the hackerone platform).
Author field will consist of an object with the following fields:
author: {
name: "",
username: "",
website: ""
}
Instead of a string, this should be an array of strings:
references: [
"https://hackerone.com/...",
"https://github.com/..."
]
@lirantal the changes are ok with me provided we are not going to cause undue breakage on those using the existing format.
Well we some what are... we'll be updating retroactively all the vuln db to the new format.
So if a tool used the vuln db and expected that references is a string with end of line control chars inside, then now it will be a JSON array with each reference URL.
@mhdawson How do you suggest providing an upgrade path could work here?
Changes to the data structure are something that I can and will happen in the future. I think it makes sense to come up with a strategy to support this in the future.
@pxlpnk I'm not necessarily suggesting an upgrade path, it is more about being comfortable that we've done our due diligence around understanding who uses the existing data/format, understanding if they are concerned with the change and, if they are, considering what we can do to avoid undue impact to them.
For example instead of just throwing the switch maybe we need to publicise, give people time to react etc. https://github.com/nodejs/security-wg/issues/200#issuecomment-399733312 is a start at that, but I wonder if we need to do broader messaging (blogs, tweets etc.). If we know there are npm modules using then reach out to maintainers directly etc. We might also need to give people more time to get ready to adjust.
I don't think I know enough about the users of the existing data to make a good call on what we should do in advance.
I agree a strategy that is well-documented helps. It can set expectations about how we'll communicate changes will be made, how much lead time we'll give consumers (as they will need time to make adjustments).
Agree with Michael on this. We've had people chime in on building toolset around it before on the issue queue and sharing it so we know some people are definitely depended on it.
By the way, this connects to another topic we raised in the last working group - if you remember I mentioned that we can have a "who uses our db" kind of page, and while suggesting it at that time for better engagement and evangelism I now see it being useful for also understanding whom are our consumers. I definitely think now it is worth pursuing this and would love to get at it.
Some key things in the strategy might be (just off the top of my head):
When changes are planned to the database format we will communicate through:
The community will include the date the format of the change to the data will be made.
The first communication will be sent out at least 2 months before the scheduled change and will be repeated every 2 weeks.
@lirantal getting a list of users, for notifications is definitely a good idea.
@mhdawson This seems all very reasonable to me. I am happy to take this on drive this forward.
I think what we should do here is:
Another approach would be to try versioning the format. However, I have not too much of an idea how this could be done.
Can any of the consumers of the data here maybe chip in and let us know what would work for them.
I鈥檓 fine with just being notified, but that鈥檚 because Dependabot (my app) is just SaaS. Imagine if any other data users are working with CLIs etc they鈥檒l want versioning. Actually, if I was building a deployed app that used this data, I'd definitely build a web proxy for it, which is where I'd handle changes in format. I think notifications should be totally fine - no need for versioning.
@pxlpnk +1 to the next steps you proposed.
Regardless to that process we can probably continue with the proposed changes (one of which was merged) in a few days if no one jumps here to say otherwise.
@mhdawson do you have any strong opinions on getting https://github.com/nodejs/security-wg/pull/314 merged soonish?
@lirantal, not sure why #314 could progress without following a good process to make sure we are not breaking anybody? I was thinking we should notify/follow what we believe will be the process in the future.
In https://github.com/nodejs/security-wg/issues/200 which was already opened several months ago we touched all of these updates, and @pxlpnk's PR is just applying them. I would assume any significant service/user of the DB will monitor our issue queue and jump in already with all of the changes we've made so far (i.e: enforcing nulls instead of empty strings, and so on, which are breaking changes as well).
I do agree with you, and maybe we can use #314 as a "use case" to outline a process of db changes (cc @pxlpnk this might be where you could jump in and formalize a new process), and follow it through.
@lirantal I added an announcement issue https://github.com/nodejs/security-wg/issues/345
After this one has been exercised I will draft a document and propose it.
Would it also make sense to update the header on the Node.js website to help get the message out?
I also think we want to build a list of db users can we contact directly as well.
I'm not sure what's the policy on that (the Node.js website), but definitely any formal means of announcing such change would make it easier for subscribers to get alerts.
At the moment the only thing I can think of is the repo's issue queue and tweeting from our own personal accounts.
Hello guys !
Please do check the pull request I opened recently for the markdown parsable overview .
Currently working on the format of the vulnerable versions. https://github.com/nodejs/security-wg/pull/408
@lirantal, you have done a lot of work on the vuln json format, can this be closed, or is it still ongoing?
Let's leave open so I can further review the remaining open items
Most helpful comment
I have time to work on this and move it forward.
I try to do it in the following order:
and will propose individual PRs for this.