Looking at the simple package index, there are a number of highly questionable packages (at least so by their names.)
Packages without proper names, authors or descriptions should probably be removed. If not for bloat reasons, but for security concerns.
Stuff like this:
There are almost 200K projects on PyPI. We don't have the ability to manually audit each one. How do you propose this should be done?
There are almost 200K projects on PyPI
Exactly! -- And probably 99.9% useless, outdated, fake, deprecated (at best), or possibly containing malware, at worst!
How do you propose this should be done?
:) We are programmers so I'm sure we can figure that out!
How about about searching for packages that:
That's just a start... and would probably remove a siht load of crud.
It would definitely be interesting to make such a search to see just how many hits we'd get.
Another related issue, is that there seem to be some kind of cyber squatting for package names going on there as well. Packages with little or meaningless content but occupies useful names.
How do you plan to deal with that?
Another related issue, is that there seem to be some kind of cyber squatting for package names going on there as well. Packages with little or meaningless content but occupies useful names.
How do you plan to deal with that?
See PEP 541 and #1506.
Thanks for filing this issue, @E3V3A!
Per discussion today, we'll be addressing this problem during upcoming work on automated detection of malicious uploads. In this issue we'll be nailing down our criteria for "how do we determine what is a bad package?" and plans for removing those packages.
(Note that we're distinguishing between a malicious upload and spam, and between malware and typosquatting, and that there are other issues -- like #194, #4319 and #4004 -- that concentrate on filtering re: packages that have noncompliant metadata or no recent releases.)
Per a discussion with @ewdurbin last week:
The work we'll do on automated detection of malicious uploads will first concentrate on _finding_ malicious packages, and building the tools around that. Only after that will we be able to provide automated tools to help PyPI admins _remove_ them.
From #7061:
What's the problem this feature will solve?
Malicious and insecure packages are a challenge in the open source community. Malicious packages have been removed several times in the last few years. Improved automated auditing techniques would make it easier for security specialists to quickly remove malicious packages. Smart bad actors would be able to use the same test suite, certainly, but it would at minimum allow for the vetting of existing packages. Likewise, this would set up an automated process which could be enhanced over time.Describe the solution you'd like
Python'sexec()function is not secure and may be a good heuristic for finding malicious packages. There may be other additional heuristics that make a package appear more suspicious, and a likely target for manual auditing. Add a badge or other indicator for packages that pass/fail these tests.
I'm very interested in this effort and would like to help. With the fact that there are so many packages here are a few suggestions that I have:
Hello friends! I will be working on the backend implementation of the system for adding malware checks. You can track the progress of this work by checking out the malware-detection label.
Hey everyone.
We are currently working on a proof of concept at GitHub to detect malicious code on Package manager.
We are currently setting-up an environment to run our test, but our first step is to use a static analysis tool: CodeQL to model the way certain backdoor works to detect them as they get included into pypi.
@xmunoz I'm excited about this work! Will we be able to discuss it with you at PyCon and/or help improve it during the sprints?
Yes, absolutely! I'm actually giving a charla about this system at PyCon, but for interested non-Spanish speakers, I can give the English version during the sprints. Also, I'd really love to get feedback on this contribution documentation, and this sounds like a great way to do that.
@xmunoz Are there any slides of that charla?
Do you guys mantain a database of previous backdoor/malware introduced to pypi ? I have slowly start building my own collection, and I would love to expand it.
For the first question, I'll follow up over email :)
The second question could potentially be answered by @ewdurbin.
The malware-detection branch has been merged onto master with PR #7377
Most helpful comment
See PEP 541 and #1506.