Regarding pypistats, awscli and its dependencies are the most downloaded packages. I try to investigate who downloads awscli from PyPI so much.
I found a very interesting result. It seems awscli is downloaded from Amazon Linux 1 much.
date|kernel|downloads
---|---|---
2019-05-14 | 4.14.77-70.59.amzn1.x86_64 | 244827
2019-05-14 | 4.4.23-31.54.amzn1.x86_64 | 55211
2019-05-15 | 4.14.77-70.59.amzn1.x86_64 | 168414
2019-05-15 | 4.14.114-83.126.amzn1.x86_64 | 74483
2019-05-16 | 4.14.114-83.126.amzn1.x86_64 | 208952
2019-05-16 | 4.4.23-31.54.amzn1.x86_64 | 63206
2019-05-17 | 4.14.114-83.126.amzn1.x86_64 | 206870
2019-05-17 | 4.4.23-31.54.amzn1.x86_64 | 64965
---|---|---
2019-06-17 | 4.14.114-83.126.amzn1.x86_64 | 211850
2019-06-17 | 4.4.23-31.54.amzn1.x86_64 | 56809
2019-06-18 | 4.14.123-86.109.amzn1.x86_64 | 167728
2019-06-18 | 4.14.114-83.126.amzn1.x86_64 | 67278
---|---|---
2019-06-25 | 4.14.123-86.109.amzn1.x86_64 | 234755
2019-06-25 | 4.4.23-31.54.amzn1.x86_64 | 66793
I suspect that this huge number of downloads are from not regular EC2 user because:
I'm sorry if I am wrong, but could you confirm some service in AWS based on Amazon Linux 1 do pip install awscli
from very old pip (6.1.1), about 200k times/day?
| date | kernel | python | pip |
|------|--------|--------|-----|
| 2019-06-18~ | 4.14.123-86.109.amzn1.x86_64 | 2.7.16 | 6.1.1 |
| 2019-05-15~2019-06-17 | 4.14.114-83.126.amzn1.x86_64 | 2.7.16 | 6.1.1 |
| 2018-11-20~2019-05-15 | 4.14.77-70.59.amzn1.x86_64 | 2.7.14 | 6.1.1 |
| 2018-08-17~2018-11-20 | 4.14.62-65.117.amzn1.x86_64 | 2.7.14 | 6.1.1 |
| 2018-05-18~2018-08-21 | 4.14.33-51.37.amzn1.x86_64 | 2.7.14 | 6.1.1 |
| ~2018-05-14 | 4.14.26-46.32.amzn1.x86_64 | 2.7.13 | 6.1.1 |
It seems very strange. I suspect these huge downloads are from AWS itself or very large company's system.
I found "CloudWatch Logs Agent" downgrade pip to 6.1.1 and install awscli !
I ran this query on BigQuery:
SELECT
details.system.release,
COUNT(*) AS cnt
FROM
[the-psf:pypi.downloads20190709]
WHERE
file.project = "pip"
AND file.version = "6.1.1"
AND details.implementation.version = "2.7.16"
GROUP BY
details.system.release
ORDER BY
cnt DESC
Result:
details_system_release | cnt
-- | --
4.14.123-86.109.amzn1.x86_64 | 195311
4.14.109-80.92.amzn1.x86_64 | 3578
4.9.27-14.33.amzn1.x86_64 | 2348
Bingo! About 200k DL!
I created a pull request to update the doc to use standalone mode.
But many users use online install in their UserData already.
Would you update the awslogs-agent-setup.py
file to download dependencies from S3, not PyPI?
Thanks for digging into this so much. I have raised this internally with the CloudWatch Logs team.
FWIW, download number of awscli is stil huge even though excluding downloads from awslogs-agent-setup.py
.
It would be helpful to recommend bundled installer is recommended more than pip install.
sudo pip install awscli
may conflict system packages. Bundled installer is easier than manually setup virtual environment. Additionally, users can use frozen dependency libraries. So some broken library update or PyPI outage doesn't affect to user's server provisioning.
So bundled installer is much better than pip for regular sys admins.
A gentle ping on this. Any updates?
Pinging again, to see if folks are interested in taking this forward.
May I write a patch for awslogs-agent-setup.py
to download files from S3?
I found https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/awslogs-agent-setup.py is updated to download dependencies frorm 'https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/AgentDependencies.tar.gz'.
I will close this issue after in this week, after I confirm the PyPIStats.
Confirmed.
It affects Python 2 vs 3 ratio in some packages. For example, this is download stats of urllib3.
Thank you for fixing this.
I found there are still huge download from pip 6.1.1.
Is there any installer like awslogs-agent-setup.py
but for awscli?
query:
file.project as proj,
COUNT(*) AS cnt
FROM
`the-psf.pypi.downloads20200128`
WHERE
details.installer.name = "pip"
and details.installer.version = "6.1.1"
GROUP BY
proj
ORDER BY
cnt DESC
Result:
| proj | cnt |
| -- | -- |
| botocore | 188069 | 聽
| s3transfer | 184599 | 聽
| urllib3 | 181705 | 聽
| awscli | 179487 | 聽
| six | 174167 | 聽
| python-dateutil | 173112 | 聽
| docutils | 172611 | 聽
| pyasn1 | 170876 | 聽
| jmespath | 169021 | 聽
| colorama | 168216 | 聽
| rsa | 167941 | 聽
| pyyaml | 166260 | 聽
| futures | 163741 | 聽
| simplejson | 128451 | 聽
| argparse | 128146 | 聽
| ordereddict | 126778 | 聽
| awscli-cwlogs | 25806 | 聽
| boto3 | 25099 |
Most helpful comment
Confirmed.
It affects Python 2 vs 3 ratio in some packages. For example, this is download stats of urllib3.
Thank you for fixing this.