Aws-cli: Huge download number from Amazon Linux 1

Created on 27 Jun 2019  路  11Comments  路  Source: aws/aws-cli

Regarding pypistats, awscli and its dependencies are the most downloaded packages. I try to investigate who downloads awscli from PyPI so much.

I found a very interesting result. It seems awscli is downloaded from Amazon Linux 1 much.

date|kernel|downloads
---|---|---
2019-05-14 | 4.14.77-70.59.amzn1.x86_64 | 244827
2019-05-14 | 4.4.23-31.54.amzn1.x86_64 | 55211
2019-05-15 | 4.14.77-70.59.amzn1.x86_64 | 168414
2019-05-15 | 4.14.114-83.126.amzn1.x86_64 | 74483
2019-05-16 | 4.14.114-83.126.amzn1.x86_64 | 208952
2019-05-16 | 4.4.23-31.54.amzn1.x86_64 | 63206
2019-05-17 | 4.14.114-83.126.amzn1.x86_64 | 206870
2019-05-17 | 4.4.23-31.54.amzn1.x86_64 | 64965
---|---|---
2019-06-17 | 4.14.114-83.126.amzn1.x86_64 | 211850
2019-06-17 | 4.4.23-31.54.amzn1.x86_64 | 56809
2019-06-18 | 4.14.123-86.109.amzn1.x86_64 | 167728
2019-06-18 | 4.14.114-83.126.amzn1.x86_64 | 67278
---|---|---
2019-06-25 | 4.14.123-86.109.amzn1.x86_64 | 234755
2019-06-25 | 4.4.23-31.54.amzn1.x86_64 | 66793

I suspect that this huge number of downloads are from not regular EC2 user because:

  • Although Amazon Linux 2 is released a year ago, downloads from Amazon Linux 1 is not decreasing.
  • It seems download from Amazon Linux 1 is much higher than download from Ubuntu, while Ubuntu is popular too.

I'm sorry if I am wrong, but could you confirm some service in AWS based on Amazon Linux 1 do pip install awscli from very old pip (6.1.1), about 200k times/day?

Most helpful comment

Confirmed.

image

It affects Python 2 vs 3 ratio in some packages. For example, this is download stats of urllib3.

image

Thank you for fixing this.

All 11 comments

| date | kernel | python | pip |
|------|--------|--------|-----|
| 2019-06-18~ | 4.14.123-86.109.amzn1.x86_64 | 2.7.16 | 6.1.1 |
| 2019-05-15~2019-06-17 | 4.14.114-83.126.amzn1.x86_64 | 2.7.16 | 6.1.1 |
| 2018-11-20~2019-05-15 | 4.14.77-70.59.amzn1.x86_64 | 2.7.14 | 6.1.1 |
| 2018-08-17~2018-11-20 | 4.14.62-65.117.amzn1.x86_64 | 2.7.14 | 6.1.1 |
| 2018-05-18~2018-08-21 | 4.14.33-51.37.amzn1.x86_64 | 2.7.14 | 6.1.1 |
| ~2018-05-14 | 4.14.26-46.32.amzn1.x86_64 | 2.7.13 | 6.1.1 |

  • pip 6.1.1 is very old, and combination of Python 2.7.16 and pip 6.1.1 and amzn1 kernel is uncommon. For example, Latest Amazon Linux 1 preinstalls Python 2.7.16 and pip 9.0.3. Amazon Linux 1 2017.03 preinstalls Python 2.7.12 and pip 6.1.1.
  • Python and Kernel were updated several times, within one or two days.
  • 200k DL/day even on Weekends.

It seems very strange. I suspect these huge downloads are from AWS itself or very large company's system.

I found "CloudWatch Logs Agent" downgrade pip to 6.1.1 and install awscli !

I ran this query on BigQuery:

SELECT
  details.system.release,
  COUNT(*) AS cnt
FROM
  [the-psf:pypi.downloads20190709]
WHERE
  file.project = "pip"
  AND file.version = "6.1.1"
  AND details.implementation.version = "2.7.16"
GROUP BY
  details.system.release
ORDER BY
  cnt DESC

Result:

details_system_release | cnt
-- | --
4.14.123-86.109.amzn1.x86_64 | 195311
4.14.109-80.92.amzn1.x86_64 | 3578
4.9.27-14.33.amzn1.x86_64 | 2348

Bingo! About 200k DL!

I created a pull request to update the doc to use standalone mode.

But many users use online install in their UserData already.
Would you update the awslogs-agent-setup.py file to download dependencies from S3, not PyPI?

Thanks for digging into this so much. I have raised this internally with the CloudWatch Logs team.

FWIW, download number of awscli is stil huge even though excluding downloads from awslogs-agent-setup.py.
It would be helpful to recommend bundled installer is recommended more than pip install.

sudo pip install awscli may conflict system packages. Bundled installer is easier than manually setup virtual environment. Additionally, users can use frozen dependency libraries. So some broken library update or PyPI outage doesn't affect to user's server provisioning.

So bundled installer is much better than pip for regular sys admins.

A gentle ping on this. Any updates?

Pinging again, to see if folks are interested in taking this forward.

May I write a patch for awslogs-agent-setup.py to download files from S3?

I found https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/awslogs-agent-setup.py is updated to download dependencies frorm 'https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/AgentDependencies.tar.gz'.

I will close this issue after in this week, after I confirm the PyPIStats.

Confirmed.

image

It affects Python 2 vs 3 ratio in some packages. For example, this is download stats of urllib3.

image

Thank you for fixing this.

I found there are still huge download from pip 6.1.1.
Is there any installer like awslogs-agent-setup.py but for awscli?

query:

  file.project as proj,
  COUNT(*) AS cnt
FROM
  `the-psf.pypi.downloads20200128`
WHERE
  details.installer.name = "pip"
  and details.installer.version = "6.1.1"
GROUP BY
  proj
ORDER BY
  cnt DESC

Result:

| proj | cnt |
| -- | -- |
| botocore | 188069 | 聽
| s3transfer | 184599 | 聽
| urllib3 | 181705 | 聽
| awscli | 179487 | 聽
| six | 174167 | 聽
| python-dateutil | 173112 | 聽
| docutils | 172611 | 聽
| pyasn1 | 170876 | 聽
| jmespath | 169021 | 聽
| colorama | 168216 | 聽
| rsa | 167941 | 聽
| pyyaml | 166260 | 聽
| futures | 163741 | 聽
| simplejson | 128451 | 聽
| argparse | 128146 | 聽
| ordereddict | 126778 | 聽
| awscli-cwlogs | 25806 | 聽
| boto3 | 25099 |

Was this page helpful?
0 / 5 - 0 ratings