Pandas: ENH: Add ability to specify AWS port through AWS_S3_PORT environment variable

Created on 11 Jun 2017 · 2Comments · Source: pandas-dev/pandas

When trying to integrate Pandas CSV reading from s3 for local development (in docker) with containers from LocalStack or Minio we need to be able to define a custom host as well as a port.

PR #12198 introduces the AWS_S3_HOST environment variable, I propose adding the AWS_S3_PORT one. Something like:

s3_host = os.environ.get('AWS_S3_HOST','s3.amazonaws.com')
s3_port = os.environ.get('AWS_S3_PORT')

try:
    conn = boto.connect_s3(host=s3_host, port=s3_port)
except boto.exception.NoAuthHandlerFound:
     conn = boto.connect_s3(host=s3_host,anon=True, port=s3_port)

This would allow to define something like this in the docker-compose.yml and use Minio for serving the csv files from a local s3 for development and AWS for production:

environment:
  - AWS_ACCESS_KEY_ID=supersecret
  - AWS_SECRET_ACCESS_KEY=supersecret
  - AWS_S3_HOST=s3local
  - AWS_S3_PORT=9000
  - S3_USE_SIGV4=True

This is only applicable for pandas 0.18.X and 0.19.X since 0.20.X uses s3f. I would be willing to submit a PR for this.

Source

ivansabik

Most helpful comment

For the record, I ended up using a workaround with s3fs along with the change introduced in https://github.com/dask/s3fs/pull/69:

import pandas as pd
from s3fs.core import S3FileSystem

client_kwargs = {'endpoint_url': 'http://s3:9000'}
s3 = S3FileSystem(anon=False, client_kwargs=client_kwargs)
df = pd.read_csv(s3.open('s3://bucket/file.csv.gz', mode='rb'))

ivansabik on 11 Jun 2017

🎉3

All 2 comments

we don't offer backports for any version before the last major one (0.20)

jreback on 11 Jun 2017

For the record, I ended up using a workaround with s3fs along with the change introduced in https://github.com/dask/s3fs/pull/69:

import pandas as pd
from s3fs.core import S3FileSystem

client_kwargs = {'endpoint_url': 'http://s3:9000'}
s3 = S3FileSystem(anon=False, client_kwargs=client_kwargs)
df = pd.read_csv(s3.open('s3://bucket/file.csv.gz', mode='rb'))

ivansabik on 11 Jun 2017

🎉3

Was this page helpful?

0 / 5 - 0 ratings