Pandas: Read/Write files on specific S3 accounts

Created on 13 Jun 2017 · 6Comments · Source: pandas-dev/pandas

Say I want to save a file to S3 using a specific account:

df.to_csv('s3://foo/bar/temp.csv')

where my accounts are listed in ~/.aws/credentials:

[default]
aws_access_key_id = XXXX
aws_secret_access_key = XXXX

[foo]
aws_access_key_id = XXXX
aws_secret_access_key = XXXX

[bar]
aws_access_key_id = XXXX
aws_secret_access_key = XXXX

What's the best or recommended way to do this with Pandas 0.20.2?

Any way to use/specify what account to use when we have multiple of them?

Perhaps related: Does Pandas use boto or boto3?

Docs IO CSV IO Network

Source

amelio-vazquez-reina

Most helpful comment

Yes, I think that request has come up in a few places. I'd be happy to see something like that.

TomAugspurger on 19 Jun 2019

👍7

All 6 comments

As of 0.20, pandas uses http://s3fs.readthedocs.io/en/latest/

I believe you should be able to do

import pandas as pd
import s3fs

fs = s3fs.S3FileSystem(profile_name='foo')

f = fs.open("my-bucket/file.csv", "wb")
df.to_csv(f)

Could you try that out, and if it works make a pull request for the documentation? I don't have a test bucket handy at the moment.

TomAugspurger on 13 Jun 2017

I know this post is quite old at this point. However @TomAugspurger 's solution certainly works. For py3, I did the small change of using 'w' instead of 'wb'.

eronsdc on 11 Dec 2018

Would a solution to this be allowing a dask style storage_options parameter on its read_x functions? It's a little frustrating not being able to just pass these things through, most frequently i'm trying to pass in credentials rather than let boto search my system for them.