Turicreate: unable to read from/write to a non public S3 bucket

Created on 11 Dec 2017  路  29Comments  路  Source: apple/turicreate

It seems not possible to use method turicreate.aws.set_credentials() described in documentation https://apple.github.io/turicreate/docs/api/generated/turicreate.SFrame.html

There is no aws object within turicreate.
It seems also not possible to use environment variables, either when set in the shell environment, or before calling the load_sframe method.

This is the code i'm using to get an SFrame in S3.
It works well with the sframe library, but fails with turicreate

def getSFrame(s3SFramePath):
import os
import turicreate as tc
global S3_ACCESS_KEY, S3_SECRET_KEY
os.environ['AWS_ACCESS_KEY_ID'] = S3_ACCESS_KEY
os.environ['AWS_SECRET_ACCESS_KEY'] = S3_SECRET_KEY
return tc.load_sframe(s3SFramePath)

bug engine p2 setup

All 29 comments

Thanks for reporting this. We will look into this issue and update the instructions on using S3.

Adding the bug tag since it looks like there is a real bug here as well. Even when credentials are set via env variables, reading from buckets doesn't always work.

The fix to support s3 regions other than the default is in #90. Leaving this issue open for now, to update the documentation as well.

Update: Bug resolved; waiting on docs.

Docs updated with #183.

hi, any idea when this will be released ? thanks

@davidswaven Stay tuned - I can't give an exact date but we're putting together a 4.1 release including all current fixes.

We are hoping for this week.

Turi Create 4.1 is now available.

I'm able to make it work from my laptop. Thanks.

Unfortunately, from an EC2 linux server that has an IAM role, I have no way to access my SFrame stored in S3 bucket.
If I don't provide any environment variable, I get the following (and expected) error:
KeyError('No access key found. Please set the environment variable AWS_ACCESS_KEY_ID.',)
If i provide an access & secure key that has access to the bucket, I get the following (but unexpected) error:
IOError: s3://{my-bucket}/{my_sframe_folder_path} not found.: iostream error

The same code was working in lib sframe 2.1

Thanks @davidswaven - sounds like there is still a (now more obscure) bug here. I'll reopen this issue to track that.

@davidswaven Are you still able to repro this on the latest Turi Create (either 4.3.2 or 5.0b2)? If so, by any chance do you have capital letters in your bucket name? I think we may have issues specific to that case.

Closing for now -- please reopen if this has not been fixed.

We are experiencing the same issue using the latest version of Turi Create within EC2. When trying to access an S3 bucket within a Linux instance on AWS EC2, we receive the following errors:

Traceback (most recent call last):
  File "/root/venv/lib64/python3.6/site-packages/turicreate/data_structures/sframe.py", line 808, in __init__
    self.__proxy__.load_from_sframe_index(url)
  File "turicreate/cython/cy_sframe.pyx", line 71, in turicreate.cython.cy_sframe.UnitySFrameProxy.load_from_sframe_index
  File "turicreate/cython/cy_sframe.pyx", line 74, in turicreate.cython.cy_sframe.UnitySFrameProxy.load_from_sframe_index
OSError: s3:/bucket_name/path/to/sframe not found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "index.py", line 3, in <module>
    ratings = tc.load_sframe("s3:/bucket_name/path/to/sframe")
  File "/root/venv/lib64/python3.6/site-packages/turicreate/data_structures/sframe.py", line 83, in load_sframe
    sf = SFrame(data=filename)
  File "/root/venv/lib64/python3.6/site-packages/turicreate/data_structures/sframe.py", line 812, in __init__
    raise ValueError('Unknown input type: ' + format)
  File "/root/venv/lib64/python3.6/site-packages/turicreate/cython/context.py", line 49, in __exit__
    raise exc_type(exc_value)
OSError: s3:/bucket_name/path/to/sframe not found.
Traceback (most recent call last):
  File "save_model.py", line 23
    model.save("s3://bucket_name/path/to/save/file.model)
  File "/root/venv/lib64/python3.6/site-packages/turicreate/toolkits/_model.py", line 443, in save
    return glconnect.get_unity().save_model(self, _make_internal_url(location))
  File "turicreate/cython/cy_unity.pyx", line 97, in turicreate.cython.cy_unity.UnityGlobalProxy.save_model
  File "turicreate/cython/cy_unity.pyx", line 103, in turicreate.cython.cy_unity.UnityGlobalProxy.save_model
OSError: Unable to create directory structure at s3://id:key:bucket_name/path/to/file.model. Ensure that you have write permission to this location, or try again with a different path.

Despite the error messages, we are able to successfully access our S3 bucket using these credentials with AWS CLI.

Can you share a small repro script that we can try out and reproduce the issue?

@srikris. Here is a small script to reproduce.

import turicreate as tc

ratings = tc.SFrame.read_csv("s3://path")
model = tc.recommender.create(ratings, target="rating", verbose=False)
model.save("s3://path")

For the environment, we spun up an Amazon Linux AMI 2018.03.0 (HVM), SSD Volume Type EC2 instance and installed Python 3.6. We created an IAM user that had full S3 access for the credentials.

@oakesjessica Thanks for reporting this. We have found the bug. We will keep you posted!

@oakesjessica I think we have identified the issue. Fix is up for PR. Thanks!

Fixed with #1416.

@srikris, @znation. Thank you!

The fix for OSError: Unable to create directory structure is now available in Turi Create 5.3.1.

@srikris, @znation. Thank you for the updated fix in 5.3.1. However, we are still having issues reading and writing directly to our S3 bucket. The traceback error paths are the same as above but with different errors.

>>> tc.SFrame.read_csv("s3://bucket/to/file.csv")
Traceback (most recent call last):
  File "/root/venv/lib64/python3.6/site-packages/turicreate/data_structures/sframe.py", line 1037, in _read_csv_impl
    errors = proxy.load_from_csvs(internal_url, parsing_config, type_hints)
  File "turicreate/cython/cy_sframe.pyx", line 76, in turicreate.cython.cy_sframe.UnitySFrameProxy.load_from_csvs
  File "turicreate/cython/cy_sframe.pyx", line 84, in turicreate.cython.cy_sframe.UnitySFrameProxy.load_from_csvs
RuntimeError: No files corresponding to the specified path (s3://bucket/to/file.csv).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/venv/lib64/python3.6/site-packages/turicreate/data_structures/sframe.py", line 1504, in read_csv
    **kwargs)[0]
  File "/root/venv/lib64/python3.6/site-packages/turicreate/data_structures/sframe.py", line 1037, in _read_csv_impl
    errors = proxy.load_from_csvs(internal_url, parsing_config, type_hints)
  File "/root/venv/lib64/python3.6/site-packages/turicreate/cython/context.py", line 49, in __exit__
    raise exc_type(exc_value)
RuntimeError: No files corresponding to the specified path (s3://bucket/to/file.csv).
>>> model.save('s3://bucket/to/save_model.model')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/venv/lib64/python3.6/site-packages/turicreate/toolkits/_model.py", line 443, in save
    return glconnect.get_unity().save_model(self, _make_internal_url(location))
  File "turicreate/cython/cy_unity.pyx", line 97, in turicreate.cython.cy_unity.UnityGlobalProxy.save_model
  File "turicreate/cython/cy_unity.pyx", line 103, in turicreate.cython.cy_unity.UnityGlobalProxy.save_model
OSError: Maximum retry time reached

We used the same repro script provided above to test this out with 5.3.1 and using cli commands such as aws s3 cp s3://bucket/to/file.csv ./ are successful so it doesn't seem to be our credentials. Is there another hidden issue or is there something missing on our end?

Hmmm... Can you try adding the following line of code before you perform any S3 access. (Say immediately after import.

tc.config.set_runtime_config('TURI_FILEIO_INSECURE_SSL_CERTIFICATE_CHECKS', 1)

Setting that config did allow us to successfully save to our S3 bucket. Although, each save took a minimum of 30 minutes or more to finish, is there something we can do to increase the efficiency? Using a p2 instance did not seem to help with the speed.

Unfortunately, we are still getting the same retrieval error, RuntimeError: No files corresponding to the specified path (s3://bucket/to/file.csv), even though the file does exist.

The S3 write path could be optimized. I don't think we are taking advantage of parallel uploading capabilities. A workaround is to write it out to local disk and use awscli to upload it.

The read issue is odd though. It is surprising that you can write to the bucket, but not read from it. What region is your bucket in? Does the bucket name have uppercase characters?

Can you help us with some diagnosis steps?

import turicreate as tc
tc.config.set_log_level(2)
print tc.config.get_server_log_location() + ".0"
# attempt to read the CSV here

A log file will be produced in the location printed by the print statement.
You might need to strip it of s3 path information, before attaching here, or you can email it to me at [email protected]

Thanks!

@ylow. Cool, I am currently using the workaround you suggested so I'll just keep using that until the upload method is optimized more. Our bucket is in the us-east-1 region and does not contain uppercase characters. Sure, I will email you the log file. Thank you!

Is it resolved?

@franz101 - good question.

@davidswaven or @oakesjessica - we recently rewrote much of our S3 code to use AWS's SDK. I suspect this issue is likely now fixed. Please try using the most recent version of TuriCreate and let us know if the issue has been resolved.

This issues should have been fixed in 6.2. I haven't heard back here. So I'm going to close this issue.

Was this page helpful?
0 / 5 - 0 ratings