boto3 seems to be breaking with apache spark in yarn mode. - `NoCredentialsError: Unable to locate credentials`.

Created on 1 Dec 2016  Â·  13Comments  Â·  Source: boto/boto3

This is a bit weird and I cannot rule out that I am doing something stupid.

with Apache spark 2.0.0 on Hortonworks Data Platform 2.5 (HDP 2.5) I am seeing that parrallelised tasks of jobs running through yarn are not able to locate credentials. I am very sure that the user I am using (centos) has the credentials stored in the right place (~/.aws) I have tested this very thoroughly with vanilla python boto3 and the awscli.

I have a couple of boto calls. one before parallelism which works.

for object in my_bucket.objects.filter(Prefix='1971-01'):

and this one is supposed to run in parallel downloading the object. It seems that this is failing.

s3obj = boto3.resource('s3').Object(bucket_name='time-waits-for-no-man', key=s3Key)

The job fails with

NoCredentialsError: Unable to locate credentials.

Stacktrace:
Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, hadoop002.dbszod.aws.db.de): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0048/container_1480271222291_0048_01_000020/pyspark.zip/pyspark/worker.py", line 172, in main
    process()
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0048/container_1480271222291_0048_01_000020/pyspark.zip/pyspark/worker.py", line 167, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0048/container_1480271222291_0048_01_000020/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1306, in takeUpToNumLeft
  File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 38, in distributedJsonRead
  File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action
    response = action(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 526, in _make_api_call
    operation_model, request_dict)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 141, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 166, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 150, in create_request
    operation_name=operation_model.name)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit
    return self._emit(event_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit
    response = handler(**kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 147, in sign
    auth.add_auth(request)
  File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 678, in add_auth
    raise NoCredentialsError
NoCredentialsError: Unable to locate credentials

    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
    at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:85)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
stacktrace:Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, hadoop002.dbszod.aws.db.de): org.apache.spark.api.python.PythonException: Traceback (most recent call last):  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0048/container_1480271222291_0048_01_000020/pyspark.zip/pyspark/worker.py", line 172, in main    process()  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0048/container_1480271222291_0048_01_000020/pyspark.zip/pyspark/worker.py", line 167, in process    serializer.dump_stream(func(split_index, iterator), outfile)  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0048/container_1480271222291_0048_01_000020/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream    vs = list(itertools.islice(iterator, batch))  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1306, in takeUpToNumLeft  File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 38, in distributedJsonRead  File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action    response = action(self, *args, **kwargs)  File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__    response = getattr(parent.meta.client, operation_name)(**params)  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call    return self._make_api_call(operation_name, kwargs)  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 526, in _make_api_call    operation_model, request_dict)  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 141, in make_request    return self._send_request(request_dict, operation_model)  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 166, in _send_request    request = self.create_request(request_dict, operation_model)  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 150, in create_request    operation_name=operation_model.name)  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit    return self._emit(event_name, kwargs)  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit    response = handler(**kwargs)  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler    return self.sign(operation_name, request)  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 147, in sign    auth.add_auth(request)  File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 678, in add_auth    raise NoCredentialsErrorNoCredentialsError: Unable to locate credentialsat org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)at org.apache.spark.scheduler.Task.run(Task.scala:85)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at java.lang.Thread.run(Thread.java:745)Driver stacktrace:

I am not sure it is relevent but the last thing I can see from the botocore debug output is:

2016-11-30 22:36:47,955 botocore.hooks [DEBUG] Event needs-retry.s3.ListObjects: calling handler <botocore.retryhandler.RetryHandler object at 0x20f7310>
2016-11-30 22:36:47,955 botocore.retryhandler [DEBUG] No retry needed.
2016-11-30 22:36:47,955 botocore.hooks [DEBUG] Event needs-retry.s3.ListObjects: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x223a6d0>>
2016-11-30 22:36:47,955 botocore.hooks [DEBUG] Event after-call.s3.ListObjects: calling handler <function decode_list_object at 0x16c3b90>
2016-11-30 22:36:47,956 botocore.hooks [DEBUG] Event creating-resource-class.s3.ObjectSummary: calling handler <function _handler at 0x1bd7488>

The full code ( please excuse the mess)

import boto3
import ujson
import arrow
import sys
import os
from pyspark.sql import SQLContext
from pyspark import SparkContext

boto3.set_stream_logger('botocore', level='DEBUG')
sc = SparkContext()

version = sys.version
log4jLogger = sc._jvm.org.apache.log4j
LOGGER = log4jLogger.LogManager.getLogger(__name__)
LOGGER.info("pyspark script logger initialized")
LOGGER.info("Python Version: " + version)

s3_list = []
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('time-waits-for-no-man')
for object in my_bucket.objects.filter(Prefix='1971-01'):
    s3_list.append(object.key)

def add_timestamp(dict):
    dict['timestamp'] = arrow.get(
                        int(dict['year']),
                        int(dict['month']),
                        int(dict['day']),
                        int(dict['hour']),
                        int(dict['minute']),
                        int(dict['second'])
                        ).timestamp
    return dict

def distributedJsonRead(s3Key):
    s3obj = boto3.resource('s3').Object(bucket_name='time-waits-for-no-man', key=s3Key)
    contents = s3obj.get()['Body'].read().decode()
    meow = contents.splitlines()
    result_wo_timestamp = map(ujson.loads, meow)
    result_wi_timestamp = map(add_timestamp, result_wo_timestamp)
    return result_wi_timestamp

sqlContext = SQLContext(sc)
job = sc.parallelize(s3_list)
foo = job.flatMap(distributedJsonRead)
df = foo.toDF()
#df.show()
blah = df.count()
print(blah)
df.printSchema()

#df.write.parquet('dates_by_seconds', mode="overwrite", partitionBy=["second"])
sc.stop()
exit()

[centos@hadoop003 ~]$ cat .aws/config

[default]
region = eu-central-1

[Boto]

proxy = webproxy.foo.de
proxy_port = 8080

[centos@hadoop003 ~]$ cat .aws/credentials

[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXX
aws_secret_access_key = XxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXx
closing-soon question

Most helpful comment

Please have a look at this SO thread.
https://stackoverflow.com/questions/37950728/boto3-cannot-create-client-on-pyspark-worker/42102858#42102858

It is because the boto3 will download some file to where it lives in.

If you have your python app and dependencies package altogether as a zip, you will be likely to encounter this issue. As those files cannot be save into a zip location.

The work around would be install boto3 on every instance of your spark cluster.

All 13 comments

Full output

[centos@hadoop001 ~]$ spark-submit --master="yarn-client" --executor-memory 1G --num-executors 20 fun-functions/spark-parrallel-read-from-s3/tick.py 
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
16/11/30 23:03:57 INFO SparkContext: Running Spark version 2.0.0.2.5.0.0-1245
16/11/30 23:03:57 INFO SecurityManager: Changing view acls to: centos
16/11/30 23:03:57 INFO SecurityManager: Changing modify acls to: centos
16/11/30 23:03:57 INFO SecurityManager: Changing view acls groups to: 
16/11/30 23:03:57 INFO SecurityManager: Changing modify acls groups to: 
16/11/30 23:03:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(centos); groups with view permissions: Set(); users  with modify permissions: Set(centos); groups with modify permissions: Set()
16/11/30 23:03:57 INFO Utils: Successfully started service 'sparkDriver' on port 33482.
16/11/30 23:03:57 INFO SparkEnv: Registering MapOutputTracker
16/11/30 23:03:57 INFO SparkEnv: Registering BlockManagerMaster
16/11/30 23:03:57 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-7d1fd6ff-8327-44e9-934d-0d565c1e8bfa
16/11/30 23:03:57 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
16/11/30 23:03:58 INFO SparkEnv: Registering OutputCommitCoordinator
16/11/30 23:03:58 INFO log: Logging initialized @2129ms
16/11/30 23:03:58 INFO Server: jetty-9.2.z-SNAPSHOT
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@3bca1cc0{/jobs,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@948c112{/jobs/json,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@61f2f4f6{/jobs/job,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@39499c00{/jobs/job/json,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@520ebbf5{/stages,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1d01b0f7{/stages/json,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6b22b2e5{/stages/stage,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1e7e9a1d{/stages/stage/json,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@39085b34{/stages/pool,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@291ec2ea{/stages/pool/json,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5755e8b6{/storage,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@3d112e33{/storage/json,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@463b035e{/storage/rdd,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@19ed80e1{/storage/rdd/json,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6185cb32{/environment,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@55511187{/environment/json,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4b6ed980{/executors,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@3ef1a284{/executors/json,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@71623233{/executors/threadDump,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2ce2d8b1{/executors/threadDump/json,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@11fdb674{/static,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@222bd34d{/,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@42388e91{/api,null,AVAILABLE}
16/11/30 23:03:58 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@78a5e137{/stages/stage/kill,null,AVAILABLE}
16/11/30 23:03:58 INFO ServerConnector: Started ServerConnector@237d61c{HTTP/1.1}{0.0.0.0:4040}
16/11/30 23:03:58 INFO Server: Started @2259ms
16/11/30 23:03:58 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/11/30 23:03:58 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.43.191.178:4040
16/11/30 23:03:59 INFO RMProxy: Connecting to ResourceManager at hadoop002.dbszod.aws.db.de/10.43.191.160:8050
16/11/30 23:03:59 INFO Client: Requesting a new application from cluster with 4 NodeManagers
16/11/30 23:03:59 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
16/11/30 23:03:59 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/11/30 23:03:59 INFO Client: Setting up container launch context for our AM
16/11/30 23:03:59 INFO Client: Setting up the launch environment for our AM container
16/11/30 23:03:59 INFO Client: Preparing resources for our AM container
16/11/30 23:03:59 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs:///hdp/apps/2.5.0.0-1245/spark2/spark2-hdp-yarn-archive.tar.gz
16/11/30 23:03:59 INFO Client: Source and destination file systems are the same. Not copying hdfs:/hdp/apps/2.5.0.0-1245/spark2/spark2-hdp-yarn-archive.tar.gz
16/11/30 23:03:59 INFO Client: Uploading resource file:/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip -> hdfs://hadoop001.dbszod.aws.db.de:8020/user/centos/.sparkStaging/application_1480271222291_0049/pyspark.zip
16/11/30 23:03:59 INFO Client: Uploading resource file:/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip -> hdfs://hadoop001.dbszod.aws.db.de:8020/user/centos/.sparkStaging/application_1480271222291_0049/py4j-0.10.1-src.zip
16/11/30 23:03:59 INFO Client: Uploading resource file:/tmp/spark-bfa7840d-dae8-4e16-8902-f3b257c410cd/__spark_conf__2000631962692114874.zip -> hdfs://hadoop001.dbszod.aws.db.de:8020/user/centos/.sparkStaging/application_1480271222291_0049/__spark_conf__.zip
16/11/30 23:04:00 INFO SecurityManager: Changing view acls to: centos
16/11/30 23:04:00 INFO SecurityManager: Changing modify acls to: centos
16/11/30 23:04:00 INFO SecurityManager: Changing view acls groups to: 
16/11/30 23:04:00 INFO SecurityManager: Changing modify acls groups to: 
16/11/30 23:04:00 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(centos); groups with view permissions: Set(); users  with modify permissions: Set(centos); groups with modify permissions: Set()
16/11/30 23:04:00 INFO Client: Submitting application application_1480271222291_0049 to ResourceManager
16/11/30 23:04:00 INFO YarnClientImpl: Submitted application application_1480271222291_0049
16/11/30 23:04:00 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1480271222291_0049 and attemptId None
16/11/30 23:04:01 INFO Client: Application report for application_1480271222291_0049 (state: ACCEPTED)
16/11/30 23:04:01 INFO Client: 
     client token: N/A
     diagnostics: AM container is launched, waiting for AM container to Register with RM
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1480547038808
     final status: UNDEFINED
     tracking URL: http://hadoop002.dbszod.aws.db.de:8088/proxy/application_1480271222291_0049/
     user: centos
16/11/30 23:04:02 INFO Client: Application report for application_1480271222291_0049 (state: ACCEPTED)
16/11/30 23:04:02 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/11/30 23:04:02 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop002.dbszod.aws.db.de, PROXY_URI_BASES -> http://hadoop002.dbszod.aws.db.de:8088/proxy/application_1480271222291_0049), /proxy/application_1480271222291_0049
16/11/30 23:04:02 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/11/30 23:04:03 INFO Client: Application report for application_1480271222291_0049 (state: RUNNING)
16/11/30 23:04:03 INFO Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: 10.43.191.161
     ApplicationMaster RPC port: 0
     queue: default
     start time: 1480547038808
     final status: UNDEFINED
     tracking URL: http://hadoop002.dbszod.aws.db.de:8088/proxy/application_1480271222291_0049/
     user: centos
16/11/30 23:04:03 INFO YarnClientSchedulerBackend: Application application_1480271222291_0049 has started running.
16/11/30 23:04:03 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43203.
16/11/30 23:04:03 INFO NettyBlockTransferService: Server created on 10.43.191.178:43203
16/11/30 23:04:03 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.43.191.178, 43203)
16/11/30 23:04:03 INFO BlockManagerMasterEndpoint: Registering block manager 10.43.191.178:43203 with 366.3 MB RAM, BlockManagerId(driver, 10.43.191.178, 43203)
16/11/30 23:04:03 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.43.191.178, 43203)
16/11/30 23:04:03 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@ad23cf5{/metrics/json,null,AVAILABLE}
16/11/30 23:04:03 INFO EventLoggingListener: Logging events to hdfs:///spark2-history/application_1480271222291_0049
16/11/30 23:04:06 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.160:34318) with ID 3
16/11/30 23:04:06 INFO BlockManagerMasterEndpoint: Registering block manager hadoop002.dbszod.aws.db.de:39300 with 366.3 MB RAM, BlockManagerId(3, hadoop002.dbszod.aws.db.de, 39300)
16/11/30 23:04:06 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.143:39832) with ID 1
16/11/30 23:04:06 INFO BlockManagerMasterEndpoint: Registering block manager hadoop004.dbszod.aws.db.de:41873 with 366.3 MB RAM, BlockManagerId(1, hadoop004.dbszod.aws.db.de, 41873)
16/11/30 23:04:07 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.178:37346) with ID 2
16/11/30 23:04:07 INFO BlockManagerMasterEndpoint: Registering block manager hadoop001.dbszod.aws.db.de:42634 with 366.3 MB RAM, BlockManagerId(2, hadoop001.dbszod.aws.db.de, 42634)
16/11/30 23:04:07 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.161:37606) with ID 4
16/11/30 23:04:07 INFO BlockManagerMasterEndpoint: Registering block manager hadoop003.dbszod.aws.db.de:35840 with 366.3 MB RAM, BlockManagerId(4, hadoop003.dbszod.aws.db.de, 35840)
16/11/30 23:04:08 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.143:39836) with ID 5
16/11/30 23:04:08 INFO BlockManagerMasterEndpoint: Registering block manager hadoop004.dbszod.aws.db.de:43446 with 366.3 MB RAM, BlockManagerId(5, hadoop004.dbszod.aws.db.de, 43446)
16/11/30 23:04:08 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.178:37350) with ID 6
16/11/30 23:04:08 INFO BlockManagerMasterEndpoint: Registering block manager hadoop001.dbszod.aws.db.de:41339 with 366.3 MB RAM, BlockManagerId(6, hadoop001.dbszod.aws.db.de, 41339)
16/11/30 23:04:09 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.143:39840) with ID 9
16/11/30 23:04:09 INFO BlockManagerMasterEndpoint: Registering block manager hadoop004.dbszod.aws.db.de:41620 with 366.3 MB RAM, BlockManagerId(9, hadoop004.dbszod.aws.db.de, 41620)
16/11/30 23:04:09 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.178:37354) with ID 10
16/11/30 23:04:09 INFO BlockManagerMasterEndpoint: Registering block manager hadoop001.dbszod.aws.db.de:39154 with 366.3 MB RAM, BlockManagerId(10, hadoop001.dbszod.aws.db.de, 39154)
16/11/30 23:04:09 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.160:34326) with ID 7
16/11/30 23:04:09 INFO BlockManagerMasterEndpoint: Registering block manager hadoop002.dbszod.aws.db.de:39835 with 366.3 MB RAM, BlockManagerId(7, hadoop002.dbszod.aws.db.de, 39835)
16/11/30 23:04:09 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.160:34328) with ID 11
16/11/30 23:04:09 INFO BlockManagerMasterEndpoint: Registering block manager hadoop002.dbszod.aws.db.de:41191 with 366.3 MB RAM, BlockManagerId(11, hadoop002.dbszod.aws.db.de, 41191)
16/11/30 23:04:10 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.161:37630) with ID 12
16/11/30 23:04:10 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.161:37618) with ID 8
16/11/30 23:04:10 INFO BlockManagerMasterEndpoint: Registering block manager hadoop003.dbszod.aws.db.de:34682 with 366.3 MB RAM, BlockManagerId(12, hadoop003.dbszod.aws.db.de, 34682)
16/11/30 23:04:10 INFO BlockManagerMasterEndpoint: Registering block manager hadoop003.dbszod.aws.db.de:46832 with 366.3 MB RAM, BlockManagerId(8, hadoop003.dbszod.aws.db.de, 46832)
16/11/30 23:04:12 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.160:34334) with ID 19
16/11/30 23:04:12 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.143:39844) with ID 17
16/11/30 23:04:12 INFO BlockManagerMasterEndpoint: Registering block manager hadoop002.dbszod.aws.db.de:45599 with 366.3 MB RAM, BlockManagerId(19, hadoop002.dbszod.aws.db.de, 45599)
16/11/30 23:04:12 INFO BlockManagerMasterEndpoint: Registering block manager hadoop004.dbszod.aws.db.de:46309 with 366.3 MB RAM, BlockManagerId(17, hadoop004.dbszod.aws.db.de, 46309)
16/11/30 23:04:12 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.178:37362) with ID 14
16/11/30 23:04:12 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.178:37364) with ID 18
16/11/30 23:04:12 INFO BlockManagerMasterEndpoint: Registering block manager hadoop001.dbszod.aws.db.de:34318 with 366.3 MB RAM, BlockManagerId(14, hadoop001.dbszod.aws.db.de, 34318)
16/11/30 23:04:12 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
16/11/30 23:04:12 INFO BlockManagerMasterEndpoint: Registering block manager hadoop001.dbszod.aws.db.de:42807 with 366.3 MB RAM, BlockManagerId(18, hadoop001.dbszod.aws.db.de, 42807)
16/11/30 23:04:12 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.143:39848) with ID 13
16/11/30 23:04:12 INFO __main__: pyspark script logger initialized
16/11/30 23:04:12 INFO __main__: Python Version: 2.7.5 (default, Nov  6 2016, 00:28:07) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)]
2016-11-30 23:04:12,799 botocore.loaders [DEBUG] Loading JSON file: /usr/lib/python2.7/site-packages/boto3/data/s3/2006-03-01/resources-1.json
2016-11-30 23:04:12,811 botocore.credentials [DEBUG] Looking for credentials via: env
2016-11-30 23:04:12,811 botocore.credentials [DEBUG] Looking for credentials via: assume-role
2016-11-30 23:04:12,812 botocore.credentials [DEBUG] Looking for credentials via: shared-credentials-file
2016-11-30 23:04:12,812 botocore.credentials [INFO] Found credentials in shared credentials file: ~/.aws/credentials
2016-11-30 23:04:12,812 botocore.loaders [DEBUG] Loading JSON file: /usr/lib/python2.7/site-packages/botocore/data/endpoints.json
2016-11-30 23:04:12,825 botocore.loaders [DEBUG] Loading JSON file: /usr/lib/python2.7/site-packages/botocore/data/s3/2006-03-01/service-2.json
16/11/30 23:04:12 INFO BlockManagerMasterEndpoint: Registering block manager hadoop004.dbszod.aws.db.de:37388 with 366.3 MB RAM, BlockManagerId(13, hadoop004.dbszod.aws.db.de, 37388)
2016-11-30 23:04:12,853 botocore.loaders [DEBUG] Loading JSON file: /usr/lib/python2.7/site-packages/botocore/data/_retry.json
2016-11-30 23:04:12,855 botocore.client [DEBUG] Registering retry handlers for service: s3
2016-11-30 23:04:12,858 botocore.hooks [DEBUG] Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x111dc08>
2016-11-30 23:04:12,858 botocore.hooks [DEBUG] Event creating-client-class.s3: calling handler <function _handler at 0x1682758>
2016-11-30 23:04:12,866 botocore.hooks [DEBUG] Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x111d398>
2016-11-30 23:04:12,867 botocore.args [DEBUG] The s3 config key is not a dictionary type, ignoring its value of: None
2016-11-30 23:04:12,869 botocore.endpoint [DEBUG] Setting s3 timeout as (60, 60)
2016-11-30 23:04:12,869 botocore.client [DEBUG] Defaulting to S3 virtual host style addressing with path style addressing fallback.
2016-11-30 23:04:12,873 botocore.hooks [DEBUG] Event creating-resource-class.s3.Bucket: calling handler <function _handler at 0x16829b0>
2016-11-30 23:04:12,874 botocore.loaders [DEBUG] Loading JSON file: /usr/lib/python2.7/site-packages/botocore/data/s3/2006-03-01/paginators-1.json
2016-11-30 23:04:12,875 botocore.hooks [DEBUG] Event before-parameter-build.s3.ListObjects: calling handler <function set_list_objects_encoding_type_url at 0x1160b18>
2016-11-30 23:04:12,875 botocore.hooks [DEBUG] Event before-parameter-build.s3.ListObjects: calling handler <function validate_bucket_name at 0x115cc08>
2016-11-30 23:04:12,876 botocore.hooks [DEBUG] Event before-parameter-build.s3.ListObjects: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x1cd86d0>>
2016-11-30 23:04:12,876 botocore.hooks [DEBUG] Event before-call.s3.ListObjects: calling handler <function add_expect_header at 0x11600c8>
2016-11-30 23:04:12,876 botocore.hooks [DEBUG] Event before-call.s3.ListObjects: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x1cd86d0>>
2016-11-30 23:04:12,876 botocore.endpoint [DEBUG] Making request for OperationModel(name=ListObjects) (verify_ssl=True) with params: {'body': '', 'url': u'https://s3.eu-central-1.amazonaws.com/time-waits-for-no-man?prefix=1971-01&encoding-type=url', 'headers': {'User-Agent': 'Boto3/1.4.1 Python/2.7.5 Linux/3.10.0-514.el7.x86_64 Botocore/1.4.80 Resource'}, 'context': {'encoding_type_auto_set': True, 'client_region': 'eu-central-1', 'signing': {'bucket': 'time-waits-for-no-man'}, 'has_streaming_input': False, 'client_config': <botocore.config.Config object at 0x1cd8290>}, 'query_string': {u'prefix': '1971-01', u'encoding-type': 'url'}, 'url_path': u'/time-waits-for-no-man', 'method': u'GET'}
2016-11-30 23:04:12,877 botocore.hooks [DEBUG] Event request-created.s3.ListObjects: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x1cd8250>>
2016-11-30 23:04:12,877 botocore.hooks [DEBUG] Event before-sign.s3.ListObjects: calling handler <function fix_s3_host at 0xf4fb90>
2016-11-30 23:04:12,877 botocore.auth [DEBUG] Calculating signature using v4 auth.
2016-11-30 23:04:12,878 botocore.auth [DEBUG] CanonicalRequest:
GET
/time-waits-for-no-man
encoding-type=url&prefix=1971-01
host:s3.eu-central-1.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20161130T230412Z

host;x-amz-content-sha256;x-amz-date
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2016-11-30 23:04:12,878 botocore.auth [DEBUG] StringToSign:
AWS4-HMAC-SHA256
20161130T230412Z
20161130/eu-central-1/s3/aws4_request
b0193aee0874f86d54c11f07cf6c5ae65140882983229c9a2d95161cd98d6135
2016-11-30 23:04:12,878 botocore.auth [DEBUG] Signature:
6513b177cff9a21e05bff424d6a6525fcd7b13bfc6f69dce1ae7f77ea9d2c217
2016-11-30 23:04:12,879 botocore.endpoint [DEBUG] Sending http request: <PreparedRequest [GET]>
2016-11-30 23:04:12,879 botocore.vendored.requests.packages.urllib3.connectionpool [INFO] Starting new HTTPS connection (1): s3.eu-central-1.amazonaws.com
16/11/30 23:04:12 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.160:34338) with ID 15
2016-11-30 23:04:12,915 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "GET /time-waits-for-no-man?prefix=1971-01&encoding-type=url HTTP/1.1" 301 None
2016-11-30 23:04:12,915 botocore.parsers [DEBUG] Response headers: {'date': 'Wed, 30 Nov 2016 23:04:07 GMT', 'x-amz-id-2': 'sAmyVIcDzpEItWhcPrA+G7IhpzahlR1ivaRxkkTVNRnrvSmbhLVvw9+0DwFzevFe2fFYTRRXby4=', 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'x-amz-request-id': '6DE6CF34A6D99E3C', 'x-amz-bucket-region': 'eu-west-1', 'content-type': 'application/xml'}
2016-11-30 23:04:12,916 botocore.parsers [DEBUG] Response body:
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>PermanentRedirect</Code><Message>The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.</Message><Bucket>time-waits-for-no-man</Bucket><Endpoint>time-waits-for-no-man.s3-eu-west-1.amazonaws.com</Endpoint><RequestId>6DE6CF34A6D99E3C</RequestId><HostId>sAmyVIcDzpEItWhcPrA+G7IhpzahlR1ivaRxkkTVNRnrvSmbhLVvw9+0DwFzevFe2fFYTRRXby4=</HostId></Error>
2016-11-30 23:04:12,916 botocore.hooks [DEBUG] Event needs-retry.s3.ListObjects: calling handler <botocore.retryhandler.RetryHandler object at 0x1b95310>
2016-11-30 23:04:12,916 botocore.retryhandler [DEBUG] No retry needed.
2016-11-30 23:04:12,916 botocore.hooks [DEBUG] Event needs-retry.s3.ListObjects: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x1cd86d0>>
2016-11-30 23:04:12,916 botocore.utils [DEBUG] S3 client configured for region eu-central-1 but the bucket time-waits-for-no-man is in region eu-west-1; Please configure the proper region to avoid multiple unnecessary redirects and signing attempts.
2016-11-30 23:04:12,916 botocore.utils [DEBUG] Updating URI from https://s3.eu-central-1.amazonaws.com/time-waits-for-no-man?prefix=1971-01&encoding-type=url to https://s3-eu-west-1.amazonaws.com/time-waits-for-no-man?prefix=1971-01&encoding-type=url
2016-11-30 23:04:12,916 botocore.endpoint [DEBUG] Response received to retry, sleeping for 0 seconds
2016-11-30 23:04:12,917 botocore.hooks [DEBUG] Event request-created.s3.ListObjects: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x1cd8250>>
2016-11-30 23:04:12,917 botocore.hooks [DEBUG] Event before-sign.s3.ListObjects: calling handler <function fix_s3_host at 0xf4fb90>
2016-11-30 23:04:12,917 botocore.auth [DEBUG] Calculating signature using v4 auth.
2016-11-30 23:04:12,917 botocore.auth [DEBUG] CanonicalRequest:
GET
/time-waits-for-no-man
encoding-type=url&prefix=1971-01
host:s3-eu-west-1.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20161130T230412Z

host;x-amz-content-sha256;x-amz-date
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2016-11-30 23:04:12,917 botocore.auth [DEBUG] StringToSign:
AWS4-HMAC-SHA256
20161130T230412Z
20161130/eu-west-1/s3/aws4_request
f93a29a6e152f5c930c3e0e00a0cea707a6582451f9b378f0eaa380c4fe36a82
2016-11-30 23:04:12,917 botocore.auth [DEBUG] Signature:
0f7bf03ebf112f7c7fa3c1107216dabd5c4b32a15ab55454d4cf160561802644
2016-11-30 23:04:12,918 botocore.endpoint [DEBUG] Sending http request: <PreparedRequest [GET]>
2016-11-30 23:04:12,918 botocore.vendored.requests.packages.urllib3.connectionpool [INFO] Starting new HTTPS connection (1): s3-eu-west-1.amazonaws.com
16/11/30 23:04:12 INFO BlockManagerMasterEndpoint: Registering block manager hadoop002.dbszod.aws.db.de:44987 with 366.3 MB RAM, BlockManagerId(15, hadoop002.dbszod.aws.db.de, 44987)
16/11/30 23:04:13 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.161:37636) with ID 20
16/11/30 23:04:13 INFO BlockManagerMasterEndpoint: Registering block manager hadoop003.dbszod.aws.db.de:43950 with 366.3 MB RAM, BlockManagerId(20, hadoop003.dbszod.aws.db.de, 43950)
2016-11-30 23:04:13,224 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "GET /time-waits-for-no-man?prefix=1971-01&encoding-type=url HTTP/1.1" 200 None
2016-11-30 23:04:13,225 botocore.parsers [DEBUG] Response headers: {'x-amz-bucket-region': 'eu-west-1', 'x-amz-id-2': 'hCJa551Z3JcalQks9xglnmOpFFuFr9sCcMiqVtkDm6xudLhbV3nPLN9K0n3MPRHYMOe9BK9fXz4=', 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'x-amz-request-id': 'A1F2405AEE42B2FE', 'date': 'Wed, 30 Nov 2016 23:04:08 GMT', 'content-type': 'application/xml'}
2016-11-30 23:04:13,225 botocore.parsers [DEBUG] Response body:
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>time-waits-for-no-man</Name><Prefix>1971-01</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><EncodingType>url</EncodingType><IsTruncated>false</IsTruncated><Contents><Key>1971-01-01</Key><LastModified>2016-10-01T14:59:03.000Z</LastModified><ETag>&quot;78d55265f86ba8ad7bc134b26adf96fe-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-02</Key><LastModified>2016-10-01T14:59:03.000Z</LastModified><ETag>&quot;cd85ab695011303df90f5d496b44cd32-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-03</Key><LastModified>2016-10-01T14:59:03.000Z</LastModified><ETag>&quot;821c1ddfb951caef8a65bee3272c0254-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-04</Key><LastModified>2016-10-01T14:59:04.000Z</LastModified><ETag>&quot;131ee1840b2c50fcfb4c0bcf8fd1e312-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-05</Key><LastModified>2016-10-01T14:59:04.000Z</LastModified><ETag>&quot;17eb81d0990ab7da32102b6fe9c1149c-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-06</Key><LastModified>2016-10-01T15:00:03.000Z</LastModified><ETag>&quot;e2288e5e8f4c64763259f9dad5b909c9-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-07</Key><LastModified>2016-10-01T15:00:03.000Z</LastModified><ETag>&quot;2ef9aca034210ec40b8e3955da82ffe8-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-08</Key><LastModified>2016-10-01T15:00:03.000Z</LastModified><ETag>&quot;c6bb7195c10e5dec17f023885b70b2fd-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-09</Key><LastModified>2016-10-01T15:00:03.000Z</LastModified><ETag>&quot;f769e0f3e35903ca24bce24cbd469acf-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-10</Key><LastModified>2016-10-01T15:00:03.000Z</LastModified><ETag>&quot;ad5943dad6fddf8a085f82328e2b60a0-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-11</Key><LastModified>2016-10-01T15:00:04.000Z</LastModified><ETag>&quot;1519a1dfba997a26475dbf372630afd0-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-12</Key><LastModified>2016-10-01T15:00:04.000Z</LastModified><ETag>&quot;ce8b9e46ac349d65071309df9af59326-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-13</Key><LastModified>2016-10-01T15:00:04.000Z</LastModified><ETag>&quot;1f59bab2d0446c8440d4a4cbc4cb08d3-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-14</Key><LastModified>2016-10-01T15:00:04.000Z</LastModified><ETag>&quot;6d8787b79f1f7656f88fe7bf791403e9-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-15</Key><LastModified>2016-10-01T15:00:04.000Z</LastModified><ETag>&quot;5fea9014bad00285355b8f85fe47c442-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-16</Key><LastModified>2016-10-01T15:00:04.000Z</LastModified><ETag>&quot;62c393b72a896450e699f57cc233e340-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-17</Key><LastModified>2016-10-01T15:01:02.000Z</LastModified><ETag>&quot;111b906722d17ff86cf26beb351cb270-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-18</Key><LastModified>2016-10-01T15:01:02.000Z</LastModified><ETag>&quot;2c7e978966b5760cda5f8ab3dab2a53e-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-19</Key><LastModified>2016-10-01T15:01:02.000Z</LastModified><ETag>&quot;985f8c9cd5eace8b64ac99c0b7af0925-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-20</Key><LastModified>2016-10-01T15:01:02.000Z</LastModified><ETag>&quot;d5e05b8279123f0905940746426fab16-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-21</Key><LastModified>2016-10-01T15:01:02.000Z</LastModified><ETag>&quot;88798987f35f8944b09cea81f89fc88a-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-22</Key><LastModified>2016-10-01T15:01:03.000Z</LastModified><ETag>&quot;038e8e79d2cbf698064e64c835de61d8-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-23</Key><LastModified>2016-10-01T15:01:03.000Z</LastModified><ETag>&quot;f520e225143bce9aa710bf312fb97f9e-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-24</Key><LastModified>2016-10-01T15:01:03.000Z</LastModified><ETag>&quot;0e35498f45844b648f5a6f204c200e6c-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-25</Key><LastModified>2016-10-01T15:01:03.000Z</LastModified><ETag>&quot;63356daac1cc4b235b6f44718fe86de3-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-26</Key><LastModified>2016-10-01T15:01:03.000Z</LastModified><ETag>&quot;ea86be9e776167efa432f27bcbc6e67f-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-27</Key><LastModified>2016-10-01T15:01:03.000Z</LastModified><ETag>&quot;736f8e23a1580545b414019355f58560-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-28</Key><LastModified>2016-10-01T15:02:02.000Z</LastModified><ETag>&quot;03b80b2da64c15dacc8f955cfedb81de-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-29</Key><LastModified>2016-10-01T15:02:02.000Z</LastModified><ETag>&quot;7a84c18765258ab00a4db12f85f0b2f2-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-30</Key><LastModified>2016-10-01T15:02:02.000Z</LastModified><ETag>&quot;000708ddd745f2cb8fd84e5279804f3b-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>1971-01-31</Key><LastModified>2016-10-01T15:02:02.000Z</LastModified><ETag>&quot;39a4bcdfe0a50651bd080d5f6cf02556-2&quot;</ETag><Size>9763200</Size><Owner><ID>5342c023f89d3d7188e53ae40251431f3f8377b3f9008c012b9bc3651e6235cb</ID><DisplayName>andrew.holway</DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
2016-11-30 23:04:13,234 botocore.hooks [DEBUG] Event needs-retry.s3.ListObjects: calling handler <botocore.retryhandler.RetryHandler object at 0x1b95310>
2016-11-30 23:04:13,234 botocore.retryhandler [DEBUG] No retry needed.
2016-11-30 23:04:13,235 botocore.hooks [DEBUG] Event needs-retry.s3.ListObjects: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x1cd86d0>>
2016-11-30 23:04:13,235 botocore.hooks [DEBUG] Event after-call.s3.ListObjects: calling handler <function decode_list_object at 0x1160b90>
2016-11-30 23:04:13,236 botocore.hooks [DEBUG] Event creating-resource-class.s3.ObjectSummary: calling handler <function _handler at 0x1682488>
16/11/30 23:04:13 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.43.191.161:37640) with ID 16
16/11/30 23:04:13 INFO BlockManagerMasterEndpoint: Registering block manager hadoop003.dbszod.aws.db.de:38293 with 366.3 MB RAM, BlockManagerId(16, hadoop003.dbszod.aws.db.de, 38293)
16/11/30 23:04:13 INFO SparkContext: Starting job: runJob at PythonRDD.scala:441
16/11/30 23:04:13 INFO DAGScheduler: Got job 0 (runJob at PythonRDD.scala:441) with 1 output partitions
16/11/30 23:04:13 INFO DAGScheduler: Final stage: ResultStage 0 (runJob at PythonRDD.scala:441)
16/11/30 23:04:13 INFO DAGScheduler: Parents of final stage: List()
16/11/30 23:04:13 INFO DAGScheduler: Missing parents: List()
16/11/30 23:04:13 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at RDD at PythonRDD.scala:48), which has no missing parents
16/11/30 23:04:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 5.6 KB, free 366.3 MB)
16/11/30 23:04:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.7 KB, free 366.3 MB)
16/11/30 23:04:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.43.191.178:43203 (size: 3.7 KB, free: 366.3 MB)
16/11/30 23:04:13 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
16/11/30 23:04:13 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (PythonRDD[1] at RDD at PythonRDD.scala:48)
16/11/30 23:04:13 INFO YarnScheduler: Adding task set 0.0 with 1 tasks
16/11/30 23:04:13 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, hadoop002.dbszod.aws.db.de, partition 0, PROCESS_LOCAL, 5263 bytes)
16/11/30 23:04:13 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 0 on executor id: 11 hostname: hadoop002.dbszod.aws.db.de.
16/11/30 23:04:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop002.dbszod.aws.db.de:41191 (size: 3.7 KB, free: 366.3 MB)
16/11/30 23:04:14 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, hadoop002.dbszod.aws.db.de): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000012/pyspark.zip/pyspark/worker.py", line 172, in main
    process()
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000012/pyspark.zip/pyspark/worker.py", line 167, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000012/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1306, in takeUpToNumLeft
  File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 38, in distributedJsonRead
  File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action
    response = action(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 526, in _make_api_call
    operation_model, request_dict)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 141, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 166, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 150, in create_request
    operation_name=operation_model.name)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit
    return self._emit(event_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit
    response = handler(**kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 147, in sign
    auth.add_auth(request)
  File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 678, in add_auth
    raise NoCredentialsError
NoCredentialsError: Unable to locate credentials

    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
    at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:85)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

16/11/30 23:04:14 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 1, hadoop001.dbszod.aws.db.de, partition 0, PROCESS_LOCAL, 5263 bytes)
16/11/30 23:04:14 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 1 on executor id: 2 hostname: hadoop001.dbszod.aws.db.de.
16/11/30 23:04:14 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop001.dbszod.aws.db.de:42634 (size: 3.7 KB, free: 366.3 MB)
16/11/30 23:04:15 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1, hadoop001.dbszod.aws.db.de): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000003/pyspark.zip/pyspark/worker.py", line 172, in main
    process()
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000003/pyspark.zip/pyspark/worker.py", line 167, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000003/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1306, in takeUpToNumLeft
  File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 38, in distributedJsonRead
  File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action
    response = action(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 526, in _make_api_call
    operation_model, request_dict)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 141, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 166, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 150, in create_request
    operation_name=operation_model.name)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit
    return self._emit(event_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit
    response = handler(**kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 147, in sign
    auth.add_auth(request)
  File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 678, in add_auth
    raise NoCredentialsError
NoCredentialsError: Unable to locate credentials

    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
    at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:85)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

16/11/30 23:04:15 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 2, hadoop001.dbszod.aws.db.de, partition 0, PROCESS_LOCAL, 5263 bytes)
16/11/30 23:04:15 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 2 on executor id: 18 hostname: hadoop001.dbszod.aws.db.de.
16/11/30 23:04:15 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop001.dbszod.aws.db.de:42807 (size: 3.7 KB, free: 366.3 MB)
16/11/30 23:04:16 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2, hadoop001.dbszod.aws.db.de): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/worker.py", line 172, in main
    process()
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/worker.py", line 167, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1306, in takeUpToNumLeft
  File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 38, in distributedJsonRead
  File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action
    response = action(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 526, in _make_api_call
    operation_model, request_dict)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 141, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 166, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 150, in create_request
    operation_name=operation_model.name)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit
    return self._emit(event_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit
    response = handler(**kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 147, in sign
    auth.add_auth(request)
  File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 678, in add_auth
    raise NoCredentialsError
NoCredentialsError: Unable to locate credentials

    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
    at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:85)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

16/11/30 23:04:16 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 3, hadoop001.dbszod.aws.db.de, partition 0, PROCESS_LOCAL, 5263 bytes)
16/11/30 23:04:16 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 3 on executor id: 18 hostname: hadoop001.dbszod.aws.db.de.
16/11/30 23:04:16 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3) on executor hadoop001.dbszod.aws.db.de: org.apache.spark.api.python.PythonException (Traceback (most recent call last):
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/worker.py", line 172, in main
    process()
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/worker.py", line 167, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1306, in takeUpToNumLeft
  File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 38, in distributedJsonRead
  File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action
    response = action(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 526, in _make_api_call
    operation_model, request_dict)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 141, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 166, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 150, in create_request
    operation_name=operation_model.name)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit
    return self._emit(event_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit
    response = handler(**kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 147, in sign
    auth.add_auth(request)
  File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 678, in add_auth
    raise NoCredentialsError
NoCredentialsError: Unable to locate credentials
) [duplicate 1]
16/11/30 23:04:16 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
16/11/30 23:04:16 INFO YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/11/30 23:04:16 INFO YarnScheduler: Cancelling stage 0
16/11/30 23:04:16 INFO DAGScheduler: ResultStage 0 (runJob at PythonRDD.scala:441) failed in 2.779 s
16/11/30 23:04:16 INFO DAGScheduler: Job 0 failed: runJob at PythonRDD.scala:441, took 3.000237 s
Traceback (most recent call last):
  File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 47, in <module>
    df = foo.toDF()
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/session.py", line 57, in toDF
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/session.py", line 520, in createDataFrame
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/session.py", line 360, in _createFromRDD
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/session.py", line 331, in _inferSchema
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1328, in first
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1310, in take
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/context.py", line 941, in runJob
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, hadoop001.dbszod.aws.db.de): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/worker.py", line 172, in main
    process()
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/worker.py", line 167, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1306, in takeUpToNumLeft
  File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 38, in distributedJsonRead
  File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action
    response = action(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 526, in _make_api_call
    operation_model, request_dict)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 141, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 166, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 150, in create_request
    operation_name=operation_model.name)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit
    return self._emit(event_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit
    response = handler(**kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 147, in sign
    auth.add_auth(request)
  File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 678, in add_auth
    raise NoCredentialsError
NoCredentialsError: Unable to locate credentials

    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
    at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:85)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1871)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1884)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1897)
    at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:441)
    at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:211)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/worker.py", line 172, in main
    process()
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/worker.py", line 167, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1306, in takeUpToNumLeft
  File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 38, in distributedJsonRead
  File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action
    response = action(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 526, in _make_api_call
    operation_model, request_dict)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 141, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 166, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 150, in create_request
    operation_name=operation_model.name)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit
    return self._emit(event_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit
    response = handler(**kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 147, in sign
    auth.add_auth(request)
  File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 678, in add_auth
    raise NoCredentialsError
NoCredentialsError: Unable to locate credentials

    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
    at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:85)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    ... 1 more

16/11/30 23:04:16 INFO SparkContext: Invoking stop() from shutdown hook
16/11/30 23:04:16 INFO ServerConnector: Stopped ServerConnector@237d61c{HTTP/1.1}{0.0.0.0:4040}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@78a5e137{/stages/stage/kill,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@42388e91{/api,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@222bd34d{/,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@11fdb674{/static,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2ce2d8b1{/executors/threadDump/json,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@71623233{/executors/threadDump,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@3ef1a284{/executors/json,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@4b6ed980{/executors,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@55511187{/environment/json,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6185cb32{/environment,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@19ed80e1{/storage/rdd/json,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@463b035e{/storage/rdd,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@3d112e33{/storage/json,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5755e8b6{/storage,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@291ec2ea{/stages/pool/json,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@39085b34{/stages/pool,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1e7e9a1d{/stages/stage/json,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6b22b2e5{/stages/stage,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1d01b0f7{/stages/json,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@520ebbf5{/stages,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@39499c00{/jobs/job/json,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@61f2f4f6{/jobs/job,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@948c112{/jobs/json,null,UNAVAILABLE}
16/11/30 23:04:16 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@3bca1cc0{/jobs,null,UNAVAILABLE}
16/11/30 23:04:16 INFO SparkUI: Stopped Spark web UI at http://10.43.191.178:4040
16/11/30 23:04:16 INFO YarnClientSchedulerBackend: Interrupting monitor thread
16/11/30 23:04:16 INFO YarnClientSchedulerBackend: Shutting down all executors
16/11/30 23:04:16 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
16/11/30 23:04:16 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
16/11/30 23:04:16 INFO YarnClientSchedulerBackend: Stopped
16/11/30 23:04:16 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/11/30 23:04:16 INFO MemoryStore: MemoryStore cleared
16/11/30 23:04:16 INFO BlockManager: BlockManager stopped
16/11/30 23:04:16 INFO BlockManagerMaster: BlockManagerMaster stopped
16/11/30 23:04:16 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/11/30 23:04:16 INFO SparkContext: Successfully stopped SparkContext
16/11/30 23:04:16 INFO ShutdownHookManager: Shutdown hook called
16/11/30 23:04:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-bfa7840d-dae8-4e16-8902-f3b257c410cd/pyspark-4956dec2-33f4-49ab-b9e1-34453522ab4c
16/11/30 23:04:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-bfa7840d-dae8-4e16-8902-f3b257c410cd

I think I see the issue. Creating clients / resources is not thread safe. We recommend creating a Session object for each thread where you will be creating clients / resources.

@JordonPhillips Its rather messy but I am initiating a new instance of the class:

before parallel:

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('time-waits-for-no-man')
for object in my_bucket.objects.filter(Prefix='1971-01'):
    s3_list.append(object.key)

in my parralleized function:

def distributedJsonRead(s3Key):
    s3obj = boto3.resource('s3').Object(bucket_name='time-waits-for-no-man', key=s3Key)
    contents = s3obj.get()['Body'].read().decode()

Cheers,

Andrew

I had the same issue and I 've tried "session" as well and it doesn't work.
My code is like

import boto3

session = boto3.session.Session()
client = session.client('s3', region_name='ap-southeast-2')

Traceback (most recent call last):
File "/Users/minddriven/workspace/ww/data-one-one/spark_jobs/parallel_cp/./parallel_cp.py", line 11, in
client = session.client('s3', region_name='ap-southeast-2')
File "/Users/minddriven/workspace/ww/data-one-one/spark_jobs/parallel_cp/deps.zip/boto3/session.py", line 263, in client
File "/Users/minddriven/workspace/ww/data-one-one/spark_jobs/parallel_cp/deps.zip/botocore/session.py", line 826, in create_client
File "/Users/minddriven/workspace/ww/data-one-one/spark_jobs/parallel_cp/deps.zip/botocore/session.py", line 701, in get_component
File "/Users/minddriven/workspace/ww/data-one-one/spark_jobs/parallel_cp/deps.zip/botocore/session.py", line 897, in get_component
File "/Users/minddriven/workspace/ww/data-one-one/spark_jobs/parallel_cp/deps.zip/botocore/session.py", line 186, in create_default_resolver
File "/Users/minddriven/workspace/ww/data-one-one/spark_jobs/parallel_cp/deps.zip/botocore/loaders.py", line 132, in _wrapper
File "/Users/minddriven/workspace/ww/data-one-one/spark_jobs/parallel_cp/deps.zip/botocore/loaders.py", line 424, in load_data
botocore.exceptions.DataNotFoundError: Unable to load data for: endpoints

@ttangww your issue appears to be something else, in your case it is unable to find the data files that should be bundled with botocore. I'm closing out this issue, but feel free to open another if this persists.

I am having the same problem when I run this in AWS EMR.

#!/usr/bin/python

import boto3

def sendTestBoto3S3():
    client = boto3.client('s3', region_name='us-east-1')
    response = client.list_buckets()

    print response

def mainTest():
    sendTestBoto3S3()

if __name__ == '__main__':
    mainTest()

Here is my error:

Traceback (most recent call last):
  File "mainTest.py", line 18, in <module>
    mainTest()
  File "mainTest.py", line 15, in mainTest
    sendTestBoto3S3()
  File "mainTest.py", line 9, in sendTestBoto3S3
    client = boto3.client('s3', region_name='us-east-1')
  File "/mnt/yarn/usercache/hadoop/appcache/application_1514061511432_0008/container_1514061511432_0008_01_000001/emsencodinglibs.zip/boto3/__init__.py", line 83, in client

  File "/mnt/yarn/usercache/hadoop/appcache/application_1514061511432_0008/container_1514061511432_0008_01_000001/emsencodinglibs.zip/boto3/session.py", line 263, in client
  File "/mnt/yarn/usercache/hadoop/appcache/application_1514061511432_0008/container_1514061511432_0008_01_000001/emsencodinglibs.zip/botocore/session.py", line 851, in create_client
  File "/mnt/yarn/usercache/hadoop/appcache/application_1514061511432_0008/container_1514061511432_0008_01_000001/emsencodinglibs.zip/botocore/session.py", line 726, in get_component
  File "/mnt/yarn/usercache/hadoop/appcache/application_1514061511432_0008/container_1514061511432_0008_01_000001/emsencodinglibs.zip/botocore/session.py", line 922, in get_component
  File "/mnt/yarn/usercache/hadoop/appcache/application_1514061511432_0008/container_1514061511432_0008_01_000001/emsencodinglibs.zip/botocore/session.py", line 189, in create_default_resolver
  File "/mnt/yarn/usercache/hadoop/appcache/application_1514061511432_0008/container_1514061511432_0008_01_000001/emsencodinglibs.zip/botocore/loaders.py", line 132, in _wrapper
  File "/mnt/yarn/usercache/hadoop/appcache/application_1514061511432_0008/container_1514061511432_0008_01_000001/emsencodinglibs.zip/botocore/loaders.py", line 424, in load_data
botocore.exceptions.DataNotFoundError: Unable to load data for: endpoints

Kinda, you have to be very careful to reinitialise your connections during
each stage. It can be very difficult to keep track of your connections.

On 23 Dec 2017 21:36, "Karthik" notifications@github.com wrote:

@mooperd https://github.com/mooperd Did you find a solution? I am
having the same problem. Here is my code:

`
def sendTestBoto3S3():
"""
sends the computed modeling metrics to cloudwatch
"""
client = boto3.client('s3', region_name='us-east-1')
response = client.list_buckets()

print response

`

Here is how I am calling it:
sendTestBoto3S3()

Here is my error:
Traceback (most recent call last): File "main.py", line 73, in
main() File "main.py", line 42, in main sendTestBoto3S3() File
"/mnt/yarn/usercache/hadoop/appcache/application_
1514061511432_0002/container_1514061511432_0002_01_000001/
emsencodingjobs.zip/emsencodingjobs/common/metric.py", line 70, in
sendTestBoto3S3 File "/mnt/yarn/usercache/hadoop/appcache/application_
1514061511432_0002/container_1514061511432_0002_01_000001/
emsencodinglibs.zip/boto3/session.py", line 263, in client File
"/mnt/yarn/usercache/hadoop/appcache/application_
1514061511432_0002/container_1514061511432_0002_01_000001/
emsencodinglibs.zip/botocore/session.py", line 851, in create_client File
"/mnt/yarn/usercache/hadoop/appcache/application_
1514061511432_0002/container_1514061511432_0002_01_000001/
emsencodinglibs.zip/botocore/session.py", line 726, in get_component File
"/mnt/yarn/usercache/hadoop/appcache/application_
1514061511432_0002/container_1514061511432_0002_01_000001/
emsencodinglibs.zip/botocore/session.py", line 922, in get_component File
"/mnt/yarn/usercache/hadoop/appcache/application_
1514061511432_0002/container_1514061511432_0002_01_000001/
emsencodinglibs.zip/botocore/session.py", line 189, in
create_default_resolver File "/mnt/yarn/usercache/hadoop/
appcache/application_1514061511432_0002/container_
1514061511432_0002_01_000001/emsencodinglibs.zip/botocore/loaders.py",
line 132, in _wrapper File "/mnt/yarn/usercache/hadoop/
appcache/application_1514061511432_0002/container_
1514061511432_0002_01_000001/emsencodinglibs.zip/botocore/loaders.py",
line 424, in load_data botocore.exceptions.DataNotFoundError: Unable to
load data for: endpoints

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/boto/boto3/issues/902#issuecomment-353749542, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AC3exSm0GxGm5Kd2bfkgkUKP5ZOCsXgVks5tDXJPgaJpZM4LAza-
.

Does anyone else have a suggestion? on Running that simple code in AWS EMR PySpark?

I think its the nature of pyspark that a new session is created for each
stage therefore a new connection initialisation will be needed for each
stage.

On 29 Dec 2017 04:49, "Karthik" notifications@github.com wrote:

Does anyone else have a suggestion? on Running that simple code in AWS EMR
Spark?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/boto/boto3/issues/902#issuecomment-354397582, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AC3exVQSckjyAmB_LPcOZKrffdypvlN5ks5tFG9wgaJpZM4LAza-
.

Please have a look at this SO thread.
https://stackoverflow.com/questions/37950728/boto3-cannot-create-client-on-pyspark-worker/42102858#42102858

It is because the boto3 will download some file to where it lives in.

If you have your python app and dependencies package altogether as a zip, you will be likely to encounter this issue. As those files cannot be save into a zip location.

The work around would be install boto3 on every instance of your spark cluster.

@tly1980 thanks for your feedback. I was able to get past that issue but I was having.

I am not sure if I should open another issue but now I am getting the following error:

Traceback (most recent call last):
  File "main.py", line 73, in <module>
    main()
  File "main.py", line 40, in main
    sendMetricsToCloudwatchBoto3(metrics)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1514935698679_0001/container_1514935698679_0001_01_000001/emsencodingjobs.zip/emsencodingjobs/common/metric.py", line 20, in sendMetricsToCloudwatchBoto3
  File "/usr/local/lib/python2.7/site-packages/botocore/client.py", line 317, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python2.7/site-packages/botocore/client.py", line 602, in _make_api_call
    operation_model, request_dict)
  File "/usr/local/lib/python2.7/site-packages/botocore/endpoint.py", line 143, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python2.7/site-packages/botocore/endpoint.py", line 172, in _send_request
    success_response, exception):
  File "/usr/local/lib/python2.7/site-packages/botocore/endpoint.py", line 265, in _needs_retry
    caught_exception=caught_exception, request_dict=request_dict)
  File "/usr/local/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit
    return self._emit(event_name, kwargs)
  File "/usr/local/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python2.7/site-packages/botocore/retryhandler.py", line 183, in __call__
    if self._checker(attempts, response, caught_exception):
  File "/usr/local/lib/python2.7/site-packages/botocore/retryhandler.py", line 251, in __call__
    caught_exception)
  File "/usr/local/lib/python2.7/site-packages/botocore/retryhandler.py", line 277, in _should_retry
    return self._checker(attempt_number, response, caught_exception)
  File "/usr/local/lib/python2.7/site-packages/botocore/retryhandler.py", line 317, in __call__
    caught_exception)
  File "/usr/local/lib/python2.7/site-packages/botocore/retryhandler.py", line 223, in __call__
    attempt_number, caught_exception)
  File "/usr/local/lib/python2.7/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
    raise caught_exception
botocore.vendored.requests.exceptions.SSLError: [Errno 20] Not a directory

@ucrkarthik , I am happy that works.
btw, can you up vote my answer on that SO thread as you said it works :) .

I am wondering, just as the first step to diagnose the issue:
what if you comment that function "sendMetricsToCloudwatchBoto3" out, will your code work ?

Cheers.

I use the following boto3 version and it seems to work:
sudo pip install -I boto3==1.4.7
sudo pip install -I future==0.16.0
sudo pip install -I requests==2.18.4
sudo pip install -I six==1.11.0
sudo pip install -I tabulate==0.8.1

Was this page helpful?
0 / 5 - 0 ratings