Boto3: Lambda function to execute an query on ATHENA and store the results back in S3

Created on 20 Apr 2018  ·  12Comments  ·  Source: boto/boto3

Hi,
Here is what I am trying to get .

  1. I have an application writing to AWS DynamoDb-> A Keinesis writing to S3 bucket.
  2. I use an ATHENA to query to the Data from S3 based on monthly buckets/Daily buckets to create a table on clean up data from S3 ( extracting required string from the CSV stored in S3).
  3. AWS ATHENA does not allow INSERT_INTO/INSERT_OVERWRITE to modify the table contents. Hence i am going the LAMBDA way to run a query on the ATHENA created table and store the result back to S3 which i can use to create visualizations in AWS quicksight.
  4. This is the Lambda function
import boto3


def lambda_handler(event, context):
    query_1 =   "Select REGEXP_EXTRACT(data,'[a-z]*[0-9]') as datacenter,\
                 REGEXP_EXTRACT(response_code,'[0-9]+') CODE, \
                 REGEXP_EXTRACT(pool_id,'[a-z]*[0-9]+') as TOWER,\
                 CASE \
                 WHEN response_code like '%2%' THEN '1' \
                 WHEN response_code like '%3%' THEN '1' \
                 WHEN response_code like '%4%' THEN '1' \
                 ELSE '0' \
                 END as STATUS \
                 FROM probe_result_v3.cwsproberesults \
                 WHERE pool_id like 'POOL_ID%';"

    database = "xxx-xxx-xx"
    s3_output = "s3://xxxx-results/"

    client = boto3.client('athena')

    response = client.start_query_execution(QueryString = query_1,
                                        QueryExecutionContext={
                                            'Database': database
                                        },
                                        ResultConfiguration={
                                            'OutputLocation': 's3://xxxx-results/resultfolder/'
                                        }
                                        )
    return response

the execution log from lambda returns success.
Response:

{
"QueryExecutionId": "d8f8104f-407c-4eff-b57d-b9bbf57e5196",
"ResponseMetadata": {
"RetryAttempts": 0,
"HTTPStatusCode": 200,
"RequestId": "2e6f5d29-43b2-11e8-862c-077a4462e1c2",
"HTTPHeaders": {
"date": "Thu, 19 Apr 2018 09:15:19 GMT",
"x-amzn-requestid": "2e6f5d29-43b2-11e8-862c-077a4462e1c2",
"content-length": "59",
"content-type": "application/x-amz-json-1.1",
"connection": "keep-alive"
}
}
}
  1. however when i go back to the s3://xxxx-results/resultfolder/ i see nothing created.
  2. When i execute the query alone from ATHENA Query editor, i see the CSV created in the S3 bucket location, but then it is an on demand query and I am trying to schedule this so that i can use it in the QUICKSIGHT for an hourly graph

Please can you help me fix this.

service-api

Most helpful comment

Thought I would chime in as I was involved in this to get this resolved. It was simplest case of not having proper IAM permissions. The Lambda role had no s3 perms and wasn't generating an exception. When it fires response = client.start_query_execution(QueryString = query_1, QueryExecutionContext={ 'Database': database }, ResultConfiguration={ 'OutputLocation': 's3://xxxx-results/resultfolder/' } )
As Response is not failing ( It throws a query in Athena with ResultConfiguration and assumes that the job is done) It has no way of knowing if it had actually written the output to the S3 bucket because of it being Asynchronous call. There are programmatic solutions to handle this obviously.

How we found it:
Error and Trial. Among few other steps which didn't help, I replicated her setup in my Test environment and gave lambda role "Full Admin" to isolate the cause. From there we figured it out.
Hope that helps.
@snehamirajkar Sorry.. Thought I will answer this to help the community.

All 12 comments

I don't know offhand, since this is a service specific question not a boto3 question I would suggest you ask on the Athena service forums or stack overflow for a better chance of getting an answer. There are too many services for us to know the ins and outs of all of them.

Hi can you please let me know which forum it is for ATHENA ?

Regards,
Sneha

From: Donald Stufft <[email protected]notifications@github.com>
Reply-To: boto/boto3 <[email protected]reply@reply.github.com>
Date: Friday, April 20, 2018 at 11:34 PM
To: boto/boto3 <[email protected]boto3@noreply.github.com>
Cc: smirajka <[email protected]smirajka@cisco.com>, Author <[email protected]author@noreply.github.com>
Subject: Re: [boto/boto3] Lambda function to execute an query on ATHENA and store the results back in S3 (#1538)

I don't know offhand, since this is a service specific question not a boto3 question I would suggest you ask on the Athena service forums or stack overflow for a better chance of getting an answer. There are too many services for us to know the ins and outs of all of them.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com/boto/boto3/issues/1538#issuecomment-383176803, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Akw11IiIqmXkILFsqxL-JwQ-2_7mtzPzks5tqiM0gaJpZM4Tc1bX.

Hi Sneha, could you please tell me how did you worked around this?
or where did you continue the thread?

Thanks.

Thank you i FIXE IT ALL

Could you write the solution? or give me a hint of what you did?
That would be very helpful! :D

@snehamirajkar it would be wonderful if you could share your solution for those who are still having this issue. Thanks ☺️

Thought I would chime in as I was involved in this to get this resolved. It was simplest case of not having proper IAM permissions. The Lambda role had no s3 perms and wasn't generating an exception. When it fires response = client.start_query_execution(QueryString = query_1, QueryExecutionContext={ 'Database': database }, ResultConfiguration={ 'OutputLocation': 's3://xxxx-results/resultfolder/' } )
As Response is not failing ( It throws a query in Athena with ResultConfiguration and assumes that the job is done) It has no way of knowing if it had actually written the output to the S3 bucket because of it being Asynchronous call. There are programmatic solutions to handle this obviously.

How we found it:
Error and Trial. Among few other steps which didn't help, I replicated her setup in my Test environment and gave lambda role "Full Admin" to isolate the cause. From there we figured it out.
Hope that helps.
@snehamirajkar Sorry.. Thought I will answer this to help the community.

Thought I would chime in as I was involved in this to get this resolved. It was simplest case of not having proper IAM permissions. The Lambda role had no s3 perms and wasn't generating an exception. When it fires response = client.start_query_execution(QueryString = query_1, QueryExecutionContext={ 'Database': database }, ResultConfiguration={ 'OutputLocation': 's3://xxxx-results/resultfolder/' } )
As Response is not failing ( It throws a query in Athena with ResultConfiguration and assumes that the job is done) It has no way of knowing if it had actually written the output to the S3 bucket because of it being Asynchronous call. There are programmatic solutions to handle this obviously.

How we found it:
Error and Trial. Among few other steps which didn't help, I replicated her setup in my Test environment and gave lambda role "Full Admin" to isolate the cause. From there we figured it out.
Hope that helps.
@snehamirajkar Sorry.. Thought I will answer this to help the community.

Which S3 permissions were required. I have given lambda full access to required s3 bucket but unless I give it full access to everything in S3 it does not seem to work. I think there is something going on here but I can't put my finger on it

Full Admin Access is not required. S3 bucket access is sufficient. There is an issue in the code. one , is missing in the last statement.

OutputLocation': 's3://xxxx-results/resultfolder/' ,

@snehamirajkar & @warpspeed6 Sorry , i struggle a lot so thought to update this and will help others.

What is the solution?

As mentioned above by
"Full Admin Access is not required. S3 bucket access is sufficient. There is an issue in the code. one , is missing in the last statement.

OutputLocation': 's3://xxxx-results/resultfolder/' ,

@snehamirajkar & @warpspeed6 Sorry , i struggle a lot so thought to update this and will help others.
should be enough

Does anyone know how to export result from redshift query to sc bucket using lambda function?

There is an Unload statement which export query result to s3.
unload ('select * from venue')
to 's3://mybucket/tickit/venue_'
access_key_id ''
secret_access_key ''
session_token '';

but I want to schedule this daily.

Was this page helpful?
0 / 5 - 0 ratings