Terraform-provider-aws: Feature request: Support AWS Glue

Created on 15 Aug 2017  ·  36Comments  ·  Source: hashicorp/terraform-provider-aws

AWS Glue has now become GA https://aws.amazon.com/glue/

It would be nice to have this running through Terraform. Do you accept PR's as I could look to add it?

new-resource servicglue

Most helpful comment

Hi, everyone! :wave: I'm going to split this work into more manageable issues so that there can be a clear definition of done for various support. Looking at the API, here are what look like relevant resources and data sources for Terraform to manage with the AWS Glue service:

  • New aws_glue_classifier resource: #3873
  • New aws_glue_connection resource: #3874
  • New aws_glue_crawler resource: #3875
  • New aws_glue_dev_endpoint resource: #3876
  • New aws_glue_job resource: #3877
  • New aws_glue_partition resource: #3878
  • New aws_glue_script data source: #3879
  • New aws_glue_table resource: #3880
  • New aws_glue_trigger resource: #3882
  • New aws_glue_user_defined_function resource: #3883

Please follow along the split issues and note any that you would like to contribute in the new issues. Personally, I will be picking up connections and jobs shortly. Thanks!

All 36 comments

Hey @darrenhaken

It seems the service is available in the Go SDK and CRUDs operations are also available.
Feel free to open a PR for that, we'll be more than happy to review & help on that :)

Thanks :-) is there any docs on how to write Terraform to hook into the SDK? I'll do some digging myself of course.

On Tue, Aug 15, 2017 at 10:27 AM +0100, "Gauthier Wallet" notifications@github.com wrote:

Hey @darrenhaken

It seems the service is available in the Go SDK and CRUDs operations are also available.

Feel free to open a PR for that, we'll be more than happy to review & help on that :)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Vendor dependencies

Generally speaking, you first need to vendor the dependencies, as exposed here: https://github.com/terraform-providers/terraform-provider-aws/pull/1417.

Connection management

To be able to interact with the SDK and thus the API, you would need to hook the connection, in the aws/config.go file. Then, you would be able to get & use it in the resource.

Resource creation

A good start is the plugin provider documentation page, along with the core repository documentation.

At first you would need to create a HCL structure that matches the API. This is about creating a schema, which is an abstraction of the resource itself.

Tests

Adding acceptance tests is a good way of testing without the need of compiling yourself but reproducing a real-world use-case.
I encourage you to check what was already made by other contributors, which is a good start.

Once the test is written, run it:

make testacc TEST=./aws TESTARGS='-run=TestAccAWSResourceGlueClassifier_'

This will run only the test with name prefix being TestAccAWSResourceGlueClassifier_.

Documentation

The documentation is a required step for people to use it. It should be crystal clear about how to use each options, what is required, optional.
Again, I encourage you to check what was already done, will help a lot 😄

Global example

Check out this example about hooking into the SDK, resource, tests, documentation, etc.

Also, as per the 0.10 providers split, you may then need to build your local provider version using the README page to check it locally.

Last bit, if you could open a PR for the vendoring, and then 1 for each resource, it would be awesome.
For instance, 1 PR for the classifier + tests + doc, 1 for the devendpoint + tests + doc, etc.
If you need any review (even in a WIP), just ask :)

Hope all of this is good to you!
Have fun :) 👍

I'm (probably) going to work my way through the resources. If anyone has any wishes as to which glue resource they would want next after the above database resource, let me know.

I'm taking requests, but can't promise anything.

@drewsonne I'd like the crawlers next and complete the data catalogue part of the product.

Do you need any help working on it?

The aws_glue_catalog_database resource has been released in terraform-provider-aws version 1.7.0. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

When did that land?


From: Brian Flad notifications@github.com
Sent: Friday, January 12, 2018 2:32:21 PM
To: terraform-providers/terraform-provider-aws
Cc: Darren Haken; Mention
Subject: Re: [terraform-providers/terraform-provider-aws] Feature request: Support AWS Glue (#1416)

The aws_glue_catalog_database resource has been released in terraform-provider-aws version 1.7.0https://github.com/terraform-providers/terraform-provider-aws/blob/master/CHANGELOG.md#170-january-12-2018. Please see the Terraform documentation on provider versioninghttps://www.terraform.io/docs/configuration/providers.html#provider-versions or reach out if you need any assistance upgrading.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/terraform-providers/terraform-provider-aws/issues/1416#issuecomment-357253145, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AA6Me-NEmBhef9CDJyVRG2PQn6r2MMGOks5tJ2z1gaJpZM4O3Wep.

@drewsonne have you done any work on the crawlers? If not I will pick this up.

@darrenhaken I have not yet. My first plan was to build a test case with test steps in a similiar manner to the database catalogue. I might get onto this in the week, but let me know how you go and if you start work on this in another repo

@drewsonne I'll take a look at the Crawlers. Is the vendor code up to date and contains the AWS SDK API for it?

Can't remember, but it's in the vendor json file

@darrenhaken I've added a skeleton test case for the crawler here. https://github.com/drewsonne/terraform-provider-aws/commit/2dcd3c6c4eaafae76ac6e45301e24113441c78d9 . It's by no means exhaustive, but would definitely be a start.

Not sure how you want to do this, but that's a start for you.

@drewsonne thanks, I'll use that to start looking at it.

@drewsonne I'm pretty close to having the crawlers finished

@darrenhaken sweet, I assume it's the "glue_crawlers" branch in your repo?

@drewsonne that's the one. I haven't pushed it because I've been wrangling some issues with defining a schema.

Have you done much with TF before? Maybe you can provide a pointer or two if I show you.

WIP has been pushed now if interested to see progress

@darrenhaken I've got a bit of experience writing providers. What are the issues you've been wrangling with the schema specifically?

@drewsonne I'm working on creating the Crawler with the required fields Name, DatabaseName and Targets. See https://docs.aws.amazon.com/sdk-for-go/api/service/glue/#CreateCrawlerInput

Here is the struct definition for Targets which is a nested struct within CreateCrawlerInput:

type CrawlerTargets struct {

    // Specifies JDBC targets.
    JdbcTargets []*JdbcTarget `type:"list"`

    // Specifies Amazon S3 targets.
    S3Targets []*S3Target `type:"list"`
    // contains filtered or unexported fields
}

What I am trying to work out is what is the correct TF schema to map to the CrawlerTargets struct. I made an attempt in the acceptance test to do this but I didn't get any luck.

FYI the acceptance test isn't complete yet - you need to have an AWS account with an S3 bucket and an AWS Glue DB.

In these cases (and I think there's some precedence for this, but I can not remember where), I would flatten the schema a bit, eg:

"s3_targets": {
    Type:          schema.TypeList,
    Optional:      true,
    MinItems:      1,
    Elem: &schema.Resource{
        Schema: map[string]*schema.Schema{
            "path": {
                Type:     schema.TypeString,
                Required: true,
            },
            "exclusions": {
                Type:     schema.TypeSet,
                Optional: true,
                Elem:     schema.TypeString,
            },
        },
    },
},
"jdbc_targets": {
    Type:          schema.TypeList,
    Optional:      true,
    MinItems:      1,
    Elem: &schema.Resource{
        Schema: map[string]*schema.Schema{
            "connection_name": {
                Type:     schema.TypeString,
                Required: true,
            },
            "path": {
                Type:     schema.TypeString,
                Required: true,
            },
            "exclusions": {
                Type:     schema.TypeSet,
                Optional: true,
                Elem:     schema.TypeString,
            },
        },
    },
},

And then put them back into the schema structure back in the Read/Write/Update handlers. I think this is simpler as the two top-level attributes have very specific schemas, in comparison to either having two different types of objects (jdbc and s3) smushed into one schema, or having an extra layer in the config tree.

Hoping maybe @Ninir, @bflad, or @radeksimko would weigh in on this?

@drewsonne I'd thought about flattening the schema. The only drawback is that it affects Required: true fields, you only need s3_targets or jdbc_targets so declaring them as both being required doesn't make sense.

In general, we tend to prefer to follow the AWS API/schema unless the user experience can be heavily improved by doing our own implementation due to future maintenance costs.

While we cannot set a Required: true type schema definition to say require one of s3_targets or jdbc_targets (at the moment anyways -- its a decent request we could use in other places as well and probably worth submitting a feature request upstream into Terraform core), we can do the logic in create/update to essentially require one or the other.

e.g. something like this should do the trick

jdbc_targets, jdbc_targets_ok := d.GetOk("jdbc_targets")
s3_targets, s3_targets_ok := d.GetOk("s3_targets")
if !jdbc_targets_ok && !s3_targets_ok {
    return fmt.Errorf("jdbc_targets or s3_targets configuration is required")
}

@bflad that makes sense, I think deviating too much from the AWS API would only cause confusion in the long run. I'll look at combining both yours and @drewsonne suggestions.

btw, @darrenhaken if you keep your git branch up to date, I can write some acceptance tests for you with the S3 bucket and the glue db and even run them.

Sure I’ll pull in the changes today then that will be handy


From: Drew J. Sonne notifications@github.com
Sent: Thursday, January 25, 2018 6:26:39 AM
To: terraform-providers/terraform-provider-aws
Cc: Darren Haken; Mention
Subject: Re: [terraform-providers/terraform-provider-aws] Feature request: Support AWS Glue (#1416)

btw, @darrenhakenhttps://github.com/darrenhaken if you keep your git branch up to date, I can write some acceptance tests for you with the S3 bucket and the glue db and even run them.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/terraform-providers/terraform-provider-aws/issues/1416#issuecomment-360373541, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AA6Me_dAR9WFnQxY62FXVzoa0u408Hawks5tOB6fgaJpZM4O3Wep.

I can give you write permissions to the branch if you’d like?


From: Darren Haken darrenhaken@gmail.com
Sent: Thursday, January 25, 2018 6:31:01 AM
To: terraform-providers/terraform-provider-aws; terraform-providers/terraform-provider-aws
Cc: Mention
Subject: Re: [terraform-providers/terraform-provider-aws] Feature request: Support AWS Glue (#1416)

Sure I’ll pull in the changes today then that will be handy


From: Drew J. Sonne notifications@github.com
Sent: Thursday, January 25, 2018 6:26:39 AM
To: terraform-providers/terraform-provider-aws
Cc: Darren Haken; Mention
Subject: Re: [terraform-providers/terraform-provider-aws] Feature request: Support AWS Glue (#1416)

btw, @darrenhakenhttps://github.com/darrenhaken if you keep your git branch up to date, I can write some acceptance tests for you with the S3 bucket and the glue db and even run them.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/terraform-providers/terraform-provider-aws/issues/1416#issuecomment-360373541, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AA6Me_dAR9WFnQxY62FXVzoa0u408Hawks5tOB6fgaJpZM4O3Wep.

@darrenhaken what is the situation of your work? Any progress?

@cemo I’ll start looking at it again this week. I had to flip back to finishing off another TF PR which has been accepted now. Sorry for the delay!

I'm interested in using this as well, is there any expected timeline?

I’ll add crawlers this week (hopefully) with the required fields added. I will link the repo/branch.

@drewsonne did you still want to work on this?

I've been working on this some more and I have an acceptance test with the following TF:

resource "aws_glue_catalog_crawler" "test" {
      name = "test"
      database_name = "db_name"
      role = "${aws_iam_role.glue.arn}"
      s3_target {
        path = "s3://bucket"
      }
    }

    resource "aws_iam_role_policy_attachment" "aws-glue-service-role-default-policy-attachment" {
        policy_arn = "arn:aws:iam::aws:policy/AWSGlueServiceRole"
        role = "${aws_iam_role.glue.arn}"
    }

    resource "aws_iam_role" "glue" {
        name = "AWSGlueServiceRoleDefault"
        assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
    }

I receive the following error:

=== RUN   TestAccAWSGlueCrawler_full
--- FAIL: TestAccAWSGlueCrawler_full (13.25s)
    testing.go:513: Step 0 error: Error applying: 2 error(s) occurred:

        * aws_iam_role_policy_attachment.aws-glue-service-role-default-policy-attachment: 1 error(s) occurred:

        * aws_iam_role_policy_attachment.aws-glue-service-role-default-policy-attachment: [WARN] Error attaching policy arn:aws:iam::aws:policy/AWSGlueServiceRole to IAM Role arn:aws:iam::{account_id}:role/AWSGlueServiceRoleDefault: ValidationError: The specified value for roleName is invalid. It must contain only alphanumeric characters and/or the following: +=,.@_-
            status code: 400, request id: 43da944e-27a7-11e8-8481-452f34f141d6
        * aws_glue_catalog_crawler.test: 1 error(s) occurred:

        * aws_glue_catalog_crawler.test: error creating Glue crawler: InvalidInputException: Service is unable to assume role arn:aws:iam::{account_id}:role/AWSGlueServiceRoleDefault. Please verify role's TrustPolicy
            status code: 400, request id: 4405c2e7-27a7-11e8-9a99-111706a35ce0
FAIL
FAIL    github.com/terraform-providers/terraform-provider-aws/aws   13.293s
make: *** [testacc] Error 1

Any ideas?

This error is being generated from the AWS API:

        * aws_iam_role_policy_attachment.aws-glue-service-role-default-policy-attachment: [WARN] Error attaching policy arn:aws:iam::aws:policy/AWSGlueServiceRole to IAM Role arn:aws:iam::697329683179:role/AWSGlueServiceRoleDefault: ValidationError: The specified value for roleName is invalid. It must contain only alphanumeric characters and/or the following: +=,.@_-
            status code: 400, request id: 43da944e-27a7-11e8-8481-452f34f141d6

Relevant code:

https://github.com/terraform-providers/terraform-provider-aws/blob/cc36c1d6c312b0e2cc024f97bb35ba6b59c3f8c2/aws/resource_aws_iam_role_policy_attachment.go#L41-L44

It looks like we're missing some validation to ensure its an IAM role name being passed in instead of a IAM role ARN (or properly handle this situation by parsing the name out of the ARN). We should probably make a separate issue for that. 😉

You can fix your Terraform configuration by changing the role attribute to use aws_iam_role.role.name instead of aws_iam_role.role.arn for now 👍

resource "aws_iam_role_policy_attachment" "aws-glue-service-role-default-policy-attachment" {
  policy_arn = "arn:aws:iam::aws:policy/AWSGlueServiceRole"
  role       = "${aws_iam_role.glue.name}"
}

I've tried the name and I still had the same issue. I'll be in the office in a couple of hours, I'll push my WIP on my fork if you wouldn't mind taking a look? The error is raised by Amazon around the crawler input and something about trust policy.

Any help is appreciated :-)

@bflad here is the changed code:

resource "aws_glue_catalog_crawler" "test" {
      name = "test"
      database_name = "db_name"
      role = "${aws_iam_role.glue.name}"
      s3_target {
        path = "s3://bucket"
      }
    }

    resource "aws_iam_role_policy_attachment" "aws-glue-service-role-default-policy-attachment" {
        policy_arn = "arn:aws:iam::aws:policy/AWSGlueServiceRole"
        role = "${aws_iam_role.glue.name}"
    }

    resource "aws_iam_role" "glue" {
        name = "AWSGlueServiceRoleDefault"
        assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
    }

Error is same as the original question and mentions:

Service is unable to assume role arn:aws:iam::{account}:role/AWSGlueServiceRoleDefault. Please verify role's TrustPolicy

You can see the code at:
[email protected]:darrenhaken/terraform-provider-aws.git and the branch is glue_crawlers

Hi, everyone! :wave: I'm going to split this work into more manageable issues so that there can be a clear definition of done for various support. Looking at the API, here are what look like relevant resources and data sources for Terraform to manage with the AWS Glue service:

  • New aws_glue_classifier resource: #3873
  • New aws_glue_connection resource: #3874
  • New aws_glue_crawler resource: #3875
  • New aws_glue_dev_endpoint resource: #3876
  • New aws_glue_job resource: #3877
  • New aws_glue_partition resource: #3878
  • New aws_glue_script data source: #3879
  • New aws_glue_table resource: #3880
  • New aws_glue_trigger resource: #3882
  • New aws_glue_user_defined_function resource: #3883

Please follow along the split issues and note any that you would like to contribute in the new issues. Personally, I will be picking up connections and jobs shortly. Thanks!

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

Was this page helpful?
0 / 5 - 0 ratings