Terraform-provider-aws: Add ability for aws_glue_job to use Python 3 and glue version 1.0

Created on 26 Jul 2019  ·  22Comments  ·  Source: hashicorp/terraform-provider-aws

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

AWS Glue now supports the ability to run ETL jobs on Apache Spark 2.4.3 (with Python 3).. terraform support needed

New or Affected Resource(s)

aws_glue_job

Potential Terraform Configuration

resource "aws_glue_job" "aws_glue_job_foo" {
  glue_version = "1"
  name         = "job-name"
  description  = "job-desc"
  role_arn     = data.aws_iam_role.aws_glue_iam_role.arn
  max_capacity = 1
  max_retries  = 1
  connections  = [aws_glue_connection.connection.name]
  timeout      = 5

  command {
    name            = "pythonshell"
    script_location = "s3://bucket/script.py"
    python_version  = "3"
  }

  default_arguments = {    
    "--job-language" = "python"
    "--ENV"          = "env"
    "--ROLE_ARN"     = data.aws_iam_role.aws_glue_iam_role.arn
  }

  execution_property {
    max_concurrent_runs = 1
  }
}

References

enhancement servicglue

Most helpful comment

Support for the new glue_version argument in the aws_glue_job resource has been merged and will release with version 2.34.0 of the Terraform AWS Provider, on Thursday. 👍

All 22 comments

It looks like this is enabled via the Glue version for a job, added in AWS SDK v1.21.4.
Requires:

Related: #9409

+1

Alternative to set python and glue version

resource "aws_glue_job" "etl" {
  name     = "${var.job_name}"
  role_arn = "${var.iam_role_arn}"

  command {
    script_location = "s3://${var.bucket_name}/${aws_s3_bucket_object.script.key}"
  }

  default_arguments = {
    "--enable-metrics" = ""
    "--job-language" = "python"
    "--TempDir" = "s3://${var.bucket_name}/TEMP"
  }

  # Manually set python 3 and glue 1.0
  provisioner "local-exec" {
    command = "aws glue update-job --job-name ${var.job_name} --job-update 'Command={ScriptLocation=s3://${var.bucket_name}/${aws_s3_bucket_object.script.key},PythonVersion=3,Name=glueetl},GlueVersion=1.0,Role=${var.iam_role_arn},DefaultArguments={--enable-metrics=\"\",--job-language=python,--TempDir=\"s3://${var.bucket_name}/TEMP\"}'"
  }
}

Any idea when does this change get pushed?

The solution/workaround provided by @ezidio works exactly as expected.

But it would be good if this change is made through terraform and pushed.

I will do the same as @ezidio suggests, but with job-language scala instead of python.
I also think it would be good if it would work without this workaround.

Maybe more urgent given the announcement of python 2's official sunsetting on Jan 1 2020.

the "workaround" is a horrible PITA as all the arguments need to be flattened into one string... with nested escaping.

In fact there is no proper workaround because any modification will reset the job back to Python 2. The local-exec provisioner will not be rerun.

In fact there is no proper workaround because any modification will reset the job back to Python 2. The local-exec provisioner will not be rerun.

I think the below script logic should do the job for you. Using null resource based on timestamp.

resource "aws_glue_job" "etl" {
  name     = "${local.name}"
  role_arn = "${module.crawler_role.role_arn}"

  command {
    script_location = "s3://abc/abc.py"
  }

  default_arguments = {
    "--job-language" = "python"
    "--database"     = "${local.name}"
    "--s3bucket"    = "${var.bucket_name}"
  }
}

resource "null_resource" "cluster" {
  depends_on = ["aws_glue_job.etl"]

  triggers = {
    time = "${timestamp()}"
  }

  provisioner "local-exec" {
    command = "aws glue update-job --job-name ${local.name} --job-update 'Role=${module.crawler_role.role_arn}, Command={ScriptLocation=s3://abc/abc.py,PythonVersion=3,Name=glueetl}, DefaultArguments={--job-language=python,--database=${local.name},--s3bucket=<bucket-name>}, Connections={Connections=[${local.name}]}, GlueVersion=1.0'"
  }
}

@Vedant-R now it will be always run, 40 times for the 40 jobs, without any changes... :F

@Vedant-R now it will be _always_ run, 40 times for the 40 jobs, without any changes... :F

Yes, but it solves your purpose of not getting reset to python2.

The python_version = "3" option is enabled in latest provider terraform-provider-aws_v2.29.0, however that did not modify "Spark version" on an existing job, because of which the job failed with the following error.

JobName:XXXXXXX and JobRunId:jr_XXXXXXXXX failed to execute with exception Unsupported pythonVersion 3 for given glueVersion 0.9

The python_version = "3" option is enabled in latest provider terraform-provider-aws_v2.29.0, however that did not modify "Spark version" on an existing job, because of which the job failed with the following error.

JobName:XXXXXXX and JobRunId:jr_XXXXXXXXX failed to execute with exception Unsupported pythonVersion 3 for given glueVersion 0.9

I am also getting the same issue. What i am missing?

The python_version = "3" option is enabled in latest provider terraform-provider-aws_v2.29.0, however that did not modify "Spark version" on an existing job, because of which the job failed with the following error.
JobName:XXXXXXX and JobRunId:jr_XXXXXXXXX failed to execute with exception Unsupported pythonVersion 3 for given glueVersion 0.9

I am also getting the same issue. What i am missing?

For time being I have deployed with python_version 3 and then from AWS console modified the job with glueVersion 1.. This fixed it.. However its good to have a fix from the provider

This issue can be fixed through Cloud formation template. In cloud formation template we can directly declare Glue version and Python version. It will be very easy way and no need update aws provider.

{
"Description": "AWS Glue Job ",
"Resources": {
"GlueJob": {
"Type": "AWS::Glue::Job",
"Properties": {
"Command": {
"Name": "glueetl",
"ScriptLocation": "${script_location}",
"PythonVersion" : "3"
},
"DefaultArguments": {
"--job-language": "${job-language}",
"--TempDir" : "${TempDir}",
"--extra-jars" : "${extra-jars}"
},
"Name": "${Name}",
"Role": "${role_arn}",
"MaxCapacity" : 10,
"GlueVersion" : "1.0"
}
}
}
}

@g-sree all updates using Terraform will always reset the Python version...

updates using Terraform will always reset the Python version...

This is my terraform declaration ..

resource "aws_glue_job" "test_glue_job" {
  name     = "name"
  role_arn = "iam_role"

  command {
    script_location = "script"
    python_version  = 3
  }

  default_arguments = {
    ~~~~ truncated ~~~~
  }
}

I'm using python3.. however if you modify an existing job only python version changes and not the glue version which should be 1.0 . In this case the job will fail in the next run. I have manually updated the glue version from 0.9 to 1.0 in the AWS console. I never had a problem afterwards .

A cloudformation-based workaround: https://github.com/terraform-providers/terraform-provider-aws/issues/8526#issuecomment-490161140

resource "aws_cloudformation_stack" "network" {
  name = "${local.name}-glue-job"

  template_body = <<STACK
{
  "Resources" : {
    "MyJob": {
      "Type": "AWS::Glue::Job",
      "Properties": {
        "Command": {
          "Name": "glueetl",
          "ScriptLocation": "s3://${local.bucket_name}/jobs/${var.job}"
        },
        "ExecutionProperty": {
         "MaxConcurrentRuns": 2
        },
        "MaxRetries": 0,
        "Name": "${local.name}",
        "Role": "${var.role}"
      }
    }
  }
}
STACK
}

Support for the new glue_version argument in the aws_glue_job resource has been merged and will release with version 2.34.0 of the Terraform AWS Provider, on Thursday. 👍

This has been released in version 2.34.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

Was this page helpful?
0 / 5 - 0 ratings