```$ terraform version
Terraform v0.11.11
### Affected Resource(s)
<!--- Please list the affected resources and data sources. --->
* aws_emr_cluster
### Terraform Configuration Files
<!--- Information about code formatting: https://help.github.com/articles/basic-writing-and-formatting-syntax/#quoting-code --->
```hcl
data "template_file" "emr_configuration_json" {
template = <<EOF
[
{
"classification": "spark-log4j",
"configurations": [],
"properties": {
"log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter": "WARN",
"log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper": "WARN",
"log4j.logger.org.apache.spark.streaming": "WARN",
"log4j.rootCategory": "WARN, console"
}
},
{
"classification": "spark-env",
"configurations": [
{
"classification": "export",
"configurations": [],
"properties": {
"PYSPARK_PYTHON": "/usr/bin/python3.6"
}
}
],
"properties": {}
}
]
EOF
}
resource "aws_emr_cluster" "data_emr_519" {
name = "data-emr-519"
release_label = "emr-5.19.0"
applications = ["Spark", "Hadoop"]
count = "${lookup(var.emr_count, var.env)}"
ec2_attributes {
subnet_id = "${var.subnet_id}"
emr_managed_master_security_group = "${var.master_security_group_id}"
emr_managed_slave_security_group = "${var.slave_security_group_id}"
instance_profile = "${aws_iam_instance_profile.emr_profile.arn}"
key_name = "${var.data_karl_marx_deploy_key_name}"
additional_master_security_groups = "${var.sg_emr_gocd_id}"
additional_slave_security_groups = "${var.sg_emr_gocd_id}"
}
instance_group {
instance_role = "MASTER"
instance_count = 1
instance_type = "${lookup(var.emr_master_instance_type, var.env)}"
}
instance_group {
instance_role = "CORE"
instance_count = "${lookup(var.emr_core_instances_count, var.env)}"
instance_type = "${lookup(var.emr_core_instances_type, var.env)}"
}
tags {
description = "Managed by Terraform"
"sf:costGroup" = "data"
"sf:env" = "${var.env}"
"sf:team" = "data"
}
configurations_json = "${data.template_file.emr_configuration_json.rendered}"
service_role = "${aws_iam_role.iam_emr_service_role.arn}"
bootstrap_action {
path = "s3://${aws_s3_bucket.data_emr_bootstrap_actions.bucket}/bootstrap.sh"
name = "bootstrap"
}
depends_on = ["aws_s3_bucket.data_emr_bootstrap_actions", "aws_s3_bucket_object.bootstrap"]
}
https://gist.github.com/l13t/cca0c1c195c34dc275bdae51cc413cfe
No cluster recreation.
Cluster is recreated because of difference in configuration.
Trying to fix an issue I copied config from aws console. So configurations_json
is:
[
{
"classification": "spark-log4j",
"configurations": [],
"properties": {
"log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter": "WARN",
"log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper": "WARN",
"log4j.logger.org.apache.spark.streaming": "WARN",
"log4j.rootCategory": "WARN, console"
}
},
{
"classification": "spark-env",
"configurations": [
{
"classification": "export",
"configurations": [],
"properties": {
"PYSPARK_PYTHON": "/usr/bin/python3.6"
}
}
],
"properties": {}
}
]
I agree that I need to remove "configurations": [],
from config. But real root cause of issue is that terraform gets config from aws and it is in next code box:
[
{
"Classification": "spark-log4j",
"Properties": {
"log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter": "WARN",
"log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper": "WARN",
"log4j.logger.org.apache.spark.streaming": "WARN",
"log4j.rootCategory": "WARN, console"
}
},
{
"Classification": "spark-env",
"Configurations": [
{
"Classification": "export",
"Properties": {
"PYSPARK_PYTHON": "/usr/bin/python3.6"
}
}
],
"Properties": {}
}
]
So there is some magic between actual config in aws console and config downloaded by terraform to my pc.
terraform plan/apply
I'm having this same issue and I did a diff on mine as well. I suspect that the last box of JSON code is getting capitalized json keys?
I had the same issue, and it was solved by capitalizing the json keys to match aws's convention. Also NOTE on configurations_json: If the Configurations value is empty then you should skip the Configurations field instead of providing empty list as value "Configurations": [].
Most helpful comment
I had the same issue, and it was solved by capitalizing the json keys to match aws's convention. Also NOTE on configurations_json: If the Configurations value is empty then you should skip the Configurations field instead of providing empty list as value "Configurations": [].