This is related to the ressource type aws_emr_cluster
Many EMR clusters use S3 as their data storage even if S3 is designed to be eventual consistent.
That's why EMR offers an option called "EMRFS Consistent View" to ensure a higher level of consistency. This option uses a DynamoDB table to store metadata.
I would be nice to support it in Terraform as it is a key feature of EMR.
Below is an exemple of using the cli to create a cluster with EMRFS consistent view enabled
aws emr create-cluster \
--release-label emr-4.6.0 \
--instance-type c4.xlarge \
--emrfs Consistent=true,Args=[fs.s3.consistent.metadata.read.capacity=600, fs.s3.consistent.metadata.write.capacity=300] \
--ec2-attributes KeyName=myKey
EMRFS consistent view comes with a number of options that are listed in this page.
It is actually already possible to enable this option via Terraform but it's not documented.
Here is how
data "template_file" "emr_config" {
template = <<EOF
[
{
"classification":"emrfs-site",
"properties":{
"fs.s3.consistent.retryPeriodSeconds":"10",
"fs.s3.consistent":"true",
"fs.s3.consistent.retryCount":"5",
"fs.s3.consistent.metadata.read.capacity":"600",
"fs.s3.consistent.metadata.write.capacity":"300",
"fs.s3.consistent.metadata.tableName":"EmrFSMetadata"
},
"configurations":[
]
}
]
EOF
}
resource "aws_emr_cluster" "emrcluster" {
[...]
configurations = "${data.template_file.emr_config.rendered}"
[...]
}
@n-my thank you for following up with your solution, it was super helpful. In case anyone else comes across this as confused as I was about where these settings are coming from, I found the docs links with all the available classifications for this configurations doc and what they mean as well as all the available properties you can configure for emrfs.
I think these solutions are great, and this issue can be closed.
I could not get this to work, I am using emr-5.20.1 and terraform version 0.13.3.
template = <
{
"classification":"emrfs-site",
"properties":{
"fs.s3.consistent.retryPeriodSeconds":"10",
"fs.s3.consistent":"true",
"fs.s3.consistent.retryCount":"5",
"fs.s3.consistent.metadata.autoCreate":"true",
"fs.s3.consistent.metadata.read.capacity":"600",
"fs.s3.consistent.metadata.write.capacity":"300",
"fs.s3.consistent.metadata.tableName":"Emr05FSMetadata"
},
"configurations":[
]
}
]
EOF
}
Thanks
Most helpful comment
It is actually already possible to enable this option via Terraform but it's not documented.
Here is how