Terraform-provider-aws: Feature Request: Manage Record Format Conversion In AWS Kinesis Firehose Stream

Created on 11 May 2018  ·  8Comments  ·  Source: hashicorp/terraform-provider-aws

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

AWS release a feature today - convert JSON from Kinesis Firehose Stream to Apache Parquet or Apache ORC before saving to S3.

Before you needed to write and pay for AWS Glue ETL jobs to do that.

Documentation: https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html

The relevant Stackoverflow question has ~2500 views meaning it was a long-awaited feature.

New or Affected Resource(s)

  • aws_kinesis_firehose_delivery_stream

Potential Terraform Configuration

Suggested syntax? According to the API.

resource "aws_kinesis_firehose_delivery_stream" "test_stream" {
  name        = "terraform-kinesis-firehose-test-stream"
  destination = "s3"

  data_format_conversion {
    enabled = "true"

    input_format_configuration {
      deserializer = "Apache Hive JSON" # or OpenX JSON
    }

    output_format_configuration {
      serializer = "ORC" # or Parquet
    }

    schema_configuration {
      catalog_id    = "${aws_glue_catalog_database.main.catalog_id}"
      database_name = "${aws_glue_catalog_database.main.name}"
      table_name    = "${aws_glue_catalog_table.main.name}"
      role_arn      = "..."
      version_id    = "3" # or LATEST by default
    }
  }
}

References

enhancement servicfirehose

Most helpful comment

I will try to get a pull request submitted for this tomorrow or Wednesday.

All 8 comments

Prerequisite: AWS Go SDK v1.13.47 (#4512)

I'm +1 on the proposed configuration syntax here as an mvp, however the input_format_configuration and output_format_configuration sections expose many more knobs:

input:
https://docs.aws.amazon.com/firehose/latest/APIReference/API_HiveJsonSerDe.html
https://docs.aws.amazon.com/firehose/latest/APIReference/API_OpenXJsonSerDe.html

output:
https://docs.aws.amazon.com/firehose/latest/APIReference/API_ParquetSerDe.html
https://docs.aws.amazon.com/firehose/latest/APIReference/API_OrcSerDe.html

It appears that most of these are optional, but will be returned in DescribeDeliveryStream: https://docs.aws.amazon.com/firehose/latest/APIReference/API_DescribeDeliveryStream.html

I will try to get a pull request submitted for this tomorrow or Wednesday.

Ack -- I only got about halfway through implementing the 36(!) new attributes required in the full schemas for serializers/deserializers before I ran out of time before I head out on a short vacation. I'll be able to pick this back up on Tuesday unless someone wants to get something in sooner.

Pull request submitted with all underlying options: #4842

Support has been merged into master and will release with version 1.24.0 of the AWS provider, likely middle of this week. 🎉

This has been released in version 1.24.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

Was this page helpful?
0 / 5 - 0 ratings