Aws-cdk: [glue] Table format is Unknown

Created on 21 Aug 2020  路  3Comments  路  Source: aws/aws-cdk


Creating Glue Table. After table created in Glue console field "Classification" is "Unknown" while table data format is specified as CSV.

Reproduction Steps

const targetTable = new glue.Table(this,"TargetTable",{
      database: glueDatabase,
      dataFormat: glue.DataFormat.CSV,
      bucket: s3Bucket,
      tableName: "target_s3_table",
      columns:[
        {
         name:"sample_field",
         type:glue.Schema.STRING
        },
      ]
    })

What did you expect to happen?

Create Glue Table with CSV format

What actually happened?

Table format is shown as "Unknown" in Glue console.

Environment

  • CLI Version :
  • Framework Version: 1.60.0 (build 8e3f53a)
  • Node.js Version: v14.8.0
  • OS : MacOS 10.14.6
  • Language (Version): TypeScript 3.7.2

Other


This is :bug: Bug Report

@aws-cdaws-glue bug efforsmall in-progress p1

Most helpful comment

@Chriscbr Thanks for the workaround code! The code is ok, those castings appear a lot when doing this sort of stuff.

Another way of doing it would be:

cfnTable.addPropertyOverride('TableInput.Parameters', { classification: "csv", ...(cfnTable.tableInput as glue.CfnTable.TableInputProperty).parameters});

But that essentially the same. In general, to anyone interested, these approaches are described here: CDK Escape Hatches

All 3 comments

Hi @SZubarev - Thanks for reporting this. Indeed looks like an oversight.

I did some digging and created a Glue table with the console, then described it with aws glue get-tables:

{
    "TableList": [
        {
            "Name": "epolon-test",
            "DatabaseName": "triage",
            "CreateTime": 1598175131.0,
            "UpdateTime": 1598175131.0,
            "Retention": 0,
            "StorageDescriptor": {
                "Columns": [
                    {
                        "Name": "count",
                        "Type": "smallint"
                    }
                ],
                "Location": "<location>",
                "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
                "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
                "Compressed": false,
                "NumberOfBuckets": 0,
                "SerdeInfo": {
                    "SerializationLibrary": "org.apache.hadoop.hive.serde2.OpenCSVSerde",
                    "Parameters": {
                        "separatorChar": ","
                    }
                },
                "SortColumns": [],
                "StoredAsSubDirectories": false
            },
            "PartitionKeys": [],
            "TableType": "EXTERNAL_TABLE",
            "Parameters": {
                "classification": "csv" // this parameter is not being passed by CDK.
            },
            "CreatedBy": "<role>",
            "IsRegisteredWithLakeFormation": false
        }
    ]
}

Looks like the Classification field in the console is controlled by the classification parameter. We need to add this parameter:

https://github.com/aws/aws-cdk/blob/25a9cc7fabbe3b70add48edfd01421f74429b97f/packages/%40aws-cdk/aws-glue/lib/table.ts#L262-L264

I think I can try and implement the fix for this. Though one detail I noticed is, when creating a glue table in the console, some of the data formats require extra options: the choice of separator for CSV files, and the row tag for XML files. So, along these lines I think it's worth also extending the DataFormat class to support specifying these options (since they are required in the AWS console table creation workflow).

As a temporary workaround, @SZubarev you can use this code like this to override the property, so the table is marked as having the CSV classification.

const cfnTable = myTable.node.defaultChild as glue.CfnTable;
((cfnTable.tableInput as glue.CfnTable.TableInputProperty).parameters as string) = {
  classification: "csv",
  ...(cfnTable.tableInput as glue.CfnTable.TableInputProperty).parameters
};

[It seems that in glue.generated.ts, all fields of CfnTable.TableInputProperty are marked as readonly, so in order to override them for an existing L2 construct, we need to do extra typecasting, but maybe there's a better way to do this which I don't see...]

@Chriscbr Thanks for the workaround code! The code is ok, those castings appear a lot when doing this sort of stuff.

Another way of doing it would be:

cfnTable.addPropertyOverride('TableInput.Parameters', { classification: "csv", ...(cfnTable.tableInput as glue.CfnTable.TableInputProperty).parameters});

But that essentially the same. In general, to anyone interested, these approaches are described here: CDK Escape Hatches

Was this page helpful?
0 / 5 - 0 ratings