Creating Glue Table. After table created in Glue console field "Classification" is "Unknown" while table data format is specified as CSV.
const targetTable = new glue.Table(this,"TargetTable",{
database: glueDatabase,
dataFormat: glue.DataFormat.CSV,
bucket: s3Bucket,
tableName: "target_s3_table",
columns:[
{
name:"sample_field",
type:glue.Schema.STRING
},
]
})
Create Glue Table with CSV format
Table format is shown as "Unknown" in Glue console.
This is :bug: Bug Report
Hi @SZubarev - Thanks for reporting this. Indeed looks like an oversight.
I did some digging and created a Glue table with the console, then described it with aws glue get-tables:
{
"TableList": [
{
"Name": "epolon-test",
"DatabaseName": "triage",
"CreateTime": 1598175131.0,
"UpdateTime": 1598175131.0,
"Retention": 0,
"StorageDescriptor": {
"Columns": [
{
"Name": "count",
"Type": "smallint"
}
],
"Location": "<location>",
"InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"Compressed": false,
"NumberOfBuckets": 0,
"SerdeInfo": {
"SerializationLibrary": "org.apache.hadoop.hive.serde2.OpenCSVSerde",
"Parameters": {
"separatorChar": ","
}
},
"SortColumns": [],
"StoredAsSubDirectories": false
},
"PartitionKeys": [],
"TableType": "EXTERNAL_TABLE",
"Parameters": {
"classification": "csv" // this parameter is not being passed by CDK.
},
"CreatedBy": "<role>",
"IsRegisteredWithLakeFormation": false
}
]
}
Looks like the Classification field in the console is controlled by the classification parameter. We need to add this parameter:
I think I can try and implement the fix for this. Though one detail I noticed is, when creating a glue table in the console, some of the data formats require extra options: the choice of separator for CSV files, and the row tag for XML files. So, along these lines I think it's worth also extending the DataFormat class to support specifying these options (since they are required in the AWS console table creation workflow).
As a temporary workaround, @SZubarev you can use this code like this to override the property, so the table is marked as having the CSV classification.
const cfnTable = myTable.node.defaultChild as glue.CfnTable;
((cfnTable.tableInput as glue.CfnTable.TableInputProperty).parameters as string) = {
classification: "csv",
...(cfnTable.tableInput as glue.CfnTable.TableInputProperty).parameters
};
[It seems that in glue.generated.ts, all fields of CfnTable.TableInputProperty are marked as readonly, so in order to override them for an existing L2 construct, we need to do extra typecasting, but maybe there's a better way to do this which I don't see...]
@Chriscbr Thanks for the workaround code! The code is ok, those castings appear a lot when doing this sort of stuff.
Another way of doing it would be:
cfnTable.addPropertyOverride('TableInput.Parameters', { classification: "csv", ...(cfnTable.tableInput as glue.CfnTable.TableInputProperty).parameters});
But that essentially the same. In general, to anyone interested, these approaches are described here: CDK Escape Hatches
Most helpful comment
@Chriscbr Thanks for the workaround code! The code is ok, those castings appear a lot when doing this sort of stuff.
Another way of doing it would be:
But that essentially the same. In general, to anyone interested, these approaches are described here: CDK Escape Hatches