We have created a external hive table with 'avro.schema.literal' like below:
create external table test_avro
partitioned by (user_id string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
LOCATION '/topics/new_data'
tblproperties ('avro.schema.literal'='
{ "name": "event_record", "type": "record", "fields": [ {"name":"name", "type":"string"},
{"name":"other_data", "type":{"type": "map", "values" : ["string", "int"]}},
{"name":"address", "type":{"type" : "map", "values" : "string"}} ] }' );
When we describe table in Hive, it lists all the columns but in Presto it doesn't list the other_data column as it's value is of union type.
Is this a known issue or is there any workaround for this?
Unfortunately, UNION type isn't yet supported even if we don't use avro. Could you use STRUCT type as the workaround?
Do you mean record type as struct in Avro?
We are using Avro and we want a field which can hold dynamic key value pairs i.e map in avro.
The values in our cases will be of different types int and string.
How can we provide schema to this kind of field in Avro and using it in Presto?
Yes, I mean record in Avro as below. Or, how about using nested map type?
{"name":"other_data", "type":{"type": "map", "values" : {"type": "record", "name":"child_data", "fields" : [{"name":"data1", "type":"string"}, {"name":"data2", "type":"int"}]}}},
JFYI, we have an internal patch which ignores uniontype field inside a struct, i.e., dropping a sub-field rather than the entire column. We can contribute this if you think it's helpful.
@lxynov Sorry for my late reply. Let me assign you. Thanks!
Related Slack thread: https://prestosql.slack.com/archives/CP1MUNEUX/p1581706921212000
+1 this would be a nice feature to support AVRO union types.
Example from my job:
Most helpful comment
+1 this would be a nice feature to support AVRO union types.
Example from my job: