Presto: Support for "skip.header.line.count" property for HIVE tables

Created on 22 Oct 2014  路  24Comments  路  Source: prestodb/presto

We have text tables with skip.header.line.count property.

enhancement intermediate-task

Most helpful comment

This is still an issue. Hive understands the skip.header.line property and skips header while reading. But presto displays the header record on querying the same table.

Example to reproduce the error:

Step 1: create a csv file with 2 columns including header record (having inserted few records),

Step 2: Create table in hive with following property TBLPROPERTIES("skip.header.line.count"="1");

Step3 : query (select *) the table in hive --> Does not show header, furthermore as an added test, using the value(column header) in where clause produces no rows

Step 4: Query (select *) the table in presto --> Includes Header, furthermore as an added test,using the value(column header) in where clause returns the header record.

All 24 comments

I hit this as well.. Refreshing that it's an issue.

This is still an issue...

skip.header.line.count is available from metastore. Record cursor will have to skip lines according to that property. Some changes in BackgroundHiveSplitLoader and ColumnarTextHiveRecordCursor will be necessary.

No update on this issue ?

I would expect this to work for reading and writing to existing tables. I don't think there is a way to set this property when creating new tables.

馃憤 plus one for this issue

If this is still an issue, please reopen with specifics on how to reproduce the problem.

This is still an issue. Hive understands the skip.header.line property and skips header while reading. But presto displays the header record on querying the same table.

Example to reproduce the error:

Step 1: create a csv file with 2 columns including header record (having inserted few records),

Step 2: Create table in hive with following property TBLPROPERTIES("skip.header.line.count"="1");

Step3 : query (select *) the table in hive --> Does not show header, furthermore as an added test, using the value(column header) in where clause produces no rows

Step 4: Query (select *) the table in presto --> Includes Header, furthermore as an added test,using the value(column header) in where clause returns the header record.

facing the same issue. running on AWS EMR with
Hive = Hive 2.1.0
Presto = Presto 0.157.1

Is this issue solved since I am also encountering the same issue.

We are experiencing the same issue, @dain is @rupesh1183 's specifics enough to reproduce the problem on your end?

Facing the same issue. Any progress since Oct 22, 2014 ?

any other way around this?

Still an issue.

same issue here

yeah, same issue here...

This also touches the behaviour when skip.footer.line.count is used

All - A related update : AWS has fixed this issue in 'Athena' which is AWS's version/product based on Presto.

@rupeshmalladi any plans for AWS Athena to contribute this back?

@findepi Not that I am aware of!

Fixed with #10323

Presto is still ignoring skip.header.line.count on latest cluster deployment from AWS (5.13) Presto 0.194 with Hadoop 2.8.3 HDFS and Hive 2.3.2

The "closing fix #10323" doesn't apply to this ticket. The linked ticket was closed for a reason that had nothing to do with Presto.

@cvandeve the fix is available for since Presto 0.199 (https://prestodb.io/docs/current/release/release-0.199.html#hive-changes), while latest release is 0.201.

Thanks @findepi

Was this page helpful?
0 / 5 - 0 ratings