Lightgbm: [hdfs] support parquet file

Created on 26 Mar 2018  路  7Comments  路  Source: microsoft/LightGBM

can it only read a single txtfile on hdfs ?
it is suggested to support parquet file , generally we write file as parquet format from spark directly after feature project~

feature request help wanted

Most helpful comment

@janelu9 @guolinke
you can run lightgbm on mmlspark, which can handle parquet files when loaded into spark DataFrame

All 7 comments

@janelu9 Sorry, it only supports text file now.

@janelu9 @guolinke
you can run lightgbm on mmlspark, which can handle parquet files when loaded into spark DataFrame

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

@StrikerRUS yep, I was just saying this requested feature already exists in mmlspark, which seems to fit the user's scenario above (running lightgbm on a parquet file from spark). Since it exists it doesn't even need to be in #2302 and can be closed because it is an actual existing feature as opposed to a non-yet-existing requested feature.

@imatiach-msft Are you sure that this feature doesn't need to be implemented in pure LightGBM (like HDFS support), independently from mmlspark package?

@StrikerRUS it certainly could be, however with the use case from user: "generally we write file as parquet format from spark ", it seems that running lightgbm in spark is the best solution. Maybe we can leave the feature open, but with low priority (if there is a way to assign priorities to tasks).

@imatiach-msft

it seems that running lightgbm in spark is the best solution.

Agree with you! I think we can re-open it in case of concrete request in the future.

Was this page helpful?
0 / 5 - 0 ratings