can it only read a single txtfile on hdfs ?
it is suggested to support parquet file , generally we write file as parquet format from spark directly after feature project~
@janelu9 Sorry, it only supports text file now.
@janelu9 @guolinke
you can run lightgbm on mmlspark, which can handle parquet files when loaded into spark DataFrame
Closed in favor of being in #2302. We decided to keep all feature requests in one place.
Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.
@StrikerRUS yep, I was just saying this requested feature already exists in mmlspark, which seems to fit the user's scenario above (running lightgbm on a parquet file from spark). Since it exists it doesn't even need to be in #2302 and can be closed because it is an actual existing feature as opposed to a non-yet-existing requested feature.
@imatiach-msft Are you sure that this feature doesn't need to be implemented in pure LightGBM (like HDFS support), independently from mmlspark package?
@StrikerRUS it certainly could be, however with the use case from user: "generally we write file as parquet format from spark ", it seems that running lightgbm in spark is the best solution. Maybe we can leave the feature open, but with low priority (if there is a way to assign priorities to tasks).
@imatiach-msft
it seems that running lightgbm in spark is the best solution.
Agree with you! I think we can re-open it in case of concrete request in the future.
Most helpful comment
@janelu9 @guolinke
you can run lightgbm on mmlspark, which can handle parquet files when loaded into spark DataFrame