Catboost: Correct parsing of dataset lines with ""

Created on 30 Oct 2017  路  3Comments  路  Source: catboost/catboost

train_pool = Pool(TRAIN_FILE, column_description=CD_FILE,delimiter=',') throwing following error:

_catboost.CatboostError: catboost/libs/data/load_data.cpp:365: Factor world" in column 4 and row 1 is declared as numeric and cannot be parsed as float. Try correcting column description file.

Column Description file:
0 Target
1 Num
2 Categ

Sample Data:
0,1,"Madurai,India"

good first issue help wanted

Most helpful comment

Actually that's a good idea for contributing to catboost!

All 3 comments

We split data by ',' symbol, so "Madurai,India" is splitted also. If you want better splitting, please consider using python dataframe, or use other delimiter.

Actually that's a good idea for contributing to catboost!

This is fixed now in code and will be out on pypi in the next release.

Was this page helpful?
0 / 5 - 0 ratings