Machinelearning: [AutoML] Column inferencing is limited to 10000 Columns

Created on 10 May 2019  路  7Comments  路  Source: dotnet/machinelearning

Looks like when a dataset has more than 10000 columns. The column inferencing API is failing in AutoML and hence CLI also fails

System information
OS version/distro: Any OS
ML.NET CLI

What did you do?
Gave the CLI a csv file more than 10000 columns
What happened?
CLI fails to run and throws an exception while inferring columns

What did you expect?
To infer columns and produce a model and console project.

AutoML.NET P2 command-line enhancement

Most helpful comment

I love the product, I'm just testing it. Happy to wait

All 7 comments

In ColumnTypeInference.cs . The issue could be the following check.

 private const int SmartColumnsLim = 10000;
......
......
......

if (args.ColumnCount >= SmartColumnsLim)
            {
                // too many columns for automatic inference
                return InferenceResult.Fail();
            }

cc: @justinormont @vinodshanbhag @daholste @Dmitry-A

does this mean that the limit is going to be there or could we be able to specify to great a vector as perhaps one would not need 10K individual properties inferred. I'd be happy with Label and vector

I would imagine two things to happen:
1) To remove the limits on the no. of columns (AutoML API)
2) Have a threshold for column count where you could generate vector column.( Command-line)

@PeterPann23: I'm expecting we'll simply remove the limit. But some investigation is needed to see why it existed in the past.

We'll have to do some perf testing first to ensure it's not atrocious. If it's too slow, we'll file another issue to rework the column type/purpose detection. I believe the code should be O(N) for the number of columns.

The other route as you mention is to push directly in to a vector column, though for this we first do need to know the columns' type as a vector must of a single datatype (int/float/bool/etc).

@PeterPann23: If you're looking for a solution for the day, you can feed an IDataView directly into the AutoML API w/ the vector column.

I love the product, I'm just testing it. Happy to wait

Was this page helpful?
0 / 5 - 0 ratings

Related issues

darren-zdc picture darren-zdc  路  3Comments

ddobric picture ddobric  路  4Comments

pgovind picture pgovind  路  3Comments

sethreidnz picture sethreidnz  路  3Comments

neven10 picture neven10  路  3Comments