Machinelearning: Q: Interpreting Feature PFI results

Created on 8 Jan 2020 · 8Comments · Source: dotnet/machinelearning

Hi all,

I have been doing a little deep dive into some of my models in order to understand a little more about feature relevance. My results for running feature explanatory analysis is as follows for bin classification:

2020-01-08 11:34:03.813 +00:00 [INF] BinaryFastTreeParameters
2020-01-08 11:34:03.815 +00:00 [INF] Bias: 0
2020-01-08 11:34:03.816 +00:00 [INF] Feature Weights:
2020-01-08 11:34:03.843 +00:00 [INF] Feature: CloseWeight: 0.1089412
2020-01-08 11:34:03.931 +00:00 [INF] Feature: OpenWeight: 0.3691619
2020-01-08 11:34:03.932 +00:00 [INF] Feature: HighWeight: 0.06676193
2020-01-08 11:34:03.933 +00:00 [INF] Feature: LowWeight: 0.1926264
2020-01-08 11:34:03.934 +00:00 [INF] Feature: STO_FastStochWeight: 0.19846
2020-01-08 11:34:03.938 +00:00 [INF] Feature: STO_StochKWeight: 0.5019926
2020-01-08 11:34:03.941 +00:00 [INF] Feature: STO_StochDWeight: 0.3781931
2020-01-08 11:34:03.942 +00:00 [INF] Feature: STOWeight: 0
2020-01-08 11:34:03.943 +00:00 [INF] Feature: CCI_TypicalPriceAvgWeight: 0.131141
2020-01-08 11:34:03.944 +00:00 [INF] Feature: CCI_TypicalPriceMADWeight: 0.1299266
2020-01-08 11:34:03.946 +00:00 [INF] Feature: CCIWeight: 1
2020-01-08 11:34:03.947 +00:00 [INF] Feature: RSIDownWeight: 0.4761779
2020-01-08 11:34:03.948 +00:00 [INF] Feature: RSIUpWeight: 0.1249975
2020-01-08 11:34:03.951 +00:00 [INF] Feature: RSIWeight: 0.2877662
2020-01-08 11:34:03.952 +00:00 [INF] Feature: MOMWeight: 0.1822069
2020-01-08 11:34:03.953 +00:00 [INF] Feature: ADX_PositiveDirectionalIndexWeight: 0.2435836
2020-01-08 11:34:03.954 +00:00 [INF] Feature: ADX_NegativeDirectionalIndexWeight: 0.4263106
2020-01-08 11:34:03.955 +00:00 [INF] Feature: ADXWeight: 0.1899773
2020-01-08 11:34:03.956 +00:00 [INF] Feature: CMOWeight: 0.2601428

But for PFI I have the following:
2020-01-08 11:34:09.369 +00:00 [INF] Calculating Binary Classification Feature PFI
2020-01-08 11:34:09.371 +00:00 [INF] Feature PFI for learner:BinaryFastTree
2020-01-08 11:34:09.383 +00:00 [INF] Close| 0.000000
2020-01-08 11:34:09.384 +00:00 [INF] Open| 0.000000
2020-01-08 11:34:09.385 +00:00 [INF] High| 0.000000
2020-01-08 11:34:09.386 +00:00 [INF] Low| 0.000000
2020-01-08 11:34:09.391 +00:00 [INF] STO_FastStoch| 0.000000
2020-01-08 11:34:09.400 +00:00 [INF] STO_StochK| 0.000000
2020-01-08 11:34:09.401 +00:00 [INF] STO_StochD| 0.000000
2020-01-08 11:34:09.402 +00:00 [INF] STO| 0.000000
2020-01-08 11:34:09.404 +00:00 [INF] CCI_TypicalPriceAvg| 0.000000
2020-01-08 11:34:09.406 +00:00 [INF] CCI_TypicalPriceMAD| 0.000113
2020-01-08 11:34:09.408 +00:00 [INF] CCI| 0.000000
2020-01-08 11:34:09.414 +00:00 [INF] RSIDown| 0.000221
2020-01-08 11:34:09.416 +00:00 [INF] RSIUp| 0.000000
2020-01-08 11:34:09.431 +00:00 [INF] RSI| 0.000000
2020-01-08 11:34:09.443 +00:00 [INF] MOM| -0.003003
2020-01-08 11:34:09.457 +00:00 [INF] ADX_PositiveDirectionalIndex| 0.000000
2020-01-08 11:34:09.467 +00:00 [INF] ADX_NegativeDirectionalIndex| 0.000000
2020-01-08 11:34:09.470 +00:00 [INF] ADX| 0.000000
2020-01-08 11:34:09.479 +00:00 [INF] CMO| 0.000000

My question is essentially - what should I read (if anything) into zero values for PFI. The evaluation score too:
020-01-08 11:34:17.135 +00:00 [INF] Score: -4.640871
2020-01-08 11:34:17.138 +00:00 [INF] Probability: 0.1351293

I would appreciate any thoughts that you may have regarding using such info to improve model veracity.

Thank you
Fig

P3 need info question

Source

lefig

All 8 comments

Can you please share the code you used to print those values to check a couple of things?

antoniovs1029 on 9 Jan 2020

Pleasure and thank you for your help!

The logging functions:

private void LogModelWeights(LinearBinaryModelParameters subModel, string name)
        {
            var weights = subModel.Weights.ToList();

            // Log the model parameters.
            Logger.Info(name + $"Parameters");
            Logger.Info("Bias: " + subModel.Bias);
            Logger.Info($"Feature Weights:");

            // 1 Feature Weights
            for (int i = 0; i < features.Length; i++)
            {
                contributions[i].Weight = weights[i];
                contributions[i].Contribution = 0;  // The weight will be assigned by the prediction engine
                                                    // Using CalculateFeatureContribution (bellow)
                Logger.Info(" Feature: " + contributions[i].Name + "Weight: " + contributions[i].Weight);
            }
        }

private void LogPermutationMetics(IDataView transformedData, 
            ImmutableArray<BinaryClassificationMetricsStatistics> permutationMetrics)
        {
            var allFeatureNames = GetColumnNamesUsedForPFI(transformedData);
            var mapFields = new List<string>();
            for (int i = 0; i < allFeatureNames.Count(); i++)
            {
                var slotField = new VBuffer<ReadOnlyMemory<char>>();
                if (transformedData.Schema[allFeatureNames[i]].HasSlotNames())
                {
                    transformedData.Schema[allFeatureNames[i]].GetSlotNames(ref slotField);
                    for (int j = 0; j < slotField.Length; j++)
                    {
                        mapFields.Add(allFeatureNames[i]);
                    }
                }
                else
                {
                    mapFields.Add(allFeatureNames[i]);
                }
            }

            // Now let's look at which features are most important to the model
            // overall. Get the feature indices sorted by their impact on AUC.
            // The importance, or the absolute average decrease in R-squared metric calculated 
            // by PermutationFeatureImportance can then be ordered from most important to least important.
            var sortedIndices = permutationMetrics
                .Select((metrics, index) => new { index, metrics.AreaUnderRocCurve })
                .OrderByDescending(
                feature => Math.Abs(feature.AreaUnderRocCurve.Mean));

            Console.WriteLine($"Feature indices sorted by their impact on AUC:");

            foreach (var feature in sortedIndices)
            {
                Console.WriteLine($"{mapFields[feature.index],-20}|\t{Math.Abs(feature.AreaUnderRocCurve.Mean):F6}");
            }

            Console.WriteLine($"PMI AUC Logged as the following:");
            // Combine metrics with feature names and format for display
            for (int i = 0; i < permutationMetrics.Length; i++)
            {
                Logger.Info($"{importances[i].Name}|\t{permutationMetrics[i].AreaUnderRocCurve.Mean:F6}");
                importances[i].AUC = permutationMetrics[i].AreaUnderRocCurve.Mean;
            }
        }

lefig on 10 Jan 2020

Hi @lefig - can you share the code that generates the objects passed to these logging functions?
LinearBinaryModelParameters subModel
IDataView transformedData
ImmutableArray<BinaryClassificationMetricsStatistics> permutationMetrics

Please also share code for any data processing and model training.

PFI values for features being 0 mean that permuting the feature values did not change AreaUnderRocCurve much. This is not the same as the weight learned by the model being 0. You can have non-zero weights for a feature that are not statistically significant, and you could end up with a situation where PFI metrics are 0.

Note that PFI value is just one indicator of feature importance, not a conclusive statement of feature importance. That said, so many features having PFI of 0 warrants some further investigation. Here are a few reasons I can think of that can possibly explain this.

permutationCount used for calculating PFI is 1 (or a small number). Please double check the value of this argument is something reasonable (try something like 10 or 30)
The model itself might not be very good, so the change in AreaUnderRocCurve isn't very large when a feature is permuted. What is the actual AreaUnderRocCurve of this model evaluated on the training and test data? AreaUnderRocCurve ~0.5 or ~0.6 would indicate a particularly poor model, which you would expect to be about as poor when a feature is permuted, hence no change in AreaUnderRocCurve.
PFI indicates feature importance only on the data it is evaluated on. Are you evaluating ImmutableArray<BinaryClassificationMetricsStatistics> permutationMetrics on a very small dataset? That could give rise to 0 change in AreaUnderRocCurve.

najeeb-kazmi on 15 Jan 2020

👍1

Hi @najeeb-kazmi

Thank you for your kind help. The code that generates the metrics is as follows (this is an example of one such learner that requires a calibrator).

private void CalculateGamCalibratedClassificationPermutationFeatureImportance(MLContext mlContext, IDataView transformedData,
                                                        ITransformer trainedModel, string learner)
        {
            // Extract the trainer (last transformer in the model)          
            var singleTrainerModel = (trainedModel as BinaryPredictionTransformer<CalibratedModelParametersBase<GamBinaryModelParameters,
                PlattCalibrator>>);

            //Calculate Feature Permutation
            ImmutableArray<BinaryClassificationMetricsStatistics> permutationMetrics =
                                            mlContext
                                                .BinaryClassification.PermutationFeatureImportance(predictionTransformer: singleTrainerModel,
                                                                                         data: transformedData,
                                                                                         labelColumnName: "Label",
                                                                                         numberOfExamplesToUse: 100, permutationCount: 50);
            Logger.Info("Calculating Binary Classification Feature PFI");
            Logger.Info("Feature PFI for learner:" + learner);
            LogPermutationMetics(transformedData, permutationMetrics);
        }

I tend to think (your point 2) that the model is poor and needs some features removed. Hence I was hoping to have some insight regarding the names of those features so that I can proceed with changing the model.

Best wishes
Fig

lefig on 17 Jan 2020

@lefig

This is a Gam model, your original comment was for a linear model. For which model are you seeing 0 PFI metrics? For linear model, you can use L1 regularization to remove unimportant features (force their weights to be 0). For Gam models, see this example for how to understand feature importance. Basically, features where the bin effects are relatively flat are less important ( and could be removed), while feature whose bin effects show some trend are more important. Try PFI after these features have been removed.

What is the AUC of this model?
Maybe using only 100 rows is the reason you are not seeing non-zero PFI. Try using the entire dataset.

najeeb-kazmi on 17 Jan 2020

👍1

@lefig any update on this and the information I requested? Also, did any of my suggestions help in debugging this?

I'm curious to see why this is happening as it is quite unusual. As I mentioned, it's not clear which model is giving you 0 PFI, Gam or linear. Would be nice to see reproducible example so I can debug this (small snippet of the data and the actual code for training the model and calculating PFI).

najeeb-kazmi on 28 Jan 2020

Hi @najeeb-kazmi

I really appreciate your time and help with this. Please let me generate some further test data and I will get back to you.

lefig on 28 Jan 2020

👍1

@lefig if this is still an issue, please feel free to reopen.

najeeb-kazmi on 20 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings