Machinelearningnotebooks: Vague on local explanation of classification

Created on 20 Aug 2019  Â·  7Comments  Â·  Source: Azure/MachineLearningNotebooks

(or local_importance_rank[0][0] if classification) This assumes that the resulting values and names will be a nested list, one per target class, however does not say which order the input classes will be in. As the ranked values tend to resort the values in order can I also assume that the classes are sorted in order? Do the classes retain the original order that the occur in the dataframe? Can I get the correct order by simply typing np.unique(df['target'])? Some amount of clarity of the output is necessary regarding the classes.


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Inference product-question

Most helpful comment

Hi Bill! In the classification situation, classes will always remain in their original order from the provided dataset, we will _not_ do any reordering. As far as getting the correct order, that will depend on the form your data takes. We'll work on clarifying this in our docs!

All 7 comments

Hi Bill! In the classification situation, classes will always remain in their original order from the provided dataset, we will _not_ do any reordering. As far as getting the correct order, that will depend on the form your data takes. We'll work on clarifying this in our docs!

Thanks for the reply! I love your product. Here's the thing: One doesn't specify a target class order when sending dataframes to shap, you send the dataframe and the model. one could assume that the returning class order is either A) The order in which they occur in the data-frame (e.g. np.unique['target']) OR b) returned in some other order based on the resulting shap values. It is reasonable to assume b because the local_explanation does sort the shap values descending order regardless of the order that they occur in the dataframe. My request is that you need to update the documentation to reflect the order of the output files. Another solution would be to nest a dictionary using the actual class labels to prevent the possibility of ambiguity instead of a list of lists.

Either way, you shouldn't tell me (although I appreciate it), you should put that line in the docs to that people who read the docs know that answer. I was able to solve the problem by running the IRIS dataset until I had a pretty good guess what the output was doing.

@vmagelo

You should also consider that that in many cases the user may not know (or care) what order the target variables are in the dataframe. The user would have to first use some kind of itertool to calculate what order the class labels occur in the dataframe for the sole purpose of knowing what order the resulting shap_value tables is in. It could also be that they may run a dataframe (same columns, new data) and the resulting shap values may have shifted order because they occur in a new order in the new dataset.

another thought: why the lists of lists? every analyst using this tool is going to build a function that takes the lists of values and list of labels and re-builds the dataframe the way that they originally had it. a simple local_explanation.to_dataframe() would be awesome here. You probably make those lists of lists because that's what the d3 viz library for the notebooks use. Some data scientists may want the actual data and they'll all have to coerce those lists back into a useful dataframe.

@BillmanH the order of the classes in SHAP's output is the order of the numeric indexes that the classifier outputs. Say you have an SVM classifier, evaluated on dataset A it will output the label values: [1, 0, 2, 1, 3, 0, 1 ....] and the predicted probabilities will be for each index: [[0.2, 0.5, 0.1, 0.2], [0.7, 0.1, 0.1, 0.1] ...]
The order of the SHAP structure will correspond to the indexes.
The optional "classes" names that you pass to our library (only used for visualization) should be in the same order as the indexes, otherwise you will get the wrong output - if you have classes 1 = "bird", 0 = "cat", 2 = "dog", 3 = "fish", the classes should be passed in the order = [0, 1, 2, 3], which corresponds to ["cat", "bird", "dog", "fish"].
This is nothing specific to our library, it's based on the way SHAP handles classes.

Thanks @imatiach-msft! @BillmanH we'll work on getting this information into our documentation. To your point about the list of lists, a to_dataframe() method is a great suggestion, thanks for that feedback!

Thanks!

Was this page helpful?
0 / 5 - 0 ratings