ADD permutation feature importance function, testing and documentatio… #33

TortySivill · 2020-07-08T15:34:55Z

…n to permutation_feature_importance branch

name: Pull Request for Permutation Feature Importance
about: Template for pull requests

Description

Added Permutation Feature Importance (PFI) to existing fatf/transparency/models/feature_influence
Added Testing for Permutation Feature Importance to fatf/transparency/models/tests/test_feature_influence
Added documentation for Permutation Feature Imprortance to existing:
- doc/api
- doc/getting_started/structure
- doc/tutorials/model-explainability
Added PFI example to examples/transparency/xmpl_transparency_pfi

Reason behind implementation

New PFI functionality as agreed with Kacper

Are there any other branches related to this work?

No

Example code

import numpy as np
import fatf.transparency.models.feature_influence as ftmfi
from sklearn.linear_model import LogisticRegression

X = np.asarray([[1, 9, 9],[1, 9, 9],[1, 9, 9],[0, 9, 9],[0, 9, 9],[0, 9, 9]])
y = np.asarray([1, 1, 1, 0, 0, 0])
clf = LogisticRegression().fit(X, y)
pfi_scores = ftmfi.permutation_feature_importance(X,clf,y,repeat_number=10)
print(np.mean(pfi_scores,axis=0))

Checklist

Please ensure you have done every task in this checklist.

[X ] Created any additional unit tests required.
[X ] All tests pass.
[X ] Code style is consistent with the project.
[X ] No additional Python packages except NumPy and SciPy are required.
[X ] The code is documented.
[X ] Appropriate API documentation, tutorials, how-to guides and user guide entries were created.

…n to permutation_feature_importance branch

So-Cool

I've marked a few small (mostly syntactic) issues that we need to address before merging it.

doc/tutorials/model-explainability.rst

So-Cool · 2020-08-04T09:09:20Z

doc/tutorials/model-explainability.rst

-This tutorial walked through using Individual Conditional Expectation and
-Partial Dependence to explain influence of features on predictions of a model.
+Permutation Feature Importance (PFI) tells us by how much does the model's
+predictive error changes as we randomly permute each feature in the dataset.


"changes" -> "change", I believe.

"each feature" -> "selected features" -- I think we should allow the user to specify whether PFI is computed for all of the features or just their selected subset. (I've left a separate comment in the source code file to request this feature.)

I would suggest to leave the functionality of selecting the features as an added feature in a future release, if this involves modifications in plenty of places. Assuming that the most common scenario would be to analyse all the features, and sort them.

doc/tutorials/model-explainability.rst

fatf/transparency/models/feature_influence.py

So-Cool · 2020-08-05T12:47:09Z

fatf/transparency/models/feature_influence.py

+ as_regressor: Optional[bool] = None
+ A boolean variable used to signify that the ``model``
+ is a regression model. Used to inform the ``scoring_metric``.
+ scoring_metric: Optional[str]=None


Suggested change

scoring_metric: Optional[str]=None

scoring_metric : string, optional (default=None)

I would suggest to make the default value for scoring_metric explicit (instead of None), in order to be clear in the documentation what is the default behaviour.

perellonieto

I have added some comments to the previous ones. Please let me know if there is anything unclear.

fatf/transparency/models/feature_influence.py

perellonieto · 2020-08-05T14:14:12Z

fatf/transparency/models/feature_influence.py

+def permutation_feature_importance(dataset: np.ndarray,
+ model: object,
+ target: np.ndarray,
+ as_regressor: Optional[bool] = None,
+ scoring_metric: Optional[str] = None,
+ repeat_number: Optional[int] = None):


Agree, I assume an array of integers indicating the feature columns.

perellonieto · 2020-08-05T14:26:32Z

fatf/transparency/models/feature_influence.py

+ repeat_number: Optional[int] = None):
+ '''
+ Calculates the Permutation Feature Importance (PFI)
+ of each feature in a dataset.


if we decide to add the list of features, this line needs to be changed from "of each feature in a dataset" for "of the selected features in a dataset"

perellonieto · 2020-08-05T14:28:11Z

fatf/transparency/models/feature_influence.py

+ as_regressor: Optional[bool] = None
+ A boolean variable used to signify that the ``model``
+ is a regression model. Used to inform the ``scoring_metric``.
+ scoring_metric: Optional[str]=None


I would suggest to make the default value for scoring_metric explicit (instead of None), in order to be clear in the documentation what is the default behaviour.

perellonieto · 2020-08-05T14:36:51Z

doc/tutorials/model-explainability.rst

-This tutorial walked through using Individual Conditional Expectation and
-Partial Dependence to explain influence of features on predictions of a model.
+Permutation Feature Importance (PFI) tells us by how much does the model's
+predictive error changes as we randomly permute each feature in the dataset.


I would suggest to leave the functionality of selecting the features as an added feature in a future release, if this involves modifications in plenty of places. Assuming that the most common scenario would be to analyse all the features, and sort them.

perellonieto · 2020-08-05T15:09:08Z

fatf/transparency/models/feature_influence.py

+ as_regressor: Optional[bool] = None,
+ scoring_metric: Optional[str] = None,
+ repeat_number: Optional[int] = None):


About the mutable parameters, shouldn't it be the function who takes care of not modifying a mutable object, if that is not the intended behaviour of the function?

perellonieto · 2020-08-05T15:36:58Z

fatf/transparency/models/feature_influence.py

+ Permutation Feature Importance (PFI).
+ PFI works by
+ permuting the values of each feature and measuring
+ the change in prediction error compared to the original


If we decide to change "prediction error" to "predictive performance metric" this needs to be changed here accordingly.

perellonieto · 2020-08-05T15:41:24Z

fatf/transparency/models/feature_influence.py

+ if SKLEARN_MISSING is True:
+ if as_regressor:
+ predictions = model.predict(dataset) # type: ignore
+ score = -np.max(np.abs(target - predictions))


It is not clear to me if "max" should be replace by "mean". Maybe you have a better understanding of why it is more appropriate to use "max" here.

perellonieto · 2020-08-05T15:49:17Z

fatf/transparency/models/feature_influence.py

+ for regressors. If the ``as_regressor`` parameter is unspecified,
+ the model will be treated as a classifier and ``accuracy``
+ will be used to generate scores.
+


The current method to decide the "scoring_metric" sounds very convoluted. I wonder if we should simplify this in some way. Eg. expect a function that expects ground_truth and scores. Or check if the model has a scoring method.

…hanges

TortySivill added 3 commits July 8, 2020 16:14

ADD permutation feature importance function, testing and documentatio…

497b372

…n to permutation_feature_importance branch

FIX format string and scikit-learn version error

aa28b3d

FIX pylint no-member error

793bbc4

So-Cool requested review from So-Cool and perellonieto July 31, 2020 10:48

So-Cool requested changes Aug 5, 2020

View reviewed changes

perellonieto suggested changes Aug 5, 2020

View reviewed changes

TortySivill added 2 commits August 8, 2020 12:50

UPDATE fatf/transparency/models/feature_influence.py with PR review c…

f3d98b4

…hanges

UPDATE documentation and examples for PFI with comments from PR review

061aa1f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADD permutation feature importance function, testing and documentatio… #33

ADD permutation feature importance function, testing and documentatio… #33

TortySivill commented Jul 8, 2020

So-Cool left a comment

So-Cool Aug 4, 2020

perellonieto Aug 5, 2020

So-Cool Aug 5, 2020

perellonieto Aug 5, 2020

perellonieto left a comment

perellonieto Aug 5, 2020

perellonieto Aug 5, 2020

perellonieto Aug 5, 2020

perellonieto Aug 5, 2020

perellonieto Aug 5, 2020

perellonieto Aug 5, 2020

perellonieto Aug 5, 2020

perellonieto Aug 5, 2020

	scoring_metric: Optional[str]=None
	scoring_metric : string, optional (default=None)

ADD permutation feature importance function, testing and documentatio… #33

Are you sure you want to change the base?

ADD permutation feature importance function, testing and documentatio… #33

Conversation

TortySivill commented Jul 8, 2020

Description

Reason behind implementation

Are there any other branches related to this work?

Example code

Checklist

So-Cool left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

perellonieto left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment