You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, We have been trying out anomaly detection using isolation forest. The model present us with anomaly in our data. We are providing about 25 feature. We want to understand contribution of each feature in as a cause for anomaly. For this we decided to use SHAPLEY values to figure out feature importance. So we plotted the SHAPLEY value at the instance of anomaly and the summary of SHAPLEY value using tree explainer.
view of dataset when the anomaly was observed
The SHAPLEY values were calculated using two methods.
By providing Isolation forest to the “tree explainer”
By training an xgboost regressor with feature values as inputs and anomaly scores as predictions, to the “tree explainer”. ( Followed some example online.)
As shown in the diagram both the plots represent 180 influence of the same feature. I understand these are two different models and the result would differ but to what extent
Also i tried to get sum of shapley values at the point of anomaly but the sum was no where close to anomaly score.
I was unable to correlate. I tried to observe change in the features by ploting percentage change at the point of anomaly. But doesn’t correlate with what we see in SHAPLEY values.
Need guidance on how to interpret the SHAPLEY values and establish correlation. Currently i assume that a higher correlation ( negative or positive ) in SHAPLEY value would be justified by change in absolute value of the corresponding feature i.e. If SHAPLEY is positive and absolute value of change is positive for the feature then in the anomaly score ,contribution of feature should also be positive.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi, We have been trying out anomaly detection using isolation forest. The model present us with anomaly in our data. We are providing about 25 feature. We want to understand contribution of each feature in as a cause for anomaly. For this we decided to use SHAPLEY values to figure out feature importance. So we plotted the SHAPLEY value at the instance of anomaly and the summary of SHAPLEY value using tree explainer.
view of dataset when the anomaly was observed
![newplot](https://private-user-images.githubusercontent.com/48429290/257433722-98e0cad5-ec92-432c-8f7e-b2e6c80e93b2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkzOTE1NjAsIm5iZiI6MTcxOTM5MTI2MCwicGF0aCI6Ii80ODQyOTI5MC8yNTc0MzM3MjItOThlMGNhZDUtZWM5Mi00MzJjLThmN2UtYjJlNmM4MGU5M2IyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDA4NDEwMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWNjYjdkMTVmNDYwNTJhN2I5ZmExYzE2NTgzM2IxNzQzMzI4NTYzZTNmNWQ2NjVkMmI3NTVkMDk2ZGZiNTNkMmEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.NhU6OM5gSgOpRxnIa8CjcQySq8aBhqgRmJ_w9TG3KaQ)
The SHAPLEY values were calculated using two methods.
As shown in the diagram both the plots represent 180 influence of the same feature. I understand these are two different models and the result would differ but to what extent
Also i tried to get sum of shapley values at the point of anomaly but the sum was no where close to anomaly score.
I was unable to correlate. I tried to observe change in the features by ploting percentage change at the point of anomaly. But doesn’t correlate with what we see in SHAPLEY values.
![newplot(2)](https://private-user-images.githubusercontent.com/48429290/257434239-70a79d56-b377-4182-8a05-7b9a2d1c189d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkzOTE1NjAsIm5iZiI6MTcxOTM5MTI2MCwicGF0aCI6Ii80ODQyOTI5MC8yNTc0MzQyMzktNzBhNzlkNTYtYjM3Ny00MTgyLThhMDUtN2I5YTJkMWMxODlkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDA4NDEwMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTZmNGMwMjViNmQ2YzJiNzQzNTBjZTdhZDRiYWUxYzQ3NGYzNDc2YTNlYjJhZTc0ZTRjOTNjMDFiYjdjYjQyYmMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.mkP6P7ROFRB1wi5pGHnfU4fa6rpCwSb1N-wL9bHZSDU)
Need guidance on how to interpret the SHAPLEY values and establish correlation. Currently i assume that a higher correlation ( negative or positive ) in SHAPLEY value would be justified by change in absolute value of the corresponding feature i.e. If SHAPLEY is positive and absolute value of change is positive for the feature then in the anomaly score ,contribution of feature should also be positive.
Beta Was this translation helpful? Give feedback.
All reactions