Understanding feature influence

Knowing which input variables (such as image regions or words from a text) influence model decisions helps users build trust. Sometimes it is also necessary, for example, in healthcare diagnostics (which patient measurement explains the AI model’s output?). For researchers and developers, understanding feature influence can be helpful in deciding which features to include in a model at all.

Example Methods

  • SHAP (SHapley Additive exPlanations): Assigns contribution scores to features based on cooperative game theory, providing local and global explanations.
  • LIME (Local Interpretable Model-agnostic Explanations): Approximates complex model outputs locally with interpretable models.
  • saliency-map methods specific to image models:
    • Grad-CAM (Gradient-weighted Class Activation Mapping): Visualizes important image regions by using gradients flowing into the last convolutional layer to highlight relevant areas for a predicted class
    • LRP (Layer-wise Relevance Propagation): Decomposes the prediction backward through the network layers, assigning relevance scores to input pixels.
    • Occlusion Sensitivity: Measures the impact on prediction when parts of the image are occluded or masked to identify important regions.
  • Note: Intrinsically interpretable models, such as linear models or decision trees, allow for direct inspection of their weights to determine the influence of features.

Further Reading