Method overview
Goals: understanding feature importance.
What it does: LIME explains individual predictions of a black-box machine learning model by fitting an interpretable and simple surrogate model locally around the input. This surrogate model directly utilizes parts of the input to make its prediction, indicating which features may have influenced the original model’s prediction. This allows users to understand which parts of the input data contributed most to a model’s decision.
Limitations: LIME can be unstable or inconsistent, especially with high-dimensional or complex data. LIME also tries to explain a prediction, but not whether that prediction is correct or wrong.
XAI taxonomy: model-agnostic, post-hoc, local (global possible when used for multiple instances)
Details and further intuition
To explain the prediction for a given image, Lime first segments the image into groups of similar neighboring pixels, such as those with similar colors. These groups are called superpixels. They can be seen as candidate regions for explaining the prediction. To determine which superpixels are good explainers, LIME’s goal is subsequently to identify which superpixels are relevant for the prediction. To measure this importance, LIME intuitively turns some superpixels “off” to see to what degree the prediction changes.
Specifically, LIME creates perturbed images by randomly turning some superpixels “off”, usually by replacing them with a fixed color. This results in a new image where some regions look unchanged, while others are visually blanked out.
LIME then passes each perturbed image through the black-box model, which produces a prediction for the perturbed image. By repeating this process many times, LIME builds a dataset of perturbed images along with their corresponding predictions.
Based on this newly created data, LIME fits the interpretable surrogate model. Here, we use a linear model. LIME solves an optimization problem that consists of (1) a faithfulness term that measures how well the predictions of the surrogate linear model align with the original model, and (2) a regularization term that helps to keep the complexity of the surrogate model low. Mathematically, LIME solves
\[ \min_\beta \sum_{i=1}^n\pi_i(y_i-\beta^T z_i)^2+\lambda \|\beta\|^2. \]
where
- \( n \) is the number of perturbed samples that have been created,
- \( y_i \) is the black-box prediction for the \( i \)-th perturbed sample (this is the target variable for the linear regression model),
- \( z_i \) is the binary vector encoding which superpixels are present/absent in the \(i \)-th perturbed sample (these are the features of the linear model),
- The regression coefficients \( \beta_j \) quantify the contribution of the \(j\)-th superpixel to the prediction: positive values indicate a superpixel increases the prediction for the class, whereas negative values indicate the opposite,
- \( \pi_i \geq0\) is a proximity weight for the \(i\)-th sample that measures how similar the perturbed sample is to the original image (more similar samples have higher weights),
- \( \lambda \) is a regularization parameter.
The explanation for the image is the set of superpixels with the largest positive or negative coefficients, typically visualized directly on the image.
Further resources
- original paper: “Why Should I Trust You?”: Explaining the Predictions of Any Classifier
- LIME – Interpretable Machine Learning (Interpretable ML Book)
- Code/Packages/Implementations:
- from this project see also: Grad-CAM (Gradient-weighted Class Activation Mapping)