Method overview
Goals: Understanding feature importance, especially for image classification tasks. The goal is to enable users to “see” what the neural network focused on while performing its task.
What it does: Grad-CAM generates a heatmap that visually highlights the regions of an input image that were most influential in the model’s prediction. The heatmap is created by looking at which parts of the image caused the most “activation” in the last convolutional layer before the model made its prediction. The resulting heatmap is then used as a visual overlay for the input image. This allows one to visually see which parts of the image the model focused on to make its decision, especially in image classification tasks.
Limitations: Grad-CAM produces quite coarse localization maps.
XAI taxonomy: model-specific, post-hoc, local
Details and further intuition
Let’s start with some intuition: Why does Grad-CAM use the last convolutional layer? This is because at this point, features are already semantically aligned with classes, and it’s the last point where spatial correspondence to the input persists. In contrast, earlier layers capture more low-level features (edges, colors, textures). These are too primitive to be semantically meaningful on their own. A saliency map from them would intuitively highlight “everything everywhere.” In contrast, the last convolutional layer retains spatial layout (albeit at lower resolution) and encodes high-level, semantically meaningful features (e.g., object parts). It can do that because it has a receptive field large enough to “see” whole objects, not just edges.
Now, let’s examine some of the technical details. The last convolutional layer has multiple feature maps, so to obtain a final saliency map, these feature maps need to be combined in a meaningful way. Grad-CAM achieves this by assigning weights to these feature maps that capture their relative importance for the class being investigated. Yes, each Grad-CAM image is relative to a specific class, which we will denote as \(c\). To determine the weight for an activation map, Grad-CAM considers the gradients of the class output for class \(c \) with respect to this activation map:
\[ \alpha_c^k = \frac{1}{Z}\sum_i\sum_j \frac{\partial y^c}{\partial A_{ij}^k}\]
where \(A_{ij}^k \) is the activation at spatial location \( (i,j) \) in feature map \(k\), and \(y^c \) is the score (before softmax) for class \(c\). This corresponds to global average pooling of the gradients. Here, \( \frac{\partial y^c}{\partial A_{ij}^k}\) measures how sensitive the class score is to the activation at that location.
Next, the weighted combination of the feature maps is computed, and only the positive part is preserved (since we are only interested in the parts that positively influenced the prediction score for class \(c\)).
\[ L_{Grad-CAM}^c = ReLU \Big(\sum_k \alpha_c^k A^k\Big).\]
Note that this has the same resolution as the chosen convolutional layer, which is usually much smaller than the input dimension. So, the heatmap, when upscaled to the input dimension, will be rather coarse.
Intuition for how feature maps are combined. TODO. If we just summed over all locations, locations with large gradients (positive or negative) would dominate. But Grad-CAM wants a single scalar importance weight per feature map. Taking the spatial average of gradients aggregates these sensitivities into a global importance measure for that feature map.
- If a map is generally useful for boosting class \(c \), its gradients will be large and positive across many spatial locations → high positive \(\alpha^k_c\).
- If the map tends to suppress class \(c \), the gradients will be negative
- If the gradients are near zero, the map isn’t relevant for class \(c \).
Further resources
- original paper: Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
- Code/Packages/Implementations:
- from this project see also: LIME (Local Interpretable Model-Agnostic Explanations)