
Intrinsically interpretable AI or machine learning models hold great value because their decision-making processes are inherently transparent and understandable by humans.
Unlike post-hoc explanations that attempt to explain black-box models after training, intrinsically interpretable models reveal their logic during inference by design. This enables users to trace how each input feature contributes to the output. It helps build trust for the model, especially in sensitive business contexts or application domains. Due to their transparent structure, intrinsically interpretable models are easier to debug. They may trade off some predictive accuracy compared to complex black-box models. On the other hand, in scenarios with little data, these models are often more robust due to their (usually) significantly lower number of parameters.
Examples of intrinsically interpretable models
- Decision trees: Model decisions as a sequence of simple, binary splits based on feature values. Also random forests (= ensembles of decision trees)
- Linear regression: Predicts outcomes as weighted sums of input features.
- Logistic regression: This extends linear regression for classification via a nonlinear sigmoid transformation. Generalized Additive Models are a generalization that uses a sum of shape functions instead of linear terms.
- Rule-based models: Use human-readable if-then rules for prediction.
- Naive Bayes classifier: Uses probabilistic assumptions for classification, which are easily interpretable.
- Prototype-based models: Make predictions by similarity to representative cases or prototypes (K-Nearest Neighbors is a popular example).
Further Reading
- Interpretable Machine Learning: a book by Christoph Molnar, a comprehensive source for XAI methods, including intrinsically interpretable models
- scikit-learn: machine learning in Python: a toolkit that implements all major classic machine learning models that are intrinsically interpretable