Artificial Sweeteners or Organic Flavors? Inherent Interpretability vs. Post-Hoc Explainability

“Why would one want to have artificial sweeteners and flavors when you can have organic natural flavors?”

A puzzling comment I received recently for a post. I had to pause. Did I post a food review accidentally?

It was from Agus Sudjianto, the person behind PiML and MoDeVa, libraries focusing on interpretable AI.

The post was actually a reading list on explainability for AI risk management (see https://www.linkedin.com/posts/garyang_aiexplainability-interpretableai-airiskmanagement-activity-7404493549909475328-Z7_A?utm_source=social_share_send&utm_medium=member_desktop_web&rcm=ACoAAAnEwqsBmNv-udZ8tKaEG_MQGlUiz7C_KAg).

Agus wasn’t talking about food though. He was drawing a distinction between explanations that emerge inherently from an (inherently interpretable) model versus explanations we bolt on after (with post-hoc methods).

The Distinction

“Explainability” and “interpretability” are terms that are used interchangeably.

But there is a practical distinction that is worth knowing - between 1) a model that is inherently interpretable vs. 2) a method that is applied post-hoc to explain a model’s outputs.

Inherently interpretable models are like natural flavors because the explanation is based on the structure of the model. Models that are inherently interpretable include linear regressions, GAMs. When an inherently interpretable model tells you how an input led to the output, it’s usually not just an interpretation of the model, but a core part of the structure of the model. No translation, no approximation.
Post-hoc explainability are like artificial flavors because the explanation is an approximation. Post-hoc methods for explainability include SHAP and LIME. When you apply a post-hoc method on a model (such as LIME), you are getting an approximation. And the tricky part is that verifying such approximate explanations is hard, and it comes with some potential pitfalls if you trust it at face-value.

To be clear: this is NOT a “post-hoc explainability is bad” argument. These methods have their place in helping understand models, especially complex ones. But do so with an understanding of their weaknesses.

The Weaknesses

First, faithfulness.

Approximate post-hoc explanations might be plausible but wrong. Think about it this way: if you ask someone to explain why they made a decision after the fact, they’ll give you a coherent story. But is that story what actually drove the decision? Or is it a rationalization that sounds good?

Second, stability.

Because post-hoc explainability methods involve sampling, perturbation, or local approximation, slight changes to inputs can produce different explanations. Imagine two similar loan applicants who both get approved. Post-hoc explainability methods can show that income was important for one, and credit history for the other.

Third, confirmation bias.

Given the two issues above, the precise-looking explanations that come from post-hoc explainability methods can create a trap: we see a plausible explanation and assume it’s correct.

A Comparison with a Use Case

As I was thinking about this, I realized I had unknowingly run exactly this experiment. Without knowing it at the time, I had basically run a taste test between natural and artificial flavors. And shown how using both flavors together can be quite useful.

The experiment is “On Predicting ESG Ratings Using Dynamic Company Networks” (Read it at https://dl.acm.org/doi/full/10.1145/3607874). It studies the effects of different types of financial and network information on predicting ESG (Environmental, Social, Governance) ratings.

The “Natural” Approach: Fixed Effects Panel Models. Fixed effects panel models are inherently interpretable. The coefficients directly tell you the relationship, with statistical significance tests built in.
The “Artificial” Approach: XGBoost + SHAP. XGBoost is powerful but opaque. I applied SHAP to understand which features mattered.

So did they agree? They did. Both pointed to some common factors, such as director network centrality.

But here’s the point. I only trusted the SHAP result because the interpretable model confirmed it. If they had diverged, I would not have trusted it.

Are post-hoc methods useless then?

To reiterate, post-hoc methods aren’t useless. They’re valuable as tools for understanding and evaluating models. But treat them as signals to investigate further. Not as the truth.

Artificial flavors have their place. They’re convenient, they’re often good enough, and sometimes they’re all you have. But understand why they may be inadequate.

#AIExplainability #InterpretableAI #AIRiskManagement #XAI #ResponsibleAI