Model Interpretability Part 1: The Importance and Approaches

Words By Nisha Arya Ahmed

November 10, 2022

Amazingly, we can use Machine Learning to make wonderful predictions and help us greatly in the decision-making process. However, they lack interpretability: The ability for humans to easily understand their process and how it derived particular outcomes.

ML model calculations can be so complex that some may find it near impossible to fully understand what an algorithm is doing. Artificial Intelligence algorithms have also been referred to as black boxes, models that are too complex that they are not interpretable by humans. They provide little understanding of the process and their insights are difficult to make understanding of to stakeholders.

Though there are some models that are highly interpretable, such as logistic regression, once you start to complicate the model further by using deep learning, the interpretability gets harder. With the rise of deep learning, interpretability becomes an element we cannot ignore.

AI researchers are diving deep into the realms of machine learning and its interpretability, but interpretability differs from explainability.

Interpretability is defined as: “it is possible to find its meaning or possible to find a particular meaning in it.” And explainability means “capable of being understood.”

Interpretability goes that extra mile in discovering why, by revealing the causes and effects of changes within a model.

This three-part series will dive into the importance of model interpretability as this is an important element for both Data Scientists and stakeholders. Not only do we need to understand the importance of model interpretability, but we also need to understand the different types, and how we should approach interpreting our models using different methods and techniques.

This series will be split up into sections, with Part 1 consisting of:

What is model interpretability
Why is it important
Approaches to model interpretability

Let’s get started…

Why is Model Interpretability Important?

Human Curiosity

The human brain remains curious. It understands the environment around it and learns and updates when something changes. This update is only achieved once the human fully understands and has an explanation for the change. For example, a human can start to feel unwell after consuming cheese. They will then understand that cheese is related to them feeling unwell and go investigate further to understand the cause, for example allergies or lactose intolerance. They have learned to update their knowledge and now know the repercussions of consuming cheese.

However, with machine learning models the understanding of the learned and updated change in the model is far more difficult. If the model can only produce outputs with no explanations, the model’s interpretability is hidden.

As mentioned earlier, more and more machine learning models outcomes are used to make high-stakes decisions. Therefore, the stakeholders and other people within the decision-making process must understand the process of the model and all of its intricacies.

If a model’s interpretability is too difficult, enterprises may legally have to decline the use of the model’s output in current and future insights. Various industries, such as banking and insurance, have to remain compliant and therefore there must be a greater understanding of the overall modeling process from start to end.

If researchers and AI experts fail to understand how a model works, it creates a lack of trust and confidence in the revolution and adoption of AI. Understanding how an algorithm outputs its answers will become more critical by the day.

The simplest of neural networks with no interpretability

The most user-friendly and highly interpretable model out there is the linear model. For example, the predicted outcome Y is dependent on the features X. However, not all models are that simple to understand, explain, and interpret.

With the use of larger datasets with the goal to achieve high accuracy, we require more complex models such as neural networks. Below is a fully connected neural network, simple. However, what information and understanding do we have of what each neuron is doing? Or which input feature contributes to the model’s output? We don’t, therefore we call such models ‘black boxes.’

Approaches to Model Interpretability

Following the methodologies from Christoph Molnar’s book: Interpretable Machine Learning, there are four criteria to approach model interpretability:

By Model
By Method
By Scope
By Results

By Model

This method explores whether a models interpretability has been achieved by its complexity (intrinsic) or the application of the method that analyses the model after training (post hoc).

Intrinsic Interpretability is when a machine learning model is regarded as interpretable due to its structure, for example, linear models or simple decision trees.

Simpler models are easier to achieve interpretability, this is where intrinsically interpretable models are more than adequate. However, as the complexity of the model increases, the harder it is for humans to comprehend.

Post hoc Interpretability is when the interpretation methods after a model has been trained are applied, for example, permutation feature importance.

Post hoc interpretability can be implemented to intrinsic interpretability, such as using permutation features important to improve decision trees.

By Method

Model-specific interpretability uses tools that are limited to specific model classes. For example, regression weight present in a linear model is considered model-specific interpretation. As mentioned prior, interpretability methods that output model internals are known as model-specific. Tools that are primarily used to help with interpretation are model-specific, for example, neural networks

Model-agnostic interpretability uses tools that are applied after a model has been trained, also known as post hoc. Agnostic methods do not have access to the model internals such as the weight of the structure. An example of this is permutation feature importance, where an agnostic method analyses the inputted features and the output pairs.

The biggest advantage of model-agnostic interpretability over model-specific models is flexibility. This allows for Data Scientists and Machine Learning Engineers to use any machine learning model they wish as the interpretation method can be applied to it. Model-agnostic interpretability is the easier method to choose as it can be applied to any type of model, making the evaluation of a task and the comparison of model interpretability much simpler.

Standardizing model management can be tricky but there is a solution. Learn more about experiment management from Comet’s own Nikolas Laskaris.

By Scope

The scope is an important factor when defining the logic of interpretability. It is easier to interpret more on the outputs more if the method explains the entire model or an individual prediction. Or maybe there is a possibility that the scope is in between the two.

Global Interpretability aims to capture the entire model. It focuses on the explanation and understanding of why the model makes particular decisions, based on the dependent and independent variables.

Global Interpretability centers around the understanding of how the model can achieve what it does. How does the model produce these predictions? How does the use of subsets of the data influence the model’s decision and prediction process?

Understanding feature interactions and their importance is a way to better understand global interpretation. For example, breaking down the elements such as subsets of features that may directly or indirectly influence the model’s predictions. A full granulated understanding of the model’s entire structure, features, strengths, and weaknesses are the solid foundations of global interpretability.

Local Interpretability aims to capture individual predictions. It focuses on the specific understanding of a data point and be exploring the feature space around it. This allows us to understand the model’s decisions, allowing for better interpretability.

Local Interpretability will raise questions such as “Why did the model make specific decisions for a group of instances?” Local Interpretability cares little or not at all about the structure of the model and is treated as a black-box model. Understanding the distribution of data and its feature space at a local level can give us a more accurate explanation.

An example of a Local Interpretability Method is the Local Interpretable Model-Agnostic Explanation (LIME) framework that can be used for model-agnostic local interpretation. I will further explore this method in Part 3.

By Results

Results can be explained, and explanations can be presented.

Feature Summary Statistics: Interpretation methods provide a summary of the statistics for each feature. Examples of this are feature importance or pairwise feature interaction.
Feature Summary Visualization: Statistics can be visualized and give you a better understanding of their meaning, allowing for further interpretability. For example, partial dependence plots are curves that visualize a feature and the average predicted outcome.

3. Model Internals: This relates to Intrinsic Interpretability, for example, the weights in linear models or the features learned in a decision tree. There is not a solid line that differentiates model internals and features summary statistics as the weights are features for both model internals and summary statistics. Interpretability methods that output model internals is known as model-specific.

4. Data Point: These are methods that use the return of data points, current and new ones to make a model interpretable. A method used to explore data points is counterfactual explanations, which explains the “what if.” If an input data point was at ‘a’ instead of ‘b’, would the model’s output change to being ‘y’ instead of ‘z’? By changing some of the features, you will be able to interpret if and why the predicted outcome changes.

Conclusion:

In this part, we have covered what is interpretability and its difference from explainability as well as the importance of model interpretability and the different approaches one can take to help with interpreting their model better. In this next part, I will further explain Global Model Agnostic Methods and then Local Model Agnostic Methods.

Stay tuned!

Run open source LLM evaluations with Opik!

Model Interpretability Part 1: The Importance and Approaches