Contact person
Madhav Mishra
Senior Scientist
Contact MadhavThe ability to interpret decisions made by machine learning algorithms helps ensure important criteria such as safety, fairness, unbiasedness, privacy, and reliability by allowing humans to confirm that the algorithms adhere to regulations and ethical guidelines.
Machine learning (ML) algorithms have proven to be powerful for learning from data and making decisions with high accuracy. In particular, they are able to outperform humans on many specific tasks and learn to generalise within the area of the provided data sets. This has led to a vast interest for deployment of machine learning algorithms in diverse application domains, even within so-called "High Risk" applications where the decisions made by ML algorithms have considerable consequences and impact on a human lives and well-being. A few examples of such applications are: diagnosing patients with a disease, operation of semi-autonomous cars, decision on criminal sentencing, and decision on eligibility for a loan.
The design of a ML algorithm is typically validated based on a criterion of accuracy, i.e., its expected ability to make the least number of mistakes. In the special high risk applications, however, it is suitable to design ML algorithms with additional relevant optimisation criteria besides accuracy. A few of those criteria are safety, fairness, unbiasedness, privacy, and reliability. These criteria, contrary to the criteria of accuracy, are difficult to quantify and hence difficult to include in the algorithm optimisation. Consequently, Explainable AI has emerged as a set of techniques and approaches that aid in ensuring the validity of those other criteria by providing reasoning and explanations to why certain predictions have been made by machine learning algorithms.
In general, three classes of approaches are used for explaining ML predictions. The first class relies on using interpretable models, such as sparse linear models or decision trees, which are easier to comprehend by humans. The second class, called model-agnostic explainability, assumes that the ML models are a priori given and, either these are too complex, or it is not permitted to access their details. The third class corresponds to model-dependent explainability approaches where model-specific information, other than just the predicted output, is used for building explanations rather than treating the classifiers as complete black boxes.
In addition, explainability may be at a global scale, describing the reasoning of the whole ML model, or at a local scale, where we may explain the ML prediction for a specific data case or instance. For both global and local explainability, two popular concepts for providing explanations are counterfactual explanations and feature importance. A counterfactual explanation quantifies the necessary changes in the data instance such that the ML algorithm changes its decision to another outcome that is desirable. In this way, it is possible to understand the factors that lead to the initial decision. If the data attributes are actionable, then counterfactual explanations tell us how we could change them in order to achieve a desirable decision. Feature importance approaches determine how much each data attribute contributed to the ML prediction. A high feature importance value means that the feature contributes greatly to the overall classification, while a low feature importance value means that the feature has small effect on the classification.
Our experts at RISE have strong theoretical and practical experience within Explainable AI which spans over diverse machine learning tasks, models, and data types.