Romy den Hartog

Post-hoc explanation methods

Some machine learning models are already intrinsically interpretable. Decision trees can be interpreted according to rules. Linear/logistic regression can be interpreted using a linear/logistic relationship. In addition, k-Nearest Neighbours can be interpreted as a result that resembles similar examples in the dataset. In addition, there are more complex machine learning methods. For these models you can use post-hoc explanation models (also possible for intrinsically interpretable models). These explanation models are additionally applied after the model has already been trained. Post-hoc methods can give explanations separated from the machine learning model (model-agnostic) or model-specific. It also may be for the whole model or for a single instance.

Model-agnostic post-hoc methods

Partial Dependence Plot: This method sets one independent variable to one specific value for all instances. It shows the marginal effect of the independent variable on the dependent variable. The relationship can also be visualized.

Individual Conditional Expectation: This method differs from the Partial Dependence Plot in that the marginal effect of the independent variables is calculated for each instance. This can uncover heterogeneous effects.

Local surrogates: This method is used to explain individual predictions by using a surrogate linear regression model.

Global surrogates: This method approaches the predictions of the complex model by using a simpler interpretable surrogate model.

Shapley values: This computationally expensive method determines the contribution of the independent variables to the prediction for a given instance.

Explanation and choosing intrinsically interpretable models

Usually, there is a trade-off between the performance of the Machine Learning models and the complexity/interpretability. Depending on the situation, a decision has to be made about this. If the stakes are high, it might be a better way to look at interpretable models. It can be said that among the choice of different models, the model must be chosen with the simplest explanation as well as having all the necessary information/fit.

Reference:

Gevaert, C. M. (2022, augustus). Explainable AI for earth observation: A review including societal and regulatory perspectives. International Journal of Applied Earth Observation and Geoinformation, 112, 102869. https://doi.org/10.1016/j.jag.2022.102869

Molnar, C. (2020). Interpretable Machine Learning. Leanpub.

Ragini, R. (2021, 10 December). Principle of Parsimony – Ruhi Ragini. Medium. https://medium.com/@ruhi3929/principle-of-parsimony-d510356ca06a

5/5 (1)

Data is used to support and shape business strategy. In addition, it can strengthen various business processes. It is therefore logical that more and more companies see value in obtaining and analysing data. They must be protected as we would any major asset. But who actually owns this collected data, the subject about whom the information is collected, the company generating the data, the person collecting the data or the person processing the data? What rights does this include?

Copyright

A database, as stated in Art. 1(2) Database Dir. (96/9/EC), is a collection of independent works, data, or other materials arranged in a systematic or methodical manner and individually accessible through electronic or other means. It is important to note that this does not refer to the individual components of the dataset, but to the protection of the entire database. A database may be protected by copyright, as by the selection or arrangement of their contents they constitute the author’s own intellectual creation and are therefore original. The author, who created the base, shall have the exclusive rights to reproduction, distribution, communication and adaptation.

Database rights

Since 1996, a guideline has already been made which can protect a database, which is not protected by copyright, against extraction and/or re-utilization from insubstantial parts of the dataset. This guideline is called the Sui Generis Right, Art. 7(1) Ddir (96/9/EC). Instead of authorship, this guideline examines the extent to which, in a qualitative or quantitative sense, there is a substantial investment in the field of obtaining, verifying and presenting the data. In the case BHB/William Hill (2004, case C-203/02) it was clarified what is meant by substantial investment and the concept of whole/substantial part. The BHB organization conducts horse racing in Great Britain and has a database on this. William Hill offers off-course bookings using two websites that display a small portion of the database’s content. The case concerned the possible infringement of BHB’s rights by William Hill posting and using the information obtained from the BHB database on the William Hill websites. Ultimately, the Supreme Court’s ruling is that substantial investment does not cover the resources used to create materials that make up the contents of a database. In this context, this means that the drawing up of a horse list by BHB was not seen as the creation of a database but as the data itself. But a by-product of its main activity, organizing horse racing. It was further explained that there is lawful use of a database as long as the cumulative effect of repeatedly retrieving/reusing data cannot largely reconstruct the content or may prejudice the copyright holder.

Consequences

Due to the elaboration of the EU database directives, the scale and scope of database protection under the EU database directives are more limited than expected, especially for sole-source databases. In addition, calling on a database can be complicated because the explanation of the concept ‘substantial investment’ is interpreted in a limited way and finding out in which part of a database a company has invested a lot is difficult. How do you think international law will change as a result of emerging technologies?

Reference

F.C. Folmer, ‘Arrest British Horseracing Board/William Hill: het einde van de spin-offtheorie in het databankrecht?’, NtER 2005-3/4, p.

What is the best way for a data scientist to communicate? Interpretable Machine Learning

18