Algorithms, particularly of the machine learning (ML) variety, are becoming increasingly important to individuals’ lives both in the public and the private sector; they’re helping to choose what we see when we query the Google search engine or visit our social media feeds, they help decide whether we are eligible for a mortgage or a loan, whether we will be hired or fired, whether someone will be granted parole, should we be accepted to university. As “software is eating the world”[1] human beings are increasingly surrounded by technical systems which make decisions that “they do not understand and feel like they have no control over”[2].
This is nothing new; users have long been disturbed at the idea that machines might make decisions for them, which they could not understand or countermand; a vision of out of control authority which derives from earlier notions of unfathomable bureaucracy found everywhere from Kafka to Terry Gilliam’s Brazil[3]. Turning to ML systems, this has led some to caution against the rise of a „black box society“ and make demands about increased transparency in algorithmic decision-making. Zarsky argues that the individual adversely affected by a predictive process has the right to “understand why” and frames this in familiar terms of autonomy and respect as a human being[4].
Therefore, when in April 2016, for the first time in over two decades, the European Parliament adopted a set of comprehensive regulations for the collection, storage and use of personal information: the General Data Protection Regulation[5] (GDPR), the set of our transparency rights started looking brighter. The new regulation has been described as a “Copernican Revolution” in data protection law, “seeking to shift its focus away from paper-based, bureaucratic requirements and towards compliance in practice, harmonization of the law, and individual empowerment”[6]. In it, certain specific provisions garnered a lot of media attention and which, at some meta level, try to deal with some of the above mentioned transparency concerns through the provisions on automated decision making.
This flurry of interest has focused on a so-called ‘right to an explanation’ that has been claimed to have been introduced in the GDPR. This claim was fuelled in part by a short conference paper by Goodman and Flaxman presented at a ML conference in 2016. As you can probably guess, the right to an explanation would require that autonomous devices and programs tell consumers how the AI reached a decision. It would, allegedly, put consumers in a better position to evaluate and potentially correct these kinds of decisions. It would also give consumers the opportunity to see how their personal data is used to generate results. Sounds great, right? The nature of this requirement in the GDPR, however, is not completely clear. The General Data Protection Regulation does not provide any specific format or content requirements, leaving experts in the field to make educated guesses and recommendations.
In all of the GDPR, a right to explanation is only explicitly mentioned in Recital 71, which states that „a person who has been subject to automated decision-making should be subject to suitable safeguards, which should include specific information to the data subject and the right to obtain human intervention, to express his or her point of view, to obtain an explanation of the decision reached after such assessment and to challenge the decision“. If recitals were legally binding, this provision would require an ex post explanation of specific, individual decisions, as Recital 71 addresses safeguards to be in place once a decision has been reached. The aforementioned paper by Goodman and Flaxman further combines this non-binding Recital 71 with binding provisions of Articles 13-14 (duties of notification of the data processor when processing of data will take place) and Article 22 (dealing specifically with the data subject’s right to object to and to opt out of automated decision-making) to argue that “the law will […] effectively create “right to explanation,” whereby a user can ask for an explanation of an algorithmic decision that was made about them.”
Contrary to this opinion, Wachter, Mittelsttad and Floridi[7] argue that articles 13 ad 14 simply create ‘notification duties’ for data controllers (e.g. companies that hold personal data about us) at the time when data is collected. This means that the data controller needs to inform their customer about “the existence of automated decision-making, including profiling […] and, at least in those cases, meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject”. However, users can request that same information at any time – meaning even after decisions have been made – using their “right of access” in Art 15. Therefore, phrasings of articles 13-15 are future-oriented, and appear to refer to the existence and planned scope of decision-making itself, rather than to the circumstances of a specific decision, as suggested in Recital 71. Hence data controllers only need to inform about system functionality (e.g. its decision tree or rules, or predictions about how input data will be processed), since no specific automated decision was made yet that could be explained. Nevertheless, given the lack of an explicit deadline for invoking the right of access, one cannot be certain, on the basis of semantics alone, that the right of access is limited to explanations of system functionality. Edwards and Veale argue that in contrast to art 13 and 14, art 15 refers to rights of “access” to data held by a data controller. This seems to imply data has been collected and processing has begun or taken place. Therefore ex post tailored knowledge about specific decisions made in relation to a particular data subject could be provided, i.e. “the logic or rationale, reasons, and individual circumstances of a specific automated decision”. Authors suggest this division seems moderately sensible and could promise a right to an explanation ex post.
However, even if despite these textual quibbles the GDPR was to introduce „a right to explanation“, the key issue that emerges is where to position the explanations that are conceived of legally – as „meaningful information about the logic of processing“– on the spectrum of ML „explanations“ computer scientists have been developing. What are the answers we are looking for with these explanations? Can we even look for „whys“ and „hows“ for the purpose of the individual vindication of rights? In other words, what information is „meaningful“ and to whom?
In short, information provided with a model-centered approach (MCE) centers on and around the model itself and could include: setup information (the intentions behind the modelling process), the family of model (neural network, random forest, ensemble combination), the parameters used to further specify it before training; training metadata: summary statistics and qualitative descriptions of the input data used to train the model, the provenance of such data, and the output data or classifications being predicted in this model, performance metrics: information on the model’s predictive skill on unseen data etc[8]. Therefore, MCEs provide one set of information to everyone (system functionality information), but there are limitations on how detailed and practical— and thus, how “meaningful” — such an explanation can be individually.
It seems that as A.I. and autonomous devices become more adept at interpreting our personal data, and our reliance on those devices increases, it becomes more important (meaningful?) for every user to know what personal data is used and how it’s being used. For data subjects, privacy concerns here embrace an enormous weight of issues about how data about them are collected to be bent into profiles, how they can control access to and processing of these data and how they might control the dissemination and use of derived profiles (in particular, ML and big data analytics in general are fundamentally based around the idea of repurposing data). Hildebrandt pointed out, what we increasingly want is not a right not to be profiled — which means effectively secluding ourselves from society and its benefits — but to determine how we are profiled and on the basis of what data — a “right how to be read”[9]. So what happens in machine learning and are the above mentioned perspectives on what constitutes „meaningful“ information realistic? This is obviously oversimplifying, but for the purpose of this article let’s say transparency around system functionality and acess to models gets you nowhere without the data. Furthermore, people mistakenly assume that, personalization, for example, means that decisions are made based on their data alone. To the contrary, the whole point is to place your data in relation to others’. Take, for example, Facebook’s News Feed. Such systems are designed to adapt to any type of content and evolve based on user feedback (e.g., clicks, likes and other metadata supplied unknowlingly). When you hear that something is “personalized,” this means that the data that you put into the system are compared to data others put into the system such that the results you get are statistically relative to the results others get. Even if you required Facebook to turn over their News Feed algorithm, you’d know nothing without the data of the rest of your “algorithmic group”. This seems extremely difficult to organise in practice, as well as probably also involving unwanted privacy disclosures. Therefore, it’s not that you can’t get some information about the model or some information about the data but due to the complexity and multi-dimensionality involved in most of these systems, it is highly unlikely your precise „why“ or „how“ explanation will be possible.
Here’s a possibly provocative statement – algorithmic transparency, as imagined around this so-called right to explanation, creates false sense of individual agency or a transparency fallacy if you will. As Burrel pointed out, we are dealing with an opacity problem that stems from the mismatch between mathematical optimization in high-dimensionality characteristic of machine learning and the demands of humanscale reasoning and styles of semantic interpretation[10]. When we talk about algorithms we talk about them as if they are making decisions when they’re not. What algorithms are doing is that they are just giving us conditional probabilities as opposed to some more standard and human ways of analyzing things which are based on causality. Therefore, how sure are we that individual explanations are actually an effective remedy and if so, to achieve what? Isn’t it more practical to search for other ways in which the GDPR secures a better algorithmic society as a whole, rather than providing individual users with rights as tools which they may find impossible to wield to any great effect?
Individuals are mostly too time-poor, resource-poor, and lacking in the necessary expertise to meaningfully make use of the above mentioned ML information but again, still have interest in understanding how decisions are made about them. On the other hand, we have data controllers that have a legitimate interest of not disclosing their trade secrets, which could happen if they are required to give detailed explanations about their algorithmic decision-making processes and methods. A third party like a „neutral data arbiter“ could help to find the middle ground. Different disclosure mechanisms, such as periodic reports by ombudspeople may be more appropriate for factors like benchmarks, error analysis, or the methodology of data collection and processing, since they may not be of interest to or even comprehensible for many users yet demanded by those who value an expert account and explanation [11].
With regard to the GDPR, in particular the mandatory requirements for Privacy by Design (GDPR Art 25), Data Protection Imapact Assesments (GDPR Art 35), appointing Data Protection Officers (Art 37) and opportunities for certification systems, might go beyond the individual to focus a priori on the creation of better algorithms, as well as creative ways for individuals to be assured about algorithmic governance[12].
Still, all of these possible solutions bring with them a real danger of formalistic bureaucratic overkill alongside a lack of substantive change if we don’t know what values we’re aiming for. We think that if the process is transparent, we can see how unfair decisions were made, but we don’t even know how to define terms like fairness. We can’t define when we care about which kind of explanation, when we care about which form of legitimacy, what even constitutes a reasonable explanation that draws the line between ethically desirable and technically feasible. No one believes that computer scientists should be making the final decision about how to trade off different societal values. But they’re the ones who are programming those values into a system — and if they don’t have clear directions, they’re going to build something that affects peoples’ lives in unexpected ways. If we aren’t clear about what we want, accountability doesn’t stand a chance. How about before we ask algorithms to do better we challenge and scrutinize our ability to build artificial worlds and live inside them?
[1] Andreesseen, Marc 2011 ‘Why Software Is Eating The World’. Wall Street Journal, http://www.wsj.com/articles/SB10001424053111903480904576512250915629460 (accessed September 1st 2017).
[2] Article 29 Data Protection Working Party (2013) Opinion 03/2013 on Purpose Limitation, http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2013/wp203_en.pdf (accessed September1 st 2017).
[3] Edwards, Lilian and Veale, Michael, Slave to the Algorithm? Why a ‘Right to an Explanation’ Is Probably Not the Remedy You Are Looking For (May 23, 2017). Duke Law & Technology Review, pg 15.
[4] Ibid, pg 16.
[5] REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) 2016. http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679&from=EN (accessed September 3rd 2017).
[6] Bryce Goodman and Seth Flaxman, EU regulations on algorithmic decision making and “a right to an explanation”, 2016 ICML workshop on human interpretability in ML (2016).
[7] Sandra Wachter et al., Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation, __ INTERNATIONAL DATA PRIVACY LAW __ (forthcoming), doi:10.1093/idpl/ipx005.
[8] Edwards, Veale (n 3).
[9] Mireille Hildebrandt, Smart Technologies and the End(s) of Law: Novel Entaglements of Law and Technology (Edward Elgar, 2015).
[10] Jenna Burrell, ‘How the Machine “Thinks:” Understanding Opacity in Machine Learning Algorithms’ [2016] Big Data & Society.
[11] Nicholas Diakopoulos & Michael Koliska (2016): Algorithmic Transparency in the News Media, Digital Journalism, DOI: 10.1080/21670811.2016.1208053.
[12] Edwards, Viele (n 3).