Regulating AI

An EU proposal to keep your eyes on

I saw on Andrew Gelman & Co’s blog that the EU is considering a proposal to regulate the use of several data science techniques, among them bayesian inference

I gave the proposal a skim; here are some highlights:  

Some of the prohibited practices seem like they should be prohibited:

...subliminal techniques beyond a person’s consciousness in order to materially distort a person’s behaviour in a manner that causes or is likely to cause that person or another person physical or psychological harm

...exploits any of the vulnerabilities to cause that person or another person physical or psychological harm;

They also seem to prevent the government from using any kind of individual risk-assessment algorithm unless it meets stringent fairness criteria. Given how thorny algorithmic fairness is even in theory, let alone in practice, I have to assume this will curtail their use. Which, I suppose, is the point. Here’s the language on what’s not allowed:

...use of AI systems by public authorities … for the evaluation or classification of the trustworthiness of natural persons … based on their social behaviour or known or predicted personal or personality characteristics … leading to either or both of the following:

(i)  detrimental … treatment of … persons or whole groups ... in social contexts which are unrelated to the contexts in which the data was originally generated or collected;
(ii) detrimental … treatment of … persons or whole groups ... that is unjustified or disproportionate to their social behaviour or its gravity;

And rounding out the prohibitions, the regulation would place hard limits on the use of facial recognition (and other biometric systems) by law enforcement: (note that Amazon implemented a one-year moratorium on selling their facial recognition product to law enforcement last summer)

(d)  the use of ‘real-time’ remote biometric identification systems in publicly accessible spaces for the purpose of law enforcement, unless and in as far as such use is strictly necessary for one of the following objectives:

Beyond prohibitions, the proposed regulation places restrictions on AI that it deems “high risk”. These are uses of AI that are allowed, but require safeguards. It’s hard to parse much of what they deem high risk, but the things they list in Annex III to the regulation are pretty clear -- things like: 

Management and operation of critical infrastructure:

(a) AI systems intended to be used as safety components in the management and operation of road traffic and the supply of water, gas, heating and electricity. 

Or, continuing the algorithmic fairness theme:

Education and vocational training:

(a) AI systems intended to be used for the purpose of determining access or assigning natural persons to educational and vocational training institutions

Beyond a clear definition of what constitutes high risk, my main concern with this regulation is in what providers have to do to gain compliance. It lists several requirements, but most (all?) are not well-defined. Take for example:

To address the opacity that may make certain AI systems incomprehensible to or too complex for natural persons, a certain degree of transparency should be required for high-risk AI systems. 

“A certain degree of transparency”. Perhaps that’s more clearly defined elsewhere; at several hundred pages I did not read everything in detail, but given the number of paragraphs that were similarly opaque, my guess is that it is not.

Overall, this seems well intentioned, and certain of the prohibitions seem reasonable. But in the grey area, the details are vague. It might be worth taking a stab at the motivation behind this proposal, why the details are vague, and why even well intentioned regulations such as this may backfire.


Much of the regulation seems aimed at establishing standards for algorithmic fairness, which this paper introduces thusly:

The growing use of algorithms in social and economic life has raised a concern: that they may inadvertently discriminate against certain groups.

Examples abound. The paper above leads with the example of natural language processing algorithms relating the word “nurse” more closely with the word “he” than with “she”. Facial recognition algorithms notoriously work better with white faces than with non-white faces.

Unfortunately, though, eliminating bias in algorithms is anything but straightforward. Even defining what this means, let alone implementing it, is a thicket of conceptual issues. One paper in particular, Davies and Goel’s Measure and Mismeasure of Fairness, convinced me that fairness is exceedingly hard to define.

This paper takes three formal definitions of fairness and shows how they each lead to categorically unfair results. Two of the definitions are fairly complex, so I’ll save them for a different post, but the first definition is fairly simple. It’s called “anti-classification”, and it simply means that you (or your statistical model) cannot consider certain attributes when making predictions.

Take, for example, this comment. Obamacare did not allow insurance companies to use gender as a factor when setting premiums. But since everything, more or less, correlates with gender, all of the variables they do use, such as whether or not you’re a smoker, contain information about whether or not you’re a man or a woman.

This simulation compares a situation where (a) being a man and (b) being a smoker contribute to mortality, and (c) being a man contributes to being a smoker. This table compares the error of a model that includes gender as a predictor versus one that does not (the “fair” model). And lo! The error of the “fair” model is (1) larger by an order of magnitude and (2) systematically different by gender: women have their risks systematically over-predicted and men under-predicted:

While gender is not explicitly in the model, it is implicitly. And in this case, you can find it in the error term, among other places.

The point here is not that these issues aren’t important, just that they are tricky, and that intuitive approaches often fail to produce the results we expect.

One might therefore argue that certain applications of AI are simply too difficult to get right, and that we should therefore prohibit them. I see a place for the precautionary principle here, especially where civil liberties are at stake. But in more prosaic applications, even if those “prosaic” applications include life-defining moments like loan applications, education decisions, etc., we should be skeptical of regulations that ham-fistedly require certain “fairness” criteria to be met. They can be anything but fair.

We should also be mindful of the relevant baseline. Algorithms may contain bias, but so do people. Is being declined for a loan more “black box” than being declined by a loan officer? Certainly not; at least in principle, even the blackest-box algorithm can be inspected by algorithms such as lime. The same is not true for a person.