In the firstdetailed element of our AI framework blog series, Reuben Binns, our ResearchFellow in AI, and Valeria Gallo, Technology Policy Adviser, explore howorganisations can ensure ‘meaningful’ human involvement to make sure AIdecisions are not classified as solely automated by mistake.
This blog forms part of our ongoing work on developing aframework for auditing AI. We are keen to hear your views in the comments belowor you can email us.
ArtificialIntelligence (AI) systems often process personal data to either support or make a decision. For example,AI could be used to approve or reject a financial loan automatically, orsupport recruitment teams to identify interview candidates by ranking jobapplications.
Article 22 ofthe General Data Protection Regulation (GDPR) establishes very strictconditions in relation to AI systems that make solely automated decisions, iewithout human input, with legal or similarly significant effects aboutindividuals. AI systems that only support or enhance human decision-making arenot subject to these conditions. However, a decision will not fall outside thescope of Article 22 just because a human has ‘rubber-stamped’ it: human inputneeds to be ‘meaningful’.
The degreeand quality of human review and intervention before a final decision is madeabout an individual is the key factor in determining whether an AI system issolely or non-solely automated.
Boardmembers, data scientists, business owners, and oversight functions, amongothers, will be expected to play an active role in ensuring that AIapplications are designed, built, and used as intended.
The meaningfulnessof human review in non-solely automated AI applications, and the management ofthe risks associated with it, are key areas of focus for our proposed AIAuditing Framework and what we will be exploring further in this blog.
What’s already been said?
- Human reviewers must be involved in checkingthe system’s recommendation and should not “routinely” apply the automated recommendation to an individual;
- reviewers’ involvement must be activeand not just a token gesture. They should have actual “meaningful” influence onthe decision, including the “authority and competence” to go against therecommendation; and
- reviewers must ‘weigh-up’ and‘interpret’ the recommendation, consider all available input data,and also take into account other additional factors’.
Themeaningfulness of human input must be considered in any automateddecision-making systems however basic (e.g. simple decision trees). In morecomplex AI systems however, we think there are two additional factors thatcould potentially cause a system to be considered solely-automated. They are:
What do we mean by automation bias?
AImodels are based on mathematics and data, and because of this people tend tothink of them as objective and trust their output.
Theterms automation bias or automation-induced complacency describe how human usersroutinely rely on the output generated by a computer decision-support systemand stop using their own judgement, or stop questioning whether the outputmight be wrong. If this happens whenusing an AI system, then there is a risk that the system may unintentionally beclassed as solely automated under the law.
What do we mean by lack of interpretability?
Sometypes of AI systems, for example those using deep learning, may be difficultfor a human reviewer to interpret.
Ifthe inputs and outputs of AI systems are not easily interpretable, and otherexplanation tools are not available or reliable, there is a risk a human willnot be able to meaningfully review the output of an AI system.
Ifmeaningful reviews are not possible, the reviewer may start to just agree withthe system’s recommendations without judgement or challenge, thiswould mean the decision was ‘solely automated’.
Organisationsshould take a clear view on the intended use of any AI application from thebeginning. They should specify and document clearly whether AI will be used toenhance human decision-making or to make solely automated decisions.
The managementbody should review and sign-off the intended use of any AI system, making surethat it is in line with the organisation’s risk appetite. This means boardmembers need to have a solid understanding of the key risk implications associatedwith each option, and be ready and equipped to provide an appropriate degree ofchallenge.
Themanagement body is also responsible to ensure clear lines of accountability andeffective risk management policies are in place from the outset. If AI systemsare only intended to support human decisions, then such policies should specificallyaddress additional risk factors such as automation bias and lack of interpretability.
It ispossible organisations may not know in advance whether a partly or fullyautomated AI application would meet their needs best. In such cases, their riskmanagement policies and Data Protection Impact Assessments (DPIAs) shouldreflect this distinctly, and include the risk and controls for each optionthroughout the AI system’s lifecycle.
You may thinkautomation bias can be addressed chiefly by improving the effectiveness of thetraining and monitoring of human reviewers. Training is a key component of effectiveAI risk management but controls to mitigate automation bias should be in placefrom the start.
During thedesign and build phase business owners, data scientists and oversight functionsshould work together to develop design requirements that support a meaningfulhuman review from the outset.
They must thinkabout what features they would expect the AI system to consider and which additionalfactors the human reviewers shouldtakeinto account before finalising their decision. For instance, the AI systemcould consider measurable properties like how many years’ experience a jobapplicant has, while a human reviewer assesses the skills of applicants whichcannot be captured in application forms.
If humanreviewers can only access or use the same data used by the AI system, thenarguably they are not taking into account other additional factors. This meansthat their review may not be sufficiently meaningful and the decision may endup being considered solely automated under GDPR.
If needed, organisationshave to think about how to capture additional factors. For example, getting thehuman reviewers to interact directly with the person the decision is about togather such information.
Those incharge of designing the front-end interface of an AI system must understand theneeds, thought process, and behaviours of human reviewers and enable them toeffectively intervene.Itmay therefore be helpful to consult and test options with human reviewers earlyon.
However, thefeatures the AI systems will use will also depend on the data available, thetype of model(s) selected, and other system building choices. Any assumptionsmade in the design phase will need to be tested and confirmed once the AIsystem has been fully trained and built.
Interpretabilityshould also be considered from the design phase.
Interpretabilityis challenging to define in absolute terms and can be measured in differentways. For example:
- Canthe human reviewer predict how the system’s outputs will change if givendifferent inputs?
- Canthe human identify the most important inputs contributing to a particularoutput?
- Canthe human identify when the output might be wrong?
This is whyit is important for organisations to define and document what interpretabilitymeans, and how to measure it, in the specific context of each AI system they wishto use.
Some AI systemsare more interpretable than others. For instance, models that use a smallnumber of human-interpretable features (e.g. age and weight), are likely to be easierto interpret than models that use a large number of features, or involve heavy ‘pre-processing’2.
The relationshipbetween the input features and the model’s output can also be simple orcomplicated. Simple “if-then” rules, which can describe decision trees, will beeasier to interpret. Similarly, linear relationships (where the value of theoutput increases proportional to the input) may be easier to interpret thanrelationships that are non-monotonic (where the output value is not proportionalto the input) or non-linear (where the output value may increase or decrease asthe input increases).
One approachto address low interpretability is the use of Local InterpretableModel-agnostic Explanations (LIMEs), which provide an explanation of the outputafter it has been generated. LIMEs use a simpler surrogate model to summarisethe relationships between input and output pairs that are similar to those inthe system you are trying to interpret. In addition to summaries of individualpredictions, LIMEs can sometimes help detect errors (e.g. to see what part ofan image classifier has mistakenly been classified as a certain object). However,they do not represent the actual logic underlying the AI system and can bemisleading if misused.
Many statisticalmodels can also be designed to provide a confidence score alongside each output,which could help a human reviewer in their own decision-making. A lowerconfidence score would indicate that the human reviewer needs to have moreinput into the final decision.
Assessing theinterpretability requirements should be part of the design phase, allowingexplanation tools to be developed as part of the system if required.
Organisationsshould try to maximise the interpretability of AI systems, but as we willexplore in future blogs there will often be difficult trade-offs to make (eg interpretabilityvs. accuracy).
This is whyrisk management policies should establish a robust, risk-based, and independentapproval process for each AI system. They should also set out clearly who isresponsible for the testing and final validation of the system before it isdeployed. Those individuals should be accountable for any negative impact oninterpretability and the effectiveness of human reviews and only provide sign-offif AI systems are in line with the adopted risk management policy.
Training ispivotal ensuring an AI system is considered non-solely automated.
As a startingpoint, human reviewers should be trained:
- tounderstand how an AI system works and its limitations;toanticipate when the system may be misleading or wrong and why;
- tohave a healthy level of scepticism in the AI system’s output and given a senseof how often the system could be wrong;
- tounderstand how their own expertise is meant to compliment the system, and be providedwith a list of factors to take into account;
- and toprovide meaningful explanations for either rejecting or accepting the AIsystem’s output – a decision they should be responsible for. A clear escalationpolicy should also be in place.
In order for the training to beeffective, it is important that human reviewers have the authority to overridethe output generated by the AI system and they and are confident that they willnot be penalised for so doing. This authority and confidence cannot be createdby policies and training alone: a supportive organisational culture is alsocrucial.
We havefocussed here on the training of human reviewers, however it is worth notingthat organisations should also consider whether any other function, eg risk orinternal audit, require additional training to provide effective oversight.
The analysisof why, and how many times, a human reviewer accepted or rejected the AIsystem’s output will be a key part in an effective risk monitoring system.
If riskmonitoring reports flag that human reviewers are routinely agreeing with the AIsystem’s outputs, and cannot demonstrate they have genuinely assessed them, thentheir decisions may effectively be classed as solely automated under GDPR.
Organisationsneed to have controls in place to keep risk within target levels, including, ifnecessary, stopping the processing of personal data by the AI system, eithertemporarily or permanently.
We are keento hear your thoughts on this topic and welcome any feedback on our currentthinking. In particular, we would appreciate your views on the following twoquestions:
1)Whatother technical and organisational controls do you think organisations shouldput in place to reduce the risk of AI systems falling within the scope of GDPRArticle 22 by mistake?
2)Arethere any additional risk factors, in addition to interpretability andautomation bias, which we should address in this part of our AI AuditingFramework?
Dr Reuben Binns, a researcher working on AI and data protection, joined the ICO on a fixed term fellowship in December 2018. During his two-year term, Dr Binns will research and investigate a framework for auditing algorithms and conduct further in-depth research activities in AI and machine learning.
Valeria Gallo is currently seconded to the ICO as a Technology Policy Adviser. She works with Reuben Binns, our Artificial Intelligence (AI) Research Fellow, on the development of the ICO Auditing Framework for AI. Prior to her secondment, Valeria was responsible for analysing and developing thought leadership on the impact of technological innovation on regulation and supervision of financial services firms.
 ‘AIsystem’ refers to Artificial Intelligence software which generates ‘outputs’ or‘recommendations’ relating to a decision, for instance, whether or not to granta customer a loan or invite an applicant to an interview (elsewhere, these maybe referred to as ‘decision support systems’). Such recommendations will oftenbe based on the outputs of a ‘machine learning model’ trained on data togenerate predictions or classifications.
 Pre-processing is a practice in machine learning thatinvolves modifying the training data so that it is more useful and effective inthe learning process.