In war, civilians and civilian objects are rarely spared from harm. Sometimes the harm is a deliberate, malicious act, sometimes it is an accident, and most often, in the fog of war, it is hard to find out. Yet, understanding what (or who) caused the harm is critical to ensure compliance with the laws of war (international humanitarian law, or IHL) and to hold wrongdoers to account.
Now, as military technologies increase in complexity, the question is whether the ability to investigate (mis)conduct in war will be as well. For instance, if an algorithm has been used to inform militaries about who, when, and where to strike – who is to be held accountable if a civilian is wrongly attacked, and how would investigators even determine responsibility? While this question is yet to be addressed in public-facing material by modern militaries, a better understanding of the implications of AI on war crime investigations is critical to ensure that humans, not machines, are held to account for unlawful incidents in future wars.
To shed light on this question, this article zooms in on the implications of military use of artificial intelligence (AI) on the ability to collect and assess evidence in case of a harmful incident. It finds that while AI may have the potential to strengthen some aspects of investigations (for example in collecting and preserving evidence), the increased reliance on AI also presents significant challenges, especially around the ability to assess collected evidence.
Ensuring Human Responsibility in the Use of Military AI
Military applications of AI can be implemented at various stages in decisions to apply force, spanning from intelligence gathering to critical functions related to the identification, selection, and engagement of targets. Regardless of how it is used, it has the potential to support and effectuate human decision-making in war. But the use of AI to support military decision-making also raises concerns. One concern is that reliance on algorithmic processes could make it more difficult to trace an IHL violation back to humans, and perhaps result in an accountability gap.
The importance of ensuring human responsibility in the use of military AI is repeatedly expressed in several national and regional principles. Current discussions in the U.N. around the regulation of lethal autonomous weapon systems (LAWS) also serve as a good example. Here, the so-called Group of Governmental Experts (which is organized under the Convention on Certain Conventional Weapons and consists of more than 100 governments) has adopted the non-binding principle that “human responsibility for decisions on the use of weapons systems must be retained since accountability cannot be transferred to machines.” However, despite numerous aspirational principles and declarations, how States intend to retain human responsibility in decisions that involve – and perhaps rely on – military AI is not entirely clear. A better understanding is needed, given current use of AI-enabled weapons on the battlefield is probably just the tip of the AI iceberg.
The Ability to Investigate Unlawful Conduct
One critical (but relatively overlooked) way to ensure human responsibility in the military use of AI pertains to the practical task of investigating harmful incidents. If a weapon (relying on AI or not) is designed or used in a way that prevents a State from effectively investigating harmful incidents, it becomes more difficult to trace back and ensure human responsibility in case of a breach. Indeed, we have already seen how the United States, among many others, has made “traceable AI” – which aims to ensure that relevant personnel “possess an appropriate understanding of the technology” and that there are processes in places to provide transparent and auditable data – a priority in its national positions around responsible use of AI in the military.
The need to investigate incidents is not just a matter of good practice, but a task flowing from States’ bedrock obligations under IHL to repress grave breaches and suppress other violations of IHL. Besides IHL obligations, international criminal law also requires State Parties to investigate and prosecute individuals for war crimes (which are defined in the Rome Statue of the International Criminal Court and include intentionally directing attacks against civilians). However, despite their important role, investigations into potential violations are a tricky and not very transparent affair. Complicating factors include issues around access to relevant information, long delays between the incident and the start of an investigation, and the subjectivity of the assessments. And while IHL may require States to repress and suppress violations, the laws of war do not say anything about how this should be done. States, therefore, enjoy great discretion in terms of how they investigate their own or others conduct, and ultimately, on what basis they deem whether IHL was violated or not.
Despite the lack of universal guidelines on how to investigate unlawful conduct, a deeper understanding of what implications AI could have on this critical task will help to ensure that (human) wrongdoers are held accountable in case of a breach. In the context of the unique features of AI, two aspects of most investigations are particularly important: the ability to collect evidence and to assess evidence.
Implications of AI on the Ability to Collect Evidence
The ability to collect evidence is a crucial first step to ensuring effective investigations. Relevant evidence to be collected could include everything from operational logs and data to intelligence, footage, and information from the ground (such as eyewitness testimonials or weapons payload samples) to information provided by external sources such as non-governmental organizations or local news outlets. That said, IHL provides no explicit requirement for what types of evidence must be collected during an investigation. While States, for example, are expected to collect “sufficient evidence” when bringing criminal charges for a war crime, what constitutes “sufficient” is up to the State alone to decide. The International Committee of the Red Cross does recommend that States “use all feasible means” to collect and preserve evidence. The big – and so far, relatively unaddressed – question is whether AI makes it easier or more difficult to collect evidence and what type of evidence would be considered feasible to collect in the context of weapons that rely on AI.
First, though AI may be considered to improve aspects of targeting decisions, the technical features associated with AI could potentially make it more difficult to collect evidence. An investigation of an incident that involved AI would likely seek insight into the technical workings of the AI systems to collect information about what happened, and why. However, reliance on algorithms in decision-making may lead to more opacity, or what is often referred to as the “black box of AI.” The black box is the result of how AI systems, notably those relying on machine learning algorithms, only allow users to understand system inputs (such as the training data it has been fed) and outputs (such as target recommendations), but not necessarily the precise process in between – how a system arrived at a certain conclusion. Depending on the technical complexity and level of human verification at different junctures, such a situation could prevent investigators from accessing and collecting relevant technical information about the circumstances leading to the application of force.
On the other hand, increased reliance on computing in warfare could potentially also strengthen the ability to collect and preserve information. The war in Ukraine serves as a good example of how new technologies may have optimized evidence gathering. For example, reports indicate that AI has been used to obtain and analyze evidence while algorithms have helped record and translate Russian military conversations. Generally, AI technologies, and their potential to support investigations, especially concern improved reporting and recording mechanisms. Such mechanisms are usually considered critical for militaries as they serve as operational logs that can help investigators access and collect information around the moments preceding the application of force. While not unique to AI, the increased digital transparency flowing from AI could, for example through digital logs and auditable algorithms (which refers to approaches for reviewing algorithmic processing systems), make this task easier. Algorithmic auditing, in particular, could be a crucial way to “institutionalize” accountability when using AI technologies.
Overall, the potential of AI to better document and record decision-making remains to be further unpacked and understood by States, especially those who seek to implement such technologies in their militaries.
Implications of AI on the Ability to Assess Evidence
Once evidence is collected, an investigatory body must assess that information. An investigatory body will seek to assess whether the harmful incident was, for instance, the result of malicious conduct, a technical glitch, or a more systemic issue (which refers to situations where harm cannot be attributed to one specific unit, person, or bad intent, but rather a set of underlying causes that could lead to further incidents).
Besides situations of clear-cut intentional attacks directed against civilians, assessments of harmful incidents are already a difficult task. In the past, States have often been criticized for arguably too often concluding that a harmful incident was due to an accident rather than, for example, a systemic issue. Now, the question is how AI, and the increased technical complexity that comes with it, affect this already complicated task.
Discerning accidents from breaches
Harmful incidents do not inherently constitute a violation of IHL. Harm to civilians or civilian objects is not unlawful so long as they were not purposefully made the object of attack (known as distinction), the harm was not excessive compared to the expected military advantage (known as proportionality), and other IHL rules are followed. A critical task for an investigator is, therefore, to assess to what extent the incident was due to a foreseen (but permissible) risk, due to factors that could not have been reasonably foreseen (and thereby, arguably not a violation) or whether the harm inflicted actually could have been reasonably foreseen and prevented (which could amount to a violation). However, assessing whether a certain result was foreseeable and not prevented islikely to be one of the core challenges when assessing evidence of incidents where AI is involved.
First, reliance on machine processes in decision-making may, by default, increase the risk of accidents. For example, it has been argued that LAWS may be more prone to accidents when compared to traditional weapons, due to hacking, unexpected interactions with the environment, technical glitches, and data or software errors. These potential new sources of risks arguably challenge IHL’s (permissible) attitude towards accidents and call for a better understanding of types of accidents involving AI technologies that would trigger responsibility. Such technical (and legal) clarification would ensure that investigators are well-positioned to assess whether an incident was a breach, a permissible action that nevertheless caused civilian harm, or just a “normal” accident.
Moreover, to distinguish accidents and permissible actions that cause civilian harm from IHL breaches, an investigator would need to establish what the users knew (or had reason to know) about the potentially harmful effects. This task is likely to become more tricky with AI for two reasons. First, military AI, by definition, introduces new layers of unpredictability. In the case of LAWS, for example, their pre-programmed nature means that the effects will be determined by how the system interacts with the environment. Thus, the user may not know exactly when, where, or against whom force will be applied. This would likely make the assessment of what human users knew (at the time of a certain decision) more difficult.
Second, an investigation requires a clear understanding of roles and responsibilities to assess whether someone acted unlawfully: who was supposed to know and do what, when and where in the military decision-making tree was an action authorized or stopped? However, when using pre-programmed weapons like LAWS – decisions to apply force may occur longer in advance and be made by a wider range of people, such as programmers and engineers, than with traditional weapons. If these new roles and responsibilities are not well understood, it could arguably pose an additional challenge for an investigator to assess whether someone had reason to know about the potentially harmful effects but did not prevent it.
The importance of identifying systemic misconduct
Another possible implication of AI pertains to the ability to identify systemic, rather than individual, misconduct. This is because harmful incidents involving AI are perhaps more likely to result from collective, systemic failures to ensure respect for IHL rather than intentional misconduct by just one person. For example, an increased risk of unintended harm could flow if armed forces receive poor training on how to use the technology, the system has insufficient testing and verification, or an overreliance on certain types of data in the target identification process or battle damage assessments (raising the issue of data biases in areas such as gender and race). Also, the decisions to deploy an AWS are likely to be made by a wider range of people, from developers to programmers and commanders. This dynamic makes it harder to trace responsibility back to one individual, and makes the actions of each individual less relevant.
This means, to ensure human responsibility for IHL violations involving AI, States should pay particular attention to mechanisms that inquire into State responsibility for a broader set of IHL violations than just individual criminal responsibility for war crimes.
By only focusing on war crimes (which currently seems to be the focus in the LAWS debate, for instance), States not only fail to identify important underlying structures causing harm but also fail to comply with their overarching obligations to “ensure respect” for IHL and suppress all violations.
If Used, Military AI should Strengthen (Not Weaken) Investigation Mechanisms
Investigating IHL violations is already difficult, and in some respects adding AI to the equation may make it trickier. The technical complexity and unpredictability associated with AI could make both the tasks of collecting and assessing evidence more challenging. These are, however, only challenges as long as they are not addressed and mitigated by States who seek to implement AI in their military.
For States to fulfil their common ambition of retaining human responsibility in high-tech warfare, a useful first step would be for States (to the extent possible and feasible) to share the general contours of their existing investigation mechanisms. A better understanding and more transparency around existing mechanisms will facilitate more constructive discussions on the ways in which AI impacts existing procedures. States should focus particular attention to mechanisms to inquire into technical and systemic issues. As such, the implementation of AI in the military could provide a useful and timely avenue for States to both share views around existing investigation mechanisms and strengthen them.
Thus, as governments continue to discuss – and disagree on – how to govern (responsible) military AI, approaching the issue from an investigatory perspective would provide a productive next step.