Last week, Canadians learned that their foreign signals intelligence agency, the Communications Security Establishment (CSE), had improperly shared information with their American, Australian, British, and New Zealand counterparts (collectively referred to as the “Five Eyes”). The exposure was unintentional: Techniques that CSE had developed to de-identify metadata with Canadians’ personal information failed to keep Canadians anonymous when juxtaposed with allies’ re-identification capabilities. Canadians recognize the hazards of such exposures given that lax information-sharing protocols with US agencies which previously contributed to the mistaken rendition and subsequent torture of a Canadian citizen in 2002.
As with many of its partner foreign agencies, CSE is granted almost limitless legal latitude when gathering intelligence on non-nationals, but is legally constrained from directing its activities at Canadians. The agency relies on de-identification techniques to facilitate its intelligence gathering, analysis, and sharing of data it knows contains significant amounts of Canadian data. However, it is generally known that de-identification is an inherently tenuous activity. Given this, it is fair to ask both how the data it was sharing might have been re-identified by its partner intelligence agencies, as well as the broader lessons of CSE’s failure to stay within its legislative mandate. And, even more importantly, this incident raises questions regarding the ongoing viability of the agency’s old-fashioned mandates that bifurcate Canadian and non-Canadian persons’ data in light of the integrated nature of contemporary communications systems and data exchanges with foreign partners.
Identity Suppression in Metadata
CSE collects large volumes of data about Canadians in spite of operating under a general prohibition on directing its activities towards nationals. It justifies such collection on the basis that Canadian data is so intermingled with other digital communications that general monitoring of the global communications infrastructure will inevitably include significant amounts of Canadian data. By similar logic, CSE analyzes databases known to contain substantial amounts of Canadian metadata and employs identity suppression techniques as a means of justifying subsequent uses of its analytical outputs. As no individual Canadian is specifically targeted or immediately identifiable in the process or emergent from the analytical outputs, the activity is presumed not to be “directed at Canadians.”
In 2010–11, CSE began sharing this data with its Five Eyes allies, relying on the same de-identification minimization techniques to adhere to its prohibition on directing its activities at Canadians. The data in question can include the content of communications, but is mostly classified as metadata, including: IP address or geo-location of the source or destination of IP traffic, cookie-based identifiers, email identifiers, files downloaded, or websites visited. Documents disclosed by former NSA contractor Edward Snowden reveal just how extensive such “incidental” collection and analysis can be. CSE has sufficient metadata to, for example, track Canadians’ mobile devices as these are used across the country and to determine what files Canadians are downloading from file upload sites.
Last week, an Annual Report released by CSE’s oversight body revealed that since 2014 CSE’s de-identification techniques had been inadequate to prevent foreign agencies from re-identifying Canadian nationals in shared datasets:
… CSE discovered on its own that certain metadata was not being minimized properly. Minimization is the process by which Canadian identity information contained in metadata is rendered unidentifiable prior to being shared. … [T]he fact that CSE did not properly minimize Canadian identity information contained in certain metadata prior to being shared was contrary to the ministerial directive, and to CSE’s operational policy.
In a comment accompanying the release of the report, Canada’s Minister of National Defence added that the de-identification failure arose from technical deficiencies in CSE de-identification systems. Following this discovery, CSE stopped sharing “certain types of metadata” while working to resolve their minimization deficiencies.
Identity suppression in metadata is an inherently difficult activity. Any de-identified dataset of sufficient volume and utility will be subject to ongoing risks of re-identification. In other words, de-identification is not absolute, and will always entail tradeoffs between the ongoing utility of a dataset and minimizing the risk that some or all of the data therein will be traceable to an individual, based on established priorities. The risk of re-identification increases significantly where a data set includes location-based data, digital identifiers such as IP addresses or cookies, or a wide range of individual activity. It is further heightened where the attack vector includes significant amounts of secondary data that can be linked to the de-identified dataset and therefore heighten the re-identification risk. It is now known that the NSA and GCHQ each operate extensive databases (see here and here) that provide these types of secondary datasets and are used for the express purpose of identifying datasets of otherwise anonymous traffic.
The identity suppression failure reported here is perhaps unsurprising in light of these difficulties. Perhaps more surprising is the account regarding CSE’s de-identification processes, which was “decentralized and lacked appropriate control and prioritization.” Prioritization decisions relating to tradeoffs between utility and re-identification risk can have far-reaching impact, and should be carried out in a systematic and centralized manner.
Implications of Erroneous Sharing
The first obvious implication of CSE’s activities is the temporary cessation some metadata sharing with close allies. Neither CSE or its oversight body have clarified how this data was collected, its intended uses in a de-identified/minimized format, or the possible domestic or international consequences associated with its no longer being shared. Presuming that such intelligence-sharing serves an important purpose, this incident underlines the importance of having clear and established privacy safeguards in place so that unexpected discoveries such as these ones do not derail active intelligence-sharing programs.
A second implication relates to CSE’s general use of de-identification as a means of meeting its legal obligations. In explaining the unintended failure in minimization practices, the Minister of National Defence (who is legally responsible for CSE’s activities) noted that the “metadata in question that was shared with Canada’s partners did not contain names or enough information in its own to identify individuals” and, hence, that “the privacy impact was low.” In addition, the statement included a familiar reminder of the importance that metadata plays in intelligence more broadly, paving the way to continuation of this metadata sharing program once steps have been taken to mitigate the identified re-identification risks.
But as noted above, it is naive to think that re-identification risks can be wholly removed as opposed to mitigated and managed. Moreover, a lot can be done with metadata without any direct linkage of that data to a specific individual — one extreme example being the use of metadata to make drone targeting decisions in some locales. Moreover, given the challenges inherent in properly assessing de-identification risks, it can be anticipated that most shared metadata will include at least some identifiable datasets. Finally, it is notable that CSE likely retains the technical ability to re-link any data within the dataset back to a particular individual if requested by a foreign intelligence partner. In light of these various re-identification capacities, it is perhaps appropriate to question whether the fig leaf of de-identification is sufficient or whether more robust measures should be put in place to ensure privacy protection if — or when — re-identification occurs.
Which raises the third, and perhaps most important, implication of CSE’s error, which relates directly to the broader framework under which foreign intelligence agencies generally operate. Indeed, CSE’s decision to suspend some data transfers to partner foreign agencies was made against a backdrop of growing policy discord, as states seek to reconcile the need to share intelligence with foreign partners with their obligations to protect the privacy of their citizens. This discord has led to policy developments like the adoption of PPD-28 in the US, which places some limits on the use (but not collection) of foreign national data, and proposals such as the “Umbrella Agreement” and EU-US “Privacy Shield,” both of which seek to grant EU nationals some additional avenues of recourse against potential privacy incursions by US intelligence and law enforcement agencies. Collectively, the growing integration of communications networks and foreign agencies highlights the need for robust debate concerning the prevailing legal paradigm, which places few limitations on the foreign-oriented surveillance activities of intelligence agencies.
These agencies are already well along the path of integrating their intelligence gathering and analysis activities, yet continue to operate under frameworks that provide minimal guarantees to non-nationals. CSE’s expansive surveillance framework, for example, is sold to the Canadian public on the basis that it is legally prohibited from directing its activities at Canadians. Yet, like its counterparts, it collects significant volumes of national data as this data is deeply intermixed with data of non-nationals in the global communications infrastructure CSE monitors. At least when this data is retained by CSE, other legal protections — such as constitutional and statutory privacy protections — can be asserted if misuses occur. The problem intensifies greatly, however, when the data is entrusted to a foreign partner agency; at that point, Canadian nationals are generally denied the ability to assert even basic protections, which are held not to apply extraterritorially.
It is this lack of protection for data in a cross-border context that requires CSE to operate with great caution when sharing any data with its intelligence partners. This same lack of assurances regarding the protection of non-national data is at the core of ongoing disputes between the United States and the European Union, which was intensified by a decision of EU’s highest court last year that suspended a commercial data-sharing regime out of concern that it exposed EU data to bulk US surveillance without meaningful protections or remedies, and underpinned intense bilateral negotiations between the two. The paradigm itself, however, is greatly at odds with the interconnected nature of our communications networks as well as modern intelligence. It deserves a reevaluation, and the recent bilateral EU-US discussions demonstrate that such a reevaluation is indeed possible.