As March gave way to April, the cybersecurity community was abuzz with the news that liblzma, a component of the xz open source data compression utility, had been hijacked as a vehicle for code that could create a backdoor into computers that installed and ran the software. It’s likely that you’ve never heard of liblzma or xz, nor spend much time thinking about software compression utilities. But whether you know it or not, you may have actually installed and used xz through its inclusion in other software tools, as is the case with many obscure open source software packages, and that’s a problem for cybersecurity. The malicious code hidden in xz was not discovered due to the careful vetting of the software by teams of cybersecurity professionals on a mission to weed out malware. Rather, the problem was discovered by happenstance, and the bad news is, we are often at the mercy of luck when it comes to detecting cybersecurity attacks before they are actually deployed and used.
Despite its general obscurity, the xz utility is widely used on many computing platforms. There is nothing particularly noteworthy about xz, one of many software compression utilities that take large files and data streams and make them smaller by taking advantage of statistical redundancies in information. Compression tools can be presented as stand-alone pieces of software but are more often part of a larger software package that might invisibly use the compression functions as part of its overall purpose. This was how the xz package became embedded in larger software projects. The xz tool has been around for over 15 years with no security defects and has been deployed in many operating systems since then, including most Linux distributions and Microsoft Windows. In addition, xz had been added as a dependency in the OpenSSH software package, a widely used set of tools for secure login between computers. What is noteworthy about xz’s inclusion is that it was indirectly added to optionally work with internal functions of Linux operating systems, not as part of OpenSSH’s core technology.
You might think that an open source software project as widely depended upon as xz would be maintained by a sizable team of developers whose code was regularly reviewed by security experts. That was not the case. The xz utility was being maintained by a single developer who had started to have problems with his health and had been slow to publish updates to xz because of them. In October 2021, a developer named Jia Tan started making contributions to the xz code and offered to take over maintenance of the project. In 2023, the reins were passed over to Jia, and they began to carefully introduce a well-concealed piece of malware into the xz code, releasing the final version in February 2024, where users of the code such as Linux distributions picked it up for inclusion in future versions of their own software.
The xz backdoor code was discovered by accident. In late March 2024, a Microsoft software engineer noticed that when he used OpenSSH to log in to remote computers, they were taking about 500ms longer than they ordinarily should have. He dug into the OpenSSH code and noticed that it was making unusual calls to xz’s liblzma library on his own computer running the Debian Linux operating system. After analyzing the xz code, he found a carefully concealed script that would enable a remote user to log into his computer without any authorization—a back door. He immediately alerted the security team at Debian, and the Red Hat Linux team followed up by submitting a CVE–an initialization of “Common Vulnerabilities and Exposures,” a reference method for tracking information security vulnerabilities–with a severity of 10, the highest severity level.
All signs point to the xz backdoor as a well-planned and sophisticated hacking attempt. “Jia Tan”—likely a pseudonym—had suddenly appeared in 2021, making multiple updates to the xz project on GitHub, all of which were of generally high quality. Soon afterwards, the original xz maintainer began getting continuously pestered by multiple accounts either run by Jia Tan or by their associates. The maintainer eventually relented in 2023, and Jia Tan began adding their malicious code to the project. Engineers at Red Hat Linux also noted that Jia Tan had been repeatedly pressing distribution maintainers to use the new version of the xz code due its “great new features.”
This model of open source software development is not unusual, and that presents a cybersecurity problem. In fact, the scenario, often referred to generally as the supply chain problem, is so common that the xkcd webcomic has published a clever illustration of the problem. Operating systems and software applications the world over regularly make use of open source software packages and libraries, and they are not always carefully vetted beforehand. Open source software is quite useful and often necessary for enormous—and enormously complex—software projects such as operating systems, which can then afford users with functionality that might not otherwise be available without additional time and costs to the manufacturer. And there is very little in the way of regulation or legal liability that might present technology companies as users of open source software with the incentives to take on the costs necessary to avoid putting cybersecurity risk on customers as a kind of negative externality. The U.S. Government has recognized this problem, creating programs like the U.S. Cybersecurity and Infrastructure Security Agency’s (CISA’s) Software Bill of Materials (SBOM) concept to better identify supply chain risks from software dependencies. But it is unlikely that an SBOM requirement alone would have detected the xz backdoor. It turns out that solutions to this problem are frustratingly elusive.
Relying on luck is not a sustainable cybersecurity strategy, especially when we consider just how much of our world depends on secure and reliable software. A sophisticated, malicious hacking attempt like the xz backdoor might not have been easily discovered were it not for a Microsoft engineer curious about a 500ms delay in his login time. We currently rely too heavily on good fortune when it comes to software security. What happens when open source projects begin letting AI-based tools write parts of their codebase? An already complex system moving at a faster pace is a recipe for disaster. At a minimum, we need to seriously consider regulatory systems with appropriately shared risk and responsibility.