Of XZ and Unknown Unknowns

Tomer Filiba

CTO

April 10, 2024

TL;DR

The recent XZ (liblzma) supply-chain attack is a marvel of social engineering and a great example of evading detection under the many-eyes of the open source community. It was discovered by pure chance by a watchful developer, but it could have gone unnoticed for years.

Sweet’s runtime solution takes a different approach to catching and mitigating such risks. We can tell if your environment is actually affected by a CVE (while keeping the noise level minimal), and we can detect both known and unknown risks using our baseline mechanism. For instance, we detect anomalous connections to your servers, as well as suspicious identities using protocol-aware introspection.

XZ What?

XZ is a Linux command line utility that compresses data, much like WinZip on Windows. It relies on the LZMA algorithm which is implemented as a library (liblzma) in the same project. This library is used by many other projects to compress data, be it files or data transfer over the network. One of the more important projects that use this library is OpenSSH, which was the actual target of the attack.

Through a meticulously planned attack, spanning at least two years, a sophisticated threat actor has gained control of the XZ’s open source GitHub repository. The threat actor (speculated to be a state-sponsored organization) pretended to be a maintainer of the library, and had actually fixed issues over time, building trust with the original author. Then, through a series of innocent looking patches, they managed to incorporate malicious code that allows for remote code execution (RCE) for anyone who possesses a certain private key.

On a personal note, I authored some popular Python open source projects (RPyC, Construct and Plumbum, to name a few), and I also found myself delegating control to maintainers as my spare time dwindled.

It is really a fascinating story, with many more interesting details, but it goes to show that the human factor is the easiest way in (via social engineering) , and that malicious code can hide in plain sight, even in popular open source repositories. Using the “official repo” is not enough.

But Am I Affected?

The first question any organization would ask themselves is, are we affected by this? Many scanning-based security solutions will be able to detect the affected library on disk (i.e., liblzma-5.6.1), however, the interesting question is not whether or not this library exists on disk, but whether or not it’s actually being used. From our measurements, the false-positives ratio in such cases reaches over 90% – that is to say, you normally get thousands (if not tens of thousands) of alerts on critical CVEs, and it’s practically impossible to tell which one is actually being used (and thus poses a risk).

Sweet’s runtime solution monitors loaded and executed libraries, as well as libraries that “merely exist”. This means the alerts you get from Sweet are about active threats, rather than drowning in a sea of potential ones.

Moreover, oftentimes a CVE is only meaningful if the machine it’s running on accepts connections from the internet (as opposed to an internal service). Since our sensor sees actual connections, we can tell if the workload is indeed reachable from the outside world, and prioritize the risk accordingly.

Behavioral Anomalies are the Only Way

Sweet’s solution relies heavily on baselines to learn your environment’s “expected pattern”, and then use it to single out anomalies. For example, if we see that sshd (the OpenSSH server) only accepts connections from IPs inside your cloud’s VPC (or from some country), it would be easy for us to detect an external IP connecting to your servers – even if the IP itself is not a “known bad” one.

Similarly, assuming someone has managed to connect to your servers, we would detect the deviation from the normal process tree, even if the tools the attacker is using are benign (like wget or curl).

Relying on hashes or signatures to detect malicious code is only good for known threats. The only way to detect zero-days and other unknown vulnerabilities is to detect anomalous behavior.

Identities are Everything

Sweet’s protocol-aware network capabilities are able to extract the identity of the “user” making the connection. Of course this need not be a human user, as non-human identities (NHIs) are an order of magnitude more common.

For instance, we extract the authorization headers used by HTTPS requests, or in the case of SSH, we extract the key’s name and signature. Using our baseline mechanism, we then learn where each identity is being used, and on what.

Should someone manage to connect over SSH to your servers, using an implanted private key (as in the case of the XZ backdoor), we would be able to detect the anomalous identity connecting to your organization and alerting on that.

Wrapping Up

In conclusion, it’s not every day that such an elaborate backdoor is discovered, but it begs the question, how many more are there that we don’t know of? Relying on static solutions that verify the origin of the code or scan for vulnerable libraries is not enough – it’s only good in hindsight.

Sweet runtime approach both focuses you on the active threats, reducing the noise by an order (or two!) of magnitude, and keeps you informed on anomalous behaviors which are indicative of an attack, so you can keep your organization safe.