Intercepting Zoom's encrypted data with BPF

I originally wrote an earlier version of this post at the end of March, when I was working on adding uprobes support to redbpf. I wanted to blog about the work I was doing and needed an application to instrument for the purpose of this post. At that time Zoom's popularity was rising quickly, and I happened to read somewhere that it supported this creepy attention tracking feature that allowed meeting hosts to monitor if attendees were paying attention. I figured I could try to use uprobes to snoop into the data Zoom was sending to their servers and see how the tracking worked.

But then Zoom quickly started getting under a lot of fire. Zoombombing became a thing, several security issues were discovered and pretty much everyone started piling on the company. Considering all that, I was advised and ultimately decided not to publish the post.

Now things seem to have settled, Zoom improved their security and by popular demand got rid of attention tracking. So I think I can finally publish this! I edited out the part about attention tracking (which no longer exists) and a couple of other things that could potentially get me in trouble.

TLDR: I wrote a command line tool that uses BPF uprobes to intercept the TLS encrypted data that zoom sends over the network, and here I'm going to show the process I went through to write it. After I wrote this post I made the tool generic so that it can now instrument any program that uses OpenSSL. I published the code at https://github.com/alessandrod/snuffy.

Instrumenting applications with uprobes

Uprobes let you instrument user space applications by attaching custom code to arbitrary locations inside a target process. It's a bit like running an application in a debugger, setting breakpoints and fiddling around, but programmatically and without the overhead of a debugger.

An uprobe must be compiled and loaded like any other BPF program, then it can be attached with the following API:

pub fn attach_uprobe(
    &mut self,
    fn_name: Option<&str>,
    offset: u64,
    target: &str,
    pid: Option<pid_t>,
) -> Result<()>;

attach_uprobe() parses the target ELF binary or shared library, looks up the function fn_name, and once the target is running it injects the probe code at the resolved address. If offset is non-zero, its value is added to the address of fn_name. If fn_name is None, then offset is interpreted as starting from the beginning of the target's .text section. Finally if a pid is given, the probe will only be attached to the process with the given id.

In the rest of the post I'm going to show some examples of uprobes, focusing on the code that gets compiled to BPF bytecode, loaded in the kernel and then injected in the target process (in our case zoom). I'm not going to show much of the user-space code that loads the probes. That part is pretty standard rust code that does some setup, then prints out the data coming from the probes as it receives it. If you're interested you can still find all the user-space code at https://github.com/alessandrod/snuffy/blob/master/src/.

Poking into Zoom

We're going to use uprobes to inspect the network traffic between the zoom client and the company's servers. Zoom uses Transport Layer Security to encrypt the data. In order to intercept the unencrypted data, we need to find out which TLS library is used by the client, then attach uprobes to strategic places inside it.

Let's start with searching for common TLS symbols using objdump:

$ objdump -CT /opt/zoom/zoom | grep -iE "ssl|gnutls"
000000000080d5b0 g    DF .text	0000000000000013  Base        PreMeetingUIMgr::sig_blockUnknownSSLCertChanged()
000000000080d590 g    DF .text	0000000000000013  Base        PreMeetingUIMgr::sig_sslCertWarningChanged()

Those look like callbacks that get invoked when a certificate is invalid, and Zoom does indeed show a warning if you try to intercept its traffic with a tool like mitmproxy. The callbacks deal with certificates, not unencrypted buffers, so they are not useful to us.

Looking at the output of ldd we can see that Zoom links to Qt Network, which includes some potentially relevant APIs:

$ objdump -CT /opt/zoom/zoom | grep -iE "QNetworkReq"
0000000000000000      DF *UND*	0000000000000000  Qt_5        QNetworkRequest::QNetworkRequest(QUrl const&)
0000000000000000      DF *UND*	0000000000000000  Qt_5        QNetworkRequest::~QNetworkRequest()
0000000000000000      DF *UND*	0000000000000000  Qt_5        QNetworkAccessManager::get(QNetworkRequest const&)

QNetworkRequest(QUrl const&) looks like something that could be used to communicate with the backend and does support TLS. I tried attaching to it and other functions exported by the framework but none of them turned out to be invoked. Zoom supports a number of platforms and devices, it's possible that they use Qt just for the UI on linux, and then have some lower level networking code that can be shared with their other clients.

At this point it is pretty likely that zoom is linking statically to the TLS library. Let's see if in the .rodata section of the binary there's anything that could point us in the right direction:

$ readelf -p .rodata /opt/zoom/zoom | grep -i ssl | wc -l
739
$ # 😏
$ readelf -p .rodata /opt/zoom/zoom | grep -i 'openssl 1'
  [4a1b66]  OpenSSL 1.1.1g  21 Apr 2020
  [58cd50]  OpenSSL 1.1.1g  21 Apr 2020

Aha! The client is using OpenSSL version 1.1.1g (knowing this will turn out to be very useful), and the library is statically linked.

Instrumenting OpenSSL

OpenSSL exports two functions named SSL_read and SSL_write, which have the following signature:

int SSL_read(SSL *ssl, void *buf, int num);
int SSL_write(SSL *ssl, const void *buf, int num);

SSL_read reads encrypted data sent by a remote peer, decrypts it and stores the decrypted data in the provided buffer. SSL_write encrypts the given buffer and sends it to a remote peer. Attaching an uprobe where SSL_read returns, and one at the entry of SSL_write, we can therefore access unencrypted memory.

Here's the uprobes that do just that:

use redbpf_probes::uprobe::prelude::*;

struct SSLArgs {
    ssl: usize,
    buf: usize,
}

// temporary storage map
static mut ssl_args: HashMap<u64, SSLArgs> = HashMap::with_max_entries(1024);

fn output_buf(regs: Registers, mode: AccessMode, buf_addr: usize, len: usize) {
  // Ignore how this is implemented for now. Assume it reads `len` bytes from `buf_addr`
  // and sends them to our user-space process where they are hex-dumped.
  ...
}

#[uprobe]
fn SSL_write_entry(regs: Registers) {
    let ssl = regs.parm1() as usize;
    let buf = regs.parm2() as usize;
    let num = regs.parm3() as i32;
    if num <= 0 {
        return;
    }

    // This is where SSL_write begins, the buffer isn't encrypted yet
    // so we send it to user-space
    output_buf(regs, AccessMode::Write, buf, num as usize);
}

#[uprobe]
fn SSL_read_entry(regs: Registers) {
    let ssl = regs.parm1() as usize;
    let buf = regs.parm2() as usize;

    // store the function arguments so we can retrieve them once the
    // function returns
    unsafe {
        ssl_args.set(&bpf_get_current_pid_tgid(), &SSLArgs { ssl, buf });
    }
}

#[uretprobe]
fn SSL_read_ret(regs: Registers) {
    // the return value of SSL_read contains the length of the buffer
    let num = regs.rc() as i32;
    if num < 0 {
        return;
    }
    // This is where SSL_read returns, the buffer is now decrypted
    // so we send it to user-space
    let tgid = bpf_get_current_pid_tgid();
    let args = unsafe { ssl_args.get(&tgid) };
    if let Some(SSLArgs { ssl, buf }) = args {
        output_buf(regs, AccessMode::Read, *buf, num as usize);
        unsafe { ssl_args.delete(&tgid) };
    }
}

Uprobes are annotated with the #[uprobe] attribute. Once they are triggered, they get passed a Registers argument through which they can access memory.

The SSL_write_entry probe is the simplest. It reads the registers containing the values of the buf and num arguments passed to SSL_write, and sends a copy of the buffer to user-space before it gets encrypted.

The SSL_read_entry probe is similar in that it reads the content of the ssl, buf and num arguments passed to SSL_read. It doesn't send the buffer to user-space though. Remember the data is decrypted after SSL_read returns, so we need a second uprobe that we attach to the return address of the function. That's what SSL_read_ret is for. It's similar to the other two probes, but is annotated with #[uretprobe], which means that it will trigger once the function it's attached to returns.

But why do we need two probes for SSL_read, why not just have SSL_read_ret? The answer is that when SSL_read returns, it's likely that the registers that used to contain the function arguments were modified, so we need to read their values at the start of the function and store them so we can retrieve them later. This is a very common pattern when writing BPF code.

Finally if zoom linked to OpenSSL dynamically or if debugging symbols were present, the user-space code to attach the probes would be as simple as:

use redbpf::load::Loader;

let mut loader = Loader::load_file(COMPILED_BPF_BINARY)?;
let pid = None;
for uprobe in loader.uprobes_mut() {
    // Attach to SSL_read and SSL_write inside libssl.
    // Let redbpf resolve the symbol addresses.
    match uprobe.name().as_str() {
        "SSL_read_entry" | "SSL_read_ret" => {
            uprobe.attach_uprobe(Some("SSL_read"), 0, "libssl", pid)?;
        }
        "SSL_write_entry" => {
            uprobe.attach_uprobe(Some("SSL_write"), 0, "libssl", pid)?;
        }
        _ => continue,
    }
}

Unfortunately since OpenSSL is statically linked and the symbols have been stripped, redbpf can't automatically resolve the addresses of SSL_read and SSL_write, instead we have to explicitly provide the offsets we want to attach to:

use redbpf::load::Loader;

let mut loader = Loader::load_file(COMPILED_BPF_BINARY)?;
let pid = None;
for uprobe in loader.uprobes_mut() {
    let zoom_binary = "/opt/zoom/zoom";
    // the offset of SSL_read in zoom's .text section
    let ssl_read_offset = ???;
    // and the offset of SSL_write
    let ssl_write_offset = ???;

    match uprobe.name().as_str() {
        "SSL_read_entry" | "SSL_read_ret" => {
            uprobe.attach_uprobe(None, ssl_read_offset, zoom_binary, pid)?;
        }
        "SSL_write_entry" => {
            uprobe.attach_uprobe(None, ssl_write_offset, zoom_binary, pid)?;
        }
        _ => continue,
    }
}

But how do we find the offsets? What values do we give to ssl_read_offset and ssl_write_offset?

[ REDACTED ]

I had a nice little section on how to find the offsets here. When I first wrote it I was convinced that publishing two addresses couldn't possibly get me sued for reverse engineering. Some of the people who read the draft of this post changed my mind about it though, and it is 2020 after all so it is not a good time to be optimistic.

Hypothetically I suppose one could find the offsets by disassembling zoom with objdump, then disassembling OpenSSL 1.1.1g and comparing the two. I guess the code wouldn't match exactly, but the function prologues and the relative addressing around the SSL * context used by SSL_read and SSL_write could make for good enough patterns. With a few carefully crafted ripgrep -U (multiline) searches on the disassembled code, I bet it wouldn't take that long to find the functions.

The rest of the post assumes that we did find the offsets, and that we put them in a file named zoom-offsets.toml with the following format:

# the values below are just examples, not the real ones
ssl_read = 0xBAAAAAAD
ssl_write = 0xDECAFBAD

so that the values passed to attach_uprobe() can be loaded from the file.

Finally some data

Having found the offsets of SSL_read and SSL_write, if we load the uprobes we wrote above and then start zoom, we'll get output that looks like this:

$ sudo target/debug/snuffy --hex-dump --offsets zoom-offsets.toml
Write 575 bytes
|504f5354 202f7265 6c656173 656e6f74| POST /releasenot 00000000
|65732048 5454502f 312e310d 0a486f73| es HTTP/1.1..Hos 00000010
|743a2075 73303477 65622e7a 6f6f6d2e| t: us04web.zoom. 00000020
|75730d0a 55736572 2d416765 6e743a20| us..User-Agent:  00000030
|4d6f7a69 6c6c612f 352e3020 285a4f4f| Mozilla/5.0 (ZOO 00000040
|4d2e4c69 6e757820 5562756e 74752031| M.Linux Ubuntu 1 00000050
...

Read 3088 bytes
|48545450 2f312e31 20323030 200d0a44| HTTP/1.1 200 ..D 00000000
|6174653a 20467269 2c203034 20536570| ate: Fri, 04 Sep 00000010
|20323032 30203035 3a30343a 31352047|  2020 05:04:15 G 00000020
|4d540d0a 436f6e74 656e742d 54797065| MT..Content-Type 00000030
|3a206170 706c6963 6174696f 6e2f782d| : application/x- 00000040
|70726f74 6f627566 3b636861 72736574| protobuf;charset 00000050
...

When zoom starts it checks for updates with that HTTP POST request. The uprobes get triggered, send the unencrypted data to the snuffy process and there the data gets hex-dumped. Success!

Tracing network connections

Turns out that Zoom uses many TLS connections simultaneously, so the output from snuffy quickly becomes an unreadable mess of intermingled data belonging to different connections.

To improve readability, we're going to try to associate reads and writes to ip addresses by digging into OpenSSL a bit more.

Extracting socket descriptors

OpenSSL provides the BIO API to implement IO. Looking at the relevant header files we can see:

// a pointer of this type gets passed to SSL_read and SSL_write
typedef struct ssl_st SSL;
struct ssl_st {
    int version;
    const SSL_METHOD *method;
    /* used by SSL_read */
    BIO *rbio;
    /* used by SSL_write */
    BIO *wbio;
    ...
};

typedef struct bio_st BIO;
struct bio_st {
    const BIO_METHOD *method;
    BIO_callback_fn callback;
    BIO_callback_fn_ex callback_ex;
    char *cb_arg;
    int init;
    int shutdown;
    int flags;
    int retry_reason;
    int num; // <- This is the socket descriptor
    ...
};

Given a SSL * pointer - which is the first argument passed to SSL_read and SSL_write - we can retrieve the associated BIO values. Inside those BIO values, the num field holds the underlying socket descriptor. Here's some hacky BPF code to get the descriptor given a SSL *:

// this is equivalent to ssl->rbio
fn ssl_rbio(ssl: usize) -> Result<*const c_void, i32> {
    unsafe { bpf_probe_read((ssl + 16) as *const *const c_void) }
}

// this is equivalent to ssl->wbio
fn ssl_wbio(ssl: usize) -> Result<*const c_void, i32> {
    unsafe { bpf_probe_read((ssl + 24) as *const *const c_void) }
}

// this is equivalent to bio->num, which happens to be the socket descriptor
fn bio_fd(bio: *const c_void) -> Result<i32, i32> {
   unsafe { bpf_probe_read((bio as usize + 48) as *const i32) }
}

Note: For brevity here I computed the offsets of rbio, wbio and num manually looking at the headers, but I could have used cargo bpf bindgen to generate Rust bindings for struct SSL.

Let's update the uprobes to send the file descriptors along with the data. Here's the relevant changes:

#[uprobe]
fn SSL_write_entry(regs: Registers) {
    ...

    // send the fd along with the buffer
    let fd = ssl_wbio(ssl).and_then(bio_fd).ok();
    output_buf(regs, fd, AccessMode::Write, buf, num as usize);
}

#[uretprobe]
fn SSL_read_ret(regs: Registers) {
    ...
        // send the fd along with the buffer
        let fd = ssl_rbio(*ssl).and_then(bio_fd).ok();
        output_buf(regs, fd, AccessMode::Read, *buf, num as usize);
    ...
}

Pretty much same as before, except now with every intercepted buffer we also send its corresponding socket descriptor.

Mapping socket descriptors to addresses

Now we have socket descriptors, but how do we get IP addresses from them? Let's take a look at the signature of connect():

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

The connect function is used to establish a connection from the given socket descriptor sockfd to the network address addr. Let's write a new uprobe that sends all the (sockfd, addr) pairs to user-space:

use redbpf_probes::uprobe::prelude::*;

#[derive(Clone)]
pub struct Connection {
    pub fd: u64,
    pub addr: u32,
    pub port: u32,
}

// user-space will receive all the values we insert in this BPF map
#[map("connection")]
static mut connection_events: PerfMap<Connection> = PerfMap::with_max_entries(1024);

#[uprobe]
fn connect(regs: Registers) {
    let _ = do_connect(regs);
}

fn do_connect(regs: Registers) -> Option<()> {
    let fd = regs.parm1() as i32;
    let addr = regs.parm2() as *const sockaddr;

    // only record ipv4 connections
    if unsafe { &*addr }.sa_family()? as u32 != AF_INET {
        return None;
    }

    // and only for the zoom command
    if !comm_is_zoom() {
        return None;
    }

    let addr = unsafe { &*(addr as *const sockaddr_in) };
    let conn = Connection {
        fd: fd as u64,
        addr: addr.sin_addr()?.s_addr()?,
        port: u16::from_be(addr.sin_port()?) as u32,
    };

    unsafe {
        // send the value to user-space
        connection_events.insert(regs.ctx, &conn);
    }

    None
}

fn comm_is_zoom() -> bool {
    let comm = bpf_get_current_comm();
    let cmd = unsafe { core::slice::from_raw_parts(comm.as_ptr() as *const u8, 16) };
    return &cmd[..4] == b"zoom";
}

When zoom initiates a connection do_connect gets called, creates a Connection value holding socket descriptor and address of the connection, and sends it to the snuffy user-space process. There we store all the Connection values in a hash map keyed by socket descriptors. Then whenever we receive the data and socket descriptor of an intercepted SSL_read or SSL_write, we can lookup the IP address by indexing the connections hash map with the descriptor.

Since connect() is linked in dynamically via libpthread (part of glibc), to attach we can simply call:

uprobe.attach_uprobe(Some("connect"), 0, "libpthread", pid)

With connection tracing in place the output now looks like this:

$ sudo target/debug/snuffy --hex-dump --trace-connections --offsets zoom-offsets.toml
Write 577 bytes to 3.235.82.213:443
|504f5354 202f7265 6c656173 656e6f74| POST /releasenot 00000000
|65732048 5454502f 312e310d 0a486f73| es HTTP/1.1..Hos 00000010
|743a2075 73303477 65622e7a 6f6f6d2e| t: us04web.zoom. 00000020
|75730d0a 55736572 2d416765 6e743a20| us..User-Agent:  00000030
|4d6f7a69 6c6c612f 352e3020 285a4f4f| Mozilla/5.0 (ZOO 00000040
|4d2e4c69 6e757820 5562756e 74752031| M.Linux Ubuntu 1 00000050
...

Read 3088 bytes from 3.235.82.213:443
|48545450 2f312e31 20323030 200d0a44| HTTP/1.1 200 ..D 00000000
|6174653a 20467269 2c203034 20536570| ate: Fri, 04 Sep 00000010
|20323032 30203035 3a30383a 31322047|  2020 05:08:12 G 00000020
|4d540d0a 436f6e74 656e742d 54797065| MT..Content-Type 00000030
|3a206170 706c6963 6174696f 6e2f782d| : application/x- 00000040
|70726f74 6f627566 3b636861 72736574| protobuf;charset 00000050
...

As a final touch in order to make it even easier to read the output, we can see what domain names those IPs correspond to by instrumenting getaddrinfo(), the function that zoom uses to resolve domains to addresses:

int getaddrinfo(const char *node, const char *service,
                const struct addrinfo *hints, struct addrinfo **res);

getaddrinfo resolves the node domain name and populates the res out argument with the corresponding IP addresses. So we're going to create a new #[uretprobe] that once getaddrinfo() returns, builds a hash map from each IP in res to the domain node. Since conceptually this is exactly what we just did for connect(), i'm not going to show the code again. You can see it here.

Putting it all together

We've written uprobes for SSL_read, SSL_write, connect and getaddrinfo. With them we can see what DNS queries the zoom client does, what addresses it connects to and what encrypted data it sends and receives.

The final output looks like this:

$ sudo target/debug/snuffy --hex-dump --trace-connections --command /opt/zoom/zoom --offsets zoom-offsets.toml
[4:56:18] Connected to 127.0.0.53:53
[4:56:18] Resolved us04web.zoom.us to 3.235.69.6
[4:56:18] Connected to us04web.zoom.us:443 (3.235.69.6:443)
[4:56:19] Write 571 bytes to us04web.zoom.us:443 (3.235.69.6:443)
[4:56:19] |504f5354 202f7265 6c656173 656e6f74| POST /releasenot 00000000
[4:56:19] |65732048 5454502f 312e310d 0a486f73| es HTTP/1.1..Hos 00000010
[4:56:19] |743a2075 73303477 65622e7a 6f6f6d2e| t: us04web.zoom. 00000020
[4:56:19] |75730d0a 55736572 2d416765 6e743a20| us..User-Agent:  00000030
[4:56:19] |4d6f7a69 6c6c612f 352e3020 285a4f4f| Mozilla/5.0 (ZOO 00000040
[4:56:19] |4d2e4c69 6e757820 5562756e 74752031| M.Linux Ubuntu 1 00000050
...

[4:56:19] Read 3088 bytes from us04web.zoom.us:443 (3.235.69.6:443)
[4:56:19] |48545450 2f312e31 20323030 200d0a44| HTTP/1.1 200 ..D 00000000
[4:56:19] |6174653a 20467269 2c203034 20536570| ate: Fri, 04 Sep 00000010
[4:56:19] |20323032 30203035 3a31313a 30352047|  2020 05:11:05 G 00000020
[4:56:19] |4d540d0a 436f6e74 656e742d 54797065| MT..Content-Type 00000030
[4:56:19] |3a206170 706c6963 6174696f 6e2f782d| : application/x- 00000040
[4:56:19] |70726f74 6f627566 3b636861 72736574| protobuf;charset 00000050
..

There's a lot interesting stuff that zoom does over the network (like XMPP 🤓), but analyzing that is left as an exercise to the reader.

The final version of the uprobes is at https://github.com/alessandrod/snuffy/blob/master/snuffy-probes/src/snuffy/ and the user-space code that loads them is at https://github.com/alessandrod/snuffy/blob/master/src/. The code is slightly different from what I inlined above as after I first wrote the post, I made snuffy generic so it can now be used to instrument any program that uses OpenSSL. See the README for more info.

I had a lot of fun writing this! I'm going to keep working on snuffy and add support for more libs in addition to OpenSSL. I've tested it with a few programs and if you find that it doesn't work with any please do let me know. I'm not working full-time on redbpf anymore but I'm still contributing to the project, so if you find bugs writing your own uprobes please open an inssue on github and I'll take a look. And finally if your team is looking for Rust developers, please do get in touch!