futex: how to solve the robust_list race condition?

[RFC PATCH 1/2] futex: Create reproducer for robust_list race condition

Posted by André Almeida 1 month, 1 week ago

Create a reproducer for https://sourceware.org/bugzilla/show_bug.cgi?id=14485

This is not supposed to be merged.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
 robust_bug.c | 178 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 178 insertions(+)
 create mode 100644 robust_bug.c

diff --git a/robust_bug.c b/robust_bug.c
new file mode 100644
index 000000000000..1ade4e6d66dd
--- /dev/null
+++ b/robust_bug.c
@@ -0,0 +1,178 @@
+/*
+ *  gcc robust_bug.c -o robust_bug
+ *
+ * This is a reproducer for "File corruption race condition in robust
+ * mutex unlocking" from https://sourceware.org/bugzilla/show_bug.cgi?id=14485
+ *
+ * To increase the changes of reaching the race condition, a delay can be added
+ * to the kernel function handle_futex_death(), just before the user memory
+ * write futex_cmpxchg_value_locked().
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <linux/futex.h>
+#include <pthread.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+#include <time.h>
+
+#define cpu_relax() asm volatile("rep; nop");
+
+/*
+ * This struct is an example of a lock struct, shared between the threads.
+ */
+struct lock_struct {
+	uint32_t 		futex;
+	struct robust_list	list;
+};
+
+static struct lock_struct *lock;
+
+/*
+ * This is the struct that we are going to use to allocate on top of the 
+ * freed memory to observe the race condition.
+ */
+struct another_struct {
+	uint64_t value;
+};
+
+static pthread_barrier_t barrier;
+
+static int set_robust_list(struct robust_list_head *head)
+{
+	return syscall(SYS_set_robust_list, head, sizeof(*head));
+}
+
+/*
+ * This thread emulates the behaviour of a thread releasing a robust mutex:
+ * - It starts by adding the mutex to the op_pending field
+ * - Remove the mutex from the robust list
+ * - Release the lock and wake up waiters
+ * - Remove the mutex from the op_pending field
+ *
+ * However, this thread dies before doing this last step, leaving the mutex
+ * behind in the op_pending field.
+ */
+void *func_b(void *arg)
+{
+	static struct robust_list_head head;
+	pid_t tid = gettid() | FUTEX_WAITERS;
+
+	/*
+	 * Initial thread setup. This would happen in an earlier stage of the
+	 * thread execution.
+	 */
+	set_robust_list(&head);
+	head.list.next = &head.list;
+	head.futex_offset = (size_t) offsetof(struct lock_struct, futex) -
+		(size_t) offsetof(struct lock_struct, list);
+
+	/* This thread takes the lock... */
+	lock->futex = tid;
+
+	/* ...would do some work here... */
+
+	/*
+	 * ...and starts the release process. Adds the mutex to be released on
+	 * the op_pending.
+	 */
+	head.list_op_pending = &lock->list;
+
+	/* Barrier to synchronize thread B taking the lock */
+	pthread_barrier_wait(&barrier);
+	usleep(100);
+
+	/*
+	 * Here we would release the lock and wake up any waiters.
+	 *
+	 * lock->futex = LOCK_FREE;
+	 * futex_wake(lock->futex, 1);
+	 */
+
+	/*
+	 * We would remove the lock from op_pending, but we emulate a thread
+	 * exiting before doing it.
+	 */
+	return NULL;
+}
+
+int main(int argc, char *argv[])
+{
+	struct another_struct *new;
+	uint64_t original_val;
+	pthread_t thread_b;
+	uint32_t value;
+	int ret;
+
+	ret = pthread_barrier_init(&barrier, NULL, 2);
+	if (ret) {
+		puts("pthread_barrier_init failed");
+		return -1;
+	}
+
+	/* Initialize the lock */
+	lock = mmap(NULL, sizeof(struct lock_struct), PROT_READ | PROT_WRITE,
+		    MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+	if (lock == MAP_FAILED) {
+		puts("mmap failed");
+		return -1;
+	}
+	memset(lock, 0, sizeof(*lock));
+
+	/* Create the thread B that will take the lock */
+	pthread_create(&thread_b, NULL, func_b, NULL);
+
+	/* Barrier to synchronize thread B taking the lock */
+	pthread_barrier_wait(&barrier);
+
+	/* Copy this value as we will use it later */
+	value = lock->futex;
+
+	/*
+	 * Here, this thread would do the following:
+	 * - It would wait for the lock, and be wake from thread B
+	 * - Take the lock, do some work, and release it
+	 * - After releasing the lock and being the last user, it can correctly
+	 *   free it
+	 */
+	munmap(lock, sizeof(struct lock_struct));
+
+	/*
+	 * After freeing the lock, this thread allocates memory, which
+	 * happens to be at the same address of the lock, and by chance, it fills
+	 * the memory with the TID of thread B.
+	 */
+	new = mmap(NULL, sizeof(struct another_struct), PROT_READ | PROT_WRITE,
+		    MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+	if (new == MAP_FAILED) {
+		puts("mmap failed");
+		return -1;
+	}
+	if ((uintptr_t) lock != (uintptr_t) new) {
+		puts("mmap got a different address");
+		return -1;
+	}
+
+	new->value = ((uint64_t) value << 32) + value;
+
+	/* Create a backup of the current value */
+	original_val = new->value;
+
+	/* Wait for the memory corruption to happen... */	
+	while (new->value == original_val)
+		cpu_relax();
+
+	/* ...and now the kernel just overwrote an unrelated user memory! */
+	printf("Memory was corrupted by the kernel: %lx vs %lx\n",
+		original_val, new->value);
+
+	munmap(new, sizeof(struct another_struct));
+
+	return 0;
+}
-- 
2.53.0

Re: [RFC PATCH 1/2] futex: Create reproducer for robust_list race condition

Posted by Sebastian Andrzej Siewior 3 weeks, 3 days ago

On 2026-02-20 17:26:19 [-0300], André Almeida wrote:
> --- /dev/null
> +++ b/robust_bug.c
…
> +	new->value = ((uint64_t) value << 32) + value;
> +
> +	/* Create a backup of the current value */
> +	original_val = new->value;

Now that I finally got it and I might have understood the issue.

You exit before unlocking the futex. You free this block and this new
memory (address) is the same as the old one. Your corruption comes from
the fact that the old content is the same as the new content.

If the thread does unlock in userland (or kernel) but the lock remains
on the robust_list while it gets killed then the kernel will attempt to
unlock the lock. But this requires that the futex value matches the
value.
So if it is unlocked (0x0) or used again then nothing happens. Unless
the new memory gets the same value assigned as the pid value by
accident. Then it gets changed…

If the unlock did not happen and is still owned by the thread, that is
killed, then the "fixup" here is the right thing to do. The memory
should not be free()ed because the lock was still owned by the thread.
The misunderstanding here might be "once the thread is gone, the lock is
free we can throw away the memory". At the very least, it was a locked
mutex and I think pthread_mutex_destroy() would complain here.

So is the issue here that the "new" value is the same as the "old" value
and the robust-death-handle part in the kernel does its job? Or did I
over simplify something?
Let me continue with the thread…

Sebastian

Re: [RFC PATCH 1/2] futex: Create reproducer for robust_list race condition

Posted by André Almeida 3 weeks, 3 days ago

Em 12/03/2026 06:04, Sebastian Andrzej Siewior escreveu:
> On 2026-02-20 17:26:19 [-0300], André Almeida wrote:
>> --- /dev/null
>> +++ b/robust_bug.c
> …
>> +	new->value = ((uint64_t) value << 32) + value;
>> +
>> +	/* Create a backup of the current value */
>> +	original_val = new->value;
> 
> Now that I finally got it and I might have understood the issue.
> 
> You exit before unlocking the futex. You free this block and this new
> memory (address) is the same as the old one. Your corruption comes from
> the fact that the old content is the same as the new content.
> 
> If the thread does unlock in userland (or kernel) but the lock remains
> on the robust_list while it gets killed then the kernel will attempt to
> unlock the lock. But this requires that the futex value matches the
> value.
> So if it is unlocked (0x0) or used again then nothing happens. Unless
> the new memory gets the same value assigned as the pid value by
> accident. Then it gets changed…
> 
> If the unlock did not happen and is still owned by the thread, that is
> killed, then the "fixup" here is the right thing to do. The memory
> should not be free()ed because the lock was still owned by the thread.
> The misunderstanding here might be "once the thread is gone, the lock is
> free we can throw away the memory". At the very least, it was a locked
> mutex and I think pthread_mutex_destroy() would complain here.
> 
> So is the issue here that the "new" value is the same as the "old" value
> and the robust-death-handle part in the kernel does its job? Or did I
> over simplify something?
> Let me continue with the thread…
> 

Yes, this is exactly what I understood as well.

User thread A releases the lock, but exits before setting op_pending = 
NULL. Thread B can free the lock after using it, and by chance needs to 
use the same value as the PID in the same memory. Then thread A do the 
robust list handle inside the kernel and the corruption happens.