From nobody Mon Jun 8 08:30:20 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E11B747DD55; Wed, 3 Jun 2026 14:24:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780496696; cv=none; b=VtZTq6msy9+s21AuwkFFGSY+W6SxbHa/me5Y2wZPmeEXhAnJ3QjYvmsULqNzykcoOLSOvDj4Ic75OI9Qe71HNJG3VT9GMauu1JrGRcVD8VXchwFhk7XtD5XU1CYjdyoN50ejH8BiyMQoAMKtleJQnf7Nzr7wEk4uwzKek7rltT0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780496696; c=relaxed/simple; bh=u1/KSfAY+Lugg3VKpuHj/pnOS3K1fAswiJS85yfJYAU=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=pm8HxHvlNhCLLwmdqG06MBMIpz7YluqNSNoElsm+8Ogt56yaft3Kx6KLoAvNAlBQsy727fxpw8W9UcmR5KagSD2/2snmN6gq5ylaoWlRe806lqk2IviEoNgoXbADD01vXIm39IoEO6llvMZ6rpk5IjQjvAMrnpM6uf5IuUvAg6Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=2VvZKj1L; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=/LP09r3x; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="2VvZKj1L"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="/LP09r3x" Date: Wed, 03 Jun 2026 14:24:50 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1780496692; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0aBOdPym0wzLL7iYa9eKrb+xko4mG7G9mtwZ5C9AXvk=; b=2VvZKj1LXgZW3VrwXlz4FAOLe/R8aB7a88QNZ3ZV886xh5SV/yEh1t4JVaU1K5EyrkWoMX kj9nsyyjGKP355XU33Bc2soxGiX7ABC5ohJdXw80HMSZmRxcpSc6uvXFV5Wq0h8/shr0mL T2QsORRKa/waB+8pcAv6wQniNGdLIw7h1W+jbma6AQNliBrEaoRe9TmHH6nXL/e8AYUKsB TxZt+eScjYpj3D5y13u9CyAaEdFe9uzGhx3CwWy4eQHPZcsxv0EVUpIzXN496icUIZHH4r /I8PNYiIqMZjUsB+LRhjbVLsZ6YTAgWKNC1p2Ag/ZFgIPKr8zu7dh72goOlNwA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1780496692; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0aBOdPym0wzLL7iYa9eKrb+xko4mG7G9mtwZ5C9AXvk=; b=/LP09r3xJFFuqNnGpOh8tINpt9Gpzv2CRYoPQQzaA/FC04rzudpB/J+wTvz5IUP1YVMCb8 DLdRSvZ8YP7JMDCg== From: "tip-bot2 for Thomas Gleixner" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: locking/core] x86/vdso: Implement __vdso_futex_robust_try_unlock() Cc: Thomas Gleixner , "Peter Zijlstra (Intel)" , andrealmeid@igalia.com, Uros Bizjak , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20260602090535.883796247@kernel.org> References: <20260602090535.883796247@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <178049669085.710.4797428561644858749.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the locking/core branch of tip: Commit-ID: a2274cc0091ed4fdce10fad68d08c529b8d3e7dd Gitweb: https://git.kernel.org/tip/a2274cc0091ed4fdce10fad68d08c529b= 8d3e7dd Author: Thomas Gleixner AuthorDate: Tue, 02 Jun 2026 11:10:12 +02:00 Committer: Peter Zijlstra CommitterDate: Wed, 03 Jun 2026 11:38:52 +02:00 x86/vdso: Implement __vdso_futex_robust_try_unlock() When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes, then the unlock sequence in userspace looks like this: 1) robust_list_set_op_pending(mutex); 2) robust_list_remove(mutex); lval =3D gettid(); 3) if (atomic_try_cmpxchg(&mutex->lock, lval, 0)) 4) robust_list_clear_op_pending(); else 5) sys_futex(OP,...FUTEX_ROBUST_UNLOCK); That still leaves a minimal race window between #3 and #4 where the mutex could be acquired by some other task which observes that it is the last user and: 1) unmaps the mutex memory 2) maps a different file, which ends up covering the same address When then the original task exits before reaching #5 then the kernel robust list handling observes the pending op entry and tries to fix up user space. In case that the newly mapped data contains the TID of the exiting thread at the address of the mutex/futex the kernel will set the owner died bit in that memory and therefore corrupt unrelated data. Provide a VDSO function which exposes the critical section window in the VDSO symbol table. The resulting addresses are updated in the task's mm when the VDSO is (re)map()'ed. The core code detects when a task was interrupted within the critical section and is about to deliver a signal. It then invokes an architecture specific function which determines whether the pending op pointer has to be cleared or not. The unlock assembly sequence on 64-bit is: mov %esi,%eax // Load TID into EAX xor %ecx,%ecx // Set ECX to 0 lock cmpxchg %ecx,(%rdi) // Try the TID -> 0 transition .Lstart: jnz .Lend movq %rcx,(%rdx) // Clear list_op_pending .Lend: ret So the decision can be simply based on the ZF state in regs->flags. The pending op pointer is always in DX independent of the build mode (32/64-bit) to make the pending op pointer retrieval uniform. The size of the pointer is stored in the matching criticial section range struct and the core code retrieves it from there. So the pointer retrieval function does not have to care. It is bit-size independent: return regs->flags & X86_EFLAGS_ZF ? regs->dx : NULL; There are two entry points to handle the different robust list pending op pointer size: __vdso_futex_robust_list64_try_unlock() __vdso_futex_robust_list32_try_unlock() The 32-bit VDSO provides only __vdso_futex_robust_list32_try_unlock(). The 64-bit VDSO provides always __vdso_futex_robust_list64_try_unlock() and when COMPAT is enabled also the list32 variant, which is required to support multi-size robust list pointers used by gaming emulators. The unlock function is inspired by an idea from Mathieu Desnoyers. Signed-off-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Andr=C3=A9 Almeida Acked-by: Uros Bizjak Link: https://lore.kernel.org/20260311185409.1988269-1-mathieu.desnoyers@ef= ficios.com Link: https://patch.msgid.link/20260602090535.883796247@kernel.org --- arch/x86/Kconfig | 1 +- arch/x86/entry/vdso/common/vfutex.c | 71 +++++++++++++++++++++++- arch/x86/entry/vdso/vdso32/Makefile | 5 +- arch/x86/entry/vdso/vdso32/vdso32.lds.S | 3 +- arch/x86/entry/vdso/vdso32/vfutex.c | 1 +- arch/x86/entry/vdso/vdso64/Makefile | 7 +- arch/x86/entry/vdso/vdso64/vdso64.lds.S | 7 ++- arch/x86/entry/vdso/vdso64/vdsox32.lds.S | 7 ++- arch/x86/entry/vdso/vdso64/vfutex.c | 1 +- arch/x86/include/asm/futex_robust.h | 19 ++++++- 10 files changed, 117 insertions(+), 5 deletions(-) create mode 100644 arch/x86/entry/vdso/common/vfutex.c create mode 100644 arch/x86/entry/vdso/vdso32/vfutex.c create mode 100644 arch/x86/entry/vdso/vdso64/vfutex.c create mode 100644 arch/x86/include/asm/futex_robust.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1ce62a9..fdaef60 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -239,6 +239,7 @@ config X86 select HAVE_EFFICIENT_UNALIGNED_ACCESS select HAVE_EISA if X86_32 select HAVE_EXIT_THREAD + select HAVE_FUTEX_ROBUST_UNLOCK select HAVE_GENERIC_TIF_BITS select HAVE_GUP_FAST select HAVE_FENTRY if X86_64 || DYNAMIC_FTRACE diff --git a/arch/x86/entry/vdso/common/vfutex.c b/arch/x86/entry/vdso/comm= on/vfutex.c new file mode 100644 index 0000000..454f059 --- /dev/null +++ b/arch/x86/entry/vdso/common/vfutex.c @@ -0,0 +1,71 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include + +/* + * Assembly template for the try unlock functions. The basic functionality= is: + * + * mov esi, %eax Move the TID into EAX + * xor %ecx, %ecx Clear ECX + * lock_cmpxchgl %ecx, (%rdi) Attempt the TID -> 0 transition + * .Lcs_start: Start of the critical section + * jnz .Lcs_end If cmpxchl failed jump to the end + * .Lcs_success: Start of the success section + * movq %rcx, (%rdx) Set the pending op pointer to 0 + * .Lcs_end: End of the critical section + * + * .Lcs_start and .Lcs_end establish the critical section range. .Lcs_succ= ess is + * technically not required, but there for illustration, debugging and tes= ting. + * + * When CONFIG_COMPAT is enabled then the 64-bit VDSO provides two functio= ns. + * One for the regular 64-bit sized pending operation pointer and one for a + * 32-bit sized pointer to support gaming emulators. + * + * The 32-bit VDSO provides only the one for 32-bit sized pointers. + */ +#define __stringify_1(x...) #x +#define __stringify(x...) __stringify_1(x) + +#define LABEL(prefix, which) __stringify(prefix##_try_unlock_cs_##which:) + +#define JNZ_END(prefix) "jnz " __stringify(prefix) "_try_unlock_cs_end\n" + +#define CLEAR_POPQ "movq %[zero], %a[pop]\n" +#define CLEAR_POPL "movl %k[zero], %a[pop]\n" + +#define futex_robust_try_unlock(prefix, clear_pop, __lock, __tid, __pop)\ +({ \ + asm volatile ( \ + " \n" \ + " lock cmpxchgl %k[zero], %a[lock] \n" \ + " \n" \ + LABEL(prefix, start) \ + " \n" \ + JNZ_END(prefix) \ + " \n" \ + LABEL(prefix, success) \ + " \n" \ + clear_pop \ + " \n" \ + LABEL(prefix, end) \ + : [tid] "+&a" (__tid) \ + : [lock] "D" (__lock), \ + [pop] "d" (__pop), \ + [zero] "r" (0UL) \ + : "memory" \ + ); \ + __tid; \ +}) + +#ifdef __x86_64__ +__u32 __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 = *pop) +{ + return futex_robust_try_unlock(__futex_list64, CLEAR_POPQ, lock, tid, pop= ); +} +#endif /* __x86_64__ */ + +#if defined(CONFIG_X86_32) || defined(CONFIG_COMPAT) +__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 = *pop) +{ + return futex_robust_try_unlock(__futex_list32, CLEAR_POPL, lock, tid, pop= ); +} +#endif /* CONFIG_X86_32 || CONFIG_COMPAT */ diff --git a/arch/x86/entry/vdso/vdso32/Makefile b/arch/x86/entry/vdso/vdso= 32/Makefile index ded4fc6..ab4b1f6 100644 --- a/arch/x86/entry/vdso/vdso32/Makefile +++ b/arch/x86/entry/vdso/vdso32/Makefile @@ -7,8 +7,9 @@ vdsos-y :=3D 32 =20 # Files to link into the vDSO: -vobjs-y :=3D note.o vclock_gettime.o vgetcpu.o -vobjs-y +=3D system_call.o sigreturn.o +vobjs-y :=3D note.o vclock_gettime.o vgetcpu.o +vobjs-y +=3D system_call.o sigreturn.o +vobjs-$(CONFIG_FUTEX_ROBUST_UNLOCK) +=3D vfutex.o =20 # Compilation flags flags-y :=3D -DBUILD_VDSO32 -m32 -mregparm=3D0 diff --git a/arch/x86/entry/vdso/vdso32/vdso32.lds.S b/arch/x86/entry/vdso/= vdso32/vdso32.lds.S index 55554f8..cee8f7f 100644 --- a/arch/x86/entry/vdso/vdso32/vdso32.lds.S +++ b/arch/x86/entry/vdso/vdso32/vdso32.lds.S @@ -30,6 +30,9 @@ VERSION __vdso_clock_gettime64; __vdso_clock_getres_time64; __vdso_getcpu; +#ifdef CONFIG_FUTEX_ROBUST_UNLOCK + __vdso_futex_robust_list32_try_unlock; +#endif }; =20 LINUX_2.5 { diff --git a/arch/x86/entry/vdso/vdso32/vfutex.c b/arch/x86/entry/vdso/vdso= 32/vfutex.c new file mode 100644 index 0000000..940a6ee --- /dev/null +++ b/arch/x86/entry/vdso/vdso32/vfutex.c @@ -0,0 +1 @@ +#include "common/vfutex.c" diff --git a/arch/x86/entry/vdso/vdso64/Makefile b/arch/x86/entry/vdso/vdso= 64/Makefile index bfffaf1..7c07900 100644 --- a/arch/x86/entry/vdso/vdso64/Makefile +++ b/arch/x86/entry/vdso/vdso64/Makefile @@ -8,9 +8,10 @@ vdsos-y :=3D 64 vdsos-$(CONFIG_X86_X32_ABI) +=3D x32 =20 # Files to link into the vDSO: -vobjs-y :=3D note.o vclock_gettime.o vgetcpu.o -vobjs-y +=3D vgetrandom.o vgetrandom-chacha.o -vobjs-$(CONFIG_X86_SGX) +=3D vsgx.o +vobjs-y :=3D note.o vclock_gettime.o vgetcpu.o +vobjs-y +=3D vgetrandom.o vgetrandom-chacha.o +vobjs-$(CONFIG_X86_SGX) +=3D vsgx.o +vobjs-$(CONFIG_FUTEX_ROBUST_UNLOCK) +=3D vfutex.o =20 # Compilation flags flags-y :=3D -DBUILD_VDSO64 -m64 -mcmodel=3Dsmall diff --git a/arch/x86/entry/vdso/vdso64/vdso64.lds.S b/arch/x86/entry/vdso/= vdso64/vdso64.lds.S index 5ce3f2b..4a72122 100644 --- a/arch/x86/entry/vdso/vdso64/vdso64.lds.S +++ b/arch/x86/entry/vdso/vdso64/vdso64.lds.S @@ -32,6 +32,13 @@ VERSION { #endif getrandom; __vdso_getrandom; + +#ifdef CONFIG_FUTEX_ROBUST_UNLOCK + __vdso_futex_robust_list64_try_unlock; +#ifdef CONFIG_COMPAT + __vdso_futex_robust_list32_try_unlock; +#endif +#endif local: *; }; } diff --git a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S b/arch/x86/entry/vdso= /vdso64/vdsox32.lds.S index 3dbd20c..b917dc6 100644 --- a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S +++ b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S @@ -22,6 +22,13 @@ VERSION { __vdso_getcpu; __vdso_time; __vdso_clock_getres; + +#ifdef CONFIG_FUTEX_ROBUST_UNLOCK + __vdso_futex_robust_list64_try_unlock; +#ifdef CONFIG_COMPAT + __vdso_futex_robust_list32_try_unlock; +#endif +#endif local: *; }; } diff --git a/arch/x86/entry/vdso/vdso64/vfutex.c b/arch/x86/entry/vdso/vdso= 64/vfutex.c new file mode 100644 index 0000000..940a6ee --- /dev/null +++ b/arch/x86/entry/vdso/vdso64/vfutex.c @@ -0,0 +1 @@ +#include "common/vfutex.c" diff --git a/arch/x86/include/asm/futex_robust.h b/arch/x86/include/asm/fut= ex_robust.h new file mode 100644 index 0000000..e879547 --- /dev/null +++ b/arch/x86/include/asm/futex_robust.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_FUTEX_ROBUST_H +#define _ASM_X86_FUTEX_ROBUST_H + +#include + +static __always_inline void __user *x86_futex_robust_unlock_get_pop(struct= pt_regs *regs) +{ + /* + * If ZF is set then the cmpxchg succeeded and the pending op pointer + * needs to be cleared. + */ + return regs->flags & X86_EFLAGS_ZF ? (void __user *)regs->dx : NULL; +} + +#define arch_futex_robust_unlock_get_pop(regs) \ + x86_futex_robust_unlock_get_pop(regs) + +#endif /* _ASM_X86_FUTEX_ROBUST_H */