From nobody Thu Apr 2 20:22:18 2026 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 868972C08AB for ; Thu, 26 Mar 2026 14:32:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774535555; cv=none; b=I1Jkdc9Uw7OcrTJzhnfb3N6K7PBPxXYRtqOmr6ODjUvMmOMl+3FJ6MCe7STLfPPuV6lAcnObuVCAmc0r1BxEtaf5+Y3qCf8wC/5vSa4+mLrMuMTFGAjKL5tQy1lXuu9+ys7wUFuQBXhbQ+9oGCg3Szc0GB7PpW+ergNco0IbEDg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774535555; c=relaxed/simple; bh=KGggsxKWWJDvxaZ4e4xHAmrO3udLwXZkKDv0ovsn68Q=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=humhrlZDDhy8q1lD3q2IQu/rBojvroZUZyXpjWeXcGXmf1u2IfqH0/uzmAjCLTERZTa55bkVOmmQ5uKUQltQMIiG/0Z9kx6je1T35V+euR0bcKloecbWtKCVeC2Dg+JDqk3Awn8pdZItNvC3WI8icQnYwFL5EDkc5IT1SxgIndY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=mzhWLpHN; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="mzhWLpHN" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Cc:To:In-Reply-To:References:Message-Id: Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date:From:Sender: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=qmFOeZD7OODiOJY74657oUnTl2CWU008CdCSDu/nIPo=; b=mzhWLpHNzOFu92ZSI4R3D7v+O9 IpTr9UPvxvekkXfTmyrpQRCMMt4xp5H0BlH0ZDr4Q0dgMqGCUAO8ZUy9xGUIU1TqjHJD0zTVQxM7k l1UyI61hhx9Tf4GsSPIFwyAQ3CIMlKlWIV3zMhAVse8xiD+Oi9ROPJRhZer4PiPz7tnm/uBMsOj/0 bXA2kkNvc9cxsEf94cmG23qna9BJqoFpZVunJlnMAO+T+/OlQfrv9BJHSt+frHC2zb58EHLwS9Fqm niu134kLXkQPq+KuuZvo1ZylEe0OVrCf/rkAT7KG0Vjn5xRv2eC78zTa+FjRdgj5SfW0csplZhhw2 OgaE9hZg==; Received: from [179.118.189.200] (helo=[192.168.15.100]) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1w5lkt-006MiO-Mw; Thu, 26 Mar 2026 15:32:15 +0100 From: =?utf-8?q?Andr=C3=A9_Almeida?= Date: Thu, 26 Mar 2026 11:31:50 -0300 Subject: [PATCH 1/2] Documentation: futex: Add a note about robust list race condition Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260326-tonyk-vdso_test-v1-1-30a6f78c8bc3@igalia.com> References: <20260326-tonyk-vdso_test-v1-0-30a6f78c8bc3@igalia.com> In-Reply-To: <20260326-tonyk-vdso_test-v1-0-30a6f78c8bc3@igalia.com> To: Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Darren Hart , Davidlohr Bueso , Mathieu Desnoyers , Sebastian Andrzej Siewior , Carlos O'Donell , Florian Weimer , Darren Hart , Arnd Bergmann , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= Cc: linux-kernel@vger.kernel.org, kernel-dev@igalia.com, =?utf-8?q?Andr=C3=A9_Almeida?= X-Mailer: b4 0.15.0 Add a note to the documentation giving a brief explanation why doing a robust futex release in userspace is racy, what should be done to avoid it and provide links to read more. Signed-off-by: Andr=C3=A9 Almeida --- Documentation/locking/robust-futex-ABI.rst | 44 ++++++++++++++++++++++++++= ++++ 1 file changed, 44 insertions(+) diff --git a/Documentation/locking/robust-futex-ABI.rst b/Documentation/loc= king/robust-futex-ABI.rst index f24904f1c16f..1808b108a58e 100644 --- a/Documentation/locking/robust-futex-ABI.rst +++ b/Documentation/locking/robust-futex-ABI.rst @@ -153,6 +153,9 @@ On removal: 3) release the futex lock, and 4) clear the 'lock_op_pending' word. =20 +Please note that the removal of a robust futex purely in userspace is +racy. Refer to the next chapter to learn more and how to avoid this. + On exit, the kernel will consider the address stored in 'list_op_pending' and the address of each 'lock word' found by walking the list starting at 'head'. For each such address, if the bottom 30 @@ -182,3 +185,44 @@ any point: When the kernel sees a list entry whose 'lock word' doesn't have the current threads TID in the lower 30 bits, it does nothing with that entry, and goes on to the next entry. + +Robust release is racy +---------------------- + +The removal of a robust futex from the list is racy when doing solely in +userspace. Quoting Thomas Gleixer for the explanation: + + The robust futex unlock mechanism is racy in respect to the clearing of = the + robust_list_head::list_op_pending pointer because unlock and clearing the + pointer are not atomic. The race window is between the unlock and cleari= ng + the pending op pointer. If the task is forced to exit in this window, ex= it + will access a potentially invalid pending op pointer when cleaning up the + robust list. That happens if another task manages to unmap the object + containing the lock before the cleanup, which results in an UAF. In the + worst case this UAF can lead to memory corruption when unrelated content + has been mapped to the same address by the time the access happens. + +A full in dept analysis can be read at +https://lore.kernel.org/lkml/20260316162316.356674433@kernel.org/ + +To overcome that, the kernel needs to participate of the lock release oper= ation. +This ensures that the release happens "atomically" in the regard of releas= ing +the lock and removing the address from ``lock_op_pending``. If the release= is +interrupted by a signal, the kernel will also verify if it interrupted the +release operation. + +For the contended unlock case, where other threads are waiting for the lock +release, there's the ``FUTEX_ROBUST_UNLOCK`` operation for the ``futex()`` +system call, which must be used with one of the following operations: +``FUTEX_WAKE``, ``FUTEX_WAKE_BITSET`` or ``FUTEX_UNLOCK_PI``. The kernel w= ill +release the lock (set the futex word to zero), clean the ``lock_op_pending= `` +field. Then, it will proceed with the normal wake path. + +For the non-contended path, there's still a race between checking the fute= x word +and clearing the ``lock_op_pending`` field. To solve this without the need= of a +complete system call, userspace should call the virtual syscall +``__vdso_futex_robust_listXX_try_unlock()`` (where XX is either 32 or 64, +depending on the size of the pointer). If the vDSO call succeeds, it means= that +it released the lock and cleared ``lock_op_pending``. If it fails, that me= ans +that there are waiters for this lock and a call to ``futex()`` syscall with +``FUTEX_ROBUST_UNLOCK`` is needed. --=20 2.53.0 From nobody Thu Apr 2 20:22:18 2026 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8691E334C1D for ; Thu, 26 Mar 2026 14:32:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.97.179.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774535554; cv=none; b=RcETaHLMmokiItzNMIaS8GFHMC8w8xm3hPUBb0HrIqx0Rh2dlApBXssI2miXaE8RAud221J0piKTerjBj+HFY5XTQQb03kGPiZovjbHWelEVtB2TN5/Nhip74q8NiWRLxg+nOtk/pVwXhtFsgCb+X/geRFBG9CTC5qJomGso2Ds= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774535554; c=relaxed/simple; bh=FogaJ+tYOygRGuJRxAYjCtWqQKPOuNgV2259k+ZR9DA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=ufpx/5mLdqjhCT4O8vw1Cvt61KkSONBDYnuNLMuPkzCvl8GlmIrjPCyPdpuaxvvkxatYGjndy9Fms5TfyMYUTNJSc7AOkVZUQrIiqi4L0Lyc+j6TY94Vl4iNkgxYff983q/QmeAqcdhZyOk+3EH0IfCyIicS+85mEbhqJF7+xcg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=mq5ZFl3h; arc=none smtp.client-ip=213.97.179.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="mq5ZFl3h" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Cc:To:In-Reply-To:References:Message-Id: Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date:From:Sender: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=lNZrj14UAhm/E7sYSnz/ufuH1WMbgPdLiG+lia1klj0=; b=mq5ZFl3hh13QTSgfbQsIYXCioB qwDUUOkhby+IuJzMLS6SMNos/g+bICNHEz7zVYQzmsck+akbF9HPRnPenoLhUZqJmiDSyyWeoTBvY UPiqUAGxot+OydxskpKXF6zuiJfeWydkWjcPbAdOEGEujNA6D1uDV+GOHDQR4xvjY0d6mNwnblxcq hNJGPDUr6y67Lm/a3AxsTtPRV2cJ6iODtFBm0v91LgWXaO98k+ydDVFHVTySXOz/0RuTu5JaOXLOK m3LIRIX67jd48eaiVMd11ApF93+loz+q2LyNms8hZnHIdCcTYLud3IqkFU8gLLUomRz3TQbkza3aO pTBjGo2Q==; Received: from [179.118.189.200] (helo=[192.168.15.100]) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1w5lkx-006MiO-H9; Thu, 26 Mar 2026 15:32:19 +0100 From: =?utf-8?q?Andr=C3=A9_Almeida?= Date: Thu, 26 Mar 2026 11:31:51 -0300 Subject: [PATCH 2/2] selftests: futex: Add tests for robust release operations Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260326-tonyk-vdso_test-v1-2-30a6f78c8bc3@igalia.com> References: <20260326-tonyk-vdso_test-v1-0-30a6f78c8bc3@igalia.com> In-Reply-To: <20260326-tonyk-vdso_test-v1-0-30a6f78c8bc3@igalia.com> To: Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Darren Hart , Davidlohr Bueso , Mathieu Desnoyers , Sebastian Andrzej Siewior , Carlos O'Donell , Florian Weimer , Darren Hart , Arnd Bergmann , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= Cc: linux-kernel@vger.kernel.org, kernel-dev@igalia.com, =?utf-8?q?Andr=C3=A9_Almeida?= X-Mailer: b4 0.15.0 Add tests for __vdso_futex_robust_listXX_try_unlock() and for the futex() op FUTEX_ROBUST_UNLOCK. Test the contended and uncontended cases for the vDSO functions and all ops combinations for FUTEX_ROBUST_UNLOCK. Signed-off-by: Andr=C3=A9 Almeida --- .../selftests/futex/functional/robust_list.c | 203 +++++++++++++++++= ++++ tools/testing/selftests/futex/include/futextest.h | 3 + 2 files changed, 206 insertions(+) diff --git a/tools/testing/selftests/futex/functional/robust_list.c b/tools= /testing/selftests/futex/functional/robust_list.c index e7d1254e18ca..38a3f9e9efc2 100644 --- a/tools/testing/selftests/futex/functional/robust_list.c +++ b/tools/testing/selftests/futex/functional/robust_list.c @@ -27,12 +27,14 @@ #include "futextest.h" #include "../../kselftest_harness.h" =20 +#include #include #include #include #include #include #include +#include #include #include =20 @@ -54,6 +56,12 @@ static int get_robust_list(int pid, struct robust_list_h= ead **head, size_t *len_ return syscall(SYS_get_robust_list, pid, head, len_ptr); } =20 +static int sys_futex_robust_unlock(_Atomic(uint32_t) *uaddr, unsigned int = op, int val, + void *list_op_pending, unsigned int val3) +{ + return syscall(SYS_futex, uaddr, op, val, NULL, list_op_pending, val3, 0); +} + /* * Basic lock struct, contains just the futex word and the robust list ele= ment * Real implementations have also a *prev to easily walk in the list @@ -549,4 +557,199 @@ TEST(test_circular_list) ksft_test_result_pass("%s\n", __func__); } =20 +/* + * Bellow are tests for the fix of robust release race condition. Please r= ead the following + * thread to learn more about the issue in the first place and why the fol= lowing functions fix it: + * https://lore.kernel.org/lkml/20260316162316.356674433@kernel.org/ + */ + +/* + * Auxiliary code for loading the vDSO functions + */ +#define VDSO_SIZE 0x4000 + +void *get_vdso_func_addr(const char *str) +{ + void *vdso_base =3D (void *) getauxval(AT_SYSINFO_EHDR), *addr; + Dl_info info; + + if (!vdso_base) { + perror("Error to get AT_SYSINFO_EHDR"); + return NULL; + } + + for (addr =3D vdso_base; addr < vdso_base + VDSO_SIZE; addr +=3D sizeof(a= ddr)) { + if (dladdr(addr, &info) =3D=3D 0 || !info.dli_sname) + continue; + + if (!strcmp(info.dli_sname, str)) + return info.dli_saddr; + } + + return NULL; +} + +/* + * These are the real vDSO function signatures: + * + * __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *po= p) + * __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *po= p) + * + * So for the generic entry point we need to use a void pointer as the las= t argument + */ +FIXTURE(vdso_unlock) +{ + uint32_t (*vdso)(_Atomic(uint32_t) *lock, uint32_t tid, void *pop); +}; + +FIXTURE_VARIANT(vdso_unlock) +{ + bool is_32; + char func_name[]; +}; + +FIXTURE_SETUP(vdso_unlock) +{ + self->vdso =3D get_vdso_func_addr(variant->func_name); + + if (!self->vdso) + ksft_test_result_skip("%s not found\n", variant->func_name); +} + +FIXTURE_TEARDOWN(vdso_unlock) {} + +FIXTURE_VARIANT_ADD(vdso_unlock, 32) +{ + .func_name =3D "__vdso_futex_robust_list32_try_unlock", + .is_32 =3D true, +}; + +FIXTURE_VARIANT_ADD(vdso_unlock, 64) +{ + .func_name =3D "__vdso_futex_robust_list64_try_unlock", + .is_32 =3D false, +}; + +/* + * Test the vDSO robust_listXX_try_unlock() for the uncontended case. The = virtual syscall should + * return the thread ID of the lock owner, the lock word must be 0 and the= list_op_pending should + * be NULL. + */ +TEST_F(vdso_unlock, test_robust_try_unlock_uncontended) +{ + struct lock_struct lock =3D { .futex =3D 0 }; + _Atomic(unsigned int) *futex =3D &lock.futex; + struct robust_list_head head; + uint64_t exp =3D (uint64_t) NULL; + pid_t tid =3D gettid(); + int ret; + + *futex =3D tid; + + ret =3D set_list(&head); + if (ret) + ksft_test_result_fail("set_robust_list error\n"); + + head.list_op_pending =3D &lock.list; + + ret =3D self->vdso(futex, tid, &head.list_op_pending); + + ASSERT_EQ(ret, tid); + ASSERT_EQ(*futex, 0); + + /* Check only the lower 32 bits for the 32-bit entry point */ + if (variant->is_32) { + exp =3D (uint64_t)(unsigned long)&lock.list; + exp &=3D ~0xFFFFFFFFULL; + } + + ASSERT_EQ((uint64_t)(unsigned long)head.list_op_pending, exp); +} + +/* + * If the lock is contended, the operation fails. The return value is the = value found at the + * futex word (tid | FUTEX_WAITERS), the futex word is not modified and th= e list_op_pending is_32 + * not cleared. + */ +TEST_F(vdso_unlock, test_robust_try_unlock_contended) +{ + struct lock_struct lock =3D { .futex =3D 0 }; + _Atomic(unsigned int) *futex =3D &lock.futex; + struct robust_list_head head; + pid_t tid =3D gettid(); + int ret; + + *futex =3D tid | FUTEX_WAITERS; + + ret =3D set_list(&head); + if (ret) + ksft_test_result_fail("set_robust_list error\n"); + + head.list_op_pending =3D &lock.list; + + ret =3D self->vdso(futex, tid, &head.list_op_pending); + + ASSERT_EQ(ret, tid | FUTEX_WAITERS); + ASSERT_EQ(*futex, tid | FUTEX_WAITERS); + ASSERT_EQ(head.list_op_pending, &lock.list); +} + +FIXTURE(futex_op) {}; + +FIXTURE_VARIANT(futex_op) +{ + unsigned int op; + unsigned int val3; +}; + +FIXTURE_SETUP(futex_op) {} + +FIXTURE_TEARDOWN(futex_op) {} + +FIXTURE_VARIANT_ADD(futex_op, wake) +{ + .op =3D FUTEX_WAKE, + .val3 =3D 0, +}; + +FIXTURE_VARIANT_ADD(futex_op, wake_bitset) +{ + .op =3D FUTEX_WAKE_BITSET, + .val3 =3D FUTEX_BITSET_MATCH_ANY, +}; + +FIXTURE_VARIANT_ADD(futex_op, unlock_pi) +{ + .op =3D FUTEX_UNLOCK_PI, + .val3 =3D 0, +}; + +/* + * The syscall should return the number of tasks waken (for this test, 0),= clear the futex word and + * clear list_op_pending + */ +TEST_F(futex_op, test_futex_robust_unlock) +{ + struct lock_struct lock =3D { .futex =3D 0 }; + _Atomic(unsigned int) *futex =3D &lock.futex; + struct robust_list_head head; + pid_t tid =3D gettid(); + int ret; + + *futex =3D tid | FUTEX_WAITERS; + + ret =3D set_list(&head); + if (ret) + ksft_test_result_fail("set_robust_list error\n"); + + head.list_op_pending =3D &lock.list; + + ret =3D sys_futex_robust_unlock(futex, FUTEX_ROBUST_UNLOCK | variant->op,= tid, + &head.list_op_pending, variant->val3); + + ASSERT_EQ(ret, 0); + ASSERT_EQ(*futex, 0); + ASSERT_EQ(head.list_op_pending, NULL); +} + TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/futex/include/futextest.h b/tools/test= ing/selftests/futex/include/futextest.h index 3d48e9789d9f..f4d880b8e795 100644 --- a/tools/testing/selftests/futex/include/futextest.h +++ b/tools/testing/selftests/futex/include/futextest.h @@ -38,6 +38,9 @@ typedef volatile u_int32_t futex_t; #ifndef FUTEX_CMP_REQUEUE_PI #define FUTEX_CMP_REQUEUE_PI 12 #endif +#ifndef FUTEX_ROBUST_UNLOCK +#define FUTEX_ROBUST_UNLOCK 512 +#endif #ifndef FUTEX_WAIT_REQUEUE_PI_PRIVATE #define FUTEX_WAIT_REQUEUE_PI_PRIVATE (FUTEX_WAIT_REQUEUE_PI | \ FUTEX_PRIVATE_FLAG) --=20 2.53.0