From nobody Wed Jun 17 01:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 945353E1238; Tue, 28 Apr 2026 23:33:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419221; cv=none; b=r3g9lbYzoUB072Xke707pO46HlpkbVKOflhwm30NnYud4r7KNcv8ME7kB+hVynYMsl6TqPSv3qrzdpbDRNGCY1qP0UyvXS2Ylc7TzSzRF+s2vVZLLoEAsTDwnVlw90HDJKjcFnpTlyHRu4UuQ/giZeL/R4bDbJngKRjQfkYZ0mQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419221; c=relaxed/simple; bh=q3X1KnnfJZ02/srklaWib7iUNJBMrUyFhloVAzkIzUQ=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=dN6OGaXMhwGGzWN7k/kaQfHeUZhx/7mG02eh0ecGI3gsaHpLj0DxEgbqTv5153GuXch6rnclGXlzSsFpCX6lW16LELdQxq+FJuSWrZeyaUcAKJfpQ8LaVzrHx8Q5tcDe6l4dBNUEeoHci2pxkxS4c7xUV9MLcko6Kldp4DECOGM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VhhbH8P1; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VhhbH8P1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 788CFC4AF11; Tue, 28 Apr 2026 23:33:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777419221; bh=q3X1KnnfJZ02/srklaWib7iUNJBMrUyFhloVAzkIzUQ=; h=Date:From:To:Cc:Subject:References:From; b=VhhbH8P1cmP2mjIrXrURDBFguhE9B9ClrwDfZstdNKJxYJ7dhN0JS2fg86ljsosaK 0mi8nT/DoWFrEinj/iHyqmguwZWU0DTr7Y+V7sAjELW1YnsF1uaP2DKeZDTYOxw3nc sChfejL9gQfNUdKmFCZquCmiLDBE+UQ4lKcww93W37pXf+BkSsohiG5nCoTtJfpHWB TC3y5q7jDDBb9p0TzqqjACb2Cwk2aE9zHg7OZ05ycdiSw98Zs3m5ouyPGvXd8MRltK 4KMEfzt0wFhGLzFa50rt6IGCUnHYzKLOS4F6KACTD2CLs6JkYICd6EiDPMvr2x8VAj sPQ6BZcH7+nFQ== Date: Wed, 29 Apr 2026 01:33:37 +0200 Message-ID: <20260428224427.271566313@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Mathias Stearn , Dmitry Vyukov , Peter Zijlstra , linux-man@vger.kernel.org, Mark Rutland , Mathieu Desnoyers , Chris Kennelly , regressions@lists.linux.dev, Ingo Molnar , Blake Oler , Florian Weimer , Rich Felker , Matthew Wilcox , Greg Kroah-Hartman , Linus Torvalds Subject: [patch 01/10] rseq: Set rseq::cpu_id_start to 0 on unregistration References: <20260428221058.149538293@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The RSEQ rework changed that to RSEQ_CPU_UNINITILIZED, which is obviously incompatible. Revert back to the original behavior. Fixes: 0f085b41880e ("rseq: Provide and use rseq_set_ids()") Reported-by: Dmitry Vyukov Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Reviewed-by: Dmitry Vyukov --- kernel/rseq.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -236,11 +236,6 @@ static int __init rseq_debugfs_init(void } __initcall(rseq_debugfs_init); =20 -static bool rseq_set_ids(struct task_struct *t, struct rseq_ids *ids, u32 = node_id) -{ - return rseq_set_ids_get_csaddr(t, ids, node_id, NULL); -} - static bool rseq_handle_cs(struct task_struct *t, struct pt_regs *regs) { struct rseq __user *urseq =3D t->rseq.usrptr; @@ -384,19 +379,22 @@ void rseq_syscall(struct pt_regs *regs) =20 static bool rseq_reset_ids(void) { - struct rseq_ids ids =3D { - .cpu_id =3D RSEQ_CPU_ID_UNINITIALIZED, - .mm_cid =3D 0, - }; + struct rseq __user *rseq =3D current->rseq.usrptr; =20 /* * If this fails, terminate it because this leaves the kernel in * stupid state as exit to user space will try to fixup the ids * again. */ - if (rseq_set_ids(current, &ids, 0)) - return true; + scoped_user_rw_access(rseq, efault) { + unsafe_put_user(0, &rseq->cpu_id_start, efault); + unsafe_put_user(RSEQ_CPU_ID_UNINITIALIZED, &rseq->cpu_id, efault); + unsafe_put_user(0, &rseq->node_id, efault); + unsafe_put_user(0, &rseq->mm_cid, efault); + } + return true; =20 +efault: force_sig(SIGSEGV); return false; } From nobody Wed Jun 17 01:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C7A6E38B15E; Tue, 28 Apr 2026 23:33:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419227; cv=none; b=bLC8zUmtMP9c9SSFvpNVgbKihIMgR3AEifZYgO/HgiGOYx51QTDnMbTUZI3LvvkrBNhbP955GN2J6X/kxnjtgrsiX1C+uoaSvTFgYXdmjutaBHz+lruPSEgCDr91xWrjJ9i+l7oXRjqUNs+prlDr5dU9P/pJyR8GA2f9AfXFOMA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419227; c=relaxed/simple; bh=tQN6MWf/5YWMmuXTo+JR3VXnNeV3GXdK1CV+HEyv1Sw=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=K7AOzKoNPpQnwKwvkSOOklx2gwrKDTp2iJaqAScTmxMALb0YDEvYE5yO2yVFiTeuMPfoPYcHt6lcrTWOWu1CjL63WdjNzYLCIld8JPhi2t/+WiRmLPIxp3vQOYl0nAM0wVWHIWgn/MzkwDE5/rLmtk12Qe4ibBqYVbLWyFD65WQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rcAO8V9M; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rcAO8V9M" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8D8A4C2BCAF; Tue, 28 Apr 2026 23:33:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777419227; bh=tQN6MWf/5YWMmuXTo+JR3VXnNeV3GXdK1CV+HEyv1Sw=; h=Date:From:To:Cc:Subject:References:From; b=rcAO8V9M6pHxogbzH/vEAUnm9iBn4dhcb3Qh27EtC2mnGV3PA+2AxS+FGbGa6YG92 GJqSPzRFJZifw4s56LTPGSMu8cDDZfeZePXfFxBH60Gr+UtZeDMSGenBO0l3IzIRHu ePnVt6q73yU2nXxZKkRTesQfM5vLuau3qv1Kn0QSGkspzyCoYVcD7OgmlfbFARju7x pIZL81DIiogyx+0Ej9yGY5RKA9qxYQHlzwUTdBNK58KzVstr1E4cHBljXHNtn3L6bz EAwK9DLBxhebYmf9PQ2x27fnWDGTPmb7Dwf0VtdzNXRDfAceO3gX4FdHls8Oz8iKp9 +nFRL3/atVvCw== Date: Wed, 29 Apr 2026 01:33:43 +0200 Message-ID: <20260428224427.353887714@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Mathias Stearn , Dmitry Vyukov , Peter Zijlstra , linux-man@vger.kernel.org, Mark Rutland , Mathieu Desnoyers , Chris Kennelly , regressions@lists.linux.dev, Ingo Molnar , Blake Oler , Florian Weimer , Rich Felker , Matthew Wilcox , Greg Kroah-Hartman , Linus Torvalds Subject: [patch 02/10] rseq: Protect rseq_reset() against interrupts References: <20260428221058.149538293@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" rseq_reset() uses memset() to clear the tasks rseq data. That's racy against membarrier() and preemption. Guard it with irqsave to cure this. Fixes: faba9d250eae ("rseq: Introduce struct rseq_data") Reported-by: Dmitry Vyukov Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Reviewed-by: Dmitry Vyukov --- include/linux/rseq.h | 1 + 1 file changed, 1 insertion(+) --- a/include/linux/rseq.h +++ b/include/linux/rseq.h @@ -119,6 +119,7 @@ static inline void rseq_virt_userspace_e =20 static inline void rseq_reset(struct task_struct *t) { + guard(irqsave)(); memset(&t->rseq, 0, sizeof(t->rseq)); t->rseq.ids.cpu_id =3D RSEQ_CPU_ID_UNINITIALIZED; } From nobody Wed Jun 17 01:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 843853EF0C9; Tue, 28 Apr 2026 23:33:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419233; cv=none; b=uODfOBQa/BSjr5fe2u/u8ofJ8LQ7DuIWE7j+sMxZ5glN/bfIgQC7EIimwrFR1P5BPXXcVmw9WuS1jotAlII2jvGKZS9ZmOUgRGnAcHf21uYtG/vZ3soZuyP6P00AfU+emBO5kklcOXg/w7gL7qGtYnM+aIfitUCXLLBl8gjTFWI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419233; c=relaxed/simple; bh=pXX6/6G2YvqxHO4PQJxwtcOMRv1WqdDb6PMnhuNhHx4=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=nnSFuWGUvecvos7ec2+ot6DFpDL1ch2Ns/HUpVkWMZRrG0pjO6eatNeSZ15CpffW3cIh92Z8nsI2VG+VuUh5frYbZOQiF8x+MQZ5dUG/ovqvY+FwkEOcp8qB5l5J8Ej4p1QJEuoq7d2Wq06lWDiXrdgePAv5JYKXp78p/1HlHXA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Jnsy/F5z; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Jnsy/F5z" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7C7FCC2BCC4; Tue, 28 Apr 2026 23:33:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777419233; bh=pXX6/6G2YvqxHO4PQJxwtcOMRv1WqdDb6PMnhuNhHx4=; h=Date:From:To:Cc:Subject:References:From; b=Jnsy/F5zzv9629ULkVTJwrMJgRAYduWoiPfMRS7VfrylJJFtFEqf2WExUdQc55m8D 3iFnY0Xwsfrl28vWm65iMBmZfiMgmbRKgYplg/TDY7IAWwnVChLh9fYlXRg1Cvd6+2 WlHKCFcdYF4aQoa2Q1Jhjq0c2mTMT9Q3CdIXkOcCpoIFT0GoABkS3kXl0xCrCN9NHW e39LGfZVrkgOzMjnvowivZmhZlaMSyJz3vRX81Of1KbojnIxDoACfnkIP/NjSny/W6 jjRVQasYjcVBKssjrYwyyqRRGbR36pMjp2nSkH3GkTqbLxeaWLQq85jrEnX5uFoXm5 9tE//ZloX9hRg== Date: Wed, 29 Apr 2026 01:33:49 +0200 Message-ID: <20260428224427.437059375@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Mathias Stearn , Dmitry Vyukov , Peter Zijlstra , linux-man@vger.kernel.org, Mark Rutland , Mathieu Desnoyers , Chris Kennelly , regressions@lists.linux.dev, Ingo Molnar , Blake Oler , Florian Weimer , Rich Felker , Matthew Wilcox , Greg Kroah-Hartman , Linus Torvalds Subject: [patch 03/10] rseq: Dont advertise time slice extensions if disabled References: <20260428221058.149538293@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If time slice extensions have been disabled on the kernel command line, then advertising them in RSEQ flags is wrong. Adjust the conditionals to reflect reality, fixup the misleading comments about the gap of these flags and the rseq::flags field. Fixes: d6200245c75e ("rseq: Allow registering RSEQ with slice extension") Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Reviewed-by: Dmitry Vyukov --- include/uapi/linux/rseq.h | 5 ++++- kernel/rseq.c | 9 +++++---- 2 files changed, 9 insertions(+), 5 deletions(-) --- a/include/uapi/linux/rseq.h +++ b/include/uapi/linux/rseq.h @@ -28,7 +28,7 @@ enum rseq_cs_flags_bit { RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT =3D 0, RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT =3D 1, RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT =3D 2, - /* (3) Intentional gap to put new bits into a separate byte */ + /* (3) Intentional gap to keep new bits separate */ =20 /* User read only feature flags */ RSEQ_CS_FLAG_SLICE_EXT_AVAILABLE_BIT =3D 4, @@ -161,6 +161,9 @@ struct rseq { * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT * - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE + * + * It is now used for feature status advertisement by the kernel. + * See: enum rseq_cs_flags_bit for further information. */ __u32 flags; =20 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -462,10 +462,11 @@ SYSCALL_DEFINE4(rseq, struct rseq __user return -EFAULT; =20 if (IS_ENABLED(CONFIG_RSEQ_SLICE_EXTENSION)) { - rseqfl |=3D RSEQ_CS_FLAG_SLICE_EXT_AVAILABLE; - if (rseq_slice_extension_enabled() && - (flags & RSEQ_FLAG_SLICE_EXT_DEFAULT_ON)) - rseqfl |=3D RSEQ_CS_FLAG_SLICE_EXT_ENABLED; + if (rseq_slice_extension_enabled()) { + rseqfl |=3D RSEQ_CS_FLAG_SLICE_EXT_AVAILABLE; + if (flags & RSEQ_FLAG_SLICE_EXT_DEFAULT_ON) + rseqfl |=3D RSEQ_CS_FLAG_SLICE_EXT_ENABLED; + } } =20 scoped_user_write_access(rseq, efault) { From nobody Wed Jun 17 01:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4CFDB3F0755; Tue, 28 Apr 2026 23:33:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419238; cv=none; b=lTE+IjOMDyYvODIXZY97HlYJc1g//KIqAwRv7sqM18dipJWOdDWU7GW0tD3RWPuzzXPv5wtYvBrn+C2mKzDbY9B3VKpwJe8poew5VyHuBskUyHvuo1823gI33l5X6NWYcbBeDYz6sh5T1EKqRrUvW51+SwQyNpau4LGCc/Yx9fw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419238; c=relaxed/simple; bh=DxGjH4TLyjWnbNss3J378kObrddEFdb4ue9MyBQ3Kzw=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=B6XCNOKt7b+Rm7LUwwBVfJxXsDgIkSH8aDiuLn6LXH7+JHOkzHImP3mMJhT0FsR/aAcdTezAg0pCvDLXZcKQw1jEA0Grtjz0X1gYRfsykxmdLQL+HpAUb1A+hlvPRsfPYsM9E5q+98pnPjcL3vUuAYXaBfz1j+MBhtyp5HZU0yg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kNcS27dq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kNcS27dq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 71C43C2BCAF; Tue, 28 Apr 2026 23:33:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777419237; bh=DxGjH4TLyjWnbNss3J378kObrddEFdb4ue9MyBQ3Kzw=; h=Date:From:To:Cc:Subject:References:From; b=kNcS27dq3m6eU1jsirtDl9fSmX6e/oTqSQiXQYnyjlgDhCQB6HpUq2/ATm30RLh2I FOFBtq3O6BB7Qfkw0+zU9VBe/Lr4gIxHIER3AtvP82mZgLD233kieXHBcNUohLF6E2 U7nWtUHUJBMLEeKlQgsw2ht0018YVdwLGmEBpxaAQWHqbha5h60Xi5HlRFeocppYJf k52xIHKj3D1YGZeo9EWb9WQbdvWl8Z0qW/YvY4nuVMoPkS2BIkD0oCaKX3WXVKt0AQ qIKEFm2mRIgN+r9NMLfuV5TwDVKW3U8fAsGwkmSp+J659BInhOVDJFNsazBDrL+ilq bgHx+Rfh1thtQ== Date: Wed, 29 Apr 2026 01:33:54 +0200 Message-ID: <20260428224427.517051752@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Mathias Stearn , Dmitry Vyukov , Peter Zijlstra , linux-man@vger.kernel.org, Mark Rutland , Mathieu Desnoyers , Chris Kennelly , regressions@lists.linux.dev, Ingo Molnar , Blake Oler , Florian Weimer , Rich Felker , Matthew Wilcox , Greg Kroah-Hartman , Linus Torvalds Subject: [patch 04/10] rseq: Revert to historical performance killing behaviour References: <20260428221058.149538293@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The recent RSEQ optimization work broke the TCMalloc abuse of the RSEQ ABI as it not longer unconditionally updates the CPU, node, mm_cid fields, which are documented as read only for user space. Due to the observed behavior of the kernel it was possible for TCMalloc to overwrite the cpu_id_start field for their own purposes and rely on the kernel to update it unconditionally after each context switch and before signal delivery. The RSEQ ABI only guarantees that these fields are updated when the data changes, i.e. the task is migrated or the MMCID of the task changes due to switching from or to per CPU ownership mode. The optimization work eliminated the unconditional updates and reduced them to the documented ABI guarantees, which results in a massive performance win for syscall, scheduling heavy work loads, which in turn breaks the TCMalloc expectations. There have been several options discussed to restore the TCMalloc functionality while preserving the optimization benefits. They all end up in a series of hard to maintain workarounds, which in the worst case introduce overhead for everyone, e.g. in the scheduler. The requirements of TCMalloc and the optimization work are diametral and the required work arounds are a maintainence burden. They end up as fragile constructs, which are blocking further optimization work and are pretty much guaranteed to cause more subtle issues down the road. The optimization work heavily depends on the generic entry code, which is not used by all architectures yet. So the rework preserved the original mechanism moslty unmodified to keep the support for architectures, which handle rseq in their own exit to user space loop. That code is currently optimized out by the compiler on architectures which use the generic entry code. This allows to revert back to the original behaviour by replacing the compile time constant conditions with a runtime condition where required, which disables the optimization and the dependend time slice extension feature until the run-time condition can be enabled in the RSEQ registration code on a per task basis again. The following changes are required to restore the original behavior, which makes TCMalloc work again: 1) Replace the compile time constant conditionals with runtime conditionals where appropriate to prevent the compiler from optimizing the legacy mode out 2) Enforce unconditional update of IDs on context switch for the non-optimized v1 mode 3) Enforce update of IDs in the pre signal delivery path for the non-optimized v1 mode 4) Enforce update of IDs in the membarrier(RSEQ) IPI for the non-optimized v1 mode 5) Make time slice and future extensions depend on optimized v2 mode This brings back the full performance problems, but preserves the v2 optimization code and for generic entry code using architectures also the TIF_RSEQ optimization which avoids a full evaluation of the exit to user mode loop in many cases. Fixes: 566d8015f7ee ("rseq: Avoid CPU/MM CID updates when no event pending") Reported-by: Mathias Stearn Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Closes: https://lore.kernel.org/CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzR= w4Zgu=3DoA@mail.gmail.com --- include/linux/rseq.h | 34 +++++++++++++++++++++++----------- include/linux/rseq_entry.h | 39 +++++++++++++++++++++++++++++---------- include/linux/rseq_types.h | 9 ++++++++- kernel/rseq.c | 42 ++++++++++++++++++++++++++++++++++------= -- kernel/sched/membarrier.c | 11 ++++++++++- 5 files changed, 104 insertions(+), 31 deletions(-) --- a/include/linux/rseq.h +++ b/include/linux/rseq.h @@ -9,6 +9,11 @@ =20 void __rseq_handle_slowpath(struct pt_regs *regs); =20 +static __always_inline bool rseq_v2(struct task_struct *t) +{ + return IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY) && likely(t->rseq.event.has_r= seq > 1); +} + /* Invoked from resume_user_mode_work() */ static inline void rseq_handle_slowpath(struct pt_regs *regs) { @@ -16,8 +21,7 @@ static inline void rseq_handle_slowpath( if (current->rseq.event.slowpath) __rseq_handle_slowpath(regs); } else { - /* '&' is intentional to spare one conditional branch */ - if (current->rseq.event.sched_switch & current->rseq.event.has_rseq) + if (current->rseq.event.sched_switch && current->rseq.event.has_rseq) __rseq_handle_slowpath(regs); } } @@ -30,9 +34,9 @@ void __rseq_signal_deliver(int sig, stru */ static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_reg= s *regs) { - if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY)) { - /* '&' is intentional to spare one conditional branch */ - if (current->rseq.event.has_rseq & current->rseq.event.user_irq) + if (rseq_v2(current)) { + /* has_rseq is implied in rseq_v2() */ + if (current->rseq.event.user_irq) __rseq_signal_deliver(ksig->sig, regs); } else { if (current->rseq.event.has_rseq) @@ -50,15 +54,22 @@ static __always_inline void rseq_sched_s { struct rseq_event *ev =3D &t->rseq.event; =20 - if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY)) { + /* + * Only apply the user_irq optimization for RSEQ ABI V2 registrations. + * Legacy users like TCMalloc rely on the original ABI V1 behaviour + * which updates IDs on every context swtich. + */ + if (rseq_v2(t)) { /* - * Avoid a boat load of conditionals by using simple logic - * to determine whether NOTIFY_RESUME needs to be raised. + * Avoid a boat load of conditionals by using simple logic to + * determine whether TIF_NOTIFY_RESUME or TIF_RSEQ needs to be + * raised. * - * It's required when the CPU or MM CID has changed or - * the entry was from user space. + * It's required when the CPU or MM CID has changed or the entry + * was via interrupt from user space. ev->has_rseq does not have + * to be evaluated here because rseq_v2() implies has_rseq. */ - bool raise =3D (ev->user_irq | ev->ids_changed) & ev->has_rseq; + bool raise =3D ev->user_irq | ev->ids_changed; =20 if (raise) { ev->sched_switch =3D true; @@ -66,6 +77,7 @@ static __always_inline void rseq_sched_s } } else { if (ev->has_rseq) { + t->rseq.event.ids_changed =3D true; t->rseq.event.sched_switch =3D true; rseq_raise_notify_resume(t); } --- a/include/linux/rseq_entry.h +++ b/include/linux/rseq_entry.h @@ -111,6 +111,20 @@ static __always_inline void rseq_slice_c t->rseq.slice.state.granted =3D false; } =20 +/* + * Open coded, so it can be invoked within a user access region. + * + * This clears the user space state of the time slice extensions field onl= y when + * the task has registered the optimized RSEQ_ABI V2. Some legacy registra= tions, + * e.g. TCMalloc, have conflicting non-ABI fields in struct RSEQ, which wo= uld be + * overwritten by an unconditional write. + */ +#define rseq_slice_clear_user(rseq, efault) \ +do { \ + if (rseq_slice_extension_enabled()) \ + unsafe_put_user(0U, &rseq->slice_ctrl.all, efault); \ +} while (0) + static __always_inline bool __rseq_grant_slice_extension(bool work_pending) { struct task_struct *curr =3D current; @@ -230,6 +244,7 @@ static __always_inline bool rseq_slice_e static __always_inline bool rseq_arm_slice_extension_timer(void) { return = false; } static __always_inline void rseq_slice_clear_grant(struct task_struct *t) = { } static __always_inline bool rseq_grant_slice_extension(unsigned long ti_wo= rk, unsigned long mask) { return false; } +#define rseq_slice_clear_user(rseq, efault) do { } while (0) #endif /* !CONFIG_RSEQ_SLICE_EXTENSION */ =20 bool rseq_debug_update_user_cs(struct task_struct *t, struct pt_regs *regs= , unsigned long csaddr); @@ -517,11 +532,9 @@ bool rseq_set_ids_get_csaddr(struct task if (csaddr) unsafe_get_user(*csaddr, &rseq->rseq_cs, efault); =20 - /* Open coded, so it's in the same user access region */ - if (rseq_slice_extension_enabled()) { - /* Unconditionally clear it, no point in conditionals */ - unsafe_put_user(0U, &rseq->slice_ctrl.all, efault); - } + /* RSEQ ABI V2 only operations */ + if (rseq_v2(t)) + rseq_slice_clear_user(rseq, efault); } =20 rseq_slice_clear_grant(t); @@ -612,6 +625,14 @@ static __always_inline bool rseq_exit_us * interrupts disabled */ guard(pagefault)(); + /* + * This optimization is only valid when the task registered for the + * optimized RSEQ_ABI_V2 variant. Some legacy users rely on the original + * RSEQ implementation behaviour which unconditionally updated the IDs. + * rseq_sched_switch_event() ensures that legacy registrations always + * have both sched_switch and ids_changed set, which is compatible with + * the historical TIF_NOTIFY_RESUME behaviour. + */ if (likely(!t->rseq.event.ids_changed)) { struct rseq __user *rseq =3D t->rseq.usrptr; /* @@ -623,11 +644,9 @@ static __always_inline bool rseq_exit_us scoped_user_rw_access(rseq, efault) { unsafe_get_user(csaddr, &rseq->rseq_cs, efault); =20 - /* Open coded, so it's in the same user access region */ - if (rseq_slice_extension_enabled()) { - /* Unconditionally clear it, no point in conditionals */ - unsafe_put_user(0U, &rseq->slice_ctrl.all, efault); - } + /* RSEQ ABI V2 only operations */ + if (rseq_v2(t)) + rseq_slice_clear_user(rseq, efault); } =20 rseq_slice_clear_grant(t); --- a/include/linux/rseq_types.h +++ b/include/linux/rseq_types.h @@ -9,6 +9,12 @@ #ifdef CONFIG_RSEQ struct rseq; =20 +/* + * rseq_event::has_rseq contains the ABI version number so preserving it + * in AND operations requires a mask. + */ +#define RSEQ_HAS_RSEQ_VERSION_MASK 0xff + /** * struct rseq_event - Storage for rseq related event management * @all: Compound to initialize and clear the data efficiently @@ -17,7 +23,8 @@ struct rseq; * exit to user * @ids_changed: Indicator that IDs need to be updated * @user_irq: True on interrupt entry from user mode - * @has_rseq: True if the task has a rseq pointer installed + * @has_rseq: Greater than 0 if the task has a rseq pointer installed. + * Contains the RSEQ version number * @error: Compound error code for the slow path to analyze * @fatal: User space data corrupted or invalid * @slowpath: Indicator that slow path processing via TIF_NOTIFY_RESUME --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -253,11 +253,14 @@ static bool rseq_handle_cs(struct task_s static void rseq_slowpath_update_usr(struct pt_regs *regs) { /* - * Preserve rseq state and user_irq state. The generic entry code - * clears user_irq on the way out, the non-generic entry - * architectures are not having user_irq. - */ - const struct rseq_event evt_mask =3D { .has_rseq =3D true, .user_irq =3D = true, }; + * Preserve has_rseq and user_irq state. The generic entry code clears + * user_irq on the way out, the non-generic entry architectures are not + * setting user_irq. + */ + const struct rseq_event evt_mask =3D { + .has_rseq =3D RSEQ_HAS_RSEQ_VERSION_MASK, + .user_irq =3D true, + }; struct task_struct *t =3D current; struct rseq_ids ids; u32 node_id; @@ -330,8 +333,9 @@ void __rseq_handle_slowpath(struct pt_re void __rseq_signal_deliver(int sig, struct pt_regs *regs) { rseq_stat_inc(rseq_stats.signal); + /* - * Don't update IDs, they are handled on exit to user if + * Don't update IDs yet, they are handled on exit to user if * necessary. The important thing is to abort a critical section of * the interrupted context as after this point the instruction * pointer in @regs points to the signal handler. @@ -344,6 +348,13 @@ void __rseq_signal_deliver(int sig, stru current->rseq.event.error =3D 0; force_sigsegv(sig); } + + /* + * In legacy mode, force the update of IDs before returning to user + * space to stay compatible. + */ + if (!rseq_v2(current)) + rseq_force_update(); } =20 /* @@ -408,6 +419,7 @@ static bool rseq_reset_ids(void) SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, int, flag= s, u32, sig) { u32 rseqfl =3D 0; + u8 version =3D 1; =20 if (flags & RSEQ_FLAG_UNREGISTER) { if (flags & ~RSEQ_FLAG_UNREGISTER) @@ -461,7 +473,11 @@ SYSCALL_DEFINE4(rseq, struct rseq __user if (!access_ok(rseq, rseq_len)) return -EFAULT; =20 - if (IS_ENABLED(CONFIG_RSEQ_SLICE_EXTENSION)) { + /* + * The version check effectivly disables time slice extensions until the + * RSEQ ABI V2 registration are implemented. + */ + if (IS_ENABLED(CONFIG_RSEQ_SLICE_EXTENSION) && version > 1) { if (rseq_slice_extension_enabled()) { rseqfl |=3D RSEQ_CS_FLAG_SLICE_EXT_AVAILABLE; if (flags & RSEQ_FLAG_SLICE_EXT_DEFAULT_ON) @@ -484,7 +500,15 @@ SYSCALL_DEFINE4(rseq, struct rseq __user unsafe_put_user(RSEQ_CPU_ID_UNINITIALIZED, &rseq->cpu_id, efault); unsafe_put_user(0U, &rseq->node_id, efault); unsafe_put_user(0U, &rseq->mm_cid, efault); - unsafe_put_user(0U, &rseq->slice_ctrl.all, efault); + + /* + * All fields past mm_cid are only valid for non-legacy v2 + * registrations. + */ + if (version > 1) { + if (IS_ENABLED(CONFIG_RSEQ_SLICE_EXTENSION)) + unsafe_put_user(0U, &rseq->slice_ctrl.all, efault); + } } =20 /* @@ -712,6 +736,8 @@ int rseq_slice_extension_prctl(unsigned return -ENOTSUPP; if (!current->rseq.usrptr) return -ENXIO; + if (!rseq_v2(current)) + return -ENOTSUPP; =20 /* No change? */ if (enable =3D=3D !!current->rseq.slice.state.enabled) --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -199,7 +199,16 @@ static void ipi_rseq(void *info) * is negligible. */ smp_mb(); - rseq_sched_switch_event(current); + /* + * Legacy mode requires that IDs are written and the critical section is + * evaluated. V2 optimized mode handles the critical section and IDs are + * only updated if they change as a consequence of preemption after + * return from this IPI. + */ + if (rseq_v2(current)) + rseq_sched_switch_event(current); + else + rseq_force_update(); } =20 static void ipi_sync_rq_state(void *info) From nobody Wed Jun 17 01:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 556B33F0755; Tue, 28 Apr 2026 23:34:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419243; cv=none; b=fTLXT8akI7gI3xubxQTvVUIiEuk7PkB7ByNKj0qNBMtLpmwr3hcV3ZW+QhloDppfzMpxYCkg/UyaWxhLtLwFXxHDQE95bVHdU9W/YBU6t0NZV5rlRZNcsSEEsKYpgc+2SMed1oonQwHXFBCPUlT4bvnObOJxROu4UrL3onDHWvQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419243; c=relaxed/simple; bh=Yc/e2ydR2bgKQuaSQ9GcEXY+tPnsGOUafO5/8Wy2NSw=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=F7jktILcsdj7ROKffOSm1hRYjZM9skP+cfGrw7dIaBh7Hi4t/5uZ5oZvX8+c5AgKRxtwQZze4qYQrSIsknrkajcgmbSbGqE0unZyQLVlE+ojojrbFloB98yC64wa5u7iNB5VAIMoz+x8CvEiZM7Tge71/Xjv7tJVM5ubsoGoP+M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jW0eZcd/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jW0eZcd/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A7A42C2BCAF; Tue, 28 Apr 2026 23:34:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777419243; bh=Yc/e2ydR2bgKQuaSQ9GcEXY+tPnsGOUafO5/8Wy2NSw=; h=Date:From:To:Cc:Subject:References:From; b=jW0eZcd/h0iiRH6Jgdi4Y4pPvmDdUMAM9Rxrrari0B3o8uEewlQliigRWNnoCoV8d X4vjjlPnZr4BXNoqrsGrCA3+k+Q4KGJZQ1BbSorwoRAdVOUpmRXC5Qw7MT7lYnVEct qah6aL6qXAQHYVEItD7DpB7ep547UeH2mwUgLQURsFU5d+/9Vdb2HNV2aktJSV+VWE Qbh5DLDvOt3vhgiKiIysXD17i9ix2C3P0HWi9FIIU4wn5CXh99UaEGVoxdStzPwtin RxFTBLY+ujzMW1h2QLgSr+p/WdYdhLbAXnw6Exhm1dn3gDGC7VooJ2hySWA6VXmusz SAJN2f8Ll5UcQ== Date: Wed, 29 Apr 2026 01:33:59 +0200 Message-ID: <20260428224427.597838491@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Mathias Stearn , Dmitry Vyukov , Peter Zijlstra , linux-man@vger.kernel.org, Mark Rutland , Mathieu Desnoyers , Chris Kennelly , regressions@lists.linux.dev, Ingo Molnar , Blake Oler , Florian Weimer , Rich Felker , Matthew Wilcox , Greg Kroah-Hartman , Linus Torvalds Subject: [patch 05/10] selftests/rseq: Skip tests if time slice extensions are not available References: <20260428221058.149538293@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Don't fail, skip the test if the extensions are not enabled at compile or runtime. Fixes: 830969e7821a ("selftests/rseq: Implement time slice extension test") Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Reviewed-by: Dmitry Vyukov --- tools/testing/selftests/rseq/slice_test.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) --- a/tools/testing/selftests/rseq/slice_test.c +++ b/tools/testing/selftests/rseq/slice_test.c @@ -124,6 +124,13 @@ FIXTURE_SETUP(slice_ext) { cpu_set_t affinity; =20 + if (rseq_register_current_thread()) + SKIP(return, "RSEQ not supported\n"); + + if (prctl(PR_RSEQ_SLICE_EXTENSION, PR_RSEQ_SLICE_EXTENSION_SET, + PR_RSEQ_SLICE_EXT_ENABLE, 0, 0)) + SKIP(return, "Time slice extension not supported\n"); + ASSERT_EQ(sched_getaffinity(0, sizeof(affinity), &affinity), 0); =20 /* Pin it on a single CPU. Avoid CPU 0 */ @@ -137,11 +144,6 @@ FIXTURE_SETUP(slice_ext) break; } =20 - ASSERT_EQ(rseq_register_current_thread(), 0); - - ASSERT_EQ(prctl(PR_RSEQ_SLICE_EXTENSION, PR_RSEQ_SLICE_EXTENSION_SET, - PR_RSEQ_SLICE_EXT_ENABLE, 0, 0), 0); - self->noise_params.noise_nsecs =3D variant->noise_nsecs; self->noise_params.sleep_nsecs =3D variant->sleep_nsecs; self->noise_params.run =3D 1; From nobody Wed Jun 17 01:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75D9C3FE656; Tue, 28 Apr 2026 23:34:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419248; cv=none; b=Jt5BMR8xSr/MLGIzfYglYiKFutwetVw5226/49Z8OE5pTpONXwCVPO2bM7oKmINoBu3jQHyelfPRlun7cq7lboEmK2/wysap1OSJvkb32+fC9cmcVslX/cDfJDWP8DdinIAMxuLA0HpFZPXfvmDysMTzVSoxI4uCREB5gJE29ys= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419248; c=relaxed/simple; bh=6Z4Fvlhn1E/ZUyj6g65xXQMFk9cke6Pn2IK07mAJm4s=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=F8HhWky/PJ4SHDgxRp2QFw1NnI/tcI+eOsEfjMqyEN+aZu88H3evW1zMQ+qJkXM64nCcIGDUbFsVrxuEBXfjl+ZxSYKvbPH4iFJldjYVjj7OqrzSfr4bUFS28Ogiutb6NfLV1HIncgo1bHpGYZrBMOFK8WXCpLdgZCJnNOcJuYc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DJn92r1c; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DJn92r1c" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7A92FC2BCB3; Tue, 28 Apr 2026 23:34:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777419248; bh=6Z4Fvlhn1E/ZUyj6g65xXQMFk9cke6Pn2IK07mAJm4s=; h=Date:From:To:Cc:Subject:References:From; b=DJn92r1cQ2NaDF2elSfTIevpQ3P9L8TsnL9Ax+ATAIJgiubxPAVFCnwAYliuxMrOl jlYD5/YrZ92jwY7omBDAT+JGzWM+CrjJeBPw8VzjQkrpUxjfVeVJ3RK34z4PBfD9Nb uAufRqM9ksLhSbQDxCZDyBBqtz8HTsKO81sFXWtzrBXgQQg41BkcGi9MTve/bJkxCz qolcQCXPFdjDNPTb2UQAH1jefZXOwL+Z+vc+aHg7Aez2N5Jjg98eHUHHRtijtJWH8y iyEmBuMM29Bt9cjbr6MlRBpR24txJfo3WDsTmkps8ABBoho8UtAF5GWBXM6iBVcG8e vCY3VueHatiSg== Date: Wed, 29 Apr 2026 01:34:04 +0200 Message-ID: <20260428224427.677889423@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Mathias Stearn , Dmitry Vyukov , Peter Zijlstra , linux-man@vger.kernel.org, Mark Rutland , Mathieu Desnoyers , Chris Kennelly , regressions@lists.linux.dev, Ingo Molnar , Blake Oler , Florian Weimer , Rich Felker , Matthew Wilcox , Greg Kroah-Hartman , Linus Torvalds Subject: [patch 06/10] selftests/rseq: Make registration flexible for legacy and optimized mode References: <20260428221058.149538293@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" rseq_register_current_thread() either uses the glibc registered RSEQ region or registers it's own region with the legacy size of 32 bytes. That worked so far, but becomes a problem when the kernel implements a distinction between legacy and performance optimized behavior based on the registration size as that does not allow to test both modes with the self test suite. Add two arguments to the function. One to enforce that the registration is not using libc provided mode and one to tell the registration to use the legacy size and not the kernel advertised size. Rename it and make the original one a inline wrapper which preserves the existing behavior. Fixes: 566d8015f7ee ("rseq: Avoid CPU/MM CID updates when no event pending") Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Reviewed-by: Dmitry Vyukov --- tools/testing/selftests/rseq/rseq-abi.h | 7 ++++- tools/testing/selftests/rseq/rseq.c | 39 ++++++++++++++-------------= ----- tools/testing/selftests/rseq/rseq.h | 8 +++++- 3 files changed, 31 insertions(+), 23 deletions(-) --- a/tools/testing/selftests/rseq/rseq-abi.h +++ b/tools/testing/selftests/rseq/rseq-abi.h @@ -192,9 +192,14 @@ struct rseq_abi { struct rseq_abi_slice_ctrl slice_ctrl; =20 /* + * Place holder to push the size above 32 bytes. + */ + __u8 __reserved; + + /* * Flexible array member at end of structure, after last feature field. */ char end[]; -} __attribute__((aligned(4 * sizeof(__u64)))); +} __attribute__((aligned(256))); =20 #endif /* _RSEQ_ABI_H */ --- a/tools/testing/selftests/rseq/rseq.c +++ b/tools/testing/selftests/rseq/rseq.c @@ -56,6 +56,7 @@ ptrdiff_t rseq_offset; * unsuccessful. */ unsigned int rseq_size =3D -1U; +static unsigned int rseq_alloc_size; =20 /* Flags used during rseq registration. */ unsigned int rseq_flags; @@ -115,29 +116,17 @@ bool rseq_available(void) } } =20 -/* The rseq areas need to be at least 32 bytes. */ -static -unsigned int get_rseq_min_alloc_size(void) -{ - unsigned int alloc_size =3D rseq_size; - - if (alloc_size < ORIG_RSEQ_ALLOC_SIZE) - alloc_size =3D ORIG_RSEQ_ALLOC_SIZE; - return alloc_size; -} - /* * Return the feature size supported by the kernel. * * Depending on the value returned by getauxval(AT_RSEQ_FEATURE_SIZE): * - * 0: Return ORIG_RSEQ_FEATURE_SIZE (20) + * 0: Return ORIG_RSEQ_FEATURE_SIZE (20) * > 0: Return the value from getauxval(AT_RSEQ_FEATURE_SIZE). * * It should never return a value below ORIG_RSEQ_FEATURE_SIZE. */ -static -unsigned int get_rseq_kernel_feature_size(void) +static unsigned int get_rseq_kernel_feature_size(void) { unsigned long auxv_rseq_feature_size, auxv_rseq_align; =20 @@ -152,15 +141,24 @@ unsigned int get_rseq_kernel_feature_siz return ORIG_RSEQ_FEATURE_SIZE; } =20 -int rseq_register_current_thread(void) +int __rseq_register_current_thread(bool nolibc, bool legacy) { + unsigned int size; int rc; =20 if (!rseq_ownership) { /* Treat libc's ownership as a successful registration. */ - return 0; + return nolibc ? -EBUSY : 0; } - rc =3D sys_rseq(&__rseq.abi, get_rseq_min_alloc_size(), 0, RSEQ_SIG); + + /* The minimal allocation size is 32, which is the legacy allocation size= */ + size =3D get_rseq_kernel_feature_size(); + if (legacy || size < ORIG_RSEQ_ALLOC_SIZE) + rseq_alloc_size =3D ORIG_RSEQ_ALLOC_SIZE; + else + rseq_alloc_size =3D size; + + rc =3D sys_rseq(&__rseq.abi, rseq_alloc_size, 0, RSEQ_SIG); if (rc) { /* * After at least one thread has registered successfully @@ -179,9 +177,8 @@ int rseq_register_current_thread(void) * The first thread to register sets the rseq_size to mimic the libc * behavior. */ - if (RSEQ_READ_ONCE(rseq_size) =3D=3D 0) { - RSEQ_WRITE_ONCE(rseq_size, get_rseq_kernel_feature_size()); - } + if (RSEQ_READ_ONCE(rseq_size) =3D=3D 0) + RSEQ_WRITE_ONCE(rseq_size, size); =20 return 0; } @@ -194,7 +191,7 @@ int rseq_unregister_current_thread(void) /* Treat libc's ownership as a successful unregistration. */ return 0; } - rc =3D sys_rseq(&__rseq.abi, get_rseq_min_alloc_size(), RSEQ_ABI_FLAG_UNR= EGISTER, RSEQ_SIG); + rc =3D sys_rseq(&__rseq.abi, rseq_alloc_size, RSEQ_ABI_FLAG_UNREGISTER, R= SEQ_SIG); if (rc) return -1; return 0; --- a/tools/testing/selftests/rseq/rseq.h +++ b/tools/testing/selftests/rseq/rseq.h @@ -8,6 +8,7 @@ #ifndef RSEQ_H #define RSEQ_H =20 +#include #include #include #include @@ -142,7 +143,12 @@ static inline struct rseq_abi *rseq_get_ * succeed. A restartable sequence executed from a non-registered * thread will always fail. */ -int rseq_register_current_thread(void); +int __rseq_register_current_thread(bool nolibc, bool legacy); + +static inline int rseq_register_current_thread(void) +{ + return __rseq_register_current_thread(false, false); +} =20 /* * Unregister rseq for current thread. From nobody Wed Jun 17 01:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 78BF03FE656; Tue, 28 Apr 2026 23:34:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419253; cv=none; b=JqkjSwYRDQlGtqlMyE64wrX4TDSi5py4GkgFQwIcY0JC0HtYtALSRZdylKXNUtV2np8+p6Whpw1zgqhsu9AY358o2ntcZgZ7wzJRXU3lStOmHX7itImIveWS6cknYEbxySASrwYxoiT+giD3jezK15ggviLMkDnvr/prLtc8W6k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419253; c=relaxed/simple; bh=KUB0P71GMAxNRyp3vKQf0e41zNk0CHybFJQYtchF264=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=CPzwdcH0TRiEW+KiRiCSZlCQzCM8AcnIj74lX5o0hhgDIv1QjPYQXYJEdrqA6fzTmUgk+lTWUlm0IP9QBkgGJbJLgBt5bka+LIHlWcIIHpIFXIQYoBH7KhmMjL6cGkb+PyMDeZ9hMqCQ+AikLaLBYO1aLS72fPEHNFooVSzEL20= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fihXZBpp; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fihXZBpp" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C4BBFC2BCAF; Tue, 28 Apr 2026 23:34:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777419253; bh=KUB0P71GMAxNRyp3vKQf0e41zNk0CHybFJQYtchF264=; h=Date:From:To:Cc:Subject:References:From; b=fihXZBppNUszJdcyOfN//BpAVUbEDaPAR32Xi+EtMeWyZPG4dwPd/FK56wC6s3Bua gzsuw1q5YV94jYI9BaOgd94I5U+xvZdpwB1xPijyXoESVKO1apDSnORVk0rwjJvKXP JYfjGR5+jWCTHO4Yry2WrohDf7i1mmQLH8yCAG/gxXwXf1tYVFI06CW6jcwHCfyUob vARJT5rP3+tI3zeO3d845SEy+GmcLBlzpeK6ToHsWSCeMhlq6zR7uuInGE+G9kbJmL U2dCQmnqeXKuBXpnTOyX6uzPgg4aCtGPGWy7rEOL/yjIesFBI9/H3aauyfihZP0QEV wdK78LixJR8UQ== Date: Wed, 29 Apr 2026 01:34:10 +0200 Message-ID: <20260428224427.764705536@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Mathias Stearn , Dmitry Vyukov , Peter Zijlstra , linux-man@vger.kernel.org, Mark Rutland , Mathieu Desnoyers , Chris Kennelly , regressions@lists.linux.dev, Ingo Molnar , Blake Oler , Florian Weimer , Rich Felker , Matthew Wilcox , Greg Kroah-Hartman , Linus Torvalds Subject: [patch 07/10] selftests/rseq: Validate legacy behavior References: <20260428221058.149538293@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The RSEQ legacy mode behavior requires that the ID fields in the rseq region are unconditionally updated on every context switch and before signal delivery even if not required by the ABI specification. To ensure that this behavior is preserved for legacy users in the future, add a test which validates that with a sleep() and a signal sent to self. Provide a run script which prevents GLIBC from registering a RSEQ region, so that the test can register it's own legacy sized region. Fixes: 566d8015f7ee ("rseq: Avoid CPU/MM CID updates when no event pending") Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Reviewed-by: Dmitry Vyukov --- tools/testing/selftests/rseq/Makefile | 4 - tools/testing/selftests/rseq/legacy_check.c | 65 ++++++++++++++++++= +++++ tools/testing/selftests/rseq/run_legacy_check.sh | 4 + 3 files changed, 71 insertions(+), 2 deletions(-) --- a/tools/testing/selftests/rseq/Makefile +++ b/tools/testing/selftests/rseq/Makefile @@ -17,11 +17,11 @@ OVERRIDE_TARGETS =3D 1 TEST_GEN_PROGS =3D basic_test basic_percpu_ops_test basic_percpu_ops_mm_ci= d_test param_test \ param_test_benchmark param_test_compare_twice param_test_mm_cid \ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice \ - syscall_errors_test slice_test + syscall_errors_test slice_test legacy_check =20 TEST_GEN_PROGS_EXTENDED =3D librseq.so =20 -TEST_PROGS =3D run_param_test.sh run_syscall_errors_test.sh +TEST_PROGS =3D run_param_test.sh run_syscall_errors_test.sh run_legacy_che= ck.sh =20 TEST_FILES :=3D settings =20 --- /dev/null +++ b/tools/testing/selftests/rseq/legacy_check.c @@ -0,0 +1,65 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef _GNU_SOURCE +#define _GNU_SOURCE +#endif + +#include +#include +#include +#include + +#include "rseq.h" + +#include "../kselftest_harness.h" + +FIXTURE(legacy) +{ +}; + +static int cpu_id_in_sigfn =3D -1; + +static void sigfn(int sig) +{ + struct rseq_abi *rs =3D rseq_get_abi(); + + cpu_id_in_sigfn =3D rs->cpu_id_start; +} + +FIXTURE_SETUP(legacy) +{ + int res =3D __rseq_register_current_thread(true, true); + + switch (res) { + case -ENOSYS: + SKIP(return, "RSEQ not enabled\n"); + case -EBUSY: + SKIP(return, "GLIBC owns RSEQ. Disable GLIBC RSEQ registration\n"); + default: + ASSERT_EQ(res, 0); + } + + ASSERT_NE(signal(SIGUSR1, sigfn), SIG_ERR); +} + +FIXTURE_TEARDOWN(legacy) +{ +} + +TEST_F(legacy, legacy_test) +{ + struct rseq_abi *rs =3D rseq_get_abi(); + + ASSERT_NE(rs, NULL); + + /* Overwrite rs::cpu_id_start */ + rs->cpu_id_start =3D -1; + sleep(1); + ASSERT_NE(rs->cpu_id_start, -1); + + rs->cpu_id_start =3D -1; + ASSERT_EQ(raise(SIGUSR1), 0); + ASSERT_NE(rs->cpu_id_start, -1); + ASSERT_NE(cpu_id_in_sigfn, -1); +} + +TEST_HARNESS_MAIN --- /dev/null +++ b/tools/testing/selftests/rseq/run_legacy_check.sh @@ -0,0 +1,4 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +GLIBC_TUNABLES=3D"${GLIBC_TUNABLES:-}:glibc.pthread.rseq=3D0" ./legacy_che= ck From nobody Wed Jun 17 01:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D5CB938B147; Tue, 28 Apr 2026 23:34:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419258; cv=none; b=jZRc8QeDsdPuYY8gyd+5L7cu/bHbiWGGBoxoEcJEFy254kDq/xqfnVEoSKGjBdpO/MdnOVjNK1XbMaQGUuev9rdCNH/az6ZWJqwbDFW0ZaPGY3EeoEAIfz56Opt51br8Y1Ruz+ZxQLqza2QQYxpKh5uEJexldiiW0ImQF5iyIqg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419258; c=relaxed/simple; bh=VOV//gWQkHADVLYCgIfwzZhW3FXmMvBwfAlH8Np/HUA=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=YM+WrGX1G5KFoXYDUqKH2ThPCb6ZHu79SkjjKQiw4qzbV7zzeL1k7grKv3/ZFJok8mcHbFJ/dnUlN6liv3jL6DGe8UGV0MLXCcfbk59DNqiPQz+k8HWRV9uMhrjGNuSQiSE+J/CqMjLbMUU4ltjipRXJvwRpjmFoZNYQ3P1N+e4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PXWvOoar; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PXWvOoar" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E2557C2BCAF; Tue, 28 Apr 2026 23:34:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777419258; bh=VOV//gWQkHADVLYCgIfwzZhW3FXmMvBwfAlH8Np/HUA=; h=Date:From:To:Cc:Subject:References:From; b=PXWvOoarTT7mMHSp04TtzWAGxpts7zSy5mpyu647WXK1xtMmlLJoCk+U84PxoJaTo CidrQJlQMKs74UQk0+UNbPEXaYuNqML5byOgBZyFHFkIyLYpNpTt7qCkeL5TDwmGuc ++W15qofgto6pyYJKnO0e9FWyvWemeam3KNnJkWyJznZOXpXDTioldomUpwthB3m0f zC7D5O1rLA2X1hbdsHQnAO7dWdakFBaZpadBye6na75zs/zG5G8cWEOsRw2q2Ano3Z wGAzgAi3d+ulCTpI7ajwChVupvy8zhfo1RpZaQlb7wfsK358K/cUga0d1KkwnAbu8b Oi99PTEYp/77Q== Date: Wed, 29 Apr 2026 01:34:15 +0200 Message-ID: <20260428224427.845230956@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Mathias Stearn , Dmitry Vyukov , Peter Zijlstra , linux-man@vger.kernel.org, Mark Rutland , Mathieu Desnoyers , Chris Kennelly , regressions@lists.linux.dev, Ingo Molnar , Blake Oler , Florian Weimer , Rich Felker , Matthew Wilcox , Greg Kroah-Hartman , Linus Torvalds Subject: [patch 08/10] rseq: Implement read only ABI enforcement for optimized RSEQ V2 mode References: <20260428221058.149538293@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The optimized RSEQ V2 mode requires that user space adheres to the ABI specification and does not modify the read-only fields cpu_id_start, cpu_id, node_id and mm_cid behind the kernel's back. While the kernel does not rely on these fields, the adherence to this is a fundamental prerequisite to allow multiple entities, e.g. libraries, in an application to utilize the full potential of RSEQ without stepping on each other toes. Validate this adherence on every update of these fields. If the kernel detects that user space modified the fields, the application is force terminated. Fixes: d6200245c75e ("rseq: Allow registering RSEQ with slice extension") Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Reviewed-by: Dmitry Vyukov --- include/linux/rseq_entry.h | 71 +++++++++++++++++-----------------------= ----- 1 file changed, 28 insertions(+), 43 deletions(-) --- a/include/linux/rseq_entry.h +++ b/include/linux/rseq_entry.h @@ -248,7 +248,6 @@ static __always_inline bool rseq_grant_s #endif /* !CONFIG_RSEQ_SLICE_EXTENSION */ =20 bool rseq_debug_update_user_cs(struct task_struct *t, struct pt_regs *regs= , unsigned long csaddr); -bool rseq_debug_validate_ids(struct task_struct *t); =20 static __always_inline void rseq_note_user_irq_entry(void) { @@ -368,43 +367,6 @@ bool rseq_debug_update_user_cs(struct ta return false; } =20 -/* - * On debug kernels validate that user space did not mess with it if the - * debug branch is enabled. - */ -bool rseq_debug_validate_ids(struct task_struct *t) -{ - struct rseq __user *rseq =3D t->rseq.usrptr; - u32 cpu_id, uval, node_id; - - /* - * On the first exit after registering the rseq region CPU ID is - * RSEQ_CPU_ID_UNINITIALIZED and node_id in user space is 0! - */ - node_id =3D t->rseq.ids.cpu_id !=3D RSEQ_CPU_ID_UNINITIALIZED ? - cpu_to_node(t->rseq.ids.cpu_id) : 0; - - scoped_user_read_access(rseq, efault) { - unsafe_get_user(cpu_id, &rseq->cpu_id_start, efault); - if (cpu_id !=3D t->rseq.ids.cpu_id) - goto die; - unsafe_get_user(uval, &rseq->cpu_id, efault); - if (uval !=3D cpu_id) - goto die; - unsafe_get_user(uval, &rseq->node_id, efault); - if (uval !=3D node_id) - goto die; - unsafe_get_user(uval, &rseq->mm_cid, efault); - if (uval !=3D t->rseq.ids.mm_cid) - goto die; - } - return true; -die: - t->rseq.event.fatal =3D true; -efault: - return false; -} - #endif /* RSEQ_BUILD_SLOW_PATH */ =20 /* @@ -519,12 +481,32 @@ bool rseq_set_ids_get_csaddr(struct task { struct rseq __user *rseq =3D t->rseq.usrptr; =20 - if (static_branch_unlikely(&rseq_debug_enabled)) { - if (!rseq_debug_validate_ids(t)) - return false; - } - scoped_user_rw_access(rseq, efault) { + /* Validate the R/O fields for debug and optimized mode */ + if (static_branch_unlikely(&rseq_debug_enabled) || rseq_v2(t)) { + u32 cpu_id, uval, node_id; + + /* + * On the first exit after registering the rseq region CPU ID is + * RSEQ_CPU_ID_UNINITIALIZED and node_id in user space is 0! + */ + node_id =3D t->rseq.ids.cpu_id !=3D RSEQ_CPU_ID_UNINITIALIZED ? + cpu_to_node(t->rseq.ids.cpu_id) : 0; + + unsafe_get_user(cpu_id, &rseq->cpu_id_start, efault); + if (cpu_id !=3D t->rseq.ids.cpu_id) + goto die; + unsafe_get_user(uval, &rseq->cpu_id, efault); + if (uval !=3D cpu_id) + goto die; + unsafe_get_user(uval, &rseq->node_id, efault); + if (uval !=3D node_id) + goto die; + unsafe_get_user(uval, &rseq->mm_cid, efault); + if (uval !=3D t->rseq.ids.mm_cid) + goto die; + } + unsafe_put_user(ids->cpu_id, &rseq->cpu_id_start, efault); unsafe_put_user(ids->cpu_id, &rseq->cpu_id, efault); unsafe_put_user(node_id, &rseq->node_id, efault); @@ -543,6 +525,9 @@ bool rseq_set_ids_get_csaddr(struct task rseq_stat_inc(rseq_stats.ids); rseq_trace_update(t, ids); return true; + +die: + t->rseq.event.fatal =3D true; efault: return false; } From nobody Wed Jun 17 01:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF54942EEDE; Tue, 28 Apr 2026 23:34:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419264; cv=none; b=N3zpntk4kRzy43uDY+p/a/DRJb4O1UThxFfUCd65vYYIRy67vYbVkIILpVxI0lYO+Y1/RtKsE5Z8utKp5LwW2wl1gZGLvnseMu8/JT7Tz77uqNkDZTBGTqUtKRb4o06zARJ1dxxicyNskxsOMIlrj2q89XBdPf1rmh8LiCkNUFI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419264; c=relaxed/simple; bh=o8q2gZuKW7aHfrJCUW9cA+6eBOYeytciXqIMGBUXqjk=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=JwdAGFSch7BNFiaWUzxzPn4L5WCLYHxtbJU1WZacJSn3NMkBPc/fVgx5fbB8hJZ2Z0FuoOQ1jmMRgCRK5Ewx8uvB7kRRrxequTNJVDUEz7cukL4NQ3jVoOHeqAYfbFy3G6w5/PHoeIZ1tGwVRKeJkM8jFUGikqLNX/UpzsR5GYw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=angFWXAt; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="angFWXAt" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 28830C2BCAF; Tue, 28 Apr 2026 23:34:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777419263; bh=o8q2gZuKW7aHfrJCUW9cA+6eBOYeytciXqIMGBUXqjk=; h=Date:From:To:Cc:Subject:References:From; b=angFWXAtLdtauswQRUmXHU+7IUwWCO1kHS3oEtJFfNx5H/gMid0aaEdCfh/+YUtTq YTTHIsRJ9h/bxGULdZxXhzbORu8l/U+NqCmZFF8RIleQFXG/Y1ckJe3UXrt0d/Xe3K E5hQRmRWSbmGez/9U4ifU23KpWEzW4Q/Hc/cDAK31+trcRmz78DrmgBGc6u4j1uOUX vZPekgzs4TpdG2oow6RCry6FKgKkYTCLTW638svdquv6I7Kt8+QD59dh/8gL8fCAV0 Vd3xP8FMQf90W8lYQyKPAZzd3aqiyHEVAWCGea19S571WVBLW2DtSuXaXVXs2yvCW9 3PTf1qUzQVwCA== Date: Wed, 29 Apr 2026 01:34:20 +0200 Message-ID: <20260428224427.927160119@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Mathias Stearn , Dmitry Vyukov , Peter Zijlstra , linux-man@vger.kernel.org, Mark Rutland , Mathieu Desnoyers , Chris Kennelly , regressions@lists.linux.dev, Ingo Molnar , Blake Oler , Florian Weimer , Rich Felker , Matthew Wilcox , Greg Kroah-Hartman , Linus Torvalds Subject: [patch 09/10] rseq: Reenable performance optimizations conditionally References: <20260428221058.149538293@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Due to the incompatibility with TCMalloc the RSEQ optimizations and extended features (time slice extensions) have been disabled and made run-time conditional. The original RSEQ implementation, which TCMalloc depends on, registers a 32 byte region (ORIG_RSEG_SIZE). This region has a 32 byte alignment requirement. The extension safe newer variant exposes the kernel RSEQ feature size via getauxval(AT_RSEQ_FEATURE_SIZE) and the alignment requirement via getauxval(AT_RSEQ_ALIGN). The alignment requirement is that the registered RSEQ region is aligned to the next power of two of the feature size. The kernel currently has a feature size of 33 bytes, which means the alignment requirement is 64 bytes. The TCMalloc RSEQ region is embedded into a cache line aligned data structure starting at offset 32 bytes so that bytes 28-31 and the cpu_id_start field at bytes 32-35 form a 64-bit little endian pointer with the top-most bit (63 set) to check whether the kernel has overwritten cpu_id_start with an actual CPU id value, which is guaranteed to not have the top most bit set. As this is part of their performance tuned magic, it's a pretty safe assumption, that TCMalloc won't use a larger RSEQ size. This allows the kernel to declare that registrations with a size greater than the original size of 32 bytes, which is the cases since time slice extensions got introduced, as RSEQ ABI v2 with the following differences to the original behaviour: 1) Unconditional updates of the user read only fields (CPU, node, MMCID) are removed. Those fields are only updated on registration, task migration and MMCID changes. 2) Unconditional evaluation of the criticial section pointer is removed. It's only evaluated when user space was interrupted and was scheduled out or before delivering a signal in the interrupted context. 3) The read/only requirement of the ID fields is enforced. When the kernel detects that userspace manipulated the fields, the process is terminated. This ensures that multiple entities (libraries) can utilize RSEQ without interfering. 4) Todays extended RSEQ feature (time slice extensions) and future extensions are only enabled in the v2 enabled mode. Registrations with the original size of 32 bytes operate in backwards compatible legacy mode without performance improvements and extended features. Unfortunately that also affects users of older GLIBC versions which register the original size of 32 bytes and do not evaluate the kernel required size in the auxiliary vector AT_RSEQ_FEATURE_SIZE. That's the result of the lack of enforcement in the original implementation and the unwillingness of a single entity to cooperate with the larger ecosystem for many years. Implement the required registration changes by restructuring the spaghetti code and adding the size/version check. Also add documentation about the differences of legacy and optimized RSEQ V2 mode. Thanks to Mathieu for pointing out the ORIG_RSEQ_SIZE constraints! Fixes: d6200245c75e ("rseq: Allow registering RSEQ with slice extension") Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Reviewed-by: Dmitry Vyukov --- Documentation/userspace-api/rseq.rst | 94 ++++++++++++++++++++++ kernel/rseq.c | 144 ++++++++++++++++++++----------= ----- 2 files changed, 178 insertions(+), 60 deletions(-) --- a/Documentation/userspace-api/rseq.rst +++ b/Documentation/userspace-api/rseq.rst @@ -24,6 +24,97 @@ Quick access to CPU number, node ID Allows to implement per CPU data efficiently. Documentation is in code and selftests. :( =20 +Optimized RSEQ V2 +----------------- + +On architectures which utilize the generic entry code and generic TIF bits +the kernel supports runtime optimizations for RSEQ, which also enable +enhanced features like scheduler time slice extensions. + +To enable them a task has to register the RSEQ region with at least the +length advertised by getauxval(AT_RSEQ_FEATURE_SIZE). + +If existing binaries register with RSEQ_ORIG_SIZE (32 bytes), the kernel +keeps the legacy low performance mode enabled to fulfil the expectations +of existing users regarding the original RSEQ implementation behaviour. + +The following table documents the ABI and behavioral guarantees of the +legacy and the optimized V2 mode. + +.. list-table:: RSEQ modes + :header-rows: 1 + + * - Nr + - What + + - Legacy + - Optimized V2 + + * - 1 + - The cpu_id_start, cpu_id, node_id and mm_cid fields (User mode read + only) + .. Legacy + - Updated by the kernel unconditionally after each context switch and + before signal delivery + .. Optimized V2 + - Updated by the kernel if and only if they change, i.e. if the task + is migrated or mm_cid changes + + * - 2 + - The rseq_cs critical section field + .. Legacy + - Evaluated and handled unconditionally after each context switch and + before signal delivery + .. Optimized V2 + - Evaluated and handled conditionally only when user space was + interrupted and was scheduled out or before delivering a signal in + the interrupted context. + + * - 3 + - Read only fields + .. Legacy + - No strict enforcement except in debug mode + .. Optimized V2 + - Strict enforcement + + * - 4 + - membarrier(...RSEQ) + .. Legacy + - All running threads of the process are interrupted and the ID fields + are rewritten and eventually active critical sections are aborted + before they return to user space. All threads which are scheduled + out whether voluntary or not are covered by #1/#2 above. + .. Optimized V2 + - All running threads of the process are interrupted and eventually + active critical sections are aborted before these threads return to + user space. The ID fields are only updated if changed as a + consequence of the interrupt. All threads which are scheduled out + whether voluntary or not are covered by #1/#2 above. + + * - 5 + - Time slice extensions + .. Legacy + - Not supported + .. Optimized V2 + - Supported + +The legacy mode is obviously less performant as it does unconditional +updates and critical section checks even if not strictly required by the +ABI contract. That can't be changed anymore as some users depend on that +observed behavior, which in turn enables them to violate the ABI and +overwrite the cpu_id_start field for their own purposes. This is obviously +discouraged as it renders RSEQ incompatible with the intended usage and +breaks the expectation of other libraries in the same application. + +The ABI compliant optimized v2 mode, which respects the read only fields, +does not require unconditional updates and therefore is way more +performant. The kernel validates the read only fields for compliance. If +user space modifies them, the process is killed. Compliant usage allows +multiple libraries in the same application to benefit from the RSEQ +functionality without disturbing each other. The ABI compliant optimized v2 +mode also enables extended RSEQ features like time slice extensions. + + Scheduler time slice extensions ------------------------------- =20 @@ -37,7 +128,8 @@ scheduled out inside of the critical sec =20 * Enabled at boot time (default is enabled) =20 - * A rseq userspace pointer has been registered for the thread + * A rseq userspace pointer has been registered for the thread in + optimized V2 mode =20 The thread has to enable the functionality via prctl(2):: =20 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -413,70 +413,23 @@ static bool rseq_reset_ids(void) /* The original rseq structure size (including padding) is 32 bytes. */ #define ORIG_RSEQ_SIZE 32 =20 -/* - * sys_rseq - setup restartable sequences for caller thread. - */ -SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, int, flag= s, u32, sig) +static long rseq_register(struct rseq __user * rseq, u32 rseq_len, int fla= gs, u32 sig) { u32 rseqfl =3D 0; u8 version =3D 1; =20 - if (flags & RSEQ_FLAG_UNREGISTER) { - if (flags & ~RSEQ_FLAG_UNREGISTER) - return -EINVAL; - /* Unregister rseq for current thread. */ - if (current->rseq.usrptr !=3D rseq || !current->rseq.usrptr) - return -EINVAL; - if (rseq_len !=3D current->rseq.len) - return -EINVAL; - if (current->rseq.sig !=3D sig) - return -EPERM; - if (!rseq_reset_ids()) - return -EFAULT; - rseq_reset(current); - return 0; - } - - if (unlikely(flags & ~(RSEQ_FLAG_SLICE_EXT_DEFAULT_ON))) - return -EINVAL; - - if (current->rseq.usrptr) { - /* - * If rseq is already registered, check whether - * the provided address differs from the prior - * one. - */ - if (current->rseq.usrptr !=3D rseq || rseq_len !=3D current->rseq.len) - return -EINVAL; - if (current->rseq.sig !=3D sig) - return -EPERM; - /* Already registered. */ - return -EBUSY; - } - - /* - * If there was no rseq previously registered, ensure the provided rseq - * is properly aligned, as communcated to user-space through the ELF - * auxiliary vector AT_RSEQ_ALIGN. If rseq_len is the original rseq - * size, the required alignment is the original struct rseq alignment. - * - * The rseq_len is required to be greater or equal to the original rseq - * size. In order to be valid, rseq_len is either the original rseq size, - * or large enough to contain all supported fields, as communicated to - * user-space through the ELF auxiliary vector AT_RSEQ_FEATURE_SIZE. - */ - if (rseq_len < ORIG_RSEQ_SIZE || - (rseq_len =3D=3D ORIG_RSEQ_SIZE && !IS_ALIGNED((unsigned long)rseq, O= RIG_RSEQ_SIZE)) || - (rseq_len !=3D ORIG_RSEQ_SIZE && (!IS_ALIGNED((unsigned long)rseq, rs= eq_alloc_align()) || - rseq_len < offsetof(struct rseq, end)))) - return -EINVAL; if (!access_ok(rseq, rseq_len)) return -EFAULT; =20 /* - * The version check effectivly disables time slice extensions until the - * RSEQ ABI V2 registration are implemented. + * Architectures, which use the generic IRQ entry code (at least) enable + * registrations with a size greater than the original v1 fixed sized + * @rseq_len, which has been validated already to utilize the optimized + * v2 ABI mode which also enables extended RSEQ features beyond MMCID. */ + if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY) && rseq_len > ORIG_RSEQ_SIZE) + version =3D 2; + if (IS_ENABLED(CONFIG_RSEQ_SLICE_EXTENSION) && version > 1) { if (rseq_slice_extension_enabled()) { rseqfl |=3D RSEQ_CS_FLAG_SLICE_EXT_AVAILABLE; @@ -524,11 +477,10 @@ SYSCALL_DEFINE4(rseq, struct rseq __user #endif =20 /* - * If rseq was previously inactive, and has just been - * registered, ensure the cpu_id_start and cpu_id fields - * are updated before returning to user-space. + * Ensure the cpu_id_start and cpu_id fields are updated before + * returning to user-space. */ - current->rseq.event.has_rseq =3D true; + current->rseq.event.has_rseq =3D version; rseq_force_update(); return 0; =20 @@ -536,6 +488,80 @@ SYSCALL_DEFINE4(rseq, struct rseq __user return -EFAULT; } =20 +static long rseq_unregister(struct rseq __user * rseq, u32 rseq_len, int f= lags, u32 sig) +{ + if (flags & ~RSEQ_FLAG_UNREGISTER) + return -EINVAL; + if (current->rseq.usrptr !=3D rseq || !current->rseq.usrptr) + return -EINVAL; + if (rseq_len !=3D current->rseq.len) + return -EINVAL; + if (current->rseq.sig !=3D sig) + return -EPERM; + if (!rseq_reset_ids()) + return -EFAULT; + rseq_reset(current); + return 0; +} + +static long rseq_reregister(struct rseq __user * rseq, u32 rseq_len, u32 s= ig) +{ + /* + * If rseq is already registered, check whether the provided address + * differs from the prior one. + */ + if (current->rseq.usrptr !=3D rseq || rseq_len !=3D current->rseq.len) + return -EINVAL; + if (current->rseq.sig !=3D sig) + return -EPERM; + /* Already registered. */ + return -EBUSY; +} + +static bool rseq_length_valid(struct rseq __user *rseq, unsigned int rseq_= len) +{ + /* + * Ensure the provided rseq is properly aligned, as communicated to + * user-space through the ELF auxiliary vector AT_RSEQ_ALIGN. If + * rseq_len is the original rseq size, the required alignment is the + * original struct rseq alignment. + * + * In order to be valid, rseq_len is either the original rseq size, or + * large enough to contain all supported fields, as communicated to + * user-space through the ELF auxiliary vector AT_RSEQ_FEATURE_SIZE. + */ + if (rseq_len < ORIG_RSEQ_SIZE) + return false; + + if (rseq_len =3D=3D ORIG_RSEQ_SIZE) + return IS_ALIGNED((unsigned long)rseq, ORIG_RSEQ_SIZE); + + return IS_ALIGNED((unsigned long)rseq, rseq_alloc_align()) && + rseq_len >=3D offsetof(struct rseq, end); +} + +#define RSEQ_FLAGS_SUPPORTED (RSEQ_FLAG_SLICE_EXT_DEFAULT_ON) + +/* + * sys_rseq - Register or unregister restartable sequences for the caller = thread. + */ +SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, int, flag= s, u32, sig) +{ + if (flags & RSEQ_FLAG_UNREGISTER) + return rseq_unregister(rseq, rseq_len, flags, sig); + + if (unlikely(flags & ~RSEQ_FLAGS_SUPPORTED)) + return -EINVAL; + + if (current->rseq.usrptr) + return rseq_reregister(rseq, rseq_len, sig); + + if (!rseq_length_valid(rseq, rseq_len)) + return -EINVAL; + + return rseq_register(rseq, rseq_len, flags, sig); +} + #ifdef CONFIG_RSEQ_SLICE_EXTENSION struct slice_timer { struct hrtimer timer; From nobody Wed Jun 17 01:32:43 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D489F44B675; Tue, 28 Apr 2026 23:34:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419268; cv=none; b=Cj7L6sQcBEYE3ONpr4g7Q4bzHT9SyElxfkh4YR0Wt8UE7djdcDrBGZ2e6Da4lk2JRBm8Y+i4HFht8WHpGAC+7bMJkI0jubgbTtocvmkDHs+HXtC4YltyAq3l4z1XXyGjaf+SqzBTgMmCnAA6XXKQD6GXn5fcKKMoxm+0RNCk++Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777419268; c=relaxed/simple; bh=qF5P88ZpPi9v84Qdsyp2IIlMFiao/NGHMWllPD/x1bw=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=B5CFNnO+owZAu/TntPBFBIwwqeTef4zK7cvGSmLTaHrpR2jOkvpTEm8yErUmFlKwLDTj84N/cuQgMRD2UWXny8aexpNf8LRXwbgrl2cX/F+leKuYAvonTvTea27ZDDqn8VYQc4NHdDA1PDTV9NFLxh8OBCvzZkV18UA9pvO7eqQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TmlxxJEI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TmlxxJEI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2893CC2BCAF; Tue, 28 Apr 2026 23:34:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777419268; bh=qF5P88ZpPi9v84Qdsyp2IIlMFiao/NGHMWllPD/x1bw=; h=Date:From:To:Cc:Subject:References:From; b=TmlxxJEI76GrXqYVc5oqf7nL1R9DbPA5j0WzrkVRxK2MC6SMsGXZMR97I13+BPwp5 C8+L7ECtGMozURoMEB8vNOKpYpbFEChtHOJPRECUVQ9PS9tV0PyZ9fYwrsqR/KviJ+ fNwWla1y5AN3XxVRdA6AdT1uPGaGo/mOjKSqGorfoopF/Lr93EtfDz9CE3wpWLjsis qRSuFemaOOYlQA+vbHuUn8JQW9oyZjfCto6JKyUsw6HurudobURtrAt6u7VmQGxQPC b+za0pPqrQngtm+TAkSrUK0qGzNJlaoLl2ZHC1hH04VhNCtyG0G1k/uId3xJhMtuHs bX5+Xz9e8BJrA== Date: Wed, 29 Apr 2026 01:34:25 +0200 Message-ID: <20260428224428.009121296@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Mathias Stearn , Dmitry Vyukov , Peter Zijlstra , linux-man@vger.kernel.org, Mark Rutland , Mathieu Desnoyers , Chris Kennelly , regressions@lists.linux.dev, Ingo Molnar , Blake Oler , Florian Weimer , Rich Felker , Matthew Wilcox , Greg Kroah-Hartman , Linus Torvalds Subject: [patch 10/10] selftests/rseq: Expand for optimized RSEQ ABI v2 References: <20260428221058.149538293@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update the selftests so they are executed for legacy (32 bytes RSEQ region) and optimized RSEQ ABI v2 mode. Fixes: d6200245c75e ("rseq: Allow registering RSEQ with slice extension") Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Reviewed-by: Dmitry Vyukov --- tools/testing/selftests/rseq/Makefile | 7 ++- tools/testing/selftests/rseq/check_optimized.c | 17 +++++++++ tools/testing/selftests/rseq/param_test.c | 22 +++++++---- tools/testing/selftests/rseq/run_param_test.sh | 39 ++++++++++++++++= +++++ tools/testing/selftests/rseq/run_timeslice_test.sh | 14 +++++++ tools/testing/selftests/rseq/slice_test.c | 2 - 6 files changed, 89 insertions(+), 12 deletions(-) --- a/tools/testing/selftests/rseq/Makefile +++ b/tools/testing/selftests/rseq/Makefile @@ -17,11 +17,11 @@ OVERRIDE_TARGETS =3D 1 TEST_GEN_PROGS =3D basic_test basic_percpu_ops_test basic_percpu_ops_mm_ci= d_test param_test \ param_test_benchmark param_test_compare_twice param_test_mm_cid \ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice \ - syscall_errors_test slice_test legacy_check + syscall_errors_test slice_test legacy_check check_optimized =20 TEST_GEN_PROGS_EXTENDED =3D librseq.so =20 -TEST_PROGS =3D run_param_test.sh run_syscall_errors_test.sh run_legacy_che= ck.sh +TEST_PROGS =3D run_param_test.sh run_syscall_errors_test.sh run_legacy_che= ck.sh run_timeslice_test.sh =20 TEST_FILES :=3D settings =20 @@ -62,3 +62,6 @@ include ../lib.mk =20 $(OUTPUT)/slice_test: slice_test.c $(TEST_GEN_PROGS_EXTENDED) rseq.h rseq-= *.h $(CC) $(CFLAGS) $< $(LDLIBS) -lrseq -o $@ + +$(OUTPUT)/check_optimized: check_optimized.c $(TEST_GEN_PROGS_EXTENDED) rs= eq.h rseq-*.h + $(CC) $(CFLAGS) $< $(LDLIBS) -lrseq -o $@ --- /dev/null +++ b/tools/testing/selftests/rseq/check_optimized.c @@ -0,0 +1,17 @@ +// SPDX-License-Identifier: LGPL-2.1 +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include + +#include "rseq.h" + +int main(int argc, char **argv) +{ + if (__rseq_register_current_thread(true, false)) + return -1; + return 0; +} --- a/tools/testing/selftests/rseq/param_test.c +++ b/tools/testing/selftests/rseq/param_test.c @@ -38,7 +38,7 @@ static int opt_modulo, verbose; static int opt_yield, opt_signal, opt_sleep, opt_disable_rseq, opt_threads =3D 200, opt_disable_mod =3D 0, opt_test =3D 's'; - +static bool opt_rseq_legacy; static long long opt_reps =3D 5000; =20 static __thread __attribute__((tls_model("initial-exec"))) @@ -481,7 +481,7 @@ void *test_percpu_spinlock_thread(void * long long i, reps; =20 if (!opt_disable_rseq && thread_data->reg && - rseq_register_current_thread()) + __rseq_register_current_thread(true, opt_rseq_legacy)) abort(); reps =3D thread_data->reps; for (i =3D 0; i < reps; i++) { @@ -558,7 +558,7 @@ void *test_percpu_inc_thread(void *arg) long long i, reps; =20 if (!opt_disable_rseq && thread_data->reg && - rseq_register_current_thread()) + __rseq_register_current_thread(true, opt_rseq_legacy)) abort(); reps =3D thread_data->reps; for (i =3D 0; i < reps; i++) { @@ -712,7 +712,7 @@ void *test_percpu_list_thread(void *arg) long long i, reps; struct percpu_list *list =3D (struct percpu_list *)arg; =20 - if (!opt_disable_rseq && rseq_register_current_thread()) + if (!opt_disable_rseq && __rseq_register_current_thread(true, opt_rseq_le= gacy)) abort(); =20 reps =3D opt_reps; @@ -895,7 +895,7 @@ void *test_percpu_buffer_thread(void *ar long long i, reps; struct percpu_buffer *buffer =3D (struct percpu_buffer *)arg; =20 - if (!opt_disable_rseq && rseq_register_current_thread()) + if (!opt_disable_rseq && __rseq_register_current_thread(true, opt_rseq_le= gacy)) abort(); =20 reps =3D opt_reps; @@ -1105,7 +1105,7 @@ void *test_percpu_memcpy_buffer_thread(v long long i, reps; struct percpu_memcpy_buffer *buffer =3D (struct percpu_memcpy_buffer *)ar= g; =20 - if (!opt_disable_rseq && rseq_register_current_thread()) + if (!opt_disable_rseq && __rseq_register_current_thread(true, opt_rseq_le= gacy)) abort(); =20 reps =3D opt_reps; @@ -1258,7 +1258,7 @@ void *test_membarrier_worker_thread(void const int iters =3D opt_reps; int i; =20 - if (rseq_register_current_thread()) { + if (__rseq_register_current_thread(true, opt_rseq_legacy)) { fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s= \n", errno, strerror(errno)); abort(); @@ -1323,7 +1323,7 @@ void *test_membarrier_manager_thread(voi intptr_t expect_a =3D 0, expect_b =3D 0; int cpu_a =3D 0, cpu_b =3D 0; =20 - if (rseq_register_current_thread()) { + if (__rseq_register_current_thread(true, opt_rseq_legacy)) { fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s= \n", errno, strerror(errno)); abort(); @@ -1475,6 +1475,7 @@ static void show_usage(int argc, char ** printf(" [-D M] Disable rseq for each M threads\n"); printf(" [-T test] Choose test: (s)pinlock, (l)ist, (b)uffer, (m)emcpy, (= i)ncrement, membarrie(r)\n"); printf(" [-M] Push into buffer and memcpy buffer with memory barriers.\n"= ); + printf(" [-O] Test with optimized RSEQ\n"); printf(" [-v] Verbose output.\n"); printf(" [-h] Show this help.\n"); printf("\n"); @@ -1602,6 +1603,9 @@ int main(int argc, char **argv) case 'M': opt_mo =3D RSEQ_MO_RELEASE; break; + case 'L': + opt_rseq_legacy =3D true; + break; default: show_usage(argc, argv); goto error; @@ -1618,7 +1622,7 @@ int main(int argc, char **argv) if (set_signal_handler()) goto error; =20 - if (!opt_disable_rseq && rseq_register_current_thread()) + if (!opt_disable_rseq && __rseq_register_current_thread(true, opt_rseq_le= gacy)) goto error; if (!opt_disable_rseq && !rseq_validate_cpu_id()) { fprintf(stderr, "Error: cpu id getter unavailable\n"); --- a/tools/testing/selftests/rseq/run_param_test.sh +++ b/tools/testing/selftests/rseq/run_param_test.sh @@ -34,6 +34,11 @@ REPS=3D1000 SLOW_REPS=3D100 NR_THREADS=3D$((6*${NR_CPUS})) =20 +# Prevent GLIBC from registering RSEQ so the selftest can run in legacy and +# performance optimized mode. +GLIBC_TUNABLES=3D"${GLIBC_TUNABLES:-}:glibc.pthread.rseq=3D0" +export GLIBC_TUNABLES + function do_tests() { local i=3D0 @@ -103,6 +108,40 @@ function inject_blocking() NR_LOOPS=3D } =20 +echo "Testing in legacy RSEQ mode" +echo "Yield injection (25%)" +inject_blocking -m 4 -y -L + +echo "Yield injection (50%)" +inject_blocking -m 2 -y -L + +echo "Yield injection (100%)" +inject_blocking -m 1 -y -L + +echo "Kill injection (25%)" +inject_blocking -m 4 -k -L + +echo "Kill injection (50%)" +inject_blocking -m 2 -k -L + +echo "Kill injection (100%)" +inject_blocking -m 1 -k -L + +echo "Sleep injection (1ms, 25%)" +inject_blocking -m 4 -s 1 -L + +echo "Sleep injection (1ms, 50%)" +inject_blocking -m 2 -s 1 -L + +echo "Sleep injection (1ms, 100%)" +inject_blocking -m 1 -s 1 -L + +./check_optimized || { + echo "Skipping optimized RSEQ mode test. Not supported"; + exit 0 +} + +echo "Testing in optimized RSEQ mode" echo "Yield injection (25%)" inject_blocking -m 4 -y =20 --- /dev/null +++ b/tools/testing/selftests/rseq/run_timeslice_test.sh @@ -0,0 +1,14 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0+ + +# Prevent GLIBC from registering RSEQ so the selftest can run in legacy +# and performance optimized mode. +GLIBC_TUNABLES=3D"${GLIBC_TUNABLES:-}:glibc.pthread.rseq=3D0" +export GLIBC_TUNABLES + +./check_optimized || { + echo "Skipping optimized RSEQ mode test. Not supported"; + exit 0 +} + +./slice_test --- a/tools/testing/selftests/rseq/slice_test.c +++ b/tools/testing/selftests/rseq/slice_test.c @@ -124,7 +124,7 @@ FIXTURE_SETUP(slice_ext) { cpu_set_t affinity; =20 - if (rseq_register_current_thread()) + if (__rseq_register_current_thread(true, false)) SKIP(return, "RSEQ not supported\n"); =20 if (prctl(PR_RSEQ_SLICE_EXTENSION, PR_RSEQ_SLICE_EXTENSION_SET,