From nobody Sat Oct 4 19:16:42 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09856286D52 for ; Wed, 13 Aug 2025 16:29:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102559; cv=none; b=P1efkFHPq5AETYrQapVZB7sQLqFSvu0PXylzG6MbQPnhqDPuYMDhcqy2I+ZilMFY7H/vz4T1oUjRFwDizcuxuLrFZe9D6VlwW/jnEcQY6FHd4BqXDwrucsbR8e/NN03b80LAkLOpXPBwZOr82pJpM0YIRZkPJmQeXb/MTRaICZw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102559; c=relaxed/simple; bh=gD66lkZO5SYisEaAHjZR3BnveNJcWa8+rT/0CAAZS/s=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=iiB0pZIfFgZm496Lr3XVBpFTlKB1zB975701ZIljj+fPy0Puo9ra0E1MSyk2ZJILckpDMcMldmhPx2VpsTg72VSykAbrhM43eUYvV0e5m2riOpet7ETpVk+DtMNUW9/3jwscysTrlqvsvEq3OAuZAc5Lhtroc9Qa3CB+hRU5ss8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=ajUzUREQ; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=4dtpvpeG; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="ajUzUREQ"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="4dtpvpeG" Message-ID: <20250813162823.845026450@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1755102556; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=ouOsjc5WDfUTdoifaPmYfBma6XPdw6qiHLc6d5Wjqzs=; b=ajUzUREQlLhF5TE5ssZoUAn0NXXANrscZO4m3VF48253P78e0pAf89LHTAm1A0eHph5Swl hPrA1nDzLNaJSLDlS7esZcmwUlrUbqy/wAp6zGGlQa9NUGthoOR5NJ6HVgHdKwbpN1TyLy V1eUV3tyD2a5rSklPUYBlY0D7chhv3yTzXs76vJFzhNjhToK6jldwj3MTujcOeMXzZwYWT x9J5zqoISsv3/kpvClRpr1rdjQlzSrD8sAYkSLU6ZvX4+6G8katkq5+FEc9g8BYPtAX7ak XJ/Iw+OyAjF9SEagdrMMUqcRWigzjl+inilEZ/i5ZtIKvenwk2WWVGcvjH5uIQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1755102556; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=ouOsjc5WDfUTdoifaPmYfBma6XPdw6qiHLc6d5Wjqzs=; b=4dtpvpeGJpnpEgziUwXIgE58JxaOSz7Aa0bdWVsRvy0wmz9SFJA2dYM9hzrFPYmLyg8NlH NaDYUkGNfGRgPQDQ== From: Thomas Gleixner To: LKML Cc: Michael Jeanson , Mathieu Desnoyers , Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Wei Liu , Jens Axboe Subject: [patch 01/11] rseq: Avoid pointless evaluation in __rseq_notify_resume() References: <20250813155941.014821755@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Wed, 13 Aug 2025 18:29:14 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The RSEQ critical section mechanism only clears the event mask when a critical section is registered, otherwise it is stale and collects bits. That means once a critical section is installed the first invocation of that code when TIF_NOTIFY_RESUME is set will abort the critical section, even when the TIF bit was not raised by the rseq preempt/migrate/signal helpers. This also has a performance implication because TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is utilized by quite some infrastructure. That means every invocation of __rseq_notify_resume() goes unconditionally through the heavy lifting of user space access and consistency checks even if there is no reason to do so. Keeping the stale event mask around when exiting to user space also prevents it from being utilized by the upcoming time slice extension mechanism. Avoid this by reading and clearing the event mask before doing the user space critical section access with preemption disabled, which ensures that the read and clear operation is CPU local atomic versus scheduling. This is correct as after re-enabling preemption any relevant event will set the bit again and raise TIF_NOTIFY_RESUME, which makes the user space exit code take another round of TIF bit clearing. If the event mask was non-zero, invoke the slow path. On debug kernels the slow path is invoked unconditionally and the result of the event mask evaluation is handed in. Add a exit path check after the TIF bit loop, which validates on debug kernels that the event mask is zero before exiting to user space. While at it reword the convoluted comment why the pt_regs pointer can be NULL under certain circumstances. Signed-off-by: Thomas Gleixner Cc: Mathieu Desnoyers Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Boqun Feng --- include/linux/irq-entry-common.h | 7 ++-- include/linux/rseq.h | 10 +++++ kernel/rseq.c | 66 ++++++++++++++++++++++++++--------= ----- 3 files changed, 58 insertions(+), 25 deletions(-) --- a/include/linux/irq-entry-common.h +++ b/include/linux/irq-entry-common.h @@ -2,11 +2,12 @@ #ifndef __LINUX_IRQENTRYCOMMON_H #define __LINUX_IRQENTRYCOMMON_H =20 +#include +#include +#include #include #include -#include #include -#include #include =20 #include @@ -226,6 +227,8 @@ static __always_inline void exit_to_user =20 arch_exit_to_user_mode_prepare(regs, ti_work); =20 + rseq_exit_to_user_mode(); + /* Ensure that kernel state is sane for a return to userspace */ kmap_assert_nomap(); lockdep_assert_irqs_disabled(); --- a/include/linux/rseq.h +++ b/include/linux/rseq.h @@ -66,6 +66,14 @@ static inline void rseq_migrate(struct t rseq_set_notify_resume(t); } =20 +static __always_inline void rseq_exit_to_user_mode(void) +{ + if (IS_ENABLED(CONFIG_DEBUG_RSEQ)) { + if (WARN_ON_ONCE(current->rseq && current->rseq_event_mask)) + current->rseq_event_mask =3D 0; + } +} + /* * If parent process has a registered restartable sequences area, the * child inherits. Unregister rseq for a clone with CLONE_VM set. @@ -118,7 +126,7 @@ static inline void rseq_fork(struct task static inline void rseq_execve(struct task_struct *t) { } - +static inline void rseq_exit_to_user_mode(void) { } #endif =20 #ifdef CONFIG_DEBUG_RSEQ --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -324,9 +324,9 @@ static bool rseq_warn_flags(const char * return true; } =20 -static int rseq_need_restart(struct task_struct *t, u32 cs_flags) +static int rseq_check_flags(struct task_struct *t, u32 cs_flags) { - u32 flags, event_mask; + u32 flags; int ret; =20 if (rseq_warn_flags("rseq_cs", cs_flags)) @@ -339,17 +339,7 @@ static int rseq_need_restart(struct task =20 if (rseq_warn_flags("rseq", flags)) return -EINVAL; - - /* - * Load and clear event mask atomically with respect to - * scheduler preemption and membarrier IPIs. - */ - scoped_guard(RSEQ_EVENT_GUARD) { - event_mask =3D t->rseq_event_mask; - t->rseq_event_mask =3D 0; - } - - return !!event_mask; + return 0; } =20 static int clear_rseq_cs(struct rseq __user *rseq) @@ -380,7 +370,7 @@ static bool in_rseq_cs(unsigned long ip, return ip - rseq_cs->start_ip < rseq_cs->post_commit_offset; } =20 -static int rseq_ip_fixup(struct pt_regs *regs) +static int rseq_ip_fixup(struct pt_regs *regs, bool abort) { unsigned long ip =3D instruction_pointer(regs); struct task_struct *t =3D current; @@ -398,9 +388,11 @@ static int rseq_ip_fixup(struct pt_regs */ if (!in_rseq_cs(ip, &rseq_cs)) return clear_rseq_cs(t->rseq); - ret =3D rseq_need_restart(t, rseq_cs.flags); - if (ret <=3D 0) + ret =3D rseq_check_flags(t, rseq_cs.flags); + if (ret < 0) return ret; + if (!abort) + return 0; ret =3D clear_rseq_cs(t->rseq); if (ret) return ret; @@ -430,14 +422,44 @@ void __rseq_handle_notify_resume(struct return; =20 /* - * regs is NULL if and only if the caller is in a syscall path. Skip - * fixup and leave rseq_cs as is so that rseq_sycall() will detect and - * kill a misbehaving userspace on debug kernels. + * If invoked from hypervisors or IO-URING, then @regs is a NULL + * pointer, so fixup cannot be done. If the syscall which led to + * this invocation was invoked inside a critical section, then it + * will either end up in this code again or a possible violation of + * a syscall inside a critical region can only be detected by the + * debug code in rseq_syscall() in a debug enabled kernel. */ if (regs) { - ret =3D rseq_ip_fixup(regs); - if (unlikely(ret < 0)) - goto error; + /* + * Read and clear the event mask first. If the task was not + * preempted or migrated or a signal is on the way, there + * is no point in doing any of the heavy lifting here on + * production kernels. In that case TIF_NOTIFY_RESUME was + * raised by some other functionality. + * + * This is correct because the read/clear operation is + * guarded against scheduler preemption, which makes it CPU + * local atomic. If the task is preempted right after + * re-enabling preemption then TIF_NOTIFY_RESUME is set + * again and this function is invoked another time _before_ + * the task is able to return to user mode. + * + * On a debug kernel, invoke the fixup code unconditionally + * with the result handed in to allow the detection of + * inconsistencies. + */ + u32 event_mask; + + scoped_guard(RSEQ_EVENT_GUARD) { + event_mask =3D t->rseq_event_mask; + t->rseq_event_mask =3D 0; + } + + if (IS_ENABLED(CONFIG_DEBUG_RSEQ) || event_mask) { + ret =3D rseq_ip_fixup(regs, !!event_mask); + if (unlikely(ret < 0)) + goto error; + } } if (unlikely(rseq_update_cpu_node_id(t))) goto error; From nobody Sat Oct 4 19:16:42 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5C9F287512 for ; Wed, 13 Aug 2025 16:29:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102562; cv=none; b=cMbCM8881jYLlaDqakSVThBqZqhjvKk6fx5gYWxWIe3gqecgn/rfXYRp5cXzq/uiPAzQiSwLEPTZDaaX9gLEdzjpCMkc+g/IFmiAdZKtb6gxaFbqDBhsc8uDBDWDOmGom1xCnyTornSBacfcsp31eJWxbfgByb3tbKvHO9StCR0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102562; c=relaxed/simple; bh=GrcX8bKrMeUNwjVa2lQR8P7v7/02L5aEjqPuACi/KfU=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=fdgfsZonoSUw69+fVDPavpyyrzYYBy7uai27FMZvZ+K8LvxLy0tEzuHpsjuDGF/0tksDlaCFCdOoDuU3gO2HRS7Z4z483H16YIK6yiHtEd6unPsw4D5DAbQ7aUxlbZMtiLDChzHNrDw1RGrcFZzvQKwM6zYWdMy3F92/FHDQ0Zg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=xIq8jSRi; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=01lVH/L1; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="xIq8jSRi"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="01lVH/L1" Message-ID: <20250813162823.909295085@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1755102558; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=y0ZQd/4MvG8zfCK2cfH/GZWcRhhoNxJYcSAnLn0wnGk=; b=xIq8jSRimyFwqDzRboxJ6Jf8wIag3KsmLAH9KCcnYV8ZplXFSOyAKU5nG1z9F8J2D15fOA l4ZBo1vSVxdeTF27OhqECjV3L/QnLwyItBVfYBBZZ5hjKX0jzJnnur7Nl6K6sFBrQxvFDY Ox1Ui/EAR+AQKPoWcCtybtY6T6b8nr105wN2YV6wTv3UCVZotsrhvD6BG3KcV+dE+Nkii3 Z0wMWYUerFw97RsPso19ELMQPmgsuEhEriDKUyHYL7InEcmxAbd6ebWz4Ffp9RzRb+d2OP KhEjPJHTLE+BeRqYBFmzPKzt9KYQS6ntDjBffSpAlm+ubsYRbSD++tBOnBSJ9Q== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1755102558; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=y0ZQd/4MvG8zfCK2cfH/GZWcRhhoNxJYcSAnLn0wnGk=; b=01lVH/L1IfjTFbFA9E9qvCMG0PFS7KNbV7B+dDDfqPhS5lFTuKbGxIDKFctL1KpRU4OKEP YQu8XdYJJFmQY3Cg== From: Thomas Gleixner To: LKML Cc: Michael Jeanson , Mathieu Desnoyers , Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Wei Liu , Jens Axboe Subject: [patch 02/11] rseq: Condense the inline stubs References: <20250813155941.014821755@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Wed, 13 Aug 2025 18:29:17 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Scrolling over tons of pointless { } lines to find the actual code is annoying at best. Signed-off-by: Thomas Gleixner Cc: Mathieu Desnoyers Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Boqun Feng Reviewed-by: Mathieu Desnoyers --- include/linux/rseq.h | 47 ++++++++++++----------------------------------- 1 file changed, 12 insertions(+), 35 deletions(-) --- a/include/linux/rseq.h +++ b/include/linux/rseq.h @@ -101,44 +101,21 @@ static inline void rseq_execve(struct ta t->rseq_event_mask =3D 0; } =20 -#else - -static inline void rseq_set_notify_resume(struct task_struct *t) -{ -} -static inline void rseq_handle_notify_resume(struct ksignal *ksig, - struct pt_regs *regs) -{ -} -static inline void rseq_signal_deliver(struct ksignal *ksig, - struct pt_regs *regs) -{ -} -static inline void rseq_preempt(struct task_struct *t) -{ -} -static inline void rseq_migrate(struct task_struct *t) -{ -} -static inline void rseq_fork(struct task_struct *t, unsigned long clone_fl= ags) -{ -} -static inline void rseq_execve(struct task_struct *t) -{ -} +#else /* CONFIG_RSEQ */ +static inline void rseq_set_notify_resume(struct task_struct *t) { } +static inline void rseq_handle_notify_resume(struct ksignal *ksig, struct = pt_regs *regs) { } +static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_reg= s *regs) { } +static inline void rseq_preempt(struct task_struct *t) { } +static inline void rseq_migrate(struct task_struct *t) { } +static inline void rseq_fork(struct task_struct *t, unsigned long clone_fl= ags) { } +static inline void rseq_execve(struct task_struct *t) { } static inline void rseq_exit_to_user_mode(void) { } -#endif +#endif /* !CONFIG_RSEQ */ =20 #ifdef CONFIG_DEBUG_RSEQ - void rseq_syscall(struct pt_regs *regs); - -#else - -static inline void rseq_syscall(struct pt_regs *regs) -{ -} - -#endif +#else /* CONFIG_DEBUG_RSEQ */ +static inline void rseq_syscall(struct pt_regs *regs) { } +#endif /* !CONFIG_DEBUG_RSEQ */ =20 #endif /* _LINUX_RSEQ_H */ From nobody Sat Oct 4 19:16:42 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A78412877F7 for ; Wed, 13 Aug 2025 16:29:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102564; cv=none; b=KUWxu/XoZly1llvfeVYO8/Gm5O/1t6IWPvrkLq9wDuA4xpCcEaiUI0kcA/ACjD5n8cuHmO+4tMK8wrgQDDkMSHRzF/GO6bU7lfu2Dzjsl4q0Mrisi+Qb5jH5tO9+n62eWtJKqhUD/UIdpX3/Q0nO3mjW5UxYGGHCDRW0ARTPypU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102564; c=relaxed/simple; bh=O+ygg52PIQiWANL8HujuilX/IklZioqj6FH+YBMvJ84=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=Okwy2rVmDPzlTnuwl9z/NzpJwmZ4nnKKmFk94ixsMKkCYXWMQklzokJ/jh5bsbvoSDnfTVVw8Jo9GrUuMJNX4879lsetJfH/cIyosKicPF+ymruCR0FG9nBz7YC+w8v0en5p+2xQD7oEDr7SdjUD00rzD7fV+tovkSxab/BTEpo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=zizFNKF2; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=fo53kI+G; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="zizFNKF2"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="fo53kI+G" Message-ID: <20250813162823.972744605@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1755102561; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=McYOjH3pRC85xYVfVYL6bZ0T+rBBg/9+RCapZBZZnk0=; b=zizFNKF2dSVrMoEtWGzrIM10AfMnqW5OYTsmTFgTvafffWaOXJGlUod94lhB04ACe3VZ76 O5DkEip0ethgFhZtC5eZFifjBmUxovQ/se+5Bkghj2BwjQHNXNSxsB9+/tmGsXrv56OYIz n6c/e1ONhFJ6RHTL9dJ8xHM2KVuuJKNeU8pJfZFjQf0uW9Ky28mAAX1CklTAprmuHvXIHO vApG/+HCV5HSNHgNm5Qf+6nEcmSqwjJDYSEORcKahZjZzl557WdoNWWNtw9LlfyZJVv2cn uBUxK22qYciSxKsXBzKa1xczAvI36GxvjyZlQ0SAJRRWiy3wbeGX4Mo7/4TwOA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1755102561; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=McYOjH3pRC85xYVfVYL6bZ0T+rBBg/9+RCapZBZZnk0=; b=fo53kI+G78MaaWDw8Z4NIYq0/OowuT6lAA/itDQ8GWkIwSWXUwn397U1rlC1lru2muHiFr /jrDp5MBD6xad5DQ== From: Thomas Gleixner To: LKML Cc: Michael Jeanson , Peter Zijlstra , Mathieu Desnoyers , "Paul E. McKenney" , Boqun Feng , Wei Liu , Jens Axboe Subject: [patch 03/11] rseq: Rename rseq_syscall() to rseq_debug_syscall_exit() References: <20250813155941.014821755@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Wed, 13 Aug 2025 18:29:19 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" rseq_syscall() is a debug function, which is invoked before the syscall exit work is handled. Name it so it's clear what it does. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Mathieu Desnoyers Cc: "Paul E. McKenney" Cc: Boqun Feng Reviewed-by: Mathieu Desnoyers --- include/linux/entry-common.h | 2 +- include/linux/rseq.h | 4 ++-- kernel/rseq.c | 5 +++-- 3 files changed, 6 insertions(+), 5 deletions(-) --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -162,7 +162,7 @@ static __always_inline void syscall_exit local_irq_enable(); } =20 - rseq_syscall(regs); + rseq_debug_syscall_exit(regs); =20 /* * Do one-time syscall specific work. If these work items are --- a/include/linux/rseq.h +++ b/include/linux/rseq.h @@ -113,9 +113,9 @@ static inline void rseq_exit_to_user_mod #endif /* !CONFIG_RSEQ */ =20 #ifdef CONFIG_DEBUG_RSEQ -void rseq_syscall(struct pt_regs *regs); +void rseq_debug_syscall_exit(struct pt_regs *regs); #else /* CONFIG_DEBUG_RSEQ */ -static inline void rseq_syscall(struct pt_regs *regs) { } +static inline void rseq_debug_syscall_exit(struct pt_regs *regs) { } #endif /* !CONFIG_DEBUG_RSEQ */ =20 #endif /* _LINUX_RSEQ_H */ --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -427,7 +427,8 @@ void __rseq_handle_notify_resume(struct * this invocation was invoked inside a critical section, then it * will either end up in this code again or a possible violation of * a syscall inside a critical region can only be detected by the - * debug code in rseq_syscall() in a debug enabled kernel. + * debug code in rseq_debug_syscall_exit() in a debug enabled + * kernel. */ if (regs) { /* @@ -476,7 +477,7 @@ void __rseq_handle_notify_resume(struct * Terminate the process if a syscall is issued within a restartable * sequence. */ -void rseq_syscall(struct pt_regs *regs) +void rseq_debug_syscall_exit(struct pt_regs *regs) { unsigned long ip =3D instruction_pointer(regs); struct task_struct *t =3D current; From nobody Sat Oct 4 19:16:42 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E4F0288C32 for ; Wed, 13 Aug 2025 16:29:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102566; cv=none; b=WffxAKyjOenwINrlyhDpItoS3Sh/+qb/ykUL+MBDwue23i1V7JhNfXKuDBscpUCSjMOzr5F+31JV8HfyJdqRMPuebe+Pwz0nzId/uFr1z647Caz0z46qOporf+yaOPjxaHBjF0r4ON7Gr5xosn5RttSjprr4A61h5vA+vXP8TP8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102566; c=relaxed/simple; bh=+OGbx7xTVcd+IIoEL1Ji2++s2d6FO6ZZqSJViwHSpaU=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=NWs50+OHAezLuSbWr58DVffHjNeEG5zdB6VK3Ajw8IoQbnADJ9e7Z7IipoG3I8rDwztDdQg9sbPrYkovFP42zhARZZJrIwF3VvUnoks5PaBC7cFNCxK68ULJxow5Kt74rpeu1iBuNpw5NkhUispd2FrEFp0j+PEExcjAFt4YpWE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=OMYZ+6MQ; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=6o15sLOP; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="OMYZ+6MQ"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="6o15sLOP" Message-ID: <20250813162824.036832236@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1755102563; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=nxDr92MJKpwx9lVrzP5wunPDq43899sT4Dlb+1PufKk=; b=OMYZ+6MQLEM6IeuaL7RwsTL0m+RFKZZy5KOxSElY6dPK98w3sQNaVRZLKJwQPII4SeFcMV 8D9lC5XWE6ESP0aYzx4EbvDawaCoTLDGNHGiiDdfjX34cEhoW5X+q1xBtSoJ1ZinUaJYmY 4wjcn4xJWsZRF/UbYLuTWqsFD+YZQ7i+ZSb4jAYwy7CRh4DwPvhgGxh0EoW7zVnz82JHuL xshXnb2YfSpXTTquOciJm+1+BoU1brnnw6ujCkP9yEB/w8tuwJpAWl8a0Jr0nIEbjtSuUJ yUGDiZNdJuTT3EOiE0qZrfo8BIIBGqXl/Jzn+rDreTScB8UjZ1p5v0MGlBXAaA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1755102563; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=nxDr92MJKpwx9lVrzP5wunPDq43899sT4Dlb+1PufKk=; b=6o15sLOP1Prmgg3W7+0v8j+qDjSdrMC/y7ZpaVFZlXpjTek8DzHAkarR9mvLnAq21K/O5V wr2x2zUBUwolAeAw== From: Thomas Gleixner To: LKML Cc: Michael Jeanson , Peter Zijlstra , Mathieu Desnoyers , "Paul E. McKenney" , Boqun Feng , Wei Liu , Jens Axboe Subject: [patch 04/11] rseq: Replace the pointless event mask bit fiddling References: <20250813155941.014821755@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Wed, 13 Aug 2025 18:29:22 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since commit 0190e4198e47 ("rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags") the bits in task::rseq_event_mask are meaningless and just extra work in terms of setting them individually. Collapse them all into a single boolean which simplifies the code a lot. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Mathieu Desnoyers Cc: "Paul E. McKenney" Cc: Boqun Feng --- include/linux/rseq.h | 59 +++++++++++------------------------------= ----- include/linux/sched.h | 10 +++---- include/uapi/linux/rseq.h | 21 +++++----------- kernel/rseq.c | 26 ++++++++++++-------- kernel/sched/core.c | 8 +++--- kernel/sched/membarrier.c | 8 +++--- 6 files changed, 51 insertions(+), 81 deletions(-) --- a/include/linux/rseq.h +++ b/include/linux/rseq.h @@ -7,28 +7,6 @@ #include #include =20 -#ifdef CONFIG_MEMBARRIER -# define RSEQ_EVENT_GUARD irq -#else -# define RSEQ_EVENT_GUARD preempt -#endif - -/* - * Map the event mask on the user-space ABI enum rseq_cs_flags - * for direct mask checks. - */ -enum rseq_event_mask_bits { - RSEQ_EVENT_PREEMPT_BIT =3D RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT, - RSEQ_EVENT_SIGNAL_BIT =3D RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT, - RSEQ_EVENT_MIGRATE_BIT =3D RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT, -}; - -enum rseq_event_mask { - RSEQ_EVENT_PREEMPT =3D (1U << RSEQ_EVENT_PREEMPT_BIT), - RSEQ_EVENT_SIGNAL =3D (1U << RSEQ_EVENT_SIGNAL_BIT), - RSEQ_EVENT_MIGRATE =3D (1U << RSEQ_EVENT_MIGRATE_BIT), -}; - static inline void rseq_set_notify_resume(struct task_struct *t) { if (t->rseq) @@ -47,30 +25,25 @@ static inline void rseq_handle_notify_re static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_regs *regs) { - scoped_guard(RSEQ_EVENT_GUARD) - __set_bit(RSEQ_EVENT_SIGNAL_BIT, ¤t->rseq_event_mask); - rseq_handle_notify_resume(ksig, regs); -} - -/* rseq_preempt() requires preemption to be disabled. */ -static inline void rseq_preempt(struct task_struct *t) -{ - __set_bit(RSEQ_EVENT_PREEMPT_BIT, &t->rseq_event_mask); - rseq_set_notify_resume(t); + if (current->rseq) { + current->rseq_event_pending =3D true; + __rseq_handle_notify_resume(ksig, regs); + } } =20 -/* rseq_migrate() requires preemption to be disabled. */ -static inline void rseq_migrate(struct task_struct *t) +static inline void rseq_notify_event(struct task_struct *t) { - __set_bit(RSEQ_EVENT_MIGRATE_BIT, &t->rseq_event_mask); - rseq_set_notify_resume(t); + if (t->rseq) { + t->rseq_event_pending =3D true; + set_tsk_thread_flag(t, TIF_NOTIFY_RESUME); + } } =20 static __always_inline void rseq_exit_to_user_mode(void) { if (IS_ENABLED(CONFIG_DEBUG_RSEQ)) { - if (WARN_ON_ONCE(current->rseq && current->rseq_event_mask)) - current->rseq_event_mask =3D 0; + if (WARN_ON_ONCE(current->rseq && current->rseq_event_pending)) + current->rseq_event_pending =3D false; } } =20 @@ -84,12 +57,12 @@ static inline void rseq_fork(struct task t->rseq =3D NULL; t->rseq_len =3D 0; t->rseq_sig =3D 0; - t->rseq_event_mask =3D 0; + t->rseq_event_pending =3D false; } else { t->rseq =3D current->rseq; t->rseq_len =3D current->rseq_len; t->rseq_sig =3D current->rseq_sig; - t->rseq_event_mask =3D current->rseq_event_mask; + t->rseq_event_pending =3D current->rseq_event_pending; } } =20 @@ -98,17 +71,15 @@ static inline void rseq_execve(struct ta t->rseq =3D NULL; t->rseq_len =3D 0; t->rseq_sig =3D 0; - t->rseq_event_mask =3D 0; + t->rseq_event_pending =3D false; } =20 #else /* CONFIG_RSEQ */ static inline void rseq_set_notify_resume(struct task_struct *t) { } static inline void rseq_handle_notify_resume(struct ksignal *ksig, struct = pt_regs *regs) { } static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_reg= s *regs) { } -static inline void rseq_preempt(struct task_struct *t) { } -static inline void rseq_migrate(struct task_struct *t) { } +static inline void rseq_notify_event(struct task_struct *t) { } static inline void rseq_fork(struct task_struct *t, unsigned long clone_fl= ags) { } -static inline void rseq_execve(struct task_struct *t) { } static inline void rseq_exit_to_user_mode(void) { } #endif /* !CONFIG_RSEQ */ =20 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1401,14 +1401,14 @@ struct task_struct { #endif /* CONFIG_NUMA_BALANCING */ =20 #ifdef CONFIG_RSEQ - struct rseq __user *rseq; - u32 rseq_len; - u32 rseq_sig; + struct rseq __user *rseq; + u32 rseq_len; + u32 rseq_sig; /* - * RmW on rseq_event_mask must be performed atomically + * RmW on rseq_event_pending must be performed atomically * with respect to preemption. */ - unsigned long rseq_event_mask; + bool rseq_event_pending; # ifdef CONFIG_DEBUG_RSEQ /* * This is a place holder to save a copy of the rseq fields for --- a/include/uapi/linux/rseq.h +++ b/include/uapi/linux/rseq.h @@ -114,20 +114,13 @@ struct rseq { /* * Restartable sequences flags field. * - * This field should only be updated by the thread which - * registered this data structure. Read by the kernel. - * Mainly used for single-stepping through rseq critical sections - * with debuggers. - * - * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT - * Inhibit instruction sequence block restart on preemption - * for this thread. - * - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL - * Inhibit instruction sequence block restart on signal - * delivery for this thread. - * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE - * Inhibit instruction sequence block restart on migration for - * this thread. + * This field was initialy intended to allow event masking for for + * single-stepping through rseq critical sections with debuggers. + * The kernel does not support this anymore and the relevant bits + * are checked for being always false: + * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT + * - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL + * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE */ __u32 flags; =20 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -19,6 +19,12 @@ #define CREATE_TRACE_POINTS #include =20 +#ifdef CONFIG_MEMBARRIER +# define RSEQ_EVENT_GUARD irq +#else +# define RSEQ_EVENT_GUARD preempt +#endif + /* The original rseq structure size (including padding) is 32 bytes. */ #define ORIG_RSEQ_SIZE 32 =20 @@ -432,11 +438,11 @@ void __rseq_handle_notify_resume(struct */ if (regs) { /* - * Read and clear the event mask first. If the task was not - * preempted or migrated or a signal is on the way, there - * is no point in doing any of the heavy lifting here on - * production kernels. In that case TIF_NOTIFY_RESUME was - * raised by some other functionality. + * Read and clear the event pending bit first. If the task + * was not preempted or migrated or a signal is on the way, + * there is no point in doing any of the heavy lifting here + * on production kernels. In that case TIF_NOTIFY_RESUME + * was raised by some other functionality. * * This is correct because the read/clear operation is * guarded against scheduler preemption, which makes it CPU @@ -449,15 +455,15 @@ void __rseq_handle_notify_resume(struct * with the result handed in to allow the detection of * inconsistencies. */ - u32 event_mask; + bool event; =20 scoped_guard(RSEQ_EVENT_GUARD) { - event_mask =3D t->rseq_event_mask; - t->rseq_event_mask =3D 0; + event =3D t->rseq_event_pending; + t->rseq_event_pending =3D false; } =20 - if (IS_ENABLED(CONFIG_DEBUG_RSEQ) || event_mask) { - ret =3D rseq_ip_fixup(regs, !!event_mask); + if (IS_ENABLED(CONFIG_DEBUG_RSEQ) || event) { + ret =3D rseq_ip_fixup(regs, event); if (unlikely(ret < 0)) goto error; } --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3364,7 +3364,7 @@ void set_task_cpu(struct task_struct *p, if (p->sched_class->migrate_task_rq) p->sched_class->migrate_task_rq(p, new_cpu); p->se.nr_migrations++; - rseq_migrate(p); + rseq_notify_event(p); sched_mm_cid_migrate_from(p); perf_event_task_migrate(p); } @@ -4795,7 +4795,7 @@ int sched_cgroup_fork(struct task_struct p->sched_task_group =3D tg; } #endif - rseq_migrate(p); + rseq_notify_event(p); /* * We're setting the CPU for the first time, we don't migrate, * so use __set_task_cpu(). @@ -4859,7 +4859,7 @@ void wake_up_new_task(struct task_struct * as we're not fully set-up yet. */ p->recent_used_cpu =3D task_cpu(p); - rseq_migrate(p); + rseq_notify_event(p); __set_task_cpu(p, select_task_rq(p, task_cpu(p), &wake_flags)); rq =3D __task_rq_lock(p, &rf); update_rq_clock(rq); @@ -5153,7 +5153,7 @@ prepare_task_switch(struct rq *rq, struc kcov_prepare_switch(prev); sched_info_switch(rq, prev, next); perf_event_task_sched_out(prev, next); - rseq_preempt(prev); + rseq_notify_event(prev); fire_sched_out_preempt_notifiers(prev, next); kmap_local_sched_out(); prepare_task(next); --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -199,7 +199,7 @@ static void ipi_rseq(void *info) * is negligible. */ smp_mb(); - rseq_preempt(current); + rseq_notify_event(current); } =20 static void ipi_sync_rq_state(void *info) @@ -407,9 +407,9 @@ static int membarrier_private_expedited( * membarrier, we will end up with some thread in the mm * running without a core sync. * - * For RSEQ, don't rseq_preempt() the caller. User code - * is not supposed to issue syscalls at all from inside an - * rseq critical section. + * For RSEQ, don't invoke rseq_notify_event() on the + * caller. User code is not supposed to issue syscalls at + * all from inside an rseq critical section. */ if (flags !=3D MEMBARRIER_FLAG_SYNC_CORE) { preempt_disable(); From nobody Sat Oct 4 19:16:42 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 949F828980F for ; Wed, 13 Aug 2025 16:29:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102569; cv=none; b=g3HNITSNTQLOk86XA6DnFw6xRDIO8w38lgRu152i8dY+J2SzdLG39UOHpf0T4RwsrrsPfGGCykQag0wAu1uNSyHYWOq6g9r1omocUN+blxga2lZP/4GZtzoS2zADgEf0BuRtWGWDQT1IJuXdOoY7UQwPTO4niSsgSbDOXndQU3k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102569; c=relaxed/simple; bh=eNVmM11/VxJ0gZJky7d2P8a3z9oDJ7H5eoAFZmT8Vgw=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=DbSDN7KDBg9SehHIz3PBDLIqFY1kZiY8FQE0qGIOj+Zcxqz3Dqrj4Hv2beq9cvZbUL3Be7Rip2dI2eiuVSsSKNhShiPaCJAnuY2hRS65s8+9l/iWGQL/2hoC9Sxkg11A7vbIS7dPxX+WgU12T8b94kkOtjD2vCN61hbzRCLqOh8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=ZtRUELDR; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=55Irt9Bn; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="ZtRUELDR"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="55Irt9Bn" Message-ID: <20250813162824.100212248@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1755102566; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=FcyAdnf9n089HR0oCpbdG20fsjCKd6CMKAT/PpPW5sg=; b=ZtRUELDRpk3WHqAL7H2KT9HFwbGicBTFhhc4QZNYVUoPNLqAThPCfPl4gyfMEJQe3cV90K AEO9rsIdyxIZ51IOSE5RAuUO8t+WAnqij0sI+bDrZXPNoif0rtxkq2LFp/04wpvG3CZvGu 2NygOfBNC7mooqt1aWsSnCumRmCqkFdX9E8qqEP2sdNLXwc8t7369I/t9mIfaoP2KW4PKv 7NLS35pc2JYAijLzKxrwrQ05kKzu6ZbcMeJI2WWHOMWixGquWE0p8kBmOGiO6uN9Ssk6wS yJGFb39VHQQ7OgGyviz6nMxl3U4lJiBk9lcjUJv0R1Nxz7+9ZFEjzewtm590Lg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1755102566; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=FcyAdnf9n089HR0oCpbdG20fsjCKd6CMKAT/PpPW5sg=; b=55Irt9BnJdWYV9HJvVerDTHUDvjMXADRnjyZsnZTY9IyvQ5YIdouHXUwS5Wo5z8xBFxcsh Hcxaas4g5AI70+Bw== From: Thomas Gleixner To: LKML Cc: Michael Jeanson , Mathieu Desnoyers , Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Wei Liu , Jens Axboe Subject: [patch 05/11] rseq: Optimize the signal delivery path References: <20250813155941.014821755@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Wed, 13 Aug 2025 18:29:24 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that the individual event mask bits are gone, there is no point in setting the event flag before invoking rseq_handle_notify_resume(). The fact that the signal pointer is not NULL indicates that there is an event. Simply drop the setting of the event bit and just fold the event in rseq_handle_notify_resume() when the signal pointer is non-NULL. Signed-off-by: Thomas Gleixner Cc: Mathieu Desnoyers Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Boqun Feng --- include/linux/resume_user_mode.h | 2 +- include/linux/rseq.h | 9 +++------ kernel/rseq.c | 7 ++++++- 3 files changed, 10 insertions(+), 8 deletions(-) --- a/include/linux/resume_user_mode.h +++ b/include/linux/resume_user_mode.h @@ -59,7 +59,7 @@ static inline void resume_user_mode_work mem_cgroup_handle_over_high(GFP_KERNEL); blkcg_maybe_throttle_current(); =20 - rseq_handle_notify_resume(NULL, regs); + rseq_handle_notify_resume(regs); } =20 #endif /* LINUX_RESUME_USER_MODE_H */ --- a/include/linux/rseq.h +++ b/include/linux/rseq.h @@ -15,20 +15,17 @@ static inline void rseq_set_notify_resum =20 void __rseq_handle_notify_resume(struct ksignal *sig, struct pt_regs *regs= ); =20 -static inline void rseq_handle_notify_resume(struct ksignal *ksig, - struct pt_regs *regs) +static inline void rseq_handle_notify_resume(struct pt_regs *regs) { if (current->rseq) - __rseq_handle_notify_resume(ksig, regs); + __rseq_handle_notify_resume(NULL, regs); } =20 static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_regs *regs) { - if (current->rseq) { - current->rseq_event_pending =3D true; + if (current->rseq) __rseq_handle_notify_resume(ksig, regs); - } } =20 static inline void rseq_notify_event(struct task_struct *t) --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -451,6 +451,11 @@ void __rseq_handle_notify_resume(struct * again and this function is invoked another time _before_ * the task is able to return to user mode. * + * If directly invoked from the signal delivery path, @ksig + * is not NULL and @regs are valid. The pending bit is not + * set by the caller as it can easily be folded in during + * the evaluation when @ksig !=3D NULL. + * * On a debug kernel, invoke the fixup code unconditionally * with the result handed in to allow the detection of * inconsistencies. @@ -458,7 +463,7 @@ void __rseq_handle_notify_resume(struct bool event; =20 scoped_guard(RSEQ_EVENT_GUARD) { - event =3D t->rseq_event_pending; + event =3D t->rseq_event_pending || !!ksig; t->rseq_event_pending =3D false; } From nobody Sat Oct 4 19:16:42 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 031E9287277 for ; Wed, 13 Aug 2025 16:29:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102571; cv=none; b=nKJgyPbSTQo0fdetj/RuSIQtZX4P2+1kSKu3dhO0hbSyU+OxwDnThdyGEv+wq90eFjpg6WltVYXY6iLhWgswszgLvd4d9fJds5+xqkTte5ePH0XMZj7PUFxS3Mx/atgxHu8ggT9fiqH7LHcchftiqsUD5shQBx9LfJHzfiUdagI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102571; c=relaxed/simple; bh=O3nf3vQlO7UFs+0cmWzRz/9C26oMaQ0wGQoAjuRxuCo=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=KlzNfrRwocxlhKi4p2/GI/qbnqajpAAi/7n5OIzbozAIapt5JJOENotVWmwzIg4SDKi72fU3lV6+rOLbhFXZHDqw2TkAPCjX+jPyEt58XapwOZpHyNXT+QH5YMPIroFAgjcaUIb211jIJuSC/fNgf0usSCbL1ODiLF4YqNv3WQU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=eheYbEDR; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=WP+LsBLP; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="eheYbEDR"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="WP+LsBLP" Message-ID: <20250813162824.164609663@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1755102568; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=CyD2ghRr1kyVuv5De/CK9jkDkl00seF2v0b1Ou2dbUk=; b=eheYbEDRlaxnBzbo4NhIHzjGcku4Hiy3mJ3hjVOkBc/PwEXZMFWn9NtZ49uoGKs6sfCaWF B4iX+H9oEdnzoJpgWD5FaAJZxclhvQXp9Z5B72n2z5nnrEE+jvLphKWgZ03kJoODaotSht jR37h0VJ2aTGSqwfz6LUhrEvycEbuQTnFi+3TYvbVA+tpJWz6AqwP/8t+hAEl5yqRYcHwz gStpnbbpXzZDyXaVJqorApZLSoltNMU+H9JxVIg3hL8yDVbHtn6B+kQFs9BKvKMHpz3Pcy Yt7XwJweOa/hPsFtYoAgfcE4Pzzzm4QYPm/AdShlEp54G4wHwqYhq8Djsc5ohQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1755102568; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=CyD2ghRr1kyVuv5De/CK9jkDkl00seF2v0b1Ou2dbUk=; b=WP+LsBLPvyoMndyPZkjeFJE9iAh/j393lXk/VFzGbdURXA84Q2M14P3Ymy3i+q2HVACfY7 5jwuXG/weDLTLpBA== From: Thomas Gleixner To: LKML Cc: Michael Jeanson , Mathieu Desnoyers , Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Wei Liu , Jens Axboe Subject: [patch 06/11] rseq: Optimize exit to user space further References: <20250813155941.014821755@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Wed, 13 Aug 2025 18:29:27 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that the event pending bit management is consistent, the invocation of __rseq_handle_notify_resume() can be avoided if the event pending bit is not set. This is correct because of the following order: 1) if (TIF_NOTIFY_RESUME) 2) clear(TIF_NOTIFY_RESUME); smp_mb__after_atomic(); 3) if (event_pending) 4) __rseq_handle_notify_resume() 5) guard() 6) work =3D check_and_clear_pending(); Any new event, which hits between #1 and #2 will be visible in #3. Any new event, which hits after #2, will either be visible in #3 and therefore consumed in #6 or missed in #3. The latter is not a problem as the new event will also re-raise TIF_NOTIFY_RESUME, which will cause the calling exit loop take another round. The quick check #3 optimizes for the common case, where event_pending is false. Ignore the quick check when CONFIG_DEBUG_RSEQ is enabled to widen the test coverage. Signed-off-by: Thomas Gleixner Cc: Mathieu Desnoyers Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Boqun Feng --- include/linux/rseq.h | 8 +++++--- kernel/rseq.c | 17 +++++++++++++---- 2 files changed, 18 insertions(+), 7 deletions(-) --- a/include/linux/rseq.h +++ b/include/linux/rseq.h @@ -17,7 +17,7 @@ void __rseq_handle_notify_resume(struct =20 static inline void rseq_handle_notify_resume(struct pt_regs *regs) { - if (current->rseq) + if (IS_ENABLED(CONFIG_DEBUG_RESQ) || READ_ONCE(current->rseq_event_pendin= g)) __rseq_handle_notify_resume(NULL, regs); } =20 @@ -30,8 +30,10 @@ static inline void rseq_signal_deliver(s =20 static inline void rseq_notify_event(struct task_struct *t) { + lockdep_assert_irqs_disabled(); + if (t->rseq) { - t->rseq_event_pending =3D true; + WRITE_ONCE(t->rseq_event_pending, true); set_tsk_thread_flag(t, TIF_NOTIFY_RESUME); } } @@ -59,7 +61,7 @@ static inline void rseq_fork(struct task t->rseq =3D current->rseq; t->rseq_len =3D current->rseq_len; t->rseq_sig =3D current->rseq_sig; - t->rseq_event_pending =3D current->rseq_event_pending; + t->rseq_event_pending =3D READ_ONCE(current->rseq_event_pending); } } =20 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -524,9 +524,17 @@ SYSCALL_DEFINE4(rseq, struct rseq __user ret =3D rseq_reset_rseq_cpu_node_id(current); if (ret) return ret; - current->rseq =3D NULL; - current->rseq_sig =3D 0; - current->rseq_len =3D 0; + + /* + * Ensure consistency of tsk::rseq and tsk::rseq_event_pending + * vs. the scheduler and the RSEQ IPIs. + */ + scoped_guard(RSEQ_EVENT_GUARD) { + current->rseq =3D NULL; + current->rseq_sig =3D 0; + current->rseq_len =3D 0; + current->rseq_event_pending =3D false; + } return 0; } =20 @@ -601,7 +609,8 @@ SYSCALL_DEFINE4(rseq, struct rseq __user * registered, ensure the cpu_id_start and cpu_id fields * are updated before returning to user-space. */ - rseq_set_notify_resume(current); + scoped_guard(irq) + rseq_notify_event(current); =20 return 0; } From nobody Sat Oct 4 19:16:42 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD4C02874E9 for ; Wed, 13 Aug 2025 16:29:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102574; cv=none; b=eiPNoacfvFqtG1gg+tonO1LMxFNXX7CLi761i4hlC/ORAVmilcU2n8FCI0kFfcKZLHhtsrgHwRfQNvRRhiTrXbyMKMjQ/848Lmguj2WQ/hPkplcd4Qg/aVNJP2i6qwIv20+Ta2ybm7fxGgNfa9MKKvttV8GhsTjo8Rq5RHKSfUM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102574; c=relaxed/simple; bh=IgZmxkLq3m2Z80XEeemlqIVhV3dreYfNDbiNxXNy3Wo=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=OVJkAWjmgwmWohPVxtoRhn0q+E6OWOnzXRgPj8hR+8L/VslGvKWFg+2R7z1+anVBPVsUGdSTF7pgcj57E0KXV5kwsStCI3qbHOFGaEd7of5lQXVjH0nm7ANwPTxaZKgcxwMWQYLE2Ddn6Fx4CkjQSWN+W8ofp5gHLaPae34Lqi0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=s6LidLSZ; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=CUPZ+ECv; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="s6LidLSZ"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="CUPZ+ECv" Message-ID: <20250813162824.228728594@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1755102571; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=vFxQdIIT/kyuIG3Fn10fgvET/1WhG4dHE+HshN4w1HQ=; b=s6LidLSZzXzx9XybgpPVOqVexjDOCBt59in4CnnrxRopFEYVfTM5Ne/Hc+VTuxUuaGCzjc AXvpQ13Nr8LqV47xEEoR0XkAMf6lgDBCyV2XlknKAH6wnE2R1nEcp+rNpLEZ30rtDtta46 RAeFZt9pT+EJCzKqMGS0RDtXyeTeuaSOFhy0PwjylqmHWmBjNqL5Sgh5KRcIFVFh3MAduX YApYza+Q0Zdcb21oVoYkzS9dkXFLH5rvbOMIEqD4LLHudi2f9ZdtFowhVOmdj3wnHK864+ SlbqCglqGEp5ypgDPBnyrg3iiHwi0R/dO7rC9+fvhjtqIfhKioD42Ttunm2Tnw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1755102571; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=vFxQdIIT/kyuIG3Fn10fgvET/1WhG4dHE+HshN4w1HQ=; b=CUPZ+ECvk//HuwvBkdK4yb+tlhSuKWFaFc6CW9soowSyp8tWzb/QbgCjnOdwnxvLDe/oLB Vi00Keem3OoZRNBg== From: Thomas Gleixner To: LKML Cc: Michael Jeanson , Peter Zijlstra , Mathieu Desnoyers , "Paul E. McKenney" , Boqun Feng , Wei Liu , Jens Axboe Subject: [patch 07/11] entry: Cleanup header References: <20250813155941.014821755@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Wed, 13 Aug 2025 18:29:29 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Cleanup the include ordering, kernel-doc and other trivialities before making further changes. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra --- include/linux/entry-common.h | 8 ++++---- include/linux/irq-entry-common.h | 2 ++ 2 files changed, 6 insertions(+), 4 deletions(-) --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -3,11 +3,11 @@ #define __LINUX_ENTRYCOMMON_H =20 #include +#include #include +#include #include #include -#include -#include =20 #include #include @@ -37,6 +37,7 @@ SYSCALL_WORK_SYSCALL_AUDIT | \ SYSCALL_WORK_SYSCALL_USER_DISPATCH | \ ARCH_SYSCALL_WORK_ENTER) + #define SYSCALL_WORK_EXIT (SYSCALL_WORK_SYSCALL_TRACEPOINT | \ SYSCALL_WORK_SYSCALL_TRACE | \ SYSCALL_WORK_SYSCALL_AUDIT | \ @@ -61,8 +62,7 @@ */ void syscall_enter_from_user_mode_prepare(struct pt_regs *regs); =20 -long syscall_trace_enter(struct pt_regs *regs, long syscall, - unsigned long work); +long syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long= work); =20 /** * syscall_enter_from_user_mode_work - Check and handle work before invoki= ng --- a/include/linux/irq-entry-common.h +++ b/include/linux/irq-entry-common.h @@ -68,6 +68,7 @@ static __always_inline bool arch_in_rcu_ =20 /** * enter_from_user_mode - Establish state when coming from user mode + * @regs: Pointer to currents pt_regs * * Syscall/interrupt entry disables interrupts, but user mode is traced as * interrupts enabled. Also with NO_HZ_FULL RCU might be idle. @@ -357,6 +358,7 @@ irqentry_state_t noinstr irqentry_enter( * Conditional reschedule with additional sanity checks. */ void raw_irqentry_exit_cond_resched(void); + #ifdef CONFIG_PREEMPT_DYNAMIC #if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL) #define irqentry_exit_cond_resched_dynamic_enabled raw_irqentry_exit_cond_= resched From nobody Sat Oct 4 19:16:42 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8A8228D8D8 for ; Wed, 13 Aug 2025 16:29:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102576; cv=none; b=kMOZGKGN3dGFQRqN3KOWloPMwPSwYTGIH1Tz9v31d03B9dl37T3S3rgBgeyGIz5p8jF9iu4PZ4ulwq66bxv4hyPjrH7mEoKtbjioxJS3gNcr8G/G19odXPcX15r2q4cP1O2ceELg3AG0oZjUOe+gxtOagEuZgrRIa8rvVA9J/vM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102576; c=relaxed/simple; bh=3e83nG6p4d5uN3wwhpJuJqLbKSHLqHMSvS35UkZUuag=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=F0odlcDVs8LvY/6GDR3mvBiprrrh1zufoRysNRWAnAy0PAE7B6Pd/YziCyez3Ejj617hzbeNH2ri6Gd3Gu95e+JkZpy6lpS4PcCy1g2emD8brQvBGY8zTlqIMsmGXVf2AHr7XQk6G4D8ybmhuTZ35zyBak1zPlNaOAaHs2+pObQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=VkG0rcW9; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=m8V7XmA9; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="VkG0rcW9"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="m8V7XmA9" Message-ID: <20250813162824.292368751@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1755102573; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=DglVt0e/OnUdWnQ50KHsL8OGNpVNZbUOXHH6mXMI3gU=; b=VkG0rcW901+phFfCGgTjwYsBO9bqaaAtdSiSkPnFFq+r8KbcRqC34tdkWtI8LIWqf3JhjG dnfbFRBh9u+zFGkZcd8xZ86/M/ty+l21uVl7SlXZyf/3e/nN+F+zsqHMZMdDvDaggxHWYO Dum7SrFrStfGfe3bOn5bRNoqofheMxT5Ej8lAzjo6oOWtSu8R5SbtmdTf/GhlNlKkjDPyn Dt4oS8Pxf5a3LbZXLYYLsTU6ZT3m19yyT5Vt2nboW1z1DrENAa2T9WU71UVUx9lmfn2T1f SjbyFYiqosk8Xj/jkPCtNBBKTxGGIwFMfkC5xQpKO3n9GX9M1CwGVZD6WQ7ahw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1755102573; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=DglVt0e/OnUdWnQ50KHsL8OGNpVNZbUOXHH6mXMI3gU=; b=m8V7XmA9tJnTL0oVY638DsvBbchm36No6fimaPWbqLCbLrOFgmDGD6KRqTzRrZuo0nVCkP Lv9DODcf4mXDPCAg== From: Thomas Gleixner To: LKML Cc: Michael Jeanson , Peter Zijlstra , Mathieu Desnoyers , "Paul E. McKenney" , Boqun Feng , Wei Liu , Jens Axboe Subject: [patch 08/11] entry: Distinguish between syscall and interrupt exit References: <20250813155941.014821755@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Wed, 13 Aug 2025 18:29:31 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The upcoming time slice extension mechanism needs to know whether enter_from_user_mode() is invoked from a syscall or from an interrupt because time slice extensions are only granted on exit to user more from an interrupt. Add a function argument and provide wrappers so the call sites don't end up with incomprehensible true/false arguments. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra --- include/linux/entry-common.h | 2 +- include/linux/irq-entry-common.h | 22 +++++++++++++++------- kernel/entry/common.c | 7 ++++--- 3 files changed, 20 insertions(+), 11 deletions(-) --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -172,7 +172,7 @@ static __always_inline void syscall_exit if (unlikely(work & SYSCALL_WORK_EXIT)) syscall_exit_work(regs, work); local_irq_disable_exit_to_user(); - exit_to_user_mode_prepare(regs); + syscall_exit_to_user_mode_prepare(regs); } =20 /** --- a/include/linux/irq-entry-common.h +++ b/include/linux/irq-entry-common.h @@ -197,15 +197,13 @@ static __always_inline void arch_exit_to */ void arch_do_signal_or_restart(struct pt_regs *regs); =20 -/** - * exit_to_user_mode_loop - do any pending work before leaving to user spa= ce - */ -unsigned long exit_to_user_mode_loop(struct pt_regs *regs, - unsigned long ti_work); +/* Handle pending TIF work */ +unsigned long exit_to_user_mode_loop(struct pt_regs *regs, unsigned long t= i_work, bool from_irq); =20 /** * exit_to_user_mode_prepare - call exit_to_user_mode_loop() if required * @regs: Pointer to pt_regs on entry stack + * @from_irq: Exiting to user space from an interrupt * * 1) check that interrupts are disabled * 2) call tick_nohz_user_enter_prepare() @@ -213,7 +211,7 @@ unsigned long exit_to_user_mode_loop(str * EXIT_TO_USER_MODE_WORK are set * 4) check that interrupts are still disabled */ -static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs) +static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs= , bool from_irq) { unsigned long ti_work; =20 @@ -224,7 +222,7 @@ static __always_inline void exit_to_user =20 ti_work =3D read_thread_flags(); if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK)) - ti_work =3D exit_to_user_mode_loop(regs, ti_work); + ti_work =3D exit_to_user_mode_loop(regs, ti_work, from_irq); =20 arch_exit_to_user_mode_prepare(regs, ti_work); =20 @@ -236,6 +234,16 @@ static __always_inline void exit_to_user lockdep_sys_exit(); } =20 +static __always_inline void syscall_exit_to_user_mode_prepare(struct pt_re= gs *regs) +{ + exit_to_user_mode_prepare(regs, false); +} + +static __always_inline void irqentry_exit_to_user_mode_prepare(struct pt_r= egs *regs) +{ + exit_to_user_mode_prepare(regs, true); +} + /** * exit_to_user_mode - Fixup state when exiting to user mode * --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -15,9 +15,10 @@ void __weak arch_do_signal_or_restart(st * exit_to_user_mode_loop - do any pending work before leaving to user spa= ce * @regs: Pointer to pt_regs on entry stack * @ti_work: TIF work flags as read by the caller + * @from_irq: Exiting to user space from an interrupt */ -__always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs, - unsigned long ti_work) +__always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,= unsigned long ti_work, + bool from_irq) { /* * Before returning to user space ensure that all pending work @@ -70,7 +71,7 @@ noinstr void irqentry_enter_from_user_mo noinstr void irqentry_exit_to_user_mode(struct pt_regs *regs) { instrumentation_begin(); - exit_to_user_mode_prepare(regs); + irqentry_exit_to_user_mode_prepare(regs); instrumentation_end(); exit_to_user_mode(); } From nobody Sat Oct 4 19:16:42 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D95F72989BF for ; Wed, 13 Aug 2025 16:29:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102580; cv=none; b=dQ3zX00XKozPLlTATOaiuFaGvRlQTgGT7wZ7k6Vq3hcovW1ReMJ7RFFAXhYd0jE9gPZfkPhl71fsQmAl3/sURZS4UvndgrIMBm2RE8vYYz8sIbmhrw1JEW83GW/NjgZ1Hx67qVhBpJ9GTtGvCmWtubXFWoek9KYNdJQFOreHvSI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102580; c=relaxed/simple; bh=5rdYN6vj3qS0kTnRTIGmNPnuloWf1LcuC+Ww+TFABNE=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=m+aANYytTPDk1+51tWhMUVQGu0/hVkNDVvPgtZBjDvrGFE4rT5i/ogLvaiWzM0NfKmM+UjlHLe7FWMlapFsUwgdvFUh9qcCN5TDC+iFF8NnQ9MsXa0U5LdMM+8VWj+V5hdCHakWo3jqWKMPImuYe0tbAdpoWNaVXWQOawyPyqas= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=I6ZnOv7e; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Kk0X3F/d; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="I6ZnOv7e"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Kk0X3F/d" Message-ID: <20250813162824.356621744@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1755102576; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=fWcHtQhzCZGOeOX2ngjD4Pl0wJj/YhvxCFd7lvSBSUs=; b=I6ZnOv7ebtXulU1VvX6GdDhDwH6MIvdT6mx3VSR65EcnzAIJQkVrcl1+7KoEoYafe3Cxxh s93K8wZIDISMmEYX2s1ThhIFHWLxyntj0U37+r9JykFzmwsn0uTIofHAlwiNtuNkObIFzP AWcN13zvVYy9xRd+z1Lv6M1lJYvLNOzSf9yS4NxxldKoDWs0O6pp7/hkdTK1UZ+VCa7F0X LSWU3QLZEMvBsZsAGlrIIaEcT1aVzrk0wx5DrVAL8zWn4Vnb58XLWMEuCPpKQxaS3PaoJ3 UNe1UgBhXDdbeg8Z2eN1asO+zruUSi9Y8SG2aDP9nlJIjbhmoZNCNh6qYX9SMQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1755102576; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=fWcHtQhzCZGOeOX2ngjD4Pl0wJj/YhvxCFd7lvSBSUs=; b=Kk0X3F/dJimjxF++EoZQ4B+JeboTosbWgXF5Z9K3qwx0cHGeIZqF/e+4x9lfKbO+h4XLS2 HZz744Ns/fxMRNCA== From: Thomas Gleixner To: LKML Cc: Michael Jeanson , Wei Liu , Jens Axboe , Peter Zijlstra , Mathieu Desnoyers , "Paul E. McKenney" , Boqun Feng Subject: [patch 09/11] entry: Provide exit_to_user_notify_resume() References: <20250813155941.014821755@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Wed, 13 Aug 2025 18:29:34 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The TIF_NOTIFY_RESUME handler of restartable sequences is invoked as all other functionality unconditionally when TIF_NOTIFY_RESUME is set for what ever reason. The invocation is already conditional on the rseq_event_pending bit being set, but there is further room for improvement. The actual invocation cannot be avoided when the event bit is set, but the actual heavy lifting of accessing user space can be avoided, when the exit to user mode loop is from a syscall unless it's a debug kernel. There is no way for the RSEQ code to distinguish that case. That's trivial for all architectures which use the generic entry code, but for all others it's non-trivial work, which is beyond the scope of this. The architectures, which want to benefit should convert their code over to the generic entry code finally. To prepare for that optimization rename resume_user_mode_work() to exit_to_user_notify_resume() and add a @from_irq argument to it, which can be supplied by the caller. Let the generic entry code and all non-entry code users like hypervisors and IO-URING use this new function and supply the correct information. Any NOTIFY_RESUME work, which evaluates this new argument, has to make the evaluation dependent on CONFIG_GENERIC_ENTRY because otherwise there is no guarantee that the value is correct at all. Signed-off-by: Thomas Gleixner Cc: Wei Liu Cc: Jens Axboe Cc: Peter Zijlstra --- drivers/hv/mshv_common.c | 2 +- include/linux/resume_user_mode.h | 38 +++++++++++++++++++++++++++-------= ---- io_uring/io_uring.h | 2 +- kernel/entry/common.c | 2 +- kernel/entry/kvm.c | 2 +- 5 files changed, 31 insertions(+), 15 deletions(-) --- a/drivers/hv/mshv_common.c +++ b/drivers/hv/mshv_common.c @@ -155,7 +155,7 @@ int mshv_do_pre_guest_mode_work(ulong th schedule(); =20 if (th_flags & _TIF_NOTIFY_RESUME) - resume_user_mode_work(NULL); + exit_to_user_notify_resume(NULL, false); =20 return 0; } --- a/include/linux/resume_user_mode.h +++ b/include/linux/resume_user_mode.h @@ -24,21 +24,22 @@ static inline void set_notify_resume(str kick_process(task); } =20 - /** - * resume_user_mode_work - Perform work before returning to user mode - * @regs: user-mode registers of @current task + * exit_to_user_notify_resume - Perform work before returning to user mode + * @regs: user-mode registers of @current task + * @from_irq: If true this is a return from interrupt, if false it's + * a syscall return. * - * This is called when %TIF_NOTIFY_RESUME has been set. Now we are - * about to return to user mode, and the user state in @regs can be - * inspected or adjusted. The caller in arch code has cleared - * %TIF_NOTIFY_RESUME before the call. If the flag gets set again - * asynchronously, this will be called again before we return to - * user mode. + * This is called when %TIF_NOTIFY_RESUME has been set to handle the exit + * to user work, which is multiplexed under this TIF bit. The bit is + * cleared and work is probed as pending. If the flag gets set again before + * exiting to user space caller will invoke this again. * - * Called without locks. + * Any work invoked here, which wants to make decisions on @from_irq, must + * make these decisions dependent on CONFIG_GENERIC_ENTRY to retain the + * historical behaviour of resume_user_mode_work(). */ -static inline void resume_user_mode_work(struct pt_regs *regs) +static inline void exit_to_user_notify_resume(struct pt_regs *regs, bool f= rom_irq) { clear_thread_flag(TIF_NOTIFY_RESUME); /* @@ -62,4 +63,19 @@ static inline void resume_user_mode_work rseq_handle_notify_resume(regs); } =20 +#ifndef CONFIG_GENERIC_ENTRY +/** + * resume_user_mode_work - Perform work before returning to user mode + * @regs: user-mode registers of @current task + * + * This is a wrapper around exit_to_user_notify_resume() for the existing + * call sites in architecture code, which do not use the generic entry + * code. + */ +static inline void resume_user_mode_work(struct pt_regs *regs) +{ + exit_to_user_notify_resume(regs, false); +} +#endif + #endif /* LINUX_RESUME_USER_MODE_H */ --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -365,7 +365,7 @@ static inline int io_run_task_work(void) if (current->flags & PF_IO_WORKER) { if (test_thread_flag(TIF_NOTIFY_RESUME)) { __set_current_state(TASK_RUNNING); - resume_user_mode_work(NULL); + exit_to_user_notify_resume(NULL, false); } if (current->io_uring) { unsigned int count =3D 0; --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -41,7 +41,7 @@ void __weak arch_do_signal_or_restart(st arch_do_signal_or_restart(regs); =20 if (ti_work & _TIF_NOTIFY_RESUME) - resume_user_mode_work(regs); + exit_to_user_notify_resume(regs, from_irq); =20 /* Architecture specific TIF work */ arch_exit_to_user_mode_work(regs, ti_work); --- a/kernel/entry/kvm.c +++ b/kernel/entry/kvm.c @@ -17,7 +17,7 @@ static int xfer_to_guest_mode_work(struc schedule(); =20 if (ti_work & _TIF_NOTIFY_RESUME) - resume_user_mode_work(NULL); + exit_to_user_notify_resume(NULL, false); =20 ret =3D arch_xfer_to_guest_mode_handle_work(vcpu, ti_work); if (ret) From nobody Sat Oct 4 19:16:42 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7BE2E299948 for ; Wed, 13 Aug 2025 16:29:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102583; cv=none; b=Y/UkundntD9MjsDnkYhx4QZyvw7jO30BNxIcYipXupWkKKq8bNieZS2kv0ArzIqNNicuqvNtBKCaMab9zPeZb8f9Sy/bcqoM12qwBjES6N5JqZvcK896nS9OdXf04mrFjX76fUGYQtAovJPF+ItYGG51XDY8ur2CeLfiVgzfoxs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102583; c=relaxed/simple; bh=e++jtXx1AVP8PYbK2GIyL33zMqQDaIpLZ87iJxg+IGs=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=bs7FWEMcn5HnH0iE+XQZLcgiib2F61Ocrt0nQprRlo8CuWYu9Y4zdmeLPtQmRNAyZgblE7RivGtZKFsfuM6hMAeQJzrTucYCgpDEWc0AvPv5722bS+MnKTYdQFHsm9MnsaWja1ZpZTisSYvUI3vZpeoGsksD2Vgi0CRwtlKGn8Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=ONHD7JLB; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=+qlJPQSL; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="ONHD7JLB"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="+qlJPQSL" Message-ID: <20250813162824.420583910@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1755102578; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=0UA6AfVfDpisXE6yID1nZmppuD1kF4OtUvT17obAfwI=; b=ONHD7JLBTZq02oCQ0B5e4tqUJzwls6ViJ83dahpsPy7fEX5y7V6qyVapNMcxoTSTyv0bCV HqgF2LxacgsvHJbu8K87u+tO2YYH+NYymM8UYnzY0OsV+killjtmPlNOEhqQAGt7D2LoD/ EidY4RT6YJ65czpgbVwnAsHCPCCfDoQsxAjee37KAWdbpc8wYkScqpIUetDe2BuCKqWv0r YBsMOgwlpQAnD3RdHoWhGtvyr0UJSdCUfIofBsO1/xrtSkalmOgw8n/e6MN02iZeqWZGRm 1WBCUxCEWnVmbdFQMILLImmQ3rnsf7yw3zT6ycx4ulBWX0Ne6OE+tBO5olhf/g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1755102578; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=0UA6AfVfDpisXE6yID1nZmppuD1kF4OtUvT17obAfwI=; b=+qlJPQSLuRF33XC8WEySzbXKOe0sRbmXf0cYPjXrtvIsSY07p4O+s+pJ1UJ7d31G4NkXuo 6r42Mx7lHST8N1Dw== From: Thomas Gleixner To: LKML Cc: Michael Jeanson , Mathieu Desnoyers , Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Wei Liu , Jens Axboe Subject: [patch 10/11] rseq: Skip fixup when returning from a syscall References: <20250813155941.014821755@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Wed, 13 Aug 2025 18:29:37 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The TIF_NOTIFY_RESUME handler of restartable sequences is invoked as all other functionality unconditionally when TIF_NOTIFY_RESUME is set for what ever reason. The invocation is already conditional on the rseq_event_pending bit being set, but there is further room for improvement. The heavy lifting of critical section fixup can be completely avoided, when the exit to user mode loop is from a syscall unless it's a debug kernel. There was no way for the RSEQ code to distinguish that case so far. On architectures, which enable CONFIG_GENERIC_ENTRY, the information is now available through a function argument to exit_to_user_notify_resume(), which tells whether the invocation comes from return from syscall or return from interrupt. Let the RSEQ code utilize this 'from_irq' argument when - CONFIG_GENERIC_ENTRY is enabled - CONFIG_DEBUG_RSEQ is disabled and skip the critical section fixup when the invocation comes from a syscall return. The update of CPU and node ID has to happen in both cases, so the out of line call has always to happen, when a event is pending whether it's a syscall return or not. This changes the current behaviour, which just blindly fixes up the critical section unconditionally in the syscall case. But that's a user space problem when it invokes a syscall from within a critical section and expects it to work. That code was clearly never tested on a debug kernel and user space can keep the pieces. Signed-off-by: Thomas Gleixner Cc: Mathieu Desnoyers Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Boqun Feng --- include/linux/resume_user_mode.h | 2 +- include/linux/rseq.h | 12 ++++++------ kernel/rseq.c | 22 +++++++++++++++++++++- 3 files changed, 28 insertions(+), 8 deletions(-) --- a/include/linux/resume_user_mode.h +++ b/include/linux/resume_user_mode.h @@ -60,7 +60,7 @@ static inline void exit_to_user_notify_r mem_cgroup_handle_over_high(GFP_KERNEL); blkcg_maybe_throttle_current(); =20 - rseq_handle_notify_resume(regs); + rseq_handle_notify_resume(regs, from_irq); } =20 #ifndef CONFIG_GENERIC_ENTRY --- a/include/linux/rseq.h +++ b/include/linux/rseq.h @@ -13,19 +13,19 @@ static inline void rseq_set_notify_resum set_tsk_thread_flag(t, TIF_NOTIFY_RESUME); } =20 -void __rseq_handle_notify_resume(struct ksignal *sig, struct pt_regs *regs= ); +void __rseq_handle_notify_resume(struct ksignal *sig, struct pt_regs *regs, + bool from_irq); =20 -static inline void rseq_handle_notify_resume(struct pt_regs *regs) +static inline void rseq_handle_notify_resume(struct pt_regs *regs, bool fr= om_irq) { if (IS_ENABLED(CONFIG_DEBUG_RESQ) || READ_ONCE(current->rseq_event_pendin= g)) - __rseq_handle_notify_resume(NULL, regs); + __rseq_handle_notify_resume(NULL, regs, from_irq); } =20 -static inline void rseq_signal_deliver(struct ksignal *ksig, - struct pt_regs *regs) +static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_reg= s *regs) { if (current->rseq) - __rseq_handle_notify_resume(ksig, regs); + __rseq_handle_notify_resume(ksig, regs, false); } =20 static inline void rseq_notify_event(struct task_struct *t) --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -408,6 +408,22 @@ static int rseq_ip_fixup(struct pt_regs return 0; } =20 +static inline bool rseq_ignore_event(bool from_irq, bool ksig) +{ + /* + * On architectures which do not select_GENERIC_ENTRY + * @from_irq is not usable. + */ + if (IS_ENABLED(CONFIG_DEBUG_RSEQ) || !IS_ENABLED(CONFIG_GENERIC_ENTRY)) + return false; + + /* + * Avoid the heavy lifting when this is a return from syscall, + * i.e. not from interrupt and not from signal delivery. + */ + return !from_irq && !ksig; +} + /* * This resume handler must always be executed between any of: * - preemption, @@ -419,7 +435,8 @@ static int rseq_ip_fixup(struct pt_regs * respect to other threads scheduled on the same CPU, and with respect * to signal handlers. */ -void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *reg= s) +void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *reg= s, + bool from_irq) { struct task_struct *t =3D current; int ret, sig; @@ -467,6 +484,9 @@ void __rseq_handle_notify_resume(struct t->rseq_event_pending =3D false; } =20 + if (rseq_ignore_event(from_irq, !!ksig)) + event =3D false; + if (IS_ENABLED(CONFIG_DEBUG_RSEQ) || event) { ret =3D rseq_ip_fixup(regs, event); if (unlikely(ret < 0)) From nobody Sat Oct 4 19:16:42 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 979F629AAE3 for ; Wed, 13 Aug 2025 16:29:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102585; cv=none; b=n0WGmnPSvr93+zlJBi3RhuWa3+XoF9xm4Wu+TQduV5PXQglj7wJdftyq7yVG5SIn8N2YVYENebU3SXCDkT+fMVX3GpjeuORq5TEC4/qq9EQ90OcX5sMAer+NHonodDwM4m1oyoR6QeY3gGep+y3rBM7U/L5zOPKZ+idP5bXh2iQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755102585; c=relaxed/simple; bh=cSSe48lgo94OmNsbAUAQqVvgKmEA2RWn8PLpW8TUuKQ=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=kSb7agjfDMiyNcwrqunF15/EPEZeQ1SHIZ/FRoDF30L4isjc5BHeaccxa45m22Mxa7QCXUQ398QFon9y3v8cos+Jx+ZGqD/qtjrHw08QxiJI+YhiESAe3U7YrkcfhO1wWB1OeBB/lrSZdxpgYaaiknBv6jEwZe+rwOCGsCG96c0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=kJ6KVU5I; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=lIoljJvs; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="kJ6KVU5I"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="lIoljJvs" Message-ID: <20250813162824.484590626@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1755102581; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=R8d6iV4UVddxDHLTJplt0kBT3CZago4jtYFo0SqhSyI=; b=kJ6KVU5INe/3BxYaK2ABxHcYVY2fCKRyc05uRFQbs8RtTuTT0d8Ex/GEN1XO44lq6D2Fo1 xacUaC496K/oKFIIORFbnuehN0CGwfC9y6m/YmnVeMpnvS+h8PlPjHc98sHM0icMaQgaPO b2zZJvP1N2tKcaN24JqzNjTZp3RoaEIi9OV9zrCTNOuvfLiN5Hzn3UN7ZpXXQMqxo8XIq5 r3/BsqUk7g5/rgAhrkXg0j+W2XLwGXLhlHVlflZ5RhDRq8feiBcJNpYCKZi9rxoNIdgHls 0OfB2tX9FUxKD699h+ODDzClLfXnq2i3E+ZUc9a/JaK78fPe90vYJ12Ne3zu7g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1755102581; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=R8d6iV4UVddxDHLTJplt0kBT3CZago4jtYFo0SqhSyI=; b=lIoljJvsNzkkBDbV0doyvQL3/cfNR47zDu10z733GDlbC1gYsOI/CqN/zlJRPaFm6M+bIJ 94CqjKZB6x1HPiAw== From: Thomas Gleixner To: LKML Cc: Michael Jeanson , Mathieu Desnoyers , Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Wei Liu , Jens Axboe Subject: [patch 11/11] rseq: Convert to masked user access where applicable References: <20250813155941.014821755@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Wed, 13 Aug 2025 18:29:39 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Masked user access optimizes the Spectre-V1 speculation barrier on architectures which support it. As rseq_handle_notify_resume() is frequently invoked, the access to the critical section pointer in the rseq ABI is a hotpath, which is worth to optimize. Replace also the clearing with the optimized version. Signed-off-by: Thomas Gleixner Cc: Mathieu Desnoyers Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Boqun Feng --- kernel/rseq.c | 23 +++-------------------- 1 file changed, 3 insertions(+), 20 deletions(-) --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -245,20 +245,9 @@ static int rseq_reset_rseq_cpu_node_id(s /* * Get the user-space pointer value stored in the 'rseq_cs' field. */ -static int rseq_get_rseq_cs_ptr_val(struct rseq __user *rseq, u64 *rseq_cs) +static inline int rseq_get_rseq_cs_ptr_val(struct rseq __user *rseq, u64 *= rseq_cs) { - if (!rseq_cs) - return -EFAULT; - -#ifdef CONFIG_64BIT - if (get_user(*rseq_cs, &rseq->rseq_cs)) - return -EFAULT; -#else - if (copy_from_user(rseq_cs, &rseq->rseq_cs, sizeof(*rseq_cs))) - return -EFAULT; -#endif - - return 0; + return get_user_masked_u64(rseq_cs, &rseq->rseq_cs); } =20 /* @@ -358,13 +347,7 @@ static int clear_rseq_cs(struct rseq __u * * Set rseq_cs to NULL. */ -#ifdef CONFIG_64BIT - return put_user(0UL, &rseq->rseq_cs); -#else - if (clear_user(&rseq->rseq_cs, sizeof(rseq->rseq_cs))) - return -EFAULT; - return 0; -#endif + return put_user_masked_u64(0UL, &rseq->rseq_cs); } =20 /*