From nobody Fri Oct 3 23:08:27 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C68326C39E for ; Sat, 23 Aug 2025 16:39:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755967164; cv=none; b=qt3eE2H5mH/ofKvWD4aL+Y/hRdgG8FR4FfM/Z6PM9XWwUfVpzGc7RTWunAAo/q7RpO3jSVEiYkXcB9eONnlj7tZwMx8T9c3xzzFmDiOrlIJlv6Gh4uRMkPjFk6+rD3i64WhnKgsDF5bDHVuYfsc4WdESn49rsCFQgIZ7yzWaGIM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755967164; c=relaxed/simple; bh=NQgAMDZG9bJ1vSYuVqvxbYPnQ4JyZOSW+hXCf2oYBBs=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=gkeue13zAgyeVO+dD5jPBC9vUv/837KWjYjM6c/gpO+Ffxz+FXnVg+X3A+8GQMFZmsuT1UsPFb2UBQd7NW3X8LNN8wtyypnZtUEfmLc4emM/Tqf7RBE4n8+Sj5YTiAv1XeP9LgsalL7h2m6lgeDms2fkKnDlNt+VYPa8CX0Az3o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=jot4fIVy; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=mn7C5sKI; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="jot4fIVy"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="mn7C5sKI" Message-ID: <20250823161653.644902433@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1755967160; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=9x2OIn6EDnOHAJLiBa7SCRZKrsC/CYR9wiaI2sa7Of0=; b=jot4fIVyRgHWOlM5Kdts+0J5VpvxLa+dGsRgfkl5eau4g+NW9FNmCKISuJuOYi0OIB+wkE JrczgjIF+Yu4o86LpiEP1EVAi5GEheQGj3cQWzU0JCsjxC0y3QpkMEoUKyPSpgF5wkn4w3 rH3BEJjfRw44GvJ0VV2RIm9T41SksgtdDJ6WijRvHesqNiORjkeScoRD9B4i8GiXiVg44m 5hIhoePGskHlSpM9cMFhXtJiWFWPmODxmq3dTLYpKs79UZPaUxZmGX6TXCZjN2Tpt9Hm5s 2JAXjH5cXfVSd9Qmatf1D4HxBEm/zwr8vp7eOGBfCU2NwBaDXbb0PalEVmfeuQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1755967160; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=9x2OIn6EDnOHAJLiBa7SCRZKrsC/CYR9wiaI2sa7Of0=; b=mn7C5sKIncJoA5reJcGHh0rs35qp/KRYu7DorBZ85Zr0rXAy1ic8SCO2SqApoOWKGh9g0Y 6AVvC5FdgF9QPkCQ== From: Thomas Gleixner To: LKML Cc: Jens Axboe , Peter Zijlstra , Mathieu Desnoyers , "Paul E. McKenney" , Boqun Feng , Paolo Bonzini , Sean Christopherson , Wei Liu , Dexuan Cui , x86@kernel.org, Arnd Bergmann , Heiko Carstens , Christian Borntraeger , Sven Schnelle , Huacai Chen , Paul Walmsley , Palmer Dabbelt Subject: [patch V2 06/37] rseq: Simplify the event notification References: <20250823161326.635281786@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 23 Aug 2025 18:39:19 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since commit 0190e4198e47 ("rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags") the bits in task::rseq_event_mask are meaningless and just extra work in terms of setting them individually. Aside of that the only relevant point where an event has to be raised is context switch. Neither the CPU nor MM CID can change without going through a context switch. Collapse them all into a single boolean which simplifies the code a lot and remove the pointless invocations which have been sprinkled all over the place for no value. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Mathieu Desnoyers Cc: "Paul E. McKenney" Cc: Boqun Feng --- V2: Reduce it to the sched switch event. --- fs/exec.c | 2 - include/linux/rseq.h | 66 +++++++++--------------------------------= ----- include/linux/sched.h | 10 +++--- include/uapi/linux/rseq.h | 21 ++++---------- kernel/rseq.c | 28 +++++++++++-------- kernel/sched/core.c | 5 --- kernel/sched/membarrier.c | 8 ++--- 7 files changed, 48 insertions(+), 92 deletions(-) --- --- a/fs/exec.c +++ b/fs/exec.c @@ -1775,7 +1775,7 @@ static int bprm_execve(struct linux_binp force_fatal_sig(SIGSEGV); =20 sched_mm_cid_after_execve(current); - rseq_set_notify_resume(current); + rseq_sched_switch_event(current); current->in_execve =3D 0; =20 return retval; --- a/include/linux/rseq.h +++ b/include/linux/rseq.h @@ -3,38 +3,8 @@ #define _LINUX_RSEQ_H =20 #ifdef CONFIG_RSEQ - -#include #include =20 -#ifdef CONFIG_MEMBARRIER -# define RSEQ_EVENT_GUARD irq -#else -# define RSEQ_EVENT_GUARD preempt -#endif - -/* - * Map the event mask on the user-space ABI enum rseq_cs_flags - * for direct mask checks. - */ -enum rseq_event_mask_bits { - RSEQ_EVENT_PREEMPT_BIT =3D RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT, - RSEQ_EVENT_SIGNAL_BIT =3D RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT, - RSEQ_EVENT_MIGRATE_BIT =3D RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT, -}; - -enum rseq_event_mask { - RSEQ_EVENT_PREEMPT =3D (1U << RSEQ_EVENT_PREEMPT_BIT), - RSEQ_EVENT_SIGNAL =3D (1U << RSEQ_EVENT_SIGNAL_BIT), - RSEQ_EVENT_MIGRATE =3D (1U << RSEQ_EVENT_MIGRATE_BIT), -}; - -static inline void rseq_set_notify_resume(struct task_struct *t) -{ - if (t->rseq) - set_tsk_thread_flag(t, TIF_NOTIFY_RESUME); -} - void __rseq_handle_notify_resume(struct ksignal *sig, struct pt_regs *regs= ); =20 static inline void rseq_handle_notify_resume(struct pt_regs *regs) @@ -43,35 +13,27 @@ static inline void rseq_handle_notify_re __rseq_handle_notify_resume(NULL, regs); } =20 -static inline void rseq_signal_deliver(struct ksignal *ksig, - struct pt_regs *regs) +static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_reg= s *regs) { if (current->rseq) { - scoped_guard(RSEQ_EVENT_GUARD) - __set_bit(RSEQ_EVENT_SIGNAL_BIT, ¤t->rseq_event_mask); + current->rseq_event_pending =3D true; __rseq_handle_notify_resume(ksig, regs); } } =20 -/* rseq_preempt() requires preemption to be disabled. */ -static inline void rseq_preempt(struct task_struct *t) +static inline void rseq_sched_switch_event(struct task_struct *t) { - __set_bit(RSEQ_EVENT_PREEMPT_BIT, &t->rseq_event_mask); - rseq_set_notify_resume(t); -} - -/* rseq_migrate() requires preemption to be disabled. */ -static inline void rseq_migrate(struct task_struct *t) -{ - __set_bit(RSEQ_EVENT_MIGRATE_BIT, &t->rseq_event_mask); - rseq_set_notify_resume(t); + if (t->rseq) { + t->rseq_event_pending =3D true; + set_tsk_thread_flag(t, TIF_NOTIFY_RESUME); + } } =20 static __always_inline void rseq_exit_to_user_mode(void) { if (IS_ENABLED(CONFIG_DEBUG_RSEQ)) { - if (WARN_ON_ONCE(current->rseq && current->rseq_event_mask)) - current->rseq_event_mask =3D 0; + if (WARN_ON_ONCE(current->rseq && current->rseq_event_pending)) + current->rseq_event_pending =3D false; } } =20 @@ -85,12 +47,12 @@ static inline void rseq_fork(struct task t->rseq =3D NULL; t->rseq_len =3D 0; t->rseq_sig =3D 0; - t->rseq_event_mask =3D 0; + t->rseq_event_pending =3D false; } else { t->rseq =3D current->rseq; t->rseq_len =3D current->rseq_len; t->rseq_sig =3D current->rseq_sig; - t->rseq_event_mask =3D current->rseq_event_mask; + t->rseq_event_pending =3D current->rseq_event_pending; } } =20 @@ -99,15 +61,13 @@ static inline void rseq_execve(struct ta t->rseq =3D NULL; t->rseq_len =3D 0; t->rseq_sig =3D 0; - t->rseq_event_mask =3D 0; + t->rseq_event_pending =3D false; } =20 #else /* CONFIG_RSEQ */ -static inline void rseq_set_notify_resume(struct task_struct *t) { } static inline void rseq_handle_notify_resume(struct ksignal *ksig, struct = pt_regs *regs) { } static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_reg= s *regs) { } -static inline void rseq_preempt(struct task_struct *t) { } -static inline void rseq_migrate(struct task_struct *t) { } +static inline void rseq_sched_switch_event(struct task_struct *t) { } static inline void rseq_fork(struct task_struct *t, unsigned long clone_fl= ags) { } static inline void rseq_execve(struct task_struct *t) { } static inline void rseq_exit_to_user_mode(void) { } --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1401,14 +1401,14 @@ struct task_struct { #endif /* CONFIG_NUMA_BALANCING */ =20 #ifdef CONFIG_RSEQ - struct rseq __user *rseq; - u32 rseq_len; - u32 rseq_sig; + struct rseq __user *rseq; + u32 rseq_len; + u32 rseq_sig; /* - * RmW on rseq_event_mask must be performed atomically + * RmW on rseq_event_pending must be performed atomically * with respect to preemption. */ - unsigned long rseq_event_mask; + bool rseq_event_pending; # ifdef CONFIG_DEBUG_RSEQ /* * This is a place holder to save a copy of the rseq fields for --- a/include/uapi/linux/rseq.h +++ b/include/uapi/linux/rseq.h @@ -114,20 +114,13 @@ struct rseq { /* * Restartable sequences flags field. * - * This field should only be updated by the thread which - * registered this data structure. Read by the kernel. - * Mainly used for single-stepping through rseq critical sections - * with debuggers. - * - * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT - * Inhibit instruction sequence block restart on preemption - * for this thread. - * - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL - * Inhibit instruction sequence block restart on signal - * delivery for this thread. - * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE - * Inhibit instruction sequence block restart on migration for - * this thread. + * This field was initialy intended to allow event masking for for + * single-stepping through rseq critical sections with debuggers. + * The kernel does not support this anymore and the relevant bits + * are checked for being always false: + * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT + * - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL + * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE */ __u32 flags; =20 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -78,6 +78,12 @@ #define CREATE_TRACE_POINTS #include =20 +#ifdef CONFIG_MEMBARRIER +# define RSEQ_EVENT_GUARD irq +#else +# define RSEQ_EVENT_GUARD preempt +#endif + /* The original rseq structure size (including padding) is 32 bytes. */ #define ORIG_RSEQ_SIZE 32 =20 @@ -430,11 +436,11 @@ void __rseq_handle_notify_resume(struct */ if (regs) { /* - * Read and clear the event mask first. If the task was not - * preempted or migrated or a signal is on the way, there - * is no point in doing any of the heavy lifting here on - * production kernels. In that case TIF_NOTIFY_RESUME was - * raised by some other functionality. + * Read and clear the event pending bit first. If the task + * was not preempted or migrated or a signal is on the way, + * there is no point in doing any of the heavy lifting here + * on production kernels. In that case TIF_NOTIFY_RESUME + * was raised by some other functionality. * * This is correct because the read/clear operation is * guarded against scheduler preemption, which makes it CPU @@ -447,15 +453,15 @@ void __rseq_handle_notify_resume(struct * with the result handed in to allow the detection of * inconsistencies. */ - u32 event_mask; + bool event; =20 scoped_guard(RSEQ_EVENT_GUARD) { - event_mask =3D t->rseq_event_mask; - t->rseq_event_mask =3D 0; + event =3D t->rseq_event_pending; + t->rseq_event_pending =3D false; } =20 - if (IS_ENABLED(CONFIG_DEBUG_RSEQ) || event_mask) { - ret =3D rseq_ip_fixup(regs, !!event_mask); + if (IS_ENABLED(CONFIG_DEBUG_RSEQ) || event) { + ret =3D rseq_ip_fixup(regs, event); if (unlikely(ret < 0)) goto error; } @@ -584,7 +590,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user * registered, ensure the cpu_id_start and cpu_id fields * are updated before returning to user-space. */ - rseq_set_notify_resume(current); + rseq_sched_switch_event(current); =20 return 0; } --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3364,7 +3364,6 @@ void set_task_cpu(struct task_struct *p, if (p->sched_class->migrate_task_rq) p->sched_class->migrate_task_rq(p, new_cpu); p->se.nr_migrations++; - rseq_migrate(p); sched_mm_cid_migrate_from(p); perf_event_task_migrate(p); } @@ -4795,7 +4794,6 @@ int sched_cgroup_fork(struct task_struct p->sched_task_group =3D tg; } #endif - rseq_migrate(p); /* * We're setting the CPU for the first time, we don't migrate, * so use __set_task_cpu(). @@ -4859,7 +4857,6 @@ void wake_up_new_task(struct task_struct * as we're not fully set-up yet. */ p->recent_used_cpu =3D task_cpu(p); - rseq_migrate(p); __set_task_cpu(p, select_task_rq(p, task_cpu(p), &wake_flags)); rq =3D __task_rq_lock(p, &rf); update_rq_clock(rq); @@ -5153,7 +5150,7 @@ prepare_task_switch(struct rq *rq, struc kcov_prepare_switch(prev); sched_info_switch(rq, prev, next); perf_event_task_sched_out(prev, next); - rseq_preempt(prev); + rseq_sched_switch_event(prev); fire_sched_out_preempt_notifiers(prev, next); kmap_local_sched_out(); prepare_task(next); --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -199,7 +199,7 @@ static void ipi_rseq(void *info) * is negligible. */ smp_mb(); - rseq_preempt(current); + rseq_sched_switch_event(current); } =20 static void ipi_sync_rq_state(void *info) @@ -407,9 +407,9 @@ static int membarrier_private_expedited( * membarrier, we will end up with some thread in the mm * running without a core sync. * - * For RSEQ, don't rseq_preempt() the caller. User code - * is not supposed to issue syscalls at all from inside an - * rseq critical section. + * For RSEQ, don't invoke rseq_sched_switch_event() on the + * caller. User code is not supposed to issue syscalls at + * all from inside an rseq critical section. */ if (flags !=3D MEMBARRIER_FLAG_SYNC_CORE) { preempt_disable();