From nobody Sun Dec 14 18:12:29 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CA1D3016EF; Tue, 4 Nov 2025 08:17:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762244232; cv=none; b=Z54Aa/PeITfhZfmaLIQyLQnk2iCYMzlJJD3mbAkfF1uuur+NmIE5A9nqi7Nl6scTQ/kA9ZL6vKKLUvs6lhyAUpnmFXIMKgxxhASIxIQox/kMxdVgPlTmVsRx12qx+1vCnKgFv5NiG1Z38uhQ7gk1uAIVDllKYeDkBws1GRPc8+U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762244232; c=relaxed/simple; bh=nlDVeY3X2dHqyRprw1fLo249h5uXNgwIWYbjjUWKffM=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=Svrr5DhLqejStYoqIH+MEV1YrBzKGSLFUEoUtSgMkF9eXxA9DuftrnRclPoj9fgFG7qvKJ1bjqNwOhQsRbqRZaSNFL3SKRYlv+PrAgPYoFmi6huiR/ae6Pd6EFMDjchIsQh2LdiU9CGSNDJP0I1ecWEllbmAzr2xyX3T7fW7Ut0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=kNHT4H1H; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=nyTDEXAi; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="kNHT4H1H"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="nyTDEXAi" Date: Tue, 04 Nov 2025 08:17:07 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1762244228; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vVGH8ZWjmbzbKr3ekXz5kbh96k5FLE1x/WlHuT70o8U=; b=kNHT4H1Hh3jthTPFcrIoFRBqMxrfAyhCVkCP0oyHWIrGkXbWOePJRBriiUm3moGXhW4N15 FeoBXqg+vMSsq3T8cVXbbGy/DpmyuP1XfRhd1twiSdP9FRoYx5Sy2vFGcPIc8srP4GE1hK SQLtfm8fD1rFC0YPtvJqC8fWzzwBv2jdBvv3dHYeR6DetYjlmEsUBAnTYMujbbkSiPMSOA pbEZUJIwCAbTDPUBS2Sg62nVwDmBZ1QqHyQ2IaHoY9tygkJbWGs4i0tA29iSZPbILA98Ec 2FHf3XfWsUiddWyGxhmga/eYwI1LagR6X6z04fIry8sSXMEgVt6lYwUuDWoxjA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1762244228; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vVGH8ZWjmbzbKr3ekXz5kbh96k5FLE1x/WlHuT70o8U=; b=nyTDEXAiCg+IiXHso2XZRSsna3Nbs5bO5h177mR1TTIjuyS/Ybuxyb/Fr5FcN2mRnTttOp PBA4ghWABVotm2Ag== From: "tip-bot2 for Thomas Gleixner" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: core/rseq] rseq: Rework the TIF_NOTIFY handler Cc: Thomas Gleixner , "Peter Zijlstra (Intel)" , Ingo Molnar , Mathieu Desnoyers , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20251027084307.517640811@linutronix.de> References: <20251027084307.517640811@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <176224422709.2601451.8867748041064816669.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the core/rseq branch of tip: Commit-ID: e2d4f42271155045a49b89530f2c06ad8e9f1a1e Gitweb: https://git.kernel.org/tip/e2d4f42271155045a49b89530f2c06ad8= e9f1a1e Author: Thomas Gleixner AuthorDate: Mon, 27 Oct 2025 09:45:12 +01:00 Committer: Ingo Molnar CommitterDate: Tue, 04 Nov 2025 08:33:54 +01:00 rseq: Rework the TIF_NOTIFY handler Replace the whole logic with a new implementation, which is shared with signal delivery and the upcoming exit fast path. Contrary to the original implementation, this ignores invocations from KVM/IO-uring, which invoke resume_user_mode_work() with the @regs argument set to NULL. The original implementation updated the CPU/Node/MM CID fields, but that was just a side effect, which was addressing the problem that this invocation cleared TIF_NOTIFY_RESUME, which in turn could cause an update on return to user space to be lost. This problem has been addressed differently, so that it's not longer required to do that update before entering the guest. That might be considered a user visible change, when the hosts thread TLS memory is mapped into the guest, but as this was never intentionally supported, this abuse of kernel internal implementation details is not considered an ABI break. Signed-off-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Reviewed-by: Mathieu Desnoyers Link: https://patch.msgid.link/20251027084307.517640811@linutronix.de --- include/linux/rseq_entry.h | 29 ++++++++++++++- kernel/rseq.c | 76 ++++++++++++++++--------------------- 2 files changed, 62 insertions(+), 43 deletions(-) diff --git a/include/linux/rseq_entry.h b/include/linux/rseq_entry.h index 37444e8..aa1c046 100644 --- a/include/linux/rseq_entry.h +++ b/include/linux/rseq_entry.h @@ -368,6 +368,35 @@ efault: return false; } =20 +/* + * Update user space with new IDs and conditionally check whether the task + * is in a critical section. + */ +static rseq_inline bool rseq_update_usr(struct task_struct *t, struct pt_r= egs *regs, + struct rseq_ids *ids, u32 node_id) +{ + u64 csaddr; + + if (!rseq_set_ids_get_csaddr(t, ids, node_id, &csaddr)) + return false; + + /* + * On architectures which utilize the generic entry code this + * allows to skip the critical section when the entry was not from + * a user space interrupt, unless debug mode is enabled. + */ + if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY)) { + if (!static_branch_unlikely(&rseq_debug_enabled)) { + if (likely(!t->rseq.event.user_irq)) + return true; + } + } + if (likely(!csaddr)) + return true; + /* Sigh, this really needs to do work */ + return rseq_update_user_cs(t, regs, csaddr); +} + static __always_inline void rseq_exit_to_user_mode(void) { struct rseq_event *ev =3D ¤t->rseq.event; diff --git a/kernel/rseq.c b/kernel/rseq.c index 13faadc..148fb21 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -82,12 +82,6 @@ #define CREATE_TRACE_POINTS #include =20 -#ifdef CONFIG_MEMBARRIER -# define RSEQ_EVENT_GUARD irq -#else -# define RSEQ_EVENT_GUARD preempt -#endif - DEFINE_STATIC_KEY_MAYBE(CONFIG_RSEQ_DEBUG_DEFAULT_ENABLE, rseq_debug_enabl= ed); =20 static inline void rseq_control_debug(bool on) @@ -239,38 +233,15 @@ efault: return false; } =20 -/* - * This resume handler must always be executed between any of: - * - preemption, - * - signal delivery, - * and return to user-space. - * - * This is how we can ensure that the entire rseq critical section - * will issue the commit instruction only if executed atomically with - * respect to other threads scheduled on the same CPU, and with respect - * to signal handlers. - */ -void __rseq_handle_notify_resume(struct pt_regs *regs) +static void rseq_slowpath_update_usr(struct pt_regs *regs) { + /* Preserve rseq state and user_irq state for exit to user */ + const struct rseq_event evt_mask =3D { .has_rseq =3D true, .user_irq =3D = true, }; struct task_struct *t =3D current; struct rseq_ids ids; u32 node_id; bool event; =20 - /* - * If invoked from hypervisors before entering the guest via - * resume_user_mode_work(), then @regs is a NULL pointer. - * - * resume_user_mode_work() clears TIF_NOTIFY_RESUME and re-raises - * it before returning from the ioctl() to user space when - * rseq_event.sched_switch is set. - * - * So it's safe to ignore here instead of pointlessly updating it - * in the vcpu_run() loop. - */ - if (!regs) - return; - if (unlikely(t->flags & PF_EXITING)) return; =20 @@ -294,26 +265,45 @@ void __rseq_handle_notify_resume(struct pt_regs *regs) * with the result handed in to allow the detection of * inconsistencies. */ - scoped_guard(RSEQ_EVENT_GUARD) { + scoped_guard(irq) { event =3D t->rseq.event.sched_switch; - t->rseq.event.sched_switch =3D false; + t->rseq.event.all &=3D evt_mask.all; ids.cpu_id =3D task_cpu(t); ids.mm_cid =3D task_mm_cid(t); } =20 - if (!IS_ENABLED(CONFIG_DEBUG_RSEQ) && !event) + if (!event) return; =20 - if (!rseq_handle_cs(t, regs)) - goto error; - node_id =3D cpu_to_node(ids.cpu_id); - if (!rseq_set_ids(t, &ids, node_id)) - goto error; - return; =20 -error: - force_sig(SIGSEGV); + if (unlikely(!rseq_update_usr(t, regs, &ids, node_id))) { + /* + * Clear the errors just in case this might survive magically, but + * leave the rest intact. + */ + t->rseq.event.error =3D 0; + force_sig(SIGSEGV); + } +} + +void __rseq_handle_notify_resume(struct pt_regs *regs) +{ + /* + * If invoked from hypervisors before entering the guest via + * resume_user_mode_work(), then @regs is a NULL pointer. + * + * resume_user_mode_work() clears TIF_NOTIFY_RESUME and re-raises + * it before returning from the ioctl() to user space when + * rseq_event.sched_switch is set. + * + * So it's safe to ignore here instead of pointlessly updating it + * in the vcpu_run() loop. + */ + if (!regs) + return; + + rseq_slowpath_update_usr(regs); } =20 void __rseq_signal_deliver(int sig, struct pt_regs *regs)