From nobody Wed Sep 10 05:32:07 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 555DD31C583; Mon, 8 Sep 2025 23:00:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757372417; cv=none; b=ObXmFMnKc6t8/klsYCE05fkSlJH24jaH4b+nwufdrVPbXqTM56+dGXQxKD/qquWy6Auzxof8hgvattALE485MBchwsbDaKkkZ8PoyEjDhEbkOq5zdD3UeXQ6UjyWGX3APs9IZmw7luGeuUTwYpb2q1lRfE+nFzXZ56etFVShOaM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757372417; c=relaxed/simple; bh=SEywHGYDZB/vWqQs5f8TkQkX7TpCbv5rbeoX0Dxswtk=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=ZSXOkKkf/KRaM8+yxc2pqPsMAeDVF03kpHvSmrIEoZdm2OE+aSmyvFScqtqGtkIy2rAItAJsVN/d9etzRpHN/1skeaRkfh6RTtpX4p/aXOy7R+PPhuau3lTLPW4FtXxoq834Yy3/zr79ffiMRS8KUMKzvYCA2l3Wmx+qA3Ds2Uo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=MdoXWnI/; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=AcVER9Sj; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="MdoXWnI/"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="AcVER9Sj" Message-ID: <20250908225753.205700259@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1757372413; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=cVw3C3mDyDXs1g4Ff1jO8tDLgnqTZ1fhyroC/ENYiNA=; b=MdoXWnI/7k3DaxRWw4l4xnin20D0FjnMOCER5uvlBf820w7UBK3r/SmysF7IGE8n5NSgSD JqWJ2vpypYlIIUPk4uWigJqHPCAiuSKk+AsWjKf0raQFh/k1M+1OBoNfXdq0qbvSi70b6W D5P18GNjN7xmTeTSvHSpgpLFD1NHeeSMRhplzoGqpwvFYh993ucQYiGSOlKRmsprNUlqlX g1zpb53DLin4zO9kNkBC4vJEfqLeCHH/l3U9Bi3VIrZ5tKAnLfZIkF30/Mv24OwEWtxQFu 5SRbN1vhMy+fvwYBrhKPuAY9hBYjFxeFOkJ9tlle3X5jJ4gfWoTozh+gOvq2Aw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1757372413; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=cVw3C3mDyDXs1g4Ff1jO8tDLgnqTZ1fhyroC/ENYiNA=; b=AcVER9SjyDp1B7DVY8SuLcM2LVjiAzx0qVHPIaDp8UtpF/YiIIivNZChsuVCdaDKADj8UA Sz792mq5QYzRztAA== From: Thomas Gleixner To: LKML Cc: Mathieu Desnoyers , Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Peter Zilstra , Jonathan Corbet , Prakash Sangappa , Madadi Vineeth Reddy , K Prateek Nayak , Steven Rostedt , Sebastian Andrzej Siewior , Arnd Bergmann , linux-arch@vger.kernel.org Subject: [patch 10/12] rseq: Implement rseq_grant_slice_extension() References: <20250908225709.144709889@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Tue, 9 Sep 2025 01:00:12 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Provide the actual decision function, which decides whether a time slice extension is granted in the exit to user mode path when NEED_RESCHED is evaluated. The decision is made in two stages. First an inline quick check to avoid going into the actual decision function. This checks whether: #1 the functionality is enabled #2 the exit is a return from interrupt to user mode #3 any TIF bit, which causes extra work is set. That includes TIF_RSEQ, which means the task was already scheduled out. =20 The slow path, which implements the actual user space ABI, is invoked when: A) #1 is true, #2 is true and #3 is false It checks whether user space requested a slice extension by setting the request bit in the rseq slice_ctrl field. If so, it grants the extension and stores the slice expiry time, so that the actual exit code can double check whether the slice is already exhausted before going back. B) #1 - #3 are true _and_ a slice extension was granted in a previous loop iteration In this case the grant is revoked. In case that the user space access faults or invalid state is detected, the task is terminated with SIGSEGV. Signed-off-by: Thomas Gleixner Cc: Mathieu Desnoyers Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Boqun Feng --- include/linux/rseq_entry.h | 111 ++++++++++++++++++++++++++++++++++++++++= +++++ 1 file changed, 111 insertions(+) --- a/include/linux/rseq_entry.h +++ b/include/linux/rseq_entry.h @@ -41,6 +41,7 @@ DECLARE_PER_CPU(struct rseq_stats, rseq_ #ifdef CONFIG_RSEQ #include #include +#include #include =20 #include @@ -110,10 +111,120 @@ static __always_inline void rseq_slice_c t->rseq.slice.state.granted =3D false; } =20 +static __always_inline bool rseq_grant_slice_extension(bool work_pending) +{ + struct task_struct *curr =3D current; + union rseq_slice_state state; + struct rseq __user *rseq; + u32 usr_ctrl; + + if (!rseq_slice_extension_enabled()) + return false; + + /* If not enabled or not a return from interrupt, nothing to do. */ + state =3D curr->rseq.slice.state; + state.enabled &=3D curr->rseq.event.user_irq; + if (likely(!state.state)) + return false; + + rseq =3D curr->rseq.usrptr; + if (!user_rw_masked_begin(rseq)) + goto die; + + /* + * Quick check conditions where a grant is not possible or + * needs to be revoked. + * + * 1) Any TIF bit which needs to do extra work aside of + * rescheduling prevents a grant. + * + * 2) A previous rescheduling request resulted in a slice + * extension grant. + */ + if (unlikely(work_pending || state.granted)) { + /* Clear user control unconditionally. No point for checking */ + unsafe_put_user(0U, &rseq->slice_ctrl, fail); + user_access_end(); + rseq_slice_clear_grant(curr); + return false; + } + + unsafe_get_user(usr_ctrl, &rseq->slice_ctrl, fail); + if (likely(!(usr_ctrl & RSEQ_SLICE_EXT_REQUEST))) { + user_access_end(); + return false; + } + + /* Grant the slice extention */ + unsafe_put_user(RSEQ_SLICE_EXT_GRANTED, &rseq->slice_ctrl, fail); + user_access_end(); + + rseq_stat_inc(rseq_stats.s_granted); + + curr->rseq.slice.state.granted =3D true; + /* Store expiry time for arming the timer on the way out */ + curr->rseq.slice.expires =3D data_race(rseq_slice_ext_nsecs) + ktime_get_= mono_fast_ns(); + /* + * This is racy against a remote CPU setting TIF_NEED_RESCHED in + * several ways: + * + * 1) + * CPU0 CPU1 + * clear_tsk() + * set_tsk() + * clear_preempt() + * Raise scheduler IPI on CPU0 + * --> IPI + * fold_need_resched() -> Folds correctly + * 2) + * CPU0 CPU1 + * set_tsk() + * clear_tsk() + * clear_preempt() + * Raise scheduler IPI on CPU0 + * --> IPI + * fold_need_resched() <- NOOP as TIF_NEED_RESCHED is false + * + * #1 is not any different from a regular remote reschedule as it + * sets the previously not set bit and then raises the IPI which + * folds it into the preempt counter + * + * #2 is obviously incorrect from a scheduler POV, but it's not + * differently incorrect than the code below clearing the + * reschedule request with the safety net of the timer. + * + * The important part is that the clearing is protected against the + * scheduler IPI and also against any other interrupt which might + * end up waking up a task and setting the bits in the middle of + * the operation: + * + * clear_tsk() + * ---> Interrupt + * wakeup_on_this_cpu() + * set_tsk() + * set_preempt() + * clear_preempt() + * + * which would be inconsistent state. + */ + scoped_guard(irq) { + clear_tsk_need_resched(curr); + clear_preempt_need_resched(); + } + return true; + +fail: + user_access_end(); +die: + force_sig(SIGSEGV); + return false; +} + #else /* CONFIG_RSEQ_SLICE_EXTENSION */ static inline bool rseq_slice_extension_enabled(void) { return false; } static inline bool rseq_arm_slice_extension_timer(void) { return false; } static inline void rseq_slice_clear_grant(struct task_struct *t) { } +static inline bool rseq_grant_slice_extension(bool work_pending) { return = false; } #endif /* !CONFIG_RSEQ_SLICE_EXTENSION */ =20 bool rseq_debug_update_user_cs(struct task_struct *t, struct pt_regs *regs= , unsigned long csaddr);