From nobody Fri Apr 17 23:54:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C35FC433EF for ; Wed, 20 Jul 2022 15:44:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241490AbiGTPot (ORCPT ); Wed, 20 Jul 2022 11:44:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235567AbiGTPom (ORCPT ); Wed, 20 Jul 2022 11:44:42 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03E6E65558 for ; Wed, 20 Jul 2022 08:44:40 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1658331878; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y1nCw/d8mMCerK6pUWZZkFfHHY0GZjcAjEhigenqTT4=; b=PyFHsyknRI8r9DGfID1BjToxl92n2S/IDowb7A0lSidydkrXfD8KEzEKvrfIPyoOiqFtnx TajbaIafyz6fhyYQLPBG4hu+qQQF0wFomKlaC9Uj1oe8zC4uZsw7fxvGrrT6xlQv/12Zk+ /SkUZ2fIIA/tRZYmKc/o1ip/Qh+jV2EZP+hsGf2GaxFb8h/s3EJtWABiA1MWnF0l+YI/ix rtzcH1XGPPtVUu4nx8S1q4jQelyhX20Lyub2dw9Nppc3E4SKho5CGdsFNWYPGBnt3GLCcN 1Mw9wT6bJ5guHmEp1XqrX/7sjhDJ0UaljRSsdCNBJYiVUdp+dt2gbwqg6ingQQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1658331878; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y1nCw/d8mMCerK6pUWZZkFfHHY0GZjcAjEhigenqTT4=; b=ALYdyqt5jVntiWP9qZ4vfQWyI+HlTubbGj9eGJn728aCLR+LYFVICRNmFiMUlo7KcS5KMt y7R2A26E/uV8ntDw== To: linux-kernel@vger.kernel.org Cc: "Eric W. Biederman" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Mel Gorman , Oleg Nesterov , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Valentin Schneider , Vincent Guittot , Sebastian Andrzej Siewior Subject: [PATCH 1/2] signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT. Date: Wed, 20 Jul 2022 17:44:34 +0200 Message-Id: <20220720154435.232749-2-bigeasy@linutronix.de> In-Reply-To: <20220720154435.232749-1-bigeasy@linutronix.de> References: <20220720154435.232749-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Commit 53da1d9456fe7 ("fix ptrace slowness") is just band aid around the problem. The invocation of do_notify_parent_cldstop() wakes the parent and makes it runnable. The scheduler then wants to replace this still running task with the parent. With the read_lock() acquired this is not possible because preemption is disabled and so this is deferred until read_unlock(). This scheduling point is undesired and is avoided by disabling preemption around the unlock operation enabled again before the schedule() invocation without a preemption point. This is only undesired because the parent sleeps a cycle in wait_task_inactive() until the traced task leaves the run-queue in schedule(). It is not a correctness issue, it is just band aid to avoid the visbile delay which sums up over multiple invocations. The task can still be preempted if an interrupt occurs between preempt_enable_no_resched() and freezable_schedule() because on the IRQ-exit path of the interrupt scheduling _will_ happen. This is ignored since it do= es not happen very often. On PREEMPT_RT keeping preemption disabled during the invocation of cgroup_enter_frozen() becomes a problem because the function acquires css_set_lock which is a sleeping lock on PREEMPT_RT and must not be acquired with disabled preemption. Don't disable preemption on PREEMPT_RT. Remove the TODO regarding adding read_unlock_no_resched() as there is no need for it and will cause harm. Signed-off-by: Sebastian Andrzej Siewior --- kernel/signal.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2297,13 +2297,13 @@ static int ptrace_stop(int exit_code, in /* * Don't want to allow preemption here, because * sys_ptrace() needs this task to be inactive. - * - * XXX: implement read_unlock_no_resched(). */ - preempt_disable(); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); read_unlock(&tasklist_lock); cgroup_enter_frozen(); - preempt_enable_no_resched(); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable_no_resched(); freezable_schedule(); cgroup_leave_frozen(true); =20 From nobody Fri Apr 17 23:54:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6C44C43334 for ; Wed, 20 Jul 2022 15:44:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239709AbiGTPoq (ORCPT ); Wed, 20 Jul 2022 11:44:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235144AbiGTPom (ORCPT ); Wed, 20 Jul 2022 11:44:42 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03C976554B for ; Wed, 20 Jul 2022 08:44:40 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1658331879; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=34Do4XAPCDFKT+cOHpVpgJ0obwCGcQhm4QU910jacv4=; b=WkW0TIeXHK7Dk2i/Xnd2hmzKy7KAr3FEJ5Atl+X6i5DMKqeyo1waUeQQfn+aA3nh2TvSsQ mB/zDJGWVZLAx5wJtPj7o/x0wfz+jJ0cxF7kHXUwCpoefPO3VRolu07LEu/yyrakewxM+u QKkZrMFzsFAYRMWFPge80Cx5EkTNmL95fGauD7zZn6e1JZygC1vrneQ/zW40daqHeRk52x YOh2yG+pXvsTiUNiUJlYd8Nff7ojOluBYD23vD8j5W09yINxoKCsK7CtmriH4DrNWUadzm Waoz2dBqk0XPr55tTR/V38dWd4Ow2PwtxibAV2o8mv6ttazZu/yiJ8WpH+TCfw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1658331879; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=34Do4XAPCDFKT+cOHpVpgJ0obwCGcQhm4QU910jacv4=; b=mfFtYo2+EqnRFrTI8rewKbQBXLe6/woa4gWqfG/JKxt7+oep0YN9eum1pYboZ9aCFfl03a 9BvHwl4SwcO2R4BA== To: linux-kernel@vger.kernel.org Cc: "Eric W. Biederman" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Mel Gorman , Oleg Nesterov , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Valentin Schneider , Vincent Guittot , Sebastian Andrzej Siewior Subject: [PATCH 2/2] sched: Consider task_struct::saved_state in wait_task_inactive(). Date: Wed, 20 Jul 2022 17:44:35 +0200 Message-Id: <20220720154435.232749-3-bigeasy@linutronix.de> In-Reply-To: <20220720154435.232749-1-bigeasy@linutronix.de> References: <20220720154435.232749-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Ptrace is using wait_task_inactive() to wait for the tracee to reach a certain task state. On PREEMPT_RT that state may be stored in task_struct::saved_state while the tracee blocks on a sleeping lock and task_struct::__state is set to TASK_RTLOCK_WAIT. It is not possible to check only for TASK_RTLOCK_WAIT to be sure that the t= ask is blocked on a sleeping lock because during wake up (after the sleeping lo= ck has been acquired) the task state is set TASK_RUNNING. After the task in on= CPU and acquired the pi_lock it will reset the state accordingly but until then TASK_RUNNING will be observed (with the desired state saved in saved_state). Check also for task_struct::saved_state if the desired match was not found = in task_struct::__state on PREEMPT_RT. If the state was found in saved_state, = wait until the task is idle and state is visible in task_struct::__state. Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Valentin Schneider --- kernel/sched/core.c | 46 +++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 41 insertions(+), 5 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3257,6 +3257,40 @@ int migrate_swap(struct task_struct *cur } #endif /* CONFIG_NUMA_BALANCING */ =20 +#ifdef CONFIG_PREEMPT_RT +static __always_inline bool state_mismatch(struct task_struct *p, unsigned= int match_state) +{ + unsigned long flags; + bool mismatch; + + raw_spin_lock_irqsave(&p->pi_lock, flags); + mismatch =3D READ_ONCE(p->__state) !=3D match_state && + READ_ONCE(p->saved_state) !=3D match_state; + raw_spin_unlock_irqrestore(&p->pi_lock, flags); + return mismatch; +} +static __always_inline bool state_match(struct task_struct *p, unsigned in= t match_state, + bool *wait) +{ + if (READ_ONCE(p->__state) =3D=3D match_state) + return true; + if (READ_ONCE(p->saved_state) !=3D match_state) + return false; + *wait =3D true; + return true; +} +#else +static __always_inline bool state_mismatch(struct task_struct *p, unsigned= int match_state) +{ + return READ_ONCE(p->__state) !=3D match_state; +} +static __always_inline bool state_match(struct task_struct *p, unsigned in= t match_state, + bool *wait) +{ + return READ_ONCE(p->__state) =3D=3D match_state; +} +#endif + /* * wait_task_inactive - wait for a thread to unschedule. * @@ -3275,7 +3309,7 @@ int migrate_swap(struct task_struct *cur */ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match= _state) { - int running, queued; + bool running, wait; struct rq_flags rf; unsigned long ncsw; struct rq *rq; @@ -3301,7 +3335,7 @@ unsigned long wait_task_inactive(struct * is actually now running somewhere else! */ while (task_running(rq, p)) { - if (match_state && unlikely(READ_ONCE(p->__state) !=3D match_state)) + if (match_state && state_mismatch(p, match_state)) return 0; cpu_relax(); } @@ -3314,10 +3348,12 @@ unsigned long wait_task_inactive(struct rq =3D task_rq_lock(p, &rf); trace_sched_wait_task(p); running =3D task_running(rq, p); - queued =3D task_on_rq_queued(p); + wait =3D task_on_rq_queued(p); ncsw =3D 0; - if (!match_state || READ_ONCE(p->__state) =3D=3D match_state) + + if (!match_state || state_match(p, match_state, &wait)) ncsw =3D p->nvcsw | LONG_MIN; /* sets MSB */ + task_rq_unlock(rq, p, &rf); =20 /* @@ -3346,7 +3382,7 @@ unsigned long wait_task_inactive(struct * running right now), it's preempted, and we should * yield - it could be a while. */ - if (unlikely(queued)) { + if (unlikely(wait)) { ktime_t to =3D NSEC_PER_SEC / HZ; =20 set_current_state(TASK_UNINTERRUPTIBLE);