From nobody Sat Apr 18 10:55:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83058C43334 for ; Thu, 14 Jul 2022 11:35:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238533AbiGNLfk (ORCPT ); Thu, 14 Jul 2022 07:35:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238966AbiGNLfg (ORCPT ); Thu, 14 Jul 2022 07:35:36 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F5EA5A2DE; Thu, 14 Jul 2022 04:35:33 -0700 (PDT) Date: Thu, 14 Jul 2022 11:35:31 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1657798532; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dw5dKPMBF5qgiL0iVEeFmaNXlyBwXMe/WNaUeMeWCig=; b=AB1CtqAg02nkJfhKCJlYBhDZsFTOh+ZWV5O9JZouHgEhysqH0c4IdGSIT81QIiIpSMjxCU HBs7s6CkkzXWW+hWGte11Jd2DIueE0okrzXsT8IGcXENprtb8rIW7fZDrMqvgAwZoqfmoW fcIoe3CGEMgXu2CB5+04kpBifVlvaf2C0J0wukjvxDwIJADRJStIMhl2PO7RzOLvftUlYg ZBAaZboFNOQcJyi28no0vm/16eITQzpY108TqtB/vCnye+7YIdzQg5yRgfc+2tUEcbeyzy SQAjITj19Xjf1jGRnZ/xOqzPSGntK/4t0uUCbQlgRryg8msmdiiMAP5K7F8z8g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1657798532; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dw5dKPMBF5qgiL0iVEeFmaNXlyBwXMe/WNaUeMeWCig=; b=bkBgjlcQWZQ9X3mqQfCJ3NUzoHpsZK8KGuxjvRbjfVVyiZxwJBC7QqDxVNfKm0ZzLXeQ3V BaaxmjQKk0nar3Aw== From: "tip-bot2 for John Keeping" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/core] sched/core: Always flush pending blk_plug Cc: John Keeping , "Peter Zijlstra (Intel)" , "Steven Rostedt (Google)" , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20220708162702.1758865-1-john@metanate.com> References: <20220708162702.1758865-1-john@metanate.com> MIME-Version: 1.0 Message-ID: <165779853104.15455.16915451211318324333.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The following commit has been merged into the sched/core branch of tip: Commit-ID: 401e4963bf45c800e3e9ea0d3a0289d738005fd4 Gitweb: https://git.kernel.org/tip/401e4963bf45c800e3e9ea0d3a0289d73= 8005fd4 Author: John Keeping AuthorDate: Fri, 08 Jul 2022 17:27:02 +01:00 Committer: Peter Zijlstra CommitterDate: Wed, 13 Jul 2022 11:29:17 +02:00 sched/core: Always flush pending blk_plug With CONFIG_PREEMPT_RT, it is possible to hit a deadlock between two normal priority tasks (SCHED_OTHER, nice level zero): INFO: task kworker/u8:0:8 blocked for more than 491 seconds. Not tainted 5.15.49-rt46 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u8:0 state:D stack: 0 pid: 8 ppid: 2 flags:0x000= 00000 Workqueue: writeback wb_workfn (flush-7:0) [] (__schedule) from [] (schedule+0xdc/0x134) [] (schedule) from [] (rt_mutex_slowlock_block.constpr= op.0+0xb8/0x174) [] (rt_mutex_slowlock_block.constprop.0) from [] +(rt_mutex_slowlock.constprop.0+0xac/0x174) [] (rt_mutex_slowlock.constprop.0) from [] (fat_write_= inode+0x34/0x54) [] (fat_write_inode) from [] (__writeback_single_inode= +0x354/0x3ec) [] (__writeback_single_inode) from [] (writeback_sb_in= odes+0x250/0x45c) [] (writeback_sb_inodes) from [] (__writeback_inodes_w= b+0x7c/0xb8) [] (__writeback_inodes_wb) from [] (wb_writeback+0x2c8= /0x2e4) [] (wb_writeback) from [] (wb_workfn+0x1a4/0x3e4) [] (wb_workfn) from [] (process_one_work+0x1fc/0x32c) [] (process_one_work) from [] (worker_thread+0x22c/0x2= d8) [] (worker_thread) from [] (kthread+0x16c/0x178) [] (kthread) from [] (ret_from_fork+0x14/0x38) Exception stack(0xc10e3fb0 to 0xc10e3ff8) 3fa0: 00000000 00000000 00000000 00000= 000 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000= 000 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 INFO: task tar:2083 blocked for more than 491 seconds. Not tainted 5.15.49-rt46 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:tar state:D stack: 0 pid: 2083 ppid: 2082 flags:0x000= 00000 [] (__schedule) from [] (schedule+0xdc/0x134) [] (schedule) from [] (io_schedule+0x14/0x24) [] (io_schedule) from [] (bit_wait_io+0xc/0x30) [] (bit_wait_io) from [] (__wait_on_bit_lock+0x54/0xa8) [] (__wait_on_bit_lock) from [] (out_of_line_wait_on_b= it_lock+0x84/0xb0) [] (out_of_line_wait_on_bit_lock) from [] (fat_mirror_= bhs+0xa0/0x144) [] (fat_mirror_bhs) from [] (fat_alloc_clusters+0x138/= 0x2a4) [] (fat_alloc_clusters) from [] (fat_alloc_new_dir+0x3= 4/0x250) [] (fat_alloc_new_dir) from [] (vfat_mkdir+0x58/0x148) [] (vfat_mkdir) from [] (vfs_mkdir+0x68/0x98) [] (vfs_mkdir) from [] (do_mkdirat+0xb0/0xec) [] (do_mkdirat) from [] (ret_fast_syscall+0x0/0x1c) Exception stack(0xc2e1bfa8 to 0xc2e1bff0) bfa0: 01ee42f0 01ee4208 01ee42f0 000041ed 00000000 00004= 000 bfc0: 01ee42f0 01ee4208 00000000 00000027 01ee4302 00000004 000dcb00 01ee4= 190 bfe0: 000dc368 bed11924 0006d4b0 b6ebddfc Here the kworker is waiting on msdos_sb_info::s_lock which is held by tar which is in turn waiting for a buffer which is locked waiting to be flushed, but this operation is plugged in the kworker. The lock is a normal struct mutex, so tsk_is_pi_blocked() will always return false on !RT and thus the behaviour changes for RT. It seems that the intent here is to skip blk_flush_plug() in the case where a non-preemptible lock (such as a spinlock) has been converted to a rtmutex on RT, which is the case covered by the SM_RTLOCK_WAIT schedule flag. But sched_submit_work() is only called from schedule() which is never called in this scenario, so the check can simply be deleted. Looking at the history of the -rt patchset, in fact this change was present from v5.9.1-rt20 until being dropped in v5.13-rt1 as it was part of a larger patch [1] most of which was replaced by commit b4bfa3fcfe3b ("sched/core: Rework the __schedule() preempt argument"). As described in [1]: The schedule process must distinguish between blocking on a regular sleeping lock (rwsem and mutex) and a RT-only sleeping lock (spinlock and rwlock): - rwsem and mutex must flush block requests (blk_schedule_flush_plug()) even if blocked on a lock. This can not deadlock because this also happens for non-RT. There should be a warning if the scheduling point is within a RCU read section. - spinlock and rwlock must not flush block requests. This will deadlock if the callback attempts to acquire a lock which is already acquired. Similarly to being preempted, there should be no warning if the scheduling point is within a RCU read section. and with the tsk_is_pi_blocked() in the scheduler path, we hit the first issue. [1] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/t= ree/patches/0022-locking-rtmutex-Use-custom-scheduling-function-for-s.patch= ?h=3Dlinux-5.10.y-rt-patches Signed-off-by: John Keeping Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Steven Rostedt (Google) Link: https://lkml.kernel.org/r/20220708162702.1758865-1-john@metanate.com --- include/linux/sched/rt.h | 8 -------- kernel/sched/core.c | 8 ++++++-- 2 files changed, 6 insertions(+), 10 deletions(-) diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h index e5af028..994c256 100644 --- a/include/linux/sched/rt.h +++ b/include/linux/sched/rt.h @@ -39,20 +39,12 @@ static inline struct task_struct *rt_mutex_get_top_task= (struct task_struct *p) } extern void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi= _task); extern void rt_mutex_adjust_pi(struct task_struct *p); -static inline bool tsk_is_pi_blocked(struct task_struct *tsk) -{ - return tsk->pi_blocked_on !=3D NULL; -} #else static inline struct task_struct *rt_mutex_get_top_task(struct task_struct= *task) { return NULL; } # define rt_mutex_adjust_pi(p) do { } while (0) -static inline bool tsk_is_pi_blocked(struct task_struct *tsk) -{ - return false; -} #endif =20 extern void normalize_rt_tasks(void); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c703d17..a463dbc 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6470,8 +6470,12 @@ static inline void sched_submit_work(struct task_str= uct *tsk) io_wq_worker_sleeping(tsk); } =20 - if (tsk_is_pi_blocked(tsk)) - return; + /* + * spinlock and rwlock must not flush block requests. This will + * deadlock if the callback attempts to acquire a lock which is + * already acquired. + */ + SCHED_WARN_ON(current->__state & TASK_RTLOCK_WAIT); =20 /* * If we are going to sleep and we have plugged IO queued,