From nobody Mon Dec 1 22:35:41 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 007002FF149; Wed, 26 Nov 2025 20:17:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764188277; cv=none; b=WHzy9UsinfHBG2RjqIlY9b5N4YVpyom2RHqEEkasOOTljOdNFpov0En7WrwiEXv6JQiDbp7nkLZKrrWsyFhJyLRk2iAIZGDBIzA0SVuJvzyYTSXseBFvBHL8ljRjT9G8F3pKJaNyWGGd2LLxIkY3bGSNbjH/SCGTnJoyePjzxPE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764188277; c=relaxed/simple; bh=duNoWfaNjp+xMfmno0sK8fdjO/rXoov+O/Yk6pDvpo0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=fV++oZqhhb66RLpkDc9DMiSETm37tQphyDxjO1ybyslEm3piYSxglm1Xr0tCH216tF85GitC7DvOTJ3i1dBiw6aMsPxpONs9c51pMEUmSHws+yTNrbZYtqwjDXaiUpPmADLKkR2U8OArZEe2ZxJpEMxdED39nDZ+qkJC9q2cEvo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=N4NDjMPk; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="N4NDjMPk" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B5D8BC4CEF7; Wed, 26 Nov 2025 20:17:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1764188276; bh=duNoWfaNjp+xMfmno0sK8fdjO/rXoov+O/Yk6pDvpo0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=N4NDjMPkNJ79kZ77/zCOqtgcNtV9TTINVUJH/iatSHjosoWOUNkK1p50x5rXDhni0 T8lWJv8bgoNtrCZ/ZqBKFN2Qfn9gSjW3SL64YIcyqJL41Au/IXWLKLX2Ax9UDNBBhy e8gPsDwmK9Z5USZru7Ldv+9d0RCDZf28P9k+xXf7m1vbniCblouhn70e4E8XrU/Ha6 VhlhY8j6M/4bGNZr3vnHaF5cNENlNPytLnFlcjGyXUicNffTl5ugtSgskno3oDFjY+ oMRenckvKaytL05hFmUalpIXVIiX/6t5Z4HX3E4laZ47pXPO5UBqlAztDIZRB0MCQo orpoW1oX48xaQ== From: "Rafael J. Wysocki" To: Bart Van Assche Cc: Yang Yang , Jens Axboe , Pavel Machek , Len Brown , Greg Kroah-Hartman , Danilo Krummrich , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Subject: Re: [PATCH 1/2] PM: runtime: Fix I/O hang due to race between resume and runtime disable Date: Wed, 26 Nov 2025 21:17:51 +0100 Message-ID: <4697314.LvFx2qVVIh@rafael.j.wysocki> Organization: Linux Kernel Development In-Reply-To: References: <20251126101636.205505-1-yang.yang@vivo.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On Wednesday, November 26, 2025 8:34:54 PM CET Rafael J. Wysocki wrote: > On Wed, Nov 26, 2025 at 8:16=E2=80=AFPM Rafael J. Wysocki wrote: > > > > On Wed, Nov 26, 2025 at 7:06=E2=80=AFPM Bart Van Assche wrote: > > > > > > On 11/26/25 3:30 AM, Rafael J. Wysocki wrote: > > > > On Wed, Nov 26, 2025 at 11:17=E2=80=AFAM Yang Yang wrote: > > > >> T1: T2: > > > >> blk_queue_enter > > > >> blk_pm_resume_queue > > > >> pm_request_resume > > > > > > > > Shouldn't this be pm_runtime_resume() rather? > > > > > > I tried to make that change on an Android device. As a result, the > > > kernel complaint shown below appeared. My understanding is that sleep= ing > > > in atomic context can trigger a deadlock and hence is not allowed. > > > > > > [ 13.728890][ T1] WARNING: CPU: 6 PID: 1 at > > > kernel/sched/core.c:9714 __might_sleep+0x78/0x84 > > > [ 13.758800][ T1] Call trace: > > > [ 13.759027][ T1] __might_sleep+0x78/0x84 > > > [ 13.759340][ T1] __pm_runtime_resume+0x40/0xb8 > > > [ 13.759781][ T1] __bio_queue_enter+0xc0/0x1cc > > > [ 13.760153][ T1] blk_mq_submit_bio+0x884/0xadc > > > [ 13.760548][ T1] __submit_bio+0x2c8/0x49c > > > [ 13.760879][ T1] __submit_bio_noacct_mq+0x38/0x88 > > > [ 13.761242][ T1] submit_bio_noacct_nocheck+0x4fc/0x7b8 > > > [ 13.761631][ T1] submit_bio+0x214/0x4c0 > > > [ 13.761941][ T1] mpage_readahead+0x1b8/0x1fc > > > [ 13.762284][ T1] blkdev_readahead+0x18/0x28 > > > [ 13.762660][ T1] page_cache_ra_unbounded+0x310/0x4d8 > > > [ 13.763072][ T1] page_cache_ra_order+0xc0/0x5b0 > > > [ 13.763434][ T1] page_cache_sync_ra+0x17c/0x268 > > > [ 13.763782][ T1] filemap_read+0x4c4/0x12f4 > > > [ 13.764125][ T1] blkdev_read_iter+0x100/0x164 > > > [ 13.764475][ T1] vfs_read+0x188/0x348 > > > [ 13.764789][ T1] __se_sys_pread64+0x84/0xc8 > > > [ 13.765180][ T1] __arm64_sys_pread64+0x1c/0x2c > > > [ 13.765556][ T1] invoke_syscall+0x58/0xf0 > > > [ 13.765876][ T1] do_el0_svc+0x8c/0xe0 > > > [ 13.766172][ T1] el0_svc+0x50/0xd4 > > > [ 13.766583][ T1] el0t_64_sync_handler+0x20/0xf4 > > > [ 13.766932][ T1] el0t_64_sync+0x1bc/0x1c0 > > > [ 13.767294][ T1] irq event stamp: 2589614 > > > [ 13.767592][ T1] hardirqs last enabled at (2589613): > > > [] finish_lock_switch+0x70/0x108 > > > [ 13.768283][ T1] hardirqs last disabled at (2589614): > > > [] el1_dbg+0x24/0x80 > > > [ 13.768875][ T1] softirqs last enabled at (2589370): > > > [] ____do_softirq+0x10/0x20 > > > [ 13.769529][ T1] softirqs last disabled at (2589349): > > > [] ____do_softirq+0x10/0x20 > > > > > > I think that the filemap_invalidate_lock_shared() call in > > > page_cache_ra_unbounded() forbids sleeping in submit_bio(). > > > > The wait_event() macro in __bio_queue_enter() calls might_sleep() at > > the very beginning, so why would it not complain? > > > > IIUC, this is the WARN_ONCE() in __might_sleep() about the task state > > being different from TASK_RUNNING, which triggers because > > prepare_to_wait_event() changes the task state to > > TASK_UNINTERRUPTIBLE. > > > > This means that calling pm_runtime_resume() cannot be part of the > > wait_event() condition, so blk_pm_resume_queue() and the wait_event() > > macros involving it would need some rewriting. >=20 > Interestingly enough, the pm_request_resume() call in > blk_pm_resume_queue() is not even necessary in the __bio_queue_enter() > case because pm is false there and it doesn't even check > q->rpm_status. >=20 > So in fact the resume is only necessary in blk_queue_enter() if pm is non= zero. If I'm not completely in the weeds, something like the patch below should be doable. Also, I'd consider using pm_runtime_get_noresume() and pm_runtime_put_noidl= e() in blk_queue_enter() and blk_queue_exit(), respectively, in the "pm !=3D 0"= case to prevent the device from suspending while the .q_usage_counter ref is hel= d. --- block/blk-core.c | 6 +++--- block/blk-pm.h | 7 ++++--- 2 files changed, 7 insertions(+), 6 deletions(-) --- a/block/blk-core.c +++ b/block/blk-core.c @@ -309,6 +309,8 @@ int blk_queue_enter(struct request_queue if (flags & BLK_MQ_REQ_NOWAIT) return -EAGAIN; =20 + /* if necessary, resume .dev (assume success). */ + blk_pm_resume_queue(pm, q); /* * read pair of barrier in blk_freeze_queue_start(), we need to * order reading __PERCPU_REF_DEAD flag of .q_usage_counter and @@ -318,9 +320,7 @@ int blk_queue_enter(struct request_queue */ smp_rmb(); wait_event(q->mq_freeze_wq, - (!q->mq_freeze_depth && - blk_pm_resume_queue(pm, q)) || - blk_queue_dying(q)); + !q->mq_freeze_depth || blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; } --- a/block/blk-pm.h +++ b/block/blk-pm.h @@ -10,9 +10,10 @@ static inline int blk_pm_resume_queue(co { if (!q->dev || !blk_queue_pm_only(q)) return 1; /* Nothing to do */ - if (pm && q->rpm_status !=3D RPM_SUSPENDED) - return 1; /* Request allowed */ - pm_request_resume(q->dev); + + if (pm) + pm_runtime_resume(q->dev); + return 0; } =20