From nobody Wed Apr 8 10:13:37 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38A2DC4332F for ; Thu, 6 Oct 2022 14:01:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230061AbiJFOBF (ORCPT ); Thu, 6 Oct 2022 10:01:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229453AbiJFOBB (ORCPT ); Thu, 6 Oct 2022 10:01:01 -0400 Received: from mail4.swissbit.com (mail4.swissbit.com [176.95.1.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25337A5982; Thu, 6 Oct 2022 07:00:59 -0700 (PDT) Received: from mail4.swissbit.com (localhost [127.0.0.1]) by DDEI (Postfix) with ESMTP id AF26D123234; Thu, 6 Oct 2022 16:00:57 +0200 (CEST) Received: from mail4.swissbit.com (localhost [127.0.0.1]) by DDEI (Postfix) with ESMTP id 9D5B3123027; Thu, 6 Oct 2022 16:00:57 +0200 (CEST) X-TM-AS-ERS: 10.149.2.42-127.5.254.253 X-TM-AS-SMTP: 1.0 ZXguc3dpc3NiaXQuY29t Y2xvZWhsZUBoeXBlcnN0b25lLmNvbQ== X-DDEI-TLS-USAGE: Used Received: from ex.swissbit.com (unknown [10.149.2.42]) by mail4.swissbit.com (Postfix) with ESMTPS; Thu, 6 Oct 2022 16:00:57 +0200 (CEST) Received: from sbdeex04.sbitdom.lan (10.149.2.42) by sbdeex04.sbitdom.lan (10.149.2.42) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.9; Thu, 6 Oct 2022 16:00:57 +0200 Received: from sbdeex04.sbitdom.lan ([fe80::2047:4968:b5a0:1818]) by sbdeex04.sbitdom.lan ([fe80::2047:4968:b5a0:1818%9]) with mapi id 15.02.1118.009; Thu, 6 Oct 2022 16:00:57 +0200 From: =?utf-8?B?Q2hyaXN0aWFuIEzDtmhsZQ==?= To: Adrian Hunter , "ulf.hansson@linaro.org" , Linux MMC List , "linux-kernel@vger.kernel.org" CC: Avri Altman Subject: RE: [PATCHv2] mmc: core: fix race of queue reset and card removal Thread-Topic: [PATCHv2] mmc: core: fix race of queue reset and card removal Thread-Index: AdjX8vnmWSO+H9o1T5SYJhJVtR/27gAvHgOAADcZyoA= Date: Thu, 6 Oct 2022 14:00:56 +0000 Message-ID: <12e0a733701f419dbcbed01f0902da51@hyperstone.com> References: <1a5810475d7a475db5e4e5130b8f455c@hyperstone.com> In-Reply-To: Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.153.3.46] Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-TMASE-Version: DDEI-5.1-9.0.1002-27184.007 X-TMASE-Result: 10--18.054400-10.000000 X-TMASE-MatchedRID: nQsCAlAhJHbUL3YCMmnG4o61Z+HJnvsO1KDIlODIu+UadFN+Lxv9xzcp XpPCV0E3OuzeNmOGKnMdXvAkWdRIlTmpL9mXE1+sHmtCXih7f9P2aiNJz83dB6JQ/kX2wIQwbyq cWT4FZRdqptNg8OUfDgVCnEzTGyEd/6VeF+1cPStKzjuZtPtIBODTYjejIZTwYgJpgK8zJqAWFL 7w0dNaom/sggtcoFm4h/iXxkrRM0kKOpSDSbToy4anR/CvYO8Xp0c5ayOm8oPiFxcREnURUhiQn 63S7zCUb7vhAFDgsdOqIzDM5PcaqCB0OrJlTevjuce7gFxhKa3BOVz0Jwcxl6vCrG0TnfVUgK6q CGa1Z9ceFlUujcltcIyivp6Zag4oL3X03fxuTKUdZEkR8Y/medjRSEbB5dRK1YzbHoRn9L2AI+p Lfk3sB0q6g2mxLy1xD5rpUQLVm0HWzUAcn0Tx4PZOZ2c2VQUg06KZJRgAyVlooy/QGYEKTF721f 1Z1L9KrqM/VRr+zhYV13ywgHqv/ZH0YXYnbGoz0gVVXNgaM0pZDL1gLmoa/MpTVxbIjeBartjOu TOmUaMLbigRnpKlKSBuGJWwgxAr0t0ccteCeDfn3d8WKSnlF3xvw1PfRySKc99NZL09Nqm+g1Sd AULhTMWFcyN1Agmm X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 X-TMASE-INERTIA: 0-0;;;; X-TMASE-XGENCLOUD: e807512a-ecf6-489e-ad25-aa7ff8924c74-0-0-200-0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Thanks Adrian for the comments and hints, implemented and submitted. They also fix both issues. -----Original Message----- From: Adrian Hunter =20 Sent: Mittwoch, 5. Oktober 2022 15:42 To: Christian L=C3=B6hle ; ulf.hansson@linaro.org; = Linux MMC List ; linux-kernel@vger.kernel.org Cc: Avri Altman Subject: Re: [PATCHv2] mmc: core: fix race of queue reset and card removal On 4/10/22 16:13, Christian L=C3=B6hle wrote: > If a recovery is active and the card is removed do not try to switch=20 > back partitions. Furthermore do not reference > mq->card which might be NULLed in the meantime. >=20 > This has been observed with recovery active with CQE. > [ 1083.510578] Unable to handle kernel NULL pointer dereference at=20 > virtual address 000000000000038c [ 1083.511362] Mem abort info: > [ 1083.511626] ESR =3D 0x96000004 > [ 1083.511912] EC =3D 0x25: DABT (current EL), IL =3D 32 bits > [ 1083.512395] SET =3D 0, FnV =3D 0 > [ 1083.512681] EA =3D 0, S1PTW =3D 0 > [ 1083.512973] FSC =3D 0x04: level 0 translation fault > [ 1083.513417] Data abort info: > [ 1083.513686] ISV =3D 0, ISS =3D 0x00000004 > [ 1083.514039] CM =3D 0, WnR =3D 0 > [ 1083.514318] user pgtable: 4k pages, 48-bit VAs,=20 > pgdp=3D000000000a4c3000 [ 1083.514899] [000000000000038c]=20 > pgd=3D0000000000000000, p4d=3D0000000000000000 [ 1083.515854] Internal er= ror: Oops: 96000004 [#1] SMP > [ 1083.516295] CPU: 0 PID: 153 Comm: kworker/0:2 Tainted: G W = 5.18.12-g925ff1d10c99-dirty #7 > [ 1083.517127] Hardware name: Pine64 RockPro64 v2.1 (DT) [=20 > 1083.517574] Workqueue: events mmc_mq_recovery_handler [ 1083.518032]=20 > pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=3D--) [=20 > 1083.518645] pc : mmc_blk_reset+0x60/0x1ac [ 1083.519004] lr :=20 > mmc_blk_reset+0x38/0x1ac [ 1083.519361] sp : ffff8000100b3cd0 [=20 > 1083.519654] x29: ffff8000100b3cd0 x28: 0000000000000000 x27:=20 > 0000000000000000 [ 1083.520288] x26: ffff80000b0ba000 x25:=20 > ffff0000f6e74805 x24: ffff000004c2fdc0 [ 1083.520922] x23:=20 > ffff000014950000 x22: ffff000004c2fc18 x21: ffff00000a33c000 [=20 > 1083.521556] x20: 00000000ffffff85 x19: ffff000004c2fc00 x18:=20 > ffffffffffffffff [ 1083.522189] x17: ffff80000cd9b200 x16:=20 > ffff80000cd9b190 x15: 0000000000000006 [ 1083.522823] x14:=20 > 0000000000000000 x13: ffff80000b0c28f0 x12: 0000000000001707 [=20 > 1083.523457] x11: 00000000000007ad x10: ffff80000c6c28f0 x9 :=20 > ffff80000b0c28f0 [ 1083.524090] x8 : 00000000fffbffff x7 :=20 > 0000000000000001 x6 : 0000000000000000 [ 1083.524723] x5 : 00000000000000= 00 x4 : ffff0000f6e62d30 x3 : 0000000000000000 [ 1083.525357] x2 : 00000000= 00000000 x1 : ffff00000b6e0000 x0 : 0000000000000000 [ 1083.525990] Call tr= ace: > [ 1083.526209] mmc_blk_reset+0x60/0x1ac [ 1083.526536] =20 > mmc_blk_cqe_recovery+0x8c/0xd0 [ 1083.526908] =20 > mmc_mq_recovery_handler+0xc4/0xd0 [ 1083.527303] =20 > process_one_work+0x23c/0x3fc [ 1083.527663] worker_thread+0x74/0x420=20 > [ 1083.527990] kthread+0xec/0xf0 [ 1083.528264] =20 > ret_from_fork+0x10/0x20 [ 1083.528587] Code: d50323bf d65f03c0=20 > f94352a0 f9404000 (b9438c01) [ 1083.529126] ---[ end trace=20 > 0000000000000000 ]--- >=20 > [ 1431.677970] Unable to handle kernel NULL pointer dereference at=20 > virtual address 0000000000000000 [ 1431.678753] Mem abort info: > [ 1431.679017] ESR =3D 0x96000004 > [ 1431.679303] EC =3D 0x25: DABT (current EL), IL =3D 32 bits > [ 1431.679786] SET =3D 0, FnV =3D 0 > [ 1431.680072] EA =3D 0, S1PTW =3D 0 > [ 1431.680366] FSC =3D 0x04: level 0 translation fault > [ 1431.680810] Data abort info: > [ 1431.681080] ISV =3D 0, ISS =3D 0x00000004 > [ 1431.681432] CM =3D 0, WnR =3D 0 > [ 1431.681712] user pgtable: 4k pages, 48-bit VAs,=20 > pgdp=3D000000000bb98000 [ 1431.682390] [0000000000000000]=20 > pgd=3D0000000000000000, p4d=3D0000000000000000 [ 1431.683393] Internal=20 > error: Oops: 96000004 [#1] SMP [ 1431.683841] CPU: 0 PID: 19948 Comm:=20 > kworker/0:2 Not tainted 5.18.12-gf65532578f32-dirty #16 [ 1431.684576]=20 > Hardware name: Pine64 RockPro64 v2.1 (DT) [ 1431.685024] Workqueue:=20 > events mmc_mq_recovery_handler [ 1431.685487] pstate: 60000005 (nZCv=20 > daif -PAN -UAO -TCO -DIT -SSBS BTYPE=3D--) [ 1431.686100] pc :=20 > mmc_put_card+0x38/0x110 [ 1431.686453] lr :=20 > mmc_mq_recovery_handler+0x98/0xd0 [ 1431.686879] sp : ffff800015813cf0=20 > [ 1431.687173] x29: ffff800015813cf0 x28: 0000000000000000 x27:=20 > 0000000000000000 [ 1431.687807] x26: ffff80000b0ba000 x25:=20 > ffff0000f6e74805 x24: ffff000013bd65c0 [ 1431.688441] x23:=20 > ffff000013b96120 x22: ffff000013bd6418 x21: 0000000000000000 [=20 > 1431.689075] x20: ffff800008ed1c70 x19: ffff8000091767d8 x18:=20 > ffffffffffffffff [ 1431.689709] x17: 31335b1b6d375b1b x16:=20 > 6d305b1b47554245 x15: 0000000000000006 [ 1431.690343] x14:=20 > 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [=20 > 1431.690976] x11: ffff000013bd6570 x10: 0000000000000001 x9 :=20 > ffff80000ea69228 [ 1431.691611] x8 : ffff80000df892c8 x7 :=20 > 0000000000000000 x6 : 0000000000000001 [ 1431.692245] x5 : 00000000000000= 01 x4 : 0000000000000002 x3 : ffff80000e6feac8 [ 1431.692879] x2 : 00000000= 0000036e x1 : ffff800008ed1c70 x0 : 0000000000000000 [ 1431.693513] Call tr= ace: > [ 1431.693732] mmc_put_card+0x38/0x110 [ 1431.694055] =20 > mmc_mq_recovery_handler+0x98/0xd0 [ 1431.694452] =20 > process_one_work+0x23c/0x3fc [ 1431.694812] worker_thread+0x74/0x420=20 > [ 1431.695139] kthread+0xec/0xf0 [ 1431.695414] =20 > ret_from_fork+0x10/0x20 [ 1431.695738] Code: f9001bf7 aa0103f6=20 > aa0003f5 aa1403e1 (f9400017) [ 1431.696278] ---[ end trace=20 > 0000000000000000 ]--- >=20 > Signed-off-by: Christian Loehle Thanks for finding these issues. A couple of comments below. > --- > drivers/mmc/core/block.c | 4 ++-- > drivers/mmc/core/queue.c | 5 +++-- > 2 files changed, 5 insertions(+), 4 deletions(-) >=20 > diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index=20 > ce89611a136e..0cd3a7065629 100644 > --- a/drivers/mmc/core/block.c > +++ b/drivers/mmc/core/block.c > @@ -997,8 +997,8 @@ static int mmc_blk_reset(struct mmc_blk_data *md,=20 > struct mmc_host *host, > =20 > md->reset_done |=3D type; > err =3D mmc_hw_reset(host->card); > - /* Ensure we switch back to the correct partition */ > - if (err) { > + /* Ensure we switch back to the correct partition on successful reset */ > + if (!err) { This isn't quite right. Originally, this was err !=3D -EOPNOTSUPP so "alwa= ys" unless the reset was not attempted at all. When the -EOPNOTSUPP return= value went away, this should have become unconditional. Also this change should be a separate patch, and have a fixes tag i.e. Fixes: fefdd3c91e0a ("mmc: core: Drop superfluous validations in mmc_hw|sw_= reset()") > struct mmc_blk_data *main_md =3D > dev_get_drvdata(&host->card->dev); > int part_err; > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index=20 > fefaa901b50f..6931fa082ea7 100644 > --- a/drivers/mmc/core/queue.c > +++ b/drivers/mmc/core/queue.c > @@ -137,9 +137,10 @@ static void mmc_mq_recovery_handler(struct work_stru= ct *work) > struct mmc_queue *mq =3D container_of(work, struct mmc_queue, > recovery_work); > struct request_queue *q =3D mq->queue; > + struct mmc_card *card =3D mq->card; > struct mmc_host *host =3D mq->card->host; > =20 > - mmc_get_card(mq->card, &mq->ctx); > + mmc_get_card(card, &mq->ctx); > =20 > mq->in_recovery =3D true; > =20 > @@ -157,7 +158,7 @@ static void mmc_mq_recovery_handler(struct work_struc= t *work) > if (host->hsq_enabled) > host->cqe_ops->cqe_recovery_finish(host); > =20 > - mmc_put_card(mq->card, &mq->ctx); > + mmc_put_card(card, &mq->ctx); > =20 > blk_mq_run_hw_queues(q, true); > } Please try this instead: diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 6931= fa082ea7..d8d9115c51f6 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -494,6 +494,13 @@ void mmc_cleanup_queue(struct mmc_queue *mq) if (blk_queue_quiesced(q)) blk_mq_unquiesce_queue(q); =20 + /* + * If the recovery completes the last (and only remaining) request in + * the queue, and the card has been removed, we could end up here with + * the recovery not quite finished yet, so flush it. + */ + flush_work(&mq->recovery_work); + blk_mq_free_tag_set(&mq->tag_set); =20 /* Hyperstone GmbH | Reichenaustr. 39a | 78467 Konstanz Managing Director: Dr. Jan Peter Berns. Commercial register of local courts: Freiburg HRB381782