From nobody Thu Sep 11 15:02:43 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBEF3CD68ED for ; Tue, 10 Oct 2023 07:24:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442570AbjJJHYF (ORCPT ); Tue, 10 Oct 2023 03:24:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442346AbjJJHXm (ORCPT ); Tue, 10 Oct 2023 03:23:42 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22BCE9E; Tue, 10 Oct 2023 00:23:41 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4S4S7C4fp3z4f3lWH; Tue, 10 Oct 2023 15:23:35 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgDHXd31+yRl1AZbCg--.36859S8; Tue, 10 Oct 2023 15:23:38 +0800 (CST) From: Yu Kuai To: song@kernel.org, xni@redhat.com Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v4 04/19] md: add new helpers to suspend/resume array Date: Tue, 10 Oct 2023 23:19:43 +0800 Message-Id: <20231010151958.145896-5-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231010151958.145896-1-yukuai1@huaweicloud.com> References: <20231010151958.145896-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgDHXd31+yRl1AZbCg--.36859S8 X-Coremail-Antispam: 1UD129KBjvJXoWxKrykZr4xGrWDXF4DJF1UWrg_yoWxJw47p3 yIqFZ8tr4UJF9xZw47tasrWa45Ww1rKFWDtrZrW3W3Cw1akr4rWr15Xr15Xr1vka4xCr4D Jw40qayUCFy8WFUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xv wVC2z280aVCY1x0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFc xC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_ Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2 IErcIFxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r 4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRvJPtU UUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Yu Kuai Advantages for new apis: - reconfig_mutex is not required; - the weird logical that suspend array hold 'reconfig_mutex' for mddev_check_recovery() to update superblock is not needed; - the specail handling, 'pers->prepare_suspend', for raid456 is not needed; - It's safe to be called at any time once mddev is allocated, and it's designed to be used from slow path where array configuration is changed; - the new helpers is designed to be called before mddev_lock(), hence it support to be interrupted by user as well. Signed-off-by: Yu Kuai --- drivers/md/md.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++- drivers/md/md.h | 3 ++ 2 files changed, 103 insertions(+), 2 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index cf929c42cdbe..201de29d913c 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -443,12 +443,22 @@ void mddev_suspend(struct mddev *mddev) lockdep_is_held(&mddev->reconfig_mutex)); =20 WARN_ON_ONCE(thread && current =3D=3D thread->tsk); - if (mddev->suspended++) + + /* can't concurrent with __mddev_suspend() and __mddev_resume() */ + mutex_lock(&mddev->suspend_mutex); + if (mddev->suspended++) { + mutex_unlock(&mddev->suspend_mutex); return; + } + wake_up(&mddev->sb_wait); set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags); percpu_ref_kill(&mddev->active_io); =20 + /* + * TODO: cleanup 'pers->prepare_suspend after all callers are replaced + * by __mddev_suspend(). + */ if (mddev->pers && mddev->pers->prepare_suspend) mddev->pers->prepare_suspend(mddev); =20 @@ -459,14 +469,21 @@ void mddev_suspend(struct mddev *mddev) del_timer_sync(&mddev->safemode_timer); /* restrict memory reclaim I/O during raid array is suspend */ mddev->noio_flag =3D memalloc_noio_save(); + + mutex_unlock(&mddev->suspend_mutex); } EXPORT_SYMBOL_GPL(mddev_suspend); =20 void mddev_resume(struct mddev *mddev) { lockdep_assert_held(&mddev->reconfig_mutex); - if (--mddev->suspended) + + /* can't concurrent with __mddev_suspend() and __mddev_resume() */ + mutex_lock(&mddev->suspend_mutex); + if (--mddev->suspended) { + mutex_unlock(&mddev->suspend_mutex); return; + } =20 /* entred the memalloc scope from mddev_suspend() */ memalloc_noio_restore(mddev->noio_flag); @@ -477,9 +494,89 @@ void mddev_resume(struct mddev *mddev) set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */ + + mutex_unlock(&mddev->suspend_mutex); } EXPORT_SYMBOL_GPL(mddev_resume); =20 +int __mddev_suspend(struct mddev *mddev, bool interruptible) +{ + int err =3D 0; + + /* + * hold reconfig_mutex to wait for normal io will deadlock, because + * other context can't update super_block, and normal io can rely on + * updating super_block. + */ + lockdep_assert_not_held(&mddev->reconfig_mutex); + + if (interruptible) + err =3D mutex_lock_interruptible(&mddev->suspend_mutex); + else + mutex_lock(&mddev->suspend_mutex); + if (err) + return err; + + if (mddev->suspended) { + WRITE_ONCE(mddev->suspended, mddev->suspended + 1); + mutex_unlock(&mddev->suspend_mutex); + return 0; + } + + percpu_ref_kill(&mddev->active_io); + if (interruptible) + err =3D wait_event_interruptible(mddev->sb_wait, + percpu_ref_is_zero(&mddev->active_io)); + else + wait_event(mddev->sb_wait, + percpu_ref_is_zero(&mddev->active_io)); + if (err) { + percpu_ref_resurrect(&mddev->active_io); + mutex_unlock(&mddev->suspend_mutex); + return err; + } + + /* + * For raid456, io might be waiting for reshape to make progress, + * allow new reshape to start while waiting for io to be done to + * prevent deadlock. + */ + WRITE_ONCE(mddev->suspended, mddev->suspended + 1); + + del_timer_sync(&mddev->safemode_timer); + /* restrict memory reclaim I/O during raid array is suspend */ + mddev->noio_flag =3D memalloc_noio_save(); + + mutex_unlock(&mddev->suspend_mutex); + return 0; +} +EXPORT_SYMBOL_GPL(__mddev_suspend); + +void __mddev_resume(struct mddev *mddev) +{ + lockdep_assert_not_held(&mddev->reconfig_mutex); + + mutex_lock(&mddev->suspend_mutex); + WRITE_ONCE(mddev->suspended, mddev->suspended - 1); + if (mddev->suspended) { + mutex_unlock(&mddev->suspend_mutex); + return; + } + + /* entred the memalloc scope from __mddev_suspend() */ + memalloc_noio_restore(mddev->noio_flag); + + percpu_ref_resurrect(&mddev->active_io); + wake_up(&mddev->sb_wait); + + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + md_wakeup_thread(mddev->thread); + md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */ + + mutex_unlock(&mddev->suspend_mutex); +} +EXPORT_SYMBOL_GPL(__mddev_resume); + /* * Generic flush handling for md */ @@ -672,6 +769,7 @@ int mddev_init(struct mddev *mddev) mutex_init(&mddev->open_mutex); mutex_init(&mddev->reconfig_mutex); mutex_init(&mddev->sync_mutex); + mutex_init(&mddev->suspend_mutex); mutex_init(&mddev->bitmap_info.mutex); INIT_LIST_HEAD(&mddev->disks); INIT_LIST_HEAD(&mddev->all_mddevs); diff --git a/drivers/md/md.h b/drivers/md/md.h index b628c292506e..b5894dc64615 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -316,6 +316,7 @@ struct mddev { unsigned long sb_flags; =20 int suspended; + struct mutex suspend_mutex; struct percpu_ref active_io; int ro; int sysfs_active; /* set when sysfs deletes @@ -811,6 +812,8 @@ extern void md_rdev_clear(struct md_rdev *rdev); extern void md_handle_request(struct mddev *mddev, struct bio *bio); extern void mddev_suspend(struct mddev *mddev); extern void mddev_resume(struct mddev *mddev); +extern int __mddev_suspend(struct mddev *mddev, bool interruptible); +extern void __mddev_resume(struct mddev *mddev); =20 extern void md_reload_sb(struct mddev *mddev, int raid_disk); extern void md_update_sb(struct mddev *mddev, int force); --=20 2.39.2