From nobody Sun Feb 8 12:31:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7F3AEB64DA for ; Wed, 14 Jun 2023 06:19:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243087AbjFNGT1 (ORCPT ); Wed, 14 Jun 2023 02:19:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242990AbjFNGTJ (ORCPT ); Wed, 14 Jun 2023 02:19:09 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F2BF1BCA; Tue, 13 Jun 2023 23:19:08 -0700 (PDT) Received: from dggpemm500014.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4QgwCg6tnmzLqNk; Wed, 14 Jun 2023 14:15:59 +0800 (CST) Received: from localhost.localdomain (10.175.112.125) by dggpemm500014.china.huawei.com (7.185.36.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Wed, 14 Jun 2023 14:19:04 +0800 From: Wupeng Ma To: , CC: , , , , Wei Yang , "Michael S. Tsirkin" , Jason Wang , Pankaj Gupta , Michal Hocko , Oscar Salvador Subject: [PATCH stable 5.10 1/1] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block Date: Wed, 14 Jun 2023 14:19:00 +0800 Message-ID: <20230614061900.3296725-2-mawupeng1@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230614061900.3296725-1-mawupeng1@huawei.com> References: <20230614061900.3296725-1-mawupeng1@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.175.112.125] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm500014.china.huawei.com (7.185.36.153) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: David Hildenbrand virtio-mem soon wants to use offline_and_remove_memory() memory that exceeds a single Linux memory block (memory_block_size_bytes()). Let's remove that restriction. Let's remember the old state and try to restore that if anything goes wrong. While re-onlining can, in general, fail, it's highly unlikely to happen (usually only when a notifier fails to allocate memory, and these are rather rare). This will be used by virtio-mem to offline+remove memory ranges that are bigger than a single memory block - for example, with a device block size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory block size of 128MB. While we could compress the state into 2 bit, using 8 bit is much easier. This handling is similar, but different to acpi_scan_try_to_offline(): a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG optimization is still relevant - it should only apply to ZONE_NORMAL (where we have no guarantees). If relevant, we can always add it. b) acpi_scan_try_to_offline() simply onlines all memory in case something goes wrong. It doesn't restore previous online type. Let's do that, so we won't overwrite what e.g., user space configured. Reviewed-by: Wei Yang Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Pankaj Gupta Cc: Michal Hocko Cc: Oscar Salvador Cc: Wei Yang Cc: Andrew Morton Signed-off-by: David Hildenbrand Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com Signed-off-by: Michael S. Tsirkin Acked-by: Andrew Morton --- mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 89 insertions(+), 16 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index f0633f9a9116..9ec9e1e67705 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1788,39 +1788,112 @@ int remove_memory(int nid, u64 start, u64 size) } EXPORT_SYMBOL_GPL(remove_memory); =20 +static int try_offline_memory_block(struct memory_block *mem, void *arg) +{ + uint8_t online_type =3D MMOP_ONLINE_KERNEL; + uint8_t **online_types =3D arg; + struct page *page; + int rc; + + /* + * Sense the online_type via the zone of the memory block. Offlining + * with multiple zones within one memory block will be rejected + * by offlining code ... so we don't care about that. + */ + page =3D pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr)); + if (page && zone_idx(page_zone(page)) =3D=3D ZONE_MOVABLE) + online_type =3D MMOP_ONLINE_MOVABLE; + + rc =3D device_offline(&mem->dev); + /* + * Default is MMOP_OFFLINE - change it only if offlining succeeded, + * so try_reonline_memory_block() can do the right thing. + */ + if (!rc) + **online_types =3D online_type; + + (*online_types)++; + /* Ignore if already offline. */ + return rc < 0 ? rc : 0; +} + +static int try_reonline_memory_block(struct memory_block *mem, void *arg) +{ + uint8_t **online_types =3D arg; + int rc; + + if (**online_types !=3D MMOP_OFFLINE) { + mem->online_type =3D **online_types; + rc =3D device_online(&mem->dev); + if (rc < 0) + pr_warn("%s: Failed to re-online memory: %d", + __func__, rc); + } + + /* Continue processing all remaining memory blocks. */ + (*online_types)++; + return 0; +} + /* - * Try to offline and remove a memory block. Might take a long time to - * finish in case memory is still in use. Primarily useful for memory devi= ces - * that logically unplugged all memory (so it's no longer in use) and want= to - * offline + remove the memory block. + * Try to offline and remove memory. Might take a long time to finish in c= ase + * memory is still in use. Primarily useful for memory devices that logica= lly + * unplugged all memory (so it's no longer in use) and want to offline + r= emove + * that memory. */ int offline_and_remove_memory(int nid, u64 start, u64 size) { - struct memory_block *mem; - int rc =3D -EINVAL; + const unsigned long mb_count =3D size / memory_block_size_bytes(); + uint8_t *online_types, *tmp; + int rc; =20 if (!IS_ALIGNED(start, memory_block_size_bytes()) || - size !=3D memory_block_size_bytes()) - return rc; + !IS_ALIGNED(size, memory_block_size_bytes()) || !size) + return -EINVAL; + + /* + * We'll remember the old online type of each memory block, so we can + * try to revert whatever we did when offlining one memory block fails + * after offlining some others succeeded. + */ + online_types =3D kmalloc_array(mb_count, sizeof(*online_types), + GFP_KERNEL); + if (!online_types) + return -ENOMEM; + /* + * Initialize all states to MMOP_OFFLINE, so when we abort processing in + * try_offline_memory_block(), we'll skip all unprocessed blocks in + * try_reonline_memory_block(). + */ + memset(online_types, MMOP_OFFLINE, mb_count); =20 lock_device_hotplug(); - mem =3D find_memory_block(__pfn_to_section(PFN_DOWN(start))); - if (mem) - rc =3D device_offline(&mem->dev); - /* Ignore if the device is already offline. */ - if (rc > 0) - rc =3D 0; + + tmp =3D online_types; + rc =3D walk_memory_blocks(start, size, &tmp, try_offline_memory_block); =20 /* - * In case we succeeded to offline the memory block, remove it. + * In case we succeeded to offline all memory, remove it. * This cannot fail as it cannot get onlined in the meantime. */ if (!rc) { rc =3D try_remove_memory(nid, start, size); - WARN_ON_ONCE(rc); + if (rc) + pr_err("%s: Failed to remove memory: %d", __func__, rc); + } + + /* + * Rollback what we did. While memory onlining might theoretically fail + * (nacked by a notifier), it barely ever happens. + */ + if (rc) { + tmp =3D online_types; + walk_memory_blocks(start, size, &tmp, + try_reonline_memory_block); } unlock_device_hotplug(); =20 + kfree(online_types); return rc; } EXPORT_SYMBOL_GPL(offline_and_remove_memory); --=20 2.25.1