From nobody Sat Feb 7 09:58:45 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C3E4A42EEBA for ; Tue, 20 Jan 2026 13:33:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768916015; cv=none; b=KJY16CyePoeUQzHsRXotqOWEsbPenX3MG+ib1FB2VaPPGYCR196Tgk73JJdTVGXWNsFET37XLJBEo9t2y63ghfOgWCEpJk7W5IE7E23iws+Fywk9X/6FPhJUNs0IGaZR2dVmULWZ5nzwkXwWV1tBO5W06MhQkKGv3iu0dF0IrwA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768916015; c=relaxed/simple; bh=WxYMGLcFQNXnEaihySfQ/dWGKrp0fx37HvGdniAnji8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HWekmlyTLKbY0U9WGXxJanYfuprhoZELvEclvXYIjeh1y8JYAhoC+9PRMoyayaCTutukcEncE7u6l9IqniKpcT8NFcnM3q0rcp1DzcbPL2spuHT97/hL+W1A2/f+Bdnmxw3qNUuS7ZRV923BRC/tCXlYelyXBtLpY3jt9vV0gXg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RPhsg6hM; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RPhsg6hM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1768916013; x=1800452013; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WxYMGLcFQNXnEaihySfQ/dWGKrp0fx37HvGdniAnji8=; b=RPhsg6hMZI4PShgvlHGbLQz4rf2ImEaAQhv3Qv0e4LCN3dRpGcRSgFbk e8l/XwzW7+RqZttsK9i3yS6r8vXPYZw5mxEU96JAtXqqlGY2hBnMZny20 RdLfzBtkFyi9KzMDHaT5Gd63NHg1gjsEpyILuJ4cH1iGdq2xk4tcKlpwh +TZma6edPBQkWgFv//tD/E3HgYL1TmSY7grZFr0BCkI0+8YrFWb6LTD5k jKR0clBntyjmKoWD+7yDL5RZ2mYvjue5aBHFi023NUssWBSFemYjclH31 L8JcKHmpXLLAhsKAwbpxPb1pu+94r4Jw2WbahIpiXsZZbxNELcXU218Ul A==; X-CSE-ConnectionGUID: LuJ3xdpYTo6XlSx3/+UAnQ== X-CSE-MsgGUID: Kd0f2iJASRS5mHyj91gOvw== X-IronPort-AV: E=McAfee;i="6800,10657,11676"; a="81494836" X-IronPort-AV: E=Sophos;i="6.21,240,1763452800"; d="scan'208";a="81494836" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jan 2026 05:33:33 -0800 X-CSE-ConnectionGUID: mA9jWkbcR0uUM1E8F4F8Mw== X-CSE-MsgGUID: wwIIa3OIQ6OdJ+5U/2xA1Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,240,1763452800"; d="scan'208";a="210272018" Received: from linux-pnp-server-27.sh.intel.com ([10.239.147.41]) by orviesa003.jf.intel.com with ESMTP; 20 Jan 2026 05:33:29 -0800 From: Tianyou Li To: David Hildenbrand , Oscar Salvador , Mike Rapoport , Wei Yang , Michal Hocko Cc: linux-mm@kvack.org, Yong Hu , Nanhai Zou , Yuan Liu , Tim Chen , Qiuxu Zhuo , Yu C Chen , Pan Deng , Tianyou Li , Chen Zhang , linux-kernel@vger.kernel.org Subject: [PATCH v8 1/3] mm/memory hotplug: Fix zone->contiguous always false when hotplug Date: Tue, 20 Jan 2026 22:33:44 +0800 Message-ID: <20260120143346.1427837-2-tianyou.li@intel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20260120143346.1427837-1-tianyou.li@intel.com> References: <20260120143346.1427837-1-tianyou.li@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yuan Liu set_zone_contiguous() uses __pageblock_pfn_to_page() to detect pageblocks that either do not exist (hole) or that do not belong to the same zone. __pageblock_pfn_to_page(), however, relies on pfn_to_online_page(), effectively always returning NULL for memory ranges that were not onlined yet. So when called on a range-to-be-onlined, it indicates a memory hole to set_zone_contiguous(). Consequently, the set_zone_contiguous() call in move_pfn_range_to_zone(), which happens early during memory onlining, will never detect a zone as being contiguous. Bad. To fix the issue, move the set_zone_contiguous() call to a later stage in memory onlining, where pfn_to_online_page() will succeed: after we mark the memory sections to be online. Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to hav= e holes") Cc: Michal Hocko Reviewed-by: Nanhai Zou Signed-off-by: Yuan Liu Signed-off-by: Tianyou Li --- mm/memory_hotplug.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a63ec679d861..c8f492b5daf0 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -782,8 +782,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned= long start_pfn, memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0, MEMINIT_HOTPLUG, altmap, migratetype, isolate_pageblock); - - set_zone_contiguous(zone); } =20 struct auto_movable_stats { @@ -1205,6 +1203,13 @@ int online_pages(unsigned long pfn, unsigned long nr= _pages, } =20 online_pages_range(pfn, nr_pages); + + /* + * Now that the ranges are indicated as online, check whether the whole + * zone is contiguous. + */ + set_zone_contiguous(zone); + adjust_present_page_count(pfn_to_page(pfn), group, nr_pages); =20 if (node_arg.nid >=3D 0) --=20 2.47.1 From nobody Sat Feb 7 09:58:45 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E0BCB42DFF3 for ; Tue, 20 Jan 2026 13:33:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768916021; cv=none; b=UuqjQm5MEePgJpvxMMVA1YJWek2uUa6FFtNoveEk1uLAPe4AYBCMOcWdm7GaG4sbJFBG82iPnCmB3lHlT6muY/WFhLPvJ+TOkDYQKUsQSQowRD3qVgmOEIcUDfo/whhz20pEnMYGgQTgZs480tGi5/jhDl0qUrjvx81Jl0JBdWk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768916021; c=relaxed/simple; bh=sGg98/0mfkXeZaIF7/NLUeUNQ16GKxKIfCu6nxC+d64=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=r5FSvNEb3e3NsS1bdyp0l+0IhdNB2h7DEyqK+uafXBOYt7NQdYhcDM622uP0GrIQoiYqoX7ft/UypKk9SdC/+pph7IcOA9SU1wZkLoCbPr+YG80llGCqsZfiAn4zaiNQlYX1Wopj+V9ghz6OuekVVj3npPMsBxXERJOoYUUa/aA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TdP4HPmm; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TdP4HPmm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1768916020; x=1800452020; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sGg98/0mfkXeZaIF7/NLUeUNQ16GKxKIfCu6nxC+d64=; b=TdP4HPmmEfRTsUt8FX6qLRWshvfcFv4CHm6bxkmKUhSXiEmNy7WeR7Wi W4h38rjk4AjYl4eGisRw6MsaprDXM1k57YXAAjvMwdw/HZ1QcwPIci0p2 O6cdaMVxB2DSRTPB7EjB7D6Bze3fNajZxHee7ap/Bay24/gXQAYRxeYHq PjiZn3Dhh6fHieg6wIa7rGXybNP24PXMqsOXF2ldqAdEg5WATavTYiksO LIXWknARLAljoCAUwIcjcffG4cAdhE5mn2G3FwwbwKM7+cLw+SJkXkHpv ykSAiJMHtbqoclPqHH+LxgJAcwu3wr4bw9SM7h7yh9v8G4lLXHNYDlICL g==; X-CSE-ConnectionGUID: y2vkG4zRTp66LQm0517sKg== X-CSE-MsgGUID: PNsCtTkbTjWYMm/ig/UykA== X-IronPort-AV: E=McAfee;i="6800,10657,11676"; a="81494866" X-IronPort-AV: E=Sophos;i="6.21,240,1763452800"; d="scan'208";a="81494866" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jan 2026 05:33:39 -0800 X-CSE-ConnectionGUID: S+MvX06HQgqHMRKQsNBUiQ== X-CSE-MsgGUID: Gz9yuQqbR9q/M8E47MMh+g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,240,1763452800"; d="scan'208";a="210272049" Received: from linux-pnp-server-27.sh.intel.com ([10.239.147.41]) by orviesa003.jf.intel.com with ESMTP; 20 Jan 2026 05:33:35 -0800 From: Tianyou Li To: David Hildenbrand , Oscar Salvador , Mike Rapoport , Wei Yang , Michal Hocko Cc: linux-mm@kvack.org, Yong Hu , Nanhai Zou , Yuan Liu , Tim Chen , Qiuxu Zhuo , Yu C Chen , Pan Deng , Tianyou Li , Chen Zhang , linux-kernel@vger.kernel.org Subject: [PATCH v8 2/3] mm/memory hotplug/unplug: Add online_memory_block_pages() and offline_memory_block_pages() Date: Tue, 20 Jan 2026 22:33:45 +0800 Message-ID: <20260120143346.1427837-3-tianyou.li@intel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20260120143346.1427837-1-tianyou.li@intel.com> References: <20260120143346.1427837-1-tianyou.li@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Encapsulate the mhp_init_memmap_on_memory() and online_pages() into online_memory_block_pages(). Thus we can further optimize the set_zone_contiguous() to check the whole memory block range, instead of check the zone contiguous in separate range. Correspondingly, encapsulate the mhp_deinit_memmap_on_memory() and offline_pages() into offline_memory_block_pages(). Tested-by: Yuan Liu Reviewed-by: Yuan Liu Signed-off-by: Tianyou Li --- drivers/base/memory.c | 53 ++++++--------------------- include/linux/memory_hotplug.h | 18 +++++----- mm/memory_hotplug.c | 65 +++++++++++++++++++++++++++++++--- 3 files changed, 80 insertions(+), 56 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 751f248ca4a8..ea4d6fbf34fd 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -246,31 +246,12 @@ static int memory_block_online(struct memory_block *m= em) nr_vmemmap_pages =3D mem->altmap->free; =20 mem_hotplug_begin(); - if (nr_vmemmap_pages) { - ret =3D mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone); - if (ret) - goto out; - } - - ret =3D online_pages(start_pfn + nr_vmemmap_pages, - nr_pages - nr_vmemmap_pages, zone, mem->group); - if (ret) { - if (nr_vmemmap_pages) - mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages); - goto out; - } - - /* - * Account once onlining succeeded. If the zone was unpopulated, it is - * now already properly populated. - */ - if (nr_vmemmap_pages) - adjust_present_page_count(pfn_to_page(start_pfn), mem->group, - nr_vmemmap_pages); - - mem->zone =3D zone; -out: + ret =3D online_memory_block_pages(start_pfn, nr_pages, nr_vmemmap_pages, + zone, mem->group); + if (!ret) + mem->zone =3D zone; mem_hotplug_done(); + return ret; } =20 @@ -295,26 +276,12 @@ static int memory_block_offline(struct memory_block *= mem) nr_vmemmap_pages =3D mem->altmap->free; =20 mem_hotplug_begin(); - if (nr_vmemmap_pages) - adjust_present_page_count(pfn_to_page(start_pfn), mem->group, - -nr_vmemmap_pages); - - ret =3D offline_pages(start_pfn + nr_vmemmap_pages, - nr_pages - nr_vmemmap_pages, mem->zone, mem->group); - if (ret) { - /* offline_pages() failed. Account back. */ - if (nr_vmemmap_pages) - adjust_present_page_count(pfn_to_page(start_pfn), - mem->group, nr_vmemmap_pages); - goto out; - } - - if (nr_vmemmap_pages) - mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages); - - mem->zone =3D NULL; -out: + ret =3D offline_memory_block_pages(start_pfn, nr_pages, nr_vmemmap_pages, + mem->zone, mem->group); + if (!ret) + mem->zone =3D NULL; mem_hotplug_done(); + return ret; } =20 diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index f2f16cdd73ee..1f8d5edd820d 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -106,11 +106,9 @@ extern void adjust_present_page_count(struct page *pag= e, struct memory_group *group, long nr_pages); /* VM interface that may be used by firmware interface */ -extern int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_p= ages, - struct zone *zone); -extern void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long n= r_pages); -extern int online_pages(unsigned long pfn, unsigned long nr_pages, - struct zone *zone, struct memory_group *group); +extern int online_memory_block_pages(unsigned long start_pfn, + unsigned long nr_pages, unsigned long nr_vmemmap_pages, + struct zone *zone, struct memory_group *group); extern unsigned long __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn); =20 @@ -261,8 +259,9 @@ static inline void pgdat_resize_init(struct pglist_data= *pgdat) {} #ifdef CONFIG_MEMORY_HOTREMOVE =20 extern void try_offline_node(int nid); -extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages, - struct zone *zone, struct memory_group *group); +extern int offline_memory_block_pages(unsigned long start_pfn, + unsigned long nr_pages, unsigned long nr_vmemmap_pages, + struct zone *zone, struct memory_group *group); extern int remove_memory(u64 start, u64 size); extern void __remove_memory(u64 start, u64 size); extern int offline_and_remove_memory(u64 start, u64 size); @@ -270,8 +269,9 @@ extern int offline_and_remove_memory(u64 start, u64 siz= e); #else static inline void try_offline_node(int nid) {} =20 -static inline int offline_pages(unsigned long start_pfn, unsigned long nr_= pages, - struct zone *zone, struct memory_group *group) +static inline int offline_memory_block_pages(unsigned long start_pfn, + unsigned long nr_pages, unsigned long nr_vmemmap_pages, + struct zone *zone, struct memory_group *group) { return -EINVAL; } diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index c8f492b5daf0..8793a83702c5 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1085,7 +1085,7 @@ void adjust_present_page_count(struct page *page, str= uct memory_group *group, group->present_kernel_pages +=3D nr_pages; } =20 -int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, +static int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_p= ages, struct zone *zone) { unsigned long end_pfn =3D pfn + nr_pages; @@ -1116,7 +1116,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsi= gned long nr_pages, return ret; } =20 -void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages) +static void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long n= r_pages) { unsigned long end_pfn =3D pfn + nr_pages; =20 @@ -1139,7 +1139,7 @@ void mhp_deinit_memmap_on_memory(unsigned long pfn, u= nsigned long nr_pages) /* * Must be called with mem_hotplug_lock in write mode. */ -int online_pages(unsigned long pfn, unsigned long nr_pages, +static int online_pages(unsigned long pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group) { struct memory_notify mem_arg =3D { @@ -1254,6 +1254,37 @@ int online_pages(unsigned long pfn, unsigned long nr= _pages, return ret; } =20 +int online_memory_block_pages(unsigned long start_pfn, + unsigned long nr_pages, unsigned long nr_vmemmap_pages, + struct zone *zone, struct memory_group *group) +{ + int ret; + + if (nr_vmemmap_pages) { + ret =3D mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone); + if (ret) + return ret; + } + + ret =3D online_pages(start_pfn + nr_vmemmap_pages, + nr_pages - nr_vmemmap_pages, zone, group); + if (ret) { + if (nr_vmemmap_pages) + mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages); + return ret; + } + + /* + * Account once onlining succeeded. If the zone was unpopulated, it is + * now already properly populated. + */ + if (nr_vmemmap_pages) + adjust_present_page_count(pfn_to_page(start_pfn), group, + nr_vmemmap_pages); + + return ret; +} + /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG = */ static pg_data_t *hotadd_init_pgdat(int nid) { @@ -1896,7 +1927,7 @@ static int count_system_ram_pages_cb(unsigned long st= art_pfn, /* * Must be called with mem_hotplug_lock in write mode. */ -int offline_pages(unsigned long start_pfn, unsigned long nr_pages, +static int offline_pages(unsigned long start_pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group) { unsigned long pfn, managed_pages, system_ram_pages =3D 0; @@ -2101,6 +2132,32 @@ int offline_pages(unsigned long start_pfn, unsigned = long nr_pages, return ret; } =20 +int offline_memory_block_pages(unsigned long start_pfn, + unsigned long nr_pages, unsigned long nr_vmemmap_pages, + struct zone *zone, struct memory_group *group) +{ + int ret; + + if (nr_vmemmap_pages) + adjust_present_page_count(pfn_to_page(start_pfn), group, + -nr_vmemmap_pages); + + ret =3D offline_pages(start_pfn + nr_vmemmap_pages, + nr_pages - nr_vmemmap_pages, zone, group); + if (ret) { + /* offline_pages() failed. Account back. */ + if (nr_vmemmap_pages) + adjust_present_page_count(pfn_to_page(start_pfn), + group, nr_vmemmap_pages); + return ret; + } + + if (nr_vmemmap_pages) + mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages); + + return ret; +} + static int check_memblock_offlined_cb(struct memory_block *mem, void *arg) { int *nid =3D arg; --=20 2.47.1 From nobody Sat Feb 7 09:58:45 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91C1A438FF1 for ; Tue, 20 Jan 2026 13:33:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768916031; cv=none; b=EGfWUB0BroCQFA2FCMyK+is2Z4CJRAeZEGYw3RMNko3sM3XimhKT18bZum3dWb+L1ftKs3jHxWoycIFoVjjOaQQgOQsXqW4W3tfSNhJGHoigMJk9/gBtn6Y1UbA0oM3hciqv8dZfZDiAZAPuGjVmAl1W4QW2b9BLlCTxiAmvaUQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768916031; c=relaxed/simple; bh=uRXJUMxk0Hb4jFSZhLv2yNQYjNQfhMU9CNfEjHKF7go=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=une9y5OkSRT9aDgDBLI9ZDSBHHBe8miCPmr98d4vt3jfGsKRbYu61v/O/muoPrMqO+JsQmJcnA5djhUWzs/OTsOYOf76Qz2sBgjz7ubt4OrfGA/5tLNz1FsfhyQi52Rytm5fsTsomIFRtkCGMb7xAeVjzUHQ6mTwd9DkhIXy3x0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SOm6hqBP; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SOm6hqBP" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1768916028; x=1800452028; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uRXJUMxk0Hb4jFSZhLv2yNQYjNQfhMU9CNfEjHKF7go=; b=SOm6hqBPQ8NwXVEt24R0SNoPoEjyqdqC1D8UTA5hSIegfCY/UlZ7IQj4 Aar9mYwwnvoNLIcaZzvt5uO9vl184/Xxzmn14EEn0MqOZvG5IYM8KrAHx nlEWI+QtyzFs0q3yjS+xiR5YvYNX46erH1Ctt7MUML9yF9y0FAm5gEIcz Qq3Bb50NIGk3fpAYFDlng2NRWzRHAH8ak7hRq0Q7SgKpEDFn39I/pTvJt UUKmqDFhnbniKnEc4o/8fiPfNCYoEN/CCuBYuTgDSBQ2aS1303wTu9FY/ yEvXTNgtmLGiKyPiDoMwQAlTc1qzaM4AGq06zKF0nfHTYKhg1q6GqoD6f A==; X-CSE-ConnectionGUID: tI9ZMPefRAeY5o13WWvKQQ== X-CSE-MsgGUID: oZ6vRYhLQWGe4zBIerbPTQ== X-IronPort-AV: E=McAfee;i="6800,10657,11676"; a="81494884" X-IronPort-AV: E=Sophos;i="6.21,240,1763452800"; d="scan'208";a="81494884" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jan 2026 05:33:48 -0800 X-CSE-ConnectionGUID: HbLPm/C3QO+Cm3NSPQ6zHA== X-CSE-MsgGUID: LcT4k1BQQbu08CJxGXonRg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,240,1763452800"; d="scan'208";a="210272063" Received: from linux-pnp-server-27.sh.intel.com ([10.239.147.41]) by orviesa003.jf.intel.com with ESMTP; 20 Jan 2026 05:33:44 -0800 From: Tianyou Li To: David Hildenbrand , Oscar Salvador , Mike Rapoport , Wei Yang , Michal Hocko Cc: linux-mm@kvack.org, Yong Hu , Nanhai Zou , Yuan Liu , Tim Chen , Qiuxu Zhuo , Yu C Chen , Pan Deng , Tianyou Li , Chen Zhang , linux-kernel@vger.kernel.org Subject: [PATCH v8 3/3] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Date: Tue, 20 Jan 2026 22:33:46 +0800 Message-ID: <20260120143346.1427837-4-tianyou.li@intel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20260120143346.1427837-1-tianyou.li@intel.com> References: <20260120143346.1427837-1-tianyou.li@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will update the zone->contiguous by checking the new zone's pfn range from the beginning to the end, regardless the previous state of the old zone. When the zone's pfn range is large, the cost of traversing the pfn range to update the zone->contiguous could be significant. Add fast paths to quickly detect cases where zone is definitely not contiguous without scanning the new zone. The cases are: when the new range did not overlap with previous range, the contiguous should be false; if the new range adjacent with the previous range, just need to check the new range; if the new added pages could not fill the hole of previous zone, the contiguous should be false. The following test cases of memory hotplug for a VM [1], tested in the environment [2], show that this optimization can significantly reduce the memory hotplug time [3]. +----------------+------+---------------+--------------+----------------+ | | Size | Time (before) | Time (after) | Time Reduction | | +------+---------------+--------------+----------------+ | Plug Memory | 256G | 10s | 2s | 80% | | +------+---------------+--------------+----------------+ | | 512G | 33s | 6s | 81% | +----------------+------+---------------+--------------+----------------+ +----------------+------+---------------+--------------+----------------+ | | Size | Time (before) | Time (after) | Time Reduction | | +------+---------------+--------------+----------------+ | Unplug Memory | 256G | 10s | 2s | 80% | | +------+---------------+--------------+----------------+ | | 512G | 34s | 6s | 82% | +----------------+------+---------------+--------------+----------------+ [1] Qemu commands to hotplug 256G/512G memory for a VM: object_add memory-backend-ram,id=3Dhotmem0,size=3D256G/512G,share=3Don device_add virtio-mem-pci,id=3Dvmem1,memdev=3Dhotmem0,bus=3Dport1 qom-set vmem1 requested-size 256G/512G (Plug Memory) qom-set vmem1 requested-size 0G (Unplug Memory) [2] Hardware : Intel Icelake server Guest Kernel : v6.18-rc2 Qemu : v9.0.0 Launch VM : qemu-system-x86_64 -accel kvm -cpu host \ -drive file=3D./Centos10_cloud.qcow2,format=3Dqcow2,if=3Dvirtio \ -drive file=3D./seed.img,format=3Draw,if=3Dvirtio \ -smp 3,cores=3D3,threads=3D1,sockets=3D1,maxcpus=3D3 \ -m 2G,slots=3D10,maxmem=3D2052472M \ -device pcie-root-port,id=3Dport1,bus=3Dpcie.0,slot=3D1,multifunction= =3Don \ -device pcie-root-port,id=3Dport2,bus=3Dpcie.0,slot=3D2 \ -nographic -machine q35 \ -nic user,hostfwd=3Dtcp::3000-:22 Guest kernel auto-onlines newly added memory blocks: echo online > /sys/devices/system/memory/auto_online_blocks [3] The time from typing the QEMU commands in [1] to when the output of 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged memory is recognized. Reported-by: Nanhai Zou Reported-by: Chen Zhang Tested-by: Yuan Liu Reviewed-by: Tim Chen Reviewed-by: Qiuxu Zhuo Reviewed-by: Yu C Chen Reviewed-by: Pan Deng Reviewed-by: Nanhai Zou Reviewed-by: Yuan Liu Signed-off-by: Tianyou Li --- mm/internal.h | 8 ++++- mm/memory_hotplug.c | 86 +++++++++++++++++++++++++++++++++++++-------- mm/mm_init.c | 15 ++++++-- 3 files changed, 92 insertions(+), 17 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index e430da900430..828aed5c2fef 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -730,7 +730,13 @@ static inline struct page *pageblock_pfn_to_page(unsig= ned long start_pfn, return __pageblock_pfn_to_page(start_pfn, end_pfn, zone); } =20 -void set_zone_contiguous(struct zone *zone); +enum zone_contig_state { + ZONE_CONTIG_YES, + ZONE_CONTIG_NO, + ZONE_CONTIG_MAYBE, +}; + +void set_zone_contiguous(struct zone *zone, enum zone_contig_state state); bool pfn_range_intersects_zones(int nid, unsigned long start_pfn, unsigned long nr_pages); =20 diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 8793a83702c5..7b8feaca0d63 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -544,6 +544,25 @@ static void update_pgdat_span(struct pglist_data *pgda= t) pgdat->node_spanned_pages =3D node_end_pfn - node_start_pfn; } =20 +static enum zone_contig_state zone_contig_state_after_shrinking(struct zon= e *zone, + unsigned long start_pfn, unsigned long nr_pages) +{ + const unsigned long end_pfn =3D start_pfn + nr_pages; + + /* + * If we cut a hole into the zone span, then the zone is + * certainly not contiguous. + */ + if (start_pfn > zone->zone_start_pfn && end_pfn < zone_end_pfn(zone)) + return ZONE_CONTIG_NO; + + /* Removing from the start/end of the zone will not change anything. */ + if (start_pfn =3D=3D zone->zone_start_pfn || end_pfn =3D=3D zone_end_pfn(= zone)) + return zone->contiguous ? ZONE_CONTIG_YES : ZONE_CONTIG_MAYBE; + + return ZONE_CONTIG_MAYBE; +} + void remove_pfn_range_from_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages) @@ -551,6 +570,7 @@ void remove_pfn_range_from_zone(struct zone *zone, const unsigned long end_pfn =3D start_pfn + nr_pages; struct pglist_data *pgdat =3D zone->zone_pgdat; unsigned long pfn, cur_nr_pages; + enum zone_contig_state new_contiguous_state; =20 /* Poison struct pages because they are now uninitialized again. */ for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D cur_nr_pages) { @@ -571,12 +591,14 @@ void remove_pfn_range_from_zone(struct zone *zone, if (zone_is_zone_device(zone)) return; =20 + new_contiguous_state =3D zone_contig_state_after_shrinking(zone, start_pf= n, + nr_pages); clear_zone_contiguous(zone); =20 shrink_zone_span(zone, start_pfn, start_pfn + nr_pages); update_pgdat_span(pgdat); =20 - set_zone_contiguous(zone); + set_zone_contiguous(zone, new_contiguous_state); } =20 /** @@ -736,6 +758,32 @@ static inline void section_taint_zone_device(unsigned = long pfn) } #endif =20 +static enum zone_contig_state zone_contig_state_after_growing(struct zone = *zone, + unsigned long start_pfn, unsigned long nr_pages) +{ + const unsigned long end_pfn =3D start_pfn + nr_pages; + + if (zone_is_empty(zone)) + return ZONE_CONTIG_YES; + + /* + * If the moved pfn range does not intersect with the original zone span + * the zone is surely not contiguous. + */ + if (end_pfn < zone->zone_start_pfn || start_pfn > zone_end_pfn(zone)) + return ZONE_CONTIG_NO; + + /* Adding to the start/end of the zone will not change anything. */ + if (end_pfn =3D=3D zone->zone_start_pfn || start_pfn =3D=3D zone_end_pfn(= zone)) + return zone->contiguous ? ZONE_CONTIG_YES : ZONE_CONTIG_NO; + + /* If we cannot fill the hole, the zone stays not contiguous. */ + if (nr_pages < (zone->spanned_pages - zone->present_pages)) + return ZONE_CONTIG_NO; + + return ZONE_CONTIG_MAYBE; +} + /* * Associate the pfn range with the given zone, initializing the memmaps * and resizing the pgdat/zone data to span the added pages. After this @@ -1165,7 +1213,6 @@ static int online_pages(unsigned long pfn, unsigned l= ong nr_pages, !IS_ALIGNED(pfn + nr_pages, PAGES_PER_SECTION))) return -EINVAL; =20 - /* associate pfn range with the zone */ move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_MOVABLE, true); @@ -1203,13 +1250,6 @@ static int online_pages(unsigned long pfn, unsigned = long nr_pages, } =20 online_pages_range(pfn, nr_pages); - - /* - * Now that the ranges are indicated as online, check whether the whole - * zone is contiguous. - */ - set_zone_contiguous(zone); - adjust_present_page_count(pfn_to_page(pfn), group, nr_pages); =20 if (node_arg.nid >=3D 0) @@ -1254,16 +1294,25 @@ static int online_pages(unsigned long pfn, unsigned= long nr_pages, return ret; } =20 -int online_memory_block_pages(unsigned long start_pfn, - unsigned long nr_pages, unsigned long nr_vmemmap_pages, - struct zone *zone, struct memory_group *group) +int online_memory_block_pages(unsigned long start_pfn, unsigned long nr_pa= ges, + unsigned long nr_vmemmap_pages, struct zone *zone, + struct memory_group *group) { + const bool contiguous =3D zone->contiguous; + enum zone_contig_state new_contiguous_state; int ret; =20 + /* + * Calculate the new zone contig state before move_pfn_range_to_zone() + * sets the zone temporarily to non-contiguous. + */ + new_contiguous_state =3D zone_contig_state_after_growing(zone, start_pfn, + nr_pages); + if (nr_vmemmap_pages) { ret =3D mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone); if (ret) - return ret; + goto restore_zone_contig; } =20 ret =3D online_pages(start_pfn + nr_vmemmap_pages, @@ -1271,7 +1320,7 @@ int online_memory_block_pages(unsigned long start_pfn, if (ret) { if (nr_vmemmap_pages) mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages); - return ret; + goto restore_zone_contig; } =20 /* @@ -1282,6 +1331,15 @@ int online_memory_block_pages(unsigned long start_pf= n, adjust_present_page_count(pfn_to_page(start_pfn), group, nr_vmemmap_pages); =20 + /* + * Now that the ranges are indicated as online, check whether the whole + * zone is contiguous. + */ + set_zone_contiguous(zone, new_contiguous_state); + return 0; + +restore_zone_contig: + zone->contiguous =3D contiguous; return ret; } =20 diff --git a/mm/mm_init.c b/mm/mm_init.c index fc2a6f1e518f..5ed3fbd5c643 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2263,11 +2263,22 @@ void __init init_cma_pageblock(struct page *page) } #endif =20 -void set_zone_contiguous(struct zone *zone) +void set_zone_contiguous(struct zone *zone, enum zone_contig_state state) { unsigned long block_start_pfn =3D zone->zone_start_pfn; unsigned long block_end_pfn; =20 + /* We expect an earlier call to clear_zone_contiguous(). */ + VM_WARN_ON_ONCE(zone->contiguous); + + if (state =3D=3D ZONE_CONTIG_YES) { + zone->contiguous =3D true; + return; + } + + if (state =3D=3D ZONE_CONTIG_NO) + return; + block_end_pfn =3D pageblock_end_pfn(block_start_pfn); for (; block_start_pfn < zone_end_pfn(zone); block_start_pfn =3D block_end_pfn, @@ -2348,7 +2359,7 @@ void __init page_alloc_init_late(void) shuffle_free_memory(NODE_DATA(nid)); =20 for_each_populated_zone(zone) - set_zone_contiguous(zone); + set_zone_contiguous(zone, ZONE_CONTIG_MAYBE); =20 /* Initialize page ext after all struct pages are initialized. */ if (deferred_struct_pages) --=20 2.47.1