[PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range

Tianyou Li posted 2 patches 1 week, 2 days ago
[PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
Posted by Tianyou Li 1 week, 2 days ago
When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
update the zone->contiguous by checking the new zone's pfn range from the
beginning to the end, regardless the previous state of the old zone. When
the zone's pfn range is large, the cost of traversing the pfn range to
update the zone->contiguous could be significant.

Add fast paths to quickly detect cases where zone is definitely not
contiguous without scanning the new zone. The cases are: when the new range
did not overlap with previous range, the contiguous should be false; if the
new range adjacent with the previous range, just need to check the new
range; if the new added pages could not fill the hole of previous zone, the
contiguous should be false.

The following test cases of memory hotplug for a VM [1], tested in the
environment [2], show that this optimization can significantly reduce the
memory hotplug time [3].

+----------------+------+---------------+--------------+----------------+
|                | Size | Time (before) | Time (after) | Time Reduction |
|                +------+---------------+--------------+----------------+
| Plug Memory    | 256G |      10s      |      2s      |       80%      |
|                +------+---------------+--------------+----------------+
|                | 512G |      33s      |      6s      |       81%      |
+----------------+------+---------------+--------------+----------------+

+----------------+------+---------------+--------------+----------------+
|                | Size | Time (before) | Time (after) | Time Reduction |
|                +------+---------------+--------------+----------------+
| Unplug Memory  | 256G |      10s      |      2s      |       80%      |
|                +------+---------------+--------------+----------------+
|                | 512G |      34s      |      6s      |       82%      |
+----------------+------+---------------+--------------+----------------+

[1] Qemu commands to hotplug 256G/512G memory for a VM:
    object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
    device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
    qom-set vmem1 requested-size 256G/512G (Plug Memory)
    qom-set vmem1 requested-size 0G (Unplug Memory)

[2] Hardware     : Intel Icelake server
    Guest Kernel : v6.18-rc2
    Qemu         : v9.0.0

    Launch VM    :
    qemu-system-x86_64 -accel kvm -cpu host \
    -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
    -drive file=./seed.img,format=raw,if=virtio \
    -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
    -m 2G,slots=10,maxmem=2052472M \
    -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
    -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
    -nographic -machine q35 \
    -nic user,hostfwd=tcp::3000-:22

    Guest kernel auto-onlines newly added memory blocks:
    echo online > /sys/devices/system/memory/auto_online_blocks

[3] The time from typing the QEMU commands in [1] to when the output of
    'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
    memory is recognized.

Reported-by: Nanhai Zou <nanhai.zou@intel.com>
Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
Tested-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
Reviewed-by: Pan Deng <pan.deng@intel.com>
Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
---
 mm/internal.h       |  8 ++++-
 mm/memory_hotplug.c | 80 ++++++++++++++++++++++++++++++++++++++-------
 mm/mm_init.c        | 15 +++++++--
 3 files changed, 89 insertions(+), 14 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index e430da900430..828aed5c2fef 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -730,7 +730,13 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
 	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
 }
 
-void set_zone_contiguous(struct zone *zone);
+enum zone_contig_state {
+	ZONE_CONTIG_YES,
+	ZONE_CONTIG_NO,
+	ZONE_CONTIG_MAYBE,
+};
+
+void set_zone_contiguous(struct zone *zone, enum zone_contig_state state);
 bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
 			   unsigned long nr_pages);
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 62d6bc8ea2dd..e7a97c9c35be 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -544,6 +544,25 @@ static void update_pgdat_span(struct pglist_data *pgdat)
 	pgdat->node_spanned_pages = node_end_pfn - node_start_pfn;
 }
 
+static enum zone_contig_state zone_contig_state_after_shrinking(struct zone *zone,
+				unsigned long start_pfn, unsigned long nr_pages)
+{
+	const unsigned long end_pfn = start_pfn + nr_pages;
+
+	/*
+	 * If we cut a hole into the zone span, then the zone is
+	 * certainly not contiguous.
+	 */
+	if (start_pfn > zone->zone_start_pfn && end_pfn < zone_end_pfn(zone))
+		return ZONE_CONTIG_NO;
+
+	/* Removing from the start/end of the zone will not change anything. */
+	if (start_pfn == zone->zone_start_pfn || end_pfn == zone_end_pfn(zone))
+		return zone->contiguous ? ZONE_CONTIG_YES : ZONE_CONTIG_MAYBE;
+
+	return ZONE_CONTIG_MAYBE;
+}
+
 void remove_pfn_range_from_zone(struct zone *zone,
 				      unsigned long start_pfn,
 				      unsigned long nr_pages)
@@ -551,6 +570,7 @@ void remove_pfn_range_from_zone(struct zone *zone,
 	const unsigned long end_pfn = start_pfn + nr_pages;
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	unsigned long pfn, cur_nr_pages;
+	enum zone_contig_state new_contiguous_state;
 
 	/* Poison struct pages because they are now uninitialized again. */
 	for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) {
@@ -571,12 +591,14 @@ void remove_pfn_range_from_zone(struct zone *zone,
 	if (zone_is_zone_device(zone))
 		return;
 
+	new_contiguous_state = zone_contig_state_after_shrinking(zone, start_pfn,
+								 nr_pages);
 	clear_zone_contiguous(zone);
 
 	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
 	update_pgdat_span(pgdat);
 
-	set_zone_contiguous(zone);
+	set_zone_contiguous(zone, new_contiguous_state);
 }
 
 /**
@@ -736,6 +758,32 @@ static inline void section_taint_zone_device(unsigned long pfn)
 }
 #endif
 
+static enum zone_contig_state zone_contig_state_after_growing(struct zone *zone,
+				unsigned long start_pfn, unsigned long nr_pages)
+{
+	const unsigned long end_pfn = start_pfn + nr_pages;
+
+	if (zone_is_empty(zone))
+		return ZONE_CONTIG_YES;
+
+	/*
+	 * If the moved pfn range does not intersect with the original zone span
+	 * the zone is surely not contiguous.
+	 */
+	if (end_pfn < zone->zone_start_pfn || start_pfn > zone_end_pfn(zone))
+		return ZONE_CONTIG_NO;
+
+	/* Adding to the start/end of the zone will not change anything. */
+	if (end_pfn == zone->zone_start_pfn || start_pfn == zone_end_pfn(zone))
+		return zone->contiguous ? ZONE_CONTIG_YES : ZONE_CONTIG_NO;
+
+	/* If we cannot fill the hole, the zone stays not contiguous. */
+	if (nr_pages < (zone->spanned_pages - zone->present_pages))
+		return ZONE_CONTIG_NO;
+
+	return ZONE_CONTIG_MAYBE;
+}
+
 /*
  * Associate the pfn range with the given zone, initializing the memmaps
  * and resizing the pgdat/zone data to span the added pages. After this
@@ -1165,7 +1213,6 @@ static int online_pages(unsigned long pfn, unsigned long nr_pages,
 			 !IS_ALIGNED(pfn + nr_pages, PAGES_PER_SECTION)))
 		return -EINVAL;
 
-
 	/* associate pfn range with the zone */
 	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_MOVABLE,
 			       true);
@@ -1203,13 +1250,6 @@ static int online_pages(unsigned long pfn, unsigned long nr_pages,
 	}
 
 	online_pages_range(pfn, nr_pages);
-
-	/*
-	 * Now that the ranges are indicated as online, check whether the whole
-	 * zone is contiguous.
-	 */
-	set_zone_contiguous(zone);
-
 	adjust_present_page_count(pfn_to_page(pfn), group, nr_pages);
 
 	if (node_arg.nid >= 0)
@@ -1258,12 +1298,21 @@ static int online_memory_block_pages(unsigned long start_pfn, unsigned long nr_p
 			unsigned long nr_vmemmap_pages, struct zone *zone,
 			struct memory_group *group)
 {
+	const bool contiguous = zone->contiguous;
+	enum zone_contig_state new_contiguous_state;
 	int ret;
 
+	/*
+	 * Calculate the new zone contig state before move_pfn_range_to_zone()
+	 * sets the zone temporarily to non-contiguous.
+	 */
+	new_contiguous_state = zone_contig_state_after_growing(zone, start_pfn,
+							       nr_pages);
+
 	if (nr_vmemmap_pages) {
 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
 		if (ret)
-			return ret;
+			goto restore_zone_contig;
 	}
 
 	ret = online_pages(start_pfn + nr_vmemmap_pages,
@@ -1271,7 +1320,7 @@ static int online_memory_block_pages(unsigned long start_pfn, unsigned long nr_p
 	if (ret) {
 		if (nr_vmemmap_pages)
 			mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
-		return ret;
+		goto restore_zone_contig;
 	}
 
 	/*
@@ -1282,6 +1331,15 @@ static int online_memory_block_pages(unsigned long start_pfn, unsigned long nr_p
 		adjust_present_page_count(pfn_to_page(start_pfn), group,
 					  nr_vmemmap_pages);
 
+	/*
+	 * Now that the ranges are indicated as online, check whether the whole
+	 * zone is contiguous.
+	 */
+	set_zone_contiguous(zone, new_contiguous_state);
+	return 0;
+
+restore_zone_contig:
+	zone->contiguous = contiguous;
 	return ret;
 }
 
diff --git a/mm/mm_init.c b/mm/mm_init.c
index fc2a6f1e518f..5ed3fbd5c643 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2263,11 +2263,22 @@ void __init init_cma_pageblock(struct page *page)
 }
 #endif
 
-void set_zone_contiguous(struct zone *zone)
+void set_zone_contiguous(struct zone *zone, enum zone_contig_state state)
 {
 	unsigned long block_start_pfn = zone->zone_start_pfn;
 	unsigned long block_end_pfn;
 
+	/* We expect an earlier call to clear_zone_contiguous(). */
+	VM_WARN_ON_ONCE(zone->contiguous);
+
+	if (state == ZONE_CONTIG_YES) {
+		zone->contiguous = true;
+		return;
+	}
+
+	if (state == ZONE_CONTIG_NO)
+		return;
+
 	block_end_pfn = pageblock_end_pfn(block_start_pfn);
 	for (; block_start_pfn < zone_end_pfn(zone);
 			block_start_pfn = block_end_pfn,
@@ -2348,7 +2359,7 @@ void __init page_alloc_init_late(void)
 		shuffle_free_memory(NODE_DATA(nid));
 
 	for_each_populated_zone(zone)
-		set_zone_contiguous(zone);
+		set_zone_contiguous(zone, ZONE_CONTIG_MAYBE);
 
 	/* Initialize page ext after all struct pages are initialized. */
 	if (deferred_struct_pages)
-- 
2.47.1
Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
Posted by David Hildenbrand (Arm) 1 day, 20 hours ago
On 1/30/26 17:37, Tianyou Li wrote:
> When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
> update the zone->contiguous by checking the new zone's pfn range from the
> beginning to the end, regardless the previous state of the old zone. When
> the zone's pfn range is large, the cost of traversing the pfn range to
> update the zone->contiguous could be significant.
> 
> Add fast paths to quickly detect cases where zone is definitely not
> contiguous without scanning the new zone. The cases are: when the new range
> did not overlap with previous range, the contiguous should be false; if the
> new range adjacent with the previous range, just need to check the new
> range; if the new added pages could not fill the hole of previous zone, the
> contiguous should be false.
> 
> The following test cases of memory hotplug for a VM [1], tested in the
> environment [2], show that this optimization can significantly reduce the
> memory hotplug time [3].
> 
> +----------------+------+---------------+--------------+----------------+
> |                | Size | Time (before) | Time (after) | Time Reduction |
> |                +------+---------------+--------------+----------------+
> | Plug Memory    | 256G |      10s      |      2s      |       80%      |
> |                +------+---------------+--------------+----------------+
> |                | 512G |      33s      |      6s      |       81%      |
> +----------------+------+---------------+--------------+----------------+
> 
> +----------------+------+---------------+--------------+----------------+
> |                | Size | Time (before) | Time (after) | Time Reduction |
> |                +------+---------------+--------------+----------------+
> | Unplug Memory  | 256G |      10s      |      2s      |       80%      |
> |                +------+---------------+--------------+----------------+
> |                | 512G |      34s      |      6s      |       82%      |
> +----------------+------+---------------+--------------+----------------+
> 
> [1] Qemu commands to hotplug 256G/512G memory for a VM:
>      object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
>      device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
>      qom-set vmem1 requested-size 256G/512G (Plug Memory)
>      qom-set vmem1 requested-size 0G (Unplug Memory)
> 
> [2] Hardware     : Intel Icelake server
>      Guest Kernel : v6.18-rc2
>      Qemu         : v9.0.0
> 
>      Launch VM    :
>      qemu-system-x86_64 -accel kvm -cpu host \
>      -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
>      -drive file=./seed.img,format=raw,if=virtio \
>      -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
>      -m 2G,slots=10,maxmem=2052472M \
>      -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
>      -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
>      -nographic -machine q35 \
>      -nic user,hostfwd=tcp::3000-:22
> 
>      Guest kernel auto-onlines newly added memory blocks:
>      echo online > /sys/devices/system/memory/auto_online_blocks
> 
> [3] The time from typing the QEMU commands in [1] to when the output of
>      'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
>      memory is recognized.
> 
> Reported-by: Nanhai Zou <nanhai.zou@intel.com>
> Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
> Tested-by: Yuan Liu <yuan1.liu@intel.com>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
> Reviewed-by: Pan Deng <pan.deng@intel.com>
> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
> Signed-off-by: Tianyou Li <tianyou.li@intel.com>
> ---

Thanks for all your work on this and sorry for being slower with
review the last month.

While I was in the shower I was thinking about how much I hate
zone->contiguous + the pageblock walking, and how we could just get
rid of it.

You know, just what you do while having a relaxing shower.


And I was wondering:

(a) in which case would we have zone_spanned_pages == zone_present_pages
and the zone *not* being contiguous? I assume this just cannot happen,
otherwise BUG.

(b) in which case would we have zone_spanned_pages != zone_present_pages
and the zone *being* contiguous? I assume in some cases where we have small
holes within a pageblock?


Reading the doc of __pageblock_pfn_to_page(), there are some weird
scenarios with holes in pageblocks.



I.e., on my notebook I have

$ cat /proc/zoneinfo  | grep -E "Node|spanned|present"
Node 0, zone      DMA
         spanned  4095
         present  3999
Node 0, zone    DMA32
         spanned  1044480
         present  439600
Node 0, zone   Normal
         spanned  7798784
         present  7798784
Node 0, zone  Movable
         spanned  0
         present  0
Node 0, zone   Device
         spanned  0
         present  0


For the most important zone regarding compaction, ZONE_NORMAL, it would be good enough.

We certainly don't care about detecting contigous for the DMA zone. For DMA32, I would suspect
that it is not detected as contigous either way, because the holes are just way too large?


So we could maybe do (completely untested):


 From 69093e5811b532812fde52b55a42dcb24d6e09dd Mon Sep 17 00:00:00 2001
From: "David Hildenbrand (Arm)" <david@kernel.org>
Date: Sat, 7 Feb 2026 11:45:21 +0100
Subject: [PATCH] tmp

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
  include/linux/mmzone.h | 25 +++++++++++++++++++++++--
  mm/internal.h          |  8 +-------
  mm/memory_hotplug.c    | 11 +----------
  mm/mm_init.c           | 25 -------------------------
  4 files changed, 25 insertions(+), 44 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fc5d6c88d2f0..7c80df343cfd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1051,8 +1051,6 @@ struct zone {
  	bool			compact_blockskip_flush;
  #endif
  
-	bool			contiguous;
-
  	CACHELINE_PADDING(_pad3_);
  	/* Zone statistics */
  	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
@@ -1124,6 +1122,29 @@ static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
  	return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
  }
  
+/**
+ * zone_is_contiguous - test whether a zone is contiguous
+ * @zone: the zone to test.
+ *
+ * In a contigous zone, it is valid to call pfn_to_page() on any pfn in the
+ * spanned zone without requiting pfn_valid() or pfn_to_online_page() checks.
+ *
+ * Returns: true if contiguous, otherwise false.
+ */
+static inline bool zone_is_contiguous(const struct zone *zone)
+{
+	/*
+	 * TODO: do we care about weird races? We could protect using a
+	 * seqcount or sth. like that (zone_span_seqbegin etc).
+	 *
+	 * Concurrent hotplug is not an issue. But likely the caller must
+	 * protect against concurrent hotunplug already? We should definitely
+	 * read these values through READ_ONCE and update them through
+	 * WRITE_ONCE().
+	 */
+	return zone->spanned_pages == zone->present_pages;
+}
+
  static inline bool zone_is_initialized(const struct zone *zone)
  {
  	return zone->initialized;
diff --git a/mm/internal.h b/mm/internal.h
index f35dbcf99a86..6062f9b8ee62 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -716,21 +716,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
  static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
  				unsigned long end_pfn, struct zone *zone)
  {
-	if (zone->contiguous)
+	if (zone_is_contiguous(zone))
  		return pfn_to_page(start_pfn);
  
  	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
  }
  
-void set_zone_contiguous(struct zone *zone);
  bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
  			   unsigned long nr_pages);
  
-static inline void clear_zone_contiguous(struct zone *zone)
-{
-	zone->contiguous = false;
-}
-
  extern int __isolate_free_page(struct page *page, unsigned int order);
  extern void __putback_isolated_page(struct page *page, unsigned int order,
  				    int mt);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a63ec679d861..790a8839b5d8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
  
  	/*
  	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
-	 * we will not try to shrink the zones - which is okay as
-	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+	 * we will not try to shrink the zones.
  	 */
  	if (zone_is_zone_device(zone))
  		return;
  
-	clear_zone_contiguous(zone);
-
  	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
  	update_pgdat_span(pgdat);
-
-	set_zone_contiguous(zone);
  }
  
  /**
@@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
  	struct pglist_data *pgdat = zone->zone_pgdat;
  	int nid = pgdat->node_id;
  
-	clear_zone_contiguous(zone);
-
  	if (zone_is_empty(zone))
  		init_currently_empty_zone(zone, start_pfn, nr_pages);
  	resize_zone_range(zone, start_pfn, nr_pages);
@@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
  	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
  			 MEMINIT_HOTPLUG, altmap, migratetype,
  			 isolate_pageblock);
-
-	set_zone_contiguous(zone);
  }
  
  struct auto_movable_stats {
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 2a809cd8e7fa..78115fb5808b 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2263,28 +2263,6 @@ void __init init_cma_pageblock(struct page *page)
  }
  #endif
  
-void set_zone_contiguous(struct zone *zone)
-{
-	unsigned long block_start_pfn = zone->zone_start_pfn;
-	unsigned long block_end_pfn;
-
-	block_end_pfn = pageblock_end_pfn(block_start_pfn);
-	for (; block_start_pfn < zone_end_pfn(zone);
-			block_start_pfn = block_end_pfn,
-			 block_end_pfn += pageblock_nr_pages) {
-
-		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
-
-		if (!__pageblock_pfn_to_page(block_start_pfn,
-					     block_end_pfn, zone))
-			return;
-		cond_resched();
-	}
-
-	/* We confirm that there is no hole */
-	zone->contiguous = true;
-}
-
  /*
   * Check if a PFN range intersects multiple zones on one or more
   * NUMA nodes. Specify the @nid argument if it is known that this
@@ -2347,9 +2325,6 @@ void __init page_alloc_init_late(void)
  	for_each_node_state(nid, N_MEMORY)
  		shuffle_free_memory(NODE_DATA(nid));
  
-	for_each_populated_zone(zone)
-		set_zone_contiguous(zone);
-
  	/* Initialize page ext after all struct pages are initialized. */
  	if (deferred_struct_pages)
  		page_ext_init();
-- 
2.43.0



If we would want to cover the cases with "holes in zone, but there is a struct page and it's
assigned to the zone", all we would have to do is manually track them (during boot only,
cannot happen during memory hotplug) in zone->absent pages. That value would never change.

Then we would have instead:

static inline bool zone_is_contiguous(const struct zone *zone)
{
	return zone->spanned_pages == zone->present_pages + zone->absent_pages;
}


I don't think we could just use "absent" as calculated in calculate_node_totalpages,
because I assume it could include "too many" things, not just these holes in pageblocks.


At least reading zone_absent_pages_in_node(), likely the value could return
* Pages that will not have a struct page in case of larger holes
* mirrored_kernelcore oddities

We'd need a reliably "absent pages that have a struct page that belongs to this zone".

Maybe Mike knows how to easily obtain that there to just set zone->absent_pages.

If we really need that optimization for these cases.

-- 
Cheers,

David
Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
Posted by Mike Rapoport 12 hours ago
On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:
> On 1/30/26 17:37, Tianyou Li wrote:
> > When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
> > update the zone->contiguous by checking the new zone's pfn range from the
> > beginning to the end, regardless the previous state of the old zone. When
> > the zone's pfn range is large, the cost of traversing the pfn range to
> > update the zone->contiguous could be significant.
> > 
> > Add fast paths to quickly detect cases where zone is definitely not
> > contiguous without scanning the new zone. The cases are: when the new range
> > did not overlap with previous range, the contiguous should be false; if the
> > new range adjacent with the previous range, just need to check the new
> > range; if the new added pages could not fill the hole of previous zone, the
> > contiguous should be false.
> > 
> > The following test cases of memory hotplug for a VM [1], tested in the
> > environment [2], show that this optimization can significantly reduce the
> > memory hotplug time [3].
> > 
> > +----------------+------+---------------+--------------+----------------+
> > |                | Size | Time (before) | Time (after) | Time Reduction |
> > |                +------+---------------+--------------+----------------+
> > | Plug Memory    | 256G |      10s      |      2s      |       80%      |
> > |                +------+---------------+--------------+----------------+
> > |                | 512G |      33s      |      6s      |       81%      |
> > +----------------+------+---------------+--------------+----------------+
> > 
> > +----------------+------+---------------+--------------+----------------+
> > |                | Size | Time (before) | Time (after) | Time Reduction |
> > |                +------+---------------+--------------+----------------+
> > | Unplug Memory  | 256G |      10s      |      2s      |       80%      |
> > |                +------+---------------+--------------+----------------+
> > |                | 512G |      34s      |      6s      |       82%      |
> > +----------------+------+---------------+--------------+----------------+
> > 
> > [1] Qemu commands to hotplug 256G/512G memory for a VM:
> >      object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
> >      device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
> >      qom-set vmem1 requested-size 256G/512G (Plug Memory)
> >      qom-set vmem1 requested-size 0G (Unplug Memory)
> > 
> > [2] Hardware     : Intel Icelake server
> >      Guest Kernel : v6.18-rc2
> >      Qemu         : v9.0.0
> > 
> >      Launch VM    :
> >      qemu-system-x86_64 -accel kvm -cpu host \
> >      -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
> >      -drive file=./seed.img,format=raw,if=virtio \
> >      -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
> >      -m 2G,slots=10,maxmem=2052472M \
> >      -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
> >      -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
> >      -nographic -machine q35 \
> >      -nic user,hostfwd=tcp::3000-:22
> > 
> >      Guest kernel auto-onlines newly added memory blocks:
> >      echo online > /sys/devices/system/memory/auto_online_blocks
> > 
> > [3] The time from typing the QEMU commands in [1] to when the output of
> >      'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
> >      memory is recognized.
> > 
> > Reported-by: Nanhai Zou <nanhai.zou@intel.com>
> > Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
> > Tested-by: Yuan Liu <yuan1.liu@intel.com>
> > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
> > Reviewed-by: Pan Deng <pan.deng@intel.com>
> > Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> > Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
> > Signed-off-by: Tianyou Li <tianyou.li@intel.com>
> > ---
> 
> Thanks for all your work on this and sorry for being slower with
> review the last month.
> 
> While I was in the shower I was thinking about how much I hate
> zone->contiguous + the pageblock walking, and how we could just get
> rid of it.
> 
> You know, just what you do while having a relaxing shower.
> 
> 
> And I was wondering:
> 
> (a) in which case would we have zone_spanned_pages == zone_present_pages
> and the zone *not* being contiguous? I assume this just cannot happen,
> otherwise BUG.
> 
> (b) in which case would we have zone_spanned_pages != zone_present_pages
> and the zone *being* contiguous? I assume in some cases where we have small
> holes within a pageblock?
>
> Reading the doc of __pageblock_pfn_to_page(), there are some weird
> scenarios with holes in pageblocks.
 
It seems that "zone->contigous" is really bad name for what this thing
represents.

tl;dr I don't think zone_spanned_pages == zone_present_pages is related to
zone->contigous at all :)

If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
check for zone->contigous should guarantee that the entire pageblock has a
valid memory map and that the entire pageblock fits a zone and does not
cross zone/node boundaries.

For coldplug memory the memory map is valid for every section that has
present memory, i.e. even it there is a hole in a section, it's memory map
will be populated and will have struct pages.

When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
essentially checks if the first page in a pageblock is online and if first
and last pages are in the zone being compacted. 
 
AFAIU, in the hotplug case an entire pageblock is always onlined to the
same zone, so zone->contigous won't change after the hotplug is complete.

We might set it to false in the beginning of the hotplug to avoid scanning
offline pages, although I'm not sure if it's possible.

But in the end of hotplug we can simply restore the old value and move on.

For the coldplug case I'm also not sure it's worth the hassle, we could
just let compaction scan a few more pfns for those rare weird pageblocks
and bail out on wrong page conditions.

> I.e., on my notebook I have
> 
> $ cat /proc/zoneinfo  | grep -E "Node|spanned|present"
> Node 0, zone      DMA
>         spanned  4095
>         present  3999
> Node 0, zone    DMA32
>         spanned  1044480
>         present  439600

I suspect this one is contigous ;-)

> Node 0, zone   Normal
>         spanned  7798784
>         present  7798784
> Node 0, zone  Movable
>         spanned  0
>         present  0
> Node 0, zone   Device
>         spanned  0
>         present  0
> 
> 
> For the most important zone regarding compaction, ZONE_NORMAL, it would be good enough.
> 
> We certainly don't care about detecting contigous for the DMA zone. For DMA32, I would suspect
> that it is not detected as contigous either way, because the holes are just way too large?
> 
> -- 
> Cheers,
> 
> David

-- 
Sincerely yours,
Mike.