[PATCH] mm/memory_hotplug: Cache auto_movable stats to optimize online check

Swaraj Gaikwad posted 1 patch 1 week, 5 days ago
mm/memory_hotplug.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
[PATCH] mm/memory_hotplug: Cache auto_movable stats to optimize online check
Posted by Swaraj Gaikwad 1 week, 5 days ago
The auto_movable_can_online_movable() function currently walks all
populated zones when nid == NUMA_NO_NODE,

Since adjust_present_page_count() is called every time memory is
onlined/offlined and already updates present page counts, we can
maintain cached global statistics that are updated incrementally. This
eliminates the need to walk all zones for the NUMA_NO_NODE case.

This patch introduces a static global_auto_movable_stats structure that
caches kernel_early_pages and movable_pages counts. The cache is updated
in adjust_present_page_count() whenever pages are onlined/offlined, and
is read directly in auto_movable_can_online_movable() when
nid == NUMA_NO_NODE.

Testing: Built and booted the kernel successfully. Ran the memory
management test suite in tools/testing/selftests/mm/ with
./run_vmtests.sh - all tests passed.

Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com>
---
 mm/memory_hotplug.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 63b9d500ec6c..ba43edba8c92 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -50,6 +50,8 @@ enum {
 
 static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
 
+static struct auto_movable_stats global_auto_movable_stats;
+
 static inline unsigned long memory_block_memmap_size(void)
 {
 	return PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
@@ -851,9 +853,7 @@ static bool auto_movable_can_online_movable(int nid, struct memory_group *group,
 
 	/* Walk all relevant zones and collect MOVABLE vs. KERNEL stats. */
 	if (nid == NUMA_NO_NODE) {
-		/* TODO: cache values */
-		for_each_populated_zone(zone)
-			auto_movable_stats_account_zone(&stats, zone);
+		stats = global_auto_movable_stats;
 	} else {
 		for (i = 0; i < MAX_NR_ZONES; i++) {
 			pg_data_t *pgdat = NODE_DATA(nid);
@@ -1071,12 +1071,13 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
 {
 	struct zone *zone = page_zone(page);
 	const bool movable = zone_idx(zone) == ZONE_MOVABLE;
+	const bool early = early_section(__pfn_to_section(page_to_pfn(page)));
 
 	/*
 	 * We only support onlining/offlining/adding/removing of complete
 	 * memory blocks; therefore, either all is either early or hotplugged.
 	 */
-	if (early_section(__pfn_to_section(page_to_pfn(page))))
+	if (early)
 		zone->present_early_pages += nr_pages;
 	zone->present_pages += nr_pages;
 	zone->zone_pgdat->node_present_pages += nr_pages;
@@ -1085,6 +1086,12 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
 		group->present_movable_pages += nr_pages;
 	else if (group && !movable)
 		group->present_kernel_pages += nr_pages;
+
+	if (movable) {
+		global_auto_movable_stats.movable_pages += nr_pages;
+	} else if (early) {
+		global_auto_movable_stats.kernel_early_pages += nr_pages;
+	}
 }
 
 int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,

base-commit: 3cfeff1d2304237b1c14628d695a6df44daff48f
-- 
2.52.0
Re: [PATCH] mm/memory_hotplug: Cache auto_movable stats to optimize online check
Posted by David Hildenbrand (Red Hat) 1 week, 4 days ago
On 12/6/25 22:25, Swaraj Gaikwad wrote:
> The auto_movable_can_online_movable() function currently walks all
> populated zones when nid == NUMA_NO_NODE,
> 
> Since adjust_present_page_count() is called every time memory is
> onlined/offlined and already updates present page counts, we can
> maintain cached global statistics that are updated incrementally. This
> eliminates the need to walk all zones for the NUMA_NO_NODE case.
> 
> This patch introduces a static global_auto_movable_stats structure that
> caches kernel_early_pages and movable_pages counts. The cache is updated
> in adjust_present_page_count() whenever pages are onlined/offlined, and
> is read directly in auto_movable_can_online_movable() when
> nid == NUMA_NO_NODE.
> 
> Testing: Built and booted the kernel successfully. Ran the memory
> management test suite in tools/testing/selftests/mm/ with
> ./run_vmtests.sh - all tests passed.
> 
> Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com>
> ---
>   mm/memory_hotplug.c | 15 +++++++++++----
>   1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 63b9d500ec6c..ba43edba8c92 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -50,6 +50,8 @@ enum {
>   
>   static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
>   
> +static struct auto_movable_stats global_auto_movable_stats;
> +
>   static inline unsigned long memory_block_memmap_size(void)
>   {
>   	return PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
> @@ -851,9 +853,7 @@ static bool auto_movable_can_online_movable(int nid, struct memory_group *group,
>   
>   	/* Walk all relevant zones and collect MOVABLE vs. KERNEL stats. */
>   	if (nid == NUMA_NO_NODE) {
> -		/* TODO: cache values */

The TODO was a bit unspecific: should have been "cache values if walking 
all zones becomes a performance problem".

Is there a performance impact, or are you able to show a performance 
difference?

-- 
Cheers

David
Re: [PATCH] mm/memory_hotplug: Cache auto_movable stats to optimize online check
Posted by Swaraj Gaikwad 1 week, 4 days ago
Hi David,

Thank you for the feedback. I've conducted benchmarking to measure the
performance impact of caching the global statistics.

Test Setup:
I created a QEMU VM with the following configuration to simulate
a multi-NUMA environment:
- 8 NUMA nodes, 1GB each
- movablecore=2G kernel parameter
- Additional hotpluggable memory region via memmap=1G!0xC00000000

Benchmark Method:
I added a debugfs interface to directly invoke auto_movable_can_online_movable()
with NUMA_NO_NODE (the code path that walks all zones), and then
triggered it via: `echo 1 > /sys/kernel/debug/movable_benchmark`


Results:
Without patch (walks all zones): 2402 ns
With patch (uses cached values): 453 ns

While the absolute time difference is small in this test setup,
the improvement becomes more significant with:
- More NUMA nodes/zones in the system
- Frequent memory hotplug operations
- Systems with many populated zones

If this performance improvement is not considered significant enough to
justify the patch, I'm happy to send an updated patch that simply clarifies
the TODO comment to: "cache values if walking all zones becomes a
performance problem" as you suggested, for future reference.

Testing code:

static int benchmark_set(void *data, u64 val)
{
    ktime_t start, end;
    s64 duration;
    bool result;
    int nid = NUMA_NO_NODE;
    unsigned long nr_pages = 32768;

    start = ktime_get();
    result = auto_movable_can_online_movable(nid, NULL, nr_pages);
    end = ktime_get();
    duration = ktime_to_ns(ktime_sub(end, start));

    pr_info("BENCHMARK_RESULT: Result=%d | Time=%lld ns\n", result, duration);
    return 0;
}

QEMU Configuration:

  qemu-system-x86_64 \
    -m 8G,slots=16,maxmem=16G \
    -smp 2 \
    -netdev user,id=net0,hostfwd=tcp::10022-:22 \
    -device virtio-net-pci,netdev=net0 \
    -enable-kvm \
    -cpu host \
    -initrd "${DEFAULT_INITRD}" \
    -kernel "${DEFAULT_KERNEL}" \
    -object memory-backend-ram,id=mem0,size=1G \
    -object memory-backend-ram,id=mem1,size=1G \
    -object memory-backend-ram,id=mem2,size=1G \
    -object memory-backend-ram,id=mem3,size=1G \
    -object memory-backend-ram,id=mem4,size=1G \
    -object memory-backend-ram,id=mem5,size=1G \
    -object memory-backend-ram,id=mem6,size=1G \
    -object memory-backend-ram,id=mem7,size=1G \
    -numa node,nodeid=0,memdev=mem0 \
    -numa node,nodeid=1,memdev=mem1 \
    -numa node,nodeid=2,memdev=mem2 \
    -numa node,nodeid=3,memdev=mem3 \
    -numa node,nodeid=4,memdev=mem4 \
    -numa node,nodeid=5,memdev=mem5 \
    -numa node,nodeid=6,memdev=mem6 \
    -numa node,nodeid=7,memdev=mem7 \
    -append "loglevel=8 root=/dev/vda3 rootwait console=ttyS0 idle=poll movablecore=2G memmap=1G!0xC00000000" \
    -drive if=none,file="${DEFAULT_DISK}",format=qcow2,id=hd \
    -device virtio-blk-pci,drive=hd \
    -nographic \
    -machine q35\
    -snapshot \
    -s

Thanks,
Swaraj
Re: [PATCH] mm/memory_hotplug: Cache auto_movable stats to optimize online check
Posted by Swaraj Gaikwad 1 day, 4 hours ago
Hi David,

I’m just checking in on this patch to see if you’ve had a chance to review
the benchmark results I shared last week. To recap, the caching reduced the
execution time for the NUMA_NO_NODE case from ~2402 ns to ~453 ns in my test
environment.

Please let me know if the performance gain justifies the change in your view,or
if you would prefer I send a v2 that simply updates the TODO comment as you suggested.

Thanks,
Swaraj
Re: [PATCH] mm/memory_hotplug: Cache auto_movable stats to optimize online check
Posted by David Hildenbrand (Red Hat) 1 day, 2 hours ago
On 12/18/25 07:41, Swaraj Gaikwad wrote:
> Hi David,

Hi!

> 
> I’m just checking in on this patch to see if you’ve had a chance to review
> the benchmark results I shared last week. 

Not really, still traveling and this is veeeery low priority :)

> To recap, the caching reduced the
> execution time for the NUMA_NO_NODE case from ~2402 ns to ~453 ns in my test
> environment.

Yeah, but these micro-benchmarks don't really matter ... at all. What 
would be interesting is what happens when you hotplug a lot of memory to 
a system with a lot of nodes. I suspect it won't really be a problem.

> 
> Please let me know if the performance gain justifies the change in your view,or
> if you would prefer I send a v2 that simply updates the TODO comment as you suggested.

I'd prefer if things that are not a real problem would not consume my 
bandwidth while traveling ;)

Anyhow, there was a kernel bot complaint that you are using "struct 
auto_movable_stats" before the compiler knows about it (and its size).

When you resend, make sure to better describe the "why" we are doing it. 
It cleans up the code a little, so that could be used as an argument.

-- 
Cheers

David
Re: [PATCH] mm/memory_hotplug: Cache auto_movable stats to optimize online check
Posted by kernel test robot 1 week, 5 days ago
Hi Swaraj,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 3cfeff1d2304237b1c14628d695a6df44daff48f]

url:    https://github.com/intel-lab-lkp/linux/commits/Swaraj-Gaikwad/mm-memory_hotplug-Cache-auto_movable-stats-to-optimize-online-check/20251206-235737
base:   3cfeff1d2304237b1c14628d695a6df44daff48f
patch link:    https://lore.kernel.org/r/20251206212507.135503-1-swarajgaikwad1925%40gmail.com
patch subject: [PATCH] mm/memory_hotplug: Cache auto_movable stats to optimize online check
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20251207/202512071758.SKaL9BlQ-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251207/202512071758.SKaL9BlQ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512071758.SKaL9BlQ-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> mm/memory_hotplug.c:53:34: warning: tentative definition of variable with internal linkage has incomplete non-array type 'struct auto_movable_stats' [-Wtentative-definition-incomplete-type]
      53 | static struct auto_movable_stats global_auto_movable_stats;
         |                                  ^
   mm/memory_hotplug.c:53:15: note: forward declaration of 'struct auto_movable_stats'
      53 | static struct auto_movable_stats global_auto_movable_stats;
         |               ^
   1 warning generated.


vim +53 mm/memory_hotplug.c

    52	
  > 53	static struct auto_movable_stats global_auto_movable_stats;
    54	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki