[PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling

David Hildenbrand (Arm) posted 14 patches 2 weeks, 6 days ago
There is a newer version of this series
[PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling
Posted by David Hildenbrand (Arm) 2 weeks, 6 days ago
In 2008, we added through commit 48c906823f39 ("memory hotplug: allocate
usemap on the section with pgdat") quite some complexity to try
allocating memory for the "usemap" (storing pageblock information
per memory section) for a memory section close to the memory of the
"pgdat" of the node.

The goal was to make memory hotunplug of boot memory more likely to
succeed. That commit also added some checks for circular dependencies
between two memory sections, whereby two memory sections would contain
each others usemap, turning bot memory sections un-removable.

However, in 2010, commit a4322e1bad91 ("sparsemem: Put usemap for one node
together") started allocating the usemap for multiple memory
sections on the same node in one chunk, effectively grouping all usemap
allocations of the same node in a single memblock allocation.

We don't really give guarantees about memory hotunplug of boot memory, and
with the change in 2010, it is pretty much impossible in practice to get
any circular dependencies.

commit 48c906823f39 ("memory hotplug: allocate usemap on the section with
pgdat") also added the comment:

	"Similarly, a pgdat can prevent a section being removed. If
	 section A contains a pgdat and section B
	 contains the usemap, both sections become inter-dependent."

Given that we don't free the pgdat anymore, that comment (and handling)
does not apply.

So let's simply remove this complexity.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/sparse.c | 100 +---------------------------------------------------
 1 file changed, 1 insertion(+), 99 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 2a1f662245bc..b57c81e99340 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -294,102 +294,6 @@ size_t mem_section_usage_size(void)
 	return sizeof(struct mem_section_usage) + usemap_size();
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
-static inline phys_addr_t pgdat_to_phys(struct pglist_data *pgdat)
-{
-#ifndef CONFIG_NUMA
-	VM_BUG_ON(pgdat != &contig_page_data);
-	return __pa_symbol(&contig_page_data);
-#else
-	return __pa(pgdat);
-#endif
-}
-
-static struct mem_section_usage * __init
-sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
-					 unsigned long size)
-{
-	struct mem_section_usage *usage;
-	unsigned long goal, limit;
-	int nid;
-	/*
-	 * A page may contain usemaps for other sections preventing the
-	 * page being freed and making a section unremovable while
-	 * other sections referencing the usemap remain active. Similarly,
-	 * a pgdat can prevent a section being removed. If section A
-	 * contains a pgdat and section B contains the usemap, both
-	 * sections become inter-dependent. This allocates usemaps
-	 * from the same section as the pgdat where possible to avoid
-	 * this problem.
-	 */
-	goal = pgdat_to_phys(pgdat) & (PAGE_SECTION_MASK << PAGE_SHIFT);
-	limit = goal + (1UL << PA_SECTION_SHIFT);
-	nid = early_pfn_to_nid(goal >> PAGE_SHIFT);
-again:
-	usage = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid);
-	if (!usage && limit) {
-		limit = MEMBLOCK_ALLOC_ACCESSIBLE;
-		goto again;
-	}
-	return usage;
-}
-
-static void __init check_usemap_section_nr(int nid,
-		struct mem_section_usage *usage)
-{
-	unsigned long usemap_snr, pgdat_snr;
-	static unsigned long old_usemap_snr;
-	static unsigned long old_pgdat_snr;
-	struct pglist_data *pgdat = NODE_DATA(nid);
-	int usemap_nid;
-
-	/* First call */
-	if (!old_usemap_snr) {
-		old_usemap_snr = NR_MEM_SECTIONS;
-		old_pgdat_snr = NR_MEM_SECTIONS;
-	}
-
-	usemap_snr = pfn_to_section_nr(__pa(usage) >> PAGE_SHIFT);
-	pgdat_snr = pfn_to_section_nr(pgdat_to_phys(pgdat) >> PAGE_SHIFT);
-	if (usemap_snr == pgdat_snr)
-		return;
-
-	if (old_usemap_snr == usemap_snr && old_pgdat_snr == pgdat_snr)
-		/* skip redundant message */
-		return;
-
-	old_usemap_snr = usemap_snr;
-	old_pgdat_snr = pgdat_snr;
-
-	usemap_nid = sparse_early_nid(__nr_to_section(usemap_snr));
-	if (usemap_nid != nid) {
-		pr_info("node %d must be removed before remove section %ld\n",
-			nid, usemap_snr);
-		return;
-	}
-	/*
-	 * There is a circular dependency.
-	 * Some platforms allow un-removable section because they will just
-	 * gather other removable sections for dynamic partitioning.
-	 * Just notify un-removable section's number here.
-	 */
-	pr_info("Section %ld and %ld (node %d) have a circular dependency on usemap and pgdat allocations\n",
-		usemap_snr, pgdat_snr, nid);
-}
-#else
-static struct mem_section_usage * __init
-sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
-					 unsigned long size)
-{
-	return memblock_alloc_node(size, SMP_CACHE_BYTES, pgdat->node_id);
-}
-
-static void __init check_usemap_section_nr(int nid,
-		struct mem_section_usage *usage)
-{
-}
-#endif /* CONFIG_MEMORY_HOTREMOVE */
-
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 unsigned long __init section_map_size(void)
 {
@@ -486,7 +390,6 @@ void __init sparse_init_early_section(int nid, struct page *map,
 				      unsigned long pnum, unsigned long flags)
 {
 	BUG_ON(!sparse_usagebuf || sparse_usagebuf >= sparse_usagebuf_end);
-	check_usemap_section_nr(nid, sparse_usagebuf);
 	sparse_init_one_section(__nr_to_section(pnum), pnum, map,
 			sparse_usagebuf, SECTION_IS_EARLY | flags);
 	sparse_usagebuf = (void *)sparse_usagebuf + mem_section_usage_size();
@@ -497,8 +400,7 @@ static int __init sparse_usage_init(int nid, unsigned long map_count)
 	unsigned long size;
 
 	size = mem_section_usage_size() * map_count;
-	sparse_usagebuf = sparse_early_usemaps_alloc_pgdat_section(
-				NODE_DATA(nid), size);
+	sparse_usagebuf = memblock_alloc_node(size, SMP_CACHE_BYTES, nid);
 	if (!sparse_usagebuf) {
 		sparse_usagebuf_end = NULL;
 		return -ENOMEM;
-- 
2.43.0
Re: [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling
Posted by Mike Rapoport 2 weeks, 5 days ago
On Tue, Mar 17, 2026 at 05:56:47PM +0100, David Hildenbrand (Arm) wrote:
> In 2008, we added through commit 48c906823f39 ("memory hotplug: allocate
> usemap on the section with pgdat") quite some complexity to try
> allocating memory for the "usemap" (storing pageblock information
> per memory section) for a memory section close to the memory of the
> "pgdat" of the node.
> 
> The goal was to make memory hotunplug of boot memory more likely to
> succeed. That commit also added some checks for circular dependencies
> between two memory sections, whereby two memory sections would contain
> each others usemap, turning bot memory sections un-removable.

                            ^ typo: boot
> 
> However, in 2010, commit a4322e1bad91 ("sparsemem: Put usemap for one node
> together") started allocating the usemap for multiple memory
> sections on the same node in one chunk, effectively grouping all usemap
> allocations of the same node in a single memblock allocation.
> 
> We don't really give guarantees about memory hotunplug of boot memory, and
> with the change in 2010, it is pretty much impossible in practice to get
> any circular dependencies.
> 
> commit 48c906823f39 ("memory hotplug: allocate usemap on the section with
> pgdat") also added the comment:
> 
> 	"Similarly, a pgdat can prevent a section being removed. If
> 	 section A contains a pgdat and section B
> 	 contains the usemap, both sections become inter-dependent."
> 
> Given that we don't free the pgdat anymore, that comment (and handling)
> does not apply.
> 
> So let's simply remove this complexity.
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/sparse.c | 100 +---------------------------------------------------
>  1 file changed, 1 insertion(+), 99 deletions(-)

-- 
Sincerely yours,
Mike.
Re: [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling
Posted by Lorenzo Stoakes (Oracle) 2 weeks, 6 days ago
On Tue, Mar 17, 2026 at 05:56:47PM +0100, David Hildenbrand (Arm) wrote:
> In 2008, we added through commit 48c906823f39 ("memory hotplug: allocate
> usemap on the section with pgdat") quite some complexity to try
> allocating memory for the "usemap" (storing pageblock information
> per memory section) for a memory section close to the memory of the
> "pgdat" of the node.
>
> The goal was to make memory hotunplug of boot memory more likely to
> succeed. That commit also added some checks for circular dependencies
> between two memory sections, whereby two memory sections would contain
> each others usemap, turning bot memory sections un-removable.

Typo: bot -> both. Presumably you are not talking about memory a bot of some
kind allocated :P

>
> However, in 2010, commit a4322e1bad91 ("sparsemem: Put usemap for one node
> together") started allocating the usemap for multiple memory
> sections on the same node in one chunk, effectively grouping all usemap
> allocations of the same node in a single memblock allocation.
>
> We don't really give guarantees about memory hotunplug of boot memory, and
> with the change in 2010, it is pretty much impossible in practice to get
> any circular dependencies.

Pretty much impossible? :) We can probably go so far as to so impossible no?

>
> commit 48c906823f39 ("memory hotplug: allocate usemap on the section with
> pgdat") also added the comment:
>
> 	"Similarly, a pgdat can prevent a section being removed. If
> 	 section A contains a pgdat and section B
> 	 contains the usemap, both sections become inter-dependent."
>
> Given that we don't free the pgdat anymore, that comment (and handling)
> does not apply.

Isn't pgdat synonymous with a node and that's the data structure that describes
a node right? Confusingly typedef'd from pglist_data to pg_data_t but then
referred to as pgdat because all that makes so much sense :)

But I'm confused, does a section containing a pgdat mean a section having the
pgdat data structure literally allocated in it?

A usemap is... something that tracks pageblock metadata I think right?

Anyway I'm also confused by 'given we don't free the pgdat any more', but the
comment says a 'pgdat can prevent a section being removed' rather than anything
about it being removed?

I guess it means the OTHER section could be prevented from being removed even
after it's gone.. somehow?

Anyway! I think maybe this could be clearer, somehow :)

>
> So let's simply remove this complexity.
>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

I think what you've done in the patch is right though, we're not doing any of
these dances after a4322e1bad91 and pgdats sitting around mean we don't really
care about where the usemap goes anyway I don't think so...

I usemap and I find myself in a place where I give you a:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

!

> ---
>  mm/sparse.c | 100 +---------------------------------------------------
>  1 file changed, 1 insertion(+), 99 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 2a1f662245bc..b57c81e99340 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -294,102 +294,6 @@ size_t mem_section_usage_size(void)
>  	return sizeof(struct mem_section_usage) + usemap_size();
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
> -static inline phys_addr_t pgdat_to_phys(struct pglist_data *pgdat)
> -{
> -#ifndef CONFIG_NUMA
> -	VM_BUG_ON(pgdat != &contig_page_data);
> -	return __pa_symbol(&contig_page_data);
> -#else
> -	return __pa(pgdat);
> -#endif
> -}
> -
> -static struct mem_section_usage * __init
> -sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
> -					 unsigned long size)
> -{
> -	struct mem_section_usage *usage;
> -	unsigned long goal, limit;
> -	int nid;
> -	/*
> -	 * A page may contain usemaps for other sections preventing the
> -	 * page being freed and making a section unremovable while
> -	 * other sections referencing the usemap remain active. Similarly,
> -	 * a pgdat can prevent a section being removed. If section A
> -	 * contains a pgdat and section B contains the usemap, both
> -	 * sections become inter-dependent. This allocates usemaps
> -	 * from the same section as the pgdat where possible to avoid
> -	 * this problem.
> -	 */
> -	goal = pgdat_to_phys(pgdat) & (PAGE_SECTION_MASK << PAGE_SHIFT);
> -	limit = goal + (1UL << PA_SECTION_SHIFT);
> -	nid = early_pfn_to_nid(goal >> PAGE_SHIFT);
> -again:
> -	usage = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid);
> -	if (!usage && limit) {
> -		limit = MEMBLOCK_ALLOC_ACCESSIBLE;
> -		goto again;
> -	}
> -	return usage;
> -}
> -
> -static void __init check_usemap_section_nr(int nid,
> -		struct mem_section_usage *usage)
> -{
> -	unsigned long usemap_snr, pgdat_snr;
> -	static unsigned long old_usemap_snr;
> -	static unsigned long old_pgdat_snr;
> -	struct pglist_data *pgdat = NODE_DATA(nid);
> -	int usemap_nid;
> -
> -	/* First call */
> -	if (!old_usemap_snr) {
> -		old_usemap_snr = NR_MEM_SECTIONS;
> -		old_pgdat_snr = NR_MEM_SECTIONS;
> -	}
> -
> -	usemap_snr = pfn_to_section_nr(__pa(usage) >> PAGE_SHIFT);
> -	pgdat_snr = pfn_to_section_nr(pgdat_to_phys(pgdat) >> PAGE_SHIFT);
> -	if (usemap_snr == pgdat_snr)
> -		return;
> -
> -	if (old_usemap_snr == usemap_snr && old_pgdat_snr == pgdat_snr)
> -		/* skip redundant message */
> -		return;
> -
> -	old_usemap_snr = usemap_snr;
> -	old_pgdat_snr = pgdat_snr;
> -
> -	usemap_nid = sparse_early_nid(__nr_to_section(usemap_snr));
> -	if (usemap_nid != nid) {
> -		pr_info("node %d must be removed before remove section %ld\n",
> -			nid, usemap_snr);
> -		return;
> -	}
> -	/*
> -	 * There is a circular dependency.
> -	 * Some platforms allow un-removable section because they will just
> -	 * gather other removable sections for dynamic partitioning.
> -	 * Just notify un-removable section's number here.
> -	 */
> -	pr_info("Section %ld and %ld (node %d) have a circular dependency on usemap and pgdat allocations\n",
> -		usemap_snr, pgdat_snr, nid);
> -}
> -#else
> -static struct mem_section_usage * __init
> -sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
> -					 unsigned long size)
> -{
> -	return memblock_alloc_node(size, SMP_CACHE_BYTES, pgdat->node_id);
> -}
> -
> -static void __init check_usemap_section_nr(int nid,
> -		struct mem_section_usage *usage)
> -{
> -}
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
> -
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  unsigned long __init section_map_size(void)
>  {
> @@ -486,7 +390,6 @@ void __init sparse_init_early_section(int nid, struct page *map,
>  				      unsigned long pnum, unsigned long flags)
>  {
>  	BUG_ON(!sparse_usagebuf || sparse_usagebuf >= sparse_usagebuf_end);
> -	check_usemap_section_nr(nid, sparse_usagebuf);
>  	sparse_init_one_section(__nr_to_section(pnum), pnum, map,
>  			sparse_usagebuf, SECTION_IS_EARLY | flags);
>  	sparse_usagebuf = (void *)sparse_usagebuf + mem_section_usage_size();
> @@ -497,8 +400,7 @@ static int __init sparse_usage_init(int nid, unsigned long map_count)
>  	unsigned long size;
>
>  	size = mem_section_usage_size() * map_count;
> -	sparse_usagebuf = sparse_early_usemaps_alloc_pgdat_section(
> -				NODE_DATA(nid), size);
> +	sparse_usagebuf = memblock_alloc_node(size, SMP_CACHE_BYTES, nid);

I guess nid here is the same node as the pgdat?

>  	if (!sparse_usagebuf) {
>  		sparse_usagebuf_end = NULL;
>  		return -ENOMEM;
> --
> 2.43.0
>

This is quite the simplification :)

Cheers, Lorenzo
Re: [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling
Posted by David Hildenbrand (Arm) 2 weeks, 3 days ago
On 3/17/26 20:48, Lorenzo Stoakes (Oracle) wrote:
> On Tue, Mar 17, 2026 at 05:56:47PM +0100, David Hildenbrand (Arm) wrote:
>> In 2008, we added through commit 48c906823f39 ("memory hotplug: allocate
>> usemap on the section with pgdat") quite some complexity to try
>> allocating memory for the "usemap" (storing pageblock information
>> per memory section) for a memory section close to the memory of the
>> "pgdat" of the node.
>>
>> The goal was to make memory hotunplug of boot memory more likely to
>> succeed. That commit also added some checks for circular dependencies
>> between two memory sections, whereby two memory sections would contain
>> each others usemap, turning bot memory sections un-removable.
> 
> Typo: bot -> both. Presumably you are not talking about memory a bot of some
> kind allocated :P
> 
>>
>> However, in 2010, commit a4322e1bad91 ("sparsemem: Put usemap for one node
>> together") started allocating the usemap for multiple memory
>> sections on the same node in one chunk, effectively grouping all usemap
>> allocations of the same node in a single memblock allocation.
>>
>> We don't really give guarantees about memory hotunplug of boot memory, and
>> with the change in 2010, it is pretty much impossible in practice to get
>> any circular dependencies.
> 
> Pretty much impossible? :) We can probably go so far as to so impossible no?

Yes.

> 
>>
>> commit 48c906823f39 ("memory hotplug: allocate usemap on the section with
>> pgdat") also added the comment:
>>
>> 	"Similarly, a pgdat can prevent a section being removed. If
>> 	 section A contains a pgdat and section B
>> 	 contains the usemap, both sections become inter-dependent."
>>
>> Given that we don't free the pgdat anymore, that comment (and handling)
>> does not apply.
> 
> Isn't pgdat synonymous with a node and that's the data structure that describes
> a node right? Confusingly typedef'd from pglist_data to pg_data_t but then
> referred to as pgdat because all that makes so much sense :)

Yeah, in general we refer to the NODE_DATA as pgdat (grep for it and
you'll be surprised).

> 
> But I'm confused, does a section containing a pgdat mean a section having the
> pgdat data structure literally allocated in it?

Yes. "struct pgdat" placed in some memory section.

> 
> A usemap is... something that tracks pageblock metadata I think right?

Yes. Essentially a large array of bytes, whereby each byte describes a
pageblock data (migratetype etc)

> 
> Anyway I'm also confused by 'given we don't free the pgdat any more', but the
> comment says a 'pgdat can prevent a section being removed' rather than anything
> about it being removed?

Well, if a pgdat resides in some memory section, given that it is
unmovable turns the whole memory section unremovable -> hotunplug fails.

Assuming you could free the pgdat when the node goes offlining, you
would turn that memory section removable.

And I think that commit somehow assumed that the last memory section
could be removed if all it contains is the corresponding pgdat (which
was never the case).

> 
> I guess it means the OTHER section could be prevented from being removed even
> after it's gone.. somehow?
> 
> Anyway! I think maybe this could be clearer, somehow :)

I'm afraid the whole purpose of the original patch was sketchy, which is
also while I fail to even explain the original motivation clearly.

Now it's fortunately no longer required. :)

> 
>>
>> So let's simply remove this complexity.
>>
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> 
> I think what you've done in the patch is right though, we're not doing any of
> these dances after a4322e1bad91 and pgdats sitting around mean we don't really
> care about where the usemap goes anyway I don't think so...
> 
> I usemap and I find myself in a place where I give you a:
> 
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> 

Thanks ;)

[...]

>> -
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>  unsigned long __init section_map_size(void)
>>  {
>> @@ -486,7 +390,6 @@ void __init sparse_init_early_section(int nid, struct page *map,
>>  				      unsigned long pnum, unsigned long flags)
>>  {
>>  	BUG_ON(!sparse_usagebuf || sparse_usagebuf >= sparse_usagebuf_end);
>> -	check_usemap_section_nr(nid, sparse_usagebuf);
>>  	sparse_init_one_section(__nr_to_section(pnum), pnum, map,
>>  			sparse_usagebuf, SECTION_IS_EARLY | flags);
>>  	sparse_usagebuf = (void *)sparse_usagebuf + mem_section_usage_size();
>> @@ -497,8 +400,7 @@ static int __init sparse_usage_init(int nid, unsigned long map_count)
>>  	unsigned long size;
>>
>>  	size = mem_section_usage_size() * map_count;
>> -	sparse_usagebuf = sparse_early_usemaps_alloc_pgdat_section(
>> -				NODE_DATA(nid), size);
>> +	sparse_usagebuf = memblock_alloc_node(size, SMP_CACHE_BYTES, nid);
> 
> I guess nid here is the same node as the pgdat?

Yes! before we used NODE_DATA(nid)->node_id, which is really just ... nid :)

-- 
Cheers,

David
Re: [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling
Posted by David Hildenbrand (Arm) 2 weeks, 3 days ago
>>
>> Anyway I'm also confused by 'given we don't free the pgdat any more', but the
>> comment says a 'pgdat can prevent a section being removed' rather than anything
>> about it being removed?
> 
> Well, if a pgdat resides in some memory section, given that it is
> unmovable turns the whole memory section unremovable -> hotunplug fails.
> 
> Assuming you could free the pgdat when the node goes offlining, you
> would turn that memory section removable.
> 
> And I think that commit somehow assumed that the last memory section
> could be removed if all it contains is the corresponding pgdat (which
> was never the case).

I decided to just drop that whole comment block completely.

-- 
Cheers,

David