[PATCH 1/8] mm/memory_hotplug: fix memory hotplug locking order

Sumanth Korikkar posted 8 patches 2 years, 1 month ago
There is a newer version of this series
[PATCH 1/8] mm/memory_hotplug: fix memory hotplug locking order
Posted by Sumanth Korikkar 2 years, 1 month ago
From Documentation/core-api/memory-hotplug.rst:
When adding/removing/onlining/offlining memory or adding/removing
heterogeneous/device memory, we should always hold the mem_hotplug_lock
in write mode to serialise memory hotplug (e.g. access to global/zone
variables).

mhp_(de)init_memmap_on_memory() functions can change zone stats and
struct page content, but they are currently called w/o the
mem_hotplug_lock.

When memory block is being offlined and when kmemleak goes through each
populated zone, the following theoretical race conditions could occur:
CPU 0:					     | CPU 1:
memory_offline()			     |
-> offline_pages()			     |
	-> mem_hotplug_begin()		     |
	   ...				     |
	-> mem_hotplug_done()		     |
					     | kmemleak_scan()
					     | -> get_online_mems()
					     |    ...
-> mhp_deinit_memmap_on_memory()	     |
  [not protected by mem_hotplug_begin/done()]|
  Marks memory section as offline,	     |   Retrieves zone_start_pfn
  poisons vmemmap struct pages and updates   |   and struct page members.
  the zone related data			     |
   					     |    ...
   					     | -> put_online_mems()

Fix this by ensuring mem_hotplug_lock is taken before performing
mhp_init_memmap_on_memory(). Also ensure that
mhp_deinit_memmap_on_memory() holds the lock.

online/offline_pages() are currently only called from
memory_block_online/offline(), so it is safe to move the locking there.

Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range")
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
---
 drivers/base/memory.c | 12 +++++++++---
 mm/memory_hotplug.c   | 13 ++++++-------
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f3b9a4d0fa3b..1e9f6a1749b9 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -204,10 +204,11 @@ static int memory_block_online(struct memory_block *mem)
 	if (mem->altmap)
 		nr_vmemmap_pages = mem->altmap->free;
 
+	mem_hotplug_begin();
 	if (nr_vmemmap_pages) {
 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
 		if (ret)
-			return ret;
+			goto out;
 	}
 
 	ret = online_pages(start_pfn + nr_vmemmap_pages,
@@ -215,7 +216,7 @@ static int memory_block_online(struct memory_block *mem)
 	if (ret) {
 		if (nr_vmemmap_pages)
 			mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
-		return ret;
+		goto out;
 	}
 
 	/*
@@ -227,6 +228,8 @@ static int memory_block_online(struct memory_block *mem)
 					  nr_vmemmap_pages);
 
 	mem->zone = zone;
+out:
+	mem_hotplug_done();
 	return ret;
 }
 
@@ -247,6 +250,7 @@ static int memory_block_offline(struct memory_block *mem)
 	if (mem->altmap)
 		nr_vmemmap_pages = mem->altmap->free;
 
+	mem_hotplug_begin();
 	if (nr_vmemmap_pages)
 		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
 					  -nr_vmemmap_pages);
@@ -258,13 +262,15 @@ static int memory_block_offline(struct memory_block *mem)
 		if (nr_vmemmap_pages)
 			adjust_present_page_count(pfn_to_page(start_pfn),
 						  mem->group, nr_vmemmap_pages);
-		return ret;
+		goto out;
 	}
 
 	if (nr_vmemmap_pages)
 		mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
 
 	mem->zone = NULL;
+out:
+	mem_hotplug_done();
 	return ret;
 }
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1b03f4ec6fd2..c8238fc5edcb 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1129,6 +1129,9 @@ void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages)
 	kasan_remove_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages));
 }
 
+/*
+ * Must be called with mem_hotplug_lock in write mode.
+ */
 int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 		       struct zone *zone, struct memory_group *group)
 {
@@ -1149,7 +1152,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 			 !IS_ALIGNED(pfn + nr_pages, PAGES_PER_SECTION)))
 		return -EINVAL;
 
-	mem_hotplug_begin();
 
 	/* associate pfn range with the zone */
 	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE);
@@ -1208,7 +1210,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 	writeback_set_ratelimit();
 
 	memory_notify(MEM_ONLINE, &arg);
-	mem_hotplug_done();
 	return 0;
 
 failed_addition:
@@ -1217,7 +1218,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 		 (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
 	memory_notify(MEM_CANCEL_ONLINE, &arg);
 	remove_pfn_range_from_zone(zone, pfn, nr_pages);
-	mem_hotplug_done();
 	return ret;
 }
 
@@ -1863,6 +1863,9 @@ static int count_system_ram_pages_cb(unsigned long start_pfn,
 	return 0;
 }
 
+/*
+ * Must be called with mem_hotplug_lock in write mode.
+ */
 int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 			struct zone *zone, struct memory_group *group)
 {
@@ -1885,8 +1888,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 			 !IS_ALIGNED(start_pfn + nr_pages, PAGES_PER_SECTION)))
 		return -EINVAL;
 
-	mem_hotplug_begin();
-
 	/*
 	 * Don't allow to offline memory blocks that contain holes.
 	 * Consequently, memory blocks with holes can never get onlined
@@ -2027,7 +2028,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 
 	memory_notify(MEM_OFFLINE, &arg);
 	remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
-	mem_hotplug_done();
 	return 0;
 
 failed_removal_isolated:
@@ -2042,7 +2042,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 		 (unsigned long long) start_pfn << PAGE_SHIFT,
 		 ((unsigned long long) end_pfn << PAGE_SHIFT) - 1,
 		 reason);
-	mem_hotplug_done();
 	return ret;
 }
 
-- 
2.41.0
Re: [PATCH 1/8] mm/memory_hotplug: fix memory hotplug locking order
Posted by David Hildenbrand 2 years, 1 month ago
On 14.11.23 19:02, Sumanth Korikkar wrote:

The patch subject talks about "fixing locking order", but it's actually 
missing locking, no?

>  From Documentation/core-api/memory-hotplug.rst:
> When adding/removing/onlining/offlining memory or adding/removing
> heterogeneous/device memory, we should always hold the mem_hotplug_lock
> in write mode to serialise memory hotplug (e.g. access to global/zone
> variables).
> 
> mhp_(de)init_memmap_on_memory() functions can change zone stats and
> struct page content, but they are currently called w/o the
> mem_hotplug_lock.
> 
> When memory block is being offlined and when kmemleak goes through each
> populated zone, the following theoretical race conditions could occur:
> CPU 0:					     | CPU 1:
> memory_offline()			     |
> -> offline_pages()			     |
> 	-> mem_hotplug_begin()		     |
> 	   ...				     |
> 	-> mem_hotplug_done()		     |
> 					     | kmemleak_scan()
> 					     | -> get_online_mems()
> 					     |    ...
> -> mhp_deinit_memmap_on_memory()	     |
>    [not protected by mem_hotplug_begin/done()]|
>    Marks memory section as offline,	     |   Retrieves zone_start_pfn
>    poisons vmemmap struct pages and updates   |   and struct page members.
>    the zone related data			     |
>     					     |    ...
>     					     | -> put_online_mems()
> 
> Fix this by ensuring mem_hotplug_lock is taken before performing
> mhp_init_memmap_on_memory(). Also ensure that
> mhp_deinit_memmap_on_memory() holds the lock.

What speaks against grabbing that lock in these functions?

-- 
Cheers,

David / dhildenb
Re: [PATCH 1/8] mm/memory_hotplug: fix memory hotplug locking order
Posted by Sumanth Korikkar 2 years, 1 month ago
On Tue, Nov 14, 2023 at 07:22:33PM +0100, David Hildenbrand wrote:
> On 14.11.23 19:02, Sumanth Korikkar wrote:
> 
> The patch subject talks about "fixing locking order", but it's actually
> missing locking, no?
> 
> >  From Documentation/core-api/memory-hotplug.rst:
> > When adding/removing/onlining/offlining memory or adding/removing
> > heterogeneous/device memory, we should always hold the mem_hotplug_lock
> > in write mode to serialise memory hotplug (e.g. access to global/zone
> > variables).
> > 
> > mhp_(de)init_memmap_on_memory() functions can change zone stats and
> > struct page content, but they are currently called w/o the
> > mem_hotplug_lock.
> > 
> > When memory block is being offlined and when kmemleak goes through each
> > populated zone, the following theoretical race conditions could occur:
> > CPU 0:					     | CPU 1:
> > memory_offline()			     |
> > -> offline_pages()			     |
> > 	-> mem_hotplug_begin()		     |
> > 	   ...				     |
> > 	-> mem_hotplug_done()		     |
> > 					     | kmemleak_scan()
> > 					     | -> get_online_mems()
> > 					     |    ...
> > -> mhp_deinit_memmap_on_memory()	     |
> >    [not protected by mem_hotplug_begin/done()]|
> >    Marks memory section as offline,	     |   Retrieves zone_start_pfn
> >    poisons vmemmap struct pages and updates   |   and struct page members.
> >    the zone related data			     |
> >     					     |    ...
> >     					     | -> put_online_mems()
> > 
> > Fix this by ensuring mem_hotplug_lock is taken before performing
> > mhp_init_memmap_on_memory(). Also ensure that
> > mhp_deinit_memmap_on_memory() holds the lock.
> 
> What speaks against grabbing that lock in these functions?
>
At present, the functions online_pages() and offline_pages() acquire the
mem_hotplug_lock right at the start. However, given the necessity of
locking in mhp_(de)init_memmap_on_memory(), it would be more efficient
to consolidate the locking process by holding the mem_hotplug_lock once
in memory_block_online() and memory_block_offline().

Moreover, the introduction of the 'memmap on memory' feature on s390
brings a new physical memory notifier, and functions like __add_pages()
or arch_add_memory() are consistently invoked with the mem_hotplug_lock
already acquired.

Considering these factors, it seemed more natural to move
mem_hotplug_lock in memory_block_online() and memory_block_offline(),
which was described as "fixing locking order" in the subject. 
I will change the subject to "add missing locking", if it is misleading .

Would you or Oscar agree that there is a need for those
mhp_(de)init_memmap_on_memory() functions to take lock at all?

Thanks
Re: [PATCH 1/8] mm/memory_hotplug: fix memory hotplug locking order
Posted by David Hildenbrand 2 years, 1 month ago
On 15.11.23 14:41, Sumanth Korikkar wrote:
> On Tue, Nov 14, 2023 at 07:22:33PM +0100, David Hildenbrand wrote:
>> On 14.11.23 19:02, Sumanth Korikkar wrote:
>>
>> The patch subject talks about "fixing locking order", but it's actually
>> missing locking, no?
>>
>>>   From Documentation/core-api/memory-hotplug.rst:
>>> When adding/removing/onlining/offlining memory or adding/removing
>>> heterogeneous/device memory, we should always hold the mem_hotplug_lock
>>> in write mode to serialise memory hotplug (e.g. access to global/zone
>>> variables).
>>>
>>> mhp_(de)init_memmap_on_memory() functions can change zone stats and
>>> struct page content, but they are currently called w/o the
>>> mem_hotplug_lock.
>>>
>>> When memory block is being offlined and when kmemleak goes through each
>>> populated zone, the following theoretical race conditions could occur:
>>> CPU 0:					     | CPU 1:
>>> memory_offline()			     |
>>> -> offline_pages()			     |
>>> 	-> mem_hotplug_begin()		     |
>>> 	   ...				     |
>>> 	-> mem_hotplug_done()		     |
>>> 					     | kmemleak_scan()
>>> 					     | -> get_online_mems()
>>> 					     |    ...
>>> -> mhp_deinit_memmap_on_memory()	     |
>>>     [not protected by mem_hotplug_begin/done()]|
>>>     Marks memory section as offline,	     |   Retrieves zone_start_pfn
>>>     poisons vmemmap struct pages and updates   |   and struct page members.
>>>     the zone related data			     |
>>>      					     |    ...
>>>      					     | -> put_online_mems()
>>>
>>> Fix this by ensuring mem_hotplug_lock is taken before performing
>>> mhp_init_memmap_on_memory(). Also ensure that
>>> mhp_deinit_memmap_on_memory() holds the lock.
>>
>> What speaks against grabbing that lock in these functions?
>>
> At present, the functions online_pages() and offline_pages() acquire the
> mem_hotplug_lock right at the start. However, given the necessity of
> locking in mhp_(de)init_memmap_on_memory(), it would be more efficient
> to consolidate the locking process by holding the mem_hotplug_lock once
> in memory_block_online() and memory_block_offline().

Good point; can you similarly add comments to these two functions that 
they need that lock in write mode?

-- 
Cheers,

David / dhildenb
Re: [PATCH 1/8] mm/memory_hotplug: fix memory hotplug locking order
Posted by Sumanth Korikkar 2 years, 1 month ago
On Thu, Nov 16, 2023 at 07:40:34PM +0100, David Hildenbrand wrote:
> Good point; can you similarly add comments to these two functions that they
> need that lock in write mode?

Ok, will add it.

Thanks