From Documentation/core-api/memory-hotplug.rst:
When adding/removing/onlining/offlining memory or adding/removing
heterogeneous/device memory, we should always hold the mem_hotplug_lock
in write mode to serialise memory hotplug (e.g. access to global/zone
variables).
mhp_(de)init_memmap_on_memory() functions can change zone stats and
struct page content, but they are currently called w/o the
mem_hotplug_lock.
When memory block is being offlined and when kmemleak goes through each
populated zone, the following theoretical race conditions could occur:
CPU 0: | CPU 1:
memory_offline() |
-> offline_pages() |
-> mem_hotplug_begin() |
... |
-> mem_hotplug_done() |
| kmemleak_scan()
| -> get_online_mems()
| ...
-> mhp_deinit_memmap_on_memory() |
[not protected by mem_hotplug_begin/done()]|
Marks memory section as offline, | Retrieves zone_start_pfn
poisons vmemmap struct pages and updates | and struct page members.
the zone related data |
| ...
| -> put_online_mems()
Fix this by ensuring mem_hotplug_lock is taken before performing
mhp_init_memmap_on_memory(). Also ensure that
mhp_deinit_memmap_on_memory() holds the lock.
online/offline_pages() are currently only called from
memory_block_online/offline(), so it is safe to move the locking there.
Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range")
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
---
drivers/base/memory.c | 12 +++++++++---
mm/memory_hotplug.c | 13 ++++++-------
2 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f3b9a4d0fa3b..1e9f6a1749b9 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -204,10 +204,11 @@ static int memory_block_online(struct memory_block *mem)
if (mem->altmap)
nr_vmemmap_pages = mem->altmap->free;
+ mem_hotplug_begin();
if (nr_vmemmap_pages) {
ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
if (ret)
- return ret;
+ goto out;
}
ret = online_pages(start_pfn + nr_vmemmap_pages,
@@ -215,7 +216,7 @@ static int memory_block_online(struct memory_block *mem)
if (ret) {
if (nr_vmemmap_pages)
mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
- return ret;
+ goto out;
}
/*
@@ -227,6 +228,8 @@ static int memory_block_online(struct memory_block *mem)
nr_vmemmap_pages);
mem->zone = zone;
+out:
+ mem_hotplug_done();
return ret;
}
@@ -247,6 +250,7 @@ static int memory_block_offline(struct memory_block *mem)
if (mem->altmap)
nr_vmemmap_pages = mem->altmap->free;
+ mem_hotplug_begin();
if (nr_vmemmap_pages)
adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
-nr_vmemmap_pages);
@@ -258,13 +262,15 @@ static int memory_block_offline(struct memory_block *mem)
if (nr_vmemmap_pages)
adjust_present_page_count(pfn_to_page(start_pfn),
mem->group, nr_vmemmap_pages);
- return ret;
+ goto out;
}
if (nr_vmemmap_pages)
mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
mem->zone = NULL;
+out:
+ mem_hotplug_done();
return ret;
}
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1b03f4ec6fd2..c8238fc5edcb 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1129,6 +1129,9 @@ void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages)
kasan_remove_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages));
}
+/*
+ * Must be called with mem_hotplug_lock in write mode.
+ */
int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group)
{
@@ -1149,7 +1152,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
!IS_ALIGNED(pfn + nr_pages, PAGES_PER_SECTION)))
return -EINVAL;
- mem_hotplug_begin();
/* associate pfn range with the zone */
move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE);
@@ -1208,7 +1210,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
writeback_set_ratelimit();
memory_notify(MEM_ONLINE, &arg);
- mem_hotplug_done();
return 0;
failed_addition:
@@ -1217,7 +1218,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
(((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
memory_notify(MEM_CANCEL_ONLINE, &arg);
remove_pfn_range_from_zone(zone, pfn, nr_pages);
- mem_hotplug_done();
return ret;
}
@@ -1863,6 +1863,9 @@ static int count_system_ram_pages_cb(unsigned long start_pfn,
return 0;
}
+/*
+ * Must be called with mem_hotplug_lock in write mode.
+ */
int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group)
{
@@ -1885,8 +1888,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
!IS_ALIGNED(start_pfn + nr_pages, PAGES_PER_SECTION)))
return -EINVAL;
- mem_hotplug_begin();
-
/*
* Don't allow to offline memory blocks that contain holes.
* Consequently, memory blocks with holes can never get onlined
@@ -2027,7 +2028,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
memory_notify(MEM_OFFLINE, &arg);
remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
- mem_hotplug_done();
return 0;
failed_removal_isolated:
@@ -2042,7 +2042,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
(unsigned long long) start_pfn << PAGE_SHIFT,
((unsigned long long) end_pfn << PAGE_SHIFT) - 1,
reason);
- mem_hotplug_done();
return ret;
}
--
2.41.0
On 14.11.23 19:02, Sumanth Korikkar wrote: The patch subject talks about "fixing locking order", but it's actually missing locking, no? > From Documentation/core-api/memory-hotplug.rst: > When adding/removing/onlining/offlining memory or adding/removing > heterogeneous/device memory, we should always hold the mem_hotplug_lock > in write mode to serialise memory hotplug (e.g. access to global/zone > variables). > > mhp_(de)init_memmap_on_memory() functions can change zone stats and > struct page content, but they are currently called w/o the > mem_hotplug_lock. > > When memory block is being offlined and when kmemleak goes through each > populated zone, the following theoretical race conditions could occur: > CPU 0: | CPU 1: > memory_offline() | > -> offline_pages() | > -> mem_hotplug_begin() | > ... | > -> mem_hotplug_done() | > | kmemleak_scan() > | -> get_online_mems() > | ... > -> mhp_deinit_memmap_on_memory() | > [not protected by mem_hotplug_begin/done()]| > Marks memory section as offline, | Retrieves zone_start_pfn > poisons vmemmap struct pages and updates | and struct page members. > the zone related data | > | ... > | -> put_online_mems() > > Fix this by ensuring mem_hotplug_lock is taken before performing > mhp_init_memmap_on_memory(). Also ensure that > mhp_deinit_memmap_on_memory() holds the lock. What speaks against grabbing that lock in these functions? -- Cheers, David / dhildenb
On Tue, Nov 14, 2023 at 07:22:33PM +0100, David Hildenbrand wrote: > On 14.11.23 19:02, Sumanth Korikkar wrote: > > The patch subject talks about "fixing locking order", but it's actually > missing locking, no? > > > From Documentation/core-api/memory-hotplug.rst: > > When adding/removing/onlining/offlining memory or adding/removing > > heterogeneous/device memory, we should always hold the mem_hotplug_lock > > in write mode to serialise memory hotplug (e.g. access to global/zone > > variables). > > > > mhp_(de)init_memmap_on_memory() functions can change zone stats and > > struct page content, but they are currently called w/o the > > mem_hotplug_lock. > > > > When memory block is being offlined and when kmemleak goes through each > > populated zone, the following theoretical race conditions could occur: > > CPU 0: | CPU 1: > > memory_offline() | > > -> offline_pages() | > > -> mem_hotplug_begin() | > > ... | > > -> mem_hotplug_done() | > > | kmemleak_scan() > > | -> get_online_mems() > > | ... > > -> mhp_deinit_memmap_on_memory() | > > [not protected by mem_hotplug_begin/done()]| > > Marks memory section as offline, | Retrieves zone_start_pfn > > poisons vmemmap struct pages and updates | and struct page members. > > the zone related data | > > | ... > > | -> put_online_mems() > > > > Fix this by ensuring mem_hotplug_lock is taken before performing > > mhp_init_memmap_on_memory(). Also ensure that > > mhp_deinit_memmap_on_memory() holds the lock. > > What speaks against grabbing that lock in these functions? > At present, the functions online_pages() and offline_pages() acquire the mem_hotplug_lock right at the start. However, given the necessity of locking in mhp_(de)init_memmap_on_memory(), it would be more efficient to consolidate the locking process by holding the mem_hotplug_lock once in memory_block_online() and memory_block_offline(). Moreover, the introduction of the 'memmap on memory' feature on s390 brings a new physical memory notifier, and functions like __add_pages() or arch_add_memory() are consistently invoked with the mem_hotplug_lock already acquired. Considering these factors, it seemed more natural to move mem_hotplug_lock in memory_block_online() and memory_block_offline(), which was described as "fixing locking order" in the subject. I will change the subject to "add missing locking", if it is misleading . Would you or Oscar agree that there is a need for those mhp_(de)init_memmap_on_memory() functions to take lock at all? Thanks
On 15.11.23 14:41, Sumanth Korikkar wrote: > On Tue, Nov 14, 2023 at 07:22:33PM +0100, David Hildenbrand wrote: >> On 14.11.23 19:02, Sumanth Korikkar wrote: >> >> The patch subject talks about "fixing locking order", but it's actually >> missing locking, no? >> >>> From Documentation/core-api/memory-hotplug.rst: >>> When adding/removing/onlining/offlining memory or adding/removing >>> heterogeneous/device memory, we should always hold the mem_hotplug_lock >>> in write mode to serialise memory hotplug (e.g. access to global/zone >>> variables). >>> >>> mhp_(de)init_memmap_on_memory() functions can change zone stats and >>> struct page content, but they are currently called w/o the >>> mem_hotplug_lock. >>> >>> When memory block is being offlined and when kmemleak goes through each >>> populated zone, the following theoretical race conditions could occur: >>> CPU 0: | CPU 1: >>> memory_offline() | >>> -> offline_pages() | >>> -> mem_hotplug_begin() | >>> ... | >>> -> mem_hotplug_done() | >>> | kmemleak_scan() >>> | -> get_online_mems() >>> | ... >>> -> mhp_deinit_memmap_on_memory() | >>> [not protected by mem_hotplug_begin/done()]| >>> Marks memory section as offline, | Retrieves zone_start_pfn >>> poisons vmemmap struct pages and updates | and struct page members. >>> the zone related data | >>> | ... >>> | -> put_online_mems() >>> >>> Fix this by ensuring mem_hotplug_lock is taken before performing >>> mhp_init_memmap_on_memory(). Also ensure that >>> mhp_deinit_memmap_on_memory() holds the lock. >> >> What speaks against grabbing that lock in these functions? >> > At present, the functions online_pages() and offline_pages() acquire the > mem_hotplug_lock right at the start. However, given the necessity of > locking in mhp_(de)init_memmap_on_memory(), it would be more efficient > to consolidate the locking process by holding the mem_hotplug_lock once > in memory_block_online() and memory_block_offline(). Good point; can you similarly add comments to these two functions that they need that lock in write mode? -- Cheers, David / dhildenb
On Thu, Nov 16, 2023 at 07:40:34PM +0100, David Hildenbrand wrote: > Good point; can you similarly add comments to these two functions that they > need that lock in write mode? Ok, will add it. Thanks
© 2016 - 2025 Red Hat, Inc.