Dear Catalin and Andrew,
I hope this message finds you well.
While reviewing the implementation of kmemleak, I noticed a potential
issue where the physical memory occupied by struct page may be scanned
twice by kmemleak. To verify this hypothesis, I added some logging to
the code, and I would like to discuss my findings with you. Below are
the details of my environment, the logic I used, and the relevant log
output.
Test Platform:
Architecture: ARM64
Platform: qemu
branch: local branch based on V6.12
defconfig: use defconfig with CONFIG_KMEMLEAK open
Problem Description:
When CONFIG_SPARSEMEM is enabled, the memory for `struct page`objects
in the sections is allocated via `sparse_buffer` using the
`memblock_alloc_xxx` interface. Since memblock does not explicitly
specify `MEMBLOCK_ALLOC_NOLEAKTRACE`, kmemleak will treat the memory
as a gray object with mincount=0. In other words, the physical memory
occupied by `struct page` will be scanned by kmemleak as a gray object
during the scan thread.
Additionally, kmemleak also traverses and scans valid struct page
objects in the zone . As a result, the physical memory occupied by
struct page may end up being scanned twice by scan thread.
Logging Added:
To investigate further, I added the following logs:
Logs for the memory allocated for `struct page` objects through sparse memblock.
diff --git a/mm/sparse.c b/mm/sparse.c
index 3174b3ca3e9e..f2465b394344 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -362,6 +362,9 @@ static void __init sparse_buffer_init(unsigned
long size, int nid)
*/
sparsemap_buf = memmap_alloc(size, section_map_size(), addr, nid, true);
sparsemap_buf_end = sparsemap_buf + size;
+ pr_info("sparse buf alloc VA: [0x%08lx: 0x%08lx] to PA: \
+ [0x%08lx: 0x%08lx]\n",sparsemap_buf, sparsemap_buf_end,
+ virt_to_phys(sparsemap_buf), virt_to_phys(sparsemap_buf_end));
#ifndef CONFIG_SPARSEMEM_VMEMMAP
memmap_boot_pages_add(DIV_ROUND_UP(size, PAGE_SIZE));
#endif
Logs to show the mapping between VMEMAP VA and PA.
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index c0388b2e959d..f1a312f2a281 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -342,6 +342,8 @@ int __meminit vmemmap_populate_hugepages(unsigned
long start, unsigned long end,
p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
if (p) {
+ pr_info("==== vmemap 2M: VA:0x%08lx to
PA:0x%08lx \n",
+ start, virt_to_phys(p));
vmemmap_set_pmd(pmd, p, node, addr, next);
continue;
} else if (altmap) {
Logs showing the memblock and struct page objects scanned by kmemleak.
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 0efa41d77bcc..a803825dcb18 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1529,7 +1529,10 @@ static void scan_object(struct kmemleak_object *object)
(void *)object->pointer;
void *end = start + object->size;
void *next;
-
+ if (object->flags & OBJECT_PHYS) {
+ pr_info("======= scan memblock
OBJECT_PHYS:[0x%08lx:0x%08lx]\n",
+ start, end);
+ }
do {
next = min(start + MAX_SCAN_SIZE, end);
scan_block(start, next, object);
@@ -1677,6 +1680,8 @@ static void kmemleak_scan(void)
* Struct page scanning for each node.
*/
for_each_populated_zone(zone) {
+ static int print_scan_page = 0;
+
unsigned long start_pfn = zone->zone_start_pfn;
unsigned long end_pfn = zone_end_pfn(zone);
unsigned long pfn;
@@ -1696,6 +1701,14 @@ static void kmemleak_scan(void)
/* only scan if page is in use */
if (page_count(page) == 0)
continue;
+
+
+ if (print_scan_page == 0) {
+ pr_info("======= scan page
[0x%08lx:0x%08lx] \n",
+ page, page+1);
+ print_scan_page++;
+ }
+
scan_block(page, page + 1, NULL);
}
}
The log output is as follows:
[ 0.000000] sparse buf alloc VA: [0xffff39253b600000:
0xffff39253f600000] to PA: [0x13b600000: 0x13f600000]
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451000000 to PA:0x13b600000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451200000 to PA:0x13b800000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451400000 to PA:0x13ba00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451600000 to PA:0x13bc00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451800000 to PA:0x13be00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451a00000 to PA:0x13c000000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451c00000 to PA:0x13c200000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451e00000 to PA:0x13c400000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452000000 to PA:0x13c600000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452200000 to PA:0x13c800000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452400000 to PA:0x13ca00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452600000 to PA:0x13cc00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452800000 to PA:0x13ce00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452a00000 to PA:0x13d000000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452c00000 to PA:0x13d200000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452e00000 to PA:0x13d400000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453000000 to PA:0x13d600000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453200000 to PA:0x13d800000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453400000 to PA:0x13da00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453600000 to PA:0x13dc00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453800000 to PA:0x13de00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453a00000 to PA:0x13e000000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453c00000 to PA:0x13e200000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453e00000 to PA:0x13e400000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454000000 to PA:0x13e600000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454200000 to PA:0x13e800000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454400000 to PA:0x13ea00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454600000 to PA:0x13ec00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454800000 to PA:0x13ee00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454a00000 to PA:0x13f000000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454c00000 to PA:0x13f200000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454e00000 to PA:0x13f400000
[ 62.480979] kmemleak: ======= scan page
[0xfffffee451008400:0xfffffee451008440]
[ 62.705876] kmemleak: ======= scan memblock
OBJECT_PHYS:[0xffff39253b600000:0xffff39253f600000]
scan_page: VMEMAP VA is [ 0xfffffee451008400 - 0xfffffee451008440]
scan_page:Real PA is [0x13b608400 - 0x13b608440]
scan OBJECT_PHYS: VA is [0xffff39253b600000:0xffff39253f600000]
scan OBJECT_PHYS: PA is [0x13b600000:0x13f600000]
The first record page `0x13b608400` was scanned twice
Could you please help confirm whether this issue exists and if it
could lead to any unintended consequences? I would appreciate any
insights or suggestions you may have.
Possible solution: Specify `MEMBLOCK_ALLOC_NOLEAKTRACE` when alloc
`struct page memory`
Thank you for your time and assistance.
Best regards,
Guo
Hi Guo, On Wed, Dec 25, 2024 at 04:09:12PM +0800, Weikang Guo wrote: > Problem Description: > > When CONFIG_SPARSEMEM is enabled, the memory for `struct page`objects > in the sections is allocated via `sparse_buffer` using the > `memblock_alloc_xxx` interface. Since memblock does not explicitly > specify `MEMBLOCK_ALLOC_NOLEAKTRACE`, kmemleak will treat the memory > as a gray object with mincount=0. In other words, the physical memory > occupied by `struct page` will be scanned by kmemleak as a gray object > during the scan thread. > > Additionally, kmemleak also traverses and scans valid struct page > objects in the zone . As a result, the physical memory occupied by > struct page may end up being scanned twice by scan thread. Yes, I can see how this happens. I don't remember how we ended up like this, maybe kmemleak did not track memblock allocations in the early days. > Possible solution: Specify `MEMBLOCK_ALLOC_NOLEAKTRACE` when alloc > `struct page memory` I think that's the easiest and just let kmemleak scan the mem_map explicitly, whether it's in the linear map or vmemmap. The way we ended up with marking 'nokleaktrace' blocks in the memblock API isn't great. This "flag" used to be called MEMBLOCK_ALLOC_KASAN and only used by KASAN (implying accessible). But it's not an actual flag, just some random value passed as the 'end' argument to memblock_alloc() and friends. Luckily memmap_alloc() only needs MEMBLOCK_ALLOC_ACCESSIBLE which is implied by MEMBLOCK_ALLOC_NOLEAKTRACE (though I can't find any documentation about this). Anyway, if you fix memmap_alloc(), please add a comment that kmemleak scans this explicitly. Also add a comment where we define MEMBLOCK_ALLOC_NOLEAKTRACE to state that it implies MEMBLOCK_ALLOC_ACCESSIBLE. Ideally we should have named this MEMBLOCK_ALLOC_ACCESSIBLE_NOLEAKTRACE but it's nearly half the recommended line length. Thanks. -- Catalin
© 2016 - 2026 Red Hat, Inc.