[PATCH v2] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization

Hao Ge posted 1 patch 1 week ago
There is a newer version of this series
include/linux/alloc_tag.h   |  2 +
include/linux/pgalloc_tag.h |  2 +-
lib/alloc_tag.c             | 92 +++++++++++++++++++++++++++++++++++++
mm/page_alloc.c             |  7 +++
4 files changed, 102 insertions(+), 1 deletion(-)
[PATCH v2] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization
Posted by Hao Ge 1 week ago
Due to initialization ordering, page_ext is allocated and initialized
relatively late during boot. Some pages have already been allocated
and freed before page_ext becomes available, leaving their codetag
uninitialized.

A clear example is in init_section_page_ext(): alloc_page_ext() calls
kmemleak_alloc(). If the slab cache has no free objects, it falls back
to the buddy allocator to allocate memory. However, at this point page_ext
is not yet fully initialized, so these newly allocated pages have no
codetag set. These pages may later be reclaimed by KASAN, which causes
the warning to trigger when they are freed because their codetag ref is
still empty.

Use a global array to track pages allocated before page_ext is fully
initialized. The array size is fixed at 8192 entries, and will emit
a warning if this limit is exceeded. When page_ext initialization
completes, set their codetag to empty to avoid warnings when they
are freed later.

The following warning is observed when this issue occurs:
[    9.582133] ------------[ cut here ]------------
[    9.582137] alloc_tag was not set
[    9.582139] WARNING: ./include/linux/alloc_tag.h:164 at __pgalloc_tag_sub+0x40f/0x550, CPU#5: systemd/1
[    9.582190] CPU: 5 UID: 0 PID: 1 Comm: systemd Not tainted 7.0.0-rc4 #1 PREEMPT(lazy)
[    9.582192] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    9.582194] RIP: 0010:__pgalloc_tag_sub+0x40f/0x550
[    9.582196] Code: 00 00 4c 29 e5 48 8b 05 1f 88 56 05 48 8d 4c ad 00 48 8d 2c c8 e9 87 fd ff ff 0f 0b 0f 0b e9 f3 fe ff ff 48 8d 3d 61 2f ed 03 <67> 48 0f b9 3a e9 b3 fd ff ff 0f 0b eb e4 e8 5e cd 14 02 4c 89 c7
[    9.582197] RSP: 0018:ffffc9000001f940 EFLAGS: 00010246
[    9.582200] RAX: dffffc0000000000 RBX: 1ffff92000003f2b RCX: 1ffff110200d806c
[    9.582201] RDX: ffff8881006c0360 RSI: 0000000000000004 RDI: ffffffff9bc7b460
[    9.582202] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff3a62324
[    9.582203] R10: ffffffff9d311923 R11: 0000000000000000 R12: ffffea0004001b00
[    9.582204] R13: 0000000000002000 R14: ffffea0000000000 R15: ffff8881006c0360
[    9.582206] FS:  00007ffbbcf2d940(0000) GS:ffff888450479000(0000) knlGS:0000000000000000
[    9.582208] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.582210] CR2: 000055ee3aa260d0 CR3: 0000000148b67005 CR4: 0000000000770ef0
[    9.582211] PKRU: 55555554
[    9.582212] Call Trace:
[    9.582213]  <TASK>
[    9.582214]  ? __pfx___pgalloc_tag_sub+0x10/0x10
[    9.582216]  ? check_bytes_and_report+0x68/0x140
[    9.582219]  __free_frozen_pages+0x2e4/0x1150
[    9.582221]  ? __free_slab+0xc2/0x2b0
[    9.582224]  qlist_free_all+0x4c/0xf0
[    9.582227]  kasan_quarantine_reduce+0x15d/0x180
[    9.582229]  __kasan_slab_alloc+0x69/0x90
[    9.582232]  kmem_cache_alloc_noprof+0x14a/0x500
[    9.582234]  do_getname+0x96/0x310
[    9.582237]  do_readlinkat+0x91/0x2f0
[    9.582239]  ? __pfx_do_readlinkat+0x10/0x10
[    9.582240]  ? get_random_bytes_user+0x1df/0x2c0
[    9.582244]  __x64_sys_readlinkat+0x96/0x100
[    9.582246]  do_syscall_64+0xce/0x650
[    9.582250]  ? __x64_sys_getrandom+0x13a/0x1e0
[    9.582252]  ? __pfx___x64_sys_getrandom+0x10/0x10
[    9.582254]  ? do_syscall_64+0x114/0x650
[    9.582255]  ? ksys_read+0xfc/0x1d0
[    9.582258]  ? __pfx_ksys_read+0x10/0x10
[    9.582260]  ? do_syscall_64+0x114/0x650
[    9.582262]  ? do_syscall_64+0x114/0x650
[    9.582264]  ? __pfx_fput_close_sync+0x10/0x10
[    9.582266]  ? file_close_fd_locked+0x178/0x2a0
[    9.582268]  ? __x64_sys_faccessat2+0x96/0x100
[    9.582269]  ? __x64_sys_close+0x7d/0xd0
[    9.582271]  ? do_syscall_64+0x114/0x650
[    9.582273]  ? do_syscall_64+0x114/0x650
[    9.582275]  ? clear_bhb_loop+0x50/0xa0
[    9.582277]  ? clear_bhb_loop+0x50/0xa0
[    9.582279]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    9.582280] RIP: 0033:0x7ffbbda345ee
[    9.582282] Code: 0f 1f 40 00 48 8b 15 29 38 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 0b 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fa 37 0d 00 f7 d8 64 89 01 48
[    9.582284] RSP: 002b:00007ffe2ad8de58 EFLAGS: 00000202 ORIG_RAX: 000000000000010b
[    9.582286] RAX: ffffffffffffffda RBX: 000055ee3aa25570 RCX: 00007ffbbda345ee
[    9.582287] RDX: 000055ee3aa25570 RSI: 00007ffe2ad8dee0 RDI: 00000000ffffff9c
[    9.582288] RBP: 0000000000001000 R08: 0000000000000003 R09: 0000000000001001
[    9.582289] R10: 0000000000001000 R11: 0000000000000202 R12: 0000000000000033
[    9.582290] R13: 00007ffe2ad8dee0 R14: 00000000ffffff9c R15: 00007ffe2ad8deb0
[    9.582292]  </TASK>
[    9.582293] ---[ end trace 0000000000000000 ]---

Fixes: 93d5440ece3c ("alloc_tag: uninline code gated by mem_alloc_profiling_key in page allocator")
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Hao Ge <hao.ge@linux.dev>
---
v2:
  - Replace spin_lock_irqsave() with atomic_try_cmpxchg() to avoid potential
     deadlock in NMI context
  - Change EARLY_ALLOC_PFN_MAX from 256 to 8192
  - Add pr_warn_once() when the limit is exceeded
  - Check ref.ct before clearing to avoid overwriting valid tags
  - Use function pointer (alloc_tag_add_early_pfn_ptr) instead of state
---
 include/linux/alloc_tag.h   |  2 +
 include/linux/pgalloc_tag.h |  2 +-
 lib/alloc_tag.c             | 92 +++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c             |  7 +++
 4 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index d40ac39bfbe8..bf226c2be2ad 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -74,6 +74,8 @@ static inline void set_codetag_empty(union codetag_ref *ref)
 
 #ifdef CONFIG_MEM_ALLOC_PROFILING
 
+void alloc_tag_add_early_pfn(unsigned long pfn);
+
 #define ALLOC_TAG_SECTION_NAME	"alloc_tags"
 
 struct codetag_bytes {
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 38a82d65e58e..951d33362268 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -181,7 +181,7 @@ static inline struct alloc_tag *__pgalloc_tag_get(struct page *page)
 
 	if (get_page_tag_ref(page, &ref, &handle)) {
 		alloc_tag_sub_check(&ref);
-		if (ref.ct)
+		if (ref.ct && !is_codetag_empty(&ref))
 			tag = ct_to_alloc_tag(ref.ct);
 		put_page_tag_ref(handle);
 	}
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 58991ab09d84..7b1812768af9 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -6,6 +6,7 @@
 #include <linux/kallsyms.h>
 #include <linux/module.h>
 #include <linux/page_ext.h>
+#include <linux/pgalloc_tag.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_buf.h>
 #include <linux/seq_file.h>
@@ -26,6 +27,96 @@ static bool mem_profiling_support;
 
 static struct codetag_type *alloc_tag_cttype;
 
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+
+/*
+ * Track page allocations before page_ext is initialized.
+ * Some pages are allocated before page_ext becomes available, leaving
+ * their codetag uninitialized. Track these early PFNs so we can clear
+ * their codetag refs later to avoid warnings when they are freed.
+ *
+ * Early allocations include:
+ *   - Base allocations independent of CPU count
+ *   - Per-CPU allocations (e.g., CPU hotplug callbacks during smp_init,
+ *     such as trace ring buffers, scheduler per-cpu data)
+ *
+ * For simplicity, we fix the size to 8192.
+ * If insufficient, a warning will be triggered to alert the user.
+ */
+#define EARLY_ALLOC_PFN_MAX		8192
+
+static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX] __initdata;
+static atomic_t early_pfn_count __initdata = ATOMIC_INIT(0);
+
+static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
+{
+	int old_idx, new_idx;
+
+	do {
+		old_idx = atomic_read(&early_pfn_count);
+		if (old_idx >= EARLY_ALLOC_PFN_MAX) {
+			pr_warn_once("Early page allocations before page_ext init exceeded EARLY_ALLOC_PFN_MAX (%d)\n",
+				      EARLY_ALLOC_PFN_MAX);
+			return;
+		}
+		new_idx = old_idx + 1;
+	} while (!atomic_try_cmpxchg(&early_pfn_count, &old_idx, new_idx));
+
+	early_pfns[old_idx] = pfn;
+}
+
+static void (*alloc_tag_add_early_pfn_ptr)(unsigned long pfn) __refdata =
+		__alloc_tag_add_early_pfn;
+
+void alloc_tag_add_early_pfn(unsigned long pfn)
+{
+	if (static_key_enabled(&mem_profiling_compressed))
+		return;
+
+	if (alloc_tag_add_early_pfn_ptr)
+		alloc_tag_add_early_pfn_ptr(pfn);
+}
+
+static void __init clear_early_alloc_pfn_tag_refs(void)
+{
+	unsigned int i;
+
+	for (i = 0; i < atomic_read(&early_pfn_count); i++) {
+		unsigned long pfn = early_pfns[i];
+
+		if (pfn_valid(pfn)) {
+			struct page *page = pfn_to_page(pfn);
+			union pgtag_ref_handle handle;
+			union codetag_ref ref;
+
+			if (get_page_tag_ref(page, &ref, &handle)) {
+				/*
+				 * An early-allocated page could be freed and reallocated
+				 * after its page_ext is initialized but before we clear it.
+				 * In that case, it already has a valid tag set.
+				 * We should not overwrite that valid tag with CODETAG_EMPTY.
+				 */
+				if (ref.ct) {
+					put_page_tag_ref(handle);
+					continue;
+				}
+
+				set_codetag_empty(&ref);
+				update_page_tag_ref(handle, &ref);
+				put_page_tag_ref(handle);
+			}
+		}
+
+	}
+	atomic_set(&early_pfn_count, 0);
+
+	alloc_tag_add_early_pfn_ptr = NULL;
+}
+#else /* !CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+inline void alloc_tag_add_early_pfn(unsigned long pfn) {}
+static inline void __init clear_early_alloc_pfn_tag_refs(void) {}
+#endif
+
 #ifdef CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU
 DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
 EXPORT_SYMBOL(_shared_alloc_tag);
@@ -760,6 +851,7 @@ static __init bool need_page_alloc_tagging(void)
 
 static __init void init_page_alloc_tagging(void)
 {
+	clear_early_alloc_pfn_tag_refs();
 }
 
 struct page_ext_operations page_alloc_tagging_ops = {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2d4b6f1a554e..8f9bda04403b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task,
 		alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr);
 		update_page_tag_ref(handle, &ref);
 		put_page_tag_ref(handle);
+	} else {
+		/*
+		 * page_ext is not available yet, record the pfn so we can
+		 * clear the tag ref later when page_ext is initialized.
+		 */
+		alloc_tag_add_early_pfn(page_to_pfn(page));
+		alloc_tag_set_inaccurate(current->alloc_tag);
 	}
 }
 
-- 
2.25.1
Re: [PATCH v2] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization
Posted by Suren Baghdasaryan 6 days, 14 hours ago
On Thu, Mar 26, 2026 at 7:07 AM Hao Ge <hao.ge@linux.dev> wrote:
>
> Due to initialization ordering, page_ext is allocated and initialized
> relatively late during boot. Some pages have already been allocated
> and freed before page_ext becomes available, leaving their codetag
> uninitialized.
>
> A clear example is in init_section_page_ext(): alloc_page_ext() calls
> kmemleak_alloc(). If the slab cache has no free objects, it falls back
> to the buddy allocator to allocate memory. However, at this point page_ext
> is not yet fully initialized, so these newly allocated pages have no
> codetag set. These pages may later be reclaimed by KASAN, which causes
> the warning to trigger when they are freed because their codetag ref is
> still empty.
>
> Use a global array to track pages allocated before page_ext is fully
> initialized. The array size is fixed at 8192 entries, and will emit
> a warning if this limit is exceeded. When page_ext initialization
> completes, set their codetag to empty to avoid warnings when they
> are freed later.
>
> The following warning is observed when this issue occurs:
> [    9.582133] ------------[ cut here ]------------
> [    9.582137] alloc_tag was not set
> [    9.582139] WARNING: ./include/linux/alloc_tag.h:164 at __pgalloc_tag_sub+0x40f/0x550, CPU#5: systemd/1
> [    9.582190] CPU: 5 UID: 0 PID: 1 Comm: systemd Not tainted 7.0.0-rc4 #1 PREEMPT(lazy)
> [    9.582192] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [    9.582194] RIP: 0010:__pgalloc_tag_sub+0x40f/0x550
> [    9.582196] Code: 00 00 4c 29 e5 48 8b 05 1f 88 56 05 48 8d 4c ad 00 48 8d 2c c8 e9 87 fd ff ff 0f 0b 0f 0b e9 f3 fe ff ff 48 8d 3d 61 2f ed 03 <67> 48 0f b9 3a e9 b3 fd ff ff 0f 0b eb e4 e8 5e cd 14 02 4c 89 c7
> [    9.582197] RSP: 0018:ffffc9000001f940 EFLAGS: 00010246
> [    9.582200] RAX: dffffc0000000000 RBX: 1ffff92000003f2b RCX: 1ffff110200d806c
> [    9.582201] RDX: ffff8881006c0360 RSI: 0000000000000004 RDI: ffffffff9bc7b460
> [    9.582202] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff3a62324
> [    9.582203] R10: ffffffff9d311923 R11: 0000000000000000 R12: ffffea0004001b00
> [    9.582204] R13: 0000000000002000 R14: ffffea0000000000 R15: ffff8881006c0360
> [    9.582206] FS:  00007ffbbcf2d940(0000) GS:ffff888450479000(0000) knlGS:0000000000000000
> [    9.582208] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    9.582210] CR2: 000055ee3aa260d0 CR3: 0000000148b67005 CR4: 0000000000770ef0
> [    9.582211] PKRU: 55555554
> [    9.582212] Call Trace:
> [    9.582213]  <TASK>
> [    9.582214]  ? __pfx___pgalloc_tag_sub+0x10/0x10
> [    9.582216]  ? check_bytes_and_report+0x68/0x140
> [    9.582219]  __free_frozen_pages+0x2e4/0x1150
> [    9.582221]  ? __free_slab+0xc2/0x2b0
> [    9.582224]  qlist_free_all+0x4c/0xf0
> [    9.582227]  kasan_quarantine_reduce+0x15d/0x180
> [    9.582229]  __kasan_slab_alloc+0x69/0x90
> [    9.582232]  kmem_cache_alloc_noprof+0x14a/0x500
> [    9.582234]  do_getname+0x96/0x310
> [    9.582237]  do_readlinkat+0x91/0x2f0
> [    9.582239]  ? __pfx_do_readlinkat+0x10/0x10
> [    9.582240]  ? get_random_bytes_user+0x1df/0x2c0
> [    9.582244]  __x64_sys_readlinkat+0x96/0x100
> [    9.582246]  do_syscall_64+0xce/0x650
> [    9.582250]  ? __x64_sys_getrandom+0x13a/0x1e0
> [    9.582252]  ? __pfx___x64_sys_getrandom+0x10/0x10
> [    9.582254]  ? do_syscall_64+0x114/0x650
> [    9.582255]  ? ksys_read+0xfc/0x1d0
> [    9.582258]  ? __pfx_ksys_read+0x10/0x10
> [    9.582260]  ? do_syscall_64+0x114/0x650
> [    9.582262]  ? do_syscall_64+0x114/0x650
> [    9.582264]  ? __pfx_fput_close_sync+0x10/0x10
> [    9.582266]  ? file_close_fd_locked+0x178/0x2a0
> [    9.582268]  ? __x64_sys_faccessat2+0x96/0x100
> [    9.582269]  ? __x64_sys_close+0x7d/0xd0
> [    9.582271]  ? do_syscall_64+0x114/0x650
> [    9.582273]  ? do_syscall_64+0x114/0x650
> [    9.582275]  ? clear_bhb_loop+0x50/0xa0
> [    9.582277]  ? clear_bhb_loop+0x50/0xa0
> [    9.582279]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [    9.582280] RIP: 0033:0x7ffbbda345ee
> [    9.582282] Code: 0f 1f 40 00 48 8b 15 29 38 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 0b 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fa 37 0d 00 f7 d8 64 89 01 48
> [    9.582284] RSP: 002b:00007ffe2ad8de58 EFLAGS: 00000202 ORIG_RAX: 000000000000010b
> [    9.582286] RAX: ffffffffffffffda RBX: 000055ee3aa25570 RCX: 00007ffbbda345ee
> [    9.582287] RDX: 000055ee3aa25570 RSI: 00007ffe2ad8dee0 RDI: 00000000ffffff9c
> [    9.582288] RBP: 0000000000001000 R08: 0000000000000003 R09: 0000000000001001
> [    9.582289] R10: 0000000000001000 R11: 0000000000000202 R12: 0000000000000033
> [    9.582290] R13: 00007ffe2ad8dee0 R14: 00000000ffffff9c R15: 00007ffe2ad8deb0
> [    9.582292]  </TASK>
> [    9.582293] ---[ end trace 0000000000000000 ]---
>
> Fixes: 93d5440ece3c ("alloc_tag: uninline code gated by mem_alloc_profiling_key in page allocator")
> Suggested-by: Suren Baghdasaryan <surenb@google.com>
> Signed-off-by: Hao Ge <hao.ge@linux.dev>
> ---
> v2:
>   - Replace spin_lock_irqsave() with atomic_try_cmpxchg() to avoid potential
>      deadlock in NMI context
>   - Change EARLY_ALLOC_PFN_MAX from 256 to 8192
>   - Add pr_warn_once() when the limit is exceeded
>   - Check ref.ct before clearing to avoid overwriting valid tags
>   - Use function pointer (alloc_tag_add_early_pfn_ptr) instead of state
> ---
>  include/linux/alloc_tag.h   |  2 +
>  include/linux/pgalloc_tag.h |  2 +-
>  lib/alloc_tag.c             | 92 +++++++++++++++++++++++++++++++++++++
>  mm/page_alloc.c             |  7 +++
>  4 files changed, 102 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> index d40ac39bfbe8..bf226c2be2ad 100644
> --- a/include/linux/alloc_tag.h
> +++ b/include/linux/alloc_tag.h
> @@ -74,6 +74,8 @@ static inline void set_codetag_empty(union codetag_ref *ref)
>
>  #ifdef CONFIG_MEM_ALLOC_PROFILING
>
> +void alloc_tag_add_early_pfn(unsigned long pfn);

Although this works, the usual approach is have it defined this way in
the header file:

#if CONFIG_MEM_ALLOC_PROFILING_DEBUG
void alloc_tag_add_early_pfn(unsigned long pfn);
#else
static inline void alloc_tag_add_early_pfn(unsigned long pfn) {}
#endif

> +
>  #define ALLOC_TAG_SECTION_NAME "alloc_tags"
>
>  struct codetag_bytes {
> diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
> index 38a82d65e58e..951d33362268 100644
> --- a/include/linux/pgalloc_tag.h
> +++ b/include/linux/pgalloc_tag.h
> @@ -181,7 +181,7 @@ static inline struct alloc_tag *__pgalloc_tag_get(struct page *page)
>
>         if (get_page_tag_ref(page, &ref, &handle)) {
>                 alloc_tag_sub_check(&ref);
> -               if (ref.ct)
> +               if (ref.ct && !is_codetag_empty(&ref))
>                         tag = ct_to_alloc_tag(ref.ct);
>                 put_page_tag_ref(handle);
>         }
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index 58991ab09d84..7b1812768af9 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -6,6 +6,7 @@
>  #include <linux/kallsyms.h>
>  #include <linux/module.h>
>  #include <linux/page_ext.h>
> +#include <linux/pgalloc_tag.h>
>  #include <linux/proc_fs.h>
>  #include <linux/seq_buf.h>
>  #include <linux/seq_file.h>
> @@ -26,6 +27,96 @@ static bool mem_profiling_support;
>
>  static struct codetag_type *alloc_tag_cttype;
>
> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
> +
> +/*
> + * Track page allocations before page_ext is initialized.
> + * Some pages are allocated before page_ext becomes available, leaving
> + * their codetag uninitialized. Track these early PFNs so we can clear
> + * their codetag refs later to avoid warnings when they are freed.
> + *
> + * Early allocations include:
> + *   - Base allocations independent of CPU count
> + *   - Per-CPU allocations (e.g., CPU hotplug callbacks during smp_init,
> + *     such as trace ring buffers, scheduler per-cpu data)
> + *
> + * For simplicity, we fix the size to 8192.
> + * If insufficient, a warning will be triggered to alert the user.
> + */
> +#define EARLY_ALLOC_PFN_MAX            8192
> +
> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX] __initdata;
> +static atomic_t early_pfn_count __initdata = ATOMIC_INIT(0);
> +
> +static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
> +{
> +       int old_idx, new_idx;
> +
> +       do {
> +               old_idx = atomic_read(&early_pfn_count);
> +               if (old_idx >= EARLY_ALLOC_PFN_MAX) {
> +                       pr_warn_once("Early page allocations before page_ext init exceeded EARLY_ALLOC_PFN_MAX (%d)\n",
> +                                     EARLY_ALLOC_PFN_MAX);
> +                       return;
> +               }
> +               new_idx = old_idx + 1;
> +       } while (!atomic_try_cmpxchg(&early_pfn_count, &old_idx, new_idx));
> +
> +       early_pfns[old_idx] = pfn;
> +}
> +
> +static void (*alloc_tag_add_early_pfn_ptr)(unsigned long pfn) __refdata =
> +               __alloc_tag_add_early_pfn;

So, there is a possible race between clear_early_alloc_pfn_tag_refs()
and __alloc_tag_add_early_pfn(). I think the easiest way to resolve
this is using RCU. It's easier to show that with the code:

typedef void (*alloc_tag_add_func)(unsigned long pfn);

static alloc_tag_add_func __rcu alloc_tag_add_early_pfn_ptr __refdata =
                __alloc_tag_add_early_pfn;

void alloc_tag_add_early_pfn(unsigned long pfn)
{
        alloc_tag_add_func alloc_tag_add;

        if (static_key_enabled(&mem_profiling_compressed))
                return;

        rcu_read_lock();
        alloc_tag_add = rcu_dereference(alloc_tag_add_early_pfn_ptr);
        if (alloc_tag_add)
                alloc_tag_add(pfn);
        rcu_read_unlock();
}

static void __init clear_early_alloc_pfn_tag_refs(void)
{
        unsigned int i;

        if (static_key_enabled(&mem_profiling_compressed))
                return;

       rcu_assign_pointer(alloc_tag_add_early_pfn_ptr, NULL);
        /* Make sure we are not racing with __alloc_tag_add_early_pfn() */
        synchronize_rcu();
        ...
}

So, clear_early_alloc_pfn_tag_refs() resets
alloc_tag_add_early_pfn_ptr to NULL before starting its loop and
alloc_tag_add_early_pfn() calls __alloc_tag_add_early_pfn() in RCU
read section. This way you know that after synchronize_rcu() nobody is
or will be executing __alloc_tag_add_early_pfn() anymore.
synchronize_rcu() can increase boot time but this happens only with
CONFIG_MEM_ALLOC_PROFILING_DEBUG, so should be acceptable.

> +
> +void alloc_tag_add_early_pfn(unsigned long pfn)
> +{
> +       if (static_key_enabled(&mem_profiling_compressed))
> +               return;
> +
> +       if (alloc_tag_add_early_pfn_ptr)
> +               alloc_tag_add_early_pfn_ptr(pfn);
> +}
> +
> +static void __init clear_early_alloc_pfn_tag_refs(void)
> +{
> +       unsigned int i;
> +

I included this in the code I suggested above but just as a reminder,
here we also need:

      if (static_key_enabled(&mem_profiling_compressed))
               return;

> +       for (i = 0; i < atomic_read(&early_pfn_count); i++) {
> +               unsigned long pfn = early_pfns[i];
> +
> +               if (pfn_valid(pfn)) {
> +                       struct page *page = pfn_to_page(pfn);
> +                       union pgtag_ref_handle handle;
> +                       union codetag_ref ref;
> +
> +                       if (get_page_tag_ref(page, &ref, &handle)) {
> +                               /*
> +                                * An early-allocated page could be freed and reallocated
> +                                * after its page_ext is initialized but before we clear it.
> +                                * In that case, it already has a valid tag set.
> +                                * We should not overwrite that valid tag with CODETAG_EMPTY.
> +                                */

You don't really solve this race here. See explanation below.

> +                               if (ref.ct) {
> +                                       put_page_tag_ref(handle);
> +                                       continue;
> +                               }
> +

Between the above "if (ref.ct)" check and below set_codetag_empty() an
allocation can change the ref.ct value to a valid reference (because
page_ext already exists) and you will override it with CODETAG_EMPTY.
I think we have two options:
1. Just let that override happen and lose accounting for that racing
allocation. I think that's preferred option since the race is not
likely and extra complexity is not worth it IMO.
2.  Do clear_page_tag_ref() here but atomically. Something like
clear_page_tag_ref_if_null() calling update_page_tag_ref_if_null()
which calls cmpxchg(&ref->ct, NULL, CODETAG_EMPTY).

If you agree with option #1 then please update the comment above
highlighting this smaller race and that we are ok with it.

> +                               set_codetag_empty(&ref);
> +                               update_page_tag_ref(handle, &ref);
> +                               put_page_tag_ref(handle);
> +                       }
> +               }
> +
> +       }
> +
> +       atomic_set(&early_pfn_count, 0);
> +       alloc_tag_add_early_pfn_ptr = NULL;

Once we did that RCU synchronization we don't need the above resets.
early_pfn_count won't be used anymore and alloc_tag_add_early_pfn_ptr
is already NULL.

> +}
> +#else /* !CONFIG_MEM_ALLOC_PROFILING_DEBUG */
> +inline void alloc_tag_add_early_pfn(unsigned long pfn) {}
> +static inline void __init clear_early_alloc_pfn_tag_refs(void) {}
> +#endif
> +
>  #ifdef CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU
>  DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
>  EXPORT_SYMBOL(_shared_alloc_tag);
> @@ -760,6 +851,7 @@ static __init bool need_page_alloc_tagging(void)
>
>  static __init void init_page_alloc_tagging(void)
>  {
> +       clear_early_alloc_pfn_tag_refs();
>  }
>
>  struct page_ext_operations page_alloc_tagging_ops = {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2d4b6f1a554e..8f9bda04403b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task,

In here let's mark the normal branch as "likely":
-        if (get_page_tag_ref(page, &ref, &handle)) {
+        if (likely(get_page_tag_ref(page, &ref, &handle))) {

>                 alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr);
>                 update_page_tag_ref(handle, &ref);
>                 put_page_tag_ref(handle);
> +       } else {
> +               /*
> +                * page_ext is not available yet, record the pfn so we can
> +                * clear the tag ref later when page_ext is initialized.
> +                */
> +               alloc_tag_add_early_pfn(page_to_pfn(page));
> +               alloc_tag_set_inaccurate(current->alloc_tag);

Here we should be using task->alloc_tag instead of current->alloc_tag
but we also need to check that task->alloc_tag != NULL.

>         }
>  }
>
> --
> 2.25.1
>
Re: [PATCH v2] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization
Posted by Suren Baghdasaryan 6 days, 14 hours ago
On Thu, Mar 26, 2026 at 9:32 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Mar 26, 2026 at 7:07 AM Hao Ge <hao.ge@linux.dev> wrote:
> >
> > Due to initialization ordering, page_ext is allocated and initialized
> > relatively late during boot. Some pages have already been allocated
> > and freed before page_ext becomes available, leaving their codetag
> > uninitialized.
> >
> > A clear example is in init_section_page_ext(): alloc_page_ext() calls
> > kmemleak_alloc(). If the slab cache has no free objects, it falls back
> > to the buddy allocator to allocate memory. However, at this point page_ext
> > is not yet fully initialized, so these newly allocated pages have no
> > codetag set. These pages may later be reclaimed by KASAN, which causes
> > the warning to trigger when they are freed because their codetag ref is
> > still empty.
> >
> > Use a global array to track pages allocated before page_ext is fully
> > initialized. The array size is fixed at 8192 entries, and will emit
> > a warning if this limit is exceeded. When page_ext initialization
> > completes, set their codetag to empty to avoid warnings when they
> > are freed later.
> >
> > The following warning is observed when this issue occurs:
> > [    9.582133] ------------[ cut here ]------------
> > [    9.582137] alloc_tag was not set
> > [    9.582139] WARNING: ./include/linux/alloc_tag.h:164 at __pgalloc_tag_sub+0x40f/0x550, CPU#5: systemd/1
> > [    9.582190] CPU: 5 UID: 0 PID: 1 Comm: systemd Not tainted 7.0.0-rc4 #1 PREEMPT(lazy)
> > [    9.582192] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > [    9.582194] RIP: 0010:__pgalloc_tag_sub+0x40f/0x550
> > [    9.582196] Code: 00 00 4c 29 e5 48 8b 05 1f 88 56 05 48 8d 4c ad 00 48 8d 2c c8 e9 87 fd ff ff 0f 0b 0f 0b e9 f3 fe ff ff 48 8d 3d 61 2f ed 03 <67> 48 0f b9 3a e9 b3 fd ff ff 0f 0b eb e4 e8 5e cd 14 02 4c 89 c7
> > [    9.582197] RSP: 0018:ffffc9000001f940 EFLAGS: 00010246
> > [    9.582200] RAX: dffffc0000000000 RBX: 1ffff92000003f2b RCX: 1ffff110200d806c
> > [    9.582201] RDX: ffff8881006c0360 RSI: 0000000000000004 RDI: ffffffff9bc7b460
> > [    9.582202] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff3a62324
> > [    9.582203] R10: ffffffff9d311923 R11: 0000000000000000 R12: ffffea0004001b00
> > [    9.582204] R13: 0000000000002000 R14: ffffea0000000000 R15: ffff8881006c0360
> > [    9.582206] FS:  00007ffbbcf2d940(0000) GS:ffff888450479000(0000) knlGS:0000000000000000
> > [    9.582208] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    9.582210] CR2: 000055ee3aa260d0 CR3: 0000000148b67005 CR4: 0000000000770ef0
> > [    9.582211] PKRU: 55555554
> > [    9.582212] Call Trace:
> > [    9.582213]  <TASK>
> > [    9.582214]  ? __pfx___pgalloc_tag_sub+0x10/0x10
> > [    9.582216]  ? check_bytes_and_report+0x68/0x140
> > [    9.582219]  __free_frozen_pages+0x2e4/0x1150
> > [    9.582221]  ? __free_slab+0xc2/0x2b0
> > [    9.582224]  qlist_free_all+0x4c/0xf0
> > [    9.582227]  kasan_quarantine_reduce+0x15d/0x180
> > [    9.582229]  __kasan_slab_alloc+0x69/0x90
> > [    9.582232]  kmem_cache_alloc_noprof+0x14a/0x500
> > [    9.582234]  do_getname+0x96/0x310
> > [    9.582237]  do_readlinkat+0x91/0x2f0
> > [    9.582239]  ? __pfx_do_readlinkat+0x10/0x10
> > [    9.582240]  ? get_random_bytes_user+0x1df/0x2c0
> > [    9.582244]  __x64_sys_readlinkat+0x96/0x100
> > [    9.582246]  do_syscall_64+0xce/0x650
> > [    9.582250]  ? __x64_sys_getrandom+0x13a/0x1e0
> > [    9.582252]  ? __pfx___x64_sys_getrandom+0x10/0x10
> > [    9.582254]  ? do_syscall_64+0x114/0x650
> > [    9.582255]  ? ksys_read+0xfc/0x1d0
> > [    9.582258]  ? __pfx_ksys_read+0x10/0x10
> > [    9.582260]  ? do_syscall_64+0x114/0x650
> > [    9.582262]  ? do_syscall_64+0x114/0x650
> > [    9.582264]  ? __pfx_fput_close_sync+0x10/0x10
> > [    9.582266]  ? file_close_fd_locked+0x178/0x2a0
> > [    9.582268]  ? __x64_sys_faccessat2+0x96/0x100
> > [    9.582269]  ? __x64_sys_close+0x7d/0xd0
> > [    9.582271]  ? do_syscall_64+0x114/0x650
> > [    9.582273]  ? do_syscall_64+0x114/0x650
> > [    9.582275]  ? clear_bhb_loop+0x50/0xa0
> > [    9.582277]  ? clear_bhb_loop+0x50/0xa0
> > [    9.582279]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [    9.582280] RIP: 0033:0x7ffbbda345ee
> > [    9.582282] Code: 0f 1f 40 00 48 8b 15 29 38 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 0b 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fa 37 0d 00 f7 d8 64 89 01 48
> > [    9.582284] RSP: 002b:00007ffe2ad8de58 EFLAGS: 00000202 ORIG_RAX: 000000000000010b
> > [    9.582286] RAX: ffffffffffffffda RBX: 000055ee3aa25570 RCX: 00007ffbbda345ee
> > [    9.582287] RDX: 000055ee3aa25570 RSI: 00007ffe2ad8dee0 RDI: 00000000ffffff9c
> > [    9.582288] RBP: 0000000000001000 R08: 0000000000000003 R09: 0000000000001001
> > [    9.582289] R10: 0000000000001000 R11: 0000000000000202 R12: 0000000000000033
> > [    9.582290] R13: 00007ffe2ad8dee0 R14: 00000000ffffff9c R15: 00007ffe2ad8deb0
> > [    9.582292]  </TASK>
> > [    9.582293] ---[ end trace 0000000000000000 ]---
> >
> > Fixes: 93d5440ece3c ("alloc_tag: uninline code gated by mem_alloc_profiling_key in page allocator")
> > Suggested-by: Suren Baghdasaryan <surenb@google.com>
> > Signed-off-by: Hao Ge <hao.ge@linux.dev>
> > ---
> > v2:
> >   - Replace spin_lock_irqsave() with atomic_try_cmpxchg() to avoid potential
> >      deadlock in NMI context
> >   - Change EARLY_ALLOC_PFN_MAX from 256 to 8192
> >   - Add pr_warn_once() when the limit is exceeded
> >   - Check ref.ct before clearing to avoid overwriting valid tags
> >   - Use function pointer (alloc_tag_add_early_pfn_ptr) instead of state
> > ---
> >  include/linux/alloc_tag.h   |  2 +
> >  include/linux/pgalloc_tag.h |  2 +-
> >  lib/alloc_tag.c             | 92 +++++++++++++++++++++++++++++++++++++
> >  mm/page_alloc.c             |  7 +++
> >  4 files changed, 102 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> > index d40ac39bfbe8..bf226c2be2ad 100644
> > --- a/include/linux/alloc_tag.h
> > +++ b/include/linux/alloc_tag.h
> > @@ -74,6 +74,8 @@ static inline void set_codetag_empty(union codetag_ref *ref)
> >
> >  #ifdef CONFIG_MEM_ALLOC_PROFILING
> >
> > +void alloc_tag_add_early_pfn(unsigned long pfn);
>
> Although this works, the usual approach is have it defined this way in
> the header file:
>
> #if CONFIG_MEM_ALLOC_PROFILING_DEBUG
> void alloc_tag_add_early_pfn(unsigned long pfn);
> #else
> static inline void alloc_tag_add_early_pfn(unsigned long pfn) {}
> #endif
>
> > +
> >  #define ALLOC_TAG_SECTION_NAME "alloc_tags"
> >
> >  struct codetag_bytes {
> > diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
> > index 38a82d65e58e..951d33362268 100644
> > --- a/include/linux/pgalloc_tag.h
> > +++ b/include/linux/pgalloc_tag.h
> > @@ -181,7 +181,7 @@ static inline struct alloc_tag *__pgalloc_tag_get(struct page *page)
> >
> >         if (get_page_tag_ref(page, &ref, &handle)) {
> >                 alloc_tag_sub_check(&ref);
> > -               if (ref.ct)
> > +               if (ref.ct && !is_codetag_empty(&ref))
> >                         tag = ct_to_alloc_tag(ref.ct);
> >                 put_page_tag_ref(handle);
> >         }
> > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > index 58991ab09d84..7b1812768af9 100644
> > --- a/lib/alloc_tag.c
> > +++ b/lib/alloc_tag.c
> > @@ -6,6 +6,7 @@
> >  #include <linux/kallsyms.h>
> >  #include <linux/module.h>
> >  #include <linux/page_ext.h>
> > +#include <linux/pgalloc_tag.h>
> >  #include <linux/proc_fs.h>
> >  #include <linux/seq_buf.h>
> >  #include <linux/seq_file.h>
> > @@ -26,6 +27,96 @@ static bool mem_profiling_support;
> >
> >  static struct codetag_type *alloc_tag_cttype;
> >
> > +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
> > +
> > +/*
> > + * Track page allocations before page_ext is initialized.
> > + * Some pages are allocated before page_ext becomes available, leaving
> > + * their codetag uninitialized. Track these early PFNs so we can clear
> > + * their codetag refs later to avoid warnings when they are freed.
> > + *
> > + * Early allocations include:
> > + *   - Base allocations independent of CPU count
> > + *   - Per-CPU allocations (e.g., CPU hotplug callbacks during smp_init,
> > + *     such as trace ring buffers, scheduler per-cpu data)
> > + *
> > + * For simplicity, we fix the size to 8192.
> > + * If insufficient, a warning will be triggered to alert the user.
> > + */
> > +#define EARLY_ALLOC_PFN_MAX            8192

Forgot to mention that we will need to do something about this limit
using dynamic allocation. I was thinking we could allocate pages
dynamically (with a GFP flag similar to ___GFP_NO_OBJ_EXT to avoid
recursion), linking them via page->lru and then freeing them at the
end of clear_early_alloc_pfn_tag_refs(). That adds more complexity but
solves this limit problem. However all this can be done as a followup
patch.

> > +
> > +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX] __initdata;
> > +static atomic_t early_pfn_count __initdata = ATOMIC_INIT(0);
> > +
> > +static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
> > +{
> > +       int old_idx, new_idx;
> > +
> > +       do {
> > +               old_idx = atomic_read(&early_pfn_count);
> > +               if (old_idx >= EARLY_ALLOC_PFN_MAX) {
> > +                       pr_warn_once("Early page allocations before page_ext init exceeded EARLY_ALLOC_PFN_MAX (%d)\n",
> > +                                     EARLY_ALLOC_PFN_MAX);
> > +                       return;
> > +               }
> > +               new_idx = old_idx + 1;
> > +       } while (!atomic_try_cmpxchg(&early_pfn_count, &old_idx, new_idx));
> > +
> > +       early_pfns[old_idx] = pfn;
> > +}
> > +
> > +static void (*alloc_tag_add_early_pfn_ptr)(unsigned long pfn) __refdata =
> > +               __alloc_tag_add_early_pfn;
>
> So, there is a possible race between clear_early_alloc_pfn_tag_refs()
> and __alloc_tag_add_early_pfn(). I think the easiest way to resolve
> this is using RCU. It's easier to show that with the code:
>
> typedef void (*alloc_tag_add_func)(unsigned long pfn);
>
> static alloc_tag_add_func __rcu alloc_tag_add_early_pfn_ptr __refdata =
>                 __alloc_tag_add_early_pfn;
>
> void alloc_tag_add_early_pfn(unsigned long pfn)
> {
>         alloc_tag_add_func alloc_tag_add;
>
>         if (static_key_enabled(&mem_profiling_compressed))
>                 return;
>
>         rcu_read_lock();
>         alloc_tag_add = rcu_dereference(alloc_tag_add_early_pfn_ptr);
>         if (alloc_tag_add)
>                 alloc_tag_add(pfn);
>         rcu_read_unlock();
> }
>
> static void __init clear_early_alloc_pfn_tag_refs(void)
> {
>         unsigned int i;
>
>         if (static_key_enabled(&mem_profiling_compressed))
>                 return;
>
>        rcu_assign_pointer(alloc_tag_add_early_pfn_ptr, NULL);
>         /* Make sure we are not racing with __alloc_tag_add_early_pfn() */
>         synchronize_rcu();
>         ...
> }
>
> So, clear_early_alloc_pfn_tag_refs() resets
> alloc_tag_add_early_pfn_ptr to NULL before starting its loop and
> alloc_tag_add_early_pfn() calls __alloc_tag_add_early_pfn() in RCU
> read section. This way you know that after synchronize_rcu() nobody is
> or will be executing __alloc_tag_add_early_pfn() anymore.
> synchronize_rcu() can increase boot time but this happens only with
> CONFIG_MEM_ALLOC_PROFILING_DEBUG, so should be acceptable.
>
> > +
> > +void alloc_tag_add_early_pfn(unsigned long pfn)
> > +{
> > +       if (static_key_enabled(&mem_profiling_compressed))
> > +               return;
> > +
> > +       if (alloc_tag_add_early_pfn_ptr)
> > +               alloc_tag_add_early_pfn_ptr(pfn);
> > +}
> > +
> > +static void __init clear_early_alloc_pfn_tag_refs(void)
> > +{
> > +       unsigned int i;
> > +
>
> I included this in the code I suggested above but just as a reminder,
> here we also need:
>
>       if (static_key_enabled(&mem_profiling_compressed))
>                return;
>
> > +       for (i = 0; i < atomic_read(&early_pfn_count); i++) {
> > +               unsigned long pfn = early_pfns[i];
> > +
> > +               if (pfn_valid(pfn)) {
> > +                       struct page *page = pfn_to_page(pfn);
> > +                       union pgtag_ref_handle handle;
> > +                       union codetag_ref ref;
> > +
> > +                       if (get_page_tag_ref(page, &ref, &handle)) {
> > +                               /*
> > +                                * An early-allocated page could be freed and reallocated
> > +                                * after its page_ext is initialized but before we clear it.
> > +                                * In that case, it already has a valid tag set.
> > +                                * We should not overwrite that valid tag with CODETAG_EMPTY.
> > +                                */
>
> You don't really solve this race here. See explanation below.
>
> > +                               if (ref.ct) {
> > +                                       put_page_tag_ref(handle);
> > +                                       continue;
> > +                               }
> > +
>
> Between the above "if (ref.ct)" check and below set_codetag_empty() an
> allocation can change the ref.ct value to a valid reference (because
> page_ext already exists) and you will override it with CODETAG_EMPTY.
> I think we have two options:
> 1. Just let that override happen and lose accounting for that racing
> allocation. I think that's preferred option since the race is not
> likely and extra complexity is not worth it IMO.
> 2.  Do clear_page_tag_ref() here but atomically. Something like
> clear_page_tag_ref_if_null() calling update_page_tag_ref_if_null()
> which calls cmpxchg(&ref->ct, NULL, CODETAG_EMPTY).
>
> If you agree with option #1 then please update the comment above
> highlighting this smaller race and that we are ok with it.
>
> > +                               set_codetag_empty(&ref);
> > +                               update_page_tag_ref(handle, &ref);
> > +                               put_page_tag_ref(handle);
> > +                       }
> > +               }
> > +
> > +       }
> > +
> > +       atomic_set(&early_pfn_count, 0);
> > +       alloc_tag_add_early_pfn_ptr = NULL;
>
> Once we did that RCU synchronization we don't need the above resets.
> early_pfn_count won't be used anymore and alloc_tag_add_early_pfn_ptr
> is already NULL.
>
> > +}
> > +#else /* !CONFIG_MEM_ALLOC_PROFILING_DEBUG */
> > +inline void alloc_tag_add_early_pfn(unsigned long pfn) {}
> > +static inline void __init clear_early_alloc_pfn_tag_refs(void) {}
> > +#endif
> > +
> >  #ifdef CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU
> >  DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
> >  EXPORT_SYMBOL(_shared_alloc_tag);
> > @@ -760,6 +851,7 @@ static __init bool need_page_alloc_tagging(void)
> >
> >  static __init void init_page_alloc_tagging(void)
> >  {
> > +       clear_early_alloc_pfn_tag_refs();
> >  }
> >
> >  struct page_ext_operations page_alloc_tagging_ops = {
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 2d4b6f1a554e..8f9bda04403b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task,
>
> In here let's mark the normal branch as "likely":
> -        if (get_page_tag_ref(page, &ref, &handle)) {
> +        if (likely(get_page_tag_ref(page, &ref, &handle))) {
>
> >                 alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr);
> >                 update_page_tag_ref(handle, &ref);
> >                 put_page_tag_ref(handle);
> > +       } else {
> > +               /*
> > +                * page_ext is not available yet, record the pfn so we can
> > +                * clear the tag ref later when page_ext is initialized.
> > +                */
> > +               alloc_tag_add_early_pfn(page_to_pfn(page));
> > +               alloc_tag_set_inaccurate(current->alloc_tag);
>
> Here we should be using task->alloc_tag instead of current->alloc_tag
> but we also need to check that task->alloc_tag != NULL.
>
> >         }
> >  }
> >
> > --
> > 2.25.1
> >
Re: [PATCH v2] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization
Posted by Hao Ge 6 days, 10 hours ago
On 2026/3/27 12:39, Suren Baghdasaryan wrote:
> On Thu, Mar 26, 2026 at 9:32 PM Suren Baghdasaryan <surenb@google.com> wrote:
>> On Thu, Mar 26, 2026 at 7:07 AM Hao Ge <hao.ge@linux.dev> wrote:
>>> Due to initialization ordering, page_ext is allocated and initialized
>>> relatively late during boot. Some pages have already been allocated
>>> and freed before page_ext becomes available, leaving their codetag
>>> uninitialized.
>>>
>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls
>>> kmemleak_alloc(). If the slab cache has no free objects, it falls back
>>> to the buddy allocator to allocate memory. However, at this point page_ext
>>> is not yet fully initialized, so these newly allocated pages have no
>>> codetag set. These pages may later be reclaimed by KASAN, which causes
>>> the warning to trigger when they are freed because their codetag ref is
>>> still empty.
>>>
>>> Use a global array to track pages allocated before page_ext is fully
>>> initialized. The array size is fixed at 8192 entries, and will emit
>>> a warning if this limit is exceeded. When page_ext initialization
>>> completes, set their codetag to empty to avoid warnings when they
>>> are freed later.
>>>
>>> The following warning is observed when this issue occurs:
>>> [    9.582133] ------------[ cut here ]------------
>>> [    9.582137] alloc_tag was not set
>>> [    9.582139] WARNING: ./include/linux/alloc_tag.h:164 at __pgalloc_tag_sub+0x40f/0x550, CPU#5: systemd/1
>>> [    9.582190] CPU: 5 UID: 0 PID: 1 Comm: systemd Not tainted 7.0.0-rc4 #1 PREEMPT(lazy)
>>> [    9.582192] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>>> [    9.582194] RIP: 0010:__pgalloc_tag_sub+0x40f/0x550
>>> [    9.582196] Code: 00 00 4c 29 e5 48 8b 05 1f 88 56 05 48 8d 4c ad 00 48 8d 2c c8 e9 87 fd ff ff 0f 0b 0f 0b e9 f3 fe ff ff 48 8d 3d 61 2f ed 03 <67> 48 0f b9 3a e9 b3 fd ff ff 0f 0b eb e4 e8 5e cd 14 02 4c 89 c7
>>> [    9.582197] RSP: 0018:ffffc9000001f940 EFLAGS: 00010246
>>> [    9.582200] RAX: dffffc0000000000 RBX: 1ffff92000003f2b RCX: 1ffff110200d806c
>>> [    9.582201] RDX: ffff8881006c0360 RSI: 0000000000000004 RDI: ffffffff9bc7b460
>>> [    9.582202] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff3a62324
>>> [    9.582203] R10: ffffffff9d311923 R11: 0000000000000000 R12: ffffea0004001b00
>>> [    9.582204] R13: 0000000000002000 R14: ffffea0000000000 R15: ffff8881006c0360
>>> [    9.582206] FS:  00007ffbbcf2d940(0000) GS:ffff888450479000(0000) knlGS:0000000000000000
>>> [    9.582208] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [    9.582210] CR2: 000055ee3aa260d0 CR3: 0000000148b67005 CR4: 0000000000770ef0
>>> [    9.582211] PKRU: 55555554
>>> [    9.582212] Call Trace:
>>> [    9.582213]  <TASK>
>>> [    9.582214]  ? __pfx___pgalloc_tag_sub+0x10/0x10
>>> [    9.582216]  ? check_bytes_and_report+0x68/0x140
>>> [    9.582219]  __free_frozen_pages+0x2e4/0x1150
>>> [    9.582221]  ? __free_slab+0xc2/0x2b0
>>> [    9.582224]  qlist_free_all+0x4c/0xf0
>>> [    9.582227]  kasan_quarantine_reduce+0x15d/0x180
>>> [    9.582229]  __kasan_slab_alloc+0x69/0x90
>>> [    9.582232]  kmem_cache_alloc_noprof+0x14a/0x500
>>> [    9.582234]  do_getname+0x96/0x310
>>> [    9.582237]  do_readlinkat+0x91/0x2f0
>>> [    9.582239]  ? __pfx_do_readlinkat+0x10/0x10
>>> [    9.582240]  ? get_random_bytes_user+0x1df/0x2c0
>>> [    9.582244]  __x64_sys_readlinkat+0x96/0x100
>>> [    9.582246]  do_syscall_64+0xce/0x650
>>> [    9.582250]  ? __x64_sys_getrandom+0x13a/0x1e0
>>> [    9.582252]  ? __pfx___x64_sys_getrandom+0x10/0x10
>>> [    9.582254]  ? do_syscall_64+0x114/0x650
>>> [    9.582255]  ? ksys_read+0xfc/0x1d0
>>> [    9.582258]  ? __pfx_ksys_read+0x10/0x10
>>> [    9.582260]  ? do_syscall_64+0x114/0x650
>>> [    9.582262]  ? do_syscall_64+0x114/0x650
>>> [    9.582264]  ? __pfx_fput_close_sync+0x10/0x10
>>> [    9.582266]  ? file_close_fd_locked+0x178/0x2a0
>>> [    9.582268]  ? __x64_sys_faccessat2+0x96/0x100
>>> [    9.582269]  ? __x64_sys_close+0x7d/0xd0
>>> [    9.582271]  ? do_syscall_64+0x114/0x650
>>> [    9.582273]  ? do_syscall_64+0x114/0x650
>>> [    9.582275]  ? clear_bhb_loop+0x50/0xa0
>>> [    9.582277]  ? clear_bhb_loop+0x50/0xa0
>>> [    9.582279]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>> [    9.582280] RIP: 0033:0x7ffbbda345ee
>>> [    9.582282] Code: 0f 1f 40 00 48 8b 15 29 38 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 0b 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fa 37 0d 00 f7 d8 64 89 01 48
>>> [    9.582284] RSP: 002b:00007ffe2ad8de58 EFLAGS: 00000202 ORIG_RAX: 000000000000010b
>>> [    9.582286] RAX: ffffffffffffffda RBX: 000055ee3aa25570 RCX: 00007ffbbda345ee
>>> [    9.582287] RDX: 000055ee3aa25570 RSI: 00007ffe2ad8dee0 RDI: 00000000ffffff9c
>>> [    9.582288] RBP: 0000000000001000 R08: 0000000000000003 R09: 0000000000001001
>>> [    9.582289] R10: 0000000000001000 R11: 0000000000000202 R12: 0000000000000033
>>> [    9.582290] R13: 00007ffe2ad8dee0 R14: 00000000ffffff9c R15: 00007ffe2ad8deb0
>>> [    9.582292]  </TASK>
>>> [    9.582293] ---[ end trace 0000000000000000 ]---
>>>
>>> Fixes: 93d5440ece3c ("alloc_tag: uninline code gated by mem_alloc_profiling_key in page allocator")
>>> Suggested-by: Suren Baghdasaryan <surenb@google.com>
>>> Signed-off-by: Hao Ge <hao.ge@linux.dev>
>>> ---
>>> v2:
>>>    - Replace spin_lock_irqsave() with atomic_try_cmpxchg() to avoid potential
>>>       deadlock in NMI context
>>>    - Change EARLY_ALLOC_PFN_MAX from 256 to 8192
>>>    - Add pr_warn_once() when the limit is exceeded
>>>    - Check ref.ct before clearing to avoid overwriting valid tags
>>>    - Use function pointer (alloc_tag_add_early_pfn_ptr) instead of state
>>> ---
>>>   include/linux/alloc_tag.h   |  2 +
>>>   include/linux/pgalloc_tag.h |  2 +-
>>>   lib/alloc_tag.c             | 92 +++++++++++++++++++++++++++++++++++++
>>>   mm/page_alloc.c             |  7 +++
>>>   4 files changed, 102 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
>>> index d40ac39bfbe8..bf226c2be2ad 100644
>>> --- a/include/linux/alloc_tag.h
>>> +++ b/include/linux/alloc_tag.h
>>> @@ -74,6 +74,8 @@ static inline void set_codetag_empty(union codetag_ref *ref)
>>>
>>>   #ifdef CONFIG_MEM_ALLOC_PROFILING
>>>
>>> +void alloc_tag_add_early_pfn(unsigned long pfn);
>> Although this works, the usual approach is have it defined this way in
>> the header file:
>>
>> #if CONFIG_MEM_ALLOC_PROFILING_DEBUG
>> void alloc_tag_add_early_pfn(unsigned long pfn);
>> #else
>> static inline void alloc_tag_add_early_pfn(unsigned long pfn) {}
>> #endif
>>
>>> +
>>>   #define ALLOC_TAG_SECTION_NAME "alloc_tags"
>>>
>>>   struct codetag_bytes {
>>> diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
>>> index 38a82d65e58e..951d33362268 100644
>>> --- a/include/linux/pgalloc_tag.h
>>> +++ b/include/linux/pgalloc_tag.h
>>> @@ -181,7 +181,7 @@ static inline struct alloc_tag *__pgalloc_tag_get(struct page *page)
>>>
>>>          if (get_page_tag_ref(page, &ref, &handle)) {
>>>                  alloc_tag_sub_check(&ref);
>>> -               if (ref.ct)
>>> +               if (ref.ct && !is_codetag_empty(&ref))
>>>                          tag = ct_to_alloc_tag(ref.ct);
>>>                  put_page_tag_ref(handle);
>>>          }
>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>>> index 58991ab09d84..7b1812768af9 100644
>>> --- a/lib/alloc_tag.c
>>> +++ b/lib/alloc_tag.c
>>> @@ -6,6 +6,7 @@
>>>   #include <linux/kallsyms.h>
>>>   #include <linux/module.h>
>>>   #include <linux/page_ext.h>
>>> +#include <linux/pgalloc_tag.h>
>>>   #include <linux/proc_fs.h>
>>>   #include <linux/seq_buf.h>
>>>   #include <linux/seq_file.h>
>>> @@ -26,6 +27,96 @@ static bool mem_profiling_support;
>>>
>>>   static struct codetag_type *alloc_tag_cttype;
>>>
>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
>>> +
>>> +/*
>>> + * Track page allocations before page_ext is initialized.
>>> + * Some pages are allocated before page_ext becomes available, leaving
>>> + * their codetag uninitialized. Track these early PFNs so we can clear
>>> + * their codetag refs later to avoid warnings when they are freed.
>>> + *
>>> + * Early allocations include:
>>> + *   - Base allocations independent of CPU count
>>> + *   - Per-CPU allocations (e.g., CPU hotplug callbacks during smp_init,
>>> + *     such as trace ring buffers, scheduler per-cpu data)
>>> + *
>>> + * For simplicity, we fix the size to 8192.
>>> + * If insufficient, a warning will be triggered to alert the user.
>>> + */
>>> +#define EARLY_ALLOC_PFN_MAX            8192

Hi Suren


> Forgot to mention that we will need to do something about this limit
> using dynamic allocation. I was thinking we could allocate pages
> dynamically (with a GFP flag similar to ___GFP_NO_OBJ_EXT to avoid
> recursion), linking them via page->lru and then freeing them at the
> end of clear_early_alloc_pfn_tag_refs(). That adds more complexity but
> solves this limit problem. However all this can be done as a followup
> patch.


Yes, to be honest, I did try calling alloc_page() myself — it was 
immediately obvious

this would lead to infinite recursion since alloc_page() would hit the 
same code path.

I've already noted these in our code comments as TODO items.

I'll also try to work on an implementation as a follow-up.


Thanks

Hao

>>> +
>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX] __initdata;
>>> +static atomic_t early_pfn_count __initdata = ATOMIC_INIT(0);
>>> +
>>> +static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
>>> +{
>>> +       int old_idx, new_idx;
>>> +
>>> +       do {
>>> +               old_idx = atomic_read(&early_pfn_count);
>>> +               if (old_idx >= EARLY_ALLOC_PFN_MAX) {
>>> +                       pr_warn_once("Early page allocations before page_ext init exceeded EARLY_ALLOC_PFN_MAX (%d)\n",
>>> +                                     EARLY_ALLOC_PFN_MAX);
>>> +                       return;
>>> +               }
>>> +               new_idx = old_idx + 1;
>>> +       } while (!atomic_try_cmpxchg(&early_pfn_count, &old_idx, new_idx));
>>> +
>>> +       early_pfns[old_idx] = pfn;
>>> +}
>>> +
>>> +static void (*alloc_tag_add_early_pfn_ptr)(unsigned long pfn) __refdata =
>>> +               __alloc_tag_add_early_pfn;
>> So, there is a possible race between clear_early_alloc_pfn_tag_refs()
>> and __alloc_tag_add_early_pfn(). I think the easiest way to resolve
>> this is using RCU. It's easier to show that with the code:
>>
>> typedef void (*alloc_tag_add_func)(unsigned long pfn);
>>
>> static alloc_tag_add_func __rcu alloc_tag_add_early_pfn_ptr __refdata =
>>                  __alloc_tag_add_early_pfn;
>>
>> void alloc_tag_add_early_pfn(unsigned long pfn)
>> {
>>          alloc_tag_add_func alloc_tag_add;
>>
>>          if (static_key_enabled(&mem_profiling_compressed))
>>                  return;
>>
>>          rcu_read_lock();
>>          alloc_tag_add = rcu_dereference(alloc_tag_add_early_pfn_ptr);
>>          if (alloc_tag_add)
>>                  alloc_tag_add(pfn);
>>          rcu_read_unlock();
>> }
>>
>> static void __init clear_early_alloc_pfn_tag_refs(void)
>> {
>>          unsigned int i;
>>
>>          if (static_key_enabled(&mem_profiling_compressed))
>>                  return;
>>
>>         rcu_assign_pointer(alloc_tag_add_early_pfn_ptr, NULL);
>>          /* Make sure we are not racing with __alloc_tag_add_early_pfn() */
>>          synchronize_rcu();
>>          ...
>> }
>>
>> So, clear_early_alloc_pfn_tag_refs() resets
>> alloc_tag_add_early_pfn_ptr to NULL before starting its loop and
>> alloc_tag_add_early_pfn() calls __alloc_tag_add_early_pfn() in RCU
>> read section. This way you know that after synchronize_rcu() nobody is
>> or will be executing __alloc_tag_add_early_pfn() anymore.
>> synchronize_rcu() can increase boot time but this happens only with
>> CONFIG_MEM_ALLOC_PROFILING_DEBUG, so should be acceptable.
>>
>>> +
>>> +void alloc_tag_add_early_pfn(unsigned long pfn)
>>> +{
>>> +       if (static_key_enabled(&mem_profiling_compressed))
>>> +               return;
>>> +
>>> +       if (alloc_tag_add_early_pfn_ptr)
>>> +               alloc_tag_add_early_pfn_ptr(pfn);
>>> +}
>>> +
>>> +static void __init clear_early_alloc_pfn_tag_refs(void)
>>> +{
>>> +       unsigned int i;
>>> +
>> I included this in the code I suggested above but just as a reminder,
>> here we also need:
>>
>>        if (static_key_enabled(&mem_profiling_compressed))
>>                 return;
>>
>>> +       for (i = 0; i < atomic_read(&early_pfn_count); i++) {
>>> +               unsigned long pfn = early_pfns[i];
>>> +
>>> +               if (pfn_valid(pfn)) {
>>> +                       struct page *page = pfn_to_page(pfn);
>>> +                       union pgtag_ref_handle handle;
>>> +                       union codetag_ref ref;
>>> +
>>> +                       if (get_page_tag_ref(page, &ref, &handle)) {
>>> +                               /*
>>> +                                * An early-allocated page could be freed and reallocated
>>> +                                * after its page_ext is initialized but before we clear it.
>>> +                                * In that case, it already has a valid tag set.
>>> +                                * We should not overwrite that valid tag with CODETAG_EMPTY.
>>> +                                */
>> You don't really solve this race here. See explanation below.
>>
>>> +                               if (ref.ct) {
>>> +                                       put_page_tag_ref(handle);
>>> +                                       continue;
>>> +                               }
>>> +
>> Between the above "if (ref.ct)" check and below set_codetag_empty() an
>> allocation can change the ref.ct value to a valid reference (because
>> page_ext already exists) and you will override it with CODETAG_EMPTY.
>> I think we have two options:
>> 1. Just let that override happen and lose accounting for that racing
>> allocation. I think that's preferred option since the race is not
>> likely and extra complexity is not worth it IMO.
>> 2.  Do clear_page_tag_ref() here but atomically. Something like
>> clear_page_tag_ref_if_null() calling update_page_tag_ref_if_null()
>> which calls cmpxchg(&ref->ct, NULL, CODETAG_EMPTY).
>>
>> If you agree with option #1 then please update the comment above
>> highlighting this smaller race and that we are ok with it.
>>
>>> +                               set_codetag_empty(&ref);
>>> +                               update_page_tag_ref(handle, &ref);
>>> +                               put_page_tag_ref(handle);
>>> +                       }
>>> +               }
>>> +
>>> +       }
>>> +
>>> +       atomic_set(&early_pfn_count, 0);
>>> +       alloc_tag_add_early_pfn_ptr = NULL;
>> Once we did that RCU synchronization we don't need the above resets.
>> early_pfn_count won't be used anymore and alloc_tag_add_early_pfn_ptr
>> is already NULL.
>>
>>> +}
>>> +#else /* !CONFIG_MEM_ALLOC_PROFILING_DEBUG */
>>> +inline void alloc_tag_add_early_pfn(unsigned long pfn) {}
>>> +static inline void __init clear_early_alloc_pfn_tag_refs(void) {}
>>> +#endif
>>> +
>>>   #ifdef CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU
>>>   DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
>>>   EXPORT_SYMBOL(_shared_alloc_tag);
>>> @@ -760,6 +851,7 @@ static __init bool need_page_alloc_tagging(void)
>>>
>>>   static __init void init_page_alloc_tagging(void)
>>>   {
>>> +       clear_early_alloc_pfn_tag_refs();
>>>   }
>>>
>>>   struct page_ext_operations page_alloc_tagging_ops = {
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 2d4b6f1a554e..8f9bda04403b 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task,
>> In here let's mark the normal branch as "likely":
>> -        if (get_page_tag_ref(page, &ref, &handle)) {
>> +        if (likely(get_page_tag_ref(page, &ref, &handle))) {
>>
>>>                  alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr);
>>>                  update_page_tag_ref(handle, &ref);
>>>                  put_page_tag_ref(handle);
>>> +       } else {
>>> +               /*
>>> +                * page_ext is not available yet, record the pfn so we can
>>> +                * clear the tag ref later when page_ext is initialized.
>>> +                */
>>> +               alloc_tag_add_early_pfn(page_to_pfn(page));
>>> +               alloc_tag_set_inaccurate(current->alloc_tag);
>> Here we should be using task->alloc_tag instead of current->alloc_tag
>> but we also need to check that task->alloc_tag != NULL.
>>
>>>          }
>>>   }
>>>
>>> --
>>> 2.25.1
>>>
Re: [PATCH v2] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization
Posted by Andrew Morton 6 days, 17 hours ago
On Thu, 26 Mar 2026 22:05:54 +0800 Hao Ge <hao.ge@linux.dev> wrote:

> Due to initialization ordering, page_ext is allocated and initialized
> relatively late during boot. Some pages have already been allocated
> and freed before page_ext becomes available, leaving their codetag
> uninitialized.
> 
> A clear example is in init_section_page_ext(): alloc_page_ext() calls
> kmemleak_alloc(). If the slab cache has no free objects, it falls back
> to the buddy allocator to allocate memory. However, at this point page_ext
> is not yet fully initialized, so these newly allocated pages have no
> codetag set. These pages may later be reclaimed by KASAN, which causes
> the warning to trigger when they are freed because their codetag ref is
> still empty.
> 
> Use a global array to track pages allocated before page_ext is fully
> initialized. The array size is fixed at 8192 entries, and will emit
> a warning if this limit is exceeded. When page_ext initialization
> completes, set their codetag to empty to avoid warnings when they
> are freed later.
> 

Thanks.  I'll queue this for review and test.

But where will I queue it?

> 
> Fixes: 93d5440ece3c ("alloc_tag: uninline code gated by mem_alloc_profiling_key in page allocator")

A year ago, so a cc:stable might be needed.

> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG

otoh, it appears that the bug only hits with
CONFIG_MEM_ALLOC_PROFILING_DEBUG=y?  If so, I'll add that (important)
info to the changelog.

Do people use CONFIG_MEM_ALLOC_PROFILING_DEBUG much?  Is a backport
really needed?

Either way, it seems that this isn't a very urgent issue so I'm
inclined to add it to the 7.1-rc1 pile, perhaps with a cc:stable.

Please all share your thoughts with me, thanks.
Re: [PATCH v2] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization
Posted by Suren Baghdasaryan 6 days, 17 hours ago
On Thu, Mar 26, 2026 at 6:11 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 26 Mar 2026 22:05:54 +0800 Hao Ge <hao.ge@linux.dev> wrote:
>
> > Due to initialization ordering, page_ext is allocated and initialized
> > relatively late during boot. Some pages have already been allocated
> > and freed before page_ext becomes available, leaving their codetag
> > uninitialized.
> >
> > A clear example is in init_section_page_ext(): alloc_page_ext() calls
> > kmemleak_alloc(). If the slab cache has no free objects, it falls back
> > to the buddy allocator to allocate memory. However, at this point page_ext
> > is not yet fully initialized, so these newly allocated pages have no
> > codetag set. These pages may later be reclaimed by KASAN, which causes
> > the warning to trigger when they are freed because their codetag ref is
> > still empty.
> >
> > Use a global array to track pages allocated before page_ext is fully
> > initialized. The array size is fixed at 8192 entries, and will emit
> > a warning if this limit is exceeded. When page_ext initialization
> > completes, set their codetag to empty to avoid warnings when they
> > are freed later.
> >
>
> Thanks.  I'll queue this for review and test.
>
> But where will I queue it?

I don't think it's extra urgent. It is visible only when debugging
with CONFIG_MEM_ALLOC_PROFILING_DEBUG.

>
> >
> > Fixes: 93d5440ece3c ("alloc_tag: uninline code gated by mem_alloc_profiling_key in page allocator")

Hmm. I'm not sure that's the right patch. Technically the problem
exists once we introduced CONFIG_MEM_ALLOC_PROFILING_DEBUG. I'll
double-check.

>
> A year ago, so a cc:stable might be needed.
>
> > +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
>
> otoh, it appears that the bug only hits with
> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y?  If so, I'll add that (important)
> info to the changelog.

Correct, it affects only CONFIG_MEM_ALLOC_PROFILING_DEBUG=y and only
if !mem_profiling_compressed.

>
> Do people use CONFIG_MEM_ALLOC_PROFILING_DEBUG much?  Is a backport
> really needed?

IMO backport would be good.

>
> Either way, it seems that this isn't a very urgent issue so I'm
> inclined to add it to the 7.1-rc1 pile, perhaps with a cc:stable.
>
> Please all share your thoughts with me, thanks.

I'm reviewing and testing the patch and there is a race and a couple
of smaller issues. I'll post a reply later today.
Re: [PATCH v2] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization
Posted by Hao Ge 6 days, 10 hours ago
Hi Suren


On 2026/3/27 09:19, Suren Baghdasaryan wrote:
> On Thu, Mar 26, 2026 at 6:11 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>> On Thu, 26 Mar 2026 22:05:54 +0800 Hao Ge <hao.ge@linux.dev> wrote:
>>
>>> Due to initialization ordering, page_ext is allocated and initialized
>>> relatively late during boot. Some pages have already been allocated
>>> and freed before page_ext becomes available, leaving their codetag
>>> uninitialized.
>>>
>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls
>>> kmemleak_alloc(). If the slab cache has no free objects, it falls back
>>> to the buddy allocator to allocate memory. However, at this point page_ext
>>> is not yet fully initialized, so these newly allocated pages have no
>>> codetag set. These pages may later be reclaimed by KASAN, which causes
>>> the warning to trigger when they are freed because their codetag ref is
>>> still empty.
>>>
>>> Use a global array to track pages allocated before page_ext is fully
>>> initialized. The array size is fixed at 8192 entries, and will emit
>>> a warning if this limit is exceeded. When page_ext initialization
>>> completes, set their codetag to empty to avoid warnings when they
>>> are freed later.
>>>
>> Thanks.  I'll queue this for review and test.
>>
>> But where will I queue it?
> I don't think it's extra urgent. It is visible only when debugging
> with CONFIG_MEM_ALLOC_PROFILING_DEBUG.
>
>>> Fixes: 93d5440ece3c ("alloc_tag: uninline code gated by mem_alloc_profiling_key in page allocator")
> Hmm. I'm not sure that's the right patch. Technically the problem
> exists once we introduced CONFIG_MEM_ALLOC_PROFILING_DEBUG. I'll
> double-check.


I believe this should be Fixes: dcfe378c81f72 ("lib: introduce support 
for page allocation tagging").

Earlier I thought backporting this commit here would be quite involved,

but after further consideration, this is indeed the commit being fixed.


>> A year ago, so a cc:stable might be needed.
>>
>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
>> otoh, it appears that the bug only hits with
>> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y?  If so, I'll add that (important)
>> info to the changelog.
> Correct, it affects only CONFIG_MEM_ALLOC_PROFILING_DEBUG=y and only
> if !mem_profiling_compressed.
>
>> Do people use CONFIG_MEM_ALLOC_PROFILING_DEBUG much?  Is a backport
>> really needed?
> IMO backport would be good.
>
>> Either way, it seems that this isn't a very urgent issue so I'm
>> inclined to add it to the 7.1-rc1 pile, perhaps with a cc:stable.
>>
>> Please all share your thoughts with me, thanks.
> I'm reviewing and testing the patch and there is a race and a couple
> of smaller issues. I'll post a reply later today.

Thank you so much for your kind help! I really appreciate it.

Thanks

Hao

Re: [PATCH v2] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization
Posted by Andrew Morton 6 days, 17 hours ago
On Thu, 26 Mar 2026 18:19:56 -0700 Suren Baghdasaryan <surenb@google.com> wrote:

> >
> > Do people use CONFIG_MEM_ALLOC_PROFILING_DEBUG much?  Is a backport
> > really needed?
> 
> IMO backport would be good.

OK, thanks, I'll slap a cc:stable on it and keep it in the 7.1.rc1 queue.

I added

"This bug occurs when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y."

somewhere in the changelog but that's pretty lame and could be expanded
upon in a respin.

> >
> > Either way, it seems that this isn't a very urgent issue so I'm
> > inclined to add it to the 7.1-rc1 pile, perhaps with a cc:stable.
> >
> > Please all share your thoughts with me, thanks.
> 
> I'm reviewing and testing the patch and there is a race and a couple
> of smaller issues. I'll post a reply later today.

Cool.  I was going to keep it in there for mm-new testing (which is
light).  But I guess there isn't much value in this so I'll drop v2.

Sashiko had a bunch of nags, but I think you're checking that routinely?
https://sashiko.dev/#/patchset/20260326140554.191996-1-hao.ge%40linux.dev
Re: [PATCH v2] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization
Posted by Suren Baghdasaryan 6 days, 17 hours ago
On Thu, Mar 26, 2026 at 6:34 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 26 Mar 2026 18:19:56 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
>
> > >
> > > Do people use CONFIG_MEM_ALLOC_PROFILING_DEBUG much?  Is a backport
> > > really needed?
> >
> > IMO backport would be good.
>
> OK, thanks, I'll slap a cc:stable on it and keep it in the 7.1.rc1 queue.
>
> I added
>
> "This bug occurs when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y."
>
> somewhere in the changelog but that's pretty lame and could be expanded
> upon in a respin.
>
> > >
> > > Either way, it seems that this isn't a very urgent issue so I'm
> > > inclined to add it to the 7.1-rc1 pile, perhaps with a cc:stable.
> > >
> > > Please all share your thoughts with me, thanks.
> >
> > I'm reviewing and testing the patch and there is a race and a couple
> > of smaller issues. I'll post a reply later today.
>
> Cool.  I was going to keep it in there for mm-new testing (which is
> light).  But I guess there isn't much value in this so I'll drop v2.
>
> Sashiko had a bunch of nags, but I think you're checking that routinely?
> https://sashiko.dev/#/patchset/20260326140554.191996-1-hao.ge%40linux.dev

Yep. Plus I finally got Sashiko to run on my local machine, so I will
give it a spin on my fixes before posting my reply.