From nobody Thu Jun 18 13:20:42 2026 Received: from mail-m15580.qiye.163.com (mail-m15580.qiye.163.com [101.71.155.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE82B1DE8AD for ; Thu, 18 Jun 2026 04:03:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=101.71.155.80 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781755408; cv=none; b=jDnrW4VbPGk14dF3GLbF1oKvNptzwz+TUnDU0hOBdDxJPM1+fmpQegvA3zM7DThaMoVG6rV7lfVZdSIJOVCFVobaQevJL4BWjvGpiteUnWeN0mBfb1Q3gEwLb0D7o/iwGunAwBlFcLVBm4Ggyx2ZdoTodEXVpNij3ep7Hv3xVGo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781755408; c=relaxed/simple; bh=HzzilkdDNwvdtIZ0/GEgjDC+OUi8JJ/1EzXwCZ84Nqs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HUcREJOBLZK0oSDwVNfIOlMgXHOK7vXOpRJ+G44zXaladTP+YfQ5eib7baxslPMEFSMdAx+RYKrKw6INo2o7zNNGY+5DYrOcZkmhw/IyLNKyCxhdTe+5FgnsjsxrHe1ESpl3XwLYgYjqcX/HUSw2ncqcJckBi38V0uDmLt7YBaw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=101.71.155.80 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from localhost.localdomain (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 1b9899911; Thu, 18 Jun 2026 11:58:04 +0800 (GMT+08:00) From: Zhen Ni To: akpm@linux-foundation.org, vbabka@kernel.org Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhen Ni Subject: [PATCH v10 1/4] mm/page_owner: add print_mode filter Date: Thu, 18 Jun 2026 11:57:47 +0800 Message-Id: <20260618035750.3724613-2-zhen.ni@easystack.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260618035750.3724613-1-zhen.ni@easystack.cn> References: <20260618035750.3724613-1-zhen.ni@easystack.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-HM-Tid: 0a9ed8e0fce60229kunm88c9600fdda82 X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVkaS01LVk1CTxhPHkpITRoYGFYVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVKS0 lPT09IVUpLS1VKQktLWQY+ Content-Type: text/plain; charset="utf-8" Add a print_mode filter to page_owner that allows users to choose between printing stack traces, stack handles, or both, providing flexibility for different debugging and analysis scenarios. The filter provides three modes via page_owner: - Writing "mode=3Dstack" prints stack traces for each page (default) - Writing "mode=3Dhandle" prints only the handle number - Writing "mode=3Dstack_handle" prints both stack traces and handles The default stack mode maintains backward compatibility with existing usage, displaying complete stack traces for each page allocation. The handle mode dramatically reduces log size and improves performance by showing only the handle number instead of the full stack trace. Testing shows handle mode reduces output size by ~66% (84MB vs 244MB) and improves read performance by ~4.4x compared to full stack output. The mapping from handles to actual stack traces can be obtained via the show_stacks_handles interface. The stack_handle mode prints both stack traces and handles, making it easier to identify pages with the same allocation pattern by comparing handle numbers instead of comparing large stack traces. Example usage: # Using the page_owner_filter tool (recommended) ./page_owner_filter -m stack # Print only stack traces (default) ./page_owner_filter -m handle # Print only handles ./page_owner_filter -m stack_handle # Print both stack and handles Sample output (handle mode): Page allocated via order 0, migratetype Unmovable, gfp_mask 0x1100ca, pid 1, tgid 1 (systemd), ts 123456789 ns PFN 0x1000 type Unmovable Block 1 type Unmovable Flags 0x3fffe800000084(referenced|lru|active|private|node=3D0|zone=3D1) handle: 17432583 ... This implementation uses per-file-descriptor filter state stored in file->private_data, allowing each opener to have independent filter configuration. Signed-off-by: Zhen Ni --- Changes in v10: - No changes Changes in v9: - Add spinlock_t lock to struct page_owner_filter_state for concurrent acce= ss protection Changes in v8: - Fix buffer overflow by adding bounds check between stack_depot_snprint() = and scnprintf() - Fix unsafe string handling: use memdup_user_nul() instead of kmalloc_objs= + strncpy_from_user() - Fix strsep() memory corruption by saving original pointer before strsep()= call - Change format specifier from %d to %u for depot_stack_handle_t Changes in v7: - per-file-descriptor implementation Changes in v6: - Remove unnecessary braces in if/else statement (coding style) - Use stack array (char kbuf[33]) instead of kmalloc for input buffer Changes in v5: - No code changes Changes in v4: - Change from numeric (0/1) to string-based interface ("full_stack"/"stack_= handle") - Merge infrastructure patch into this patch Changes in v3: - No code changes Changes in v2: - Renamed from 'compact mode' to 'print_mode' for better clarity - Use enum values (0=3Dfull_stack, 1=3Dstack_handle) instead of boolean - Update debugfs filename from 'compact' to 'print_mode' v9: https://lore.kernel.org/linux-mm/20260525081652.2210206-2-zhen.ni@easys= tack.cn/ v8: https://lore.kernel.org/linux-mm/20260520075641.1931080-2-zhen.ni@easys= tack.cn/ v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-2-zhen.ni@easys= tack.cn/ v6: https://lore.kernel.org/linux-mm/20260511033017.747781-2-zhen.ni@easyst= ack.cn/ v5: https://lore.kernel.org/linux-mm/20260507064643.179187-2-zhen.ni@easyst= ack.cn/ v4: https://lore.kernel.org/linux-mm/20260430163247.13628-2-zhen.ni@easysta= ck.cn/ v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-2-zhen.ni@easys= tack.cn/ https://lore.kernel.org/linux-mm/20260428071112.1420380-3-zhen.ni@easys= tack.cn/ v2: https://lore.kernel.org/linux-mm/20260419155540.376847-2-zhen.ni@easyst= ack.cn/ https://lore.kernel.org/linux-mm/20260419155540.376847-3-zhen.ni@easyst= ack.cn/ v1: https://lore.kernel.org/linux-mm/20260417154638.22370-2-zhen.ni@easysta= ck.cn/ https://lore.kernel.org/linux-mm/20260417154638.22370-3-zhen.ni@easysta= ck.cn/ --- mm/page_owner.c | 129 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 123 insertions(+), 6 deletions(-) diff --git a/mm/page_owner.c b/mm/page_owner.c index 8178e0be557f..7595735979bf 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -54,6 +54,23 @@ struct stack_print_ctx { u8 flags; }; =20 +enum page_owner_print_mode { + PAGE_OWNER_PRINT_STACK, + PAGE_OWNER_PRINT_HANDLE, + PAGE_OWNER_PRINT_STACK_HANDLE, +}; + +static const char * const page_owner_print_mode_strings[] =3D { + [PAGE_OWNER_PRINT_STACK] =3D "stack", + [PAGE_OWNER_PRINT_HANDLE] =3D "handle", + [PAGE_OWNER_PRINT_STACK_HANDLE] =3D "stack_handle", +}; + +struct page_owner_filter_state { + enum page_owner_print_mode print_mode; + spinlock_t lock; +}; + static bool page_owner_enabled __initdata; DEFINE_STATIC_KEY_FALSE(page_owner_inited); =20 @@ -547,16 +564,23 @@ static inline int print_page_owner_memcg(char *kbuf, = size_t count, int ret, static ssize_t print_page_owner(char __user *buf, size_t count, unsigned long pfn, struct page *page, struct page_owner *page_owner, - depot_stack_handle_t handle) + depot_stack_handle_t handle, + struct page_owner_filter_state *state) { int ret, pageblock_mt, page_mt; char *kbuf; + enum page_owner_print_mode print_mode; + unsigned long flags; =20 count =3D min_t(size_t, count, PAGE_SIZE); kbuf =3D kmalloc(count, GFP_KERNEL); if (!kbuf) return -ENOMEM; =20 + spin_lock_irqsave(&state->lock, flags); + print_mode =3D state->print_mode; + spin_unlock_irqrestore(&state->lock, flags); + ret =3D scnprintf(kbuf, count, "Page allocated via order %u, mask %#x(%pGg), pid %d, tgid %d (%s), ts = %llu ns\n", page_owner->order, page_owner->gfp_mask, @@ -575,9 +599,18 @@ print_page_owner(char __user *buf, size_t count, unsig= ned long pfn, migratetype_names[pageblock_mt], &page->flags); =20 - ret +=3D stack_depot_snprint(handle, kbuf + ret, count - ret, 0); - if (ret >=3D count) - goto err; + if (print_mode !=3D PAGE_OWNER_PRINT_HANDLE) { + ret +=3D stack_depot_snprint(handle, kbuf + ret, count - ret, 0); + if (ret >=3D count) + goto err; + } + + if (print_mode !=3D PAGE_OWNER_PRINT_STACK) { + ret +=3D scnprintf(kbuf + ret, count - ret, "handle: %u\n", + handle); + if (ret >=3D count) + goto err; + } =20 if (page_owner->last_migrate_reason !=3D -1) { ret +=3D scnprintf(kbuf + ret, count - ret, @@ -664,6 +697,7 @@ read_page_owner(struct file *file, char __user *buf, si= ze_t count, loff_t *ppos) struct page_ext *page_ext; struct page_owner *page_owner; depot_stack_handle_t handle; + struct page_owner_filter_state *state =3D file->private_data; =20 if (!static_branch_unlikely(&page_owner_inited)) return -EINVAL; @@ -746,7 +780,7 @@ read_page_owner(struct file *file, char __user *buf, si= ze_t count, loff_t *ppos) page_owner_tmp =3D *page_owner; page_ext_put(page_ext); return print_page_owner(buf, count, pfn, page, - &page_owner_tmp, handle); + &page_owner_tmp, handle, state); ext_put_continue: page_ext_put(page_ext); } @@ -847,7 +881,90 @@ static void init_early_allocated_pages(void) init_pages_in_zone(zone); } =20 +static int page_owner_open(struct inode *inode, struct file *file) +{ + struct page_owner_filter_state *state; + + state =3D kzalloc_obj(*state); + if (!state) + return -ENOMEM; + + spin_lock_init(&state->lock); + state->print_mode =3D PAGE_OWNER_PRINT_STACK; + file->private_data =3D state; + return 0; +} + +static int page_owner_release(struct inode *inode, struct file *file) +{ + kfree(file->private_data); + return 0; +} + +static ssize_t page_owner_write(struct file *file, + const char __user *buf, + size_t count, loff_t *ppos) +{ + char *kbuf; + char *orig; + char *token; + int ret; + size_t max_input_len; + struct page_owner_filter_state *state =3D file->private_data; + enum page_owner_print_mode new_print_mode; + unsigned long flags; + + /* + * Maximum input length for filter commands: + * 32: print_mode command max length is 17 ("mode=3Dstack_handle"). + */ + max_input_len =3D 32; + + if (count > max_input_len) + return -EINVAL; + + kbuf =3D memdup_user_nul(buf, count); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + + orig =3D kbuf; + + spin_lock_irqsave(&state->lock, flags); + new_print_mode =3D state->print_mode; + spin_unlock_irqrestore(&state->lock, flags); + + while ((token =3D strsep(&kbuf, " \t\n")) !=3D NULL) { + if (*token =3D=3D '\0') + continue; + + if (!strncmp(token, "mode=3D", 5)) { + ret =3D sysfs_match_string(page_owner_print_mode_strings, + token + 5); + if (ret < 0) + goto out_free; + new_print_mode =3D ret; + } else { + ret =3D -EINVAL; + goto out_free; + } + } + + spin_lock_irqsave(&state->lock, flags); + state->print_mode =3D new_print_mode; + spin_unlock_irqrestore(&state->lock, flags); + + ret =3D count; + +out_free: + kfree(orig); + return ret; +} + static const struct file_operations page_owner_fops =3D { + .owner =3D THIS_MODULE, + .open =3D page_owner_open, + .release =3D page_owner_release, + .write =3D page_owner_write, .read =3D read_page_owner, .llseek =3D lseek_page_owner, }; @@ -980,7 +1097,7 @@ static int __init pageowner_init(void) return 0; } =20 - debugfs_create_file("page_owner", 0400, NULL, NULL, &page_owner_fops); + debugfs_create_file("page_owner", 0600, NULL, NULL, &page_owner_fops); dir =3D debugfs_create_dir("page_owner_stacks", NULL); debugfs_create_file("show_stacks", 0400, dir, (void *)(STACK_PRINT_FLAG_STACK | --=20 2.20.1 From nobody Thu Jun 18 13:20:42 2026 Received: from mail-m3272.qiye.163.com (mail-m3272.qiye.163.com [220.197.32.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA35B2D5432 for ; Thu, 18 Jun 2026 04:13:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.32.72 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781756019; cv=none; b=qH5VgRYoQqNEqV5OlbmDUUR+uzJCPqIMhtFDZJ6yWZlUmDr2leDaxBtgZjunZVTPOBA/yIUitAuo5AHA8H3mw1z9ln+Mdl9AcRHcihwwLBfHddt6uJM3jcrLE/+kcNVLkeLkQTvecJ40h2fUu3hNdFypXLl+zLbTHMB7Ve1143k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781756019; c=relaxed/simple; bh=rjSEvP3AZqjy3jOXamBACh2EMtC6VxKiYmCczJPx9UA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Ix3HTT6hsZoVraj7RWqaV63rq5iQ23VWLdssABU8d6OFE82AY6UBAZEZZPWlu26OsnyBv+X6W+cuaJaNVS98WdcWMd4E2mzL66rb+6vrV+AK1iOOPyUEU8Owz1xBg0t6OrT0lpR83x5hOObv0hnbBCPSpTj3tebH6gqNrh65FQo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=220.197.32.72 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from localhost.localdomain (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 1b9899923; Thu, 18 Jun 2026 11:58:08 +0800 (GMT+08:00) From: Zhen Ni To: akpm@linux-foundation.org, vbabka@kernel.org Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhen Ni Subject: [PATCH v10 2/4] mm/page_owner: add NUMA node filter Date: Thu, 18 Jun 2026 11:57:48 +0800 Message-Id: <20260618035750.3724613-3-zhen.ni@easystack.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260618035750.3724613-1-zhen.ni@easystack.cn> References: <20260618035750.3724613-1-zhen.ni@easystack.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-HM-Tid: 0a9ed8e10e710229kunm88c9600fdda9e X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVkZTEgdVh5MGkJJTxhLTk5PSlYVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVKS0 lPT09IVUpLS1VKQktLWQY+ Content-Type: text/plain; charset="utf-8" Add NUMA node filtering functionality to page_owner to allow filtering pages by specific NUMA node(s). This is useful for NUMA-aware memory allocation analysis and debugging. The filter supports flexible input formats: - Single node: nid=3D0 - Multiple nodes: nid=3D0,2,3 - Node range: nid=3D0-3 - Mixed format: nid=3D0,2-4,7 Example usage: # Using the page_owner_filter tool (recommended) ./page_owner_filter -n 0-3 ./page_owner_filter -m stack_handle -n 0,2-4,7 Record the node ID at allocation time by adding a 'nid' member to struct page_owner, rather than calling page_to_nid() during lockless iteration. Since page_to_nid() includes PF_POISONED_CHECK() which may trigger VM_BUG_ON when accessing poisoned page->flags during concurrent page free, record nid at allocation time to avoid panic and provide safe access. The implementation uses per-file-descriptor filter state stored in file->private_data, allowing each opener to have independent filter configuration. It uses nodemask_t for efficient multi-node filtering and nodelist_parse() for flexible input parsing. Node validity is verified using nodes_subset() to reject nodes without memory. Signed-off-by: Zhen Ni --- Changes in v10: - Add 'nid' member to struct page_owner and record it at allocation time - Remove cond_resched() in page iteration loop (unconditional call) - Update NUMA filter to use saved nid instead of page_to_nid() Changes in v9: - Add spinlock protection for NUMA filter state access - Use memdesc_nid() instead of page_to_nid() to bypass PF_POISONED_CHECK() Changes in v8: - Add cond_resched() in page iteration loop to prevent RCU stalls - Reject empty nid list to avoid enabling an empty filter - Improve comment: "Commit all filter changes" Changes in v7: - per-file-descriptor implementation Changes in v6: - Add node validity check using nodes_subset to reject invalid node numbers that don't exist in the system - Move bool filter_by_nid declaration to top of block - Use kmalloc_objs instead of kmalloc - Remove 100 bytes overhead Changes in v5: - Optimize nodes_empty() check in page iteration loop - Add __data_racy qualifier to nid_mask field Changes in v4: - Remove "-1" support, use empty string to clear filter - Use strncpy_from_user() instead of copy_from_user() - Add concurrency safety documentation for nid_mask access - Rename fops to page_owner_nid_filter_fops for consistency Changes in v3: - Remove READ_ONCE/WRITE_ONCE for nodemask_t (fixes compilation errors) * nodemask_t is a large structure (128 bytes) that triggers compile-time = asserts * Direct assignment is safe for this use case - Add comment explaining input length calculation formula * 6 bytes =3D ",NNNNN" (comma + 5-digit node number) - Simplify "-1" check using kstrtoint() instead of dual strcmp() - Move nodemask_t mask read outside PFN iteration loop for performance * Avoids 128-byte structure copy on each iteration Changes in v2: - Use nodemask_t instead of int to support multiple nodes - Implement nodelist_parse() to support flexible input formats * Single node: "0", "2" * Multiple nodes: "0,2,3" * Ranges: "0-3" * Mixed: "0,2-4,7" - Use %*pbl format for output (e.g., "0-2", "0,2-4,7") - Use dynamic memory allocation (kmalloc) to handle variable-length input - Follow cpuset's max_write_len pattern: (100 + 6 * MAX_NUMNODES) v9: https://lore.kernel.org/linux-mm/20260525081652.2210206-3-zhen.ni@easys= tack.cn/ v8: https://lore.kernel.org/linux-mm/20260520075641.1931080-3-zhen.ni@easys= tack.cn/ v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-3-zhen.ni@easys= tack.cn/ v6: https://lore.kernel.org/linux-mm/20260511033017.747781-3-zhen.ni@easyst= ack.cn/ v5: https://lore.kernel.org/linux-mm/20260507064643.179187-3-zhen.ni@easyst= ack.cn/ v4: https://lore.kernel.org/linux-mm/20260430163247.13628-3-zhen.ni@easysta= ck.cn/ v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-4-zhen.ni@easys= tack.cn/ v2: https://lore.kernel.org/linux-mm/20260419155540.376847-4-zhen.ni@easyst= ack.cn/ v1: https://lore.kernel.org/linux-mm/20260417154638.22370-4-zhen.ni@easysta= ck.cn/ --- mm/page_owner.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 50 insertions(+), 2 deletions(-) diff --git a/mm/page_owner.c b/mm/page_owner.c index 7595735979bf..5538d65dcdac 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -34,6 +34,7 @@ struct page_owner { pid_t tgid; pid_t free_pid; pid_t free_tgid; + int nid; }; =20 struct stack { @@ -68,6 +69,8 @@ static const char * const page_owner_print_mode_strings[]= =3D { =20 struct page_owner_filter_state { enum page_owner_print_mode print_mode; + nodemask_t nid_filter; + bool nid_filter_enabled; spinlock_t lock; }; =20 @@ -268,6 +271,7 @@ static inline void __update_page_owner_handle(struct pa= ge *page, struct page_ext_iter iter; struct page_ext *page_ext; struct page_owner *page_owner; + int nid =3D page_to_nid(page); =20 rcu_read_lock(); for_each_page_ext(page, 1 << order, page_ext, iter) { @@ -279,6 +283,7 @@ static inline void __update_page_owner_handle(struct pa= ge *page, page_owner->pid =3D pid; page_owner->tgid =3D tgid; page_owner->ts_nsec =3D ts_nsec; + page_owner->nid =3D nid; strscpy(page_owner->comm, comm, sizeof(page_owner->comm)); __set_bit(PAGE_EXT_OWNER, &page_ext->flags); @@ -698,6 +703,7 @@ read_page_owner(struct file *file, char __user *buf, si= ze_t count, loff_t *ppos) struct page_owner *page_owner; depot_stack_handle_t handle; struct page_owner_filter_state *state =3D file->private_data; + unsigned long flags; =20 if (!static_branch_unlikely(&page_owner_inited)) return -EINVAL; @@ -774,6 +780,15 @@ read_page_owner(struct file *file, char __user *buf, s= ize_t count, loff_t *ppos) if (!handle) goto ext_put_continue; =20 + spin_lock_irqsave(&state->lock, flags); + if (state->nid_filter_enabled) { + if (!node_isset(page_owner->nid, state->nid_filter)) { + spin_unlock_irqrestore(&state->lock, flags); + goto ext_put_continue; + } + } + spin_unlock_irqrestore(&state->lock, flags); + /* Record the next PFN to read in the file offset */ *ppos =3D pfn + 1; =20 @@ -783,6 +798,7 @@ read_page_owner(struct file *file, char __user *buf, si= ze_t count, loff_t *ppos) &page_owner_tmp, handle, state); ext_put_continue: page_ext_put(page_ext); + cond_resched(); } =20 return 0; @@ -891,6 +907,8 @@ static int page_owner_open(struct inode *inode, struct = file *file) =20 spin_lock_init(&state->lock); state->print_mode =3D PAGE_OWNER_PRINT_STACK; + nodes_clear(state->nid_filter); + state->nid_filter_enabled =3D false; file->private_data =3D state; return 0; } @@ -912,13 +930,18 @@ static ssize_t page_owner_write(struct file *file, size_t max_input_len; struct page_owner_filter_state *state =3D file->private_data; enum page_owner_print_mode new_print_mode; + nodemask_t new_nid_filter; + bool new_nid_filter_enabled; unsigned long flags; =20 /* * Maximum input length for filter commands: - * 32: print_mode command max length is 17 ("mode=3Dstack_handle"). + * - 32: print_mode command max length is 17 ("mode=3Dstack_handle") + * with sufficient buffer + * - 6 * MAX_NUMNODES: worst case for nid list + * Worst case per node: ",NNNNN" (comma + 5-digit node number) =3D 6 by= tes */ - max_input_len =3D 32; + max_input_len =3D 32 + 6 * MAX_NUMNODES; =20 if (count > max_input_len) return -EINVAL; @@ -931,6 +954,8 @@ static ssize_t page_owner_write(struct file *file, =20 spin_lock_irqsave(&state->lock, flags); new_print_mode =3D state->print_mode; + new_nid_filter =3D state->nid_filter; + new_nid_filter_enabled =3D state->nid_filter_enabled; spin_unlock_irqrestore(&state->lock, flags); =20 while ((token =3D strsep(&kbuf, " \t\n")) !=3D NULL) { @@ -943,14 +968,37 @@ static ssize_t page_owner_write(struct file *file, if (ret < 0) goto out_free; new_print_mode =3D ret; + } else if (!strncmp(token, "nid=3D", 4)) { + ret =3D nodelist_parse(token + 4, new_nid_filter); + if (ret < 0) + goto out_free; + + if (nodes_empty(new_nid_filter)) { + ret =3D -EINVAL; + goto out_free; + } + + /* + * We want to filter memory allocations by numa nodes, so make sure + * that the specified nodes have memory. + */ + if (!nodes_subset(new_nid_filter, node_states[N_MEMORY])) { + ret =3D -EINVAL; + goto out_free; + } + + new_nid_filter_enabled =3D true; } else { ret =3D -EINVAL; goto out_free; } } =20 + /* Commit all filter changes */ spin_lock_irqsave(&state->lock, flags); state->print_mode =3D new_print_mode; + state->nid_filter =3D new_nid_filter; + state->nid_filter_enabled =3D new_nid_filter_enabled; spin_unlock_irqrestore(&state->lock, flags); =20 ret =3D count; --=20 2.20.1 From nobody Thu Jun 18 13:20:42 2026 Received: from mail-m1973187.qiye.163.com (mail-m1973187.qiye.163.com [220.197.31.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E83A37DEAA for ; Thu, 18 Jun 2026 05:13:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.31.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781759641; cv=none; b=MsIcs+hJOVEV6vrWsHL7xR6riMA4pRXh+ll3q1eNzFVd/Aqu0SkfYYBFp+30Mv7Q4sJLaHHS5ihMNg008Fts2ajExx1OUPUEd1a22ZfwUV0zwwsfQd2D+s7ScwDMgenvS1IWjk3Zf6WtuKZCS7NLxKwSI4r/gzxVO4og0ExMpew= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781759641; c=relaxed/simple; bh=H2Md5JKN++wYD9bpsjM24oIbSngLcxqHKAuy9ygO46Y=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RHwIlnr1pfnfcZXsHBG9xMtMiMIoon5sNf4fKkzvAcRRaHBkZ64MIlUClhUjEzJ0caGM5LU+5zZ08aBz21eYMNZQfAAyv9RBuhmNPez3UGUe/aKvnBrBtve7GR8j94k7Ffid9amc1bEv2otN71acmf/1VZffJfZ5o0f7oQyHCyw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=220.197.31.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from localhost.localdomain (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 1b9899930; Thu, 18 Jun 2026 11:58:10 +0800 (GMT+08:00) From: Zhen Ni To: akpm@linux-foundation.org, vbabka@kernel.org Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhen Ni Subject: [PATCH v10 3/4] tools/mm: add page_owner_filter userspace tool Date: Thu, 18 Jun 2026 11:57:49 +0800 Message-Id: <20260618035750.3724613-4-zhen.ni@easystack.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260618035750.3724613-1-zhen.ni@easystack.cn> References: <20260618035750.3724613-1-zhen.ni@easystack.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-HM-Tid: 0a9ed8e115b60229kunm88c9600fddaaf X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVlCGR0eVk9DTBpCSE5DHk8YSFYVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVKS0 lPT09IVUpLS1VKQktLWQY+ Content-Type: text/plain; charset="utf-8" Add a userspace filtering tool for page_owner that supports per-fd filtering with print_mode and NUMA node filters. Features: - Three print modes: stack (default), handle, stack_handle - NUMA node filtering with flexible formats (single: 0, multiple: 0,1,2, range: 0-3, mixed: 0,2-3) - Per-file-descriptor filter state for independent filtering Usage examples: # Filter by print mode ./page_owner_filter -m handle ./page_owner_filter -m stack_handle # Filter by NUMA node ./page_owner_filter -n 0 ./page_owner_filter -n 0-3 # Combined filters ./page_owner_filter -m stack -n 0,1,2 ./page_owner_filter -m handle -n 0,2-3 The tool validates inputs before sending commands to the kernel and provides clear error messages when the kernel does not support per-fd filtering. Signed-off-by: Zhen Ni --- Changes in v10: - Improve error handling: check fwrite() and fflush() return values - Handle EPIPE correctly: treat broken pipe as success Changes in v9: - Fix isdigit() usage: cast to unsigned char to avoid undefined behavior wi= th non-ASCII input - Optimize I/O performance: replace fprintf() + fflush() in loop with fwrit= e() + single fflush() after loop Changes in v8: - Add validation to reject multiple dashes in nid list (e.g., "1-2-3") - Fix snprintf return value handling to prevent command overflow Changes in v7: - New patch for userspace tool v9: https://lore.kernel.org/linux-mm/20260525081652.2210206-4-zhen.ni@easys= tack.cn/ v8: https://lore.kernel.org/linux-mm/20260520075641.1931080-4-zhen.ni@easys= tack.cn/ v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-4-zhen.ni@easys= tack.cn/ --- tools/mm/Makefile | 4 +- tools/mm/page_owner_filter.c | 302 +++++++++++++++++++++++++++++++++++ 2 files changed, 304 insertions(+), 2 deletions(-) create mode 100644 tools/mm/page_owner_filter.c diff --git a/tools/mm/Makefile b/tools/mm/Makefile index f5725b5c23aa..858186a6eefd 100644 --- a/tools/mm/Makefile +++ b/tools/mm/Makefile @@ -3,7 +3,7 @@ # include ../scripts/Makefile.include =20 -BUILD_TARGETS=3Dpage-types slabinfo page_owner_sort thp_swap_allocator_test +BUILD_TARGETS=3Dpage-types slabinfo page_owner_sort page_owner_filter thp_= swap_allocator_test INSTALL_TARGETS =3D $(BUILD_TARGETS) thpmaps =20 LIB_DIR =3D ../lib/api @@ -23,7 +23,7 @@ $(LIBS): $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) =20 clean: - $(RM) page-types slabinfo page_owner_sort thp_swap_allocator_test + $(RM) page-types slabinfo page_owner_sort page_owner_filter thp_swap_allo= cator_test make -C $(LIB_DIR) clean =20 sbindir ?=3D /usr/sbin diff --git a/tools/mm/page_owner_filter.c b/tools/mm/page_owner_filter.c new file mode 100644 index 000000000000..cc5e110a7775 --- /dev/null +++ b/tools/mm/page_owner_filter.c @@ -0,0 +1,302 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * User-space helper to filter page_owner output per-fd + * + * Example use: + * ./page_owner_filter -m handle + * ./page_owner_filter -m stack_handle + * ./page_owner_filter -n 0,1,2 + * + * See Documentation/mm/page_owner.rst + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#define MAX_CMD_LEN 512 + +static void usage(const char *prog) +{ + fprintf(stderr, "Usage: %s [OPTIONS]\n", prog); + fprintf(stderr, "\nOptions:\n"); + fprintf(stderr, " -m, --mode MODE : print_mode (stack, handle, or s= tack_handle)\n"); + fprintf(stderr, " -n, --nid NID_LIST : NUMA node IDs (comma-separated = or ranges)\n"); + fprintf(stderr, " -o, --output FILE : output file (default: stdout)\n= "); + fprintf(stderr, " -h, --help : show this help message\n"); + fprintf(stderr, "\nExamples:\n"); + fprintf(stderr, " %s -m stack\n", prog); + fprintf(stderr, " %s -m handle\n", prog); + fprintf(stderr, " %s -m stack_handle\n", prog); + fprintf(stderr, " %s -m stack -o output.txt\n", prog); + fprintf(stderr, " %s -n 0,1,2\n", prog); + fprintf(stderr, " %s -m stack -n 0\n", prog); +} + +static int validate_mode(const char *mode) +{ + if (strcmp(mode, "stack") =3D=3D 0 || + strcmp(mode, "handle") =3D=3D 0 || + strcmp(mode, "stack_handle") =3D=3D 0) + return 0; + + fprintf(stderr, "Error: Invalid mode '%s'\n", mode); + fprintf(stderr, "Valid modes: stack, handle, stack_handle\n"); + return -1; +} + +static int validate_nid_list(const char *nid_list) +{ + const char *p; + int i =3D 0; + int has_digit =3D 0; + int in_range =3D 0; + int prev_num =3D 0; + int curr_num =3D 0; + + if (!nid_list || strlen(nid_list) =3D=3D 0) + return 0; + + for (p =3D nid_list; *p; p++) { + if (*p =3D=3D ',') { + if (!has_digit) { + fprintf(stderr, "Error: Invalid nid_list format\n"); + return -1; + } + if (in_range && prev_num > curr_num) { + fprintf(stderr, + "Error: Invalid range %d-%d (start must be <=3D end)\n", + prev_num, curr_num); + return -1; + } + i =3D 0; + has_digit =3D 0; + in_range =3D 0; + prev_num =3D 0; + curr_num =3D 0; + continue; + } + + if (*p =3D=3D '-') { + if (!has_digit) { + fprintf(stderr, + "Error: Invalid nid_list format "); + fprintf(stderr, + "(dash without preceding number)\n"); + return -1; + } + if (in_range) { + fprintf(stderr, "Error: Multiple dashes in nid_list\n"); + return -1; + } + prev_num =3D curr_num; + curr_num =3D 0; + i =3D 0; + has_digit =3D 0; + in_range =3D 1; + continue; + } + + if (!isdigit((unsigned char)*p)) { + fprintf(stderr, "Error: Invalid character '%c' in nid_list\n", *p); + return -1; + } + + if (i > 5) { + fprintf(stderr, "Error: NID too long (max 65536)\n"); + return -1; + } + curr_num =3D curr_num * 10 + (*p - '0'); + i++; + has_digit =3D 1; + } + + if (!has_digit) { + fprintf(stderr, "Error: Invalid nid_list format\n"); + return -1; + } + + if (in_range && prev_num > curr_num) { + fprintf(stderr, + "Error: Invalid range %d-%d (start must be <=3D end)\n", + prev_num, curr_num); + return -1; + } + + return 0; +} + +int main(int argc, char *argv[]) +{ + const char *output_file =3D NULL; + char filter_cmd[MAX_CMD_LEN]; + FILE *output =3D NULL; + int fd =3D -1; + ssize_t ret; + char buf[4096]; + int opt; + size_t cmd_len =3D 0; + + static struct option long_options[] =3D { + {"mode", required_argument, 0, 'm'}, + {"nid", required_argument, 0, 'n'}, + {"output", required_argument, 0, 'o'}, + {"help", no_argument, 0, 'h'}, + {0, 0, 0, 0} + }; + + filter_cmd[0] =3D '\0'; + + if (argc > 1) { + for (int i =3D 1; i < argc; i++) { + if (strcmp(argv[i], "-h") =3D=3D 0 || strcmp(argv[i], "--help") =3D=3D = 0) { + usage(argv[0]); + return 0; + } + } + } + + /* Check if page_owner exists and is readable */ + if (access("/sys/kernel/debug/page_owner", F_OK) !=3D 0) { + if (errno =3D=3D ENOENT) + fprintf(stderr, "Error: /sys/kernel/debug/page_owner does not exist\n"); + else + perror("Error accessing /sys/kernel/debug/page_owner"); + fprintf(stderr, "Make sure page_owner is enabled in kernel\n"); + return 1; + } + + while ((opt =3D getopt_long(argc, argv, "m:n:o:h", long_options, NULL)) != =3D -1) { + int len; + + switch (opt) { + case 'm': { + const char *mode =3D optarg; + + if (validate_mode(mode) < 0) + return 1; + len =3D snprintf(filter_cmd + cmd_len, MAX_CMD_LEN - cmd_len, + "%smode=3D%s", cmd_len > 0 ? " " : "", mode); + if (len < 0 || cmd_len + len >=3D MAX_CMD_LEN) { + fprintf(stderr, "Error: Command too long\n"); + return 1; + } + cmd_len +=3D len; + break; + } + case 'n': { + const char *nid_list =3D optarg; + + if (validate_nid_list(nid_list) < 0) + return 1; + len =3D snprintf(filter_cmd + cmd_len, MAX_CMD_LEN - cmd_len, + "%snid=3D%s", cmd_len > 0 ? " " : "", nid_list); + if (len < 0 || cmd_len + len >=3D MAX_CMD_LEN) { + fprintf(stderr, "Error: Command too long\n"); + return 1; + } + cmd_len +=3D len; + break; + } + case 'o': + output_file =3D optarg; + break; + case 'h': + /* Already handled above */ + break; + default: + usage(argv[0]); + return 1; + } + } + + /* At least one filter must be specified */ + if (cmd_len =3D=3D 0) { + fprintf(stderr, "Error: At least one filter (-m or -n) must be specified= \n\n"); + usage(argv[0]); + return 1; + } + + /* Open page_owner for read-write - this will fail if kernel doesn't supp= ort write */ + fd =3D open("/sys/kernel/debug/page_owner", O_RDWR); + if (fd < 0) { + if (errno =3D=3D EACCES || errno =3D=3D EPERM) { + fprintf(stderr, "Error: /sys/kernel/debug/page_owner "); + fprintf(stderr, "does not support write access\n"); + fprintf(stderr, "This kernel does not support "); + fprintf(stderr, "per-fd filtering.\n"); + fprintf(stderr, "Please ensure you have a kernel with "); + fprintf(stderr, "per-fd filtering support.\n"); + } else { + perror("Error opening /sys/kernel/debug/page_owner"); + } + return 1; + } + + if (output_file) { + output =3D fopen(output_file, "w"); + if (!output) { + perror("open output file"); + close(fd); + return 1; + } + } else { + output =3D stdout; + } + + ret =3D write(fd, filter_cmd, strlen(filter_cmd)); + + if (ret < 0) { + if (errno =3D=3D EINVAL) { + fprintf(stderr, "Error: Kernel rejected the filter command.\n"); + fprintf(stderr, "Possible causes:\n"); + fprintf(stderr, " - Kernel does not support per-fd filtering\n"); + fprintf(stderr, " - NUMA node has no memory\n"); + fprintf(stderr, " - Unknown reason\n"); + } else { + perror("write filter command"); + } + goto out; + } + + if ((size_t)ret !=3D strlen(filter_cmd)) + fprintf(stderr, "Warning: Partial write (%zd/%zu)\n", ret, strlen(filter= _cmd)); + + /* Read and display filtered output */ + ret =3D 0; + while ((ret =3D read(fd, buf, sizeof(buf))) > 0) { + size_t written =3D fwrite(buf, 1, ret, output); + + if (written !=3D (size_t)ret) { + if (errno =3D=3D EPIPE) { + /* Pipe closed, treat as success */ + ret =3D 0; + goto out; + } + perror("write output"); + ret =3D -1; + goto out; + } + } + + if (ret < 0) { + perror("read page_owner"); + goto out; + } + + if (fflush(output)) { + perror("flush output"); + ret =3D -1; + } + +out: + close(fd); + if (output !=3D stdout) + fclose(output); + return ret < 0 ? 1 : 0; +} --=20 2.20.1 From nobody Thu Jun 18 13:20:42 2026 Received: from mail-m155115.qiye.163.com (mail-m155115.qiye.163.com [101.71.155.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB82C2877C3 for ; Thu, 18 Jun 2026 06:20:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=101.71.155.115 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781763642; cv=none; b=uMmz/6lmve9m981J9nmUHnq7HMou16YZCYeNP2wkFVmyAyAUg9pigpdlHWFQ+Xx1oYFuPgG3+b6B/sUyqEmY2qtuoCLZ3ODM1qndeYQmHHbDgtR/XZixKEvd/H4uPKsDqL0M/QzTC5j5QmaP9libm8jGA4p73VIlXYXTG7vVV50= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781763642; c=relaxed/simple; bh=IXZEh9H1VjNTAkv3LzRZh//76SKKfG6iKpbjZN/bgc8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Rngkbv2JI5shS2kck9VNvEvW29fKBM6xfUDcObk2/dHBrZWwJ7gA2MTGVW58qWAlYapni0c3shfe8YZA/BPyFIeMf4rQ260H1C+34P8J9S+bDV4F99MrutoY1LmU7eSeYMl1awlY/+hrhvk9iuYmllkDTjbZS60ozrJEkN/MRxQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=101.71.155.115 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from localhost.localdomain (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 1b989993a; Thu, 18 Jun 2026 11:58:12 +0800 (GMT+08:00) From: Zhen Ni To: akpm@linux-foundation.org, vbabka@kernel.org Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhen Ni Subject: [PATCH v10 4/4] mm/page_owner: document page_owner filter Date: Thu, 18 Jun 2026 11:57:50 +0800 Message-Id: <20260618035750.3724613-5-zhen.ni@easystack.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260618035750.3724613-1-zhen.ni@easystack.cn> References: <20260618035750.3724613-1-zhen.ni@easystack.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-HM-Tid: 0a9ed8e11d0c0229kunm88c9600fddaba X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVlDGh5JVk4YH0wYQkhOTkMdTVYVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVKS0 lPT09IVUpLS1VKQktLWQY+ Content-Type: text/plain; charset="utf-8" Add documentation for the page_owner_filter userspace tool and kernel-level filtering features. Signed-off-by: Zhen Ni --- Changes in v10: - No changes Changes in v9: - No changes Changes in v8: - Fix Sphinx double colon warning Changes in v7: - document for per-file-descriptor implementation Changes in v6: - No code changes Changes in v5: - No code changes Changes in v4: - Update print_mode documentation to reflect string-based interface * Change from "0/1" to "full_stack"/"stack_handle" * Add bracket notation example: "[full_stack] stack_handle" - Update NUMA filter documentation * Remove "-1" example * Add empty string as clear method - Fix indentation: use tabs instead of spaces in code examples Changes in v3: - New patch to document filter features as requested by Andrew Morton v9: https://lore.kernel.org/linux-mm/20260525081652.2210206-5-zhen.ni@easys= tack.cn/ v8: https://lore.kernel.org/linux-mm/20260520075641.1931080-5-zhen.ni@easys= tack.cn/ v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-5-zhen.ni@easys= tack.cn/ v6: https://lore.kernel.org/linux-mm/20260511033017.747781-4-zhen.ni@easyst= ack.cn/ v5: https://lore.kernel.org/linux-mm/20260507064643.179187-4-zhen.ni@easyst= ack.cn/ v4: https://lore.kernel.org/linux-mm/20260430163247.13628-4-zhen.ni@easysta= ck.cn/ v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-5-zhen.ni@easys= tack.cn/ --- Documentation/mm/page_owner.rst | 77 ++++++++++++++++++++++++++++++++- 1 file changed, 75 insertions(+), 2 deletions(-) diff --git a/Documentation/mm/page_owner.rst b/Documentation/mm/page_owner.= rst index 6b12f3b007ec..383e59c42743 100644 --- a/Documentation/mm/page_owner.rst +++ b/Documentation/mm/page_owner.rst @@ -65,7 +65,14 @@ un-tracking state. Usage =3D=3D=3D=3D=3D =20 -1) Build user-space helper:: +1) Build user-space helpers:: + +To filter page_owner output: + + cd tools/mm + make page_owner_filter + +To sort and analyze page_owner output: =20 cd tools/mm make page_owner_sort @@ -74,7 +81,11 @@ Usage =20 3) Do the job that you want to debug. =20 -4) Analyze information from page owner:: +4) (Optional) Filter page_owner output:: + + ./page_owner_filter -m handle -n 0,1,2 > filtered_page_owner.txt + +5) Analyze information from page owner:: =20 cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt cat stacks.txt @@ -263,3 +274,65 @@ STANDARD FORMAT SPECIFIERS f free whether the page has been released or not st stacktrace stack trace of the page allocation ator allocator memory allocator for pages + +Filtering page_owner output +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D + +page_owner supports filtering output at the kernel level before reading, +which reduces the amount of data that needs to be processed in userspace. + +The page_owner_filter tool provides a convenient interface for this filter= ing +capability. It supports two types of filters: + +1. **print_mode filter**: Control what information is printed for each page + - ``stack``: Print full stack traces (default, compatible with existing u= sage) + - ``handle``: Print only stack handle numbers (much faster, smaller outpu= t) + - ``stack_handle``: Print both stack traces and handle numbers + + The ``handle`` mode uses numeric identifiers instead of full stack traces. + The mapping from handles to actual stack traces can be obtained via the + show_stacks_handles interface. + +2. **NUMA node filter**: Filter pages by NUMA node ID + - Supports single node: ``-n 0`` + - Multiple nodes: ``-n 0,1,2`` + - Ranges: ``-n 0-3`` + - Mixed format: ``-n 0,2-3,5`` + +Usage examples:: + + # Filter by print mode + ./page_owner_filter -m handle + ./page_owner_filter -m stack_handle + + # Filter by NUMA node + ./page_owner_filter -n 0 + ./page_owner_filter -n 0-3 + + # Combined filters + ./page_owner_filter -m stack -n 0,1,2 + ./page_owner_filter -m handle -n 0,2-3 + + # Save to file + ./page_owner_filter -m handle -o filtered_output.txt + +The handle mode is particularly useful for monitoring and performance-crit= ical +scenarios as it dramatically reduces output size. Testing shows handle mod= e can +reduce output size by ~66% (84MB vs 244MB) and improve read performance by= ~4.4x +compared to full stack output. + +The NUMA node filter is useful for NUMA-aware memory allocation analysis a= nd debugging. + +Behind the scenes, page_owner_filter opens /sys/kernel/debug/page_owner and +writes filter commands before reading the filtered output. The filtering u= ses +per-file-descriptor state, allowing each open() to have independent filter= settings. + +Each file descriptor maintains its own filter state, so you can have multi= ple +independent filtering operations running concurrently. For example, in dif= ferent +terminals you can run different filters simultaneously:: + + # Terminal 1: Filter node 0 + ./page_owner_filter -n 0 > node0_output.txt + + # Terminal 2: Filter node 1 (runs concurrently) + ./page_owner_filter -n 1 > node1_output.txt --=20 2.20.1