From nobody Mon May 25 00:08:04 2026 Received: from mail-m32108.qiye.163.com (mail-m32108.qiye.163.com [220.197.32.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4BA9C322A1F for ; Wed, 20 May 2026 08:02:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.32.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779264141; cv=none; b=mQ0N7CtdQcNJ1lTa77KxZDJ2qA0+V8ufviIhvXddHi1WLjbRVkEjo9yEJIj+q+vJwofxJwPf0jI+MT1xcxGO5XoF7X3xgctj1hIxvanUlgudRz9fY2I74/FwAob3BVNh4chX76gK3jBlck3Fee5xgy8LZILYOCe3T/nJwzRyo0Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779264141; c=relaxed/simple; bh=4k/ZgXrYWSFzx0aUM2VBdSSgzxy6sUOFEdKLA9dr2tg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=DgN3a7k8VtpxAziUUyBsQIPeEJk0wYjY43U4yaKHWus5LbyjwdFFktaqQl2+xbnH9xeaYP9wyYldzWngha5yLLVWMRoLZMWPd0a7guIurzK+aDOHXEoZfTBBxUcfeTqCfqpD9GOFqdnvl/yF1A0KuroFH67ekAlE4GRWrgOSn0E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=220.197.32.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from localhost.localdomain (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 1a5d18cad; Wed, 20 May 2026 15:56:57 +0800 (GMT+08:00) From: Zhen Ni To: akpm@linux-foundation.org, vbabka@kernel.org Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhen Ni Subject: [PATCH v8 1/4] mm/page_owner: add print_mode filter Date: Wed, 20 May 2026 15:56:38 +0800 Message-Id: <20260520075641.1931080-2-zhen.ni@easystack.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260520075641.1931080-1-zhen.ni@easystack.cn> References: <20260520075641.1931080-1-zhen.ni@easystack.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-HM-Tid: 0a9e446346e60229kunm5cf2619843985 X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVkaSE1PVkhNTU5LTx0dGBpISVYVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVKS0 lPT09IVUpLS1VKQktLWQY+ Content-Type: text/plain; charset="utf-8" Add a print_mode filter to page_owner that allows users to choose between printing stack traces, stack handles, or both, providing flexibility for different debugging and analysis scenarios. The filter provides three modes via page_owner: - Writing "mode=3Dstack" prints stack traces for each page (default) - Writing "mode=3Dhandle" prints only the handle number - Writing "mode=3Dstack_handle" prints both stack traces and handles The default stack mode maintains backward compatibility with existing usage, displaying complete stack traces for each page allocation. The handle mode dramatically reduces log size and improves performance by showing only the handle number instead of the full stack trace. Testing shows handle mode reduces output size by ~66% (84MB vs 244MB) and improves read performance by ~4.4x compared to full stack output. The mapping from handles to actual stack traces can be obtained via the show_stacks_handles interface. The stack_handle mode prints both stack traces and handles, making it easier to identify pages with the same allocation pattern by comparing handle numbers instead of comparing large stack traces. Example usage: # Using the page_owner_filter tool (recommended) ./page_owner_filter -m stack # Print only stack traces (default) ./page_owner_filter -m handle # Print only handles ./page_owner_filter -m stack_handle # Print both stack and handles Sample output (handle mode): Page allocated via order 0, migratetype Unmovable, gfp_mask 0x1100ca, pid 1, tgid 1 (systemd), ts 123456789 ns PFN 0x1000 type Unmovable Block 1 type Unmovable Flags 0x3fffe800000084(referenced|lru|active|private|node=3D0|zone=3D1) handle: 17432583 ... This implementation uses per-file-descriptor filter state stored in file->private_data, allowing each opener to have independent filter configuration. Signed-off-by: Zhen Ni --- Changes in v8: - Fix buffer overflow by adding bounds check between stack_depot_snprint() = and scnprintf() - Fix unsafe string handling: use memdup_user_nul() instead of kmalloc_objs= + strncpy_from_user() - Fix strsep() memory corruption by saving original pointer before strsep()= call - Change format specifier from %d to %u for depot_stack_handle_t Changes in v7: - per-file-descriptor implementation Changes in v6: - Remove unnecessary braces in if/else statement (coding style) - Use stack array (char kbuf[33]) instead of kmalloc for input buffer Changes in v5: - No code changes Changes in v4: - Change from numeric (0/1) to string-based interface ("full_stack"/"stack_= handle") - Merge infrastructure patch into this patch Changes in v3: - No code changes Changes in v2: - Renamed from 'compact mode' to 'print_mode' for better clarity - Use enum values (0=3Dfull_stack, 1=3Dstack_handle) instead of boolean - Update debugfs filename from 'compact' to 'print_mode' v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-2-zhen.ni@easys= tack.cn/ v6: https://lore.kernel.org/linux-mm/20260511033017.747781-2-zhen.ni@easyst= ack.cn/ v5: https://lore.kernel.org/linux-mm/20260507064643.179187-2-zhen.ni@easyst= ack.cn/ v4: https://lore.kernel.org/linux-mm/20260430163247.13628-2-zhen.ni@easysta= ck.cn/ v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-2-zhen.ni@easys= tack.cn/ https://lore.kernel.org/linux-mm/20260428071112.1420380-3-zhen.ni@easys= tack.cn/ v2: https://lore.kernel.org/linux-mm/20260419155540.376847-2-zhen.ni@easyst= ack.cn/ https://lore.kernel.org/linux-mm/20260419155540.376847-3-zhen.ni@easyst= ack.cn/ v1: https://lore.kernel.org/linux-mm/20260417154638.22370-2-zhen.ni@easysta= ck.cn/ https://lore.kernel.org/linux-mm/20260417154638.22370-3-zhen.ni@easysta= ck.cn/ --- mm/page_owner.c | 111 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 105 insertions(+), 6 deletions(-) diff --git a/mm/page_owner.c b/mm/page_owner.c index 8178e0be557f..d0c428d6cac3 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -54,6 +54,22 @@ struct stack_print_ctx { u8 flags; }; =20 +enum page_owner_print_mode { + PAGE_OWNER_PRINT_STACK, + PAGE_OWNER_PRINT_HANDLE, + PAGE_OWNER_PRINT_STACK_HANDLE, +}; + +static const char * const page_owner_print_mode_strings[] =3D { + [PAGE_OWNER_PRINT_STACK] =3D "stack", + [PAGE_OWNER_PRINT_HANDLE] =3D "handle", + [PAGE_OWNER_PRINT_STACK_HANDLE] =3D "stack_handle", +}; + +struct page_owner_filter_state { + enum page_owner_print_mode print_mode; +}; + static bool page_owner_enabled __initdata; DEFINE_STATIC_KEY_FALSE(page_owner_inited); =20 @@ -547,7 +563,8 @@ static inline int print_page_owner_memcg(char *kbuf, si= ze_t count, int ret, static ssize_t print_page_owner(char __user *buf, size_t count, unsigned long pfn, struct page *page, struct page_owner *page_owner, - depot_stack_handle_t handle) + depot_stack_handle_t handle, + struct page_owner_filter_state *state) { int ret, pageblock_mt, page_mt; char *kbuf; @@ -575,9 +592,18 @@ print_page_owner(char __user *buf, size_t count, unsig= ned long pfn, migratetype_names[pageblock_mt], &page->flags); =20 - ret +=3D stack_depot_snprint(handle, kbuf + ret, count - ret, 0); - if (ret >=3D count) - goto err; + if (state->print_mode !=3D PAGE_OWNER_PRINT_HANDLE) { + ret +=3D stack_depot_snprint(handle, kbuf + ret, count - ret, 0); + if (ret >=3D count) + goto err; + } + + if (state->print_mode !=3D PAGE_OWNER_PRINT_STACK) { + ret +=3D scnprintf(kbuf + ret, count - ret, "handle: %u\n", + handle); + if (ret >=3D count) + goto err; + } =20 if (page_owner->last_migrate_reason !=3D -1) { ret +=3D scnprintf(kbuf + ret, count - ret, @@ -664,6 +690,7 @@ read_page_owner(struct file *file, char __user *buf, si= ze_t count, loff_t *ppos) struct page_ext *page_ext; struct page_owner *page_owner; depot_stack_handle_t handle; + struct page_owner_filter_state *state =3D file->private_data; =20 if (!static_branch_unlikely(&page_owner_inited)) return -EINVAL; @@ -746,7 +773,7 @@ read_page_owner(struct file *file, char __user *buf, si= ze_t count, loff_t *ppos) page_owner_tmp =3D *page_owner; page_ext_put(page_ext); return print_page_owner(buf, count, pfn, page, - &page_owner_tmp, handle); + &page_owner_tmp, handle, state); ext_put_continue: page_ext_put(page_ext); } @@ -847,7 +874,79 @@ static void init_early_allocated_pages(void) init_pages_in_zone(zone); } =20 +static int page_owner_open(struct inode *inode, struct file *file) +{ + struct page_owner_filter_state *state; + + state =3D kzalloc_obj(*state); + if (!state) + return -ENOMEM; + + state->print_mode =3D PAGE_OWNER_PRINT_STACK; + file->private_data =3D state; + return 0; +} + +static int page_owner_release(struct inode *inode, struct file *file) +{ + kfree(file->private_data); + return 0; +} + +static ssize_t page_owner_write(struct file *file, + const char __user *buf, + size_t count, loff_t *ppos) +{ + char *kbuf; + char *orig; + char *token; + int ret; + size_t max_input_len; + struct page_owner_filter_state *state =3D file->private_data; + + /* + * Maximum input length for filter commands: + * 32: print_mode command max length is 17 ("mode=3Dstack_handle"). + */ + max_input_len =3D 32; + + if (count > max_input_len) + return -EINVAL; + + kbuf =3D memdup_user_nul(buf, count); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + + orig =3D kbuf; + + while ((token =3D strsep(&kbuf, " \t\n")) !=3D NULL) { + if (*token =3D=3D '\0') + continue; + + if (!strncmp(token, "mode=3D", 5)) { + ret =3D sysfs_match_string(page_owner_print_mode_strings, + token + 5); + if (ret < 0) + goto out_free; + state->print_mode =3D ret; + } else { + ret =3D -EINVAL; + goto out_free; + } + } + + ret =3D count; + +out_free: + kfree(orig); + return ret; +} + static const struct file_operations page_owner_fops =3D { + .owner =3D THIS_MODULE, + .open =3D page_owner_open, + .release =3D page_owner_release, + .write =3D page_owner_write, .read =3D read_page_owner, .llseek =3D lseek_page_owner, }; @@ -980,7 +1079,7 @@ static int __init pageowner_init(void) return 0; } =20 - debugfs_create_file("page_owner", 0400, NULL, NULL, &page_owner_fops); + debugfs_create_file("page_owner", 0600, NULL, NULL, &page_owner_fops); dir =3D debugfs_create_dir("page_owner_stacks", NULL); debugfs_create_file("show_stacks", 0400, dir, (void *)(STACK_PRINT_FLAG_STACK | --=20 2.20.1 From nobody Mon May 25 00:08:04 2026 Received: from mail-m19731109.qiye.163.com (mail-m19731109.qiye.163.com [220.197.31.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1AECA37D133 for ; Wed, 20 May 2026 11:26:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.31.109 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779276390; cv=none; b=hp6QOwDIpavYAN4QIn/L2EEx//kVJbOYh4I26S5QB9hbhlqApqGNH9DBHGtr1fIL61VSgT2C4rAi4uWmSfULel+Obo6uxjZmmq9EzS2Bc/NQWFBbpahRHIqWb4fIzPzwLSQ3jUey/AvRmaBu+/jA7tEieWIHO3cT+ZGfAdeXYvk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779276390; c=relaxed/simple; bh=qfvkLnD9Y8exKfIqhjL/chdhoNbNvf2A4gFq+7WLruY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=k0KMuYYjVqHidBIzYaNyPQOzDJk4IMjMEpyszVpcTNgQlzSB2W1Wcgs3XhlUKj/mJTb0GXs+UH6hf6Tn/KFcAXpZP2uOucBhONqooGK6EDB2HDyOP5l0gYzBqnggp3xxosYgceVdotqAQbijmhPlwtnZTqc5lMoXpx5k6QTSYow= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=220.197.31.109 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from localhost.localdomain (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 1a5d18cae; Wed, 20 May 2026 15:56:59 +0800 (GMT+08:00) From: Zhen Ni To: akpm@linux-foundation.org, vbabka@kernel.org Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhen Ni Subject: [PATCH v8 2/4] mm/page_owner: add NUMA node filter Date: Wed, 20 May 2026 15:56:39 +0800 Message-Id: <20260520075641.1931080-3-zhen.ni@easystack.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260520075641.1931080-1-zhen.ni@easystack.cn> References: <20260520075641.1931080-1-zhen.ni@easystack.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-HM-Tid: 0a9e44634d2c0229kunm5cf2619843989 X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVkaT0tMVkhPT08ZHUlLSEwfQ1YVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVKS0 lPT09IVUpLS1VKQktLWQY+ Content-Type: text/plain; charset="utf-8" Add NUMA node filtering functionality to page_owner to allow filtering pages by specific NUMA node(s). This is useful for NUMA-aware memory allocation analysis and debugging. The filter supports flexible input formats: - Single node: nid=3D0 - Multiple nodes: nid=3D0,2,3 - Node range: nid=3D0-3 - Mixed format: nid=3D0,2-4,7 Example usage: # Using the page_owner_filter tool (recommended) ./page_owner_filter -n 0-3 ./page_owner_filter -m stack_handle -n 0,2-4,7 The implementation uses per-file-descriptor filter state stored in file->private_data, allowing each opener to have independent filter configuration. It uses nodemask_t for efficient multi-node filtering and nodelist_parse() for flexible input parsing. Node validity is verified using nodes_subset() to reject nodes without memory. Signed-off-by: Zhen Ni --- Changes in v8: - Add cond_resched() in page iteration loop to prevent RCU stalls - Reject empty nid list to avoid enabling an empty filter - Improve comment: "Commit all filter changes" Changes in v7: - per-file-descriptor implementation Changes in v6: - Add node validity check using nodes_subset to reject invalid node numbers that don't exist in the system - Move bool filter_by_nid declaration to top of block - Use kmalloc_objs instead of kmalloc - Remove 100 bytes overhead Changes in v5: - Optimize nodes_empty() check in page iteration loop - Add __data_racy qualifier to nid_mask field Changes in v4: - Remove "-1" support, use empty string to clear filter - Use strncpy_from_user() instead of copy_from_user() - Add concurrency safety documentation for nid_mask access - Rename fops to page_owner_nid_filter_fops for consistency Changes in v3: - Remove READ_ONCE/WRITE_ONCE for nodemask_t (fixes compilation errors) * nodemask_t is a large structure (128 bytes) that triggers compile-time = asserts * Direct assignment is safe for this use case - Add comment explaining input length calculation formula * 6 bytes =3D ",NNNNN" (comma + 5-digit node number) - Simplify "-1" check using kstrtoint() instead of dual strcmp() - Move nodemask_t mask read outside PFN iteration loop for performance * Avoids 128-byte structure copy on each iteration Changes in v2: - Use nodemask_t instead of int to support multiple nodes - Implement nodelist_parse() to support flexible input formats * Single node: "0", "2" * Multiple nodes: "0,2,3" * Ranges: "0-3" * Mixed: "0,2-4,7" - Use %*pbl format for output (e.g., "0-2", "0,2-4,7") - Use dynamic memory allocation (kmalloc) to handle variable-length input - Follow cpuset's max_write_len pattern: (100 + 6 * MAX_NUMNODES) v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-3-zhen.ni@easys= tack.cn/ v6: https://lore.kernel.org/linux-mm/20260511033017.747781-3-zhen.ni@easyst= ack.cn/ v5: https://lore.kernel.org/linux-mm/20260507064643.179187-3-zhen.ni@easyst= ack.cn/ v4: https://lore.kernel.org/linux-mm/20260430163247.13628-3-zhen.ni@easysta= ck.cn/ v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-4-zhen.ni@easys= tack.cn/ v2: https://lore.kernel.org/linux-mm/20260419155540.376847-4-zhen.ni@easyst= ack.cn/ v1: https://lore.kernel.org/linux-mm/20260417154638.22370-4-zhen.ni@easysta= ck.cn/ --- mm/page_owner.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 47 insertions(+), 3 deletions(-) diff --git a/mm/page_owner.c b/mm/page_owner.c index d0c428d6cac3..59cfbc64a117 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -68,6 +68,8 @@ static const char * const page_owner_print_mode_strings[]= =3D { =20 struct page_owner_filter_state { enum page_owner_print_mode print_mode; + nodemask_t nid_filter; + bool nid_filter_enabled; }; =20 static bool page_owner_enabled __initdata; @@ -767,6 +769,13 @@ read_page_owner(struct file *file, char __user *buf, s= ize_t count, loff_t *ppos) if (!handle) goto ext_put_continue; =20 + if (state->nid_filter_enabled) { + int page_nid =3D page_to_nid(page); + + if (!node_isset(page_nid, state->nid_filter)) + goto ext_put_continue; + } + /* Record the next PFN to read in the file offset */ *ppos =3D pfn + 1; =20 @@ -776,6 +785,8 @@ read_page_owner(struct file *file, char __user *buf, si= ze_t count, loff_t *ppos) &page_owner_tmp, handle, state); ext_put_continue: page_ext_put(page_ext); + if (need_resched()) + cond_resched(); } =20 return 0; @@ -883,6 +894,8 @@ static int page_owner_open(struct inode *inode, struct = file *file) return -ENOMEM; =20 state->print_mode =3D PAGE_OWNER_PRINT_STACK; + nodes_clear(state->nid_filter); + state->nid_filter_enabled =3D false; file->private_data =3D state; return 0; } @@ -903,12 +916,18 @@ static ssize_t page_owner_write(struct file *file, int ret; size_t max_input_len; struct page_owner_filter_state *state =3D file->private_data; + enum page_owner_print_mode new_print_mode =3D state->print_mode; + nodemask_t new_nid_filter =3D state->nid_filter; + bool new_nid_filter_enabled =3D state->nid_filter_enabled; =20 /* * Maximum input length for filter commands: - * 32: print_mode command max length is 17 ("mode=3Dstack_handle"). + * - 32: print_mode command max length is 17 ("mode=3Dstack_handle") + * with sufficient buffer + * - 6 * MAX_NUMNODES: worst case for nid list + * Worst case per node: ",NNNNN" (comma + 5-digit node number) =3D 6 by= tes */ - max_input_len =3D 32; + max_input_len =3D 32 + 6 * MAX_NUMNODES; =20 if (count > max_input_len) return -EINVAL; @@ -928,13 +947,38 @@ static ssize_t page_owner_write(struct file *file, token + 5); if (ret < 0) goto out_free; - state->print_mode =3D ret; + new_print_mode =3D ret; + } else if (!strncmp(token, "nid=3D", 4)) { + ret =3D nodelist_parse(token + 4, new_nid_filter); + if (ret < 0) + goto out_free; + + if (nodes_empty(new_nid_filter)) { + ret =3D -EINVAL; + goto out_free; + } + + /* + * We want to filter memory allocations by numa nodes, so make sure + * that the specified nodes have memory. + */ + if (!nodes_subset(new_nid_filter, node_states[N_MEMORY])) { + ret =3D -EINVAL; + goto out_free; + } + + new_nid_filter_enabled =3D true; } else { ret =3D -EINVAL; goto out_free; } } =20 + /* Commit all filter changes */ + state->print_mode =3D new_print_mode; + state->nid_filter =3D new_nid_filter; + state->nid_filter_enabled =3D new_nid_filter_enabled; + ret =3D count; =20 out_free: --=20 2.20.1 From nobody Mon May 25 00:08:04 2026 Received: from mail-m1973194.qiye.163.com (mail-m1973194.qiye.163.com [220.197.31.94]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C17A23976B2 for ; Wed, 20 May 2026 11:26:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.31.94 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779276391; cv=none; b=ZKlgNmxxFu/YvPrMQiNMJdeRfdi/Vm5Dd2FwHTykFIISz6JfQ4aITE9InI0cg1xvxV+eTsLXcK3EGimtIKvfECeLaU5czmcJUeUvB0My3Xcwq4UruNBwZWxA0W4IZjtwBgPSyWf9xogxqDvUjvGOI+NYGUw+CJ07doTmniclfiE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779276391; c=relaxed/simple; bh=VxQ3SkjirJyPCs5PNlSBRjm5BGn5efJSPGqBgOXAq6U=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=M6p7Hls5k1FFBRx9UBQqA/gxCCvRlBQIspWdYMBFJyztpQ25ixXils/aZOlV92VLkQEy9hK4yOQ56p1KVuKEYRKzMv9asBO7Z4OhSa0AdraYvEAkTMzwPo40ndZmEKlM5fnq4P2+tX1t8GzcLOUzSqwlTSA71RZMdspbeRFq0SU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=220.197.31.94 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from localhost.localdomain (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 1a5d18cb9; Wed, 20 May 2026 15:57:00 +0800 (GMT+08:00) From: Zhen Ni To: akpm@linux-foundation.org, vbabka@kernel.org Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhen Ni Subject: [PATCH v8 3/4] tools/mm: add page_owner_filter userspace tool Date: Wed, 20 May 2026 15:56:40 +0800 Message-Id: <20260520075641.1931080-4-zhen.ni@easystack.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260520075641.1931080-1-zhen.ni@easystack.cn> References: <20260520075641.1931080-1-zhen.ni@easystack.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-HM-Tid: 0a9e4463538c0229kunm5cf2619843998 X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVlCThkdVkJNSRpIHUMYThlLSVYVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVKS0 lPT09IVUpLS1VKQktLWQY+ Content-Type: text/plain; charset="utf-8" Add a userspace filtering tool for page_owner that supports per-fd filtering with print_mode and NUMA node filters. Features: - Three print modes: stack (default), handle, stack_handle - NUMA node filtering with flexible formats (single: 0, multiple: 0,1,2, range: 0-3, mixed: 0,2-3) - Per-file-descriptor filter state for independent filtering Usage examples: # Filter by print mode ./page_owner_filter -m handle ./page_owner_filter -m stack_handle # Filter by NUMA node ./page_owner_filter -n 0 ./page_owner_filter -n 0-3 # Combined filters ./page_owner_filter -m stack -n 0,1,2 ./page_owner_filter -m handle -n 0,2-3 The tool validates inputs before sending commands to the kernel and provides clear error messages when the kernel does not support per-fd filtering. Signed-off-by: Zhen Ni --- Changes in v8: - Add validation to reject multiple dashes in nid list (e.g., "1-2-3") - Fix snprintf return value handling to prevent command overflow Changes in v7: - New patch for userspace tool v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-4-zhen.ni@easys= tack.cn/ --- tools/mm/Makefile | 4 +- tools/mm/page_owner_filter.c | 293 +++++++++++++++++++++++++++++++++++ 2 files changed, 295 insertions(+), 2 deletions(-) create mode 100644 tools/mm/page_owner_filter.c diff --git a/tools/mm/Makefile b/tools/mm/Makefile index f5725b5c23aa..858186a6eefd 100644 --- a/tools/mm/Makefile +++ b/tools/mm/Makefile @@ -3,7 +3,7 @@ # include ../scripts/Makefile.include =20 -BUILD_TARGETS=3Dpage-types slabinfo page_owner_sort thp_swap_allocator_test +BUILD_TARGETS=3Dpage-types slabinfo page_owner_sort page_owner_filter thp_= swap_allocator_test INSTALL_TARGETS =3D $(BUILD_TARGETS) thpmaps =20 LIB_DIR =3D ../lib/api @@ -23,7 +23,7 @@ $(LIBS): $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) =20 clean: - $(RM) page-types slabinfo page_owner_sort thp_swap_allocator_test + $(RM) page-types slabinfo page_owner_sort page_owner_filter thp_swap_allo= cator_test make -C $(LIB_DIR) clean =20 sbindir ?=3D /usr/sbin diff --git a/tools/mm/page_owner_filter.c b/tools/mm/page_owner_filter.c new file mode 100644 index 000000000000..d056b9bb626a --- /dev/null +++ b/tools/mm/page_owner_filter.c @@ -0,0 +1,293 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * User-space helper to filter page_owner output per-fd + * + * Example use: + * ./page_owner_filter -m handle + * ./page_owner_filter -m stack_handle + * ./page_owner_filter -n 0,1,2 + * + * See Documentation/mm/page_owner.rst + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#define MAX_CMD_LEN 512 + +static void usage(const char *prog) +{ + fprintf(stderr, "Usage: %s [OPTIONS]\n", prog); + fprintf(stderr, "\nOptions:\n"); + fprintf(stderr, " -m, --mode MODE : print_mode (stack, handle, or s= tack_handle)\n"); + fprintf(stderr, " -n, --nid NID_LIST : NUMA node IDs (comma-separated = or ranges)\n"); + fprintf(stderr, " -o, --output FILE : output file (default: stdout)\n= "); + fprintf(stderr, " -h, --help : show this help message\n"); + fprintf(stderr, "\nExamples:\n"); + fprintf(stderr, " %s -m stack\n", prog); + fprintf(stderr, " %s -m handle\n", prog); + fprintf(stderr, " %s -m stack_handle\n", prog); + fprintf(stderr, " %s -m stack -o output.txt\n", prog); + fprintf(stderr, " %s -n 0,1,2\n", prog); + fprintf(stderr, " %s -m stack -n 0\n", prog); +} + +static int validate_mode(const char *mode) +{ + if (strcmp(mode, "stack") =3D=3D 0 || + strcmp(mode, "handle") =3D=3D 0 || + strcmp(mode, "stack_handle") =3D=3D 0) + return 0; + + fprintf(stderr, "Error: Invalid mode '%s'\n", mode); + fprintf(stderr, "Valid modes: stack, handle, stack_handle\n"); + return -1; +} + +static int validate_nid_list(const char *nid_list) +{ + const char *p; + int i =3D 0; + int has_digit =3D 0; + int in_range =3D 0; + int prev_num =3D 0; + int curr_num =3D 0; + + if (!nid_list || strlen(nid_list) =3D=3D 0) + return 0; + + for (p =3D nid_list; *p; p++) { + if (*p =3D=3D ',') { + if (!has_digit) { + fprintf(stderr, "Error: Invalid nid_list format\n"); + return -1; + } + if (in_range && prev_num > curr_num) { + fprintf(stderr, + "Error: Invalid range %d-%d (start must be <=3D end)\n", + prev_num, curr_num); + return -1; + } + i =3D 0; + has_digit =3D 0; + in_range =3D 0; + prev_num =3D 0; + curr_num =3D 0; + continue; + } + + if (*p =3D=3D '-') { + if (!has_digit) { + fprintf(stderr, + "Error: Invalid nid_list format "); + fprintf(stderr, + "(dash without preceding number)\n"); + return -1; + } + if (in_range) { + fprintf(stderr, "Error: Multiple dashes in nid_list\n"); + return -1; + } + prev_num =3D curr_num; + curr_num =3D 0; + i =3D 0; + has_digit =3D 0; + in_range =3D 1; + continue; + } + + if (!isdigit(*p)) { + fprintf(stderr, "Error: Invalid character '%c' in nid_list\n", *p); + return -1; + } + + if (i > 5) { + fprintf(stderr, "Error: NID too long (max 65536)\n"); + return -1; + } + curr_num =3D curr_num * 10 + (*p - '0'); + i++; + has_digit =3D 1; + } + + if (!has_digit) { + fprintf(stderr, "Error: Invalid nid_list format\n"); + return -1; + } + + if (in_range && prev_num > curr_num) { + fprintf(stderr, + "Error: Invalid range %d-%d (start must be <=3D end)\n", + prev_num, curr_num); + return -1; + } + + return 0; +} + +int main(int argc, char *argv[]) +{ + const char *output_file =3D NULL; + char filter_cmd[MAX_CMD_LEN]; + FILE *output =3D NULL; + int fd =3D -1; + ssize_t ret; + char buf[4096]; + int opt; + size_t cmd_len =3D 0; + + static struct option long_options[] =3D { + {"mode", required_argument, 0, 'm'}, + {"nid", required_argument, 0, 'n'}, + {"output", required_argument, 0, 'o'}, + {"help", no_argument, 0, 'h'}, + {0, 0, 0, 0} + }; + + filter_cmd[0] =3D '\0'; + + if (argc > 1) { + for (int i =3D 1; i < argc; i++) { + if (strcmp(argv[i], "-h") =3D=3D 0 || strcmp(argv[i], "--help") =3D=3D = 0) { + usage(argv[0]); + return 0; + } + } + } + + /* Check if page_owner exists and is readable */ + if (access("/sys/kernel/debug/page_owner", F_OK) !=3D 0) { + if (errno =3D=3D ENOENT) + fprintf(stderr, "Error: /sys/kernel/debug/page_owner does not exist\n"); + else + perror("Error accessing /sys/kernel/debug/page_owner"); + fprintf(stderr, "Make sure page_owner is enabled in kernel\n"); + return 1; + } + + while ((opt =3D getopt_long(argc, argv, "m:n:o:h", long_options, NULL)) != =3D -1) { + int len; + + switch (opt) { + case 'm': { + const char *mode =3D optarg; + + if (validate_mode(mode) < 0) + return 1; + len =3D snprintf(filter_cmd + cmd_len, MAX_CMD_LEN - cmd_len, + "%smode=3D%s", cmd_len > 0 ? " " : "", mode); + if (len < 0 || cmd_len + len >=3D MAX_CMD_LEN) { + fprintf(stderr, "Error: Command too long\n"); + return 1; + } + cmd_len +=3D len; + break; + } + case 'n': { + const char *nid_list =3D optarg; + + if (validate_nid_list(nid_list) < 0) + return 1; + len =3D snprintf(filter_cmd + cmd_len, MAX_CMD_LEN - cmd_len, + "%snid=3D%s", cmd_len > 0 ? " " : "", nid_list); + if (len < 0 || cmd_len + len >=3D MAX_CMD_LEN) { + fprintf(stderr, "Error: Command too long\n"); + return 1; + } + cmd_len +=3D len; + break; + } + case 'o': + output_file =3D optarg; + break; + case 'h': + /* Already handled above */ + break; + default: + usage(argv[0]); + return 1; + } + } + + /* At least one filter must be specified */ + if (cmd_len =3D=3D 0) { + fprintf(stderr, "Error: At least one filter (-m or -n) must be specified= \n\n"); + usage(argv[0]); + return 1; + } + + /* Open page_owner for read-write - this will fail if kernel doesn't supp= ort write */ + fd =3D open("/sys/kernel/debug/page_owner", O_RDWR); + if (fd < 0) { + if (errno =3D=3D EACCES || errno =3D=3D EPERM) { + fprintf(stderr, "Error: /sys/kernel/debug/page_owner "); + fprintf(stderr, "does not support write access\n"); + fprintf(stderr, "This kernel does not support "); + fprintf(stderr, "per-fd filtering.\n"); + fprintf(stderr, "Please ensure you have a kernel with "); + fprintf(stderr, "per-fd filtering support.\n"); + } else { + perror("Error opening /sys/kernel/debug/page_owner"); + } + return 1; + } + + if (output_file) { + output =3D fopen(output_file, "w"); + if (!output) { + perror("open output file"); + close(fd); + return 1; + } + } else { + output =3D stdout; + } + + ret =3D write(fd, filter_cmd, strlen(filter_cmd)); + + if (ret < 0) { + if (errno =3D=3D EINVAL) { + fprintf(stderr, "Error: Kernel rejected the filter command.\n"); + fprintf(stderr, "Possible causes:\n"); + fprintf(stderr, " - Kernel does not support per-fd filtering\n"); + fprintf(stderr, " - NUMA node has no memory\n"); + fprintf(stderr, " - Unknown reason\n"); + } else { + perror("write filter command"); + } + close(fd); + if (output !=3D stdout) + fclose(output); + return 1; + } + + if ((size_t)ret !=3D strlen(filter_cmd)) + fprintf(stderr, "Warning: Partial write (%zd/%zu)\n", ret, strlen(filter= _cmd)); + + /* Read and display filtered output */ + while ((ret =3D read(fd, buf, sizeof(buf) - 1)) > 0) { + buf[ret] =3D '\0'; + fprintf(output, "%s", buf); + fflush(output); + } + + if (ret < 0) { + perror("read page_owner"); + close(fd); + if (output !=3D stdout) + fclose(output); + return 1; + } + + close(fd); + if (output !=3D stdout) + fclose(output); + + return 0; +} --=20 2.20.1 From nobody Mon May 25 00:08:04 2026 Received: from mail-m15586.qiye.163.com (mail-m15586.qiye.163.com [101.71.155.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D838328B5E for ; Wed, 20 May 2026 08:12:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=101.71.155.86 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779264752; cv=none; b=qqzpbflnyDQyoYdR2qyCa3MyiKlV1VAgX8P0yZnJS3VJ80rm7i4CvXMsM6oC4WoZLrFdGaWru8CIabW6CpJAbVOYUPqjhdIxw2WgLiQp6IVzRYr5wmCb2/o/76jlIzh6CYd0jsksBQ2P57H7NSVAPUOp87wHgc3DNQCkp9L5J+U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779264752; c=relaxed/simple; bh=4NQHT9Ydv0oRGE8vNDuSDO9h1Fv1JZbyfwwSd8Q2Qfs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HRAUDg0sg6tsxzuRE3/JWVQxLsvsnBK/pHVEKhr3OmLIM3d9HF7mDbq+E1Ts2rJ3jDlwEjTy5LQJTi+JybDrPMkItMMSLFQ37Ax0Z8XYsA+Y1e2jy3PH5pqso3GdlCt7amgF15pCadBw+8Pfx9XM233N6eyalbDxhI4n380XvZ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=101.71.155.86 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from localhost.localdomain (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 1a5d18cbf; Wed, 20 May 2026 15:57:02 +0800 (GMT+08:00) From: Zhen Ni To: akpm@linux-foundation.org, vbabka@kernel.org Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhen Ni Subject: [PATCH v8 4/4] mm/page_owner: document page_owner filter Date: Wed, 20 May 2026 15:56:41 +0800 Message-Id: <20260520075641.1931080-5-zhen.ni@easystack.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260520075641.1931080-1-zhen.ni@easystack.cn> References: <20260520075641.1931080-1-zhen.ni@easystack.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-HM-Tid: 0a9e446359f50229kunm5cf26198439a0 X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVlDHkofVklOSkkZGE9DH01PGVYVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVKS0 lPT09IVUpLS1VKQktLWQY+ Content-Type: text/plain; charset="utf-8" Add documentation for the page_owner_filter userspace tool and kernel-level filtering features. Signed-off-by: Zhen Ni --- Changes in v8: - Fix Sphinx double colon warning Changes in v7: - document for per-file-descriptor implementation Changes in v6: - No code changes Changes in v5: - No code changes Changes in v4: - Update print_mode documentation to reflect string-based interface * Change from "0/1" to "full_stack"/"stack_handle" * Add bracket notation example: "[full_stack] stack_handle" - Update NUMA filter documentation * Remove "-1" example * Add empty string as clear method - Fix indentation: use tabs instead of spaces in code examples Changes in v3: - New patch to document filter features as requested by Andrew Morton v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-5-zhen.ni@easys= tack.cn/ v6: https://lore.kernel.org/linux-mm/20260511033017.747781-4-zhen.ni@easyst= ack.cn/ v5: https://lore.kernel.org/linux-mm/20260507064643.179187-4-zhen.ni@easyst= ack.cn/ v4: https://lore.kernel.org/linux-mm/20260430163247.13628-4-zhen.ni@easysta= ck.cn/ v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-5-zhen.ni@easys= tack.cn/ --- Documentation/mm/page_owner.rst | 77 ++++++++++++++++++++++++++++++++- 1 file changed, 75 insertions(+), 2 deletions(-) diff --git a/Documentation/mm/page_owner.rst b/Documentation/mm/page_owner.= rst index 6b12f3b007ec..383e59c42743 100644 --- a/Documentation/mm/page_owner.rst +++ b/Documentation/mm/page_owner.rst @@ -65,7 +65,14 @@ un-tracking state. Usage =3D=3D=3D=3D=3D =20 -1) Build user-space helper:: +1) Build user-space helpers:: + +To filter page_owner output: + + cd tools/mm + make page_owner_filter + +To sort and analyze page_owner output: =20 cd tools/mm make page_owner_sort @@ -74,7 +81,11 @@ Usage =20 3) Do the job that you want to debug. =20 -4) Analyze information from page owner:: +4) (Optional) Filter page_owner output:: + + ./page_owner_filter -m handle -n 0,1,2 > filtered_page_owner.txt + +5) Analyze information from page owner:: =20 cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt cat stacks.txt @@ -263,3 +274,65 @@ STANDARD FORMAT SPECIFIERS f free whether the page has been released or not st stacktrace stack trace of the page allocation ator allocator memory allocator for pages + +Filtering page_owner output +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D + +page_owner supports filtering output at the kernel level before reading, +which reduces the amount of data that needs to be processed in userspace. + +The page_owner_filter tool provides a convenient interface for this filter= ing +capability. It supports two types of filters: + +1. **print_mode filter**: Control what information is printed for each page + - ``stack``: Print full stack traces (default, compatible with existing u= sage) + - ``handle``: Print only stack handle numbers (much faster, smaller outpu= t) + - ``stack_handle``: Print both stack traces and handle numbers + + The ``handle`` mode uses numeric identifiers instead of full stack traces. + The mapping from handles to actual stack traces can be obtained via the + show_stacks_handles interface. + +2. **NUMA node filter**: Filter pages by NUMA node ID + - Supports single node: ``-n 0`` + - Multiple nodes: ``-n 0,1,2`` + - Ranges: ``-n 0-3`` + - Mixed format: ``-n 0,2-3,5`` + +Usage examples:: + + # Filter by print mode + ./page_owner_filter -m handle + ./page_owner_filter -m stack_handle + + # Filter by NUMA node + ./page_owner_filter -n 0 + ./page_owner_filter -n 0-3 + + # Combined filters + ./page_owner_filter -m stack -n 0,1,2 + ./page_owner_filter -m handle -n 0,2-3 + + # Save to file + ./page_owner_filter -m handle -o filtered_output.txt + +The handle mode is particularly useful for monitoring and performance-crit= ical +scenarios as it dramatically reduces output size. Testing shows handle mod= e can +reduce output size by ~66% (84MB vs 244MB) and improve read performance by= ~4.4x +compared to full stack output. + +The NUMA node filter is useful for NUMA-aware memory allocation analysis a= nd debugging. + +Behind the scenes, page_owner_filter opens /sys/kernel/debug/page_owner and +writes filter commands before reading the filtered output. The filtering u= ses +per-file-descriptor state, allowing each open() to have independent filter= settings. + +Each file descriptor maintains its own filter state, so you can have multi= ple +independent filtering operations running concurrently. For example, in dif= ferent +terminals you can run different filters simultaneously:: + + # Terminal 1: Filter node 0 + ./page_owner_filter -n 0 > node0_output.txt + + # Terminal 2: Filter node 1 (runs concurrently) + ./page_owner_filter -n 1 > node1_output.txt --=20 2.20.1