From nobody Mon Jun 8 23:56:26 2026 Received: from mail-m1973194.qiye.163.com (mail-m1973194.qiye.163.com [220.197.31.94]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D155731E83C for ; Mon, 25 May 2026 08:22:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.31.94 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779697348; cv=none; b=i38QK3xIbdCdZrCu1BSkkaHV2yK2KopvTk0ZC3Q4Fy0zbMV1DQxEpas/95SKYROTijrb87WAyXsI3b6PjdvK7dIm//Ef2ugF8dStYGQzycVpcs/SsHFxO1SATeQoQg1dQdlS6+IW7AC3/3AHLdq4QbBtvG8UvfhxFJars9OKE+I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779697348; c=relaxed/simple; bh=rb06pQ9hGjJYw6E7VWSnYmmgxmJNow4z9dFat1Vf+Kg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LwsgHuVtayJJjUbviAc0kQoixzJdKBe5uVZpcohw+fwRMnHUa5l1FvsNHoq0rJBdpxVtIRc2UlRbYALeZNMqobvv0ZLpH9WjjYEZXvbi19bQFBtBS5WEbXm9ENVn60nTm6RTpERT4o5jKWuqDXV00/yegVtvNZXMsVJHxADLlVM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=220.197.31.94 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from localhost.localdomain (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 1a8ce5df9; Mon, 25 May 2026 16:17:05 +0800 (GMT+08:00) From: Zhen Ni To: akpm@linux-foundation.org, vbabka@kernel.org Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhen Ni Subject: [PATCH v9 1/4] mm/page_owner: add print_mode filter Date: Mon, 25 May 2026 16:16:49 +0800 Message-Id: <20260525081652.2210206-2-zhen.ni@easystack.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260525081652.2210206-1-zhen.ni@easystack.cn> References: <20260525081652.2210206-1-zhen.ni@easystack.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-HM-Tid: 0a9e5e3581b90229kunm9b83604117f12d X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVlDSEIZVh9LGENKGUwdGUJKQ1YVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVKS0 lPT09IVUpLS1VKQktLWQY+ Content-Type: text/plain; charset="utf-8" Add a print_mode filter to page_owner that allows users to choose between printing stack traces, stack handles, or both, providing flexibility for different debugging and analysis scenarios. The filter provides three modes via page_owner: - Writing "mode=3Dstack" prints stack traces for each page (default) - Writing "mode=3Dhandle" prints only the handle number - Writing "mode=3Dstack_handle" prints both stack traces and handles The default stack mode maintains backward compatibility with existing usage, displaying complete stack traces for each page allocation. The handle mode dramatically reduces log size and improves performance by showing only the handle number instead of the full stack trace. Testing shows handle mode reduces output size by ~66% (84MB vs 244MB) and improves read performance by ~4.4x compared to full stack output. The mapping from handles to actual stack traces can be obtained via the show_stacks_handles interface. The stack_handle mode prints both stack traces and handles, making it easier to identify pages with the same allocation pattern by comparing handle numbers instead of comparing large stack traces. Example usage: # Using the page_owner_filter tool (recommended) ./page_owner_filter -m stack # Print only stack traces (default) ./page_owner_filter -m handle # Print only handles ./page_owner_filter -m stack_handle # Print both stack and handles Sample output (handle mode): Page allocated via order 0, migratetype Unmovable, gfp_mask 0x1100ca, pid 1, tgid 1 (systemd), ts 123456789 ns PFN 0x1000 type Unmovable Block 1 type Unmovable Flags 0x3fffe800000084(referenced|lru|active|private|node=3D0|zone=3D1) handle: 17432583 ... This implementation uses per-file-descriptor filter state stored in file->private_data, allowing each opener to have independent filter configuration. Signed-off-by: Zhen Ni --- Changes in v9: - Add spinlock_t lock to struct page_owner_filter_state for concurrent acce= ss protection Changes in v8: - Fix buffer overflow by adding bounds check between stack_depot_snprint() = and scnprintf() - Fix unsafe string handling: use memdup_user_nul() instead of kmalloc_objs= + strncpy_from_user() - Fix strsep() memory corruption by saving original pointer before strsep()= call - Change format specifier from %d to %u for depot_stack_handle_t Changes in v7: - per-file-descriptor implementation Changes in v6: - Remove unnecessary braces in if/else statement (coding style) - Use stack array (char kbuf[33]) instead of kmalloc for input buffer Changes in v5: - No code changes Changes in v4: - Change from numeric (0/1) to string-based interface ("full_stack"/"stack_= handle") - Merge infrastructure patch into this patch Changes in v3: - No code changes Changes in v2: - Renamed from 'compact mode' to 'print_mode' for better clarity - Use enum values (0=3Dfull_stack, 1=3Dstack_handle) instead of boolean - Update debugfs filename from 'compact' to 'print_mode' v8: https://lore.kernel.org/linux-mm/20260520075641.1931080-2-zhen.ni@easys= tack.cn/ v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-2-zhen.ni@easys= tack.cn/ v6: https://lore.kernel.org/linux-mm/20260511033017.747781-2-zhen.ni@easyst= ack.cn/ v5: https://lore.kernel.org/linux-mm/20260507064643.179187-2-zhen.ni@easyst= ack.cn/ v4: https://lore.kernel.org/linux-mm/20260430163247.13628-2-zhen.ni@easysta= ck.cn/ v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-2-zhen.ni@easys= tack.cn/ https://lore.kernel.org/linux-mm/20260428071112.1420380-3-zhen.ni@easys= tack.cn/ v2: https://lore.kernel.org/linux-mm/20260419155540.376847-2-zhen.ni@easyst= ack.cn/ https://lore.kernel.org/linux-mm/20260419155540.376847-3-zhen.ni@easyst= ack.cn/ v1: https://lore.kernel.org/linux-mm/20260417154638.22370-2-zhen.ni@easysta= ck.cn/ https://lore.kernel.org/linux-mm/20260417154638.22370-3-zhen.ni@easysta= ck.cn/ --- mm/page_owner.c | 129 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 123 insertions(+), 6 deletions(-) diff --git a/mm/page_owner.c b/mm/page_owner.c index 8178e0be557f..7595735979bf 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -54,6 +54,23 @@ struct stack_print_ctx { u8 flags; }; =20 +enum page_owner_print_mode { + PAGE_OWNER_PRINT_STACK, + PAGE_OWNER_PRINT_HANDLE, + PAGE_OWNER_PRINT_STACK_HANDLE, +}; + +static const char * const page_owner_print_mode_strings[] =3D { + [PAGE_OWNER_PRINT_STACK] =3D "stack", + [PAGE_OWNER_PRINT_HANDLE] =3D "handle", + [PAGE_OWNER_PRINT_STACK_HANDLE] =3D "stack_handle", +}; + +struct page_owner_filter_state { + enum page_owner_print_mode print_mode; + spinlock_t lock; +}; + static bool page_owner_enabled __initdata; DEFINE_STATIC_KEY_FALSE(page_owner_inited); =20 @@ -547,16 +564,23 @@ static inline int print_page_owner_memcg(char *kbuf, = size_t count, int ret, static ssize_t print_page_owner(char __user *buf, size_t count, unsigned long pfn, struct page *page, struct page_owner *page_owner, - depot_stack_handle_t handle) + depot_stack_handle_t handle, + struct page_owner_filter_state *state) { int ret, pageblock_mt, page_mt; char *kbuf; + enum page_owner_print_mode print_mode; + unsigned long flags; =20 count =3D min_t(size_t, count, PAGE_SIZE); kbuf =3D kmalloc(count, GFP_KERNEL); if (!kbuf) return -ENOMEM; =20 + spin_lock_irqsave(&state->lock, flags); + print_mode =3D state->print_mode; + spin_unlock_irqrestore(&state->lock, flags); + ret =3D scnprintf(kbuf, count, "Page allocated via order %u, mask %#x(%pGg), pid %d, tgid %d (%s), ts = %llu ns\n", page_owner->order, page_owner->gfp_mask, @@ -575,9 +599,18 @@ print_page_owner(char __user *buf, size_t count, unsig= ned long pfn, migratetype_names[pageblock_mt], &page->flags); =20 - ret +=3D stack_depot_snprint(handle, kbuf + ret, count - ret, 0); - if (ret >=3D count) - goto err; + if (print_mode !=3D PAGE_OWNER_PRINT_HANDLE) { + ret +=3D stack_depot_snprint(handle, kbuf + ret, count - ret, 0); + if (ret >=3D count) + goto err; + } + + if (print_mode !=3D PAGE_OWNER_PRINT_STACK) { + ret +=3D scnprintf(kbuf + ret, count - ret, "handle: %u\n", + handle); + if (ret >=3D count) + goto err; + } =20 if (page_owner->last_migrate_reason !=3D -1) { ret +=3D scnprintf(kbuf + ret, count - ret, @@ -664,6 +697,7 @@ read_page_owner(struct file *file, char __user *buf, si= ze_t count, loff_t *ppos) struct page_ext *page_ext; struct page_owner *page_owner; depot_stack_handle_t handle; + struct page_owner_filter_state *state =3D file->private_data; =20 if (!static_branch_unlikely(&page_owner_inited)) return -EINVAL; @@ -746,7 +780,7 @@ read_page_owner(struct file *file, char __user *buf, si= ze_t count, loff_t *ppos) page_owner_tmp =3D *page_owner; page_ext_put(page_ext); return print_page_owner(buf, count, pfn, page, - &page_owner_tmp, handle); + &page_owner_tmp, handle, state); ext_put_continue: page_ext_put(page_ext); } @@ -847,7 +881,90 @@ static void init_early_allocated_pages(void) init_pages_in_zone(zone); } =20 +static int page_owner_open(struct inode *inode, struct file *file) +{ + struct page_owner_filter_state *state; + + state =3D kzalloc_obj(*state); + if (!state) + return -ENOMEM; + + spin_lock_init(&state->lock); + state->print_mode =3D PAGE_OWNER_PRINT_STACK; + file->private_data =3D state; + return 0; +} + +static int page_owner_release(struct inode *inode, struct file *file) +{ + kfree(file->private_data); + return 0; +} + +static ssize_t page_owner_write(struct file *file, + const char __user *buf, + size_t count, loff_t *ppos) +{ + char *kbuf; + char *orig; + char *token; + int ret; + size_t max_input_len; + struct page_owner_filter_state *state =3D file->private_data; + enum page_owner_print_mode new_print_mode; + unsigned long flags; + + /* + * Maximum input length for filter commands: + * 32: print_mode command max length is 17 ("mode=3Dstack_handle"). + */ + max_input_len =3D 32; + + if (count > max_input_len) + return -EINVAL; + + kbuf =3D memdup_user_nul(buf, count); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + + orig =3D kbuf; + + spin_lock_irqsave(&state->lock, flags); + new_print_mode =3D state->print_mode; + spin_unlock_irqrestore(&state->lock, flags); + + while ((token =3D strsep(&kbuf, " \t\n")) !=3D NULL) { + if (*token =3D=3D '\0') + continue; + + if (!strncmp(token, "mode=3D", 5)) { + ret =3D sysfs_match_string(page_owner_print_mode_strings, + token + 5); + if (ret < 0) + goto out_free; + new_print_mode =3D ret; + } else { + ret =3D -EINVAL; + goto out_free; + } + } + + spin_lock_irqsave(&state->lock, flags); + state->print_mode =3D new_print_mode; + spin_unlock_irqrestore(&state->lock, flags); + + ret =3D count; + +out_free: + kfree(orig); + return ret; +} + static const struct file_operations page_owner_fops =3D { + .owner =3D THIS_MODULE, + .open =3D page_owner_open, + .release =3D page_owner_release, + .write =3D page_owner_write, .read =3D read_page_owner, .llseek =3D lseek_page_owner, }; @@ -980,7 +1097,7 @@ static int __init pageowner_init(void) return 0; } =20 - debugfs_create_file("page_owner", 0400, NULL, NULL, &page_owner_fops); + debugfs_create_file("page_owner", 0600, NULL, NULL, &page_owner_fops); dir =3D debugfs_create_dir("page_owner_stacks", NULL); debugfs_create_file("show_stacks", 0400, dir, (void *)(STACK_PRINT_FLAG_STACK | --=20 2.20.1 From nobody Mon Jun 8 23:56:26 2026 Received: from mail-m19731114.qiye.163.com (mail-m19731114.qiye.163.com [220.197.31.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 433DB354AE3 for ; Mon, 25 May 2026 10:39:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.31.114 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779705581; cv=none; b=V47oDm21kGARMWJEoV1QwQyH4Y4s1wUbWeWmTuLnmF99F6t5iBOdxpGmWjYM9/JmuTRIl+XoQMA4I/qD1QAzwObBMXADPbThbaRml32WH4Dt+TuG7aA5yJhPLcL1zZ0zs+sSNqG5Cc9uh2hsAUQ/6MrfQz3se/lsdEaw3ijV3Vs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779705581; c=relaxed/simple; bh=fID8iDd33shP9L+2bzpfWPNNWwshUzw+6drCkkoEauU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QuyUTuhbT/yDU162p1qjqrW619ZYG36plVPrAjPdYdShV7XSeJoJ6fLficVkky51TZzKZWMzJWzLyigBkOU8DqfsZdXScx5rw0iL54ezFfexVlfGKns9BJzSKaeRji8lBmSkSmMwsi3trAYiWw++RVgNJZOaSqxQ1LWWfqqyBHc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=220.197.31.114 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from localhost.localdomain (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 1a8ce5e03; Mon, 25 May 2026 16:17:07 +0800 (GMT+08:00) From: Zhen Ni To: akpm@linux-foundation.org, vbabka@kernel.org Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhen Ni Subject: [PATCH v9 2/4] mm/page_owner: add NUMA node filter Date: Mon, 25 May 2026 16:16:50 +0800 Message-Id: <20260525081652.2210206-3-zhen.ni@easystack.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260525081652.2210206-1-zhen.ni@easystack.cn> References: <20260525081652.2210206-1-zhen.ni@easystack.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-HM-Tid: 0a9e5e3589e10229kunm9b83604117f13f X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVkZTEIeVkgfGkweQx5MSh1OHVYVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVKS0 lPT09IVUpLS1VKQktLWQY+ Content-Type: text/plain; charset="utf-8" Add NUMA node filtering functionality to page_owner to allow filtering pages by specific NUMA node(s). This is useful for NUMA-aware memory allocation analysis and debugging. The filter supports flexible input formats: - Single node: nid=3D0 - Multiple nodes: nid=3D0,2,3 - Node range: nid=3D0-3 - Mixed format: nid=3D0,2-4,7 Example usage: # Using the page_owner_filter tool (recommended) ./page_owner_filter -n 0-3 ./page_owner_filter -m stack_handle -n 0,2-4,7 The implementation uses per-file-descriptor filter state stored in file->private_data, allowing each opener to have independent filter configuration. It uses nodemask_t for efficient multi-node filtering and nodelist_parse() for flexible input parsing. Node validity is verified using nodes_subset() to reject nodes without memory. Signed-off-by: Zhen Ni --- Changes in v9: - Add spinlock protection for NUMA filter state access - Use memdesc_nid() instead of page_to_nid() to bypass PF_POISONED_CHECK() Changes in v8: - Add cond_resched() in page iteration loop to prevent RCU stalls - Reject empty nid list to avoid enabling an empty filter - Improve comment: "Commit all filter changes" Changes in v7: - per-file-descriptor implementation Changes in v6: - Add node validity check using nodes_subset to reject invalid node numbers that don't exist in the system - Move bool filter_by_nid declaration to top of block - Use kmalloc_objs instead of kmalloc - Remove 100 bytes overhead Changes in v5: - Optimize nodes_empty() check in page iteration loop - Add __data_racy qualifier to nid_mask field Changes in v4: - Remove "-1" support, use empty string to clear filter - Use strncpy_from_user() instead of copy_from_user() - Add concurrency safety documentation for nid_mask access - Rename fops to page_owner_nid_filter_fops for consistency Changes in v3: - Remove READ_ONCE/WRITE_ONCE for nodemask_t (fixes compilation errors) * nodemask_t is a large structure (128 bytes) that triggers compile-time = asserts * Direct assignment is safe for this use case - Add comment explaining input length calculation formula * 6 bytes =3D ",NNNNN" (comma + 5-digit node number) - Simplify "-1" check using kstrtoint() instead of dual strcmp() - Move nodemask_t mask read outside PFN iteration loop for performance * Avoids 128-byte structure copy on each iteration Changes in v2: - Use nodemask_t instead of int to support multiple nodes - Implement nodelist_parse() to support flexible input formats * Single node: "0", "2" * Multiple nodes: "0,2,3" * Ranges: "0-3" * Mixed: "0,2-4,7" - Use %*pbl format for output (e.g., "0-2", "0,2-4,7") - Use dynamic memory allocation (kmalloc) to handle variable-length input - Follow cpuset's max_write_len pattern: (100 + 6 * MAX_NUMNODES) v8: https://lore.kernel.org/linux-mm/20260520075641.1931080-3-zhen.ni@easys= tack.cn/ v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-3-zhen.ni@easys= tack.cn/ v6: https://lore.kernel.org/linux-mm/20260511033017.747781-3-zhen.ni@easyst= ack.cn/ v5: https://lore.kernel.org/linux-mm/20260507064643.179187-3-zhen.ni@easyst= ack.cn/ v4: https://lore.kernel.org/linux-mm/20260430163247.13628-3-zhen.ni@easysta= ck.cn/ v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-4-zhen.ni@easys= tack.cn/ v2: https://lore.kernel.org/linux-mm/20260419155540.376847-4-zhen.ni@easyst= ack.cn/ v1: https://lore.kernel.org/linux-mm/20260417154638.22370-4-zhen.ni@easysta= ck.cn/ --- mm/page_owner.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 59 insertions(+), 2 deletions(-) diff --git a/mm/page_owner.c b/mm/page_owner.c index 7595735979bf..9e0fb679303f 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -68,6 +68,8 @@ static const char * const page_owner_print_mode_strings[]= =3D { =20 struct page_owner_filter_state { enum page_owner_print_mode print_mode; + nodemask_t nid_filter; + bool nid_filter_enabled; spinlock_t lock; }; =20 @@ -698,6 +700,7 @@ read_page_owner(struct file *file, char __user *buf, si= ze_t count, loff_t *ppos) struct page_owner *page_owner; depot_stack_handle_t handle; struct page_owner_filter_state *state =3D file->private_data; + unsigned long flags; =20 if (!static_branch_unlikely(&page_owner_inited)) return -EINVAL; @@ -774,6 +777,26 @@ read_page_owner(struct file *file, char __user *buf, s= ize_t count, loff_t *ppos) if (!handle) goto ext_put_continue; =20 + /* + * NUMA filter: if enabled, only output pages from specified nodes. + * We cannot use page_to_nid() here because it calls + * PF_POISONED_CHECK() which triggers VM_BUG_ON_PGFLAGS() when + * the page is in an inconsistent state during concurrent allocation + * or free. Since we're iterating pages without holding the zone + * lock, we need to extract nid directly from page->flags + * without the poisoned check. + */ + spin_lock_irqsave(&state->lock, flags); + if (state->nid_filter_enabled) { + int page_nid =3D memdesc_nid(page->flags); + + if (!node_isset(page_nid, state->nid_filter)) { + spin_unlock_irqrestore(&state->lock, flags); + goto ext_put_continue; + } + } + spin_unlock_irqrestore(&state->lock, flags); + /* Record the next PFN to read in the file offset */ *ppos =3D pfn + 1; =20 @@ -783,6 +806,8 @@ read_page_owner(struct file *file, char __user *buf, si= ze_t count, loff_t *ppos) &page_owner_tmp, handle, state); ext_put_continue: page_ext_put(page_ext); + if (need_resched()) + cond_resched(); } =20 return 0; @@ -891,6 +916,8 @@ static int page_owner_open(struct inode *inode, struct = file *file) =20 spin_lock_init(&state->lock); state->print_mode =3D PAGE_OWNER_PRINT_STACK; + nodes_clear(state->nid_filter); + state->nid_filter_enabled =3D false; file->private_data =3D state; return 0; } @@ -912,13 +939,18 @@ static ssize_t page_owner_write(struct file *file, size_t max_input_len; struct page_owner_filter_state *state =3D file->private_data; enum page_owner_print_mode new_print_mode; + nodemask_t new_nid_filter; + bool new_nid_filter_enabled; unsigned long flags; =20 /* * Maximum input length for filter commands: - * 32: print_mode command max length is 17 ("mode=3Dstack_handle"). + * - 32: print_mode command max length is 17 ("mode=3Dstack_handle") + * with sufficient buffer + * - 6 * MAX_NUMNODES: worst case for nid list + * Worst case per node: ",NNNNN" (comma + 5-digit node number) =3D 6 by= tes */ - max_input_len =3D 32; + max_input_len =3D 32 + 6 * MAX_NUMNODES; =20 if (count > max_input_len) return -EINVAL; @@ -931,6 +963,8 @@ static ssize_t page_owner_write(struct file *file, =20 spin_lock_irqsave(&state->lock, flags); new_print_mode =3D state->print_mode; + new_nid_filter =3D state->nid_filter; + new_nid_filter_enabled =3D state->nid_filter_enabled; spin_unlock_irqrestore(&state->lock, flags); =20 while ((token =3D strsep(&kbuf, " \t\n")) !=3D NULL) { @@ -943,14 +977,37 @@ static ssize_t page_owner_write(struct file *file, if (ret < 0) goto out_free; new_print_mode =3D ret; + } else if (!strncmp(token, "nid=3D", 4)) { + ret =3D nodelist_parse(token + 4, new_nid_filter); + if (ret < 0) + goto out_free; + + if (nodes_empty(new_nid_filter)) { + ret =3D -EINVAL; + goto out_free; + } + + /* + * We want to filter memory allocations by numa nodes, so make sure + * that the specified nodes have memory. + */ + if (!nodes_subset(new_nid_filter, node_states[N_MEMORY])) { + ret =3D -EINVAL; + goto out_free; + } + + new_nid_filter_enabled =3D true; } else { ret =3D -EINVAL; goto out_free; } } =20 + /* Commit all filter changes */ spin_lock_irqsave(&state->lock, flags); state->print_mode =3D new_print_mode; + state->nid_filter =3D new_nid_filter; + state->nid_filter_enabled =3D new_nid_filter_enabled; spin_unlock_irqrestore(&state->lock, flags); =20 ret =3D count; --=20 2.20.1 From nobody Mon Jun 8 23:56:26 2026 Received: from mail-m3284.qiye.163.com (mail-m3284.qiye.163.com [220.197.32.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 327C63E1CEB for ; Mon, 25 May 2026 08:22:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.32.84 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779697355; cv=none; b=s17JAejLxmlrcBryf+S2PtFvQTTm1+l2m4EKrqXfOGAf4K/gJZlQOY2Wsj1g1TPEX/HNfBGR5JAcmfstMqGdrZilntuGZVMNuwaaNzJhLLtkZ5ElT4u25QTFmNcatkFrF2dtpT7wkYN9y8dJ9aerVXc8B8oUxYj27kahMOphuUU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779697355; c=relaxed/simple; bh=DLOZ9CZjSUFz9PwzV535KvGCpDJNIxh16RBqhRsIAC4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=shSYoO6ftRp+sNj4Lo+Q7JO7iIoRuZvwm/VjvjInZX7FIcP0LmpWQEPaza2R5Z6sZYnghvMSKXs/nSvBDw93X7XlQrxZnlyHzequ48miF7Po919ON2s3DtGDtMQeQUl5dMAdWzwICTyV4pkU/4bEOb7S2s5YmM5AfgxBBCMsUOg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=220.197.32.84 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from localhost.localdomain (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 1a8ce5e09; Mon, 25 May 2026 16:17:09 +0800 (GMT+08:00) From: Zhen Ni To: akpm@linux-foundation.org, vbabka@kernel.org Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhen Ni Subject: [PATCH v9 3/4] tools/mm: add page_owner_filter userspace tool Date: Mon, 25 May 2026 16:16:51 +0800 Message-Id: <20260525081652.2210206-4-zhen.ni@easystack.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260525081652.2210206-1-zhen.ni@easystack.cn> References: <20260525081652.2210206-1-zhen.ni@easystack.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-HM-Tid: 0a9e5e3590770229kunm9b83604117f146 X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVlCSEpIVkoaHx1KGRlNSBlJGFYVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVKS0 lPT09IVUpLS1VKQktLWQY+ Content-Type: text/plain; charset="utf-8" Add a userspace filtering tool for page_owner that supports per-fd filtering with print_mode and NUMA node filters. Features: - Three print modes: stack (default), handle, stack_handle - NUMA node filtering with flexible formats (single: 0, multiple: 0,1,2, range: 0-3, mixed: 0,2-3) - Per-file-descriptor filter state for independent filtering Usage examples: # Filter by print mode ./page_owner_filter -m handle ./page_owner_filter -m stack_handle # Filter by NUMA node ./page_owner_filter -n 0 ./page_owner_filter -n 0-3 # Combined filters ./page_owner_filter -m stack -n 0,1,2 ./page_owner_filter -m handle -n 0,2-3 The tool validates inputs before sending commands to the kernel and provides clear error messages when the kernel does not support per-fd filtering. Signed-off-by: Zhen Ni --- Changes in v9: - Fix isdigit() usage: cast to unsigned char to avoid undefined behavior wi= th non-ASCII input - Optimize I/O performance: replace fprintf() + fflush() in loop with fwrit= e() + single fflush() after loop Changes in v8: - Add validation to reject multiple dashes in nid list (e.g., "1-2-3") - Fix snprintf return value handling to prevent command overflow Changes in v7: - New patch for userspace tool v8: https://lore.kernel.org/linux-mm/20260520075641.1931080-4-zhen.ni@easys= tack.cn/ v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-4-zhen.ni@easys= tack.cn/ --- tools/mm/Makefile | 4 +- tools/mm/page_owner_filter.c | 292 +++++++++++++++++++++++++++++++++++ 2 files changed, 294 insertions(+), 2 deletions(-) create mode 100644 tools/mm/page_owner_filter.c diff --git a/tools/mm/Makefile b/tools/mm/Makefile index f5725b5c23aa..858186a6eefd 100644 --- a/tools/mm/Makefile +++ b/tools/mm/Makefile @@ -3,7 +3,7 @@ # include ../scripts/Makefile.include =20 -BUILD_TARGETS=3Dpage-types slabinfo page_owner_sort thp_swap_allocator_test +BUILD_TARGETS=3Dpage-types slabinfo page_owner_sort page_owner_filter thp_= swap_allocator_test INSTALL_TARGETS =3D $(BUILD_TARGETS) thpmaps =20 LIB_DIR =3D ../lib/api @@ -23,7 +23,7 @@ $(LIBS): $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) =20 clean: - $(RM) page-types slabinfo page_owner_sort thp_swap_allocator_test + $(RM) page-types slabinfo page_owner_sort page_owner_filter thp_swap_allo= cator_test make -C $(LIB_DIR) clean =20 sbindir ?=3D /usr/sbin diff --git a/tools/mm/page_owner_filter.c b/tools/mm/page_owner_filter.c new file mode 100644 index 000000000000..9c97740c557f --- /dev/null +++ b/tools/mm/page_owner_filter.c @@ -0,0 +1,292 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * User-space helper to filter page_owner output per-fd + * + * Example use: + * ./page_owner_filter -m handle + * ./page_owner_filter -m stack_handle + * ./page_owner_filter -n 0,1,2 + * + * See Documentation/mm/page_owner.rst + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#define MAX_CMD_LEN 512 + +static void usage(const char *prog) +{ + fprintf(stderr, "Usage: %s [OPTIONS]\n", prog); + fprintf(stderr, "\nOptions:\n"); + fprintf(stderr, " -m, --mode MODE : print_mode (stack, handle, or s= tack_handle)\n"); + fprintf(stderr, " -n, --nid NID_LIST : NUMA node IDs (comma-separated = or ranges)\n"); + fprintf(stderr, " -o, --output FILE : output file (default: stdout)\n= "); + fprintf(stderr, " -h, --help : show this help message\n"); + fprintf(stderr, "\nExamples:\n"); + fprintf(stderr, " %s -m stack\n", prog); + fprintf(stderr, " %s -m handle\n", prog); + fprintf(stderr, " %s -m stack_handle\n", prog); + fprintf(stderr, " %s -m stack -o output.txt\n", prog); + fprintf(stderr, " %s -n 0,1,2\n", prog); + fprintf(stderr, " %s -m stack -n 0\n", prog); +} + +static int validate_mode(const char *mode) +{ + if (strcmp(mode, "stack") =3D=3D 0 || + strcmp(mode, "handle") =3D=3D 0 || + strcmp(mode, "stack_handle") =3D=3D 0) + return 0; + + fprintf(stderr, "Error: Invalid mode '%s'\n", mode); + fprintf(stderr, "Valid modes: stack, handle, stack_handle\n"); + return -1; +} + +static int validate_nid_list(const char *nid_list) +{ + const char *p; + int i =3D 0; + int has_digit =3D 0; + int in_range =3D 0; + int prev_num =3D 0; + int curr_num =3D 0; + + if (!nid_list || strlen(nid_list) =3D=3D 0) + return 0; + + for (p =3D nid_list; *p; p++) { + if (*p =3D=3D ',') { + if (!has_digit) { + fprintf(stderr, "Error: Invalid nid_list format\n"); + return -1; + } + if (in_range && prev_num > curr_num) { + fprintf(stderr, + "Error: Invalid range %d-%d (start must be <=3D end)\n", + prev_num, curr_num); + return -1; + } + i =3D 0; + has_digit =3D 0; + in_range =3D 0; + prev_num =3D 0; + curr_num =3D 0; + continue; + } + + if (*p =3D=3D '-') { + if (!has_digit) { + fprintf(stderr, + "Error: Invalid nid_list format "); + fprintf(stderr, + "(dash without preceding number)\n"); + return -1; + } + if (in_range) { + fprintf(stderr, "Error: Multiple dashes in nid_list\n"); + return -1; + } + prev_num =3D curr_num; + curr_num =3D 0; + i =3D 0; + has_digit =3D 0; + in_range =3D 1; + continue; + } + + if (!isdigit((unsigned char)*p)) { + fprintf(stderr, "Error: Invalid character '%c' in nid_list\n", *p); + return -1; + } + + if (i > 5) { + fprintf(stderr, "Error: NID too long (max 65536)\n"); + return -1; + } + curr_num =3D curr_num * 10 + (*p - '0'); + i++; + has_digit =3D 1; + } + + if (!has_digit) { + fprintf(stderr, "Error: Invalid nid_list format\n"); + return -1; + } + + if (in_range && prev_num > curr_num) { + fprintf(stderr, + "Error: Invalid range %d-%d (start must be <=3D end)\n", + prev_num, curr_num); + return -1; + } + + return 0; +} + +int main(int argc, char *argv[]) +{ + const char *output_file =3D NULL; + char filter_cmd[MAX_CMD_LEN]; + FILE *output =3D NULL; + int fd =3D -1; + ssize_t ret; + char buf[4096]; + int opt; + size_t cmd_len =3D 0; + + static struct option long_options[] =3D { + {"mode", required_argument, 0, 'm'}, + {"nid", required_argument, 0, 'n'}, + {"output", required_argument, 0, 'o'}, + {"help", no_argument, 0, 'h'}, + {0, 0, 0, 0} + }; + + filter_cmd[0] =3D '\0'; + + if (argc > 1) { + for (int i =3D 1; i < argc; i++) { + if (strcmp(argv[i], "-h") =3D=3D 0 || strcmp(argv[i], "--help") =3D=3D = 0) { + usage(argv[0]); + return 0; + } + } + } + + /* Check if page_owner exists and is readable */ + if (access("/sys/kernel/debug/page_owner", F_OK) !=3D 0) { + if (errno =3D=3D ENOENT) + fprintf(stderr, "Error: /sys/kernel/debug/page_owner does not exist\n"); + else + perror("Error accessing /sys/kernel/debug/page_owner"); + fprintf(stderr, "Make sure page_owner is enabled in kernel\n"); + return 1; + } + + while ((opt =3D getopt_long(argc, argv, "m:n:o:h", long_options, NULL)) != =3D -1) { + int len; + + switch (opt) { + case 'm': { + const char *mode =3D optarg; + + if (validate_mode(mode) < 0) + return 1; + len =3D snprintf(filter_cmd + cmd_len, MAX_CMD_LEN - cmd_len, + "%smode=3D%s", cmd_len > 0 ? " " : "", mode); + if (len < 0 || cmd_len + len >=3D MAX_CMD_LEN) { + fprintf(stderr, "Error: Command too long\n"); + return 1; + } + cmd_len +=3D len; + break; + } + case 'n': { + const char *nid_list =3D optarg; + + if (validate_nid_list(nid_list) < 0) + return 1; + len =3D snprintf(filter_cmd + cmd_len, MAX_CMD_LEN - cmd_len, + "%snid=3D%s", cmd_len > 0 ? " " : "", nid_list); + if (len < 0 || cmd_len + len >=3D MAX_CMD_LEN) { + fprintf(stderr, "Error: Command too long\n"); + return 1; + } + cmd_len +=3D len; + break; + } + case 'o': + output_file =3D optarg; + break; + case 'h': + /* Already handled above */ + break; + default: + usage(argv[0]); + return 1; + } + } + + /* At least one filter must be specified */ + if (cmd_len =3D=3D 0) { + fprintf(stderr, "Error: At least one filter (-m or -n) must be specified= \n\n"); + usage(argv[0]); + return 1; + } + + /* Open page_owner for read-write - this will fail if kernel doesn't supp= ort write */ + fd =3D open("/sys/kernel/debug/page_owner", O_RDWR); + if (fd < 0) { + if (errno =3D=3D EACCES || errno =3D=3D EPERM) { + fprintf(stderr, "Error: /sys/kernel/debug/page_owner "); + fprintf(stderr, "does not support write access\n"); + fprintf(stderr, "This kernel does not support "); + fprintf(stderr, "per-fd filtering.\n"); + fprintf(stderr, "Please ensure you have a kernel with "); + fprintf(stderr, "per-fd filtering support.\n"); + } else { + perror("Error opening /sys/kernel/debug/page_owner"); + } + return 1; + } + + if (output_file) { + output =3D fopen(output_file, "w"); + if (!output) { + perror("open output file"); + close(fd); + return 1; + } + } else { + output =3D stdout; + } + + ret =3D write(fd, filter_cmd, strlen(filter_cmd)); + + if (ret < 0) { + if (errno =3D=3D EINVAL) { + fprintf(stderr, "Error: Kernel rejected the filter command.\n"); + fprintf(stderr, "Possible causes:\n"); + fprintf(stderr, " - Kernel does not support per-fd filtering\n"); + fprintf(stderr, " - NUMA node has no memory\n"); + fprintf(stderr, " - Unknown reason\n"); + } else { + perror("write filter command"); + } + close(fd); + if (output !=3D stdout) + fclose(output); + return 1; + } + + if ((size_t)ret !=3D strlen(filter_cmd)) + fprintf(stderr, "Warning: Partial write (%zd/%zu)\n", ret, strlen(filter= _cmd)); + + /* Read and display filtered output */ + while ((ret =3D read(fd, buf, sizeof(buf))) > 0) + fwrite(buf, 1, ret, output); + + fflush(output); + + if (ret < 0) { + perror("read page_owner"); + close(fd); + if (output !=3D stdout) + fclose(output); + return 1; + } + + close(fd); + if (output !=3D stdout) + fclose(output); + + return 0; +} --=20 2.20.1 From nobody Mon Jun 8 23:56:26 2026 Received: from mail-m32118.qiye.163.com (mail-m32118.qiye.163.com [220.197.32.118]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B91803E1CED for ; Mon, 25 May 2026 08:22:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.32.118 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779697355; cv=none; b=aKKHTwOgNp4S4wMWkdhsrFBJh/nTzgP5LdOH+9L8fV9ch/kQDGahWoFShW9Oi/nXirOgQxuhMMfNLL7QqY/e/Ps9t8uPryMEzTjMfrkE/aPHpjqaGv1U+RSXYefsmpzLHJ34Y2KyQPZsnepyYZ88OoXMchYR1XnFsC1TfPRDIck= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779697355; c=relaxed/simple; bh=/g3aEesQBrw9jRs2c8O8xgv8a91tf6vTnrceNx1jpmU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=G5mIha6tysR2jPdspdsJ1A+Pb1PKHzefeJgDnQrzN+T1UjNrRVCFs+pVs9YwZ7N9mSWuj4D+m+1gEsYZhzHETJa/muaDnCZ8pT83VKhvrvGvlMpiwrKXBQ2WwB+yEGGS/6kaYJvsRNE3Fo6vwUlbP041eSGvctLVABl013ZQ8sk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=220.197.32.118 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from localhost.localdomain (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 1a8ce5e0f; Mon, 25 May 2026 16:17:10 +0800 (GMT+08:00) From: Zhen Ni To: akpm@linux-foundation.org, vbabka@kernel.org Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhen Ni Subject: [PATCH v9 4/4] mm/page_owner: document page_owner filter Date: Mon, 25 May 2026 16:16:52 +0800 Message-Id: <20260525081652.2210206-5-zhen.ni@easystack.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260525081652.2210206-1-zhen.ni@easystack.cn> References: <20260525081652.2210206-1-zhen.ni@easystack.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-HM-Tid: 0a9e5e3596f80229kunm9b83604117f150 X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVlDT09DVh1DTUNCQxpDHh1ITFYVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVKS0 lPT09IVUpLS1VKQktLWQY+ Content-Type: text/plain; charset="utf-8" Add documentation for the page_owner_filter userspace tool and kernel-level filtering features. Signed-off-by: Zhen Ni --- Changes in v9: - No changes Changes in v8: - Fix Sphinx double colon warning Changes in v7: - document for per-file-descriptor implementation Changes in v6: - No code changes Changes in v5: - No code changes Changes in v4: - Update print_mode documentation to reflect string-based interface * Change from "0/1" to "full_stack"/"stack_handle" * Add bracket notation example: "[full_stack] stack_handle" - Update NUMA filter documentation * Remove "-1" example * Add empty string as clear method - Fix indentation: use tabs instead of spaces in code examples Changes in v3: - New patch to document filter features as requested by Andrew Morton v8: https://lore.kernel.org/linux-mm/20260520075641.1931080-5-zhen.ni@easys= tack.cn/ v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-5-zhen.ni@easys= tack.cn/ v6: https://lore.kernel.org/linux-mm/20260511033017.747781-4-zhen.ni@easyst= ack.cn/ v5: https://lore.kernel.org/linux-mm/20260507064643.179187-4-zhen.ni@easyst= ack.cn/ v4: https://lore.kernel.org/linux-mm/20260430163247.13628-4-zhen.ni@easysta= ck.cn/ v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-5-zhen.ni@easys= tack.cn/ --- Documentation/mm/page_owner.rst | 77 ++++++++++++++++++++++++++++++++- 1 file changed, 75 insertions(+), 2 deletions(-) diff --git a/Documentation/mm/page_owner.rst b/Documentation/mm/page_owner.= rst index 6b12f3b007ec..383e59c42743 100644 --- a/Documentation/mm/page_owner.rst +++ b/Documentation/mm/page_owner.rst @@ -65,7 +65,14 @@ un-tracking state. Usage =3D=3D=3D=3D=3D =20 -1) Build user-space helper:: +1) Build user-space helpers:: + +To filter page_owner output: + + cd tools/mm + make page_owner_filter + +To sort and analyze page_owner output: =20 cd tools/mm make page_owner_sort @@ -74,7 +81,11 @@ Usage =20 3) Do the job that you want to debug. =20 -4) Analyze information from page owner:: +4) (Optional) Filter page_owner output:: + + ./page_owner_filter -m handle -n 0,1,2 > filtered_page_owner.txt + +5) Analyze information from page owner:: =20 cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt cat stacks.txt @@ -263,3 +274,65 @@ STANDARD FORMAT SPECIFIERS f free whether the page has been released or not st stacktrace stack trace of the page allocation ator allocator memory allocator for pages + +Filtering page_owner output +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D + +page_owner supports filtering output at the kernel level before reading, +which reduces the amount of data that needs to be processed in userspace. + +The page_owner_filter tool provides a convenient interface for this filter= ing +capability. It supports two types of filters: + +1. **print_mode filter**: Control what information is printed for each page + - ``stack``: Print full stack traces (default, compatible with existing u= sage) + - ``handle``: Print only stack handle numbers (much faster, smaller outpu= t) + - ``stack_handle``: Print both stack traces and handle numbers + + The ``handle`` mode uses numeric identifiers instead of full stack traces. + The mapping from handles to actual stack traces can be obtained via the + show_stacks_handles interface. + +2. **NUMA node filter**: Filter pages by NUMA node ID + - Supports single node: ``-n 0`` + - Multiple nodes: ``-n 0,1,2`` + - Ranges: ``-n 0-3`` + - Mixed format: ``-n 0,2-3,5`` + +Usage examples:: + + # Filter by print mode + ./page_owner_filter -m handle + ./page_owner_filter -m stack_handle + + # Filter by NUMA node + ./page_owner_filter -n 0 + ./page_owner_filter -n 0-3 + + # Combined filters + ./page_owner_filter -m stack -n 0,1,2 + ./page_owner_filter -m handle -n 0,2-3 + + # Save to file + ./page_owner_filter -m handle -o filtered_output.txt + +The handle mode is particularly useful for monitoring and performance-crit= ical +scenarios as it dramatically reduces output size. Testing shows handle mod= e can +reduce output size by ~66% (84MB vs 244MB) and improve read performance by= ~4.4x +compared to full stack output. + +The NUMA node filter is useful for NUMA-aware memory allocation analysis a= nd debugging. + +Behind the scenes, page_owner_filter opens /sys/kernel/debug/page_owner and +writes filter commands before reading the filtered output. The filtering u= ses +per-file-descriptor state, allowing each open() to have independent filter= settings. + +Each file descriptor maintains its own filter state, so you can have multi= ple +independent filtering operations running concurrently. For example, in dif= ferent +terminals you can run different filters simultaneously:: + + # Terminal 1: Filter node 0 + ./page_owner_filter -n 0 > node0_output.txt + + # Terminal 2: Filter node 1 (runs concurrently) + ./page_owner_filter -n 1 > node1_output.txt --=20 2.20.1