[PATCH v2 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering

Zhen Ni posted 3 patches 2 months ago
There is a newer version of this series
mm/page_owner.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 122 insertions(+), 2 deletions(-)
[PATCH v2 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering
Posted by Zhen Ni 2 months ago
This patch series introduces filtering capabilities to the page_owner
feature to address storage and performance challenges in production
environments.

Changes from v1:
- Renamed 'compact' to 'print_mode' with enum type for better clarity
  * PAGE_OWNER_PRINT_FULL_STACK (0): print full stack traces
  * PAGE_OWNER_PRINT_STACK_HANDLE (1): print only stack handles
- Changed NUMA filter from single node to nodelist with bitmask support
  * Uses nodelist_parse() to support "0", "0,2", "0-3", "0,2-4,7" formats
  * Uses nodemask_t internally for efficient multi-node filtering
  * Output uses %*pbl format (e.g., "0-2", "0,2-4,7")
- Improved memory handling in nid_filter_write using dynamic allocation
  * Limit: (100 + 6 * MAX_NUMNODES) to handle worst-case input

These changes address feedback from v1 review:
- "compact" was too vague → use descriptive enum (PAGE_OWNER_PRINT_*)
- Single node filter was limiting → use nodelist_parse() for multi-node support

Problem Statement
=================

In production environments with large memory configurations (e.g., 250GB+),
collecting page_owner information often results in files ranging from
several gigabytes to over 10GB. This creates significant challenges:

1. Storage pressure on production systems
2. Difficulty transferring large files from production environments
3. Post-processing overhead with tools/mm/page_owner_sort.c

The primary contributor to file size is redundant stack trace
information. While the kernel already deduplicates stacks via
stackdepot, page_owner retrieves and stores full stack traces for
each page, only to deduplicate them again during post-processing.

Additionally, in NUMA-aware environments (e.g., DPDK-based cloud
deployments where QEMU processes are bound to specific NUMA nodes),
OOM events are often node-specific rather than system-wide.
Currently, page_owner cannot filter by NUMA node, forcing users to
collect and analyze data for all nodes.

Solution
========

This patch series introduces a flexible filter infrastructure with
two initial filters:

1. **Print Mode Filter**: Outputs only stack handles instead of
   full stack traces. The handle-to-stack mapping can be retrieved
   from the existing show_stacks_handles interface. This dramatically
   reduces output size while preserving all allocation metadata.

2. **NUMA Node Filter**: Allows filtering pages by specific NUMA node(s)
   using flexible nodelist format, enabling targeted analysis of memory
   issues in NUMA-aware deployments.

Implementation
==============

The series is structured as follows:

- Patch 1: Add filter infrastructure (data structures and
  debugfs directory)
- Patch 2: Implement print_mode filter
- Patch 3: Implement NUMA node filter with nodelist support

Usage Example
=============

Enable print_mode and filter for NUMA nodes 0,2-3:

    # cd /sys/kernel/debug/page_owner_filter/
    # echo 1 > print_mode
    # echo "0,2-3" > nid
    # cat /sys/kernel/debug/page_owner > page_owner.txt

Sample print_mode output (showing handles only):

    Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper),
    ts 0 ns PFN 0x40000 type Unmovable Block 512 type Unmovable
    Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
    handle: 1048577

    Page allocated via order 0, mask 0x252000(__GFP_NOWARN|
    __GFP_NORETRY|__GFP_COMP|__GFP_THISNODE), pid 0, tgid 0 (swapper),
    ts 0 ns PFN 0x40002 type Unmovable Block 512 type Unmovable
    Flags 0x23fffe0000000200(workingset|node=0|zone=0|lastcpupid=0x1ffff)
    handle: 1048577

Testing
=======

Tested on a system with multiple NUMA nodes. Verified that:
- Filters work independently and in combination
- Print_mode output correlates correctly with show_stacks_handles
- Default behavior (filters disabled) remains unchanged
- NUMA filter works with single node, multiple nodes, and ranges

Example test session:
    # cat print_mode
    0
    # echo "0,1-2" > nid
    # cat nid
    0-2
    # echo "0,2-3" > nid
    # cat nid
    0,2-3
    # echo 1 > print_mode
    # head -n 100 /sys/kernel/debug/page_owner
    [Shows compact mode output with handles only]

Future Enhancements
==================

The filter infrastructure is designed to be extensible. Potential
future filters could include:
- PID/TGID filtering
- Time range filtering (allocation timestamp windows)
- GFP flag filtering
- Migration type filtering

Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>

---

Zhen Ni (3):
  mm/page_owner: add filter infrastructure
  mm/page_owner: add print_mode filter
  mm/page_owner: add NUMA node filter with nodelist support

 mm/page_owner.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 122 insertions(+), 2 deletions(-)

--
2.20.1

Re: [PATCH v2 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering
Posted by Andrew Morton 1 month, 3 weeks ago
On Sun, 19 Apr 2026 23:55:37 +0800 Zhen Ni <zhen.ni@easystack.cn> wrote:

> This patch series introduces filtering capabilities to the page_owner
> feature to address storage and performance challenges in production
> environments.

AI review asks some good questions:
	https://sashiko.dev/#/patchset/20260419155540.376847-1-zhen.ni@easystack.cn

(I'd be OK with ignoring the bisection hole issues)
Re: [PATCH v2 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering
Posted by Andrew Morton 1 month, 3 weeks ago
On Sun, 19 Apr 2026 23:55:37 +0800 Zhen Ni <zhen.ni@easystack.cn> wrote:

> This patch series introduces filtering capabilities to the page_owner
> feature to address storage and performance challenges in production
> environments.

Nice patchset, thanks.  I'm glad to hear that page_owner is being
useful in such environments.

I'll add this to mm.git for some testing and review.  Meanwhile...

>  mm/page_owner.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 122 insertions(+), 2 deletions(-)

Could you please add appropriate updates to
Documentation/mm/page_owner.rst?