[PATCH v4 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering

Zhen Ni posted 3 patches 1 month, 2 weeks ago
There is a newer version of this series
Documentation/mm/page_owner.rst |  61 ++++++++++-
mm/page_owner.c                 | 180 +++++++++++++++++++++++++++++++-
2 files changed, 238 insertions(+), 3 deletions(-)
[PATCH v4 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering
Posted by Zhen Ni 1 month, 2 weeks ago
This patch series introduces filtering capabilities to the page_owner
feature to address storage and performance challenges in production
environments.

Changes from v3:
- Change print_mode from numeric (0/1) to string-based interface
  * Use "full_stack"/"stack_handle" strings instead of numbers
  * Display current mode with bracket notation: "[full_stack] stack_handle"
- Remove "-1" support from NUMA filter
  * Use empty string to clear filter (echo > nid)
- Use strncpy_from_user() instead of copy_from_user()
- Rename nid_filter_fops to page_owner_nid_filter_fops for consistency
- Merge patch 1 (infrastructure) and patch 2 (print_mode) from v3
- Update documentation to match new interface
  * String-based examples
  * Tab indentation in code blocks

Changes from v2:
- Remove READ_ONCE/WRITE_ONCE for nodemask_t (fixes compilation errors)
  * nodemask_t is a large structure (128 bytes) that triggers compile-time asserts
  * Direct assignment is safe for this use case
- Add comment explaining input length calculation formula
  * 6 bytes = ",NNNNN" (comma + 5-digit node number)
- Simplify "-1" check using kstrtoint() instead of dual strcmp()
- Move nodemask_t mask read outside PFN iteration loop for performance
  * Avoids 128-byte structure copy on each iteration
- Add documentation for filter features (patch 3/3)

Changes from v1:
- Renamed 'compact' to 'print_mode' with enum type for better clarity
  * PAGE_OWNER_PRINT_FULL_STACK (0): print full stack traces
  * PAGE_OWNER_PRINT_STACK_HANDLE (1): print only stack handles
- Changed NUMA filter from single node to nodelist with bitmask support
  * Uses nodelist_parse() to support "0", "0,2", "0-3", "0,2-4,7" formats
  * Uses nodemask_t internally for efficient multi-node filtering
  * Output uses %*pbl format (e.g., "0-2", "0,2-4,7")
- Improved memory handling in nid_filter_write using dynamic allocation
  * Limit: (100 + 6 * MAX_NUMNODES) to handle worst-case input


Problem Statement
=================

In production environments with large memory configurations (e.g., 250GB+),
collecting page_owner information often results in files ranging from
several gigabytes to over 10GB. This creates significant challenges:

1. Storage pressure on production systems
2. Difficulty transferring large files from production environments
3. Post-processing overhead with tools/mm/page_owner_sort.c

The primary contributor to file size is redundant stack trace
information. While the kernel already deduplicates stacks via
stackdepot, page_owner retrieves and stores full stack traces for
each page, only to deduplicate them again during post-processing.

Additionally, in NUMA-aware environments (e.g., DPDK-based cloud
deployments where QEMU processes are bound to specific NUMA nodes),
OOM events are often node-specific rather than system-wide.
Currently, page_owner cannot filter by NUMA node, forcing users to
collect and analyze data for all nodes.

Solution
========

This patch series introduces a flexible filter infrastructure with
two initial filters:

1. **Print Mode Filter**: Outputs only stack handles instead of
   full stack traces. The handle-to-stack mapping can be retrieved
   from the existing show_stacks_handles interface. This dramatically
   reduces output size while preserving all allocation metadata.

2. **NUMA Node Filter**: Allows filtering pages by specific NUMA node(s)
   using flexible nodelist format, enabling targeted analysis of memory
   issues in NUMA-aware deployments.

Implementation
==============

The series is structured as follows:

- Patch 1: Implement print_mode filter with string-based interface
  (merges infrastructure + print_mode from v3)
- Patch 2: Implement NUMA node filter with nodelist support
- Patch 3: Document filter features

Usage Example
=============

Enable print_mode and filter for NUMA nodes 0,2-3:

    # cd /sys/kernel/debug/page_owner_filter/
    # echo stack_handle > print_mode
    # echo "0,2-3" > nid
    # cat /sys/kernel/debug/page_owner > page_owner.txt

Sample print_mode output (showing handles only):

    Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper),
    ts 0 ns PFN 0x40000 type Unmovable Block 512 type Unmovable
    Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
    handle: 1048577

    Page allocated via order 0, mask 0x252000(__GFP_NOWARN|
    __GFP_NORETRY|__GFP_COMP|__GFP_THISNODE), pid 0, tgid 0 (swapper),
    ts 0 ns PFN 0x40002 type Unmovable Block 512 type Unmovable
    Flags 0x23fffe0000000200(workingset|node=0|zone=0|lastcpupid=0x1ffff)
    handle: 1048577

Testing
=======

Tested on a system with multiple NUMA nodes. Verified that:
- Filters work independently and in combination
- Print_mode output correlates correctly with show_stacks_handles
- Default behavior (filters disabled) remains unchanged
- NUMA filter works with single node, multiple nodes, and ranges
- String-based interface works correctly ("full_stack"/"stack_handle")
- Empty string clears NUMA filter
- Code compiles without warnings or errors (allmodconfig tested)

Example test session:
    # cat print_mode
    [full_stack] stack_handle
    # echo stack_handle > print_mode
    # cat print_mode
    full_stack [stack_handle]
    # echo "0,1-2" > nid
    # cat nid
    0-2
    # echo "0,2-3" > nid
    # cat nid
    0,2-3
    # echo > nid
    # cat nid

    (empty - filter cleared)

Future Enhancements
===================

The filter infrastructure is designed to be extensible. Potential
future filters could include:
- PID/TGID filtering
- Time range filtering (allocation timestamp windows)
- GFP flag filtering
- Migration type filtering

Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---

Zhen Ni (3):
  mm/page_owner: add print_mode filter
  mm/page_owner: add NUMA node filter with nodelist support
  mm/page_owner: document page_owner filter features

 Documentation/mm/page_owner.rst |  61 ++++++++++-
 mm/page_owner.c                 | 180 +++++++++++++++++++++++++++++++-
 2 files changed, 238 insertions(+), 3 deletions(-)

--
2.20.1
Re: [PATCH v4 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering
Posted by Andrew Morton 1 month, 2 weeks ago
On Fri,  1 May 2026 00:32:44 +0800 Zhen Ni <zhen.ni@easystack.cn> wrote:

> This patch series introduces filtering capabilities to the page_owner
> feature to address storage and performance challenges in production
> environments.

AI review asks a couple of reasonable-sounding questions:
	https://sashiko.dev/#/patchset/20260430163247.13628-1-zhen.ni@easystack.cn
Re: [PATCH v4 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering
Posted by zhen.ni 1 month, 2 weeks ago

在 2026/5/1 02:22, Andrew Morton 写道:
> On Fri,  1 May 2026 00:32:44 +0800 Zhen Ni <zhen.ni@easystack.cn> wrote:
> 
>> This patch series introduces filtering capabilities to the page_owner
>> feature to address storage and performance challenges in production
>> environments.
> 
> AI review asks a couple of reasonable-sounding questions:
> 	https://sashiko.dev/#/patchset/20260430163247.13628-1-zhen.ni@easystack.cn
> 
> 

Will this cause KCSAN splats?

While the practical impact is minimal (debugfs interface, infrequent
writes, torn reads only cause temporary debug output inconsistency),
we should properly handle this to avoid KCSAN warnings.

I'm wondering if using the __data_racy qualifier would be appropriate
here? Something like:


struct page_owner_filter {
     ...
     nodemask_t __data_racy nid_mask;
};


Is it necessary to evaluate nodes_empty(mask) inside this loop?

I'll fix this by moving the check outside the loop.


Best regards,
Zhen
Re: [PATCH v4 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering
Posted by SeongJae Park 1 month, 2 weeks ago
On Thu, 30 Apr 2026 11:22:45 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Fri,  1 May 2026 00:32:44 +0800 Zhen Ni <zhen.ni@easystack.cn> wrote:
> 
> > This patch series introduces filtering capabilities to the page_owner
> > feature to address storage and performance challenges in production
> > environments.
> 
> AI review asks a couple of reasonable-sounding questions:
> 	https://sashiko.dev/#/patchset/20260430163247.13628-1-zhen.ni@easystack.cn

I like the idea of this series and therefore willing to help reviewing.  I
therefore added a few comments to the previous version of this series.  But
unfortunately not that much to open the web browser for revewing the Sashiko
review on my own.  I might willing to do that on my onw, if I could read that
on this email list.  But that's not the case and I'm a lazy and bad reviewer...

Even if Zhen replies with his opinion saying Sashiko's review found no real
issue, if it doesn't have reasonable amount of explanation with original
Sashiko review quotes, I might still feel like I may better to double check
Zhen's opinion, but again I might not feel like to open web browser to read
origianl Sashiko review.

So I will hold reviewing this series until I sure the Sashiko reviews found no
blocker, or I forget the fact that there were concerning Sashiko reviews to
this series.  Just wanted to make clear why I don't keep reviewing this series,
FWIW.


Thanks,
SJ