[PATCH v2 0/2] mm/readahead: batch folio insertion to improve performance

Zhiguo Zhou posted 2 patches 2 weeks, 5 days ago
include/linux/pagemap.h |   4 +-
mm/filemap.c            | 238 ++++++++++++++++++++++++++++------------
mm/hugetlb.c            |   3 +-
mm/readahead.c          | 196 ++++++++++++++++++++++++++-------
4 files changed, 325 insertions(+), 116 deletions(-)
[PATCH v2 0/2] mm/readahead: batch folio insertion to improve performance
Posted by Zhiguo Zhou 2 weeks, 5 days ago
This patch series improves readahead performance by batching folio
insertions into the page cache's xarray, reducing the cacheline transfers,
and optimizing the execution efficiency in the critical section.

PROBLEM
=======
When the `readahead` syscall is invoked, `page_cache_ra_unbounded`
currently inserts folios into the page cache individually. Each insertion
requires acquiring and releasing the `xa_lock`, which can lead to:
1. Significant lock contention when running on multi-core systems
2. Cross-core cacheline transfers for the lock and associated data
3. Increased execution time due to frequent lock operations

These overheads become particularly noticeable in high-throughput storage
workloads where readahead is frequently used.

SOLUTION
========
This series introduces batched folio insertion for contiguous ranges in
the page cache. The key changes are:

Patch 1/2: Refactor __filemap_add_folio to separate critical section
- Extract the core xarray insertion logic into
  __filemap_add_folio_xa_locked()
- Allow callers to control locking granularity via a 'xa_locked' parameter
- Maintain existing functionality while preparing for batch insertion

Patch 2/2: Batch folio insertion in page_cache_ra_unbounded
- Introduce filemap_add_folio_range() for batch insertion of folios
- Pre-allocate folios before entering the critical section
- Insert multiple folios while holding the xa_lock only once
- Update page_cache_ra_unbounded to use the new batching interface
- Insert folios individually when memory is under pressure

PERFORMANCE RESULTS
===================
Testing was performed using RocksDB's `db_bench` (readseq workload) on a
32-vCPU Intel Ice Lake server with 256GB memory:

1. Throughput improved by 1.51x (ops/sec)
2. Latency:
   - P50: 63.9% reduction (6.15 usec → 2.22 usec)
   - P75: 42.1% reduction (13.38 usec → 7.75 usec)
   - P99: 31.4% reduction (507.95 usec → 348.54 usec)
3. IPC of page_cache_ra_unbounded (excluding lock overhead) improved by
   2.18x

TESTING DETAILS
===============
- Kernel: v6.19-rc5 (0f61b1, tip of mm.git:mm-stable on Jan 14, 2026)
- Hardware: Intel Ice Lake server, 32 vCPUs, 256GB RAM
- Workload: RocksDB db_bench readseq
- Command: ./db_bench --benchmarks=readseq,stats --use_existing_db=1
           --num_multi_db=32 --threads=32 --num=1600000 --value_size=8192
           --cache_size=16GB

IMPLEMENTATION NOTES
====================
- The existing single-folio insertion API remains unchanged for
  compatibility
- Hugetlb folio handling is preserved through the refactoring
- Error injection (BPF) support is maintained for __filemap_add_folio

Zhiguo Zhou (2):
  mm/filemap: refactor __filemap_add_folio to separate critical section
  mm/readahead: batch folio insertion to improve performance

 include/linux/pagemap.h |   4 +-
 mm/filemap.c            | 238 ++++++++++++++++++++++++++++------------
 mm/hugetlb.c            |   3 +-
 mm/readahead.c          | 196 ++++++++++++++++++++++++++-------
 4 files changed, 325 insertions(+), 116 deletions(-)

-- 
2.43.0

Re: [PATCH v2 0/2] mm/readahead: batch folio insertion to improve performance
Posted by Matthew Wilcox 2 weeks, 5 days ago
On Mon, Jan 19, 2026 at 06:02:57PM +0800, Zhiguo Zhou wrote:
> This patch series improves readahead performance by batching folio
> insertions into the page cache's xarray, reducing the cacheline transfers,
> and optimizing the execution efficiency in the critical section.

1. Don't resend patches immediately.  Wait for feedback.

2. Don't send v2 as a reply to v1.  New thread.

3. This is unutterably ugly.

4. Passing boolean parameters to functions is an antipattern.  You
never know at the caller site what 'true' or 'false' means.

5. Passing 'is_locked' is specifically an antipattern of its own.

6. You've EXPORTed a symbol that has no in-tree modular user.

7. Do you want to keep trying to do this or do you want me to do it
properly?  I don't have much patience for doing development by patch
feedback, not for something as sensitive as the page cache.
[PATCH v2 0/2] mm/readahead: Changes since v1
Posted by Zhiguo Zhou 2 weeks, 5 days ago
Hi all,

Changes since v1:
- Fixed lockdep_assert_held() usage (now passes &xa_lock)

Sorry for missing this in the v2 cover letter.

Thanks,
Zhiguo