[PATCH 0/2] mm, swap: improve cluster scan strategy

Kairui Song posted 2 patches 2 months ago
include/linux/swap.h |  1 -
mm/swapfile.c        | 68 +++++++++++++++++++++++---------------------
2 files changed, 36 insertions(+), 33 deletions(-)
[PATCH 0/2] mm, swap: improve cluster scan strategy
Posted by Kairui Song 2 months ago
From: Kairui Song <kasong@tencent.com>

This series improves the large allocation performance and reduces
the failure rate. Some design of the cluster alloactor was later
found to be improvable after thorough testing.

For example, build kernel test with make -j96 and 10G ZRAM with 64kB
mTHP enabled shows better performance and a lower failure rate:

Before: sys time: 10230.22s  64kB/swpout: 1793044  64kB/swpout_fallback: 17653
After:  sys time: 5538.3s    64kB/swpout: 1813133  64kB/swpout_fallback: 0

System time is cut in half, and the failure rate drops to zero. Larger
allocations in a hybrid workload also showed a major improvement:

512kB swap failure rate:
Before: swpout:11971  swpout_fallback:2218
After:  swpout:14606  swpout_fallback:4

2M swap failure rate:
Before: swpout:12     swpout_fallback:1578
After:  swpout:1253   swpout_fallback:15

Kairui Song (2):
  mm, swap: don't scan every fragment cluster
  mm, swap: prefer nonfull over free clusters

 include/linux/swap.h |  1 -
 mm/swapfile.c        | 68 +++++++++++++++++++++++---------------------
 2 files changed, 36 insertions(+), 33 deletions(-)

-- 
2.50.1
Re: [PATCH 0/2] mm, swap: improve cluster scan strategy
Posted by Chris Li 2 months ago
On Mon, Aug 4, 2025 at 10:24 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> This series improves the large allocation performance and reduces
> the failure rate. Some design of the cluster alloactor was later
> found to be improvable after thorough testing.

Nit: If you have a next version of this series, please include a bit
of detail on how you get the improvement to kick off the discussion.
Right now the cover letter just said I have some cool changes and here
is the number. e.g. limit the fragment list search to the first
cluster.

>
> For example, build kernel test with make -j96 and 10G ZRAM with 64kB
> mTHP enabled shows better performance and a lower failure rate:
>
> Before: sys time: 10230.22s  64kB/swpout: 1793044  64kB/swpout_fallback: 17653
> After:  sys time: 5538.3s    64kB/swpout: 1813133  64kB/swpout_fallback: 0
>
> System time is cut in half, and the failure rate drops to zero. Larger
> allocations in a hybrid workload also showed a major improvement:

That is a big improvement. Congrats.

>
> 512kB swap failure rate:
> Before: swpout:11971  swpout_fallback:2218
> After:  swpout:14606  swpout_fallback:4
>
> 2M swap failure rate:
> Before: swpout:12     swpout_fallback:1578
> After:  swpout:1253   swpout_fallback:15

The number looks very good.

Chris



Chris