[PATCH v4 0/3] mm: swap: mTHP swap allocator base on swap cluster order

Chris Li posted 3 patches 1 year, 5 months ago
There is a newer version of this series
include/linux/swap.h |  30 ++--
mm/swapfile.c        | 490 +++++++++++++++++++++++----------------------------
2 files changed, 238 insertions(+), 282 deletions(-)
[PATCH v4 0/3] mm: swap: mTHP swap allocator base on swap cluster order
Posted by Chris Li 1 year, 5 months ago
This is the short term solutions "swap cluster order" listed
in my "Swap Abstraction" discussion slice 8 in the recent
LSF/MM conference.

When commit 845982eb264bc "mm: swap: allow storage of all mTHP
orders" is introduced, it only allocates the mTHP swap entries
from the new empty cluster list.  It has a fragmentation issue
reported by Barry.

https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+Ah+NSgNQ@mail.gmail.com/

The reason is that all the empty clusters have been exhausted while
there are plenty of free swap entries in the cluster that are
not 100% free.

Remember the swap allocation order in the cluster.
Keep track of the per order non full cluster list for later allocation.

The patch 3 of this series gives the swap SSD allocation
a new separate code path from the HDD allocation. The new allocator
use cluster list only and do not global scan swap_map[] without lock
any more.

This streamline the swap allocation for SSD. The code matches the execution
flow much better.

User impact: For users that allocate and free mix order mTHP swapping,
It greatly improves the success rate of the mTHP swap allocation after the
initial phase.

It also performs faster when the swapfile is close to full, because the
allocator can get the non full cluster from a list rather than scanning
a lot of swap_map entries. 

This series still lacks the swap cache reclaim feature. The reclaim series
of patches are under development and testing right now. Will post the
mail list soon. For this reason, the patch 3 is consider RFC and not
ready to merge.

With Barry's mthp test program V2:

Without:
$ ./thp_swap_allocator_test -a
Iteration 1: swpout inc: 32, swpout fallback inc: 192, Fallback percentage: 85.71%
Iteration 2: swpout inc: 0, swpout fallback inc: 231, Fallback percentage: 100.00%
Iteration 3: swpout inc: 0, swpout fallback inc: 227, Fallback percentage: 100.00%
...
Iteration 98: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
Iteration 99: swpout inc: 0, swpout fallback inc: 215, Fallback percentage: 100.00%
Iteration 100: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%

$ ./thp_swap_allocator_test -a -s
Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
..
Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%

$ ./thp_swap_allocator_test -s
Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
..
Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%

$ ./thp_swap_allocator_test
Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
..
Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%

With:
$ ./thp_swap_allocator_test -a
Iteration 1: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 2: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 3: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00%
...
Iteration 98: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 99: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 100: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%

$ ./thp_swap_allocator_test -a -s
Iteration 1: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 2: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 3: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 4: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 5: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 6: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 7: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 8: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 9: swpout inc: 217, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 10: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 11: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 12: swpout inc: 232, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 13: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 14: swpout inc: 223, swpout fallback inc: 3, Fallback percentage: 1.33%
Iteration 15: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 16: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 17: swpout inc: 212, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 18: swpout inc: 234, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 19: swpout inc: 220, swpout fallback inc: 6, Fallback percentage: 2.65%
Iteration 20: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 21: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 22: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 23: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 24: swpout inc: 232, swpout fallback inc: 1, Fallback percentage: 0.43%
Iteration 25: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 26: swpout inc: 230, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 27: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 28: swpout inc: 225, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 29: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
Iteration 30: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 31: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 32: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 33: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 34: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
Iteration 35: swpout inc: 230, swpout fallback inc: 3, Fallback percentage: 1.29%
Iteration 36: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 37: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 38: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 39: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
Iteration 40: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 41: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 42: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 43: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 44: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 45: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 46: swpout inc: 221, swpout fallback inc: 2, Fallback percentage: 0.90%
Iteration 47: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 48: swpout inc: 220, swpout fallback inc: 1, Fallback percentage: 0.45%
Iteration 49: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 50: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 51: swpout inc: 224, swpout fallback inc: 2, Fallback percentage: 0.88%
Iteration 52: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 53: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 54: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 55: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 56: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
Iteration 57: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 58: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 59: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 60: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
Iteration 61: swpout inc: 217, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 62: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 63: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 64: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 65: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 66: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 67: swpout inc: 220, swpout fallback inc: 2, Fallback percentage: 0.90%
Iteration 68: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 69: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 70: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 71: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 72: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 73: swpout inc: 218, swpout fallback inc: 5, Fallback percentage: 2.24%
Iteration 74: swpout inc: 223, swpout fallback inc: 5, Fallback percentage: 2.19%
Iteration 75: swpout inc: 222, swpout fallback inc: 7, Fallback percentage: 3.06%
Iteration 76: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 77: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 78: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 79: swpout inc: 223, swpout fallback inc: 2, Fallback percentage: 0.89%
Iteration 80: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45%
Iteration 81: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 82: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 83: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
Iteration 84: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 85: swpout inc: 213, swpout fallback inc: 1, Fallback percentage: 0.47%
Iteration 86: swpout inc: 215, swpout fallback inc: 8, Fallback percentage: 3.59%
Iteration 87: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45%
Iteration 88: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 89: swpout inc: 222, swpout fallback inc: 6, Fallback percentage: 2.63%
Iteration 90: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 91: swpout inc: 214, swpout fallback inc: 1, Fallback percentage: 0.47%
Iteration 92: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 93: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 94: swpout inc: 223, swpout fallback inc: 2, Fallback percentage: 0.89%
Iteration 95: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45%
Iteration 96: swpout inc: 223, swpout fallback inc: 4, Fallback percentage: 1.76%
Iteration 97: swpout inc: 223, swpout fallback inc: 7, Fallback percentage: 3.04%
Iteration 98: swpout inc: 227, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 99: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
Iteration 100: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00%

$ ./thp_swap_allocator_test      
Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 2: swpout inc: 134, swpout fallback inc: 98, Fallback percentage: 42.24%
Iteration 3: swpout inc: 72, swpout fallback inc: 154, Fallback percentage: 68.14%
Iteration 4: swpout inc: 40, swpout fallback inc: 183, Fallback percentage: 82.06%
Iteration 5: swpout inc: 27, swpout fallback inc: 199, Fallback percentage: 88.05%
Iteration 6: swpout inc: 22, swpout fallback inc: 202, Fallback percentage: 90.18%
Iteration 7: swpout inc: 12, swpout fallback inc: 216, Fallback percentage: 94.74%
Iteration 8: swpout inc: 14, swpout fallback inc: 214, Fallback percentage: 93.86%
Iteration 9: swpout inc: 5, swpout fallback inc: 221, Fallback percentage: 97.79%
Iteration 10: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61%
...
Iteration 97: swpout inc: 12, swpout fallback inc: 207, Fallback percentage: 94.52%
Iteration 98: swpout inc: 8, swpout fallback inc: 219, Fallback percentage: 96.48%
Iteration 99: swpout inc: 16, swpout fallback inc: 218, Fallback percentage: 93.16%
Iteration 100: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61%

$ ./thp_swap_allocator_test -s
Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 2: swpout inc: 84, swpout fallback inc: 148, Fallback percentage: 63.79%
Iteration 3: swpout inc: 39, swpout fallback inc: 195, Fallback percentage: 83.33%
Iteration 4: swpout inc: 16, swpout fallback inc: 217, Fallback percentage: 93.13%
Iteration 5: swpout inc: 11, swpout fallback inc: 214, Fallback percentage: 95.11%
Iteration 6: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61%
...
Iteration 96: swpout inc: 5, swpout fallback inc: 225, Fallback percentage: 97.83%
Iteration 97: swpout inc: 2, swpout fallback inc: 215, Fallback percentage: 99.08%
Iteration 98: swpout inc: 2, swpout fallback inc: 220, Fallback percentage: 99.10%
Iteration 99: swpout inc: 4, swpout fallback inc: 222, Fallback percentage: 98.23%
Iteration 100: swpout inc: 3, swpout fallback inc: 221, Fallback percentage: 98.66%

Kernel compile under tmpfs with cgroup memory.max = 2G.
12 core 24 hyperthreading, 32 jobs.

HDD swap 3 runs average, 20G swap file:

Without:
user	4186.290
system	421.743
real	597.317

With:
user	4113.897
system	413.123
real	659.543

SSD swap 10 runs average, 20G swap partition:

Without:
user	4736.810
system	500.921
real	250.243
 
With:
user	4729.478
system	500.265
real	249.633

Two zram swap:
zram0 1.4G zram1 20G.
The idea is forcing the zram0 almost
full then overflow to zram1:

Two zram 10 runs average:

Without:
user	4600.693
system	384.105
real	238.735

With:
user	4604.502
system	382.087
real	239.063

Reported-by: Barry Song <21cnbao@gmail.com>
Signed-off-by: Chris Li <chrisl@kernel.org>
---
Changes in v4:
- Remove a warning in patch 2.
- Allocating from the free cluster list before the nonfull list. Revert the v3 behavior.
- Add cluster_index and cluster_offset function.
- Patch 3 has a new allocating path for SSD.
- HDD swap allocation does not need to consider clusters any more.

Changes in v3:
- Using V1 as base.
- Rename "next" to "list" for the list field, suggested by Ying.
- Update comment for the locking rules for cluster fields and list,
  suggested by Ying.
- Allocate from the nonfull list before attempting free list, suggested
  by Kairui.
- Link to v2: https://lore.kernel.org/r/20240614-swap-allocator-v2-0-2a513b4a7f2f@kernel.org

Changes in v2:
- Abandoned.
- Link to v1: https://lore.kernel.org/r/20240524-swap-allocator-v1-0-47861b423b26@kernel.org

---
Chris Li (3):
      mm: swap: swap cluster switch to double link list
      mm: swap: mTHP allocate swap entries from nonfull list
      RFC: mm: swap: seperate SSD allocation from scan_swap_map_slots()

 include/linux/swap.h |  30 ++--
 mm/swapfile.c        | 490 +++++++++++++++++++++++----------------------------
 2 files changed, 238 insertions(+), 282 deletions(-)
---
base-commit: ff3a648ecb9409aff1448cf4f6aa41d78c69a3bc
change-id: 20240523-swap-allocator-1534c480ece4

Best regards,
-- 
Chris Li <chrisl@kernel.org>

Re: [PATCH v4 0/3] mm: swap: mTHP swap allocator base on swap cluster order
Posted by Huang, Ying 1 year, 4 months ago
Chris Li <chrisl@kernel.org> writes:

> This is the short term solutions "swap cluster order" listed
> in my "Swap Abstraction" discussion slice 8 in the recent
> LSF/MM conference.
>
> When commit 845982eb264bc "mm: swap: allow storage of all mTHP
> orders" is introduced, it only allocates the mTHP swap entries
> from the new empty cluster list.  It has a fragmentation issue
> reported by Barry.
>
> https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+Ah+NSgNQ@mail.gmail.com/
>
> The reason is that all the empty clusters have been exhausted while
> there are plenty of free swap entries in the cluster that are
> not 100% free.
>
> Remember the swap allocation order in the cluster.
> Keep track of the per order non full cluster list for later allocation.
>
> The patch 3 of this series gives the swap SSD allocation
> a new separate code path from the HDD allocation. The new allocator
> use cluster list only and do not global scan swap_map[] without lock
> any more.
>
> This streamline the swap allocation for SSD. The code matches the execution
> flow much better.
>
> User impact: For users that allocate and free mix order mTHP swapping,
> It greatly improves the success rate of the mTHP swap allocation after the
> initial phase.
>
> It also performs faster when the swapfile is close to full, because the
> allocator can get the non full cluster from a list rather than scanning
> a lot of swap_map entries. 
>
> This series still lacks the swap cache reclaim feature. The reclaim series
> of patches are under development and testing right now. Will post the
> mail list soon. For this reason, the patch 3 is consider RFC and not
> ready to merge.
>
> With Barry's mthp test program V2:
>
> Without:
> $ ./thp_swap_allocator_test -a
> Iteration 1: swpout inc: 32, swpout fallback inc: 192, Fallback percentage: 85.71%
> Iteration 2: swpout inc: 0, swpout fallback inc: 231, Fallback percentage: 100.00%
> Iteration 3: swpout inc: 0, swpout fallback inc: 227, Fallback percentage: 100.00%
> ...
> Iteration 98: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
> Iteration 99: swpout inc: 0, swpout fallback inc: 215, Fallback percentage: 100.00%
> Iteration 100: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
>
> $ ./thp_swap_allocator_test -a -s
> Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
> Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
> Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
> ..
> Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
> Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
> Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%
>
> $ ./thp_swap_allocator_test -s
> Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
> Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
> Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
> ..
> Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
> Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
> Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%
>
> $ ./thp_swap_allocator_test
> Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
> Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
> Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
> ..
> Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
> Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
> Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%
>
> With:
> $ ./thp_swap_allocator_test -a
> Iteration 1: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 2: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 3: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00%
> ...
> Iteration 98: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 99: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 100: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
>
> $ ./thp_swap_allocator_test -a -s
> Iteration 1: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 2: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 3: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 4: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 5: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 6: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 7: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 8: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 9: swpout inc: 217, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 10: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 11: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 12: swpout inc: 232, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 13: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 14: swpout inc: 223, swpout fallback inc: 3, Fallback percentage: 1.33%
> Iteration 15: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 16: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 17: swpout inc: 212, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 18: swpout inc: 234, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 19: swpout inc: 220, swpout fallback inc: 6, Fallback percentage: 2.65%
> Iteration 20: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 21: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 22: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 23: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 24: swpout inc: 232, swpout fallback inc: 1, Fallback percentage: 0.43%
> Iteration 25: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 26: swpout inc: 230, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 27: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 28: swpout inc: 225, swpout fallback inc: 1, Fallback percentage: 0.44%
> Iteration 29: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
> Iteration 30: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
> Iteration 31: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 32: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 33: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 34: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
> Iteration 35: swpout inc: 230, swpout fallback inc: 3, Fallback percentage: 1.29%
> Iteration 36: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 37: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 38: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 39: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
> Iteration 40: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 41: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 42: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 43: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 44: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 45: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 46: swpout inc: 221, swpout fallback inc: 2, Fallback percentage: 0.90%
> Iteration 47: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 48: swpout inc: 220, swpout fallback inc: 1, Fallback percentage: 0.45%
> Iteration 49: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 50: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 51: swpout inc: 224, swpout fallback inc: 2, Fallback percentage: 0.88%
> Iteration 52: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 53: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 54: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 55: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 56: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
> Iteration 57: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 58: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 59: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
> Iteration 60: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
> Iteration 61: swpout inc: 217, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 62: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 63: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 64: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 65: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 66: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 67: swpout inc: 220, swpout fallback inc: 2, Fallback percentage: 0.90%
> Iteration 68: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 69: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 70: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 71: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 72: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 73: swpout inc: 218, swpout fallback inc: 5, Fallback percentage: 2.24%
> Iteration 74: swpout inc: 223, swpout fallback inc: 5, Fallback percentage: 2.19%
> Iteration 75: swpout inc: 222, swpout fallback inc: 7, Fallback percentage: 3.06%
> Iteration 76: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 77: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 78: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 79: swpout inc: 223, swpout fallback inc: 2, Fallback percentage: 0.89%
> Iteration 80: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45%
> Iteration 81: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 82: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 83: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
> Iteration 84: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 85: swpout inc: 213, swpout fallback inc: 1, Fallback percentage: 0.47%
> Iteration 86: swpout inc: 215, swpout fallback inc: 8, Fallback percentage: 3.59%
> Iteration 87: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45%
> Iteration 88: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 89: swpout inc: 222, swpout fallback inc: 6, Fallback percentage: 2.63%
> Iteration 90: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
> Iteration 91: swpout inc: 214, swpout fallback inc: 1, Fallback percentage: 0.47%
> Iteration 92: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 93: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 94: swpout inc: 223, swpout fallback inc: 2, Fallback percentage: 0.89%
> Iteration 95: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45%
> Iteration 96: swpout inc: 223, swpout fallback inc: 4, Fallback percentage: 1.76%
> Iteration 97: swpout inc: 223, swpout fallback inc: 7, Fallback percentage: 3.04%
> Iteration 98: swpout inc: 227, swpout fallback inc: 1, Fallback percentage: 0.44%
> Iteration 99: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
> Iteration 100: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00%
>
> $ ./thp_swap_allocator_test      
> Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 2: swpout inc: 134, swpout fallback inc: 98, Fallback percentage: 42.24%
> Iteration 3: swpout inc: 72, swpout fallback inc: 154, Fallback percentage: 68.14%
> Iteration 4: swpout inc: 40, swpout fallback inc: 183, Fallback percentage: 82.06%
> Iteration 5: swpout inc: 27, swpout fallback inc: 199, Fallback percentage: 88.05%
> Iteration 6: swpout inc: 22, swpout fallback inc: 202, Fallback percentage: 90.18%
> Iteration 7: swpout inc: 12, swpout fallback inc: 216, Fallback percentage: 94.74%
> Iteration 8: swpout inc: 14, swpout fallback inc: 214, Fallback percentage: 93.86%
> Iteration 9: swpout inc: 5, swpout fallback inc: 221, Fallback percentage: 97.79%
> Iteration 10: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61%
> ...
> Iteration 97: swpout inc: 12, swpout fallback inc: 207, Fallback percentage: 94.52%
> Iteration 98: swpout inc: 8, swpout fallback inc: 219, Fallback percentage: 96.48%
> Iteration 99: swpout inc: 16, swpout fallback inc: 218, Fallback percentage: 93.16%
> Iteration 100: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61%
>
> $ ./thp_swap_allocator_test -s
> Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 2: swpout inc: 84, swpout fallback inc: 148, Fallback percentage: 63.79%
> Iteration 3: swpout inc: 39, swpout fallback inc: 195, Fallback percentage: 83.33%
> Iteration 4: swpout inc: 16, swpout fallback inc: 217, Fallback percentage: 93.13%
> Iteration 5: swpout inc: 11, swpout fallback inc: 214, Fallback percentage: 95.11%
> Iteration 6: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61%
> ...
> Iteration 96: swpout inc: 5, swpout fallback inc: 225, Fallback percentage: 97.83%
> Iteration 97: swpout inc: 2, swpout fallback inc: 215, Fallback percentage: 99.08%
> Iteration 98: swpout inc: 2, swpout fallback inc: 220, Fallback percentage: 99.10%
> Iteration 99: swpout inc: 4, swpout fallback inc: 222, Fallback percentage: 98.23%
> Iteration 100: swpout inc: 3, swpout fallback inc: 221, Fallback percentage: 98.66%
>
> Kernel compile under tmpfs with cgroup memory.max = 2G.
> 12 core 24 hyperthreading, 32 jobs.
>
> HDD swap 3 runs average, 20G swap file:
>
> Without:
> user	4186.290
> system	421.743
> real	597.317
>
> With:
> user	4113.897
> system	413.123
> real	659.543
>
> SSD swap 10 runs average, 20G swap partition:
>
> Without:
> user	4736.810
> system	500.921
> real	250.243
>  
> With:
> user	4729.478
> system	500.265
> real	249.633
>
> Two zram swap:
> zram0 1.4G zram1 20G.
> The idea is forcing the zram0 almost
> full then overflow to zram1:
>
> Two zram 10 runs average:
>
> Without:
> user	4600.693
> system	384.105
> real	238.735
>
> With:
> user	4604.502
> system	382.087
> real	239.063
>
> Reported-by: Barry Song <21cnbao@gmail.com>
> Signed-off-by: Chris Li <chrisl@kernel.org>
> ---
> Changes in v4:
> - Remove a warning in patch 2.
> - Allocating from the free cluster list before the nonfull list. Revert the v3 behavior.
> - Add cluster_index and cluster_offset function.
> - Patch 3 has a new allocating path for SSD.
> - HDD swap allocation does not need to consider clusters any more.

It appears that my comments in the following emails are ignored?

https://lore.kernel.org/linux-mm/87bk3pzr5p.fsf@yhuang6-desk2.ccr.corp.intel.com/
https://lore.kernel.org/linux-mm/874j9hzqr3.fsf@yhuang6-desk2.ccr.corp.intel.com/

> changes in v3:
> - Using V1 as base.
> - Rename "next" to "list" for the list field, suggested by Ying.
> - Update comment for the locking rules for cluster fields and list,
>   suggested by Ying.
> - Allocate from the nonfull list before attempting free list, suggested
>   by Kairui.
> - Link to v2: https://lore.kernel.org/r/20240614-swap-allocator-v2-0-2a513b4a7f2f@kernel.org
>
> Changes in v2:
> - Abandoned.
> - Link to v1: https://lore.kernel.org/r/20240524-swap-allocator-v1-0-47861b423b26@kernel.org
>
> ---
> Chris Li (3):
>       mm: swap: swap cluster switch to double link list
>       mm: swap: mTHP allocate swap entries from nonfull list
>       RFC: mm: swap: seperate SSD allocation from scan_swap_map_slots()
>
>  include/linux/swap.h |  30 ++--
>  mm/swapfile.c        | 490 +++++++++++++++++++++++----------------------------
>  2 files changed, 238 insertions(+), 282 deletions(-)
> ---
> base-commit: ff3a648ecb9409aff1448cf4f6aa41d78c69a3bc
> change-id: 20240523-swap-allocator-1534c480ece4
>

--
Best Regards,
Huang, Ying
Re: [PATCH v4 0/3] mm: swap: mTHP swap allocator base on swap cluster order
Posted by Chris Li 1 year, 4 months ago
On Wed, Jul 17, 2024 at 10:54 PM Huang, Ying <ying.huang@intel.com> wrote:
> > - HDD swap allocation does not need to consider clusters any more.
>
> It appears that my comments in the following emails are ignored?
>

Sorry for missing some email catching up.

> https://lore.kernel.org/linux-mm/87bk3pzr5p.fsf@yhuang6-desk2.ccr.corp.intel.com/

Will reply to that. BTW, V4 already reverted to the previous SSD
behavior and allocated a new cluster before nonfull cluster.

> https://lore.kernel.org/linux-mm/874j9hzqr3.fsf@yhuang6-desk2.ccr.corp.intel.com/

Already replied to the renaming in another email.

Chris

>
> > changes in v3:
> > - Using V1 as base.
> > - Rename "next" to "list" for the list field, suggested by Ying.
> > - Update comment for the locking rules for cluster fields and list,
> >   suggested by Ying.
> > - Allocate from the nonfull list before attempting free list, suggested
> >   by Kairui.
> > - Link to v2: https://lore.kernel.org/r/20240614-swap-allocator-v2-0-2a513b4a7f2f@kernel.org
> >
> > Changes in v2:
> > - Abandoned.
> > - Link to v1: https://lore.kernel.org/r/20240524-swap-allocator-v1-0-47861b423b26@kernel.org
> >
> > ---
> > Chris Li (3):
> >       mm: swap: swap cluster switch to double link list
> >       mm: swap: mTHP allocate swap entries from nonfull list
> >       RFC: mm: swap: seperate SSD allocation from scan_swap_map_slots()
> >
> >  include/linux/swap.h |  30 ++--
> >  mm/swapfile.c        | 490 +++++++++++++++++++++++----------------------------
> >  2 files changed, 238 insertions(+), 282 deletions(-)
> > ---
> > base-commit: ff3a648ecb9409aff1448cf4f6aa41d78c69a3bc
> > change-id: 20240523-swap-allocator-1534c480ece4
> >
>
> --
> Best Regards,
> Huang, Ying
Re: [PATCH v4 0/3] mm: swap: mTHP swap allocator base on swap cluster order
Posted by Ryan Roberts 1 year, 5 months ago
On 11/07/2024 08:29, Chris Li wrote:
> This is the short term solutions "swap cluster order" listed
> in my "Swap Abstraction" discussion slice 8 in the recent
> LSF/MM conference.
> 
> When commit 845982eb264bc "mm: swap: allow storage of all mTHP
> orders" is introduced, it only allocates the mTHP swap entries
> from the new empty cluster list.  It has a fragmentation issue
> reported by Barry.
> 
> https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+Ah+NSgNQ@mail.gmail.com/
> 
> The reason is that all the empty clusters have been exhausted while
> there are plenty of free swap entries in the cluster that are
> not 100% free.
> 
> Remember the swap allocation order in the cluster.
> Keep track of the per order non full cluster list for later allocation.
> 
> The patch 3 of this series gives the swap SSD allocation
> a new separate code path from the HDD allocation. The new allocator
> use cluster list only and do not global scan swap_map[] without lock
> any more.
> 
> This streamline the swap allocation for SSD. The code matches the execution
> flow much better.
> 
> User impact: For users that allocate and free mix order mTHP swapping,
> It greatly improves the success rate of the mTHP swap allocation after the
> initial phase.
> 
> It also performs faster when the swapfile is close to full, because the
> allocator can get the non full cluster from a list rather than scanning
> a lot of swap_map entries. 
> 
> This series still lacks the swap cache reclaim feature. The reclaim series
> of patches are under development and testing right now. Will post the
> mail list soon. For this reason, the patch 3 is consider RFC and not
> ready to merge.
> 
> With Barry's mthp test program V2:
> 
> Without:
> $ ./thp_swap_allocator_test -a
> Iteration 1: swpout inc: 32, swpout fallback inc: 192, Fallback percentage: 85.71%
> Iteration 2: swpout inc: 0, swpout fallback inc: 231, Fallback percentage: 100.00%
> Iteration 3: swpout inc: 0, swpout fallback inc: 227, Fallback percentage: 100.00%
> ...
> Iteration 98: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
> Iteration 99: swpout inc: 0, swpout fallback inc: 215, Fallback percentage: 100.00%
> Iteration 100: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
> 
> $ ./thp_swap_allocator_test -a -s
> Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
> Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
> Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
> ..
> Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
> Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
> Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%
> 
> $ ./thp_swap_allocator_test -s
> Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
> Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
> Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
> ..
> Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
> Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
> Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%
> 
> $ ./thp_swap_allocator_test
> Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
> Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
> Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
> ..
> Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
> Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
> Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%
> 
> With:
> $ ./thp_swap_allocator_test -a
> Iteration 1: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 2: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 3: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00%
> ...
> Iteration 98: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 99: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 100: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
> 
> $ ./thp_swap_allocator_test -a -s
> Iteration 1: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 2: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 3: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 4: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 5: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 6: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 7: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 8: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 9: swpout inc: 217, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 10: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 11: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 12: swpout inc: 232, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 13: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 14: swpout inc: 223, swpout fallback inc: 3, Fallback percentage: 1.33%
> Iteration 15: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 16: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 17: swpout inc: 212, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 18: swpout inc: 234, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 19: swpout inc: 220, swpout fallback inc: 6, Fallback percentage: 2.65%
> Iteration 20: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 21: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 22: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 23: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 24: swpout inc: 232, swpout fallback inc: 1, Fallback percentage: 0.43%
> Iteration 25: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 26: swpout inc: 230, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 27: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 28: swpout inc: 225, swpout fallback inc: 1, Fallback percentage: 0.44%
> Iteration 29: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
> Iteration 30: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
> Iteration 31: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 32: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 33: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 34: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
> Iteration 35: swpout inc: 230, swpout fallback inc: 3, Fallback percentage: 1.29%
> Iteration 36: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 37: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 38: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 39: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
> Iteration 40: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 41: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 42: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 43: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 44: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 45: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 46: swpout inc: 221, swpout fallback inc: 2, Fallback percentage: 0.90%
> Iteration 47: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 48: swpout inc: 220, swpout fallback inc: 1, Fallback percentage: 0.45%
> Iteration 49: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 50: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 51: swpout inc: 224, swpout fallback inc: 2, Fallback percentage: 0.88%
> Iteration 52: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 53: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 54: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 55: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 56: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
> Iteration 57: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 58: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 59: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
> Iteration 60: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
> Iteration 61: swpout inc: 217, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 62: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 63: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 64: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 65: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 66: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 67: swpout inc: 220, swpout fallback inc: 2, Fallback percentage: 0.90%
> Iteration 68: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 69: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 70: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 71: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 72: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 73: swpout inc: 218, swpout fallback inc: 5, Fallback percentage: 2.24%
> Iteration 74: swpout inc: 223, swpout fallback inc: 5, Fallback percentage: 2.19%
> Iteration 75: swpout inc: 222, swpout fallback inc: 7, Fallback percentage: 3.06%
> Iteration 76: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 77: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 78: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 79: swpout inc: 223, swpout fallback inc: 2, Fallback percentage: 0.89%
> Iteration 80: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45%
> Iteration 81: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 82: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 83: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
> Iteration 84: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 85: swpout inc: 213, swpout fallback inc: 1, Fallback percentage: 0.47%
> Iteration 86: swpout inc: 215, swpout fallback inc: 8, Fallback percentage: 3.59%
> Iteration 87: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45%
> Iteration 88: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 89: swpout inc: 222, swpout fallback inc: 6, Fallback percentage: 2.63%
> Iteration 90: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
> Iteration 91: swpout inc: 214, swpout fallback inc: 1, Fallback percentage: 0.47%
> Iteration 92: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 93: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 94: swpout inc: 223, swpout fallback inc: 2, Fallback percentage: 0.89%
> Iteration 95: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45%
> Iteration 96: swpout inc: 223, swpout fallback inc: 4, Fallback percentage: 1.76%
> Iteration 97: swpout inc: 223, swpout fallback inc: 7, Fallback percentage: 3.04%
> Iteration 98: swpout inc: 227, swpout fallback inc: 1, Fallback percentage: 0.44%
> Iteration 99: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
> Iteration 100: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00%

Great results!

> 
> $ ./thp_swap_allocator_test      
> Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 2: swpout inc: 134, swpout fallback inc: 98, Fallback percentage: 42.24%
> Iteration 3: swpout inc: 72, swpout fallback inc: 154, Fallback percentage: 68.14%
> Iteration 4: swpout inc: 40, swpout fallback inc: 183, Fallback percentage: 82.06%
> Iteration 5: swpout inc: 27, swpout fallback inc: 199, Fallback percentage: 88.05%
> Iteration 6: swpout inc: 22, swpout fallback inc: 202, Fallback percentage: 90.18%
> Iteration 7: swpout inc: 12, swpout fallback inc: 216, Fallback percentage: 94.74%
> Iteration 8: swpout inc: 14, swpout fallback inc: 214, Fallback percentage: 93.86%
> Iteration 9: swpout inc: 5, swpout fallback inc: 221, Fallback percentage: 97.79%
> Iteration 10: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61%
> ...
> Iteration 97: swpout inc: 12, swpout fallback inc: 207, Fallback percentage: 94.52%
> Iteration 98: swpout inc: 8, swpout fallback inc: 219, Fallback percentage: 96.48%
> Iteration 99: swpout inc: 16, swpout fallback inc: 218, Fallback percentage: 93.16%
> Iteration 100: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61%
> 
> $ ./thp_swap_allocator_test -s
> Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
> Iteration 2: swpout inc: 84, swpout fallback inc: 148, Fallback percentage: 63.79%
> Iteration 3: swpout inc: 39, swpout fallback inc: 195, Fallback percentage: 83.33%
> Iteration 4: swpout inc: 16, swpout fallback inc: 217, Fallback percentage: 93.13%
> Iteration 5: swpout inc: 11, swpout fallback inc: 214, Fallback percentage: 95.11%
> Iteration 6: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61%
> ...
> Iteration 96: swpout inc: 5, swpout fallback inc: 225, Fallback percentage: 97.83%
> Iteration 97: swpout inc: 2, swpout fallback inc: 215, Fallback percentage: 99.08%
> Iteration 98: swpout inc: 2, swpout fallback inc: 220, Fallback percentage: 99.10%
> Iteration 99: swpout inc: 4, swpout fallback inc: 222, Fallback percentage: 98.23%
> Iteration 100: swpout inc: 3, swpout fallback inc: 221, Fallback percentage: 98.66%
> 
> Kernel compile under tmpfs with cgroup memory.max = 2G.
> 12 core 24 hyperthreading, 32 jobs.
> 
> HDD swap 3 runs average, 20G swap file:
> 
> Without:
> user	4186.290
> system	421.743
> real	597.317
> 
> With:
> user	4113.897
> system	413.123
> real	659.543

If I've understood this correctly, this test is taking~10% longer in wall time?
But your changes shouldn't affect HDD swap path? So what's the reason for this?

I'm hoping to review this properly next week. It would be great to get this in
sooner rather than later IMHO.

Thanks,
Ryan

> 
> SSD swap 10 runs average, 20G swap partition:
> 
> Without:
> user	4736.810
> system	500.921
> real	250.243
>  
> With:
> user	4729.478
> system	500.265
> real	249.633
> 
> Two zram swap:
> zram0 1.4G zram1 20G.
> The idea is forcing the zram0 almost
> full then overflow to zram1:
> 
> Two zram 10 runs average:
> 
> Without:
> user	4600.693
> system	384.105
> real	238.735
> 
> With:
> user	4604.502
> system	382.087
> real	239.063
> 
> Reported-by: Barry Song <21cnbao@gmail.com>
> Signed-off-by: Chris Li <chrisl@kernel.org>
> ---
> Changes in v4:
> - Remove a warning in patch 2.
> - Allocating from the free cluster list before the nonfull list. Revert the v3 behavior.
> - Add cluster_index and cluster_offset function.
> - Patch 3 has a new allocating path for SSD.
> - HDD swap allocation does not need to consider clusters any more.
> 
> Changes in v3:
> - Using V1 as base.
> - Rename "next" to "list" for the list field, suggested by Ying.
> - Update comment for the locking rules for cluster fields and list,
>   suggested by Ying.
> - Allocate from the nonfull list before attempting free list, suggested
>   by Kairui.
> - Link to v2: https://lore.kernel.org/r/20240614-swap-allocator-v2-0-2a513b4a7f2f@kernel.org
> 
> Changes in v2:
> - Abandoned.
> - Link to v1: https://lore.kernel.org/r/20240524-swap-allocator-v1-0-47861b423b26@kernel.org
> 
> ---
> Chris Li (3):
>       mm: swap: swap cluster switch to double link list
>       mm: swap: mTHP allocate swap entries from nonfull list
>       RFC: mm: swap: seperate SSD allocation from scan_swap_map_slots()
> 
>  include/linux/swap.h |  30 ++--
>  mm/swapfile.c        | 490 +++++++++++++++++++++++----------------------------
>  2 files changed, 238 insertions(+), 282 deletions(-)
> ---
> base-commit: ff3a648ecb9409aff1448cf4f6aa41d78c69a3bc
> change-id: 20240523-swap-allocator-1534c480ece4
> 
> Best regards,

Re: [PATCH v4 0/3] mm: swap: mTHP swap allocator base on swap cluster order
Posted by Chris Li 1 year, 5 months ago
On Thu, Jul 11, 2024 at 3:02 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> > Kernel compile under tmpfs with cgroup memory.max = 2G.
> > 12 core 24 hyperthreading, 32 jobs.
> >
> > HDD swap 3 runs average, 20G swap file:
> >
> > Without:
> > user  4186.290
> > system        421.743
> > real  597.317
> >
> > With:
> > user  4113.897
> > system        413.123
> > real  659.543
>
> If I've understood this correctly, this test is taking~10% longer in wall time?

Most likely due to the high variance in measurement and fewer
measuring samples 3 vs 10. Most of that wall time is waiting for IO.
It is likely just noise.

> But your changes shouldn't affect HDD swap path? So what's the reason for this?

The change did affect HDD swap path in the sense that it did not need
to check for si->cluster_info any more. A small gain there.

The wall clock time is more than double the SSD or zram. Which means
most of the time the system is waiting for HDD IO to complete (wait is
98%) , there will be much higher variance for sure. At this point the
wall clock we are measuring the wait mostly,  not the actual work. The
system time is quicker, that is good.

I now have a dedicated machine to run the HDD swap now. The HDD is
very very slow to swap. The point of the HDD test is being able to
complete the run without OOM. Because of the high latency in HDD,
there will be more memory pressure. It did catch some other bugs in my
internal version of the patch.

> I'm hoping to review this properly next week. It would be great to get this in
> sooner rather than later IMHO.

Thank you. This new code path is much easier to work with than the
previous SSD and HDD mixed allocation path. I am able to implement the
cluster reservation experiment in the new allocator much quicker.

Chris
Re: [PATCH v4 0/3] mm: swap: mTHP swap allocator base on swap cluster order
Posted by Ryan Roberts 1 year, 5 months ago
On 11/07/2024 15:08, Chris Li wrote:
> On Thu, Jul 11, 2024 at 3:02 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>>> Kernel compile under tmpfs with cgroup memory.max = 2G.
>>> 12 core 24 hyperthreading, 32 jobs.
>>>
>>> HDD swap 3 runs average, 20G swap file:
>>>
>>> Without:
>>> user  4186.290
>>> system        421.743
>>> real  597.317
>>>
>>> With:
>>> user  4113.897
>>> system        413.123
>>> real  659.543
>>
>> If I've understood this correctly, this test is taking~10% longer in wall time?
> 
> Most likely due to the high variance in measurement and fewer
> measuring samples 3 vs 10. Most of that wall time is waiting for IO.
> It is likely just noise.

OK, that certainly makes sense, as long as you're sure its noise. The other
(unlikely) possibility is that somehow the HDD placement descisions are
changing, which increases waiting due to increased seek times.

> 
>> But your changes shouldn't affect HDD swap path? So what's the reason for this?
> 
> The change did affect HDD swap path in the sense that it did not need
> to check for si->cluster_info any more. A small gain there.
> 
> The wall clock time is more than double the SSD or zram. Which means
> most of the time the system is waiting for HDD IO to complete (wait is
> 98%) , there will be much higher variance for sure. At this point the
> wall clock we are measuring the wait mostly,  not the actual work. The
> system time is quicker, that is good.
> 
> I now have a dedicated machine to run the HDD swap now. The HDD is
> very very slow to swap. The point of the HDD test is being able to
> complete the run without OOM. Because of the high latency in HDD,
> there will be more memory pressure. It did catch some other bugs in my
> internal version of the patch.
> 
>> I'm hoping to review this properly next week. It would be great to get this in
>> sooner rather than later IMHO.
> 
> Thank you. This new code path is much easier to work with than the
> previous SSD and HDD mixed allocation path. I am able to implement the
> cluster reservation experiment in the new allocator much quicker.
> 
> Chris

Re: [PATCH v4 0/3] mm: swap: mTHP swap allocator base on swap cluster order
Posted by Chris Li 1 year, 5 months ago
On Mon, Jul 15, 2024 at 7:10 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 11/07/2024 15:08, Chris Li wrote:
> > On Thu, Jul 11, 2024 at 3:02 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >>> Kernel compile under tmpfs with cgroup memory.max = 2G.
> >>> 12 core 24 hyperthreading, 32 jobs.
> >>>
> >>> HDD swap 3 runs average, 20G swap file:
> >>>
> >>> Without:
> >>> user  4186.290
> >>> system        421.743
> >>> real  597.317
> >>>
> >>> With:
> >>> user  4113.897
> >>> system        413.123
> >>> real  659.543
> >>
> >> If I've understood this correctly, this test is taking~10% longer in wall time?
> >
> > Most likely due to the high variance in measurement and fewer
> > measuring samples 3 vs 10. Most of that wall time is waiting for IO.
> > It is likely just noise.
>
> OK, that certainly makes sense, as long as you're sure its noise. The other
> (unlikely) possibility is that somehow the HDD placement descisions are
> changing, which increases waiting due to increased seek times.

I sure did not change the HDD placement, if the HDD allocation is
different from the previous code, that should be a bug.
I mostly remove the cluster code path in HDD swap entry allocation.

I did the HDD run mostly to make sure the HDD can still take some
stress test on the swapping without crashing.

Chris