arch/arm64/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Currently, ARCH_WANTS_THP_SWAP was limited to 4K page size ARM64 kernels, but
large folios requiring swapping also exist in other page size configurations
(e.g. 64K). Without this config, large folios in these kernels cannot be swapped
out.
Here we enable ARCH_WANTS_THP_SWAP for all ARM64 page sizes.
Signed-off-by: Weilin Tong <tongweilin@linux.alibaba.com>
---
arch/arm64/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 93173f0a09c7..58f7b4405f81 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -120,7 +120,7 @@ config ARM64
select ARCH_WANT_LD_ORPHAN_WARN
select ARCH_WANTS_EXECMEM_LATE
select ARCH_WANTS_NO_INSTR
- select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
+ select ARCH_WANTS_THP_SWAP
select ARCH_HAS_UBSAN
select ARM_AMBA
select ARM_ARCH_TIMER
--
2.43.7
On Fri, Dec 26, 2025 at 7:39 PM Weilin Tong
<tongweilin@linux.alibaba.com> wrote:
>
> Currently, ARCH_WANTS_THP_SWAP was limited to 4K page size ARM64 kernels, but
> large folios requiring swapping also exist in other page size configurations
> (e.g. 64K). Without this config, large folios in these kernels cannot be swapped
> out.
>
> Here we enable ARCH_WANTS_THP_SWAP for all ARM64 page sizes.
I no longer recall why this was not enabled for sizes other than
4 KB in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64"), but
it appears to be fine, and the swap cluster size should also be
more friendly to PMD alignment.
#ifdef CONFIG_THP_SWAP
#define SWAPFILE_CLUSTER HPAGE_PMD_NR
#define swap_entry_order(order) (order)
#else
#define SWAPFILE_CLUSTER 256
#define swap_entry_order(order) 0
#endif
>
> Signed-off-by: Weilin Tong <tongweilin@linux.alibaba.com>
> ---
> arch/arm64/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 93173f0a09c7..58f7b4405f81 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -120,7 +120,7 @@ config ARM64
> select ARCH_WANT_LD_ORPHAN_WARN
> select ARCH_WANTS_EXECMEM_LATE
> select ARCH_WANTS_NO_INSTR
> - select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> + select ARCH_WANTS_THP_SWAP
> select ARCH_HAS_UBSAN
> select ARM_AMBA
> select ARM_ARCH_TIMER
> --
> 2.43.7
Thanks
Barry
On Fri, Dec 26, 2025 at 07:52:44PM +1300, Barry Song wrote:
> On Fri, Dec 26, 2025 at 7:39 PM Weilin Tong
> <tongweilin@linux.alibaba.com> wrote:
> >
> > Currently, ARCH_WANTS_THP_SWAP was limited to 4K page size ARM64 kernels, but
> > large folios requiring swapping also exist in other page size configurations
> > (e.g. 64K). Without this config, large folios in these kernels cannot be swapped
> > out.
> >
> > Here we enable ARCH_WANTS_THP_SWAP for all ARM64 page sizes.
>
> I no longer recall why this was not enabled for sizes other than
> 4 KB in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64"), but
> it appears to be fine, and the swap cluster size should also be
> more friendly to PMD alignment.
You seemed to be worried about I/O latency in your original post:
https://lore.kernel.org/all/20220524071403.128644-1-21cnbao@gmail.com/
Will
On Fri, Jan 9, 2026 at 7:29 AM Will Deacon <will@kernel.org> wrote:
>
> On Fri, Dec 26, 2025 at 07:52:44PM +1300, Barry Song wrote:
> > On Fri, Dec 26, 2025 at 7:39 PM Weilin Tong
> > <tongweilin@linux.alibaba.com> wrote:
> > >
> > > Currently, ARCH_WANTS_THP_SWAP was limited to 4K page size ARM64 kernels, but
> > > large folios requiring swapping also exist in other page size configurations
> > > (e.g. 64K). Without this config, large folios in these kernels cannot be swapped
> > > out.
> > >
> > > Here we enable ARCH_WANTS_THP_SWAP for all ARM64 page sizes.
> >
> > I no longer recall why this was not enabled for sizes other than
> > 4 KB in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64"), but
> > it appears to be fine, and the swap cluster size should also be
> > more friendly to PMD alignment.
>
> You seemed to be worried about I/O latency in your original post:
>
> https://lore.kernel.org/all/20220524071403.128644-1-21cnbao@gmail.com/
Will, thanks for pointing this out! With a 16KB page size, a PMD
covers 32MB; with 64KB pages, a PMD covers 512MB. So, Weilin, are
we ready to wait for 32MB or 512MB to be written out before
memory can be reclaimed? By splitting, we can reclaim memory
earlier while only part of it has been swapped out.
While splitting down to order-0 is not ideal, splitting to a
relatively larger order appears to strike a balance between I/O
latency and swap performance. Anyway, I don't know :-)
Thanks
Barry
在 2026/1/9 07:11, Barry Song 写道:
> On Fri, Jan 9, 2026 at 7:29 AM Will Deacon <will@kernel.org> wrote:
>> On Fri, Dec 26, 2025 at 07:52:44PM +1300, Barry Song wrote:
>>> On Fri, Dec 26, 2025 at 7:39 PM Weilin Tong
>>> <tongweilin@linux.alibaba.com> wrote:
>>>> Currently, ARCH_WANTS_THP_SWAP was limited to 4K page size ARM64 kernels, but
>>>> large folios requiring swapping also exist in other page size configurations
>>>> (e.g. 64K). Without this config, large folios in these kernels cannot be swapped
>>>> out.
>>>>
>>>> Here we enable ARCH_WANTS_THP_SWAP for all ARM64 page sizes.
>>> I no longer recall why this was not enabled for sizes other than
>>> 4 KB in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64"), but
>>> it appears to be fine, and the swap cluster size should also be
>>> more friendly to PMD alignment.
>> You seemed to be worried about I/O latency in your original post:
>>
>> https://lore.kernel.org/all/20220524071403.128644-1-21cnbao@gmail.com/
> Will, thanks for pointing this out! With a 16KB page size, a PMD
> covers 32MB; with 64KB pages, a PMD covers 512MB. So, Weilin, are
> we ready to wait for 32MB or 512MB to be written out before
> memory can be reclaimed? By splitting, we can reclaim memory
> earlier while only part of it has been swapped out.
I got your point. In our production envs using 64K pagesize kernel, we
only enable 2M and below size
mthp, so swapping out as a whole is a better way. Or maybe we can set
the SWAPFILE_CLUSTER by arch.
I will do some tests of this concern.
Thanks a lot.
> While splitting down to order-0 is not ideal, splitting to a
> relatively larger order appears to strike a balance between I/O
> latency and swap performance. Anyway, I don't know :-)
>
> Thanks
> Barry
On Fri, Jan 9, 2026 at 4:32 PM Weilin Tong <tongweilin@linux.alibaba.com> wrote:
>
>
> 在 2026/1/9 07:11, Barry Song 写道:
> > On Fri, Jan 9, 2026 at 7:29 AM Will Deacon <will@kernel.org> wrote:
> >> On Fri, Dec 26, 2025 at 07:52:44PM +1300, Barry Song wrote:
> >>> On Fri, Dec 26, 2025 at 7:39 PM Weilin Tong
> >>> <tongweilin@linux.alibaba.com> wrote:
> >>>> Currently, ARCH_WANTS_THP_SWAP was limited to 4K page size ARM64 kernels, but
> >>>> large folios requiring swapping also exist in other page size configurations
> >>>> (e.g. 64K). Without this config, large folios in these kernels cannot be swapped
> >>>> out.
> >>>>
> >>>> Here we enable ARCH_WANTS_THP_SWAP for all ARM64 page sizes.
> >>> I no longer recall why this was not enabled for sizes other than
> >>> 4 KB in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64"), but
> >>> it appears to be fine, and the swap cluster size should also be
> >>> more friendly to PMD alignment.
> >> You seemed to be worried about I/O latency in your original post:
> >>
> >> https://lore.kernel.org/all/20220524071403.128644-1-21cnbao@gmail.com/
> > Will, thanks for pointing this out! With a 16KB page size, a PMD
> > covers 32MB; with 64KB pages, a PMD covers 512MB. So, Weilin, are
> > we ready to wait for 32MB or 512MB to be written out before
> > memory can be reclaimed? By splitting, we can reclaim memory
> > earlier while only part of it has been swapped out.
>
> I got your point. In our production envs using 64K pagesize kernel, we
> only enable 2M and below size
If mTHP is enabled only for sizes below 2 MB, the patch makes
perfect sense. However, the problem is that we do not know how
others configure their systems.
>
> mthp, so swapping out as a whole is a better way. Or maybe we can set
> the SWAPFILE_CLUSTER by arch.
Even for 512 MB or 32 MB PMD folios, it would be perfectly fine
for SWAPFILE_CLUSTER to match the PMD folio size, given the
assumption that the swap table should be PAGE_SIZE.
>
> I will do some tests of this concern.
Right. It would be helpful to have some test data—for example,
with larger folios like 16 MB, 32 MB, or 64 MB—to see what
happens when memory reclamation kicks in.
One possible option is to call
split_huge_page_to_list_to_order(&folio->page, list,
get_order(SZ_2M));
for paging out.But this looks rather ugly :-)
On the other hand, if users configure mTHP to, for example,
128 MB, swapping out and reclaiming the entire 128 MB folio
could actually help with memory de-fragmentation.
So perhaps users should tolerate the I/O latency in this
case?
>
> Thanks a lot.
>
> > While splitting down to order-0 is not ideal, splitting to a
> > relatively larger order appears to strike a balance between I/O
> > latency and swap performance. Anyway, I don't know :-)
> >
Thanks
Barry
在 2025/12/26 14:52, Barry Song 写道:
> On Fri, Dec 26, 2025 at 7:39 PM Weilin Tong
> <tongweilin@linux.alibaba.com> wrote:
>> Currently, ARCH_WANTS_THP_SWAP was limited to 4K page size ARM64 kernels, but
>> large folios requiring swapping also exist in other page size configurations
>> (e.g. 64K). Without this config, large folios in these kernels cannot be swapped
>> out.
>>
>> Here we enable ARCH_WANTS_THP_SWAP for all ARM64 page sizes.
> I no longer recall why this was not enabled for sizes other than
> 4 KB in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64"), but
> it appears to be fine, and the swap cluster size should also be
> more friendly to PMD alignment.
>
>
> #ifdef CONFIG_THP_SWAP
> #define SWAPFILE_CLUSTER HPAGE_PMD_NR
> #define swap_entry_order(order) (order)
> #else
> #define SWAPFILE_CLUSTER 256
> #define swap_entry_order(order) 0
> #endif
Thank you very much for taking the time to review this patch during the
holiday.
Wishing you a happy holiday as well!
I appreciate you pointing out this optimization. We initially noticed
the issue because,
on ARM64 kernels with 64K page size, if large folios are used in shmem,
they cannot be
swapped out as a whole during shmem_writeout() due to the config
limitation, and are forced
to split instead — which is something we wanted to avoid.
It seems that this change will help enable better swap operations for
large folios.
Thank you again for your feedback!
>> Signed-off-by: Weilin Tong <tongweilin@linux.alibaba.com>
>> ---
>> arch/arm64/Kconfig | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 93173f0a09c7..58f7b4405f81 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -120,7 +120,7 @@ config ARM64
>> select ARCH_WANT_LD_ORPHAN_WARN
>> select ARCH_WANTS_EXECMEM_LATE
>> select ARCH_WANTS_NO_INSTR
>> - select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
>> + select ARCH_WANTS_THP_SWAP
>> select ARCH_HAS_UBSAN
>> select ARM_AMBA
>> select ARM_ARCH_TIMER
>> --
>> 2.43.7
> Thanks
> Barry
© 2016 - 2026 Red Hat, Inc.