[v1] mm: support large folios swap-in

[PATCH RFC 0/6] mm: support large folios swap-in

Posted by Barry Song 2 years ago

On an embedded system like Android, more than half of anon memory is actually
in swap devices such as zRAM. For example, while an app is switched to back-
ground, its most memory might be swapped-out.

Now we have mTHP features, unfortunately, if we don't support large folios
swap-in, once those large folios are swapped-out, we immediately lose the 
performance gain we can get through large folios and hardware optimization
such as CONT-PTE.

In theory, we don't need to rely on Ryan's swap out patchset[1]. That is to say,
before swap-out, if some memory were normal pages, but when swapping in, we
can also swap-in them as large folios. But this might require I/O happen at
some random places in swap devices. So we limit the large folios swap-in to
those areas which were large folios before swapping-out, aka, swaps are also
contiguous in hardware. On the other hand, in OPPO's product, we've deployed
anon large folios on millions of phones[2]. we enhanced zsmalloc and zRAM to
compress and decompress large folios as a whole, which help improve compression
ratio and decrease CPU consumption significantly. In zsmalloc and zRAM we can
save large objects whose original size are 64KiB for example. So it is also a
better choice for us to only swap-in large folios for those compressed large
objects as a large folio can be decompressed all together.

Note I am moving my previous "arm64: mm: swap: support THP_SWAP on hardware
with MTE" to this series as it might help review.

[1] [PATCH v3 0/4] Swap-out small-sized THP without splitting
https://lore.kernel.org/linux-mm/20231025144546.577640-1-ryan.roberts@arm.com/
[2] OnePlusOSS / android_kernel_oneplus_sm8550 
https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/oneplus/sm8550_u_14.0.0_oneplus11

Barry Song (2):
  arm64: mm: swap: support THP_SWAP on hardware with MTE
  mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap()

Chuanhua Han (4):
  mm: swap: introduce swap_nr_free() for batched swap_free()
  mm: swap: make should_try_to_free_swap() support large-folio
  mm: support large folios swapin as a whole
  mm: madvise: don't split mTHP for MADV_PAGEOUT

 arch/arm64/include/asm/pgtable.h |  21 ++----
 arch/arm64/mm/mteswap.c          |  42 ++++++++++++
 include/asm-generic/tlb.h        |  10 +++
 include/linux/huge_mm.h          |  12 ----
 include/linux/pgtable.h          |  62 ++++++++++++++++-
 include/linux/swap.h             |   6 ++
 mm/madvise.c                     |  48 ++++++++++++++
 mm/memory.c                      | 110 ++++++++++++++++++++++++++-----
 mm/page_io.c                     |   2 +-
 mm/rmap.c                        |   5 +-
 mm/swap_slots.c                  |   2 +-
 mm/swapfile.c                    |  29 ++++++++
 12 files changed, 301 insertions(+), 48 deletions(-)

-- 
2.34.1

Re: [PATCH RFC 0/6] mm: support large folios swap-in

Posted by Ryan Roberts 2 years ago

On 18/01/2024 11:10, Barry Song wrote:
> On an embedded system like Android, more than half of anon memory is actually
> in swap devices such as zRAM. For example, while an app is switched to back-
> ground, its most memory might be swapped-out.
> 
> Now we have mTHP features, unfortunately, if we don't support large folios
> swap-in, once those large folios are swapped-out, we immediately lose the 
> performance gain we can get through large folios and hardware optimization
> such as CONT-PTE.
> 
> In theory, we don't need to rely on Ryan's swap out patchset[1]. That is to say,
> before swap-out, if some memory were normal pages, but when swapping in, we
> can also swap-in them as large folios. 

I think this could also violate MADV_NOHUGEPAGE; if the application has
requested that we do not create a THP, then we had better not; it could cause a
correctness issue in some circumstances. You would need to pay attention to this
vma flag if taking this approach.

> But this might require I/O happen at
> some random places in swap devices. So we limit the large folios swap-in to
> those areas which were large folios before swapping-out, aka, swaps are also
> contiguous in hardware. 

In fact, even this may not be sufficient; it's possible that a contiguous set of
base pages (small folios) were allocated to a virtual mapping and all swapped
out together - they would likely end up contiguous in the swap file, but should
not be swapped back in as a single folio because of this (same reasoning applies
to cluster of smaller THPs that you mistake for a larger THP, etc).

So you will need to check what THP sizes are enabled and check the VMA
suitability regardless; Perhaps you are already doing this - I haven't looked at
the code yet.

I'll aim to review the code in the next couple of weeks.

Thanks,
Ryan

> On the other hand, in OPPO's product, we've deployed
> anon large folios on millions of phones[2]. we enhanced zsmalloc and zRAM to
> compress and decompress large folios as a whole, which help improve compression
> ratio and decrease CPU consumption significantly. In zsmalloc and zRAM we can
> save large objects whose original size are 64KiB for example. So it is also a
> better choice for us to only swap-in large folios for those compressed large
> objects as a large folio can be decompressed all together.
> 
> Note I am moving my previous "arm64: mm: swap: support THP_SWAP on hardware
> with MTE" to this series as it might help review.
> 
> [1] [PATCH v3 0/4] Swap-out small-sized THP without splitting
> https://lore.kernel.org/linux-mm/20231025144546.577640-1-ryan.roberts@arm.com/
> [2] OnePlusOSS / android_kernel_oneplus_sm8550 
> https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/oneplus/sm8550_u_14.0.0_oneplus11
> 
> Barry Song (2):
>   arm64: mm: swap: support THP_SWAP on hardware with MTE
>   mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap()
> 
> Chuanhua Han (4):
>   mm: swap: introduce swap_nr_free() for batched swap_free()
>   mm: swap: make should_try_to_free_swap() support large-folio
>   mm: support large folios swapin as a whole
>   mm: madvise: don't split mTHP for MADV_PAGEOUT
> 
>  arch/arm64/include/asm/pgtable.h |  21 ++----
>  arch/arm64/mm/mteswap.c          |  42 ++++++++++++
>  include/asm-generic/tlb.h        |  10 +++
>  include/linux/huge_mm.h          |  12 ----
>  include/linux/pgtable.h          |  62 ++++++++++++++++-
>  include/linux/swap.h             |   6 ++
>  mm/madvise.c                     |  48 ++++++++++++++
>  mm/memory.c                      | 110 ++++++++++++++++++++++++++-----
>  mm/page_io.c                     |   2 +-
>  mm/rmap.c                        |   5 +-
>  mm/swap_slots.c                  |   2 +-
>  mm/swapfile.c                    |  29 ++++++++
>  12 files changed, 301 insertions(+), 48 deletions(-)
>

Re: [PATCH RFC 0/6] mm: support large folios swap-in

Posted by Barry Song 2 years ago

On Thu, Jan 18, 2024 at 11:25 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 18/01/2024 11:10, Barry Song wrote:
> > On an embedded system like Android, more than half of anon memory is actually
> > in swap devices such as zRAM. For example, while an app is switched to back-
> > ground, its most memory might be swapped-out.
> >
> > Now we have mTHP features, unfortunately, if we don't support large folios
> > swap-in, once those large folios are swapped-out, we immediately lose the
> > performance gain we can get through large folios and hardware optimization
> > such as CONT-PTE.
> >
> > In theory, we don't need to rely on Ryan's swap out patchset[1]. That is to say,
> > before swap-out, if some memory were normal pages, but when swapping in, we
> > can also swap-in them as large folios.
>
> I think this could also violate MADV_NOHUGEPAGE; if the application has
> requested that we do not create a THP, then we had better not; it could cause a
> correctness issue in some circumstances. You would need to pay attention to this
> vma flag if taking this approach.
>
> > But this might require I/O happen at
> > some random places in swap devices. So we limit the large folios swap-in to
> > those areas which were large folios before swapping-out, aka, swaps are also
> > contiguous in hardware.
>
> In fact, even this may not be sufficient; it's possible that a contiguous set of
> base pages (small folios) were allocated to a virtual mapping and all swapped
> out together - they would likely end up contiguous in the swap file, but should
> not be swapped back in as a single folio because of this (same reasoning applies
> to cluster of smaller THPs that you mistake for a larger THP, etc).
>
> So you will need to check what THP sizes are enabled and check the VMA
> suitability regardless; Perhaps you are already doing this - I haven't looked at
> the code yet.

we are actually re-using your alloc_anon_folio() by adding a parameter
to make it
support both do_anon_page and do_swap_page,

-static struct folio *alloc_anon_folio(struct vm_fault *vmf)
+static struct folio *alloc_anon_folio(struct vm_fault *vmf,
+      bool (*pte_range_check)(pte_t *, int))
 {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
  struct vm_area_struct *vma = vmf->vma;
@@ -4190,7 +4270,7 @@ static struct folio *alloc_anon_folio(struct
vm_fault *vmf)
  order = highest_order(orders);
  while (orders) {
  addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
- if (pte_range_none(pte + pte_index(addr), 1 << order))
+ if (pte_range_check(pte + pte_index(addr), 1 << order))
  break;
  order = next_order(&orders, order);
  }
@@ -4269,7 +4349,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
  if (unlikely(anon_vma_prepare(vma)))
  goto oom;
  /* Returns NULL on OOM or ERR_PTR(-EAGAIN) if we must retry the fault */
- folio = alloc_anon_folio(vmf);
+ folio = alloc_anon_folio(vmf, pte_range_none);
  if (IS_ERR(folio))
  return 0;
  if (!folio)
--

I assume this has checked everything?

>
> I'll aim to review the code in the next couple of weeks.

nice, thanks!

>
> Thanks,
> Ryan
>
> > On the other hand, in OPPO's product, we've deployed
> > anon large folios on millions of phones[2]. we enhanced zsmalloc and zRAM to
> > compress and decompress large folios as a whole, which help improve compression
> > ratio and decrease CPU consumption significantly. In zsmalloc and zRAM we can
> > save large objects whose original size are 64KiB for example. So it is also a
> > better choice for us to only swap-in large folios for those compressed large
> > objects as a large folio can be decompressed all together.
> >
> > Note I am moving my previous "arm64: mm: swap: support THP_SWAP on hardware
> > with MTE" to this series as it might help review.
> >
> > [1] [PATCH v3 0/4] Swap-out small-sized THP without splitting
> > https://lore.kernel.org/linux-mm/20231025144546.577640-1-ryan.roberts@arm.com/
> > [2] OnePlusOSS / android_kernel_oneplus_sm8550
> > https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/oneplus/sm8550_u_14.0.0_oneplus11
> >
> > Barry Song (2):
> >   arm64: mm: swap: support THP_SWAP on hardware with MTE
> >   mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap()
> >
> > Chuanhua Han (4):
> >   mm: swap: introduce swap_nr_free() for batched swap_free()
> >   mm: swap: make should_try_to_free_swap() support large-folio
> >   mm: support large folios swapin as a whole
> >   mm: madvise: don't split mTHP for MADV_PAGEOUT
> >
> >  arch/arm64/include/asm/pgtable.h |  21 ++----
> >  arch/arm64/mm/mteswap.c          |  42 ++++++++++++
> >  include/asm-generic/tlb.h        |  10 +++
> >  include/linux/huge_mm.h          |  12 ----
> >  include/linux/pgtable.h          |  62 ++++++++++++++++-
> >  include/linux/swap.h             |   6 ++
> >  mm/madvise.c                     |  48 ++++++++++++++
> >  mm/memory.c                      | 110 ++++++++++++++++++++++++++-----
> >  mm/page_io.c                     |   2 +-
> >  mm/rmap.c                        |   5 +-
> >  mm/swap_slots.c                  |   2 +-
> >  mm/swapfile.c                    |  29 ++++++++
> >  12 files changed, 301 insertions(+), 48 deletions(-)
> >
>

Thanks
Barry

Re: [PATCH RFC 0/6] mm: support large folios swap-in

Posted by Ryan Roberts 2 years ago

On 18/01/2024 23:54, Barry Song wrote:
> On Thu, Jan 18, 2024 at 11:25 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 18/01/2024 11:10, Barry Song wrote:
>>> On an embedded system like Android, more than half of anon memory is actually
>>> in swap devices such as zRAM. For example, while an app is switched to back-
>>> ground, its most memory might be swapped-out.
>>>
>>> Now we have mTHP features, unfortunately, if we don't support large folios
>>> swap-in, once those large folios are swapped-out, we immediately lose the
>>> performance gain we can get through large folios and hardware optimization
>>> such as CONT-PTE.
>>>
>>> In theory, we don't need to rely on Ryan's swap out patchset[1]. That is to say,
>>> before swap-out, if some memory were normal pages, but when swapping in, we
>>> can also swap-in them as large folios.
>>
>> I think this could also violate MADV_NOHUGEPAGE; if the application has
>> requested that we do not create a THP, then we had better not; it could cause a
>> correctness issue in some circumstances. You would need to pay attention to this
>> vma flag if taking this approach.
>>
>>> But this might require I/O happen at
>>> some random places in swap devices. So we limit the large folios swap-in to
>>> those areas which were large folios before swapping-out, aka, swaps are also
>>> contiguous in hardware.
>>
>> In fact, even this may not be sufficient; it's possible that a contiguous set of
>> base pages (small folios) were allocated to a virtual mapping and all swapped
>> out together - they would likely end up contiguous in the swap file, but should
>> not be swapped back in as a single folio because of this (same reasoning applies
>> to cluster of smaller THPs that you mistake for a larger THP, etc).
>>
>> So you will need to check what THP sizes are enabled and check the VMA
>> suitability regardless; Perhaps you are already doing this - I haven't looked at
>> the code yet.
> 
> we are actually re-using your alloc_anon_folio() by adding a parameter
> to make it
> support both do_anon_page and do_swap_page,
> 
> -static struct folio *alloc_anon_folio(struct vm_fault *vmf)
> +static struct folio *alloc_anon_folio(struct vm_fault *vmf,
> +      bool (*pte_range_check)(pte_t *, int))
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   struct vm_area_struct *vma = vmf->vma;
> @@ -4190,7 +4270,7 @@ static struct folio *alloc_anon_folio(struct
> vm_fault *vmf)
>   order = highest_order(orders);
>   while (orders) {
>   addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
> - if (pte_range_none(pte + pte_index(addr), 1 << order))
> + if (pte_range_check(pte + pte_index(addr), 1 << order))
>   break;
>   order = next_order(&orders, order);
>   }
> @@ -4269,7 +4349,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>   if (unlikely(anon_vma_prepare(vma)))
>   goto oom;
>   /* Returns NULL on OOM or ERR_PTR(-EAGAIN) if we must retry the fault */
> - folio = alloc_anon_folio(vmf);
> + folio = alloc_anon_folio(vmf, pte_range_none);
>   if (IS_ERR(folio))
>   return 0;
>   if (!folio)
> --
> 
> I assume this has checked everything?

Ahh yes, very good. In that case you can disregard what I said; its already covered.

I notice that this series appears as a reply to my series. I'm not sure what the
normal convention is, but I expect more people would see it if you posted it as
its own thread?


> 
>>
>> I'll aim to review the code in the next couple of weeks.
> 
> nice, thanks!
> 
>>
>> Thanks,
>> Ryan
>>
>>> On the other hand, in OPPO's product, we've deployed
>>> anon large folios on millions of phones[2]. we enhanced zsmalloc and zRAM to
>>> compress and decompress large folios as a whole, which help improve compression
>>> ratio and decrease CPU consumption significantly. In zsmalloc and zRAM we can
>>> save large objects whose original size are 64KiB for example. So it is also a
>>> better choice for us to only swap-in large folios for those compressed large
>>> objects as a large folio can be decompressed all together.
>>>
>>> Note I am moving my previous "arm64: mm: swap: support THP_SWAP on hardware
>>> with MTE" to this series as it might help review.
>>>
>>> [1] [PATCH v3 0/4] Swap-out small-sized THP without splitting
>>> https://lore.kernel.org/linux-mm/20231025144546.577640-1-ryan.roberts@arm.com/
>>> [2] OnePlusOSS / android_kernel_oneplus_sm8550
>>> https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/oneplus/sm8550_u_14.0.0_oneplus11
>>>
>>> Barry Song (2):
>>>   arm64: mm: swap: support THP_SWAP on hardware with MTE
>>>   mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap()
>>>
>>> Chuanhua Han (4):
>>>   mm: swap: introduce swap_nr_free() for batched swap_free()
>>>   mm: swap: make should_try_to_free_swap() support large-folio
>>>   mm: support large folios swapin as a whole
>>>   mm: madvise: don't split mTHP for MADV_PAGEOUT
>>>
>>>  arch/arm64/include/asm/pgtable.h |  21 ++----
>>>  arch/arm64/mm/mteswap.c          |  42 ++++++++++++
>>>  include/asm-generic/tlb.h        |  10 +++
>>>  include/linux/huge_mm.h          |  12 ----
>>>  include/linux/pgtable.h          |  62 ++++++++++++++++-
>>>  include/linux/swap.h             |   6 ++
>>>  mm/madvise.c                     |  48 ++++++++++++++
>>>  mm/memory.c                      | 110 ++++++++++++++++++++++++++-----
>>>  mm/page_io.c                     |   2 +-
>>>  mm/rmap.c                        |   5 +-
>>>  mm/swap_slots.c                  |   2 +-
>>>  mm/swapfile.c                    |  29 ++++++++
>>>  12 files changed, 301 insertions(+), 48 deletions(-)
>>>
>>
> 
> Thanks
> Barry

Re: [PATCH RFC 0/6] mm: support large folios swap-in

Posted by Barry Song 2 years ago

On Fri, Jan 19, 2024 at 9:25 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 18/01/2024 23:54, Barry Song wrote:
> > On Thu, Jan 18, 2024 at 11:25 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 18/01/2024 11:10, Barry Song wrote:
> >>> On an embedded system like Android, more than half of anon memory is actually
> >>> in swap devices such as zRAM. For example, while an app is switched to back-
> >>> ground, its most memory might be swapped-out.
> >>>
> >>> Now we have mTHP features, unfortunately, if we don't support large folios
> >>> swap-in, once those large folios are swapped-out, we immediately lose the
> >>> performance gain we can get through large folios and hardware optimization
> >>> such as CONT-PTE.
> >>>
> >>> In theory, we don't need to rely on Ryan's swap out patchset[1]. That is to say,
> >>> before swap-out, if some memory were normal pages, but when swapping in, we
> >>> can also swap-in them as large folios.
> >>
> >> I think this could also violate MADV_NOHUGEPAGE; if the application has
> >> requested that we do not create a THP, then we had better not; it could cause a
> >> correctness issue in some circumstances. You would need to pay attention to this
> >> vma flag if taking this approach.
> >>
> >>> But this might require I/O happen at
> >>> some random places in swap devices. So we limit the large folios swap-in to
> >>> those areas which were large folios before swapping-out, aka, swaps are also
> >>> contiguous in hardware.
> >>
> >> In fact, even this may not be sufficient; it's possible that a contiguous set of
> >> base pages (small folios) were allocated to a virtual mapping and all swapped
> >> out together - they would likely end up contiguous in the swap file, but should
> >> not be swapped back in as a single folio because of this (same reasoning applies
> >> to cluster of smaller THPs that you mistake for a larger THP, etc).
> >>
> >> So you will need to check what THP sizes are enabled and check the VMA
> >> suitability regardless; Perhaps you are already doing this - I haven't looked at
> >> the code yet.
> >
> > we are actually re-using your alloc_anon_folio() by adding a parameter
> > to make it
> > support both do_anon_page and do_swap_page,
> >
> > -static struct folio *alloc_anon_folio(struct vm_fault *vmf)
> > +static struct folio *alloc_anon_folio(struct vm_fault *vmf,
> > +      bool (*pte_range_check)(pte_t *, int))
> >  {
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >   struct vm_area_struct *vma = vmf->vma;
> > @@ -4190,7 +4270,7 @@ static struct folio *alloc_anon_folio(struct
> > vm_fault *vmf)
> >   order = highest_order(orders);
> >   while (orders) {
> >   addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
> > - if (pte_range_none(pte + pte_index(addr), 1 << order))
> > + if (pte_range_check(pte + pte_index(addr), 1 << order))
> >   break;
> >   order = next_order(&orders, order);
> >   }
> > @@ -4269,7 +4349,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> >   if (unlikely(anon_vma_prepare(vma)))
> >   goto oom;
> >   /* Returns NULL on OOM or ERR_PTR(-EAGAIN) if we must retry the fault */
> > - folio = alloc_anon_folio(vmf);
> > + folio = alloc_anon_folio(vmf, pte_range_none);
> >   if (IS_ERR(folio))
> >   return 0;
> >   if (!folio)
> > --
> >
> > I assume this has checked everything?
>
> Ahh yes, very good. In that case you can disregard what I said; its already covered.
>
> I notice that this series appears as a reply to my series. I'm not sure what the
> normal convention is, but I expect more people would see it if you posted it as
> its own thread?

yes. i was replying your series because we were using your series to
swapout large folios
without splitting. in v2,  i will send it as a new thread.

>
>
> >
> >>
> >> I'll aim to review the code in the next couple of weeks.
> >
> > nice, thanks!
> >
> >>
> >> Thanks,
> >> Ryan
> >>
> >>> On the other hand, in OPPO's product, we've deployed
> >>> anon large folios on millions of phones[2]. we enhanced zsmalloc and zRAM to
> >>> compress and decompress large folios as a whole, which help improve compression
> >>> ratio and decrease CPU consumption significantly. In zsmalloc and zRAM we can
> >>> save large objects whose original size are 64KiB for example. So it is also a
> >>> better choice for us to only swap-in large folios for those compressed large
> >>> objects as a large folio can be decompressed all together.
> >>>
> >>> Note I am moving my previous "arm64: mm: swap: support THP_SWAP on hardware
> >>> with MTE" to this series as it might help review.
> >>>
> >>> [1] [PATCH v3 0/4] Swap-out small-sized THP without splitting
> >>> https://lore.kernel.org/linux-mm/20231025144546.577640-1-ryan.roberts@arm.com/
> >>> [2] OnePlusOSS / android_kernel_oneplus_sm8550
> >>> https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/oneplus/sm8550_u_14.0.0_oneplus11
> >>>
> >>> Barry Song (2):
> >>>   arm64: mm: swap: support THP_SWAP on hardware with MTE
> >>>   mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap()
> >>>
> >>> Chuanhua Han (4):
> >>>   mm: swap: introduce swap_nr_free() for batched swap_free()
> >>>   mm: swap: make should_try_to_free_swap() support large-folio
> >>>   mm: support large folios swapin as a whole
> >>>   mm: madvise: don't split mTHP for MADV_PAGEOUT
> >>>
> >>>  arch/arm64/include/asm/pgtable.h |  21 ++----
> >>>  arch/arm64/mm/mteswap.c          |  42 ++++++++++++
> >>>  include/asm-generic/tlb.h        |  10 +++
> >>>  include/linux/huge_mm.h          |  12 ----
> >>>  include/linux/pgtable.h          |  62 ++++++++++++++++-
> >>>  include/linux/swap.h             |   6 ++
> >>>  mm/madvise.c                     |  48 ++++++++++++++
> >>>  mm/memory.c                      | 110 ++++++++++++++++++++++++++-----
> >>>  mm/page_io.c                     |   2 +-
> >>>  mm/rmap.c                        |   5 +-
> >>>  mm/swap_slots.c                  |   2 +-
> >>>  mm/swapfile.c                    |  29 ++++++++
> >>>  12 files changed, 301 insertions(+), 48 deletions(-)
> >>>
> >>
> >
 Thanks
 Barry
>

Re: [PATCH RFC 0/6] mm: support large folios swap-in

Posted by Huang, Ying 2 years ago

Barry Song <21cnbao@gmail.com> writes:

> On an embedded system like Android, more than half of anon memory is actually
> in swap devices such as zRAM. For example, while an app is switched to back-
> ground, its most memory might be swapped-out.
>
> Now we have mTHP features, unfortunately, if we don't support large folios
> swap-in, once those large folios are swapped-out, we immediately lose the 
> performance gain we can get through large folios and hardware optimization
> such as CONT-PTE.
>
> In theory, we don't need to rely on Ryan's swap out patchset[1]. That is to say,
> before swap-out, if some memory were normal pages, but when swapping in, we
> can also swap-in them as large folios. But this might require I/O happen at
> some random places in swap devices. So we limit the large folios swap-in to
> those areas which were large folios before swapping-out, aka, swaps are also
> contiguous in hardware. On the other hand, in OPPO's product, we've deployed
> anon large folios on millions of phones[2]. we enhanced zsmalloc and zRAM to
> compress and decompress large folios as a whole, which help improve compression
> ratio and decrease CPU consumption significantly. In zsmalloc and zRAM we can
> save large objects whose original size are 64KiB for example. So it is also a
> better choice for us to only swap-in large folios for those compressed large
> objects as a large folio can be decompressed all together.

Another possibility is to combine large folios swap-in with VMA based
swap-in readahead.  If we will swap-in readahead several pages based on
VMA, we can swap-in a large folio instead.

I think that it is similar as allocating large file folios for file
readahead (TBH, I haven't check file large folios allocation code).

--
Best Regards,
Huang, Ying

> Note I am moving my previous "arm64: mm: swap: support THP_SWAP on hardware
> with MTE" to this series as it might help review.
>
> [1] [PATCH v3 0/4] Swap-out small-sized THP without splitting
> https://lore.kernel.org/linux-mm/20231025144546.577640-1-ryan.roberts@arm.com/
> [2] OnePlusOSS / android_kernel_oneplus_sm8550 
> https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/oneplus/sm8550_u_14.0.0_oneplus11
>
> Barry Song (2):
>   arm64: mm: swap: support THP_SWAP on hardware with MTE
>   mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap()
>
> Chuanhua Han (4):
>   mm: swap: introduce swap_nr_free() for batched swap_free()
>   mm: swap: make should_try_to_free_swap() support large-folio
>   mm: support large folios swapin as a whole
>   mm: madvise: don't split mTHP for MADV_PAGEOUT
>
>  arch/arm64/include/asm/pgtable.h |  21 ++----
>  arch/arm64/mm/mteswap.c          |  42 ++++++++++++
>  include/asm-generic/tlb.h        |  10 +++
>  include/linux/huge_mm.h          |  12 ----
>  include/linux/pgtable.h          |  62 ++++++++++++++++-
>  include/linux/swap.h             |   6 ++
>  mm/madvise.c                     |  48 ++++++++++++++
>  mm/memory.c                      | 110 ++++++++++++++++++++++++++-----
>  mm/page_io.c                     |   2 +-
>  mm/rmap.c                        |   5 +-
>  mm/swap_slots.c                  |   2 +-
>  mm/swapfile.c                    |  29 ++++++++
>  12 files changed, 301 insertions(+), 48 deletions(-)