[v3] mm: support large folios swap-in

[RFC PATCH v3 3/5] mm: swap: make should_try_to_free_swap() support large-folio

Posted by Barry Song 1 year, 11 months ago

From: Chuanhua Han <hanchuanhua@oppo.com>

should_try_to_free_swap() works with an assumption that swap-in is always done
at normal page granularity, aka, folio_nr_pages = 1. To support large folio
swap-in, this patch removes the assumption.

Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com>
Co-developed-by: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: Chris Li <chrisl@kernel.org>
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index abd4f33d62c9..e0d34d705e07 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio,
 	 * reference only in case it's likely that we'll be the exlusive user.
 	 */
 	return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
-		folio_ref_count(folio) == 2;
+		folio_ref_count(folio) == (1 + folio_nr_pages(folio));
 }
 
 static vm_fault_t pte_marker_clear(struct vm_fault *vmf)
-- 
2.34.1

Re: [RFC PATCH v3 3/5] mm: swap: make should_try_to_free_swap() support large-folio

Posted by Ryan Roberts 1 year, 11 months ago

On 04/03/2024 08:13, Barry Song wrote:
> From: Chuanhua Han <hanchuanhua@oppo.com>
> 
> should_try_to_free_swap() works with an assumption that swap-in is always done
> at normal page granularity, aka, folio_nr_pages = 1. To support large folio
> swap-in, this patch removes the assumption.
> 
> Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com>
> Co-developed-by: Barry Song <v-songbaohua@oppo.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Acked-by: Chris Li <chrisl@kernel.org>
> ---
>  mm/memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index abd4f33d62c9..e0d34d705e07 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio,
>  	 * reference only in case it's likely that we'll be the exlusive user.
>  	 */
>  	return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
> -		folio_ref_count(folio) == 2;
> +		folio_ref_count(folio) == (1 + folio_nr_pages(folio));

I don't think this is correct; one reference has just been added to the folio in
do_swap_page(), either by getting from swapcache (swap_cache_get_folio()) or by
allocating. If it came from the swapcache, it could be a large folio, because we
swapped out a large folio and never removed it from swapcache. But in that case,
others may have partially mapped it, so the refcount could legitimately equal
the number of pages while still not being exclusively mapped.

I'm guessing this logic is trying to estimate when we are likely exclusive so
that we remove from swapcache (release ref) and can then reuse rather than CoW
the folio? The main CoW path currently CoWs page-by-page even for large folios,
and with Barry's recent patch, even the last page gets copied. So not sure what
this change is really trying to achieve?

>  }
>  
>  static vm_fault_t pte_marker_clear(struct vm_fault *vmf)

Re: [RFC PATCH v3 3/5] mm: swap: make should_try_to_free_swap() support large-folio

Posted by Chuanhua Han 1 year, 11 months ago

hi, Ryan Roberts

在 2024/3/12 20:34, Ryan Roberts 写道:
> On 04/03/2024 08:13, Barry Song wrote:
>> From: Chuanhua Han <hanchuanhua@oppo.com>
>>
>> should_try_to_free_swap() works with an assumption that swap-in is always done
>> at normal page granularity, aka, folio_nr_pages = 1. To support large folio
>> swap-in, this patch removes the assumption.
>>
>> Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com>
>> Co-developed-by: Barry Song <v-songbaohua@oppo.com>
>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>> Acked-by: Chris Li <chrisl@kernel.org>
>> ---
>>  mm/memory.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index abd4f33d62c9..e0d34d705e07 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio,
>>  	 * reference only in case it's likely that we'll be the exlusive user.
>>  	 */
>>  	return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
>> -		folio_ref_count(folio) == 2;
>> +		folio_ref_count(folio) == (1 + folio_nr_pages(folio));
> I don't think this is correct; one reference has just been added to the folio in
> do_swap_page(), either by getting from swapcache (swap_cache_get_folio()) or by
> allocating. If it came from the swapcache, it could be a large folio, because we
> swapped out a large folio and never removed it from swapcache. But in that case,
> others may have partially mapped it, so the refcount could legitimately equal
> the number of pages while still not being exclusively mapped.
>
> I'm guessing this logic is trying to estimate when we are likely exclusive so
> that we remove from swapcache (release ref) and can then reuse rather than CoW
> the folio? The main CoW path currently CoWs page-by-page even for large folios,
> and with Barry's recent patch, even the last page gets copied. So not sure what
> this change is really trying to achieve?
>
First, if it is a large folio in the swap cache, then its refcont is at
least folio_nr_pages(folio) :  


For example, in add_to_swap_cache path:

int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
                        gfp_t gfp, void **shadowp)
{
        struct address_space *address_space = swap_address_space(entry);
        pgoff_t idx = swp_offset(entry);
        XA_STATE_ORDER(xas, &address_space->i_pages, idx,
folio_order(folio));
        unsigned long i, nr = folio_nr_pages(folio); <---
        void *old;
        ...
        folio_ref_add(folio, nr); <---
        folio_set_swapcache(folio);
        ...
}


  *

    Then in the do_swap_page path:

  * if (should_try_to_free_swap(folio, vma, vmf->flags))
            folio_free_swap(folio);
  *

  * It also indicates that only folio in the swap cache will call
    folio_free_swap
  * to delete it from the swap cache, So I feel like this patch is
    necessary!? 😁

>>  }
>>  
>>  static vm_fault_t pte_marker_clear(struct vm_fault *vmf)

Thanks,

Chuanhua

Re: [RFC PATCH v3 3/5] mm: swap: make should_try_to_free_swap() support large-folio

Posted by Ryan Roberts 1 year, 11 months ago

On 13/03/2024 02:21, Chuanhua Han wrote:
> hi, Ryan Roberts
> 
> 在 2024/3/12 20:34, Ryan Roberts 写道:
>> On 04/03/2024 08:13, Barry Song wrote:
>>> From: Chuanhua Han <hanchuanhua@oppo.com>
>>>
>>> should_try_to_free_swap() works with an assumption that swap-in is always done
>>> at normal page granularity, aka, folio_nr_pages = 1. To support large folio
>>> swap-in, this patch removes the assumption.
>>>
>>> Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com>
>>> Co-developed-by: Barry Song <v-songbaohua@oppo.com>
>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>> Acked-by: Chris Li <chrisl@kernel.org>
>>> ---
>>>  mm/memory.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index abd4f33d62c9..e0d34d705e07 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio,
>>>  	 * reference only in case it's likely that we'll be the exlusive user.
>>>  	 */
>>>  	return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
>>> -		folio_ref_count(folio) == 2;
>>> +		folio_ref_count(folio) == (1 + folio_nr_pages(folio));
>> I don't think this is correct; one reference has just been added to the folio in
>> do_swap_page(), either by getting from swapcache (swap_cache_get_folio()) or by
>> allocating. If it came from the swapcache, it could be a large folio, because we
>> swapped out a large folio and never removed it from swapcache. But in that case,
>> others may have partially mapped it, so the refcount could legitimately equal
>> the number of pages while still not being exclusively mapped.
>>
>> I'm guessing this logic is trying to estimate when we are likely exclusive so
>> that we remove from swapcache (release ref) and can then reuse rather than CoW
>> the folio? The main CoW path currently CoWs page-by-page even for large folios,
>> and with Barry's recent patch, even the last page gets copied. So not sure what
>> this change is really trying to achieve?
>>
> First, if it is a large folio in the swap cache, then its refcont is at
> least folio_nr_pages(folio) :  

Ahh! Sorry, I had it backwards - was thinking there would be 1 ref for the swap
cache, and you were assuming 1 ref per page taken by do_swap_page(). I
understand now. On this basis:

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> 
> 
> For example, in add_to_swap_cache path:
> 
> int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
>                         gfp_t gfp, void **shadowp)
> {
>         struct address_space *address_space = swap_address_space(entry);
>         pgoff_t idx = swp_offset(entry);
>         XA_STATE_ORDER(xas, &address_space->i_pages, idx,
> folio_order(folio));
>         unsigned long i, nr = folio_nr_pages(folio); <---
>         void *old;
>         ...
>         folio_ref_add(folio, nr); <---
>         folio_set_swapcache(folio);
>         ...
> }
> 
> 
>   *
> 
>     Then in the do_swap_page path:
> 
>   * if (should_try_to_free_swap(folio, vma, vmf->flags))
>             folio_free_swap(folio);
>   *
> 
>   * It also indicates that only folio in the swap cache will call
>     folio_free_swap
>   * to delete it from the swap cache, So I feel like this patch is
>     necessary!? 😁
> 
>>>  }
>>>  
>>>  static vm_fault_t pte_marker_clear(struct vm_fault *vmf)
> 
> Thanks,
> 
> Chuanhua
>

Re: [RFC PATCH v3 3/5] mm: swap: make should_try_to_free_swap() support large-folio

Posted by Chuanhua Han 1 year, 11 months ago

Hi Ryan,
Ryan Roberts <ryan.roberts@arm.com> 于2024年3月13日周三 17:10写道：
>
> On 13/03/2024 02:21, Chuanhua Han wrote:
> > hi, Ryan Roberts
> >
> > 在 2024/3/12 20:34, Ryan Roberts 写道:
> >> On 04/03/2024 08:13, Barry Song wrote:
> >>> From: Chuanhua Han <hanchuanhua@oppo.com>
> >>>
> >>> should_try_to_free_swap() works with an assumption that swap-in is always done
> >>> at normal page granularity, aka, folio_nr_pages = 1. To support large folio
> >>> swap-in, this patch removes the assumption.
> >>>
> >>> Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com>
> >>> Co-developed-by: Barry Song <v-songbaohua@oppo.com>
> >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> >>> Acked-by: Chris Li <chrisl@kernel.org>
> >>> ---
> >>>  mm/memory.c | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/mm/memory.c b/mm/memory.c
> >>> index abd4f33d62c9..e0d34d705e07 100644
> >>> --- a/mm/memory.c
> >>> +++ b/mm/memory.c
> >>> @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio,
> >>>      * reference only in case it's likely that we'll be the exlusive user.
> >>>      */
> >>>     return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
> >>> -           folio_ref_count(folio) == 2;
> >>> +           folio_ref_count(folio) == (1 + folio_nr_pages(folio));
> >> I don't think this is correct; one reference has just been added to the folio in
> >> do_swap_page(), either by getting from swapcache (swap_cache_get_folio()) or by
> >> allocating. If it came from the swapcache, it could be a large folio, because we
> >> swapped out a large folio and never removed it from swapcache. But in that case,
> >> others may have partially mapped it, so the refcount could legitimately equal
> >> the number of pages while still not being exclusively mapped.
> >>
> >> I'm guessing this logic is trying to estimate when we are likely exclusive so
> >> that we remove from swapcache (release ref) and can then reuse rather than CoW
> >> the folio? The main CoW path currently CoWs page-by-page even for large folios,
> >> and with Barry's recent patch, even the last page gets copied. So not sure what
> >> this change is really trying to achieve?
> >>
> > First, if it is a large folio in the swap cache, then its refcont is at
> > least folio_nr_pages(folio) :
>
> Ahh! Sorry, I had it backwards - was thinking there would be 1 ref for the swap
> cache, and you were assuming 1 ref per page taken by do_swap_page(). I
> understand now. On this basis:
>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

Thank you for your review!
>
> >
> >
> > For example, in add_to_swap_cache path:
> >
> > int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
> >                         gfp_t gfp, void **shadowp)
> > {
> >         struct address_space *address_space = swap_address_space(entry);
> >         pgoff_t idx = swp_offset(entry);
> >         XA_STATE_ORDER(xas, &address_space->i_pages, idx,
> > folio_order(folio));
> >         unsigned long i, nr = folio_nr_pages(folio); <---
> >         void *old;
> >         ...
> >         folio_ref_add(folio, nr); <---
> >         folio_set_swapcache(folio);
> >         ...
> > }
> >
> >
> >   *
> >
> >     Then in the do_swap_page path:
> >
> >   * if (should_try_to_free_swap(folio, vma, vmf->flags))
> >             folio_free_swap(folio);
> >   *
> >
> >   * It also indicates that only folio in the swap cache will call
> >     folio_free_swap
> >   * to delete it from the swap cache, So I feel like this patch is
> >     necessary!? 😁
> >
> >>>  }
> >>>
> >>>  static vm_fault_t pte_marker_clear(struct vm_fault *vmf)
> >
> > Thanks,
> >
> > Chuanhua
> >
>
>
Thanks,
Chuanhua