[PATCHv2 3/5] mm/rmap: mlock large folios in try_to_unmap_one()

Kiryl Shutsemau posted 5 patches 1 week, 5 days ago
There is a newer version of this series
[PATCHv2 3/5] mm/rmap: mlock large folios in try_to_unmap_one()
Posted by Kiryl Shutsemau 1 week, 5 days ago
From: Kiryl Shutsemau <kas@kernel.org>

Currently, try_to_unmap_once() only tries to mlock small folios.

Use logic similar to folio_referenced_one() to mlock large folios:
only do this for fully mapped folios and under page table lock that
protects all page table entries.

Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
---
 mm/rmap.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 3d0235f332de..482e6504fa88 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1870,6 +1870,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 	unsigned long nr_pages = 1, end_addr;
 	unsigned long pfn;
 	unsigned long hsz = 0;
+	int ptes = 0;
 
 	/*
 	 * When racing against e.g. zap_pte_range() on another cpu,
@@ -1910,10 +1911,26 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 		 */
 		if (!(flags & TTU_IGNORE_MLOCK) &&
 		    (vma->vm_flags & VM_LOCKED)) {
+			ptes++;
+			ret = false;
+
+			/* Only mlock fully mapped pages */
+			if (pvmw.pte && ptes != pvmw.nr_pages)
+				continue;
+
+			/*
+			 * All PTEs must be protected by page table lock in
+			 * order to mlock the page.
+			 *
+			 * If page table boundary has been cross, current ptl
+			 * only protect part of ptes.
+			 */
+			if (pvmw.flags & PVMW_PGTABLE_CROSSSED)
+				goto walk_done;
+
 			/* Restore the mlock which got missed */
-			if (!folio_test_large(folio))
-				mlock_vma_folio(folio, vma);
-			goto walk_abort;
+			mlock_vma_folio(folio, vma);
+			goto walk_done;
 		}
 
 		if (!pvmw.pte) {
-- 
2.50.1
Re: [PATCHv2 3/5] mm/rmap: mlock large folios in try_to_unmap_one()
Posted by Shakeel Butt 1 week, 5 days ago
On Fri, Sep 19, 2025 at 01:40:34PM +0100, Kiryl Shutsemau wrote:
> From: Kiryl Shutsemau <kas@kernel.org>
> 
> Currently, try_to_unmap_once() only tries to mlock small folios.
> 
> Use logic similar to folio_referenced_one() to mlock large folios:
> only do this for fully mapped folios and under page table lock that
> protects all page table entries.
> 
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> ---
>  mm/rmap.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 3d0235f332de..482e6504fa88 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1870,6 +1870,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  	unsigned long nr_pages = 1, end_addr;
>  	unsigned long pfn;
>  	unsigned long hsz = 0;
> +	int ptes = 0;
>  
>  	/*
>  	 * When racing against e.g. zap_pte_range() on another cpu,
> @@ -1910,10 +1911,26 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  		 */
>  		if (!(flags & TTU_IGNORE_MLOCK) &&
>  		    (vma->vm_flags & VM_LOCKED)) {
> +			ptes++;
> +			ret = false;
> +
> +			/* Only mlock fully mapped pages */
> +			if (pvmw.pte && ptes != pvmw.nr_pages)
> +				continue;
> +
> +			/*
> +			 * All PTEs must be protected by page table lock in
> +			 * order to mlock the page.
> +			 *
> +			 * If page table boundary has been cross, current ptl
> +			 * only protect part of ptes.
> +			 */
> +			if (pvmw.flags & PVMW_PGTABLE_CROSSSED)
> +				goto walk_done;

Should it be goto walk_abort?

> +
>  			/* Restore the mlock which got missed */
> -			if (!folio_test_large(folio))
> -				mlock_vma_folio(folio, vma);
> -			goto walk_abort;
> +			mlock_vma_folio(folio, vma);
> +			goto walk_done;

Here too?

>  		}
>  
>  		if (!pvmw.pte) {
> -- 
> 2.50.1
>
Re: [PATCHv2 3/5] mm/rmap: mlock large folios in try_to_unmap_one()
Posted by Kiryl Shutsemau 1 week, 2 days ago
On Fri, Sep 19, 2025 at 02:27:40PM -0700, Shakeel Butt wrote:
> On Fri, Sep 19, 2025 at 01:40:34PM +0100, Kiryl Shutsemau wrote:
> > From: Kiryl Shutsemau <kas@kernel.org>
> > 
> > Currently, try_to_unmap_once() only tries to mlock small folios.
> > 
> > Use logic similar to folio_referenced_one() to mlock large folios:
> > only do this for fully mapped folios and under page table lock that
> > protects all page table entries.
> > 
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > ---
> >  mm/rmap.c | 23 ++++++++++++++++++++---
> >  1 file changed, 20 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 3d0235f332de..482e6504fa88 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1870,6 +1870,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> >  	unsigned long nr_pages = 1, end_addr;
> >  	unsigned long pfn;
> >  	unsigned long hsz = 0;
> > +	int ptes = 0;
> >  
> >  	/*
> >  	 * When racing against e.g. zap_pte_range() on another cpu,
> > @@ -1910,10 +1911,26 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> >  		 */
> >  		if (!(flags & TTU_IGNORE_MLOCK) &&
> >  		    (vma->vm_flags & VM_LOCKED)) {
> > +			ptes++;
> > +			ret = false;
> > +
> > +			/* Only mlock fully mapped pages */
> > +			if (pvmw.pte && ptes != pvmw.nr_pages)
> > +				continue;
> > +
> > +			/*
> > +			 * All PTEs must be protected by page table lock in
> > +			 * order to mlock the page.
> > +			 *
> > +			 * If page table boundary has been cross, current ptl
> > +			 * only protect part of ptes.
> > +			 */
> > +			if (pvmw.flags & PVMW_PGTABLE_CROSSSED)
> > +				goto walk_done;
> 
> Should it be goto walk_abort?

I already have to set ret to false above to make it work for partially
mapped large folios. So walk_done is enough here.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCHv2 3/5] mm/rmap: mlock large folios in try_to_unmap_one()
Posted by Shakeel Butt 1 week, 2 days ago
On Mon, Sep 22, 2025 at 10:51:35AM +0100, Kiryl Shutsemau wrote:
> On Fri, Sep 19, 2025 at 02:27:40PM -0700, Shakeel Butt wrote:
> > On Fri, Sep 19, 2025 at 01:40:34PM +0100, Kiryl Shutsemau wrote:
> > > From: Kiryl Shutsemau <kas@kernel.org>
> > > 
> > > Currently, try_to_unmap_once() only tries to mlock small folios.
> > > 
> > > Use logic similar to folio_referenced_one() to mlock large folios:
> > > only do this for fully mapped folios and under page table lock that
> > > protects all page table entries.
> > > 
> > > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > > ---
> > >  mm/rmap.c | 23 ++++++++++++++++++++---
> > >  1 file changed, 20 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 3d0235f332de..482e6504fa88 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -1870,6 +1870,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> > >  	unsigned long nr_pages = 1, end_addr;
> > >  	unsigned long pfn;
> > >  	unsigned long hsz = 0;
> > > +	int ptes = 0;
> > >  
> > >  	/*
> > >  	 * When racing against e.g. zap_pte_range() on another cpu,
> > > @@ -1910,10 +1911,26 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> > >  		 */
> > >  		if (!(flags & TTU_IGNORE_MLOCK) &&
> > >  		    (vma->vm_flags & VM_LOCKED)) {
> > > +			ptes++;
> > > +			ret = false;
> > > +
> > > +			/* Only mlock fully mapped pages */
> > > +			if (pvmw.pte && ptes != pvmw.nr_pages)
> > > +				continue;
> > > +
> > > +			/*
> > > +			 * All PTEs must be protected by page table lock in
> > > +			 * order to mlock the page.
> > > +			 *
> > > +			 * If page table boundary has been cross, current ptl
> > > +			 * only protect part of ptes.
> > > +			 */
> > > +			if (pvmw.flags & PVMW_PGTABLE_CROSSSED)
> > > +				goto walk_done;
> > 
> > Should it be goto walk_abort?
> 
> I already have to set ret to false above to make it work for partially
> mapped large folios. So walk_done is enough here.

Indeed and I missed that. What do you think about adding a comment when
setting ret to false? Everywhere else we are jumping to abort for
scenarios which needs to break the rmap walk loop but here we need to
keep counting the ptes for mlock handling. Anyways it's just a nit.

Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>