[RFC PATCH 3/3] mm/mincore: avoid touching the PTL

Kairui Song posted 3 patches 1 month, 4 weeks ago
[RFC PATCH 3/3] mm/mincore: avoid touching the PTL
Posted by Kairui Song 1 month, 4 weeks ago
From: Kairui Song <kasong@tencent.com>

mincore only interested in the existence of a page, which is a
changing state by nature, locking and making it stable is not needed.
And now neither mincore_page or mincore_swap requires PTL, this PTL
locking can be dropped.

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/mincore.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/mincore.c b/mm/mincore.c
index 1ac53acac239..cc4460aba1f9 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -153,13 +153,13 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		goto out;
 	}
 
-	ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+	ptep = pte_offset_map(pmd, addr);
 	if (!ptep) {
 		walk->action = ACTION_AGAIN;
 		return 0;
 	}
 	for (; addr != end; ptep += step, addr += step * PAGE_SIZE) {
-		pte_t pte = ptep_get(ptep);
+		pte_t pte = ptep_get_lockless(ptep);
 
 		step = 1;
 		/* We need to do cache lookup too for pte markers */
@@ -192,7 +192,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		}
 		vec += step;
 	}
-	pte_unmap_unlock(ptep - 1, ptl);
+	pte_unmap(ptep - 1);
 out:
 	walk->private += nr;
 	cond_resched();
-- 
2.50.1
Re: [RFC PATCH 3/3] mm/mincore: avoid touching the PTL
Posted by David Hildenbrand 1 month, 3 weeks ago
On 07.08.25 17:27, Kairui Song wrote:
> From: Kairui Song <kasong@tencent.com>
> 
> mincore only interested in the existence of a page, which is a
> changing state by nature, locking and making it stable is not needed.
> And now neither mincore_page or mincore_swap requires PTL, this PTL
> locking can be dropped.
> 
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
>   mm/mincore.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/mincore.c b/mm/mincore.c
> index 1ac53acac239..cc4460aba1f9 100644
> --- a/mm/mincore.c
> +++ b/mm/mincore.c
> @@ -153,13 +153,13 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>   		goto out;
>   	}
>   
> -	ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> +	ptep = pte_offset_map(pmd, addr);

I agree with Jann, that if we move away from the PTL, we better have a 
very good reason (+performance numbers).

-- 
Cheers,

David / dhildenb
Re: [RFC PATCH 3/3] mm/mincore: avoid touching the PTL
Posted by Jann Horn 1 month, 4 weeks ago
On Thu, Aug 7, 2025 at 5:27 PM Kairui Song <ryncsn@gmail.com> wrote:
> mincore only interested in the existence of a page, which is a
> changing state by nature, locking and making it stable is not needed.
> And now neither mincore_page or mincore_swap requires PTL, this PTL
> locking can be dropped.

This means you can race such that you end up looking at an unrelated
page of another process, right? And your patch intentionally allows
that to happen in order to make mincore() faster?
Re: [RFC PATCH 3/3] mm/mincore: avoid touching the PTL
Posted by Kairui Song 1 month, 4 weeks ago
On Fri, Aug 8, 2025 at 12:06 AM Jann Horn <jannh@google.com> wrote:
>
> On Thu, Aug 7, 2025 at 5:27 PM Kairui Song <ryncsn@gmail.com> wrote:
> > mincore only interested in the existence of a page, which is a
> > changing state by nature, locking and making it stable is not needed.
> > And now neither mincore_page or mincore_swap requires PTL, this PTL
> > locking can be dropped.
>
> This means you can race such that you end up looking at an unrelated
> page of another process, right?

I was thinking If the PTE is gone, it will make mincore go check the
page cache, but even if we hold the PTL here, the next mincore call
(if called soon enough) could check the page cache using the same
address. And it never checks any actual page if the PTE is not none.

Perhaps you mean that it's now doing the page / swap cache lookup
without holding PTL so if the PTE changed, then the lookup could be
using an invalidated index, and may find an unrelated page.

A changing PTE also means the mincore return value is changing, and if
called earlier or later by a little bit, the result of that address
could be opposite, and mincore only checks if the page existed,
it's hard to say the returned value is a false positive / negative?

Or could this introduce a new security issue?

> And your patch intentionally allows that to happen in order to make mincore() faster?

When doing a clean up (patch 1) I noticed and didn't understand why we
need a PTL here. It will no longer block others and go faster as we
remove one lock, I can drop this one if we are not comfortable with
it.
Re: [RFC PATCH 3/3] mm/mincore: avoid touching the PTL
Posted by Jann Horn 1 month, 4 weeks ago
On Thu, Aug 7, 2025 at 7:28 PM Kairui Song <ryncsn@gmail.com> wrote:
> On Fri, Aug 8, 2025 at 12:06 AM Jann Horn <jannh@google.com> wrote:
> >
> > On Thu, Aug 7, 2025 at 5:27 PM Kairui Song <ryncsn@gmail.com> wrote:
> > > mincore only interested in the existence of a page, which is a
> > > changing state by nature, locking and making it stable is not needed.
> > > And now neither mincore_page or mincore_swap requires PTL, this PTL
> > > locking can be dropped.
> >
> > This means you can race such that you end up looking at an unrelated
> > page of another process, right?
>
> I was thinking If the PTE is gone, it will make mincore go check the
> page cache, but even if we hold the PTL here, the next mincore call
> (if called soon enough) could check the page cache using the same
> address. And it never checks any actual page if the PTE is not none.
>
> Perhaps you mean that it's now doing the page / swap cache lookup
> without holding PTL so if the PTE changed, then the lookup could be
> using an invalidated index, and may find an unrelated page.

Yes, that's what I meant.

> A changing PTE also means the mincore return value is changing, and if
> called earlier or later by a little bit, the result of that address
> could be opposite, and mincore only checks if the page existed,
> it's hard to say the returned value is a false positive / negative?
>
> Or could this introduce a new security issue?

I don't have specific security concerns here; but this is a change
that trades accuracy and simplicity for performance.

> > And your patch intentionally allows that to happen in order to make mincore() faster?
>
> When doing a clean up (patch 1) I noticed and didn't understand why we
> need a PTL here. It will no longer block others and go faster as we
> remove one lock, I can drop this one if we are not comfortable with
> it.

If you had a specific performance concern here, I think we could
consider changing this, but in my view it would sort of be breaking
the locking rules (by using a swap index that is not guaranteed to be
kept alive) and would need an explanatory comment explaining the
tradeoff.

Since you only wrote the patch because you thought the lock was
unnecessary, I'd prefer it if you drop this patch.
Re: [RFC PATCH 3/3] mm/mincore: avoid touching the PTL
Posted by Kairui Song 1 month, 4 weeks ago
On Fri, Aug 8, 2025 at 1:45 AM Jann Horn <jannh@google.com> wrote:
> On Thu, Aug 7, 2025 at 7:28 PM Kairui Song <ryncsn@gmail.com> wrote:
> > On Fri, Aug 8, 2025 at 12:06 AM Jann Horn <jannh@google.com> wrote:
> > >
> > > On Thu, Aug 7, 2025 at 5:27 PM Kairui Song <ryncsn@gmail.com> wrote:
> > > > mincore only interested in the existence of a page, which is a
> > > > changing state by nature, locking and making it stable is not needed.
> > > > And now neither mincore_page or mincore_swap requires PTL, this PTL
> > > > locking can be dropped.
> > >
> > > This means you can race such that you end up looking at an unrelated
> > > page of another process, right?
> >
> > I was thinking If the PTE is gone, it will make mincore go check the
> > page cache, but even if we hold the PTL here, the next mincore call
> > (if called soon enough) could check the page cache using the same
> > address. And it never checks any actual page if the PTE is not none.
> >
> > Perhaps you mean that it's now doing the page / swap cache lookup
> > without holding PTL so if the PTE changed, then the lookup could be
> > using an invalidated index, and may find an unrelated page.
>
> Yes, that's what I meant.
>
> > A changing PTE also means the mincore return value is changing, and if
> > called earlier or later by a little bit, the result of that address
> > could be opposite, and mincore only checks if the page existed,
> > it's hard to say the returned value is a false positive / negative?
> >
> > Or could this introduce a new security issue?
>
> I don't have specific security concerns here; but this is a change
> that trades accuracy and simplicity for performance.
>
> > > And your patch intentionally allows that to happen in order to make mincore() faster?
> >
> > When doing a clean up (patch 1) I noticed and didn't understand why we
> > need a PTL here. It will no longer block others and go faster as we
> > remove one lock, I can drop this one if we are not comfortable with
> > it.
>
> If you had a specific performance concern here, I think we could
> consider changing this, but in my view it would sort of be breaking
> the locking rules (by using a swap index that is not guaranteed to be
> kept alive) and would need an explanatory comment explaining the
> tradeoff.

Thanks for the explanation.

From the swap side, get_swap_device also ensures the offset is still
in the valid lookup range so the worst thing is a very rare inaccurate
value.
PTE change will mean the page is being swapped in/out or zapped, so if
the mincore is called by like a jitter earlier / later, the result
changes. So I thought it hard to define the accuracy in such a case
considering the timing.

>
> Since you only wrote the patch because you thought the lock was
> unnecessary, I'd prefer it if you drop this patch.

Understandable, I can update and keep patch 1 and 2, which improves
the performance and clean it up without causing any potential accuracy
issues.