[PATCH v13 57/70] mm/mlock: use vma iterator and maple state instead of vma linked list

Liam Howlett posted 70 patches 3 years, 7 months ago
There is a newer version of this series
[PATCH v13 57/70] mm/mlock: use vma iterator and maple state instead of vma linked list
Posted by Liam Howlett 3 years, 7 months ago
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Handle overflow checking in count_mm_mlocked_page_nr() differently.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
---
 mm/mlock.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index b14e929084cc..43d19a1f28eb 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -471,6 +471,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
 	unsigned long nstart, end, tmp;
 	struct vm_area_struct *vma, *prev;
 	int error;
+	MA_STATE(mas, &current->mm->mm_mt, start, start);
 
 	VM_BUG_ON(offset_in_page(start));
 	VM_BUG_ON(len != PAGE_ALIGN(len));
@@ -479,13 +480,14 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
 		return -EINVAL;
 	if (end == start)
 		return 0;
-	vma = find_vma(current->mm, start);
-	if (!vma || vma->vm_start > start)
+	vma = mas_walk(&mas);
+	if (!vma)
 		return -ENOMEM;
 
-	prev = vma->vm_prev;
 	if (start > vma->vm_start)
 		prev = vma;
+	else
+		prev = mas_prev(&mas, 0);
 
 	for (nstart = start ; ; ) {
 		vm_flags_t newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
@@ -505,7 +507,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
 		if (nstart >= end)
 			break;
 
-		vma = prev->vm_next;
+		vma = find_vma(prev->vm_mm, prev->vm_end);
 		if (!vma || vma->vm_start != nstart) {
 			error = -ENOMEM;
 			break;
@@ -526,24 +528,23 @@ static unsigned long count_mm_mlocked_page_nr(struct mm_struct *mm,
 {
 	struct vm_area_struct *vma;
 	unsigned long count = 0;
+	unsigned long end;
+	VMA_ITERATOR(vmi, mm, start);
 
 	if (mm == NULL)
 		mm = current->mm;
 
-	vma = find_vma(mm, start);
-	if (vma == NULL)
-		return 0;
-
-	for (; vma ; vma = vma->vm_next) {
-		if (start >= vma->vm_end)
-			continue;
-		if (start + len <=  vma->vm_start)
-			break;
+	/* Don't overflow past ULONG_MAX */
+	if (unlikely(ULONG_MAX - len < start))
+		end = ULONG_MAX;
+	else
+		end = start + len;
+	for_each_vma_range(vmi, vma, end) {
 		if (vma->vm_flags & VM_LOCKED) {
 			if (start > vma->vm_start)
 				count -= (start - vma->vm_start);
-			if (start + len < vma->vm_end) {
-				count += start + len - vma->vm_start;
+			if (end < vma->vm_end) {
+				count += end - vma->vm_start;
 				break;
 			}
 			count += vma->vm_end - vma->vm_start;
@@ -659,6 +660,7 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
  */
 static int apply_mlockall_flags(int flags)
 {
+	MA_STATE(mas, &current->mm->mm_mt, 0, 0);
 	struct vm_area_struct *vma, *prev = NULL;
 	vm_flags_t to_add = 0;
 
@@ -679,7 +681,7 @@ static int apply_mlockall_flags(int flags)
 			to_add |= VM_LOCKONFAULT;
 	}
 
-	for (vma = current->mm->mmap; vma ; vma = prev->vm_next) {
+	mas_for_each(&mas, vma, ULONG_MAX) {
 		vm_flags_t newflags;
 
 		newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
@@ -687,6 +689,7 @@ static int apply_mlockall_flags(int flags)
 
 		/* Ignore errors */
 		mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
+		mas_pause(&mas);
 		cond_resched();
 	}
 out:
-- 
2.35.1
Re: [PATCH v13 57/70] mm/mlock: use vma iterator and maple state instead of vma linked list
Posted by Mark Brown 3 years, 7 months ago
On Mon, Aug 22, 2022 at 03:06:30PM +0000, Liam Howlett wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Handle overflow checking in count_mm_mlocked_page_nr() differently.

Our QA team found that since next-20220823 we're seeing a couple of test
failures in the check_mmap_options kselftest on arm64 platforms with MTE
that aren't present in mainline:

# # FAIL: mprotect not ignoring clear PROT_MTE property
# not ok 21 Check clear PROT_MTE flags with private mapping, sync error mode and mmap memory
# # FAIL: mprotect not ignoring clear PROT_MTE property
# not ok 22 Check clear PROT_MTE flags with private mapping and sync error mode and mmap/mprotect memory

I bisected this using qemu[1] which landed on 4ceb4bca479d41a
("mm/mprotect: use maple tree navigation instead of vma linked list"),
though I'm not 100% sure I trust the specific identification of the
commit I'm pretty confident it's at the very least in this series.  I've
not done any analysis of the failure beyond getting this bisect result.

[1] qemu -smp cpus=4 -cpu max -machine virt,gic-version=3,mte=on
Re: [PATCH v13 57/70] mm/mlock: use vma iterator and maple state instead of vma linked list
Posted by Liam Howlett 3 years, 7 months ago
* Mark Brown <broonie@kernel.org> [220824 20:34]:
> On Mon, Aug 22, 2022 at 03:06:30PM +0000, Liam Howlett wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > Handle overflow checking in count_mm_mlocked_page_nr() differently.
> 
> Our QA team found that since next-20220823 we're seeing a couple of test
> failures in the check_mmap_options kselftest on arm64 platforms with MTE
> that aren't present in mainline:
> 
> # # FAIL: mprotect not ignoring clear PROT_MTE property
> # not ok 21 Check clear PROT_MTE flags with private mapping, sync error mode and mmap memory
> # # FAIL: mprotect not ignoring clear PROT_MTE property
> # not ok 22 Check clear PROT_MTE flags with private mapping and sync error mode and mmap/mprotect memory

Thanks.

> 
> I bisected this using qemu[1] which landed on 4ceb4bca479d41a
> ("mm/mprotect: use maple tree navigation instead of vma linked list"),
> though I'm not 100% sure I trust the specific identification of the
> commit I'm pretty confident it's at the very least in this series.  I've
> not done any analysis of the failure beyond getting this bisect result.
> 
> [1] qemu -smp cpus=4 -cpu max -machine virt,gic-version=3,mte=on

This helps a lot.  I think your bisect is accurate:

...
        struct mmu_gather tlb;
+       MA_STATE(mas, &current->mm->mm_mt, start, start);
 
        start = untagged_addr(start);
...

It looks like I search against the tagged address.  I should initialize
the state to 0 and mas_set(&mas, start) after untagging the address.

I'll send out a patch once I have recreated and verified this is the
issue.

Cheers,
Liam
Re: [PATCH v13 57/70] mm/mlock: use vma iterator and maple state instead of vma linked list
Posted by Catalin Marinas 3 years, 7 months ago
On Thu, Aug 25, 2022 at 01:21:01PM +0000, Liam Howlett wrote:
> * Mark Brown <broonie@kernel.org> [220824 20:34]:
> > On Mon, Aug 22, 2022 at 03:06:30PM +0000, Liam Howlett wrote:
> > > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > > 
> > > Handle overflow checking in count_mm_mlocked_page_nr() differently.
> > 
> > Our QA team found that since next-20220823 we're seeing a couple of test
> > failures in the check_mmap_options kselftest on arm64 platforms with MTE
> > that aren't present in mainline:
> > 
> > # # FAIL: mprotect not ignoring clear PROT_MTE property
> > # not ok 21 Check clear PROT_MTE flags with private mapping, sync error mode and mmap memory
> > # # FAIL: mprotect not ignoring clear PROT_MTE property
> > # not ok 22 Check clear PROT_MTE flags with private mapping and sync error mode and mmap/mprotect memory
> 
> Thanks.
> 
> > I bisected this using qemu[1] which landed on 4ceb4bca479d41a
> > ("mm/mprotect: use maple tree navigation instead of vma linked list"),
> > though I'm not 100% sure I trust the specific identification of the
> > commit I'm pretty confident it's at the very least in this series.  I've
> > not done any analysis of the failure beyond getting this bisect result.
> > 
> > [1] qemu -smp cpus=4 -cpu max -machine virt,gic-version=3,mte=on
> 
> This helps a lot.  I think your bisect is accurate:
> 
> ...
>         struct mmu_gather tlb;
> +       MA_STATE(mas, &current->mm->mm_mt, start, start);
>  
>         start = untagged_addr(start);
> ...
> 
> It looks like I search against the tagged address.  I should initialize
> the state to 0 and mas_set(&mas, start) after untagging the address.
> 
> I'll send out a patch once I have recreated and verified this is the
> issue.

Thanks. I did a quick test and untagging start seems to fix the issue (I
was wondering why mprotect() returned -ENOMEM when failing).

-- 
Catalin
Re: [PATCH v13 57/70] mm/mlock: use vma iterator and maple state instead of vma linked list
Posted by Liam Howlett 3 years, 7 months ago
* Catalin Marinas <catalin.marinas@arm.com> [220825 11:20]:
> On Thu, Aug 25, 2022 at 01:21:01PM +0000, Liam Howlett wrote:
> > * Mark Brown <broonie@kernel.org> [220824 20:34]:
> > > On Mon, Aug 22, 2022 at 03:06:30PM +0000, Liam Howlett wrote:
> > > > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > > > 
> > > > Handle overflow checking in count_mm_mlocked_page_nr() differently.
> > > 
> > > Our QA team found that since next-20220823 we're seeing a couple of test
> > > failures in the check_mmap_options kselftest on arm64 platforms with MTE
> > > that aren't present in mainline:
> > > 
> > > # # FAIL: mprotect not ignoring clear PROT_MTE property
> > > # not ok 21 Check clear PROT_MTE flags with private mapping, sync error mode and mmap memory
> > > # # FAIL: mprotect not ignoring clear PROT_MTE property
> > > # not ok 22 Check clear PROT_MTE flags with private mapping and sync error mode and mmap/mprotect memory
> > 
> > Thanks.
> > 
> > > I bisected this using qemu[1] which landed on 4ceb4bca479d41a
> > > ("mm/mprotect: use maple tree navigation instead of vma linked list"),
> > > though I'm not 100% sure I trust the specific identification of the
> > > commit I'm pretty confident it's at the very least in this series.  I've
> > > not done any analysis of the failure beyond getting this bisect result.
> > > 
> > > [1] qemu -smp cpus=4 -cpu max -machine virt,gic-version=3,mte=on
> > 
> > This helps a lot.  I think your bisect is accurate:
> > 
> > ...
> >         struct mmu_gather tlb;
> > +       MA_STATE(mas, &current->mm->mm_mt, start, start);
> >  
> >         start = untagged_addr(start);
> > ...
> > 
> > It looks like I search against the tagged address.  I should initialize
> > the state to 0 and mas_set(&mas, start) after untagging the address.
> > 
> > I'll send out a patch once I have recreated and verified this is the
> > issue.
> 
> Thanks. I did a quick test and untagging start seems to fix the issue (I
> was wondering why mprotect() returned -ENOMEM when failing).
> 

Thanks Catalin for testing this.

I can confirm this fixes test 21 and 22 above:
ok 21 Check clear PROT_MTE flags with private mapping, sync error mode
and mmap memory
ok 22 Check clear PROT_MTE flags with private mapping and sync error
mode and mmap/mprotect memory
# Totals: pass:22 fail:0 xfail:0 xpass:0 skip:0 error:0

I will send out an update patch shortly.

Regards,
Liam