[PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries

Sasha Levin posted 1 patch 3 months, 1 week ago
mm/userfaultfd.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
[PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Sasha Levin 3 months, 1 week ago
When handling non-swap entries in move_pages_pte(), the error handling
for entries that are NOT migration entries fails to unmap the page table
entries before jumping to the error handling label.

This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
triggers a WARNING in kunmap_local_indexed() because the kmap stack is
corrupted.

Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
  WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
  Call trace:
    kunmap_local_indexed from move_pages+0x964/0x19f4
    move_pages from userfaultfd_ioctl+0x129c/0x2144
    userfaultfd_ioctl from sys_ioctl+0x558/0xd24

The issue was introduced with the UFFDIO_MOVE feature but became more
frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
path more commonly executed during userfaultfd operations.

Fix this by ensuring PTEs are properly unmapped in all non-swap entry
paths before jumping to the error handling label, not just for migration
entries.

Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 mm/userfaultfd.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 8253978ee0fb1..7c298e9cbc18f 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
 
 		entry = pte_to_swp_entry(orig_src_pte);
 		if (non_swap_entry(entry)) {
+			pte_unmap(src_pte);
+			pte_unmap(dst_pte);
+			src_pte = dst_pte = NULL;
 			if (is_migration_entry(entry)) {
-				pte_unmap(src_pte);
-				pte_unmap(dst_pte);
-				src_pte = dst_pte = NULL;
 				migration_entry_wait(mm, src_pmd, src_addr);
 				err = -EAGAIN;
-			} else
+			} else {
 				err = -EFAULT;
+			}
 			goto out;
 		}
 
-- 
2.39.5
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Andrew Morton 3 months, 1 week ago
On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:

> When handling non-swap entries in move_pages_pte(), the error handling
> for entries that are NOT migration entries fails to unmap the page table
> entries before jumping to the error handling label.
> 
> This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
> triggers a WARNING in kunmap_local_indexed() because the kmap stack is
> corrupted.
> 
> Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>   WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>   Call trace:
>     kunmap_local_indexed from move_pages+0x964/0x19f4
>     move_pages from userfaultfd_ioctl+0x129c/0x2144
>     userfaultfd_ioctl from sys_ioctl+0x558/0xd24
> 
> The issue was introduced with the UFFDIO_MOVE feature but became more
> frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
> PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
> path more commonly executed during userfaultfd operations.
> 
> Fix this by ensuring PTEs are properly unmapped in all non-swap entry
> paths before jumping to the error handling label, not just for migration
> entries.

I don't get it.

> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>  
>  		entry = pte_to_swp_entry(orig_src_pte);
>  		if (non_swap_entry(entry)) {
> +			pte_unmap(src_pte);
> +			pte_unmap(dst_pte);
> +			src_pte = dst_pte = NULL;
>  			if (is_migration_entry(entry)) {
> -				pte_unmap(src_pte);
> -				pte_unmap(dst_pte);
> -				src_pte = dst_pte = NULL;
>  				migration_entry_wait(mm, src_pmd, src_addr);
>  				err = -EAGAIN;
> -			} else
> +			} else {
>  				err = -EFAULT;
> +			}
>  			goto out;

where we have

out:
	...
	if (dst_pte)
		pte_unmap(dst_pte);
	if (src_pte)
		pte_unmap(src_pte);
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by David Hildenbrand 3 months ago
On 01.07.25 02:57, Andrew Morton wrote:
> On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
> 
>> When handling non-swap entries in move_pages_pte(), the error handling
>> for entries that are NOT migration entries fails to unmap the page table
>> entries before jumping to the error handling label.
>>
>> This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
>> triggers a WARNING in kunmap_local_indexed() because the kmap stack is
>> corrupted.
>>
>> Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>>    WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>>    Call trace:
>>      kunmap_local_indexed from move_pages+0x964/0x19f4
>>      move_pages from userfaultfd_ioctl+0x129c/0x2144
>>      userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>>
>> The issue was introduced with the UFFDIO_MOVE feature but became more
>> frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
>> PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
>> path more commonly executed during userfaultfd operations.
>>
>> Fix this by ensuring PTEs are properly unmapped in all non-swap entry
>> paths before jumping to the error handling label, not just for migration
>> entries.
> 
> I don't get it.
> 
>> --- a/mm/userfaultfd.c
>> +++ b/mm/userfaultfd.c
>> @@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>>   
>>   		entry = pte_to_swp_entry(orig_src_pte);
>>   		if (non_swap_entry(entry)) {
>> +			pte_unmap(src_pte);
>> +			pte_unmap(dst_pte);
>> +			src_pte = dst_pte = NULL;
>>   			if (is_migration_entry(entry)) {
>> -				pte_unmap(src_pte);
>> -				pte_unmap(dst_pte);
>> -				src_pte = dst_pte = NULL;
>>   				migration_entry_wait(mm, src_pmd, src_addr);
>>   				err = -EAGAIN;
>> -			} else
>> +			} else {
>>   				err = -EFAULT;
>> +			}
>>   			goto out;
> 
> where we have
> 
> out:
> 	...
> 	if (dst_pte)
> 		pte_unmap(dst_pte);
> 	if (src_pte)
> 		pte_unmap(src_pte);

AI slop?

-- 
Cheers,

David / dhildenb
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Sasha Levin 3 months ago
On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
>On 01.07.25 02:57, Andrew Morton wrote:
>>On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
>>
>>>When handling non-swap entries in move_pages_pte(), the error handling
>>>for entries that are NOT migration entries fails to unmap the page table
>>>entries before jumping to the error handling label.
>>>
>>>This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
>>>triggers a WARNING in kunmap_local_indexed() because the kmap stack is
>>>corrupted.
>>>
>>>Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>>>   WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>>>   Call trace:
>>>     kunmap_local_indexed from move_pages+0x964/0x19f4
>>>     move_pages from userfaultfd_ioctl+0x129c/0x2144
>>>     userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>>>
>>>The issue was introduced with the UFFDIO_MOVE feature but became more
>>>frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
>>>PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
>>>path more commonly executed during userfaultfd operations.
>>>
>>>Fix this by ensuring PTEs are properly unmapped in all non-swap entry
>>>paths before jumping to the error handling label, not just for migration
>>>entries.
>>
>>I don't get it.
>>
>>>--- a/mm/userfaultfd.c
>>>+++ b/mm/userfaultfd.c
>>>@@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>>>  		entry = pte_to_swp_entry(orig_src_pte);
>>>  		if (non_swap_entry(entry)) {
>>>+			pte_unmap(src_pte);
>>>+			pte_unmap(dst_pte);
>>>+			src_pte = dst_pte = NULL;
>>>  			if (is_migration_entry(entry)) {
>>>-				pte_unmap(src_pte);
>>>-				pte_unmap(dst_pte);
>>>-				src_pte = dst_pte = NULL;
>>>  				migration_entry_wait(mm, src_pmd, src_addr);
>>>  				err = -EAGAIN;
>>>-			} else
>>>+			} else {
>>>  				err = -EFAULT;
>>>+			}
>>>  			goto out;
>>
>>where we have
>>
>>out:
>>	...
>>	if (dst_pte)
>>		pte_unmap(dst_pte);
>>	if (src_pte)
>>		pte_unmap(src_pte);
>
>AI slop?

Nah, this one is sadly all me :(

I was trying to resolve some of the issues found with linus-next on
LKFT, and misunderstood the code. Funny enough, I thought that the
change above "fixed" it by making the warnings go away, but clearly is
the wrong thing to do so I went back to the drawing table...

If you're curious, here's the issue: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.13-rc7-43418-g558c6dd4d863/testrun/29030370/suite/log-parser-test/test/exception-warning-cpu-pid-at-mmhighmem-kunmap_local_indexed/details/

-- 
Thanks,
Sasha
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by David Hildenbrand 3 months ago
On 08.07.25 17:33, Sasha Levin wrote:
> On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
>> On 01.07.25 02:57, Andrew Morton wrote:
>>> On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
>>>
>>>> When handling non-swap entries in move_pages_pte(), the error handling
>>>> for entries that are NOT migration entries fails to unmap the page table
>>>> entries before jumping to the error handling label.
>>>>
>>>> This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
>>>> triggers a WARNING in kunmap_local_indexed() because the kmap stack is
>>>> corrupted.
>>>>
>>>> Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>>>>    WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>>>>    Call trace:
>>>>      kunmap_local_indexed from move_pages+0x964/0x19f4
>>>>      move_pages from userfaultfd_ioctl+0x129c/0x2144
>>>>      userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>>>>
>>>> The issue was introduced with the UFFDIO_MOVE feature but became more
>>>> frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
>>>> PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
>>>> path more commonly executed during userfaultfd operations.
>>>>
>>>> Fix this by ensuring PTEs are properly unmapped in all non-swap entry
>>>> paths before jumping to the error handling label, not just for migration
>>>> entries.
>>>
>>> I don't get it.
>>>
>>>> --- a/mm/userfaultfd.c
>>>> +++ b/mm/userfaultfd.c
>>>> @@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>>>>   		entry = pte_to_swp_entry(orig_src_pte);
>>>>   		if (non_swap_entry(entry)) {
>>>> +			pte_unmap(src_pte);
>>>> +			pte_unmap(dst_pte);
>>>> +			src_pte = dst_pte = NULL;
>>>>   			if (is_migration_entry(entry)) {
>>>> -				pte_unmap(src_pte);
>>>> -				pte_unmap(dst_pte);
>>>> -				src_pte = dst_pte = NULL;
>>>>   				migration_entry_wait(mm, src_pmd, src_addr);
>>>>   				err = -EAGAIN;
>>>> -			} else
>>>> +			} else {
>>>>   				err = -EFAULT;
>>>> +			}
>>>>   			goto out;
>>>
>>> where we have
>>>
>>> out:
>>> 	...
>>> 	if (dst_pte)
>>> 		pte_unmap(dst_pte);
>>> 	if (src_pte)
>>> 		pte_unmap(src_pte);
>>
>> AI slop?
> 
> Nah, this one is sadly all me :(

Haha, sorry :P

-- 
Cheers,

David / dhildenb
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Sasha Levin 2 months, 1 week ago
On Tue, Jul 08, 2025 at 05:42:16PM +0200, David Hildenbrand wrote:
>On 08.07.25 17:33, Sasha Levin wrote:
>>On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
>>>On 01.07.25 02:57, Andrew Morton wrote:
>>>>On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
>>>>
>>>>>When handling non-swap entries in move_pages_pte(), the error handling
>>>>>for entries that are NOT migration entries fails to unmap the page table
>>>>>entries before jumping to the error handling label.
>>>>>
>>>>>This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
>>>>>triggers a WARNING in kunmap_local_indexed() because the kmap stack is
>>>>>corrupted.
>>>>>
>>>>>Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>>>>>   WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>>>>>   Call trace:
>>>>>     kunmap_local_indexed from move_pages+0x964/0x19f4
>>>>>     move_pages from userfaultfd_ioctl+0x129c/0x2144
>>>>>     userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>>>>>
>>>>>The issue was introduced with the UFFDIO_MOVE feature but became more
>>>>>frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
>>>>>PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
>>>>>path more commonly executed during userfaultfd operations.
>>>>>
>>>>>Fix this by ensuring PTEs are properly unmapped in all non-swap entry
>>>>>paths before jumping to the error handling label, not just for migration
>>>>>entries.
>>>>
>>>>I don't get it.
>>>>
>>>>>--- a/mm/userfaultfd.c
>>>>>+++ b/mm/userfaultfd.c
>>>>>@@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>>>>>  		entry = pte_to_swp_entry(orig_src_pte);
>>>>>  		if (non_swap_entry(entry)) {
>>>>>+			pte_unmap(src_pte);
>>>>>+			pte_unmap(dst_pte);
>>>>>+			src_pte = dst_pte = NULL;
>>>>>  			if (is_migration_entry(entry)) {
>>>>>-				pte_unmap(src_pte);
>>>>>-				pte_unmap(dst_pte);
>>>>>-				src_pte = dst_pte = NULL;
>>>>>  				migration_entry_wait(mm, src_pmd, src_addr);
>>>>>  				err = -EAGAIN;
>>>>>-			} else
>>>>>+			} else {
>>>>>  				err = -EFAULT;
>>>>>+			}
>>>>>  			goto out;
>>>>
>>>>where we have
>>>>
>>>>out:
>>>>	...
>>>>	if (dst_pte)
>>>>		pte_unmap(dst_pte);
>>>>	if (src_pte)
>>>>		pte_unmap(src_pte);
>>>
>>>AI slop?
>>
>>Nah, this one is sadly all me :(
>
>Haha, sorry :P

So as I was getting nowhere with this, I asked AI to help me :)

If you're not interested in reading LLM generated code, feel free to
stop reading now...

After it went over the logs, and a few prompts to point it the right
way, it ended up generating a patch (below) that made sense, and fixed
the warning that LKFT was being able to trigger.

If anyone who's more familiar with the code than me (and the AI) agrees
with the patch and ways to throw their Reviewed-by, I'll send out the
patch.

If the below patch is completely bogus then I'm sorry and I'll buy you a
beer at LPC :)


 From 70f7eae079a5203857b96d6c64bb72b0f566d4de Mon Sep 17 00:00:00 2001
From: Sasha Levin <sashal@kernel.org>
Date: Wed, 30 Jul 2025 20:41:54 -0400
Subject: [PATCH] mm/userfaultfd: fix kmap_local LIFO ordering for
  CONFIG_HIGHPTE

With CONFIG_HIGHPTE on 32-bit ARM, move_pages_pte() maps PTE pages using
kmap_local_page(), which requires unmapping in Last-In-First-Out order.

The current code maps dst_pte first, then src_pte, but unmaps them in
the same order (dst_pte, src_pte), violating the LIFO requirement.
This causes the warning in kunmap_local_indexed():

   WARNING: CPU: 0 PID: 604 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
   addr != __fix_to_virt(FIX_KMAP_BEGIN + idx)

Fix this by reversing the unmap order to respect LIFO ordering.

This issue follows the same pattern as similar fixes:
- commit eca6828403b8 ("crypto: skcipher - fix mismatch between mapping and unmapping order")
- commit 8cf57c6df818 ("nilfs2: eliminate staggered calls to kunmap in nilfs_rename")

Both of which addressed the same fundamental requirement that kmap_local
operations must follow LIFO ordering.

Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Co-developed-by: Claude claude-opus-4-20250514
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
  mm/userfaultfd.c | 9 +++++++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 8253978ee0fb..bf7a57ea71e0 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1453,10 +1453,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
  		folio_unlock(src_folio);
  		folio_put(src_folio);
  	}
-	if (dst_pte)
-		pte_unmap(dst_pte);
+	/*
+	 * Unmap in reverse order (LIFO) to maintain proper kmap_local
+	 * index ordering when CONFIG_HIGHPTE is enabled. We mapped dst_pte
+	 * first, then src_pte, so we must unmap src_pte first, then dst_pte.
+	 */
  	if (src_pte)
  		pte_unmap(src_pte);
+	if (dst_pte)
+		pte_unmap(dst_pte);
  	mmu_notifier_invalidate_range_end(&range);
  	if (si)
  		put_swap_device(si);


-- 
Thanks,
Sasha
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by David Hildenbrand 2 months, 1 week ago
On 31.07.25 14:37, Sasha Levin wrote:
> On Tue, Jul 08, 2025 at 05:42:16PM +0200, David Hildenbrand wrote:
>> On 08.07.25 17:33, Sasha Levin wrote:
>>> On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
>>>> On 01.07.25 02:57, Andrew Morton wrote:
>>>>> On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
>>>>>
>>>>>> When handling non-swap entries in move_pages_pte(), the error handling
>>>>>> for entries that are NOT migration entries fails to unmap the page table
>>>>>> entries before jumping to the error handling label.
>>>>>>
>>>>>> This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
>>>>>> triggers a WARNING in kunmap_local_indexed() because the kmap stack is
>>>>>> corrupted.
>>>>>>
>>>>>> Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>>>>>>    WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>>>>>>    Call trace:
>>>>>>      kunmap_local_indexed from move_pages+0x964/0x19f4
>>>>>>      move_pages from userfaultfd_ioctl+0x129c/0x2144
>>>>>>      userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>>>>>>
>>>>>> The issue was introduced with the UFFDIO_MOVE feature but became more
>>>>>> frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
>>>>>> PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
>>>>>> path more commonly executed during userfaultfd operations.
>>>>>>
>>>>>> Fix this by ensuring PTEs are properly unmapped in all non-swap entry
>>>>>> paths before jumping to the error handling label, not just for migration
>>>>>> entries.
>>>>>
>>>>> I don't get it.
>>>>>
>>>>>> --- a/mm/userfaultfd.c
>>>>>> +++ b/mm/userfaultfd.c
>>>>>> @@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>>>>>>   		entry = pte_to_swp_entry(orig_src_pte);
>>>>>>   		if (non_swap_entry(entry)) {
>>>>>> +			pte_unmap(src_pte);
>>>>>> +			pte_unmap(dst_pte);
>>>>>> +			src_pte = dst_pte = NULL;
>>>>>>   			if (is_migration_entry(entry)) {
>>>>>> -				pte_unmap(src_pte);
>>>>>> -				pte_unmap(dst_pte);
>>>>>> -				src_pte = dst_pte = NULL;
>>>>>>   				migration_entry_wait(mm, src_pmd, src_addr);
>>>>>>   				err = -EAGAIN;
>>>>>> -			} else
>>>>>> +			} else {
>>>>>>   				err = -EFAULT;
>>>>>> +			}
>>>>>>   			goto out;
>>>>>
>>>>> where we have
>>>>>
>>>>> out:
>>>>> 	...
>>>>> 	if (dst_pte)
>>>>> 		pte_unmap(dst_pte);
>>>>> 	if (src_pte)
>>>>> 		pte_unmap(src_pte);
>>>>
>>>> AI slop?
>>>
>>> Nah, this one is sadly all me :(
>>
>> Haha, sorry :P
> 
> So as I was getting nowhere with this, I asked AI to help me :)
> 
> If you're not interested in reading LLM generated code, feel free to
> stop reading now...
> 
> After it went over the logs, and a few prompts to point it the right
> way, it ended up generating a patch (below) that made sense, and fixed
> the warning that LKFT was being able to trigger.
> 
> If anyone who's more familiar with the code than me (and the AI) agrees
> with the patch and ways to throw their Reviewed-by, I'll send out the
> patch.

Seems to check out for me. In particular, out pte_unmap() everywhere 
else in that function (and mremap.c:move_ptes) are ordered properly.

Even if it would not fix the issue, it would be a cleanup :)

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Sasha Levin 2 months, 1 week ago
On Thu, Jul 31, 2025 at 02:56:25PM +0200, David Hildenbrand wrote:
>On 31.07.25 14:37, Sasha Levin wrote:
>>On Tue, Jul 08, 2025 at 05:42:16PM +0200, David Hildenbrand wrote:
>>>On 08.07.25 17:33, Sasha Levin wrote:
>>>>On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
>>>>>On 01.07.25 02:57, Andrew Morton wrote:
>>>>>>On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
>>>>>>
>>>>>>>When handling non-swap entries in move_pages_pte(), the error handling
>>>>>>>for entries that are NOT migration entries fails to unmap the page table
>>>>>>>entries before jumping to the error handling label.
>>>>>>>
>>>>>>>This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
>>>>>>>triggers a WARNING in kunmap_local_indexed() because the kmap stack is
>>>>>>>corrupted.
>>>>>>>
>>>>>>>Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>>>>>>>   WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>>>>>>>   Call trace:
>>>>>>>     kunmap_local_indexed from move_pages+0x964/0x19f4
>>>>>>>     move_pages from userfaultfd_ioctl+0x129c/0x2144
>>>>>>>     userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>>>>>>>
>>>>>>>The issue was introduced with the UFFDIO_MOVE feature but became more
>>>>>>>frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
>>>>>>>PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
>>>>>>>path more commonly executed during userfaultfd operations.
>>>>>>>
>>>>>>>Fix this by ensuring PTEs are properly unmapped in all non-swap entry
>>>>>>>paths before jumping to the error handling label, not just for migration
>>>>>>>entries.
>>>>>>
>>>>>>I don't get it.
>>>>>>
>>>>>>>--- a/mm/userfaultfd.c
>>>>>>>+++ b/mm/userfaultfd.c
>>>>>>>@@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>>>>>>>  		entry = pte_to_swp_entry(orig_src_pte);
>>>>>>>  		if (non_swap_entry(entry)) {
>>>>>>>+			pte_unmap(src_pte);
>>>>>>>+			pte_unmap(dst_pte);
>>>>>>>+			src_pte = dst_pte = NULL;
>>>>>>>  			if (is_migration_entry(entry)) {
>>>>>>>-				pte_unmap(src_pte);
>>>>>>>-				pte_unmap(dst_pte);
>>>>>>>-				src_pte = dst_pte = NULL;
>>>>>>>  				migration_entry_wait(mm, src_pmd, src_addr);
>>>>>>>  				err = -EAGAIN;
>>>>>>>-			} else
>>>>>>>+			} else {
>>>>>>>  				err = -EFAULT;
>>>>>>>+			}
>>>>>>>  			goto out;
>>>>>>
>>>>>>where we have
>>>>>>
>>>>>>out:
>>>>>>	...
>>>>>>	if (dst_pte)
>>>>>>		pte_unmap(dst_pte);
>>>>>>	if (src_pte)
>>>>>>		pte_unmap(src_pte);
>>>>>
>>>>>AI slop?
>>>>
>>>>Nah, this one is sadly all me :(
>>>
>>>Haha, sorry :P
>>
>>So as I was getting nowhere with this, I asked AI to help me :)
>>
>>If you're not interested in reading LLM generated code, feel free to
>>stop reading now...
>>
>>After it went over the logs, and a few prompts to point it the right
>>way, it ended up generating a patch (below) that made sense, and fixed
>>the warning that LKFT was being able to trigger.
>>
>>If anyone who's more familiar with the code than me (and the AI) agrees
>>with the patch and ways to throw their Reviewed-by, I'll send out the
>>patch.
>
>Seems to check out for me. In particular, out pte_unmap() everywhere 
>else in that function (and mremap.c:move_ptes) are ordered properly.
>
>Even if it would not fix the issue, it would be a cleanup :)
>
>Acked-by: David Hildenbrand <david@redhat.com>

David, I ended up LLM generating a .cocci script to detect this type of
issues, and it ended up detecting a similar issue in
arch/loongarch/mm/init.c.

Would you be open to reviewing both the .cocci script as well as the
loongarch fix?

-- 
Thanks,
Sasha
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by David Hildenbrand 2 months, 1 week ago
On 01.08.25 15:26, Sasha Levin wrote:
> On Thu, Jul 31, 2025 at 02:56:25PM +0200, David Hildenbrand wrote:
>> On 31.07.25 14:37, Sasha Levin wrote:
>>> On Tue, Jul 08, 2025 at 05:42:16PM +0200, David Hildenbrand wrote:
>>>> On 08.07.25 17:33, Sasha Levin wrote:
>>>>> On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
>>>>>> On 01.07.25 02:57, Andrew Morton wrote:
>>>>>>> On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
>>>>>>>
>>>>>>>> When handling non-swap entries in move_pages_pte(), the error handling
>>>>>>>> for entries that are NOT migration entries fails to unmap the page table
>>>>>>>> entries before jumping to the error handling label.
>>>>>>>>
>>>>>>>> This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
>>>>>>>> triggers a WARNING in kunmap_local_indexed() because the kmap stack is
>>>>>>>> corrupted.
>>>>>>>>
>>>>>>>> Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>>>>>>>>    WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>>>>>>>>    Call trace:
>>>>>>>>      kunmap_local_indexed from move_pages+0x964/0x19f4
>>>>>>>>      move_pages from userfaultfd_ioctl+0x129c/0x2144
>>>>>>>>      userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>>>>>>>>
>>>>>>>> The issue was introduced with the UFFDIO_MOVE feature but became more
>>>>>>>> frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
>>>>>>>> PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
>>>>>>>> path more commonly executed during userfaultfd operations.
>>>>>>>>
>>>>>>>> Fix this by ensuring PTEs are properly unmapped in all non-swap entry
>>>>>>>> paths before jumping to the error handling label, not just for migration
>>>>>>>> entries.
>>>>>>>
>>>>>>> I don't get it.
>>>>>>>
>>>>>>>> --- a/mm/userfaultfd.c
>>>>>>>> +++ b/mm/userfaultfd.c
>>>>>>>> @@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>>>>>>>>   		entry = pte_to_swp_entry(orig_src_pte);
>>>>>>>>   		if (non_swap_entry(entry)) {
>>>>>>>> +			pte_unmap(src_pte);
>>>>>>>> +			pte_unmap(dst_pte);
>>>>>>>> +			src_pte = dst_pte = NULL;
>>>>>>>>   			if (is_migration_entry(entry)) {
>>>>>>>> -				pte_unmap(src_pte);
>>>>>>>> -				pte_unmap(dst_pte);
>>>>>>>> -				src_pte = dst_pte = NULL;
>>>>>>>>   				migration_entry_wait(mm, src_pmd, src_addr);
>>>>>>>>   				err = -EAGAIN;
>>>>>>>> -			} else
>>>>>>>> +			} else {
>>>>>>>>   				err = -EFAULT;
>>>>>>>> +			}
>>>>>>>>   			goto out;
>>>>>>>
>>>>>>> where we have
>>>>>>>
>>>>>>> out:
>>>>>>> 	...
>>>>>>> 	if (dst_pte)
>>>>>>> 		pte_unmap(dst_pte);
>>>>>>> 	if (src_pte)
>>>>>>> 		pte_unmap(src_pte);
>>>>>>
>>>>>> AI slop?
>>>>>
>>>>> Nah, this one is sadly all me :(
>>>>
>>>> Haha, sorry :P
>>>
>>> So as I was getting nowhere with this, I asked AI to help me :)
>>>
>>> If you're not interested in reading LLM generated code, feel free to
>>> stop reading now...
>>>
>>> After it went over the logs, and a few prompts to point it the right
>>> way, it ended up generating a patch (below) that made sense, and fixed
>>> the warning that LKFT was being able to trigger.
>>>
>>> If anyone who's more familiar with the code than me (and the AI) agrees
>>> with the patch and ways to throw their Reviewed-by, I'll send out the
>>> patch.
>>
>> Seems to check out for me. In particular, out pte_unmap() everywhere
>> else in that function (and mremap.c:move_ptes) are ordered properly.
>>
>> Even if it would not fix the issue, it would be a cleanup :)
>>
>> Acked-by: David Hildenbrand <david@redhat.com>
> 
> David, I ended up LLM generating a .cocci script to detect this type of
> issues, and it ended up detecting a similar issue in
> arch/loongarch/mm/init.c.

Does loongarch have these kmap_local restrictions?

> 
> Would you be open to reviewing both the .cocci script as well as the
> loongarch fix?

Sure, if it's prechecked by you no problem.

-- 
Cheers,

David / dhildenb
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by David Hildenbrand 2 months, 1 week ago
On 01.08.25 16:06, David Hildenbrand wrote:
> On 01.08.25 15:26, Sasha Levin wrote:
>> On Thu, Jul 31, 2025 at 02:56:25PM +0200, David Hildenbrand wrote:
>>> On 31.07.25 14:37, Sasha Levin wrote:
>>>> On Tue, Jul 08, 2025 at 05:42:16PM +0200, David Hildenbrand wrote:
>>>>> On 08.07.25 17:33, Sasha Levin wrote:
>>>>>> On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
>>>>>>> On 01.07.25 02:57, Andrew Morton wrote:
>>>>>>>> On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
>>>>>>>>
>>>>>>>>> When handling non-swap entries in move_pages_pte(), the error handling
>>>>>>>>> for entries that are NOT migration entries fails to unmap the page table
>>>>>>>>> entries before jumping to the error handling label.
>>>>>>>>>
>>>>>>>>> This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
>>>>>>>>> triggers a WARNING in kunmap_local_indexed() because the kmap stack is
>>>>>>>>> corrupted.
>>>>>>>>>
>>>>>>>>> Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>>>>>>>>>     WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>>>>>>>>>     Call trace:
>>>>>>>>>       kunmap_local_indexed from move_pages+0x964/0x19f4
>>>>>>>>>       move_pages from userfaultfd_ioctl+0x129c/0x2144
>>>>>>>>>       userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>>>>>>>>>
>>>>>>>>> The issue was introduced with the UFFDIO_MOVE feature but became more
>>>>>>>>> frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
>>>>>>>>> PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
>>>>>>>>> path more commonly executed during userfaultfd operations.
>>>>>>>>>
>>>>>>>>> Fix this by ensuring PTEs are properly unmapped in all non-swap entry
>>>>>>>>> paths before jumping to the error handling label, not just for migration
>>>>>>>>> entries.
>>>>>>>>
>>>>>>>> I don't get it.
>>>>>>>>
>>>>>>>>> --- a/mm/userfaultfd.c
>>>>>>>>> +++ b/mm/userfaultfd.c
>>>>>>>>> @@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>>>>>>>>>    		entry = pte_to_swp_entry(orig_src_pte);
>>>>>>>>>    		if (non_swap_entry(entry)) {
>>>>>>>>> +			pte_unmap(src_pte);
>>>>>>>>> +			pte_unmap(dst_pte);
>>>>>>>>> +			src_pte = dst_pte = NULL;
>>>>>>>>>    			if (is_migration_entry(entry)) {
>>>>>>>>> -				pte_unmap(src_pte);
>>>>>>>>> -				pte_unmap(dst_pte);
>>>>>>>>> -				src_pte = dst_pte = NULL;
>>>>>>>>>    				migration_entry_wait(mm, src_pmd, src_addr);
>>>>>>>>>    				err = -EAGAIN;
>>>>>>>>> -			} else
>>>>>>>>> +			} else {
>>>>>>>>>    				err = -EFAULT;
>>>>>>>>> +			}
>>>>>>>>>    			goto out;
>>>>>>>>
>>>>>>>> where we have
>>>>>>>>
>>>>>>>> out:
>>>>>>>> 	...
>>>>>>>> 	if (dst_pte)
>>>>>>>> 		pte_unmap(dst_pte);
>>>>>>>> 	if (src_pte)
>>>>>>>> 		pte_unmap(src_pte);
>>>>>>>
>>>>>>> AI slop?
>>>>>>
>>>>>> Nah, this one is sadly all me :(
>>>>>
>>>>> Haha, sorry :P
>>>>
>>>> So as I was getting nowhere with this, I asked AI to help me :)
>>>>
>>>> If you're not interested in reading LLM generated code, feel free to
>>>> stop reading now...
>>>>
>>>> After it went over the logs, and a few prompts to point it the right
>>>> way, it ended up generating a patch (below) that made sense, and fixed
>>>> the warning that LKFT was being able to trigger.
>>>>
>>>> If anyone who's more familiar with the code than me (and the AI) agrees
>>>> with the patch and ways to throw their Reviewed-by, I'll send out the
>>>> patch.
>>>
>>> Seems to check out for me. In particular, out pte_unmap() everywhere
>>> else in that function (and mremap.c:move_ptes) are ordered properly.
>>>
>>> Even if it would not fix the issue, it would be a cleanup :)
>>>
>>> Acked-by: David Hildenbrand <david@redhat.com>
>>
>> David, I ended up LLM generating a .cocci script to detect this type of
>> issues, and it ended up detecting a similar issue in
>> arch/loongarch/mm/init.c.
> 
> Does loongarch have these kmap_local restrictions?

loongarch doesn't use HIGHMEM, so it probably doesn't matter. Could be 
considered a cleanup, though.

-- 
Cheers,

David / dhildenb
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Sasha Levin 2 months, 1 week ago
On Fri, Aug 01, 2025 at 04:13:32PM +0200, David Hildenbrand wrote:
>On 01.08.25 16:06, David Hildenbrand wrote:
>>On 01.08.25 15:26, Sasha Levin wrote:
>>>On Thu, Jul 31, 2025 at 02:56:25PM +0200, David Hildenbrand wrote:
>>>>On 31.07.25 14:37, Sasha Levin wrote:
>>>>>On Tue, Jul 08, 2025 at 05:42:16PM +0200, David Hildenbrand wrote:
>>>>>>On 08.07.25 17:33, Sasha Levin wrote:
>>>>>>>On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
>>>>>>>>On 01.07.25 02:57, Andrew Morton wrote:
>>>>>>>>>On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
>>>>>>>>>
>>>>>>>>>>When handling non-swap entries in move_pages_pte(), the error handling
>>>>>>>>>>for entries that are NOT migration entries fails to unmap the page table
>>>>>>>>>>entries before jumping to the error handling label.
>>>>>>>>>>
>>>>>>>>>>This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
>>>>>>>>>>triggers a WARNING in kunmap_local_indexed() because the kmap stack is
>>>>>>>>>>corrupted.
>>>>>>>>>>
>>>>>>>>>>Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>>>>>>>>>>    WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>>>>>>>>>>    Call trace:
>>>>>>>>>>      kunmap_local_indexed from move_pages+0x964/0x19f4
>>>>>>>>>>      move_pages from userfaultfd_ioctl+0x129c/0x2144
>>>>>>>>>>      userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>>>>>>>>>>
>>>>>>>>>>The issue was introduced with the UFFDIO_MOVE feature but became more
>>>>>>>>>>frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
>>>>>>>>>>PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
>>>>>>>>>>path more commonly executed during userfaultfd operations.
>>>>>>>>>>
>>>>>>>>>>Fix this by ensuring PTEs are properly unmapped in all non-swap entry
>>>>>>>>>>paths before jumping to the error handling label, not just for migration
>>>>>>>>>>entries.
>>>>>>>>>
>>>>>>>>>I don't get it.
>>>>>>>>>
>>>>>>>>>>--- a/mm/userfaultfd.c
>>>>>>>>>>+++ b/mm/userfaultfd.c
>>>>>>>>>>@@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>>>>>>>>>>   		entry = pte_to_swp_entry(orig_src_pte);
>>>>>>>>>>   		if (non_swap_entry(entry)) {
>>>>>>>>>>+			pte_unmap(src_pte);
>>>>>>>>>>+			pte_unmap(dst_pte);
>>>>>>>>>>+			src_pte = dst_pte = NULL;
>>>>>>>>>>   			if (is_migration_entry(entry)) {
>>>>>>>>>>-				pte_unmap(src_pte);
>>>>>>>>>>-				pte_unmap(dst_pte);
>>>>>>>>>>-				src_pte = dst_pte = NULL;
>>>>>>>>>>   				migration_entry_wait(mm, src_pmd, src_addr);
>>>>>>>>>>   				err = -EAGAIN;
>>>>>>>>>>-			} else
>>>>>>>>>>+			} else {
>>>>>>>>>>   				err = -EFAULT;
>>>>>>>>>>+			}
>>>>>>>>>>   			goto out;
>>>>>>>>>
>>>>>>>>>where we have
>>>>>>>>>
>>>>>>>>>out:
>>>>>>>>>	...
>>>>>>>>>	if (dst_pte)
>>>>>>>>>		pte_unmap(dst_pte);
>>>>>>>>>	if (src_pte)
>>>>>>>>>		pte_unmap(src_pte);
>>>>>>>>
>>>>>>>>AI slop?
>>>>>>>
>>>>>>>Nah, this one is sadly all me :(
>>>>>>
>>>>>>Haha, sorry :P
>>>>>
>>>>>So as I was getting nowhere with this, I asked AI to help me :)
>>>>>
>>>>>If you're not interested in reading LLM generated code, feel free to
>>>>>stop reading now...
>>>>>
>>>>>After it went over the logs, and a few prompts to point it the right
>>>>>way, it ended up generating a patch (below) that made sense, and fixed
>>>>>the warning that LKFT was being able to trigger.
>>>>>
>>>>>If anyone who's more familiar with the code than me (and the AI) agrees
>>>>>with the patch and ways to throw their Reviewed-by, I'll send out the
>>>>>patch.
>>>>
>>>>Seems to check out for me. In particular, out pte_unmap() everywhere
>>>>else in that function (and mremap.c:move_ptes) are ordered properly.
>>>>
>>>>Even if it would not fix the issue, it would be a cleanup :)
>>>>
>>>>Acked-by: David Hildenbrand <david@redhat.com>
>>>
>>>David, I ended up LLM generating a .cocci script to detect this type of
>>>issues, and it ended up detecting a similar issue in
>>>arch/loongarch/mm/init.c.
>>
>>Does loongarch have these kmap_local restrictions?
>
>loongarch doesn't use HIGHMEM, so it probably doesn't matter. Could be 
>considered a cleanup, though.

Yup, it's just a cleanup for loongarch.

It was the only other place besides mm/userfaultfd.c that had that
inversion, so keeping the tree warning clear will make it easier to spot
newly introduced issues in the future.

-- 
Thanks,
Sasha
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Sasha Levin 2 months, 1 week ago
On Fri, Aug 01, 2025 at 04:06:14PM +0200, David Hildenbrand wrote:
>On 01.08.25 15:26, Sasha Levin wrote:
>>On Thu, Jul 31, 2025 at 02:56:25PM +0200, David Hildenbrand wrote:
>>>On 31.07.25 14:37, Sasha Levin wrote:
>>>>On Tue, Jul 08, 2025 at 05:42:16PM +0200, David Hildenbrand wrote:
>>>>>On 08.07.25 17:33, Sasha Levin wrote:
>>>>>>On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
>>>>>>>On 01.07.25 02:57, Andrew Morton wrote:
>>>>>>>>On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
>>>>>>>>
>>>>>>>>>When handling non-swap entries in move_pages_pte(), the error handling
>>>>>>>>>for entries that are NOT migration entries fails to unmap the page table
>>>>>>>>>entries before jumping to the error handling label.
>>>>>>>>>
>>>>>>>>>This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
>>>>>>>>>triggers a WARNING in kunmap_local_indexed() because the kmap stack is
>>>>>>>>>corrupted.
>>>>>>>>>
>>>>>>>>>Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>>>>>>>>>   WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>>>>>>>>>   Call trace:
>>>>>>>>>     kunmap_local_indexed from move_pages+0x964/0x19f4
>>>>>>>>>     move_pages from userfaultfd_ioctl+0x129c/0x2144
>>>>>>>>>     userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>>>>>>>>>
>>>>>>>>>The issue was introduced with the UFFDIO_MOVE feature but became more
>>>>>>>>>frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
>>>>>>>>>PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
>>>>>>>>>path more commonly executed during userfaultfd operations.
>>>>>>>>>
>>>>>>>>>Fix this by ensuring PTEs are properly unmapped in all non-swap entry
>>>>>>>>>paths before jumping to the error handling label, not just for migration
>>>>>>>>>entries.
>>>>>>>>
>>>>>>>>I don't get it.
>>>>>>>>
>>>>>>>>>--- a/mm/userfaultfd.c
>>>>>>>>>+++ b/mm/userfaultfd.c
>>>>>>>>>@@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>>>>>>>>>  		entry = pte_to_swp_entry(orig_src_pte);
>>>>>>>>>  		if (non_swap_entry(entry)) {
>>>>>>>>>+			pte_unmap(src_pte);
>>>>>>>>>+			pte_unmap(dst_pte);
>>>>>>>>>+			src_pte = dst_pte = NULL;
>>>>>>>>>  			if (is_migration_entry(entry)) {
>>>>>>>>>-				pte_unmap(src_pte);
>>>>>>>>>-				pte_unmap(dst_pte);
>>>>>>>>>-				src_pte = dst_pte = NULL;
>>>>>>>>>  				migration_entry_wait(mm, src_pmd, src_addr);
>>>>>>>>>  				err = -EAGAIN;
>>>>>>>>>-			} else
>>>>>>>>>+			} else {
>>>>>>>>>  				err = -EFAULT;
>>>>>>>>>+			}
>>>>>>>>>  			goto out;
>>>>>>>>
>>>>>>>>where we have
>>>>>>>>
>>>>>>>>out:
>>>>>>>>	...
>>>>>>>>	if (dst_pte)
>>>>>>>>		pte_unmap(dst_pte);
>>>>>>>>	if (src_pte)
>>>>>>>>		pte_unmap(src_pte);
>>>>>>>
>>>>>>>AI slop?
>>>>>>
>>>>>>Nah, this one is sadly all me :(
>>>>>
>>>>>Haha, sorry :P
>>>>
>>>>So as I was getting nowhere with this, I asked AI to help me :)
>>>>
>>>>If you're not interested in reading LLM generated code, feel free to
>>>>stop reading now...
>>>>
>>>>After it went over the logs, and a few prompts to point it the right
>>>>way, it ended up generating a patch (below) that made sense, and fixed
>>>>the warning that LKFT was being able to trigger.
>>>>
>>>>If anyone who's more familiar with the code than me (and the AI) agrees
>>>>with the patch and ways to throw their Reviewed-by, I'll send out the
>>>>patch.
>>>
>>>Seems to check out for me. In particular, out pte_unmap() everywhere
>>>else in that function (and mremap.c:move_ptes) are ordered properly.
>>>
>>>Even if it would not fix the issue, it would be a cleanup :)
>>>
>>>Acked-by: David Hildenbrand <david@redhat.com>
>>
>>David, I ended up LLM generating a .cocci script to detect this type of
>>issues, and it ended up detecting a similar issue in
>>arch/loongarch/mm/init.c.
>
>Does loongarch have these kmap_local restrictions?
>
>>
>>Would you be open to reviewing both the .cocci script as well as the
>>loongarch fix?
>
>Sure, if it's prechecked by you no problem.

Yup. Though I definitely learned a thing or two about Coccinelle patches
during this experiment.

-- 
Thanks,
Sasha
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Sasha Levin 2 months ago
On Fri, Aug 01, 2025 at 10:29:17AM -0400, Sasha Levin wrote:
>On Fri, Aug 01, 2025 at 04:06:14PM +0200, David Hildenbrand wrote:
>>Sure, if it's prechecked by you no problem.
>
>Yup. Though I definitely learned a thing or two about Coccinelle patches
>during this experiment.

Appologies if it isn't the case, but the two patches were attached to
the previous mail and I suspect they might have been missed :)

-- 
Thanks,
Sasha
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by David Hildenbrand 2 months ago
On 07.08.25 21:51, Sasha Levin wrote:
> On Fri, Aug 01, 2025 at 10:29:17AM -0400, Sasha Levin wrote:
>> On Fri, Aug 01, 2025 at 04:06:14PM +0200, David Hildenbrand wrote:
>>> Sure, if it's prechecked by you no problem.
>>
>> Yup. Though I definitely learned a thing or two about Coccinelle patches
>> during this experiment.
> 
> Appologies if it isn't the case, but the two patches were attached to
> the previous mail and I suspect they might have been missed :)

Whoop's not used to reviewing attachments. I'll focus on the loongarch patch.

 From a547687db03ecfe13ddc74e452357df78f880255 Mon Sep 17 00:00:00 2001
From: Sasha Levin <sashal@kernel.org>
Date: Fri, 1 Aug 2025 09:17:04 -0400
Subject: [PATCH 2/2] LoongArch: fix kmap_local_page() LIFO ordering in
  copy_user_highpage()

The current implementation violates kmap_local_page()'s LIFO ordering
requirement by unmapping the pages in the same order they were mapped.

This was introduced by commit 477a0ebec101 ("LoongArch: Replace
kmap_atomic() with kmap_local_page() in copy_user_highpage()") when
converting from kmap_atomic() to kmap_local_page(). The original code
correctly unmapped in reverse order, but the conversion swapped the
mapping order while keeping the unmapping order unchanged, resulting
in a LIFO violation.

kmap_local_page() requires unmapping to be done in reverse order
(Last-In-First-Out). Currently we map vfrom and then vto, but unmap
vfrom and then vto, which is incorrect. This patch corrects it to
unmap vto first, then vfrom.

This issue was detected by the kmap_local_lifo.cocci semantic patch.

Fixes: 477a0ebec101 ("LoongArch: Replace kmap_atomic() with kmap_local_page() in copy_user_highpage()")
Co-developed-by: Claude claude-opus-4-20250514
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
  arch/loongarch/mm/init.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
index c3e4586a7975..01c43f455486 100644
--- a/arch/loongarch/mm/init.c
+++ b/arch/loongarch/mm/init.c
@@ -47,8 +47,8 @@ void copy_user_highpage(struct page *to, struct page *from,
  	vfrom = kmap_local_page(from);
  	vto = kmap_local_page(to);
  	copy_page(vto, vfrom);
-	kunmap_local(vfrom);
  	kunmap_local(vto);
+	kunmap_local(vfrom);
  	/* Make sure this page is cleared on other CPU's too before using it */
  	smp_wmb();
  }
-- 
2.39.5


So, loongarch neither supports

a) Highmem

nor

b) ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP, disabling DEBUG_KMAP_LOCAL_FORCE_MAP

Consequently __kmap_local_page_prot() will not do anything:

	if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
		return page_address(page);


So there isn't anything to fix here and the whole patch subject+description should be
rewritten to focus on this being purely a cleanup -- unless I am missing
something important.

Also, please reduce the description to the absolute minimum, nobody wants to read the
same thing 4 times using slightly different words.

"LIFO ordering", "LIFO ordering", "unmapped in reverse order ... LIFO violation" ...
"reverse order (Last-In-First-Out)"


More importantly: is the LIFO semantics clearly documented somewhere? I read
Documentation/mm/highmem.rst

   Nesting kmap_local_page() and kmap_atomic() mappings is allowed to a certain
   extent (up to KMAP_TYPE_NR) but their invocations have to be strictly ordered
   because the map implementation is stack based. See kmap_local_page() kdocs
   (included in the "Functions" section) for details on how to manage nested
   mappings.

and that kind-of spells that out (strictly order -> stack based). I think one could
have clarified that a bit further.

Also, I would expect this to be mentioned in the docs of the relevant kmap functions,
and the pte map / unmap functions.

Did I miss that part or could we extend the function docs to spell that out?

-- 
Cheers,

David / dhildenb
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Sasha Levin 2 months ago
On Fri, Aug 08, 2025 at 10:02:08AM +0200, David Hildenbrand wrote:
>On 07.08.25 21:51, Sasha Levin wrote:
>>On Fri, Aug 01, 2025 at 10:29:17AM -0400, Sasha Levin wrote:
>>>On Fri, Aug 01, 2025 at 04:06:14PM +0200, David Hildenbrand wrote:
>>>>Sure, if it's prechecked by you no problem.
>>>
>>>Yup. Though I definitely learned a thing or two about Coccinelle patches
>>>during this experiment.
>>
>>Appologies if it isn't the case, but the two patches were attached to
>>the previous mail and I suspect they might have been missed :)
>
>Whoop's not used to reviewing attachments. I'll focus on the loongarch patch.

Thank you for the review!

>From a547687db03ecfe13ddc74e452357df78f880255 Mon Sep 17 00:00:00 2001
>From: Sasha Levin <sashal@kernel.org>
>Date: Fri, 1 Aug 2025 09:17:04 -0400
>Subject: [PATCH 2/2] LoongArch: fix kmap_local_page() LIFO ordering in
> copy_user_highpage()
>
>The current implementation violates kmap_local_page()'s LIFO ordering
>requirement by unmapping the pages in the same order they were mapped.
>
>This was introduced by commit 477a0ebec101 ("LoongArch: Replace
>kmap_atomic() with kmap_local_page() in copy_user_highpage()") when
>converting from kmap_atomic() to kmap_local_page(). The original code
>correctly unmapped in reverse order, but the conversion swapped the
>mapping order while keeping the unmapping order unchanged, resulting
>in a LIFO violation.
>
>kmap_local_page() requires unmapping to be done in reverse order
>(Last-In-First-Out). Currently we map vfrom and then vto, but unmap
>vfrom and then vto, which is incorrect. This patch corrects it to
>unmap vto first, then vfrom.
>
>This issue was detected by the kmap_local_lifo.cocci semantic patch.
>
>Fixes: 477a0ebec101 ("LoongArch: Replace kmap_atomic() with kmap_local_page() in copy_user_highpage()")
>Co-developed-by: Claude claude-opus-4-20250514
>Signed-off-by: Sasha Levin <sashal@kernel.org>
>---
> arch/loongarch/mm/init.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
>index c3e4586a7975..01c43f455486 100644
>--- a/arch/loongarch/mm/init.c
>+++ b/arch/loongarch/mm/init.c
>@@ -47,8 +47,8 @@ void copy_user_highpage(struct page *to, struct page *from,
> 	vfrom = kmap_local_page(from);
> 	vto = kmap_local_page(to);
> 	copy_page(vto, vfrom);
>-	kunmap_local(vfrom);
> 	kunmap_local(vto);
>+	kunmap_local(vfrom);
> 	/* Make sure this page is cleared on other CPU's too before using it */
> 	smp_wmb();
> }
>-- 
>2.39.5
>
>
>So, loongarch neither supports
>
>a) Highmem
>
>nor
>
>b) ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP, disabling DEBUG_KMAP_LOCAL_FORCE_MAP
>
>Consequently __kmap_local_page_prot() will not do anything:
>
>	if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
>		return page_address(page);
>
>
>So there isn't anything to fix here and the whole patch subject+description should be
>rewritten to focus on this being purely a cleanup -- unless I am missing
>something important.
>
>Also, please reduce the description to the absolute minimum, nobody wants to read the
>same thing 4 times using slightly different words.
>
>"LIFO ordering", "LIFO ordering", "unmapped in reverse order ... LIFO violation" ...
>"reverse order (Last-In-First-Out)"

How about something like:

     LoongArch: cleanup kmap_local_page() usage in copy_user_highpage()
     
     Unmap kmap_local_page() mappings in reverse order to follow the
     function's LIFO specification. While LoongArch doesn't support
     highmem and these operations are no-ops, code should still adhere
     to the API requirements.
     
     Detected by kmap_local_lifo.cocci.
     
     Fixes: 477a0ebec101 ("LoongArch: Replace kmap_atomic() with kmap_local_page() in copy_user_highpage()")
     Signed-off-by: Sasha Levin <sashal@kernel.org>

>More importantly: is the LIFO semantics clearly documented somewhere? I read
>Documentation/mm/highmem.rst
>
>  Nesting kmap_local_page() and kmap_atomic() mappings is allowed to a certain
>  extent (up to KMAP_TYPE_NR) but their invocations have to be strictly ordered
>  because the map implementation is stack based. See kmap_local_page() kdocs
>  (included in the "Functions" section) for details on how to manage nested
>  mappings.
>
>and that kind-of spells that out (strictly order -> stack based). I think one could
>have clarified that a bit further.
>
>Also, I would expect this to be mentioned in the docs of the relevant kmap functions,
>and the pte map / unmap functions.
>
>Did I miss that part or could we extend the function docs to spell that out?

The docs for kmap_local_page() seem to cover it better, and give the
concrete example we're trying to fix here:

  * Requires careful handling when nesting multiple mappings because the map
  * management is stack based. The unmap has to be in the reverse order of
  * the map operation:
  *
  * addr1 = kmap_local_page(page1);
  * addr2 = kmap_local_page(page2);
  * ...
  * kunmap_local(addr2);
  * kunmap_local(addr1);
  *
  * Unmapping addr1 before addr2 is invalid and causes malfunction.

-- 
Thanks,
Sasha
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Sasha Levin 2 months, 1 week ago
On Thu, Jul 31, 2025 at 02:56:25PM +0200, David Hildenbrand wrote:
>On 31.07.25 14:37, Sasha Levin wrote:
>>On Tue, Jul 08, 2025 at 05:42:16PM +0200, David Hildenbrand wrote:
>>>On 08.07.25 17:33, Sasha Levin wrote:
>>>>On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
>>>>>On 01.07.25 02:57, Andrew Morton wrote:
>>>>>>On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
>>>>>>
>>>>>>>When handling non-swap entries in move_pages_pte(), the error handling
>>>>>>>for entries that are NOT migration entries fails to unmap the page table
>>>>>>>entries before jumping to the error handling label.
>>>>>>>
>>>>>>>This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
>>>>>>>triggers a WARNING in kunmap_local_indexed() because the kmap stack is
>>>>>>>corrupted.
>>>>>>>
>>>>>>>Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>>>>>>>   WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>>>>>>>   Call trace:
>>>>>>>     kunmap_local_indexed from move_pages+0x964/0x19f4
>>>>>>>     move_pages from userfaultfd_ioctl+0x129c/0x2144
>>>>>>>     userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>>>>>>>
>>>>>>>The issue was introduced with the UFFDIO_MOVE feature but became more
>>>>>>>frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
>>>>>>>PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
>>>>>>>path more commonly executed during userfaultfd operations.
>>>>>>>
>>>>>>>Fix this by ensuring PTEs are properly unmapped in all non-swap entry
>>>>>>>paths before jumping to the error handling label, not just for migration
>>>>>>>entries.
>>>>>>
>>>>>>I don't get it.
>>>>>>
>>>>>>>--- a/mm/userfaultfd.c
>>>>>>>+++ b/mm/userfaultfd.c
>>>>>>>@@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>>>>>>>  		entry = pte_to_swp_entry(orig_src_pte);
>>>>>>>  		if (non_swap_entry(entry)) {
>>>>>>>+			pte_unmap(src_pte);
>>>>>>>+			pte_unmap(dst_pte);
>>>>>>>+			src_pte = dst_pte = NULL;
>>>>>>>  			if (is_migration_entry(entry)) {
>>>>>>>-				pte_unmap(src_pte);
>>>>>>>-				pte_unmap(dst_pte);
>>>>>>>-				src_pte = dst_pte = NULL;
>>>>>>>  				migration_entry_wait(mm, src_pmd, src_addr);
>>>>>>>  				err = -EAGAIN;
>>>>>>>-			} else
>>>>>>>+			} else {
>>>>>>>  				err = -EFAULT;
>>>>>>>+			}
>>>>>>>  			goto out;
>>>>>>
>>>>>>where we have
>>>>>>
>>>>>>out:
>>>>>>	...
>>>>>>	if (dst_pte)
>>>>>>		pte_unmap(dst_pte);
>>>>>>	if (src_pte)
>>>>>>		pte_unmap(src_pte);
>>>>>
>>>>>AI slop?
>>>>
>>>>Nah, this one is sadly all me :(
>>>
>>>Haha, sorry :P
>>
>>So as I was getting nowhere with this, I asked AI to help me :)
>>
>>If you're not interested in reading LLM generated code, feel free to
>>stop reading now...
>>
>>After it went over the logs, and a few prompts to point it the right
>>way, it ended up generating a patch (below) that made sense, and fixed
>>the warning that LKFT was being able to trigger.
>>
>>If anyone who's more familiar with the code than me (and the AI) agrees
>>with the patch and ways to throw their Reviewed-by, I'll send out the
>>patch.
>
>Seems to check out for me. In particular, out pte_unmap() everywhere 
>else in that function (and mremap.c:move_ptes) are ordered properly.
>
>Even if it would not fix the issue, it would be a cleanup :)
>
>Acked-by: David Hildenbrand <david@redhat.com>

Thanks for the review!

I'll send this patch out properly.

-- 
Thanks,
Sasha
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Suren Baghdasaryan 2 months, 1 week ago
On Thu, Jul 31, 2025 at 5:56 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 31.07.25 14:37, Sasha Levin wrote:
> > On Tue, Jul 08, 2025 at 05:42:16PM +0200, David Hildenbrand wrote:
> >> On 08.07.25 17:33, Sasha Levin wrote:
> >>> On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
> >>>> On 01.07.25 02:57, Andrew Morton wrote:
> >>>>> On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
> >>>>>
> >>>>>> When handling non-swap entries in move_pages_pte(), the error handling
> >>>>>> for entries that are NOT migration entries fails to unmap the page table
> >>>>>> entries before jumping to the error handling label.
> >>>>>>
> >>>>>> This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
> >>>>>> triggers a WARNING in kunmap_local_indexed() because the kmap stack is
> >>>>>> corrupted.
> >>>>>>
> >>>>>> Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
> >>>>>>    WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
> >>>>>>    Call trace:
> >>>>>>      kunmap_local_indexed from move_pages+0x964/0x19f4
> >>>>>>      move_pages from userfaultfd_ioctl+0x129c/0x2144
> >>>>>>      userfaultfd_ioctl from sys_ioctl+0x558/0xd24
> >>>>>>
> >>>>>> The issue was introduced with the UFFDIO_MOVE feature but became more
> >>>>>> frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
> >>>>>> PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
> >>>>>> path more commonly executed during userfaultfd operations.
> >>>>>>
> >>>>>> Fix this by ensuring PTEs are properly unmapped in all non-swap entry
> >>>>>> paths before jumping to the error handling label, not just for migration
> >>>>>> entries.
> >>>>>
> >>>>> I don't get it.
> >>>>>
> >>>>>> --- a/mm/userfaultfd.c
> >>>>>> +++ b/mm/userfaultfd.c
> >>>>>> @@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
> >>>>>>                  entry = pte_to_swp_entry(orig_src_pte);
> >>>>>>                  if (non_swap_entry(entry)) {
> >>>>>> +                        pte_unmap(src_pte);
> >>>>>> +                        pte_unmap(dst_pte);
> >>>>>> +                        src_pte = dst_pte = NULL;
> >>>>>>                          if (is_migration_entry(entry)) {
> >>>>>> -                                pte_unmap(src_pte);
> >>>>>> -                                pte_unmap(dst_pte);
> >>>>>> -                                src_pte = dst_pte = NULL;
> >>>>>>                                  migration_entry_wait(mm, src_pmd, src_addr);
> >>>>>>                                  err = -EAGAIN;
> >>>>>> -                        } else
> >>>>>> +                        } else {
> >>>>>>                                  err = -EFAULT;
> >>>>>> +                        }
> >>>>>>                          goto out;
> >>>>>
> >>>>> where we have
> >>>>>
> >>>>> out:
> >>>>>   ...
> >>>>>   if (dst_pte)
> >>>>>           pte_unmap(dst_pte);
> >>>>>   if (src_pte)
> >>>>>           pte_unmap(src_pte);
> >>>>
> >>>> AI slop?
> >>>
> >>> Nah, this one is sadly all me :(
> >>
> >> Haha, sorry :P
> >
> > So as I was getting nowhere with this, I asked AI to help me :)
> >
> > If you're not interested in reading LLM generated code, feel free to
> > stop reading now...
> >
> > After it went over the logs, and a few prompts to point it the right
> > way, it ended up generating a patch (below) that made sense, and fixed
> > the warning that LKFT was being able to trigger.
> >
> > If anyone who's more familiar with the code than me (and the AI) agrees
> > with the patch and ways to throw their Reviewed-by, I'll send out the
> > patch.
>
> Seems to check out for me. In particular, out pte_unmap() everywhere
> else in that function (and mremap.c:move_ptes) are ordered properly.
>
> Even if it would not fix the issue, it would be a cleanup :)
>
> Acked-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

Thanks for the fix!

>
> --
> Cheers,
>
> David / dhildenb
>
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Suren Baghdasaryan 3 months ago
On Tue, Jul 8, 2025 at 8:33 AM Sasha Levin <sashal@kernel.org> wrote:
>
> On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
> >On 01.07.25 02:57, Andrew Morton wrote:
> >>On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
> >>
> >>>When handling non-swap entries in move_pages_pte(), the error handling
> >>>for entries that are NOT migration entries fails to unmap the page table
> >>>entries before jumping to the error handling label.
> >>>
> >>>This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
> >>>triggers a WARNING in kunmap_local_indexed() because the kmap stack is
> >>>corrupted.
> >>>
> >>>Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
> >>>   WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
> >>>   Call trace:
> >>>     kunmap_local_indexed from move_pages+0x964/0x19f4
> >>>     move_pages from userfaultfd_ioctl+0x129c/0x2144
> >>>     userfaultfd_ioctl from sys_ioctl+0x558/0xd24
> >>>
> >>>The issue was introduced with the UFFDIO_MOVE feature but became more
> >>>frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
> >>>PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
> >>>path more commonly executed during userfaultfd operations.
> >>>
> >>>Fix this by ensuring PTEs are properly unmapped in all non-swap entry
> >>>paths before jumping to the error handling label, not just for migration
> >>>entries.
> >>
> >>I don't get it.
> >>
> >>>--- a/mm/userfaultfd.c
> >>>+++ b/mm/userfaultfd.c
> >>>@@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
> >>>             entry = pte_to_swp_entry(orig_src_pte);
> >>>             if (non_swap_entry(entry)) {
> >>>+                    pte_unmap(src_pte);
> >>>+                    pte_unmap(dst_pte);
> >>>+                    src_pte = dst_pte = NULL;
> >>>                     if (is_migration_entry(entry)) {
> >>>-                            pte_unmap(src_pte);
> >>>-                            pte_unmap(dst_pte);
> >>>-                            src_pte = dst_pte = NULL;
> >>>                             migration_entry_wait(mm, src_pmd, src_addr);
> >>>                             err = -EAGAIN;
> >>>-                    } else
> >>>+                    } else {
> >>>                             err = -EFAULT;
> >>>+                    }
> >>>                     goto out;
> >>
> >>where we have
> >>
> >>out:
> >>      ...
> >>      if (dst_pte)
> >>              pte_unmap(dst_pte);
> >>      if (src_pte)
> >>              pte_unmap(src_pte);
> >
> >AI slop?
>
> Nah, this one is sadly all me :(
>
> I was trying to resolve some of the issues found with linus-next on
> LKFT, and misunderstood the code. Funny enough, I thought that the
> change above "fixed" it by making the warnings go away, but clearly is
> the wrong thing to do so I went back to the drawing table...
>
> If you're curious, here's the issue: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.13-rc7-43418-g558c6dd4d863/testrun/29030370/suite/log-parser-test/test/exception-warning-cpu-pid-at-mmhighmem-kunmap_local_indexed/details/

Any way to symbolize that Call trace? I can't find build artefacts to
extract vmlinux image...

>
> --
> Thanks,
> Sasha
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Sasha Levin 3 months ago
On Tue, Jul 08, 2025 at 08:39:47AM -0700, Suren Baghdasaryan wrote:
>On Tue, Jul 8, 2025 at 8:33 AM Sasha Levin <sashal@kernel.org> wrote:
>>
>> On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
>> >On 01.07.25 02:57, Andrew Morton wrote:
>> >>On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
>> >>
>> >>>When handling non-swap entries in move_pages_pte(), the error handling
>> >>>for entries that are NOT migration entries fails to unmap the page table
>> >>>entries before jumping to the error handling label.
>> >>>
>> >>>This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
>> >>>triggers a WARNING in kunmap_local_indexed() because the kmap stack is
>> >>>corrupted.
>> >>>
>> >>>Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>> >>>   WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>> >>>   Call trace:
>> >>>     kunmap_local_indexed from move_pages+0x964/0x19f4
>> >>>     move_pages from userfaultfd_ioctl+0x129c/0x2144
>> >>>     userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>> >>>
>> >>>The issue was introduced with the UFFDIO_MOVE feature but became more
>> >>>frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
>> >>>PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
>> >>>path more commonly executed during userfaultfd operations.
>> >>>
>> >>>Fix this by ensuring PTEs are properly unmapped in all non-swap entry
>> >>>paths before jumping to the error handling label, not just for migration
>> >>>entries.
>> >>
>> >>I don't get it.
>> >>
>> >>>--- a/mm/userfaultfd.c
>> >>>+++ b/mm/userfaultfd.c
>> >>>@@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>> >>>             entry = pte_to_swp_entry(orig_src_pte);
>> >>>             if (non_swap_entry(entry)) {
>> >>>+                    pte_unmap(src_pte);
>> >>>+                    pte_unmap(dst_pte);
>> >>>+                    src_pte = dst_pte = NULL;
>> >>>                     if (is_migration_entry(entry)) {
>> >>>-                            pte_unmap(src_pte);
>> >>>-                            pte_unmap(dst_pte);
>> >>>-                            src_pte = dst_pte = NULL;
>> >>>                             migration_entry_wait(mm, src_pmd, src_addr);
>> >>>                             err = -EAGAIN;
>> >>>-                    } else
>> >>>+                    } else {
>> >>>                             err = -EFAULT;
>> >>>+                    }
>> >>>                     goto out;
>> >>
>> >>where we have
>> >>
>> >>out:
>> >>      ...
>> >>      if (dst_pte)
>> >>              pte_unmap(dst_pte);
>> >>      if (src_pte)
>> >>              pte_unmap(src_pte);
>> >
>> >AI slop?
>>
>> Nah, this one is sadly all me :(
>>
>> I was trying to resolve some of the issues found with linus-next on
>> LKFT, and misunderstood the code. Funny enough, I thought that the
>> change above "fixed" it by making the warnings go away, but clearly is
>> the wrong thing to do so I went back to the drawing table...
>>
>> If you're curious, here's the issue: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.13-rc7-43418-g558c6dd4d863/testrun/29030370/suite/log-parser-test/test/exception-warning-cpu-pid-at-mmhighmem-kunmap_local_indexed/details/
>
>Any way to symbolize that Call trace? I can't find build artefacts to
>extract vmlinux image...

The build artifacts are at
https://storage.tuxsuite.com/public/linaro/lkft/builds/2zSrTao2x4P640QKIx18JUuFdc1/
but I couldn't get it to do the right thing. I'm guessing that I need
some magical arm32 toolchain bits that I don't carry:

cat tr.txt | ./scripts/decode_stacktrace.sh vmlinux
<4>[   38.566145] ------------[ cut here ]------------
<4>[ 38.566392] WARNING: CPU: 1 PID: 637 at mm/highmem.c:622 kunmap_local_indexed+0x198/0x1a4
<4>[   38.569398] Modules linked in: nfnetlink ip_tables x_tables
<4>[   38.570481] CPU: 1 UID: 0 PID: 637 Comm: uffd-unit-tests Not tainted 6.16.0-rc4 #1 NONE
<4>[   38.570815] Hardware name: Generic DT based system
<4>[   38.571073] Call trace:
<4>[ 38.571239] unwind_backtrace from show_stack (arch/arm64/kernel/stacktrace.c:465)
<4>[ 38.571602] show_stack from dump_stack_lvl (lib/dump_stack.c:118 (discriminator 1))
<4>[ 38.571805] dump_stack_lvl from __warn (kernel/panic.c:791)
<4>[ 38.572002] __warn from warn_slowpath_fmt+0xa8/0x174
<4>[ 38.572290] warn_slowpath_fmt from kunmap_local_indexed+0x198/0x1a4
<4>[ 38.572520] kunmap_local_indexed from move_pages_pte+0xc40/0xf48
<4>[ 38.572970] move_pages_pte from move_pages+0x428/0x5bc
<4>[ 38.573189] move_pages from userfaultfd_ioctl+0x900/0x1ec0
<4>[ 38.573376] userfaultfd_ioctl from sys_ioctl+0xd24/0xd90
<4>[ 38.573581] sys_ioctl from ret_fast_syscall+0x0/0x5c
<4>[   38.573810] Exception stack(0xf9d69fa8 to 0xf9d69ff0)
<4>[   38.574546] 9fa0:                   00001000 00000005 00000005 c028aa05 b2d3ecd8 b2d3ecc8
<4>[   38.574919] 9fc0: 00001000 00000005 b2d3ece0 00000036 b2d3ed84 b2d3ed50 b2d3ed7c b2d3ed58
<4>[   38.575131] 9fe0: 00000036 b2d3ecb0 b6df1861 b6d5f736
<4>[   38.575511] ---[ end trace 0000000000000000 ]---

-- 
Thanks,
Sasha
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Suren Baghdasaryan 3 months ago
On Tue, Jul 8, 2025 at 8:57 AM Sasha Levin <sashal@kernel.org> wrote:
>
> On Tue, Jul 08, 2025 at 08:39:47AM -0700, Suren Baghdasaryan wrote:
> >On Tue, Jul 8, 2025 at 8:33 AM Sasha Levin <sashal@kernel.org> wrote:
> >>
> >> On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
> >> >On 01.07.25 02:57, Andrew Morton wrote:
> >> >>On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
> >> >>
> >> >>>When handling non-swap entries in move_pages_pte(), the error handling
> >> >>>for entries that are NOT migration entries fails to unmap the page table
> >> >>>entries before jumping to the error handling label.
> >> >>>
> >> >>>This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
> >> >>>triggers a WARNING in kunmap_local_indexed() because the kmap stack is
> >> >>>corrupted.
> >> >>>
> >> >>>Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
> >> >>>   WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
> >> >>>   Call trace:
> >> >>>     kunmap_local_indexed from move_pages+0x964/0x19f4
> >> >>>     move_pages from userfaultfd_ioctl+0x129c/0x2144
> >> >>>     userfaultfd_ioctl from sys_ioctl+0x558/0xd24
> >> >>>
> >> >>>The issue was introduced with the UFFDIO_MOVE feature but became more
> >> >>>frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
> >> >>>PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
> >> >>>path more commonly executed during userfaultfd operations.
> >> >>>
> >> >>>Fix this by ensuring PTEs are properly unmapped in all non-swap entry
> >> >>>paths before jumping to the error handling label, not just for migration
> >> >>>entries.
> >> >>
> >> >>I don't get it.
> >> >>
> >> >>>--- a/mm/userfaultfd.c
> >> >>>+++ b/mm/userfaultfd.c
> >> >>>@@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
> >> >>>             entry = pte_to_swp_entry(orig_src_pte);
> >> >>>             if (non_swap_entry(entry)) {
> >> >>>+                    pte_unmap(src_pte);
> >> >>>+                    pte_unmap(dst_pte);
> >> >>>+                    src_pte = dst_pte = NULL;
> >> >>>                     if (is_migration_entry(entry)) {
> >> >>>-                            pte_unmap(src_pte);
> >> >>>-                            pte_unmap(dst_pte);
> >> >>>-                            src_pte = dst_pte = NULL;
> >> >>>                             migration_entry_wait(mm, src_pmd, src_addr);
> >> >>>                             err = -EAGAIN;
> >> >>>-                    } else
> >> >>>+                    } else {
> >> >>>                             err = -EFAULT;
> >> >>>+                    }
> >> >>>                     goto out;
> >> >>
> >> >>where we have
> >> >>
> >> >>out:
> >> >>      ...
> >> >>      if (dst_pte)
> >> >>              pte_unmap(dst_pte);
> >> >>      if (src_pte)
> >> >>              pte_unmap(src_pte);
> >> >
> >> >AI slop?
> >>
> >> Nah, this one is sadly all me :(
> >>
> >> I was trying to resolve some of the issues found with linus-next on
> >> LKFT, and misunderstood the code. Funny enough, I thought that the
> >> change above "fixed" it by making the warnings go away, but clearly is
> >> the wrong thing to do so I went back to the drawing table...
> >>
> >> If you're curious, here's the issue: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.13-rc7-43418-g558c6dd4d863/testrun/29030370/suite/log-parser-test/test/exception-warning-cpu-pid-at-mmhighmem-kunmap_local_indexed/details/
> >
> >Any way to symbolize that Call trace? I can't find build artefacts to
> >extract vmlinux image...
>
> The build artifacts are at
> https://storage.tuxsuite.com/public/linaro/lkft/builds/2zSrTao2x4P640QKIx18JUuFdc1/
> but I couldn't get it to do the right thing. I'm guessing that I need
> some magical arm32 toolchain bits that I don't carry:
>
> cat tr.txt | ./scripts/decode_stacktrace.sh vmlinux
> <4>[   38.566145] ------------[ cut here ]------------
> <4>[ 38.566392] WARNING: CPU: 1 PID: 637 at mm/highmem.c:622 kunmap_local_indexed+0x198/0x1a4
> <4>[   38.569398] Modules linked in: nfnetlink ip_tables x_tables
> <4>[   38.570481] CPU: 1 UID: 0 PID: 637 Comm: uffd-unit-tests Not tainted 6.16.0-rc4 #1 NONE
> <4>[   38.570815] Hardware name: Generic DT based system
> <4>[   38.571073] Call trace:
> <4>[ 38.571239] unwind_backtrace from show_stack (arch/arm64/kernel/stacktrace.c:465)
> <4>[ 38.571602] show_stack from dump_stack_lvl (lib/dump_stack.c:118 (discriminator 1))
> <4>[ 38.571805] dump_stack_lvl from __warn (kernel/panic.c:791)
> <4>[ 38.572002] __warn from warn_slowpath_fmt+0xa8/0x174
> <4>[ 38.572290] warn_slowpath_fmt from kunmap_local_indexed+0x198/0x1a4
> <4>[ 38.572520] kunmap_local_indexed from move_pages_pte+0xc40/0xf48
> <4>[ 38.572970] move_pages_pte from move_pages+0x428/0x5bc
> <4>[ 38.573189] move_pages from userfaultfd_ioctl+0x900/0x1ec0
> <4>[ 38.573376] userfaultfd_ioctl from sys_ioctl+0xd24/0xd90
> <4>[ 38.573581] sys_ioctl from ret_fast_syscall+0x0/0x5c
> <4>[   38.573810] Exception stack(0xf9d69fa8 to 0xf9d69ff0)
> <4>[   38.574546] 9fa0:                   00001000 00000005 00000005 c028aa05 b2d3ecd8 b2d3ecc8
> <4>[   38.574919] 9fc0: 00001000 00000005 b2d3ece0 00000036 b2d3ed84 b2d3ed50 b2d3ed7c b2d3ed58
> <4>[   38.575131] 9fe0: 00000036 b2d3ecb0 b6df1861 b6d5f736
> <4>[   38.575511] ---[ end trace 0000000000000000 ]---

Ah, I know what's going on. 6.13.rc7 which is used in this test does
not have my fix 927e926d72d9 ("userfaultfd: fix PTE unmapping
stack-allocated PTE copies") (see
https://elixir.bootlin.com/linux/v6.13.7/source/mm/userfaultfd.c#L1284).
It was backported into 6.13.rc8. So, it tries to unmap a copy of a
mapped PTE, which will fail when CONFIG_HIGHPTE is enabled. So, it
makes sense that it is failing on arm32.

>
> --
> Thanks,
> Sasha
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Sasha Levin 2 months, 1 week ago
On Tue, Jul 08, 2025 at 09:34:48AM -0700, Suren Baghdasaryan wrote:
>On Tue, Jul 8, 2025 at 8:57 AM Sasha Levin <sashal@kernel.org> wrote:
>>
>> On Tue, Jul 08, 2025 at 08:39:47AM -0700, Suren Baghdasaryan wrote:
>> >On Tue, Jul 8, 2025 at 8:33 AM Sasha Levin <sashal@kernel.org> wrote:
>> >>
>> >> On Tue, Jul 08, 2025 at 05:10:44PM +0200, David Hildenbrand wrote:
>> >> >On 01.07.25 02:57, Andrew Morton wrote:
>> >> >>On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
>> >> >>
>> >> >>>When handling non-swap entries in move_pages_pte(), the error handling
>> >> >>>for entries that are NOT migration entries fails to unmap the page table
>> >> >>>entries before jumping to the error handling label.
>> >> >>>
>> >> >>>This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
>> >> >>>triggers a WARNING in kunmap_local_indexed() because the kmap stack is
>> >> >>>corrupted.
>> >> >>>
>> >> >>>Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>> >> >>>   WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>> >> >>>   Call trace:
>> >> >>>     kunmap_local_indexed from move_pages+0x964/0x19f4
>> >> >>>     move_pages from userfaultfd_ioctl+0x129c/0x2144
>> >> >>>     userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>> >> >>>
>> >> >>>The issue was introduced with the UFFDIO_MOVE feature but became more
>> >> >>>frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
>> >> >>>PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
>> >> >>>path more commonly executed during userfaultfd operations.
>> >> >>>
>> >> >>>Fix this by ensuring PTEs are properly unmapped in all non-swap entry
>> >> >>>paths before jumping to the error handling label, not just for migration
>> >> >>>entries.
>> >> >>
>> >> >>I don't get it.
>> >> >>
>> >> >>>--- a/mm/userfaultfd.c
>> >> >>>+++ b/mm/userfaultfd.c
>> >> >>>@@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>> >> >>>             entry = pte_to_swp_entry(orig_src_pte);
>> >> >>>             if (non_swap_entry(entry)) {
>> >> >>>+                    pte_unmap(src_pte);
>> >> >>>+                    pte_unmap(dst_pte);
>> >> >>>+                    src_pte = dst_pte = NULL;
>> >> >>>                     if (is_migration_entry(entry)) {
>> >> >>>-                            pte_unmap(src_pte);
>> >> >>>-                            pte_unmap(dst_pte);
>> >> >>>-                            src_pte = dst_pte = NULL;
>> >> >>>                             migration_entry_wait(mm, src_pmd, src_addr);
>> >> >>>                             err = -EAGAIN;
>> >> >>>-                    } else
>> >> >>>+                    } else {
>> >> >>>                             err = -EFAULT;
>> >> >>>+                    }
>> >> >>>                     goto out;
>> >> >>
>> >> >>where we have
>> >> >>
>> >> >>out:
>> >> >>      ...
>> >> >>      if (dst_pte)
>> >> >>              pte_unmap(dst_pte);
>> >> >>      if (src_pte)
>> >> >>              pte_unmap(src_pte);
>> >> >
>> >> >AI slop?
>> >>
>> >> Nah, this one is sadly all me :(
>> >>
>> >> I was trying to resolve some of the issues found with linus-next on
>> >> LKFT, and misunderstood the code. Funny enough, I thought that the
>> >> change above "fixed" it by making the warnings go away, but clearly is
>> >> the wrong thing to do so I went back to the drawing table...
>> >>
>> >> If you're curious, here's the issue: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.13-rc7-43418-g558c6dd4d863/testrun/29030370/suite/log-parser-test/test/exception-warning-cpu-pid-at-mmhighmem-kunmap_local_indexed/details/
>> >
>> >Any way to symbolize that Call trace? I can't find build artefacts to
>> >extract vmlinux image...
>>
>> The build artifacts are at
>> https://storage.tuxsuite.com/public/linaro/lkft/builds/2zSrTao2x4P640QKIx18JUuFdc1/
>> but I couldn't get it to do the right thing. I'm guessing that I need
>> some magical arm32 toolchain bits that I don't carry:
>>
>> cat tr.txt | ./scripts/decode_stacktrace.sh vmlinux
>> <4>[   38.566145] ------------[ cut here ]------------
>> <4>[ 38.566392] WARNING: CPU: 1 PID: 637 at mm/highmem.c:622 kunmap_local_indexed+0x198/0x1a4
>> <4>[   38.569398] Modules linked in: nfnetlink ip_tables x_tables
>> <4>[   38.570481] CPU: 1 UID: 0 PID: 637 Comm: uffd-unit-tests Not tainted 6.16.0-rc4 #1 NONE
>> <4>[   38.570815] Hardware name: Generic DT based system
>> <4>[   38.571073] Call trace:
>> <4>[ 38.571239] unwind_backtrace from show_stack (arch/arm64/kernel/stacktrace.c:465)
>> <4>[ 38.571602] show_stack from dump_stack_lvl (lib/dump_stack.c:118 (discriminator 1))
>> <4>[ 38.571805] dump_stack_lvl from __warn (kernel/panic.c:791)
>> <4>[ 38.572002] __warn from warn_slowpath_fmt+0xa8/0x174
>> <4>[ 38.572290] warn_slowpath_fmt from kunmap_local_indexed+0x198/0x1a4
>> <4>[ 38.572520] kunmap_local_indexed from move_pages_pte+0xc40/0xf48
>> <4>[ 38.572970] move_pages_pte from move_pages+0x428/0x5bc
>> <4>[ 38.573189] move_pages from userfaultfd_ioctl+0x900/0x1ec0
>> <4>[ 38.573376] userfaultfd_ioctl from sys_ioctl+0xd24/0xd90
>> <4>[ 38.573581] sys_ioctl from ret_fast_syscall+0x0/0x5c
>> <4>[   38.573810] Exception stack(0xf9d69fa8 to 0xf9d69ff0)
>> <4>[   38.574546] 9fa0:                   00001000 00000005 00000005 c028aa05 b2d3ecd8 b2d3ecc8
>> <4>[   38.574919] 9fc0: 00001000 00000005 b2d3ece0 00000036 b2d3ed84 b2d3ed50 b2d3ed7c b2d3ed58
>> <4>[   38.575131] 9fe0: 00000036 b2d3ecb0 b6df1861 b6d5f736
>> <4>[   38.575511] ---[ end trace 0000000000000000 ]---
>
>Ah, I know what's going on. 6.13.rc7 which is used in this test does
>not have my fix 927e926d72d9 ("userfaultfd: fix PTE unmapping
>stack-allocated PTE copies") (see
>https://elixir.bootlin.com/linux/v6.13.7/source/mm/userfaultfd.c#L1284).
>It was backported into 6.13.rc8. So, it tries to unmap a copy of a
>mapped PTE, which will fail when CONFIG_HIGHPTE is enabled. So, it
>makes sense that it is failing on arm32.

Sorry, I've missed this.

The tree only identifies as 6.13-rc7 but in practice it's a much newer
version since it merges in PRs from the ML.

The issue was still reproducing even on v6.16 with 927e926d72d9.

I've sent out https://lore.kernel.org/all/aItjffoR7molh3QF@lappy/ which
fixed the issue for me.

-- 
Thanks,
Sasha
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Suren Baghdasaryan 3 months ago
On Tue, Jul 8, 2025 at 8:10 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 01.07.25 02:57, Andrew Morton wrote:
> > On Sun, 29 Jun 2025 23:19:58 -0400 Sasha Levin <sashal@kernel.org> wrote:
> >
> >> When handling non-swap entries in move_pages_pte(), the error handling
> >> for entries that are NOT migration entries fails to unmap the page table
> >> entries before jumping to the error handling label.
> >>
> >> This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
> >> triggers a WARNING in kunmap_local_indexed() because the kmap stack is
> >> corrupted.
> >>
> >> Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
> >>    WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
> >>    Call trace:
> >>      kunmap_local_indexed from move_pages+0x964/0x19f4
> >>      move_pages from userfaultfd_ioctl+0x129c/0x2144
> >>      userfaultfd_ioctl from sys_ioctl+0x558/0xd24
> >>
> >> The issue was introduced with the UFFDIO_MOVE feature but became more
> >> frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
> >> PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
> >> path more commonly executed during userfaultfd operations.
> >>
> >> Fix this by ensuring PTEs are properly unmapped in all non-swap entry
> >> paths before jumping to the error handling label, not just for migration
> >> entries.
> >
> > I don't get it.
> >
> >> --- a/mm/userfaultfd.c
> >> +++ b/mm/userfaultfd.c
> >> @@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
> >>
> >>              entry = pte_to_swp_entry(orig_src_pte);
> >>              if (non_swap_entry(entry)) {
> >> +                    pte_unmap(src_pte);
> >> +                    pte_unmap(dst_pte);
> >> +                    src_pte = dst_pte = NULL;
> >>                      if (is_migration_entry(entry)) {
> >> -                            pte_unmap(src_pte);
> >> -                            pte_unmap(dst_pte);
> >> -                            src_pte = dst_pte = NULL;
> >>                              migration_entry_wait(mm, src_pmd, src_addr);
> >>                              err = -EAGAIN;
> >> -                    } else
> >> +                    } else {
> >>                              err = -EFAULT;
> >> +                    }
> >>                      goto out;
> >
> > where we have
> >
> > out:
> >       ...
> >       if (dst_pte)
> >               pte_unmap(dst_pte);
> >       if (src_pte)
> >               pte_unmap(src_pte);
>
> AI slop?

Hmm, but there is even a Call trace in the report. I wonder if the
issue is somewhere else?

>
> --
> Cheers,
>
> David / dhildenb
>
Re: [PATCH] mm/userfaultfd: fix missing PTE unmap for non-migration entries
Posted by Dev Jain 3 months, 1 week ago
On 30/06/25 8:49 am, Sasha Levin wrote:
> When handling non-swap entries in move_pages_pte(), the error handling
> for entries that are NOT migration entries fails to unmap the page table
> entries before jumping to the error handling label.
>
> This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
> triggers a WARNING in kunmap_local_indexed() because the kmap stack is
> corrupted.
>
> Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
>    WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
>    Call trace:
>      kunmap_local_indexed from move_pages+0x964/0x19f4
>      move_pages from userfaultfd_ioctl+0x129c/0x2144
>      userfaultfd_ioctl from sys_ioctl+0x558/0xd24
>
> The issue was introduced with the UFFDIO_MOVE feature but became more
> frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
> PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
> path more commonly executed during userfaultfd operations.
>
> Fix this by ensuring PTEs are properly unmapped in all non-swap entry
> paths before jumping to the error handling label, not just for migration
> entries.
>
> Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   mm/userfaultfd.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 8253978ee0fb1..7c298e9cbc18f 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>   
>   		entry = pte_to_swp_entry(orig_src_pte);
>   		if (non_swap_entry(entry)) {
> +			pte_unmap(src_pte);
> +			pte_unmap(dst_pte);
> +			src_pte = dst_pte = NULL;
>   			if (is_migration_entry(entry)) {
> -				pte_unmap(src_pte);
> -				pte_unmap(dst_pte);
> -				src_pte = dst_pte = NULL;
>   				migration_entry_wait(mm, src_pmd, src_addr);
>   				err = -EAGAIN;
> -			} else
> +			} else {
>   				err = -EFAULT;
> +			}
>   			goto out;

Won't the out label take care of the unmapping? I think CONFIG_HIGHPTE
is involved in the explanation.

>   		}
>