[PATCH] mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count

Jane Chu posted 1 patch 3 weeks, 2 days ago
There is a newer version of this series
mm/hugetlb.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
[PATCH] mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count
Posted by Jane Chu 3 weeks, 2 days ago
commit 59d9094df3d79 introduced ->pt_share_count dedicated to
hugetlb PMD share count tracking, but omitted fixing
copy_hugetlb_page_range(), leaving the function relying on
page_count() for tracking that no longer works.

When lazy page table copy for hugetlb is disabled (commit bcd51a3c679d),
fork()'ing with hugetlb PMD sharing quickly lockup -

[  239.446559] watchdog: BUG: soft lockup - CPU#75 stuck for 27s!
[  239.446611] RIP: 0010:native_queued_spin_lock_slowpath+0x7e/0x2e0
[  239.446631] Call Trace:
[  239.446633]  <TASK>
[  239.446636]  _raw_spin_lock+0x3f/0x60
[  239.446639]  copy_hugetlb_page_range+0x258/0xb50
[  239.446645]  copy_page_range+0x22b/0x2c0
[  239.446651]  dup_mmap+0x3e2/0x770
[  239.446654]  dup_mm.constprop.0+0x5e/0x230
[  239.446657]  copy_process+0xd17/0x1760
[  239.446660]  kernel_clone+0xc0/0x3e0
[  239.446661]  __do_sys_clone+0x65/0xa0
[  239.446664]  do_syscall_64+0x82/0x930
[  239.446668]  ? count_memcg_events+0xd2/0x190
[  239.446671]  ? syscall_trace_enter+0x14e/0x1f0
[  239.446676]  ? syscall_exit_work+0x118/0x150
[  239.446677]  ? arch_exit_to_user_mode_prepare.constprop.0+0x9/0xb0
[  239.446681]  ? clear_bhb_loop+0x30/0x80
[  239.446684]  ? clear_bhb_loop+0x30/0x80
[  239.446686]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

There are two options to resolve the potential latent issue:
  1. remove the PMD sharing awareness from copy_hugetlb_page_range(),
  2. fix it.
This patch opts for the second option.

Fixes: 59d9094df3d79 ("mm: hugetlb: independent PMD page table shared
count")

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 mm/hugetlb.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 753f99b4c718..8ca5b4f7805f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5594,18 +5594,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			break;
 		}
 
-		/*
-		 * If the pagetables are shared don't copy or take references.
-		 *
-		 * dst_pte == src_pte is the common case of src/dest sharing.
-		 * However, src could have 'unshared' and dst shares with
-		 * another vma. So page_count of ptep page is checked instead
-		 * to reliably determine whether pte is shared.
-		 */
-		if (page_count(virt_to_page(dst_pte)) > 1) {
+#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING
+		/* If the pagetables are shared don't copy or take references. */
+		if (ptdesc_pmd_pts_count(virt_to_ptdesc(dst_pte)) > 0) {
 			addr |= last_addr_mask;
 			continue;
 		}
+#endif
 
 		dst_ptl = huge_pte_lock(h, dst, dst_pte);
 		src_ptl = huge_pte_lockptr(h, src, src_pte);
-- 
2.43.5
Re: [PATCH] mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count
Posted by David Hildenbrand 3 weeks, 1 day ago
On 09.09.25 20:43, Jane Chu wrote:
> commit 59d9094df3d79 introduced ->pt_share_count dedicated to
> hugetlb PMD share count tracking, but omitted fixing
> copy_hugetlb_page_range(), leaving the function relying on
> page_count() for tracking that no longer works.
> 
> When lazy page table copy for hugetlb is disabled (commit bcd51a3c679d),
> fork()'ing with hugetlb PMD sharing quickly lockup -
> 
> [  239.446559] watchdog: BUG: soft lockup - CPU#75 stuck for 27s!
> [  239.446611] RIP: 0010:native_queued_spin_lock_slowpath+0x7e/0x2e0
> [  239.446631] Call Trace:
> [  239.446633]  <TASK>
> [  239.446636]  _raw_spin_lock+0x3f/0x60
> [  239.446639]  copy_hugetlb_page_range+0x258/0xb50
> [  239.446645]  copy_page_range+0x22b/0x2c0
> [  239.446651]  dup_mmap+0x3e2/0x770
> [  239.446654]  dup_mm.constprop.0+0x5e/0x230
> [  239.446657]  copy_process+0xd17/0x1760
> [  239.446660]  kernel_clone+0xc0/0x3e0
> [  239.446661]  __do_sys_clone+0x65/0xa0
> [  239.446664]  do_syscall_64+0x82/0x930
> [  239.446668]  ? count_memcg_events+0xd2/0x190
> [  239.446671]  ? syscall_trace_enter+0x14e/0x1f0
> [  239.446676]  ? syscall_exit_work+0x118/0x150
> [  239.446677]  ? arch_exit_to_user_mode_prepare.constprop.0+0x9/0xb0
> [  239.446681]  ? clear_bhb_loop+0x30/0x80
> [  239.446684]  ? clear_bhb_loop+0x30/0x80
> [  239.446686]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> There are two options to resolve the potential latent issue:
>    1. remove the PMD sharing awareness from copy_hugetlb_page_range(),
>    2. fix it.
> This patch opts for the second option.
> 
> Fixes: 59d9094df3d79 ("mm: hugetlb: independent PMD page table shared
> count")
> 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
>   mm/hugetlb.c | 13 ++++---------
>   1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 753f99b4c718..8ca5b4f7805f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5594,18 +5594,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>   			break;
>   		}
>   
> -		/*
> -		 * If the pagetables are shared don't copy or take references.
> -		 *
> -		 * dst_pte == src_pte is the common case of src/dest sharing.
> -		 * However, src could have 'unshared' and dst shares with
> -		 * another vma. So page_count of ptep page is checked instead
> -		 * to reliably determine whether pte is shared.
> -		 */
> -		if (page_count(virt_to_page(dst_pte)) > 1) {
> +#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING
> +		/* If the pagetables are shared don't copy or take references. */

Why remove so much of the original comment?

> +		if (ptdesc_pmd_pts_count(virt_to_ptdesc(dst_pte)) > 0) {
>   			addr |= last_addr_mask;
>   			continue;
>   		}

LGTM, thanks!

-- 
Cheers

David / dhildenb
Re: [PATCH] mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count
Posted by jane.chu@oracle.com 3 weeks ago
On 9/9/2025 11:45 PM, David Hildenbrand wrote:
[..]
>> -        /*
>> -         * If the pagetables are shared don't copy or take references.
>> -         *
>> -         * dst_pte == src_pte is the common case of src/dest sharing.
>> -         * However, src could have 'unshared' and dst shares with
>> -         * another vma. So page_count of ptep page is checked instead
>> -         * to reliably determine whether pte is shared.
>> -         */
>> -        if (page_count(virt_to_page(dst_pte)) > 1) {
>> +#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING
>> +        /* If the pagetables are shared don't copy or take 
>> references. */
> 
> Why remove so much of the original comment?

Because, this part of checking has already advanced from the "dst_pte == 
src_pte" to "page_count() > 1" to ->pt_share_count > 0, it seems cleaner 
to just keep an one liner comment.
That said, if you feel the comments should be kept, I'd be happy to 
restore them with a bit revision.

> 
>> +        if (ptdesc_pmd_pts_count(virt_to_ptdesc(dst_pte)) > 0) {
>>               addr |= last_addr_mask;
>>               continue;
>>           }
> 
> LGTM, thanks!

Thanks!
-jane

> 


Re: [PATCH] mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count
Posted by David Hildenbrand 2 weeks, 6 days ago
On 11.09.25 21:54, jane.chu@oracle.com wrote:
> 
> On 9/9/2025 11:45 PM, David Hildenbrand wrote:
> [..]
>>> -        /*
>>> -         * If the pagetables are shared don't copy or take references.
>>> -         *
>>> -         * dst_pte == src_pte is the common case of src/dest sharing.
>>> -         * However, src could have 'unshared' and dst shares with
>>> -         * another vma. So page_count of ptep page is checked instead
>>> -         * to reliably determine whether pte is shared.
>>> -         */
>>> -        if (page_count(virt_to_page(dst_pte)) > 1) {
>>> +#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING
>>> +        /* If the pagetables are shared don't copy or take
>>> references. */
>>
>> Why remove so much of the original comment?
> 
> Because, this part of checking has already advanced from the "dst_pte ==
> src_pte" to "page_count() > 1" to ->pt_share_count > 0, it seems cleaner
> to just keep an one liner comment.
> That said, if you feel the comments should be kept, I'd be happy to
> restore them with a bit revision.

Well, the comment explains why checking the pte pointers is insufficient 
and why there is a corner case where the pointers differ but we still 
want to unshare. :)

But yeah, I agree that reading the code it's clear: if dst is already 
shared, just don't do anything.

I would probably rephrase the comment to something simpler like

"/* If the pagetables are shared, there is nothing to do. */

If you resend, please add a comment to the patch description like "While 
at it, simplify the comment, the details are not actually relevant anymore".

-- 
Cheers

David / dhildenb

Re: [PATCH] mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count
Posted by jane.chu@oracle.com 2 weeks, 3 days ago
On 9/12/2025 12:31 AM, David Hildenbrand wrote:
> On 11.09.25 21:54, jane.chu@oracle.com wrote:
>>
>> On 9/9/2025 11:45 PM, David Hildenbrand wrote:
>> [..]
>>>> -        /*
>>>> -         * If the pagetables are shared don't copy or take references.
>>>> -         *
>>>> -         * dst_pte == src_pte is the common case of src/dest sharing.
>>>> -         * However, src could have 'unshared' and dst shares with
>>>> -         * another vma. So page_count of ptep page is checked instead
>>>> -         * to reliably determine whether pte is shared.
>>>> -         */
>>>> -        if (page_count(virt_to_page(dst_pte)) > 1) {
>>>> +#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING
>>>> +        /* If the pagetables are shared don't copy or take
>>>> references. */
>>>
>>> Why remove so much of the original comment?
>>
>> Because, this part of checking has already advanced from the "dst_pte ==
>> src_pte" to "page_count() > 1" to ->pt_share_count > 0, it seems cleaner
>> to just keep an one liner comment.
>> That said, if you feel the comments should be kept, I'd be happy to
>> restore them with a bit revision.
> 
> Well, the comment explains why checking the pte pointers is insufficient 
> and why there is a corner case where the pointers differ but we still 
> want to unshare. :)
> 
> But yeah, I agree that reading the code it's clear: if dst is already 
> shared, just don't do anything.
> 
> I would probably rephrase the comment to something simpler like
> 
> "/* If the pagetables are shared, there is nothing to do. */
> 
> If you resend, please add a comment to the patch description like "While 
> at it, simplify the comment, the details are not actually relevant 
> anymore".
> 

Will do, thanks!

-jane

Re: [PATCH] mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count
Posted by Harry Yoo 3 weeks, 1 day ago
On Tue, Sep 09, 2025 at 12:43:57PM -0600, Jane Chu wrote:
> commit 59d9094df3d79 introduced ->pt_share_count dedicated to

nit: the format should be:
  commit <sha1> ("summary")
?

> hugetlb PMD share count tracking, but omitted fixing
> copy_hugetlb_page_range(), leaving the function relying on
> page_count() for tracking that no longer works.
> 
> When lazy page table copy for hugetlb is disabled (commit bcd51a3c679d),

same here.

> fork()'ing with hugetlb PMD sharing quickly lockup -
> 
> [  239.446559] watchdog: BUG: soft lockup - CPU#75 stuck for 27s!
> [  239.446611] RIP: 0010:native_queued_spin_lock_slowpath+0x7e/0x2e0
> [  239.446631] Call Trace:
> [  239.446633]  <TASK>
> [  239.446636]  _raw_spin_lock+0x3f/0x60
> [  239.446639]  copy_hugetlb_page_range+0x258/0xb50
> [  239.446645]  copy_page_range+0x22b/0x2c0
> [  239.446651]  dup_mmap+0x3e2/0x770
> [  239.446654]  dup_mm.constprop.0+0x5e/0x230
> [  239.446657]  copy_process+0xd17/0x1760
> [  239.446660]  kernel_clone+0xc0/0x3e0
> [  239.446661]  __do_sys_clone+0x65/0xa0
> [  239.446664]  do_syscall_64+0x82/0x930
> [  239.446668]  ? count_memcg_events+0xd2/0x190
> [  239.446671]  ? syscall_trace_enter+0x14e/0x1f0
> [  239.446676]  ? syscall_exit_work+0x118/0x150
> [  239.446677]  ? arch_exit_to_user_mode_prepare.constprop.0+0x9/0xb0
> [  239.446681]  ? clear_bhb_loop+0x30/0x80
> [  239.446684]  ? clear_bhb_loop+0x30/0x80
> [  239.446686]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> There are two options to resolve the potential latent issue:
>   1. remove the PMD sharing awareness from copy_hugetlb_page_range(),
>   2. fix it.
> This patch opts for the second option.
> 
> Fixes: 59d9094df3d79 ("mm: hugetlb: independent PMD page table shared
> count")

nit: we don't add newline even when Fixes: tag is line is over 75 characters?
 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---

The change in general looks good to me,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

+Cc Jann Horn who backported the commit 59d9094df3d79.

Elaborating a little bit why it doesn't need to be backported:
TL;DR: One needs to backport commit 3aa4ed8040e15 to pre-v6.0 kernels,
or revert commit bcd51a3c679d to trigger this.

commit 59d9094df3d79 ("mm: hugetlb: independent PMD page table shared count")
is introduced in 6.13 and backported to 5.10.y, 5.15.y, 6.1.y, 6.6.y, 6.12.y.

As lazy page table copy is enabled in v6.0 by commit bcd51a3c679d
("hugetlb: lazy page table copies in fork()"), the bug is not triggered
because shared hugetlb VMAs go through lazy page table copy logic
(vma_needs_copy() returns false) or they can't share page tables
(uffd_disable_huge_pmd_share() returns false).

They shouldn't have anon_vma, VM_PFNMAP, VM_MIXEDMAP and UFFD-WP VMA
cannot share page tables - so either vma_needs_copy() return false, or
page tables cannot be shared.

And before commit 3aa4ed8040e15 ("mm/hugetlb: make detecting shared pte
more reliable") introduced in v6.1, copy_hugetlb_page_range() doesn't check
refcount to determine whether the page table is shared. 

>  mm/hugetlb.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 753f99b4c718..8ca5b4f7805f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5594,18 +5594,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>  			break;
>  		}
>  
> -		/*
> -		 * If the pagetables are shared don't copy or take references.
> -		 *
> -		 * dst_pte == src_pte is the common case of src/dest sharing.
> -		 * However, src could have 'unshared' and dst shares with
> -		 * another vma. So page_count of ptep page is checked instead
> -		 * to reliably determine whether pte is shared.
> -		 */
> -		if (page_count(virt_to_page(dst_pte)) > 1) {
> +#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING
> +		/* If the pagetables are shared don't copy or take references. */
> +		if (ptdesc_pmd_pts_count(virt_to_ptdesc(dst_pte)) > 0) {
>  			addr |= last_addr_mask;
>  			continue;
>  		}
> +#endif
>  
>  		dst_ptl = huge_pte_lock(h, dst, dst_pte);
>  		src_ptl = huge_pte_lockptr(h, src, src_pte);
> -- 
> 2.43.5
> 

-- 
Cheers,
Harry / Hyeonggon
Re: [PATCH] mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count
Posted by jane.chu@oracle.com 3 weeks, 1 day ago
On 9/9/2025 6:14 PM, Harry Yoo wrote:
> On Tue, Sep 09, 2025 at 12:43:57PM -0600, Jane Chu wrote:
>> commit 59d9094df3d79 introduced ->pt_share_count dedicated to
> 
> nit: the format should be:
>    commit <sha1> ("summary")
> ?
> 
>> hugetlb PMD share count tracking, but omitted fixing
>> copy_hugetlb_page_range(), leaving the function relying on
>> page_count() for tracking that no longer works.
>>
>> When lazy page table copy for hugetlb is disabled (commit bcd51a3c679d),
> 
> same here.
> 
>> fork()'ing with hugetlb PMD sharing quickly lockup -
>>
>> [  239.446559] watchdog: BUG: soft lockup - CPU#75 stuck for 27s!
>> [  239.446611] RIP: 0010:native_queued_spin_lock_slowpath+0x7e/0x2e0
>> [  239.446631] Call Trace:
>> [  239.446633]  <TASK>
>> [  239.446636]  _raw_spin_lock+0x3f/0x60
>> [  239.446639]  copy_hugetlb_page_range+0x258/0xb50
>> [  239.446645]  copy_page_range+0x22b/0x2c0
>> [  239.446651]  dup_mmap+0x3e2/0x770
>> [  239.446654]  dup_mm.constprop.0+0x5e/0x230
>> [  239.446657]  copy_process+0xd17/0x1760
>> [  239.446660]  kernel_clone+0xc0/0x3e0
>> [  239.446661]  __do_sys_clone+0x65/0xa0
>> [  239.446664]  do_syscall_64+0x82/0x930
>> [  239.446668]  ? count_memcg_events+0xd2/0x190
>> [  239.446671]  ? syscall_trace_enter+0x14e/0x1f0
>> [  239.446676]  ? syscall_exit_work+0x118/0x150
>> [  239.446677]  ? arch_exit_to_user_mode_prepare.constprop.0+0x9/0xb0
>> [  239.446681]  ? clear_bhb_loop+0x30/0x80
>> [  239.446684]  ? clear_bhb_loop+0x30/0x80
>> [  239.446686]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> There are two options to resolve the potential latent issue:
>>    1. remove the PMD sharing awareness from copy_hugetlb_page_range(),
>>    2. fix it.
>> This patch opts for the second option.
>>
>> Fixes: 59d9094df3d79 ("mm: hugetlb: independent PMD page table shared
>> count")
> 
> nit: we don't add newline even when Fixes: tag is line is over 75 characters?
>   
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
> 
> The change in general looks good to me,
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> 
> +Cc Jann Horn who backported the commit 59d9094df3d79.

Thanks Harry!  Will fix the commit message and send v2.>
> Elaborating a little bit why it doesn't need to be backported:
> TL;DR: One needs to backport commit 3aa4ed8040e15 to pre-v6.0 kernels,
> or revert commit bcd51a3c679d to trigger this.
> 
> commit 59d9094df3d79 ("mm: hugetlb: independent PMD page table shared count")
> is introduced in 6.13 and backported to 5.10.y, 5.15.y, 6.1.y, 6.6.y, 6.12.y.
> 
> As lazy page table copy is enabled in v6.0 by commit bcd51a3c679d
> ("hugetlb: lazy page table copies in fork()"), the bug is not triggered
> because shared hugetlb VMAs go through lazy page table copy logic
> (vma_needs_copy() returns false) or they can't share page tables
> (uffd_disable_huge_pmd_share() returns false).
> 
> They shouldn't have anon_vma, VM_PFNMAP, VM_MIXEDMAP and UFFD-WP VMA
> cannot share page tables - so either vma_needs_copy() return false, or
> page tables cannot be shared.
> 
> And before commit 3aa4ed8040e15 ("mm/hugetlb: make detecting shared pte
> more reliable") introduced in v6.1, copy_hugetlb_page_range() doesn't check
> refcount to determine whether the page table is shared.

Yes, it's the combination of
   v6.1: 3aa4ed8040e15 mm/hugetlb: make detecting shared pte more reliable
   v6.13: 59d9094df3d79 mm: hugetlb: independent PMD page table shared count
without
   v6.0: bcd51a3c679d hugetlb: lazy page table copies in fork()
that is problematic.

Since the problematic combination doesn't exist in any upstream release,
no need to backport.  The patch is for a potential latent bug.

thanks,
-jane
   >
>>   mm/hugetlb.c | 13 ++++---------
>>   1 file changed, 4 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 753f99b4c718..8ca5b4f7805f 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -5594,18 +5594,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>>   			break;
>>   		}
>>   
>> -		/*
>> -		 * If the pagetables are shared don't copy or take references.
>> -		 *
>> -		 * dst_pte == src_pte is the common case of src/dest sharing.
>> -		 * However, src could have 'unshared' and dst shares with
>> -		 * another vma. So page_count of ptep page is checked instead
>> -		 * to reliably determine whether pte is shared.
>> -		 */
>> -		if (page_count(virt_to_page(dst_pte)) > 1) {
>> +#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING
>> +		/* If the pagetables are shared don't copy or take references. */
>> +		if (ptdesc_pmd_pts_count(virt_to_ptdesc(dst_pte)) > 0) {
>>   			addr |= last_addr_mask;
>>   			continue;
>>   		}
>> +#endif
>>   
>>   		dst_ptl = huge_pte_lock(h, dst, dst_pte);
>>   		src_ptl = huge_pte_lockptr(h, src, src_pte);
>> -- 
>> 2.43.5
>>
>