mm: hugetlb: avoid soft lockup when mprotect to large memory area

[v2 PATCH] mm: hugetlb: avoid soft lockup when mprotect to large memory area

Posted by Yang Shi 4 months, 1 week ago

When calling mprotect() to a large hugetlb memory area in our customer's
workload (~300GB hugetlb memory), soft lockup was observed:

watchdog: BUG: soft lockup - CPU#98 stuck for 23s! [t2_new_sysv:126916]

CPU: 98 PID: 126916 Comm: t2_new_sysv Kdump: loaded Not tainted 6.17-rc7
Hardware name: GIGACOMPUTING R2A3-T40-AAV1/Jefferson CIO, BIOS 5.4.4.1 07/15/2025
pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mte_clear_page_tags+0x14/0x24
lr : mte_sync_tags+0x1c0/0x240
sp : ffff80003150bb80
x29: ffff80003150bb80 x28: ffff00739e9705a8 x27: 0000ffd2d6a00000
x26: 0000ff8e4bc00000 x25: 00e80046cde00f45 x24: 0000000000022458
x23: 0000000000000000 x22: 0000000000000004 x21: 000000011b380000
x20: ffff000000000000 x19: 000000011b379f40 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc875e0aa5e2c
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
x5 : fffffc01ce7a5c00 x4 : 00000000046cde00 x3 : fffffc0000000000
x2 : 0000000000000004 x1 : 0000000000000040 x0 : ffff0046cde7c000

Call trace:
  mte_clear_page_tags+0x14/0x24
  set_huge_pte_at+0x25c/0x280
  hugetlb_change_protection+0x220/0x430
  change_protection+0x5c/0x8c
  mprotect_fixup+0x10c/0x294
  do_mprotect_pkey.constprop.0+0x2e0/0x3d4
  __arm64_sys_mprotect+0x24/0x44
  invoke_syscall+0x50/0x160
  el0_svc_common+0x48/0x144
  do_el0_svc+0x30/0xe0
  el0_svc+0x30/0xf0
  el0t_64_sync_handler+0xc4/0x148
  el0t_64_sync+0x1a4/0x1a8

Soft lockup is not triggered with THP or base page because there is
cond_resched() called for each PMD size.

Although the soft lockup was triggered by MTE, it should be not MTE
specific. The other processing which takes long time in the loop may
trigger soft lockup too.

So add cond_resched() for hugetlb to avoid soft lockup.

Fixes: 8f860591ffb2 ("[PATCH] Enable mprotect on huge pages")
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
---
v2: - Made the subject and commit message less MTE specific and fixed
      the fixes tag.
    - Collected all R-bs and A-bs.

 mm/hugetlb.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index cb5c4e79e0b8..fe6606d91b31 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7242,6 +7242,8 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
 						psize);
 		}
 		spin_unlock(ptl);
+
+		cond_resched();
 	}
 	/*
 	 * Must flush TLB before releasing i_mmap_rwsem: x86's huge_pmd_unshare
-- 
2.47.0

Re: [v2 PATCH] mm: hugetlb: avoid soft lockup when mprotect to large memory area

Posted by Dev Jain 4 months, 1 week ago

On 30/09/25 1:54 am, Yang Shi wrote:
> When calling mprotect() to a large hugetlb memory area in our customer's
> workload (~300GB hugetlb memory), soft lockup was observed:
>
> watchdog: BUG: soft lockup - CPU#98 stuck for 23s! [t2_new_sysv:126916]
>
> CPU: 98 PID: 126916 Comm: t2_new_sysv Kdump: loaded Not tainted 6.17-rc7
> Hardware name: GIGACOMPUTING R2A3-T40-AAV1/Jefferson CIO, BIOS 5.4.4.1 07/15/2025
> pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : mte_clear_page_tags+0x14/0x24
> lr : mte_sync_tags+0x1c0/0x240
> sp : ffff80003150bb80
> x29: ffff80003150bb80 x28: ffff00739e9705a8 x27: 0000ffd2d6a00000
> x26: 0000ff8e4bc00000 x25: 00e80046cde00f45 x24: 0000000000022458
> x23: 0000000000000000 x22: 0000000000000004 x21: 000000011b380000
> x20: ffff000000000000 x19: 000000011b379f40 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc875e0aa5e2c
> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
> x5 : fffffc01ce7a5c00 x4 : 00000000046cde00 x3 : fffffc0000000000
> x2 : 0000000000000004 x1 : 0000000000000040 x0 : ffff0046cde7c000
>
> Call trace:
>    mte_clear_page_tags+0x14/0x24
>    set_huge_pte_at+0x25c/0x280
>    hugetlb_change_protection+0x220/0x430
>    change_protection+0x5c/0x8c
>    mprotect_fixup+0x10c/0x294
>    do_mprotect_pkey.constprop.0+0x2e0/0x3d4
>    __arm64_sys_mprotect+0x24/0x44
>    invoke_syscall+0x50/0x160
>    el0_svc_common+0x48/0x144
>    do_el0_svc+0x30/0xe0
>    el0_svc+0x30/0xf0
>    el0t_64_sync_handler+0xc4/0x148
>    el0t_64_sync+0x1a4/0x1a8
>
> Soft lockup is not triggered with THP or base page because there is
> cond_resched() called for each PMD size.
>
> Although the soft lockup was triggered by MTE, it should be not MTE
> specific. The other processing which takes long time in the loop may
> trigger soft lockup too.
>
> So add cond_resched() for hugetlb to avoid soft lockup.
>
> Fixes: 8f860591ffb2 ("[PATCH] Enable mprotect on huge pages")
> Tested-by: Carl Worth <carl@os.amperecomputing.com>
> Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Acked-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
> ---
> v2: - Made the subject and commit message less MTE specific and fixed
>        the fixes tag.
>      - Collected all R-bs and A-bs.
>
>   mm/hugetlb.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index cb5c4e79e0b8..fe6606d91b31 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -7242,6 +7242,8 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
>   						psize);
>   		}
>   		spin_unlock(ptl);
> +
> +		cond_resched();
>   	}
>   	/*
>   	 * Must flush TLB before releasing i_mmap_rwsem: x86's huge_pmd_unshare

Reviewed-by: Dev Jain <dev.jain@arm.com>

Does it make sense to also do cond_resched() in the huge_pmd_unshare() branch?
That also amounts to clearing a page. And I can see for example, zap_huge_pmd()
and change_huge_pmd() consume a cond_resched().

Re: [v2 PATCH] mm: hugetlb: avoid soft lockup when mprotect to large memory area

Posted by Yang Shi 4 months, 1 week ago


On 9/29/25 10:26 PM, Dev Jain wrote:
>
> On 30/09/25 1:54 am, Yang Shi wrote:
>> When calling mprotect() to a large hugetlb memory area in our customer's
>> workload (~300GB hugetlb memory), soft lockup was observed:
>>
>> watchdog: BUG: soft lockup - CPU#98 stuck for 23s! [t2_new_sysv:126916]
>>
>> CPU: 98 PID: 126916 Comm: t2_new_sysv Kdump: loaded Not tainted 6.17-rc7
>> Hardware name: GIGACOMPUTING R2A3-T40-AAV1/Jefferson CIO, BIOS 
>> 5.4.4.1 07/15/2025
>> pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : mte_clear_page_tags+0x14/0x24
>> lr : mte_sync_tags+0x1c0/0x240
>> sp : ffff80003150bb80
>> x29: ffff80003150bb80 x28: ffff00739e9705a8 x27: 0000ffd2d6a00000
>> x26: 0000ff8e4bc00000 x25: 00e80046cde00f45 x24: 0000000000022458
>> x23: 0000000000000000 x22: 0000000000000004 x21: 000000011b380000
>> x20: ffff000000000000 x19: 000000011b379f40 x18: 0000000000000000
>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>> x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
>> x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc875e0aa5e2c
>> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
>> x5 : fffffc01ce7a5c00 x4 : 00000000046cde00 x3 : fffffc0000000000
>> x2 : 0000000000000004 x1 : 0000000000000040 x0 : ffff0046cde7c000
>>
>> Call trace:
>>    mte_clear_page_tags+0x14/0x24
>>    set_huge_pte_at+0x25c/0x280
>>    hugetlb_change_protection+0x220/0x430
>>    change_protection+0x5c/0x8c
>>    mprotect_fixup+0x10c/0x294
>>    do_mprotect_pkey.constprop.0+0x2e0/0x3d4
>>    __arm64_sys_mprotect+0x24/0x44
>>    invoke_syscall+0x50/0x160
>>    el0_svc_common+0x48/0x144
>>    do_el0_svc+0x30/0xe0
>>    el0_svc+0x30/0xf0
>>    el0t_64_sync_handler+0xc4/0x148
>>    el0t_64_sync+0x1a4/0x1a8
>>
>> Soft lockup is not triggered with THP or base page because there is
>> cond_resched() called for each PMD size.
>>
>> Although the soft lockup was triggered by MTE, it should be not MTE
>> specific. The other processing which takes long time in the loop may
>> trigger soft lockup too.
>>
>> So add cond_resched() for hugetlb to avoid soft lockup.
>>
>> Fixes: 8f860591ffb2 ("[PATCH] Enable mprotect on huge pages")
>> Tested-by: Carl Worth <carl@os.amperecomputing.com>
>> Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org>
>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Acked-by: Oscar Salvador <osalvador@suse.de>
>> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
>> ---
>> v2: - Made the subject and commit message less MTE specific and fixed
>>        the fixes tag.
>>      - Collected all R-bs and A-bs.
>>
>>   mm/hugetlb.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index cb5c4e79e0b8..fe6606d91b31 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -7242,6 +7242,8 @@ long hugetlb_change_protection(struct 
>> vm_area_struct *vma,
>>                           psize);
>>           }
>>           spin_unlock(ptl);
>> +
>> +        cond_resched();
>>       }
>>       /*
>>        * Must flush TLB before releasing i_mmap_rwsem: x86's 
>> huge_pmd_unshare
>
> Reviewed-by: Dev Jain <dev.jain@arm.com>

Thank you.

>
> Does it make sense to also do cond_resched() in the huge_pmd_unshare() 
> branch?
> That also amounts to clearing a page. And I can see for example, 
> zap_huge_pmd()
> and change_huge_pmd() consume a cond_resched().

Thanks for raising this. I did think about it. But I didn't convince 
myself because shared pmd should be not that common IMHO (If I'm wrong, 
please feel free to correct me). At least PMD can't be shared if the 
memory is tagged IIRC. So I'd like to keep the patch minimal for now and 
defer adding cond_resched() until it is hit by some real life workload.

Yang

Re: [v2 PATCH] mm: hugetlb: avoid soft lockup when mprotect to large memory area

Posted by Dev Jain 4 months, 1 week ago

On 30/09/25 11:38 pm, Yang Shi wrote:
>
>
> On 9/29/25 10:26 PM, Dev Jain wrote:
>>
>> On 30/09/25 1:54 am, Yang Shi wrote:
>>> When calling mprotect() to a large hugetlb memory area in our 
>>> customer's
>>> workload (~300GB hugetlb memory), soft lockup was observed:
>>>
>>> watchdog: BUG: soft lockup - CPU#98 stuck for 23s! [t2_new_sysv:126916]
>>>
>>> CPU: 98 PID: 126916 Comm: t2_new_sysv Kdump: loaded Not tainted 
>>> 6.17-rc7
>>> Hardware name: GIGACOMPUTING R2A3-T40-AAV1/Jefferson CIO, BIOS 
>>> 5.4.4.1 07/15/2025
>>> pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> pc : mte_clear_page_tags+0x14/0x24
>>> lr : mte_sync_tags+0x1c0/0x240
>>> sp : ffff80003150bb80
>>> x29: ffff80003150bb80 x28: ffff00739e9705a8 x27: 0000ffd2d6a00000
>>> x26: 0000ff8e4bc00000 x25: 00e80046cde00f45 x24: 0000000000022458
>>> x23: 0000000000000000 x22: 0000000000000004 x21: 000000011b380000
>>> x20: ffff000000000000 x19: 000000011b379f40 x18: 0000000000000000
>>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>> x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
>>> x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc875e0aa5e2c
>>> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
>>> x5 : fffffc01ce7a5c00 x4 : 00000000046cde00 x3 : fffffc0000000000
>>> x2 : 0000000000000004 x1 : 0000000000000040 x0 : ffff0046cde7c000
>>>
>>> Call trace:
>>>    mte_clear_page_tags+0x14/0x24
>>>    set_huge_pte_at+0x25c/0x280
>>>    hugetlb_change_protection+0x220/0x430
>>>    change_protection+0x5c/0x8c
>>>    mprotect_fixup+0x10c/0x294
>>>    do_mprotect_pkey.constprop.0+0x2e0/0x3d4
>>>    __arm64_sys_mprotect+0x24/0x44
>>>    invoke_syscall+0x50/0x160
>>>    el0_svc_common+0x48/0x144
>>>    do_el0_svc+0x30/0xe0
>>>    el0_svc+0x30/0xf0
>>>    el0t_64_sync_handler+0xc4/0x148
>>>    el0t_64_sync+0x1a4/0x1a8
>>>
>>> Soft lockup is not triggered with THP or base page because there is
>>> cond_resched() called for each PMD size.
>>>
>>> Although the soft lockup was triggered by MTE, it should be not MTE
>>> specific. The other processing which takes long time in the loop may
>>> trigger soft lockup too.
>>>
>>> So add cond_resched() for hugetlb to avoid soft lockup.
>>>
>>> Fixes: 8f860591ffb2 ("[PATCH] Enable mprotect on huge pages")
>>> Tested-by: Carl Worth <carl@os.amperecomputing.com>
>>> Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org>
>>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>>> Acked-by: David Hildenbrand <david@redhat.com>
>>> Acked-by: Oscar Salvador <osalvador@suse.de>
>>> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
>>> ---
>>> v2: - Made the subject and commit message less MTE specific and fixed
>>>        the fixes tag.
>>>      - Collected all R-bs and A-bs.
>>>
>>>   mm/hugetlb.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index cb5c4e79e0b8..fe6606d91b31 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -7242,6 +7242,8 @@ long hugetlb_change_protection(struct 
>>> vm_area_struct *vma,
>>>                           psize);
>>>           }
>>>           spin_unlock(ptl);
>>> +
>>> +        cond_resched();
>>>       }
>>>       /*
>>>        * Must flush TLB before releasing i_mmap_rwsem: x86's 
>>> huge_pmd_unshare
>>
>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>
> Thank you.
>
>>
>> Does it make sense to also do cond_resched() in the 
>> huge_pmd_unshare() branch?
>> That also amounts to clearing a page. And I can see for example, 
>> zap_huge_pmd()
>> and change_huge_pmd() consume a cond_resched().
>
> Thanks for raising this. I did think about it. But I didn't convince 
> myself because shared pmd should be not that common IMHO (If I'm 
> wrong, please feel free to correct me). At least PMD can't be shared 
> if the memory is tagged IIRC. So I'd like to keep the patch minimal 
> for now and defer adding cond_resched() until it is hit by some real 
> life workload.

If we have large swathes of hugetlb memory like in your workload, and it 
is MAP_SHARED, then there should be high chances of sharing the PMD. 
Although, I incorrectly

observed that we are clearing a page there - we are only clearing the 
pud entry which is 8 bytes. So yes a soft lockup should be highly 
unlikely. But since cond_resched()

is cheap (I assume this is the case since it is liberally sprinkled all 
over the codebase) I think we should be consistent. Probably not an 
immediate concern and not a matter

of this patch.


>
> Yang
>
>

Re: [v2 PATCH] mm: hugetlb: avoid soft lockup when mprotect to large memory area

Posted by David Hildenbrand 4 months, 1 week ago

On 01.10.25 06:23, Dev Jain wrote:
> 
> On 30/09/25 11:38 pm, Yang Shi wrote:
>>
>>
>> On 9/29/25 10:26 PM, Dev Jain wrote:
>>>
>>> On 30/09/25 1:54 am, Yang Shi wrote:
>>>> When calling mprotect() to a large hugetlb memory area in our
>>>> customer's
>>>> workload (~300GB hugetlb memory), soft lockup was observed:
>>>>
>>>> watchdog: BUG: soft lockup - CPU#98 stuck for 23s! [t2_new_sysv:126916]
>>>>
>>>> CPU: 98 PID: 126916 Comm: t2_new_sysv Kdump: loaded Not tainted
>>>> 6.17-rc7
>>>> Hardware name: GIGACOMPUTING R2A3-T40-AAV1/Jefferson CIO, BIOS
>>>> 5.4.4.1 07/15/2025
>>>> pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>> pc : mte_clear_page_tags+0x14/0x24
>>>> lr : mte_sync_tags+0x1c0/0x240
>>>> sp : ffff80003150bb80
>>>> x29: ffff80003150bb80 x28: ffff00739e9705a8 x27: 0000ffd2d6a00000
>>>> x26: 0000ff8e4bc00000 x25: 00e80046cde00f45 x24: 0000000000022458
>>>> x23: 0000000000000000 x22: 0000000000000004 x21: 000000011b380000
>>>> x20: ffff000000000000 x19: 000000011b379f40 x18: 0000000000000000
>>>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>>> x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
>>>> x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc875e0aa5e2c
>>>> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
>>>> x5 : fffffc01ce7a5c00 x4 : 00000000046cde00 x3 : fffffc0000000000
>>>> x2 : 0000000000000004 x1 : 0000000000000040 x0 : ffff0046cde7c000
>>>>
>>>> Call trace:
>>>>     mte_clear_page_tags+0x14/0x24
>>>>     set_huge_pte_at+0x25c/0x280
>>>>     hugetlb_change_protection+0x220/0x430
>>>>     change_protection+0x5c/0x8c
>>>>     mprotect_fixup+0x10c/0x294
>>>>     do_mprotect_pkey.constprop.0+0x2e0/0x3d4
>>>>     __arm64_sys_mprotect+0x24/0x44
>>>>     invoke_syscall+0x50/0x160
>>>>     el0_svc_common+0x48/0x144
>>>>     do_el0_svc+0x30/0xe0
>>>>     el0_svc+0x30/0xf0
>>>>     el0t_64_sync_handler+0xc4/0x148
>>>>     el0t_64_sync+0x1a4/0x1a8
>>>>
>>>> Soft lockup is not triggered with THP or base page because there is
>>>> cond_resched() called for each PMD size.
>>>>
>>>> Although the soft lockup was triggered by MTE, it should be not MTE
>>>> specific. The other processing which takes long time in the loop may
>>>> trigger soft lockup too.
>>>>
>>>> So add cond_resched() for hugetlb to avoid soft lockup.
>>>>
>>>> Fixes: 8f860591ffb2 ("[PATCH] Enable mprotect on huge pages")
>>>> Tested-by: Carl Worth <carl@os.amperecomputing.com>
>>>> Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org>
>>>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>>>> Acked-by: David Hildenbrand <david@redhat.com>
>>>> Acked-by: Oscar Salvador <osalvador@suse.de>
>>>> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>> Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
>>>> ---
>>>> v2: - Made the subject and commit message less MTE specific and fixed
>>>>         the fixes tag.
>>>>       - Collected all R-bs and A-bs.
>>>>
>>>>    mm/hugetlb.c | 2 ++
>>>>    1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>> index cb5c4e79e0b8..fe6606d91b31 100644
>>>> --- a/mm/hugetlb.c
>>>> +++ b/mm/hugetlb.c
>>>> @@ -7242,6 +7242,8 @@ long hugetlb_change_protection(struct
>>>> vm_area_struct *vma,
>>>>                            psize);
>>>>            }
>>>>            spin_unlock(ptl);
>>>> +
>>>> +        cond_resched();
>>>>        }
>>>>        /*
>>>>         * Must flush TLB before releasing i_mmap_rwsem: x86's
>>>> huge_pmd_unshare
>>>
>>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>>
>> Thank you.
>>
>>>
>>> Does it make sense to also do cond_resched() in the
>>> huge_pmd_unshare() branch?
>>> That also amounts to clearing a page. And I can see for example,
>>> zap_huge_pmd()
>>> and change_huge_pmd() consume a cond_resched().
>>
>> Thanks for raising this. I did think about it. But I didn't convince
>> myself because shared pmd should be not that common IMHO (If I'm
>> wrong, please feel free to correct me). At least PMD can't be shared
>> if the memory is tagged IIRC. So I'd like to keep the patch minimal
>> for now and defer adding cond_resched() until it is hit by some real
>> life workload.
> 
> If we have large swathes of hugetlb memory like in your workload, and it
> is MAP_SHARED, then there should be high chances of sharing the PMD.
> Although, I incorrectly
> 
> observed that we are clearing a page there - we are only clearing the
> pud entry which is 8 bytes. So yes a soft lockup should be highly
> unlikely. But since cond_resched()
> 
> is cheap (I assume this is the case since it is liberally sprinkled all
> over the codebase) I think we should be consistent. Probably not an
> immediate concern and not a matter

Right, that's one of the cases where we might just want to wait either 
until is is reported or until hugetlb is finally removed in a couple of 
decades ;)

-- 
Cheers

David / dhildenb

Re: [v2 PATCH] mm: hugetlb: avoid soft lockup when mprotect to large memory area

Posted by Christoph Lameter (Ampere) 4 months, 1 week ago

On Tue, 30 Sep 2025, Yang Shi wrote:

> > Does it make sense to also do cond_resched() in the huge_pmd_unshare()
> > branch?
> > That also amounts to clearing a page. And I can see for example,
> > zap_huge_pmd()
> > and change_huge_pmd() consume a cond_resched().
>
> Thanks for raising this. I did think about it. But I didn't convince myself
> because shared pmd should be not that common IMHO (If I'm wrong, please feel
> free to correct me). At least PMD can't be shared if the memory is tagged
> IIRC. So I'd like to keep the patch minimal for now and defer adding
> cond_resched() until it is hit by some real life workload.

It would be good to send out a second path that covers the other cases
for discussion.