[PATCH v2 02/14] arm: adjust_pte() use pte_offset_map_rw_nolock()

Qi Zheng posted 14 patches 1 year, 5 months ago
There is a newer version of this series
[PATCH v2 02/14] arm: adjust_pte() use pte_offset_map_rw_nolock()
Posted by Qi Zheng 1 year, 5 months ago
In do_adjust_pte(), we may modify the pte entry. At this time, the write
lock of mmap_lock is not held, and the pte_same() check is not performed
after the PTL held. The corresponding pmd entry may have been modified
concurrently. Therefore, in order to ensure the stability if pmd entry,
use pte_offset_map_rw_nolock() to replace pte_offset_map_nolock(), and do
pmd_same() check after holding the PTL.

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 arch/arm/mm/fault-armv.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 831793cd6ff94..de6c7d8a2ddfc 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -94,6 +94,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
+	pmd_t pmdval;
 	int ret;
 
 	pgd = pgd_offset(vma->vm_mm, address);
@@ -112,16 +113,22 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
 	if (pmd_none_or_clear_bad(pmd))
 		return 0;
 
+again:
 	/*
 	 * This is called while another page table is mapped, so we
 	 * must use the nested version.  This also means we need to
 	 * open-code the spin-locking.
 	 */
-	pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl);
+	pte = pte_offset_map_rw_nolock(vma->vm_mm, pmd, address, &pmdval, &ptl);
 	if (!pte)
 		return 0;
 
 	do_pte_lock(ptl);
+	if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
+		do_pte_unlock(ptl);
+		pte_unmap(pte);
+		goto again;
+	}
 
 	ret = do_adjust_pte(vma, address, pfn, pte);
 
-- 
2.20.1
Re: [PATCH v2 02/14] arm: adjust_pte() use pte_offset_map_rw_nolock()
Posted by David Hildenbrand 1 year, 5 months ago
On 22.08.24 09:13, Qi Zheng wrote:
> In do_adjust_pte(), we may modify the pte entry. At this time, the write
> lock of mmap_lock is not held, and the pte_same() check is not performed
> after the PTL held. The corresponding pmd entry may have been modified
> concurrently. Therefore, in order to ensure the stability if pmd entry,
> use pte_offset_map_rw_nolock() to replace pte_offset_map_nolock(), and do
> pmd_same() check after holding the PTL.
> 
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---
>   arch/arm/mm/fault-armv.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> index 831793cd6ff94..de6c7d8a2ddfc 100644
> --- a/arch/arm/mm/fault-armv.c
> +++ b/arch/arm/mm/fault-armv.c
> @@ -94,6 +94,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
>   	pud_t *pud;
>   	pmd_t *pmd;
>   	pte_t *pte;
> +	pmd_t pmdval;
>   	int ret;
>   
>   	pgd = pgd_offset(vma->vm_mm, address);
> @@ -112,16 +113,22 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
>   	if (pmd_none_or_clear_bad(pmd))
>   		return 0;
>   
> +again:
>   	/*
>   	 * This is called while another page table is mapped, so we
>   	 * must use the nested version.  This also means we need to
>   	 * open-code the spin-locking.
>   	 */
> -	pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl);
> +	pte = pte_offset_map_rw_nolock(vma->vm_mm, pmd, address, &pmdval, &ptl);
>   	if (!pte)
>   		return 0;
>   
>   	do_pte_lock(ptl);
> +	if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
> +		do_pte_unlock(ptl);
> +		pte_unmap(pte);
> +		goto again;
> +	}
>   
>   	ret = do_adjust_pte(vma, address, pfn, pte);
>   

Looks correct to me, but I wonder why the missing pmd_same check is not 
an issue so far ... any experts? THP on __LINUX_ARM_ARCH__ < 6 is not 
really used/possible?

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb
Re: [PATCH v2 02/14] arm: adjust_pte() use pte_offset_map_rw_nolock()
Posted by Muchun Song 1 year, 5 months ago

On 2024/8/26 23:26, David Hildenbrand wrote:
> On 22.08.24 09:13, Qi Zheng wrote:
>> In do_adjust_pte(), we may modify the pte entry. At this time, the write
>> lock of mmap_lock is not held, and the pte_same() check is not performed
>> after the PTL held. The corresponding pmd entry may have been modified
>> concurrently. Therefore, in order to ensure the stability if pmd entry,
>> use pte_offset_map_rw_nolock() to replace pte_offset_map_nolock(), 
>> and do
>> pmd_same() check after holding the PTL.
>>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>

Reviewed-by: Muchun Song <muchun.song@linux.dev>

>> ---
>>   arch/arm/mm/fault-armv.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
>> index 831793cd6ff94..de6c7d8a2ddfc 100644
>> --- a/arch/arm/mm/fault-armv.c
>> +++ b/arch/arm/mm/fault-armv.c
>> @@ -94,6 +94,7 @@ static int adjust_pte(struct vm_area_struct *vma, 
>> unsigned long address,
>>       pud_t *pud;
>>       pmd_t *pmd;
>>       pte_t *pte;
>> +    pmd_t pmdval;
>>       int ret;
>>         pgd = pgd_offset(vma->vm_mm, address);
>> @@ -112,16 +113,22 @@ static int adjust_pte(struct vm_area_struct 
>> *vma, unsigned long address,
>>       if (pmd_none_or_clear_bad(pmd))
>>           return 0;
>>   +again:
>>       /*
>>        * This is called while another page table is mapped, so we
>>        * must use the nested version.  This also means we need to
>>        * open-code the spin-locking.
>>        */
>> -    pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl);
>> +    pte = pte_offset_map_rw_nolock(vma->vm_mm, pmd, address, 
>> &pmdval, &ptl);
>>       if (!pte)
>>           return 0;
>>         do_pte_lock(ptl);
>> +    if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
>> +        do_pte_unlock(ptl);
>> +        pte_unmap(pte);
>> +        goto again;
>> +    }
>>         ret = do_adjust_pte(vma, address, pfn, pte);
>
> Looks correct to me, but I wonder why the missing pmd_same check is 
> not an issue so far ... any experts? THP on __LINUX_ARM_ARCH__ < 6 is 
> not really used/possible?

I think it is because it does not support THP.

TRANSPARENT_HUGEPAGE depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE which
depends on ARM_LPAE. However, the Kconfig says ARM_LPAE is only
supported on ARMv7 processor.

config ARM_LPAE
          bool "Support for the Large Physical Address Extension"
          depends on MMU && CPU_32v7 && !CPU_32v6 && !CPU_32v5 && \
                  !CPU_32v4 && !CPU_32v3
          select PHYS_ADDR_T_64BIT
          select SWIOTLB
          help
            Say Y if you have an ARMv7 processor supporting the LPAE page
            table format and you would like to access memory beyond the
            4GB limit. The resulting kernel image will not run on
            processors without the LPA extension.

            If unsure, say N.

Thanks.
>
> Acked-by: David Hildenbrand <david@redhat.com>
>