[PATCH 2/2] arm64/mm: fix incorrect file_map_count for invalid pmd/pud

Liu Shixin posted 2 patches 3 years, 4 months ago
There is a newer version of this series
[PATCH 2/2] arm64/mm: fix incorrect file_map_count for invalid pmd/pud
Posted by Liu Shixin 3 years, 4 months ago
The page table check trigger BUG_ON() unexpectedly when split hugepage:

 ------------[ cut here ]------------
 kernel BUG at mm/page_table_check.c:119!
 Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
 Dumping ftrace buffer:
    (ftrace buffer empty)
 Modules linked in:
 CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
 Hardware name: linux,dummy-virt (DT)
 pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : page_table_check_set.isra.0+0x398/0x468
 lr : page_table_check_set.isra.0+0x1c0/0x468
[...]
 Call trace:
  page_table_check_set.isra.0+0x398/0x468
  __page_table_check_pte_set+0x160/0x1c0
  __split_huge_pmd_locked+0x900/0x1648
  __split_huge_pmd+0x28c/0x3b8
  unmap_page_range+0x428/0x858
  unmap_single_vma+0xf4/0x1c8
  zap_page_range+0x2b0/0x410
  madvise_vma_behavior+0xc44/0xe78
  do_madvise+0x280/0x698
  __arm64_sys_madvise+0x90/0xe8
  invoke_syscall.constprop.0+0xdc/0x1d8
  do_el0_svc+0xf4/0x3f8
  el0_svc+0x58/0x120
  el0t_64_sync_handler+0xb8/0xc0
  el0t_64_sync+0x19c/0x1a0
[...]

On arm64, pmd_present() will return true even if the pmd is invalid. So
in pmdp_invalidate() the file_map_count will not only decrease once but
also increase once. Then in set_pte_at(), the file_map_count increase
again, and so trigger BUG_ON() unexpectedly.

Fix this problem by adding pmd_valid() in pmd_user_accessible_page().
Moreover, add pud_valid() for pud_user_accessible_page() too.

Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
---
 arch/arm64/include/asm/pgtable.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index edf6625ce965..56e178de75e7 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -863,12 +863,12 @@ static inline bool pte_user_accessible_page(pte_t pte)
 
 static inline bool pmd_user_accessible_page(pmd_t pmd)
 {
-	return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
+	return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)) && pmd_valid(pmd);
 }
 
 static inline bool pud_user_accessible_page(pud_t pud)
 {
-	return pud_leaf(pud) && pud_user(pud);
+	return pud_leaf(pud) && pud_user(pud) && pud_valid(pud);
 }
 #endif
 
-- 
2.25.1
Re: [PATCH 2/2] arm64/mm: fix incorrect file_map_count for invalid pmd/pud
Posted by Mark Rutland 3 years, 4 months ago
On Wed, Nov 16, 2022 at 04:38:11PM +0800, Liu Shixin wrote:
> The page table check trigger BUG_ON() unexpectedly when split hugepage:
> 
>  ------------[ cut here ]------------
>  kernel BUG at mm/page_table_check.c:119!
>  Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>  Dumping ftrace buffer:
>     (ftrace buffer empty)
>  Modules linked in:
>  CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
>  Hardware name: linux,dummy-virt (DT)
>  pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : page_table_check_set.isra.0+0x398/0x468
>  lr : page_table_check_set.isra.0+0x1c0/0x468
> [...]
>  Call trace:
>   page_table_check_set.isra.0+0x398/0x468
>   __page_table_check_pte_set+0x160/0x1c0
>   __split_huge_pmd_locked+0x900/0x1648
>   __split_huge_pmd+0x28c/0x3b8
>   unmap_page_range+0x428/0x858
>   unmap_single_vma+0xf4/0x1c8
>   zap_page_range+0x2b0/0x410
>   madvise_vma_behavior+0xc44/0xe78
>   do_madvise+0x280/0x698
>   __arm64_sys_madvise+0x90/0xe8
>   invoke_syscall.constprop.0+0xdc/0x1d8
>   do_el0_svc+0xf4/0x3f8
>   el0_svc+0x58/0x120
>   el0t_64_sync_handler+0xb8/0xc0
>   el0t_64_sync+0x19c/0x1a0
> [...]
> 
> On arm64, pmd_present() will return true even if the pmd is invalid. So
> in pmdp_invalidate() the file_map_count will not only decrease once but
> also increase once. Then in set_pte_at(), the file_map_count increase
> again, and so trigger BUG_ON() unexpectedly.

It's not clear to me how pmd_present() relates to p?d_user_accessible_page()
below. How are they related? (or is this a copy-paste error)?

> Fix this problem by adding pmd_valid() in pmd_user_accessible_page().
> Moreover, add pud_valid() for pud_user_accessible_page() too.
> 
> Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
> Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
> ---
>  arch/arm64/include/asm/pgtable.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index edf6625ce965..56e178de75e7 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -863,12 +863,12 @@ static inline bool pte_user_accessible_page(pte_t pte)
>  
>  static inline bool pmd_user_accessible_page(pmd_t pmd)
>  {
> -	return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
> +	return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)) && pmd_valid(pmd);
>  }
>  
>  static inline bool pud_user_accessible_page(pud_t pud)
>  {
> -	return pud_leaf(pud) && pud_user(pud);
> +	return pud_leaf(pud) && pud_user(pud) && pud_valid(pud);

I think these p?d_valid() checks should be first for clarity, since the other
bits aren't necessarily meaningful for !valid entries.

Thanks,
Mark.
Re: [PATCH 2/2] arm64/mm: fix incorrect file_map_count for invalid pmd/pud
Posted by Liu Shixin 3 years, 4 months ago

On 2022/11/16 23:52, Mark Rutland wrote:
> On Wed, Nov 16, 2022 at 04:38:11PM +0800, Liu Shixin wrote:
>> The page table check trigger BUG_ON() unexpectedly when split hugepage:
>>
>>  ------------[ cut here ]------------
>>  kernel BUG at mm/page_table_check.c:119!
>>  Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>>  Dumping ftrace buffer:
>>     (ftrace buffer empty)
>>  Modules linked in:
>>  CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
>>  Hardware name: linux,dummy-virt (DT)
>>  pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>  pc : page_table_check_set.isra.0+0x398/0x468
>>  lr : page_table_check_set.isra.0+0x1c0/0x468
>> [...]
>>  Call trace:
>>   page_table_check_set.isra.0+0x398/0x468
>>   __page_table_check_pte_set+0x160/0x1c0
>>   __split_huge_pmd_locked+0x900/0x1648
>>   __split_huge_pmd+0x28c/0x3b8
>>   unmap_page_range+0x428/0x858
>>   unmap_single_vma+0xf4/0x1c8
>>   zap_page_range+0x2b0/0x410
>>   madvise_vma_behavior+0xc44/0xe78
>>   do_madvise+0x280/0x698
>>   __arm64_sys_madvise+0x90/0xe8
>>   invoke_syscall.constprop.0+0xdc/0x1d8
>>   do_el0_svc+0xf4/0x3f8
>>   el0_svc+0x58/0x120
>>   el0t_64_sync_handler+0xb8/0xc0
>>   el0t_64_sync+0x19c/0x1a0
>> [...]
>>
>> On arm64, pmd_present() will return true even if the pmd is invalid. So
>> in pmdp_invalidate() the file_map_count will not only decrease once but
>> also increase once. Then in set_pte_at(), the file_map_count increase
>> again, and so trigger BUG_ON() unexpectedly.
> It's not clear to me how pmd_present() relates to p?d_user_accessible_page()
> below. How are they related? (or is this a copy-paste error)?
Yes, should be pmd_leaf(). In the previous patch, pmd_present() has already replaced with pmd_leaf().
Thanks for your careful discovery. Will fix in next version.
>> Fix this problem by adding pmd_valid() in pmd_user_accessible_page().
>> Moreover, add pud_valid() for pud_user_accessible_page() too.
>>
>> Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
>> Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
>> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
>> ---
>>  arch/arm64/include/asm/pgtable.h | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index edf6625ce965..56e178de75e7 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -863,12 +863,12 @@ static inline bool pte_user_accessible_page(pte_t pte)
>>  
>>  static inline bool pmd_user_accessible_page(pmd_t pmd)
>>  {
>> -	return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
>> +	return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)) && pmd_valid(pmd);
>>  }
>>  
>>  static inline bool pud_user_accessible_page(pud_t pud)
>>  {
>> -	return pud_leaf(pud) && pud_user(pud);
>> +	return pud_leaf(pud) && pud_user(pud) && pud_valid(pud);
> I think these p?d_valid() checks should be first for clarity, since the other
> bits aren't necessarily meaningful for !valid entries.
Thanks for your advice.
>
> Thanks,
> Mark.
>
> .
>
Re: [PATCH 2/2] arm64/mm: fix incorrect file_map_count for invalid pmd/pud
Posted by Pasha Tatashin 3 years, 4 months ago
On Wed, Nov 16, 2022 at 2:51 AM Liu Shixin <liushixin2@huawei.com> wrote:
>
> The page table check trigger BUG_ON() unexpectedly when split hugepage:
>
>  ------------[ cut here ]------------
>  kernel BUG at mm/page_table_check.c:119!
>  Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>  Dumping ftrace buffer:
>     (ftrace buffer empty)
>  Modules linked in:
>  CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
>  Hardware name: linux,dummy-virt (DT)
>  pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : page_table_check_set.isra.0+0x398/0x468
>  lr : page_table_check_set.isra.0+0x1c0/0x468
> [...]
>  Call trace:
>   page_table_check_set.isra.0+0x398/0x468
>   __page_table_check_pte_set+0x160/0x1c0
>   __split_huge_pmd_locked+0x900/0x1648
>   __split_huge_pmd+0x28c/0x3b8
>   unmap_page_range+0x428/0x858
>   unmap_single_vma+0xf4/0x1c8
>   zap_page_range+0x2b0/0x410
>   madvise_vma_behavior+0xc44/0xe78
>   do_madvise+0x280/0x698
>   __arm64_sys_madvise+0x90/0xe8
>   invoke_syscall.constprop.0+0xdc/0x1d8
>   do_el0_svc+0xf4/0x3f8
>   el0_svc+0x58/0x120
>   el0t_64_sync_handler+0xb8/0xc0
>   el0t_64_sync+0x19c/0x1a0
> [...]
>
> On arm64, pmd_present() will return true even if the pmd is invalid.
> in pmdp_invalidate() the file_map_count will not only decrease once but
> also increase once. Then in set_pte_at(), the file_map_count increase
> again, and so trigger BUG_ON() unexpectedly.

>  static inline bool pmd_user_accessible_page(pmd_t pmd)
>  {
> -       return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
> +       return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)) && pmd_valid(pmd);
>  }
>
>  static inline bool pud_user_accessible_page(pud_t pud)
>  {
> -       return pud_leaf(pud) && pud_user(pud);
> +       return pud_leaf(pud) && pud_user(pud) && pud_valid(pud);

This looks closer to x86 where the check is directly: pmd_val(pmd) &
_PAGE_PRESENT, without PTE_PROT_NONE that is part of pmd_present()

Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Thanks,
Pasha
Re: [PATCH 2/2] arm64/mm: fix incorrect file_map_count for invalid pmd/pud
Posted by David Hildenbrand 3 years, 4 months ago
On 16.11.22 09:38, Liu Shixin wrote:
> The page table check trigger BUG_ON() unexpectedly when split hugepage:
> 
>   ------------[ cut here ]------------
>   kernel BUG at mm/page_table_check.c:119!
>   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>   Dumping ftrace buffer:
>      (ftrace buffer empty)
>   Modules linked in:
>   CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
>   Hardware name: linux,dummy-virt (DT)
>   pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : page_table_check_set.isra.0+0x398/0x468
>   lr : page_table_check_set.isra.0+0x1c0/0x468
> [...]
>   Call trace:
>    page_table_check_set.isra.0+0x398/0x468
>    __page_table_check_pte_set+0x160/0x1c0
>    __split_huge_pmd_locked+0x900/0x1648
>    __split_huge_pmd+0x28c/0x3b8
>    unmap_page_range+0x428/0x858
>    unmap_single_vma+0xf4/0x1c8
>    zap_page_range+0x2b0/0x410
>    madvise_vma_behavior+0xc44/0xe78
>    do_madvise+0x280/0x698
>    __arm64_sys_madvise+0x90/0xe8
>    invoke_syscall.constprop.0+0xdc/0x1d8
>    do_el0_svc+0xf4/0x3f8
>    el0_svc+0x58/0x120
>    el0t_64_sync_handler+0xb8/0xc0
>    el0t_64_sync+0x19c/0x1a0
> [...]
> 
> On arm64, pmd_present() will return true even if the pmd is invalid.

I assume that's because of the pmd_present_invalid() check.

... I wonder why that behavior was chosen. Sounds error-prone to me.

Fix LGHTM, but I am not an arm64 pgtable expert.

-- 
Thanks,

David / dhildenb
Re: [PATCH 2/2] arm64/mm: fix incorrect file_map_count for invalid pmd/pud
Posted by Mark Rutland 3 years, 4 months ago
On Wed, Nov 16, 2022 at 10:08:27AM +0100, David Hildenbrand wrote:
> On 16.11.22 09:38, Liu Shixin wrote:
> > The page table check trigger BUG_ON() unexpectedly when split hugepage:
> > 
> >   ------------[ cut here ]------------
> >   kernel BUG at mm/page_table_check.c:119!
> >   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
> >   Dumping ftrace buffer:
> >      (ftrace buffer empty)
> >   Modules linked in:
> >   CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
> >   Hardware name: linux,dummy-virt (DT)
> >   pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >   pc : page_table_check_set.isra.0+0x398/0x468
> >   lr : page_table_check_set.isra.0+0x1c0/0x468
> > [...]
> >   Call trace:
> >    page_table_check_set.isra.0+0x398/0x468
> >    __page_table_check_pte_set+0x160/0x1c0
> >    __split_huge_pmd_locked+0x900/0x1648
> >    __split_huge_pmd+0x28c/0x3b8
> >    unmap_page_range+0x428/0x858
> >    unmap_single_vma+0xf4/0x1c8
> >    zap_page_range+0x2b0/0x410
> >    madvise_vma_behavior+0xc44/0xe78
> >    do_madvise+0x280/0x698
> >    __arm64_sys_madvise+0x90/0xe8
> >    invoke_syscall.constprop.0+0xdc/0x1d8
> >    do_el0_svc+0xf4/0x3f8
> >    el0_svc+0x58/0x120
> >    el0t_64_sync_handler+0xb8/0xc0
> >    el0t_64_sync+0x19c/0x1a0
> > [...]
> > 
> > On arm64, pmd_present() will return true even if the pmd is invalid.
> 
> I assume that's because of the pmd_present_invalid() check.
> 
> ... I wonder why that behavior was chosen. Sounds error-prone to me.

That seems to be down to commit:

  b65399f6111b03df ("arm64/mm: Change THP helpers to comply with generic MM semantics")

... apparently because Andrea Arcangelli said this was necessary in:

  https://lore.kernel.org/lkml/20181017020930.GN30832@redhat.com/

... but that does see to contradict what's said in:

  Documentation/mm/arch_pgtable_helpers.rst

... which just says:

  pmd_present  Tests a valid mapped PMD 

... and it's not clear to me why this *only* applies to the PMD level.

Anshuman?

Thanks,
Mark.
Re: [PATCH 2/2] arm64/mm: fix incorrect file_map_count for invalid pmd/pud
Posted by Anshuman Khandual 3 years, 4 months ago

On 11/16/22 21:16, Mark Rutland wrote:
> On Wed, Nov 16, 2022 at 10:08:27AM +0100, David Hildenbrand wrote:
>> On 16.11.22 09:38, Liu Shixin wrote:
>>> The page table check trigger BUG_ON() unexpectedly when split hugepage:
>>>
>>>   ------------[ cut here ]------------
>>>   kernel BUG at mm/page_table_check.c:119!
>>>   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>>>   Dumping ftrace buffer:
>>>      (ftrace buffer empty)
>>>   Modules linked in:
>>>   CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
>>>   Hardware name: linux,dummy-virt (DT)
>>>   pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>   pc : page_table_check_set.isra.0+0x398/0x468
>>>   lr : page_table_check_set.isra.0+0x1c0/0x468
>>> [...]
>>>   Call trace:
>>>    page_table_check_set.isra.0+0x398/0x468
>>>    __page_table_check_pte_set+0x160/0x1c0
>>>    __split_huge_pmd_locked+0x900/0x1648
>>>    __split_huge_pmd+0x28c/0x3b8
>>>    unmap_page_range+0x428/0x858
>>>    unmap_single_vma+0xf4/0x1c8
>>>    zap_page_range+0x2b0/0x410
>>>    madvise_vma_behavior+0xc44/0xe78
>>>    do_madvise+0x280/0x698
>>>    __arm64_sys_madvise+0x90/0xe8
>>>    invoke_syscall.constprop.0+0xdc/0x1d8
>>>    do_el0_svc+0xf4/0x3f8
>>>    el0_svc+0x58/0x120
>>>    el0t_64_sync_handler+0xb8/0xc0
>>>    el0t_64_sync+0x19c/0x1a0
>>> [...]
>>>
>>> On arm64, pmd_present() will return true even if the pmd is invalid.
>>
>> I assume that's because of the pmd_present_invalid() check.
>>
>> ... I wonder why that behavior was chosen. Sounds error-prone to me.
> 
> That seems to be down to commit:
> 
>   b65399f6111b03df ("arm64/mm: Change THP helpers to comply with generic MM semantics")
> 
> ... apparently because Andrea Arcangelli said this was necessary in:
> 
>   https://lore.kernel.org/lkml/20181017020930.GN30832@redhat.com/
> 
> ... but that does see to contradict what's said in:
> 
>   Documentation/mm/arch_pgtable_helpers.rst
> 
> ... which just says:
> 
>   pmd_present  Tests a valid mapped PMD

It should be as follows instead, will update. Not sure about PUD level though,
where anon THP is not supported (AFAIK).

+---------------------------+--------------------------------------------------+
| pmd_present               | Tests if pmd_page() points to valid memory page  |
+---------------------------+--------------------------------------------------+

> 
> ... and it's not clear to me why this *only* applies to the PMD level.
> 
> Anshuman?

Because THP is supported at PMD level. As Andrea had explained earlier, pmd_present()
should return positive if pmd_page() on the entry points to valid memory irrespective
of whether the entry is valid/mapped or not. That is the semantics expected in generic
THP during PMD split, collapse, migration etc and other memory code walking past such
PMD entries. That was my understanding.