arch/loongarch/mm/hugetlbpage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
LoongArch's huge_pte_offset() currently returns a pointer to a PMD slot
even if the underlying entry points to invalid_pte_table (indicating no
mapping). Callers like smaps_hugetlb_range() fetch this invalid entry
value (the address of invalid_pte_table) via this pointer.
The generic is_swap_pte() check then incorrectly identifies this address
as a swap entry on LoongArch, because it satisfies the !pte_present()
&& !pte_none() conditions. This misinterpretation, combined with a
coincidental match by is_migration_entry() on the address bits, leads
to kernel crashes in pfn_swap_entry_to_page().
Fix this at the architecture level by modifying huge_pte_offset() to
check the PMD entry's content using pmd_none() before returning. If the
entry is none (i.e., it points to invalid_pte_table), return NULL
instead of the pointer to the slot.
Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Ming Wang <wangming01@loongson.cn>
---
arch/loongarch/mm/hugetlbpage.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/loongarch/mm/hugetlbpage.c b/arch/loongarch/mm/hugetlbpage.c
index e4068906143b..cea84d7f2b91 100644
--- a/arch/loongarch/mm/hugetlbpage.c
+++ b/arch/loongarch/mm/hugetlbpage.c
@@ -47,7 +47,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
pmd = pmd_offset(pud, addr);
}
}
- return (pte_t *) pmd;
+ return pmd_none(pmdp_get(pmd)) ? NULL : (pte_t *) pmd;
}
uint64_t pmd_to_entrylo(unsigned long pmd_val)
--
2.43.0
Applied, thanks. Huacai On Thu, Apr 24, 2025 at 4:30 PM Ming Wang <wangming01@loongson.cn> wrote: > > LoongArch's huge_pte_offset() currently returns a pointer to a PMD slot > even if the underlying entry points to invalid_pte_table (indicating no > mapping). Callers like smaps_hugetlb_range() fetch this invalid entry > value (the address of invalid_pte_table) via this pointer. > > The generic is_swap_pte() check then incorrectly identifies this address > as a swap entry on LoongArch, because it satisfies the !pte_present() > && !pte_none() conditions. This misinterpretation, combined with a > coincidental match by is_migration_entry() on the address bits, leads > to kernel crashes in pfn_swap_entry_to_page(). > > Fix this at the architecture level by modifying huge_pte_offset() to > check the PMD entry's content using pmd_none() before returning. If the > entry is none (i.e., it points to invalid_pte_table), return NULL > instead of the pointer to the slot. > > Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn> > Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > Signed-off-by: Ming Wang <wangming01@loongson.cn> > --- > arch/loongarch/mm/hugetlbpage.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/loongarch/mm/hugetlbpage.c b/arch/loongarch/mm/hugetlbpage.c > index e4068906143b..cea84d7f2b91 100644 > --- a/arch/loongarch/mm/hugetlbpage.c > +++ b/arch/loongarch/mm/hugetlbpage.c > @@ -47,7 +47,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, > pmd = pmd_offset(pud, addr); > } > } > - return (pte_t *) pmd; > + return pmd_none(pmdp_get(pmd)) ? NULL : (pte_t *) pmd; > } > > uint64_t pmd_to_entrylo(unsigned long pmd_val) > -- > 2.43.0 >
On Thu, Apr 24, 2025 at 04:30:37PM +0800, Ming Wang wrote:
> LoongArch's huge_pte_offset() currently returns a pointer to a PMD slot
> even if the underlying entry points to invalid_pte_table (indicating no
> mapping). Callers like smaps_hugetlb_range() fetch this invalid entry
> value (the address of invalid_pte_table) via this pointer.
So it's in smaps_hugetlb_range() context, then..
>
> The generic is_swap_pte() check then incorrectly identifies this address
> as a swap entry on LoongArch, because it satisfies the !pte_present()
> && !pte_none() conditions. This misinterpretation, combined with a
> coincidental match by is_migration_entry() on the address bits, leads
> to kernel crashes in pfn_swap_entry_to_page().
.. you mentioned !pte_none() is checked. Could you explain why it's
pte_none() rather than huge_pte_none()? As I saw loongarch has exactly
this..
static inline int huge_pte_none(pte_t pte)
{
unsigned long val = pte_val(pte) & ~_PAGE_GLOBAL;
return !val || (val == (unsigned long)invalid_pte_table);
}
I'm not familiar with loongarch's invalid_pte_table, but I would expect at
least for now in hugetlb specific paths it should be reported as none even
without this patch.
One more thing to mention is the kernel (AFAIU) is trying to moving away
from hugetlb specific pgtable parsing to generic pgtable walkers. It means
it could happen at some point where the kernel walks the hugetlb pgtables
using normal pgtable APIs. I'm not yet sure what would happen then, but
maybe at some point the invalid_pte_table check is needed in pte_none() too
for loongarch.
Thanks,
>
> Fix this at the architecture level by modifying huge_pte_offset() to
> check the PMD entry's content using pmd_none() before returning. If the
> entry is none (i.e., it points to invalid_pte_table), return NULL
> instead of the pointer to the slot.
>
> Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn>
> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> Signed-off-by: Ming Wang <wangming01@loongson.cn>
> ---
> arch/loongarch/mm/hugetlbpage.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/loongarch/mm/hugetlbpage.c b/arch/loongarch/mm/hugetlbpage.c
> index e4068906143b..cea84d7f2b91 100644
> --- a/arch/loongarch/mm/hugetlbpage.c
> +++ b/arch/loongarch/mm/hugetlbpage.c
> @@ -47,7 +47,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
> pmd = pmd_offset(pud, addr);
> }
> }
> - return (pte_t *) pmd;
> + return pmd_none(pmdp_get(pmd)) ? NULL : (pte_t *) pmd;
> }
>
> uint64_t pmd_to_entrylo(unsigned long pmd_val)
> --
> 2.43.0
>
--
Peter Xu
Hi Peter Xu,
Thanks for taking a look and raising these important points!
On 4/25/25 02:48, Peter Xu wrote:
> On Thu, Apr 24, 2025 at 04:30:37PM +0800, Ming Wang wrote:
>> LoongArch's huge_pte_offset() currently returns a pointer to a PMD slot
>> even if the underlying entry points to invalid_pte_table (indicating no
>> mapping). Callers like smaps_hugetlb_range() fetch this invalid entry
>> value (the address of invalid_pte_table) via this pointer.
>
> So it's in smaps_hugetlb_range() context, then..
The root cause involves several steps:
1. When the hugetlbfs file is mapped (MAP_PRIVATE), the initial PMD
(or relevant level) entry is often populated by the kernel during mmap()
with a non-present entry pointing to the architecture's invalid_pte_table
On the affected LoongArch system, this address was observed to
be 0x90000000031e4000.
2. The smaps walker (walk_hugetlb_range -> smaps_hugetlb_range) reads
this entry.
3. The generic is_swap_pte() macro checks `!pte_present() && !pte_none()`.
The entry (invalid_pte_table address) is not present. Crucially,
the generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
returns false because the invalid_pte_table address is non-zero.
Therefore, is_swap_pte() incorrectly returns true.
4. The code enters the `else if (is_swap_pte(...))` block.
5. Inside this block, it checks `is_pfn_swap_entry()`. Due to a bit
pattern coincidence in the invalid_pte_table address on LoongArch,
the embedded generic `is_migration_entry()` check happens to return
true (misinterpreting parts of the address as a migration type).
6. This leads to a call to pfn_swap_entry_to_page() with the bogus
swap entry derived from the invalid table address.
7. pfn_swap_entry_to_page() extracts a meaningless PFN, finds an
unrelated struct page, checks its lock status (unlocked), and hits
the `BUG_ON(is_migration_entry(entry) && !PageLocked(p))` assertion.
>
>>
>> The generic is_swap_pte() check then incorrectly identifies this address
>> as a swap entry on LoongArch, because it satisfies the !pte_present()
>> && !pte_none() conditions. This misinterpretation, combined with a
>> coincidental match by is_migration_entry() on the address bits, leads
>> to kernel crashes in pfn_swap_entry_to_page().
>
> .. you mentioned !pte_none() is checked. Could you explain why it's
> pte_none() rather than huge_pte_none()? As I saw loongarch has exactly
> this..
>
> static inline int huge_pte_none(pte_t pte)
> {
> unsigned long val = pte_val(pte) & ~_PAGE_GLOBAL;
> return !val || (val == (unsigned long)invalid_pte_table);
> }
>
> I'm not familiar with loongarch's invalid_pte_table, but I would expect at
> least for now in hugetlb specific paths it should be reported as none even
> without this patch.
>
> One more thing to mention is the kernel (AFAIU) is trying to moving away
> from hugetlb specific pgtable parsing to generic pgtable walkers. It means
> it could happen at some point where the kernel walks the hugetlb pgtables
> using normal pgtable APIs. I'm not yet sure what would happen then, but
> maybe at some point the invalid_pte_table check is needed in pte_none() too
> for loongarch.
>
> Thanks,
You asked why the check involves pte_none() rather than huge_pte_none(), given that LoongArch
provides the latter which correctly identifies the invalid_pte_table address.
That's a great question, and the crux seems to be in how the generic code path works. The crash
originates within smaps_hugetlb_range() after the generic `is_swap_pte(ptent)` macro returns true.
Looking at the definition of `is_swap_pte()` (in include/linux/mm.h or similar), it typically
expands to `!pte_present(pte) && !pte_none(pte)`.
Critically, even though `smaps_hugetlb_range()` deals with HugeTLB entries (often PMDs cast to pte_t),
the generic `is_swap_pte()` macro itself, when expanded, calls the **generic `pte_none()` macro**, not
the specialized `huge_pte_none()`.
LoongArch's generic `pte_none()` macro is defined as:
`#define pte_none(pte) (!(pte_val(pte) & ~_PAGE_GLOBAL))`
This definition does *not* check for the `invalid_pte_table` address and thus returns false for it,
leading to `is_swap_pte()` incorrectly returning true.
So, while LoongArch does provide `huge_pte_none()` which *could* correctly identify the state, it's not
actually invoked in the code path triggered by `is_swap_pte()` within `smaps_hugetlb_range()`.
This is why modifying `huge_pte_offset()` seems necessary and appropriate at the architecture level.
By returning NULL when the underlying PMD entry is none (checked using the correct `pmd_none()`, which *does*
check for invalid_pte_table on LoongArch), we prevent the invalid pointer and its problematic value from reaching
`smaps_hugetlb_range()` and subsequently fooling the generic `is_swap_pte()` check that uses the generic `pte_none()`.
Regarding your point about generic page table walkers possibly needing `pte_none()` itself to handle `invalid_pte_table`
in the future – I understand the concern. That might indeed be a separate, future enhancement needed for LoongArch's
generic page table support. However, the current patch addresses the immediate crash within the existing hugetlb-specific
walker (`smaps_hugetlb_range`) by stopping the problematic value at the source (`huge_pte_offset`), which seems like a
necessary and correct fix for the present issue.
Does this explanation clarify the interaction between the generic macros and the arch-specific helpers in this context?
Thanks,
Ming
>
>>
>> Fix this at the architecture level by modifying huge_pte_offset() to
>> check the PMD entry's content using pmd_none() before returning. If the
>> entry is none (i.e., it points to invalid_pte_table), return NULL
>> instead of the pointer to the slot.
>>
>> Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn>
>> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
>> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
>> Signed-off-by: Ming Wang <wangming01@loongson.cn>
>> ---
>> arch/loongarch/mm/hugetlbpage.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/loongarch/mm/hugetlbpage.c b/arch/loongarch/mm/hugetlbpage.c
>> index e4068906143b..cea84d7f2b91 100644
>> --- a/arch/loongarch/mm/hugetlbpage.c
>> +++ b/arch/loongarch/mm/hugetlbpage.c
>> @@ -47,7 +47,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
>> pmd = pmd_offset(pud, addr);
>> }
>> }
>> - return (pte_t *) pmd;
>> + return pmd_none(pmdp_get(pmd)) ? NULL : (pte_t *) pmd;
>> }
>>
>> uint64_t pmd_to_entrylo(unsigned long pmd_val)
>> --
>> 2.43.0
>>
>
On Fri, Apr 25, 2025 at 10:47:24AM +0800, Ming Wang wrote:
> Hi Peter Xu,
Hi, Ming,
[...]
> You asked why the check involves pte_none() rather than huge_pte_none(), given that LoongArch
> provides the latter which correctly identifies the invalid_pte_table address.
>
> That's a great question, and the crux seems to be in how the generic code path works. The crash
> originates within smaps_hugetlb_range() after the generic `is_swap_pte(ptent)` macro returns true.
> Looking at the definition of `is_swap_pte()` (in include/linux/mm.h or similar), it typically
> expands to `!pte_present(pte) && !pte_none(pte)`.
>
> Critically, even though `smaps_hugetlb_range()` deals with HugeTLB entries (often PMDs cast to pte_t),
> the generic `is_swap_pte()` macro itself, when expanded, calls the **generic `pte_none()` macro**, not
> the specialized `huge_pte_none()`.
>
> LoongArch's generic `pte_none()` macro is defined as:
> `#define pte_none(pte) (!(pte_val(pte) & ~_PAGE_GLOBAL))`
> This definition does *not* check for the `invalid_pte_table` address and thus returns false for it,
> leading to `is_swap_pte()` incorrectly returning true.
>
> So, while LoongArch does provide `huge_pte_none()` which *could* correctly identify the state, it's not
> actually invoked in the code path triggered by `is_swap_pte()` within `smaps_hugetlb_range()`.
>
> This is why modifying `huge_pte_offset()` seems necessary and appropriate at the architecture level.
> By returning NULL when the underlying PMD entry is none (checked using the correct `pmd_none()`, which *does*
> check for invalid_pte_table on LoongArch), we prevent the invalid pointer and its problematic value from reaching
> `smaps_hugetlb_range()` and subsequently fooling the generic `is_swap_pte()` check that uses the generic `pte_none()`.
>
> Regarding your point about generic page table walkers possibly needing `pte_none()` itself to handle `invalid_pte_table`
> in the future – I understand the concern. That might indeed be a separate, future enhancement needed for LoongArch's
> generic page table support. However, the current patch addresses the immediate crash within the existing hugetlb-specific
> walker (`smaps_hugetlb_range`) by stopping the problematic value at the source (`huge_pte_offset`), which seems like a
> necessary and correct fix for the present issue.
>
> Does this explanation clarify the interaction between the generic macros and the arch-specific helpers in this context?
I see what's off here - I'm looking at Andrew's latest mm-unstable, which
contains your other fix already:
commit 2f46598ca15065ff7efac3dba466899608bfc659
Author: Ming Wang <wangming01@loongson.cn>
Date: Wed Apr 23 09:03:59 2025 +0800
smaps: fix crash in smaps_hugetlb_range for non-present hugetlb entries
So we're reading different code base..
Looks like the generic mm code used is_swap_pte() in multiple occurances on
hugetlb ptes already. Besides smap code you mentioned, I at least also see
page_vma_mapped_walk -> check_pte also does it.
I'm not sure what's the best to fix this, and if it means is_swap_pte()
should work on hugetlb pte_t's for all archs. However you're right if
that's the case your current patch can fix all of them by fixing
huge_pte_offset() in loongarch's impl. So it looks like at least the
simplest.
Acked-by: Peter Xu <peterx@redhat.com>
Maybe you want to ping the other patch to drop that in mm-unstable too, if
that's not your intention to merge.
Thanks,
--
Peter Xu
© 2016 - 2026 Red Hat, Inc.