fs/proc/task_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
When reading /proc/pid/smaps for a process that has mapped a hugetlbfs
file with MAP_PRIVATE, the kernel might crash inside pfn_swap_entry_to_page.
This occurs on LoongArch under specific conditions.
The root cause involves several steps:
1. When the hugetlbfs file is mapped (MAP_PRIVATE), the initial PMD
(or relevant level) entry is often populated by the kernel during mmap()
with a non-present entry pointing to the architecture's invalid_pte_table
On the affected LoongArch system, this address was observed to
be 0x90000000031e4000.
2. The smaps walker (walk_hugetlb_range -> smaps_hugetlb_range) reads
this entry.
3. The generic is_swap_pte() macro checks `!pte_present() && !pte_none()`.
The entry (invalid_pte_table address) is not present. Crucially,
the generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
returns false because the invalid_pte_table address is non-zero.
Therefore, is_swap_pte() incorrectly returns true.
4. The code enters the `else if (is_swap_pte(...))` block.
5. Inside this block, it checks `is_pfn_swap_entry()`. Due to a bit
pattern coincidence in the invalid_pte_table address on LoongArch,
the embedded generic `is_migration_entry()` check happens to return
true (misinterpreting parts of the address as a migration type).
6. This leads to a call to pfn_swap_entry_to_page() with the bogus
swap entry derived from the invalid table address.
7. pfn_swap_entry_to_page() extracts a meaningless PFN, finds an
unrelated struct page, checks its lock status (unlocked), and hits
the `BUG_ON(is_migration_entry(entry) && !PageLocked(p))` assertion.
The original code's intent in the `else if` block seems aimed at handling
potential migration entries, as indicated by the inner `is_pfn_swap_entry()`
check. The issue arises because the outer `is_swap_pte()` check incorrectly
includes the invalid table pointer case on LoongArch.
This patch fixes the issue by changing the condition in
smaps_hugetlb_range() from the broad `is_swap_pte()` to the specific
`is_hugetlb_entry_migration()`.
The `is_hugetlb_entry_migration()` helper function correctly handles this
by first checking `huge_pte_none()`. Architectures like LoongArch can
provide an override for `huge_pte_none()` that specifically recognizes
the `invalid_pte_table` address as a "none" state for HugeTLB entries.
This ensures `is_hugetlb_entry_migration()` returns false for the invalid
entry, preventing the code from entering the faulty block.
This change makes the code reflect the likely original intent (handling
migration) more accurately and leverages architecture-specific helpers
(`huge_pte_none`) to correctly interpret special PTE/PMD values in the
HugeTLB context, fixing the crash on LoongArch without altering the
generic is_swap_pte() behavior.
Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps")
Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Ming Wang <wangming01@loongson.cn>
---
fs/proc/task_mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 994cde10e3f4..95a0093ae87c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1027,7 +1027,7 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
if (pte_present(ptent)) {
folio = page_folio(pte_page(ptent));
present = true;
- } else if (is_swap_pte(ptent)) {
+ } else if (is_hugetlb_entry_migration(ptent)) {
swp_entry_t swpent = pte_to_swp_entry(ptent);
if (is_pfn_swap_entry(swpent))
--
2.43.0
On 23.04.25 03:03, Ming Wang wrote: > When reading /proc/pid/smaps for a process that has mapped a hugetlbfs > file with MAP_PRIVATE, the kernel might crash inside pfn_swap_entry_to_page. > This occurs on LoongArch under specific conditions. > > The root cause involves several steps: > 1. When the hugetlbfs file is mapped (MAP_PRIVATE), the initial PMD > (or relevant level) entry is often populated by the kernel during mmap() > with a non-present entry pointing to the architecture's invalid_pte_table > On the affected LoongArch system, this address was observed to > be 0x90000000031e4000. > 2. The smaps walker (walk_hugetlb_range -> smaps_hugetlb_range) reads > this entry. > 3. The generic is_swap_pte() macro checks `!pte_present() && !pte_none()`. > The entry (invalid_pte_table address) is not present. Crucially, > the generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`) > returns false because the invalid_pte_table address is non-zero. > Therefore, is_swap_pte() incorrectly returns true. > 4. The code enters the `else if (is_swap_pte(...))` block. > 5. Inside this block, it checks `is_pfn_swap_entry()`. Due to a bit > pattern coincidence in the invalid_pte_table address on LoongArch, > the embedded generic `is_migration_entry()` check happens to return > true (misinterpreting parts of the address as a migration type). > 6. This leads to a call to pfn_swap_entry_to_page() with the bogus > swap entry derived from the invalid table address. > 7. pfn_swap_entry_to_page() extracts a meaningless PFN, finds an > unrelated struct page, checks its lock status (unlocked), and hits > the `BUG_ON(is_migration_entry(entry) && !PageLocked(p))` assertion. > > The original code's intent in the `else if` block seems aimed at handling > potential migration entries, as indicated by the inner `is_pfn_swap_entry()` > check. The issue arises because the outer `is_swap_pte()` check incorrectly > includes the invalid table pointer case on LoongArch. This has a big loongarch smell to it. If we end up passing !pte_present() && !pte_none(), then loongarch must be fixed to filter out these weird non-present entries. is_swap_pte() must not succeed on something that is not an actual swap pte. -- Cheers, David / dhildenb
On 4/23/25 15:07, David Hildenbrand wrote:
> On 23.04.25 03:03, Ming Wang wrote:
>> When reading /proc/pid/smaps for a process that has mapped a hugetlbfs
>> file with MAP_PRIVATE, the kernel might crash inside
>> pfn_swap_entry_to_page.
>> This occurs on LoongArch under specific conditions.
>>
>> The root cause involves several steps:
>> 1. When the hugetlbfs file is mapped (MAP_PRIVATE), the initial PMD
>> (or relevant level) entry is often populated by the kernel during
>> mmap()
>> with a non-present entry pointing to the architecture's
>> invalid_pte_table
>> On the affected LoongArch system, this address was observed to
>> be 0x90000000031e4000.
>> 2. The smaps walker (walk_hugetlb_range -> smaps_hugetlb_range) reads
>> this entry.
>> 3. The generic is_swap_pte() macro checks `!pte_present() && !
>> pte_none()`.
>> The entry (invalid_pte_table address) is not present. Crucially,
>> the generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
>> returns false because the invalid_pte_table address is non-zero.
>> Therefore, is_swap_pte() incorrectly returns true.
>> 4. The code enters the `else if (is_swap_pte(...))` block.
>> 5. Inside this block, it checks `is_pfn_swap_entry()`. Due to a bit
>> pattern coincidence in the invalid_pte_table address on LoongArch,
>> the embedded generic `is_migration_entry()` check happens to return
>> true (misinterpreting parts of the address as a migration type).
>> 6. This leads to a call to pfn_swap_entry_to_page() with the bogus
>> swap entry derived from the invalid table address.
>> 7. pfn_swap_entry_to_page() extracts a meaningless PFN, finds an
>> unrelated struct page, checks its lock status (unlocked), and hits
>> the `BUG_ON(is_migration_entry(entry) && !PageLocked(p))` assertion.
>>
>> The original code's intent in the `else if` block seems aimed at handling
>> potential migration entries, as indicated by the inner
>> `is_pfn_swap_entry()`
>> check. The issue arises because the outer `is_swap_pte()` check
>> incorrectly
>> includes the invalid table pointer case on LoongArch.
>
> This has a big loongarch smell to it.
>
> If we end up passing !pte_present() && !pte_none(), then loongarch must
> be fixed to filter out these weird non-present entries.
>
> is_swap_pte() must not succeed on something that is not an actual swap pte.
>
Hi David,
Thanks a lot for your feedback and insightful analysis!
You're absolutely right, the core issue here stems from how the generic
is_swap_pte() macro interacts with the specific value of
invalid_pte_table (or the equivalent invalid table entries for PMD) on
the LoongArch architecture. I agree that this has a strong LoongArch
characteristic.
On the affected LoongArch system, the address used for invalid_pte_table
(observed as 0x90000000031e4000 in the vmcore) happens to satisfy both
!pte_present() and !pte_none() conditions. This is because:
1. It lacks the _PAGE_PRESENT and _PAGE_PROTNONE bits (correct for an
invalid entry).
2. The generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
returns false, as the address value itself is non-zero and doesn't match
the all-zero (except global bit) pattern.
This causes is_swap_pte() to incorrectly return true for these
non-mapped, initial entries set up during mmap().
The reason my proposed patch changes the condition in
smaps_hugetlb_range() from is_swap_pte(ptent) to
is_hugetlb_entry_migration(pte) is precisely to leverage an
**architecture-level filtering mechanism**, as you suggested LoongArch
should provide.
This works because is_hugetlb_entry_migration() internally calls
`huge_pte_none()`. LoongArch **already provides** an
architecture-specific override for huge_pte_none() (via
`__HAVE_ARCH_HUGE_PTE_NONE`), which is defined as follows in
arch/loongarch/include/asm/pgtable.h:
```
static inline int huge_pte_none(pte_t pte)
{
unsigned long val = pte_val(pte) & ~_PAGE_GLOBAL;
/* Check for all zeros (except global) OR if it points to
invalid_pte_table */
return !val || (val == (unsigned long)invalid_pte_table);
}
```
On 23.04.25 10:14, Ming Wang wrote:
>
>
> On 4/23/25 15:07, David Hildenbrand wrote:
>> On 23.04.25 03:03, Ming Wang wrote:
>>> When reading /proc/pid/smaps for a process that has mapped a hugetlbfs
>>> file with MAP_PRIVATE, the kernel might crash inside
>>> pfn_swap_entry_to_page.
>>> This occurs on LoongArch under specific conditions.
>>>
>>> The root cause involves several steps:
>>> 1. When the hugetlbfs file is mapped (MAP_PRIVATE), the initial PMD
>>> (or relevant level) entry is often populated by the kernel during
>>> mmap()
>>> with a non-present entry pointing to the architecture's
>>> invalid_pte_table
>>> On the affected LoongArch system, this address was observed to
>>> be 0x90000000031e4000.
>>> 2. The smaps walker (walk_hugetlb_range -> smaps_hugetlb_range) reads
>>> this entry.
>>> 3. The generic is_swap_pte() macro checks `!pte_present() && !
>>> pte_none()`.
>>> The entry (invalid_pte_table address) is not present. Crucially,
>>> the generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
>>> returns false because the invalid_pte_table address is non-zero.
>>> Therefore, is_swap_pte() incorrectly returns true.
>>> 4. The code enters the `else if (is_swap_pte(...))` block.
>>> 5. Inside this block, it checks `is_pfn_swap_entry()`. Due to a bit
>>> pattern coincidence in the invalid_pte_table address on LoongArch,
>>> the embedded generic `is_migration_entry()` check happens to return
>>> true (misinterpreting parts of the address as a migration type).
>>> 6. This leads to a call to pfn_swap_entry_to_page() with the bogus
>>> swap entry derived from the invalid table address.
>>> 7. pfn_swap_entry_to_page() extracts a meaningless PFN, finds an
>>> unrelated struct page, checks its lock status (unlocked), and hits
>>> the `BUG_ON(is_migration_entry(entry) && !PageLocked(p))` assertion.
>>>
>>> The original code's intent in the `else if` block seems aimed at handling
>>> potential migration entries, as indicated by the inner
>>> `is_pfn_swap_entry()`
>>> check. The issue arises because the outer `is_swap_pte()` check
>>> incorrectly
>>> includes the invalid table pointer case on LoongArch.
>>
>> This has a big loongarch smell to it.
>>
>> If we end up passing !pte_present() && !pte_none(), then loongarch must
>> be fixed to filter out these weird non-present entries.
>>
>> is_swap_pte() must not succeed on something that is not an actual swap pte.
>>
>
> Hi David,
>
> Thanks a lot for your feedback and insightful analysis!
>
> You're absolutely right, the core issue here stems from how the generic
> is_swap_pte() macro interacts with the specific value of
> invalid_pte_table (or the equivalent invalid table entries for PMD) on
> the LoongArch architecture. I agree that this has a strong LoongArch
> characteristic.
>
> On the affected LoongArch system, the address used for invalid_pte_table
> (observed as 0x90000000031e4000 in the vmcore) happens to satisfy both
> !pte_present() and !pte_none() conditions. This is because:
> 1. It lacks the _PAGE_PRESENT and _PAGE_PROTNONE bits (correct for an
> invalid entry).
> 2. The generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
> returns false, as the address value itself is non-zero and doesn't match
> the all-zero (except global bit) pattern.
> This causes is_swap_pte() to incorrectly return true for these
> non-mapped, initial entries set up during mmap().
>
> The reason my proposed patch changes the condition in
> smaps_hugetlb_range() from is_swap_pte(ptent) to
> is_hugetlb_entry_migration(pte) is precisely to leverage an
> **architecture-level filtering mechanism**, as you suggested LoongArch
> should provide.
>
> This works because is_hugetlb_entry_migration() internally calls
> `huge_pte_none()`. LoongArch **already provides** an
> architecture-specific override for huge_pte_none() (via
> `__HAVE_ARCH_HUGE_PTE_NONE`), which is defined as follows in
> arch/loongarch/include/asm/pgtable.h:
>
> ```
> static inline int huge_pte_none(pte_t pte)
> {
> unsigned long val = pte_val(pte) & ~_PAGE_GLOBAL;
> /* Check for all zeros (except global) OR if it points to
> invalid_pte_table */
> return !val || (val == (unsigned long)invalid_pte_table);
> }
> ```
There is now an alternative fix on the list, right?
https://lore.kernel.org/loongarch/20250424083037.2226732-1-wangming01@loongson.cn/T/#u
--
Cheers,
David / dhildenb
On Thu, Apr 24, 2025 at 8:21 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 23.04.25 10:14, Ming Wang wrote:
> >
> >
> > On 4/23/25 15:07, David Hildenbrand wrote:
> >> On 23.04.25 03:03, Ming Wang wrote:
> >>> When reading /proc/pid/smaps for a process that has mapped a hugetlbfs
> >>> file with MAP_PRIVATE, the kernel might crash inside
> >>> pfn_swap_entry_to_page.
> >>> This occurs on LoongArch under specific conditions.
> >>>
> >>> The root cause involves several steps:
> >>> 1. When the hugetlbfs file is mapped (MAP_PRIVATE), the initial PMD
> >>> (or relevant level) entry is often populated by the kernel during
> >>> mmap()
> >>> with a non-present entry pointing to the architecture's
> >>> invalid_pte_table
> >>> On the affected LoongArch system, this address was observed to
> >>> be 0x90000000031e4000.
> >>> 2. The smaps walker (walk_hugetlb_range -> smaps_hugetlb_range) reads
> >>> this entry.
> >>> 3. The generic is_swap_pte() macro checks `!pte_present() && !
> >>> pte_none()`.
> >>> The entry (invalid_pte_table address) is not present. Crucially,
> >>> the generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
> >>> returns false because the invalid_pte_table address is non-zero.
> >>> Therefore, is_swap_pte() incorrectly returns true.
> >>> 4. The code enters the `else if (is_swap_pte(...))` block.
> >>> 5. Inside this block, it checks `is_pfn_swap_entry()`. Due to a bit
> >>> pattern coincidence in the invalid_pte_table address on LoongArch,
> >>> the embedded generic `is_migration_entry()` check happens to return
> >>> true (misinterpreting parts of the address as a migration type).
> >>> 6. This leads to a call to pfn_swap_entry_to_page() with the bogus
> >>> swap entry derived from the invalid table address.
> >>> 7. pfn_swap_entry_to_page() extracts a meaningless PFN, finds an
> >>> unrelated struct page, checks its lock status (unlocked), and hits
> >>> the `BUG_ON(is_migration_entry(entry) && !PageLocked(p))` assertion.
> >>>
> >>> The original code's intent in the `else if` block seems aimed at handling
> >>> potential migration entries, as indicated by the inner
> >>> `is_pfn_swap_entry()`
> >>> check. The issue arises because the outer `is_swap_pte()` check
> >>> incorrectly
> >>> includes the invalid table pointer case on LoongArch.
> >>
> >> This has a big loongarch smell to it.
> >>
> >> If we end up passing !pte_present() && !pte_none(), then loongarch must
> >> be fixed to filter out these weird non-present entries.
> >>
> >> is_swap_pte() must not succeed on something that is not an actual swap pte.
> >>
> >
> > Hi David,
> >
> > Thanks a lot for your feedback and insightful analysis!
> >
> > You're absolutely right, the core issue here stems from how the generic
> > is_swap_pte() macro interacts with the specific value of
> > invalid_pte_table (or the equivalent invalid table entries for PMD) on
> > the LoongArch architecture. I agree that this has a strong LoongArch
> > characteristic.
> >
> > On the affected LoongArch system, the address used for invalid_pte_table
> > (observed as 0x90000000031e4000 in the vmcore) happens to satisfy both
> > !pte_present() and !pte_none() conditions. This is because:
> > 1. It lacks the _PAGE_PRESENT and _PAGE_PROTNONE bits (correct for an
> > invalid entry).
> > 2. The generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
> > returns false, as the address value itself is non-zero and doesn't match
> > the all-zero (except global bit) pattern.
> > This causes is_swap_pte() to incorrectly return true for these
> > non-mapped, initial entries set up during mmap().
> >
> > The reason my proposed patch changes the condition in
> > smaps_hugetlb_range() from is_swap_pte(ptent) to
> > is_hugetlb_entry_migration(pte) is precisely to leverage an
> > **architecture-level filtering mechanism**, as you suggested LoongArch
> > should provide.
> >
> > This works because is_hugetlb_entry_migration() internally calls
> > `huge_pte_none()`. LoongArch **already provides** an
> > architecture-specific override for huge_pte_none() (via
> > `__HAVE_ARCH_HUGE_PTE_NONE`), which is defined as follows in
> > arch/loongarch/include/asm/pgtable.h:
> >
> > ```
> > static inline int huge_pte_none(pte_t pte)
> > {
> > unsigned long val = pte_val(pte) & ~_PAGE_GLOBAL;
> > /* Check for all zeros (except global) OR if it points to
> > invalid_pte_table */
> > return !val || (val == (unsigned long)invalid_pte_table);
> > }
> > ```
>
> There is now an alternative fix on the list, right?
>
> https://lore.kernel.org/loongarch/20250424083037.2226732-1-wangming01@loongson.cn/T/#u
Yes, that one is better.
Huacai
>
> --
> Cheers,
>
> David / dhildenb
>
On 24.04.25 14:36, Huacai Chen wrote:
> On Thu, Apr 24, 2025 at 8:21 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 23.04.25 10:14, Ming Wang wrote:
>>>
>>>
>>> On 4/23/25 15:07, David Hildenbrand wrote:
>>>> On 23.04.25 03:03, Ming Wang wrote:
>>>>> When reading /proc/pid/smaps for a process that has mapped a hugetlbfs
>>>>> file with MAP_PRIVATE, the kernel might crash inside
>>>>> pfn_swap_entry_to_page.
>>>>> This occurs on LoongArch under specific conditions.
>>>>>
>>>>> The root cause involves several steps:
>>>>> 1. When the hugetlbfs file is mapped (MAP_PRIVATE), the initial PMD
>>>>> (or relevant level) entry is often populated by the kernel during
>>>>> mmap()
>>>>> with a non-present entry pointing to the architecture's
>>>>> invalid_pte_table
>>>>> On the affected LoongArch system, this address was observed to
>>>>> be 0x90000000031e4000.
>>>>> 2. The smaps walker (walk_hugetlb_range -> smaps_hugetlb_range) reads
>>>>> this entry.
>>>>> 3. The generic is_swap_pte() macro checks `!pte_present() && !
>>>>> pte_none()`.
>>>>> The entry (invalid_pte_table address) is not present. Crucially,
>>>>> the generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
>>>>> returns false because the invalid_pte_table address is non-zero.
>>>>> Therefore, is_swap_pte() incorrectly returns true.
>>>>> 4. The code enters the `else if (is_swap_pte(...))` block.
>>>>> 5. Inside this block, it checks `is_pfn_swap_entry()`. Due to a bit
>>>>> pattern coincidence in the invalid_pte_table address on LoongArch,
>>>>> the embedded generic `is_migration_entry()` check happens to return
>>>>> true (misinterpreting parts of the address as a migration type).
>>>>> 6. This leads to a call to pfn_swap_entry_to_page() with the bogus
>>>>> swap entry derived from the invalid table address.
>>>>> 7. pfn_swap_entry_to_page() extracts a meaningless PFN, finds an
>>>>> unrelated struct page, checks its lock status (unlocked), and hits
>>>>> the `BUG_ON(is_migration_entry(entry) && !PageLocked(p))` assertion.
>>>>>
>>>>> The original code's intent in the `else if` block seems aimed at handling
>>>>> potential migration entries, as indicated by the inner
>>>>> `is_pfn_swap_entry()`
>>>>> check. The issue arises because the outer `is_swap_pte()` check
>>>>> incorrectly
>>>>> includes the invalid table pointer case on LoongArch.
>>>>
>>>> This has a big loongarch smell to it.
>>>>
>>>> If we end up passing !pte_present() && !pte_none(), then loongarch must
>>>> be fixed to filter out these weird non-present entries.
>>>>
>>>> is_swap_pte() must not succeed on something that is not an actual swap pte.
>>>>
>>>
>>> Hi David,
>>>
>>> Thanks a lot for your feedback and insightful analysis!
>>>
>>> You're absolutely right, the core issue here stems from how the generic
>>> is_swap_pte() macro interacts with the specific value of
>>> invalid_pte_table (or the equivalent invalid table entries for PMD) on
>>> the LoongArch architecture. I agree that this has a strong LoongArch
>>> characteristic.
>>>
>>> On the affected LoongArch system, the address used for invalid_pte_table
>>> (observed as 0x90000000031e4000 in the vmcore) happens to satisfy both
>>> !pte_present() and !pte_none() conditions. This is because:
>>> 1. It lacks the _PAGE_PRESENT and _PAGE_PROTNONE bits (correct for an
>>> invalid entry).
>>> 2. The generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
>>> returns false, as the address value itself is non-zero and doesn't match
>>> the all-zero (except global bit) pattern.
>>> This causes is_swap_pte() to incorrectly return true for these
>>> non-mapped, initial entries set up during mmap().
>>>
>>> The reason my proposed patch changes the condition in
>>> smaps_hugetlb_range() from is_swap_pte(ptent) to
>>> is_hugetlb_entry_migration(pte) is precisely to leverage an
>>> **architecture-level filtering mechanism**, as you suggested LoongArch
>>> should provide.
>>>
>>> This works because is_hugetlb_entry_migration() internally calls
>>> `huge_pte_none()`. LoongArch **already provides** an
>>> architecture-specific override for huge_pte_none() (via
>>> `__HAVE_ARCH_HUGE_PTE_NONE`), which is defined as follows in
>>> arch/loongarch/include/asm/pgtable.h:
>>>
>>> ```
>>> static inline int huge_pte_none(pte_t pte)
>>> {
>>> unsigned long val = pte_val(pte) & ~_PAGE_GLOBAL;
>>> /* Check for all zeros (except global) OR if it points to
>>> invalid_pte_table */
>>> return !val || (val == (unsigned long)invalid_pte_table);
>>> }
>>> ```
>>
>> There is now an alternative fix on the list, right?
>>
>> https://lore.kernel.org/loongarch/20250424083037.2226732-1-wangming01@loongson.cn/T/#u
> Yes, that one is better.
We do now have page table walkers that walk hugetlb tables without any
hugetlb specifics.
Examples are GUP and folio_walk_start().
I assume these will be working as expected, because they would be
checking pmd_none() / pmd_present() natively, correct?
--
Cheers,
David / dhildenb
Hi David, Huacai,
On 4/24/25 20:39, David Hildenbrand wrote:
> On 24.04.25 14:36, Huacai Chen wrote:
>> On Thu, Apr 24, 2025 at 8:21 PM David Hildenbrand <david@redhat.com> wrote:
>>>
>>> On 23.04.25 10:14, Ming Wang wrote:
>>>>
>>>>
>>>> On 4/23/25 15:07, David Hildenbrand wrote:
>>>>> On 23.04.25 03:03, Ming Wang wrote:
>>>>>> When reading /proc/pid/smaps for a process that has mapped a hugetlbfs
>>>>>> file with MAP_PRIVATE, the kernel might crash inside
>>>>>> pfn_swap_entry_to_page.
>>>>>> This occurs on LoongArch under specific conditions.
>>>>>>
>>>>>> The root cause involves several steps:
>>>>>> 1. When the hugetlbfs file is mapped (MAP_PRIVATE), the initial PMD
>>>>>> (or relevant level) entry is often populated by the kernel during
>>>>>> mmap()
>>>>>> with a non-present entry pointing to the architecture's
>>>>>> invalid_pte_table
>>>>>> On the affected LoongArch system, this address was observed to
>>>>>> be 0x90000000031e4000.
>>>>>> 2. The smaps walker (walk_hugetlb_range -> smaps_hugetlb_range) reads
>>>>>> this entry.
>>>>>> 3. The generic is_swap_pte() macro checks `!pte_present() && !
>>>>>> pte_none()`.
>>>>>> The entry (invalid_pte_table address) is not present. Crucially,
>>>>>> the generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
>>>>>> returns false because the invalid_pte_table address is non-zero.
>>>>>> Therefore, is_swap_pte() incorrectly returns true.
>>>>>> 4. The code enters the `else if (is_swap_pte(...))` block.
>>>>>> 5. Inside this block, it checks `is_pfn_swap_entry()`. Due to a bit
>>>>>> pattern coincidence in the invalid_pte_table address on LoongArch,
>>>>>> the embedded generic `is_migration_entry()` check happens to return
>>>>>> true (misinterpreting parts of the address as a migration type).
>>>>>> 6. This leads to a call to pfn_swap_entry_to_page() with the bogus
>>>>>> swap entry derived from the invalid table address.
>>>>>> 7. pfn_swap_entry_to_page() extracts a meaningless PFN, finds an
>>>>>> unrelated struct page, checks its lock status (unlocked), and hits
>>>>>> the `BUG_ON(is_migration_entry(entry) && !PageLocked(p))` assertion.
>>>>>>
>>>>>> The original code's intent in the `else if` block seems aimed at handling
>>>>>> potential migration entries, as indicated by the inner
>>>>>> `is_pfn_swap_entry()`
>>>>>> check. The issue arises because the outer `is_swap_pte()` check
>>>>>> incorrectly
>>>>>> includes the invalid table pointer case on LoongArch.
>>>>>
>>>>> This has a big loongarch smell to it.
>>>>>
>>>>> If we end up passing !pte_present() && !pte_none(), then loongarch must
>>>>> be fixed to filter out these weird non-present entries.
>>>>>
>>>>> is_swap_pte() must not succeed on something that is not an actual swap pte.
>>>>>
>>>>
>>>> Hi David,
>>>>
>>>> Thanks a lot for your feedback and insightful analysis!
>>>>
>>>> You're absolutely right, the core issue here stems from how the generic
>>>> is_swap_pte() macro interacts with the specific value of
>>>> invalid_pte_table (or the equivalent invalid table entries for PMD) on
>>>> the LoongArch architecture. I agree that this has a strong LoongArch
>>>> characteristic.
>>>>
>>>> On the affected LoongArch system, the address used for invalid_pte_table
>>>> (observed as 0x90000000031e4000 in the vmcore) happens to satisfy both
>>>> !pte_present() and !pte_none() conditions. This is because:
>>>> 1. It lacks the _PAGE_PRESENT and _PAGE_PROTNONE bits (correct for an
>>>> invalid entry).
>>>> 2. The generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
>>>> returns false, as the address value itself is non-zero and doesn't match
>>>> the all-zero (except global bit) pattern.
>>>> This causes is_swap_pte() to incorrectly return true for these
>>>> non-mapped, initial entries set up during mmap().
>>>>
>>>> The reason my proposed patch changes the condition in
>>>> smaps_hugetlb_range() from is_swap_pte(ptent) to
>>>> is_hugetlb_entry_migration(pte) is precisely to leverage an
>>>> **architecture-level filtering mechanism**, as you suggested LoongArch
>>>> should provide.
>>>>
>>>> This works because is_hugetlb_entry_migration() internally calls
>>>> `huge_pte_none()`. LoongArch **already provides** an
>>>> architecture-specific override for huge_pte_none() (via
>>>> `__HAVE_ARCH_HUGE_PTE_NONE`), which is defined as follows in
>>>> arch/loongarch/include/asm/pgtable.h:
>>>>
>>>> ```
>>>> static inline int huge_pte_none(pte_t pte)
>>>> {
>>>> unsigned long val = pte_val(pte) & ~_PAGE_GLOBAL;
>>>> /* Check for all zeros (except global) OR if it points to
>>>> invalid_pte_table */
>>>> return !val || (val == (unsigned long)invalid_pte_table);
>>>> }
>>>> ```
>>>
>>> There is now an alternative fix on the list, right?
>>>
>>> https://lore.kernel.org/loongarch/20250424083037.2226732-1-wangming01@loongson.cn/T/#u
>> Yes, that one is better.
>
> We do now have page table walkers that walk hugetlb tables without any hugetlb specifics.
>
> Examples are GUP and folio_walk_start().
>
> I assume these will be working as expected, because they would be checking pmd_none() / pmd_present() natively, correct?
>
Thanks for the clarification, David. Your point about generic page table walkers like GUP and folio_walk_start()
relying on native pmd_none()/pmd_present() checks makes perfect sense.
Therefore, I'll withdraw the patch modifying smaps_hugetlb_range(). We should proceed with the alternative fix
at the LoongArch architecture level.
Thanks again for guiding this towards the correct architectural solution!
Best regards,
Ming
Hi, Wangming,
On Wed, Apr 23, 2025 at 9:04 AM Ming Wang <wangming01@loongson.cn> wrote:
>
> When reading /proc/pid/smaps for a process that has mapped a hugetlbfs
> file with MAP_PRIVATE, the kernel might crash inside pfn_swap_entry_to_page.
> This occurs on LoongArch under specific conditions.
>
> The root cause involves several steps:
> 1. When the hugetlbfs file is mapped (MAP_PRIVATE), the initial PMD
> (or relevant level) entry is often populated by the kernel during mmap()
> with a non-present entry pointing to the architecture's invalid_pte_table
> On the affected LoongArch system, this address was observed to
> be 0x90000000031e4000.
> 2. The smaps walker (walk_hugetlb_range -> smaps_hugetlb_range) reads
> this entry.
> 3. The generic is_swap_pte() macro checks `!pte_present() && !pte_none()`.
> The entry (invalid_pte_table address) is not present. Crucially,
> the generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
> returns false because the invalid_pte_table address is non-zero.
> Therefore, is_swap_pte() incorrectly returns true.
> 4. The code enters the `else if (is_swap_pte(...))` block.
> 5. Inside this block, it checks `is_pfn_swap_entry()`. Due to a bit
> pattern coincidence in the invalid_pte_table address on LoongArch,
> the embedded generic `is_migration_entry()` check happens to return
> true (misinterpreting parts of the address as a migration type).
> 6. This leads to a call to pfn_swap_entry_to_page() with the bogus
> swap entry derived from the invalid table address.
> 7. pfn_swap_entry_to_page() extracts a meaningless PFN, finds an
> unrelated struct page, checks its lock status (unlocked), and hits
> the `BUG_ON(is_migration_entry(entry) && !PageLocked(p))` assertion.
>
> The original code's intent in the `else if` block seems aimed at handling
> potential migration entries, as indicated by the inner `is_pfn_swap_entry()`
> check. The issue arises because the outer `is_swap_pte()` check incorrectly
> includes the invalid table pointer case on LoongArch.
>
> This patch fixes the issue by changing the condition in
> smaps_hugetlb_range() from the broad `is_swap_pte()` to the specific
> `is_hugetlb_entry_migration()`.
>
> The `is_hugetlb_entry_migration()` helper function correctly handles this
> by first checking `huge_pte_none()`. Architectures like LoongArch can
> provide an override for `huge_pte_none()` that specifically recognizes
> the `invalid_pte_table` address as a "none" state for HugeTLB entries.
> This ensures `is_hugetlb_entry_migration()` returns false for the invalid
> entry, preventing the code from entering the faulty block.
>
> This change makes the code reflect the likely original intent (handling
> migration) more accurately and leverages architecture-specific helpers
> (`huge_pte_none`) to correctly interpret special PTE/PMD values in the
> HugeTLB context, fixing the crash on LoongArch without altering the
> generic is_swap_pte() behavior.
>
> Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps")
> Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn>
> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
> Signed-off-by: Ming Wang <wangming01@loongson.cn>
> ---
> fs/proc/task_mmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 994cde10e3f4..95a0093ae87c 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1027,7 +1027,7 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
> if (pte_present(ptent)) {
> folio = page_folio(pte_page(ptent));
> present = true;
> - } else if (is_swap_pte(ptent)) {
> + } else if (is_hugetlb_entry_migration(ptent)) {
Other functions in this file, such as pagemap_hugetlb_category(), may
need similar modifications.
Huacai
> swp_entry_t swpent = pte_to_swp_entry(ptent);
>
> if (is_pfn_swap_entry(swpent))
> --
> 2.43.0
>
© 2016 - 2026 Red Hat, Inc.