Add a vmstat counter to track PTE allocation failures during PMD split.
This enables monitoring of split failures due to memory pressure after
the lazy PTE page table allocation change.
The counter is incremented in three places:
- __split_huge_pmd(): Main entry point for splitting a PMD
- try_to_unmap_one(): When reclaim needs to split a PMD-mapped THP
- try_to_migrate_one(): When migration needs to split a PMD-mapped THP
Visible via /proc/vmstat as thp_split_pmd_pte_alloc_failed.
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
include/linux/vm_event_item.h | 1 +
mm/huge_memory.c | 1 +
mm/rmap.c | 3 +++
mm/vmstat.c | 1 +
4 files changed, 6 insertions(+)
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 22a139f82d75f..827c9a8c251de 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -111,6 +111,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
THP_DEFERRED_SPLIT_PAGE,
THP_UNDERUSED_SPLIT_PAGE,
THP_SPLIT_PMD,
+ THP_SPLIT_PMD_PTE_ALLOC_FAILED,
THP_SCAN_EXCEED_NONE_PTE,
THP_SCAN_EXCEED_SWAP_PTE,
THP_SCAN_EXCEED_SHARED_PTE,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4c9a8d89fc8aa..8d7c9f67f8a1d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3332,6 +3332,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
if (vma_is_anonymous(vma) && !arch_needs_pgtable_deposit()) {
pgtable = pte_alloc_one(vma->vm_mm);
if (!pgtable) {
+ count_vm_event(THP_SPLIT_PMD_PTE_ALLOC_FAILED);
mmu_notifier_invalidate_range_end(&range);
return;
}
diff --git a/mm/rmap.c b/mm/rmap.c
index c6ff23fc12944..5c4afedb29d5a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2070,8 +2070,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
pgtable_t pgtable = prealloc_pte;
prealloc_pte = NULL;
+
if (!arch_needs_pgtable_deposit() && !pgtable &&
vma_is_anonymous(vma)) {
+ count_vm_event(THP_SPLIT_PMD_PTE_ALLOC_FAILED);
page_vma_mapped_walk_done(&pvmw);
ret = false;
break;
@@ -2474,6 +2476,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
prealloc_pte = NULL;
if (!arch_needs_pgtable_deposit() && !pgtable &&
vma_is_anonymous(vma)) {
+ count_vm_event(THP_SPLIT_PMD_PTE_ALLOC_FAILED);
page_vma_mapped_walk_done(&pvmw);
ret = false;
break;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 99270713e0c13..473edfa624a41 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1408,6 +1408,7 @@ const char * const vmstat_text[] = {
[I(THP_DEFERRED_SPLIT_PAGE)] = "thp_deferred_split_page",
[I(THP_UNDERUSED_SPLIT_PAGE)] = "thp_underused_split_page",
[I(THP_SPLIT_PMD)] = "thp_split_pmd",
+ [I(THP_SPLIT_PMD_PTE_ALLOC_FAILED)] = "thp_split_pmd_pte_alloc_failed",
[I(THP_SCAN_EXCEED_NONE_PTE)] = "thp_scan_exceed_none_pte",
[I(THP_SCAN_EXCEED_SWAP_PTE)] = "thp_scan_exceed_swap_pte",
[I(THP_SCAN_EXCEED_SHARED_PTE)] = "thp_scan_exceed_share_pte",
--
2.47.3
On 2/11/26 13:49, Usama Arif wrote:
> Add a vmstat counter to track PTE allocation failures during PMD split.
> This enables monitoring of split failures due to memory pressure after
> the lazy PTE page table allocation change.
>
> The counter is incremented in three places:
> - __split_huge_pmd(): Main entry point for splitting a PMD
> - try_to_unmap_one(): When reclaim needs to split a PMD-mapped THP
> - try_to_migrate_one(): When migration needs to split a PMD-mapped THP
>
> Visible via /proc/vmstat as thp_split_pmd_pte_alloc_failed.
>
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
> include/linux/vm_event_item.h | 1 +
> mm/huge_memory.c | 1 +
> mm/rmap.c | 3 +++
> mm/vmstat.c | 1 +
> 4 files changed, 6 insertions(+)
>
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 22a139f82d75f..827c9a8c251de 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -111,6 +111,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> THP_DEFERRED_SPLIT_PAGE,
> THP_UNDERUSED_SPLIT_PAGE,
> THP_SPLIT_PMD,
> + THP_SPLIT_PMD_PTE_ALLOC_FAILED,
Probably sufficient to call this THP_SPLIT_PMD_FAILED and count any
(future) failures (if any) as well.
It's a shame that we called a remapping a "split" and keep causing
confusion.
--
Cheers,
David
On 11/02/2026 13:27, David Hildenbrand (Arm) wrote:
> On 2/11/26 13:49, Usama Arif wrote:
>> Add a vmstat counter to track PTE allocation failures during PMD split.
>> This enables monitoring of split failures due to memory pressure after
>> the lazy PTE page table allocation change.
>>
>> The counter is incremented in three places:
>> - __split_huge_pmd(): Main entry point for splitting a PMD
>> - try_to_unmap_one(): When reclaim needs to split a PMD-mapped THP
>> - try_to_migrate_one(): When migration needs to split a PMD-mapped THP
>>
>> Visible via /proc/vmstat as thp_split_pmd_pte_alloc_failed.
>>
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>> ---
>> include/linux/vm_event_item.h | 1 +
>> mm/huge_memory.c | 1 +
>> mm/rmap.c | 3 +++
>> mm/vmstat.c | 1 +
>> 4 files changed, 6 insertions(+)
>>
>> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
>> index 22a139f82d75f..827c9a8c251de 100644
>> --- a/include/linux/vm_event_item.h
>> +++ b/include/linux/vm_event_item.h
>> @@ -111,6 +111,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>> THP_DEFERRED_SPLIT_PAGE,
>> THP_UNDERUSED_SPLIT_PAGE,
>> THP_SPLIT_PMD,
>> + THP_SPLIT_PMD_PTE_ALLOC_FAILED,
>
> Probably sufficient to call this THP_SPLIT_PMD_FAILED and count any (future) failures (if any) as well.
>
Makes sense. This was just a patch I was using for testing and I wanted to share.
It was always 0 as I couldnt get split to fail :) But I can rename it as THP_SPLIT_PMD_FAILED
as suggested and we can use for future split failures (hopefully none).
> It's a shame that we called a remapping a "split" and keep causing confusion.
>
On 2/11/26 14:31, Usama Arif wrote:
>
>
> On 11/02/2026 13:27, David Hildenbrand (Arm) wrote:
>> On 2/11/26 13:49, Usama Arif wrote:
>>> Add a vmstat counter to track PTE allocation failures during PMD split.
>>> This enables monitoring of split failures due to memory pressure after
>>> the lazy PTE page table allocation change.
>>>
>>> The counter is incremented in three places:
>>> - __split_huge_pmd(): Main entry point for splitting a PMD
>>> - try_to_unmap_one(): When reclaim needs to split a PMD-mapped THP
>>> - try_to_migrate_one(): When migration needs to split a PMD-mapped THP
>>>
>>> Visible via /proc/vmstat as thp_split_pmd_pte_alloc_failed.
>>>
>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>> ---
>>> include/linux/vm_event_item.h | 1 +
>>> mm/huge_memory.c | 1 +
>>> mm/rmap.c | 3 +++
>>> mm/vmstat.c | 1 +
>>> 4 files changed, 6 insertions(+)
>>>
>>> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
>>> index 22a139f82d75f..827c9a8c251de 100644
>>> --- a/include/linux/vm_event_item.h
>>> +++ b/include/linux/vm_event_item.h
>>> @@ -111,6 +111,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>>> THP_DEFERRED_SPLIT_PAGE,
>>> THP_UNDERUSED_SPLIT_PAGE,
>>> THP_SPLIT_PMD,
>>> + THP_SPLIT_PMD_PTE_ALLOC_FAILED,
>>
>> Probably sufficient to call this THP_SPLIT_PMD_FAILED and count any (future) failures (if any) as well.
>>
>
> Makes sense. This was just a patch I was using for testing and I wanted to share.
> It was always 0 as I couldnt get split to fail :) But I can rename it as THP_SPLIT_PMD_FAILED
> as suggested and we can use for future split failures (hopefully none).
Btw, you can use the allocation fault injection framework to find weird
issues, if you haven't heard of that yet.
--
Cheers,
David
On 11/02/2026 13:38, David Hildenbrand (Arm) wrote:
> On 2/11/26 14:31, Usama Arif wrote:
>>
>>
>> On 11/02/2026 13:27, David Hildenbrand (Arm) wrote:
>>> On 2/11/26 13:49, Usama Arif wrote:
>>>> Add a vmstat counter to track PTE allocation failures during PMD split.
>>>> This enables monitoring of split failures due to memory pressure after
>>>> the lazy PTE page table allocation change.
>>>>
>>>> The counter is incremented in three places:
>>>> - __split_huge_pmd(): Main entry point for splitting a PMD
>>>> - try_to_unmap_one(): When reclaim needs to split a PMD-mapped THP
>>>> - try_to_migrate_one(): When migration needs to split a PMD-mapped THP
>>>>
>>>> Visible via /proc/vmstat as thp_split_pmd_pte_alloc_failed.
>>>>
>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>> ---
>>>> include/linux/vm_event_item.h | 1 +
>>>> mm/huge_memory.c | 1 +
>>>> mm/rmap.c | 3 +++
>>>> mm/vmstat.c | 1 +
>>>> 4 files changed, 6 insertions(+)
>>>>
>>>> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
>>>> index 22a139f82d75f..827c9a8c251de 100644
>>>> --- a/include/linux/vm_event_item.h
>>>> +++ b/include/linux/vm_event_item.h
>>>> @@ -111,6 +111,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>>>> THP_DEFERRED_SPLIT_PAGE,
>>>> THP_UNDERUSED_SPLIT_PAGE,
>>>> THP_SPLIT_PMD,
>>>> + THP_SPLIT_PMD_PTE_ALLOC_FAILED,
>>>
>>> Probably sufficient to call this THP_SPLIT_PMD_FAILED and count any (future) failures (if any) as well.
>>>
>>
>> Makes sense. This was just a patch I was using for testing and I wanted to share.
>> It was always 0 as I couldnt get split to fail :) But I can rename it as THP_SPLIT_PMD_FAILED
>> as suggested and we can use for future split failures (hopefully none).
>
> Btw, you can use the allocation fault injection framework to find weird issues, if you haven't heard of that yet.
>
This looks very interesting, Thanks! Let me have a look.
On 2/11/26 14:31, Usama Arif wrote:
>
>
> On 11/02/2026 13:27, David Hildenbrand (Arm) wrote:
>> On 2/11/26 13:49, Usama Arif wrote:
>>> Add a vmstat counter to track PTE allocation failures during PMD split.
>>> This enables monitoring of split failures due to memory pressure after
>>> the lazy PTE page table allocation change.
>>>
>>> The counter is incremented in three places:
>>> - __split_huge_pmd(): Main entry point for splitting a PMD
>>> - try_to_unmap_one(): When reclaim needs to split a PMD-mapped THP
>>> - try_to_migrate_one(): When migration needs to split a PMD-mapped THP
>>>
>>> Visible via /proc/vmstat as thp_split_pmd_pte_alloc_failed.
>>>
>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>> ---
>>> include/linux/vm_event_item.h | 1 +
>>> mm/huge_memory.c | 1 +
>>> mm/rmap.c | 3 +++
>>> mm/vmstat.c | 1 +
>>> 4 files changed, 6 insertions(+)
>>>
>>> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
>>> index 22a139f82d75f..827c9a8c251de 100644
>>> --- a/include/linux/vm_event_item.h
>>> +++ b/include/linux/vm_event_item.h
>>> @@ -111,6 +111,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>>> THP_DEFERRED_SPLIT_PAGE,
>>> THP_UNDERUSED_SPLIT_PAGE,
>>> THP_SPLIT_PMD,
>>> + THP_SPLIT_PMD_PTE_ALLOC_FAILED,
>>
>> Probably sufficient to call this THP_SPLIT_PMD_FAILED and count any (future) failures (if any) as well.
>>
>
> Makes sense. This was just a patch I was using for testing and I wanted to share.
> It was always 0 as I couldnt get split to fail :) But I can rename it as THP_SPLIT_PMD_FAILED
> as suggested and we can use for future split failures (hopefully none).
I guess it might be reasonable to have because I am sure it will fail at
some point and maybe provoke weird issues we didn't think of. In that
case, having an indication that splitting failed at some point might be
reasonable.
--
Cheers,
David
On 11/02/2026 13:36, David Hildenbrand (Arm) wrote:
> On 2/11/26 14:31, Usama Arif wrote:
>>
>>
>> On 11/02/2026 13:27, David Hildenbrand (Arm) wrote:
>>> On 2/11/26 13:49, Usama Arif wrote:
>>>> Add a vmstat counter to track PTE allocation failures during PMD split.
>>>> This enables monitoring of split failures due to memory pressure after
>>>> the lazy PTE page table allocation change.
>>>>
>>>> The counter is incremented in three places:
>>>> - __split_huge_pmd(): Main entry point for splitting a PMD
>>>> - try_to_unmap_one(): When reclaim needs to split a PMD-mapped THP
>>>> - try_to_migrate_one(): When migration needs to split a PMD-mapped THP
>>>>
>>>> Visible via /proc/vmstat as thp_split_pmd_pte_alloc_failed.
>>>>
>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>> ---
>>>> include/linux/vm_event_item.h | 1 +
>>>> mm/huge_memory.c | 1 +
>>>> mm/rmap.c | 3 +++
>>>> mm/vmstat.c | 1 +
>>>> 4 files changed, 6 insertions(+)
>>>>
>>>> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
>>>> index 22a139f82d75f..827c9a8c251de 100644
>>>> --- a/include/linux/vm_event_item.h
>>>> +++ b/include/linux/vm_event_item.h
>>>> @@ -111,6 +111,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>>>> THP_DEFERRED_SPLIT_PAGE,
>>>> THP_UNDERUSED_SPLIT_PAGE,
>>>> THP_SPLIT_PMD,
>>>> + THP_SPLIT_PMD_PTE_ALLOC_FAILED,
>>>
>>> Probably sufficient to call this THP_SPLIT_PMD_FAILED and count any (future) failures (if any) as well.
>>>
>>
>> Makes sense. This was just a patch I was using for testing and I wanted to share.
>> It was always 0 as I couldnt get split to fail :) But I can rename it as THP_SPLIT_PMD_FAILED
>> as suggested and we can use for future split failures (hopefully none).
>
> I guess it might be reasonable to have because I am sure it will fail at some point and maybe provoke weird issues we didn't think of. In that case, having an indication that splitting failed at some point might be reasonable.
>
ack
© 2016 - 2026 Red Hat, Inc.