[v3 07/24] mm: thp: retry on split failure in change_pmd_range()

Usama Arif posted 24 patches 6 days, 16 hours ago
[v3 07/24] mm: thp: retry on split failure in change_pmd_range()
Posted by Usama Arif 6 days, 16 hours ago
change_pmd_range() splits a huge PMD when mprotect() targets a sub-PMD
range or when VMA flags require per-PTE protection bits that can't be
represented at PMD granularity.

If pte_alloc_one() fails inside __split_huge_pmd(), the huge PMD remains
intact. Without this change, change_pte_range() would return -EAGAIN
because pte_offset_map_lock() returns NULL for a huge PMD, sending the
code back to the 'again' label to retry the split—without ever calling
cond_resched().

Now that __split_huge_pmd() returns an error code, handle it explicitly:
yield the CPU with cond_resched() and retry via goto again, giving other
tasks a chance to free memory.

Trying to return an error all the way to change_protection_range would
not work as it would leave a memory range with new protections, and
others unchanged, with no easy way to roll back the already modified
entries (and previous splits). __split_huge_pmd only requires an
order-0 allocation and is extremely unlikely to fail.

Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
 mm/mprotect.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 110d47a36d4bb..e39e96963da8b 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -477,7 +477,16 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
 		if (pmd_is_huge(_pmd)) {
 			if ((next - addr != HPAGE_PMD_SIZE) ||
 			    pgtable_split_needed(vma, cp_flags)) {
-				__split_huge_pmd(vma, pmd, addr, false);
+				ret = __split_huge_pmd(vma, pmd, addr, false);
+				if (ret) {
+					/*
+					 * Yield and retry. Other tasks
+					 * may free memory while we
+					 * reschedule.
+					 */
+					cond_resched();
+					goto again;
+				}
 				/*
 				 * For file-backed, the pmd could have been
 				 * cleared; make sure pmd populated if
-- 
2.52.0

Re: [v3 07/24] mm: thp: retry on split failure in change_pmd_range()
Posted by Kiryl Shutsemau 3 days, 4 hours ago
On Thu, Mar 26, 2026 at 07:08:49PM -0700, Usama Arif wrote:
> change_pmd_range() splits a huge PMD when mprotect() targets a sub-PMD
> range or when VMA flags require per-PTE protection bits that can't be
> represented at PMD granularity.
> 
> If pte_alloc_one() fails inside __split_huge_pmd(), the huge PMD remains
> intact. Without this change, change_pte_range() would return -EAGAIN
> because pte_offset_map_lock() returns NULL for a huge PMD, sending the
> code back to the 'again' label to retry the split—without ever calling
> cond_resched().
> 
> Now that __split_huge_pmd() returns an error code, handle it explicitly:
> yield the CPU with cond_resched() and retry via goto again, giving other
> tasks a chance to free memory.
> 
> Trying to return an error all the way to change_protection_range would
> not work as it would leave a memory range with new protections, and
> others unchanged, with no easy way to roll back the already modified
> entries (and previous splits). __split_huge_pmd only requires an
> order-0 allocation and is extremely unlikely to fail.

I think this is wrong approach. We need to split page tables upfront
before going into depth of change_protection() and doing irreversible
changes.

Conceptually, it should be similar to vma_adjust_trans_huge() in vma
split/merge paths.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov