mm/hugetlb: restore reservation on error in hugetlb_mfill_atomic_pte() resubmission path

[PATCH v2] mm/hugetlb: restore reservation on error in hugetlb_mfill_atomic_pte() resubmission path

Posted by David Carlier 5 days, 1 hour ago

When the resubmission path in hugetlb_mfill_atomic_pte() allocates a new
hugetlb folio via alloc_hugetlb_folio(), a VMA reservation is consumed.
If copy_user_large_folio() subsequently fails (e.g. -EHWPOISON when the
source page is hwpoisoned), folio_put() restores the global hugetlb pool
count through free_huge_folio(), but the per-VMA reservation map entry
is left marked consumed.

User-visible effect: on a UFFDIO_COPY into a private hugetlb VMA where
the resubmission path's copy fails, the reservation for that address is
leaked from the VMA's reserve map. A subsequent fault at the same
address takes the no-reservation path, and under hugetlb pool pressure
the task is SIGBUSed at an address it had previously reserved. One map
entry is leaked per occurrence.

Add the missing restore_reserve_on_error() call before folio_put(),
matching the first-attempt error path which already handles this
correctly.

Fixes: 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults")
Cc: <stable@vger.kernel.org>
Signed-off-by: David Carlier <devnexen@gmail.com>
---
v2:
  - Add user-visible effects paragraph in changelog (per akpm,
    required for Cc: stable).
  - Correct Fixes: tag to 1cb9dc4b475c (per Muchun) -- the failing
    arm only exists since copy_user_large_folio() became int-returning.

Andrew, please drop the v1 currently queued as 270157aef0d1 in
mm-unstable.

v1: https://lore.kernel.org/all/20260322052120.14021-1-devnexen@gmail.com/

 mm/hugetlb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4b80b167cc9c..c6dee98840db 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6270,6 +6270,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
 		folio_put(*foliop);
 		*foliop = NULL;
 		if (ret) {
+			restore_reserve_on_error(h, dst_vma, dst_addr, folio);
 			folio_put(folio);
 			goto out;
 		}
-- 
2.53.0

Re: [PATCH v2] mm/hugetlb: restore reservation on error in hugetlb_mfill_atomic_pte() resubmission path

Posted by Muchun Song 4 days, 21 hours ago


> On May 20, 2026, at 07:05, David Carlier <devnexen@gmail.com> wrote:
> 
> When the resubmission path in hugetlb_mfill_atomic_pte() allocates a new
> hugetlb folio via alloc_hugetlb_folio(), a VMA reservation is consumed.
> If copy_user_large_folio() subsequently fails (e.g. -EHWPOISON when the
> source page is hwpoisoned), folio_put() restores the global hugetlb pool
> count through free_huge_folio(), but the per-VMA reservation map entry
> is left marked consumed.
> 
> User-visible effect: on a UFFDIO_COPY into a private hugetlb VMA where
> the resubmission path's copy fails, the reservation for that address is
> leaked from the VMA's reserve map. A subsequent fault at the same
> address takes the no-reservation path, and under hugetlb pool pressure
> the task is SIGBUSed at an address it had previously reserved. One map
> entry is leaked per occurrence.
> 
> Add the missing restore_reserve_on_error() call before folio_put(),
> matching the first-attempt error path which already handles this
> correctly.
> 
> Fixes: 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
> v2:
>  - Add user-visible effects paragraph in changelog (per akpm,
>    required for Cc: stable).
>  - Correct Fixes: tag to 1cb9dc4b475c (per Muchun) -- the failing
>    arm only exists since copy_user_large_folio() became int-returning.
> 
> Andrew, please drop the v1 currently queued as 270157aef0d1 in
> mm-unstable.
> 
> v1: https://lore.kernel.org/all/20260322052120.14021-1-devnexen@gmail.com/
> 
> mm/hugetlb.c | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4b80b167cc9c..c6dee98840db 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6270,6 +6270,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
> 	folio_put(*foliop);
> *	foliop = NULL;
> 	if (ret) {
> + 		restore_reserve_on_error(h, dst_vma, dst_addr, folio);

I think you should fix the same problem in copy_hugetlb_page_range()
within this patch as well since both are introduced by the same commit.

Muchun,
Thanks.

> 		folio_put(folio);
> 		goto out;
> 	}
> -- 
> 2.53.0
>

[PATCH v3] mm/hugetlb: restore reservation on error in hugetlb folio copy paths

Posted by David Carlier 4 days, 19 hours ago

Two sites in mm/hugetlb.c allocate a hugetlb folio via
alloc_hugetlb_folio() (consuming a VMA reservation) and then call
copy_user_large_folio(), which became int-returning in commit
1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage
copy-on-write faults") and can now fail (e.g. -EHWPOISON on a
hwpoisoned source page). On the failure path, folio_put() restores
the global hugetlb pool count through free_huge_folio(), but the
per-VMA reservation map entry is left marked consumed:

  - hugetlb_mfill_atomic_pte() resubmission path (UFFDIO_COPY)
  - copy_hugetlb_page_range() fork-time CoW path when
    hugetlb_try_dup_anon_rmap() fails (rare: pinned hugetlb anon
    folio under fork)

User-visible effect: on UFFDIO_COPY into a private hugetlb VMA where
the resubmission copy fails, the reservation for that address is
leaked from the VMA's reserve map. A subsequent fault at the same
address takes the no-reservation path, and under hugetlb pool
pressure the task is SIGBUSed at an address it had previously
reserved. The fork-time CoW path leaks the same way in the child
VMA's reserve map, though it requires the much rarer combination
of pinned hugetlb anon page + hwpoisoned source.

Add the missing restore_reserve_on_error() call before folio_put()
on both error paths.

Fixes: 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults")
Cc: <stable@vger.kernel.org>
Signed-off-by: David Carlier <devnexen@gmail.com>
---
v3:
  - Fold the copy_hugetlb_page_range() sibling fix into this patch
    (per Muchun) -- same Fixes commit, same fix pattern, single
    backport unit for stable.
  - Reworded changelog to cover both sites.

v2: https://lore.kernel.org/all/20260519230503.121293-1-devnexen@gmail.com/
v1: https://lore.kernel.org/all/20260322052120.14021-1-devnexen@gmail.com/

 mm/hugetlb.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4b80b167cc9c..ba7c3ed96835 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4974,6 +4974,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 							    addr, dst_vma);
 				folio_put(pte_folio);
 				if (ret) {
+					restore_reserve_on_error(h, dst_vma, addr, new_folio);
 					folio_put(new_folio);
 					break;
 				}
@@ -6270,6 +6271,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
 		folio_put(*foliop);
 		*foliop = NULL;
 		if (ret) {
+			restore_reserve_on_error(h, dst_vma, dst_addr, folio);
 			folio_put(folio);
 			goto out;
 		}
-- 
2.53.0

Re: [PATCH v3] mm/hugetlb: restore reservation on error in hugetlb folio copy paths

Posted by Muchun Song 4 days, 18 hours ago


> On May 20, 2026, at 12:49, David Carlier <devnexen@gmail.com> wrote:
> 
> Two sites in mm/hugetlb.c allocate a hugetlb folio via
> alloc_hugetlb_folio() (consuming a VMA reservation) and then call
> copy_user_large_folio(), which became int-returning in commit
> 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage
> copy-on-write faults") and can now fail (e.g. -EHWPOISON on a
> hwpoisoned source page). On the failure path, folio_put() restores
> the global hugetlb pool count through free_huge_folio(), but the
> per-VMA reservation map entry is left marked consumed:
> 
>  - hugetlb_mfill_atomic_pte() resubmission path (UFFDIO_COPY)
>  - copy_hugetlb_page_range() fork-time CoW path when
>    hugetlb_try_dup_anon_rmap() fails (rare: pinned hugetlb anon
>    folio under fork)
> 
> User-visible effect: on UFFDIO_COPY into a private hugetlb VMA where
> the resubmission copy fails, the reservation for that address is
> leaked from the VMA's reserve map. A subsequent fault at the same
> address takes the no-reservation path, and under hugetlb pool
> pressure the task is SIGBUSed at an address it had previously
> reserved. The fork-time CoW path leaks the same way in the child
> VMA's reserve map, though it requires the much rarer combination
> of pinned hugetlb anon page + hwpoisoned source.
> 
> Add the missing restore_reserve_on_error() call before folio_put()
> on both error paths.
> 
> Fixes: 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: David Carlier <devnexen@gmail.com>

Reviewed-by: Muchun Song <muchun.song@linux.dev>

Thanks.