mm/huge_memory.c | 3 ++- mm/memory-failure.c | 6 ++++-- 2 files changed, 6 insertions(+), 3 deletions(-)
Handling a memory failure pointing inside a huge page requires splitting
the page. The splitting logic uses a mechanism, implemented in
migrate.c:try_to_map_unused_to_zeropage(), that inspects contents of
individual pages to find zero-filled pages. The read access to the
contents may cause a new, synchronous exception like an x86 Machine
Check, delivered before the initial memory_failure() finishes, ending
in a crash.
Luckily memory_failure() already sets the has_hwpoisoned flag on the
folio right before try_to_split_thp_page(). Don't enable the shared
zeropage mechanism (RMP_USE_SHARED_ZEROPAGE flag) down in
__split_unmapped_folio() when the original folio has has_hwpoisoned.
Note: we're disabling a potentially useful feature, some of the
individual pages that aren't poisoned might be zero-filled. One
argument for not trying to add a mechanism to maybe re-scan them later,
apart from code cost, is that the owning process is likely being
killed and the memory released.
Signed-off-by: Andrew Zaborowski <balrogg+code@gmail.com>
---
mm/huge_memory.c | 3 ++-
mm/memory-failure.c | 6 ++++--
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9c38a95e9f0..1568f0308b9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3588,6 +3588,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
struct list_head *list, bool uniform_split)
{
struct deferred_split *ds_queue = get_deferred_split_queue(folio);
+ bool has_hwpoisoned = folio_test_has_hwpoisoned(folio);
XA_STATE(xas, &folio->mapping->i_pages, folio->index);
struct folio *end_folio = folio_next(folio);
bool is_anon = folio_test_anon(folio);
@@ -3858,7 +3859,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
if (nr_shmem_dropped)
shmem_uncharge(mapping->host, nr_shmem_dropped);
- if (!ret && is_anon)
+ if (!ret && is_anon && !has_hwpoisoned)
remap_flags = RMP_USE_SHARED_ZEROPAGE;
remap_page(folio, 1 << order, remap_flags);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fc30ca4804b..2d755493de9 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2352,8 +2352,10 @@ int memory_failure(unsigned long pfn, int flags)
* otherwise it may race with THP split.
* And the flag can't be set in get_hwpoison_page() since
* it is called by soft offline too and it is just called
- * for !MF_COUNT_INCREASED. So here seems to be the best
- * place.
+ * for !MF_COUNT_INCREASED.
+ * It also tells __split_unmapped_folio() to not bother
+ * using the shared zeropage -- the all-zeros check would
+ * consume the poison. So here seems to be the best place.
*
* Don't need care about the above error handling paths for
* get_hwpoison_page() since they handle either free page
--
2.45.2
On 11.09.25 04:14, Andrew Zaborowski wrote: > Handling a memory failure pointing inside a huge page requires splitting > the page. The splitting logic uses a mechanism, implemented in > migrate.c:try_to_map_unused_to_zeropage(), that inspects contents of > individual pages to find zero-filled pages. The read access to the > contents may cause a new, synchronous exception like an x86 Machine > Check, delivered before the initial memory_failure() finishes, ending > in a crash. > > Luckily memory_failure() already sets the has_hwpoisoned flag on the > folio right before try_to_split_thp_page(). Don't enable the shared > zeropage mechanism (RMP_USE_SHARED_ZEROPAGE flag) down in > __split_unmapped_folio() when the original folio has has_hwpoisoned. > > Note: we're disabling a potentially useful feature, some of the > individual pages that aren't poisoned might be zero-filled. One > argument for not trying to add a mechanism to maybe re-scan them later, > apart from code cost, is that the owning process is likely being > killed and the memory released. > > Signed-off-by: Andrew Zaborowski <balrogg+code@gmail.com> > --- I would suggest just checking whether the page (PageHWPoison()) is poisoned before doing the check for zero. If set, just treat it as non-zero. No need to stop the split. You'll have to do that in two locations. No need to mess with RMP_USE_SHARED_ZEROPAGE -- Cheers David / dhildenb
On 10 Sep 2025, at 22:14, Andrew Zaborowski wrote: > Handling a memory failure pointing inside a huge page requires splitting > the page. The splitting logic uses a mechanism, implemented in > migrate.c:try_to_map_unused_to_zeropage(), that inspects contents of > individual pages to find zero-filled pages. The read access to the > contents may cause a new, synchronous exception like an x86 Machine > Check, delivered before the initial memory_failure() finishes, ending > in a crash. > > Luckily memory_failure() already sets the has_hwpoisoned flag on the > folio right before try_to_split_thp_page(). Don't enable the shared > zeropage mechanism (RMP_USE_SHARED_ZEROPAGE flag) down in > __split_unmapped_folio() when the original folio has has_hwpoisoned. > > Note: we're disabling a potentially useful feature, some of the > individual pages that aren't poisoned might be zero-filled. One > argument for not trying to add a mechanism to maybe re-scan them later, > apart from code cost, is that the owning process is likely being > killed and the memory released. Sounds reasonable to me. > > Signed-off-by: Andrew Zaborowski <balrogg+code@gmail.com> > --- > mm/huge_memory.c | 3 ++- > mm/memory-failure.c | 6 ++++-- > 2 files changed, 6 insertions(+), 3 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 9c38a95e9f0..1568f0308b9 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -3588,6 +3588,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, > struct list_head *list, bool uniform_split) > { > struct deferred_split *ds_queue = get_deferred_split_queue(folio); > + bool has_hwpoisoned = folio_test_has_hwpoisoned(folio); The state needs to be stored here because __split_unmapped_folio() clears the flag. Maybe add a comment here to prevent people from “optimizing” it by calling folio_test_has_hwpoisoned(folio) in the code below. (I wanted to until I checked the definition of folio_test_has_hwpoisoned()) > XA_STATE(xas, &folio->mapping->i_pages, folio->index); > struct folio *end_folio = folio_next(folio); > bool is_anon = folio_test_anon(folio); > @@ -3858,7 +3859,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, > if (nr_shmem_dropped) > shmem_uncharge(mapping->host, nr_shmem_dropped); > > - if (!ret && is_anon) > + if (!ret && is_anon && !has_hwpoisoned) > remap_flags = RMP_USE_SHARED_ZEROPAGE; > remap_page(folio, 1 << order, remap_flags); > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index fc30ca4804b..2d755493de9 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -2352,8 +2352,10 @@ int memory_failure(unsigned long pfn, int flags) > * otherwise it may race with THP split. > * And the flag can't be set in get_hwpoison_page() since > * it is called by soft offline too and it is just called > - * for !MF_COUNT_INCREASED. So here seems to be the best > - * place. > + * for !MF_COUNT_INCREASED. > + * It also tells __split_unmapped_folio() to not bother s/__split_unmapped_folio/__folio_split/, since remap_page() is called in __folio_split(). > + * using the shared zeropage -- the all-zeros check would > + * consume the poison. So here seems to be the best place. > * > * Don't need care about the above error handling paths for > * get_hwpoison_page() since they handle either free page > -- > 2.45.2 Otherwise, Acked-by: Zi Yan <ziy@nvidia.com> Best Regards, Yan, Zi
On Thu, Sep 11, 2025 at 12:11 PM Zi Yan <ziy@nvidia.com> wrote: > > On 10 Sep 2025, at 22:14, Andrew Zaborowski wrote: > > > Handling a memory failure pointing inside a huge page requires splitting > > the page. The splitting logic uses a mechanism, implemented in > > migrate.c:try_to_map_unused_to_zeropage(), that inspects contents of > > individual pages to find zero-filled pages. The read access to the > > contents may cause a new, synchronous exception like an x86 Machine > > Check, delivered before the initial memory_failure() finishes, ending > > in a crash. > > > > Luckily memory_failure() already sets the has_hwpoisoned flag on the > > folio right before try_to_split_thp_page(). Don't enable the shared > > zeropage mechanism (RMP_USE_SHARED_ZEROPAGE flag) down in > > __split_unmapped_folio() when the original folio has has_hwpoisoned. Nit: s/__split_unmapped_folio/__folio_split/ As Zi mentioned, remap_page() is called in __folio_split() ;) > > > > Note: we're disabling a potentially useful feature, some of the > > individual pages that aren't poisoned might be zero-filled. One > > argument for not trying to add a mechanism to maybe re-scan them later, > > apart from code cost, is that the owning process is likely being > > killed and the memory released. > > Sounds reasonable to me. Makes sense to me as well! > > > > > Signed-off-by: Andrew Zaborowski <balrogg+code@gmail.com> > > --- > > mm/huge_memory.c | 3 ++- > > mm/memory-failure.c | 6 ++++-- > > 2 files changed, 6 insertions(+), 3 deletions(-) > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 9c38a95e9f0..1568f0308b9 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -3588,6 +3588,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, > > struct list_head *list, bool uniform_split) > > { > > struct deferred_split *ds_queue = get_deferred_split_queue(folio); > > + bool has_hwpoisoned = folio_test_has_hwpoisoned(folio); > > The state needs to be stored here because __split_unmapped_folio() > clears the flag. Maybe add a comment here to prevent people > from “optimizing” it by calling folio_test_has_hwpoisoned(folio) > in the code below. > > (I wanted to until I checked the definition of folio_test_has_hwpoisoned()) folio_test_has_hwpoisoned() requires a large folio. That is safe in this context, since this path is only ever called for large folios. Cheers, Lance > > > XA_STATE(xas, &folio->mapping->i_pages, folio->index); > > struct folio *end_folio = folio_next(folio); > > bool is_anon = folio_test_anon(folio); > > @@ -3858,7 +3859,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, > > if (nr_shmem_dropped) > > shmem_uncharge(mapping->host, nr_shmem_dropped); > > > > - if (!ret && is_anon) > > + if (!ret && is_anon && !has_hwpoisoned) > > remap_flags = RMP_USE_SHARED_ZEROPAGE; > > remap_page(folio, 1 << order, remap_flags); > > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index fc30ca4804b..2d755493de9 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -2352,8 +2352,10 @@ int memory_failure(unsigned long pfn, int flags) > > * otherwise it may race with THP split. > > * And the flag can't be set in get_hwpoison_page() since > > * it is called by soft offline too and it is just called > > - * for !MF_COUNT_INCREASED. So here seems to be the best > > - * place. > > + * for !MF_COUNT_INCREASED. > > + * It also tells __split_unmapped_folio() to not bother > > s/__split_unmapped_folio/__folio_split/, since remap_page() is > called in __folio_split(). > > > + * using the shared zeropage -- the all-zeros check would > > + * consume the poison. So here seems to be the best place. > > * > > * Don't need care about the above error handling paths for > > * get_hwpoison_page() since they handle either free page > > -- > > 2.45.2 > > Otherwise, Acked-by: Zi Yan <ziy@nvidia.com> > > Best Regards, > Yan, Zi >
© 2016 - 2025 Red Hat, Inc.