From: Kairui Song <kasong@tencent.com>
Instead of checking the poison flag only in the fast swap cache lookup
path, always check the poison flags after locking a swap cache folio.
There are two reasons to do so.
The folio is unstable and could be removed from the swap cache anytime,
so it's totally possible that the folio is no longer the backing folio
of a swap entry, and could be an irrelevant poisoned folio. We might
mistakenly kill a faulting process.
And it's totally possible or even common for the slow swap in path
(swapin_readahead) to bring in a cached folio. The cache folio could be
poisoned, too. Only checking the poison flag in the fast path will miss
such folios.
The race window is tiny, so it's very unlikely to happen, though.
While at it, also add a unlikely prefix.
Signed-off-by: Kairui Song <kasong@tencent.com>
---
mm/memory.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 10ef528a5f44..94a5928e8ace 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4661,10 +4661,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
goto out;
folio = swap_cache_get_folio(entry);
- if (folio) {
+ if (folio)
swap_update_readahead(folio, vma, vmf->address);
- page = folio_file_page(folio, swp_offset(entry));
- }
swapcache = folio;
if (!folio) {
@@ -4735,20 +4733,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
ret = VM_FAULT_MAJOR;
count_vm_event(PGMAJFAULT);
count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
- page = folio_file_page(folio, swp_offset(entry));
- } else if (PageHWPoison(page)) {
- /*
- * hwpoisoned dirty swapcache pages are kept for killing
- * owner processes (which may be unknown at hwpoison time)
- */
- ret = VM_FAULT_HWPOISON;
- goto out_release;
}
ret |= folio_lock_or_retry(folio, vmf);
if (ret & VM_FAULT_RETRY)
goto out_release;
+ page = folio_file_page(folio, swp_offset(entry));
if (swapcache) {
/*
* Make sure folio_free_swap() or swapoff did not release the
@@ -4761,6 +4752,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
page_swap_entry(page).val != entry.val))
goto out_page;
+ if (unlikely(PageHWPoison(page))) {
+ /*
+ * hwpoisoned dirty swapcache pages are kept for killing
+ * owner processes (which may be unknown at hwpoison time)
+ */
+ ret = VM_FAULT_HWPOISON;
+ goto out_page;
+ }
+
/*
* KSM sometimes has to copy on read faults, for example, if
* folio->index of non-ksm folios would be nonlinear inside the
--
2.51.0
On 05.09.25 21:13, Kairui Song wrote: > From: Kairui Song <kasong@tencent.com> > > Instead of checking the poison flag only in the fast swap cache lookup > path, always check the poison flags after locking a swap cache folio. > > There are two reasons to do so. > > The folio is unstable and could be removed from the swap cache anytime, > so it's totally possible that the folio is no longer the backing folio > of a swap entry, and could be an irrelevant poisoned folio. We might > mistakenly kill a faulting process. > > And it's totally possible or even common for the slow swap in path > (swapin_readahead) to bring in a cached folio. The cache folio could be > poisoned, too. Only checking the poison flag in the fast path will miss > such folios. > > The race window is tiny, so it's very unlikely to happen, though. > While at it, also add a unlikely prefix. > > Signed-off-by: Kairui Song <kasong@tencent.com> > --- > mm/memory.c | 22 +++++++++++----------- > 1 file changed, 11 insertions(+), 11 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 10ef528a5f44..94a5928e8ace 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4661,10 +4661,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > goto out; > > folio = swap_cache_get_folio(entry); > - if (folio) { > + if (folio) > swap_update_readahead(folio, vma, vmf->address); > - page = folio_file_page(folio, swp_offset(entry)); > - } > swapcache = folio; > > if (!folio) { > @@ -4735,20 +4733,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > ret = VM_FAULT_MAJOR; > count_vm_event(PGMAJFAULT); > count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); > - page = folio_file_page(folio, swp_offset(entry)); > - } else if (PageHWPoison(page)) { > - /* > - * hwpoisoned dirty swapcache pages are kept for killing > - * owner processes (which may be unknown at hwpoison time) > - */ > - ret = VM_FAULT_HWPOISON; > - goto out_release; > } > > ret |= folio_lock_or_retry(folio, vmf); > if (ret & VM_FAULT_RETRY) > goto out_release; > > + page = folio_file_page(folio, swp_offset(entry)); > if (swapcache) { > /* > * Make sure folio_free_swap() or swapoff did not release the > @@ -4761,6 +4752,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > page_swap_entry(page).val != entry.val)) > goto out_page; > > + if (unlikely(PageHWPoison(page))) { > + /* > + * hwpoisoned dirty swapcache pages are kept for killing > + * owner processes (which may be unknown at hwpoison time) > + */ > + ret = VM_FAULT_HWPOISON; > + goto out_page; > + } > + > /* > * KSM sometimes has to copy on read faults, for example, if > * folio->index of non-ksm folios would be nonlinear inside the LGTM, but I was wondering whether we just want to check that even when we just allocated a fresh folio for simplicity. The check is cheap ... -- Cheers David / dhildenb
On Mon, Sep 8, 2025 at 8:40 PM David Hildenbrand <david@redhat.com> wrote: > > On 05.09.25 21:13, Kairui Song wrote: > > From: Kairui Song <kasong@tencent.com> > > > > Instead of checking the poison flag only in the fast swap cache lookup > > path, always check the poison flags after locking a swap cache folio. > > > > There are two reasons to do so. > > > > The folio is unstable and could be removed from the swap cache anytime, > > so it's totally possible that the folio is no longer the backing folio > > of a swap entry, and could be an irrelevant poisoned folio. We might > > mistakenly kill a faulting process. > > > > And it's totally possible or even common for the slow swap in path > > (swapin_readahead) to bring in a cached folio. The cache folio could be > > poisoned, too. Only checking the poison flag in the fast path will miss > > such folios. > > > > The race window is tiny, so it's very unlikely to happen, though. > > While at it, also add a unlikely prefix. > > > > Signed-off-by: Kairui Song <kasong@tencent.com> > > --- > > mm/memory.c | 22 +++++++++++----------- > > 1 file changed, 11 insertions(+), 11 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 10ef528a5f44..94a5928e8ace 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -4661,10 +4661,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > goto out; > > > > folio = swap_cache_get_folio(entry); > > - if (folio) { > > + if (folio) > > swap_update_readahead(folio, vma, vmf->address); > > - page = folio_file_page(folio, swp_offset(entry)); > > - } > > swapcache = folio; > > > > if (!folio) { > > @@ -4735,20 +4733,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > ret = VM_FAULT_MAJOR; > > count_vm_event(PGMAJFAULT); > > count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); > > - page = folio_file_page(folio, swp_offset(entry)); > > - } else if (PageHWPoison(page)) { > > - /* > > - * hwpoisoned dirty swapcache pages are kept for killing > > - * owner processes (which may be unknown at hwpoison time) > > - */ > > - ret = VM_FAULT_HWPOISON; > > - goto out_release; > > } > > > > ret |= folio_lock_or_retry(folio, vmf); > > if (ret & VM_FAULT_RETRY) > > goto out_release; > > > > + page = folio_file_page(folio, swp_offset(entry)); > > if (swapcache) { > > /* > > * Make sure folio_free_swap() or swapoff did not release the > > @@ -4761,6 +4752,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > page_swap_entry(page).val != entry.val)) > > goto out_page; > > > > + if (unlikely(PageHWPoison(page))) { > > + /* > > + * hwpoisoned dirty swapcache pages are kept for killing > > + * owner processes (which may be unknown at hwpoison time) > > + */ > > + ret = VM_FAULT_HWPOISON; > > + goto out_page; > > + } > > + > > /* > > * KSM sometimes has to copy on read faults, for example, if > > * folio->index of non-ksm folios would be nonlinear inside the > > LGTM, but I was wondering whether we just want to check that even when Thanks for checking the patch. > we just allocated a fresh folio for simplicity. The check is cheap ... > Maybe not for now? This patch expects folio_test_swapcache to filter out potentially irrelevant folios, so moving the check before that is in theory not correct.. And folio_test_swapcache check won't work for the fresh allocated folio here... I'm planning to remove the whole `if (swapcache)` check in phase 2, as all swapin will go through swap cache. By that time all checks will always be applied. The simplification will be done in a cleaner way. > -- > Cheers > > David / dhildenb > >
On 09.09.25 16:54, Kairui Song wrote: > On Mon, Sep 8, 2025 at 8:40 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 05.09.25 21:13, Kairui Song wrote: >>> From: Kairui Song <kasong@tencent.com> >>> >>> Instead of checking the poison flag only in the fast swap cache lookup >>> path, always check the poison flags after locking a swap cache folio. >>> >>> There are two reasons to do so. >>> >>> The folio is unstable and could be removed from the swap cache anytime, >>> so it's totally possible that the folio is no longer the backing folio >>> of a swap entry, and could be an irrelevant poisoned folio. We might >>> mistakenly kill a faulting process. >>> >>> And it's totally possible or even common for the slow swap in path >>> (swapin_readahead) to bring in a cached folio. The cache folio could be >>> poisoned, too. Only checking the poison flag in the fast path will miss >>> such folios. >>> >>> The race window is tiny, so it's very unlikely to happen, though. >>> While at it, also add a unlikely prefix. >>> >>> Signed-off-by: Kairui Song <kasong@tencent.com> >>> --- >>> mm/memory.c | 22 +++++++++++----------- >>> 1 file changed, 11 insertions(+), 11 deletions(-) >>> >>> diff --git a/mm/memory.c b/mm/memory.c >>> index 10ef528a5f44..94a5928e8ace 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -4661,10 +4661,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >>> goto out; >>> >>> folio = swap_cache_get_folio(entry); >>> - if (folio) { >>> + if (folio) >>> swap_update_readahead(folio, vma, vmf->address); >>> - page = folio_file_page(folio, swp_offset(entry)); >>> - } >>> swapcache = folio; >>> >>> if (!folio) { >>> @@ -4735,20 +4733,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >>> ret = VM_FAULT_MAJOR; >>> count_vm_event(PGMAJFAULT); >>> count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); >>> - page = folio_file_page(folio, swp_offset(entry)); >>> - } else if (PageHWPoison(page)) { >>> - /* >>> - * hwpoisoned dirty swapcache pages are kept for killing >>> - * owner processes (which may be unknown at hwpoison time) >>> - */ >>> - ret = VM_FAULT_HWPOISON; >>> - goto out_release; >>> } >>> >>> ret |= folio_lock_or_retry(folio, vmf); >>> if (ret & VM_FAULT_RETRY) >>> goto out_release; >>> >>> + page = folio_file_page(folio, swp_offset(entry)); >>> if (swapcache) { >>> /* >>> * Make sure folio_free_swap() or swapoff did not release the >>> @@ -4761,6 +4752,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >>> page_swap_entry(page).val != entry.val)) >>> goto out_page; >>> >>> + if (unlikely(PageHWPoison(page))) { >>> + /* >>> + * hwpoisoned dirty swapcache pages are kept for killing >>> + * owner processes (which may be unknown at hwpoison time) >>> + */ >>> + ret = VM_FAULT_HWPOISON; >>> + goto out_page; >>> + } >>> + >>> /* >>> * KSM sometimes has to copy on read faults, for example, if >>> * folio->index of non-ksm folios would be nonlinear inside the >> >> LGTM, but I was wondering whether we just want to check that even when > > Thanks for checking the patch. > >> we just allocated a fresh folio for simplicity. The check is cheap ... >> > > Maybe not for now? This patch expects folio_test_swapcache to filter > out potentially irrelevant folios, so moving the check before that is > in theory not correct.. And folio_test_swapcache check won't work for > the fresh allocated folio here... > > I'm planning to remove the whole `if (swapcache)` check in phase 2, as > all swapin will go through swap cache. By that time all checks will > always be applied. The simplification will be done in a cleaner way. > Fair enough :) -- Cheers David / dhildenb
Separating this patch out makes it easier to read for me. Thank you. The last V1 mixed diff is very messy. My last attempt missing the out_page will unlock the HWPoison page as well. Now it is obviously correct to me. Acked-by: Chris Li <chrisl@kernel.org> Chris On Fri, Sep 5, 2025 at 12:14 PM Kairui Song <ryncsn@gmail.com> wrote: > > From: Kairui Song <kasong@tencent.com> > > Instead of checking the poison flag only in the fast swap cache lookup > path, always check the poison flags after locking a swap cache folio. > > There are two reasons to do so. > > The folio is unstable and could be removed from the swap cache anytime, > so it's totally possible that the folio is no longer the backing folio > of a swap entry, and could be an irrelevant poisoned folio. We might > mistakenly kill a faulting process. > > And it's totally possible or even common for the slow swap in path > (swapin_readahead) to bring in a cached folio. The cache folio could be > poisoned, too. Only checking the poison flag in the fast path will miss > such folios. > > The race window is tiny, so it's very unlikely to happen, though. > While at it, also add a unlikely prefix. > > Signed-off-by: Kairui Song <kasong@tencent.com> > --- > mm/memory.c | 22 +++++++++++----------- > 1 file changed, 11 insertions(+), 11 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 10ef528a5f44..94a5928e8ace 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4661,10 +4661,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > goto out; > > folio = swap_cache_get_folio(entry); > - if (folio) { > + if (folio) > swap_update_readahead(folio, vma, vmf->address); > - page = folio_file_page(folio, swp_offset(entry)); > - } > swapcache = folio; > > if (!folio) { > @@ -4735,20 +4733,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > ret = VM_FAULT_MAJOR; > count_vm_event(PGMAJFAULT); > count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); > - page = folio_file_page(folio, swp_offset(entry)); > - } else if (PageHWPoison(page)) { > - /* > - * hwpoisoned dirty swapcache pages are kept for killing > - * owner processes (which may be unknown at hwpoison time) > - */ > - ret = VM_FAULT_HWPOISON; > - goto out_release; > } > > ret |= folio_lock_or_retry(folio, vmf); > if (ret & VM_FAULT_RETRY) > goto out_release; > > + page = folio_file_page(folio, swp_offset(entry)); > if (swapcache) { > /* > * Make sure folio_free_swap() or swapoff did not release the > @@ -4761,6 +4752,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > page_swap_entry(page).val != entry.val)) > goto out_page; > > + if (unlikely(PageHWPoison(page))) { > + /* > + * hwpoisoned dirty swapcache pages are kept for killing > + * owner processes (which may be unknown at hwpoison time) > + */ > + ret = VM_FAULT_HWPOISON; > + goto out_page; > + } > + > /* > * KSM sometimes has to copy on read faults, for example, if > * folio->index of non-ksm folios would be nonlinear inside the > -- > 2.51.0 > >
© 2016 - 2025 Red Hat, Inc.