include/linux/hugetlb.h | 25 ++++++++++++++++++++++--- include/linux/mm.h | 7 ++++--- mm/khugepaged.c | 21 +++++++++++++++------ mm/pgtable-generic.c | 4 ++-- 4 files changed, 43 insertions(+), 14 deletions(-)
Working on another generic page table walker that tries to avoid special-casing hugetlb, I found a page table locking issue with hugetlb folios that are not mapped using a single PMD/PUD. For some hugetlb folio sizes, GUP will take different page table locks when walking the page tables than hugetlb when modifying the page tables. I did not actually try reproducing an issue, but looking at follow_pmd_mask() where we might be rereading a PMD value multiple times it's rather clear that concurrent modifications are rather unpleasant. In follow_page_pte() we might be better in that regard -- ptep_get() does a READ_ONCE() -- but who knows what else could happen concurrently in some weird corner cases (e.g., hugetlb folio getting unmapped and freed). Did some basic sanity testing with various hugetlb sizes on x86-64 and arm64. Maybe I'll find some time to actually write a simple reproducer in the common weeks, so this wouldn't have to be all-theoretical for now. Only v6.10 is affected, so the #1 can be simply backported as a prereq patch along with the real fix. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> David Hildenbrand (2): mm: let pte_lockptr() consume a pte_t pointer mm/hugetlb: fix hugetlb vs. core-mm PT locking include/linux/hugetlb.h | 25 ++++++++++++++++++++++--- include/linux/mm.h | 7 ++++--- mm/khugepaged.c | 21 +++++++++++++++------ mm/pgtable-generic.c | 4 ++-- 4 files changed, 43 insertions(+), 14 deletions(-) base-commit: cca1345bd26a67fc61a92ff0c6d81766c259e522 -- 2.45.2
On Thu, 25 Jul 2024 20:39:53 +0200 David Hildenbrand <david@redhat.com> wrote: > Working on another generic page table walker that tries to avoid > special-casing hugetlb, I found a page table locking issue with hugetlb > folios that are not mapped using a single PMD/PUD. > > For some hugetlb folio sizes, GUP will take different page table locks > when walking the page tables than hugetlb when modifying the page tables. > > I did not actually try reproducing an issue, but looking at > follow_pmd_mask() where we might be rereading a PMD value multiple times > it's rather clear that concurrent modifications are rather unpleasant. > > In follow_page_pte() we might be better in that regard -- ptep_get() does > a READ_ONCE() -- but who knows what else could happen concurrently in > some weird corner cases (e.g., hugetlb folio getting unmapped and freed). > > Did some basic sanity testing with various hugetlb sizes on x86-64 and > arm64. Maybe I'll find some time to actually write a simple reproducer in > the common weeks, so this wouldn't have to be all-theoretical for now. When can we be confident that this change is merge-worthy? > Only v6.10 is affected, so the #1 can be simply backported as a prereq > patch along with the real fix. I'll add the same Fixes: to [1/2], and cc:stable.
On 25.07.24 22:41, Andrew Morton wrote: > On Thu, 25 Jul 2024 20:39:53 +0200 David Hildenbrand <david@redhat.com> wrote: > >> Working on another generic page table walker that tries to avoid >> special-casing hugetlb, I found a page table locking issue with hugetlb >> folios that are not mapped using a single PMD/PUD. >> >> For some hugetlb folio sizes, GUP will take different page table locks >> when walking the page tables than hugetlb when modifying the page tables. >> >> I did not actually try reproducing an issue, but looking at >> follow_pmd_mask() where we might be rereading a PMD value multiple times >> it's rather clear that concurrent modifications are rather unpleasant. >> >> In follow_page_pte() we might be better in that regard -- ptep_get() does >> a READ_ONCE() -- but who knows what else could happen concurrently in >> some weird corner cases (e.g., hugetlb folio getting unmapped and freed). >> >> Did some basic sanity testing with various hugetlb sizes on x86-64 and >> arm64. Maybe I'll find some time to actually write a simple reproducer in >> the common weeks, so this wouldn't have to be all-theoretical for now. > > When can we be confident that this change is merge-worthy? I'm convinced that it is the right thing to do, but I don't think we have to rush this. As Baolin notes, we fixed the same issue in the past, unfortunately also without a reproducer IIUC, so I'll try to reproduce the race, but I'm not 100% sure if I'll manage to do so.. So it's certainly merge-worthy after it had a bit of exposure to -next, but no need to rush this upstream. Thanks! -- Cheers, David / dhildenb
On 26.07.24 11:19, David Hildenbrand wrote: > On 25.07.24 22:41, Andrew Morton wrote: >> On Thu, 25 Jul 2024 20:39:53 +0200 David Hildenbrand <david@redhat.com> wrote: >> >>> Working on another generic page table walker that tries to avoid >>> special-casing hugetlb, I found a page table locking issue with hugetlb >>> folios that are not mapped using a single PMD/PUD. >>> >>> For some hugetlb folio sizes, GUP will take different page table locks >>> when walking the page tables than hugetlb when modifying the page tables. >>> >>> I did not actually try reproducing an issue, but looking at >>> follow_pmd_mask() where we might be rereading a PMD value multiple times >>> it's rather clear that concurrent modifications are rather unpleasant. >>> >>> In follow_page_pte() we might be better in that regard -- ptep_get() does >>> a READ_ONCE() -- but who knows what else could happen concurrently in >>> some weird corner cases (e.g., hugetlb folio getting unmapped and freed). >>> >>> Did some basic sanity testing with various hugetlb sizes on x86-64 and >>> arm64. Maybe I'll find some time to actually write a simple reproducer in >>> the common weeks, so this wouldn't have to be all-theoretical for now. >> >> When can we be confident that this change is merge-worthy? > > I'm convinced that it is the right thing to do, but I don't think we > have to rush this. > > As Baolin notes, we fixed the same issue in the past, unfortunately also > without a reproducer IIUC, so I'll try to reproduce the race, but I'm > not 100% sure if I'll manage to do so.. Okay, so running this series against the reproducer I pulled out of my magic hat, this series seems to properly fix it. -- Cheers, David / dhildenb
© 2016 - 2026 Red Hat, Inc.