arch/s390/include/asm/page.h | 2 + arch/s390/kernel/uv.c | 112 ++++++++++++++++++++++------------- arch/s390/mm/gmap.c | 4 +- arch/s390/mm/hugetlbpage.c | 8 +-- 4 files changed, 79 insertions(+), 47 deletions(-)
On my journey to remove page_mapcount(), I got hooked up on other folio cleanups that Willy most certainly will enjoy. This series removes the s390x usage of: * page_mapcount() [patches WIP] * page_has_private() [have patches to remove it] ... and makes PG_arch_1 only be set on folio->flags (i.e., never on tail pages of large folios). Further, one "easy" fix upfront. ... unfortunately there is one other issue I spotted that I am not tackling in this series, because I am not 100% sure what we want to do: the usage of page_ref_freeze()/folio_ref_freeze() in make_folio_secure() is unsafe. :( In make_folio_secure(), we're holding the folio lock, the mmap lock and the PT lock. So we are protected against concurrent fork(), zap, GUP, swapin, migration ... The page_ref_freeze()/ folio_ref_freeze() should also block concurrent GUP-fast very reliably. But if the folio is mapped into multiple page tables, we could see concurrent zapping of the folio, a pagecache folios could get mapped/ accessed concurrent, we could see fork() sharing the page in another process, GUP ... trying to adjust the folio refcount while we froze it. Very bad. For anonymous folios, it would likely be sufficient to check that folio_mapcount() == 1. For pagecache folios, that's insufficient, likely we would have to lock the pagecache. To handle folios mapped into multiple page tables, we would have to do what split_huge_page_to_list_to_order() does (temporary migration entries). So it's a bit more involved, and I'll have to leave that to s390x folks to figure out. There are othe reasonable cleanups I think, but I'll have to focus on other stuff. Compile tested, but not runtime tested, I'll appreiate some testing help from people with UV access and experience. Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Thomas Huth <thuth@redhat.com> David Hildenbrand (5): s390/uv: don't call wait_on_page_writeback() without a reference s390/uv: convert gmap_make_secure() to work on folios s390/uv: convert PG_arch_1 users to only work on small folios s390/uv: update PG_arch_1 comment s390/hugetlb: convert PG_arch_1 code to work on folio->flags arch/s390/include/asm/page.h | 2 + arch/s390/kernel/uv.c | 112 ++++++++++++++++++++++------------- arch/s390/mm/gmap.c | 4 +- arch/s390/mm/hugetlbpage.c | 8 +-- 4 files changed, 79 insertions(+), 47 deletions(-) -- 2.44.0
On Thu, Apr 04, 2024 at 06:36:37PM +0200, David Hildenbrand wrote: > On my journey to remove page_mapcount(), I got hooked up on other folio > cleanups that Willy most certainly will enjoy. > > This series removes the s390x usage of: > * page_mapcount() [patches WIP] > * page_has_private() [have patches to remove it] > > ... and makes PG_arch_1 only be set on folio->flags (i.e., never on tail > pages of large folios). > > Further, one "easy" fix upfront. Looks like you didn't see: https://lore.kernel.org/linux-s390/20240322161149.2327518-1-willy@infradead.org/ > ... unfortunately there is one other issue I spotted that I am not > tackling in this series, because I am not 100% sure what we want to > do: the usage of page_ref_freeze()/folio_ref_freeze() in > make_folio_secure() is unsafe. :( > > In make_folio_secure(), we're holding the folio lock, the mmap lock and > the PT lock. So we are protected against concurrent fork(), zap, GUP, > swapin, migration ... The page_ref_freeze()/ folio_ref_freeze() should > also block concurrent GUP-fast very reliably. > > But if the folio is mapped into multiple page tables, we could see > concurrent zapping of the folio, a pagecache folios could get mapped/ > accessed concurrent, we could see fork() sharing the page in another > process, GUP ... trying to adjust the folio refcount while we froze it. > Very bad. Hmmm. Why is that not then a problem for, eg, splitting or migrating? Is it because they unmap first and then try to freeze? > For anonymous folios, it would likely be sufficient to check that > folio_mapcount() == 1. For pagecache folios, that's insufficient, likely > we would have to lock the pagecache. To handle folios mapped into > multiple page tables, we would have to do what > split_huge_page_to_list_to_order() does (temporary migration entries). > > So it's a bit more involved, and I'll have to leave that to s390x folks to > figure out. There are othe reasonable cleanups I think, but I'll have to > focus on other stuff. > > Compile tested, but not runtime tested, I'll appreiate some testing help > from people with UV access and experience. > > Cc: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: Heiko Carstens <hca@linux.ibm.com> > Cc: Vasily Gorbik <gor@linux.ibm.com> > Cc: Alexander Gordeev <agordeev@linux.ibm.com> > Cc: Christian Borntraeger <borntraeger@linux.ibm.com> > Cc: Sven Schnelle <svens@linux.ibm.com> > Cc: Janosch Frank <frankja@linux.ibm.com> > Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> > Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: Thomas Huth <thuth@redhat.com> > > David Hildenbrand (5): > s390/uv: don't call wait_on_page_writeback() without a reference > s390/uv: convert gmap_make_secure() to work on folios > s390/uv: convert PG_arch_1 users to only work on small folios > s390/uv: update PG_arch_1 comment > s390/hugetlb: convert PG_arch_1 code to work on folio->flags > > arch/s390/include/asm/page.h | 2 + > arch/s390/kernel/uv.c | 112 ++++++++++++++++++++++------------- > arch/s390/mm/gmap.c | 4 +- > arch/s390/mm/hugetlbpage.c | 8 +-- > 4 files changed, 79 insertions(+), 47 deletions(-) > > -- > 2.44.0 >
On 05.04.24 05:42, Matthew Wilcox wrote: > On Thu, Apr 04, 2024 at 06:36:37PM +0200, David Hildenbrand wrote: >> On my journey to remove page_mapcount(), I got hooked up on other folio >> cleanups that Willy most certainly will enjoy. >> >> This series removes the s390x usage of: >> * page_mapcount() [patches WIP] >> * page_has_private() [have patches to remove it] >> >> ... and makes PG_arch_1 only be set on folio->flags (i.e., never on tail >> pages of large folios). >> >> Further, one "easy" fix upfront. > > Looks like you didn't see: > > https://lore.kernel.org/linux-s390/20240322161149.2327518-1-willy@infradead.org/ Yes, I only skimmed linux-mm. I think s390x certainly wants to handle PTE-mapped THP in that code, I think there are ways to trigger that, we're just mostly lucky that it doesn't happen in the common case. But thinking about it, the current page_mapcount() based check could not possibly have worked for them and rejected any PTE-mapped THP. So I can just base my changes on top of yours (we might want to get the first fix in ahead of time). > >> ... unfortunately there is one other issue I spotted that I am not >> tackling in this series, because I am not 100% sure what we want to >> do: the usage of page_ref_freeze()/folio_ref_freeze() in >> make_folio_secure() is unsafe. :( >> >> In make_folio_secure(), we're holding the folio lock, the mmap lock and >> the PT lock. So we are protected against concurrent fork(), zap, GUP, >> swapin, migration ... The page_ref_freeze()/ folio_ref_freeze() should >> also block concurrent GUP-fast very reliably. >> >> But if the folio is mapped into multiple page tables, we could see >> concurrent zapping of the folio, a pagecache folios could get mapped/ >> accessed concurrent, we could see fork() sharing the page in another >> process, GUP ... trying to adjust the folio refcount while we froze it. >> Very bad. > > Hmmm. Why is that not then a problem for, eg, splitting or migrating? > Is it because they unmap first and then try to freeze? Yes, exactly. Using mapcount in combination with ref freezing is problematic. Except maybe for anonymous folios with mapcount=1, while holding a bunch of locks to stop anybody from stumbling over that. -- Cheers, David / dhildenb
© 2016 - 2026 Red Hat, Inc.