arch/s390/kernel/uv.c | 85 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 72 insertions(+), 13 deletions(-)
From patch #3:
"
Currently, starting a PV VM on an iomap-based filesystem with large
folio support, such as XFS, will not work. We'll be stuck in
unpack_one()->gmap_make_secure(), because we can't seem to make progress
splitting the large folio.
The problem is that we require a writable PTE but a writable PTE under such
filesystems will imply a dirty folio.
So whenever we have a writable PTE, we'll have a dirty folio, and dirty
iomap folios cannot currently get split, because
split_folio()->split_huge_page_to_list_to_order()->filemap_release_folio()
will fail in iomap_release_folio().
So we will not make any progress splitting such large folios.
"
Let's fix one related problem during unpack first, to then handle such
folios by triggering writeback before immediately trying to split them
again.
This makes it work on XFS with large folios again.
Long-term, we should cleanly supporting splitting such folios even
without writeback, but that's a bit harder to implement and not a quick
fix.
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Sebastian Mitterle <smitterl@redhat.com>
David Hildenbrand (3):
s390/uv: don't return 0 from make_hva_secure() if the operation was
not successful
s390/uv: always return 0 from s390_wiggle_split_folio() if successful
s390/uv: improve splitting of large folios that cannot be split while
dirty
arch/s390/kernel/uv.c | 85 ++++++++++++++++++++++++++++++++++++-------
1 file changed, 72 insertions(+), 13 deletions(-)
base-commit: 088d13246a4672bc03aec664675138e3f5bff68c
--
2.49.0
On Fri, 16 May 2025 14:39:43 +0200 David Hildenbrand <david@redhat.com> wrote: > From patch #3: > > " > Currently, starting a PV VM on an iomap-based filesystem with large > folio support, such as XFS, will not work. We'll be stuck in > unpack_one()->gmap_make_secure(), because we can't seem to make progress > splitting the large folio. > > The problem is that we require a writable PTE but a writable PTE under such > filesystems will imply a dirty folio. > > So whenever we have a writable PTE, we'll have a dirty folio, and dirty > iomap folios cannot currently get split, because > split_folio()->split_huge_page_to_list_to_order()->filemap_release_folio() > will fail in iomap_release_folio(). > > So we will not make any progress splitting such large folios. > " > > Let's fix one related problem during unpack first, to then handle such > folios by triggering writeback before immediately trying to split them > again. > > This makes it work on XFS with large folios again. > > Long-term, we should cleanly supporting splitting such folios even > without writeback, but that's a bit harder to implement and not a quick > fix. picked for 6.16, I think it will survive the CI without issues, since I assume you tested this thoroughly > > Cc: Christian Borntraeger <borntraeger@linux.ibm.com> > Cc: Janosch Frank <frankja@linux.ibm.com> > Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> > Cc: David Hildenbrand <david@redhat.com> > Cc: Heiko Carstens <hca@linux.ibm.com> > Cc: Vasily Gorbik <gor@linux.ibm.com> > Cc: Alexander Gordeev <agordeev@linux.ibm.com> > Cc: Sven Schnelle <svens@linux.ibm.com> > Cc: Thomas Huth <thuth@redhat.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Sebastian Mitterle <smitterl@redhat.com> > > David Hildenbrand (3): > s390/uv: don't return 0 from make_hva_secure() if the operation was > not successful > s390/uv: always return 0 from s390_wiggle_split_folio() if successful > s390/uv: improve splitting of large folios that cannot be split while > dirty > > arch/s390/kernel/uv.c | 85 ++++++++++++++++++++++++++++++++++++------- > 1 file changed, 72 insertions(+), 13 deletions(-) > > > base-commit: 088d13246a4672bc03aec664675138e3f5bff68c
On 16.05.25 19:17, Claudio Imbrenda wrote: > On Fri, 16 May 2025 14:39:43 +0200 > David Hildenbrand <david@redhat.com> wrote: > >> From patch #3: >> >> " >> Currently, starting a PV VM on an iomap-based filesystem with large >> folio support, such as XFS, will not work. We'll be stuck in >> unpack_one()->gmap_make_secure(), because we can't seem to make progress >> splitting the large folio. >> >> The problem is that we require a writable PTE but a writable PTE under such >> filesystems will imply a dirty folio. >> >> So whenever we have a writable PTE, we'll have a dirty folio, and dirty >> iomap folios cannot currently get split, because >> split_folio()->split_huge_page_to_list_to_order()->filemap_release_folio() >> will fail in iomap_release_folio(). >> >> So we will not make any progress splitting such large folios. >> " >> >> Let's fix one related problem during unpack first, to then handle such >> folios by triggering writeback before immediately trying to split them >> again. >> >> This makes it work on XFS with large folios again. >> >> Long-term, we should cleanly supporting splitting such folios even >> without writeback, but that's a bit harder to implement and not a quick >> fix. > > picked for 6.16, I think it will survive the CI without issues, since > I assume you tested this thoroughly I did test what was known to be broken, but our QE did not run a bigger test on it. So giving it some soaking time + waiting for a bit for more review might be a good idea! -- Cheers, David / dhildenb
On Fri, 16 May 2025 14:39:43 +0200 David Hildenbrand <david@redhat.com> wrote: > From patch #3: > > " > Currently, starting a PV VM on an iomap-based filesystem with large > folio support, such as XFS, will not work. We'll be stuck in > unpack_one()->gmap_make_secure(), because we can't seem to make progress > splitting the large folio. > > The problem is that we require a writable PTE but a writable PTE under such > filesystems will imply a dirty folio. > > So whenever we have a writable PTE, we'll have a dirty folio, and dirty > iomap folios cannot currently get split, because > split_folio()->split_huge_page_to_list_to_order()->filemap_release_folio() > will fail in iomap_release_folio(). > > So we will not make any progress splitting such large folios. > " > > Let's fix one related problem during unpack first, to then handle such > folios by triggering writeback before immediately trying to split them > again. > > This makes it work on XFS with large folios again. > > Long-term, we should cleanly supporting splitting such folios even > without writeback, but that's a bit harder to implement and not a quick > fix. yet another layer of duck tape I really dislike the current interaction between secure execution and I/O, I hope I can get a cleaner solution as soon as possible meanwhile, let's keep the boat afloat; whole series: Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> David: thanks for fixing this mess! > > Cc: Christian Borntraeger <borntraeger@linux.ibm.com> > Cc: Janosch Frank <frankja@linux.ibm.com> > Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> > Cc: David Hildenbrand <david@redhat.com> > Cc: Heiko Carstens <hca@linux.ibm.com> > Cc: Vasily Gorbik <gor@linux.ibm.com> > Cc: Alexander Gordeev <agordeev@linux.ibm.com> > Cc: Sven Schnelle <svens@linux.ibm.com> > Cc: Thomas Huth <thuth@redhat.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Sebastian Mitterle <smitterl@redhat.com> > > David Hildenbrand (3): > s390/uv: don't return 0 from make_hva_secure() if the operation was > not successful > s390/uv: always return 0 from s390_wiggle_split_folio() if successful > s390/uv: improve splitting of large folios that cannot be split while > dirty > > arch/s390/kernel/uv.c | 85 ++++++++++++++++++++++++++++++++++++------- > 1 file changed, 72 insertions(+), 13 deletions(-) > > > base-commit: 088d13246a4672bc03aec664675138e3f5bff68c
On 16.05.25 19:07, Claudio Imbrenda wrote: > On Fri, 16 May 2025 14:39:43 +0200 > David Hildenbrand <david@redhat.com> wrote: > >> From patch #3: >> >> " >> Currently, starting a PV VM on an iomap-based filesystem with large >> folio support, such as XFS, will not work. We'll be stuck in >> unpack_one()->gmap_make_secure(), because we can't seem to make progress >> splitting the large folio. >> >> The problem is that we require a writable PTE but a writable PTE under such >> filesystems will imply a dirty folio. >> >> So whenever we have a writable PTE, we'll have a dirty folio, and dirty >> iomap folios cannot currently get split, because >> split_folio()->split_huge_page_to_list_to_order()->filemap_release_folio() >> will fail in iomap_release_folio(). >> >> So we will not make any progress splitting such large folios. >> " >> >> Let's fix one related problem during unpack first, to then handle such >> folios by triggering writeback before immediately trying to split them >> again. >> >> This makes it work on XFS with large folios again. >> >> Long-term, we should cleanly supporting splitting such folios even >> without writeback, but that's a bit harder to implement and not a quick >> fix. > > yet another layer of duck tape > > I really dislike the current interaction between secure execution and > I/O, I hope I can get a cleaner solution as soon as possible I'll be more than happy to review such a series -- hoping we can just support large folios naturally :) > > Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> > > Thanks! > David: thanks for fixing this mess! NP; I had a prototype of patch #3 for a long time. But after rebasing on top of your work I saw these weird validation errors and just couldn't find the issue. And I only saw them with patch #3 on ordinary pagecache folios, not with shmem, which severely confused. ... gave it another try today after ~1month and almost immediately spotted the issue. Some things just need time :) -- Cheers, David / dhildenb
© 2016 - 2025 Red Hat, Inc.