include/linux/huge_mm.h | 18 ++++++++++++++++++ mm/huge_memory.c | 13 +------------ mm/memory.c | 9 +++++++++ mm/shmem.c | 7 +------ 4 files changed, 29 insertions(+), 18 deletions(-)
During testing, it was found that we can get PMD mappings in processes where THP (and more precisely, PMD mappings) are supposed to be disabled. While it works as expected for anon+shmem, the pagecache is the problematic bit. For s390 KVM this currently means that a VM backed by a file located on filesystem with large folio support can crash when KVM tries accessing the problematic page, because the readahead logic might decide to use a PMD-sized THP and faulting it into the page tables will install a PMD mapping, something that s390 KVM cannot tolerate. This might also be a problem with HW that does not support PMD mappings, but I did not try reproducing it. Fix it by respecting the ways to disable THPs when deciding whether we can install a PMD mapping. khugepaged should already be taking care of not collapsing if THPs are effectively disabled for the hw/process/vma. An earlier patch was tested by Thomas Huth, this one still needs to be retested; sending it out already. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Thomas Huth <thuth@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> David Hildenbrand (1): mm: don't install PMD mappings when THPs are disabled by the hw/process/vma Kefeng Wang (1): mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() include/linux/huge_mm.h | 18 ++++++++++++++++++ mm/huge_memory.c | 13 +------------ mm/memory.c | 9 +++++++++ mm/shmem.c | 7 +------ 4 files changed, 29 insertions(+), 18 deletions(-) -- 2.46.1
On 11/10/2024 12.24, David Hildenbrand wrote: > During testing, it was found that we can get PMD mappings in processes > where THP (and more precisely, PMD mappings) are supposed to be disabled. > While it works as expected for anon+shmem, the pagecache is the problematic > bit. > > For s390 KVM this currently means that a VM backed by a file located on > filesystem with large folio support can crash when KVM tries accessing > the problematic page, because the readahead logic might decide to use > a PMD-sized THP and faulting it into the page tables will install a > PMD mapping, something that s390 KVM cannot tolerate. > > This might also be a problem with HW that does not support PMD mappings, > but I did not try reproducing it. > > Fix it by respecting the ways to disable THPs when deciding whether we > can install a PMD mapping. khugepaged should already be taking care of > not collapsing if THPs are effectively disabled for the hw/process/vma. > > An earlier patch was tested by Thomas Huth, this one still needs to > be retested; sending it out already. I just finished testing your new version of these patches here, and I can confirm that they are fixing the problem that I was facing, so: Tested-by: Thomas Huth <thuth@redhat.com> FWIW, the problem can be reproduced by running a KVM guest on a s390x host like this: qemu-system-s390x -accel kvm -nographic -m 4G -d guest_errors \ -M s390-ccw-virtio,memory-backend=mem-machine_mem \ -object memory-backend-file,size=4294967296,prealloc=true,mem-path=$HOME/myfile,share=true,id=mem-machine_mem Without the fix, the guest crashes immediatly before being able to execute the first instruction. With the fix applied, you can still see the first messages of the guest firmware, indicating that the guest started successfully. Thank you very much for the fix, David! Thomas
On 11.10.24 13:39, Thomas Huth wrote: > On 11/10/2024 12.24, David Hildenbrand wrote: >> During testing, it was found that we can get PMD mappings in processes >> where THP (and more precisely, PMD mappings) are supposed to be disabled. >> While it works as expected for anon+shmem, the pagecache is the problematic >> bit. >> >> For s390 KVM this currently means that a VM backed by a file located on >> filesystem with large folio support can crash when KVM tries accessing >> the problematic page, because the readahead logic might decide to use >> a PMD-sized THP and faulting it into the page tables will install a >> PMD mapping, something that s390 KVM cannot tolerate. >> >> This might also be a problem with HW that does not support PMD mappings, >> but I did not try reproducing it. >> >> Fix it by respecting the ways to disable THPs when deciding whether we >> can install a PMD mapping. khugepaged should already be taking care of >> not collapsing if THPs are effectively disabled for the hw/process/vma. >> >> An earlier patch was tested by Thomas Huth, this one still needs to >> be retested; sending it out already. > > I just finished testing your new version of these patches here, and I can > confirm that they are fixing the problem that I was facing, so: > > Tested-by: Thomas Huth <thuth@redhat.com> > > FWIW, the problem can be reproduced by running a KVM guest on a s390x host > like this: > > qemu-system-s390x -accel kvm -nographic -m 4G -d guest_errors \ > -M s390-ccw-virtio,memory-backend=mem-machine_mem \ > -object > memory-backend-file,size=4294967296,prealloc=true,mem-path=$HOME/myfile,share=true,id=mem-machine_mem > > Without the fix, the guest crashes immediatly before being able to execute > the first instruction. With the fix applied, you can still see the first > messages of the guest firmware, indicating that the guest started successfully. > > Thank you very much for the fix, David! Thanks for the quick test, Thomas! -- Cheers, David / dhildenb
© 2016 - 2024 Red Hat, Inc.