[PATCH v1 0/2] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma

David Hildenbrand posted 2 patches 1 month, 2 weeks ago
include/linux/huge_mm.h | 18 ++++++++++++++++++
mm/huge_memory.c        | 13 +------------
mm/memory.c             |  9 +++++++++
mm/shmem.c              |  7 +------
4 files changed, 29 insertions(+), 18 deletions(-)
[PATCH v1 0/2] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
Posted by David Hildenbrand 1 month, 2 weeks ago
During testing, it was found that we can get PMD mappings in processes
where THP (and more precisely, PMD mappings) are supposed to be disabled.
While it works as expected for anon+shmem, the pagecache is the problematic
bit.

For s390 KVM this currently means that a VM backed by a file located on
filesystem with large folio support can crash when KVM tries accessing
the problematic page, because the readahead logic might decide to use
a PMD-sized THP and faulting it into the page tables will install a
PMD mapping, something that s390 KVM cannot tolerate.

This might also be a problem with HW that does not support PMD mappings,
but I did not try reproducing it.

Fix it by respecting the ways to disable THPs when deciding whether we
can install a PMD mapping. khugepaged should already be taking care of
not collapsing if THPs are effectively disabled for the hw/process/vma.

An earlier patch was tested by Thomas Huth, this one still needs to
be retested; sending it out already.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>

David Hildenbrand (1):
  mm: don't install PMD mappings when THPs are disabled by the
    hw/process/vma

Kefeng Wang (1):
  mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()

 include/linux/huge_mm.h | 18 ++++++++++++++++++
 mm/huge_memory.c        | 13 +------------
 mm/memory.c             |  9 +++++++++
 mm/shmem.c              |  7 +------
 4 files changed, 29 insertions(+), 18 deletions(-)

-- 
2.46.1
Re: [PATCH v1 0/2] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
Posted by Thomas Huth 1 month, 2 weeks ago
On 11/10/2024 12.24, David Hildenbrand wrote:
> During testing, it was found that we can get PMD mappings in processes
> where THP (and more precisely, PMD mappings) are supposed to be disabled.
> While it works as expected for anon+shmem, the pagecache is the problematic
> bit.
> 
> For s390 KVM this currently means that a VM backed by a file located on
> filesystem with large folio support can crash when KVM tries accessing
> the problematic page, because the readahead logic might decide to use
> a PMD-sized THP and faulting it into the page tables will install a
> PMD mapping, something that s390 KVM cannot tolerate.
> 
> This might also be a problem with HW that does not support PMD mappings,
> but I did not try reproducing it.
> 
> Fix it by respecting the ways to disable THPs when deciding whether we
> can install a PMD mapping. khugepaged should already be taking care of
> not collapsing if THPs are effectively disabled for the hw/process/vma.
> 
> An earlier patch was tested by Thomas Huth, this one still needs to
> be retested; sending it out already.

I just finished testing your new version of these patches here, and I can 
confirm that they are fixing the problem that I was facing, so:

Tested-by: Thomas Huth <thuth@redhat.com>

FWIW, the problem can be reproduced by running a KVM guest on a s390x host 
like this:

qemu-system-s390x -accel kvm -nographic -m 4G -d guest_errors \
   -M s390-ccw-virtio,memory-backend=mem-machine_mem \
   -object 
memory-backend-file,size=4294967296,prealloc=true,mem-path=$HOME/myfile,share=true,id=mem-machine_mem

Without the fix, the guest crashes immediatly before being able to execute 
the first instruction. With the fix applied, you can still see the first 
messages of the guest firmware, indicating that the guest started successfully.

Thank you very much for the fix, David!

  Thomas
Re: [PATCH v1 0/2] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
Posted by David Hildenbrand 1 month, 2 weeks ago
On 11.10.24 13:39, Thomas Huth wrote:
> On 11/10/2024 12.24, David Hildenbrand wrote:
>> During testing, it was found that we can get PMD mappings in processes
>> where THP (and more precisely, PMD mappings) are supposed to be disabled.
>> While it works as expected for anon+shmem, the pagecache is the problematic
>> bit.
>>
>> For s390 KVM this currently means that a VM backed by a file located on
>> filesystem with large folio support can crash when KVM tries accessing
>> the problematic page, because the readahead logic might decide to use
>> a PMD-sized THP and faulting it into the page tables will install a
>> PMD mapping, something that s390 KVM cannot tolerate.
>>
>> This might also be a problem with HW that does not support PMD mappings,
>> but I did not try reproducing it.
>>
>> Fix it by respecting the ways to disable THPs when deciding whether we
>> can install a PMD mapping. khugepaged should already be taking care of
>> not collapsing if THPs are effectively disabled for the hw/process/vma.
>>
>> An earlier patch was tested by Thomas Huth, this one still needs to
>> be retested; sending it out already.
> 
> I just finished testing your new version of these patches here, and I can
> confirm that they are fixing the problem that I was facing, so:
> 
> Tested-by: Thomas Huth <thuth@redhat.com>
> 
> FWIW, the problem can be reproduced by running a KVM guest on a s390x host
> like this:
> 
> qemu-system-s390x -accel kvm -nographic -m 4G -d guest_errors \
>     -M s390-ccw-virtio,memory-backend=mem-machine_mem \
>     -object
> memory-backend-file,size=4294967296,prealloc=true,mem-path=$HOME/myfile,share=true,id=mem-machine_mem
> 
> Without the fix, the guest crashes immediatly before being able to execute
> the first instruction. With the fix applied, you can still see the first
> messages of the guest firmware, indicating that the guest started successfully.
> 
> Thank you very much for the fix, David!

Thanks for the quick test, Thomas!

-- 
Cheers,

David / dhildenb