virt/kvm/guest_memfd.c | 6 ++++++ 1 file changed, 6 insertions(+)
#syz test: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next
guest_memfd VMAs don't need to be merged, especially now, since guest_memfd
only supports PAGE_SIZE folios.
Set VM_DONTEXPAND on guest_memfd VMAs.
In addition, this disables khugepaged from operating on guest_memfd folios,
which may result in unintended merging of guest_memfd folios.
Change-Id: I5867edcb66b075b54b25260afd22a198aee76df1
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
virt/kvm/guest_memfd.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index fdaea3422c30..3d4ac461c28b 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -480,6 +480,12 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
return -EINVAL;
}
+ /*
+ * Disable VMA merging - guest_memfd VMAs should be
+ * static. This also stops khugepaged from operating on
+ * guest_memfd VMAs and folios.
+ */
+ vm_flags_set(vma, VM_DONTEXPAND);
vma->vm_ops = &kvm_gmem_vm_ops;
return 0;
--
2.53.0.rc2.204.g2597b5adb4-goog
Ackerley Tng <ackerleytng@google.com> writes: > #syz test: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next > > guest_memfd VMAs don't need to be merged, especially now, since guest_memfd > only supports PAGE_SIZE folios. > > Set VM_DONTEXPAND on guest_memfd VMAs. > Local tests and syzbot agree that this fixes the issue identified. :) I would like to look into madvise(MADV_COLLAPSE) and uprobes triggering mapping/folio collapsing before submitting a full patch series. David, Michael, Vishal, what do you think of the choice of setting VM_DONTEXPAND to disable khugepaged? + For 4K guest_memfd, there's really nothing to expand + For THP and HugeTLB guest_memfd (future), we actually don't want expansion of the VMAs. IIUC setting VM_DONTEXPAND doesn't affect mremap() as long as the remapping does not involve expansion. > In addition, this disables khugepaged from operating on guest_memfd folios, > which may result in unintended merging of guest_memfd folios. > > Change-Id: I5867edcb66b075b54b25260afd22a198aee76df1 > Signed-off-by: Ackerley Tng <ackerleytng@google.com> > --- > virt/kvm/guest_memfd.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index fdaea3422c30..3d4ac461c28b 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -480,6 +480,12 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) > return -EINVAL; > } > > + /* > + * Disable VMA merging - guest_memfd VMAs should be > + * static. This also stops khugepaged from operating on > + * guest_memfd VMAs and folios. > + */ > + vm_flags_set(vma, VM_DONTEXPAND); > vma->vm_ops = &kvm_gmem_vm_ops; > > return 0; > -- > 2.53.0.rc2.204.g2597b5adb4-goog
On Wed, Feb 04, 2026, Ackerley Tng wrote: > Ackerley Tng <ackerleytng@google.com> writes: > > > #syz test: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next > > > > guest_memfd VMAs don't need to be merged, Why not? There are benefits to merging VMAs that have nothing to do with folios. E.g. map 1GiB of guest_memfd with 512*512 4KiB VMAs, and then it becomes quite desirable to merge all of those VMAs into one. Creating _hugepages_ doesn't add value, but that's not the same things as merging VMAs. > > especially now, since guest_memfd only supports PAGE_SIZE folios. > > > > Set VM_DONTEXPAND on guest_memfd VMAs. > > Local tests and syzbot agree that this fixes the issue identified. :) > > I would like to look into madvise(MADV_COLLAPSE) and uprobes triggering > mapping/folio collapsing before submitting a full patch series. > > David, Michael, Vishal, what do you think of the choice of setting > VM_DONTEXPAND to disable khugepaged? I'm not one of the above, but for me it feels very much like treating a symptom and not fixing the underlying cause. It seems like what KVM should do is not block one path that triggers hugepage processing, but instead flat out disallow creating hugepages. Unfortunately, AFAICT, there's no existing way to prevent madvise() from clearing VM_NOHUGEPAGE, so we can't simply force that flag. I'd prefer not to special case guest_memfd, a la devdax, but I also want to address this head-on, not by removing a tangentially related trigger. > + For 4K guest_memfd, there's really nothing to expand > + For THP and HugeTLB guest_memfd (future), we actually don't want > expansion of the VMAs. > > IIUC setting VM_DONTEXPAND doesn't affect mremap() as long as the > remapping does not involve expansion. > > > In addition, this disables khugepaged from operating on guest_memfd folios, > > which may result in unintended merging of guest_memfd folios. > > > > Change-Id: I5867edcb66b075b54b25260afd22a198aee76df1 > > Signed-off-by: Ackerley Tng <ackerleytng@google.com> > > --- > > virt/kvm/guest_memfd.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > index fdaea3422c30..3d4ac461c28b 100644 > > --- a/virt/kvm/guest_memfd.c > > +++ b/virt/kvm/guest_memfd.c > > @@ -480,6 +480,12 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) > > return -EINVAL; > > } > > > > + /* > > + * Disable VMA merging - guest_memfd VMAs should be > > + * static. This also stops khugepaged from operating on > > + * guest_memfd VMAs and folios. > > + */ > > + vm_flags_set(vma, VM_DONTEXPAND); > > vma->vm_ops = &kvm_gmem_vm_ops; > > > > return 0; > > -- > > 2.53.0.rc2.204.g2597b5adb4-goog
On 2/4/26 22:37, Sean Christopherson wrote: > On Wed, Feb 04, 2026, Ackerley Tng wrote: >> Ackerley Tng <ackerleytng@google.com> writes: >> >>> #syz test: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next >>> >>> guest_memfd VMAs don't need to be merged, > > Why not? There are benefits to merging VMAs that have nothing to do with folios. > E.g. map 1GiB of guest_memfd with 512*512 4KiB VMAs, and then it becomes quite > desirable to merge all of those VMAs into one. > > Creating _hugepages_ doesn't add value, but that's not the same things as merging > VMAs. > >>> especially now, since guest_memfd only supports PAGE_SIZE folios. >>> >>> Set VM_DONTEXPAND on guest_memfd VMAs. >> >> Local tests and syzbot agree that this fixes the issue identified. :) >> >> I would like to look into madvise(MADV_COLLAPSE) and uprobes triggering >> mapping/folio collapsing before submitting a full patch series. >> >> David, Michael, Vishal, what do you think of the choice of setting >> VM_DONTEXPAND to disable khugepaged? > > I'm not one of the above, but for me it feels very much like treating a symptom > and not fixing the underlying cause. And you are spot-on :) > > It seems like what KVM should do is not block one path that triggers hugepage > processing, but instead flat out disallow creating hugepages. Unfortunately, > AFAICT, there's no existing way to prevent madvise() from clearing VM_NOHUGEPAGE, > so we can't simply force that flag. > > I'd prefer not to special case guest_memfd, a la devdax, but I also want to address > this head-on, not by removing a tangentially related trigger. VM_NOHUGEPAGE also smells like the wrong thing. This is a file limitation. !thp_vma_allowable_order() must take care of that somehow down in __thp_vma_allowable_orders(), by checking the file). Likely the file_thp_enabled() check is the culprit with CONFIG_READ_ONLY_THP_FOR_FS? Maybe we need a flag to say "even not CONFIG_READ_ONLY_THP_FOR_FS". I wonder how we handle that for secretmem. Too late for me, going to bed :) -- Cheers, David
"David Hildenbrand (arm)" <david@kernel.org> writes: > On 2/4/26 22:37, Sean Christopherson wrote: >> On Wed, Feb 04, 2026, Ackerley Tng wrote: >>> Ackerley Tng <ackerleytng@google.com> writes: >>> >>>> #syz test: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next >>>> >>>> guest_memfd VMAs don't need to be merged, >> >> Why not? There are benefits to merging VMAs that have nothing to do with folios. >> E.g. map 1GiB of guest_memfd with 512*512 4KiB VMAs, and then it becomes quite >> desirable to merge all of those VMAs into one. >> I didn't realise VM_DONTEXPAND's no expansion policy extends to the case where adjacent VMAs with the same flags, etc automatically merge. Since VM_DONTEXPAND blocks this kind of expansion, I agree VM_DONTEXPAND is not great. >> Creating _hugepages_ doesn't add value, but that's not the same things as merging >> VMAs. >> >>>> especially now, since guest_memfd only supports PAGE_SIZE folios. >>>> >>>> Set VM_DONTEXPAND on guest_memfd VMAs. >>> >>> Local tests and syzbot agree that this fixes the issue identified. :) >>> >>> I would like to look into madvise(MADV_COLLAPSE) and uprobes triggering >>> mapping/folio collapsing before submitting a full patch series. >>> >>> David, Michael, Vishal, what do you think of the choice of setting >>> VM_DONTEXPAND to disable khugepaged? >> >> I'm not one of the above, but for me it feels very much like treating a symptom Was going to find some solution before getting to you to save you some time :) >> and not fixing the underlying cause. > > And you are spot-on :) > >> >> It seems like what KVM should do is not block one path that triggers hugepage >> processing, but instead flat out disallow creating hugepages. Unfortunately, __filemap_get_folio_mpol(), which we use in kvm_gmem_get_folio(), looks up mapping_min_folio_order() to determine what order to allocate. I think we could lock that down to always use order 0. I tried that here [1] but in this case khugepaged allocates new folios for guest_memfd (and others) directly in collapse_file(), explicitly specifying PMD_ORDER. I took a look and wasn't able to find a central callback/ops to catch all fs allocations. [1] https://lore.kernel.org/all/6982553e.a00a0220.34fa92.0009.GAE@google.com/ >> AFAICT, there's no existing way to prevent madvise() from clearing VM_NOHUGEPAGE, >> so we can't simply force that flag. >> >> I'd prefer not to special case guest_memfd, a la devdax, but I also want to address >> this head-on, not by removing a tangentially related trigger. > > VM_NOHUGEPAGE also smells like the wrong thing. This is a file limitation. > > !thp_vma_allowable_order() must take care of that somehow down in > __thp_vma_allowable_orders(), by checking the file). > > Likely the file_thp_enabled() check is the culprit with > CONFIG_READ_ONLY_THP_FOR_FS? > > Maybe we need a flag to say "even not CONFIG_READ_ONLY_THP_FOR_FS". > > I wonder how we handle that for secretmem. Too late for me, going to bed :) > Let me look deeper into this. Thanks! > -- > Cheers, > > David
Hello, syzbot has tested the proposed patch and the reproducer did not trigger any issue: Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com Tested on: commit: 0499add8 Merge tag 'kvm-x86-fixes-6.19-rc1' of https:/.. git tree: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next console output: https://syzkaller.appspot.com/x/log.txt?x=1778a402580000 kernel config: https://syzkaller.appspot.com/x/.config?x=3aec2f7e1730a8eb dashboard link: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44 compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44 patch: https://syzkaller.appspot.com/x/patch.diff?x=13b847fa580000 Note: testing is done by a robot and is best-effort only.
© 2016 - 2026 Red Hat, Inc.