Documentation/virt/kvm/api.rst | 35 ++++- arch/arm64/include/asm/kvm_host.h | 9 ++ arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/mmu.c | 48 +++--- arch/x86/include/asm/kvm_host.h | 68 +++++++- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/mmu.c | 13 +- arch/x86/kvm/mmu/mmu_internal.h | 77 +--------- arch/x86/kvm/x86.c | 27 ++-- include/linux/kvm_host.h | 49 +++++- include/uapi/linux/kvm.h | 6 +- .../selftests/kvm/demand_paging_test.c | 145 ++++++++++++++++-- .../testing/selftests/kvm/include/kvm_util.h | 5 + .../selftests/kvm/include/userfaultfd_util.h | 2 + tools/testing/selftests/kvm/lib/kvm_util.c | 42 ++++- .../selftests/kvm/lib/userfaultfd_util.c | 2 + .../selftests/kvm/set_memory_region_test.c | 33 ++++ virt/kvm/Kconfig | 3 + virt/kvm/kvm_main.c | 57 ++++++- 19 files changed, 489 insertions(+), 134 deletions(-)
Hi Sean, Paolo, Oliver, + others, Here is a v3 of KVM Userfault. Thanks for all the feedback on the v2, Sean. I realize it has been 6 months since the v2; I hope that isn't an issue. I am working on the QEMU side of the changes as I get time. Let me know if it's important for me to send those patches out for this series to be merged. Be aware that this series will have non-trivial conflicts with Fuad's user mapping support for guest_memfd series[1]. For example, for the arm64 change he is making, the newly introduced gmem_abort() would need to be enlightened to handle KVM Userfault exits. Changelog: v2[2]->v3: - Pull in Sean's changes to genericize struct kvm_page_fault and use it for arm64. Many of these patches now have Sean's SoB. - Pull in Sean's small rename and squashing of the main patches. - Add kvm_arch_userfault_enabled() in place of calling kvm_arch_flush_shadow_memslot() directly from generic code. - Pull in Xin Li's documentation section number fix for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS[3]. v1[4]->v2: - For arm64, no longer zap stage 2 when disabling KVM_MEM_USERFAULT (thanks Oliver). - Fix the userfault_bitmap validation and casts (thanks kernel test robot). - Fix _Atomic cast for the userfault bitmap in the selftest (thanks kernel test robot). - Pick up Reviewed-by on doc changes (thanks Bagas). Below is the cover letter from v1, mostly unchanged: Please see the RFC[5] for the problem description. In summary, guest_memfd VMs have no mechanism for doing post-copy live migration. KVM Userfault provides such a mechanism. There is a second problem that KVM Userfault solves: userfaultfd-based post-copy doesn't scale very well. KVM Userfault when used with userfaultfd can scale much better in the common case that most post-copy demand fetches are a result of vCPU access violations. This is a continuation of the solution Anish was working on[6]. This aspect of KVM Userfault is important for userfaultfd-based live migration when scaling up to hundreds of vCPUs with ~30us network latency for a PAGE_SIZE demand-fetch. The implementation in this series is version than the RFC[5]. It adds... 1. a new memslot flag is added: KVM_MEM_USERFAULT, 2. a new parameter, userfault_bitmap, into struct kvm_memory_slot, 3. a new KVM_RUN exit reason: KVM_MEMORY_EXIT_FLAG_USERFAULT, 4. a new KVM capability KVM_CAP_USERFAULT. KVM Userfault does not attempt to catch KVM's own accesses to guest memory. That is left up to userfaultfd. When enabling KVM_MEM_USERFAULT for a memslot, the second-stage mappings are zapped, and new faults will check `userfault_bitmap` to see if the fault should exit to userspace. When KVM_MEM_USERFAULT is enabled, only PAGE_SIZE mappings are permitted. When disabling KVM_MEM_USERFAULT, huge mappings will be reconstructed consistent with dirty log disabling. So on x86, huge mappings will be reconstructed, but on arm64, they won't be. KVM Userfault is not compatible with async page faults. Nikita has proposed a new implementation of async page faults that is more userspace-driven that *is* compatible with KVM Userfault[7]. See v1 for more performance details[4]. They are unchanged in this version. This series is based on the latest kvm-x86/next. [1]: https://lore.kernel.org/kvm/20250611133330.1514028-1-tabba@google.com/ [2]: https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com/ [3]: https://lore.kernel.org/kvm/20250414165146.2279450-1-xin@zytor.com/ [4]: https://lore.kernel.org/kvm/20241204191349.1730936-1-jthoughton@google.com/ [5]: https://lore.kernel.org/kvm/20240710234222.2333120-1-jthoughton@google.com/ [6]: https://lore.kernel.org/all/20240215235405.368539-1-amoorthy@google.com/ [7]: https://lore.kernel.org/kvm/20241118123948.4796-1-kalyazin@amazon.com/#t James Houghton (11): KVM: Add common infrastructure for KVM Userfaults KVM: x86: Add support for KVM userfault exits KVM: arm64: Add support for KVM userfault exits KVM: Enable and advertise support for KVM userfault exits KVM: selftests: Fix vm_mem_region_set_flags docstring KVM: selftests: Fix prefault_mem logic KVM: selftests: Add va_start/end into uffd_desc KVM: selftests: Add KVM Userfault mode to demand_paging_test KVM: selftests: Inform set_memory_region_test of KVM_MEM_USERFAULT KVM: selftests: Add KVM_MEM_USERFAULT + guest_memfd toggle tests KVM: Documentation: Add KVM_CAP_USERFAULT and KVM_MEM_USERFAULT details Sean Christopherson (3): KVM: x86/mmu: Move "struct kvm_page_fault" definition to asm/kvm_host.h KVM: arm64: Add "struct kvm_page_fault" to gather common fault variables KVM: arm64: x86: Require "struct kvm_page_fault" for memory fault exits Xin Li (Intel) (1): KVM: Documentation: Fix section number for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS Documentation/virt/kvm/api.rst | 35 ++++- arch/arm64/include/asm/kvm_host.h | 9 ++ arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/mmu.c | 48 +++--- arch/x86/include/asm/kvm_host.h | 68 +++++++- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/mmu.c | 13 +- arch/x86/kvm/mmu/mmu_internal.h | 77 +--------- arch/x86/kvm/x86.c | 27 ++-- include/linux/kvm_host.h | 49 +++++- include/uapi/linux/kvm.h | 6 +- .../selftests/kvm/demand_paging_test.c | 145 ++++++++++++++++-- .../testing/selftests/kvm/include/kvm_util.h | 5 + .../selftests/kvm/include/userfaultfd_util.h | 2 + tools/testing/selftests/kvm/lib/kvm_util.c | 42 ++++- .../selftests/kvm/lib/userfaultfd_util.c | 2 + .../selftests/kvm/set_memory_region_test.c | 33 ++++ virt/kvm/Kconfig | 3 + virt/kvm/kvm_main.c | 57 ++++++- 19 files changed, 489 insertions(+), 134 deletions(-) base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 -- 2.50.0.rc2.692.g299adb8693-goog
On Wed, Jun 18, 2025 at 04:24:09AM +0000, James Houghton wrote: > Hi Sean, Paolo, Oliver, + others, > > Here is a v3 of KVM Userfault. Thanks for all the feedback on the v2, > Sean. I realize it has been 6 months since the v2; I hope that isn't an > issue. Not one bit. The only thing I look for in patch frequency is the urgency with which the author wants to get something in. > I am working on the QEMU side of the changes as I get time. Let me know > if it's important for me to send those patches out for this series to be > merged. It'd be good to know we have line of sight on a functional implementation here, i.e. uffd-based handling of non-vCPU accesses. I'm not expecting surprises here, but patches always speak louder than words. Don't want to block the kernel pieces if that's a time sink though. And FWIW, besides the nitpicking I'm quite happy with the way this is shaping up. > Be aware that this series will have non-trivial conflicts with Fuad's > user mapping support for guest_memfd series[1]. For example, for the > arm64 change he is making, the newly introduced gmem_abort() would need > to be enlightened to handle KVM Userfault exits. Appreciate the heads up! Thanks, Oliver
On 18/06/2025 05:24, James Houghton wrote: > Hi Sean, Paolo, Oliver, + others, > > Here is a v3 of KVM Userfault. Thanks for all the feedback on the v2, > Sean. I realize it has been 6 months since the v2; I hope that isn't an > issue. > > I am working on the QEMU side of the changes as I get time. Let me know > if it's important for me to send those patches out for this series to be > merged. Hi Sean and others, Are there any blockers for merging this series? We would like to use the functionality in Firecracker for restoring guest_memfd-backed VMs from snapshots via UFFD [1]. [2] is a Firecracker feature branch that builds on top of KVM userfault, along with direct map removal [3], write syscall [4] and UFFD support [5] in guest_memfd (currently in discussion with MM at [6]) series. Thanks, Nikita [1]: https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/handling-page-faults-on-snapshot-resume.md [2]: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding [3]: https://lore.kernel.org/kvm/20250828093902.2719-1-roypat@amazon.co.uk [4]: https://lore.kernel.org/kvm/20250902111951.58315-1-kalyazin@amazon.com [5]: https://lore.kernel.org/kvm/20250404154352.23078-1-kalyazin@amazon.com [6]: https://lore.kernel.org/linux-mm/20250627154655.2085903-1-peterx@redhat.com > Be aware that this series will have non-trivial conflicts with Fuad's > user mapping support for guest_memfd series[1]. For example, for the > arm64 change he is making, the newly introduced gmem_abort() would need > to be enlightened to handle KVM Userfault exits. > > Changelog: > v2[2]->v3: > - Pull in Sean's changes to genericize struct kvm_page_fault and use it > for arm64. Many of these patches now have Sean's SoB. > - Pull in Sean's small rename and squashing of the main patches. > - Add kvm_arch_userfault_enabled() in place of calling > kvm_arch_flush_shadow_memslot() directly from generic code. > - Pull in Xin Li's documentation section number fix for > KVM_CAP_ARM_WRITABLE_IMP_ID_REGS[3]. > v1[4]->v2: > - For arm64, no longer zap stage 2 when disabling KVM_MEM_USERFAULT > (thanks Oliver). > - Fix the userfault_bitmap validation and casts (thanks kernel test > robot). > - Fix _Atomic cast for the userfault bitmap in the selftest (thanks > kernel test robot). > - Pick up Reviewed-by on doc changes (thanks Bagas). > > Below is the cover letter from v1, mostly unchanged: > > Please see the RFC[5] for the problem description. In summary, > guest_memfd VMs have no mechanism for doing post-copy live migration. > KVM Userfault provides such a mechanism. > > There is a second problem that KVM Userfault solves: userfaultfd-based > post-copy doesn't scale very well. KVM Userfault when used with > userfaultfd can scale much better in the common case that most post-copy > demand fetches are a result of vCPU access violations. This is a > continuation of the solution Anish was working on[6]. This aspect of > KVM Userfault is important for userfaultfd-based live migration when > scaling up to hundreds of vCPUs with ~30us network latency for a > PAGE_SIZE demand-fetch. > > The implementation in this series is version than the RFC[5]. It adds... > 1. a new memslot flag is added: KVM_MEM_USERFAULT, > 2. a new parameter, userfault_bitmap, into struct kvm_memory_slot, > 3. a new KVM_RUN exit reason: KVM_MEMORY_EXIT_FLAG_USERFAULT, > 4. a new KVM capability KVM_CAP_USERFAULT. > > KVM Userfault does not attempt to catch KVM's own accesses to guest > memory. That is left up to userfaultfd. > > When enabling KVM_MEM_USERFAULT for a memslot, the second-stage mappings > are zapped, and new faults will check `userfault_bitmap` to see if the > fault should exit to userspace. > > When KVM_MEM_USERFAULT is enabled, only PAGE_SIZE mappings are > permitted. > > When disabling KVM_MEM_USERFAULT, huge mappings will be reconstructed > consistent with dirty log disabling. So on x86, huge mappings will be > reconstructed, but on arm64, they won't be. > > KVM Userfault is not compatible with async page faults. Nikita has > proposed a new implementation of async page faults that is more > userspace-driven that *is* compatible with KVM Userfault[7]. > > See v1 for more performance details[4]. They are unchanged in this > version. > > This series is based on the latest kvm-x86/next. > > [1]: https://lore.kernel.org/kvm/20250611133330.1514028-1-tabba@google.com/ > [2]: https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com/ > [3]: https://lore.kernel.org/kvm/20250414165146.2279450-1-xin@zytor.com/ > [4]: https://lore.kernel.org/kvm/20241204191349.1730936-1-jthoughton@google.com/ > [5]: https://lore.kernel.org/kvm/20240710234222.2333120-1-jthoughton@google.com/ > [6]: https://lore.kernel.org/all/20240215235405.368539-1-amoorthy@google.com/ > [7]: https://lore.kernel.org/kvm/20241118123948.4796-1-kalyazin@amazon.com/#t > > James Houghton (11): > KVM: Add common infrastructure for KVM Userfaults > KVM: x86: Add support for KVM userfault exits > KVM: arm64: Add support for KVM userfault exits > KVM: Enable and advertise support for KVM userfault exits > KVM: selftests: Fix vm_mem_region_set_flags docstring > KVM: selftests: Fix prefault_mem logic > KVM: selftests: Add va_start/end into uffd_desc > KVM: selftests: Add KVM Userfault mode to demand_paging_test > KVM: selftests: Inform set_memory_region_test of KVM_MEM_USERFAULT > KVM: selftests: Add KVM_MEM_USERFAULT + guest_memfd toggle tests > KVM: Documentation: Add KVM_CAP_USERFAULT and KVM_MEM_USERFAULT > details > > Sean Christopherson (3): > KVM: x86/mmu: Move "struct kvm_page_fault" definition to > asm/kvm_host.h > KVM: arm64: Add "struct kvm_page_fault" to gather common fault > variables > KVM: arm64: x86: Require "struct kvm_page_fault" for memory fault > exits > > Xin Li (Intel) (1): > KVM: Documentation: Fix section number for > KVM_CAP_ARM_WRITABLE_IMP_ID_REGS > > Documentation/virt/kvm/api.rst | 35 ++++- > arch/arm64/include/asm/kvm_host.h | 9 ++ > arch/arm64/kvm/Kconfig | 1 + > arch/arm64/kvm/mmu.c | 48 +++--- > arch/x86/include/asm/kvm_host.h | 68 +++++++- > arch/x86/kvm/Kconfig | 1 + > arch/x86/kvm/mmu/mmu.c | 13 +- > arch/x86/kvm/mmu/mmu_internal.h | 77 +--------- > arch/x86/kvm/x86.c | 27 ++-- > include/linux/kvm_host.h | 49 +++++- > include/uapi/linux/kvm.h | 6 +- > .../selftests/kvm/demand_paging_test.c | 145 ++++++++++++++++-- > .../testing/selftests/kvm/include/kvm_util.h | 5 + > .../selftests/kvm/include/userfaultfd_util.h | 2 + > tools/testing/selftests/kvm/lib/kvm_util.c | 42 ++++- > .../selftests/kvm/lib/userfaultfd_util.c | 2 + > .../selftests/kvm/set_memory_region_test.c | 33 ++++ > virt/kvm/Kconfig | 3 + > virt/kvm/kvm_main.c | 57 ++++++- > 19 files changed, 489 insertions(+), 134 deletions(-) > > > base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 > -- > 2.50.0.rc2.692.g299adb8693-goog >
On Thu, Sep 4, 2025 at 9:43 AM Nikita Kalyazin <kalyazin@amazon.com> wrote: > > > > On 18/06/2025 05:24, James Houghton wrote: > > Hi Sean, Paolo, Oliver, + others, > > > > Here is a v3 of KVM Userfault. Thanks for all the feedback on the v2, > > Sean. I realize it has been 6 months since the v2; I hope that isn't an > > issue. > > > > I am working on the QEMU side of the changes as I get time. Let me know > > if it's important for me to send those patches out for this series to be > > merged. > > Hi Sean and others, > > Are there any blockers for merging this series? We would like to use > the functionality in Firecracker for restoring guest_memfd-backed VMs > from snapshots via UFFD [1]. [2] is a Firecracker feature branch that > builds on top of KVM userfault, along with direct map removal [3], write > syscall [4] and UFFD support [5] in guest_memfd (currently in discussion > with MM at [6]) series. Glad to hear that you need this series. :) I am on the hook to get some QEMU patches to demonstrate that KVM Userfault can work well with it. I'll try to get that done ASAP now that you've expressed interest. The firecracker patches are a nice demonstration that this could work too... (I wish the VMM I work on was open-source). I think the current "blocker" is the kvm_page_fault stuff[*]; KVM Userfault will be the first user of this API. I'll review that series in the next few days. I'm pretty sure Sean doesn't have any conceptual issues with KVM Userfault as implemented in this series. [*]: https://lore.kernel.org/linux-arm-kernel/20250821210042.3451147-1-seanjc@google.com/ > > Thanks, > Nikita > > [1]: > https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/handling-page-faults-on-snapshot-resume.md > [2]: > https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding > [3]: https://lore.kernel.org/kvm/20250828093902.2719-1-roypat@amazon.co.uk > [4]: https://lore.kernel.org/kvm/20250902111951.58315-1-kalyazin@amazon.com > [5]: https://lore.kernel.org/kvm/20250404154352.23078-1-kalyazin@amazon.com > [6]: > https://lore.kernel.org/linux-mm/20250627154655.2085903-1-peterx@redhat.com
On Thu, Sep 04, 2025, James Houghton wrote: > On Thu, Sep 4, 2025 at 9:43 AM Nikita Kalyazin <kalyazin@amazon.com> wrote: > > Are there any blockers for merging this series? We would like to use > > the functionality in Firecracker for restoring guest_memfd-backed VMs > > from snapshots via UFFD [1]. [2] is a Firecracker feature branch that > > builds on top of KVM userfault, along with direct map removal [3], write > > syscall [4] and UFFD support [5] in guest_memfd (currently in discussion > > with MM at [6]) series. > > Glad to hear that you need this series. :) Likewise (though I had slightly-advanced warning from Patrick that Firecracker wants KVM Userfault). The main reason I haven't pushed harder on this series is that I didn't think anyone wanted to use it within the next ~year. > I am on the hook to get some QEMU patches to demonstrate that KVM > Userfault can work well with it. I'll try to get that done ASAP now > that you've expressed interest. The firecracker patches are a nice > demonstration that this could work too... (I wish the VMM I work on > was open-source). > > I think the current "blocker" is the kvm_page_fault stuff[*]; KVM > Userfault will be the first user of this API. I'll review that series > in the next few days. I'm pretty sure Sean doesn't have any conceptual > issues with KVM Userfault as implemented in this series. Yep, Oliver and I (and anyone else that has an opinion) just need to align on the interface for arch-neutral code. I think that's mostly on me to spin a v2, and maybe to show how it all looks when integrated with the userfault stuff.
On 05/09/2025 13:27, Sean Christopherson wrote: > On Thu, Sep 04, 2025, James Houghton wrote: >> On Thu, Sep 4, 2025 at 9:43 AM Nikita Kalyazin <kalyazin@amazon.com> wrote: >>> Are there any blockers for merging this series? We would like to use >>> the functionality in Firecracker for restoring guest_memfd-backed VMs >>> from snapshots via UFFD [1]. [2] is a Firecracker feature branch that >>> builds on top of KVM userfault, along with direct map removal [3], write >>> syscall [4] and UFFD support [5] in guest_memfd (currently in discussion >>> with MM at [6]) series. >> >> Glad to hear that you need this series. :) > > Likewise (though I had slightly-advanced warning from Patrick that Firecracker > wants KVM Userfault). The main reason I haven't pushed harder on this series is > that I didn't think anyone wanted to use it within the next ~year. > >> I am on the hook to get some QEMU patches to demonstrate that KVM >> Userfault can work well with it. I'll try to get that done ASAP now >> that you've expressed interest. The firecracker patches are a nice >> demonstration that this could work too... (I wish the VMM I work on >> was open-source). >> >> I think the current "blocker" is the kvm_page_fault stuff[*]; KVM >> Userfault will be the first user of this API. I'll review that series >> in the next few days. I'm pretty sure Sean doesn't have any conceptual >> issues with KVM Userfault as implemented in this series. > > Yep, Oliver and I (and anyone else that has an opinion) just need to align on the > interface for arch-neutral code. I think that's mostly on me to spin a v2, and > maybe to show how it all looks when integrated with the userfault stuff.\ Sounds good, thanks. Do you think you'll be having time to work on the v2 soonish? Is defining and implementing the interface a strict prerequisite for this series?
© 2016 - 2025 Red Hat, Inc.