[PATCH v3 00/15] KVM: Introduce KVM Userfault

James Houghton posted 15 patches 3 months, 3 weeks ago
Documentation/virt/kvm/api.rst                |  35 ++++-
arch/arm64/include/asm/kvm_host.h             |   9 ++
arch/arm64/kvm/Kconfig                        |   1 +
arch/arm64/kvm/mmu.c                          |  48 +++---
arch/x86/include/asm/kvm_host.h               |  68 +++++++-
arch/x86/kvm/Kconfig                          |   1 +
arch/x86/kvm/mmu/mmu.c                        |  13 +-
arch/x86/kvm/mmu/mmu_internal.h               |  77 +---------
arch/x86/kvm/x86.c                            |  27 ++--
include/linux/kvm_host.h                      |  49 +++++-
include/uapi/linux/kvm.h                      |   6 +-
.../selftests/kvm/demand_paging_test.c        | 145 ++++++++++++++++--
.../testing/selftests/kvm/include/kvm_util.h  |   5 +
.../selftests/kvm/include/userfaultfd_util.h  |   2 +
tools/testing/selftests/kvm/lib/kvm_util.c    |  42 ++++-
.../selftests/kvm/lib/userfaultfd_util.c      |   2 +
.../selftests/kvm/set_memory_region_test.c    |  33 ++++
virt/kvm/Kconfig                              |   3 +
virt/kvm/kvm_main.c                           |  57 ++++++-
19 files changed, 489 insertions(+), 134 deletions(-)
[PATCH v3 00/15] KVM: Introduce KVM Userfault
Posted by James Houghton 3 months, 3 weeks ago
Hi Sean, Paolo, Oliver, + others,

Here is a v3 of KVM Userfault. Thanks for all the feedback on the v2,
Sean. I realize it has been 6 months since the v2; I hope that isn't an
issue.

I am working on the QEMU side of the changes as I get time. Let me know
if it's important for me to send those patches out for this series to be
merged.

Be aware that this series will have non-trivial conflicts with Fuad's
user mapping support for guest_memfd series[1]. For example, for the
arm64 change he is making, the newly introduced gmem_abort() would need
to be enlightened to handle KVM Userfault exits.

Changelog:
v2[2]->v3:
- Pull in Sean's changes to genericize struct kvm_page_fault and use it
  for arm64. Many of these patches now have Sean's SoB.
- Pull in Sean's small rename and squashing of the main patches.
- Add kvm_arch_userfault_enabled() in place of calling
  kvm_arch_flush_shadow_memslot() directly from generic code.
- Pull in Xin Li's documentation section number fix for
  KVM_CAP_ARM_WRITABLE_IMP_ID_REGS[3].
v1[4]->v2:
- For arm64, no longer zap stage 2 when disabling KVM_MEM_USERFAULT
  (thanks Oliver).
- Fix the userfault_bitmap validation and casts (thanks kernel test
  robot).
- Fix _Atomic cast for the userfault bitmap in the selftest (thanks
  kernel test robot).
- Pick up Reviewed-by on doc changes (thanks Bagas).

Below is the cover letter from v1, mostly unchanged:

Please see the RFC[5] for the problem description. In summary,
guest_memfd VMs have no mechanism for doing post-copy live migration.
KVM Userfault provides such a mechanism.

There is a second problem that KVM Userfault solves: userfaultfd-based
post-copy doesn't scale very well. KVM Userfault when used with
userfaultfd can scale much better in the common case that most post-copy
demand fetches are a result of vCPU access violations. This is a
continuation of the solution Anish was working on[6]. This aspect of
KVM Userfault is important for userfaultfd-based live migration when
scaling up to hundreds of vCPUs with ~30us network latency for a
PAGE_SIZE demand-fetch.

The implementation in this series is version than the RFC[5]. It adds...
 1. a new memslot flag is added: KVM_MEM_USERFAULT,
 2. a new parameter, userfault_bitmap, into struct kvm_memory_slot,
 3. a new KVM_RUN exit reason: KVM_MEMORY_EXIT_FLAG_USERFAULT,
 4. a new KVM capability KVM_CAP_USERFAULT.

KVM Userfault does not attempt to catch KVM's own accesses to guest
memory. That is left up to userfaultfd.

When enabling KVM_MEM_USERFAULT for a memslot, the second-stage mappings
are zapped, and new faults will check `userfault_bitmap` to see if the
fault should exit to userspace.

When KVM_MEM_USERFAULT is enabled, only PAGE_SIZE mappings are
permitted.

When disabling KVM_MEM_USERFAULT, huge mappings will be reconstructed
consistent with dirty log disabling. So on x86, huge mappings will be
reconstructed, but on arm64, they won't be.

KVM Userfault is not compatible with async page faults. Nikita has
proposed a new implementation of async page faults that is more
userspace-driven that *is* compatible with KVM Userfault[7].

See v1 for more performance details[4]. They are unchanged in this
version.

This series is based on the latest kvm-x86/next.

[1]: https://lore.kernel.org/kvm/20250611133330.1514028-1-tabba@google.com/
[2]: https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com/
[3]: https://lore.kernel.org/kvm/20250414165146.2279450-1-xin@zytor.com/
[4]: https://lore.kernel.org/kvm/20241204191349.1730936-1-jthoughton@google.com/
[5]: https://lore.kernel.org/kvm/20240710234222.2333120-1-jthoughton@google.com/
[6]: https://lore.kernel.org/all/20240215235405.368539-1-amoorthy@google.com/
[7]: https://lore.kernel.org/kvm/20241118123948.4796-1-kalyazin@amazon.com/#t

James Houghton (11):
  KVM: Add common infrastructure for KVM Userfaults
  KVM: x86: Add support for KVM userfault exits
  KVM: arm64: Add support for KVM userfault exits
  KVM: Enable and advertise support for KVM userfault exits
  KVM: selftests: Fix vm_mem_region_set_flags docstring
  KVM: selftests: Fix prefault_mem logic
  KVM: selftests: Add va_start/end into uffd_desc
  KVM: selftests: Add KVM Userfault mode to demand_paging_test
  KVM: selftests: Inform set_memory_region_test of KVM_MEM_USERFAULT
  KVM: selftests: Add KVM_MEM_USERFAULT + guest_memfd toggle tests
  KVM: Documentation: Add KVM_CAP_USERFAULT and KVM_MEM_USERFAULT
    details

Sean Christopherson (3):
  KVM: x86/mmu: Move "struct kvm_page_fault" definition to
    asm/kvm_host.h
  KVM: arm64: Add "struct kvm_page_fault" to gather common fault
    variables
  KVM: arm64: x86: Require "struct kvm_page_fault" for memory fault
    exits

Xin Li (Intel) (1):
  KVM: Documentation: Fix section number for
    KVM_CAP_ARM_WRITABLE_IMP_ID_REGS

 Documentation/virt/kvm/api.rst                |  35 ++++-
 arch/arm64/include/asm/kvm_host.h             |   9 ++
 arch/arm64/kvm/Kconfig                        |   1 +
 arch/arm64/kvm/mmu.c                          |  48 +++---
 arch/x86/include/asm/kvm_host.h               |  68 +++++++-
 arch/x86/kvm/Kconfig                          |   1 +
 arch/x86/kvm/mmu/mmu.c                        |  13 +-
 arch/x86/kvm/mmu/mmu_internal.h               |  77 +---------
 arch/x86/kvm/x86.c                            |  27 ++--
 include/linux/kvm_host.h                      |  49 +++++-
 include/uapi/linux/kvm.h                      |   6 +-
 .../selftests/kvm/demand_paging_test.c        | 145 ++++++++++++++++--
 .../testing/selftests/kvm/include/kvm_util.h  |   5 +
 .../selftests/kvm/include/userfaultfd_util.h  |   2 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  42 ++++-
 .../selftests/kvm/lib/userfaultfd_util.c      |   2 +
 .../selftests/kvm/set_memory_region_test.c    |  33 ++++
 virt/kvm/Kconfig                              |   3 +
 virt/kvm/kvm_main.c                           |  57 ++++++-
 19 files changed, 489 insertions(+), 134 deletions(-)


base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
-- 
2.50.0.rc2.692.g299adb8693-goog
Re: [PATCH v3 00/15] KVM: Introduce KVM Userfault
Posted by Oliver Upton 3 months, 3 weeks ago
On Wed, Jun 18, 2025 at 04:24:09AM +0000, James Houghton wrote:
> Hi Sean, Paolo, Oliver, + others,
> 
> Here is a v3 of KVM Userfault. Thanks for all the feedback on the v2,
> Sean. I realize it has been 6 months since the v2; I hope that isn't an
> issue.

Not one bit. The only thing I look for in patch frequency is the urgency
with which the author wants to get something in.

> I am working on the QEMU side of the changes as I get time. Let me know
> if it's important for me to send those patches out for this series to be
> merged.

It'd be good to know we have line of sight on a functional
implementation here, i.e. uffd-based handling of non-vCPU accesses. I'm
not expecting surprises here, but patches always speak louder than
words.

Don't want to block the kernel pieces if that's a time sink though. And
FWIW, besides the nitpicking I'm quite happy with the way this is
shaping up.

> Be aware that this series will have non-trivial conflicts with Fuad's
> user mapping support for guest_memfd series[1]. For example, for the
> arm64 change he is making, the newly introduced gmem_abort() would need
> to be enlightened to handle KVM Userfault exits.

Appreciate the heads up!

Thanks,
Oliver
Re: [PATCH v3 00/15] KVM: Introduce KVM Userfault
Posted by Nikita Kalyazin 1 month ago

On 18/06/2025 05:24, James Houghton wrote:
> Hi Sean, Paolo, Oliver, + others,
> 
> Here is a v3 of KVM Userfault. Thanks for all the feedback on the v2,
> Sean. I realize it has been 6 months since the v2; I hope that isn't an
> issue.
> 
> I am working on the QEMU side of the changes as I get time. Let me know
> if it's important for me to send those patches out for this series to be
> merged.

Hi Sean and others,

Are there any blockers for merging this series?  We would like to use 
the functionality in Firecracker for restoring guest_memfd-backed VMs 
from snapshots via UFFD [1].  [2] is a Firecracker feature branch that 
builds on top of KVM userfault, along with direct map removal [3], write 
syscall [4] and UFFD support [5] in guest_memfd (currently in discussion 
with MM at [6]) series.

Thanks,
Nikita

[1]: 
https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/handling-page-faults-on-snapshot-resume.md
[2]: 
https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[3]: https://lore.kernel.org/kvm/20250828093902.2719-1-roypat@amazon.co.uk
[4]: https://lore.kernel.org/kvm/20250902111951.58315-1-kalyazin@amazon.com
[5]: https://lore.kernel.org/kvm/20250404154352.23078-1-kalyazin@amazon.com
[6]: 
https://lore.kernel.org/linux-mm/20250627154655.2085903-1-peterx@redhat.com

> Be aware that this series will have non-trivial conflicts with Fuad's
> user mapping support for guest_memfd series[1]. For example, for the
> arm64 change he is making, the newly introduced gmem_abort() would need
> to be enlightened to handle KVM Userfault exits.
> 
> Changelog:
> v2[2]->v3:
> - Pull in Sean's changes to genericize struct kvm_page_fault and use it
>    for arm64. Many of these patches now have Sean's SoB.
> - Pull in Sean's small rename and squashing of the main patches.
> - Add kvm_arch_userfault_enabled() in place of calling
>    kvm_arch_flush_shadow_memslot() directly from generic code.
> - Pull in Xin Li's documentation section number fix for
>    KVM_CAP_ARM_WRITABLE_IMP_ID_REGS[3].
> v1[4]->v2:
> - For arm64, no longer zap stage 2 when disabling KVM_MEM_USERFAULT
>    (thanks Oliver).
> - Fix the userfault_bitmap validation and casts (thanks kernel test
>    robot).
> - Fix _Atomic cast for the userfault bitmap in the selftest (thanks
>    kernel test robot).
> - Pick up Reviewed-by on doc changes (thanks Bagas).
> 
> Below is the cover letter from v1, mostly unchanged:
> 
> Please see the RFC[5] for the problem description. In summary,
> guest_memfd VMs have no mechanism for doing post-copy live migration.
> KVM Userfault provides such a mechanism.
> 
> There is a second problem that KVM Userfault solves: userfaultfd-based
> post-copy doesn't scale very well. KVM Userfault when used with
> userfaultfd can scale much better in the common case that most post-copy
> demand fetches are a result of vCPU access violations. This is a
> continuation of the solution Anish was working on[6]. This aspect of
> KVM Userfault is important for userfaultfd-based live migration when
> scaling up to hundreds of vCPUs with ~30us network latency for a
> PAGE_SIZE demand-fetch.
> 
> The implementation in this series is version than the RFC[5]. It adds...
>   1. a new memslot flag is added: KVM_MEM_USERFAULT,
>   2. a new parameter, userfault_bitmap, into struct kvm_memory_slot,
>   3. a new KVM_RUN exit reason: KVM_MEMORY_EXIT_FLAG_USERFAULT,
>   4. a new KVM capability KVM_CAP_USERFAULT.
> 
> KVM Userfault does not attempt to catch KVM's own accesses to guest
> memory. That is left up to userfaultfd.
> 
> When enabling KVM_MEM_USERFAULT for a memslot, the second-stage mappings
> are zapped, and new faults will check `userfault_bitmap` to see if the
> fault should exit to userspace.
> 
> When KVM_MEM_USERFAULT is enabled, only PAGE_SIZE mappings are
> permitted.
> 
> When disabling KVM_MEM_USERFAULT, huge mappings will be reconstructed
> consistent with dirty log disabling. So on x86, huge mappings will be
> reconstructed, but on arm64, they won't be.
> 
> KVM Userfault is not compatible with async page faults. Nikita has
> proposed a new implementation of async page faults that is more
> userspace-driven that *is* compatible with KVM Userfault[7].
> 
> See v1 for more performance details[4]. They are unchanged in this
> version.
> 
> This series is based on the latest kvm-x86/next.
> 
> [1]: https://lore.kernel.org/kvm/20250611133330.1514028-1-tabba@google.com/
> [2]: https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com/
> [3]: https://lore.kernel.org/kvm/20250414165146.2279450-1-xin@zytor.com/
> [4]: https://lore.kernel.org/kvm/20241204191349.1730936-1-jthoughton@google.com/
> [5]: https://lore.kernel.org/kvm/20240710234222.2333120-1-jthoughton@google.com/
> [6]: https://lore.kernel.org/all/20240215235405.368539-1-amoorthy@google.com/
> [7]: https://lore.kernel.org/kvm/20241118123948.4796-1-kalyazin@amazon.com/#t
> 
> James Houghton (11):
>    KVM: Add common infrastructure for KVM Userfaults
>    KVM: x86: Add support for KVM userfault exits
>    KVM: arm64: Add support for KVM userfault exits
>    KVM: Enable and advertise support for KVM userfault exits
>    KVM: selftests: Fix vm_mem_region_set_flags docstring
>    KVM: selftests: Fix prefault_mem logic
>    KVM: selftests: Add va_start/end into uffd_desc
>    KVM: selftests: Add KVM Userfault mode to demand_paging_test
>    KVM: selftests: Inform set_memory_region_test of KVM_MEM_USERFAULT
>    KVM: selftests: Add KVM_MEM_USERFAULT + guest_memfd toggle tests
>    KVM: Documentation: Add KVM_CAP_USERFAULT and KVM_MEM_USERFAULT
>      details
> 
> Sean Christopherson (3):
>    KVM: x86/mmu: Move "struct kvm_page_fault" definition to
>      asm/kvm_host.h
>    KVM: arm64: Add "struct kvm_page_fault" to gather common fault
>      variables
>    KVM: arm64: x86: Require "struct kvm_page_fault" for memory fault
>      exits
> 
> Xin Li (Intel) (1):
>    KVM: Documentation: Fix section number for
>      KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
> 
>   Documentation/virt/kvm/api.rst                |  35 ++++-
>   arch/arm64/include/asm/kvm_host.h             |   9 ++
>   arch/arm64/kvm/Kconfig                        |   1 +
>   arch/arm64/kvm/mmu.c                          |  48 +++---
>   arch/x86/include/asm/kvm_host.h               |  68 +++++++-
>   arch/x86/kvm/Kconfig                          |   1 +
>   arch/x86/kvm/mmu/mmu.c                        |  13 +-
>   arch/x86/kvm/mmu/mmu_internal.h               |  77 +---------
>   arch/x86/kvm/x86.c                            |  27 ++--
>   include/linux/kvm_host.h                      |  49 +++++-
>   include/uapi/linux/kvm.h                      |   6 +-
>   .../selftests/kvm/demand_paging_test.c        | 145 ++++++++++++++++--
>   .../testing/selftests/kvm/include/kvm_util.h  |   5 +
>   .../selftests/kvm/include/userfaultfd_util.h  |   2 +
>   tools/testing/selftests/kvm/lib/kvm_util.c    |  42 ++++-
>   .../selftests/kvm/lib/userfaultfd_util.c      |   2 +
>   .../selftests/kvm/set_memory_region_test.c    |  33 ++++
>   virt/kvm/Kconfig                              |   3 +
>   virt/kvm/kvm_main.c                           |  57 ++++++-
>   19 files changed, 489 insertions(+), 134 deletions(-)
> 
> 
> base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
> --
> 2.50.0.rc2.692.g299adb8693-goog
>
Re: [PATCH v3 00/15] KVM: Introduce KVM Userfault
Posted by James Houghton 1 month ago
On Thu, Sep 4, 2025 at 9:43 AM Nikita Kalyazin <kalyazin@amazon.com> wrote:
>
>
>
> On 18/06/2025 05:24, James Houghton wrote:
> > Hi Sean, Paolo, Oliver, + others,
> >
> > Here is a v3 of KVM Userfault. Thanks for all the feedback on the v2,
> > Sean. I realize it has been 6 months since the v2; I hope that isn't an
> > issue.
> >
> > I am working on the QEMU side of the changes as I get time. Let me know
> > if it's important for me to send those patches out for this series to be
> > merged.
>
> Hi Sean and others,
>
> Are there any blockers for merging this series?  We would like to use
> the functionality in Firecracker for restoring guest_memfd-backed VMs
> from snapshots via UFFD [1].  [2] is a Firecracker feature branch that
> builds on top of KVM userfault, along with direct map removal [3], write
> syscall [4] and UFFD support [5] in guest_memfd (currently in discussion
> with MM at [6]) series.

Glad to hear that you need this series. :)

I am on the hook to get some QEMU patches to demonstrate that KVM
Userfault can work well with it. I'll try to get that done ASAP now
that you've expressed interest. The firecracker patches are a nice
demonstration that this could work too... (I wish the VMM I work on
was open-source).

I think the current "blocker" is the kvm_page_fault stuff[*]; KVM
Userfault will be the first user of this API. I'll review that series
in the next few days. I'm pretty sure Sean doesn't have any conceptual
issues with KVM Userfault as implemented in this series.

[*]: https://lore.kernel.org/linux-arm-kernel/20250821210042.3451147-1-seanjc@google.com/

>
> Thanks,
> Nikita
>
> [1]:
> https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/handling-page-faults-on-snapshot-resume.md
> [2]:
> https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
> [3]: https://lore.kernel.org/kvm/20250828093902.2719-1-roypat@amazon.co.uk
> [4]: https://lore.kernel.org/kvm/20250902111951.58315-1-kalyazin@amazon.com
> [5]: https://lore.kernel.org/kvm/20250404154352.23078-1-kalyazin@amazon.com
> [6]:
> https://lore.kernel.org/linux-mm/20250627154655.2085903-1-peterx@redhat.com
Re: [PATCH v3 00/15] KVM: Introduce KVM Userfault
Posted by Sean Christopherson 1 month ago
On Thu, Sep 04, 2025, James Houghton wrote:
> On Thu, Sep 4, 2025 at 9:43 AM Nikita Kalyazin <kalyazin@amazon.com> wrote:
> > Are there any blockers for merging this series?  We would like to use
> > the functionality in Firecracker for restoring guest_memfd-backed VMs
> > from snapshots via UFFD [1].  [2] is a Firecracker feature branch that
> > builds on top of KVM userfault, along with direct map removal [3], write
> > syscall [4] and UFFD support [5] in guest_memfd (currently in discussion
> > with MM at [6]) series.
> 
> Glad to hear that you need this series. :)

Likewise (though I had slightly-advanced warning from Patrick that Firecracker
wants KVM Userfault).  The main reason I haven't pushed harder on this series is
that I didn't think anyone wanted to use it within the next ~year.

> I am on the hook to get some QEMU patches to demonstrate that KVM
> Userfault can work well with it. I'll try to get that done ASAP now
> that you've expressed interest. The firecracker patches are a nice
> demonstration that this could work too... (I wish the VMM I work on
> was open-source).
> 
> I think the current "blocker" is the kvm_page_fault stuff[*]; KVM
> Userfault will be the first user of this API. I'll review that series
> in the next few days. I'm pretty sure Sean doesn't have any conceptual
> issues with KVM Userfault as implemented in this series.

Yep, Oliver and I (and anyone else that has an opinion) just need to align on the
interface for arch-neutral code.  I think that's mostly on me to spin a v2, and
maybe to show how it all looks when integrated with the userfault stuff.
Re: [PATCH v3 00/15] KVM: Introduce KVM Userfault
Posted by Nikita Kalyazin 2 weeks, 1 day ago

On 05/09/2025 13:27, Sean Christopherson wrote:
> On Thu, Sep 04, 2025, James Houghton wrote:
>> On Thu, Sep 4, 2025 at 9:43 AM Nikita Kalyazin <kalyazin@amazon.com> wrote:
>>> Are there any blockers for merging this series?  We would like to use
>>> the functionality in Firecracker for restoring guest_memfd-backed VMs
>>> from snapshots via UFFD [1].  [2] is a Firecracker feature branch that
>>> builds on top of KVM userfault, along with direct map removal [3], write
>>> syscall [4] and UFFD support [5] in guest_memfd (currently in discussion
>>> with MM at [6]) series.
>>
>> Glad to hear that you need this series. :)
> 
> Likewise (though I had slightly-advanced warning from Patrick that Firecracker
> wants KVM Userfault).  The main reason I haven't pushed harder on this series is
> that I didn't think anyone wanted to use it within the next ~year.
> 
>> I am on the hook to get some QEMU patches to demonstrate that KVM
>> Userfault can work well with it. I'll try to get that done ASAP now
>> that you've expressed interest. The firecracker patches are a nice
>> demonstration that this could work too... (I wish the VMM I work on
>> was open-source).
>>
>> I think the current "blocker" is the kvm_page_fault stuff[*]; KVM
>> Userfault will be the first user of this API. I'll review that series
>> in the next few days. I'm pretty sure Sean doesn't have any conceptual
>> issues with KVM Userfault as implemented in this series.
> 
> Yep, Oliver and I (and anyone else that has an opinion) just need to align on the
> interface for arch-neutral code.  I think that's mostly on me to spin a v2, and
> maybe to show how it all looks when integrated with the userfault stuff.\

Sounds good, thanks.  Do you think you'll be having time to work on the 
v2 soonish?  Is defining and implementing the interface a strict 
prerequisite for this series?