[PATCH RFC 00/12] guest_memfd: support in-place memory conversion

Michael Roth posted 12 patches 2 days, 16 hours ago
Failed in applying to current master (apply log)
accel/kvm/kvm-all.c                                | 286 +++++++++++--
accel/stubs/kvm-stub.c                             |   9 +-
backends/confidential-guest-support.c              |  25 ++
backends/hostmem-guest-memfd.c                     |  93 +++++
backends/meson.build                               |   1 +
include/standard-headers/drm/drm_fourcc.h          |  28 +-
include/standard-headers/linux/const.h             |  18 +
include/standard-headers/linux/ethtool.h           |  28 +-
include/standard-headers/linux/input-event-codes.h |  13 +
include/standard-headers/linux/pci_regs.h          |  71 +++-
include/standard-headers/linux/typelimits.h        |   8 +
include/standard-headers/linux/virtio_ring.h       |   5 +-
include/standard-headers/linux/virtio_rtc.h        | 237 +++++++++++
include/standard-headers/linux/vmclock-abi.h       |  20 +
include/system/confidential-guest-support.h        |  14 +
include/system/hostmem.h                           |   1 +
include/system/kvm.h                               |   3 +-
include/system/memory.h                            |   8 +-
linux-headers/asm-arm64/kvm.h                      |   1 +
linux-headers/asm-arm64/unistd_64.h                |   1 +
linux-headers/asm-generic/unistd.h                 |   5 +-
linux-headers/asm-loongarch/kvm.h                  |   5 +
linux-headers/asm-loongarch/kvm_para.h             |   1 +
linux-headers/asm-loongarch/unistd_64.h            |   2 +
linux-headers/asm-mips/unistd_n32.h                |   1 +
linux-headers/asm-mips/unistd_n64.h                |   1 +
linux-headers/asm-mips/unistd_o32.h                |   1 +
linux-headers/asm-powerpc/unistd_32.h              |   1 +
linux-headers/asm-powerpc/unistd_64.h              |   1 +
linux-headers/asm-riscv/kvm.h                      |  11 +-
linux-headers/asm-riscv/ptrace.h                   |  37 ++
linux-headers/asm-riscv/unistd_32.h                |   1 +
linux-headers/asm-riscv/unistd_64.h                |   1 +
linux-headers/asm-s390/unistd_32.h                 | 446 ---------------------
linux-headers/asm-s390/unistd_64.h                 |   1 +
linux-headers/asm-x86/kvm.h                        |  21 +-
linux-headers/asm-x86/unistd_32.h                  |   1 +
linux-headers/asm-x86/unistd_64.h                  |   1 +
linux-headers/asm-x86/unistd_x32.h                 |   1 +
linux-headers/linux/const.h                        |  18 +
linux-headers/linux/iommufd.h                      |  48 +++
linux-headers/linux/kvm.h                          |  62 ++-
linux-headers/linux/mshv.h                         |   4 +-
linux-headers/linux/psp-sev.h                      |   2 +-
linux-headers/linux/stddef.h                       |   4 +
linux-headers/linux/vduse.h                        |  85 +++-
linux-headers/linux/vfio.h                         |  30 +-
qapi/qom.json                                      |  35 +-
qemu-options.hx                                    |   5 +
system/memory.c                                    |  22 +-
system/physmem.c                                   |  50 ++-
target/i386/sev.c                                  |  12 +-
52 files changed, 1253 insertions(+), 533 deletions(-)
create mode 100644 backends/hostmem-guest-memfd.c
create mode 100644 include/standard-headers/linux/typelimits.h
create mode 100644 include/standard-headers/linux/virtio_rtc.h
delete mode 100644 linux-headers/asm-s390/unistd_32.h
[PATCH RFC 00/12] guest_memfd: support in-place memory conversion
Posted by Michael Roth 2 days, 16 hours ago
This patchset is also available at:

  https://github.com/amdese/qemu/commits/snp-inplace-rfc1

which is in turn based on the following series:

  [PATCH 0/4] "guest_memfd: Fix handling for conversions of MMIO ranges"
  https://lists.gnu.org/archive/html/qemu-devel/2026-05/msg07547.html


OVERVIEW
--------

This series adds guest_memfd support for in-place conversion of memory
between private/shared, and enables it for SEV-SNP guests. It is based
on recently-added kernel support for mmap()-able guest_memfd
instances[1], which allow it to be used for shared memory, and the
following patchset[2], which adds additional guest_memfd interfaces to
allow it to be used to perform in-place conversion:

  "[PATCH v7 00/42] guest_memfd: In-place conversion support"
  https://lore.kernel.org/kvm/20260522-gmem-inplace-conversion-v7-0-2f0fae496530@google.com/

That series also introduces a new 'vm_memory_attributes' KVM
module option, which sets whether memory attributes are tracked
VM-wide by KVM (vm_memory_attributes=1: the existing 'legacy' mode),
or per-guest_memfd instance (vm_memory_attributes=0: the new mode
which allows for in-place conversion). The latter is intended to
eventually deprecate the legacy mode, at which point in-place
conversion would become the primarily-supported mode.


MOTIVATION
----------

Today, SEV-SNP guests (and other CoCo VM types using guest_memfd) keep
shared and private memory on separate physical backings: a userspace
memory-backend object for shared pages, and a kernel-allocated
guest_memfd file descriptor for private pages. KVM_SET_MEMORY_ATTRIBUTES
flips which backing the guest sees for a given GPA range, and the old
backing is typically discarded / hole-punched on conversion to avoid
doubled memory usage.

That model works, but has a number of downsides that impact certain
use-cases:

  - Each conversion involves discarding pages on one side and faulting
    them in on the other, which incurs allocation overheads in the
    host kernel for every conversion.

  - Some use-cases, like pKVM[3], rely on memory isolation rather than
    encryption and rely on in-place conversion to pass through things
    like secured framebuffer memory without needing to bounce data
    through separate shared/private HPAs, which would introduce
    unacceptable latency for that sort of workload.

  - Hugetlb support[4] for guest_memfd will rely on it, since things like
    1GB hugepages with a mix of shared/private sub-ranges would generally
    require 2 1GB hugetlb pages to remain available to handle shared vs.
    private accesses, which quickly causes doubling of guest memory usage.

Recent kernel work[2] makes guest_memfd mmap()-able and lets the *same*
physical pages be used for both shared and private states for a given
GPA range, allowing the above pitfalls to be naturally avoided.

This series wires that support up in QEMU.


DESIGN
------

A new dedicated memory backend, memory-backend-guest-memfd, allocates
its memory via a guest_memfd file descriptor obtained from KVM with
the GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED flags. The fd
is mmap()ed so userspace can access pages directly while they are in
the shared state. For a normal/non-confidential VM, this backend can
be used in a similar fashion as the existing memory-backend-memfd.

For confidential VMs, a new 'convert-in-place' flag is added to switch
on in-place conversion support. When running in this mode, the user
*MUST* use memory-backend-guest-memfd for backing guest RAM. A new
RAM_GUEST_MEMFD_SHARED RAMBlock flag is added to track/enforce the
dependency. Additionally, QEMU is modified to use mmap()-able
guest_memfd and set this flag for other cases where it allocates RAM
internally. As a result, block->fd will generally always a guest_memfd,
and when RAM_GUEST_MEMFD_SHARED is set then that block->fd will be
qemu_dup()'d as the FD handle for private memory is well (which is
currently what block->guest_memfd point to). This allows the prior
non-in-place handling around block->guest_memfd to be kept mostly
unchanged.

When running with convert-in-place=true, shared/private conversions
are no longer handled directly by KVM, but instead by a new guest_memfd
ioctl, KVM_SET_MEMORY_ATTRIBUTES2, which purposely provides similar
naming/implementation to the KVM_SET_MEMORY_ATTRIBUTES KVM ioctl that
it replaces. This series adds handling to route conversion requests to
the appropriate ioctls based on whether or not in-place conversion is
enabled.

Since guest_memfd ioctls need to be called against the specific
guest_memfd inode associated with each memory slot/region, some
refactoring is needed to handle conversions on a per-section. Much of
that is inherited from the bugfix series this patchset is based on top
of, which adds the initial logic for handling multiple sections within
a range that gets heavily re-used here.


USAGE
-----

After applying this series against a kernel with the RFC patches above
present, an SEV-SNP guest can be started with in-place conversion via:

    qemu-system-x86_64 \
        -machine q35,confidential-guest-support=sev0,memory-backend=ram0 \
        -object memory-backend-guest-memfd,id=ram0,size=8G,share=on \
        -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,\
                convert-in-place=on \
        ...

The new memory-backend-guest-memfd can also be used by normal VMs:

    qemu-system-x86_64 \
        -machine q35,memory-backend=ram0 \
        -object memory-backend-guest-memfd,id=ram0,size=8G,share=on \
        ...

This is mainly only useful atm for testing, but in the future there may
be more use-cases around using guest_memfd as a general-purpose backend
for non-confidential VMs, so it is intended to work in this manner as
well.


NOTES/TODO
----------

  - the CPR handling to support resetting of confidential VMs is
    currently disabled when in-place conversion is enabled.
  - TDX testing would be great, in theory it can be enabled with this
    series (similarly to the top patch) but I'm not sure if there are
    other special requirements before we can switch it on.
  - kernel patches are still in-flight, but fairly mature at this point
    and nearing upstream


REFERENCES
----------

[1] https://lore.kernel.org/kvm/20250729225455.670324-1-seanjc@google.com/
[2] https://lore.kernel.org/kvm/20260522-gmem-inplace-conversion-v7-0-2f0fae496530@google.com/
[3] https://www.youtube.com/watch?v=MMfAGNW9RVg
[4] 1GB hugetlb v2


Thoughts, feedback, and testing are very much appreciated.

Thanks,

Mike


----------------------------------------------------------------
Michael Roth (12):
      accel/kvm: Decouple guest_memfd checks from memory attribute checks
      hostmem: Introduce dedicated memory backend for guest_memfd
      linux-headers: Update headers for v7 of in-place conversion kernel support
      accel/kvm: Add CGS option to control in-place conversion support
      system/memory: Re-use memory-backend-guest-memfd inode for private memory
      system/memory: Default to guest_memfd for RAM for in-place conversion
      accel/kvm: Move post-conversion updates to a separate helper
      accel/kvm: Re-order attribute notifications for in-place conversion
      accel/kvm: Support shared/private conversions via guest_memfd ioctls
      accel/kvm: Don't default to private attributes for in-place conversion
      i386/sev: Update SNP_LAUNCH_UPDATE for in-place conversion
      i386/sev: Allow in-place conversion for SEV-SNP guests

 accel/kvm/kvm-all.c                                | 286 +++++++++++--
 accel/stubs/kvm-stub.c                             |   9 +-
 backends/confidential-guest-support.c              |  25 ++
 backends/hostmem-guest-memfd.c                     |  93 +++++
 backends/meson.build                               |   1 +
 include/standard-headers/drm/drm_fourcc.h          |  28 +-
 include/standard-headers/linux/const.h             |  18 +
 include/standard-headers/linux/ethtool.h           |  28 +-
 include/standard-headers/linux/input-event-codes.h |  13 +
 include/standard-headers/linux/pci_regs.h          |  71 +++-
 include/standard-headers/linux/typelimits.h        |   8 +
 include/standard-headers/linux/virtio_ring.h       |   5 +-
 include/standard-headers/linux/virtio_rtc.h        | 237 +++++++++++
 include/standard-headers/linux/vmclock-abi.h       |  20 +
 include/system/confidential-guest-support.h        |  14 +
 include/system/hostmem.h                           |   1 +
 include/system/kvm.h                               |   3 +-
 include/system/memory.h                            |   8 +-
 linux-headers/asm-arm64/kvm.h                      |   1 +
 linux-headers/asm-arm64/unistd_64.h                |   1 +
 linux-headers/asm-generic/unistd.h                 |   5 +-
 linux-headers/asm-loongarch/kvm.h                  |   5 +
 linux-headers/asm-loongarch/kvm_para.h             |   1 +
 linux-headers/asm-loongarch/unistd_64.h            |   2 +
 linux-headers/asm-mips/unistd_n32.h                |   1 +
 linux-headers/asm-mips/unistd_n64.h                |   1 +
 linux-headers/asm-mips/unistd_o32.h                |   1 +
 linux-headers/asm-powerpc/unistd_32.h              |   1 +
 linux-headers/asm-powerpc/unistd_64.h              |   1 +
 linux-headers/asm-riscv/kvm.h                      |  11 +-
 linux-headers/asm-riscv/ptrace.h                   |  37 ++
 linux-headers/asm-riscv/unistd_32.h                |   1 +
 linux-headers/asm-riscv/unistd_64.h                |   1 +
 linux-headers/asm-s390/unistd_32.h                 | 446 ---------------------
 linux-headers/asm-s390/unistd_64.h                 |   1 +
 linux-headers/asm-x86/kvm.h                        |  21 +-
 linux-headers/asm-x86/unistd_32.h                  |   1 +
 linux-headers/asm-x86/unistd_64.h                  |   1 +
 linux-headers/asm-x86/unistd_x32.h                 |   1 +
 linux-headers/linux/const.h                        |  18 +
 linux-headers/linux/iommufd.h                      |  48 +++
 linux-headers/linux/kvm.h                          |  62 ++-
 linux-headers/linux/mshv.h                         |   4 +-
 linux-headers/linux/psp-sev.h                      |   2 +-
 linux-headers/linux/stddef.h                       |   4 +
 linux-headers/linux/vduse.h                        |  85 +++-
 linux-headers/linux/vfio.h                         |  30 +-
 qapi/qom.json                                      |  35 +-
 qemu-options.hx                                    |   5 +
 system/memory.c                                    |  22 +-
 system/physmem.c                                   |  50 ++-
 target/i386/sev.c                                  |  12 +-
 52 files changed, 1253 insertions(+), 533 deletions(-)
 create mode 100644 backends/hostmem-guest-memfd.c
 create mode 100644 include/standard-headers/linux/typelimits.h
 create mode 100644 include/standard-headers/linux/virtio_rtc.h
 delete mode 100644 linux-headers/asm-s390/unistd_32.h
Re: [PATCH RFC 00/12] guest_memfd: support in-place memory conversion
Posted by Xiaoyao Li 2 days, 11 hours ago
On 5/28/2026 8:03 AM, Michael Roth wrote:
> This patchset is also available at:
> 
>    https://github.com/amdese/qemu/commits/snp-inplace-rfc1
> 
> which is in turn based on the following series:
> 
>    [PATCH 0/4] "guest_memfd: Fix handling for conversions of MMIO ranges"
>    https://lists.gnu.org/archive/html/qemu-devel/2026-05/msg07547.html
> 
> 
> OVERVIEW
> --------
> 
> This series adds guest_memfd support for in-place conversion of memory
> between private/shared, and enables it for SEV-SNP guests. It is based
> on recently-added kernel support for mmap()-able guest_memfd
> instances[1], which allow it to be used for shared memory, and the
> following patchset[2], which adds additional guest_memfd interfaces to
> allow it to be used to perform in-place conversion:
> 
>    "[PATCH v7 00/42] guest_memfd: In-place conversion support"
>    https://lore.kernel.org/kvm/20260522-gmem-inplace-conversion-v7-0-2f0fae496530@google.com/
> 
> That series also introduces a new 'vm_memory_attributes' KVM
> module option, which sets whether memory attributes are tracked
> VM-wide by KVM (vm_memory_attributes=1: the existing 'legacy' mode),
> or per-guest_memfd instance (vm_memory_attributes=0: the new mode
> which allows for in-place conversion). The latter is intended to
> eventually deprecate the legacy mode, at which point in-place
> conversion would become the primarily-supported mode.
> 
> 
> MOTIVATION
> ----------
> 
> Today, SEV-SNP guests (and other CoCo VM types using guest_memfd) keep
> shared and private memory on separate physical backings: a userspace
> memory-backend object for shared pages, and a kernel-allocated
> guest_memfd file descriptor for private pages. KVM_SET_MEMORY_ATTRIBUTES
> flips which backing the guest sees for a given GPA range, and the old
> backing is typically discarded / hole-punched on conversion to avoid
> doubled memory usage.
> 
> That model works, but has a number of downsides that impact certain
> use-cases:
> 
>    - Each conversion involves discarding pages on one side and faulting
>      them in on the other, which incurs allocation overheads in the
>      host kernel for every conversion.
> 
>    - Some use-cases, like pKVM[3], rely on memory isolation rather than
>      encryption and rely on in-place conversion to pass through things
>      like secured framebuffer memory without needing to bounce data
>      through separate shared/private HPAs, which would introduce
>      unacceptable latency for that sort of workload.
> 
>    - Hugetlb support[4] for guest_memfd will rely on it, since things like
>      1GB hugepages with a mix of shared/private sub-ranges would generally
>      require 2 1GB hugetlb pages to remain available to handle shared vs.
>      private accesses, which quickly causes doubling of guest memory usage.
> 
> Recent kernel work[2] makes guest_memfd mmap()-able and lets the *same*
> physical pages be used for both shared and private states for a given
> GPA range, allowing the above pitfalls to be naturally avoided.
> 
> This series wires that support up in QEMU.

+ Peter,

Peter had the series[*] to enable the mmap() of guest memfd and allow it 
serve as unencrypted memory for VMs. I believe there are some overlapped 
efforts.

[*] 
https://lore.kernel.org/qemu-devel/20251215205203.1185099-1-peterx@redhat.com/

> 
> DESIGN
> ------
> 
> A new dedicated memory backend, memory-backend-guest-memfd, allocates
> its memory via a guest_memfd file descriptor obtained from KVM with
> the GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED flags. 

A quick feedback:

The design choice from Peter's series was to extend the current 
hostmem-memfd backend to support guest-memfd instead of a new dedicated 
backend.
I think we need to evaluate the pros and cons of each other, and make a 
choice.

(I will go read the other part later and provide more feedback)