Documentation/virt/kvm/api.rst | 2 + include/linux/kvm_host.h | 2 +- include/uapi/linux/kvm.h | 1 + .../testing/selftests/kvm/guest_memfd_test.c | 58 +++++++++++++++++-- virt/kvm/guest_memfd.c | 52 +++++++++++++++++ 5 files changed, 108 insertions(+), 7 deletions(-)
On systems that support shared guest memory, write() is useful, for example, for population of the initial image. Even though the same can also be achieved via userspace mapping and memcpying from userspace, write() provides a more performant option because it does not need to set user page tables and it does not cause a page fault for every page like memcpy would. Note that memcpy cannot be accelerated via MADV_POPULATE_WRITE as it is not supported by guest_memfd and relies on GUP. Populating 512MiB of guest_memfd on a x86 machine: - via memcpy: 436 ms - via write: 202 ms (-54%) Only PAGE_ALIGNED offset and len are allowed. Even though non-aligned writes are technically possible, when in-place conversion support is implemented [1], the restriction makes handling of mixed shared/private huge pages simpler. write() will only be allowed to populate shared pages. When direct map removal is implemented [2] - write() will not be allowed to access pages that have already been removed from direct map - on completion, write() will remove the populated pages from direct map While it is technically possible to implement read() syscall on systems with shared guest memory, it is not supported as there is currently no use case for it. [1] https://lore.kernel.org/kvm/cover.1760731772.git.ackerleytng@google.com [2] https://lore.kernel.org/kvm/20250924151101.2225820-1-patrick.roy@campus.lmu.de Nikita Kalyazin (2): KVM: guest_memfd: add generic population via write KVM: selftests: update guest_memfd write tests Documentation/virt/kvm/api.rst | 2 + include/linux/kvm_host.h | 2 +- include/uapi/linux/kvm.h | 1 + .../testing/selftests/kvm/guest_memfd_test.c | 58 +++++++++++++++++-- virt/kvm/guest_memfd.c | 52 +++++++++++++++++ 5 files changed, 108 insertions(+), 7 deletions(-) base-commit: 8a4821412cf2c1429fffa07c012dd150f2edf78c -- 2.50.1
On 14/11/2025 15:18, Kalyazin, Nikita wrote:
> On systems that support shared guest memory, write() is useful, for
> example, for population of the initial image. Even though the same can
> also be achieved via userspace mapping and memcpying from userspace,
> write() provides a more performant option because it does not need to
> set user page tables and it does not cause a page fault for every page
> like memcpy would. Note that memcpy cannot be accelerated via
> MADV_POPULATE_WRITE as it is not supported by guest_memfd and relies on
> GUP.
>
> Populating 512MiB of guest_memfd on a x86 machine:
> - via memcpy: 436 ms
> - via write: 202 ms (-54%)
>
> Only PAGE_ALIGNED offset and len are allowed. Even though non-aligned
> writes are technically possible, when in-place conversion support is
> implemented [1], the restriction makes handling of mixed shared/private
> huge pages simpler. write() will only be allowed to populate shared
> pages.
>
> When direct map removal is implemented [2]
> - write() will not be allowed to access pages that have already
> been removed from direct map
> - on completion, write() will remove the populated pages from
> direct map
>
> While it is technically possible to implement read() syscall on systems
> with shared guest memory, it is not supported as there is currently no
> use case for it.
>
> [1]
> https://lore.kernel.org/kvm/cover.1760731772.git.ackerleytng@google.com
> [2]
> https://lore.kernel.org/kvm/20250924151101.2225820-1-patrick.roy@campus.lmu.de
I failed to include links to previous versions:
v7:
- Sean: add GUEST_MEMFD_FLAG_WRITE and documentation for it
- Ackerley: only allow PAGE_ALIGNED offset and len
- Sean/Ackerley: formatting fixes
v6:
- https://lore.kernel.org/kvm/20251020161352.69257-1-kalyazin@amazon.com
- Make write support conditional on mmap support instead of relying on
the up-to-date flag to decide whether writing to a page is allowed
- James: Remove dependencies on folio_test_large
- James: Remove page alignment restriction
- James: Formatting fixes
v5:
- https://lore.kernel.org/kvm/20250902111951.58315-1-kalyazin@amazon.com
- Replace the call to the unexported filemap_remove_folio with
zeroing the bytes that could not be copied
- Fix checkpatch findings
v4:
- https://lore.kernel.org/kvm/20250828153049.3922-1-kalyazin@amazon.com
- Switch from implementing the write callback to write_iter
- Remove conditional compilation
v3:
- https://lore.kernel.org/kvm/20250303130838.28812-1-kalyazin@amazon.com
- David/Mike D: Only compile support for the write syscall if
CONFIG_KVM_GMEM_SHARED_MEM (now gone) is enabled.
v2:
- https://lore.kernel.org/kvm/20241129123929.64790-1-kalyazin@amazon.com
- Switch from an ioctl to the write syscall to implement population
v1:
- https://lore.kernel.org/kvm/20241024095429.54052-1-kalyazin@amazon.com
>
> Nikita Kalyazin (2):
> KVM: guest_memfd: add generic population via write
> KVM: selftests: update guest_memfd write tests
>
> Documentation/virt/kvm/api.rst | 2 +
> include/linux/kvm_host.h | 2 +-
> include/uapi/linux/kvm.h | 1 +
> .../testing/selftests/kvm/guest_memfd_test.c | 58 +++++++++++++++++--
> virt/kvm/guest_memfd.c | 52 +++++++++++++++++
> 5 files changed, 108 insertions(+), 7 deletions(-)
>
>
> base-commit: 8a4821412cf2c1429fffa07c012dd150f2edf78c
> --
> 2.50.1
>
© 2016 - 2025 Red Hat, Inc.