Documentation/virt/kvm/api.rst | 139 ++++- .../virt/kvm/x86/amd-memory-encryption.rst | 19 +- Documentation/virt/kvm/x86/intel-tdx.rst | 4 + arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/Kconfig | 15 +- arch/x86/kvm/mmu/mmu.c | 20 +- arch/x86/kvm/svm/sev.c | 18 +- arch/x86/kvm/vmx/tdx.c | 8 +- arch/x86/kvm/x86.c | 145 ++++- include/linux/kvm_host.h | 74 ++- include/trace/events/kvm.h | 4 +- include/uapi/linux/kvm.h | 21 + mm/swap.c | 2 + tools/testing/selftests/kvm/Makefile.kvm | 5 + tools/testing/selftests/kvm/include/kvm_util.h | 141 ++++- tools/testing/selftests/kvm/include/test_util.h | 34 +- .../selftests/kvm/kvm_has_gmem_attributes.c | 17 + tools/testing/selftests/kvm/lib/kvm_util.c | 130 +++-- tools/testing/selftests/kvm/lib/test_util.c | 7 - tools/testing/selftests/kvm/lib/x86/sev.c | 2 +- .../testing/selftests/kvm/pre_fault_memory_test.c | 4 +- .../kvm/x86/guest_memfd_conversions_test.c | 552 +++++++++++++++++++ .../kvm/x86/private_mem_conversions_test.c | 55 +- .../kvm/x86/private_mem_conversions_test.sh | 128 +++++ .../selftests/kvm/x86/private_mem_kvm_exits_test.c | 38 +- virt/kvm/Kconfig | 3 +- virt/kvm/guest_memfd.c | 591 ++++++++++++++++++++- virt/kvm/kvm_main.c | 87 ++- 28 files changed, 2075 insertions(+), 190 deletions(-)
This is RFC v5 of guest_memfd in-place conversion support.
Up till now, guest_memfd supports the entire inode worth of memory being
used as all-shared, or all-private. CoCo VMs may request guest memory to be
converted between private and shared states, and the only way to support
that currently would be to have the userspace VMM provide two sources of
backing memory from completely different areas of physical memory.
pKVM has a use case for in-place sharing: the guest and host may be
cooperating on given data, and pKVM doesn't protect data through
encryption, so copying that given data between different areas of physical
memory as part of conversions would be unnecessary work.
This series also serves as a foundation for guest_memfd huge page
support. Now, guest_memfd only supports PAGE_SIZE pages, so if two sources
of backing memory are used, the userspace VMM could maintain a steady total
memory utilized by punching out the pages that are not used. When huge
pages are available in guest_memfd, even if the backing memory source
supports hole punching within a huge page, punching out pages to maintain
the total memory utilized by a VM would be introducing lots of
fragmentation.
In-place conversion avoids fragmentation by allowing the same physical
memory to be used for both shared and private memory, with guest_memfd
tracks the shared/private status of all the pages at a per-page
granularity.
The central principle, which guest_memfd continues to uphold, is that any
guest-private page will not be mappable to host userspace. All pages will
be mmap()-able in host userspace, but accesses to guest-private pages (as
tracked by guest_memfd) will result in a SIGBUS.
This series introduces a guest_memfd ioctl (not kvm, vm or vcpu, but
guest_memfd ioctl) that allows userspace to set memory
attributes (shared/private) directly through the guest_memfd. This is the
appropriate interface because shared/private-ness is a property of memory
and hence the request should be sent directly to the memory provider -
guest_memfd.
Tested with both CONFIG_KVM_VM_MEMORY_ATTRIBUTES enabled and disabled:
+ tools/testing/selftests/kvm/guest_memfd_test.c
+ tools/testing/selftests/kvm/pre_fault_memory_test.c
+ tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
+ tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
+ tools/testing/selftests/kvm/x86/private_mem_conversions_test.sh
+ tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c
Updates for this revision:
+ For TDX and SNP, PRESERVE supported only before VM is finalized only for
to_private conversions.
+ This allows PRESERVE to be used as part of the VM memory
loading/encryption flow
+ Only support PRESERVE for to_private conversions (to_shared on
populated memory on TDX would cause zeroing)
+ Relaxed constraints for SNP and TDX to allow NULL to be passed as
source address.
+ Dropped KVM_CAP_MEMORY_ATTRIBUTES2. KVM_CAP_MEMORY_ATTRIBUTES reports
attributes supported by the KVM_SET_MEMORY_ATTRIBUTES VM ioctl, and
KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES reports attributes supported bt the
KVM_SET_MEMORY_ATTRIBUTES2 guest_memfd ioctl.
+ KVM_SET_MEMORY_ATTRIBUTES2 is not supported by the VM ioctl
+ Resolve locking issue when kvm_gmem_get_attribute() is called from
kvm_mmu_zap_collapsible_spte() by bugging the VM. guest_memfd memslots
don't support dirty tracking, so the locking issue is not on an
accessible code path.
+ Moved guest_memfd_conversions_test.c to only be compiled and tested for
x86, since it depends so heavily on KVM_X86_SW_PROTECTED_VM's as a
testing vehicle
TODOs
+ Perhaps further clarify PRESERVE flag: [8]
+ Resolve issue where guest_memfd_conversions_test, which uses the
kselftest framework, doesn't perform teardown on assertion
failure. Please see proposal at [9]
+ Test with TDX selftests. We're in the process of rebasing TDX selftests
on this series and will post updates when that's tested.
I would like feedback on:
+ Content modes: 0 (MODE_UNSPECIFIED), ZERO, and PRESERVE. Is that all
good, or does anyone think there is a use case for something else?
+ Should the content modes apply even if no attribute changes are required?
+ See notes added in "KVM: guest_memfd: Apply content modes while
setting memory attributes"
+ Possibly related: should setting attributes be allowed if some
sub-range requested already has the requested attribute?
+ Structure of how various content modes are checked for support or
applied? I used overridable weak functions for architectures that haven't
defined support, and defined overrides for x86 to show how I think it would
work. For CoCo platforms, I only implemented TDX for illustration purposes
and might need help with the other platforms. Should I have used
kvm_x86_ops? I tried and found myself defining lots of boilerplate.
+ The use of private_mem_conversions_test.sh to run different options in
private_mem_conversions_test. If this makes sense, I'll adjust the
Makefile to have private_mem_conversions_test tested only via the script.
This series is based on kvm/next, and here's the tree for your convenience:
https://github.com/googleprodkernel/linux-cc/commits/guest_memfd-inplace-conversion-v5
Older series:
+ RFCv4 is at [7]
+ RFCv3 is at [6]
+ RFCv2 is at [5]
+ RFCv1 is at [4]
+ Previous versions of this feature, part of other series, are available at
[1][2][3].
[1] https://lore.kernel.org/all/bd163de3118b626d1005aa88e71ef2fb72f0be0f.1726009989.git.ackerleytng@google.com/
[2] https://lore.kernel.org/all/20250117163001.2326672-6-tabba@google.com/
[3] https://lore.kernel.org/all/b784326e9ccae6a08388f1bf39db70a2204bdc51.1747264138.git.ackerleytng@google.com/
[4] https://lore.kernel.org/all/cover.1760731772.git.ackerleytng@google.com/T/
[5] https://lore.kernel.org/all/cover.1770071243.git.ackerleytng@google.com/T/
[6] https://lore.kernel.org/r/20260313-gmem-inplace-conversion-v3-0-5fc12a70ec89@google.com/T/
[7] https://lore.kernel.org/all/20260326-gmem-inplace-conversion-v4-0-e202fe950ffd@google.com/T/
[8] https://lore.kernel.org/all/CAEvNRgGbMhkX310CkFY_M5x-zod=BDTiuznrZ0XvFPUK7weL1A@mail.gmail.com/
[9] https://lore.kernel.org/all/20260414-selftest-global-metadata-v1-0-fd223922bc57@google.com/T/
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
Ackerley Tng (34):
KVM: x86/mmu: Bug the VM if gmem attributes are queried to determine max mapping level
KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes
KVM: guest_memfd: Only prepare folios for private pages
KVM: Move kvm_supported_mem_attributes() to kvm_host.h
KVM: guest_memfd: Add basic support for KVM_SET_MEMORY_ATTRIBUTES2
KVM: guest_memfd: Ensure pages are not in use before conversion
KVM: guest_memfd: Call arch invalidate hooks on conversion
KVM: guest_memfd: Return early if range already has requested attributes
KVM: guest_memfd: Advertise KVM_SET_MEMORY_ATTRIBUTES2 ioctl
KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion safety check
KVM: guest_memfd: Use actual size for invalidation in kvm_gmem_release()
KVM: guest_memfd: Determine invalidation filter from memory attributes
KVM: guest_memfd: Introduce default handlers for content modes
KVM: guest_memfd: Apply content modes while setting memory attributes
KVM: x86: Support SW_PROTECTED_VM in applying content modes
KVM: TDX: Make source page optional for KVM_TDX_INIT_MEM_REGION
KVM: x86: Support SNP and TDX applying content modes
KVM: x86: Bug CoCo VM on page fault before finalizing
KVM: Add CAP to enumerate supported SET_MEMORY_ATTRIBUTES2 flags
KVM: selftests: Test basic single-page conversion flow
KVM: selftests: Test conversion flow when INIT_SHARED
KVM: selftests: Test conversion precision in guest_memfd
KVM: selftests: Test conversion before allocation
KVM: selftests: Convert with allocated folios in different layouts
KVM: selftests: Test that truncation does not change shared/private status
KVM: selftests: Test conversion with elevated page refcount
KVM: selftests: Test that conversion to private does not support ZERO
KVM: selftests: Support checking that data not equal expected
KVM: selftests: Test that not specifying a conversion flag scrambles memory contents
KVM: selftests: Reset shared memory after hole-punching
KVM: selftests: Provide function to look up guest_memfd details from gpa
KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe
KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd
KVM: selftests: Add script to exercise private_mem_conversions_test
Michael Roth (1):
KVM: SEV: Make 'uaddr' parameter optional for KVM_SEV_SNP_LAUNCH_UPDATE
Sean Christopherson (18):
KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings
KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES
KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined
KVM: Stub in ability to disable per-VM memory attribute tracking
KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes
KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86
KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes
KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs
KVM: selftests: Create gmem fd before "regular" fd when adding memslot
KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset}
KVM: selftests: Add support for mmap() on guest_memfd in core library
KVM: selftests: Add selftests global for guest memory attributes capability
KVM: selftests: Add helpers for calling ioctls on guest_memfd
KVM: selftests: Test that shared/private status is consistent across processes
KVM: selftests: Provide common function to set memory attributes
KVM: selftests: Check fd/flags provided to mmap() when setting up memslot
KVM: selftests: Update pre-fault test to work with per-guest_memfd attributes
KVM: selftests: Update private memory exits test to work with per-gmem attributes
Documentation/virt/kvm/api.rst | 139 ++++-
.../virt/kvm/x86/amd-memory-encryption.rst | 19 +-
Documentation/virt/kvm/x86/intel-tdx.rst | 4 +
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/Kconfig | 15 +-
arch/x86/kvm/mmu/mmu.c | 20 +-
arch/x86/kvm/svm/sev.c | 18 +-
arch/x86/kvm/vmx/tdx.c | 8 +-
arch/x86/kvm/x86.c | 145 ++++-
include/linux/kvm_host.h | 74 ++-
include/trace/events/kvm.h | 4 +-
include/uapi/linux/kvm.h | 21 +
mm/swap.c | 2 +
tools/testing/selftests/kvm/Makefile.kvm | 5 +
tools/testing/selftests/kvm/include/kvm_util.h | 141 ++++-
tools/testing/selftests/kvm/include/test_util.h | 34 +-
.../selftests/kvm/kvm_has_gmem_attributes.c | 17 +
tools/testing/selftests/kvm/lib/kvm_util.c | 130 +++--
tools/testing/selftests/kvm/lib/test_util.c | 7 -
tools/testing/selftests/kvm/lib/x86/sev.c | 2 +-
.../testing/selftests/kvm/pre_fault_memory_test.c | 4 +-
.../kvm/x86/guest_memfd_conversions_test.c | 552 +++++++++++++++++++
.../kvm/x86/private_mem_conversions_test.c | 55 +-
.../kvm/x86/private_mem_conversions_test.sh | 128 +++++
.../selftests/kvm/x86/private_mem_kvm_exits_test.c | 38 +-
virt/kvm/Kconfig | 3 +-
virt/kvm/guest_memfd.c | 591 ++++++++++++++++++++-
virt/kvm/kvm_main.c | 87 ++-
28 files changed, 2075 insertions(+), 190 deletions(-)
---
base-commit: 39f1c201b93f4ff71631bac72cff6eb155f976a4
change-id: 20260225-gmem-inplace-conversion-bd0dbd39753a
Best regards,
--
Ackerley Tng <ackerleytng@google.com>
On Tue, Apr 28, 2026 at 04:24:55PM -0700, Ackerley Tng via B4 Relay wrote:
> [Some people who received this message don't often get email from devnull+ackerleytng.google.com@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> This is RFC v5 of guest_memfd in-place conversion support.
>
> Up till now, guest_memfd supports the entire inode worth of memory being
> used as all-shared, or all-private. CoCo VMs may request guest memory to be
> converted between private and shared states, and the only way to support
> that currently would be to have the userspace VMM provide two sources of
> backing memory from completely different areas of physical memory.
>
> pKVM has a use case for in-place sharing: the guest and host may be
> cooperating on given data, and pKVM doesn't protect data through
> encryption, so copying that given data between different areas of physical
> memory as part of conversions would be unnecessary work.
>
> This series also serves as a foundation for guest_memfd huge page
> support. Now, guest_memfd only supports PAGE_SIZE pages, so if two sources
> of backing memory are used, the userspace VMM could maintain a steady total
> memory utilized by punching out the pages that are not used. When huge
> pages are available in guest_memfd, even if the backing memory source
> supports hole punching within a huge page, punching out pages to maintain
> the total memory utilized by a VM would be introducing lots of
> fragmentation.
>
> In-place conversion avoids fragmentation by allowing the same physical
> memory to be used for both shared and private memory, with guest_memfd
> tracks the shared/private status of all the pages at a per-page
> granularity.
>
> The central principle, which guest_memfd continues to uphold, is that any
> guest-private page will not be mappable to host userspace. All pages will
> be mmap()-able in host userspace, but accesses to guest-private pages (as
> tracked by guest_memfd) will result in a SIGBUS.
>
> This series introduces a guest_memfd ioctl (not kvm, vm or vcpu, but
> guest_memfd ioctl) that allows userspace to set memory
> attributes (shared/private) directly through the guest_memfd. This is the
> appropriate interface because shared/private-ness is a property of memory
> and hence the request should be sent directly to the memory provider -
> guest_memfd.
>
> Tested with both CONFIG_KVM_VM_MEMORY_ATTRIBUTES enabled and disabled:
>
> + tools/testing/selftests/kvm/guest_memfd_test.c
> + tools/testing/selftests/kvm/pre_fault_memory_test.c
> + tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> + tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
> + tools/testing/selftests/kvm/x86/private_mem_conversions_test.sh
> + tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c
>
> Updates for this revision:
>
> + For TDX and SNP, PRESERVE supported only before VM is finalized only for
> to_private conversions.
> + This allows PRESERVE to be used as part of the VM memory
> loading/encryption flow
> + Only support PRESERVE for to_private conversions (to_shared on
> populated memory on TDX would cause zeroing)
> + Relaxed constraints for SNP and TDX to allow NULL to be passed as
> source address.
> + Dropped KVM_CAP_MEMORY_ATTRIBUTES2. KVM_CAP_MEMORY_ATTRIBUTES reports
> attributes supported by the KVM_SET_MEMORY_ATTRIBUTES VM ioctl, and
> KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES reports attributes supported bt the
> KVM_SET_MEMORY_ATTRIBUTES2 guest_memfd ioctl.
> + KVM_SET_MEMORY_ATTRIBUTES2 is not supported by the VM ioctl
> + Resolve locking issue when kvm_gmem_get_attribute() is called from
> kvm_mmu_zap_collapsible_spte() by bugging the VM. guest_memfd memslots
> don't support dirty tracking, so the locking issue is not on an
> accessible code path.
> + Moved guest_memfd_conversions_test.c to only be compiled and tested for
> x86, since it depends so heavily on KVM_X86_SW_PROTECTED_VM's as a
> testing vehicle
>
> TODOs
>
> + Perhaps further clarify PRESERVE flag: [8]
I made a super-long-winded reply to that thread, but to summarize:
PRESERVE flag has different enumeration/behavior/enforcement for pre-launch
vs. post-launch, and similar considerations might come into play for
other flags, so to make it easier to enumerate what flags are available
for pre-launch/post-launch, maybe we could have 2 capabilities instead
of 1:
KVM_CAP_MEMORY_ATTRIBUTES2_PRE_LAUNCH_FLAGS
KVM_CAP_MEMORY_ATTRIBUTES2_FLAGS
where SNP/TDX would only advertise PRESERVE for PRE_LAUNCH, and pKVM I
guess would enumerate it for both (or maybe just POST_LAUNCH?)
That lets us keep the flags definitions more straightforward but still
allows userspace to easily enumerate what exactly should be available at
pre vs. post launch time, and give us some flexibility to detail
variations in behavior between the 2 phases without documenting
edge-cases in terms of VM types.
> + Resolve issue where guest_memfd_conversions_test, which uses the
> kselftest framework, doesn't perform teardown on assertion
> failure. Please see proposal at [9]
> + Test with TDX selftests. We're in the process of rebasing TDX selftests
> on this series and will post updates when that's tested.
>
> I would like feedback on:
>
> + Content modes: 0 (MODE_UNSPECIFIED), ZERO, and PRESERVE. Is that all
> good, or does anyone think there is a use case for something else?
> + Should the content modes apply even if no attribute changes are required?
> + See notes added in "KVM: guest_memfd: Apply content modes while
> setting memory attributes"
Looking at the example you have there:
+ Note: These content modes apply to the entire requested range, not
+ just the parts of the range that underwent conversion. For example, if
+ this was the initial state:
+
+ * [0x0000, 0x1000): shared
+ * [0x1000, 0x2000): private
+ * [0x2000, 0x3000): shared
+ and range [0x0000, 0x3000) was set to shared, the content mode would
+ apply to all memory in [0x0000, 0x3000), not just the range that
+ underwent conversion [0x1000, 0x2000).
Userspace would be aware of whether the range contains pages that were
already set to private, so if it really wants to set the just the
[0x1000, 0x2000) range to shared with appropriate content mode, it is
fully able to do so by just issuing the ioctl for that specific range.
If it attempts to issue it for the entire range, it only seems like it
would defy normal expectations and cause confusion to skip ranges, and
I'm not sure it gains us anything useful in exchange for that potential
confusion.
> + Possibly related: should setting attributes be allowed if some
> sub-range requested already has the requested attribute?
As it is now, userspace has that capability (to use finer-grained ranges
if it doesn't want to re-issue unecessary/unwanted conversions), similar
to above. And KVM internally will just issue kvm_arch_gmem_prepare()
calls so that architecture-specific handling can deal with this case
(e.g. SNP's sev_gmem_prepare() already checks if the corresponding
attribute is set in the RMP table and just skips it otherwise). So I
don't think we really gain anything but added complexity if we try to
make gmem more selective about it.
-Mike
> + Structure of how various content modes are checked for support or
> applied? I used overridable weak functions for architectures that haven't
> defined support, and defined overrides for x86 to show how I think it would
> work. For CoCo platforms, I only implemented TDX for illustration purposes
> and might need help with the other platforms. Should I have used
> kvm_x86_ops? I tried and found myself defining lots of boilerplate.
> + The use of private_mem_conversions_test.sh to run different options in
> private_mem_conversions_test. If this makes sense, I'll adjust the
> Makefile to have private_mem_conversions_test tested only via the script.
>
> This series is based on kvm/next, and here's the tree for your convenience:
>
> https://github.com/googleprodkernel/linux-cc/commits/guest_memfd-inplace-conversion-v5
>
> Older series:
>
> + RFCv4 is at [7]
> + RFCv3 is at [6]
> + RFCv2 is at [5]
> + RFCv1 is at [4]
> + Previous versions of this feature, part of other series, are available at
> [1][2][3].
>
> [1] https://lore.kernel.org/all/bd163de3118b626d1005aa88e71ef2fb72f0be0f.1726009989.git.ackerleytng@google.com/
> [2] https://lore.kernel.org/all/20250117163001.2326672-6-tabba@google.com/
> [3] https://lore.kernel.org/all/b784326e9ccae6a08388f1bf39db70a2204bdc51.1747264138.git.ackerleytng@google.com/
> [4] https://lore.kernel.org/all/cover.1760731772.git.ackerleytng@google.com/T/
> [5] https://lore.kernel.org/all/cover.1770071243.git.ackerleytng@google.com/T/
> [6] https://lore.kernel.org/r/20260313-gmem-inplace-conversion-v3-0-5fc12a70ec89@google.com/T/
> [7] https://lore.kernel.org/all/20260326-gmem-inplace-conversion-v4-0-e202fe950ffd@google.com/T/
> [8] https://lore.kernel.org/all/CAEvNRgGbMhkX310CkFY_M5x-zod=BDTiuznrZ0XvFPUK7weL1A@mail.gmail.com/
> [9] https://lore.kernel.org/all/20260414-selftest-global-metadata-v1-0-fd223922bc57@google.com/T/
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
> Ackerley Tng (34):
> KVM: x86/mmu: Bug the VM if gmem attributes are queried to determine max mapping level
> KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes
> KVM: guest_memfd: Only prepare folios for private pages
> KVM: Move kvm_supported_mem_attributes() to kvm_host.h
> KVM: guest_memfd: Add basic support for KVM_SET_MEMORY_ATTRIBUTES2
> KVM: guest_memfd: Ensure pages are not in use before conversion
> KVM: guest_memfd: Call arch invalidate hooks on conversion
> KVM: guest_memfd: Return early if range already has requested attributes
> KVM: guest_memfd: Advertise KVM_SET_MEMORY_ATTRIBUTES2 ioctl
> KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion safety check
> KVM: guest_memfd: Use actual size for invalidation in kvm_gmem_release()
> KVM: guest_memfd: Determine invalidation filter from memory attributes
> KVM: guest_memfd: Introduce default handlers for content modes
> KVM: guest_memfd: Apply content modes while setting memory attributes
> KVM: x86: Support SW_PROTECTED_VM in applying content modes
> KVM: TDX: Make source page optional for KVM_TDX_INIT_MEM_REGION
> KVM: x86: Support SNP and TDX applying content modes
> KVM: x86: Bug CoCo VM on page fault before finalizing
> KVM: Add CAP to enumerate supported SET_MEMORY_ATTRIBUTES2 flags
> KVM: selftests: Test basic single-page conversion flow
> KVM: selftests: Test conversion flow when INIT_SHARED
> KVM: selftests: Test conversion precision in guest_memfd
> KVM: selftests: Test conversion before allocation
> KVM: selftests: Convert with allocated folios in different layouts
> KVM: selftests: Test that truncation does not change shared/private status
> KVM: selftests: Test conversion with elevated page refcount
> KVM: selftests: Test that conversion to private does not support ZERO
> KVM: selftests: Support checking that data not equal expected
> KVM: selftests: Test that not specifying a conversion flag scrambles memory contents
> KVM: selftests: Reset shared memory after hole-punching
> KVM: selftests: Provide function to look up guest_memfd details from gpa
> KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe
> KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd
> KVM: selftests: Add script to exercise private_mem_conversions_test
>
> Michael Roth (1):
> KVM: SEV: Make 'uaddr' parameter optional for KVM_SEV_SNP_LAUNCH_UPDATE
>
> Sean Christopherson (18):
> KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings
> KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES
> KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined
> KVM: Stub in ability to disable per-VM memory attribute tracking
> KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes
> KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86
> KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes
> KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs
> KVM: selftests: Create gmem fd before "regular" fd when adding memslot
> KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset}
> KVM: selftests: Add support for mmap() on guest_memfd in core library
> KVM: selftests: Add selftests global for guest memory attributes capability
> KVM: selftests: Add helpers for calling ioctls on guest_memfd
> KVM: selftests: Test that shared/private status is consistent across processes
> KVM: selftests: Provide common function to set memory attributes
> KVM: selftests: Check fd/flags provided to mmap() when setting up memslot
> KVM: selftests: Update pre-fault test to work with per-guest_memfd attributes
> KVM: selftests: Update private memory exits test to work with per-gmem attributes
>
> Documentation/virt/kvm/api.rst | 139 ++++-
> .../virt/kvm/x86/amd-memory-encryption.rst | 19 +-
> Documentation/virt/kvm/x86/intel-tdx.rst | 4 +
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/Kconfig | 15 +-
> arch/x86/kvm/mmu/mmu.c | 20 +-
> arch/x86/kvm/svm/sev.c | 18 +-
> arch/x86/kvm/vmx/tdx.c | 8 +-
> arch/x86/kvm/x86.c | 145 ++++-
> include/linux/kvm_host.h | 74 ++-
> include/trace/events/kvm.h | 4 +-
> include/uapi/linux/kvm.h | 21 +
> mm/swap.c | 2 +
> tools/testing/selftests/kvm/Makefile.kvm | 5 +
> tools/testing/selftests/kvm/include/kvm_util.h | 141 ++++-
> tools/testing/selftests/kvm/include/test_util.h | 34 +-
> .../selftests/kvm/kvm_has_gmem_attributes.c | 17 +
> tools/testing/selftests/kvm/lib/kvm_util.c | 130 +++--
> tools/testing/selftests/kvm/lib/test_util.c | 7 -
> tools/testing/selftests/kvm/lib/x86/sev.c | 2 +-
> .../testing/selftests/kvm/pre_fault_memory_test.c | 4 +-
> .../kvm/x86/guest_memfd_conversions_test.c | 552 +++++++++++++++++++
> .../kvm/x86/private_mem_conversions_test.c | 55 +-
> .../kvm/x86/private_mem_conversions_test.sh | 128 +++++
> .../selftests/kvm/x86/private_mem_kvm_exits_test.c | 38 +-
> virt/kvm/Kconfig | 3 +-
> virt/kvm/guest_memfd.c | 591 ++++++++++++++++++++-
> virt/kvm/kvm_main.c | 87 ++-
> 28 files changed, 2075 insertions(+), 190 deletions(-)
> ---
> base-commit: 39f1c201b93f4ff71631bac72cff6eb155f976a4
> change-id: 20260225-gmem-inplace-conversion-bd0dbd39753a
>
> Best regards,
> --
> Ackerley Tng <ackerleytng@google.com>
>
>
Michael Roth <michael.roth@amd.com> writes:
>
> [...snip...]
>
> I made a super-long-winded reply to that thread, but to summarize:
>
> PRESERVE flag has different enumeration/behavior/enforcement for pre-launch
> vs. post-launch, and similar considerations might come into play for
> other flags, so to make it easier to enumerate what flags are available
> for pre-launch/post-launch, maybe we could have 2 capabilities instead
> of 1:
>
> KVM_CAP_MEMORY_ATTRIBUTES2_PRE_LAUNCH_FLAGS
> KVM_CAP_MEMORY_ATTRIBUTES2_FLAGS
>
> where SNP/TDX would only advertise PRESERVE for PRE_LAUNCH, and pKVM I
> guess would enumerate it for both (or maybe just POST_LAUNCH?)
>
> That lets us keep the flags definitions more straightforward but still
> allows userspace to easily enumerate what exactly should be available at
> pre vs. post launch time, and give us some flexibility to detail
> variations in behavior between the 2 phases without documenting
> edge-cases in terms of VM types.
>
Oops Michael I only read this after the meeting today.
Sean, today at guest_memfd biweekly we also discussed this topic. I
brought up this topic because IMO the interface is starting to get
a little awkward, I'm struggling to put the awkwardness into words.
Here are some awkward points:
For PRESERVE, even though it is defined (now) as that what the host
writes will be readable in the guest, it only works for both to-private
and to-shared conversions for KVM_X86_SW_PROTECTED_VMs and pKVM. That's
because guest_memfd doesn't actually invoke encryption during the
conversion. For TDX and SNP, the encryption can only be done before the
VM is finalized, through vendor-specific ioctls that go through
kvm_gmem_populate() to load memory into the guest.
For ZERO, it is defined in api.rst that ZERO is not supported for
to-private conversions, and the rationale there was that when ZEROing,
guest_memfd/KVM can zero, but it's really the contract between the guest
and the vendor trusted firmware whether the guest sees zeros later.
Another awkward point is that ZERO was meant to enable an optimization
for TDX since the firmware zeroes memory, but it actually only zeroes
memory when the page is unmapped from Secure EPTs. guest_memfd (for now)
doesn't track whether the page was unmapped from Secure EPTs as part of
the conversion, so guest_memfd can't assume it was mapped before the
conversion request. To uphold the ZERO contract with userspace,
guest_memfd applies zeroing for TDX anyway.
Summarizing from guest_memfd biweekly today:
David suggested enumerating the combinations, something like
`SHARED_ZERO` and friends (since to-private and ZERO is not supported)
and Michael then brought up the other axis of pre/post launch. IIRC
there might be another axis since pKVM would need to determine
dynamically if a to_shared conversion can be permitted for the range
being converted, based on whether the guest had requested a to_shared
conversion.
I think this might just result in too many flags, and could paint us
into a corner if more options get supported later.
I spent even more time thinking about this today. I get that we want a
consistent contract to userspace, can we scope the contract differently?
What if we scope as "what KVM guarantees the content will look like
after guest_memfd updates attributes"? This is a smaller contract, since
it doesn't promise anything about what the guest sees. Running this
through a few examples:
+ Pre-finalize, SNP, to-private, PRESERVE: guest_memfd guarantees that
after setting memory attributes, the contents of the pages will not
change. The contents are then ready for populate. What populate does
to the memory is another contract between SNP and the guest that is
out of scope of guest_memfd's contract.
+ Post-finalize, SNP, to-private, PRESERVE: guest_memfd guarantees that
after setting memory attributes, the contents of the pages will not
change. SNP's contract with the guest does not, though. After the page
gets faulted in, the guest sees scrambled data. This may be a
meaningless operation now, but it leaves the door open so perhaps we
could have an SNP-specific ioctl in future where step 1 is to set
memory attributes within guest_memfd to private and step 2 is to
encrypt in place.
+ pKVM, to-private, PRESERVE: guest_memfd guarantees that after setting
memory attributes, the contents of the pages don't change. Separately,
pKVM doesn't do encryption, so the pKVM guest reads the same contents
the host wrote. The distinction here from the current state is that
guest_memfd didn't guarantee that the pKVM guest will see the same
content the host wrote since that's a separate contract between the
pKVM guest and pKVM.
+ Post-finalize, TDX, to-shared, ZERO: guest_memfd guarantees that
contents of the pages will be zeroed in the process of updating
guest_memfd attributes. Host userspace reads zeros after faulting it
in, which is because guest_memfd did zero the pages after conversion
to shared. A future optimization is possible, where guest_memfd only
zeroes the pages that were unmapped from Secure EPTs, since (this
version of) TDX zeros memory when unmapping from Secure EPTs.
+ Post-finalize, TDX, to-shared, PRESERVE: -EOPNOTSUPP. guest_memfd is
unable to guarantee that the process of setting memory attributes will
not change memory contents. The process of setting memory attributes
requires unmapping from Secure EPTs, which will zero the memory. (In
future, if we want to relax this, we could permit this if nothing in
the requested range was mapped in Secure EPTs)
+ Post-finalize, SNP, to-shared, PRESERVE: guest_memfd guarantees that
after setting memory attributes, the contents of the pages will not
change. For SNP, unmapping doesn't change memory contents? The guest
reads garbage, and that's a separate contract between SNP and the
guest. In the guest_memfd contract, guest_memfd PRESERVEs the memory
contents in the process of setting memory attributes, and can fulfil
that.
+ Post-finalize, TDX, to-private, ZERO: guest_memfd zeroes the shared
memory before updating the attributes to be private, because it
promised to. If this memory gets faulted in to Secure EPTs, TDX
firmware zeros it again, because that's TDX's contract with the
guest. I can't see any benefit to userspace in using this combination,
but the guest_memfd contract and implementation are simple.
TLDR:
+ PRESERVE == guarantee that the process of setting memory attributes
doesn't change memory contents.
+ implementation == do nothing in most cases, except -EOPNOTSUPP for
to-shared on TDX, since unmapping is a required part of setting
memory attributes to private, and a TDX side effect of unmapping
is zeroing memory,
+ ZERO == guarantee that the process of setting memory attributes zeroes
memory contents.
+ implementation == memset(zero) in most cases. For TDX, a future
optimization exists, where memset() can be skipped for pages that
were mapped in Secure EPTs before conversion
+ UNSPECIFIED == no guarantees
+ implementation == guest_memfd does nothing explicitly about memory
contents. The implementation is pretty much the same as PRESERVE
except guest_memfd won't take into account vendor-specific side
effects of the process of conversion. Except for the test vehicle
KVM_X86_SW_PROTECTED_VMS, where memory is scrambled.
>>
>> [...snip...]
>>
>
> Looking at the example you have there:
>
> + Note: These content modes apply to the entire requested range, not
> + just the parts of the range that underwent conversion. For example, if
> + this was the initial state:
> +
> + * [0x0000, 0x1000): shared
> + * [0x1000, 0x2000): private
> + * [0x2000, 0x3000): shared
> + and range [0x0000, 0x3000) was set to shared, the content mode would
> + apply to all memory in [0x0000, 0x3000), not just the range that
> + underwent conversion [0x1000, 0x2000).
>
> Userspace would be aware of whether the range contains pages that were
> already set to private, so if it really wants to set the just the
> [0x1000, 0x2000) range to shared with appropriate content mode, it is
> fully able to do so by just issuing the ioctl for that specific range.
> If it attempts to issue it for the entire range, it only seems like it
> would defy normal expectations and cause confusion to skip ranges, and
> I'm not sure it gains us anything useful in exchange for that potential
> confusion.
>
Great that we're aligned here :) No complaints from guest_memfd biweekly
today as well :)
>>
>> [...snip...]
>>
Ackerley Tng <ackerleytng@google.com> writes: > > [...snip...] > > > TLDR: > > + PRESERVE == guarantee that the process of setting memory attributes > doesn't change memory contents. > + implementation == do nothing in most cases, except -EOPNOTSUPP for > to-shared on TDX, since unmapping is a required part of setting > memory attributes to private, and a TDX side effect of unmapping > is zeroing memory, -EOPNOTSUPP will only be for TDX, not SNP. > + ZERO == guarantee that the process of setting memory attributes zeroes > memory contents. > + implementation == memset(zero) in most cases. For TDX, a future > optimization exists, where memset() can be skipped for pages that > were mapped in Secure EPTs before conversion > + UNSPECIFIED == no guarantees > + implementation == guest_memfd does nothing explicitly about memory > contents. The implementation is pretty much the same as PRESERVE > except guest_memfd won't take into account vendor-specific side > effects of the process of conversion. Except for the test vehicle > KVM_X86_SW_PROTECTED_VMS, where memory is scrambled. > Found another use case internally for pre-finalize, SNP, to-shared, PRESERVE, which works with the above smaller scope. During SNP_LAUNCH_UPDATE, when inserting a CPUID page, the firmware will check that the CPUID values would not lead to an insecure guest state. SNP_LAUNCH_UPDATE will fail with an error and the page remains shared in the RMP table. Here's the proposed flow in the userspace VMM: 1. Load CPUID in shared guest_memfd memory 2. SET_MEMORY_ATTRIBUTES(PRIVATE, PRESERVE) 3. SNP_LAUNCH_UPDATE => get error since CPUID was insecure 4. SET_MEMORY_ATTRIBUTES(SHARED, PRESERVE) 5. Read shared guest_memfd memory, error if VMM disagrees 6. SET_MEMORY_ATTRIBUTES(PRIVATE, PRESERVE) 7. SNP_LAUNCH_UPDATE => successful, since CPUID is now corrected Does that seem ok? >>> >>> [...snip...] >>>
On Tue, Apr 28, 2026, Ackerley Tng wrote: > This is RFC v5 of guest_memfd in-place conversion support. ... > TODOs > > + Perhaps further clarify PRESERVE flag: [8] > + Resolve issue where guest_memfd_conversions_test, which uses the > kselftest framework, doesn't perform teardown on assertion > failure. Please see proposal at [9] > + Test with TDX selftests. We're in the process of rebasing TDX selftests > on this series and will post updates when that's tested. Why exactly is this still RFC? The TODOs here don't strike me as things that would make this RFC. Blockers for merge, yes/maybe/probably, but at a glance, it feels like we've moved beyond RFC for the code itself.
With these POC patches, I was able to test the set memory
attributes/conversion ioctls with SNP.
The content policies work, and PRESERVE can be used before the SNP VM
is finalized. SNP_LAUNCH_UPDATE can accept 0 for source address and
the SNP VM runs fine. :)
Ackerley Tng (6):
KVM: selftests: Initialize guest_memfd with INIT_SHARED
KVM: selftests: Use guest_memfd memory contents in-place for SNP
launch update
KVM: selftests: Make guest_code_xsave more friendly
KVM: selftests: Allow specifying CoCo-privateness while mapping a page
KVM: selftests: Test conversions for SNP
KVM: selftests: Test content modes ZERO and PRESERVE for SNP
.../selftests/kvm/include/x86/processor.h | 2 +
tools/testing/selftests/kvm/lib/kvm_util.c | 12 +-
.../testing/selftests/kvm/lib/x86/processor.c | 13 +-
tools/testing/selftests/kvm/lib/x86/sev.c | 9 +-
.../selftests/kvm/x86/sev_smoke_test.c | 255 +++++++++++++++++-
5 files changed, 271 insertions(+), 20 deletions(-)
--
2.54.0.545.g6539524ca2-goog
Initialize guest_memfd with INIT_SHARED for VM types that require
guest_memfd.
Memory in the first memslot is used by the selftest framework to load
code, page tables, interrupt descriptor tables, and basically everything
the selftest needs to run. The selftest framework sets all of these up
assuming that the memory in the memslot can be written to from the
host. Align with that behavior by initializing guest_memfd as shared so
that all the writes from the host are permitted.
guest_memfd memory can later be marked private if necessary by CoCo
platform-specific initialization functions.
Suggested-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
tools/testing/selftests/kvm/lib/kvm_util.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 216d6e037153c..3811aef8c98cd 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -483,8 +483,10 @@ struct kvm_vm *__vm_create(struct vm_shape shape, u32 nr_runnable_vcpus,
{
u64 nr_pages = vm_nr_pages_required(shape.mode, nr_runnable_vcpus,
nr_extra_pages);
+ enum vm_mem_backing_src_type src_type;
struct userspace_mem_region *slot0;
struct kvm_vm *vm;
+ u64 gmem_flags;
int i, flags;
kvm_set_files_rlimit(nr_runnable_vcpus);
@@ -502,7 +504,15 @@ struct kvm_vm *__vm_create(struct vm_shape shape, u32 nr_runnable_vcpus,
if (is_guest_memfd_required(shape))
flags |= KVM_MEM_GUEST_MEMFD;
- vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, flags);
+ gmem_flags = 0;
+ src_type = VM_MEM_SRC_ANONYMOUS;
+ if (is_guest_memfd_required(shape) && kvm_has_gmem_attributes) {
+ src_type = VM_MEM_SRC_SHMEM;
+ gmem_flags = GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED;
+ }
+
+ vm_mem_add(vm, src_type, 0, 0, nr_pages, flags, -1, 0, gmem_flags);
+
for (i = 0; i < NR_MEM_REGIONS; i++)
vm->memslots[i] = 0;
--
2.54.0.545.g6539524ca2-goog
Update the SEV-SNP launch update flow to utilize guest_memfd in-place
conversion.
Include the KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE flag when setting memory
attributes to private. This is permitted before the SNP VM is finalized.
In snp_launch_update_data, pass 0 as the host virtual address. This
instructs the kernel to perform the launch update using the guest_memfd
backing the guest physical address rather than a userspace-provided
buffer.
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
tools/testing/selftests/kvm/lib/x86/sev.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86/sev.c b/tools/testing/selftests/kvm/lib/x86/sev.c
index d0205b3299e0b..72b2935871fe4 100644
--- a/tools/testing/selftests/kvm/lib/x86/sev.c
+++ b/tools/testing/selftests/kvm/lib/x86/sev.c
@@ -32,13 +32,14 @@ static void encrypt_region(struct kvm_vm *vm, struct userspace_mem_region *regio
const u64 size = (j - i + 1) * vm->page_size;
const u64 offset = (i - lowest_page_in_region) * vm->page_size;
- if (private)
- vm_mem_set_private(vm, gpa_base + offset, size, 0);
+ if (private) {
+ vm_mem_set_private(vm, gpa_base + offset, size,
+ KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE);
+ }
if (is_sev_snp_vm(vm))
snp_launch_update_data(vm, gpa_base + offset,
- (u64)addr_gpa2hva(vm, gpa_base + offset),
- size, page_type);
+ 0, size, page_type);
else
sev_launch_update_data(vm, gpa_base + offset, size);
--
2.54.0.545.g6539524ca2-goog
The original implementation of guest_code_xsave makes a jmp to
guest_sev_es_code in inline assembly. When code that uses guest_sev_es_code
is removed, guest_sev_es_code will be optimized out, leading to a linking
error since guest_code_xsave still tries to jmp to guest_sev_es_code.
Rewrite guest_code_xsave() to instead make a call, in C, to
guest_sev_es_code(), so that usage of guest_sev_es_code() is made known to
the compiler.
This rewriting also gives a name to the xsave inline assembly, improving
readability.
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
.../selftests/kvm/x86/sev_smoke_test.c | 24 +++++++++++++------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/sev_smoke_test.c b/tools/testing/selftests/kvm/x86/sev_smoke_test.c
index 1a49ee3915864..8b859adf4cf6f 100644
--- a/tools/testing/selftests/kvm/x86/sev_smoke_test.c
+++ b/tools/testing/selftests/kvm/x86/sev_smoke_test.c
@@ -80,13 +80,23 @@ static void guest_sev_code(void)
GUEST_DONE();
}
-/* Stash state passed via VMSA before any compiled code runs. */
-extern void guest_code_xsave(void);
-asm("guest_code_xsave:\n"
- "mov $" __stringify(XFEATURE_MASK_X87_AVX) ", %eax\n"
- "xor %edx, %edx\n"
- "xsave (%rdi)\n"
- "jmp guest_sev_es_code");
+static void xsave_all_registers(void *addr)
+{
+ __asm__ __volatile__(
+ "mov $" __stringify(XFEATURE_MASK_X87_AVX) ", %eax\n"
+ "xor %edx, %edx\n"
+ "xsave (%0)"
+ :
+ : "r"(addr)
+ : "eax", "edx", "memory"
+ );
+}
+
+static void guest_code_xsave(void *vmsa_gva)
+{
+ xsave_all_registers(vmsa_gva);
+ guest_sev_es_code();
+}
static void compare_xsave(u8 *from_host, u8 *from_guest)
{
--
2.54.0.545.g6539524ca2-goog
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
tools/testing/selftests/kvm/include/x86/processor.h | 2 ++
tools/testing/selftests/kvm/lib/x86/processor.c | 13 ++++++++++---
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 77f576ee7789d..683f21452db58 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1507,6 +1507,8 @@ enum pg_level {
void tdp_mmu_init(struct kvm_vm *vm, int pgtable_levels,
struct pte_masks *pte_masks);
+void ___virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, gva_t gva,
+ gpa_t gpa, int level, bool private);
void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, gva_t gva,
gpa_t gpa, int level);
void virt_map_level(struct kvm_vm *vm, gva_t gva, gpa_t gpa,
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index b51467d70f6e7..02781194f51a2 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -256,8 +256,8 @@ static u64 *virt_create_upper_pte(struct kvm_vm *vm,
return pte;
}
-void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, gva_t gva,
- gpa_t gpa, int level)
+void ___virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, gva_t gva,
+ gpa_t gpa, int level, bool private)
{
const u64 pg_size = PG_LEVEL_SIZE(level);
u64 *pte = &mmu->pgd;
@@ -309,12 +309,19 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, gva_t gva,
* Neither SEV nor TDX supports shared page tables, so only the final
* leaf PTE needs manually set the C/S-bit.
*/
- if (vm_is_gpa_protected(vm, gpa))
+ if (private)
*pte |= PTE_C_BIT_MASK(mmu);
else
*pte |= PTE_S_BIT_MASK(mmu);
}
+void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, gva_t gva,
+ gpa_t gpa, int level)
+{
+ ___virt_pg_map(vm, mmu, gva, gpa, level,
+ vm_is_gpa_protected(vm, gpa));
+}
+
void virt_arch_pg_map(struct kvm_vm *vm, gva_t gva, gpa_t gpa)
{
__virt_pg_map(vm, &vm->mmu, gva, gpa, PG_LEVEL_4K);
--
2.54.0.545.g6539524ca2-goog
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
.../selftests/kvm/x86/sev_smoke_test.c | 190 +++++++++++++++++-
1 file changed, 185 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/sev_smoke_test.c b/tools/testing/selftests/kvm/x86/sev_smoke_test.c
index 8b859adf4cf6f..86f17e59e9392 100644
--- a/tools/testing/selftests/kvm/x86/sev_smoke_test.c
+++ b/tools/testing/selftests/kvm/x86/sev_smoke_test.c
@@ -253,17 +253,197 @@ static void test_sev_smoke(void *guest, u32 type, u64 policy)
}
}
+#define GHCB_MSR_REG_GPA_REQ 0x012
+#define GHCB_MSR_REG_GPA_REQ_VAL(v) \
+ /* GHCBData[63:12] */ \
+ (((u64)((v) & GENMASK_ULL(51, 0)) << 12) | \
+ /* GHCBData[11:0] */ \
+ GHCB_MSR_REG_GPA_REQ)
+
+#define GHCB_MSR_REG_GPA_RESP 0x013
+#define GHCB_MSR_REG_GPA_RESP_VAL(v) \
+ /* GHCBData[63:12] */ \
+ (((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
+
+#define GHCB_DATA_LOW 12
+#define GHCB_MSR_INFO_MASK (BIT_ULL(GHCB_DATA_LOW) - 1)
+#define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)
+
+/*
+ * SNP Page State Change Operation
+ *
+ * GHCBData[55:52] - Page operation:
+ * 0x0001 Page assignment, Private
+ * 0x0002 Page assignment, Shared
+ */
+enum psc_op {
+ SNP_PAGE_STATE_PRIVATE = 1,
+ SNP_PAGE_STATE_SHARED,
+};
+
+#define GHCB_MSR_PSC_REQ 0x014
+#define GHCB_MSR_PSC_REQ_GFN(gfn, op) \
+ /* GHCBData[55:52] */ \
+ (((u64)((op) & 0xf) << 52) | \
+ /* GHCBData[51:12] */ \
+ ((u64)((gfn) & GENMASK_ULL(39, 0)) << 12) | \
+ /* GHCBData[11:0] */ \
+ GHCB_MSR_PSC_REQ)
+
+#define GHCB_MSR_PSC_RESP 0x015
+#define GHCB_MSR_PSC_RESP_VAL(val) \
+ /* GHCBData[63:32] */ \
+ (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
+
+static u64 ghcb_gpa;
+static void snp_register_ghcb(void)
+{
+ u64 ghcb_pfn = ghcb_gpa >> PAGE_SHIFT;
+ u64 val;
+
+ GUEST_ASSERT(ghcb_gpa);
+
+ wrmsr(MSR_AMD64_SEV_ES_GHCB, GHCB_MSR_REG_GPA_REQ_VAL(ghcb_gpa >> PAGE_SHIFT));
+ vmgexit();
+
+ val = rdmsr(MSR_AMD64_SEV_ES_GHCB);
+ GUEST_ASSERT_EQ(GHCB_RESP_CODE(val), GHCB_MSR_REG_GPA_RESP);
+ GUEST_ASSERT_EQ(GHCB_MSR_REG_GPA_RESP_VAL(val), ghcb_pfn);
+}
+
+static void snp_page_state_change(u64 gpa, enum psc_op op)
+{
+ u64 val;
+
+ wrmsr(MSR_AMD64_SEV_ES_GHCB, GHCB_MSR_PSC_REQ_GFN(gpa >> PAGE_SHIFT, op));
+ vmgexit();
+
+ val = rdmsr(MSR_AMD64_SEV_ES_GHCB);
+ GUEST_ASSERT_EQ(GHCB_RESP_CODE(val), GHCB_MSR_PSC_RESP);
+ GUEST_ASSERT_EQ(GHCB_MSR_PSC_RESP_VAL(val), 0);
+}
+
+#define RMP_PG_SIZE_4K 0
+static inline void pvalidate(void *vaddr, bool validate)
+{
+ bool no_rmpupdate;
+ int rc;
+
+ /* "pvalidate" mnemonic support in binutils 2.36 and newer */
+ asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFF\n\t"
+ : "=@ccc"(no_rmpupdate), "=a"(rc)
+ : "a"(vaddr), "c"(RMP_PG_SIZE_4K), "d"(validate)
+ : "memory", "cc");
+
+ GUEST_ASSERT(!no_rmpupdate);
+ GUEST_ASSERT_EQ(rc, 0);
+}
+
+#define CONVERSION_TEST_VALUE_SHARED_1 0xab
+#define CONVERSION_TEST_VALUE_SHARED_2 0xcd
+#define CONVERSION_TEST_VALUE_PRIVATE 0xef
+#define CONVERSION_TEST_VALUE_SHARED_3 0xbc
+static void guest_code_conversion(u8 *test_shared_gva, u8 *test_private_gva, u64 test_gpa)
+{
+ snp_register_ghcb();
+
+ GUEST_ASSERT_EQ(READ_ONCE(*test_shared_gva), CONVERSION_TEST_VALUE_SHARED_1);
+ WRITE_ONCE(*test_shared_gva, CONVERSION_TEST_VALUE_SHARED_2);
+
+ snp_page_state_change(test_gpa, SNP_PAGE_STATE_PRIVATE);
+ pvalidate(test_private_gva, true);
+
+ WRITE_ONCE(*test_private_gva, CONVERSION_TEST_VALUE_PRIVATE);
+ GUEST_ASSERT_EQ(READ_ONCE(*test_private_gva), CONVERSION_TEST_VALUE_PRIVATE);
+
+ pvalidate(test_private_gva, false);
+ snp_page_state_change(test_gpa, SNP_PAGE_STATE_SHARED);
+
+ WRITE_ONCE(*test_shared_gva, CONVERSION_TEST_VALUE_SHARED_3);
+
+ wrmsr(MSR_AMD64_SEV_ES_GHCB, GHCB_MSR_TERM_REQ);
+ vmgexit();
+}
+
+static void test_conversion(u64 policy)
+{
+ gva_t test_private_gva;
+ gva_t test_shared_gva;
+ struct kvm_vcpu *vcpu;
+ gva_t ghcb_gva;
+ gpa_t test_gpa;
+ struct kvm_vm *vm;
+ void *ghcb_hva;
+ void *test_hva;
+
+ vm = vm_sev_create_with_one_vcpu(KVM_X86_SNP_VM, guest_code_conversion, &vcpu);
+
+ ghcb_gva = vm_alloc_shared(vm, PAGE_SIZE, KVM_UTIL_MIN_VADDR,
+ MEM_REGION_TEST_DATA);
+ ghcb_hva = addr_gva2hva(vm, ghcb_gva);
+ ghcb_gpa = addr_gva2gpa(vm, ghcb_gva);
+ sync_global_to_guest(vm, ghcb_gpa);
+
+ test_shared_gva = vm_alloc_shared(vm, PAGE_SIZE, KVM_UTIL_MIN_VADDR,
+ MEM_REGION_TEST_DATA);
+ test_hva = addr_gva2hva(vm, test_shared_gva);
+ test_gpa = addr_gva2gpa(vm, test_shared_gva);
+
+ test_private_gva = vm_unused_gva_gap(vm, PAGE_SIZE, KVM_UTIL_MIN_VADDR);
+ ___virt_pg_map(vm, &vm->mmu, test_private_gva, test_gpa, PG_SIZE_4K, true);
+
+ vcpu_args_set(vcpu, 3, test_shared_gva, test_private_gva, test_gpa);
+
+ vm_sev_launch(vm, policy, NULL);
+
+ WRITE_ONCE(*(u8 *)test_hva, CONVERSION_TEST_VALUE_SHARED_1);
+
+ fprintf(stderr, "ghcb_hva=%p ghcb_gpa=%lx ghcb_gva=%lx\n", ghcb_hva, ghcb_gpa, ghcb_gva);
+ fprintf(stderr, "test_hva=%p test_gpa=%lx test_private_gva=%lx test_shared_gva=%lx\n", test_hva, test_gpa, test_private_gva, test_shared_gva);
+
+ vcpu_run(vcpu);
+
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_HYPERCALL);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.nr, KVM_HC_MAP_GPA_RANGE);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.args[0], test_gpa);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.args[1], 1);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.args[2], KVM_MAP_GPA_RANGE_ENCRYPTED | KVM_MAP_GPA_RANGE_PAGE_SZ_4K);
+
+ vm_mem_set_private(vm, test_gpa, PAGE_SIZE, KVM_SET_MEMORY_ATTRIBUTES2_MODE_UNSPECIFIED);
+
+ vcpu_run(vcpu);
+
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_HYPERCALL);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.nr, KVM_HC_MAP_GPA_RANGE);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.args[0], test_gpa);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.args[1], 1);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.args[2], KVM_MAP_GPA_RANGE_DECRYPTED | KVM_MAP_GPA_RANGE_PAGE_SZ_4K);
+
+ vm_mem_set_shared(vm, test_gpa, PAGE_SIZE, KVM_SET_MEMORY_ATTRIBUTES2_MODE_UNSPECIFIED);
+
+ vcpu_run(vcpu);
+
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_SYSTEM_EVENT);
+ TEST_ASSERT_EQ(vcpu->run->system_event.type, KVM_SYSTEM_EVENT_SEV_TERM);
+ TEST_ASSERT_EQ(vcpu->run->system_event.ndata, 1);
+ TEST_ASSERT_EQ(vcpu->run->system_event.data[0], GHCB_MSR_TERM_REQ);
+
+ TEST_ASSERT_EQ(*(u8 *)test_hva, CONVERSION_TEST_VALUE_SHARED_3);
+}
+
int main(int argc, char *argv[])
{
TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SEV));
- test_sev_smoke(guest_sev_code, KVM_X86_SEV_VM, 0);
+ // test_sev_smoke(guest_sev_code, KVM_X86_SEV_VM, 0);
- if (kvm_cpu_has(X86_FEATURE_SEV_ES))
- test_sev_smoke(guest_sev_es_code, KVM_X86_SEV_ES_VM, SEV_POLICY_ES);
+ // if (kvm_cpu_has(X86_FEATURE_SEV_ES))
+ // test_sev_smoke(guest_sev_es_code, KVM_X86_SEV_ES_VM, SEV_POLICY_ES);
- if (kvm_cpu_has(X86_FEATURE_SEV_SNP))
- test_sev_smoke(guest_snp_code, KVM_X86_SNP_VM, snp_default_policy());
+ if (kvm_cpu_has(X86_FEATURE_SEV_SNP)) {
+ test_conversion(snp_default_policy());
+ // test_sev_smoke(guest_snp_code, KVM_X86_SNP_VM, snp_default_policy());
+ }
return 0;
}
--
2.54.0.545.g6539524ca2-goog
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
.../selftests/kvm/x86/sev_smoke_test.c | 47 +++++++++++++++++--
1 file changed, 44 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/sev_smoke_test.c b/tools/testing/selftests/kvm/x86/sev_smoke_test.c
index 86f17e59e9392..7a91a113c4fb7 100644
--- a/tools/testing/selftests/kvm/x86/sev_smoke_test.c
+++ b/tools/testing/selftests/kvm/x86/sev_smoke_test.c
@@ -365,7 +365,26 @@ static void guest_code_conversion(u8 *test_shared_gva, u8 *test_private_gva, u64
vmgexit();
}
-static void test_conversion(u64 policy)
+static void vm_set_memory_attributes_expect_error(struct kvm_vm *vm, u64 gpa,
+ size_t size, u64 attributes,
+ u64 flags, int expected_errno)
+{
+ loff_t error_offset = -1;
+ size_t len_ignored;
+ loff_t offset;
+ int gmem_fd;
+ int ret;
+
+ gmem_fd = kvm_gpa_to_guest_memfd(vm, gpa, &offset, &len_ignored);
+ ret = __gmem_set_memory_attributes(gmem_fd, offset, size, attributes,
+ &error_offset, flags);
+
+ TEST_ASSERT_EQ(ret, -1);
+ TEST_ASSERT_EQ(offset, error_offset);
+ TEST_ASSERT_EQ(errno, expected_errno);
+}
+
+static void test_conversion(u64 policy, u64 content_mode)
{
gva_t test_private_gva;
gva_t test_shared_gva;
@@ -409,6 +428,21 @@ static void test_conversion(u64 policy)
TEST_ASSERT_EQ(vcpu->run->hypercall.args[1], 1);
TEST_ASSERT_EQ(vcpu->run->hypercall.args[2], KVM_MAP_GPA_RANGE_ENCRYPTED | KVM_MAP_GPA_RANGE_PAGE_SZ_4K);
+ /* ZERO when setting memory attributes to private is always not supported. */
+ vm_set_memory_attributes_expect_error(vm, test_gpa, PAGE_SIZE,
+ KVM_MEMORY_ATTRIBUTE_PRIVATE,
+ KVM_SET_MEMORY_ATTRIBUTES2_ZERO,
+ EOPNOTSUPP);
+
+ /* PRESERVE is not supported for SNP. */
+ vm_set_memory_attributes_expect_error(vm, test_gpa, PAGE_SIZE, 0,
+ KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE,
+ EOPNOTSUPP);
+ vm_set_memory_attributes_expect_error(vm, test_gpa, PAGE_SIZE,
+ KVM_MEMORY_ATTRIBUTE_PRIVATE,
+ KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE,
+ EOPNOTSUPP);
+
vm_mem_set_private(vm, test_gpa, PAGE_SIZE, KVM_SET_MEMORY_ATTRIBUTES2_MODE_UNSPECIFIED);
vcpu_run(vcpu);
@@ -419,7 +453,12 @@ static void test_conversion(u64 policy)
TEST_ASSERT_EQ(vcpu->run->hypercall.args[1], 1);
TEST_ASSERT_EQ(vcpu->run->hypercall.args[2], KVM_MAP_GPA_RANGE_DECRYPTED | KVM_MAP_GPA_RANGE_PAGE_SZ_4K);
- vm_mem_set_shared(vm, test_gpa, PAGE_SIZE, KVM_SET_MEMORY_ATTRIBUTES2_MODE_UNSPECIFIED);
+ vm_mem_set_shared(vm, test_gpa, PAGE_SIZE, content_mode);
+
+ if (content_mode == KVM_SET_MEMORY_ATTRIBUTES2_ZERO)
+ TEST_ASSERT_EQ(READ_ONCE(*(u8 *)test_hva), 0);
+ else
+ fprintf(stderr, "test_hva contents = %x\n", READ_ONCE(*(u8 *)test_hva));
vcpu_run(vcpu);
@@ -441,7 +480,9 @@ int main(int argc, char *argv[])
// test_sev_smoke(guest_sev_es_code, KVM_X86_SEV_ES_VM, SEV_POLICY_ES);
if (kvm_cpu_has(X86_FEATURE_SEV_SNP)) {
- test_conversion(snp_default_policy());
+ test_conversion(snp_default_policy(), KVM_SET_MEMORY_ATTRIBUTES2_MODE_UNSPECIFIED);
+ test_conversion(snp_default_policy(), KVM_SET_MEMORY_ATTRIBUTES2_ZERO);
+
// test_sev_smoke(guest_snp_code, KVM_X86_SNP_VM, snp_default_policy());
}
--
2.54.0.545.g6539524ca2-goog
© 2016 - 2026 Red Hat, Inc.