[RFC PATCH v2 00/21] QEMU gmem implemention

Xiaoyao Li posted 21 patches 7 months, 2 weeks ago
Failed in applying to current master (apply log)
accel/kvm/kvm-all.c               | 180 ++++++++++++++++++++-
accel/kvm/trace-events            |   4 +-
backends/hostmem-file.c           |   1 +
backends/hostmem-memfd.c          |   1 +
backends/hostmem-ram.c            |   1 +
backends/hostmem.c                |  18 +++
hw/i386/pc.c                      |   5 -
hw/i386/pc_q35.c                  |   3 +-
hw/i386/x86.c                     |  12 ++
hw/pci-host/q35.c                 |  61 ++++---
include/exec/cpu-common.h         |   2 +
include/exec/memory.h             |  20 +++
include/exec/ramblock.h           |   1 +
include/hw/i386/pc.h              |   4 +-
include/hw/i386/x86.h             |   1 +
include/hw/pci-host/q35.h         |   1 +
include/sysemu/hostmem.h          |   2 +-
include/sysemu/kvm.h              |   5 +
include/sysemu/kvm_int.h          |   2 +
linux-headers/asm-x86/kvm.h       |   3 +
linux-headers/linux/kvm.h         |  50 ++++++
qapi/qom.json                     |   5 +
softmmu/memory.c                  |  18 +++
softmmu/physmem.c                 | 256 ++++++++++++++++++------------
target/i386/kvm/kvm.c             |  43 ++++-
target/i386/kvm/kvm_i386.h        |   1 +
target/i386/kvm/meson.build       |   1 +
target/i386/kvm/sw-protected-vm.c |  71 +++++++++
target/i386/kvm/sw-protected-vm.h |  19 +++
target/i386/sev.c                 |   1 -
target/i386/sev.h                 |   2 +
31 files changed, 648 insertions(+), 146 deletions(-)
create mode 100644 target/i386/kvm/sw-protected-vm.c
create mode 100644 target/i386/kvm/sw-protected-vm.h
[RFC PATCH v2 00/21] QEMU gmem implemention
Posted by Xiaoyao Li 7 months, 2 weeks ago
It's the v2 RFC of enabling KVM gmem[1] as the backend for private
memory.

For confidential-computing, KVM provides gmem/guest_mem interfaces for
userspace, like QEMU, to allocate user-unaccesible private memory. This
series aims to add gmem support in QEMU's RAMBlock so that each RAM can
have both hva-based shared memory and gmem_fd based private memory. QEMU
does the shared-private conversion on KVM_MEMORY_EXIT and discards the
memory.

It chooses the design that adds "private" property to hostmeory backend.
If "private" property is set, QEMU will allocate/create KVM gmem when
initialize the RAMbloch of the memory backend. 

This sereis also introduces the first user of kvm gmem,
KVM_X86_SW_PROTECTED_VM. A KVM_X86_SW_PROTECTED_VM with private KVM gmem
can be created with 

  $qemu -object sw-protected-vm,id=sp-vm0 \
	-object memory-backend-ram,id=mem0,size=1G,private=on \
	-machine q35,kernel_irqchip=split,confidential-guest-support=sp-vm0,memory-backend=mem0 \
	...

Unfortunately this patch series fails the boot of OVMF at very early
stage due to triple fault, because KVM doesn't support emulating string IO
to private memory.

This version still leave some opens to be discussed:
1. whether we need "private" propery to be user-settable?

   It seems unnecessary because vm-type is determined. If the VM is
   confidential-guest, then the RAM of the guest must be able to be
   mapped as private, i.e., have kvm gmem backend. So QEMU can
   determine the value of "private" property automatiacally based on vm
   type.

   This also aligns with the board internal MemoryRegion that needs to
   have kvm gmem backend, e.g., TDX requires OVMF to act as private
   memory so bios memory region needs to have kvm gmem fd associated.
   QEMU no doubt will do it internally automatically.

2. hugepage support.

   KVM gmem can be allocated from hugetlbfs. How does QEMU determine
   when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
   easiest solution is create KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
   only when memory backend is HostMemoryBackendFile of hugetlbfs.

3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we need it?

   This series implements KVM_X86_SW_PROTECTED_VM because it's introduced
   with gmem together on KVM side and it's supposed to be the first user
   who requires KVM gmem. However the implementation is incomplete and
   there lacks the definition of how KVM_X86_SW_PROTECTED_VM works.

Any other idea/open/question is welcomed.

Beside, TDX QEMU implemetation is based on this series to provide
private gmem for TD private memory, which can be found at [2].
And it can work corresponding KVM [3] to boot TDX guest. 

[1] https://lore.kernel.org/all/20230718234512.1690985-1-seanjc@google.com/
[2] https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream
[3] https://github.com/intel/tdx/tree/kvm-upstream-2023.07.27-v6.5-rc2-workaround

===
Changes since rfc v1:
- Implement KVM_X86_SW_PROTECTED_VM with confidential-guest-support
interface;
- rename memory_region_can_be_private() to memory_region_has_gmem_fd();
- allocate kvm gmem fd when creating/initializing the memory backend by
introducing the RAM_KVM_GMEM flag;


Chao Peng (3):
  RAMBlock: Add support of KVM private gmem
  kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
  kvm: handle KVM_EXIT_MEMORY_FAULT

Isaku Yamahata (4):
  HostMem: Add private property and associate it with RAM_KVM_GMEM
  trace/kvm: Add trace for page convertion between shared and private
  pci-host/q35: Move PAM initialization above SMRAM initialization
  q35: Introduce smm_ranges property for q35-pci-host

Xiaoyao Li (14):
  *** HACK *** linux-headers: Update headers to pull in gmem APIs
  memory: Introduce memory_region_has_gmem_fd()
  i386: Add support for sw-protected-vm object
  i386/pc: Drop pc_machine_kvm_type()
  target/i386: Implement mc->kvm_type() to get VM type
  target/i386: Introduce kvm_confidential_guest_init()
  i386/kvm: Implement kvm_sw_protected_vm_init() for sw-protcted-vm
    specific functions
  kvm: Introduce support for memory_attributes
  kvm/memory: Introduce the infrastructure to set the default
    shared/private value
  i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM
  physmem: replace function name with __func__ in
    ram_block_discard_range()
  physmem: extract ram_block_discard_range_fd() from
    ram_block_discard_range()
  physmem: Introduce ram_block_convert_range()
  i386: Disable SMM mode for X86_SW_PROTECTED_VM

 accel/kvm/kvm-all.c               | 180 ++++++++++++++++++++-
 accel/kvm/trace-events            |   4 +-
 backends/hostmem-file.c           |   1 +
 backends/hostmem-memfd.c          |   1 +
 backends/hostmem-ram.c            |   1 +
 backends/hostmem.c                |  18 +++
 hw/i386/pc.c                      |   5 -
 hw/i386/pc_q35.c                  |   3 +-
 hw/i386/x86.c                     |  12 ++
 hw/pci-host/q35.c                 |  61 ++++---
 include/exec/cpu-common.h         |   2 +
 include/exec/memory.h             |  20 +++
 include/exec/ramblock.h           |   1 +
 include/hw/i386/pc.h              |   4 +-
 include/hw/i386/x86.h             |   1 +
 include/hw/pci-host/q35.h         |   1 +
 include/sysemu/hostmem.h          |   2 +-
 include/sysemu/kvm.h              |   5 +
 include/sysemu/kvm_int.h          |   2 +
 linux-headers/asm-x86/kvm.h       |   3 +
 linux-headers/linux/kvm.h         |  50 ++++++
 qapi/qom.json                     |   5 +
 softmmu/memory.c                  |  18 +++
 softmmu/physmem.c                 | 256 ++++++++++++++++++------------
 target/i386/kvm/kvm.c             |  43 ++++-
 target/i386/kvm/kvm_i386.h        |   1 +
 target/i386/kvm/meson.build       |   1 +
 target/i386/kvm/sw-protected-vm.c |  71 +++++++++
 target/i386/kvm/sw-protected-vm.h |  19 +++
 target/i386/sev.c                 |   1 -
 target/i386/sev.h                 |   2 +
 31 files changed, 648 insertions(+), 146 deletions(-)
 create mode 100644 target/i386/kvm/sw-protected-vm.c
 create mode 100644 target/i386/kvm/sw-protected-vm.h

-- 
2.34.1
Re: [RFC PATCH v2 00/21] QEMU gmem implemention
Posted by David Hildenbrand 7 months, 2 weeks ago
On 14.09.23 05:50, Xiaoyao Li wrote:
> It's the v2 RFC of enabling KVM gmem[1] as the backend for private
> memory.
> 
> For confidential-computing, KVM provides gmem/guest_mem interfaces for
> userspace, like QEMU, to allocate user-unaccesible private memory. This
> series aims to add gmem support in QEMU's RAMBlock so that each RAM can
> have both hva-based shared memory and gmem_fd based private memory. QEMU
> does the shared-private conversion on KVM_MEMORY_EXIT and discards the
> memory.
> 
> It chooses the design that adds "private" property to hostmeory backend.
> If "private" property is set, QEMU will allocate/create KVM gmem when
> initialize the RAMbloch of the memory backend.
> 
> This sereis also introduces the first user of kvm gmem,
> KVM_X86_SW_PROTECTED_VM. A KVM_X86_SW_PROTECTED_VM with private KVM gmem
> can be created with
> 
>    $qemu -object sw-protected-vm,id=sp-vm0 \
> 	-object memory-backend-ram,id=mem0,size=1G,private=on \
> 	-machine q35,kernel_irqchip=split,confidential-guest-support=sp-vm0,memory-backend=mem0 \
> 	...
> 
> Unfortunately this patch series fails the boot of OVMF at very early
> stage due to triple fault, because KVM doesn't support emulating string IO
> to private memory.

Is support being added? Or have we figured out what it would take to 
make it work?

How does this interact with other features (memory ballooning, virtiofs, 
vfio/mdev/...)?

> 
> This version still leave some opens to be discussed:
> 1. whether we need "private" propery to be user-settable?
> 
>     It seems unnecessary because vm-type is determined. If the VM is
>     confidential-guest, then the RAM of the guest must be able to be
>     mapped as private, i.e., have kvm gmem backend. So QEMU can
>     determine the value of "private" property automatiacally based on vm
>     type.
> 
>     This also aligns with the board internal MemoryRegion that needs to
>     have kvm gmem backend, e.g., TDX requires OVMF to act as private
>     memory so bios memory region needs to have kvm gmem fd associated.
>     QEMU no doubt will do it internally automatically.

Would it make sense to have some regions without "pivate" semantics? 
Like NVDIMMs?

> 
> 2. hugepage support.
> 
>     KVM gmem can be allocated from hugetlbfs. How does QEMU determine
>     when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
>     easiest solution is create KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
>     only when memory backend is HostMemoryBackendFile of hugetlbfs.

Good question.

Probably "if the memory backend uses huge pages, also use huge pages for 
the private gmem" makes sense.

... but it becomes a mess with preallocation ... which is what people 
should actually be using with hugetlb. Andeventual double 
memory-consumption ... but maybe that's all been taken care of already?

Probably it's best to leave hugetlb support as future work and start 
with something minimal.

> 
> 3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we need it?
> 

Why implement it when you have to ask others for a motivation? ;)

Personally, I'm not sure if it is really useful, especially in this state.

>     This series implements KVM_X86_SW_PROTECTED_VM because it's introduced
>     with gmem together on KVM side and it's supposed to be the first user
>     who requires KVM gmem. However the implementation is incomplete and
>     there lacks the definition of how KVM_X86_SW_PROTECTED_VM works.

Then it should not be included in this series such that you can make 
progress with the gmem implementation for TDX guests instead?

-- 
Cheers,

David / dhildenb
Re: [RFC PATCH v2 00/21] QEMU gmem implemention
Posted by Xiaoyao Li 7 months, 2 weeks ago
On 9/14/2023 9:09 PM, David Hildenbrand wrote:
> On 14.09.23 05:50, Xiaoyao Li wrote:
>> It's the v2 RFC of enabling KVM gmem[1] as the backend for private
>> memory.
>>
>> For confidential-computing, KVM provides gmem/guest_mem interfaces for
>> userspace, like QEMU, to allocate user-unaccesible private memory. This
>> series aims to add gmem support in QEMU's RAMBlock so that each RAM can
>> have both hva-based shared memory and gmem_fd based private memory. QEMU
>> does the shared-private conversion on KVM_MEMORY_EXIT and discards the
>> memory.
>>
>> It chooses the design that adds "private" property to hostmeory backend.
>> If "private" property is set, QEMU will allocate/create KVM gmem when
>> initialize the RAMbloch of the memory backend.
>>
>> This sereis also introduces the first user of kvm gmem,
>> KVM_X86_SW_PROTECTED_VM. A KVM_X86_SW_PROTECTED_VM with private KVM gmem
>> can be created with
>>
>>    $qemu -object sw-protected-vm,id=sp-vm0 \
>>     -object memory-backend-ram,id=mem0,size=1G,private=on \
>>     -machine 
>> q35,kernel_irqchip=split,confidential-guest-support=sp-vm0,memory-backend=mem0 \
>>     ...
>>
>> Unfortunately this patch series fails the boot of OVMF at very early
>> stage due to triple fault, because KVM doesn't support emulating 
>> string IO
>> to private memory.
> 
> Is support being added? Or have we figured out what it would take to 
> make it work?

Hi David,

I only reply the questions that werrn't covered by Sean's reply.

> How does this interact with other features (memory ballooning, virtiofs, 
> vfio/mdev/...)?

I need time to learn them before I can answer it.

>>
>> This version still leave some opens to be discussed:
>> 1. whether we need "private" propery to be user-settable?
>>
>>     It seems unnecessary because vm-type is determined. If the VM is
>>     confidential-guest, then the RAM of the guest must be able to be
>>     mapped as private, i.e., have kvm gmem backend. So QEMU can
>>     determine the value of "private" property automatiacally based on vm
>>     type.
>>
>>     This also aligns with the board internal MemoryRegion that needs to
>>     have kvm gmem backend, e.g., TDX requires OVMF to act as private
>>     memory so bios memory region needs to have kvm gmem fd associated.
>>     QEMU no doubt will do it internally automatically.
> 
> Would it make sense to have some regions without "pivate" semantics? 
> Like NVDIMMs?

Of course it can have regions without "private" semantics.

Whether a region needs "private" backend depends on the definition of VM 
type. E.g., for TDX,
  - all the RAM needs to able to mapped as private. So it needs private 
gmem.
  - TDVF(OVMF) code must be mapped as private. So it needs private gmem.
  - MMIO region needs to be shared for TDX 1.0, and it doesn't need 
private gmem;

>>
>> 2. hugepage support.
>>
>>     KVM gmem can be allocated from hugetlbfs. How does QEMU determine
>>     when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
>>     easiest solution is create KVM gmem with 
>> KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
>>     only when memory backend is HostMemoryBackendFile of hugetlbfs.
> 
> Good question.
> 
> Probably "if the memory backend uses huge pages, also use huge pages for 
> the private gmem" makes sense.
> 
> ... but it becomes a mess with preallocation ... which is what people 
> should actually be using with hugetlb. Andeventual double 
> memory-consumption ... but maybe that's all been taken care of already?
> 
> Probably it's best to leave hugetlb support as future work and start 
> with something minimal.
> 

As Sean replied, I had some misunderstanding of 
KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. If it's for THP, I think we can allow it 
for every gmem.

As for hugetlb, we can leave it as future work.


Re: [RFC PATCH v2 00/21] QEMU gmem implemention
Posted by David Hildenbrand 7 months, 1 week ago
>>>
>>> This version still leave some opens to be discussed:
>>> 1. whether we need "private" propery to be user-settable?
>>>
>>>      It seems unnecessary because vm-type is determined. If the VM is
>>>      confidential-guest, then the RAM of the guest must be able to be
>>>      mapped as private, i.e., have kvm gmem backend. So QEMU can
>>>      determine the value of "private" property automatiacally based on vm
>>>      type.
>>>
>>>      This also aligns with the board internal MemoryRegion that needs to
>>>      have kvm gmem backend, e.g., TDX requires OVMF to act as private
>>>      memory so bios memory region needs to have kvm gmem fd associated.
>>>      QEMU no doubt will do it internally automatically.
>>
>> Would it make sense to have some regions without "pivate" semantics?
>> Like NVDIMMs?
> 
> Of course it can have regions without "private" semantics.

I meant "RAM memory regions on such a special VM". Does it make sense to 
have !private there? I assume "for now, not".

>>>
>>> 2. hugepage support.
>>>
>>>      KVM gmem can be allocated from hugetlbfs. How does QEMU determine
>>>      when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
>>>      easiest solution is create KVM gmem with
>>> KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
>>>      only when memory backend is HostMemoryBackendFile of hugetlbfs.
>>
>> Good question.
>>
>> Probably "if the memory backend uses huge pages, also use huge pages for
>> the private gmem" makes sense.
>>
>> ... but it becomes a mess with preallocation ... which is what people
>> should actually be using with hugetlb. Andeventual double
>> memory-consumption ... but maybe that's all been taken care of already?
>>
>> Probably it's best to leave hugetlb support as future work and start
>> with something minimal.
>>
> 
> As Sean replied, I had some misunderstanding of
> KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. If it's for THP, I think we can allow it
> for every gmem.

Right, just like we do a MADV_HUGEPAGE rather blindly on all memory.

-- 
Cheers,

David / dhildenb


Re: [RFC PATCH v2 00/21] QEMU gmem implemention
Posted by Sean Christopherson 7 months, 2 weeks ago
On Thu, Sep 14, 2023, David Hildenbrand wrote:
> On 14.09.23 05:50, Xiaoyao Li wrote:
> > It's the v2 RFC of enabling KVM gmem[1] as the backend for private
> > memory.
> > 
> > For confidential-computing, KVM provides gmem/guest_mem interfaces for
> > userspace, like QEMU, to allocate user-unaccesible private memory. This
> > series aims to add gmem support in QEMU's RAMBlock so that each RAM can
> > have both hva-based shared memory and gmem_fd based private memory. QEMU
> > does the shared-private conversion on KVM_MEMORY_EXIT and discards the
> > memory.
> > 
> > It chooses the design that adds "private" property to hostmeory backend.
> > If "private" property is set, QEMU will allocate/create KVM gmem when
> > initialize the RAMbloch of the memory backend.
> > 
> > This sereis also introduces the first user of kvm gmem,
> > KVM_X86_SW_PROTECTED_VM. A KVM_X86_SW_PROTECTED_VM with private KVM gmem
> > can be created with
> > 
> >    $qemu -object sw-protected-vm,id=sp-vm0 \
> > 	-object memory-backend-ram,id=mem0,size=1G,private=on \
> > 	-machine q35,kernel_irqchip=split,confidential-guest-support=sp-vm0,memory-backend=mem0 \
> > 	...
> > 
> > Unfortunately this patch series fails the boot of OVMF at very early
> > stage due to triple fault, because KVM doesn't support emulating string IO
> > to private memory.
> 
> Is support being added? Or have we figured out what it would take to make it
> work?

Hrm, this isn't something I've thought deeply about.  The issue is that anything
that reaches any form of copy_{from,to}_user() will go kablooie because KVM will
always try to read/write the shared mappings.  The best case scenario is that the
shared mapping is invalid and the uaccess faults.  The worst case scenario is
that KVM read/writes the wrong memory and sends the guest into the weeds.  Eww.

And we (well, at least I) definitely want to support this so that gmem can be
used for "regular" VMs, i.e. for VMs where userspace is in the TCB, but for which
userspace doesn't have access to guest memory by default.

It shouldn't be too hard to support.  It's easy enough to wire up the hook
(thankfully that aren't _that_ many sites), and gmem only supports struct page at
the moment so we go straight to kmap.  E.g. something like this

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 54480655bcce..b500b0ce5ce3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3291,12 +3291,15 @@ static int next_segment(unsigned long len, int offset)
                return len;
 }
 
-static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
-                                void *data, int offset, int len)
+static int __kvm_read_guest_page(struct kvm *kvm, struct kvm_memory_slot *slot,
+                                gfn_t gfn, void *data, int offset, int len)
 {
        int r;
        unsigned long addr;
 
+       if (kvm_mem_is_private(kvm, gfn))
+               return kvm_gmem_read(slot, gfn, data, offset, len);
+
        addr = gfn_to_hva_memslot_prot(slot, gfn, NULL);
        if (kvm_is_error_hva(addr))
                return -EFAULT;
@@ -3309,9 +3312,8 @@ static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
 int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
                        int len)
 {
-       struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
-
-       return __kvm_read_guest_page(slot, gfn, data, offset, len);
+       return __kvm_read_guest_page(kvm, gfn_to_memslot(kvm, gfn), gfn, data,
+                                    offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_read_guest_page);
 
@@ -3320,7 +3322,7 @@ int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data,
 {
        struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 
-       return __kvm_read_guest_page(slot, gfn, data, offset, len);
+       return __kvm_read_guest_page(vcpu->kvm, slot, gfn, data, offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page);
 
> > 2. hugepage support.
> > 
> >     KVM gmem can be allocated from hugetlbfs. How does QEMU determine

Not yet it can't.  gmem only supports THP, hugetlbfs is a future thing, if it's
ever supported.  I wouldn't be at all surprised if we end up going down a slightly
different route and don't use hugetlbfs directly.

> >     when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
> >     easiest solution is create KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
> >     only when memory backend is HostMemoryBackendFile of hugetlbfs.
> 
> Good question.
> 
> Probably "if the memory backend uses huge pages, also use huge pages for the
> private gmem" makes sense.
> 
> ... but it becomes a mess with preallocation ... which is what people should
> actually be using with hugetlb. Andeventual double memory-consumption ...
> but maybe that's all been taken care of already?
> 
> Probably it's best to leave hugetlb support as future work and start with
> something minimal.
> 
> > 
> > 3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we need it?
> > 
> 
> Why implement it when you have to ask others for a motivation? ;)
> 
> Personally, I'm not sure if it is really useful, especially in this state.

Yeah, as of today, KVM_X86_SW_PROTECTED_VM is mainly a development vehicle,
e.g. so that testing gmem doesn't require TDX/SNP hardware, debugging gmem guests
isn't brutally painful, etc.

Longer term, I have aspirations of being able to back most VMs with gmem, but
that's going to require quite a bit more work, e.g. gmem needs to be mappable
(when hardware allows it) so that gmem doesn't all but require double mapping,
KVM obviously needs to be able to read/write gmem, etc.

The value proposition is that having a guest-first memory type will allow KVM to
optimize and harden gmem in ways that wouldn't be feasible for a more generic
memory implementation.  E.g. memory isn't mapped into host userspace by default
(makes it harder to accidentally corrupt the guest), the guest can have *larger*
mappings than host userspace, guest memory can be served from a dedicated pool
(similar-ish to hugetlb), the pool can be omitted from the direct map, etc.

> >     This series implements KVM_X86_SW_PROTECTED_VM because it's introduced
> >     with gmem together on KVM side and it's supposed to be the first user
> >     who requires KVM gmem. However the implementation is incomplete and
> >     there lacks the definition of how KVM_X86_SW_PROTECTED_VM works.
> 
> Then it should not be included in this series such that you can make
> progress with the gmem implementation for TDX guests instead?
Re: [RFC PATCH v2 00/21] QEMU gmem implemention
Posted by David Hildenbrand 7 months, 1 week ago
>>> 2. hugepage support.
>>>
>>>      KVM gmem can be allocated from hugetlbfs. How does QEMU determine
> 
> Not yet it can't.  gmem only supports THP, hugetlbfs is a future thing, if it's
> ever supported.  I wouldn't be at all surprised if we end up going down a slightly
> different route and don't use hugetlbfs directly.

Agreed. Certainly future work.

> 
>>>      when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
>>>      easiest solution is create KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
>>>      only when memory backend is HostMemoryBackendFile of hugetlbfs.
>>
>> Good question.
>>
>> Probably "if the memory backend uses huge pages, also use huge pages for the
>> private gmem" makes sense.
>>
>> ... but it becomes a mess with preallocation ... which is what people should
>> actually be using with hugetlb. Andeventual double memory-consumption ...
>> but maybe that's all been taken care of already?
>>
>> Probably it's best to leave hugetlb support as future work and start with
>> something minimal.
>>
>>>
>>> 3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we need it?
>>>
>>
>> Why implement it when you have to ask others for a motivation? ;)
>>
>> Personally, I'm not sure if it is really useful, especially in this state.
> 
> Yeah, as of today, KVM_X86_SW_PROTECTED_VM is mainly a development vehicle,
> e.g. so that testing gmem doesn't require TDX/SNP hardware, debugging gmem guests
> isn't brutally painful, etc.
> 
> Longer term, I have aspirations of being able to back most VMs with gmem, but
> that's going to require quite a bit more work, e.g. gmem needs to be mappable
> (when hardware allows it) so that gmem doesn't all but require double mapping,
> KVM obviously needs to be able to read/write gmem, etc.
> 
> The value proposition is that having a guest-first memory type will allow KVM to
> optimize and harden gmem in ways that wouldn't be feasible for a more generic
> memory implementation.  E.g. memory isn't mapped into host userspace by default
> (makes it harder to accidentally corrupt the guest), the guest can have *larger*
> mappings than host userspace, guest memory can be served from a dedicated pool
> (similar-ish to hugetlb), the pool can be omitted from the direct map, etc.
>
Thanks for that information. Personally, I don't believe "to back most 
VMs with gmem", but that's a different discussion.

As a development vehicle to get TDX up and running it might be very 
valuable indeed. But it doesn't necessarily have to be merged in QEMU 
for that case -- especially in a semi-finished form.

-- 
Cheers,

David / dhildenb
Re: [RFC PATCH v2 00/21] QEMU gmem implemention
Posted by Xiaoyao Li 7 months, 1 week ago
On 9/21/2023 5:11 PM, David Hildenbrand wrote:
>>>> 3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we 
>>>> need it?
>>>>
>>>
>>> Why implement it when you have to ask others for a motivation? 😉
>>>
>>> Personally, I'm not sure if it is really useful, especially in this 
>>> state.
>>
>> Yeah, as of today, KVM_X86_SW_PROTECTED_VM is mainly a development 
>> vehicle,
>> e.g. so that testing gmem doesn't require TDX/SNP hardware, debugging 
>> gmem guests
>> isn't brutally painful, etc.
>>
>> Longer term, I have aspirations of being able to back most VMs with 
>> gmem, but
>> that's going to require quite a bit more work, e.g. gmem needs to be 
>> mappable
>> (when hardware allows it) so that gmem doesn't all but require double 
>> mapping,
>> KVM obviously needs to be able to read/write gmem, etc.
>>
>> The value proposition is that having a guest-first memory type will 
>> allow KVM to
>> optimize and harden gmem in ways that wouldn't be feasible for a more 
>> generic
>> memory implementation.  E.g. memory isn't mapped into host userspace 
>> by default
>> (makes it harder to accidentally corrupt the guest), the guest can 
>> have *larger*
>> mappings than host userspace, guest memory can be served from a 
>> dedicated pool
>> (similar-ish to hugetlb), the pool can be omitted from the direct map, 
>> etc.
>>
> Thanks for that information. Personally, I don't believe "to back most 
> VMs with gmem", but that's a different discussion.
> 
> As a development vehicle to get TDX up and running it might be very 
> valuable indeed. But it doesn't necessarily have to be merged in QEMU 
> for that case -- especially in a semi-finished form.

It's true and I agree with it. I'll drop the KVM_X86_SW_PROTECTED_VM 
part in next version.

How would you like this series to proceed in next version? only the 
patches of gmem support without a user? or together with next QEMU TDX 
series?

Re: [RFC PATCH v2 00/21] QEMU gmem implemention
Posted by David Hildenbrand 7 months, 1 week ago
On 22.09.23 09:03, Xiaoyao Li wrote:
> On 9/21/2023 5:11 PM, David Hildenbrand wrote:
>>>>> 3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we
>>>>> need it?
>>>>>
>>>>
>>>> Why implement it when you have to ask others for a motivation? 😉
>>>>
>>>> Personally, I'm not sure if it is really useful, especially in this
>>>> state.
>>>
>>> Yeah, as of today, KVM_X86_SW_PROTECTED_VM is mainly a development
>>> vehicle,
>>> e.g. so that testing gmem doesn't require TDX/SNP hardware, debugging
>>> gmem guests
>>> isn't brutally painful, etc.
>>>
>>> Longer term, I have aspirations of being able to back most VMs with
>>> gmem, but
>>> that's going to require quite a bit more work, e.g. gmem needs to be
>>> mappable
>>> (when hardware allows it) so that gmem doesn't all but require double
>>> mapping,
>>> KVM obviously needs to be able to read/write gmem, etc.
>>>
>>> The value proposition is that having a guest-first memory type will
>>> allow KVM to
>>> optimize and harden gmem in ways that wouldn't be feasible for a more
>>> generic
>>> memory implementation.  E.g. memory isn't mapped into host userspace
>>> by default
>>> (makes it harder to accidentally corrupt the guest), the guest can
>>> have *larger*
>>> mappings than host userspace, guest memory can be served from a
>>> dedicated pool
>>> (similar-ish to hugetlb), the pool can be omitted from the direct map,
>>> etc.
>>>
>> Thanks for that information. Personally, I don't believe "to back most
>> VMs with gmem", but that's a different discussion.
>>
>> As a development vehicle to get TDX up and running it might be very
>> valuable indeed. But it doesn't necessarily have to be merged in QEMU
>> for that case -- especially in a semi-finished form.
> 
> It's true and I agree with it. I'll drop the KVM_X86_SW_PROTECTED_VM
> part in next version.
> 
> How would you like this series to proceed in next version? only the
> patches of gmem support without a user? or together with next QEMU TDX
> series?

Makes sense to me. GMEM series can be a prereq for QEMU TDX.

-- 
Cheers,

David / dhildenb