[v3] Add AMD Secure Nested Paging (SEV-SNP) support

[PATCH v3 07/49] HostMem: Add mechanism to opt in kvm guest memfd via MachineState

Posted by Michael Roth 1 year, 10 months ago

From: Xiaoyao Li <xiaoyao.li@intel.com>

Add a new member "guest_memfd" to memory backends. When it's set
to true, it enables RAM_GUEST_MEMFD in ram_flags, thus private kvm
guest_memfd will be allocated during RAMBlock allocation.

Memory backend's @guest_memfd is wired with @require_guest_memfd
field of MachineState. It avoid looking up the machine in phymem.c.

MachineState::require_guest_memfd is supposed to be set by any VMs
that requires KVM guest memfd as private memory, e.g., TDX VM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
Changes in v4:
 - rename "require_guest_memfd" to "guest_memfd" in struct
   HostMemoryBackend;	(David Hildenbrand)
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 backends/hostmem-file.c  | 1 +
 backends/hostmem-memfd.c | 1 +
 backends/hostmem-ram.c   | 1 +
 backends/hostmem.c       | 1 +
 hw/core/machine.c        | 5 +++++
 include/hw/boards.h      | 2 ++
 include/sysemu/hostmem.h | 1 +
 7 files changed, 12 insertions(+)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index ac3e433cbd..3c69db7946 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -85,6 +85,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     ram_flags |= fb->readonly ? RAM_READONLY_FD : 0;
     ram_flags |= fb->rom == ON_OFF_AUTO_ON ? RAM_READONLY : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
     ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
     ram_flags |= RAM_NAMED_FILE;
     return memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 3923ea9364..745ead0034 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -55,6 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
     return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
                                           backend->size, ram_flags, fd, 0, errp);
 }
diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index d121249f0f..f7d81af783 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -30,6 +30,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
     return memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend),
                                                   name, backend->size,
                                                   ram_flags, errp);
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 81a72ce40b..eb9682b4a8 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -277,6 +277,7 @@ static void host_memory_backend_init(Object *obj)
     /* TODO: convert access to globals to compat properties */
     backend->merge = machine_mem_merge(machine);
     backend->dump = machine_dump_guest_core(machine);
+    backend->guest_memfd = machine_require_guest_memfd(machine);
     backend->reserve = true;
     backend->prealloc_threads = machine->smp.cpus;
 }
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 37ede0e7d4..73ce9da835 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1198,6 +1198,11 @@ bool machine_mem_merge(MachineState *machine)
     return machine->mem_merge;
 }
 
+bool machine_require_guest_memfd(MachineState *machine)
+{
+    return machine->require_guest_memfd;
+}
+
 static char *cpu_slot_to_string(const CPUArchId *cpu)
 {
     GString *s = g_string_new(NULL);
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 8b8f6d5c00..44c2a4e1ec 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -36,6 +36,7 @@ bool machine_usb(MachineState *machine);
 int machine_phandle_start(MachineState *machine);
 bool machine_dump_guest_core(MachineState *machine);
 bool machine_mem_merge(MachineState *machine);
+bool machine_require_guest_memfd(MachineState *machine);
 HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine);
 void machine_set_cpu_numa_node(MachineState *machine,
                                const CpuInstanceProperties *props,
@@ -370,6 +371,7 @@ struct MachineState {
     char *dt_compatible;
     bool dump_guest_core;
     bool mem_merge;
+    bool require_guest_memfd;
     bool usb;
     bool usb_disabled;
     char *firmware;
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 0e411aaa29..04b884bf42 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -74,6 +74,7 @@ struct HostMemoryBackend {
     uint64_t size;
     bool merge, dump, use_canonical_path;
     bool prealloc, is_mapped, share, reserve;
+    bool guest_memfd;
     uint32_t prealloc_threads;
     ThreadContext *prealloc_context;
     DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
-- 
2.25.1

Re: [PATCH v3 07/49] HostMem: Add mechanism to opt in kvm guest memfd via MachineState

Posted by Peter Xu 1 year ago

On Wed, Mar 20, 2024 at 03:39:03AM -0500, Michael Roth wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Add a new member "guest_memfd" to memory backends. When it's set
> to true, it enables RAM_GUEST_MEMFD in ram_flags, thus private kvm
> guest_memfd will be allocated during RAMBlock allocation.
> 
> Memory backend's @guest_memfd is wired with @require_guest_memfd
> field of MachineState. It avoid looking up the machine in phymem.c.
> 
> MachineState::require_guest_memfd is supposed to be set by any VMs
> that requires KVM guest memfd as private memory, e.g., TDX VM.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> ---
> Changes in v4:
>  - rename "require_guest_memfd" to "guest_memfd" in struct
>    HostMemoryBackend;	(David Hildenbrand)
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  backends/hostmem-file.c  | 1 +
>  backends/hostmem-memfd.c | 1 +
>  backends/hostmem-ram.c   | 1 +
>  backends/hostmem.c       | 1 +
>  hw/core/machine.c        | 5 +++++
>  include/hw/boards.h      | 2 ++
>  include/sysemu/hostmem.h | 1 +
>  7 files changed, 12 insertions(+)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index ac3e433cbd..3c69db7946 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -85,6 +85,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>      ram_flags |= fb->readonly ? RAM_READONLY_FD : 0;
>      ram_flags |= fb->rom == ON_OFF_AUTO_ON ? RAM_READONLY : 0;
>      ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
> +    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
>      ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
>      ram_flags |= RAM_NAMED_FILE;
>      return memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
> diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
> index 3923ea9364..745ead0034 100644
> --- a/backends/hostmem-memfd.c
> +++ b/backends/hostmem-memfd.c
> @@ -55,6 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>      name = host_memory_backend_get_name(backend);
>      ram_flags = backend->share ? RAM_SHARED : 0;
>      ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
> +    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
>      return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
>                                            backend->size, ram_flags, fd, 0, errp);
>  }
> diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
> index d121249f0f..f7d81af783 100644
> --- a/backends/hostmem-ram.c
> +++ b/backends/hostmem-ram.c
> @@ -30,6 +30,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>      name = host_memory_backend_get_name(backend);
>      ram_flags = backend->share ? RAM_SHARED : 0;
>      ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
> +    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
>      return memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend),
>                                                    name, backend->size,
>                                                    ram_flags, errp);

These change look a bit confusing to me, as I don't see how gmemfd can be
used with either file or ram typed memory backends..

When specified gmemfd=on with those, IIUC it'll allocate both the memory
(ramblock->host) and gmemfd, but without using ->host.  Meanwhile AFAIU the
ramblock->host will start to conflict with gmemfd in the future when it
might be able to be mapp-able (having valid ->host).

I have a local fix for this (and actually more than below.. but starting
from it), I'm not sure whether I overlooked something, but from reading the
cover letter it's only using memfd backend which makes perfect sense to me
so far.  I also don't know the planning of coco patches merging so I don't
think even if valid this is urgent - I don't want to mess up on merging
plans..  but still want to collect some comments on whether it's valid:

===8<===

From edfdf019ab01e99fb4ff417e30bb3692b4e3b922 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Tue, 21 Jan 2025 12:31:19 -0500
Subject: [PATCH] hostmem: Disallow guest memfd for FILE or RAM typed backends

Guest memfd has very special semantics, which by default doesn't have a
path at all, meanwhile it won't proactively allocate anonymous memory.

Currently:

  - memory-backend-file: it is about creating a memory object based on a
  path in the file system.  It doesn't apply to gmemfd.

  - memory-backend-ram: it is about (mostly) trying to allocate anonymous
  memories from the system (private or shared).  It also doesn't apply to
  gmemfd.

Forbid the two types of memory backends to gmemfd, but only allow
memory-backend-memfd for it as of now.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 backends/hostmem-file.c | 8 +++++++-
 backends/hostmem-ram.c  | 7 ++++++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 46321fda84..c94cf8441b 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -52,11 +52,18 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
         error_setg(errp, "can't create backend with size 0");
         return false;
     }
+
     if (!fb->mem_path) {
         error_setg(errp, "mem-path property not set");
         return false;
     }
 
+    if (backend->guest_memfd) {
+        error_setg(errp, "File backends do not support guest memfd. "
+                   "Please use memfd backend");
+        return false;
+    }
+
     switch (fb->rom) {
     case ON_OFF_AUTO_AUTO:
         /* Traditionally, opening the file readonly always resulted in ROM. */
@@ -86,7 +93,6 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     ram_flags |= fb->readonly ? RAM_READONLY_FD : 0;
     ram_flags |= fb->rom == ON_OFF_AUTO_ON ? RAM_READONLY : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
-    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
     ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
     ram_flags |= RAM_NAMED_FILE;
     return memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index 39aac6bf35..8125be217c 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -27,10 +27,15 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
         return false;
     }
 
+    if (backend->guest_memfd) {
+        error_setg(errp, "File backends do not support guest memfd. "
+                   "Please use memfd backend");
+        return false;
+    }
+
     name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
-    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
     return memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend),
                                                   name, backend->size,
                                                   ram_flags, errp);
-- 
2.47.0
===8<===

Thanks,

-- 
Peter Xu

Re: [PATCH v3 07/49] HostMem: Add mechanism to opt in kvm guest memfd via MachineState

Posted by David Hildenbrand 1 year ago

On 21.01.25 18:39, Peter Xu wrote:
> On Wed, Mar 20, 2024 at 03:39:03AM -0500, Michael Roth wrote:
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>
>> Add a new member "guest_memfd" to memory backends. When it's set
>> to true, it enables RAM_GUEST_MEMFD in ram_flags, thus private kvm
>> guest_memfd will be allocated during RAMBlock allocation.
>>
>> Memory backend's @guest_memfd is wired with @require_guest_memfd
>> field of MachineState. It avoid looking up the machine in phymem.c.
>>
>> MachineState::require_guest_memfd is supposed to be set by any VMs
>> that requires KVM guest memfd as private memory, e.g., TDX VM.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
>> ---
>> Changes in v4:
>>   - rename "require_guest_memfd" to "guest_memfd" in struct
>>     HostMemoryBackend;	(David Hildenbrand)
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>>   backends/hostmem-file.c  | 1 +
>>   backends/hostmem-memfd.c | 1 +
>>   backends/hostmem-ram.c   | 1 +
>>   backends/hostmem.c       | 1 +
>>   hw/core/machine.c        | 5 +++++
>>   include/hw/boards.h      | 2 ++
>>   include/sysemu/hostmem.h | 1 +
>>   7 files changed, 12 insertions(+)
>>
>> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
>> index ac3e433cbd..3c69db7946 100644
>> --- a/backends/hostmem-file.c
>> +++ b/backends/hostmem-file.c
>> @@ -85,6 +85,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>>       ram_flags |= fb->readonly ? RAM_READONLY_FD : 0;
>>       ram_flags |= fb->rom == ON_OFF_AUTO_ON ? RAM_READONLY : 0;
>>       ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
>> +    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
>>       ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
>>       ram_flags |= RAM_NAMED_FILE;
>>       return memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
>> diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
>> index 3923ea9364..745ead0034 100644
>> --- a/backends/hostmem-memfd.c
>> +++ b/backends/hostmem-memfd.c
>> @@ -55,6 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>>       name = host_memory_backend_get_name(backend);
>>       ram_flags = backend->share ? RAM_SHARED : 0;
>>       ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
>> +    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
>>       return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
>>                                             backend->size, ram_flags, fd, 0, errp);
>>   }
>> diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
>> index d121249f0f..f7d81af783 100644
>> --- a/backends/hostmem-ram.c
>> +++ b/backends/hostmem-ram.c
>> @@ -30,6 +30,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>>       name = host_memory_backend_get_name(backend);
>>       ram_flags = backend->share ? RAM_SHARED : 0;
>>       ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
>> +    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
>>       return memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend),
>>                                                     name, backend->size,
>>                                                     ram_flags, errp);
> 
> These change look a bit confusing to me, as I don't see how gmemfd can be
> used with either file or ram typed memory backends..

I recall that the following should work:

"private" memory will come from guest_memfd, "shared" (as in, accessible 
by the host) will come from anonymous memory.

This "anon" memory cannot be "shared" with other processes, but 
virtio-kernel etc. can just use it.

To "share" the memory with other processes, we'd need memfd/file.

> 
> When specified gmemfd=on with those, IIUC it'll allocate both the memory
> (ramblock->host) and gmemfd, but without using ->host.  Meanwhile AFAIU the
> ramblock->host will start to conflict with gmemfd in the future when it
> might be able to be mapp-able (having valid ->host).

These will require a new guest_memfd memory backend (I recall that was 
discussed a couple of times).

> 
> I have a local fix for this (and actually more than below.. but starting
> from it), I'm not sure whether I overlooked something, but from reading the
> cover letter it's only using memfd backend which makes perfect sense to me
> so far. 

Does the anon+guest_memfd combination not work or are you speculating 
about the usability (which I hopefully addressed above).

-- 
Cheers,

David / dhildenb

Re: [PATCH v3 07/49] HostMem: Add mechanism to opt in kvm guest memfd via MachineState

Posted by Peter Xu 1 year ago

On Tue, Jan 21, 2025 at 07:24:29PM +0100, David Hildenbrand wrote:
> On 21.01.25 18:39, Peter Xu wrote:
> > On Wed, Mar 20, 2024 at 03:39:03AM -0500, Michael Roth wrote:
> > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > 
> > > Add a new member "guest_memfd" to memory backends. When it's set
> > > to true, it enables RAM_GUEST_MEMFD in ram_flags, thus private kvm
> > > guest_memfd will be allocated during RAMBlock allocation.
> > > 
> > > Memory backend's @guest_memfd is wired with @require_guest_memfd
> > > field of MachineState. It avoid looking up the machine in phymem.c.
> > > 
> > > MachineState::require_guest_memfd is supposed to be set by any VMs
> > > that requires KVM guest memfd as private memory, e.g., TDX VM.
> > > 
> > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > ---
> > > Changes in v4:
> > >   - rename "require_guest_memfd" to "guest_memfd" in struct
> > >     HostMemoryBackend;	(David Hildenbrand)
> > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > ---
> > >   backends/hostmem-file.c  | 1 +
> > >   backends/hostmem-memfd.c | 1 +
> > >   backends/hostmem-ram.c   | 1 +
> > >   backends/hostmem.c       | 1 +
> > >   hw/core/machine.c        | 5 +++++
> > >   include/hw/boards.h      | 2 ++
> > >   include/sysemu/hostmem.h | 1 +
> > >   7 files changed, 12 insertions(+)
> > > 
> > > diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> > > index ac3e433cbd..3c69db7946 100644
> > > --- a/backends/hostmem-file.c
> > > +++ b/backends/hostmem-file.c
> > > @@ -85,6 +85,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
> > >       ram_flags |= fb->readonly ? RAM_READONLY_FD : 0;
> > >       ram_flags |= fb->rom == ON_OFF_AUTO_ON ? RAM_READONLY : 0;
> > >       ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
> > > +    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
> > >       ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
> > >       ram_flags |= RAM_NAMED_FILE;
> > >       return memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
> > > diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
> > > index 3923ea9364..745ead0034 100644
> > > --- a/backends/hostmem-memfd.c
> > > +++ b/backends/hostmem-memfd.c
> > > @@ -55,6 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
> > >       name = host_memory_backend_get_name(backend);
> > >       ram_flags = backend->share ? RAM_SHARED : 0;
> > >       ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
> > > +    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
> > >       return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
> > >                                             backend->size, ram_flags, fd, 0, errp);
> > >   }
> > > diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
> > > index d121249f0f..f7d81af783 100644
> > > --- a/backends/hostmem-ram.c
> > > +++ b/backends/hostmem-ram.c
> > > @@ -30,6 +30,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
> > >       name = host_memory_backend_get_name(backend);
> > >       ram_flags = backend->share ? RAM_SHARED : 0;
> > >       ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
> > > +    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
> > >       return memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend),
> > >                                                     name, backend->size,
> > >                                                     ram_flags, errp);
> > 
> > These change look a bit confusing to me, as I don't see how gmemfd can be
> > used with either file or ram typed memory backends..
> 
> I recall that the following should work:
> 
> "private" memory will come from guest_memfd, "shared" (as in, accessible by
> the host) will come from anonymous memory.
> 
> This "anon" memory cannot be "shared" with other processes, but
> virtio-kernel etc. can just use it.
> 
> To "share" the memory with other processes, we'd need memfd/file.

Ah OK, thanks David.  Is this the planned long term solution for
vhost-kernel?

I wonder what happens if vhost tries to DMA to a region that is private
with this setup.

AFAIU, it'll try to DMA to the fake address of ramblock->host that is
pointing to by the memory backend (either anon, shmem, file, etc.).  The
ideal case IIUC is it should crash QEMU because it's trying to access an
illegal page which is private. But if with this model, it won't crash but
silently populate some page in the non-gmemfd backend.

Is that expected?

> 
> > 
> > When specified gmemfd=on with those, IIUC it'll allocate both the memory
> > (ramblock->host) and gmemfd, but without using ->host.  Meanwhile AFAIU the
> > ramblock->host will start to conflict with gmemfd in the future when it
> > might be able to be mapp-able (having valid ->host).
> 
> These will require a new guest_memfd memory backend (I recall that was
> discussed a couple of times).

Do you know if anyone is working on this one?

> 
> > 
> > I have a local fix for this (and actually more than below.. but starting
> > from it), I'm not sure whether I overlooked something, but from reading the
> > cover letter it's only using memfd backend which makes perfect sense to me
> > so far.
> 
> Does the anon+guest_memfd combination not work or are you speculating about
> the usability (which I hopefully addressed above).

IIUC, if with above solution and with how QEMU interacts memory convertions
right now, at least hugetlb pages will suffer from double allocation, as
kvm_convert_memory() won't free hugetlb pages even if converted to private.

It sounds like also doable (and also preferrable..) that for each of the VM
we always stich with pages in the gmemfd page cache, no matter if it's
shared or private.  For private, we could zap all pgtables and sigbus any
faults afterwards.  I thought that was always the plan, but I could lose
many latest informations..

Thanks,

-- 
Peter Xu

Re: [PATCH v3 07/49] HostMem: Add mechanism to opt in kvm guest memfd via MachineState

Posted by David Hildenbrand 1 year ago

>> This "anon" memory cannot be "shared" with other processes, but
>> virtio-kernel etc. can just use it.
>>
>> To "share" the memory with other processes, we'd need memfd/file.
> 
> Ah OK, thanks David.  Is this the planned long term solution for
> vhost-kernel?

I think the basic idea was that the memory backend defines how the 
"non-private" memory is backed, which is the same just like for any 
other non-CC VM.

The "private" memory always comes from guest_memfd.

So for the time being using anon+guest_memfd coresponds to "just a 
simple VM".

Long-term I expect that we use guest_memfd for shared+private, and use 
in-place conversion. Access to "private" memory using the mmap() will 
result in a SIGBUS.

 > > I wonder what happens if vhost tries to DMA to a region that is private
> with this setup.
 > > AFAIU, it'll try to DMA to the fake address of ramblock->host that is
> pointing to by the memory backend (either anon, shmem, file, etc.).  The
> ideal case IIUC is it should crash QEMU because it's trying to access an
> illegal page which is private. But if with this model, it won't crash but
> silently populate some page in the non-gmemfd backend.
> 
> Is that expected?

Yes, it's all just a big mmap() which will populate memory on access -- 
independent of using anon/file/memfd.

Similar to virtio-mem, long-term we'd want a mechanism to check/enforce 
that some memory in there will not be populated on access from QEMU 
(well, and vhost-user processes ...).

In memory_get_xlat_addr() we perform such checks, but it's only used for 
iommu. vhost-kernel likely has no such checks, just like vhost-user etc 
does not.

> 
>>
>>>
>>> When specified gmemfd=on with those, IIUC it'll allocate both the memory
>>> (ramblock->host) and gmemfd, but without using ->host.  Meanwhile AFAIU the
>>> ramblock->host will start to conflict with gmemfd in the future when it
>>> might be able to be mapp-able (having valid ->host).
>>
>> These will require a new guest_memfd memory backend (I recall that was
>> discussed a couple of times).
> 
> Do you know if anyone is working on this one?

So far my understanding is that Google that does shared+private 
guest_memfd kernel part won't be working on QEMU patches. I raised that 
to our management recently, that this would be a good project for RH to 
focus on.

I am not aware of real implementations of the guest_memfd backend (yet).

> 
>>
>>>
>>> I have a local fix for this (and actually more than below.. but starting
>>> from it), I'm not sure whether I overlooked something, but from reading the
>>> cover letter it's only using memfd backend which makes perfect sense to me
>>> so far.
>>
>> Does the anon+guest_memfd combination not work or are you speculating about
>> the usability (which I hopefully addressed above).
> 
> IIUC, if with above solution and with how QEMU interacts memory convertions
> right now, at least hugetlb pages will suffer from double allocation, as
> kvm_convert_memory() won't free hugetlb pages even if converted to private.

Yes, that's why I'm invested in teaching guest_memfd in-place conversion 
alongside huge page support (which fortunately Google engineers are 
doing great work on).

> 
> It sounds like also doable (and also preferrable..) that for each of the VM
> we always stich with pages in the gmemfd page cache, no matter if it's
> shared or private.  For private, we could zap all pgtables and sigbus any
> faults afterwards.  I thought that was always the plan, but I could lose
> many latest informations..

Yes, with the guest_memfd backend (shared+private) that's the plan: 
SIGBUS on invalid access.


-- 
Cheers,

David / dhildenb

Re: [PATCH v3 07/49] HostMem: Add mechanism to opt in kvm guest memfd via MachineState

Posted by Peter Xu 1 year ago

On Tue, Jan 21, 2025 at 09:41:55PM +0100, David Hildenbrand wrote:
> So far my understanding is that Google that does shared+private guest_memfd
> kernel part won't be working on QEMU patches. I raised that to our
> management recently, that this would be a good project for RH to focus on.
> 
> I am not aware of real implementations of the guest_memfd backend (yet).

I see, thanks, those are pretty useful information to me.

I think I have a rough picture now.  Since I've already had some patches
going that direction, I'll give it a shot.  I'll keep you updated if I get
something.

-- 
Peter Xu

Re: [PATCH v3 07/49] HostMem: Add mechanism to opt in kvm guest memfd via MachineState

Posted by David Hildenbrand 1 year ago

On 21.01.25 21:59, Peter Xu wrote:
> On Tue, Jan 21, 2025 at 09:41:55PM +0100, David Hildenbrand wrote:
>> So far my understanding is that Google that does shared+private guest_memfd
>> kernel part won't be working on QEMU patches. I raised that to our
>> management recently, that this would be a good project for RH to focus on.
>>
>> I am not aware of real implementations of the guest_memfd backend (yet).
> 
> I see, thanks, those are pretty useful information to me.
> 
> I think I have a rough picture now.  Since I've already had some patches
> going that direction, I'll give it a shot.  I'll keep you updated if I get
> something.

Nice!

-- 
Cheers,

David / dhildenb