As the commit 852f0048f3 ("RAMBlock: make guest_memfd require
uncoordinated discard") highlighted, some subsystems like VFIO may
disable ram block discard. However, guest_memfd relies on the discard
operation to perform page conversion between private and shared memory.
This can lead to stale IOMMU mapping issue when assigning a hardware
device to a confidential VM via shared memory. To address this, it is
crucial to ensure systems like VFIO refresh its IOMMU mappings.
RamDiscardManager is an existing concept (used by virtio-mem) to adjust
VFIO mappings in relation to VM page assignment. Effectively page
conversion is similar to hot-removing a page in one mode and adding it
back in the other. Therefore, similar actions are required for page
conversion events. Introduce the RamDiscardManager to guest_memfd to
facilitate this process.
Since guest_memfd is not an object, it cannot directly implement the
RamDiscardManager interface. One potential attempt is to implement it in
HostMemoryBackend. This is not appropriate because guest_memfd is per
RAMBlock. Some RAMBlocks have a memory backend but others do not. In
particular, the ones like virtual BIOS calling
memory_region_init_ram_guest_memfd() do not.
To manage the RAMBlocks with guest_memfd, define a new object named
MemoryAttributeManager to implement the RamDiscardManager interface. The
object stores guest_memfd information such as shared_bitmap, and handles
page conversion notification. Using the name of MemoryAttributeManager is
aimed to make it more generic. The term "Memory" emcompasses not only RAM
but also private MMIO in TEE I/O, which might rely on this
object/interface to handle page conversion events in the future. The
term "Attribute" allows for the management of various attributes beyond
shared and private. For instance, it could support scenarios where
discard vs. populated and shared vs. private states co-exists, such as
supporting virtio-mem or something similar in the future.
In the current context, MemoryAttributeManager signifies discarded state
as private and populated state as shared. Memory state is tracked at the
host page size granularity, as the minimum memory conversion size can be one
page per request. Additionally, VFIO expects the DMA mapping for a
specific iova to be mapped and unmapped with the same granularity.
Confidential VMs may perform partial conversions, e.g. conversion
happens on a small region within a large region. To prevent such invalid
cases and until cut_mapping operation support is introduced, all
operations are performed with 4K granularity.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v3:
- Some rename (bitmap_size->shared_bitmap_size,
first_one/zero_bit->first_bit, etc.)
- Change shared_bitmap_size from uint32_t to unsigned
- Return mgr->mr->ram_block->page_size in get_block_size()
- Move set_ram_discard_manager() up to avoid a g_free() in failure
case.
- Add const for the memory_attribute_manager_get_block_size()
- Unify the ReplayRamPopulate and ReplayRamDiscard and related
callback.
Changes in v2:
- Rename the object name to MemoryAttributeManager
- Rename the bitmap to shared_bitmap to make it more clear.
- Remove block_size field and get it from a helper. In future, we
can get the page_size from RAMBlock if necessary.
- Remove the unncessary "struct" before GuestMemfdReplayData
- Remove the unncessary g_free() for the bitmap
- Add some error report when the callback failure for
populated/discarded section.
- Move the realize()/unrealize() definition to this patch.
---
include/system/memory-attribute-manager.h | 42 ++++
system/memory-attribute-manager.c | 283 ++++++++++++++++++++++
system/meson.build | 1 +
3 files changed, 326 insertions(+)
create mode 100644 include/system/memory-attribute-manager.h
create mode 100644 system/memory-attribute-manager.c
diff --git a/include/system/memory-attribute-manager.h b/include/system/memory-attribute-manager.h
new file mode 100644
index 0000000000..23375a14b8
--- /dev/null
+++ b/include/system/memory-attribute-manager.h
@@ -0,0 +1,42 @@
+/*
+ * QEMU memory attribute manager
+ *
+ * Copyright Intel
+ *
+ * Author:
+ * Chenyi Qiang <chenyi.qiang@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory
+ *
+ */
+
+#ifndef SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H
+#define SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H
+
+#include "system/hostmem.h"
+
+#define TYPE_MEMORY_ATTRIBUTE_MANAGER "memory-attribute-manager"
+
+OBJECT_DECLARE_TYPE(MemoryAttributeManager, MemoryAttributeManagerClass, MEMORY_ATTRIBUTE_MANAGER)
+
+struct MemoryAttributeManager {
+ Object parent;
+
+ MemoryRegion *mr;
+
+ /* 1-setting of the bit represents the memory is populated (shared) */
+ unsigned shared_bitmap_size;
+ unsigned long *shared_bitmap;
+
+ QLIST_HEAD(, RamDiscardListener) rdl_list;
+};
+
+struct MemoryAttributeManagerClass {
+ ObjectClass parent_class;
+};
+
+int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr);
+void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr);
+
+#endif
diff --git a/system/memory-attribute-manager.c b/system/memory-attribute-manager.c
new file mode 100644
index 0000000000..7c3789cf49
--- /dev/null
+++ b/system/memory-attribute-manager.c
@@ -0,0 +1,283 @@
+/*
+ * QEMU memory attribute manager
+ *
+ * Copyright Intel
+ *
+ * Author:
+ * Chenyi Qiang <chenyi.qiang@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "exec/ramblock.h"
+#include "system/memory-attribute-manager.h"
+
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(MemoryAttributeManager,
+ memory_attribute_manager,
+ MEMORY_ATTRIBUTE_MANAGER,
+ OBJECT,
+ { TYPE_RAM_DISCARD_MANAGER },
+ { })
+
+static size_t memory_attribute_manager_get_block_size(const MemoryAttributeManager *mgr)
+{
+ /*
+ * Because page conversion could be manipulated in the size of at least 4K or 4K aligned,
+ * Use the host page size as the granularity to track the memory attribute.
+ */
+ g_assert(mgr && mgr->mr && mgr->mr->ram_block);
+ g_assert(mgr->mr->ram_block->page_size == qemu_real_host_page_size());
+ return mgr->mr->ram_block->page_size;
+}
+
+
+static bool memory_attribute_rdm_is_populated(const RamDiscardManager *rdm,
+ const MemoryRegionSection *section)
+{
+ const MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
+ const int block_size = memory_attribute_manager_get_block_size(mgr);
+ uint64_t first_bit = section->offset_within_region / block_size;
+ uint64_t last_bit = first_bit + int128_get64(section->size) / block_size - 1;
+ unsigned long first_discard_bit;
+
+ first_discard_bit = find_next_zero_bit(mgr->shared_bitmap, last_bit + 1, first_bit);
+ return first_discard_bit > last_bit;
+}
+
+typedef int (*memory_attribute_section_cb)(MemoryRegionSection *s, void *arg);
+
+static int memory_attribute_notify_populate_cb(MemoryRegionSection *section, void *arg)
+{
+ RamDiscardListener *rdl = arg;
+
+ return rdl->notify_populate(rdl, section);
+}
+
+static int memory_attribute_notify_discard_cb(MemoryRegionSection *section, void *arg)
+{
+ RamDiscardListener *rdl = arg;
+
+ rdl->notify_discard(rdl, section);
+
+ return 0;
+}
+
+static int memory_attribute_for_each_populated_section(const MemoryAttributeManager *mgr,
+ MemoryRegionSection *section,
+ void *arg,
+ memory_attribute_section_cb cb)
+{
+ unsigned long first_bit, last_bit;
+ uint64_t offset, size;
+ const int block_size = memory_attribute_manager_get_block_size(mgr);
+ int ret = 0;
+
+ first_bit = section->offset_within_region / block_size;
+ first_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, first_bit);
+
+ while (first_bit < mgr->shared_bitmap_size) {
+ MemoryRegionSection tmp = *section;
+
+ offset = first_bit * block_size;
+ last_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
+ first_bit + 1) - 1;
+ size = (last_bit - first_bit + 1) * block_size;
+
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+ break;
+ }
+
+ ret = cb(&tmp, arg);
+ if (ret) {
+ error_report("%s: Failed to notify RAM discard listener: %s", __func__,
+ strerror(-ret));
+ break;
+ }
+
+ first_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
+ last_bit + 2);
+ }
+
+ return ret;
+}
+
+static int memory_attribute_for_each_discarded_section(const MemoryAttributeManager *mgr,
+ MemoryRegionSection *section,
+ void *arg,
+ memory_attribute_section_cb cb)
+{
+ unsigned long first_bit, last_bit;
+ uint64_t offset, size;
+ const int block_size = memory_attribute_manager_get_block_size(mgr);
+ int ret = 0;
+
+ first_bit = section->offset_within_region / block_size;
+ first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
+ first_bit);
+
+ while (first_bit < mgr->shared_bitmap_size) {
+ MemoryRegionSection tmp = *section;
+
+ offset = first_bit * block_size;
+ last_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
+ first_bit + 1) - 1;
+ size = (last_bit - first_bit + 1) * block_size;
+
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+ break;
+ }
+
+ ret = cb(&tmp, arg);
+ if (ret) {
+ error_report("%s: Failed to notify RAM discard listener: %s", __func__,
+ strerror(-ret));
+ break;
+ }
+
+ first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
+ last_bit + 2);
+ }
+
+ return ret;
+}
+
+static uint64_t memory_attribute_rdm_get_min_granularity(const RamDiscardManager *rdm,
+ const MemoryRegion *mr)
+{
+ MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
+
+ g_assert(mr == mgr->mr);
+ return memory_attribute_manager_get_block_size(mgr);
+}
+
+static void memory_attribute_rdm_register_listener(RamDiscardManager *rdm,
+ RamDiscardListener *rdl,
+ MemoryRegionSection *section)
+{
+ MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
+ int ret;
+
+ g_assert(section->mr == mgr->mr);
+ rdl->section = memory_region_section_new_copy(section);
+
+ QLIST_INSERT_HEAD(&mgr->rdl_list, rdl, next);
+
+ ret = memory_attribute_for_each_populated_section(mgr, section, rdl,
+ memory_attribute_notify_populate_cb);
+ if (ret) {
+ error_report("%s: Failed to register RAM discard listener: %s", __func__,
+ strerror(-ret));
+ }
+}
+
+static void memory_attribute_rdm_unregister_listener(RamDiscardManager *rdm,
+ RamDiscardListener *rdl)
+{
+ MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
+ int ret;
+
+ g_assert(rdl->section);
+ g_assert(rdl->section->mr == mgr->mr);
+
+ ret = memory_attribute_for_each_populated_section(mgr, rdl->section, rdl,
+ memory_attribute_notify_discard_cb);
+ if (ret) {
+ error_report("%s: Failed to unregister RAM discard listener: %s", __func__,
+ strerror(-ret));
+ }
+
+ memory_region_section_free_copy(rdl->section);
+ rdl->section = NULL;
+ QLIST_REMOVE(rdl, next);
+
+}
+
+typedef struct MemoryAttributeReplayData {
+ ReplayRamStateChange fn;
+ void *opaque;
+} MemoryAttributeReplayData;
+
+static int memory_attribute_rdm_replay_cb(MemoryRegionSection *section, void *arg)
+{
+ MemoryAttributeReplayData *data = arg;
+
+ return data->fn(section, data->opaque);
+}
+
+static int memory_attribute_rdm_replay_populated(const RamDiscardManager *rdm,
+ MemoryRegionSection *section,
+ ReplayRamStateChange replay_fn,
+ void *opaque)
+{
+ MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
+ MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
+
+ g_assert(section->mr == mgr->mr);
+ return memory_attribute_for_each_populated_section(mgr, section, &data,
+ memory_attribute_rdm_replay_cb);
+}
+
+static int memory_attribute_rdm_replay_discarded(const RamDiscardManager *rdm,
+ MemoryRegionSection *section,
+ ReplayRamStateChange replay_fn,
+ void *opaque)
+{
+ MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
+ MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
+
+ g_assert(section->mr == mgr->mr);
+ return memory_attribute_for_each_discarded_section(mgr, section, &data,
+ memory_attribute_rdm_replay_cb);
+}
+
+int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr)
+{
+ uint64_t shared_bitmap_size;
+ const int block_size = qemu_real_host_page_size();
+ int ret;
+
+ shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
+
+ mgr->mr = mr;
+ ret = memory_region_set_ram_discard_manager(mgr->mr, RAM_DISCARD_MANAGER(mgr));
+ if (ret) {
+ return ret;
+ }
+ mgr->shared_bitmap_size = shared_bitmap_size;
+ mgr->shared_bitmap = bitmap_new(shared_bitmap_size);
+
+ return ret;
+}
+
+void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr)
+{
+ g_free(mgr->shared_bitmap);
+ memory_region_set_ram_discard_manager(mgr->mr, NULL);
+}
+
+static void memory_attribute_manager_init(Object *obj)
+{
+ MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(obj);
+
+ QLIST_INIT(&mgr->rdl_list);
+}
+
+static void memory_attribute_manager_finalize(Object *obj)
+{
+}
+
+static void memory_attribute_manager_class_init(ObjectClass *oc, void *data)
+{
+ RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc);
+
+ rdmc->get_min_granularity = memory_attribute_rdm_get_min_granularity;
+ rdmc->register_listener = memory_attribute_rdm_register_listener;
+ rdmc->unregister_listener = memory_attribute_rdm_unregister_listener;
+ rdmc->is_populated = memory_attribute_rdm_is_populated;
+ rdmc->replay_populated = memory_attribute_rdm_replay_populated;
+ rdmc->replay_discarded = memory_attribute_rdm_replay_discarded;
+}
diff --git a/system/meson.build b/system/meson.build
index 4952f4b2c7..ab07ff1442 100644
--- a/system/meson.build
+++ b/system/meson.build
@@ -15,6 +15,7 @@ system_ss.add(files(
'dirtylimit.c',
'dma-helpers.c',
'globals.c',
+ 'memory-attribute-manager.c',
'memory_mapping.c',
'qdev-monitor.c',
'qtest.c',
--
2.43.5
On 3/10/2025 9:18 AM, Chenyi Qiang wrote: > As the commit 852f0048f3 ("RAMBlock: make guest_memfd require > uncoordinated discard") highlighted, some subsystems like VFIO may > disable ram block discard. However, guest_memfd relies on the discard > operation to perform page conversion between private and shared memory. > This can lead to stale IOMMU mapping issue when assigning a hardware > device to a confidential VM via shared memory. To address this, it is > crucial to ensure systems like VFIO refresh its IOMMU mappings. > > RamDiscardManager is an existing concept (used by virtio-mem) to adjust > VFIO mappings in relation to VM page assignment. Effectively page > conversion is similar to hot-removing a page in one mode and adding it > back in the other. Therefore, similar actions are required for page > conversion events. Introduce the RamDiscardManager to guest_memfd to > facilitate this process. > > Since guest_memfd is not an object, it cannot directly implement the > RamDiscardManager interface. One potential attempt is to implement it in > HostMemoryBackend. This is not appropriate because guest_memfd is per > RAMBlock. Some RAMBlocks have a memory backend but others do not. In > particular, the ones like virtual BIOS calling > memory_region_init_ram_guest_memfd() do not. > > To manage the RAMBlocks with guest_memfd, define a new object named > MemoryAttributeManager to implement the RamDiscardManager interface. The Isn't this should be the other way around. 'MemoryAttributeManager' should be an interface and RamDiscardManager a type of it, an implementation? MemoryAttributeManager have the data like 'shared_bitmap' etc that information can also be used by the other implementation types? Or maybe I am getting it entirely wrong. Thanks, Pankaj > object stores guest_memfd information such as shared_bitmap, and handles > page conversion notification. Using the name of MemoryAttributeManager is > aimed to make it more generic. The term "Memory" emcompasses not only RAM > but also private MMIO in TEE I/O, which might rely on this > object/interface to handle page conversion events in the future. The > term "Attribute" allows for the management of various attributes beyond > shared and private. For instance, it could support scenarios where > discard vs. populated and shared vs. private states co-exists, such as > supporting virtio-mem or something similar in the future. > > In the current context, MemoryAttributeManager signifies discarded state > as private and populated state as shared. Memory state is tracked at the > host page size granularity, as the minimum memory conversion size can be one > page per request. Additionally, VFIO expects the DMA mapping for a > specific iova to be mapped and unmapped with the same granularity. > Confidential VMs may perform partial conversions, e.g. conversion > happens on a small region within a large region. To prevent such invalid > cases and until cut_mapping operation support is introduced, all > operations are performed with 4K granularity. > > Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> > --- > Changes in v3: > - Some rename (bitmap_size->shared_bitmap_size, > first_one/zero_bit->first_bit, etc.) > - Change shared_bitmap_size from uint32_t to unsigned > - Return mgr->mr->ram_block->page_size in get_block_size() > - Move set_ram_discard_manager() up to avoid a g_free() in failure > case. > - Add const for the memory_attribute_manager_get_block_size() > - Unify the ReplayRamPopulate and ReplayRamDiscard and related > callback. > > Changes in v2: > - Rename the object name to MemoryAttributeManager > - Rename the bitmap to shared_bitmap to make it more clear. > - Remove block_size field and get it from a helper. In future, we > can get the page_size from RAMBlock if necessary. > - Remove the unncessary "struct" before GuestMemfdReplayData > - Remove the unncessary g_free() for the bitmap > - Add some error report when the callback failure for > populated/discarded section. > - Move the realize()/unrealize() definition to this patch. > --- > include/system/memory-attribute-manager.h | 42 ++++ > system/memory-attribute-manager.c | 283 ++++++++++++++++++++++ > system/meson.build | 1 + > 3 files changed, 326 insertions(+) > create mode 100644 include/system/memory-attribute-manager.h > create mode 100644 system/memory-attribute-manager.c > > diff --git a/include/system/memory-attribute-manager.h b/include/system/memory-attribute-manager.h > new file mode 100644 > index 0000000000..23375a14b8 > --- /dev/null > +++ b/include/system/memory-attribute-manager.h > @@ -0,0 +1,42 @@ > +/* > + * QEMU memory attribute manager > + * > + * Copyright Intel > + * > + * Author: > + * Chenyi Qiang <chenyi.qiang@intel.com> > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory > + * > + */ > + > +#ifndef SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H > +#define SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H > + > +#include "system/hostmem.h" > + > +#define TYPE_MEMORY_ATTRIBUTE_MANAGER "memory-attribute-manager" > + > +OBJECT_DECLARE_TYPE(MemoryAttributeManager, MemoryAttributeManagerClass, MEMORY_ATTRIBUTE_MANAGER) > + > +struct MemoryAttributeManager { > + Object parent; > + > + MemoryRegion *mr; > + > + /* 1-setting of the bit represents the memory is populated (shared) */ > + unsigned shared_bitmap_size; > + unsigned long *shared_bitmap; > + > + QLIST_HEAD(, RamDiscardListener) rdl_list; > +}; > + > +struct MemoryAttributeManagerClass { > + ObjectClass parent_class; > +}; > + > +int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr); > +void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr); > + > +#endif > diff --git a/system/memory-attribute-manager.c b/system/memory-attribute-manager.c > new file mode 100644 > index 0000000000..7c3789cf49 > --- /dev/null > +++ b/system/memory-attribute-manager.c > @@ -0,0 +1,283 @@ > +/* > + * QEMU memory attribute manager > + * > + * Copyright Intel > + * > + * Author: > + * Chenyi Qiang <chenyi.qiang@intel.com> > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory > + * > + */ > + > +#include "qemu/osdep.h" > +#include "qemu/error-report.h" > +#include "exec/ramblock.h" > +#include "system/memory-attribute-manager.h" > + > +OBJECT_DEFINE_TYPE_WITH_INTERFACES(MemoryAttributeManager, > + memory_attribute_manager, > + MEMORY_ATTRIBUTE_MANAGER, > + OBJECT, > + { TYPE_RAM_DISCARD_MANAGER }, > + { }) > + > +static size_t memory_attribute_manager_get_block_size(const MemoryAttributeManager *mgr) > +{ > + /* > + * Because page conversion could be manipulated in the size of at least 4K or 4K aligned, > + * Use the host page size as the granularity to track the memory attribute. > + */ > + g_assert(mgr && mgr->mr && mgr->mr->ram_block); > + g_assert(mgr->mr->ram_block->page_size == qemu_real_host_page_size()); > + return mgr->mr->ram_block->page_size; > +} > + > + > +static bool memory_attribute_rdm_is_populated(const RamDiscardManager *rdm, > + const MemoryRegionSection *section) > +{ > + const MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); > + const int block_size = memory_attribute_manager_get_block_size(mgr); > + uint64_t first_bit = section->offset_within_region / block_size; > + uint64_t last_bit = first_bit + int128_get64(section->size) / block_size - 1; > + unsigned long first_discard_bit; > + > + first_discard_bit = find_next_zero_bit(mgr->shared_bitmap, last_bit + 1, first_bit); > + return first_discard_bit > last_bit; > +} > + > +typedef int (*memory_attribute_section_cb)(MemoryRegionSection *s, void *arg); > + > +static int memory_attribute_notify_populate_cb(MemoryRegionSection *section, void *arg) > +{ > + RamDiscardListener *rdl = arg; > + > + return rdl->notify_populate(rdl, section); > +} > + > +static int memory_attribute_notify_discard_cb(MemoryRegionSection *section, void *arg) > +{ > + RamDiscardListener *rdl = arg; > + > + rdl->notify_discard(rdl, section); > + > + return 0; > +} > + > +static int memory_attribute_for_each_populated_section(const MemoryAttributeManager *mgr, > + MemoryRegionSection *section, > + void *arg, > + memory_attribute_section_cb cb) > +{ > + unsigned long first_bit, last_bit; > + uint64_t offset, size; > + const int block_size = memory_attribute_manager_get_block_size(mgr); > + int ret = 0; > + > + first_bit = section->offset_within_region / block_size; > + first_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, first_bit); > + > + while (first_bit < mgr->shared_bitmap_size) { > + MemoryRegionSection tmp = *section; > + > + offset = first_bit * block_size; > + last_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, > + first_bit + 1) - 1; > + size = (last_bit - first_bit + 1) * block_size; > + > + if (!memory_region_section_intersect_range(&tmp, offset, size)) { > + break; > + } > + > + ret = cb(&tmp, arg); > + if (ret) { > + error_report("%s: Failed to notify RAM discard listener: %s", __func__, > + strerror(-ret)); > + break; > + } > + > + first_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, > + last_bit + 2); > + } > + > + return ret; > +} > + > +static int memory_attribute_for_each_discarded_section(const MemoryAttributeManager *mgr, > + MemoryRegionSection *section, > + void *arg, > + memory_attribute_section_cb cb) > +{ > + unsigned long first_bit, last_bit; > + uint64_t offset, size; > + const int block_size = memory_attribute_manager_get_block_size(mgr); > + int ret = 0; > + > + first_bit = section->offset_within_region / block_size; > + first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, > + first_bit); > + > + while (first_bit < mgr->shared_bitmap_size) { > + MemoryRegionSection tmp = *section; > + > + offset = first_bit * block_size; > + last_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, > + first_bit + 1) - 1; > + size = (last_bit - first_bit + 1) * block_size; > + > + if (!memory_region_section_intersect_range(&tmp, offset, size)) { > + break; > + } > + > + ret = cb(&tmp, arg); > + if (ret) { > + error_report("%s: Failed to notify RAM discard listener: %s", __func__, > + strerror(-ret)); > + break; > + } > + > + first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, > + last_bit + 2); > + } > + > + return ret; > +} > + > +static uint64_t memory_attribute_rdm_get_min_granularity(const RamDiscardManager *rdm, > + const MemoryRegion *mr) > +{ > + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); > + > + g_assert(mr == mgr->mr); > + return memory_attribute_manager_get_block_size(mgr); > +} > + > +static void memory_attribute_rdm_register_listener(RamDiscardManager *rdm, > + RamDiscardListener *rdl, > + MemoryRegionSection *section) > +{ > + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); > + int ret; > + > + g_assert(section->mr == mgr->mr); > + rdl->section = memory_region_section_new_copy(section); > + > + QLIST_INSERT_HEAD(&mgr->rdl_list, rdl, next); > + > + ret = memory_attribute_for_each_populated_section(mgr, section, rdl, > + memory_attribute_notify_populate_cb); > + if (ret) { > + error_report("%s: Failed to register RAM discard listener: %s", __func__, > + strerror(-ret)); > + } > +} > + > +static void memory_attribute_rdm_unregister_listener(RamDiscardManager *rdm, > + RamDiscardListener *rdl) > +{ > + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); > + int ret; > + > + g_assert(rdl->section); > + g_assert(rdl->section->mr == mgr->mr); > + > + ret = memory_attribute_for_each_populated_section(mgr, rdl->section, rdl, > + memory_attribute_notify_discard_cb); > + if (ret) { > + error_report("%s: Failed to unregister RAM discard listener: %s", __func__, > + strerror(-ret)); > + } > + > + memory_region_section_free_copy(rdl->section); > + rdl->section = NULL; > + QLIST_REMOVE(rdl, next); > + > +} > + > +typedef struct MemoryAttributeReplayData { > + ReplayRamStateChange fn; > + void *opaque; > +} MemoryAttributeReplayData; > + > +static int memory_attribute_rdm_replay_cb(MemoryRegionSection *section, void *arg) > +{ > + MemoryAttributeReplayData *data = arg; > + > + return data->fn(section, data->opaque); > +} > + > +static int memory_attribute_rdm_replay_populated(const RamDiscardManager *rdm, > + MemoryRegionSection *section, > + ReplayRamStateChange replay_fn, > + void *opaque) > +{ > + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); > + MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = opaque }; > + > + g_assert(section->mr == mgr->mr); > + return memory_attribute_for_each_populated_section(mgr, section, &data, > + memory_attribute_rdm_replay_cb); > +} > + > +static int memory_attribute_rdm_replay_discarded(const RamDiscardManager *rdm, > + MemoryRegionSection *section, > + ReplayRamStateChange replay_fn, > + void *opaque) > +{ > + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); > + MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = opaque }; > + > + g_assert(section->mr == mgr->mr); > + return memory_attribute_for_each_discarded_section(mgr, section, &data, > + memory_attribute_rdm_replay_cb); > +} > + > +int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr) > +{ > + uint64_t shared_bitmap_size; > + const int block_size = qemu_real_host_page_size(); > + int ret; > + > + shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size; > + > + mgr->mr = mr; > + ret = memory_region_set_ram_discard_manager(mgr->mr, RAM_DISCARD_MANAGER(mgr)); > + if (ret) { > + return ret; > + } > + mgr->shared_bitmap_size = shared_bitmap_size; > + mgr->shared_bitmap = bitmap_new(shared_bitmap_size); > + > + return ret; > +} > + > +void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr) > +{ > + g_free(mgr->shared_bitmap); > + memory_region_set_ram_discard_manager(mgr->mr, NULL); > +} > + > +static void memory_attribute_manager_init(Object *obj) > +{ > + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(obj); > + > + QLIST_INIT(&mgr->rdl_list); > +} > + > +static void memory_attribute_manager_finalize(Object *obj) > +{ > +} > + > +static void memory_attribute_manager_class_init(ObjectClass *oc, void *data) > +{ > + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); > + > + rdmc->get_min_granularity = memory_attribute_rdm_get_min_granularity; > + rdmc->register_listener = memory_attribute_rdm_register_listener; > + rdmc->unregister_listener = memory_attribute_rdm_unregister_listener; > + rdmc->is_populated = memory_attribute_rdm_is_populated; > + rdmc->replay_populated = memory_attribute_rdm_replay_populated; > + rdmc->replay_discarded = memory_attribute_rdm_replay_discarded; > +} Would this initialization be for > diff --git a/system/meson.build b/system/meson.build > index 4952f4b2c7..ab07ff1442 100644 > --- a/system/meson.build > +++ b/system/meson.build > @@ -15,6 +15,7 @@ system_ss.add(files( > 'dirtylimit.c', > 'dma-helpers.c', > 'globals.c', > + 'memory-attribute-manager.c', > 'memory_mapping.c', > 'qdev-monitor.c', > 'qtest.c',
On 3/14/2025 8:11 PM, Gupta, Pankaj wrote: > On 3/10/2025 9:18 AM, Chenyi Qiang wrote: >> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require >> uncoordinated discard") highlighted, some subsystems like VFIO may >> disable ram block discard. However, guest_memfd relies on the discard >> operation to perform page conversion between private and shared memory. >> This can lead to stale IOMMU mapping issue when assigning a hardware >> device to a confidential VM via shared memory. To address this, it is >> crucial to ensure systems like VFIO refresh its IOMMU mappings. >> >> RamDiscardManager is an existing concept (used by virtio-mem) to adjust >> VFIO mappings in relation to VM page assignment. Effectively page >> conversion is similar to hot-removing a page in one mode and adding it >> back in the other. Therefore, similar actions are required for page >> conversion events. Introduce the RamDiscardManager to guest_memfd to >> facilitate this process. >> >> Since guest_memfd is not an object, it cannot directly implement the >> RamDiscardManager interface. One potential attempt is to implement it in >> HostMemoryBackend. This is not appropriate because guest_memfd is per >> RAMBlock. Some RAMBlocks have a memory backend but others do not. In >> particular, the ones like virtual BIOS calling >> memory_region_init_ram_guest_memfd() do not. >> >> To manage the RAMBlocks with guest_memfd, define a new object named >> MemoryAttributeManager to implement the RamDiscardManager interface. The > > Isn't this should be the other way around. 'MemoryAttributeManager' > should be an interface and RamDiscardManager a type of it, an > implementation? We want to use 'MemoryAttributeManager' to represent RAMBlock to implement the RamDiscardManager interface callbacks because RAMBlock is not an object. It includes some metadata of guest_memfd like shared_bitmap at the same time. I can't get it that make 'MemoryAttributeManager' an interface and RamDiscardManager a type of it. Can you elaborate it a little bit? I think at least we need someone to implement the RamDiscardManager interface. > > MemoryAttributeManager have the data like 'shared_bitmap' etc that > information can also be used by the other implementation types? Shared_bitmap is just the metadata of guest_memfd. Other subsystems may consult its status by RamDiscardManager interface like ram_discard_manager_is_populated(). > > Or maybe I am getting it entirely wrong. > > Thanks, > Pankaj > >> object stores guest_memfd information such as shared_bitmap, and handles >> page conversion notification. Using the name of MemoryAttributeManager is >> aimed to make it more generic. The term "Memory" emcompasses not only RAM >> but also private MMIO in TEE I/O, which might rely on this >> object/interface to handle page conversion events in the future. The >> term "Attribute" allows for the management of various attributes beyond >> shared and private. For instance, it could support scenarios where >> discard vs. populated and shared vs. private states co-exists, such as >> supporting virtio-mem or something similar in the future. >> >> In the current context, MemoryAttributeManager signifies discarded state >> as private and populated state as shared. Memory state is tracked at the >> host page size granularity, as the minimum memory conversion size can >> be one >> page per request. Additionally, VFIO expects the DMA mapping for a >> specific iova to be mapped and unmapped with the same granularity. >> Confidential VMs may perform partial conversions, e.g. conversion >> happens on a small region within a large region. To prevent such invalid >> cases and until cut_mapping operation support is introduced, all >> operations are performed with 4K granularity. >> >> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> >> --- >> Changes in v3: >> - Some rename (bitmap_size->shared_bitmap_size, >> first_one/zero_bit->first_bit, etc.) >> - Change shared_bitmap_size from uint32_t to unsigned >> - Return mgr->mr->ram_block->page_size in get_block_size() >> - Move set_ram_discard_manager() up to avoid a g_free() in failure >> case. >> - Add const for the memory_attribute_manager_get_block_size() >> - Unify the ReplayRamPopulate and ReplayRamDiscard and related >> callback. >> >> Changes in v2: >> - Rename the object name to MemoryAttributeManager >> - Rename the bitmap to shared_bitmap to make it more clear. >> - Remove block_size field and get it from a helper. In future, we >> can get the page_size from RAMBlock if necessary. >> - Remove the unncessary "struct" before GuestMemfdReplayData >> - Remove the unncessary g_free() for the bitmap >> - Add some error report when the callback failure for >> populated/discarded section. >> - Move the realize()/unrealize() definition to this patch. >> --- >> include/system/memory-attribute-manager.h | 42 ++++ >> system/memory-attribute-manager.c | 283 ++++++++++++++++++++++ >> system/meson.build | 1 + >> 3 files changed, 326 insertions(+) >> create mode 100644 include/system/memory-attribute-manager.h >> create mode 100644 system/memory-attribute-manager.c >> >> diff --git a/include/system/memory-attribute-manager.h b/include/ >> system/memory-attribute-manager.h >> new file mode 100644 >> index 0000000000..23375a14b8 >> --- /dev/null >> +++ b/include/system/memory-attribute-manager.h >> @@ -0,0 +1,42 @@ >> +/* >> + * QEMU memory attribute manager >> + * >> + * Copyright Intel >> + * >> + * Author: >> + * Chenyi Qiang <chenyi.qiang@intel.com> >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2 or >> later. >> + * See the COPYING file in the top-level directory >> + * >> + */ >> + >> +#ifndef SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H >> +#define SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H >> + >> +#include "system/hostmem.h" >> + >> +#define TYPE_MEMORY_ATTRIBUTE_MANAGER "memory-attribute-manager" >> + >> +OBJECT_DECLARE_TYPE(MemoryAttributeManager, >> MemoryAttributeManagerClass, MEMORY_ATTRIBUTE_MANAGER) >> + >> +struct MemoryAttributeManager { >> + Object parent; >> + >> + MemoryRegion *mr; >> + >> + /* 1-setting of the bit represents the memory is populated >> (shared) */ >> + unsigned shared_bitmap_size; >> + unsigned long *shared_bitmap; >> + >> + QLIST_HEAD(, RamDiscardListener) rdl_list; >> +}; >> + >> +struct MemoryAttributeManagerClass { >> + ObjectClass parent_class; >> +}; >> + >> +int memory_attribute_manager_realize(MemoryAttributeManager *mgr, >> MemoryRegion *mr); >> +void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr); >> + >> +#endif >> diff --git a/system/memory-attribute-manager.c b/system/memory- >> attribute-manager.c >> new file mode 100644 >> index 0000000000..7c3789cf49 >> --- /dev/null >> +++ b/system/memory-attribute-manager.c >> @@ -0,0 +1,283 @@ >> +/* >> + * QEMU memory attribute manager >> + * >> + * Copyright Intel >> + * >> + * Author: >> + * Chenyi Qiang <chenyi.qiang@intel.com> >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2 or >> later. >> + * See the COPYING file in the top-level directory >> + * >> + */ >> + >> +#include "qemu/osdep.h" >> +#include "qemu/error-report.h" >> +#include "exec/ramblock.h" >> +#include "system/memory-attribute-manager.h" >> + >> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(MemoryAttributeManager, >> + memory_attribute_manager, >> + MEMORY_ATTRIBUTE_MANAGER, >> + OBJECT, >> + { TYPE_RAM_DISCARD_MANAGER }, >> + { }) >> + >> +static size_t memory_attribute_manager_get_block_size(const >> MemoryAttributeManager *mgr) >> +{ >> + /* >> + * Because page conversion could be manipulated in the size of at >> least 4K or 4K aligned, >> + * Use the host page size as the granularity to track the memory >> attribute. >> + */ >> + g_assert(mgr && mgr->mr && mgr->mr->ram_block); >> + g_assert(mgr->mr->ram_block->page_size == >> qemu_real_host_page_size()); >> + return mgr->mr->ram_block->page_size; >> +} >> + >> + >> +static bool memory_attribute_rdm_is_populated(const RamDiscardManager >> *rdm, >> + const >> MemoryRegionSection *section) >> +{ >> + const MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); >> + const int block_size = memory_attribute_manager_get_block_size(mgr); >> + uint64_t first_bit = section->offset_within_region / block_size; >> + uint64_t last_bit = first_bit + int128_get64(section->size) / >> block_size - 1; >> + unsigned long first_discard_bit; >> + >> + first_discard_bit = find_next_zero_bit(mgr->shared_bitmap, >> last_bit + 1, first_bit); >> + return first_discard_bit > last_bit; >> +} >> + >> +typedef int (*memory_attribute_section_cb)(MemoryRegionSection *s, >> void *arg); >> + >> +static int memory_attribute_notify_populate_cb(MemoryRegionSection >> *section, void *arg) >> +{ >> + RamDiscardListener *rdl = arg; >> + >> + return rdl->notify_populate(rdl, section); >> +} >> + >> +static int memory_attribute_notify_discard_cb(MemoryRegionSection >> *section, void *arg) >> +{ >> + RamDiscardListener *rdl = arg; >> + >> + rdl->notify_discard(rdl, section); >> + >> + return 0; >> +} >> + >> +static int memory_attribute_for_each_populated_section(const >> MemoryAttributeManager *mgr, >> + >> MemoryRegionSection *section, >> + void *arg, >> + >> memory_attribute_section_cb cb) >> +{ >> + unsigned long first_bit, last_bit; >> + uint64_t offset, size; >> + const int block_size = memory_attribute_manager_get_block_size(mgr); >> + int ret = 0; >> + >> + first_bit = section->offset_within_region / block_size; >> + first_bit = find_next_bit(mgr->shared_bitmap, mgr- >> >shared_bitmap_size, first_bit); >> + >> + while (first_bit < mgr->shared_bitmap_size) { >> + MemoryRegionSection tmp = *section; >> + >> + offset = first_bit * block_size; >> + last_bit = find_next_zero_bit(mgr->shared_bitmap, mgr- >> >shared_bitmap_size, >> + first_bit + 1) - 1; >> + size = (last_bit - first_bit + 1) * block_size; >> + >> + if (!memory_region_section_intersect_range(&tmp, offset, >> size)) { >> + break; >> + } >> + >> + ret = cb(&tmp, arg); >> + if (ret) { >> + error_report("%s: Failed to notify RAM discard listener: >> %s", __func__, >> + strerror(-ret)); >> + break; >> + } >> + >> + first_bit = find_next_bit(mgr->shared_bitmap, mgr- >> >shared_bitmap_size, >> + last_bit + 2); >> + } >> + >> + return ret; >> +} >> + >> +static int memory_attribute_for_each_discarded_section(const >> MemoryAttributeManager *mgr, >> + >> MemoryRegionSection *section, >> + void *arg, >> + >> memory_attribute_section_cb cb) >> +{ >> + unsigned long first_bit, last_bit; >> + uint64_t offset, size; >> + const int block_size = memory_attribute_manager_get_block_size(mgr); >> + int ret = 0; >> + >> + first_bit = section->offset_within_region / block_size; >> + first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr- >> >shared_bitmap_size, >> + first_bit); >> + >> + while (first_bit < mgr->shared_bitmap_size) { >> + MemoryRegionSection tmp = *section; >> + >> + offset = first_bit * block_size; >> + last_bit = find_next_bit(mgr->shared_bitmap, mgr- >> >shared_bitmap_size, >> + first_bit + 1) - 1; >> + size = (last_bit - first_bit + 1) * block_size; >> + >> + if (!memory_region_section_intersect_range(&tmp, offset, >> size)) { >> + break; >> + } >> + >> + ret = cb(&tmp, arg); >> + if (ret) { >> + error_report("%s: Failed to notify RAM discard listener: >> %s", __func__, >> + strerror(-ret)); >> + break; >> + } >> + >> + first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr- >> >shared_bitmap_size, >> + last_bit + 2); >> + } >> + >> + return ret; >> +} >> + >> +static uint64_t memory_attribute_rdm_get_min_granularity(const >> RamDiscardManager *rdm, >> + const >> MemoryRegion *mr) >> +{ >> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); >> + >> + g_assert(mr == mgr->mr); >> + return memory_attribute_manager_get_block_size(mgr); >> +} >> + >> +static void memory_attribute_rdm_register_listener(RamDiscardManager >> *rdm, >> + RamDiscardListener >> *rdl, >> + >> MemoryRegionSection *section) >> +{ >> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); >> + int ret; >> + >> + g_assert(section->mr == mgr->mr); >> + rdl->section = memory_region_section_new_copy(section); >> + >> + QLIST_INSERT_HEAD(&mgr->rdl_list, rdl, next); >> + >> + ret = memory_attribute_for_each_populated_section(mgr, section, rdl, >> + >> memory_attribute_notify_populate_cb); >> + if (ret) { >> + error_report("%s: Failed to register RAM discard listener: >> %s", __func__, >> + strerror(-ret)); >> + } >> +} >> + >> +static void >> memory_attribute_rdm_unregister_listener(RamDiscardManager *rdm, >> + >> RamDiscardListener *rdl) >> +{ >> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); >> + int ret; >> + >> + g_assert(rdl->section); >> + g_assert(rdl->section->mr == mgr->mr); >> + >> + ret = memory_attribute_for_each_populated_section(mgr, rdl- >> >section, rdl, >> + >> memory_attribute_notify_discard_cb); >> + if (ret) { >> + error_report("%s: Failed to unregister RAM discard listener: >> %s", __func__, >> + strerror(-ret)); >> + } >> + >> + memory_region_section_free_copy(rdl->section); >> + rdl->section = NULL; >> + QLIST_REMOVE(rdl, next); >> + >> +} >> + >> +typedef struct MemoryAttributeReplayData { >> + ReplayRamStateChange fn; >> + void *opaque; >> +} MemoryAttributeReplayData; >> + >> +static int memory_attribute_rdm_replay_cb(MemoryRegionSection >> *section, void *arg) >> +{ >> + MemoryAttributeReplayData *data = arg; >> + >> + return data->fn(section, data->opaque); >> +} >> + >> +static int memory_attribute_rdm_replay_populated(const >> RamDiscardManager *rdm, >> + MemoryRegionSection >> *section, >> + ReplayRamStateChange >> replay_fn, >> + void *opaque) >> +{ >> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); >> + MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = >> opaque }; >> + >> + g_assert(section->mr == mgr->mr); >> + return memory_attribute_for_each_populated_section(mgr, section, >> &data, >> + >> memory_attribute_rdm_replay_cb); >> +} >> + >> +static int memory_attribute_rdm_replay_discarded(const >> RamDiscardManager *rdm, >> + MemoryRegionSection >> *section, >> + ReplayRamStateChange >> replay_fn, >> + void *opaque) >> +{ >> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); >> + MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = >> opaque }; >> + >> + g_assert(section->mr == mgr->mr); >> + return memory_attribute_for_each_discarded_section(mgr, section, >> &data, >> + >> memory_attribute_rdm_replay_cb); >> +} >> + >> +int memory_attribute_manager_realize(MemoryAttributeManager *mgr, >> MemoryRegion *mr) >> +{ >> + uint64_t shared_bitmap_size; >> + const int block_size = qemu_real_host_page_size(); >> + int ret; >> + >> + shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size; >> + >> + mgr->mr = mr; >> + ret = memory_region_set_ram_discard_manager(mgr->mr, >> RAM_DISCARD_MANAGER(mgr)); >> + if (ret) { >> + return ret; >> + } >> + mgr->shared_bitmap_size = shared_bitmap_size; >> + mgr->shared_bitmap = bitmap_new(shared_bitmap_size); >> + >> + return ret; >> +} >> + >> +void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr) >> +{ >> + g_free(mgr->shared_bitmap); >> + memory_region_set_ram_discard_manager(mgr->mr, NULL); >> +} >> + >> +static void memory_attribute_manager_init(Object *obj) >> +{ >> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(obj); >> + >> + QLIST_INIT(&mgr->rdl_list); >> +} >> + >> +static void memory_attribute_manager_finalize(Object *obj) >> +{ >> +} >> + >> +static void memory_attribute_manager_class_init(ObjectClass *oc, void >> *data) >> +{ >> + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); >> + >> + rdmc->get_min_granularity = >> memory_attribute_rdm_get_min_granularity; >> + rdmc->register_listener = memory_attribute_rdm_register_listener; >> + rdmc->unregister_listener = >> memory_attribute_rdm_unregister_listener; >> + rdmc->is_populated = memory_attribute_rdm_is_populated; >> + rdmc->replay_populated = memory_attribute_rdm_replay_populated; >> + rdmc->replay_discarded = memory_attribute_rdm_replay_discarded; >> +} > > Would this initialization be for >> diff --git a/system/meson.build b/system/meson.build >> index 4952f4b2c7..ab07ff1442 100644 >> --- a/system/meson.build >> +++ b/system/meson.build >> @@ -15,6 +15,7 @@ system_ss.add(files( >> 'dirtylimit.c', >> 'dma-helpers.c', >> 'globals.c', >> + 'memory-attribute-manager.c', >> 'memory_mapping.c', >> 'qdev-monitor.c', >> 'qtest.c', >
On 17.03.25 03:54, Chenyi Qiang wrote: > > > On 3/14/2025 8:11 PM, Gupta, Pankaj wrote: >> On 3/10/2025 9:18 AM, Chenyi Qiang wrote: >>> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require >>> uncoordinated discard") highlighted, some subsystems like VFIO may >>> disable ram block discard. However, guest_memfd relies on the discard >>> operation to perform page conversion between private and shared memory. >>> This can lead to stale IOMMU mapping issue when assigning a hardware >>> device to a confidential VM via shared memory. To address this, it is >>> crucial to ensure systems like VFIO refresh its IOMMU mappings. >>> >>> RamDiscardManager is an existing concept (used by virtio-mem) to adjust >>> VFIO mappings in relation to VM page assignment. Effectively page >>> conversion is similar to hot-removing a page in one mode and adding it >>> back in the other. Therefore, similar actions are required for page >>> conversion events. Introduce the RamDiscardManager to guest_memfd to >>> facilitate this process. >>> >>> Since guest_memfd is not an object, it cannot directly implement the >>> RamDiscardManager interface. One potential attempt is to implement it in >>> HostMemoryBackend. This is not appropriate because guest_memfd is per >>> RAMBlock. Some RAMBlocks have a memory backend but others do not. In >>> particular, the ones like virtual BIOS calling >>> memory_region_init_ram_guest_memfd() do not. >>> >>> To manage the RAMBlocks with guest_memfd, define a new object named >>> MemoryAttributeManager to implement the RamDiscardManager interface. The >> >> Isn't this should be the other way around. 'MemoryAttributeManager' >> should be an interface and RamDiscardManager a type of it, an >> implementation? > > We want to use 'MemoryAttributeManager' to represent RAMBlock to > implement the RamDiscardManager interface callbacks because RAMBlock is > not an object. It includes some metadata of guest_memfd like > shared_bitmap at the same time. > > I can't get it that make 'MemoryAttributeManager' an interface and > RamDiscardManager a type of it. Can you elaborate it a little bit? I > think at least we need someone to implement the RamDiscardManager interface. shared <-> private is translated (abstracted) to "populated <-> discarded", which makes sense. The other way around would be wrong. It's going to be interesting once we have more logical states, for example supporting virtio-mem for confidential VMs. Then we'd have "shared+populated, private+populated, shared+discard, private+discarded". Not sure if this could simply be achieved by allowing multiple RamDiscardManager that are effectively chained, or if we'd want a different interface. -- Cheers, David / dhildenb
On 3/17/2025 11:36 AM, David Hildenbrand wrote: > On 17.03.25 03:54, Chenyi Qiang wrote: >> >> >> On 3/14/2025 8:11 PM, Gupta, Pankaj wrote: >>> On 3/10/2025 9:18 AM, Chenyi Qiang wrote: >>>> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require >>>> uncoordinated discard") highlighted, some subsystems like VFIO may >>>> disable ram block discard. However, guest_memfd relies on the discard >>>> operation to perform page conversion between private and shared memory. >>>> This can lead to stale IOMMU mapping issue when assigning a hardware >>>> device to a confidential VM via shared memory. To address this, it is >>>> crucial to ensure systems like VFIO refresh its IOMMU mappings. >>>> >>>> RamDiscardManager is an existing concept (used by virtio-mem) to adjust >>>> VFIO mappings in relation to VM page assignment. Effectively page >>>> conversion is similar to hot-removing a page in one mode and adding it >>>> back in the other. Therefore, similar actions are required for page >>>> conversion events. Introduce the RamDiscardManager to guest_memfd to >>>> facilitate this process. >>>> >>>> Since guest_memfd is not an object, it cannot directly implement the >>>> RamDiscardManager interface. One potential attempt is to implement >>>> it in >>>> HostMemoryBackend. This is not appropriate because guest_memfd is per >>>> RAMBlock. Some RAMBlocks have a memory backend but others do not. In >>>> particular, the ones like virtual BIOS calling >>>> memory_region_init_ram_guest_memfd() do not. >>>> >>>> To manage the RAMBlocks with guest_memfd, define a new object named >>>> MemoryAttributeManager to implement the RamDiscardManager interface. >>>> The >>> >>> Isn't this should be the other way around. 'MemoryAttributeManager' >>> should be an interface and RamDiscardManager a type of it, an >>> implementation? >> >> We want to use 'MemoryAttributeManager' to represent RAMBlock to >> implement the RamDiscardManager interface callbacks because RAMBlock is >> not an object. It includes some metadata of guest_memfd like >> shared_bitmap at the same time. >> >> I can't get it that make 'MemoryAttributeManager' an interface and >> RamDiscardManager a type of it. Can you elaborate it a little bit? I >> think at least we need someone to implement the RamDiscardManager >> interface. > > shared <-> private is translated (abstracted) to "populated <-> > discarded", which makes sense. The other way around would be wrong. > > It's going to be interesting once we have more logical states, for > example supporting virtio-mem for confidential VMs. > > Then we'd have "shared+populated, private+populated, shared+discard, > private+discarded". Not sure if this could simply be achieved by > allowing multiple RamDiscardManager that are effectively chained, or if > we'd want a different interface. Exactly! In any case generic manager (parent class) would make more sense that can work on different operations/states implemented in child classes (can be chained as well). Best regards, Pankaj >
On 3/18/2025 1:01 AM, Gupta, Pankaj wrote: > On 3/17/2025 11:36 AM, David Hildenbrand wrote: >> On 17.03.25 03:54, Chenyi Qiang wrote: >>> >>> >>> On 3/14/2025 8:11 PM, Gupta, Pankaj wrote: >>>> On 3/10/2025 9:18 AM, Chenyi Qiang wrote: >>>>> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require >>>>> uncoordinated discard") highlighted, some subsystems like VFIO may >>>>> disable ram block discard. However, guest_memfd relies on the discard >>>>> operation to perform page conversion between private and shared >>>>> memory. >>>>> This can lead to stale IOMMU mapping issue when assigning a hardware >>>>> device to a confidential VM via shared memory. To address this, it is >>>>> crucial to ensure systems like VFIO refresh its IOMMU mappings. >>>>> >>>>> RamDiscardManager is an existing concept (used by virtio-mem) to >>>>> adjust >>>>> VFIO mappings in relation to VM page assignment. Effectively page >>>>> conversion is similar to hot-removing a page in one mode and adding it >>>>> back in the other. Therefore, similar actions are required for page >>>>> conversion events. Introduce the RamDiscardManager to guest_memfd to >>>>> facilitate this process. >>>>> >>>>> Since guest_memfd is not an object, it cannot directly implement the >>>>> RamDiscardManager interface. One potential attempt is to implement >>>>> it in >>>>> HostMemoryBackend. This is not appropriate because guest_memfd is per >>>>> RAMBlock. Some RAMBlocks have a memory backend but others do not. In >>>>> particular, the ones like virtual BIOS calling >>>>> memory_region_init_ram_guest_memfd() do not. >>>>> >>>>> To manage the RAMBlocks with guest_memfd, define a new object named >>>>> MemoryAttributeManager to implement the RamDiscardManager >>>>> interface. The >>>> >>>> Isn't this should be the other way around. 'MemoryAttributeManager' >>>> should be an interface and RamDiscardManager a type of it, an >>>> implementation? >>> >>> We want to use 'MemoryAttributeManager' to represent RAMBlock to >>> implement the RamDiscardManager interface callbacks because RAMBlock is >>> not an object. It includes some metadata of guest_memfd like >>> shared_bitmap at the same time. >>> >>> I can't get it that make 'MemoryAttributeManager' an interface and >>> RamDiscardManager a type of it. Can you elaborate it a little bit? I >>> think at least we need someone to implement the RamDiscardManager >>> interface. >> >> shared <-> private is translated (abstracted) to "populated <-> >> discarded", which makes sense. The other way around would be wrong. >> >> It's going to be interesting once we have more logical states, for >> example supporting virtio-mem for confidential VMs. >> >> Then we'd have "shared+populated, private+populated, shared+discard, >> private+discarded". Not sure if this could simply be achieved by >> allowing multiple RamDiscardManager that are effectively chained, or >> if we'd want a different interface. > > Exactly! In any case generic manager (parent class) would make more > sense that can work on different operations/states implemented in child > classes (can be chained as well). Ah, we are talking about the generic state management. Sorry for my slow reaction. So we need to 1. Define a generic manager Interface, e.g. MemoryStateManager/GenericStateManager. 2. Make RamDiscardManager the child of MemoryStateManager which manages the state of populated and discarded. 3. Define a new child manager Interface PrivateSharedManager which manages the state of private and shared. 4. Define a new object ConfidentialMemoryAttribute to implement the PrivateSharedManager interface. (Welcome to rename the above Interface/Object) Is my understanding correct? > > Best regards, > Pankaj >> >
>>>>>> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require >>>>>> uncoordinated discard") highlighted, some subsystems like VFIO may >>>>>> disable ram block discard. However, guest_memfd relies on the discard >>>>>> operation to perform page conversion between private and shared >>>>>> memory. >>>>>> This can lead to stale IOMMU mapping issue when assigning a hardware >>>>>> device to a confidential VM via shared memory. To address this, it is >>>>>> crucial to ensure systems like VFIO refresh its IOMMU mappings. >>>>>> >>>>>> RamDiscardManager is an existing concept (used by virtio-mem) to >>>>>> adjust >>>>>> VFIO mappings in relation to VM page assignment. Effectively page >>>>>> conversion is similar to hot-removing a page in one mode and adding it >>>>>> back in the other. Therefore, similar actions are required for page >>>>>> conversion events. Introduce the RamDiscardManager to guest_memfd to >>>>>> facilitate this process. >>>>>> >>>>>> Since guest_memfd is not an object, it cannot directly implement the >>>>>> RamDiscardManager interface. One potential attempt is to implement >>>>>> it in >>>>>> HostMemoryBackend. This is not appropriate because guest_memfd is per >>>>>> RAMBlock. Some RAMBlocks have a memory backend but others do not. In >>>>>> particular, the ones like virtual BIOS calling >>>>>> memory_region_init_ram_guest_memfd() do not. >>>>>> >>>>>> To manage the RAMBlocks with guest_memfd, define a new object named >>>>>> MemoryAttributeManager to implement the RamDiscardManager >>>>>> interface. The >>>>> >>>>> Isn't this should be the other way around. 'MemoryAttributeManager' >>>>> should be an interface and RamDiscardManager a type of it, an >>>>> implementation? >>>> >>>> We want to use 'MemoryAttributeManager' to represent RAMBlock to >>>> implement the RamDiscardManager interface callbacks because RAMBlock is >>>> not an object. It includes some metadata of guest_memfd like >>>> shared_bitmap at the same time. >>>> >>>> I can't get it that make 'MemoryAttributeManager' an interface and >>>> RamDiscardManager a type of it. Can you elaborate it a little bit? I >>>> think at least we need someone to implement the RamDiscardManager >>>> interface. >>> >>> shared <-> private is translated (abstracted) to "populated <-> >>> discarded", which makes sense. The other way around would be wrong. >>> >>> It's going to be interesting once we have more logical states, for >>> example supporting virtio-mem for confidential VMs. >>> >>> Then we'd have "shared+populated, private+populated, shared+discard, >>> private+discarded". Not sure if this could simply be achieved by >>> allowing multiple RamDiscardManager that are effectively chained, or >>> if we'd want a different interface. >> >> Exactly! In any case generic manager (parent class) would make more >> sense that can work on different operations/states implemented in child >> classes (can be chained as well). > > Ah, we are talking about the generic state management. Sorry for my slow > reaction. > > So we need to > 1. Define a generic manager Interface, e.g. > MemoryStateManager/GenericStateManager. > 2. Make RamDiscardManager the child of MemoryStateManager which manages > the state of populated and discarded. > 3. Define a new child manager Interface PrivateSharedManager which > manages the state of private and shared. > 4. Define a new object ConfidentialMemoryAttribute to implement the > PrivateSharedManager interface. > (Welcome to rename the above Interface/Object) > > Is my understanding correct? Yes, in that direction. Where 'RamDiscardManager' & 'PrivateSharedManager' are both child of 'GenericStateManager'. Depending on listeners registered, corresponding handlers can be called. Best regards, Pankaj
On 3/19/2025 4:55 PM, Gupta, Pankaj wrote: > >>>>>>> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require >>>>>>> uncoordinated discard") highlighted, some subsystems like VFIO may >>>>>>> disable ram block discard. However, guest_memfd relies on the >>>>>>> discard >>>>>>> operation to perform page conversion between private and shared >>>>>>> memory. >>>>>>> This can lead to stale IOMMU mapping issue when assigning a hardware >>>>>>> device to a confidential VM via shared memory. To address this, >>>>>>> it is >>>>>>> crucial to ensure systems like VFIO refresh its IOMMU mappings. >>>>>>> >>>>>>> RamDiscardManager is an existing concept (used by virtio-mem) to >>>>>>> adjust >>>>>>> VFIO mappings in relation to VM page assignment. Effectively page >>>>>>> conversion is similar to hot-removing a page in one mode and >>>>>>> adding it >>>>>>> back in the other. Therefore, similar actions are required for page >>>>>>> conversion events. Introduce the RamDiscardManager to guest_memfd to >>>>>>> facilitate this process. >>>>>>> >>>>>>> Since guest_memfd is not an object, it cannot directly implement the >>>>>>> RamDiscardManager interface. One potential attempt is to implement >>>>>>> it in >>>>>>> HostMemoryBackend. This is not appropriate because guest_memfd is >>>>>>> per >>>>>>> RAMBlock. Some RAMBlocks have a memory backend but others do not. In >>>>>>> particular, the ones like virtual BIOS calling >>>>>>> memory_region_init_ram_guest_memfd() do not. >>>>>>> >>>>>>> To manage the RAMBlocks with guest_memfd, define a new object named >>>>>>> MemoryAttributeManager to implement the RamDiscardManager >>>>>>> interface. The >>>>>> >>>>>> Isn't this should be the other way around. 'MemoryAttributeManager' >>>>>> should be an interface and RamDiscardManager a type of it, an >>>>>> implementation? >>>>> >>>>> We want to use 'MemoryAttributeManager' to represent RAMBlock to >>>>> implement the RamDiscardManager interface callbacks because >>>>> RAMBlock is >>>>> not an object. It includes some metadata of guest_memfd like >>>>> shared_bitmap at the same time. >>>>> >>>>> I can't get it that make 'MemoryAttributeManager' an interface and >>>>> RamDiscardManager a type of it. Can you elaborate it a little bit? I >>>>> think at least we need someone to implement the RamDiscardManager >>>>> interface. >>>> >>>> shared <-> private is translated (abstracted) to "populated <-> >>>> discarded", which makes sense. The other way around would be wrong. >>>> >>>> It's going to be interesting once we have more logical states, for >>>> example supporting virtio-mem for confidential VMs. >>>> >>>> Then we'd have "shared+populated, private+populated, shared+discard, >>>> private+discarded". Not sure if this could simply be achieved by >>>> allowing multiple RamDiscardManager that are effectively chained, or >>>> if we'd want a different interface. >>> >>> Exactly! In any case generic manager (parent class) would make more >>> sense that can work on different operations/states implemented in child >>> classes (can be chained as well). >> >> Ah, we are talking about the generic state management. Sorry for my slow >> reaction. >> >> So we need to >> 1. Define a generic manager Interface, e.g. >> MemoryStateManager/GenericStateManager. >> 2. Make RamDiscardManager the child of MemoryStateManager which manages >> the state of populated and discarded. >> 3. Define a new child manager Interface PrivateSharedManager which >> manages the state of private and shared. >> 4. Define a new object ConfidentialMemoryAttribute to implement the >> PrivateSharedManager interface. >> (Welcome to rename the above Interface/Object) >> >> Is my understanding correct? > > Yes, in that direction. Where 'RamDiscardManager' & > 'PrivateSharedManager' are both child of 'GenericStateManager'. > > Depending on listeners registered, corresponding handlers can be called. Yes, it would be more generic and future extensive. Do we need to add this framework change directly? Or keep the current structure (abstract private/shared as discard/populated) and add the generic manager until the real case like virtio-mem for confidential VMs. > > Best regards, > Pankaj >
On 3/19/2025 12:23 PM, Chenyi Qiang wrote: > > > On 3/19/2025 4:55 PM, Gupta, Pankaj wrote: >> >>>>>>>> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require >>>>>>>> uncoordinated discard") highlighted, some subsystems like VFIO may >>>>>>>> disable ram block discard. However, guest_memfd relies on the >>>>>>>> discard >>>>>>>> operation to perform page conversion between private and shared >>>>>>>> memory. >>>>>>>> This can lead to stale IOMMU mapping issue when assigning a hardware >>>>>>>> device to a confidential VM via shared memory. To address this, >>>>>>>> it is >>>>>>>> crucial to ensure systems like VFIO refresh its IOMMU mappings. >>>>>>>> >>>>>>>> RamDiscardManager is an existing concept (used by virtio-mem) to >>>>>>>> adjust >>>>>>>> VFIO mappings in relation to VM page assignment. Effectively page >>>>>>>> conversion is similar to hot-removing a page in one mode and >>>>>>>> adding it >>>>>>>> back in the other. Therefore, similar actions are required for page >>>>>>>> conversion events. Introduce the RamDiscardManager to guest_memfd to >>>>>>>> facilitate this process. >>>>>>>> >>>>>>>> Since guest_memfd is not an object, it cannot directly implement the >>>>>>>> RamDiscardManager interface. One potential attempt is to implement >>>>>>>> it in >>>>>>>> HostMemoryBackend. This is not appropriate because guest_memfd is >>>>>>>> per >>>>>>>> RAMBlock. Some RAMBlocks have a memory backend but others do not. In >>>>>>>> particular, the ones like virtual BIOS calling >>>>>>>> memory_region_init_ram_guest_memfd() do not. >>>>>>>> >>>>>>>> To manage the RAMBlocks with guest_memfd, define a new object named >>>>>>>> MemoryAttributeManager to implement the RamDiscardManager >>>>>>>> interface. The >>>>>>> >>>>>>> Isn't this should be the other way around. 'MemoryAttributeManager' >>>>>>> should be an interface and RamDiscardManager a type of it, an >>>>>>> implementation? >>>>>> >>>>>> We want to use 'MemoryAttributeManager' to represent RAMBlock to >>>>>> implement the RamDiscardManager interface callbacks because >>>>>> RAMBlock is >>>>>> not an object. It includes some metadata of guest_memfd like >>>>>> shared_bitmap at the same time. >>>>>> >>>>>> I can't get it that make 'MemoryAttributeManager' an interface and >>>>>> RamDiscardManager a type of it. Can you elaborate it a little bit? I >>>>>> think at least we need someone to implement the RamDiscardManager >>>>>> interface. >>>>> >>>>> shared <-> private is translated (abstracted) to "populated <-> >>>>> discarded", which makes sense. The other way around would be wrong. >>>>> >>>>> It's going to be interesting once we have more logical states, for >>>>> example supporting virtio-mem for confidential VMs. >>>>> >>>>> Then we'd have "shared+populated, private+populated, shared+discard, >>>>> private+discarded". Not sure if this could simply be achieved by >>>>> allowing multiple RamDiscardManager that are effectively chained, or >>>>> if we'd want a different interface. >>>> >>>> Exactly! In any case generic manager (parent class) would make more >>>> sense that can work on different operations/states implemented in child >>>> classes (can be chained as well). >>> >>> Ah, we are talking about the generic state management. Sorry for my slow >>> reaction. >>> >>> So we need to >>> 1. Define a generic manager Interface, e.g. >>> MemoryStateManager/GenericStateManager. >>> 2. Make RamDiscardManager the child of MemoryStateManager which manages >>> the state of populated and discarded. >>> 3. Define a new child manager Interface PrivateSharedManager which >>> manages the state of private and shared. >>> 4. Define a new object ConfidentialMemoryAttribute to implement the >>> PrivateSharedManager interface. >>> (Welcome to rename the above Interface/Object) >>> >>> Is my understanding correct? >> >> Yes, in that direction. Where 'RamDiscardManager' & >> 'PrivateSharedManager' are both child of 'GenericStateManager'. >> >> Depending on listeners registered, corresponding handlers can be called. > > Yes, it would be more generic and future extensive. > > Do we need to add this framework change directly? Or keep the current > structure (abstract private/shared as discard/populated) and add the > generic manager until the real case like virtio-mem for confidential VMs. > Yes, maybe to start with we should add new (discard/populated) changes with the new framework. In future the current framework can be extended for in-place conversion for private-shared conversion (if require userspace help) and virtio-mem like interfaces. Important is to have proper hierarchy with base bits there. Thanks, Pankaj
On 3/19/2025 7:56 PM, Gupta, Pankaj wrote: > On 3/19/2025 12:23 PM, Chenyi Qiang wrote: >> >> >> On 3/19/2025 4:55 PM, Gupta, Pankaj wrote: >>> >>>>>>>>> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require >>>>>>>>> uncoordinated discard") highlighted, some subsystems like VFIO may >>>>>>>>> disable ram block discard. However, guest_memfd relies on the >>>>>>>>> discard >>>>>>>>> operation to perform page conversion between private and shared >>>>>>>>> memory. >>>>>>>>> This can lead to stale IOMMU mapping issue when assigning a >>>>>>>>> hardware >>>>>>>>> device to a confidential VM via shared memory. To address this, >>>>>>>>> it is >>>>>>>>> crucial to ensure systems like VFIO refresh its IOMMU mappings. >>>>>>>>> >>>>>>>>> RamDiscardManager is an existing concept (used by virtio-mem) to >>>>>>>>> adjust >>>>>>>>> VFIO mappings in relation to VM page assignment. Effectively page >>>>>>>>> conversion is similar to hot-removing a page in one mode and >>>>>>>>> adding it >>>>>>>>> back in the other. Therefore, similar actions are required for >>>>>>>>> page >>>>>>>>> conversion events. Introduce the RamDiscardManager to >>>>>>>>> guest_memfd to >>>>>>>>> facilitate this process. >>>>>>>>> >>>>>>>>> Since guest_memfd is not an object, it cannot directly >>>>>>>>> implement the >>>>>>>>> RamDiscardManager interface. One potential attempt is to implement >>>>>>>>> it in >>>>>>>>> HostMemoryBackend. This is not appropriate because guest_memfd is >>>>>>>>> per >>>>>>>>> RAMBlock. Some RAMBlocks have a memory backend but others do >>>>>>>>> not. In >>>>>>>>> particular, the ones like virtual BIOS calling >>>>>>>>> memory_region_init_ram_guest_memfd() do not. >>>>>>>>> >>>>>>>>> To manage the RAMBlocks with guest_memfd, define a new object >>>>>>>>> named >>>>>>>>> MemoryAttributeManager to implement the RamDiscardManager >>>>>>>>> interface. The >>>>>>>> >>>>>>>> Isn't this should be the other way around. 'MemoryAttributeManager' >>>>>>>> should be an interface and RamDiscardManager a type of it, an >>>>>>>> implementation? >>>>>>> >>>>>>> We want to use 'MemoryAttributeManager' to represent RAMBlock to >>>>>>> implement the RamDiscardManager interface callbacks because >>>>>>> RAMBlock is >>>>>>> not an object. It includes some metadata of guest_memfd like >>>>>>> shared_bitmap at the same time. >>>>>>> >>>>>>> I can't get it that make 'MemoryAttributeManager' an interface and >>>>>>> RamDiscardManager a type of it. Can you elaborate it a little bit? I >>>>>>> think at least we need someone to implement the RamDiscardManager >>>>>>> interface. >>>>>> >>>>>> shared <-> private is translated (abstracted) to "populated <-> >>>>>> discarded", which makes sense. The other way around would be wrong. >>>>>> >>>>>> It's going to be interesting once we have more logical states, for >>>>>> example supporting virtio-mem for confidential VMs. >>>>>> >>>>>> Then we'd have "shared+populated, private+populated, shared+discard, >>>>>> private+discarded". Not sure if this could simply be achieved by >>>>>> allowing multiple RamDiscardManager that are effectively chained, or >>>>>> if we'd want a different interface. >>>>> >>>>> Exactly! In any case generic manager (parent class) would make more >>>>> sense that can work on different operations/states implemented in >>>>> child >>>>> classes (can be chained as well). >>>> >>>> Ah, we are talking about the generic state management. Sorry for my >>>> slow >>>> reaction. >>>> >>>> So we need to >>>> 1. Define a generic manager Interface, e.g. >>>> MemoryStateManager/GenericStateManager. >>>> 2. Make RamDiscardManager the child of MemoryStateManager which manages >>>> the state of populated and discarded. >>>> 3. Define a new child manager Interface PrivateSharedManager which >>>> manages the state of private and shared. >>>> 4. Define a new object ConfidentialMemoryAttribute to implement the >>>> PrivateSharedManager interface. >>>> (Welcome to rename the above Interface/Object) >>>> >>>> Is my understanding correct? >>> >>> Yes, in that direction. Where 'RamDiscardManager' & >>> 'PrivateSharedManager' are both child of 'GenericStateManager'. >>> >>> Depending on listeners registered, corresponding handlers can be called. >> >> Yes, it would be more generic and future extensive. >> >> Do we need to add this framework change directly? Or keep the current >> structure (abstract private/shared as discard/populated) and add the >> generic manager until the real case like virtio-mem for confidential VMs. >> > > Yes, maybe to start with we should add new (discard/populated) changes > with the new framework. > > In future the current framework can be extended for in-place conversion > for private-shared conversion (if require userspace help) and virtio-mem > like interfaces. Important is to have proper hierarchy with base bits > there. Thanks. Then I will follow this direction. To abstract the common parent class, what I can think of is to abstract it to manage a pair of opposite states (state set and state clear, like populate and discard) and define some similar common callbacks like notify_set() and notify_clear(), as long as we don't use it to manage more than two states in the future. Otherwise I may define a stub parent class. > > Thanks, > Pankaj
On 3/20/2025 11:15 AM, Chenyi Qiang wrote: > > > On 3/19/2025 7:56 PM, Gupta, Pankaj wrote: >> On 3/19/2025 12:23 PM, Chenyi Qiang wrote: >>> >>> >>> On 3/19/2025 4:55 PM, Gupta, Pankaj wrote: >>>> >>>>>>>>>> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require >>>>>>>>>> uncoordinated discard") highlighted, some subsystems like VFIO may >>>>>>>>>> disable ram block discard. However, guest_memfd relies on the >>>>>>>>>> discard >>>>>>>>>> operation to perform page conversion between private and shared >>>>>>>>>> memory. >>>>>>>>>> This can lead to stale IOMMU mapping issue when assigning a >>>>>>>>>> hardware >>>>>>>>>> device to a confidential VM via shared memory. To address this, >>>>>>>>>> it is >>>>>>>>>> crucial to ensure systems like VFIO refresh its IOMMU mappings. >>>>>>>>>> >>>>>>>>>> RamDiscardManager is an existing concept (used by virtio-mem) to >>>>>>>>>> adjust >>>>>>>>>> VFIO mappings in relation to VM page assignment. Effectively page >>>>>>>>>> conversion is similar to hot-removing a page in one mode and >>>>>>>>>> adding it >>>>>>>>>> back in the other. Therefore, similar actions are required for >>>>>>>>>> page >>>>>>>>>> conversion events. Introduce the RamDiscardManager to >>>>>>>>>> guest_memfd to >>>>>>>>>> facilitate this process. >>>>>>>>>> >>>>>>>>>> Since guest_memfd is not an object, it cannot directly >>>>>>>>>> implement the >>>>>>>>>> RamDiscardManager interface. One potential attempt is to implement >>>>>>>>>> it in >>>>>>>>>> HostMemoryBackend. This is not appropriate because guest_memfd is >>>>>>>>>> per >>>>>>>>>> RAMBlock. Some RAMBlocks have a memory backend but others do >>>>>>>>>> not. In >>>>>>>>>> particular, the ones like virtual BIOS calling >>>>>>>>>> memory_region_init_ram_guest_memfd() do not. >>>>>>>>>> >>>>>>>>>> To manage the RAMBlocks with guest_memfd, define a new object >>>>>>>>>> named >>>>>>>>>> MemoryAttributeManager to implement the RamDiscardManager >>>>>>>>>> interface. The >>>>>>>>> >>>>>>>>> Isn't this should be the other way around. 'MemoryAttributeManager' >>>>>>>>> should be an interface and RamDiscardManager a type of it, an >>>>>>>>> implementation? >>>>>>>> >>>>>>>> We want to use 'MemoryAttributeManager' to represent RAMBlock to >>>>>>>> implement the RamDiscardManager interface callbacks because >>>>>>>> RAMBlock is >>>>>>>> not an object. It includes some metadata of guest_memfd like >>>>>>>> shared_bitmap at the same time. >>>>>>>> >>>>>>>> I can't get it that make 'MemoryAttributeManager' an interface and >>>>>>>> RamDiscardManager a type of it. Can you elaborate it a little bit? I >>>>>>>> think at least we need someone to implement the RamDiscardManager >>>>>>>> interface. >>>>>>> >>>>>>> shared <-> private is translated (abstracted) to "populated <-> >>>>>>> discarded", which makes sense. The other way around would be wrong. >>>>>>> >>>>>>> It's going to be interesting once we have more logical states, for >>>>>>> example supporting virtio-mem for confidential VMs. >>>>>>> >>>>>>> Then we'd have "shared+populated, private+populated, shared+discard, >>>>>>> private+discarded". Not sure if this could simply be achieved by >>>>>>> allowing multiple RamDiscardManager that are effectively chained, or >>>>>>> if we'd want a different interface. >>>>>> >>>>>> Exactly! In any case generic manager (parent class) would make more >>>>>> sense that can work on different operations/states implemented in >>>>>> child >>>>>> classes (can be chained as well). >>>>> >>>>> Ah, we are talking about the generic state management. Sorry for my >>>>> slow >>>>> reaction. >>>>> >>>>> So we need to >>>>> 1. Define a generic manager Interface, e.g. >>>>> MemoryStateManager/GenericStateManager. >>>>> 2. Make RamDiscardManager the child of MemoryStateManager which manages >>>>> the state of populated and discarded. >>>>> 3. Define a new child manager Interface PrivateSharedManager which >>>>> manages the state of private and shared. >>>>> 4. Define a new object ConfidentialMemoryAttribute to implement the >>>>> PrivateSharedManager interface. >>>>> (Welcome to rename the above Interface/Object) >>>>> >>>>> Is my understanding correct? >>>> >>>> Yes, in that direction. Where 'RamDiscardManager' & >>>> 'PrivateSharedManager' are both child of 'GenericStateManager'. >>>> >>>> Depending on listeners registered, corresponding handlers can be called. >>> >>> Yes, it would be more generic and future extensive. >>> >>> Do we need to add this framework change directly? Or keep the current >>> structure (abstract private/shared as discard/populated) and add the >>> generic manager until the real case like virtio-mem for confidential VMs. >>> >> >> Yes, maybe to start with we should add new (discard/populated) changes >> with the new framework. >> >> In future the current framework can be extended for in-place conversion >> for private-shared conversion (if require userspace help) and virtio-mem >> like interfaces. Important is to have proper hierarchy with base bits >> there. > > Thanks. Then I will follow this direction. > > To abstract the common parent class, what I can think of is to abstract > it to manage a pair of opposite states (state set and state clear, like > populate and discard) and define some similar common callbacks like > notify_set() and notify_clear(), as long as we don't use it to manage > more than two states in the future. Otherwise I may define a stub parent > class. Hi Pankaj & David, How about defining them like: --- For class definition: extract all the callbacks into parent class: typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque); struct GenericStateManagerClass { InterfaceClass parent_class; uint64_t (*get_min_granularity)(const GenericStateManager *gsm, const MemoryRegion *mr); bool (*is_state_set)(const GenericStateManager *gsm, const MemoryRegionSection *section); int (*replay_state_set)(const GenericStateManager *gsm, MemoryRegionSection *section, ReplayStateChange replay_fn, void *opaque); int (*replay_state_clear)(const GenericStateManager *gsm, MemoryRegionSection *section, ReplayStateChange replay_fn, void *opaque); void (*register_listener)(GenericStateManager *gsm, void* listener, MemoryRegionSection *section); void (*unregister_listener)(GenericStateManager *gsm, void *listener); } struct RamDiscardManagerClass { /* private */ GenericStateManagerClass parent_class; }; struct PrivateSharedManagerClass { /* private */ GenericStateManagerClass parent_class; } --- For listeners, embed the generic listener into specific listeners: typedef int (*NotifyStateSet)(void *listener, MemoryRegionSection *section); typedef int (*NotifyStateClear)(void *listener, MemoryRegionSection *section); struct StateChangeListener { NotifyStateSet notify_state_set; NotifyStateClear notify_state_clear; MemoryRegionSection *section; } struct RamDiscardListener { struct StateChangeListener scl; bool double_discard_supported; QLIST_ENTRY(RamDiscardListener) next; }; struct PrivateSharedListener { struct StateChangeListener scl; QLIST_ENTRY(PrivateSharedListener) next; } --- For helpers, - define the generic helpers: void generic_state_manager_register_listener(GenericStateManager *gsm, void *listener, MemoryRegionSection *s) { GenericStateManagerClass gsmc = GENERATE_STATE_MANAGER_GET_CLASS(rdm); g_assert(gsmc->register_listener); gsmc->register_listener(gsm, listener, s); } - Keep RamDiscardManager specific helpers for compatibility if necessary: void ram_discard_manager_register_listener(RamDiscardManager *rdm, RamDiscardListener *rdl, MemoryRegionSection *section) { GenericStateManagerClass gsmc = GENERATE_STATE_MANAGER_GET_CLASS(rdm); GenericStateManager *gsm = GENERIC_STATE_MANAGER(rdm); g_assert(gsmc->register_listener); gsmc->register_listener(gsm, (void*)rdl, section); } > >> >> Thanks, >> Pankaj >
© 2016 - 2025 Red Hat, Inc.