1
Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
1
This is the v4 series of the shared device assignment support.
2
discard") effectively disables device assignment when using guest_memfd.
3
This poses a significant challenge as guest_memfd is essential for
4
confidential guests, thereby blocking device assignment to these VMs.
5
The initial rationale for disabling device assignment was due to stale
6
IOMMU mappings (see Problem section) and the assumption that TEE I/O
7
(SEV-TIO, TDX Connect, COVE-IO, etc.) would solve the device-assignment
8
problem for confidential guests [1]. However, this assumption has proven
9
to be incorrect. TEE I/O relies on the ability to operate devices against
10
"shared" or untrusted memory, which is crucial for device initialization
11
and error recovery scenarios. As a result, the current implementation does
12
not adequately support device assignment for confidential guests, necessitating
13
a reevaluation of the approach to ensure compatibility and functionality.
14
2
15
This series enables shared device assignment by notifying VFIO of page
3
Compared with v3 series, the main changes are:
16
conversions using an existing framework named RamDiscardListener.
17
Additionally, there is an ongoing patch set [2] that aims to add 1G page
18
support for guest_memfd. This patch set introduces in-place page conversion,
19
where private and shared memory share the same physical pages as the backend.
20
This development may impact our solution.
21
4
22
We presented our solution in the guest_memfd meeting to discuss its
5
- Introduced a new GenericStateManager parent class, so that the existing
23
compatibility with the new changes and potential future directions (see [3]
6
RamDiscardManager and new PrivateSharedManager can be its child class
24
for more details). The conclusion was that, although our solution may not be
7
and manage different states.
25
the most elegant (see the Limitation section), it is sufficient for now and
8
- Changed the name of MemoryAttributeManager to RamBlockAttribute to
26
can be easily adapted to future changes.
9
distinguish from the XXXManager interface and still use it to manage
10
guest_memfd information. Meanwhile, Use it to implement
11
PrivateSharedManager instead of RamDiscardManager to distinguish the
12
states of populate/discard and shared/private.
13
- Moved the attribute change operations into a listener so that both the
14
attribute change and IOMMU pins can be invoked in listener callbacks.
15
- Added priority listener support in PrivateSharedListener so that the
16
attribute change listener and VFIO listener can be triggered in
17
expected order to comply with in-place conversin requirement.
18
- v3: https://lore.kernel.org/qemu-devel/20250310081837.13123-1-chenyi.qiang@intel.com/
27
19
28
We are re-posting the patch series with some cleanup and have removed the RFC
20
The overview of this series:
29
label for the main enabling patches (1-6). The newly-added patch 7 is still
21
- Patch 1-3: preparation patches. These include function exposure and
30
marked as RFC as it tries to resolve some extension concerns related to
22
some definition changes to return values.
31
RamDiscardManager for future usage.
23
- Patch 4: Introduce a generic state change parent class with
24
RamDiscardManager as its child class. This paves the way to introduce
25
new child classes to manage other memory states.
26
- Patch 5-6: Introduce a new child class, PrivateSharedManager, to
27
manage the private and shared states. Also adds VFIO support for this
28
new interface to coordinate RAM discard support.
29
- Patch 7-9: Introduce a new object to implement the
30
PrivateSharedManager interface and a callback to notify the
31
shared/private state change. Stores it in RAMBlocks and register it in
32
the target MemoryRegion so that the object can notify page conversion
33
events to other systems.
34
- Patch 10-11: Moves the state change handling into a
35
PrivateSharedListener so that it can be invoked together with the VFIO
36
listener by the state_change() call.
37
- Patch 12: To comply with in-place conversion, introduces the priority
38
listener support so that the attribute change and IOMMU pin can follow
39
the expected order.
40
- Patch 13: Unlocks the coordinate discard so that the shared device
41
assignment (VFIO) can work with guest_memfd.
32
42
33
The overview of the patches:
43
More small changes or details can be found in the individual patches.
34
- Patch 1: Export a helper to get intersection of a MemoryRegionSection
35
with a given range.
36
- Patch 2-6: Introduce a new object to manage the guest-memfd with
37
RamDiscardManager, and notify the shared/private state change during
38
conversion.
39
- Patch 7: Try to resolve a semantics concern related to RamDiscardManager
40
i.e. RamDiscardManager is used to manage memory plug/unplug state
41
instead of shared/private state. It would affect future users of
42
RamDiscardManger in confidential VMs. Attach it behind as a RFC patch[4].
43
44
Changes since last version:
45
- Add a patch to export some generic helper functions from virtio-mem code.
46
- Change the bitmap in guest_memfd_manager from default shared to default
47
private. This keeps alignment with virtio-mem that 1-setting in bitmap
48
represents the populated state and may help to export more generic code
49
if necessary.
50
- Add the helpers to initialize/uninitialize the guest_memfd_manager instance
51
to make it more clear.
52
- Add a patch to distinguish between the shared/private state change and
53
the memory plug/unplug state change in RamDiscardManager.
54
- RFC: https://lore.kernel.org/qemu-devel/20240725072118.358923-1-chenyi.qiang@intel.com/
55
44
56
---
45
---
46
Original cover letter with minor changes related to new parent class:
57
47
58
Background
48
Background
59
==========
49
==========
60
Confidential VMs have two classes of memory: shared and private memory.
50
Confidential VMs have two classes of memory: shared and private memory.
61
Shared memory is accessible from the host/VMM while private memory is
51
Shared memory is accessible from the host/VMM while private memory is
62
not. Confidential VMs can decide which memory is shared/private and
52
not. Confidential VMs can decide which memory is shared/private and
63
convert memory between shared/private at runtime.
53
convert memory between shared/private at runtime.
64
54
65
"guest_memfd" is a new kind of fd whose primary goal is to serve guest
55
"guest_memfd" is a new kind of fd whose primary goal is to serve guest
66
private memory. The key differences between guest_memfd and normal memfd
56
private memory. In current implementation, shared memory is allocated
67
are that guest_memfd is spawned by a KVM ioctl, bound to its owner VM and
57
with normal methods (e.g. mmap or fallocate) while private memory is
68
cannot be mapped, read or written by userspace.
58
allocated from guest_memfd. When a VM performs memory conversions, QEMU
69
59
frees pages via madvise or via PUNCH_HOLE on memfd or guest_memfd from
70
In QEMU's implementation, shared memory is allocated with normal methods
60
one side, and allocates new pages from the other side. This will cause a
71
(e.g. mmap or fallocate) while private memory is allocated from
61
stale IOMMU mapping issue mentioned in [1] when we try to enable shared
72
guest_memfd. When a VM performs memory conversions, QEMU frees pages via
62
device assignment in confidential VMs.
73
madvise() or via PUNCH_HOLE on memfd or guest_memfd from one side and
74
allocates new pages from the other side.
75
76
Problem
77
=======
78
Device assignment in QEMU is implemented via VFIO system. In the normal
79
VM, VM memory is pinned at the beginning of time by VFIO. In the
80
confidential VM, the VM can convert memory and when that happens
81
nothing currently tells VFIO that its mappings are stale. This means
82
that page conversion leaks memory and leaves stale IOMMU mappings. For
83
example, sequence like the following can result in stale IOMMU mappings:
84
85
1. allocate shared page
86
2. convert page shared->private
87
3. discard shared page
88
4. convert page private->shared
89
5. allocate shared page
90
6. issue DMA operations against that shared page
91
92
After step 3, VFIO is still pinning the page. However, DMA operations in
93
step 6 will hit the old mapping that was allocated in step 1, which
94
causes the device to access the invalid data.
95
63
96
Solution
64
Solution
97
========
65
========
98
The key to enable shared device assignment is to update the IOMMU mappings
66
The key to enable shared device assignment is to update the IOMMU mappings
99
on page conversion.
67
on page conversion. RamDiscardManager, an existing interface currently
68
utilized by virtio-mem, offers a means to modify IOMMU mappings in
69
accordance with VM page assignment. Although the required operations in
70
VFIO for page conversion are similar to memory plug/unplug, the states of
71
private/shared are different from discard/populated. We want a similar
72
mechanism with RamDiscardManager but used to manage the state of private
73
and shared.
100
74
101
Given the constraints and assumptions here is a solution that satisfied
75
This series introduce a new parent abstract class to manage a pair of
102
the use cases. RamDiscardManager, an existing interface currently
76
opposite states with RamDiscardManager as its child to manage
103
utilized by virtio-mem, offers a means to modify IOMMU mappings in
77
populate/discard states, and introduce a new child class,
104
accordance with VM page assignment. Page conversion is similar to
78
PrivateSharedManager, which can also utilize the same infrastructure to
105
hot-removing a page in one mode and adding it back in the other.
79
notify VFIO of page conversions.
106
80
107
This series implements a RamDiscardManager for confidential VMs and
81
Relationship with in-place page conversion
108
utilizes its infrastructure to notify VFIO of page conversions.
82
==========================================
109
83
To support 1G page support for guest_memfd [2], the current direction is to
110
Another possible attempt [5] was to not discard shared pages in step 3
111
above. This was an incomplete band-aid because guests would consume
112
twice the memory since shared pages wouldn't be freed even after they
113
were converted to private.
114
115
w/ in-place page conversion
116
===========================
117
To support 1G page support for guest_memfd, the current direction is to
118
allow mmap() of guest_memfd to userspace so that both private and shared
84
allow mmap() of guest_memfd to userspace so that both private and shared
119
memory can use the same physical pages as the backend. This in-place page
85
memory can use the same physical pages as the backend. This in-place page
120
conversion design eliminates the need to discard pages during shared/private
86
conversion design eliminates the need to discard pages during shared/private
121
conversions. However, device assignment will still be blocked because the
87
conversions. However, device assignment will still be blocked because the
122
in-place page conversion will reject the conversion when the page is pinned
88
in-place page conversion will reject the conversion when the page is pinned
123
by VFIO.
89
by VFIO.
124
90
125
To address this, the key difference lies in the sequence of VFIO map/unmap
91
To address this, the key difference lies in the sequence of VFIO map/unmap
126
operations and the page conversion. This series can be adjusted to achieve
92
operations and the page conversion. It can be adjusted to achieve
127
unmap-before-conversion-to-private and map-after-conversion-to-shared,
93
unmap-before-conversion-to-private and map-after-conversion-to-shared,
128
ensuring compatibility with guest_memfd.
94
ensuring compatibility with guest_memfd.
129
95
130
Additionally, with in-place page conversion, the previously mentioned
131
solution to disable the discard of shared pages is not feasible because
132
shared and private memory share the same backend, and no discard operation
133
is performed. Retaining the old mappings in the IOMMU would result in
134
unsafe DMA access to protected memory.
135
136
Limitation
96
Limitation
137
==========
97
==========
138
98
One limitation is that VFIO expects the DMA mapping for a specific IOVA
139
One limitation (also discussed in the guest_memfd meeting) is that VFIO
99
to be mapped and unmapped with the same granularity. The guest may
140
expects the DMA mapping for a specific IOVA to be mapped and unmapped with
100
perform partial conversions, such as converting a small region within a
141
the same granularity. The guest may perform partial conversions, such as
101
larger region. To prevent such invalid cases, all operations are
142
converting a small region within a larger region. To prevent such invalid
102
performed with 4K granularity. This could be optimized after the
143
cases, all operations are performed with 4K granularity. The possible
103
cut_mapping operation [3] is introduced in future. We can alway perform a
144
solutions we can think of are either to enable VFIO to support partial unmap
104
split-before-unmap if partial conversions happen. If the split succeeds,
145
or to implement an enlightened guest to avoid partial conversion. The former
105
the unmap will succeed and be atomic. If the split fails, the unmap
146
requires complex changes in VFIO, while the latter requires the page
106
process fails.
147
conversion to be a guest-enlightened behavior. It is still uncertain which
148
option is a preferred one.
149
107
150
Testing
108
Testing
151
=======
109
=======
152
This patch series is tested with the KVM/QEMU branch:
110
This patch series is tested based on TDX patches available at:
153
KVM: https://github.com/intel/tdx/tree/tdx_kvm_dev-2024-11-20
111
KVM: https://github.com/intel/tdx/tree/kvm-coco-queue-snapshot/kvm-coco-queue-snapshot-20250322
154
QEMU: https://github.com/intel-staging/qemu-tdx/tree/tdx-upstream-snapshot-2024-12-13
112
(With the revert of HEAD commit)
113
QEMU: https://github.com/intel-staging/qemu-tdx/tree/tdx-upstream-snapshot-2025-04-07
155
114
156
To facilitate shared device assignment with the NIC, employ the legacy
115
To facilitate shared device assignment with the NIC, employ the legacy
157
type1 VFIO with the QEMU command:
116
type1 VFIO with the QEMU command:
158
117
159
qemu-system-x86_64 [...]
118
qemu-system-x86_64 [...]
...
...
174
Following the bootup of the TD guest, the guest's IP address becomes
133
Following the bootup of the TD guest, the guest's IP address becomes
175
visible, and iperf is able to successfully send and receive data.
134
visible, and iperf is able to successfully send and receive data.
176
135
177
Related link
136
Related link
178
============
137
============
179
[1] https://lore.kernel.org/all/d6acfbef-96a1-42bc-8866-c12a4de8c57c@redhat.com/
138
[1] https://lore.kernel.org/qemu-devel/20240423150951.41600-54-pbonzini@redhat.com/
180
[2] https://lore.kernel.org/lkml/cover.1726009989.git.ackerleytng@google.com/
139
[2] https://lore.kernel.org/lkml/cover.1726009989.git.ackerleytng@google.com/
181
[3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAosPOk/edit?tab=t.0#heading=h.jr4csfgw1uql
140
[3] https://lore.kernel.org/linux-iommu/7-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com/
182
[4] https://lore.kernel.org/qemu-devel/d299bbad-81bc-462e-91b5-a6d9c27ffe3a@redhat.com/
183
[5] https://lore.kernel.org/all/20240320083945.991426-20-michael.roth@amd.com/
184
141
185
Chenyi Qiang (7):
142
Chenyi Qiang (13):
186
memory: Export a helper to get intersection of a MemoryRegionSection
143
memory: Export a helper to get intersection of a MemoryRegionSection
187
with a given range
144
with a given range
188
guest_memfd: Introduce an object to manage the guest-memfd with
145
memory: Change memory_region_set_ram_discard_manager() to return the
146
result
147
memory: Unify the definiton of ReplayRamPopulate() and
148
ReplayRamDiscard()
149
memory: Introduce generic state change parent class for
189
RamDiscardManager
150
RamDiscardManager
190
guest_memfd: Introduce a callback to notify the shared/private state
151
memory: Introduce PrivateSharedManager Interface as child of
191
change
152
GenericStateManager
192
KVM: Notify the state change event during shared/private conversion
153
vfio: Add the support for PrivateSharedManager Interface
193
memory: Register the RamDiscardManager instance upon guest_memfd
154
ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock
194
creation
155
with guest_memfd
195
RAMBlock: make guest_memfd require coordinate discard
156
ram-block-attribute: Introduce a callback to notify shared/private
196
memory: Add a new argument to indicate the request attribute in
157
state changes
197
RamDismcardManager helpers
158
memory: Attach RamBlockAttribute to guest_memfd-backed RAMBlocks
159
memory: Change NotifyStateClear() definition to return the result
160
KVM: Introduce CVMPrivateSharedListener for attribute changes during
161
page conversions
162
ram-block-attribute: Add priority listener support for
163
PrivateSharedListener
164
RAMBlock: Make guest_memfd require coordinate discard
198
165
199
accel/kvm/kvm-all.c | 4 +
166
accel/kvm/kvm-all.c | 81 +++-
200
hw/vfio/common.c | 22 +-
167
hw/vfio/common.c | 131 +++++-
201
hw/virtio/virtio-mem.c | 55 ++--
168
hw/vfio/container-base.c | 1 +
202
include/exec/memory.h | 36 ++-
169
hw/virtio/virtio-mem.c | 168 +++----
203
include/sysemu/guest-memfd-manager.h | 91 ++++++
170
include/exec/memory.h | 407 ++++++++++------
204
migration/ram.c | 14 +-
171
include/exec/ramblock.h | 25 +
205
system/guest-memfd-manager.c | 456 +++++++++++++++++++++++++++
172
include/hw/vfio/vfio-container-base.h | 10 +
206
system/memory.c | 30 +-
173
include/system/confidential-guest-support.h | 10 +
207
system/memory_mapping.c | 4 +-
174
migration/ram.c | 21 +-
208
system/meson.build | 1 +
175
system/memory.c | 137 ++++--
209
system/physmem.c | 9 +-
176
system/memory_mapping.c | 6 +-
210
11 files changed, 659 insertions(+), 63 deletions(-)
177
system/meson.build | 1 +
211
create mode 100644 include/sysemu/guest-memfd-manager.h
178
system/physmem.c | 20 +-
212
create mode 100644 system/guest-memfd-manager.c
179
system/ram-block-attribute.c | 495 ++++++++++++++++++++
180
target/i386/kvm/tdx.c | 1 +
181
target/i386/sev.c | 1 +
182
16 files changed, 1192 insertions(+), 323 deletions(-)
183
create mode 100644 system/ram-block-attribute.c
213
184
214
--
185
--
215
2.43.5
186
2.43.5
diff view generated by jsdifflib
1
Rename the helper to memory_region_section_intersect_range() to make it
1
Rename the helper to memory_region_section_intersect_range() to make it
2
more generic.
2
more generic. Meanwhile, define the @end as Int128 and replace the
3
related operations with Int128_* format since the helper is exported as
4
a wider API.
3
5
6
Suggested-by: Alexey Kardashevskiy <aik@amd.com>
7
Reviewed-by: David Hildenbrand <david@redhat.com>
4
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
8
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
5
---
9
---
10
Changes in v4:
11
- No change.
12
13
Changes in v3:
14
- No change
15
16
Changes in v2:
17
- Make memory_region_section_intersect_range() an inline function.
18
- Add Reviewed-by from David
19
- Define the @end as Int128 and use the related Int128_* ops as a wilder
20
API (Alexey)
21
---
6
hw/virtio/virtio-mem.c | 32 +++++---------------------------
22
hw/virtio/virtio-mem.c | 32 +++++---------------------------
7
include/exec/memory.h | 13 +++++++++++++
23
include/exec/memory.h | 27 +++++++++++++++++++++++++++
8
system/memory.c | 17 +++++++++++++++++
24
2 files changed, 32 insertions(+), 27 deletions(-)
9
3 files changed, 35 insertions(+), 27 deletions(-)
10
25
11
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
26
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
12
index XXXXXXX..XXXXXXX 100644
27
index XXXXXXX..XXXXXXX 100644
13
--- a/hw/virtio/virtio-mem.c
28
--- a/hw/virtio/virtio-mem.c
14
+++ b/hw/virtio/virtio-mem.c
29
+++ b/hw/virtio/virtio-mem.c
...
...
102
+ * @offset: the offset of the given range in the memory region
117
+ * @offset: the offset of the given range in the memory region
103
+ * @size: the size of the given range
118
+ * @size: the size of the given range
104
+ *
119
+ *
105
+ * Returns false if the intersection is empty, otherwise returns true.
120
+ * Returns false if the intersection is empty, otherwise returns true.
106
+ */
121
+ */
107
+bool memory_region_section_intersect_range(MemoryRegionSection *s,
122
+static inline bool memory_region_section_intersect_range(MemoryRegionSection *s,
108
+ uint64_t offset, uint64_t size);
123
+ uint64_t offset, uint64_t size)
109
+
110
/**
111
* memory_region_init: Initialize a memory region
112
*
113
diff --git a/system/memory.c b/system/memory.c
114
index XXXXXXX..XXXXXXX 100644
115
--- a/system/memory.c
116
+++ b/system/memory.c
117
@@ -XXX,XX +XXX,XX @@ void memory_region_section_free_copy(MemoryRegionSection *s)
118
g_free(s);
119
}
120
121
+bool memory_region_section_intersect_range(MemoryRegionSection *s,
122
+ uint64_t offset, uint64_t size)
123
+{
124
+{
124
+ uint64_t start = MAX(s->offset_within_region, offset);
125
+ uint64_t start = MAX(s->offset_within_region, offset);
125
+ uint64_t end = MIN(s->offset_within_region + int128_get64(s->size),
126
+ Int128 end = int128_min(int128_add(int128_make64(s->offset_within_region), s->size),
126
+ offset + size);
127
+ int128_add(int128_make64(offset), int128_make64(size)));
127
+
128
+
128
+ if (end <= start) {
129
+ if (int128_le(end, int128_make64(start))) {
129
+ return false;
130
+ return false;
130
+ }
131
+ }
131
+
132
+
132
+ s->offset_within_address_space += start - s->offset_within_region;
133
+ s->offset_within_address_space += start - s->offset_within_region;
133
+ s->offset_within_region = start;
134
+ s->offset_within_region = start;
134
+ s->size = int128_make64(end - start);
135
+ s->size = int128_sub(end, int128_make64(start));
135
+ return true;
136
+ return true;
136
+}
137
+}
137
+
138
+
138
bool memory_region_present(MemoryRegion *container, hwaddr addr)
139
/**
139
{
140
* memory_region_init: Initialize a memory region
140
MemoryRegion *mr;
141
*
141
--
142
--
142
2.43.5
143
2.43.5
diff view generated by jsdifflib
New patch
1
Modify memory_region_set_ram_discard_manager() to return false if a
2
RamDiscardManager is already set in the MemoryRegion. The caller must
3
handle this failure, such as having virtio-mem undo its actions and fail
4
the realize() process. Opportunistically move the call earlier to avoid
5
complex error handling.
1
6
7
This change is beneficial when introducing a new RamDiscardManager
8
instance besides virtio-mem. After
9
ram_block_coordinated_discard_require(true) unlocks all
10
RamDiscardManager instances, only one instance is allowed to be set for
11
a MemoryRegion at present.
12
13
Suggested-by: David Hildenbrand <david@redhat.com>
14
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
15
---
16
Changes in v4:
17
- No change.
18
19
Changes in v3:
20
- Move set_ram_discard_manager() up to avoid a g_free()
21
- Clean up set_ram_discard_manager() definition
22
23
Changes in v2:
24
- newly added.
25
---
26
hw/virtio/virtio-mem.c | 29 ++++++++++++++++-------------
27
include/exec/memory.h | 6 +++---
28
system/memory.c | 10 +++++++---
29
3 files changed, 26 insertions(+), 19 deletions(-)
30
31
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
32
index XXXXXXX..XXXXXXX 100644
33
--- a/hw/virtio/virtio-mem.c
34
+++ b/hw/virtio/virtio-mem.c
35
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
36
return;
37
}
38
39
+ /*
40
+ * Set ourselves as RamDiscardManager before the plug handler maps the
41
+ * memory region and exposes it via an address space.
42
+ */
43
+ if (memory_region_set_ram_discard_manager(&vmem->memdev->mr,
44
+ RAM_DISCARD_MANAGER(vmem))) {
45
+ error_setg(errp, "Failed to set RamDiscardManager");
46
+ ram_block_coordinated_discard_require(false);
47
+ return;
48
+ }
49
+
50
/*
51
* We don't know at this point whether shared RAM is migrated using
52
* QEMU or migrated using the file content. "x-ignore-shared" will be
53
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
54
vmem->system_reset = VIRTIO_MEM_SYSTEM_RESET(obj);
55
vmem->system_reset->vmem = vmem;
56
qemu_register_resettable(obj);
57
-
58
- /*
59
- * Set ourselves as RamDiscardManager before the plug handler maps the
60
- * memory region and exposes it via an address space.
61
- */
62
- memory_region_set_ram_discard_manager(&vmem->memdev->mr,
63
- RAM_DISCARD_MANAGER(vmem));
64
}
65
66
static void virtio_mem_device_unrealize(DeviceState *dev)
67
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_device_unrealize(DeviceState *dev)
68
VirtIODevice *vdev = VIRTIO_DEVICE(dev);
69
VirtIOMEM *vmem = VIRTIO_MEM(dev);
70
71
- /*
72
- * The unplug handler unmapped the memory region, it cannot be
73
- * found via an address space anymore. Unset ourselves.
74
- */
75
- memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
76
-
77
qemu_unregister_resettable(OBJECT(vmem->system_reset));
78
object_unref(OBJECT(vmem->system_reset));
79
80
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_device_unrealize(DeviceState *dev)
81
virtio_del_queue(vdev, 0);
82
virtio_cleanup(vdev);
83
g_free(vmem->bitmap);
84
+ /*
85
+ * The unplug handler unmapped the memory region, it cannot be
86
+ * found via an address space anymore. Unset ourselves.
87
+ */
88
+ memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
89
ram_block_coordinated_discard_require(false);
90
}
91
92
diff --git a/include/exec/memory.h b/include/exec/memory.h
93
index XXXXXXX..XXXXXXX 100644
94
--- a/include/exec/memory.h
95
+++ b/include/exec/memory.h
96
@@ -XXX,XX +XXX,XX @@ static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
97
*
98
* This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
99
* that does not cover RAM, or a #MemoryRegion that already has a
100
- * #RamDiscardManager assigned.
101
+ * #RamDiscardManager assigned. Return 0 if the rdm is set successfully.
102
*
103
* @mr: the #MemoryRegion
104
* @rdm: #RamDiscardManager to set
105
*/
106
-void memory_region_set_ram_discard_manager(MemoryRegion *mr,
107
- RamDiscardManager *rdm);
108
+int memory_region_set_ram_discard_manager(MemoryRegion *mr,
109
+ RamDiscardManager *rdm);
110
111
/**
112
* memory_region_find: translate an address/size relative to a
113
diff --git a/system/memory.c b/system/memory.c
114
index XXXXXXX..XXXXXXX 100644
115
--- a/system/memory.c
116
+++ b/system/memory.c
117
@@ -XXX,XX +XXX,XX @@ RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr)
118
return mr->rdm;
119
}
120
121
-void memory_region_set_ram_discard_manager(MemoryRegion *mr,
122
- RamDiscardManager *rdm)
123
+int memory_region_set_ram_discard_manager(MemoryRegion *mr,
124
+ RamDiscardManager *rdm)
125
{
126
g_assert(memory_region_is_ram(mr));
127
- g_assert(!rdm || !mr->rdm);
128
+ if (mr->rdm && rdm) {
129
+ return -EBUSY;
130
+ }
131
+
132
mr->rdm = rdm;
133
+ return 0;
134
}
135
136
uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
137
--
138
2.43.5
diff view generated by jsdifflib
New patch
1
Update ReplayRamDiscard() function to return the result and unify the
2
ReplayRamPopulate() and ReplayRamDiscard() to ReplayStateChange() at
3
the same time due to their identical definitions. This unification
4
simplifies related structures, such as VirtIOMEMReplayData, which makes
5
it more cleaner and maintainable.
1
6
7
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
8
---
9
Changes in v4:
10
- Modify the commit message. We won't use Replay() operation when
11
doing the attribute change like v3.
12
13
Changes in v3:
14
- Newly added.
15
---
16
hw/virtio/virtio-mem.c | 20 ++++++++++----------
17
include/exec/memory.h | 31 ++++++++++++++++---------------
18
migration/ram.c | 5 +++--
19
system/memory.c | 12 ++++++------
20
4 files changed, 35 insertions(+), 33 deletions(-)
21
22
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
23
index XXXXXXX..XXXXXXX 100644
24
--- a/hw/virtio/virtio-mem.c
25
+++ b/hw/virtio/virtio-mem.c
26
@@ -XXX,XX +XXX,XX @@ static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm,
27
}
28
29
struct VirtIOMEMReplayData {
30
- void *fn;
31
+ ReplayStateChange fn;
32
void *opaque;
33
};
34
35
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg)
36
{
37
struct VirtIOMEMReplayData *data = arg;
38
39
- return ((ReplayRamPopulate)data->fn)(s, data->opaque);
40
+ return data->fn(s, data->opaque);
41
}
42
43
static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm,
44
MemoryRegionSection *s,
45
- ReplayRamPopulate replay_fn,
46
+ ReplayStateChange replay_fn,
47
void *opaque)
48
{
49
const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
50
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s,
51
{
52
struct VirtIOMEMReplayData *data = arg;
53
54
- ((ReplayRamDiscard)data->fn)(s, data->opaque);
55
+ data->fn(s, data->opaque);
56
return 0;
57
}
58
59
-static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
60
- MemoryRegionSection *s,
61
- ReplayRamDiscard replay_fn,
62
- void *opaque)
63
+static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
64
+ MemoryRegionSection *s,
65
+ ReplayStateChange replay_fn,
66
+ void *opaque)
67
{
68
const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
69
struct VirtIOMEMReplayData data = {
70
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
71
};
72
73
g_assert(s->mr == &vmem->memdev->mr);
74
- virtio_mem_for_each_unplugged_section(vmem, s, &data,
75
- virtio_mem_rdm_replay_discarded_cb);
76
+ return virtio_mem_for_each_unplugged_section(vmem, s, &data,
77
+ virtio_mem_rdm_replay_discarded_cb);
78
}
79
80
static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
81
diff --git a/include/exec/memory.h b/include/exec/memory.h
82
index XXXXXXX..XXXXXXX 100644
83
--- a/include/exec/memory.h
84
+++ b/include/exec/memory.h
85
@@ -XXX,XX +XXX,XX @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl,
86
rdl->double_discard_supported = double_discard_supported;
87
}
88
89
-typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaque);
90
-typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void *opaque);
91
+typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
92
93
/*
94
* RamDiscardManagerClass:
95
@@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass {
96
/**
97
* @replay_populated:
98
*
99
- * Call the #ReplayRamPopulate callback for all populated parts within the
100
+ * Call the #ReplayStateChange callback for all populated parts within the
101
* #MemoryRegionSection via the #RamDiscardManager.
102
*
103
* In case any call fails, no further calls are made.
104
*
105
* @rdm: the #RamDiscardManager
106
* @section: the #MemoryRegionSection
107
- * @replay_fn: the #ReplayRamPopulate callback
108
+ * @replay_fn: the #ReplayStateChange callback
109
* @opaque: pointer to forward to the callback
110
*
111
* Returns 0 on success, or a negative error if any notification failed.
112
*/
113
int (*replay_populated)(const RamDiscardManager *rdm,
114
MemoryRegionSection *section,
115
- ReplayRamPopulate replay_fn, void *opaque);
116
+ ReplayStateChange replay_fn, void *opaque);
117
118
/**
119
* @replay_discarded:
120
*
121
- * Call the #ReplayRamDiscard callback for all discarded parts within the
122
+ * Call the #ReplayStateChange callback for all discarded parts within the
123
* #MemoryRegionSection via the #RamDiscardManager.
124
*
125
* @rdm: the #RamDiscardManager
126
* @section: the #MemoryRegionSection
127
- * @replay_fn: the #ReplayRamDiscard callback
128
+ * @replay_fn: the #ReplayStateChange callback
129
* @opaque: pointer to forward to the callback
130
+ *
131
+ * Returns 0 on success, or a negative error if any notification failed.
132
*/
133
- void (*replay_discarded)(const RamDiscardManager *rdm,
134
- MemoryRegionSection *section,
135
- ReplayRamDiscard replay_fn, void *opaque);
136
+ int (*replay_discarded)(const RamDiscardManager *rdm,
137
+ MemoryRegionSection *section,
138
+ ReplayStateChange replay_fn, void *opaque);
139
140
/**
141
* @register_listener:
142
@@ -XXX,XX +XXX,XX @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
143
144
int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
145
MemoryRegionSection *section,
146
- ReplayRamPopulate replay_fn,
147
+ ReplayStateChange replay_fn,
148
void *opaque);
149
150
-void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
151
- MemoryRegionSection *section,
152
- ReplayRamDiscard replay_fn,
153
- void *opaque);
154
+int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
155
+ MemoryRegionSection *section,
156
+ ReplayStateChange replay_fn,
157
+ void *opaque);
158
159
void ram_discard_manager_register_listener(RamDiscardManager *rdm,
160
RamDiscardListener *rdl,
161
diff --git a/migration/ram.c b/migration/ram.c
162
index XXXXXXX..XXXXXXX 100644
163
--- a/migration/ram.c
164
+++ b/migration/ram.c
165
@@ -XXX,XX +XXX,XX @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
166
return ret;
167
}
168
169
-static void dirty_bitmap_clear_section(MemoryRegionSection *section,
170
- void *opaque)
171
+static int dirty_bitmap_clear_section(MemoryRegionSection *section,
172
+ void *opaque)
173
{
174
const hwaddr offset = section->offset_within_region;
175
const hwaddr size = int128_get64(section->size);
176
@@ -XXX,XX +XXX,XX @@ static void dirty_bitmap_clear_section(MemoryRegionSection *section,
177
}
178
*cleared_bits += bitmap_count_one_with_offset(rb->bmap, start, npages);
179
bitmap_clear(rb->bmap, start, npages);
180
+ return 0;
181
}
182
183
/*
184
diff --git a/system/memory.c b/system/memory.c
185
index XXXXXXX..XXXXXXX 100644
186
--- a/system/memory.c
187
+++ b/system/memory.c
188
@@ -XXX,XX +XXX,XX @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
189
190
int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
191
MemoryRegionSection *section,
192
- ReplayRamPopulate replay_fn,
193
+ ReplayStateChange replay_fn,
194
void *opaque)
195
{
196
RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
197
@@ -XXX,XX +XXX,XX @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
198
return rdmc->replay_populated(rdm, section, replay_fn, opaque);
199
}
200
201
-void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
202
- MemoryRegionSection *section,
203
- ReplayRamDiscard replay_fn,
204
- void *opaque)
205
+int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
206
+ MemoryRegionSection *section,
207
+ ReplayStateChange replay_fn,
208
+ void *opaque)
209
{
210
RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
211
212
g_assert(rdmc->replay_discarded);
213
- rdmc->replay_discarded(rdm, section, replay_fn, opaque);
214
+ return rdmc->replay_discarded(rdm, section, replay_fn, opaque);
215
}
216
217
void ram_discard_manager_register_listener(RamDiscardManager *rdm,
218
--
219
2.43.5
diff view generated by jsdifflib
1
For each ram_discard_manager helper, add a new argument 'is_private' to
1
RamDiscardManager is an interface used by virtio-mem to adjust VFIO
2
indicate the request attribute. If is_private is true, the operation
2
mappings in relation to VM page assignment. It manages the state of
3
targets the private range in the section. For example,
3
populated and discard for the RAM. To accommodate future scnarios for
4
replay_populate(true) will replay the populate operation on private part
4
managing RAM states, such as private and shared states in confidential
5
in the MemoryRegionSection, while replay_popuate(false) will replay
5
VMs, the existing RamDiscardManager interface needs to be generalized.
6
population on shared part.
7
6
8
This helps to distinguish between the states of private/shared and
7
Introduce a parent class, GenericStateManager, to manage a pair of
9
discarded/populated. It is essential for guest_memfd_manager which uses
8
opposite states with RamDiscardManager as its child. The changes include
10
RamDiscardManager interface but can't treat private memory as discarded
9
- Define a new abstract class GenericStateChange.
11
memory. This is because it does not align with the expectation of
10
- Extract six callbacks into GenericStateChangeClass and allow the child
12
current RamDiscardManager users (e.g. live migration), who expect that
11
classes to inherit them.
13
discarded memory is hot-removed and can be skipped when processing guest
12
- Modify RamDiscardManager-related helpers to use GenericStateManager
14
memory. Treating private memory as discarded won't work in the future if
13
ones.
15
live migration needs to handle private memory. For example, live
14
- Define a generic StatChangeListener to extract fields from
16
migration needs to migrate private memory.
15
RamDiscardManager listener which allows future listeners to embed it
16
and avoid duplication.
17
- Change the users of RamDiscardManager (virtio-mem, migration, etc.) to
18
switch to use GenericStateChange helpers.
17
19
18
The user of the helper needs to figure out which attribute to
20
It can provide a more flexible and resuable framework for RAM state
19
manipulate. For legacy VM case, use is_private=true by default. Private
21
management, facilitating future enhancements and use cases.
20
attribute is only valid in a guest_memfd based VM.
21
22
Opportunistically rename the guest_memfd_for_each_{discarded,
23
populated}_section() to guest_memfd_for_each_{private, shared)_section()
24
to distinguish between private/shared and discarded/populated at the
25
same time.
26
22
27
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
23
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
28
---
24
---
29
hw/vfio/common.c | 22 ++++++--
25
Changes in v4:
30
hw/virtio/virtio-mem.c | 23 ++++----
26
- Newly added.
31
include/exec/memory.h | 23 ++++++--
27
---
32
migration/ram.c | 14 ++---
28
hw/vfio/common.c | 30 ++--
33
system/guest-memfd-manager.c | 106 +++++++++++++++++++++++------------
29
hw/virtio/virtio-mem.c | 95 ++++++------
34
system/memory.c | 13 +++--
30
include/exec/memory.h | 313 ++++++++++++++++++++++------------------
35
system/memory_mapping.c | 4 +-
31
migration/ram.c | 16 +-
36
7 files changed, 135 insertions(+), 70 deletions(-)
32
system/memory.c | 106 ++++++++------
33
system/memory_mapping.c | 6 +-
34
6 files changed, 310 insertions(+), 256 deletions(-)
37
35
38
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
36
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
39
index XXXXXXX..XXXXXXX 100644
37
index XXXXXXX..XXXXXXX 100644
40
--- a/hw/vfio/common.c
38
--- a/hw/vfio/common.c
41
+++ b/hw/vfio/common.c
39
+++ b/hw/vfio/common.c
42
@@ -XXX,XX +XXX,XX @@ out:
40
@@ -XXX,XX +XXX,XX @@ out:
43
}
41
rcu_read_unlock();
44
42
}
45
static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
43
46
- MemoryRegionSection *section)
44
-static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
47
+ MemoryRegionSection *section,
45
+static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
48
+ bool is_private)
46
MemoryRegionSection *section)
49
{
47
{
48
+ RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
50
VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
49
VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
51
listener);
50
listener);
51
VFIOContainerBase *bcontainer = vrdl->bcontainer;
52
@@ -XXX,XX +XXX,XX @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
52
@@ -XXX,XX +XXX,XX @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
53
const hwaddr iova = section->offset_within_address_space;
53
}
54
int ret;
54
}
55
55
56
+ if (is_private) {
56
-static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
57
+ /* Not support discard private memory yet. */
57
+static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
58
+ return;
58
MemoryRegionSection *section)
59
+ }
59
{
60
+
60
+ RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
61
/* Unmap with a single call. */
62
ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL);
63
if (ret) {
64
@@ -XXX,XX +XXX,XX @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
65
}
66
67
static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
68
- MemoryRegionSection *section)
69
+ MemoryRegionSection *section,
70
+ bool is_private)
71
{
72
VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
61
VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
73
listener);
62
listener);
74
@@ -XXX,XX +XXX,XX @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
63
VFIOContainerBase *bcontainer = vrdl->bcontainer;
75
void *vaddr;
76
int ret;
77
78
+ if (is_private) {
79
+ /* Not support discard private memory yet. */
80
+ return 0;
81
+ }
82
+
83
/*
84
* Map in (aligned within memory region) minimum granularity, so we can
85
* unmap in minimum granularity later.
86
@@ -XXX,XX +XXX,XX @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
64
@@ -XXX,XX +XXX,XX @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
87
vaddr, section->readonly);
65
vaddr, section->readonly);
88
if (ret) {
66
if (ret) {
89
/* Rollback */
67
/* Rollback */
90
- vfio_ram_discard_notify_discard(rdl, section);
68
- vfio_ram_discard_notify_discard(rdl, section);
91
+ vfio_ram_discard_notify_discard(rdl, section, false);
69
+ vfio_ram_discard_notify_discard(scl, section);
92
return ret;
70
return ret;
93
}
71
}
94
}
72
}
95
@@ -XXX,XX +XXX,XX @@ out:
73
@@ -XXX,XX +XXX,XX @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
96
}
74
static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
97
75
MemoryRegionSection *section)
98
static int vfio_ram_discard_get_dirty_bitmap(MemoryRegionSection *section,
76
{
99
- void *opaque)
77
- RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
100
+ bool is_private, void *opaque)
78
+ GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
101
{
79
VFIORamDiscardListener *vrdl;
102
const hwaddr size = int128_get64(section->size);
80
+ RamDiscardListener *rdl;
103
const hwaddr iova = section->offset_within_address_space;
81
82
/* Ignore some corner cases not relevant in practice. */
83
g_assert(QEMU_IS_ALIGNED(section->offset_within_region, TARGET_PAGE_SIZE));
84
@@ -XXX,XX +XXX,XX @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
85
vrdl->mr = section->mr;
86
vrdl->offset_within_address_space = section->offset_within_address_space;
87
vrdl->size = int128_get64(section->size);
88
- vrdl->granularity = ram_discard_manager_get_min_granularity(rdm,
89
- section->mr);
90
+ vrdl->granularity = generic_state_manager_get_min_granularity(gsm,
91
+ section->mr);
92
93
g_assert(vrdl->granularity && is_power_of_2(vrdl->granularity));
94
g_assert(bcontainer->pgsizes &&
95
vrdl->granularity >= 1ULL << ctz64(bcontainer->pgsizes));
96
97
- ram_discard_listener_init(&vrdl->listener,
98
+ rdl = &vrdl->listener;
99
+ ram_discard_listener_init(rdl,
100
vfio_ram_discard_notify_populate,
101
vfio_ram_discard_notify_discard, true);
102
- ram_discard_manager_register_listener(rdm, &vrdl->listener, section);
103
+ generic_state_manager_register_listener(gsm, &rdl->scl, section);
104
QLIST_INSERT_HEAD(&bcontainer->vrdl_list, vrdl, next);
105
106
/*
107
@@ -XXX,XX +XXX,XX @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
108
static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
109
MemoryRegionSection *section)
110
{
111
- RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
112
+ GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
113
VFIORamDiscardListener *vrdl = NULL;
114
+ RamDiscardListener *rdl;
115
116
QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
117
if (vrdl->mr == section->mr &&
118
@@ -XXX,XX +XXX,XX @@ static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
119
hw_error("vfio: Trying to unregister missing RAM discard listener");
120
}
121
122
- ram_discard_manager_unregister_listener(rdm, &vrdl->listener);
123
+ rdl = &vrdl->listener;
124
+ generic_state_manager_unregister_listener(gsm, &rdl->scl);
125
QLIST_REMOVE(vrdl, next);
126
g_free(vrdl);
127
}
128
@@ -XXX,XX +XXX,XX @@ static int
129
vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
130
MemoryRegionSection *section)
131
{
132
- RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
133
+ GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
134
VFIORamDiscardListener *vrdl = NULL;
135
136
QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
104
@@ -XXX,XX +XXX,XX @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
137
@@ -XXX,XX +XXX,XX @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
105
* We only want/can synchronize the bitmap for actually mapped parts -
138
* We only want/can synchronize the bitmap for actually mapped parts -
106
* which correspond to populated parts. Replay all populated parts.
139
* which correspond to populated parts. Replay all populated parts.
107
*/
140
*/
108
- return ram_discard_manager_replay_populated(rdm, section,
141
- return ram_discard_manager_replay_populated(rdm, section,
109
+ return ram_discard_manager_replay_populated(rdm, section, false,
142
+ return generic_state_manager_replay_on_state_set(gsm, section,
110
vfio_ram_discard_get_dirty_bitmap,
143
vfio_ram_discard_get_dirty_bitmap,
111
&vrdl);
144
&vrdl);
112
}
145
}
113
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
146
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
114
index XXXXXXX..XXXXXXX 100644
147
index XXXXXXX..XXXXXXX 100644
115
--- a/hw/virtio/virtio-mem.c
148
--- a/hw/virtio/virtio-mem.c
116
+++ b/hw/virtio/virtio-mem.c
149
+++ b/hw/virtio/virtio-mem.c
117
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg)
150
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
118
{
151
119
RamDiscardListener *rdl = arg;
152
static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg)
153
{
154
- RamDiscardListener *rdl = arg;
155
+ StateChangeListener *scl = arg;
120
156
121
- return rdl->notify_populate(rdl, s);
157
- return rdl->notify_populate(rdl, s);
122
+ return rdl->notify_populate(rdl, s, false);
158
+ return scl->notify_to_state_set(scl, s);
123
}
159
}
124
160
125
static int virtio_mem_notify_discard_cb(MemoryRegionSection *s, void *arg)
161
static int virtio_mem_notify_discard_cb(MemoryRegionSection *s, void *arg)
126
{
162
{
127
RamDiscardListener *rdl = arg;
163
- RamDiscardListener *rdl = arg;
164
+ StateChangeListener *scl = arg;
128
165
129
- rdl->notify_discard(rdl, s);
166
- rdl->notify_discard(rdl, s);
130
+ rdl->notify_discard(rdl, s, false);
167
+ scl->notify_to_state_clear(scl, s);
131
return 0;
168
return 0;
132
}
169
}
133
170
134
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset,
171
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset,
172
RamDiscardListener *rdl;
173
174
QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
175
- MemoryRegionSection tmp = *rdl->section;
176
+ StateChangeListener *scl = &rdl->scl;
177
+ MemoryRegionSection tmp = *scl->section;
178
135
if (!memory_region_section_intersect_range(&tmp, offset, size)) {
179
if (!memory_region_section_intersect_range(&tmp, offset, size)) {
136
continue;
180
continue;
137
}
181
}
138
- rdl->notify_discard(rdl, &tmp);
182
- rdl->notify_discard(rdl, &tmp);
139
+ rdl->notify_discard(rdl, &tmp, false);
183
+ scl->notify_to_state_clear(scl, &tmp);
140
}
184
}
141
}
185
}
142
186
143
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
187
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
188
int ret = 0;
189
190
QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
191
- MemoryRegionSection tmp = *rdl->section;
192
+ StateChangeListener *scl = &rdl->scl;
193
+ MemoryRegionSection tmp = *scl->section;
194
144
if (!memory_region_section_intersect_range(&tmp, offset, size)) {
195
if (!memory_region_section_intersect_range(&tmp, offset, size)) {
145
continue;
196
continue;
146
}
197
}
147
- ret = rdl->notify_populate(rdl, &tmp);
198
- ret = rdl->notify_populate(rdl, &tmp);
148
+ ret = rdl->notify_populate(rdl, &tmp, false);
199
+ ret = scl->notify_to_state_set(scl, &tmp);
149
if (ret) {
200
if (ret) {
150
break;
201
break;
151
}
202
}
203
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
204
if (ret) {
205
/* Notify all already-notified listeners. */
206
QLIST_FOREACH(rdl2, &vmem->rdl_list, next) {
207
- MemoryRegionSection tmp = *rdl2->section;
208
+ StateChangeListener *scl2 = &rdl2->scl;
209
+ MemoryRegionSection tmp = *scl2->section;
210
211
if (rdl2 == rdl) {
212
break;
152
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
213
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
153
if (!memory_region_section_intersect_range(&tmp, offset, size)) {
214
if (!memory_region_section_intersect_range(&tmp, offset, size)) {
154
continue;
215
continue;
155
}
216
}
156
- rdl2->notify_discard(rdl2, &tmp);
217
- rdl2->notify_discard(rdl2, &tmp);
157
+ rdl2->notify_discard(rdl2, &tmp, false);
218
+ scl2->notify_to_state_clear(scl2, &tmp);
158
}
219
}
159
}
220
}
160
return ret;
221
return ret;
161
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem)
222
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem)
223
}
162
224
163
QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
225
QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
226
+ StateChangeListener *scl = &rdl->scl;
164
if (rdl->double_discard_supported) {
227
if (rdl->double_discard_supported) {
165
- rdl->notify_discard(rdl, rdl->section);
228
- rdl->notify_discard(rdl, rdl->section);
166
+ rdl->notify_discard(rdl, rdl->section, false);
229
+ scl->notify_to_state_clear(scl, scl->section);
167
} else {
230
} else {
168
virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
231
- virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
232
+ virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
169
virtio_mem_notify_discard_cb);
233
virtio_mem_notify_discard_cb);
170
@@ -XXX,XX +XXX,XX @@ static uint64_t virtio_mem_rdm_get_min_granularity(const RamDiscardManager *rdm,
234
}
171
}
235
}
172
236
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
173
static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm,
237
* Set ourselves as RamDiscardManager before the plug handler maps the
174
- const MemoryRegionSection *s)
238
* memory region and exposes it via an address space.
175
+ const MemoryRegionSection *s,
239
*/
176
+ bool is_private)
240
- if (memory_region_set_ram_discard_manager(&vmem->memdev->mr,
177
{
241
- RAM_DISCARD_MANAGER(vmem))) {
178
const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
242
+ if (memory_region_set_generic_state_manager(&vmem->memdev->mr,
243
+ GENERIC_STATE_MANAGER(vmem))) {
244
error_setg(errp, "Failed to set RamDiscardManager");
245
ram_block_coordinated_discard_require(false);
246
return;
247
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_device_unrealize(DeviceState *dev)
248
* The unplug handler unmapped the memory region, it cannot be
249
* found via an address space anymore. Unset ourselves.
250
*/
251
- memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
252
+ memory_region_set_generic_state_manager(&vmem->memdev->mr, NULL);
253
ram_block_coordinated_discard_require(false);
254
}
255
256
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_post_load_bitmap(VirtIOMEM *vmem)
257
* into an address space. Replay, now that we updated the bitmap.
258
*/
259
QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
260
- ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
261
+ StateChangeListener *scl = &rdl->scl;
262
+ ret = virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
263
virtio_mem_notify_populate_cb);
264
if (ret) {
265
return ret;
266
@@ -XXX,XX +XXX,XX @@ static const Property virtio_mem_properties[] = {
267
dynamic_memslots, false),
268
};
269
270
-static uint64_t virtio_mem_rdm_get_min_granularity(const RamDiscardManager *rdm,
271
+static uint64_t virtio_mem_rdm_get_min_granularity(const GenericStateManager *gsm,
272
const MemoryRegion *mr)
273
{
274
- const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
275
+ const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
276
277
g_assert(mr == &vmem->memdev->mr);
278
return vmem->block_size;
279
}
280
281
-static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm,
282
+static bool virtio_mem_rdm_is_populated(const GenericStateManager *gsm,
283
const MemoryRegionSection *s)
284
{
285
- const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
286
+ const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
179
uint64_t start_gpa = vmem->addr + s->offset_within_region;
287
uint64_t start_gpa = vmem->addr + s->offset_within_region;
288
uint64_t end_gpa = start_gpa + int128_get64(s->size);
289
180
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg)
290
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg)
181
{
291
return data->fn(s, data->opaque);
182
struct VirtIOMEMReplayData *data = arg;
292
}
183
293
184
- return ((ReplayRamPopulate)data->fn)(s, data->opaque);
294
-static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm,
185
+ return ((ReplayRamPopulate)data->fn)(s, false, data->opaque);
295
+static int virtio_mem_rdm_replay_populated(const GenericStateManager *gsm,
186
}
187
188
static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm,
189
MemoryRegionSection *s,
296
MemoryRegionSection *s,
190
+ bool is_private,
297
ReplayStateChange replay_fn,
191
ReplayRamPopulate replay_fn,
192
void *opaque)
298
void *opaque)
193
{
299
{
300
- const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
301
+ const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
302
struct VirtIOMEMReplayData data = {
303
.fn = replay_fn,
304
.opaque = opaque,
194
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s,
305
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s,
195
{
196
struct VirtIOMEMReplayData *data = arg;
197
198
- ((ReplayRamDiscard)data->fn)(s, data->opaque);
199
+ ((ReplayRamDiscard)data->fn)(s, false, data->opaque);
200
return 0;
306
return 0;
201
}
307
}
202
308
203
static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
309
-static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
204
MemoryRegionSection *s,
310
+static int virtio_mem_rdm_replay_discarded(const GenericStateManager *gsm,
205
+ bool is_private,
311
MemoryRegionSection *s,
206
ReplayRamDiscard replay_fn,
312
ReplayStateChange replay_fn,
207
void *opaque)
313
void *opaque)
208
{
314
{
209
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_rdm_unregister_listener(RamDiscardManager *rdm,
315
- const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
210
g_assert(rdl->section->mr == &vmem->memdev->mr);
316
+ const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
317
struct VirtIOMEMReplayData data = {
318
.fn = replay_fn,
319
.opaque = opaque,
320
@@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
321
virtio_mem_rdm_replay_discarded_cb);
322
}
323
324
-static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
325
- RamDiscardListener *rdl,
326
+static void virtio_mem_rdm_register_listener(GenericStateManager *gsm,
327
+ StateChangeListener *scl,
328
MemoryRegionSection *s)
329
{
330
- VirtIOMEM *vmem = VIRTIO_MEM(rdm);
331
+ VirtIOMEM *vmem = VIRTIO_MEM(gsm);
332
+ RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
333
int ret;
334
335
g_assert(s->mr == &vmem->memdev->mr);
336
- rdl->section = memory_region_section_new_copy(s);
337
+ scl->section = memory_region_section_new_copy(s);
338
339
QLIST_INSERT_HEAD(&vmem->rdl_list, rdl, next);
340
- ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
341
+ ret = virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
342
virtio_mem_notify_populate_cb);
343
if (ret) {
344
error_report("%s: Replaying plugged ranges failed: %s", __func__,
345
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
346
}
347
}
348
349
-static void virtio_mem_rdm_unregister_listener(RamDiscardManager *rdm,
350
- RamDiscardListener *rdl)
351
+static void virtio_mem_rdm_unregister_listener(GenericStateManager *gsm,
352
+ StateChangeListener *scl)
353
{
354
- VirtIOMEM *vmem = VIRTIO_MEM(rdm);
355
+ VirtIOMEM *vmem = VIRTIO_MEM(gsm);
356
+ RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
357
358
- g_assert(rdl->section->mr == &vmem->memdev->mr);
359
+ g_assert(scl->section->mr == &vmem->memdev->mr);
211
if (vmem->size) {
360
if (vmem->size) {
212
if (rdl->double_discard_supported) {
361
if (rdl->double_discard_supported) {
213
- rdl->notify_discard(rdl, rdl->section);
362
- rdl->notify_discard(rdl, rdl->section);
214
+ rdl->notify_discard(rdl, rdl->section, false);
363
+ scl->notify_to_state_clear(scl, scl->section);
215
} else {
364
} else {
216
virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
365
- virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
366
+ virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
217
virtio_mem_notify_discard_cb);
367
virtio_mem_notify_discard_cb);
368
}
369
}
370
371
- memory_region_section_free_copy(rdl->section);
372
- rdl->section = NULL;
373
+ memory_region_section_free_copy(scl->section);
374
+ scl->section = NULL;
375
QLIST_REMOVE(rdl, next);
376
}
377
378
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_class_init(ObjectClass *klass, void *data)
379
DeviceClass *dc = DEVICE_CLASS(klass);
380
VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
381
VirtIOMEMClass *vmc = VIRTIO_MEM_CLASS(klass);
382
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(klass);
383
+ GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(klass);
384
385
device_class_set_props(dc, virtio_mem_properties);
386
dc->vmsd = &vmstate_virtio_mem;
387
@@ -XXX,XX +XXX,XX @@ static void virtio_mem_class_init(ObjectClass *klass, void *data)
388
vmc->remove_size_change_notifier = virtio_mem_remove_size_change_notifier;
389
vmc->unplug_request_check = virtio_mem_unplug_request_check;
390
391
- rdmc->get_min_granularity = virtio_mem_rdm_get_min_granularity;
392
- rdmc->is_populated = virtio_mem_rdm_is_populated;
393
- rdmc->replay_populated = virtio_mem_rdm_replay_populated;
394
- rdmc->replay_discarded = virtio_mem_rdm_replay_discarded;
395
- rdmc->register_listener = virtio_mem_rdm_register_listener;
396
- rdmc->unregister_listener = virtio_mem_rdm_unregister_listener;
397
+ gsmc->get_min_granularity = virtio_mem_rdm_get_min_granularity;
398
+ gsmc->is_state_set = virtio_mem_rdm_is_populated;
399
+ gsmc->replay_on_state_set = virtio_mem_rdm_replay_populated;
400
+ gsmc->replay_on_state_clear = virtio_mem_rdm_replay_discarded;
401
+ gsmc->register_listener = virtio_mem_rdm_register_listener;
402
+ gsmc->unregister_listener = virtio_mem_rdm_unregister_listener;
403
}
404
405
static const TypeInfo virtio_mem_info = {
218
diff --git a/include/exec/memory.h b/include/exec/memory.h
406
diff --git a/include/exec/memory.h b/include/exec/memory.h
219
index XXXXXXX..XXXXXXX 100644
407
index XXXXXXX..XXXXXXX 100644
220
--- a/include/exec/memory.h
408
--- a/include/exec/memory.h
221
+++ b/include/exec/memory.h
409
+++ b/include/exec/memory.h
410
@@ -XXX,XX +XXX,XX @@ typedef struct IOMMUMemoryRegionClass IOMMUMemoryRegionClass;
411
DECLARE_OBJ_CHECKERS(IOMMUMemoryRegion, IOMMUMemoryRegionClass,
412
IOMMU_MEMORY_REGION, TYPE_IOMMU_MEMORY_REGION)
413
414
+#define TYPE_GENERIC_STATE_MANAGER "generic-state-manager"
415
+typedef struct GenericStateManagerClass GenericStateManagerClass;
416
+typedef struct GenericStateManager GenericStateManager;
417
+DECLARE_OBJ_CHECKERS(GenericStateManager, GenericStateManagerClass,
418
+ GENERIC_STATE_MANAGER, TYPE_GENERIC_STATE_MANAGER)
419
+
420
#define TYPE_RAM_DISCARD_MANAGER "ram-discard-manager"
421
typedef struct RamDiscardManagerClass RamDiscardManagerClass;
422
typedef struct RamDiscardManager RamDiscardManager;
222
@@ -XXX,XX +XXX,XX @@ struct IOMMUMemoryRegionClass {
423
@@ -XXX,XX +XXX,XX @@ struct IOMMUMemoryRegionClass {
223
424
int (*num_indexes)(IOMMUMemoryRegion *iommu);
224
typedef struct RamDiscardListener RamDiscardListener;
425
};
225
typedef int (*NotifyRamPopulate)(RamDiscardListener *rdl,
426
427
-typedef struct RamDiscardListener RamDiscardListener;
428
-typedef int (*NotifyRamPopulate)(RamDiscardListener *rdl,
226
- MemoryRegionSection *section);
429
- MemoryRegionSection *section);
430
-typedef void (*NotifyRamDiscard)(RamDiscardListener *rdl,
431
+typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
432
+
433
+typedef struct StateChangeListener StateChangeListener;
434
+typedef int (*NotifyStateSet)(StateChangeListener *scl,
435
+ MemoryRegionSection *section);
436
+typedef void (*NotifyStateClear)(StateChangeListener *scl,
437
MemoryRegionSection *section);
438
439
-struct RamDiscardListener {
440
+struct StateChangeListener {
441
/*
442
- * @notify_populate:
443
+ * @notify_to_state_set:
444
*
445
- * Notification that previously discarded memory is about to get populated.
446
- * Listeners are able to object. If any listener objects, already
447
- * successfully notified listeners are notified about a discard again.
448
+ * Notification that previously state clear part is about to be set.
449
*
450
- * @rdl: the #RamDiscardListener getting notified
451
- * @section: the #MemoryRegionSection to get populated. The section
452
+ * @scl: the #StateChangeListener getting notified
453
+ * @section: the #MemoryRegionSection to be state-set. The section
454
* is aligned within the memory region to the minimum granularity
455
* unless it would exceed the registered section.
456
*
457
* Returns 0 on success. If the notification is rejected by the listener,
458
* an error is returned.
459
*/
460
- NotifyRamPopulate notify_populate;
461
+ NotifyStateSet notify_to_state_set;
462
463
/*
464
- * @notify_discard:
465
+ * @notify_to_state_clear:
466
*
467
- * Notification that previously populated memory was discarded successfully
468
- * and listeners should drop all references to such memory and prevent
469
- * new population (e.g., unmap).
470
+ * Notification that previously state set part is about to be cleared
471
*
472
- * @rdl: the #RamDiscardListener getting notified
473
- * @section: the #MemoryRegionSection to get populated. The section
474
+ * @scl: the #StateChangeListener getting notified
475
+ * @section: the #MemoryRegionSection to be state-cleared. The section
476
* is aligned within the memory region to the minimum granularity
477
* unless it would exceed the registered section.
478
- */
479
- NotifyRamDiscard notify_discard;
480
-
481
- /*
482
- * @double_discard_supported:
483
*
484
- * The listener suppors getting @notify_discard notifications that span
485
- * already discarded parts.
486
+ * Returns 0 on success. If the notification is rejected by the listener,
487
+ * an error is returned.
488
*/
489
- bool double_discard_supported;
490
+ NotifyStateClear notify_to_state_clear;
491
492
MemoryRegionSection *section;
493
- QLIST_ENTRY(RamDiscardListener) next;
494
};
495
496
-static inline void ram_discard_listener_init(RamDiscardListener *rdl,
497
- NotifyRamPopulate populate_fn,
498
- NotifyRamDiscard discard_fn,
499
- bool double_discard_supported)
500
-{
501
- rdl->notify_populate = populate_fn;
502
- rdl->notify_discard = discard_fn;
503
- rdl->double_discard_supported = double_discard_supported;
504
-}
505
-
506
-typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
507
-
508
/*
509
- * RamDiscardManagerClass:
510
- *
511
- * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion
512
- * regions are currently populated to be used/accessed by the VM, notifying
513
- * after parts were discarded (freeing up memory) and before parts will be
514
- * populated (consuming memory), to be used/accessed by the VM.
515
- *
516
- * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the
517
- * #MemoryRegion isn't mapped into an address space yet (either directly
518
- * or via an alias); it cannot change while the #MemoryRegion is
519
- * mapped into an address space.
520
+ * GenericStateManagerClass:
521
*
522
- * The #RamDiscardManager is intended to be used by technologies that are
523
- * incompatible with discarding of RAM (e.g., VFIO, which may pin all
524
- * memory inside a #MemoryRegion), and require proper coordination to only
525
- * map the currently populated parts, to hinder parts that are expected to
526
- * remain discarded from silently getting populated and consuming memory.
527
- * Technologies that support discarding of RAM don't have to bother and can
528
- * simply map the whole #MemoryRegion.
529
- *
530
- * An example #RamDiscardManager is virtio-mem, which logically (un)plugs
531
- * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
532
- * Logically unplugging memory consists of discarding RAM. The VM agreed to not
533
- * access unplugged (discarded) memory - especially via DMA. virtio-mem will
534
- * properly coordinate with listeners before memory is plugged (populated),
535
- * and after memory is unplugged (discarded).
536
+ * A #GenericStateManager is a common interface used to manage the state of
537
+ * a #MemoryRegion. The managed states is a pair of opposite states, such as
538
+ * populated and discarded, or private and shared. It is abstract as set and
539
+ * clear in below callbacks, and the actual state is managed by the
540
+ * implementation.
541
*
542
- * Listeners are called in multiples of the minimum granularity (unless it
543
- * would exceed the registered range) and changes are aligned to the minimum
544
- * granularity within the #MemoryRegion. Listeners have to prepare for memory
545
- * becoming discarded in a different granularity than it was populated and the
546
- * other way around.
547
*/
548
-struct RamDiscardManagerClass {
549
+struct GenericStateManagerClass {
550
/* private */
551
InterfaceClass parent_class;
552
553
@@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass {
554
* @get_min_granularity:
555
*
556
* Get the minimum granularity in which listeners will get notified
557
- * about changes within the #MemoryRegion via the #RamDiscardManager.
558
+ * about changes within the #MemoryRegion via the #GenericStateManager.
559
*
560
- * @rdm: the #RamDiscardManager
561
+ * @gsm: the #GenericStateManager
562
* @mr: the #MemoryRegion
563
*
564
* Returns the minimum granularity.
565
*/
566
- uint64_t (*get_min_granularity)(const RamDiscardManager *rdm,
567
+ uint64_t (*get_min_granularity)(const GenericStateManager *gsm,
568
const MemoryRegion *mr);
569
570
/**
571
- * @is_populated:
572
+ * @is_state_set:
573
*
574
- * Check whether the given #MemoryRegionSection is completely populated
575
- * (i.e., no parts are currently discarded) via the #RamDiscardManager.
576
- * There are no alignment requirements.
577
+ * Check whether the given #MemoryRegionSection state is set.
578
+ * via the #GenericStateManager.
579
*
580
- * @rdm: the #RamDiscardManager
581
+ * @gsm: the #GenericStateManager
582
* @section: the #MemoryRegionSection
583
*
584
- * Returns whether the given range is completely populated.
585
+ * Returns whether the given range is completely set.
586
*/
587
- bool (*is_populated)(const RamDiscardManager *rdm,
588
+ bool (*is_state_set)(const GenericStateManager *gsm,
589
const MemoryRegionSection *section);
590
591
/**
592
- * @replay_populated:
593
+ * @replay_on_state_set:
594
*
595
- * Call the #ReplayStateChange callback for all populated parts within the
596
- * #MemoryRegionSection via the #RamDiscardManager.
597
+ * Call the #ReplayStateChange callback for all state set parts within the
598
+ * #MemoryRegionSection via the #GenericStateManager.
599
*
600
* In case any call fails, no further calls are made.
601
*
602
- * @rdm: the #RamDiscardManager
603
+ * @gsm: the #GenericStateManager
604
* @section: the #MemoryRegionSection
605
* @replay_fn: the #ReplayStateChange callback
606
* @opaque: pointer to forward to the callback
607
*
608
* Returns 0 on success, or a negative error if any notification failed.
609
*/
610
- int (*replay_populated)(const RamDiscardManager *rdm,
611
- MemoryRegionSection *section,
612
- ReplayStateChange replay_fn, void *opaque);
613
+ int (*replay_on_state_set)(const GenericStateManager *gsm,
614
+ MemoryRegionSection *section,
615
+ ReplayStateChange replay_fn, void *opaque);
616
617
/**
618
- * @replay_discarded:
619
+ * @replay_on_state_clear:
620
*
621
- * Call the #ReplayStateChange callback for all discarded parts within the
622
- * #MemoryRegionSection via the #RamDiscardManager.
623
+ * Call the #ReplayStateChange callback for all state clear parts within the
624
+ * #MemoryRegionSection via the #GenericStateManager.
625
+ *
626
+ * In case any call fails, no further calls are made.
627
*
628
- * @rdm: the #RamDiscardManager
629
+ * @gsm: the #GenericStateManager
630
* @section: the #MemoryRegionSection
631
* @replay_fn: the #ReplayStateChange callback
632
* @opaque: pointer to forward to the callback
633
*
634
* Returns 0 on success, or a negative error if any notification failed.
635
*/
636
- int (*replay_discarded)(const RamDiscardManager *rdm,
637
- MemoryRegionSection *section,
638
- ReplayStateChange replay_fn, void *opaque);
639
+ int (*replay_on_state_clear)(const GenericStateManager *gsm,
227
+ MemoryRegionSection *section,
640
+ MemoryRegionSection *section,
228
+ bool is_private);
641
+ ReplayStateChange replay_fn, void *opaque);
229
typedef void (*NotifyRamDiscard)(RamDiscardListener *rdl,
642
230
- MemoryRegionSection *section);
643
/**
231
+ MemoryRegionSection *section,
644
* @register_listener:
232
+ bool is_private);
645
*
233
646
- * Register a #RamDiscardListener for the given #MemoryRegionSection and
234
struct RamDiscardListener {
647
- * immediately notify the #RamDiscardListener about all populated parts
235
/*
648
- * within the #MemoryRegionSection via the #RamDiscardManager.
236
@@ -XXX,XX +XXX,XX @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl,
649
+ * Register a #StateChangeListener for the given #MemoryRegionSection and
237
rdl->double_discard_supported = double_discard_supported;
650
+ * immediately notify the #StateChangeListener about all state-set parts
238
}
651
+ * within the #MemoryRegionSection via the #GenericStateManager.
239
652
*
240
-typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaque);
653
* In case any notification fails, no further notifications are triggered
241
-typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void *opaque);
654
* and an error is logged.
242
+typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, bool is_private, void *opaque);
655
*
243
+typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, bool is_private, void *opaque);
656
- * @rdm: the #RamDiscardManager
244
657
- * @rdl: the #RamDiscardListener
245
/*
658
+ * @rdm: the #GenericStateManager
246
* RamDiscardManagerClass:
659
+ * @rdl: the #StateChangeListener
247
@@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass {
248
*
249
* @rdm: the #RamDiscardManager
250
* @section: the #MemoryRegionSection
660
* @section: the #MemoryRegionSection
251
+ * @is_private: the attribute of the request section
661
*/
252
*
662
- void (*register_listener)(RamDiscardManager *rdm,
253
* Returns whether the given range is completely populated.
663
- RamDiscardListener *rdl,
254
*/
664
+ void (*register_listener)(GenericStateManager *gsm,
255
bool (*is_populated)(const RamDiscardManager *rdm,
665
+ StateChangeListener *scl,
256
- const MemoryRegionSection *section);
666
MemoryRegionSection *section);
257
+ const MemoryRegionSection *section,
258
+ bool is_private);
259
667
260
/**
668
/**
261
* @replay_populated:
669
* @unregister_listener:
262
@@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass {
670
*
263
*
671
- * Unregister a previously registered #RamDiscardListener via the
264
* @rdm: the #RamDiscardManager
672
- * #RamDiscardManager after notifying the #RamDiscardListener about all
265
* @section: the #MemoryRegionSection
673
- * populated parts becoming unpopulated within the registered
266
+ * @is_private: the attribute of the populated parts
674
+ * Unregister a previously registered #StateChangeListener via the
267
* @replay_fn: the #ReplayRamPopulate callback
675
+ * #GenericStateManager after notifying the #StateChangeListener about all
268
* @opaque: pointer to forward to the callback
676
+ * state-set parts becoming state-cleared within the registered
269
*
677
* #MemoryRegionSection.
270
@@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass {
678
*
271
*/
679
- * @rdm: the #RamDiscardManager
272
int (*replay_populated)(const RamDiscardManager *rdm,
680
- * @rdl: the #RamDiscardListener
273
MemoryRegionSection *section,
681
+ * @rdm: the #GenericStateManager
274
+ bool is_private,
682
+ * @rdl: the #StateChangeListener
275
ReplayRamPopulate replay_fn, void *opaque);
683
*/
276
684
- void (*unregister_listener)(RamDiscardManager *rdm,
277
/**
685
- RamDiscardListener *rdl);
278
@@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass {
686
+ void (*unregister_listener)(GenericStateManager *gsm,
279
*
687
+ StateChangeListener *scl);
280
* @rdm: the #RamDiscardManager
688
};
281
* @section: the #MemoryRegionSection
689
282
+ * @is_private: the attribute of the discarded parts
690
-uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
283
* @replay_fn: the #ReplayRamDiscard callback
691
- const MemoryRegion *mr);
284
* @opaque: pointer to forward to the callback
692
+uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm,
285
*/
693
+ const MemoryRegion *mr);
286
void (*replay_discarded)(const RamDiscardManager *rdm,
694
287
MemoryRegionSection *section,
695
-bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
288
+ bool is_private,
289
ReplayRamDiscard replay_fn, void *opaque);
290
291
/**
292
@@ -XXX,XX +XXX,XX @@ uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
293
const MemoryRegion *mr);
294
295
bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
296
- const MemoryRegionSection *section);
696
- const MemoryRegionSection *section);
297
+ const MemoryRegionSection *section,
697
+bool generic_state_manager_is_state_set(const GenericStateManager *gsm,
298
+ bool is_private);
698
+ const MemoryRegionSection *section);
299
699
300
int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
700
-int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
301
MemoryRegionSection *section,
701
- MemoryRegionSection *section,
302
+ bool is_private,
702
- ReplayStateChange replay_fn,
303
ReplayRamPopulate replay_fn,
703
- void *opaque);
304
void *opaque);
704
+int generic_state_manager_replay_on_state_set(const GenericStateManager *gsm,
305
705
+ MemoryRegionSection *section,
306
void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
706
+ ReplayStateChange replay_fn,
307
MemoryRegionSection *section,
707
+ void *opaque);
308
+ bool is_private,
708
309
ReplayRamDiscard replay_fn,
709
-int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
310
void *opaque);
710
- MemoryRegionSection *section,
311
711
- ReplayStateChange replay_fn,
712
- void *opaque);
713
+int generic_state_manager_replay_on_state_clear(const GenericStateManager *gsm,
714
+ MemoryRegionSection *section,
715
+ ReplayStateChange replay_fn,
716
+ void *opaque);
717
718
-void ram_discard_manager_register_listener(RamDiscardManager *rdm,
719
- RamDiscardListener *rdl,
720
- MemoryRegionSection *section);
721
+void generic_state_manager_register_listener(GenericStateManager *gsm,
722
+ StateChangeListener *scl,
723
+ MemoryRegionSection *section);
724
725
-void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
726
- RamDiscardListener *rdl);
727
+void generic_state_manager_unregister_listener(GenericStateManager *gsm,
728
+ StateChangeListener *scl);
729
+
730
+typedef struct RamDiscardListener RamDiscardListener;
731
+
732
+struct RamDiscardListener {
733
+ struct StateChangeListener scl;
734
+
735
+ /*
736
+ * @double_discard_supported:
737
+ *
738
+ * The listener suppors getting @notify_discard notifications that span
739
+ * already discarded parts.
740
+ */
741
+ bool double_discard_supported;
742
+
743
+ QLIST_ENTRY(RamDiscardListener) next;
744
+};
745
+
746
+static inline void ram_discard_listener_init(RamDiscardListener *rdl,
747
+ NotifyStateSet populate_fn,
748
+ NotifyStateClear discard_fn,
749
+ bool double_discard_supported)
750
+{
751
+ rdl->scl.notify_to_state_set = populate_fn;
752
+ rdl->scl.notify_to_state_clear = discard_fn;
753
+ rdl->double_discard_supported = double_discard_supported;
754
+}
755
+
756
+/*
757
+ * RamDiscardManagerClass:
758
+ *
759
+ * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion
760
+ * regions are currently populated to be used/accessed by the VM, notifying
761
+ * after parts were discarded (freeing up memory) and before parts will be
762
+ * populated (consuming memory), to be used/accessed by the VM.
763
+ *
764
+ * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the
765
+ * #MemoryRegion isn't mapped into an address space yet (either directly
766
+ * or via an alias); it cannot change while the #MemoryRegion is
767
+ * mapped into an address space.
768
+ *
769
+ * The #RamDiscardManager is intended to be used by technologies that are
770
+ * incompatible with discarding of RAM (e.g., VFIO, which may pin all
771
+ * memory inside a #MemoryRegion), and require proper coordination to only
772
+ * map the currently populated parts, to hinder parts that are expected to
773
+ * remain discarded from silently getting populated and consuming memory.
774
+ * Technologies that support discarding of RAM don't have to bother and can
775
+ * simply map the whole #MemoryRegion.
776
+ *
777
+ * An example #RamDiscardManager is virtio-mem, which logically (un)plugs
778
+ * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
779
+ * Logically unplugging memory consists of discarding RAM. The VM agreed to not
780
+ * access unplugged (discarded) memory - especially via DMA. virtio-mem will
781
+ * properly coordinate with listeners before memory is plugged (populated),
782
+ * and after memory is unplugged (discarded).
783
+ *
784
+ * Listeners are called in multiples of the minimum granularity (unless it
785
+ * would exceed the registered range) and changes are aligned to the minimum
786
+ * granularity within the #MemoryRegion. Listeners have to prepare for memory
787
+ * becoming discarded in a different granularity than it was populated and the
788
+ * other way around.
789
+ */
790
+struct RamDiscardManagerClass {
791
+ /* private */
792
+ GenericStateManagerClass parent_class;
793
+};
794
795
/**
796
* memory_get_xlat_addr: Extract addresses from a TLB entry
797
@@ -XXX,XX +XXX,XX @@ struct MemoryRegion {
798
const char *name;
799
unsigned ioeventfd_nb;
800
MemoryRegionIoeventfd *ioeventfds;
801
- RamDiscardManager *rdm; /* Only for RAM */
802
+ GenericStateManager *gsm; /* Only for RAM */
803
804
/* For devices designed to perform re-entrant IO into their own IO MRs */
805
bool disable_reentrancy_guard;
806
@@ -XXX,XX +XXX,XX @@ bool memory_region_present(MemoryRegion *container, hwaddr addr);
807
bool memory_region_is_mapped(MemoryRegion *mr);
808
809
/**
810
- * memory_region_get_ram_discard_manager: get the #RamDiscardManager for a
811
+ * memory_region_get_generic_state_manager: get the #GenericStateManager for a
812
* #MemoryRegion
813
*
814
- * The #RamDiscardManager cannot change while a memory region is mapped.
815
+ * The #GenericStateManager cannot change while a memory region is mapped.
816
*
817
* @mr: the #MemoryRegion
818
*/
819
-RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr);
820
+GenericStateManager *memory_region_get_generic_state_manager(MemoryRegion *mr);
821
822
/**
823
- * memory_region_has_ram_discard_manager: check whether a #MemoryRegion has a
824
- * #RamDiscardManager assigned
825
+ * memory_region_set_generic_state_manager: set the #GenericStateManager for a
826
+ * #MemoryRegion
827
+ *
828
+ * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
829
+ * that does not cover RAM, or a #MemoryRegion that already has a
830
+ * #GenericStateManager assigned. Return 0 if the gsm is set successfully.
831
*
832
* @mr: the #MemoryRegion
833
+ * @gsm: #GenericStateManager to set
834
*/
835
-static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
836
-{
837
- return !!memory_region_get_ram_discard_manager(mr);
838
-}
839
+int memory_region_set_generic_state_manager(MemoryRegion *mr,
840
+ GenericStateManager *gsm);
841
842
/**
843
- * memory_region_set_ram_discard_manager: set the #RamDiscardManager for a
844
- * #MemoryRegion
845
- *
846
- * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
847
- * that does not cover RAM, or a #MemoryRegion that already has a
848
- * #RamDiscardManager assigned. Return 0 if the rdm is set successfully.
849
+ * memory_region_has_ram_discard_manager: check whether a #MemoryRegion has a
850
+ * #RamDiscardManager assigned
851
*
852
* @mr: the #MemoryRegion
853
- * @rdm: #RamDiscardManager to set
854
*/
855
-int memory_region_set_ram_discard_manager(MemoryRegion *mr,
856
- RamDiscardManager *rdm);
857
+bool memory_region_has_ram_discard_manager(MemoryRegion *mr);
858
859
/**
860
* memory_region_find: translate an address/size relative to a
312
diff --git a/migration/ram.c b/migration/ram.c
861
diff --git a/migration/ram.c b/migration/ram.c
313
index XXXXXXX..XXXXXXX 100644
862
index XXXXXXX..XXXXXXX 100644
314
--- a/migration/ram.c
863
--- a/migration/ram.c
315
+++ b/migration/ram.c
864
+++ b/migration/ram.c
316
@@ -XXX,XX +XXX,XX @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
317
}
318
319
static void dirty_bitmap_clear_section(MemoryRegionSection *section,
320
- void *opaque)
321
+ bool is_private, void *opaque)
322
{
323
const hwaddr offset = section->offset_within_region;
324
const hwaddr size = int128_get64(section->size);
325
@@ -XXX,XX +XXX,XX @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb)
865
@@ -XXX,XX +XXX,XX @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb)
866
uint64_t cleared_bits = 0;
867
868
if (rb->mr && rb->bmap && memory_region_has_ram_discard_manager(rb->mr)) {
869
- RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
870
+ GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
871
MemoryRegionSection section = {
872
.mr = rb->mr,
873
.offset_within_region = 0,
326
.size = int128_make64(qemu_ram_get_used_length(rb)),
874
.size = int128_make64(qemu_ram_get_used_length(rb)),
327
};
875
};
328
876
329
- ram_discard_manager_replay_discarded(rdm, &section,
877
- ram_discard_manager_replay_discarded(rdm, &section,
330
+ ram_discard_manager_replay_discarded(rdm, &section, false,
878
+ generic_state_manager_replay_on_state_clear(gsm, &section,
331
dirty_bitmap_clear_section,
879
dirty_bitmap_clear_section,
332
&cleared_bits);
880
&cleared_bits);
333
}
881
}
334
@@ -XXX,XX +XXX,XX @@ bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start)
882
@@ -XXX,XX +XXX,XX @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb)
883
bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start)
884
{
885
if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
886
- RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
887
+ GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
888
MemoryRegionSection section = {
889
.mr = rb->mr,
890
.offset_within_region = start,
335
.size = int128_make64(qemu_ram_pagesize(rb)),
891
.size = int128_make64(qemu_ram_pagesize(rb)),
336
};
892
};
337
893
338
- return !ram_discard_manager_is_populated(rdm, &section);
894
- return !ram_discard_manager_is_populated(rdm, &section);
339
+ return !ram_discard_manager_is_populated(rdm, &section, false);
895
+ return !generic_state_manager_is_state_set(gsm, &section);
340
}
896
}
341
return false;
897
return false;
342
}
898
}
343
@@ -XXX,XX +XXX,XX @@ static inline void populate_read_range(RAMBlock *block, ram_addr_t offset,
344
}
345
346
static inline int populate_read_section(MemoryRegionSection *section,
347
- void *opaque)
348
+ bool is_private, void *opaque)
349
{
350
const hwaddr size = int128_get64(section->size);
351
hwaddr offset = section->offset_within_region;
352
@@ -XXX,XX +XXX,XX @@ static void ram_block_populate_read(RAMBlock *rb)
899
@@ -XXX,XX +XXX,XX @@ static void ram_block_populate_read(RAMBlock *rb)
900
* Note: The result is only stable while migrating (precopy/postcopy).
901
*/
902
if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
903
- RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
904
+ GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
905
MemoryRegionSection section = {
906
.mr = rb->mr,
907
.offset_within_region = 0,
353
.size = rb->mr->size,
908
.size = rb->mr->size,
354
};
909
};
355
910
356
- ram_discard_manager_replay_populated(rdm, &section,
911
- ram_discard_manager_replay_populated(rdm, &section,
357
+ ram_discard_manager_replay_populated(rdm, &section, false,
912
+ generic_state_manager_replay_on_state_set(gsm, &section,
358
populate_read_section, NULL);
913
populate_read_section, NULL);
359
} else {
914
} else {
360
populate_read_range(rb, 0, rb->used_length);
915
populate_read_range(rb, 0, rb->used_length);
361
@@ -XXX,XX +XXX,XX @@ void ram_write_tracking_prepare(void)
362
}
363
364
static inline int uffd_protect_section(MemoryRegionSection *section,
365
- void *opaque)
366
+ bool is_private, void *opaque)
367
{
368
const hwaddr size = int128_get64(section->size);
369
const hwaddr offset = section->offset_within_region;
370
@@ -XXX,XX +XXX,XX @@ static int ram_block_uffd_protect(RAMBlock *rb, int uffd_fd)
916
@@ -XXX,XX +XXX,XX @@ static int ram_block_uffd_protect(RAMBlock *rb, int uffd_fd)
917
918
/* See ram_block_populate_read() */
919
if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
920
- RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
921
+ GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
922
MemoryRegionSection section = {
923
.mr = rb->mr,
924
.offset_within_region = 0,
371
.size = rb->mr->size,
925
.size = rb->mr->size,
372
};
926
};
373
927
374
- return ram_discard_manager_replay_populated(rdm, &section,
928
- return ram_discard_manager_replay_populated(rdm, &section,
375
+ return ram_discard_manager_replay_populated(rdm, &section, false,
929
+ return generic_state_manager_replay_on_state_set(gsm, &section,
376
uffd_protect_section,
930
uffd_protect_section,
377
(void *)(uintptr_t)uffd_fd);
931
(void *)(uintptr_t)uffd_fd);
378
}
932
}
379
diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c
380
index XXXXXXX..XXXXXXX 100644
381
--- a/system/guest-memfd-manager.c
382
+++ b/system/guest-memfd-manager.c
383
@@ -XXX,XX +XXX,XX @@ OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(GuestMemfdManager,
384
{ })
385
386
static bool guest_memfd_rdm_is_populated(const RamDiscardManager *rdm,
387
- const MemoryRegionSection *section)
388
+ const MemoryRegionSection *section,
389
+ bool is_private)
390
{
391
const GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm);
392
uint64_t first_bit = section->offset_within_region / gmm->block_size;
393
uint64_t last_bit = first_bit + int128_get64(section->size) / gmm->block_size - 1;
394
unsigned long first_discard_bit;
395
396
- first_discard_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit);
397
+ if (is_private) {
398
+ /* Check if the private section is populated */
399
+ first_discard_bit = find_next_bit(gmm->bitmap, last_bit + 1, first_bit);
400
+ } else {
401
+ /* Check if the shared section is populated */
402
+ first_discard_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit);
403
+ }
404
+
405
return first_discard_bit > last_bit;
406
}
407
408
-typedef int (*guest_memfd_section_cb)(MemoryRegionSection *s, void *arg);
409
+typedef int (*guest_memfd_section_cb)(MemoryRegionSection *s, bool is_private,
410
+ void *arg);
411
412
-static int guest_memfd_notify_populate_cb(MemoryRegionSection *section, void *arg)
413
+static int guest_memfd_notify_populate_cb(MemoryRegionSection *section, bool is_private,
414
+ void *arg)
415
{
416
RamDiscardListener *rdl = arg;
417
418
- return rdl->notify_populate(rdl, section);
419
+ return rdl->notify_populate(rdl, section, is_private);
420
}
421
422
-static int guest_memfd_notify_discard_cb(MemoryRegionSection *section, void *arg)
423
+static int guest_memfd_notify_discard_cb(MemoryRegionSection *section, bool is_private,
424
+ void *arg)
425
{
426
RamDiscardListener *rdl = arg;
427
428
- rdl->notify_discard(rdl, section);
429
+ rdl->notify_discard(rdl, section, is_private);
430
431
return 0;
432
}
433
434
-static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm,
435
- MemoryRegionSection *section,
436
- void *arg,
437
- guest_memfd_section_cb cb)
438
+static int guest_memfd_for_each_shared_section(const GuestMemfdManager *gmm,
439
+ MemoryRegionSection *section,
440
+ bool is_private,
441
+ void *arg,
442
+ guest_memfd_section_cb cb)
443
{
444
unsigned long first_one_bit, last_one_bit;
445
uint64_t offset, size;
446
@@ -XXX,XX +XXX,XX @@ static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm,
447
break;
448
}
449
450
- ret = cb(&tmp, arg);
451
+ ret = cb(&tmp, is_private, arg);
452
if (ret) {
453
break;
454
}
455
@@ -XXX,XX +XXX,XX @@ static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm,
456
return ret;
457
}
458
459
-static int guest_memfd_for_each_discarded_section(const GuestMemfdManager *gmm,
460
- MemoryRegionSection *section,
461
- void *arg,
462
- guest_memfd_section_cb cb)
463
+static int guest_memfd_for_each_private_section(const GuestMemfdManager *gmm,
464
+ MemoryRegionSection *section,
465
+ bool is_private,
466
+ void *arg,
467
+ guest_memfd_section_cb cb)
468
{
469
unsigned long first_zero_bit, last_zero_bit;
470
uint64_t offset, size;
471
@@ -XXX,XX +XXX,XX @@ static int guest_memfd_for_each_discarded_section(const GuestMemfdManager *gmm,
472
break;
473
}
474
475
- ret = cb(&tmp, arg);
476
+ ret = cb(&tmp, is_private, arg);
477
if (ret) {
478
break;
479
}
480
@@ -XXX,XX +XXX,XX @@ static void guest_memfd_rdm_register_listener(RamDiscardManager *rdm,
481
482
QLIST_INSERT_HEAD(&gmm->rdl_list, rdl, next);
483
484
- ret = guest_memfd_for_each_populated_section(gmm, section, rdl,
485
- guest_memfd_notify_populate_cb);
486
+ /* Populate shared part */
487
+ ret = guest_memfd_for_each_shared_section(gmm, section, false, rdl,
488
+ guest_memfd_notify_populate_cb);
489
if (ret) {
490
error_report("%s: Failed to register RAM discard listener: %s", __func__,
491
strerror(-ret));
492
@@ -XXX,XX +XXX,XX @@ static void guest_memfd_rdm_unregister_listener(RamDiscardManager *rdm,
493
g_assert(rdl->section);
494
g_assert(rdl->section->mr == gmm->mr);
495
496
- ret = guest_memfd_for_each_populated_section(gmm, rdl->section, rdl,
497
- guest_memfd_notify_discard_cb);
498
+ /* Discard shared part */
499
+ ret = guest_memfd_for_each_shared_section(gmm, rdl->section, false, rdl,
500
+ guest_memfd_notify_discard_cb);
501
if (ret) {
502
error_report("%s: Failed to unregister RAM discard listener: %s", __func__,
503
strerror(-ret));
504
@@ -XXX,XX +XXX,XX @@ typedef struct GuestMemfdReplayData {
505
void *opaque;
506
} GuestMemfdReplayData;
507
508
-static int guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section, void *arg)
509
+static int guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section,
510
+ bool is_private, void *arg)
511
{
512
struct GuestMemfdReplayData *data = arg;
513
ReplayRamPopulate replay_fn = data->fn;
514
515
- return replay_fn(section, data->opaque);
516
+ return replay_fn(section, is_private, data->opaque);
517
}
518
519
static int guest_memfd_rdm_replay_populated(const RamDiscardManager *rdm,
520
MemoryRegionSection *section,
521
+ bool is_private,
522
ReplayRamPopulate replay_fn,
523
void *opaque)
524
{
525
@@ -XXX,XX +XXX,XX @@ static int guest_memfd_rdm_replay_populated(const RamDiscardManager *rdm,
526
struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque };
527
528
g_assert(section->mr == gmm->mr);
529
- return guest_memfd_for_each_populated_section(gmm, section, &data,
530
- guest_memfd_rdm_replay_populated_cb);
531
+ if (is_private) {
532
+ /* Replay populate on private section */
533
+ return guest_memfd_for_each_private_section(gmm, section, is_private, &data,
534
+ guest_memfd_rdm_replay_populated_cb);
535
+ } else {
536
+ /* Replay populate on shared section */
537
+ return guest_memfd_for_each_shared_section(gmm, section, is_private, &data,
538
+ guest_memfd_rdm_replay_populated_cb);
539
+ }
540
}
541
542
-static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection *section, void *arg)
543
+static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection *section,
544
+ bool is_private, void *arg)
545
{
546
struct GuestMemfdReplayData *data = arg;
547
ReplayRamDiscard replay_fn = data->fn;
548
549
- replay_fn(section, data->opaque);
550
+ replay_fn(section, is_private, data->opaque);
551
552
return 0;
553
}
554
555
static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm,
556
MemoryRegionSection *section,
557
+ bool is_private,
558
ReplayRamDiscard replay_fn,
559
void *opaque)
560
{
561
@@ -XXX,XX +XXX,XX @@ static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm,
562
struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque };
563
564
g_assert(section->mr == gmm->mr);
565
- guest_memfd_for_each_discarded_section(gmm, section, &data,
566
- guest_memfd_rdm_replay_discarded_cb);
567
+
568
+ if (is_private) {
569
+ /* Replay discard on private section */
570
+ guest_memfd_for_each_private_section(gmm, section, is_private, &data,
571
+ guest_memfd_rdm_replay_discarded_cb);
572
+ } else {
573
+ /* Replay discard on shared section */
574
+ guest_memfd_for_each_shared_section(gmm, section, is_private, &data,
575
+ guest_memfd_rdm_replay_discarded_cb);
576
+ }
577
}
578
579
static bool guest_memfd_is_valid_range(GuestMemfdManager *gmm,
580
@@ -XXX,XX +XXX,XX @@ static void guest_memfd_notify_discard(GuestMemfdManager *gmm,
581
continue;
582
}
583
584
- guest_memfd_for_each_populated_section(gmm, &tmp, rdl,
585
- guest_memfd_notify_discard_cb);
586
+ /* For current shared section, notify to discard shared parts */
587
+ guest_memfd_for_each_shared_section(gmm, &tmp, false, rdl,
588
+ guest_memfd_notify_discard_cb);
589
}
590
}
591
592
@@ -XXX,XX +XXX,XX @@ static int guest_memfd_notify_populate(GuestMemfdManager *gmm,
593
continue;
594
}
595
596
- ret = guest_memfd_for_each_discarded_section(gmm, &tmp, rdl,
597
- guest_memfd_notify_populate_cb);
598
+ /* For current private section, notify to populate the shared parts */
599
+ ret = guest_memfd_for_each_private_section(gmm, &tmp, false, rdl,
600
+ guest_memfd_notify_populate_cb);
601
if (ret) {
602
break;
603
}
604
@@ -XXX,XX +XXX,XX @@ static int guest_memfd_notify_populate(GuestMemfdManager *gmm,
605
continue;
606
}
607
608
- guest_memfd_for_each_discarded_section(gmm, &tmp, rdl2,
609
- guest_memfd_notify_discard_cb);
610
+ guest_memfd_for_each_private_section(gmm, &tmp, false, rdl2,
611
+ guest_memfd_notify_discard_cb);
612
}
613
}
614
return ret;
615
diff --git a/system/memory.c b/system/memory.c
933
diff --git a/system/memory.c b/system/memory.c
616
index XXXXXXX..XXXXXXX 100644
934
index XXXXXXX..XXXXXXX 100644
617
--- a/system/memory.c
935
--- a/system/memory.c
618
+++ b/system/memory.c
936
+++ b/system/memory.c
619
@@ -XXX,XX +XXX,XX @@ uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
937
@@ -XXX,XX +XXX,XX @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr)
620
}
938
return imrc->num_indexes(iommu_mr);
621
939
}
622
bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
940
941
-RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr)
942
+GenericStateManager *memory_region_get_generic_state_manager(MemoryRegion *mr)
943
{
944
if (!memory_region_is_ram(mr)) {
945
return NULL;
946
}
947
- return mr->rdm;
948
+ return mr->gsm;
949
}
950
951
-int memory_region_set_ram_discard_manager(MemoryRegion *mr,
952
- RamDiscardManager *rdm)
953
+int memory_region_set_generic_state_manager(MemoryRegion *mr,
954
+ GenericStateManager *gsm)
955
{
956
g_assert(memory_region_is_ram(mr));
957
- if (mr->rdm && rdm) {
958
+ if (mr->gsm && gsm) {
959
return -EBUSY;
960
}
961
962
- mr->rdm = rdm;
963
+ mr->gsm = gsm;
964
return 0;
965
}
966
967
-uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
968
- const MemoryRegion *mr)
969
+bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
970
{
971
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
972
+ if (!memory_region_is_ram(mr) ||
973
+ !object_dynamic_cast(OBJECT(mr->gsm), TYPE_RAM_DISCARD_MANAGER)) {
974
+ return false;
975
+ }
976
+
977
+ return true;
978
+}
979
+
980
+uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm,
981
+ const MemoryRegion *mr)
982
+{
983
+ GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
984
985
- g_assert(rdmc->get_min_granularity);
986
- return rdmc->get_min_granularity(rdm, mr);
987
+ g_assert(gsmc->get_min_granularity);
988
+ return gsmc->get_min_granularity(gsm, mr);
989
}
990
991
-bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
623
- const MemoryRegionSection *section)
992
- const MemoryRegionSection *section)
624
+ const MemoryRegionSection *section,
993
+bool generic_state_manager_is_state_set(const GenericStateManager *gsm,
625
+ bool is_private)
994
+ const MemoryRegionSection *section)
626
{
995
{
627
RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
996
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
628
997
+ GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
629
g_assert(rdmc->is_populated);
998
999
- g_assert(rdmc->is_populated);
630
- return rdmc->is_populated(rdm, section);
1000
- return rdmc->is_populated(rdm, section);
631
+ return rdmc->is_populated(rdm, section, is_private);
1001
+ g_assert(gsmc->is_state_set);
632
}
1002
+ return gsmc->is_state_set(gsm, section);
633
1003
}
634
int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
1004
635
MemoryRegionSection *section,
1005
-int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
636
+ bool is_private,
1006
- MemoryRegionSection *section,
637
ReplayRamPopulate replay_fn,
1007
- ReplayStateChange replay_fn,
638
void *opaque)
1008
- void *opaque)
639
{
1009
+int generic_state_manager_replay_on_state_set(const GenericStateManager *gsm,
640
RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
1010
+ MemoryRegionSection *section,
641
1011
+ ReplayStateChange replay_fn,
642
g_assert(rdmc->replay_populated);
1012
+ void *opaque)
1013
{
1014
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
1015
+ GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
1016
1017
- g_assert(rdmc->replay_populated);
643
- return rdmc->replay_populated(rdm, section, replay_fn, opaque);
1018
- return rdmc->replay_populated(rdm, section, replay_fn, opaque);
644
+ return rdmc->replay_populated(rdm, section, is_private, replay_fn, opaque);
1019
+ g_assert(gsmc->replay_on_state_set);
645
}
1020
+ return gsmc->replay_on_state_set(gsm, section, replay_fn, opaque);
646
1021
}
647
void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
1022
648
MemoryRegionSection *section,
1023
-int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
649
+ bool is_private,
1024
- MemoryRegionSection *section,
650
ReplayRamDiscard replay_fn,
1025
- ReplayStateChange replay_fn,
651
void *opaque)
1026
- void *opaque)
652
{
1027
+int generic_state_manager_replay_on_state_clear(const GenericStateManager *gsm,
653
RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
1028
+ MemoryRegionSection *section,
654
1029
+ ReplayStateChange replay_fn,
655
g_assert(rdmc->replay_discarded);
1030
+ void *opaque)
656
- rdmc->replay_discarded(rdm, section, replay_fn, opaque);
1031
{
657
+ rdmc->replay_discarded(rdm, section, is_private, replay_fn, opaque);
1032
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
658
}
1033
+ GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
659
1034
660
void ram_discard_manager_register_listener(RamDiscardManager *rdm,
1035
- g_assert(rdmc->replay_discarded);
1036
- return rdmc->replay_discarded(rdm, section, replay_fn, opaque);
1037
+ g_assert(gsmc->replay_on_state_clear);
1038
+ return gsmc->replay_on_state_clear(gsm, section, replay_fn, opaque);
1039
}
1040
1041
-void ram_discard_manager_register_listener(RamDiscardManager *rdm,
1042
- RamDiscardListener *rdl,
1043
- MemoryRegionSection *section)
1044
+void generic_state_manager_register_listener(GenericStateManager *gsm,
1045
+ StateChangeListener *scl,
1046
+ MemoryRegionSection *section)
1047
{
1048
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
1049
+ GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
1050
1051
- g_assert(rdmc->register_listener);
1052
- rdmc->register_listener(rdm, rdl, section);
1053
+ g_assert(gsmc->register_listener);
1054
+ gsmc->register_listener(gsm, scl, section);
1055
}
1056
1057
-void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
1058
- RamDiscardListener *rdl)
1059
+void generic_state_manager_unregister_listener(GenericStateManager *gsm,
1060
+ StateChangeListener *scl)
1061
{
1062
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
1063
+ GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
1064
1065
- g_assert(rdmc->unregister_listener);
1066
- rdmc->unregister_listener(rdm, rdl);
1067
+ g_assert(gsmc->unregister_listener);
1068
+ gsmc->unregister_listener(gsm, scl);
1069
}
1070
1071
/* Called with rcu_read_lock held. */
1072
@@ -XXX,XX +XXX,XX @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
1073
error_setg(errp, "iommu map to non memory area %" HWADDR_PRIx "", xlat);
1074
return false;
1075
} else if (memory_region_has_ram_discard_manager(mr)) {
1076
- RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
1077
+ GenericStateManager *gsm = memory_region_get_generic_state_manager(mr);
1078
MemoryRegionSection tmp = {
1079
.mr = mr,
1080
.offset_within_region = xlat,
661
@@ -XXX,XX +XXX,XX @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
1081
@@ -XXX,XX +XXX,XX @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
662
* Disallow that. vmstate priorities make sure any RamDiscardManager
1082
* Disallow that. vmstate priorities make sure any RamDiscardManager
663
* were already restored before IOMMUs are restored.
1083
* were already restored before IOMMUs are restored.
664
*/
1084
*/
665
- if (!ram_discard_manager_is_populated(rdm, &tmp)) {
1085
- if (!ram_discard_manager_is_populated(rdm, &tmp)) {
666
+ if (!ram_discard_manager_is_populated(rdm, &tmp, false)) {
1086
+ if (!generic_state_manager_is_state_set(gsm, &tmp)) {
667
error_setg(errp, "iommu map to discarded memory (e.g., unplugged"
1087
error_setg(errp, "iommu map to discarded memory (e.g., unplugged"
668
" via virtio-mem): %" HWADDR_PRIx "",
1088
" via virtio-mem): %" HWADDR_PRIx "",
669
iotlb->translated_addr);
1089
iotlb->translated_addr);
1090
@@ -XXX,XX +XXX,XX @@ static const TypeInfo iommu_memory_region_info = {
1091
.abstract = true,
1092
};
1093
1094
-static const TypeInfo ram_discard_manager_info = {
1095
+static const TypeInfo generic_state_manager_info = {
1096
.parent = TYPE_INTERFACE,
1097
+ .name = TYPE_GENERIC_STATE_MANAGER,
1098
+ .class_size = sizeof(GenericStateManagerClass),
1099
+ .abstract = true,
1100
+};
1101
+
1102
+static const TypeInfo ram_discard_manager_info = {
1103
+ .parent = TYPE_GENERIC_STATE_MANAGER,
1104
.name = TYPE_RAM_DISCARD_MANAGER,
1105
.class_size = sizeof(RamDiscardManagerClass),
1106
};
1107
@@ -XXX,XX +XXX,XX @@ static void memory_register_types(void)
1108
{
1109
type_register_static(&memory_region_info);
1110
type_register_static(&iommu_memory_region_info);
1111
+ type_register_static(&generic_state_manager_info);
1112
type_register_static(&ram_discard_manager_info);
1113
}
1114
670
diff --git a/system/memory_mapping.c b/system/memory_mapping.c
1115
diff --git a/system/memory_mapping.c b/system/memory_mapping.c
671
index XXXXXXX..XXXXXXX 100644
1116
index XXXXXXX..XXXXXXX 100644
672
--- a/system/memory_mapping.c
1117
--- a/system/memory_mapping.c
673
+++ b/system/memory_mapping.c
1118
+++ b/system/memory_mapping.c
674
@@ -XXX,XX +XXX,XX @@ static void guest_phys_block_add_section(GuestPhysListener *g,
675
}
676
677
static int guest_phys_ram_populate_cb(MemoryRegionSection *section,
678
- void *opaque)
679
+ bool is_private, void *opaque)
680
{
681
GuestPhysListener *g = opaque;
682
683
@@ -XXX,XX +XXX,XX @@ static void guest_phys_blocks_region_add(MemoryListener *listener,
1119
@@ -XXX,XX +XXX,XX @@ static void guest_phys_blocks_region_add(MemoryListener *listener,
684
RamDiscardManager *rdm;
1120
685
1121
/* for special sparse regions, only add populated parts */
686
rdm = memory_region_get_ram_discard_manager(section->mr);
1122
if (memory_region_has_ram_discard_manager(section->mr)) {
1123
- RamDiscardManager *rdm;
1124
-
1125
- rdm = memory_region_get_ram_discard_manager(section->mr);
687
- ram_discard_manager_replay_populated(rdm, section,
1126
- ram_discard_manager_replay_populated(rdm, section,
688
+ ram_discard_manager_replay_populated(rdm, section, false,
1127
+ GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
1128
+ generic_state_manager_replay_on_state_set(gsm, section,
689
guest_phys_ram_populate_cb, g);
1129
guest_phys_ram_populate_cb, g);
690
return;
1130
return;
691
}
1131
}
692
--
1132
--
693
2.43.5
1133
2.43.5
diff view generated by jsdifflib
New patch
1
To manage the private and shared RAM states in confidential VMs,
2
introduce a new class of PrivateShareManager as a child of
3
GenericStateManager, which inherits the six interface callbacks. With a
4
different interface type, it can be distinguished from the
5
RamDiscardManager object and provide the flexibility for addressing
6
specific requirements of confidential VMs in the future.
1
7
8
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
9
---
10
Changes in v4:
11
- Newly added.
12
---
13
include/exec/memory.h | 44 +++++++++++++++++++++++++++++++++++++++++--
14
system/memory.c | 17 +++++++++++++++++
15
2 files changed, 59 insertions(+), 2 deletions(-)
16
17
diff --git a/include/exec/memory.h b/include/exec/memory.h
18
index XXXXXXX..XXXXXXX 100644
19
--- a/include/exec/memory.h
20
+++ b/include/exec/memory.h
21
@@ -XXX,XX +XXX,XX @@ typedef struct RamDiscardManager RamDiscardManager;
22
DECLARE_OBJ_CHECKERS(RamDiscardManager, RamDiscardManagerClass,
23
RAM_DISCARD_MANAGER, TYPE_RAM_DISCARD_MANAGER);
24
25
+#define TYPE_PRIVATE_SHARED_MANAGER "private-shared-manager"
26
+typedef struct PrivateSharedManagerClass PrivateSharedManagerClass;
27
+typedef struct PrivateSharedManager PrivateSharedManager;
28
+DECLARE_OBJ_CHECKERS(PrivateSharedManager, PrivateSharedManagerClass,
29
+ PRIVATE_SHARED_MANAGER, TYPE_PRIVATE_SHARED_MANAGER)
30
+
31
#ifdef CONFIG_FUZZ
32
void fuzz_dma_read_cb(size_t addr,
33
size_t len,
34
@@ -XXX,XX +XXX,XX @@ void generic_state_manager_register_listener(GenericStateManager *gsm,
35
void generic_state_manager_unregister_listener(GenericStateManager *gsm,
36
StateChangeListener *scl);
37
38
+static inline void state_change_listener_init(StateChangeListener *scl,
39
+ NotifyStateSet state_set_fn,
40
+ NotifyStateClear state_clear_fn)
41
+{
42
+ scl->notify_to_state_set = state_set_fn;
43
+ scl->notify_to_state_clear = state_clear_fn;
44
+}
45
+
46
typedef struct RamDiscardListener RamDiscardListener;
47
48
struct RamDiscardListener {
49
@@ -XXX,XX +XXX,XX @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl,
50
NotifyStateClear discard_fn,
51
bool double_discard_supported)
52
{
53
- rdl->scl.notify_to_state_set = populate_fn;
54
- rdl->scl.notify_to_state_clear = discard_fn;
55
+ state_change_listener_init(&rdl->scl, populate_fn, discard_fn);
56
rdl->double_discard_supported = double_discard_supported;
57
}
58
59
@@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass {
60
GenericStateManagerClass parent_class;
61
};
62
63
+typedef struct PrivateSharedListener PrivateSharedListener;
64
+struct PrivateSharedListener {
65
+ struct StateChangeListener scl;
66
+
67
+ QLIST_ENTRY(PrivateSharedListener) next;
68
+};
69
+
70
+struct PrivateSharedManagerClass {
71
+ /* private */
72
+ GenericStateManagerClass parent_class;
73
+};
74
+
75
+static inline void private_shared_listener_init(PrivateSharedListener *psl,
76
+ NotifyStateSet populate_fn,
77
+ NotifyStateClear discard_fn)
78
+{
79
+ state_change_listener_init(&psl->scl, populate_fn, discard_fn);
80
+}
81
+
82
/**
83
* memory_get_xlat_addr: Extract addresses from a TLB entry
84
*
85
@@ -XXX,XX +XXX,XX @@ int memory_region_set_generic_state_manager(MemoryRegion *mr,
86
*/
87
bool memory_region_has_ram_discard_manager(MemoryRegion *mr);
88
89
+/**
90
+ * memory_region_has_private_shared_manager: check whether a #MemoryRegion has a
91
+ * #PrivateSharedManager assigned
92
+ *
93
+ * @mr: the #MemoryRegion
94
+ */
95
+bool memory_region_has_private_shared_manager(MemoryRegion *mr);
96
+
97
/**
98
* memory_region_find: translate an address/size relative to a
99
* MemoryRegion into a #MemoryRegionSection.
100
diff --git a/system/memory.c b/system/memory.c
101
index XXXXXXX..XXXXXXX 100644
102
--- a/system/memory.c
103
+++ b/system/memory.c
104
@@ -XXX,XX +XXX,XX @@ bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
105
return true;
106
}
107
108
+bool memory_region_has_private_shared_manager(MemoryRegion *mr)
109
+{
110
+ if (!memory_region_is_ram(mr) ||
111
+ !object_dynamic_cast(OBJECT(mr->gsm), TYPE_PRIVATE_SHARED_MANAGER)) {
112
+ return false;
113
+ }
114
+
115
+ return true;
116
+}
117
+
118
uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm,
119
const MemoryRegion *mr)
120
{
121
@@ -XXX,XX +XXX,XX @@ static const TypeInfo ram_discard_manager_info = {
122
.class_size = sizeof(RamDiscardManagerClass),
123
};
124
125
+static const TypeInfo private_shared_manager_info = {
126
+ .parent = TYPE_GENERIC_STATE_MANAGER,
127
+ .name = TYPE_PRIVATE_SHARED_MANAGER,
128
+ .class_size = sizeof(PrivateSharedManagerClass),
129
+};
130
+
131
static void memory_register_types(void)
132
{
133
type_register_static(&memory_region_info);
134
type_register_static(&iommu_memory_region_info);
135
type_register_static(&generic_state_manager_info);
136
type_register_static(&ram_discard_manager_info);
137
+ type_register_static(&private_shared_manager_info);
138
}
139
140
type_init(memory_register_types)
141
--
142
2.43.5
diff view generated by jsdifflib
New patch
1
Subsystems like VFIO previously disabled ram block discard and only
2
allowed coordinated discarding via RamDiscardManager. However,
3
guest_memfd in confidential VMs relies on discard operations for page
4
conversion between private and shared memory. This can lead to stale
5
IOMMU mapping issue when assigning a hardware device to a confidential
6
VM via shared memory. With the introduction of PrivateSharedManager
7
interface to manage private and shared states and being distinct from
8
RamDiscardManager, include PrivateSharedManager in coordinated RAM
9
discard and add related support in VFIO.
1
10
11
Currently, migration support for confidential VMs is not available, so
12
vfio_sync_dirty_bitmap() handling for PrivateSharedListener can be
13
ignored. The register/unregister of PrivateSharedListener is necessary
14
during vfio_listener_region_add/del(). The listener callbacks are
15
similar between RamDiscardListener and PrivateSharedListener, allowing
16
for extraction of common parts opportunisticlly.
17
18
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
19
---
20
Changes in v4
21
- Newly added.
22
---
23
hw/vfio/common.c | 104 +++++++++++++++++++++++---
24
hw/vfio/container-base.c | 1 +
25
include/hw/vfio/vfio-container-base.h | 10 +++
26
3 files changed, 105 insertions(+), 10 deletions(-)
27
28
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
29
index XXXXXXX..XXXXXXX 100644
30
--- a/hw/vfio/common.c
31
+++ b/hw/vfio/common.c
32
@@ -XXX,XX +XXX,XX @@ out:
33
rcu_read_unlock();
34
}
35
36
-static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
37
- MemoryRegionSection *section)
38
+static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer,
39
+ MemoryRegionSection *section)
40
{
41
- RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
42
- VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
43
- listener);
44
- VFIOContainerBase *bcontainer = vrdl->bcontainer;
45
const hwaddr size = int128_get64(section->size);
46
const hwaddr iova = section->offset_within_address_space;
47
int ret;
48
@@ -XXX,XX +XXX,XX @@ static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
49
}
50
}
51
52
-static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
53
+static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
54
MemoryRegionSection *section)
55
{
56
RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
57
VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
58
listener);
59
- VFIOContainerBase *bcontainer = vrdl->bcontainer;
60
+ vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section);
61
+}
62
+
63
+static void vfio_private_shared_notify_to_private(StateChangeListener *scl,
64
+ MemoryRegionSection *section)
65
+{
66
+ PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
67
+ VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener,
68
+ listener);
69
+ vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section);
70
+}
71
+
72
+static int vfio_state_change_notify_to_state_set(VFIOContainerBase *bcontainer,
73
+ MemoryRegionSection *section,
74
+ uint64_t granularity)
75
+{
76
const hwaddr end = section->offset_within_region +
77
int128_get64(section->size);
78
hwaddr start, next, iova;
79
@@ -XXX,XX +XXX,XX @@ static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
80
* unmap in minimum granularity later.
81
*/
82
for (start = section->offset_within_region; start < end; start = next) {
83
- next = ROUND_UP(start + 1, vrdl->granularity);
84
+ next = ROUND_UP(start + 1, granularity);
85
next = MIN(next, end);
86
87
iova = start - section->offset_within_region +
88
@@ -XXX,XX +XXX,XX @@ static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
89
vaddr, section->readonly);
90
if (ret) {
91
/* Rollback */
92
- vfio_ram_discard_notify_discard(scl, section);
93
+ vfio_state_change_notify_to_state_clear(bcontainer, section);
94
return ret;
95
}
96
}
97
return 0;
98
}
99
100
+static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
101
+ MemoryRegionSection *section)
102
+{
103
+ RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
104
+ VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
105
+ listener);
106
+ return vfio_state_change_notify_to_state_set(vrdl->bcontainer, section,
107
+ vrdl->granularity);
108
+}
109
+
110
+static int vfio_private_shared_notify_to_shared(StateChangeListener *scl,
111
+ MemoryRegionSection *section)
112
+{
113
+ PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
114
+ VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener,
115
+ listener);
116
+ return vfio_state_change_notify_to_state_set(vpsl->bcontainer, section,
117
+ vpsl->granularity);
118
+}
119
+
120
static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
121
MemoryRegionSection *section)
122
{
123
@@ -XXX,XX +XXX,XX @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
124
}
125
}
126
127
+static void vfio_register_private_shared_listener(VFIOContainerBase *bcontainer,
128
+ MemoryRegionSection *section)
129
+{
130
+ GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
131
+ VFIOPrivateSharedListener *vpsl;
132
+ PrivateSharedListener *psl;
133
+
134
+ vpsl = g_new0(VFIOPrivateSharedListener, 1);
135
+ vpsl->bcontainer = bcontainer;
136
+ vpsl->mr = section->mr;
137
+ vpsl->offset_within_address_space = section->offset_within_address_space;
138
+ vpsl->granularity = generic_state_manager_get_min_granularity(gsm,
139
+ section->mr);
140
+
141
+ psl = &vpsl->listener;
142
+ private_shared_listener_init(psl, vfio_private_shared_notify_to_shared,
143
+ vfio_private_shared_notify_to_private);
144
+ generic_state_manager_register_listener(gsm, &psl->scl, section);
145
+ QLIST_INSERT_HEAD(&bcontainer->vpsl_list, vpsl, next);
146
+}
147
+
148
static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
149
MemoryRegionSection *section)
150
{
151
@@ -XXX,XX +XXX,XX @@ static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
152
g_free(vrdl);
153
}
154
155
+static void vfio_unregister_private_shared_listener(VFIOContainerBase *bcontainer,
156
+ MemoryRegionSection *section)
157
+{
158
+ GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
159
+ VFIOPrivateSharedListener *vpsl = NULL;
160
+ PrivateSharedListener *psl;
161
+
162
+ QLIST_FOREACH(vpsl, &bcontainer->vpsl_list, next) {
163
+ if (vpsl->mr == section->mr &&
164
+ vpsl->offset_within_address_space ==
165
+ section->offset_within_address_space) {
166
+ break;
167
+ }
168
+ }
169
+
170
+ if (!vpsl) {
171
+ hw_error("vfio: Trying to unregister missing RAM discard listener");
172
+ }
173
+
174
+ psl = &vpsl->listener;
175
+ generic_state_manager_unregister_listener(gsm, &psl->scl);
176
+ QLIST_REMOVE(vpsl, next);
177
+ g_free(vpsl);
178
+}
179
+
180
static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
181
{
182
MemoryRegion *mr = section->mr;
183
@@ -XXX,XX +XXX,XX @@ static void vfio_listener_region_add(MemoryListener *listener,
184
if (memory_region_has_ram_discard_manager(section->mr)) {
185
vfio_register_ram_discard_listener(bcontainer, section);
186
return;
187
+ } else if (memory_region_has_private_shared_manager(section->mr)) {
188
+ vfio_register_private_shared_listener(bcontainer, section);
189
+ return;
190
}
191
192
vaddr = memory_region_get_ram_ptr(section->mr) +
193
@@ -XXX,XX +XXX,XX @@ static void vfio_listener_region_del(MemoryListener *listener,
194
vfio_unregister_ram_discard_listener(bcontainer, section);
195
/* Unregistering will trigger an unmap. */
196
try_unmap = false;
197
+ } else if (memory_region_has_private_shared_manager(section->mr)) {
198
+ vfio_unregister_private_shared_listener(bcontainer, section);
199
+ /* Unregistering will trigger an unmap. */
200
+ try_unmap = false;
201
}
202
203
if (try_unmap) {
204
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
205
index XXXXXXX..XXXXXXX 100644
206
--- a/hw/vfio/container-base.c
207
+++ b/hw/vfio/container-base.c
208
@@ -XXX,XX +XXX,XX @@ static void vfio_container_instance_init(Object *obj)
209
bcontainer->iova_ranges = NULL;
210
QLIST_INIT(&bcontainer->giommu_list);
211
QLIST_INIT(&bcontainer->vrdl_list);
212
+ QLIST_INIT(&bcontainer->vpsl_list);
213
}
214
215
static const TypeInfo types[] = {
216
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
217
index XXXXXXX..XXXXXXX 100644
218
--- a/include/hw/vfio/vfio-container-base.h
219
+++ b/include/hw/vfio/vfio-container-base.h
220
@@ -XXX,XX +XXX,XX @@ typedef struct VFIOContainerBase {
221
bool dirty_pages_started; /* Protected by BQL */
222
QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
223
QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
224
+ QLIST_HEAD(, VFIOPrivateSharedListener) vpsl_list;
225
QLIST_ENTRY(VFIOContainerBase) next;
226
QLIST_HEAD(, VFIODevice) device_list;
227
GList *iova_ranges;
228
@@ -XXX,XX +XXX,XX @@ typedef struct VFIORamDiscardListener {
229
QLIST_ENTRY(VFIORamDiscardListener) next;
230
} VFIORamDiscardListener;
231
232
+typedef struct VFIOPrivateSharedListener {
233
+ VFIOContainerBase *bcontainer;
234
+ MemoryRegion *mr;
235
+ hwaddr offset_within_address_space;
236
+ uint64_t granularity;
237
+ PrivateSharedListener listener;
238
+ QLIST_ENTRY(VFIOPrivateSharedListener) next;
239
+} VFIOPrivateSharedListener;
240
+
241
int vfio_container_dma_map(VFIOContainerBase *bcontainer,
242
hwaddr iova, ram_addr_t size,
243
void *vaddr, bool readonly);
244
--
245
2.43.5
diff view generated by jsdifflib
1
As the commit 852f0048f3 ("RAMBlock: make guest_memfd require
1
Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
2
uncoordinated discard") highlighted, some subsystems like VFIO might
2
discard") highlighted that subsystems like VFIO may disable RAM block
3
disable ram block discard. However, guest_memfd relies on the discard
3
discard. However, guest_memfd relies on discard operations for page
4
operation to perform page conversion between private and shared memory.
4
conversion between private and shared memory, potentially leading to
5
This can lead to stale IOMMU mapping issue when assigning a hardware
5
stale IOMMU mapping issue when assigning hardware devices to
6
device to a confidential VM via shared memory (unprotected memory
6
confidential VMs via shared memory. To address this, it is crucial to
7
pages). Blocking shared page discard can solve this problem, but it
7
ensure systems like VFIO refresh its IOMMU mappings.
8
could cause guests to consume twice the memory with VFIO, which is not
8
9
acceptable in some cases. An alternative solution is to convey other
9
PrivateSharedManager is introduced to manage private and shared states in
10
systems like VFIO to refresh its outdated IOMMU mappings.
10
confidential VMs, similar to RamDiscardManager, which supports
11
11
coordinated RAM discard in VFIO. Integrating PrivateSharedManager with
12
RamDiscardManager is an existing concept (used by virtio-mem) to adjust
12
guest_memfd can facilitate the adjustment of VFIO mappings in response
13
VFIO mappings in relation to VM page assignment. Effectively page
13
to page conversion events.
14
conversion is similar to hot-removing a page in one mode and adding it
14
15
back in the other, so the similar work that needs to happen in response
15
Since guest_memfd is not an object, it cannot directly implement the
16
to virtio-mem changes needs to happen for page conversion events.
16
PrivateSharedManager interface. Implementing it in HostMemoryBackend is
17
Introduce the RamDiscardManager to guest_memfd to achieve it.
17
not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
18
18
have a memory backend while others do not. Notably, virtual BIOS
19
However, guest_memfd is not an object so it cannot directly implement
19
RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
20
the RamDiscardManager interface.
20
backend.
21
21
22
One solution is to implement the interface in HostMemoryBackend. Any
22
To manage RAMBlocks with guest_memfd, define a new object named
23
guest_memfd-backed host memory backend can register itself in the target
23
RamBlockAttribute to implement the RamDiscardManager interface. This
24
MemoryRegion. However, this solution doesn't cover the scenario where a
24
object stores guest_memfd information such as shared_bitmap, and handles
25
guest_memfd MemoryRegion doesn't belong to the HostMemoryBackend, e.g.
25
page conversion notification. The memory state is tracked at the host
26
the virtual BIOS MemoryRegion.
26
page size granularity, as the minimum memory conversion size can be one
27
27
page per request. Additionally, VFIO expects the DMA mapping for a
28
Thus, choose the second option, i.e. define an object type named
28
specific iova to be mapped and unmapped with the same granularity.
29
guest_memfd_manager with RamDiscardManager interface. Upon creation of
29
Confidential VMs may perform partial conversions, such as conversions on
30
guest_memfd, a new guest_memfd_manager object can be instantiated and
30
small regions within larger regions. To prevent invalid cases and until
31
registered to the managed guest_memfd MemoryRegion to handle the page
31
cut_mapping operation support is available, all operations are performed
32
conversion events.
32
with 4K granularity.
33
34
In the context of guest_memfd, the discarded state signifies that the
35
page is private, while the populated state indicated that the page is
36
shared. The state of the memory is tracked at the granularity of the
37
host page size (i.e. block_size), as the minimum conversion size can be
38
one page per request.
39
40
In addition, VFIO expects the DMA mapping for a specific iova to be
41
mapped and unmapped with the same granularity. However, the confidential
42
VMs may do partial conversion, e.g. conversion happens on a small region
43
within a large region. To prevent such invalid cases and before any
44
potential optimization comes out, all operations are performed with 4K
45
granularity.
46
33
47
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
34
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
48
---
35
---
49
include/sysemu/guest-memfd-manager.h | 46 +++++
36
Changes in v4:
50
system/guest-memfd-manager.c | 250 +++++++++++++++++++++++++++
37
- Change the name from memory-attribute-manager to
51
system/meson.build | 1 +
38
ram-block-attribute.
52
3 files changed, 297 insertions(+)
39
- Implement the newly-introduced PrivateSharedManager instead of
53
create mode 100644 include/sysemu/guest-memfd-manager.h
40
RamDiscardManager and change related commit message.
54
create mode 100644 system/guest-memfd-manager.c
41
- Define the new object in ramblock.h instead of adding a new file.
55
42
56
diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h
43
Changes in v3:
44
- Some rename (bitmap_size->shared_bitmap_size,
45
first_one/zero_bit->first_bit, etc.)
46
- Change shared_bitmap_size from uint32_t to unsigned
47
- Return mgr->mr->ram_block->page_size in get_block_size()
48
- Move set_ram_discard_manager() up to avoid a g_free() in failure
49
case.
50
- Add const for the memory_attribute_manager_get_block_size()
51
- Unify the ReplayRamPopulate and ReplayRamDiscard and related
52
callback.
53
54
Changes in v2:
55
- Rename the object name to MemoryAttributeManager
56
- Rename the bitmap to shared_bitmap to make it more clear.
57
- Remove block_size field and get it from a helper. In future, we
58
can get the page_size from RAMBlock if necessary.
59
- Remove the unncessary "struct" before GuestMemfdReplayData
60
- Remove the unncessary g_free() for the bitmap
61
- Add some error report when the callback failure for
62
populated/discarded section.
63
- Move the realize()/unrealize() definition to this patch.
64
---
65
include/exec/ramblock.h | 24 +++
66
system/meson.build | 1 +
67
system/ram-block-attribute.c | 282 +++++++++++++++++++++++++++++++++++
68
3 files changed, 307 insertions(+)
69
create mode 100644 system/ram-block-attribute.c
70
71
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
72
index XXXXXXX..XXXXXXX 100644
73
--- a/include/exec/ramblock.h
74
+++ b/include/exec/ramblock.h
75
@@ -XXX,XX +XXX,XX @@
76
#include "cpu-common.h"
77
#include "qemu/rcu.h"
78
#include "exec/ramlist.h"
79
+#include "system/hostmem.h"
80
+
81
+#define TYPE_RAM_BLOCK_ATTRIBUTE "ram-block-attribute"
82
+OBJECT_DECLARE_TYPE(RamBlockAttribute, RamBlockAttributeClass, RAM_BLOCK_ATTRIBUTE)
83
84
struct RAMBlock {
85
struct rcu_head rcu;
86
@@ -XXX,XX +XXX,XX @@ struct RAMBlock {
87
*/
88
ram_addr_t postcopy_length;
89
};
90
+
91
+struct RamBlockAttribute {
92
+ Object parent;
93
+
94
+ MemoryRegion *mr;
95
+
96
+ /* 1-setting of the bit represents the memory is populated (shared) */
97
+ unsigned shared_bitmap_size;
98
+ unsigned long *shared_bitmap;
99
+
100
+ QLIST_HEAD(, PrivateSharedListener) psl_list;
101
+};
102
+
103
+struct RamBlockAttributeClass {
104
+ ObjectClass parent_class;
105
+};
106
+
107
+int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr);
108
+void ram_block_attribute_unrealize(RamBlockAttribute *attr);
109
+
110
#endif
111
#endif
112
diff --git a/system/meson.build b/system/meson.build
113
index XXXXXXX..XXXXXXX 100644
114
--- a/system/meson.build
115
+++ b/system/meson.build
116
@@ -XXX,XX +XXX,XX @@ system_ss.add(files(
117
'dirtylimit.c',
118
'dma-helpers.c',
119
'globals.c',
120
+ 'ram-block-attribute.c',
121
'memory_mapping.c',
122
'qdev-monitor.c',
123
'qtest.c',
124
diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
57
new file mode 100644
125
new file mode 100644
58
index XXXXXXX..XXXXXXX
126
index XXXXXXX..XXXXXXX
59
--- /dev/null
127
--- /dev/null
60
+++ b/include/sysemu/guest-memfd-manager.h
128
+++ b/system/ram-block-attribute.c
61
@@ -XXX,XX +XXX,XX @@
129
@@ -XXX,XX +XXX,XX @@
62
+/*
130
+/*
63
+ * QEMU guest memfd manager
131
+ * QEMU ram block attribute
64
+ *
132
+ *
65
+ * Copyright Intel
133
+ * Copyright Intel
66
+ *
134
+ *
67
+ * Author:
135
+ * Author:
68
+ * Chenyi Qiang <chenyi.qiang@intel.com>
136
+ * Chenyi Qiang <chenyi.qiang@intel.com>
69
+ *
137
+ *
70
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
138
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
71
+ * See the COPYING file in the top-level directory
139
+ * See the COPYING file in the top-level directory
72
+ *
140
+ *
73
+ */
141
+ */
74
+
142
+
75
+#ifndef SYSEMU_GUEST_MEMFD_MANAGER_H
76
+#define SYSEMU_GUEST_MEMFD_MANAGER_H
77
+
78
+#include "sysemu/hostmem.h"
79
+
80
+#define TYPE_GUEST_MEMFD_MANAGER "guest-memfd-manager"
81
+
82
+OBJECT_DECLARE_TYPE(GuestMemfdManager, GuestMemfdManagerClass, GUEST_MEMFD_MANAGER)
83
+
84
+struct GuestMemfdManager {
85
+ Object parent;
86
+
87
+ /* Managed memory region. */
88
+ MemoryRegion *mr;
89
+
90
+ /*
91
+ * 1-setting of the bit represents the memory is populated (shared).
92
+ */
93
+ int32_t bitmap_size;
94
+ unsigned long *bitmap;
95
+
96
+ /* block size and alignment */
97
+ uint64_t block_size;
98
+
99
+ /* listeners to notify on populate/discard activity. */
100
+ QLIST_HEAD(, RamDiscardListener) rdl_list;
101
+};
102
+
103
+struct GuestMemfdManagerClass {
104
+ ObjectClass parent_class;
105
+};
106
+
107
+#endif
108
diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c
109
new file mode 100644
110
index XXXXXXX..XXXXXXX
111
--- /dev/null
112
+++ b/system/guest-memfd-manager.c
113
@@ -XXX,XX +XXX,XX @@
114
+/*
115
+ * QEMU guest memfd manager
116
+ *
117
+ * Copyright Intel
118
+ *
119
+ * Author:
120
+ * Chenyi Qiang <chenyi.qiang@intel.com>
121
+ *
122
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
123
+ * See the COPYING file in the top-level directory
124
+ *
125
+ */
126
+
127
+#include "qemu/osdep.h"
143
+#include "qemu/osdep.h"
128
+#include "qemu/error-report.h"
144
+#include "qemu/error-report.h"
129
+#include "sysemu/guest-memfd-manager.h"
145
+#include "exec/ramblock.h"
130
+
146
+
131
+OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(GuestMemfdManager,
147
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(RamBlockAttribute,
132
+ guest_memfd_manager,
148
+ ram_block_attribute,
133
+ GUEST_MEMFD_MANAGER,
149
+ RAM_BLOCK_ATTRIBUTE,
134
+ OBJECT,
150
+ OBJECT,
135
+ { TYPE_RAM_DISCARD_MANAGER },
151
+ { TYPE_PRIVATE_SHARED_MANAGER },
136
+ { })
152
+ { })
137
+
153
+
138
+static bool guest_memfd_rdm_is_populated(const RamDiscardManager *rdm,
154
+static size_t ram_block_attribute_get_block_size(const RamBlockAttribute *attr)
139
+ const MemoryRegionSection *section)
155
+{
140
+{
156
+ /*
141
+ const GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm);
157
+ * Because page conversion could be manipulated in the size of at least 4K or 4K aligned,
142
+ uint64_t first_bit = section->offset_within_region / gmm->block_size;
158
+ * Use the host page size as the granularity to track the memory attribute.
143
+ uint64_t last_bit = first_bit + int128_get64(section->size) / gmm->block_size - 1;
159
+ */
160
+ g_assert(attr && attr->mr && attr->mr->ram_block);
161
+ g_assert(attr->mr->ram_block->page_size == qemu_real_host_page_size());
162
+ return attr->mr->ram_block->page_size;
163
+}
164
+
165
+
166
+static bool ram_block_attribute_psm_is_shared(const GenericStateManager *gsm,
167
+ const MemoryRegionSection *section)
168
+{
169
+ const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
170
+ const int block_size = ram_block_attribute_get_block_size(attr);
171
+ uint64_t first_bit = section->offset_within_region / block_size;
172
+ uint64_t last_bit = first_bit + int128_get64(section->size) / block_size - 1;
144
+ unsigned long first_discard_bit;
173
+ unsigned long first_discard_bit;
145
+
174
+
146
+ first_discard_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit);
175
+ first_discard_bit = find_next_zero_bit(attr->shared_bitmap, last_bit + 1, first_bit);
147
+ return first_discard_bit > last_bit;
176
+ return first_discard_bit > last_bit;
148
+}
177
+}
149
+
178
+
150
+typedef int (*guest_memfd_section_cb)(MemoryRegionSection *s, void *arg);
179
+typedef int (*ram_block_attribute_section_cb)(MemoryRegionSection *s, void *arg);
151
+
180
+
152
+static int guest_memfd_notify_populate_cb(MemoryRegionSection *section, void *arg)
181
+static int ram_block_attribute_notify_shared_cb(MemoryRegionSection *section, void *arg)
153
+{
182
+{
154
+ RamDiscardListener *rdl = arg;
183
+ StateChangeListener *scl = arg;
155
+
184
+
156
+ return rdl->notify_populate(rdl, section);
185
+ return scl->notify_to_state_set(scl, section);
157
+}
186
+}
158
+
187
+
159
+static int guest_memfd_notify_discard_cb(MemoryRegionSection *section, void *arg)
188
+static int ram_block_attribute_notify_private_cb(MemoryRegionSection *section, void *arg)
160
+{
189
+{
161
+ RamDiscardListener *rdl = arg;
190
+ StateChangeListener *scl = arg;
162
+
191
+
163
+ rdl->notify_discard(rdl, section);
192
+ scl->notify_to_state_clear(scl, section);
164
+
165
+ return 0;
193
+ return 0;
166
+}
194
+}
167
+
195
+
168
+static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm,
196
+static int ram_block_attribute_for_each_shared_section(const RamBlockAttribute *attr,
169
+ MemoryRegionSection *section,
197
+ MemoryRegionSection *section,
170
+ void *arg,
198
+ void *arg,
171
+ guest_memfd_section_cb cb)
199
+ ram_block_attribute_section_cb cb)
172
+{
200
+{
173
+ unsigned long first_one_bit, last_one_bit;
201
+ unsigned long first_bit, last_bit;
174
+ uint64_t offset, size;
202
+ uint64_t offset, size;
203
+ const int block_size = ram_block_attribute_get_block_size(attr);
175
+ int ret = 0;
204
+ int ret = 0;
176
+
205
+
177
+ first_one_bit = section->offset_within_region / gmm->block_size;
206
+ first_bit = section->offset_within_region / block_size;
178
+ first_one_bit = find_next_bit(gmm->bitmap, gmm->bitmap_size, first_one_bit);
207
+ first_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size, first_bit);
179
+
208
+
180
+ while (first_one_bit < gmm->bitmap_size) {
209
+ while (first_bit < attr->shared_bitmap_size) {
181
+ MemoryRegionSection tmp = *section;
210
+ MemoryRegionSection tmp = *section;
182
+
211
+
183
+ offset = first_one_bit * gmm->block_size;
212
+ offset = first_bit * block_size;
184
+ last_one_bit = find_next_zero_bit(gmm->bitmap, gmm->bitmap_size,
213
+ last_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size,
185
+ first_one_bit + 1) - 1;
214
+ first_bit + 1) - 1;
186
+ size = (last_one_bit - first_one_bit + 1) * gmm->block_size;
215
+ size = (last_bit - first_bit + 1) * block_size;
187
+
216
+
188
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
217
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
189
+ break;
218
+ break;
190
+ }
219
+ }
191
+
220
+
192
+ ret = cb(&tmp, arg);
221
+ ret = cb(&tmp, arg);
193
+ if (ret) {
222
+ if (ret) {
223
+ error_report("%s: Failed to notify RAM discard listener: %s", __func__,
224
+ strerror(-ret));
194
+ break;
225
+ break;
195
+ }
226
+ }
196
+
227
+
197
+ first_one_bit = find_next_bit(gmm->bitmap, gmm->bitmap_size,
228
+ first_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size,
198
+ last_one_bit + 2);
229
+ last_bit + 2);
199
+ }
230
+ }
200
+
231
+
201
+ return ret;
232
+ return ret;
202
+}
233
+}
203
+
234
+
204
+static int guest_memfd_for_each_discarded_section(const GuestMemfdManager *gmm,
235
+static int ram_block_attribute_for_each_private_section(const RamBlockAttribute *attr,
205
+ MemoryRegionSection *section,
236
+ MemoryRegionSection *section,
206
+ void *arg,
237
+ void *arg,
207
+ guest_memfd_section_cb cb)
238
+ ram_block_attribute_section_cb cb)
208
+{
239
+{
209
+ unsigned long first_zero_bit, last_zero_bit;
240
+ unsigned long first_bit, last_bit;
210
+ uint64_t offset, size;
241
+ uint64_t offset, size;
242
+ const int block_size = ram_block_attribute_get_block_size(attr);
211
+ int ret = 0;
243
+ int ret = 0;
212
+
244
+
213
+ first_zero_bit = section->offset_within_region / gmm->block_size;
245
+ first_bit = section->offset_within_region / block_size;
214
+ first_zero_bit = find_next_zero_bit(gmm->bitmap, gmm->bitmap_size,
246
+ first_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size,
215
+ first_zero_bit);
247
+ first_bit);
216
+
248
+
217
+ while (first_zero_bit < gmm->bitmap_size) {
249
+ while (first_bit < attr->shared_bitmap_size) {
218
+ MemoryRegionSection tmp = *section;
250
+ MemoryRegionSection tmp = *section;
219
+
251
+
220
+ offset = first_zero_bit * gmm->block_size;
252
+ offset = first_bit * block_size;
221
+ last_zero_bit = find_next_bit(gmm->bitmap, gmm->bitmap_size,
253
+ last_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size,
222
+ first_zero_bit + 1) - 1;
254
+ first_bit + 1) - 1;
223
+ size = (last_zero_bit - first_zero_bit + 1) * gmm->block_size;
255
+ size = (last_bit - first_bit + 1) * block_size;
224
+
256
+
225
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
257
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
226
+ break;
258
+ break;
227
+ }
259
+ }
228
+
260
+
229
+ ret = cb(&tmp, arg);
261
+ ret = cb(&tmp, arg);
230
+ if (ret) {
262
+ if (ret) {
263
+ error_report("%s: Failed to notify RAM discard listener: %s", __func__,
264
+ strerror(-ret));
231
+ break;
265
+ break;
232
+ }
266
+ }
233
+
267
+
234
+ first_zero_bit = find_next_zero_bit(gmm->bitmap, gmm->bitmap_size,
268
+ first_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size,
235
+ last_zero_bit + 2);
269
+ last_bit + 2);
236
+ }
270
+ }
237
+
271
+
238
+ return ret;
272
+ return ret;
239
+}
273
+}
240
+
274
+
241
+static uint64_t guest_memfd_rdm_get_min_granularity(const RamDiscardManager *rdm,
275
+static uint64_t ram_block_attribute_psm_get_min_granularity(const GenericStateManager *gsm,
242
+ const MemoryRegion *mr)
276
+ const MemoryRegion *mr)
243
+{
277
+{
244
+ GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm);
278
+ const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
245
+
279
+
246
+ g_assert(mr == gmm->mr);
280
+ g_assert(mr == attr->mr);
247
+ return gmm->block_size;
281
+ return ram_block_attribute_get_block_size(attr);
248
+}
282
+}
249
+
283
+
250
+static void guest_memfd_rdm_register_listener(RamDiscardManager *rdm,
284
+static void ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
251
+ RamDiscardListener *rdl,
285
+ StateChangeListener *scl,
252
+ MemoryRegionSection *section)
286
+ MemoryRegionSection *section)
253
+{
287
+{
254
+ GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm);
288
+ RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
289
+ PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
255
+ int ret;
290
+ int ret;
256
+
291
+
257
+ g_assert(section->mr == gmm->mr);
292
+ g_assert(section->mr == attr->mr);
258
+ rdl->section = memory_region_section_new_copy(section);
293
+ scl->section = memory_region_section_new_copy(section);
259
+
294
+
260
+ QLIST_INSERT_HEAD(&gmm->rdl_list, rdl, next);
295
+ QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
261
+
296
+
262
+ ret = guest_memfd_for_each_populated_section(gmm, section, rdl,
297
+ ret = ram_block_attribute_for_each_shared_section(attr, section, scl,
263
+ guest_memfd_notify_populate_cb);
298
+ ram_block_attribute_notify_shared_cb);
264
+ if (ret) {
299
+ if (ret) {
265
+ error_report("%s: Failed to register RAM discard listener: %s", __func__,
300
+ error_report("%s: Failed to register RAM discard listener: %s", __func__,
266
+ strerror(-ret));
301
+ strerror(-ret));
267
+ }
302
+ }
268
+}
303
+}
269
+
304
+
270
+static void guest_memfd_rdm_unregister_listener(RamDiscardManager *rdm,
305
+static void ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm,
271
+ RamDiscardListener *rdl)
306
+ StateChangeListener *scl)
272
+{
307
+{
273
+ GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm);
308
+ RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
309
+ PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
274
+ int ret;
310
+ int ret;
275
+
311
+
276
+ g_assert(rdl->section);
312
+ g_assert(scl->section);
277
+ g_assert(rdl->section->mr == gmm->mr);
313
+ g_assert(scl->section->mr == attr->mr);
278
+
314
+
279
+ ret = guest_memfd_for_each_populated_section(gmm, rdl->section, rdl,
315
+ ret = ram_block_attribute_for_each_shared_section(attr, scl->section, scl,
280
+ guest_memfd_notify_discard_cb);
316
+ ram_block_attribute_notify_private_cb);
281
+ if (ret) {
317
+ if (ret) {
282
+ error_report("%s: Failed to unregister RAM discard listener: %s", __func__,
318
+ error_report("%s: Failed to unregister RAM discard listener: %s", __func__,
283
+ strerror(-ret));
319
+ strerror(-ret));
284
+ }
320
+ }
285
+
321
+
286
+ memory_region_section_free_copy(rdl->section);
322
+ memory_region_section_free_copy(scl->section);
287
+ rdl->section = NULL;
323
+ scl->section = NULL;
288
+ QLIST_REMOVE(rdl, next);
324
+ QLIST_REMOVE(psl, next);
289
+
325
+}
290
+}
326
+
291
+
327
+typedef struct RamBlockAttributeReplayData {
292
+typedef struct GuestMemfdReplayData {
328
+ ReplayStateChange fn;
293
+ void *fn;
294
+ void *opaque;
329
+ void *opaque;
295
+} GuestMemfdReplayData;
330
+} RamBlockAttributeReplayData;
296
+
331
+
297
+static int guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section, void *arg)
332
+static int ram_block_attribute_psm_replay_cb(MemoryRegionSection *section, void *arg)
298
+{
333
+{
299
+ struct GuestMemfdReplayData *data = arg;
334
+ RamBlockAttributeReplayData *data = arg;
300
+ ReplayRamPopulate replay_fn = data->fn;
335
+
301
+
336
+ return data->fn(section, data->opaque);
302
+ return replay_fn(section, data->opaque);
337
+}
303
+}
338
+
304
+
339
+static int ram_block_attribute_psm_replay_on_shared(const GenericStateManager *gsm,
305
+static int guest_memfd_rdm_replay_populated(const RamDiscardManager *rdm,
340
+ MemoryRegionSection *section,
306
+ MemoryRegionSection *section,
341
+ ReplayStateChange replay_fn,
307
+ ReplayRamPopulate replay_fn,
342
+ void *opaque)
308
+ void *opaque)
343
+{
309
+{
344
+ RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
310
+ GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm);
345
+ RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
311
+ struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque };
346
+
312
+
347
+ g_assert(section->mr == attr->mr);
313
+ g_assert(section->mr == gmm->mr);
348
+ return ram_block_attribute_for_each_shared_section(attr, section, &data,
314
+ return guest_memfd_for_each_populated_section(gmm, section, &data,
349
+ ram_block_attribute_psm_replay_cb);
315
+ guest_memfd_rdm_replay_populated_cb);
350
+}
316
+}
351
+
317
+
352
+static int ram_block_attribute_psm_replay_on_private(const GenericStateManager *gsm,
318
+static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection *section, void *arg)
353
+ MemoryRegionSection *section,
319
+{
354
+ ReplayStateChange replay_fn,
320
+ struct GuestMemfdReplayData *data = arg;
355
+ void *opaque)
321
+ ReplayRamDiscard replay_fn = data->fn;
356
+{
322
+
357
+ RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
323
+ replay_fn(section, data->opaque);
358
+ RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
324
+
359
+
325
+ return 0;
360
+ g_assert(section->mr == attr->mr);
326
+}
361
+ return ram_block_attribute_for_each_private_section(attr, section, &data,
327
+
362
+ ram_block_attribute_psm_replay_cb);
328
+static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm,
363
+}
329
+ MemoryRegionSection *section,
364
+
330
+ ReplayRamDiscard replay_fn,
365
+int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr)
331
+ void *opaque)
366
+{
332
+{
367
+ uint64_t shared_bitmap_size;
333
+ GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm);
368
+ const int block_size = qemu_real_host_page_size();
334
+ struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque };
369
+ int ret;
335
+
370
+
336
+ g_assert(section->mr == gmm->mr);
371
+ shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
337
+ guest_memfd_for_each_discarded_section(gmm, section, &data,
372
+
338
+ guest_memfd_rdm_replay_discarded_cb);
373
+ attr->mr = mr;
339
+}
374
+ ret = memory_region_set_generic_state_manager(mr, GENERIC_STATE_MANAGER(attr));
340
+
375
+ if (ret) {
341
+static void guest_memfd_manager_init(Object *obj)
376
+ return ret;
342
+{
377
+ }
343
+ GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj);
378
+ attr->shared_bitmap_size = shared_bitmap_size;
344
+
379
+ attr->shared_bitmap = bitmap_new(shared_bitmap_size);
345
+ QLIST_INIT(&gmm->rdl_list);
380
+
346
+}
381
+ return ret;
347
+
382
+}
348
+static void guest_memfd_manager_finalize(Object *obj)
383
+
349
+{
384
+void ram_block_attribute_unrealize(RamBlockAttribute *attr)
350
+ g_free(GUEST_MEMFD_MANAGER(obj)->bitmap);
385
+{
351
+}
386
+ g_free(attr->shared_bitmap);
352
+
387
+ memory_region_set_generic_state_manager(attr->mr, NULL);
353
+static void guest_memfd_manager_class_init(ObjectClass *oc, void *data)
388
+}
354
+{
389
+
355
+ RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc);
390
+static void ram_block_attribute_init(Object *obj)
356
+
391
+{
357
+ rdmc->get_min_granularity = guest_memfd_rdm_get_min_granularity;
392
+ RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(obj);
358
+ rdmc->register_listener = guest_memfd_rdm_register_listener;
393
+
359
+ rdmc->unregister_listener = guest_memfd_rdm_unregister_listener;
394
+ QLIST_INIT(&attr->psl_list);
360
+ rdmc->is_populated = guest_memfd_rdm_is_populated;
395
+}
361
+ rdmc->replay_populated = guest_memfd_rdm_replay_populated;
396
+
362
+ rdmc->replay_discarded = guest_memfd_rdm_replay_discarded;
397
+static void ram_block_attribute_finalize(Object *obj)
363
+}
398
+{
364
diff --git a/system/meson.build b/system/meson.build
399
+}
365
index XXXXXXX..XXXXXXX 100644
400
+
366
--- a/system/meson.build
401
+static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
367
+++ b/system/meson.build
402
+{
368
@@ -XXX,XX +XXX,XX @@ system_ss.add(files(
403
+ GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(oc);
369
'dirtylimit.c',
404
+
370
'dma-helpers.c',
405
+ gsmc->get_min_granularity = ram_block_attribute_psm_get_min_granularity;
371
'globals.c',
406
+ gsmc->register_listener = ram_block_attribute_psm_register_listener;
372
+ 'guest-memfd-manager.c',
407
+ gsmc->unregister_listener = ram_block_attribute_psm_unregister_listener;
373
'memory_mapping.c',
408
+ gsmc->is_state_set = ram_block_attribute_psm_is_shared;
374
'qdev-monitor.c',
409
+ gsmc->replay_on_state_set = ram_block_attribute_psm_replay_on_shared;
375
'qtest.c',
410
+ gsmc->replay_on_state_clear = ram_block_attribute_psm_replay_on_private;
411
+}
376
--
412
--
377
2.43.5
413
2.43.5
diff view generated by jsdifflib
1
Introduce a new state_change() callback in GuestMemfdManagerClass to
1
A new state_change() callback is introduced in PrivateSharedManageClass
2
efficiently notify all registered RamDiscardListeners, including VFIO
2
to efficiently notify all registered PrivateSharedListeners, including
3
listeners about the memory conversion events in guest_memfd. The
3
VFIO listeners, about memory conversion events in guest_memfd. The VFIO
4
existing VFIO listener can dynamically DMA map/unmap the shared pages
4
listener can dynamically DMA map/unmap shared pages based on conversion
5
based on conversion types:
5
types:
6
- For conversions from shared to private, the VFIO system ensures the
6
- For conversions from shared to private, the VFIO system ensures the
7
discarding of shared mapping from the IOMMU.
7
discarding of shared mapping from the IOMMU.
8
- For conversions from private to shared, it triggers the population of
8
- For conversions from private to shared, it triggers the population of
9
the shared mapping into the IOMMU.
9
the shared mapping into the IOMMU.
10
10
11
Additionally, there could be some special conversion requests:
11
Additionally, special conversion requests are handled as followed:
12
- When a conversion request is made for a page already in the desired
12
- If a conversion request is made for a page already in the desired
13
state, the helper simply returns success.
13
state, the helper simply returns success.
14
- For requests involving a range partially in the desired state, only
14
- For requests involving a range partially in the desired state, only
15
the necessary segments are converted, ensuring the entire range
15
the necessary segments are converted, ensuring efficient compliance
16
complies with the request efficiently.
16
with the request. In this case, fallback to "1 block at a time"
17
- In scenarios where a conversion request is declined by other systems,
17
handling so that the range passed to the notify_to_private/shared() is
18
such as a failure from VFIO during notify_populate(), the helper will
18
always in the desired state.
19
roll back the request, maintaining consistency.
19
- If a conversion request is declined by other systems, such as a
20
failure from VFIO during notify_to_shared(), the helper rolls back the
21
request to maintain consistency. As for notify_to_private() handling,
22
failure in VFIO is unexpected, so no error check is performed.
23
24
Note that the bitmap status is updated before callbacks, allowing
25
listeners to handle memory based on the latest status.
26
27
Opportunistically, introduce a helper to trigger the state_change()
28
callback of the class.
20
29
21
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
30
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
22
---
31
---
23
include/sysemu/guest-memfd-manager.h | 3 +
32
Changes in v4:
24
system/guest-memfd-manager.c | 144 +++++++++++++++++++++++++++
33
- Add the state_change() callback in PrivateSharedManagerClass
25
2 files changed, 147 insertions(+)
34
instead of the RamBlockAttribute.
26
35
27
diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h
36
Changes in v3:
37
- Move the bitmap update before notifier callbacks.
38
- Call the notifier callbacks directly in notify_discard/populate()
39
with the expectation that the request memory range is in the
40
desired attribute.
41
- For the case that only partial range in the desire status, handle
42
the range with block_size granularity for ease of rollback
43
(https://lore.kernel.org/qemu-devel/812768d7-a02d-4b29-95f3-fb7a125cf54e@redhat.com/)
44
45
Changes in v2:
46
- Do the alignment changes due to the rename to MemoryAttributeManager
47
- Move the state_change() helper definition in this patch.
48
---
49
include/exec/memory.h | 7 ++
50
system/memory.c | 10 ++
51
system/ram-block-attribute.c | 191 +++++++++++++++++++++++++++++++++++
52
3 files changed, 208 insertions(+)
53
54
diff --git a/include/exec/memory.h b/include/exec/memory.h
28
index XXXXXXX..XXXXXXX 100644
55
index XXXXXXX..XXXXXXX 100644
29
--- a/include/sysemu/guest-memfd-manager.h
56
--- a/include/exec/memory.h
30
+++ b/include/sysemu/guest-memfd-manager.h
57
+++ b/include/exec/memory.h
31
@@ -XXX,XX +XXX,XX @@ struct GuestMemfdManager {
58
@@ -XXX,XX +XXX,XX @@ struct PrivateSharedListener {
32
59
struct PrivateSharedManagerClass {
33
struct GuestMemfdManagerClass {
60
/* private */
34
ObjectClass parent_class;
61
GenericStateManagerClass parent_class;
35
+
62
+
36
+ int (*state_change)(GuestMemfdManager *gmm, uint64_t offset, uint64_t size,
63
+ int (*state_change)(PrivateSharedManager *mgr, uint64_t offset, uint64_t size,
37
+ bool shared_to_private);
64
+ bool to_private);
38
};
65
};
39
66
40
#endif
67
static inline void private_shared_listener_init(PrivateSharedListener *psl,
41
diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c
68
@@ -XXX,XX +XXX,XX @@ static inline void private_shared_listener_init(PrivateSharedListener *psl,
69
state_change_listener_init(&psl->scl, populate_fn, discard_fn);
70
}
71
72
+int private_shared_manager_state_change(PrivateSharedManager *mgr,
73
+ uint64_t offset, uint64_t size,
74
+ bool to_private);
75
+
76
/**
77
* memory_get_xlat_addr: Extract addresses from a TLB entry
78
*
79
diff --git a/system/memory.c b/system/memory.c
42
index XXXXXXX..XXXXXXX 100644
80
index XXXXXXX..XXXXXXX 100644
43
--- a/system/guest-memfd-manager.c
81
--- a/system/memory.c
44
+++ b/system/guest-memfd-manager.c
82
+++ b/system/memory.c
45
@@ -XXX,XX +XXX,XX @@ static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm,
83
@@ -XXX,XX +XXX,XX @@ void generic_state_manager_unregister_listener(GenericStateManager *gsm,
46
guest_memfd_rdm_replay_discarded_cb);
84
gsmc->unregister_listener(gsm, scl);
47
}
85
}
48
86
49
+static bool guest_memfd_is_valid_range(GuestMemfdManager *gmm,
87
+int private_shared_manager_state_change(PrivateSharedManager *mgr,
50
+ uint64_t offset, uint64_t size)
88
+ uint64_t offset, uint64_t size,
51
+{
89
+ bool to_private)
52
+ MemoryRegion *mr = gmm->mr;
90
+{
91
+ PrivateSharedManagerClass *psmc = PRIVATE_SHARED_MANAGER_GET_CLASS(mgr);
92
+
93
+ g_assert(psmc->state_change);
94
+ return psmc->state_change(mgr, offset, size, to_private);
95
+}
96
+
97
/* Called with rcu_read_lock held. */
98
bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
99
ram_addr_t *ram_addr, bool *read_only,
100
diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
101
index XXXXXXX..XXXXXXX 100644
102
--- a/system/ram-block-attribute.c
103
+++ b/system/ram-block-attribute.c
104
@@ -XXX,XX +XXX,XX @@ static int ram_block_attribute_psm_replay_on_private(const GenericStateManager *
105
ram_block_attribute_psm_replay_cb);
106
}
107
108
+static bool ram_block_attribute_is_valid_range(RamBlockAttribute *attr,
109
+ uint64_t offset, uint64_t size)
110
+{
111
+ MemoryRegion *mr = attr->mr;
53
+
112
+
54
+ g_assert(mr);
113
+ g_assert(mr);
55
+
114
+
56
+ uint64_t region_size = memory_region_size(mr);
115
+ uint64_t region_size = memory_region_size(mr);
57
+ if (!QEMU_IS_ALIGNED(offset, gmm->block_size)) {
116
+ int block_size = ram_block_attribute_get_block_size(attr);
117
+
118
+ if (!QEMU_IS_ALIGNED(offset, block_size)) {
58
+ return false;
119
+ return false;
59
+ }
120
+ }
60
+ if (offset + size < offset || !size) {
121
+ if (offset + size < offset || !size) {
61
+ return false;
122
+ return false;
62
+ }
123
+ }
63
+ if (offset >= region_size || offset + size > region_size) {
124
+ if (offset >= region_size || offset + size > region_size) {
64
+ return false;
125
+ return false;
65
+ }
126
+ }
66
+ return true;
127
+ return true;
67
+}
128
+}
68
+
129
+
69
+static void guest_memfd_notify_discard(GuestMemfdManager *gmm,
130
+static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr,
70
+ uint64_t offset, uint64_t size)
131
+ uint64_t offset, uint64_t size)
71
+{
132
+{
72
+ RamDiscardListener *rdl;
133
+ PrivateSharedListener *psl;
73
+
134
+
74
+ QLIST_FOREACH(rdl, &gmm->rdl_list, next) {
135
+ QLIST_FOREACH(psl, &attr->psl_list, next) {
75
+ MemoryRegionSection tmp = *rdl->section;
136
+ StateChangeListener *scl = &psl->scl;
137
+ MemoryRegionSection tmp = *scl->section;
76
+
138
+
77
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
139
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
78
+ continue;
140
+ continue;
79
+ }
141
+ }
80
+
142
+ scl->notify_to_state_clear(scl, &tmp);
81
+ guest_memfd_for_each_populated_section(gmm, &tmp, rdl,
143
+ }
82
+ guest_memfd_notify_discard_cb);
144
+}
83
+ }
145
+
84
+}
146
+static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
85
+
147
+ uint64_t offset, uint64_t size)
86
+
148
+{
87
+static int guest_memfd_notify_populate(GuestMemfdManager *gmm,
149
+ PrivateSharedListener *psl, *psl2;
88
+ uint64_t offset, uint64_t size)
89
+{
90
+ RamDiscardListener *rdl, *rdl2;
91
+ int ret = 0;
150
+ int ret = 0;
92
+
151
+
93
+ QLIST_FOREACH(rdl, &gmm->rdl_list, next) {
152
+ QLIST_FOREACH(psl, &attr->psl_list, next) {
94
+ MemoryRegionSection tmp = *rdl->section;
153
+ StateChangeListener *scl = &psl->scl;
154
+ MemoryRegionSection tmp = *scl->section;
95
+
155
+
96
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
156
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
97
+ continue;
157
+ continue;
98
+ }
158
+ }
99
+
159
+ ret = scl->notify_to_state_set(scl, &tmp);
100
+ ret = guest_memfd_for_each_discarded_section(gmm, &tmp, rdl,
101
+ guest_memfd_notify_populate_cb);
102
+ if (ret) {
160
+ if (ret) {
103
+ break;
161
+ break;
104
+ }
162
+ }
105
+ }
163
+ }
106
+
164
+
107
+ if (ret) {
165
+ if (ret) {
108
+ /* Notify all already-notified listeners. */
166
+ /* Notify all already-notified listeners. */
109
+ QLIST_FOREACH(rdl2, &gmm->rdl_list, next) {
167
+ QLIST_FOREACH(psl2, &attr->psl_list, next) {
110
+ MemoryRegionSection tmp = *rdl2->section;
168
+ StateChangeListener *scl2 = &psl2->scl;
111
+
169
+ MemoryRegionSection tmp = *scl2->section;
112
+ if (rdl2 == rdl) {
170
+
171
+ if (psl == psl2) {
113
+ break;
172
+ break;
114
+ }
173
+ }
115
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
174
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
116
+ continue;
175
+ continue;
117
+ }
176
+ }
118
+
177
+ scl2->notify_to_state_clear(scl2, &tmp);
119
+ guest_memfd_for_each_discarded_section(gmm, &tmp, rdl2,
120
+ guest_memfd_notify_discard_cb);
121
+ }
178
+ }
122
+ }
179
+ }
123
+ return ret;
180
+ return ret;
124
+}
181
+}
125
+
182
+
126
+static bool guest_memfd_is_range_populated(GuestMemfdManager *gmm,
183
+static bool ram_block_attribute_is_range_shared(RamBlockAttribute *attr,
127
+ uint64_t offset, uint64_t size)
184
+ uint64_t offset, uint64_t size)
128
+{
185
+{
129
+ const unsigned long first_bit = offset / gmm->block_size;
186
+ const int block_size = ram_block_attribute_get_block_size(attr);
130
+ const unsigned long last_bit = first_bit + (size / gmm->block_size) - 1;
187
+ const unsigned long first_bit = offset / block_size;
188
+ const unsigned long last_bit = first_bit + (size / block_size) - 1;
131
+ unsigned long found_bit;
189
+ unsigned long found_bit;
132
+
190
+
133
+ /* We fake a shorter bitmap to avoid searching too far. */
191
+ /* We fake a shorter bitmap to avoid searching too far. */
134
+ found_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit);
192
+ found_bit = find_next_zero_bit(attr->shared_bitmap, last_bit + 1, first_bit);
135
+ return found_bit > last_bit;
193
+ return found_bit > last_bit;
136
+}
194
+}
137
+
195
+
138
+static bool guest_memfd_is_range_discarded(GuestMemfdManager *gmm,
196
+static bool ram_block_attribute_is_range_private(RamBlockAttribute *attr,
139
+ uint64_t offset, uint64_t size)
197
+ uint64_t offset, uint64_t size)
140
+{
198
+{
141
+ const unsigned long first_bit = offset / gmm->block_size;
199
+ const int block_size = ram_block_attribute_get_block_size(attr);
142
+ const unsigned long last_bit = first_bit + (size / gmm->block_size) - 1;
200
+ const unsigned long first_bit = offset / block_size;
201
+ const unsigned long last_bit = first_bit + (size / block_size) - 1;
143
+ unsigned long found_bit;
202
+ unsigned long found_bit;
144
+
203
+
145
+ /* We fake a shorter bitmap to avoid searching too far. */
204
+ /* We fake a shorter bitmap to avoid searching too far. */
146
+ found_bit = find_next_bit(gmm->bitmap, last_bit + 1, first_bit);
205
+ found_bit = find_next_bit(attr->shared_bitmap, last_bit + 1, first_bit);
147
+ return found_bit > last_bit;
206
+ return found_bit > last_bit;
148
+}
207
+}
149
+
208
+
150
+static int guest_memfd_state_change(GuestMemfdManager *gmm, uint64_t offset,
209
+static int ram_block_attribute_psm_state_change(PrivateSharedManager *mgr, uint64_t offset,
151
+ uint64_t size, bool shared_to_private)
210
+ uint64_t size, bool to_private)
152
+{
211
+{
212
+ RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(mgr);
213
+ const int block_size = ram_block_attribute_get_block_size(attr);
214
+ const unsigned long first_bit = offset / block_size;
215
+ const unsigned long nbits = size / block_size;
216
+ const uint64_t end = offset + size;
217
+ unsigned long bit;
218
+ uint64_t cur;
153
+ int ret = 0;
219
+ int ret = 0;
154
+
220
+
155
+ if (!guest_memfd_is_valid_range(gmm, offset, size)) {
221
+ if (!ram_block_attribute_is_valid_range(attr, offset, size)) {
156
+ error_report("%s, invalid range: offset 0x%lx, size 0x%lx",
222
+ error_report("%s, invalid range: offset 0x%lx, size 0x%lx",
157
+ __func__, offset, size);
223
+ __func__, offset, size);
158
+ return -1;
224
+ return -1;
159
+ }
225
+ }
160
+
226
+
161
+ if ((shared_to_private && guest_memfd_is_range_discarded(gmm, offset, size)) ||
227
+ if (to_private) {
162
+ (!shared_to_private && guest_memfd_is_range_populated(gmm, offset, size))) {
228
+ if (ram_block_attribute_is_range_private(attr, offset, size)) {
163
+ return 0;
229
+ /* Already private */
164
+ }
230
+ } else if (!ram_block_attribute_is_range_shared(attr, offset, size)) {
165
+
231
+ /* Unexpected mixture: process individual blocks */
166
+ if (shared_to_private) {
232
+ for (cur = offset; cur < end; cur += block_size) {
167
+ guest_memfd_notify_discard(gmm, offset, size);
233
+ bit = cur / block_size;
234
+ if (!test_bit(bit, attr->shared_bitmap)) {
235
+ continue;
236
+ }
237
+ clear_bit(bit, attr->shared_bitmap);
238
+ ram_block_attribute_notify_to_private(attr, cur, block_size);
239
+ }
240
+ } else {
241
+ /* Completely shared */
242
+ bitmap_clear(attr->shared_bitmap, first_bit, nbits);
243
+ ram_block_attribute_notify_to_private(attr, offset, size);
244
+ }
168
+ } else {
245
+ } else {
169
+ ret = guest_memfd_notify_populate(gmm, offset, size);
246
+ if (ram_block_attribute_is_range_shared(attr, offset, size)) {
170
+ }
247
+ /* Already shared */
171
+
248
+ } else if (!ram_block_attribute_is_range_private(attr, offset, size)) {
172
+ if (!ret) {
249
+ /* Unexpected mixture: process individual blocks */
173
+ unsigned long first_bit = offset / gmm->block_size;
250
+ unsigned long *modified_bitmap = bitmap_new(nbits);
174
+ unsigned long nbits = size / gmm->block_size;
251
+
175
+
252
+ for (cur = offset; cur < end; cur += block_size) {
176
+ g_assert((first_bit + nbits) <= gmm->bitmap_size);
253
+ bit = cur / block_size;
177
+
254
+ if (test_bit(bit, attr->shared_bitmap)) {
178
+ if (shared_to_private) {
255
+ continue;
179
+ bitmap_clear(gmm->bitmap, first_bit, nbits);
256
+ }
257
+ set_bit(bit, attr->shared_bitmap);
258
+ ret = ram_block_attribute_notify_to_shared(attr, cur, block_size);
259
+ if (!ret) {
260
+ set_bit(bit - first_bit, modified_bitmap);
261
+ continue;
262
+ }
263
+ clear_bit(bit, attr->shared_bitmap);
264
+ break;
265
+ }
266
+
267
+ if (ret) {
268
+ /*
269
+ * Very unexpected: something went wrong. Revert to the old
270
+ * state, marking only the blocks as private that we converted
271
+ * to shared.
272
+ */
273
+ for (cur = offset; cur < end; cur += block_size) {
274
+ bit = cur / block_size;
275
+ if (!test_bit(bit - first_bit, modified_bitmap)) {
276
+ continue;
277
+ }
278
+ assert(test_bit(bit, attr->shared_bitmap));
279
+ clear_bit(bit, attr->shared_bitmap);
280
+ ram_block_attribute_notify_to_private(attr, cur, block_size);
281
+ }
282
+ }
283
+ g_free(modified_bitmap);
180
+ } else {
284
+ } else {
181
+ bitmap_set(gmm->bitmap, first_bit, nbits);
285
+ /* Complete private */
182
+ }
286
+ bitmap_set(attr->shared_bitmap, first_bit, nbits);
183
+
287
+ ret = ram_block_attribute_notify_to_shared(attr, offset, size);
184
+ return 0;
288
+ if (ret) {
289
+ bitmap_clear(attr->shared_bitmap, first_bit, nbits);
290
+ }
291
+ }
185
+ }
292
+ }
186
+
293
+
187
+ return ret;
294
+ return ret;
188
+}
295
+}
189
+
296
+
190
static void guest_memfd_manager_init(Object *obj)
297
int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr)
191
{
298
{
192
GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj);
299
uint64_t shared_bitmap_size;
193
@@ -XXX,XX +XXX,XX @@ static void guest_memfd_manager_finalize(Object *obj)
300
@@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_finalize(Object *obj)
194
301
static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
195
static void guest_memfd_manager_class_init(ObjectClass *oc, void *data)
196
{
302
{
197
+ GuestMemfdManagerClass *gmmc = GUEST_MEMFD_MANAGER_CLASS(oc);
303
GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(oc);
198
RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc);
304
+ PrivateSharedManagerClass *psmc = PRIVATE_SHARED_MANAGER_CLASS(oc);
199
305
200
+ gmmc->state_change = guest_memfd_state_change;
306
gsmc->get_min_granularity = ram_block_attribute_psm_get_min_granularity;
201
+
307
gsmc->register_listener = ram_block_attribute_psm_register_listener;
202
rdmc->get_min_granularity = guest_memfd_rdm_get_min_granularity;
308
@@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
203
rdmc->register_listener = guest_memfd_rdm_register_listener;
309
gsmc->is_state_set = ram_block_attribute_psm_is_shared;
204
rdmc->unregister_listener = guest_memfd_rdm_unregister_listener;
310
gsmc->replay_on_state_set = ram_block_attribute_psm_replay_on_shared;
311
gsmc->replay_on_state_clear = ram_block_attribute_psm_replay_on_private;
312
+ psmc->state_change = ram_block_attribute_psm_state_change;
313
}
205
--
314
--
206
2.43.5
315
2.43.5
diff view generated by jsdifflib
1
Introduce the realize()/unrealize() callbacks to initialize/uninitialize
1
A new field, ram_block_attribute, is introduced in RAMBlock to link to a
2
the new guest_memfd_manager object and register/unregister it in the
2
RamBlockAttribute object. This change centralizes all guest_memfd state
3
target MemoryRegion.
3
information (such as fd and shared_bitmap) within a RAMBlock,
4
simplifying management.
4
5
5
Guest_memfd was initially set to shared until the commit bd3bcf6962
6
The realize()/unrealized() helpers are used to initialize/uninitialize
6
("kvm/memory: Make memory type private by default if it has guest memfd
7
the RamBlockAttribute object. The object is registered/unregistered in
7
backend"). To align with this change, the default state in
8
the target RAMBlock's MemoryRegion when creating guest_memfd.
8
guest_memfd_manager is set to private. (The bitmap is cleared to 0).
9
9
Additionally, setting the default to private can also reduce the
10
Additionally, use the private_shared_manager_state_change() helper to
10
overhead of mapping shared pages into IOMMU by VFIO during the bootup stage.
11
notify the registered PrivateSharedListener of these changes.
11
12
12
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
13
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
13
---
14
---
14
include/sysemu/guest-memfd-manager.h | 27 +++++++++++++++++++++++++++
15
Changes in v4:
15
system/guest-memfd-manager.c | 28 +++++++++++++++++++++++++++-
16
- Remove the replay operations for attribute changes which will be
16
system/physmem.c | 7 +++++++
17
handled in a listener in following patches.
17
3 files changed, 61 insertions(+), 1 deletion(-)
18
- Add some comment in the error path of realize() to remind the
19
future development of the unified error path.
18
20
19
diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h
21
Changes in v3:
22
- Use ram_discard_manager_reply_populated/discarded() to set the
23
memory attribute and add the undo support if state_change()
24
failed.
25
- Didn't add Reviewed-by from Alexey due to the new changes in this
26
commit.
27
28
Changes in v2:
29
- Introduce a new field memory_attribute_manager in RAMBlock.
30
- Move the state_change() handling during page conversion in this patch.
31
- Undo what we did if it fails to set.
32
- Change the order of close(guest_memfd) and memory_attribute_manager cleanup.
33
---
34
accel/kvm/kvm-all.c | 9 +++++++++
35
include/exec/ramblock.h | 1 +
36
system/physmem.c | 16 ++++++++++++++++
37
3 files changed, 26 insertions(+)
38
39
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
20
index XXXXXXX..XXXXXXX 100644
40
index XXXXXXX..XXXXXXX 100644
21
--- a/include/sysemu/guest-memfd-manager.h
41
--- a/accel/kvm/kvm-all.c
22
+++ b/include/sysemu/guest-memfd-manager.h
42
+++ b/accel/kvm/kvm-all.c
23
@@ -XXX,XX +XXX,XX @@ struct GuestMemfdManager {
43
@@ -XXX,XX +XXX,XX @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
24
struct GuestMemfdManagerClass {
44
addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
25
ObjectClass parent_class;
45
rb = qemu_ram_block_from_host(addr, false, &offset);
26
46
27
+ void (*realize)(GuestMemfdManager *gmm, MemoryRegion *mr, uint64_t region_size);
47
+ ret = private_shared_manager_state_change(PRIVATE_SHARED_MANAGER(mr->gsm),
28
+ void (*unrealize)(GuestMemfdManager *gmm);
48
+ offset, size, to_private);
29
int (*state_change)(GuestMemfdManager *gmm, uint64_t offset, uint64_t size,
49
+ if (ret) {
30
bool shared_to_private);
50
+ error_report("Failed to notify the listener the state change of "
31
};
51
+ "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s",
32
@@ -XXX,XX +XXX,XX @@ static inline int guest_memfd_manager_state_change(GuestMemfdManager *gmm, uint6
52
+ start, size, to_private ? "private" : "shared");
33
return 0;
53
+ goto out_unref;
34
}
54
+ }
35
36
+static inline void guest_memfd_manager_realize(GuestMemfdManager *gmm,
37
+ MemoryRegion *mr, uint64_t region_size)
38
+{
39
+ GuestMemfdManagerClass *klass;
40
+
55
+
41
+ g_assert(gmm);
56
if (to_private) {
42
+ klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm);
57
if (rb->page_size != qemu_real_host_page_size()) {
43
+
58
/*
44
+ if (klass->realize) {
59
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
45
+ klass->realize(gmm, mr, region_size);
46
+ }
47
+}
48
+
49
+static inline void guest_memfd_manager_unrealize(GuestMemfdManager *gmm)
50
+{
51
+ GuestMemfdManagerClass *klass;
52
+
53
+ g_assert(gmm);
54
+ klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm);
55
+
56
+ if (klass->unrealize) {
57
+ klass->unrealize(gmm);
58
+ }
59
+}
60
+
61
#endif
62
diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c
63
index XXXXXXX..XXXXXXX 100644
60
index XXXXXXX..XXXXXXX 100644
64
--- a/system/guest-memfd-manager.c
61
--- a/include/exec/ramblock.h
65
+++ b/system/guest-memfd-manager.c
62
+++ b/include/exec/ramblock.h
66
@@ -XXX,XX +XXX,XX @@ static int guest_memfd_state_change(GuestMemfdManager *gmm, uint64_t offset,
63
@@ -XXX,XX +XXX,XX @@ struct RAMBlock {
67
return ret;
64
int fd;
68
}
65
uint64_t fd_offset;
69
66
int guest_memfd;
70
+static void guest_memfd_manager_realizefn(GuestMemfdManager *gmm, MemoryRegion *mr,
67
+ RamBlockAttribute *ram_block_attribute;
71
+ uint64_t region_size)
68
size_t page_size;
72
+{
69
/* dirty bitmap used during migration */
73
+ uint64_t bitmap_size;
70
unsigned long *bmap;
74
+
75
+ gmm->block_size = qemu_real_host_page_size();
76
+ bitmap_size = ROUND_UP(region_size, gmm->block_size) / gmm->block_size;
77
+
78
+ gmm->mr = mr;
79
+ gmm->bitmap_size = bitmap_size;
80
+ gmm->bitmap = bitmap_new(bitmap_size);
81
+
82
+ memory_region_set_ram_discard_manager(gmm->mr, RAM_DISCARD_MANAGER(gmm));
83
+}
84
+
85
+static void guest_memfd_manager_unrealizefn(GuestMemfdManager *gmm)
86
+{
87
+ memory_region_set_ram_discard_manager(gmm->mr, NULL);
88
+
89
+ g_free(gmm->bitmap);
90
+ gmm->bitmap = NULL;
91
+ gmm->bitmap_size = 0;
92
+ gmm->mr = NULL;
93
+}
94
+
95
static void guest_memfd_manager_init(Object *obj)
96
{
97
GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj);
98
@@ -XXX,XX +XXX,XX @@ static void guest_memfd_manager_init(Object *obj)
99
100
static void guest_memfd_manager_finalize(Object *obj)
101
{
102
- g_free(GUEST_MEMFD_MANAGER(obj)->bitmap);
103
}
104
105
static void guest_memfd_manager_class_init(ObjectClass *oc, void *data)
106
@@ -XXX,XX +XXX,XX @@ static void guest_memfd_manager_class_init(ObjectClass *oc, void *data)
107
RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc);
108
109
gmmc->state_change = guest_memfd_state_change;
110
+ gmmc->realize = guest_memfd_manager_realizefn;
111
+ gmmc->unrealize = guest_memfd_manager_unrealizefn;
112
113
rdmc->get_min_granularity = guest_memfd_rdm_get_min_granularity;
114
rdmc->register_listener = guest_memfd_rdm_register_listener;
115
diff --git a/system/physmem.c b/system/physmem.c
71
diff --git a/system/physmem.c b/system/physmem.c
116
index XXXXXXX..XXXXXXX 100644
72
index XXXXXXX..XXXXXXX 100644
117
--- a/system/physmem.c
73
--- a/system/physmem.c
118
+++ b/system/physmem.c
74
+++ b/system/physmem.c
119
@@ -XXX,XX +XXX,XX @@
120
#include "sysemu/hostmem.h"
121
#include "sysemu/hw_accel.h"
122
#include "sysemu/xen-mapcache.h"
123
+#include "sysemu/guest-memfd-manager.h"
124
#include "trace.h"
125
126
#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
127
@@ -XXX,XX +XXX,XX @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
75
@@ -XXX,XX +XXX,XX @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
128
qemu_mutex_unlock_ramlist();
76
qemu_mutex_unlock_ramlist();
129
goto out_free;
77
goto out_free;
130
}
78
}
131
+
79
+
132
+ GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(object_new(TYPE_GUEST_MEMFD_MANAGER));
80
+ new_block->ram_block_attribute = RAM_BLOCK_ATTRIBUTE(object_new(TYPE_RAM_BLOCK_ATTRIBUTE));
133
+ guest_memfd_manager_realize(gmm, new_block->mr, new_block->mr->size);
81
+ if (ram_block_attribute_realize(new_block->ram_block_attribute, new_block->mr)) {
82
+ error_setg(errp, "Failed to realize ram block attribute");
83
+ /*
84
+ * The error path could be unified if the rest of ram_block_add() ever
85
+ * develops a need to check for errors.
86
+ */
87
+ object_unref(OBJECT(new_block->ram_block_attribute));
88
+ close(new_block->guest_memfd);
89
+ ram_block_discard_require(false);
90
+ qemu_mutex_unlock_ramlist();
91
+ goto out_free;
92
+ }
134
}
93
}
135
94
136
ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS;
95
ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS;
137
@@ -XXX,XX +XXX,XX @@ static void reclaim_ramblock(RAMBlock *block)
96
@@ -XXX,XX +XXX,XX @@ static void reclaim_ramblock(RAMBlock *block)
97
}
138
98
139
if (block->guest_memfd >= 0) {
99
if (block->guest_memfd >= 0) {
100
+ ram_block_attribute_unrealize(block->ram_block_attribute);
101
+ object_unref(OBJECT(block->ram_block_attribute));
140
close(block->guest_memfd);
102
close(block->guest_memfd);
141
+ GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(block->mr->rdm);
142
+ guest_memfd_manager_unrealize(gmm);
143
+ object_unref(OBJECT(gmm));
144
ram_block_discard_require(false);
103
ram_block_discard_require(false);
145
}
104
}
146
147
--
105
--
148
2.43.5
106
2.43.5
diff view generated by jsdifflib
New patch
1
So that the caller can check the result of NotifyStateClear() handler if
2
the operation fails.
1
3
4
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
5
---
6
Changes in v4:
7
- Newly added.
8
---
9
hw/vfio/common.c | 18 ++++++++++--------
10
include/exec/memory.h | 4 ++--
11
2 files changed, 12 insertions(+), 10 deletions(-)
12
13
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
14
index XXXXXXX..XXXXXXX 100644
15
--- a/hw/vfio/common.c
16
+++ b/hw/vfio/common.c
17
@@ -XXX,XX +XXX,XX @@ out:
18
rcu_read_unlock();
19
}
20
21
-static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer,
22
- MemoryRegionSection *section)
23
+static int vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer,
24
+ MemoryRegionSection *section)
25
{
26
const hwaddr size = int128_get64(section->size);
27
const hwaddr iova = section->offset_within_address_space;
28
@@ -XXX,XX +XXX,XX @@ static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontaine
29
error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
30
strerror(-ret));
31
}
32
+
33
+ return ret;
34
}
35
36
-static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
37
- MemoryRegionSection *section)
38
+static int vfio_ram_discard_notify_discard(StateChangeListener *scl,
39
+ MemoryRegionSection *section)
40
{
41
RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
42
VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
43
listener);
44
- vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section);
45
+ return vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section);
46
}
47
48
-static void vfio_private_shared_notify_to_private(StateChangeListener *scl,
49
- MemoryRegionSection *section)
50
+static int vfio_private_shared_notify_to_private(StateChangeListener *scl,
51
+ MemoryRegionSection *section)
52
{
53
PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
54
VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener,
55
listener);
56
- vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section);
57
+ return vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section);
58
}
59
60
static int vfio_state_change_notify_to_state_set(VFIOContainerBase *bcontainer,
61
diff --git a/include/exec/memory.h b/include/exec/memory.h
62
index XXXXXXX..XXXXXXX 100644
63
--- a/include/exec/memory.h
64
+++ b/include/exec/memory.h
65
@@ -XXX,XX +XXX,XX @@ typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
66
typedef struct StateChangeListener StateChangeListener;
67
typedef int (*NotifyStateSet)(StateChangeListener *scl,
68
MemoryRegionSection *section);
69
-typedef void (*NotifyStateClear)(StateChangeListener *scl,
70
- MemoryRegionSection *section);
71
+typedef int (*NotifyStateClear)(StateChangeListener *scl,
72
+ MemoryRegionSection *section);
73
74
struct StateChangeListener {
75
/*
76
--
77
2.43.5
diff view generated by jsdifflib
1
Introduce a helper to trigger the state_change() callback of the class.
1
With the introduction of the RamBlockAttribute object to manage
2
Once exit to userspace to convert the page from private to shared or
2
RAMBlocks with guest_memfd and the implementation of
3
vice versa at runtime, notify the event via the helper so that other
3
PrivateSharedManager interface to convey page conversion events, it is
4
registered subsystems like VFIO can be notified.
4
more elegant to move attribute changes into a PrivateSharedListener.
5
6
The PrivateSharedListener is reigstered/unregistered for each memory
7
region section during kvm_region_add/del(), and listeners are stored in
8
a CVMPrivateSharedListener list for easy management. The listener
9
handler performs attribute changes upon receiving notifications from
10
private_shared_manager_state_change() calls. With this change, the
11
state changes operations in kvm_convert_memory() can be removed.
12
13
Note that after moving attribute changes into a listener, errors can be
14
returned in ram_block_attribute_notify_to_private() if attribute changes
15
fail in corner cases (e.g. -ENOMEM). Since there is currently no rollback
16
operation for the to_private case, an assert is used to prevent the
17
guest from continuing with a partially changed attribute state.
5
18
6
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
19
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
7
---
20
---
8
accel/kvm/kvm-all.c | 4 ++++
21
Changes in v4:
9
include/sysemu/guest-memfd-manager.h | 15 +++++++++++++++
22
- Newly added.
10
2 files changed, 19 insertions(+)
23
---
24
accel/kvm/kvm-all.c | 73 ++++++++++++++++++---
25
include/system/confidential-guest-support.h | 10 +++
26
system/ram-block-attribute.c | 17 ++++-
27
target/i386/kvm/tdx.c | 1 +
28
target/i386/sev.c | 1 +
29
5 files changed, 90 insertions(+), 12 deletions(-)
11
30
12
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
31
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
13
index XXXXXXX..XXXXXXX 100644
32
index XXXXXXX..XXXXXXX 100644
14
--- a/accel/kvm/kvm-all.c
33
--- a/accel/kvm/kvm-all.c
15
+++ b/accel/kvm/kvm-all.c
34
+++ b/accel/kvm/kvm-all.c
16
@@ -XXX,XX +XXX,XX @@
35
@@ -XXX,XX +XXX,XX @@
17
#include "kvm-cpus.h"
36
#include "kvm-cpus.h"
18
#include "sysemu/dirtylimit.h"
37
#include "system/dirtylimit.h"
19
#include "qemu/range.h"
38
#include "qemu/range.h"
20
+#include "sysemu/guest-memfd-manager.h"
39
+#include "system/confidential-guest-support.h"
21
40
22
#include "hw/boards.h"
41
#include "hw/boards.h"
23
#include "sysemu/stats.h"
42
#include "system/stats.h"
43
@@ -XXX,XX +XXX,XX @@ static int kvm_dirty_ring_init(KVMState *s)
44
return 0;
45
}
46
47
+static int kvm_private_shared_notify(StateChangeListener *scl,
48
+ MemoryRegionSection *section,
49
+ bool to_private)
50
+{
51
+ hwaddr start = section->offset_within_address_space;
52
+ hwaddr size = section->size;
53
+
54
+ if (to_private) {
55
+ return kvm_set_memory_attributes_private(start, size);
56
+ } else {
57
+ return kvm_set_memory_attributes_shared(start, size);
58
+ }
59
+}
60
+
61
+static int kvm_private_shared_notify_to_shared(StateChangeListener *scl,
62
+ MemoryRegionSection *section)
63
+{
64
+ return kvm_private_shared_notify(scl, section, false);
65
+}
66
+
67
+static int kvm_private_shared_notify_to_private(StateChangeListener *scl,
68
+ MemoryRegionSection *section)
69
+{
70
+ return kvm_private_shared_notify(scl, section, true);
71
+}
72
+
73
static void kvm_region_add(MemoryListener *listener,
74
MemoryRegionSection *section)
75
{
76
KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
77
+ ConfidentialGuestSupport *cgs = MACHINE(qdev_get_machine())->cgs;
78
+ GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
79
KVMMemoryUpdate *update;
80
+ CVMPrivateSharedListener *cpsl;
81
+ PrivateSharedListener *psl;
82
+
83
84
update = g_new0(KVMMemoryUpdate, 1);
85
update->section = *section;
86
87
QSIMPLEQ_INSERT_TAIL(&kml->transaction_add, update, next);
88
+
89
+ if (!memory_region_has_guest_memfd(section->mr) || !gsm) {
90
+ return;
91
+ }
92
+
93
+ cpsl = g_new0(CVMPrivateSharedListener, 1);
94
+ cpsl->mr = section->mr;
95
+ cpsl->offset_within_address_space = section->offset_within_address_space;
96
+ cpsl->granularity = generic_state_manager_get_min_granularity(gsm, section->mr);
97
+ psl = &cpsl->listener;
98
+ QLIST_INSERT_HEAD(&cgs->cvm_private_shared_list, cpsl, next);
99
+ private_shared_listener_init(psl, kvm_private_shared_notify_to_shared,
100
+ kvm_private_shared_notify_to_private);
101
+ generic_state_manager_register_listener(gsm, &psl->scl, section);
102
}
103
104
static void kvm_region_del(MemoryListener *listener,
105
MemoryRegionSection *section)
106
{
107
KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
108
+ ConfidentialGuestSupport *cgs = MACHINE(qdev_get_machine())->cgs;
109
+ GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
110
KVMMemoryUpdate *update;
111
+ CVMPrivateSharedListener *cpsl;
112
+ PrivateSharedListener *psl;
113
114
update = g_new0(KVMMemoryUpdate, 1);
115
update->section = *section;
116
117
QSIMPLEQ_INSERT_TAIL(&kml->transaction_del, update, next);
118
+ if (!memory_region_has_guest_memfd(section->mr) || !gsm) {
119
+ return;
120
+ }
121
+
122
+ QLIST_FOREACH(cpsl, &cgs->cvm_private_shared_list, next) {
123
+ if (cpsl->mr == section->mr &&
124
+ cpsl->offset_within_address_space == section->offset_within_address_space) {
125
+ psl = &cpsl->listener;
126
+ generic_state_manager_unregister_listener(gsm, &psl->scl);
127
+ QLIST_REMOVE(cpsl, next);
128
+ g_free(cpsl);
129
+ break;
130
+ }
131
+ }
132
}
133
134
static void kvm_region_commit(MemoryListener *listener)
24
@@ -XXX,XX +XXX,XX @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
135
@@ -XXX,XX +XXX,XX @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
136
goto out_unref;
137
}
138
139
- if (to_private) {
140
- ret = kvm_set_memory_attributes_private(start, size);
141
- } else {
142
- ret = kvm_set_memory_attributes_shared(start, size);
143
- }
144
- if (ret) {
145
- goto out_unref;
146
- }
147
-
25
addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
148
addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
26
rb = qemu_ram_block_from_host(addr, false, &offset);
149
rb = qemu_ram_block_from_host(addr, false, &offset);
27
150
28
+ guest_memfd_manager_state_change(GUEST_MEMFD_MANAGER(mr->rdm), offset,
151
diff --git a/include/system/confidential-guest-support.h b/include/system/confidential-guest-support.h
29
+ size, to_private);
152
index XXXXXXX..XXXXXXX 100644
30
+
153
--- a/include/system/confidential-guest-support.h
31
if (to_private) {
154
+++ b/include/system/confidential-guest-support.h
32
if (rb->page_size != qemu_real_host_page_size()) {
155
@@ -XXX,XX +XXX,XX @@
33
/*
34
diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h
35
index XXXXXXX..XXXXXXX 100644
36
--- a/include/sysemu/guest-memfd-manager.h
37
+++ b/include/sysemu/guest-memfd-manager.h
38
@@ -XXX,XX +XXX,XX @@ struct GuestMemfdManagerClass {
39
bool shared_to_private);
40
};
41
42
+static inline int guest_memfd_manager_state_change(GuestMemfdManager *gmm, uint64_t offset,
43
+ uint64_t size, bool shared_to_private)
44
+{
45
+ GuestMemfdManagerClass *klass;
46
+
47
+ g_assert(gmm);
48
+ klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm);
49
+
50
+ if (klass->state_change) {
51
+ return klass->state_change(gmm, offset, size, shared_to_private);
52
+ }
53
+
54
+ return 0;
55
+}
56
+
57
#endif
156
#endif
157
158
#include "qom/object.h"
159
+#include "exec/memory.h"
160
161
#define TYPE_CONFIDENTIAL_GUEST_SUPPORT "confidential-guest-support"
162
OBJECT_DECLARE_TYPE(ConfidentialGuestSupport,
163
ConfidentialGuestSupportClass,
164
CONFIDENTIAL_GUEST_SUPPORT)
165
166
+typedef struct CVMPrivateSharedListener {
167
+ MemoryRegion *mr;
168
+ hwaddr offset_within_address_space;
169
+ uint64_t granularity;
170
+ PrivateSharedListener listener;
171
+ QLIST_ENTRY(CVMPrivateSharedListener) next;
172
+} CVMPrivateSharedListener;
173
174
struct ConfidentialGuestSupport {
175
Object parent;
176
@@ -XXX,XX +XXX,XX @@ struct ConfidentialGuestSupport {
177
*/
178
bool require_guest_memfd;
179
180
+ QLIST_HEAD(, CVMPrivateSharedListener) cvm_private_shared_list;
181
+
182
/*
183
* ready: flag set by CGS initialization code once it's ready to
184
* start executing instructions in a potentially-secure
185
diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
186
index XXXXXXX..XXXXXXX 100644
187
--- a/system/ram-block-attribute.c
188
+++ b/system/ram-block-attribute.c
189
@@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr,
190
uint64_t offset, uint64_t size)
191
{
192
PrivateSharedListener *psl;
193
+ int ret;
194
195
QLIST_FOREACH(psl, &attr->psl_list, next) {
196
StateChangeListener *scl = &psl->scl;
197
@@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr,
198
if (!memory_region_section_intersect_range(&tmp, offset, size)) {
199
continue;
200
}
201
- scl->notify_to_state_clear(scl, &tmp);
202
+ /*
203
+ * No undo operation for the state_clear() callback failure at present.
204
+ * Expect the state_clear() callback always succeed.
205
+ */
206
+ ret = scl->notify_to_state_clear(scl, &tmp);
207
+ g_assert(!ret);
208
}
209
}
210
211
@@ -XXX,XX +XXX,XX @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
212
uint64_t offset, uint64_t size)
213
{
214
PrivateSharedListener *psl, *psl2;
215
- int ret = 0;
216
+ int ret = 0, ret2 = 0;
217
218
QLIST_FOREACH(psl, &attr->psl_list, next) {
219
StateChangeListener *scl = &psl->scl;
220
@@ -XXX,XX +XXX,XX @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
221
if (!memory_region_section_intersect_range(&tmp, offset, size)) {
222
continue;
223
}
224
- scl2->notify_to_state_clear(scl2, &tmp);
225
+ /*
226
+ * No undo operation for the state_clear() callback failure at present.
227
+ * Expect the state_clear() callback always succeed.
228
+ */
229
+ ret2 = scl2->notify_to_state_clear(scl2, &tmp);
230
+ g_assert(!ret2);
231
}
232
}
233
return ret;
234
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
235
index XXXXXXX..XXXXXXX 100644
236
--- a/target/i386/kvm/tdx.c
237
+++ b/target/i386/kvm/tdx.c
238
@@ -XXX,XX +XXX,XX @@ static void tdx_guest_init(Object *obj)
239
qemu_mutex_init(&tdx->lock);
240
241
cgs->require_guest_memfd = true;
242
+ QLIST_INIT(&cgs->cvm_private_shared_list);
243
tdx->attributes = TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE;
244
245
object_property_add_uint64_ptr(obj, "attributes", &tdx->attributes,
246
diff --git a/target/i386/sev.c b/target/i386/sev.c
247
index XXXXXXX..XXXXXXX 100644
248
--- a/target/i386/sev.c
249
+++ b/target/i386/sev.c
250
@@ -XXX,XX +XXX,XX @@ sev_snp_guest_instance_init(Object *obj)
251
SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(obj);
252
253
cgs->require_guest_memfd = true;
254
+ QLIST_INIT(&cgs->cvm_private_shared_list);
255
256
/* default init/start/finish params for kvm */
257
sev_snp_guest->kvm_start_conf.policy = DEFAULT_SEV_SNP_POLICY;
58
--
258
--
59
2.43.5
259
2.43.5
diff view generated by jsdifflib
New patch
1
In-place page conversion requires operations to follow a specific
2
sequence: unmap-before-conversion-to-private and
3
map-after-conversion-to-shared. Currently, both attribute changes and
4
VFIO DMA map/unmap operations are handled by PrivateSharedListeners,
5
they need to be invoked in a specific order.
1
6
7
For private to shared conversion:
8
- Change attribute to shared.
9
- VFIO populates the shared mappings into the IOMMU.
10
- Restore attribute if the operation fails.
11
12
For shared to private conversion:
13
- VFIO discards shared mapping from the IOMMU.
14
- Change attribute to private.
15
16
To faciliate this sequence, priority support is added to
17
PrivateSharedListener so that listeners are stored in a determined
18
order based on priority. A tail queue is used to store listeners,
19
allowing traversal in either direction.
20
21
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
22
---
23
Changes in v4:
24
- Newly added.
25
---
26
accel/kvm/kvm-all.c | 3 ++-
27
hw/vfio/common.c | 3 ++-
28
include/exec/memory.h | 19 +++++++++++++++++--
29
include/exec/ramblock.h | 2 +-
30
system/ram-block-attribute.c | 23 +++++++++++++++++------
31
5 files changed, 39 insertions(+), 11 deletions(-)
32
33
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
34
index XXXXXXX..XXXXXXX 100644
35
--- a/accel/kvm/kvm-all.c
36
+++ b/accel/kvm/kvm-all.c
37
@@ -XXX,XX +XXX,XX @@ static void kvm_region_add(MemoryListener *listener,
38
psl = &cpsl->listener;
39
QLIST_INSERT_HEAD(&cgs->cvm_private_shared_list, cpsl, next);
40
private_shared_listener_init(psl, kvm_private_shared_notify_to_shared,
41
- kvm_private_shared_notify_to_private);
42
+ kvm_private_shared_notify_to_private,
43
+ PRIVATE_SHARED_LISTENER_PRIORITY_MIN);
44
generic_state_manager_register_listener(gsm, &psl->scl, section);
45
}
46
47
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
48
index XXXXXXX..XXXXXXX 100644
49
--- a/hw/vfio/common.c
50
+++ b/hw/vfio/common.c
51
@@ -XXX,XX +XXX,XX @@ static void vfio_register_private_shared_listener(VFIOContainerBase *bcontainer,
52
53
psl = &vpsl->listener;
54
private_shared_listener_init(psl, vfio_private_shared_notify_to_shared,
55
- vfio_private_shared_notify_to_private);
56
+ vfio_private_shared_notify_to_private,
57
+ PRIVATE_SHARED_LISTENER_PRIORITY_COMMON);
58
generic_state_manager_register_listener(gsm, &psl->scl, section);
59
QLIST_INSERT_HEAD(&bcontainer->vpsl_list, vpsl, next);
60
}
61
diff --git a/include/exec/memory.h b/include/exec/memory.h
62
index XXXXXXX..XXXXXXX 100644
63
--- a/include/exec/memory.h
64
+++ b/include/exec/memory.h
65
@@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass {
66
GenericStateManagerClass parent_class;
67
};
68
69
+#define PRIVATE_SHARED_LISTENER_PRIORITY_MIN 0
70
+#define PRIVATE_SHARED_LISTENER_PRIORITY_COMMON 10
71
+
72
typedef struct PrivateSharedListener PrivateSharedListener;
73
struct PrivateSharedListener {
74
struct StateChangeListener scl;
75
76
- QLIST_ENTRY(PrivateSharedListener) next;
77
+ /*
78
+ * @priority:
79
+ *
80
+ * Govern the order in which ram discard listeners are invoked. Lower priorities
81
+ * are invoked earlier.
82
+ * The listener priority can help to undo the effects of previous listeners in
83
+ * a reverse order in case of a failure callback.
84
+ */
85
+ int priority;
86
+
87
+ QTAILQ_ENTRY(PrivateSharedListener) next;
88
};
89
90
struct PrivateSharedManagerClass {
91
@@ -XXX,XX +XXX,XX @@ struct PrivateSharedManagerClass {
92
93
static inline void private_shared_listener_init(PrivateSharedListener *psl,
94
NotifyStateSet populate_fn,
95
- NotifyStateClear discard_fn)
96
+ NotifyStateClear discard_fn,
97
+ int priority)
98
{
99
state_change_listener_init(&psl->scl, populate_fn, discard_fn);
100
+ psl->priority = priority;
101
}
102
103
int private_shared_manager_state_change(PrivateSharedManager *mgr,
104
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
105
index XXXXXXX..XXXXXXX 100644
106
--- a/include/exec/ramblock.h
107
+++ b/include/exec/ramblock.h
108
@@ -XXX,XX +XXX,XX @@ struct RamBlockAttribute {
109
unsigned shared_bitmap_size;
110
unsigned long *shared_bitmap;
111
112
- QLIST_HEAD(, PrivateSharedListener) psl_list;
113
+ QTAILQ_HEAD(, PrivateSharedListener) psl_list;
114
};
115
116
struct RamBlockAttributeClass {
117
diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
118
index XXXXXXX..XXXXXXX 100644
119
--- a/system/ram-block-attribute.c
120
+++ b/system/ram-block-attribute.c
121
@@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
122
{
123
RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
124
PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
125
+ PrivateSharedListener *other = NULL;
126
int ret;
127
128
g_assert(section->mr == attr->mr);
129
scl->section = memory_region_section_new_copy(section);
130
131
- QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
132
+ if (QTAILQ_EMPTY(&attr->psl_list) ||
133
+ psl->priority >= QTAILQ_LAST(&attr->psl_list)->priority) {
134
+ QTAILQ_INSERT_TAIL(&attr->psl_list, psl, next);
135
+ } else {
136
+ QTAILQ_FOREACH(other, &attr->psl_list, next) {
137
+ if (psl->priority < other->priority) {
138
+ break;
139
+ }
140
+ }
141
+ QTAILQ_INSERT_BEFORE(other, psl, next);
142
+ }
143
144
ret = ram_block_attribute_for_each_shared_section(attr, section, scl,
145
ram_block_attribute_notify_shared_cb);
146
@@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm
147
148
memory_region_section_free_copy(scl->section);
149
scl->section = NULL;
150
- QLIST_REMOVE(psl, next);
151
+ QTAILQ_REMOVE(&attr->psl_list, psl, next);
152
}
153
154
typedef struct RamBlockAttributeReplayData {
155
@@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr,
156
PrivateSharedListener *psl;
157
int ret;
158
159
- QLIST_FOREACH(psl, &attr->psl_list, next) {
160
+ QTAILQ_FOREACH_REVERSE(psl, &attr->psl_list, next) {
161
StateChangeListener *scl = &psl->scl;
162
MemoryRegionSection tmp = *scl->section;
163
164
@@ -XXX,XX +XXX,XX @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
165
PrivateSharedListener *psl, *psl2;
166
int ret = 0, ret2 = 0;
167
168
- QLIST_FOREACH(psl, &attr->psl_list, next) {
169
+ QTAILQ_FOREACH(psl, &attr->psl_list, next) {
170
StateChangeListener *scl = &psl->scl;
171
MemoryRegionSection tmp = *scl->section;
172
173
@@ -XXX,XX +XXX,XX @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
174
175
if (ret) {
176
/* Notify all already-notified listeners. */
177
- QLIST_FOREACH(psl2, &attr->psl_list, next) {
178
+ QTAILQ_FOREACH(psl2, &attr->psl_list, next) {
179
StateChangeListener *scl2 = &psl2->scl;
180
MemoryRegionSection tmp = *scl2->section;
181
182
@@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_init(Object *obj)
183
{
184
RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(obj);
185
186
- QLIST_INIT(&attr->psl_list);
187
+ QTAILQ_INIT(&attr->psl_list);
188
}
189
190
static void ram_block_attribute_finalize(Object *obj)
191
--
192
2.43.5
diff view generated by jsdifflib
1
As guest_memfd is now managed by guest_memfd_manager with
1
As guest_memfd is now managed by ram_block_attribute with
2
RamDiscardManager, only block uncoordinated discard.
2
PrivateSharedManager, only block uncoordinated discard.
3
3
4
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
4
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
5
---
5
---
6
system/physmem.c | 2 +-
6
Changes in v4:
7
1 file changed, 1 insertion(+), 1 deletion(-)
7
- Modify commit message (RamDiscardManager->PrivateSharedManager).
8
9
Changes in v3:
10
- No change.
11
12
Changes in v2:
13
- Change the ram_block_discard_require(false) to
14
ram_block_coordinated_discard_require(false).
15
---
16
system/physmem.c | 6 +++---
17
1 file changed, 3 insertions(+), 3 deletions(-)
8
18
9
diff --git a/system/physmem.c b/system/physmem.c
19
diff --git a/system/physmem.c b/system/physmem.c
10
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
11
--- a/system/physmem.c
21
--- a/system/physmem.c
12
+++ b/system/physmem.c
22
+++ b/system/physmem.c
...
...
17
- ret = ram_block_discard_require(true);
27
- ret = ram_block_discard_require(true);
18
+ ret = ram_block_coordinated_discard_require(true);
28
+ ret = ram_block_coordinated_discard_require(true);
19
if (ret < 0) {
29
if (ret < 0) {
20
error_setg_errno(errp, -ret,
30
error_setg_errno(errp, -ret,
21
"cannot set up private guest memory: discard currently blocked");
31
"cannot set up private guest memory: discard currently blocked");
32
@@ -XXX,XX +XXX,XX @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
33
*/
34
object_unref(OBJECT(new_block->ram_block_attribute));
35
close(new_block->guest_memfd);
36
- ram_block_discard_require(false);
37
+ ram_block_coordinated_discard_require(false);
38
qemu_mutex_unlock_ramlist();
39
goto out_free;
40
}
41
@@ -XXX,XX +XXX,XX @@ static void reclaim_ramblock(RAMBlock *block)
42
ram_block_attribute_unrealize(block->ram_block_attribute);
43
object_unref(OBJECT(block->ram_block_attribute));
44
close(block->guest_memfd);
45
- ram_block_discard_require(false);
46
+ ram_block_coordinated_discard_require(false);
47
}
48
49
g_free(block);
22
--
50
--
23
2.43.5
51
2.43.5
diff view generated by jsdifflib