1 | Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated | 1 | This is the v4 series of the shared device assignment support. |
---|---|---|---|
2 | discard") effectively disables device assignment when using guest_memfd. | ||
3 | This poses a significant challenge as guest_memfd is essential for | ||
4 | confidential guests, thereby blocking device assignment to these VMs. | ||
5 | The initial rationale for disabling device assignment was due to stale | ||
6 | IOMMU mappings (see Problem section) and the assumption that TEE I/O | ||
7 | (SEV-TIO, TDX Connect, COVE-IO, etc.) would solve the device-assignment | ||
8 | problem for confidential guests [1]. However, this assumption has proven | ||
9 | to be incorrect. TEE I/O relies on the ability to operate devices against | ||
10 | "shared" or untrusted memory, which is crucial for device initialization | ||
11 | and error recovery scenarios. As a result, the current implementation does | ||
12 | not adequately support device assignment for confidential guests, necessitating | ||
13 | a reevaluation of the approach to ensure compatibility and functionality. | ||
14 | 2 | ||
15 | This series enables shared device assignment by notifying VFIO of page | 3 | Compared with v3 series, the main changes are: |
16 | conversions using an existing framework named RamDiscardListener. | ||
17 | Additionally, there is an ongoing patch set [2] that aims to add 1G page | ||
18 | support for guest_memfd. This patch set introduces in-place page conversion, | ||
19 | where private and shared memory share the same physical pages as the backend. | ||
20 | This development may impact our solution. | ||
21 | 4 | ||
22 | We presented our solution in the guest_memfd meeting to discuss its | 5 | - Introduced a new GenericStateManager parent class, so that the existing |
23 | compatibility with the new changes and potential future directions (see [3] | 6 | RamDiscardManager and new PrivateSharedManager can be its child class |
24 | for more details). The conclusion was that, although our solution may not be | 7 | and manage different states. |
25 | the most elegant (see the Limitation section), it is sufficient for now and | 8 | - Changed the name of MemoryAttributeManager to RamBlockAttribute to |
26 | can be easily adapted to future changes. | 9 | distinguish from the XXXManager interface and still use it to manage |
10 | guest_memfd information. Meanwhile, Use it to implement | ||
11 | PrivateSharedManager instead of RamDiscardManager to distinguish the | ||
12 | states of populate/discard and shared/private. | ||
13 | - Moved the attribute change operations into a listener so that both the | ||
14 | attribute change and IOMMU pins can be invoked in listener callbacks. | ||
15 | - Added priority listener support in PrivateSharedListener so that the | ||
16 | attribute change listener and VFIO listener can be triggered in | ||
17 | expected order to comply with in-place conversin requirement. | ||
18 | - v3: https://lore.kernel.org/qemu-devel/20250310081837.13123-1-chenyi.qiang@intel.com/ | ||
27 | 19 | ||
28 | We are re-posting the patch series with some cleanup and have removed the RFC | 20 | The overview of this series: |
29 | label for the main enabling patches (1-6). The newly-added patch 7 is still | 21 | - Patch 1-3: preparation patches. These include function exposure and |
30 | marked as RFC as it tries to resolve some extension concerns related to | 22 | some definition changes to return values. |
31 | RamDiscardManager for future usage. | 23 | - Patch 4: Introduce a generic state change parent class with |
24 | RamDiscardManager as its child class. This paves the way to introduce | ||
25 | new child classes to manage other memory states. | ||
26 | - Patch 5-6: Introduce a new child class, PrivateSharedManager, to | ||
27 | manage the private and shared states. Also adds VFIO support for this | ||
28 | new interface to coordinate RAM discard support. | ||
29 | - Patch 7-9: Introduce a new object to implement the | ||
30 | PrivateSharedManager interface and a callback to notify the | ||
31 | shared/private state change. Stores it in RAMBlocks and register it in | ||
32 | the target MemoryRegion so that the object can notify page conversion | ||
33 | events to other systems. | ||
34 | - Patch 10-11: Moves the state change handling into a | ||
35 | PrivateSharedListener so that it can be invoked together with the VFIO | ||
36 | listener by the state_change() call. | ||
37 | - Patch 12: To comply with in-place conversion, introduces the priority | ||
38 | listener support so that the attribute change and IOMMU pin can follow | ||
39 | the expected order. | ||
40 | - Patch 13: Unlocks the coordinate discard so that the shared device | ||
41 | assignment (VFIO) can work with guest_memfd. | ||
32 | 42 | ||
33 | The overview of the patches: | 43 | More small changes or details can be found in the individual patches. |
34 | - Patch 1: Export a helper to get intersection of a MemoryRegionSection | ||
35 | with a given range. | ||
36 | - Patch 2-6: Introduce a new object to manage the guest-memfd with | ||
37 | RamDiscardManager, and notify the shared/private state change during | ||
38 | conversion. | ||
39 | - Patch 7: Try to resolve a semantics concern related to RamDiscardManager | ||
40 | i.e. RamDiscardManager is used to manage memory plug/unplug state | ||
41 | instead of shared/private state. It would affect future users of | ||
42 | RamDiscardManger in confidential VMs. Attach it behind as a RFC patch[4]. | ||
43 | |||
44 | Changes since last version: | ||
45 | - Add a patch to export some generic helper functions from virtio-mem code. | ||
46 | - Change the bitmap in guest_memfd_manager from default shared to default | ||
47 | private. This keeps alignment with virtio-mem that 1-setting in bitmap | ||
48 | represents the populated state and may help to export more generic code | ||
49 | if necessary. | ||
50 | - Add the helpers to initialize/uninitialize the guest_memfd_manager instance | ||
51 | to make it more clear. | ||
52 | - Add a patch to distinguish between the shared/private state change and | ||
53 | the memory plug/unplug state change in RamDiscardManager. | ||
54 | - RFC: https://lore.kernel.org/qemu-devel/20240725072118.358923-1-chenyi.qiang@intel.com/ | ||
55 | 44 | ||
56 | --- | 45 | --- |
46 | Original cover letter with minor changes related to new parent class: | ||
57 | 47 | ||
58 | Background | 48 | Background |
59 | ========== | 49 | ========== |
60 | Confidential VMs have two classes of memory: shared and private memory. | 50 | Confidential VMs have two classes of memory: shared and private memory. |
61 | Shared memory is accessible from the host/VMM while private memory is | 51 | Shared memory is accessible from the host/VMM while private memory is |
62 | not. Confidential VMs can decide which memory is shared/private and | 52 | not. Confidential VMs can decide which memory is shared/private and |
63 | convert memory between shared/private at runtime. | 53 | convert memory between shared/private at runtime. |
64 | 54 | ||
65 | "guest_memfd" is a new kind of fd whose primary goal is to serve guest | 55 | "guest_memfd" is a new kind of fd whose primary goal is to serve guest |
66 | private memory. The key differences between guest_memfd and normal memfd | 56 | private memory. In current implementation, shared memory is allocated |
67 | are that guest_memfd is spawned by a KVM ioctl, bound to its owner VM and | 57 | with normal methods (e.g. mmap or fallocate) while private memory is |
68 | cannot be mapped, read or written by userspace. | 58 | allocated from guest_memfd. When a VM performs memory conversions, QEMU |
69 | 59 | frees pages via madvise or via PUNCH_HOLE on memfd or guest_memfd from | |
70 | In QEMU's implementation, shared memory is allocated with normal methods | 60 | one side, and allocates new pages from the other side. This will cause a |
71 | (e.g. mmap or fallocate) while private memory is allocated from | 61 | stale IOMMU mapping issue mentioned in [1] when we try to enable shared |
72 | guest_memfd. When a VM performs memory conversions, QEMU frees pages via | 62 | device assignment in confidential VMs. |
73 | madvise() or via PUNCH_HOLE on memfd or guest_memfd from one side and | ||
74 | allocates new pages from the other side. | ||
75 | |||
76 | Problem | ||
77 | ======= | ||
78 | Device assignment in QEMU is implemented via VFIO system. In the normal | ||
79 | VM, VM memory is pinned at the beginning of time by VFIO. In the | ||
80 | confidential VM, the VM can convert memory and when that happens | ||
81 | nothing currently tells VFIO that its mappings are stale. This means | ||
82 | that page conversion leaks memory and leaves stale IOMMU mappings. For | ||
83 | example, sequence like the following can result in stale IOMMU mappings: | ||
84 | |||
85 | 1. allocate shared page | ||
86 | 2. convert page shared->private | ||
87 | 3. discard shared page | ||
88 | 4. convert page private->shared | ||
89 | 5. allocate shared page | ||
90 | 6. issue DMA operations against that shared page | ||
91 | |||
92 | After step 3, VFIO is still pinning the page. However, DMA operations in | ||
93 | step 6 will hit the old mapping that was allocated in step 1, which | ||
94 | causes the device to access the invalid data. | ||
95 | 63 | ||
96 | Solution | 64 | Solution |
97 | ======== | 65 | ======== |
98 | The key to enable shared device assignment is to update the IOMMU mappings | 66 | The key to enable shared device assignment is to update the IOMMU mappings |
99 | on page conversion. | 67 | on page conversion. RamDiscardManager, an existing interface currently |
68 | utilized by virtio-mem, offers a means to modify IOMMU mappings in | ||
69 | accordance with VM page assignment. Although the required operations in | ||
70 | VFIO for page conversion are similar to memory plug/unplug, the states of | ||
71 | private/shared are different from discard/populated. We want a similar | ||
72 | mechanism with RamDiscardManager but used to manage the state of private | ||
73 | and shared. | ||
100 | 74 | ||
101 | Given the constraints and assumptions here is a solution that satisfied | 75 | This series introduce a new parent abstract class to manage a pair of |
102 | the use cases. RamDiscardManager, an existing interface currently | 76 | opposite states with RamDiscardManager as its child to manage |
103 | utilized by virtio-mem, offers a means to modify IOMMU mappings in | 77 | populate/discard states, and introduce a new child class, |
104 | accordance with VM page assignment. Page conversion is similar to | 78 | PrivateSharedManager, which can also utilize the same infrastructure to |
105 | hot-removing a page in one mode and adding it back in the other. | 79 | notify VFIO of page conversions. |
106 | 80 | ||
107 | This series implements a RamDiscardManager for confidential VMs and | 81 | Relationship with in-place page conversion |
108 | utilizes its infrastructure to notify VFIO of page conversions. | 82 | ========================================== |
109 | 83 | To support 1G page support for guest_memfd [2], the current direction is to | |
110 | Another possible attempt [5] was to not discard shared pages in step 3 | ||
111 | above. This was an incomplete band-aid because guests would consume | ||
112 | twice the memory since shared pages wouldn't be freed even after they | ||
113 | were converted to private. | ||
114 | |||
115 | w/ in-place page conversion | ||
116 | =========================== | ||
117 | To support 1G page support for guest_memfd, the current direction is to | ||
118 | allow mmap() of guest_memfd to userspace so that both private and shared | 84 | allow mmap() of guest_memfd to userspace so that both private and shared |
119 | memory can use the same physical pages as the backend. This in-place page | 85 | memory can use the same physical pages as the backend. This in-place page |
120 | conversion design eliminates the need to discard pages during shared/private | 86 | conversion design eliminates the need to discard pages during shared/private |
121 | conversions. However, device assignment will still be blocked because the | 87 | conversions. However, device assignment will still be blocked because the |
122 | in-place page conversion will reject the conversion when the page is pinned | 88 | in-place page conversion will reject the conversion when the page is pinned |
123 | by VFIO. | 89 | by VFIO. |
124 | 90 | ||
125 | To address this, the key difference lies in the sequence of VFIO map/unmap | 91 | To address this, the key difference lies in the sequence of VFIO map/unmap |
126 | operations and the page conversion. This series can be adjusted to achieve | 92 | operations and the page conversion. It can be adjusted to achieve |
127 | unmap-before-conversion-to-private and map-after-conversion-to-shared, | 93 | unmap-before-conversion-to-private and map-after-conversion-to-shared, |
128 | ensuring compatibility with guest_memfd. | 94 | ensuring compatibility with guest_memfd. |
129 | 95 | ||
130 | Additionally, with in-place page conversion, the previously mentioned | ||
131 | solution to disable the discard of shared pages is not feasible because | ||
132 | shared and private memory share the same backend, and no discard operation | ||
133 | is performed. Retaining the old mappings in the IOMMU would result in | ||
134 | unsafe DMA access to protected memory. | ||
135 | |||
136 | Limitation | 96 | Limitation |
137 | ========== | 97 | ========== |
138 | 98 | One limitation is that VFIO expects the DMA mapping for a specific IOVA | |
139 | One limitation (also discussed in the guest_memfd meeting) is that VFIO | 99 | to be mapped and unmapped with the same granularity. The guest may |
140 | expects the DMA mapping for a specific IOVA to be mapped and unmapped with | 100 | perform partial conversions, such as converting a small region within a |
141 | the same granularity. The guest may perform partial conversions, such as | 101 | larger region. To prevent such invalid cases, all operations are |
142 | converting a small region within a larger region. To prevent such invalid | 102 | performed with 4K granularity. This could be optimized after the |
143 | cases, all operations are performed with 4K granularity. The possible | 103 | cut_mapping operation [3] is introduced in future. We can alway perform a |
144 | solutions we can think of are either to enable VFIO to support partial unmap | 104 | split-before-unmap if partial conversions happen. If the split succeeds, |
145 | or to implement an enlightened guest to avoid partial conversion. The former | 105 | the unmap will succeed and be atomic. If the split fails, the unmap |
146 | requires complex changes in VFIO, while the latter requires the page | 106 | process fails. |
147 | conversion to be a guest-enlightened behavior. It is still uncertain which | ||
148 | option is a preferred one. | ||
149 | 107 | ||
150 | Testing | 108 | Testing |
151 | ======= | 109 | ======= |
152 | This patch series is tested with the KVM/QEMU branch: | 110 | This patch series is tested based on TDX patches available at: |
153 | KVM: https://github.com/intel/tdx/tree/tdx_kvm_dev-2024-11-20 | 111 | KVM: https://github.com/intel/tdx/tree/kvm-coco-queue-snapshot/kvm-coco-queue-snapshot-20250322 |
154 | QEMU: https://github.com/intel-staging/qemu-tdx/tree/tdx-upstream-snapshot-2024-12-13 | 112 | (With the revert of HEAD commit) |
113 | QEMU: https://github.com/intel-staging/qemu-tdx/tree/tdx-upstream-snapshot-2025-04-07 | ||
155 | 114 | ||
156 | To facilitate shared device assignment with the NIC, employ the legacy | 115 | To facilitate shared device assignment with the NIC, employ the legacy |
157 | type1 VFIO with the QEMU command: | 116 | type1 VFIO with the QEMU command: |
158 | 117 | ||
159 | qemu-system-x86_64 [...] | 118 | qemu-system-x86_64 [...] |
... | ... | ||
174 | Following the bootup of the TD guest, the guest's IP address becomes | 133 | Following the bootup of the TD guest, the guest's IP address becomes |
175 | visible, and iperf is able to successfully send and receive data. | 134 | visible, and iperf is able to successfully send and receive data. |
176 | 135 | ||
177 | Related link | 136 | Related link |
178 | ============ | 137 | ============ |
179 | [1] https://lore.kernel.org/all/d6acfbef-96a1-42bc-8866-c12a4de8c57c@redhat.com/ | 138 | [1] https://lore.kernel.org/qemu-devel/20240423150951.41600-54-pbonzini@redhat.com/ |
180 | [2] https://lore.kernel.org/lkml/cover.1726009989.git.ackerleytng@google.com/ | 139 | [2] https://lore.kernel.org/lkml/cover.1726009989.git.ackerleytng@google.com/ |
181 | [3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAosPOk/edit?tab=t.0#heading=h.jr4csfgw1uql | 140 | [3] https://lore.kernel.org/linux-iommu/7-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com/ |
182 | [4] https://lore.kernel.org/qemu-devel/d299bbad-81bc-462e-91b5-a6d9c27ffe3a@redhat.com/ | ||
183 | [5] https://lore.kernel.org/all/20240320083945.991426-20-michael.roth@amd.com/ | ||
184 | 141 | ||
185 | Chenyi Qiang (7): | 142 | Chenyi Qiang (13): |
186 | memory: Export a helper to get intersection of a MemoryRegionSection | 143 | memory: Export a helper to get intersection of a MemoryRegionSection |
187 | with a given range | 144 | with a given range |
188 | guest_memfd: Introduce an object to manage the guest-memfd with | 145 | memory: Change memory_region_set_ram_discard_manager() to return the |
146 | result | ||
147 | memory: Unify the definiton of ReplayRamPopulate() and | ||
148 | ReplayRamDiscard() | ||
149 | memory: Introduce generic state change parent class for | ||
189 | RamDiscardManager | 150 | RamDiscardManager |
190 | guest_memfd: Introduce a callback to notify the shared/private state | 151 | memory: Introduce PrivateSharedManager Interface as child of |
191 | change | 152 | GenericStateManager |
192 | KVM: Notify the state change event during shared/private conversion | 153 | vfio: Add the support for PrivateSharedManager Interface |
193 | memory: Register the RamDiscardManager instance upon guest_memfd | 154 | ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock |
194 | creation | 155 | with guest_memfd |
195 | RAMBlock: make guest_memfd require coordinate discard | 156 | ram-block-attribute: Introduce a callback to notify shared/private |
196 | memory: Add a new argument to indicate the request attribute in | 157 | state changes |
197 | RamDismcardManager helpers | 158 | memory: Attach RamBlockAttribute to guest_memfd-backed RAMBlocks |
159 | memory: Change NotifyStateClear() definition to return the result | ||
160 | KVM: Introduce CVMPrivateSharedListener for attribute changes during | ||
161 | page conversions | ||
162 | ram-block-attribute: Add priority listener support for | ||
163 | PrivateSharedListener | ||
164 | RAMBlock: Make guest_memfd require coordinate discard | ||
198 | 165 | ||
199 | accel/kvm/kvm-all.c | 4 + | 166 | accel/kvm/kvm-all.c | 81 +++- |
200 | hw/vfio/common.c | 22 +- | 167 | hw/vfio/common.c | 131 +++++- |
201 | hw/virtio/virtio-mem.c | 55 ++-- | 168 | hw/vfio/container-base.c | 1 + |
202 | include/exec/memory.h | 36 ++- | 169 | hw/virtio/virtio-mem.c | 168 +++---- |
203 | include/sysemu/guest-memfd-manager.h | 91 ++++++ | 170 | include/exec/memory.h | 407 ++++++++++------ |
204 | migration/ram.c | 14 +- | 171 | include/exec/ramblock.h | 25 + |
205 | system/guest-memfd-manager.c | 456 +++++++++++++++++++++++++++ | 172 | include/hw/vfio/vfio-container-base.h | 10 + |
206 | system/memory.c | 30 +- | 173 | include/system/confidential-guest-support.h | 10 + |
207 | system/memory_mapping.c | 4 +- | 174 | migration/ram.c | 21 +- |
208 | system/meson.build | 1 + | 175 | system/memory.c | 137 ++++-- |
209 | system/physmem.c | 9 +- | 176 | system/memory_mapping.c | 6 +- |
210 | 11 files changed, 659 insertions(+), 63 deletions(-) | 177 | system/meson.build | 1 + |
211 | create mode 100644 include/sysemu/guest-memfd-manager.h | 178 | system/physmem.c | 20 +- |
212 | create mode 100644 system/guest-memfd-manager.c | 179 | system/ram-block-attribute.c | 495 ++++++++++++++++++++ |
180 | target/i386/kvm/tdx.c | 1 + | ||
181 | target/i386/sev.c | 1 + | ||
182 | 16 files changed, 1192 insertions(+), 323 deletions(-) | ||
183 | create mode 100644 system/ram-block-attribute.c | ||
213 | 184 | ||
214 | -- | 185 | -- |
215 | 2.43.5 | 186 | 2.43.5 | diff view generated by jsdifflib |
1 | Rename the helper to memory_region_section_intersect_range() to make it | 1 | Rename the helper to memory_region_section_intersect_range() to make it |
---|---|---|---|
2 | more generic. | 2 | more generic. Meanwhile, define the @end as Int128 and replace the |
3 | related operations with Int128_* format since the helper is exported as | ||
4 | a wider API. | ||
3 | 5 | ||
6 | Suggested-by: Alexey Kardashevskiy <aik@amd.com> | ||
7 | Reviewed-by: David Hildenbrand <david@redhat.com> | ||
4 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> | 8 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> |
5 | --- | 9 | --- |
10 | Changes in v4: | ||
11 | - No change. | ||
12 | |||
13 | Changes in v3: | ||
14 | - No change | ||
15 | |||
16 | Changes in v2: | ||
17 | - Make memory_region_section_intersect_range() an inline function. | ||
18 | - Add Reviewed-by from David | ||
19 | - Define the @end as Int128 and use the related Int128_* ops as a wilder | ||
20 | API (Alexey) | ||
21 | --- | ||
6 | hw/virtio/virtio-mem.c | 32 +++++--------------------------- | 22 | hw/virtio/virtio-mem.c | 32 +++++--------------------------- |
7 | include/exec/memory.h | 13 +++++++++++++ | 23 | include/exec/memory.h | 27 +++++++++++++++++++++++++++ |
8 | system/memory.c | 17 +++++++++++++++++ | 24 | 2 files changed, 32 insertions(+), 27 deletions(-) |
9 | 3 files changed, 35 insertions(+), 27 deletions(-) | ||
10 | 25 | ||
11 | diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c | 26 | diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c |
12 | index XXXXXXX..XXXXXXX 100644 | 27 | index XXXXXXX..XXXXXXX 100644 |
13 | --- a/hw/virtio/virtio-mem.c | 28 | --- a/hw/virtio/virtio-mem.c |
14 | +++ b/hw/virtio/virtio-mem.c | 29 | +++ b/hw/virtio/virtio-mem.c |
... | ... | ||
102 | + * @offset: the offset of the given range in the memory region | 117 | + * @offset: the offset of the given range in the memory region |
103 | + * @size: the size of the given range | 118 | + * @size: the size of the given range |
104 | + * | 119 | + * |
105 | + * Returns false if the intersection is empty, otherwise returns true. | 120 | + * Returns false if the intersection is empty, otherwise returns true. |
106 | + */ | 121 | + */ |
107 | +bool memory_region_section_intersect_range(MemoryRegionSection *s, | 122 | +static inline bool memory_region_section_intersect_range(MemoryRegionSection *s, |
108 | + uint64_t offset, uint64_t size); | 123 | + uint64_t offset, uint64_t size) |
109 | + | ||
110 | /** | ||
111 | * memory_region_init: Initialize a memory region | ||
112 | * | ||
113 | diff --git a/system/memory.c b/system/memory.c | ||
114 | index XXXXXXX..XXXXXXX 100644 | ||
115 | --- a/system/memory.c | ||
116 | +++ b/system/memory.c | ||
117 | @@ -XXX,XX +XXX,XX @@ void memory_region_section_free_copy(MemoryRegionSection *s) | ||
118 | g_free(s); | ||
119 | } | ||
120 | |||
121 | +bool memory_region_section_intersect_range(MemoryRegionSection *s, | ||
122 | + uint64_t offset, uint64_t size) | ||
123 | +{ | 124 | +{ |
124 | + uint64_t start = MAX(s->offset_within_region, offset); | 125 | + uint64_t start = MAX(s->offset_within_region, offset); |
125 | + uint64_t end = MIN(s->offset_within_region + int128_get64(s->size), | 126 | + Int128 end = int128_min(int128_add(int128_make64(s->offset_within_region), s->size), |
126 | + offset + size); | 127 | + int128_add(int128_make64(offset), int128_make64(size))); |
127 | + | 128 | + |
128 | + if (end <= start) { | 129 | + if (int128_le(end, int128_make64(start))) { |
129 | + return false; | 130 | + return false; |
130 | + } | 131 | + } |
131 | + | 132 | + |
132 | + s->offset_within_address_space += start - s->offset_within_region; | 133 | + s->offset_within_address_space += start - s->offset_within_region; |
133 | + s->offset_within_region = start; | 134 | + s->offset_within_region = start; |
134 | + s->size = int128_make64(end - start); | 135 | + s->size = int128_sub(end, int128_make64(start)); |
135 | + return true; | 136 | + return true; |
136 | +} | 137 | +} |
137 | + | 138 | + |
138 | bool memory_region_present(MemoryRegion *container, hwaddr addr) | 139 | /** |
139 | { | 140 | * memory_region_init: Initialize a memory region |
140 | MemoryRegion *mr; | 141 | * |
141 | -- | 142 | -- |
142 | 2.43.5 | 143 | 2.43.5 | diff view generated by jsdifflib |
New patch | |||
---|---|---|---|
1 | Modify memory_region_set_ram_discard_manager() to return false if a | ||
2 | RamDiscardManager is already set in the MemoryRegion. The caller must | ||
3 | handle this failure, such as having virtio-mem undo its actions and fail | ||
4 | the realize() process. Opportunistically move the call earlier to avoid | ||
5 | complex error handling. | ||
1 | 6 | ||
7 | This change is beneficial when introducing a new RamDiscardManager | ||
8 | instance besides virtio-mem. After | ||
9 | ram_block_coordinated_discard_require(true) unlocks all | ||
10 | RamDiscardManager instances, only one instance is allowed to be set for | ||
11 | a MemoryRegion at present. | ||
12 | |||
13 | Suggested-by: David Hildenbrand <david@redhat.com> | ||
14 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> | ||
15 | --- | ||
16 | Changes in v4: | ||
17 | - No change. | ||
18 | |||
19 | Changes in v3: | ||
20 | - Move set_ram_discard_manager() up to avoid a g_free() | ||
21 | - Clean up set_ram_discard_manager() definition | ||
22 | |||
23 | Changes in v2: | ||
24 | - newly added. | ||
25 | --- | ||
26 | hw/virtio/virtio-mem.c | 29 ++++++++++++++++------------- | ||
27 | include/exec/memory.h | 6 +++--- | ||
28 | system/memory.c | 10 +++++++--- | ||
29 | 3 files changed, 26 insertions(+), 19 deletions(-) | ||
30 | |||
31 | diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c | ||
32 | index XXXXXXX..XXXXXXX 100644 | ||
33 | --- a/hw/virtio/virtio-mem.c | ||
34 | +++ b/hw/virtio/virtio-mem.c | ||
35 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) | ||
36 | return; | ||
37 | } | ||
38 | |||
39 | + /* | ||
40 | + * Set ourselves as RamDiscardManager before the plug handler maps the | ||
41 | + * memory region and exposes it via an address space. | ||
42 | + */ | ||
43 | + if (memory_region_set_ram_discard_manager(&vmem->memdev->mr, | ||
44 | + RAM_DISCARD_MANAGER(vmem))) { | ||
45 | + error_setg(errp, "Failed to set RamDiscardManager"); | ||
46 | + ram_block_coordinated_discard_require(false); | ||
47 | + return; | ||
48 | + } | ||
49 | + | ||
50 | /* | ||
51 | * We don't know at this point whether shared RAM is migrated using | ||
52 | * QEMU or migrated using the file content. "x-ignore-shared" will be | ||
53 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) | ||
54 | vmem->system_reset = VIRTIO_MEM_SYSTEM_RESET(obj); | ||
55 | vmem->system_reset->vmem = vmem; | ||
56 | qemu_register_resettable(obj); | ||
57 | - | ||
58 | - /* | ||
59 | - * Set ourselves as RamDiscardManager before the plug handler maps the | ||
60 | - * memory region and exposes it via an address space. | ||
61 | - */ | ||
62 | - memory_region_set_ram_discard_manager(&vmem->memdev->mr, | ||
63 | - RAM_DISCARD_MANAGER(vmem)); | ||
64 | } | ||
65 | |||
66 | static void virtio_mem_device_unrealize(DeviceState *dev) | ||
67 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_device_unrealize(DeviceState *dev) | ||
68 | VirtIODevice *vdev = VIRTIO_DEVICE(dev); | ||
69 | VirtIOMEM *vmem = VIRTIO_MEM(dev); | ||
70 | |||
71 | - /* | ||
72 | - * The unplug handler unmapped the memory region, it cannot be | ||
73 | - * found via an address space anymore. Unset ourselves. | ||
74 | - */ | ||
75 | - memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL); | ||
76 | - | ||
77 | qemu_unregister_resettable(OBJECT(vmem->system_reset)); | ||
78 | object_unref(OBJECT(vmem->system_reset)); | ||
79 | |||
80 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_device_unrealize(DeviceState *dev) | ||
81 | virtio_del_queue(vdev, 0); | ||
82 | virtio_cleanup(vdev); | ||
83 | g_free(vmem->bitmap); | ||
84 | + /* | ||
85 | + * The unplug handler unmapped the memory region, it cannot be | ||
86 | + * found via an address space anymore. Unset ourselves. | ||
87 | + */ | ||
88 | + memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL); | ||
89 | ram_block_coordinated_discard_require(false); | ||
90 | } | ||
91 | |||
92 | diff --git a/include/exec/memory.h b/include/exec/memory.h | ||
93 | index XXXXXXX..XXXXXXX 100644 | ||
94 | --- a/include/exec/memory.h | ||
95 | +++ b/include/exec/memory.h | ||
96 | @@ -XXX,XX +XXX,XX @@ static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr) | ||
97 | * | ||
98 | * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion | ||
99 | * that does not cover RAM, or a #MemoryRegion that already has a | ||
100 | - * #RamDiscardManager assigned. | ||
101 | + * #RamDiscardManager assigned. Return 0 if the rdm is set successfully. | ||
102 | * | ||
103 | * @mr: the #MemoryRegion | ||
104 | * @rdm: #RamDiscardManager to set | ||
105 | */ | ||
106 | -void memory_region_set_ram_discard_manager(MemoryRegion *mr, | ||
107 | - RamDiscardManager *rdm); | ||
108 | +int memory_region_set_ram_discard_manager(MemoryRegion *mr, | ||
109 | + RamDiscardManager *rdm); | ||
110 | |||
111 | /** | ||
112 | * memory_region_find: translate an address/size relative to a | ||
113 | diff --git a/system/memory.c b/system/memory.c | ||
114 | index XXXXXXX..XXXXXXX 100644 | ||
115 | --- a/system/memory.c | ||
116 | +++ b/system/memory.c | ||
117 | @@ -XXX,XX +XXX,XX @@ RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr) | ||
118 | return mr->rdm; | ||
119 | } | ||
120 | |||
121 | -void memory_region_set_ram_discard_manager(MemoryRegion *mr, | ||
122 | - RamDiscardManager *rdm) | ||
123 | +int memory_region_set_ram_discard_manager(MemoryRegion *mr, | ||
124 | + RamDiscardManager *rdm) | ||
125 | { | ||
126 | g_assert(memory_region_is_ram(mr)); | ||
127 | - g_assert(!rdm || !mr->rdm); | ||
128 | + if (mr->rdm && rdm) { | ||
129 | + return -EBUSY; | ||
130 | + } | ||
131 | + | ||
132 | mr->rdm = rdm; | ||
133 | + return 0; | ||
134 | } | ||
135 | |||
136 | uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm, | ||
137 | -- | ||
138 | 2.43.5 | diff view generated by jsdifflib |
New patch | |||
---|---|---|---|
1 | Update ReplayRamDiscard() function to return the result and unify the | ||
2 | ReplayRamPopulate() and ReplayRamDiscard() to ReplayStateChange() at | ||
3 | the same time due to their identical definitions. This unification | ||
4 | simplifies related structures, such as VirtIOMEMReplayData, which makes | ||
5 | it more cleaner and maintainable. | ||
1 | 6 | ||
7 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> | ||
8 | --- | ||
9 | Changes in v4: | ||
10 | - Modify the commit message. We won't use Replay() operation when | ||
11 | doing the attribute change like v3. | ||
12 | |||
13 | Changes in v3: | ||
14 | - Newly added. | ||
15 | --- | ||
16 | hw/virtio/virtio-mem.c | 20 ++++++++++---------- | ||
17 | include/exec/memory.h | 31 ++++++++++++++++--------------- | ||
18 | migration/ram.c | 5 +++-- | ||
19 | system/memory.c | 12 ++++++------ | ||
20 | 4 files changed, 35 insertions(+), 33 deletions(-) | ||
21 | |||
22 | diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c | ||
23 | index XXXXXXX..XXXXXXX 100644 | ||
24 | --- a/hw/virtio/virtio-mem.c | ||
25 | +++ b/hw/virtio/virtio-mem.c | ||
26 | @@ -XXX,XX +XXX,XX @@ static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm, | ||
27 | } | ||
28 | |||
29 | struct VirtIOMEMReplayData { | ||
30 | - void *fn; | ||
31 | + ReplayStateChange fn; | ||
32 | void *opaque; | ||
33 | }; | ||
34 | |||
35 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg) | ||
36 | { | ||
37 | struct VirtIOMEMReplayData *data = arg; | ||
38 | |||
39 | - return ((ReplayRamPopulate)data->fn)(s, data->opaque); | ||
40 | + return data->fn(s, data->opaque); | ||
41 | } | ||
42 | |||
43 | static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm, | ||
44 | MemoryRegionSection *s, | ||
45 | - ReplayRamPopulate replay_fn, | ||
46 | + ReplayStateChange replay_fn, | ||
47 | void *opaque) | ||
48 | { | ||
49 | const VirtIOMEM *vmem = VIRTIO_MEM(rdm); | ||
50 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s, | ||
51 | { | ||
52 | struct VirtIOMEMReplayData *data = arg; | ||
53 | |||
54 | - ((ReplayRamDiscard)data->fn)(s, data->opaque); | ||
55 | + data->fn(s, data->opaque); | ||
56 | return 0; | ||
57 | } | ||
58 | |||
59 | -static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm, | ||
60 | - MemoryRegionSection *s, | ||
61 | - ReplayRamDiscard replay_fn, | ||
62 | - void *opaque) | ||
63 | +static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm, | ||
64 | + MemoryRegionSection *s, | ||
65 | + ReplayStateChange replay_fn, | ||
66 | + void *opaque) | ||
67 | { | ||
68 | const VirtIOMEM *vmem = VIRTIO_MEM(rdm); | ||
69 | struct VirtIOMEMReplayData data = { | ||
70 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm, | ||
71 | }; | ||
72 | |||
73 | g_assert(s->mr == &vmem->memdev->mr); | ||
74 | - virtio_mem_for_each_unplugged_section(vmem, s, &data, | ||
75 | - virtio_mem_rdm_replay_discarded_cb); | ||
76 | + return virtio_mem_for_each_unplugged_section(vmem, s, &data, | ||
77 | + virtio_mem_rdm_replay_discarded_cb); | ||
78 | } | ||
79 | |||
80 | static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm, | ||
81 | diff --git a/include/exec/memory.h b/include/exec/memory.h | ||
82 | index XXXXXXX..XXXXXXX 100644 | ||
83 | --- a/include/exec/memory.h | ||
84 | +++ b/include/exec/memory.h | ||
85 | @@ -XXX,XX +XXX,XX @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl, | ||
86 | rdl->double_discard_supported = double_discard_supported; | ||
87 | } | ||
88 | |||
89 | -typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaque); | ||
90 | -typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void *opaque); | ||
91 | +typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque); | ||
92 | |||
93 | /* | ||
94 | * RamDiscardManagerClass: | ||
95 | @@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass { | ||
96 | /** | ||
97 | * @replay_populated: | ||
98 | * | ||
99 | - * Call the #ReplayRamPopulate callback for all populated parts within the | ||
100 | + * Call the #ReplayStateChange callback for all populated parts within the | ||
101 | * #MemoryRegionSection via the #RamDiscardManager. | ||
102 | * | ||
103 | * In case any call fails, no further calls are made. | ||
104 | * | ||
105 | * @rdm: the #RamDiscardManager | ||
106 | * @section: the #MemoryRegionSection | ||
107 | - * @replay_fn: the #ReplayRamPopulate callback | ||
108 | + * @replay_fn: the #ReplayStateChange callback | ||
109 | * @opaque: pointer to forward to the callback | ||
110 | * | ||
111 | * Returns 0 on success, or a negative error if any notification failed. | ||
112 | */ | ||
113 | int (*replay_populated)(const RamDiscardManager *rdm, | ||
114 | MemoryRegionSection *section, | ||
115 | - ReplayRamPopulate replay_fn, void *opaque); | ||
116 | + ReplayStateChange replay_fn, void *opaque); | ||
117 | |||
118 | /** | ||
119 | * @replay_discarded: | ||
120 | * | ||
121 | - * Call the #ReplayRamDiscard callback for all discarded parts within the | ||
122 | + * Call the #ReplayStateChange callback for all discarded parts within the | ||
123 | * #MemoryRegionSection via the #RamDiscardManager. | ||
124 | * | ||
125 | * @rdm: the #RamDiscardManager | ||
126 | * @section: the #MemoryRegionSection | ||
127 | - * @replay_fn: the #ReplayRamDiscard callback | ||
128 | + * @replay_fn: the #ReplayStateChange callback | ||
129 | * @opaque: pointer to forward to the callback | ||
130 | + * | ||
131 | + * Returns 0 on success, or a negative error if any notification failed. | ||
132 | */ | ||
133 | - void (*replay_discarded)(const RamDiscardManager *rdm, | ||
134 | - MemoryRegionSection *section, | ||
135 | - ReplayRamDiscard replay_fn, void *opaque); | ||
136 | + int (*replay_discarded)(const RamDiscardManager *rdm, | ||
137 | + MemoryRegionSection *section, | ||
138 | + ReplayStateChange replay_fn, void *opaque); | ||
139 | |||
140 | /** | ||
141 | * @register_listener: | ||
142 | @@ -XXX,XX +XXX,XX @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, | ||
143 | |||
144 | int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, | ||
145 | MemoryRegionSection *section, | ||
146 | - ReplayRamPopulate replay_fn, | ||
147 | + ReplayStateChange replay_fn, | ||
148 | void *opaque); | ||
149 | |||
150 | -void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, | ||
151 | - MemoryRegionSection *section, | ||
152 | - ReplayRamDiscard replay_fn, | ||
153 | - void *opaque); | ||
154 | +int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, | ||
155 | + MemoryRegionSection *section, | ||
156 | + ReplayStateChange replay_fn, | ||
157 | + void *opaque); | ||
158 | |||
159 | void ram_discard_manager_register_listener(RamDiscardManager *rdm, | ||
160 | RamDiscardListener *rdl, | ||
161 | diff --git a/migration/ram.c b/migration/ram.c | ||
162 | index XXXXXXX..XXXXXXX 100644 | ||
163 | --- a/migration/ram.c | ||
164 | +++ b/migration/ram.c | ||
165 | @@ -XXX,XX +XXX,XX @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs, | ||
166 | return ret; | ||
167 | } | ||
168 | |||
169 | -static void dirty_bitmap_clear_section(MemoryRegionSection *section, | ||
170 | - void *opaque) | ||
171 | +static int dirty_bitmap_clear_section(MemoryRegionSection *section, | ||
172 | + void *opaque) | ||
173 | { | ||
174 | const hwaddr offset = section->offset_within_region; | ||
175 | const hwaddr size = int128_get64(section->size); | ||
176 | @@ -XXX,XX +XXX,XX @@ static void dirty_bitmap_clear_section(MemoryRegionSection *section, | ||
177 | } | ||
178 | *cleared_bits += bitmap_count_one_with_offset(rb->bmap, start, npages); | ||
179 | bitmap_clear(rb->bmap, start, npages); | ||
180 | + return 0; | ||
181 | } | ||
182 | |||
183 | /* | ||
184 | diff --git a/system/memory.c b/system/memory.c | ||
185 | index XXXXXXX..XXXXXXX 100644 | ||
186 | --- a/system/memory.c | ||
187 | +++ b/system/memory.c | ||
188 | @@ -XXX,XX +XXX,XX @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, | ||
189 | |||
190 | int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, | ||
191 | MemoryRegionSection *section, | ||
192 | - ReplayRamPopulate replay_fn, | ||
193 | + ReplayStateChange replay_fn, | ||
194 | void *opaque) | ||
195 | { | ||
196 | RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); | ||
197 | @@ -XXX,XX +XXX,XX @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, | ||
198 | return rdmc->replay_populated(rdm, section, replay_fn, opaque); | ||
199 | } | ||
200 | |||
201 | -void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, | ||
202 | - MemoryRegionSection *section, | ||
203 | - ReplayRamDiscard replay_fn, | ||
204 | - void *opaque) | ||
205 | +int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, | ||
206 | + MemoryRegionSection *section, | ||
207 | + ReplayStateChange replay_fn, | ||
208 | + void *opaque) | ||
209 | { | ||
210 | RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); | ||
211 | |||
212 | g_assert(rdmc->replay_discarded); | ||
213 | - rdmc->replay_discarded(rdm, section, replay_fn, opaque); | ||
214 | + return rdmc->replay_discarded(rdm, section, replay_fn, opaque); | ||
215 | } | ||
216 | |||
217 | void ram_discard_manager_register_listener(RamDiscardManager *rdm, | ||
218 | -- | ||
219 | 2.43.5 | diff view generated by jsdifflib |
1 | For each ram_discard_manager helper, add a new argument 'is_private' to | 1 | RamDiscardManager is an interface used by virtio-mem to adjust VFIO |
---|---|---|---|
2 | indicate the request attribute. If is_private is true, the operation | 2 | mappings in relation to VM page assignment. It manages the state of |
3 | targets the private range in the section. For example, | 3 | populated and discard for the RAM. To accommodate future scnarios for |
4 | replay_populate(true) will replay the populate operation on private part | 4 | managing RAM states, such as private and shared states in confidential |
5 | in the MemoryRegionSection, while replay_popuate(false) will replay | 5 | VMs, the existing RamDiscardManager interface needs to be generalized. |
6 | population on shared part. | ||
7 | 6 | ||
8 | This helps to distinguish between the states of private/shared and | 7 | Introduce a parent class, GenericStateManager, to manage a pair of |
9 | discarded/populated. It is essential for guest_memfd_manager which uses | 8 | opposite states with RamDiscardManager as its child. The changes include |
10 | RamDiscardManager interface but can't treat private memory as discarded | 9 | - Define a new abstract class GenericStateChange. |
11 | memory. This is because it does not align with the expectation of | 10 | - Extract six callbacks into GenericStateChangeClass and allow the child |
12 | current RamDiscardManager users (e.g. live migration), who expect that | 11 | classes to inherit them. |
13 | discarded memory is hot-removed and can be skipped when processing guest | 12 | - Modify RamDiscardManager-related helpers to use GenericStateManager |
14 | memory. Treating private memory as discarded won't work in the future if | 13 | ones. |
15 | live migration needs to handle private memory. For example, live | 14 | - Define a generic StatChangeListener to extract fields from |
16 | migration needs to migrate private memory. | 15 | RamDiscardManager listener which allows future listeners to embed it |
16 | and avoid duplication. | ||
17 | - Change the users of RamDiscardManager (virtio-mem, migration, etc.) to | ||
18 | switch to use GenericStateChange helpers. | ||
17 | 19 | ||
18 | The user of the helper needs to figure out which attribute to | 20 | It can provide a more flexible and resuable framework for RAM state |
19 | manipulate. For legacy VM case, use is_private=true by default. Private | 21 | management, facilitating future enhancements and use cases. |
20 | attribute is only valid in a guest_memfd based VM. | ||
21 | |||
22 | Opportunistically rename the guest_memfd_for_each_{discarded, | ||
23 | populated}_section() to guest_memfd_for_each_{private, shared)_section() | ||
24 | to distinguish between private/shared and discarded/populated at the | ||
25 | same time. | ||
26 | 22 | ||
27 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> | 23 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> |
28 | --- | 24 | --- |
29 | hw/vfio/common.c | 22 ++++++-- | 25 | Changes in v4: |
30 | hw/virtio/virtio-mem.c | 23 ++++---- | 26 | - Newly added. |
31 | include/exec/memory.h | 23 ++++++-- | 27 | --- |
32 | migration/ram.c | 14 ++--- | 28 | hw/vfio/common.c | 30 ++-- |
33 | system/guest-memfd-manager.c | 106 +++++++++++++++++++++++------------ | 29 | hw/virtio/virtio-mem.c | 95 ++++++------ |
34 | system/memory.c | 13 +++-- | 30 | include/exec/memory.h | 313 ++++++++++++++++++++++------------------ |
35 | system/memory_mapping.c | 4 +- | 31 | migration/ram.c | 16 +- |
36 | 7 files changed, 135 insertions(+), 70 deletions(-) | 32 | system/memory.c | 106 ++++++++------ |
33 | system/memory_mapping.c | 6 +- | ||
34 | 6 files changed, 310 insertions(+), 256 deletions(-) | ||
37 | 35 | ||
38 | diff --git a/hw/vfio/common.c b/hw/vfio/common.c | 36 | diff --git a/hw/vfio/common.c b/hw/vfio/common.c |
39 | index XXXXXXX..XXXXXXX 100644 | 37 | index XXXXXXX..XXXXXXX 100644 |
40 | --- a/hw/vfio/common.c | 38 | --- a/hw/vfio/common.c |
41 | +++ b/hw/vfio/common.c | 39 | +++ b/hw/vfio/common.c |
42 | @@ -XXX,XX +XXX,XX @@ out: | 40 | @@ -XXX,XX +XXX,XX @@ out: |
43 | } | 41 | rcu_read_unlock(); |
44 | 42 | } | |
45 | static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl, | 43 | |
46 | - MemoryRegionSection *section) | 44 | -static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl, |
47 | + MemoryRegionSection *section, | 45 | +static void vfio_ram_discard_notify_discard(StateChangeListener *scl, |
48 | + bool is_private) | 46 | MemoryRegionSection *section) |
49 | { | 47 | { |
48 | + RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl); | ||
50 | VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, | 49 | VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, |
51 | listener); | 50 | listener); |
51 | VFIOContainerBase *bcontainer = vrdl->bcontainer; | ||
52 | @@ -XXX,XX +XXX,XX @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl, | 52 | @@ -XXX,XX +XXX,XX @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl, |
53 | const hwaddr iova = section->offset_within_address_space; | 53 | } |
54 | int ret; | 54 | } |
55 | 55 | ||
56 | + if (is_private) { | 56 | -static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl, |
57 | + /* Not support discard private memory yet. */ | 57 | +static int vfio_ram_discard_notify_populate(StateChangeListener *scl, |
58 | + return; | 58 | MemoryRegionSection *section) |
59 | + } | 59 | { |
60 | + | 60 | + RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl); |
61 | /* Unmap with a single call. */ | ||
62 | ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL); | ||
63 | if (ret) { | ||
64 | @@ -XXX,XX +XXX,XX @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl, | ||
65 | } | ||
66 | |||
67 | static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl, | ||
68 | - MemoryRegionSection *section) | ||
69 | + MemoryRegionSection *section, | ||
70 | + bool is_private) | ||
71 | { | ||
72 | VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, | 61 | VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, |
73 | listener); | 62 | listener); |
74 | @@ -XXX,XX +XXX,XX @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl, | 63 | VFIOContainerBase *bcontainer = vrdl->bcontainer; |
75 | void *vaddr; | ||
76 | int ret; | ||
77 | |||
78 | + if (is_private) { | ||
79 | + /* Not support discard private memory yet. */ | ||
80 | + return 0; | ||
81 | + } | ||
82 | + | ||
83 | /* | ||
84 | * Map in (aligned within memory region) minimum granularity, so we can | ||
85 | * unmap in minimum granularity later. | ||
86 | @@ -XXX,XX +XXX,XX @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl, | 64 | @@ -XXX,XX +XXX,XX @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl, |
87 | vaddr, section->readonly); | 65 | vaddr, section->readonly); |
88 | if (ret) { | 66 | if (ret) { |
89 | /* Rollback */ | 67 | /* Rollback */ |
90 | - vfio_ram_discard_notify_discard(rdl, section); | 68 | - vfio_ram_discard_notify_discard(rdl, section); |
91 | + vfio_ram_discard_notify_discard(rdl, section, false); | 69 | + vfio_ram_discard_notify_discard(scl, section); |
92 | return ret; | 70 | return ret; |
93 | } | 71 | } |
94 | } | 72 | } |
95 | @@ -XXX,XX +XXX,XX @@ out: | 73 | @@ -XXX,XX +XXX,XX @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl, |
96 | } | 74 | static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer, |
97 | 75 | MemoryRegionSection *section) | |
98 | static int vfio_ram_discard_get_dirty_bitmap(MemoryRegionSection *section, | 76 | { |
99 | - void *opaque) | 77 | - RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr); |
100 | + bool is_private, void *opaque) | 78 | + GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr); |
101 | { | 79 | VFIORamDiscardListener *vrdl; |
102 | const hwaddr size = int128_get64(section->size); | 80 | + RamDiscardListener *rdl; |
103 | const hwaddr iova = section->offset_within_address_space; | 81 | |
82 | /* Ignore some corner cases not relevant in practice. */ | ||
83 | g_assert(QEMU_IS_ALIGNED(section->offset_within_region, TARGET_PAGE_SIZE)); | ||
84 | @@ -XXX,XX +XXX,XX @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer, | ||
85 | vrdl->mr = section->mr; | ||
86 | vrdl->offset_within_address_space = section->offset_within_address_space; | ||
87 | vrdl->size = int128_get64(section->size); | ||
88 | - vrdl->granularity = ram_discard_manager_get_min_granularity(rdm, | ||
89 | - section->mr); | ||
90 | + vrdl->granularity = generic_state_manager_get_min_granularity(gsm, | ||
91 | + section->mr); | ||
92 | |||
93 | g_assert(vrdl->granularity && is_power_of_2(vrdl->granularity)); | ||
94 | g_assert(bcontainer->pgsizes && | ||
95 | vrdl->granularity >= 1ULL << ctz64(bcontainer->pgsizes)); | ||
96 | |||
97 | - ram_discard_listener_init(&vrdl->listener, | ||
98 | + rdl = &vrdl->listener; | ||
99 | + ram_discard_listener_init(rdl, | ||
100 | vfio_ram_discard_notify_populate, | ||
101 | vfio_ram_discard_notify_discard, true); | ||
102 | - ram_discard_manager_register_listener(rdm, &vrdl->listener, section); | ||
103 | + generic_state_manager_register_listener(gsm, &rdl->scl, section); | ||
104 | QLIST_INSERT_HEAD(&bcontainer->vrdl_list, vrdl, next); | ||
105 | |||
106 | /* | ||
107 | @@ -XXX,XX +XXX,XX @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer, | ||
108 | static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer, | ||
109 | MemoryRegionSection *section) | ||
110 | { | ||
111 | - RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr); | ||
112 | + GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr); | ||
113 | VFIORamDiscardListener *vrdl = NULL; | ||
114 | + RamDiscardListener *rdl; | ||
115 | |||
116 | QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) { | ||
117 | if (vrdl->mr == section->mr && | ||
118 | @@ -XXX,XX +XXX,XX @@ static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer, | ||
119 | hw_error("vfio: Trying to unregister missing RAM discard listener"); | ||
120 | } | ||
121 | |||
122 | - ram_discard_manager_unregister_listener(rdm, &vrdl->listener); | ||
123 | + rdl = &vrdl->listener; | ||
124 | + generic_state_manager_unregister_listener(gsm, &rdl->scl); | ||
125 | QLIST_REMOVE(vrdl, next); | ||
126 | g_free(vrdl); | ||
127 | } | ||
128 | @@ -XXX,XX +XXX,XX @@ static int | ||
129 | vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer, | ||
130 | MemoryRegionSection *section) | ||
131 | { | ||
132 | - RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr); | ||
133 | + GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr); | ||
134 | VFIORamDiscardListener *vrdl = NULL; | ||
135 | |||
136 | QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) { | ||
104 | @@ -XXX,XX +XXX,XX @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer, | 137 | @@ -XXX,XX +XXX,XX @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer, |
105 | * We only want/can synchronize the bitmap for actually mapped parts - | 138 | * We only want/can synchronize the bitmap for actually mapped parts - |
106 | * which correspond to populated parts. Replay all populated parts. | 139 | * which correspond to populated parts. Replay all populated parts. |
107 | */ | 140 | */ |
108 | - return ram_discard_manager_replay_populated(rdm, section, | 141 | - return ram_discard_manager_replay_populated(rdm, section, |
109 | + return ram_discard_manager_replay_populated(rdm, section, false, | 142 | + return generic_state_manager_replay_on_state_set(gsm, section, |
110 | vfio_ram_discard_get_dirty_bitmap, | 143 | vfio_ram_discard_get_dirty_bitmap, |
111 | &vrdl); | 144 | &vrdl); |
112 | } | 145 | } |
113 | diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c | 146 | diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c |
114 | index XXXXXXX..XXXXXXX 100644 | 147 | index XXXXXXX..XXXXXXX 100644 |
115 | --- a/hw/virtio/virtio-mem.c | 148 | --- a/hw/virtio/virtio-mem.c |
116 | +++ b/hw/virtio/virtio-mem.c | 149 | +++ b/hw/virtio/virtio-mem.c |
117 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg) | 150 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem, |
118 | { | 151 | |
119 | RamDiscardListener *rdl = arg; | 152 | static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg) |
153 | { | ||
154 | - RamDiscardListener *rdl = arg; | ||
155 | + StateChangeListener *scl = arg; | ||
120 | 156 | ||
121 | - return rdl->notify_populate(rdl, s); | 157 | - return rdl->notify_populate(rdl, s); |
122 | + return rdl->notify_populate(rdl, s, false); | 158 | + return scl->notify_to_state_set(scl, s); |
123 | } | 159 | } |
124 | 160 | ||
125 | static int virtio_mem_notify_discard_cb(MemoryRegionSection *s, void *arg) | 161 | static int virtio_mem_notify_discard_cb(MemoryRegionSection *s, void *arg) |
126 | { | 162 | { |
127 | RamDiscardListener *rdl = arg; | 163 | - RamDiscardListener *rdl = arg; |
164 | + StateChangeListener *scl = arg; | ||
128 | 165 | ||
129 | - rdl->notify_discard(rdl, s); | 166 | - rdl->notify_discard(rdl, s); |
130 | + rdl->notify_discard(rdl, s, false); | 167 | + scl->notify_to_state_clear(scl, s); |
131 | return 0; | 168 | return 0; |
132 | } | 169 | } |
133 | 170 | ||
134 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset, | 171 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset, |
172 | RamDiscardListener *rdl; | ||
173 | |||
174 | QLIST_FOREACH(rdl, &vmem->rdl_list, next) { | ||
175 | - MemoryRegionSection tmp = *rdl->section; | ||
176 | + StateChangeListener *scl = &rdl->scl; | ||
177 | + MemoryRegionSection tmp = *scl->section; | ||
178 | |||
135 | if (!memory_region_section_intersect_range(&tmp, offset, size)) { | 179 | if (!memory_region_section_intersect_range(&tmp, offset, size)) { |
136 | continue; | 180 | continue; |
137 | } | 181 | } |
138 | - rdl->notify_discard(rdl, &tmp); | 182 | - rdl->notify_discard(rdl, &tmp); |
139 | + rdl->notify_discard(rdl, &tmp, false); | 183 | + scl->notify_to_state_clear(scl, &tmp); |
140 | } | 184 | } |
141 | } | 185 | } |
142 | 186 | ||
143 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, | 187 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, |
188 | int ret = 0; | ||
189 | |||
190 | QLIST_FOREACH(rdl, &vmem->rdl_list, next) { | ||
191 | - MemoryRegionSection tmp = *rdl->section; | ||
192 | + StateChangeListener *scl = &rdl->scl; | ||
193 | + MemoryRegionSection tmp = *scl->section; | ||
194 | |||
144 | if (!memory_region_section_intersect_range(&tmp, offset, size)) { | 195 | if (!memory_region_section_intersect_range(&tmp, offset, size)) { |
145 | continue; | 196 | continue; |
146 | } | 197 | } |
147 | - ret = rdl->notify_populate(rdl, &tmp); | 198 | - ret = rdl->notify_populate(rdl, &tmp); |
148 | + ret = rdl->notify_populate(rdl, &tmp, false); | 199 | + ret = scl->notify_to_state_set(scl, &tmp); |
149 | if (ret) { | 200 | if (ret) { |
150 | break; | 201 | break; |
151 | } | 202 | } |
203 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, | ||
204 | if (ret) { | ||
205 | /* Notify all already-notified listeners. */ | ||
206 | QLIST_FOREACH(rdl2, &vmem->rdl_list, next) { | ||
207 | - MemoryRegionSection tmp = *rdl2->section; | ||
208 | + StateChangeListener *scl2 = &rdl2->scl; | ||
209 | + MemoryRegionSection tmp = *scl2->section; | ||
210 | |||
211 | if (rdl2 == rdl) { | ||
212 | break; | ||
152 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, | 213 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, |
153 | if (!memory_region_section_intersect_range(&tmp, offset, size)) { | 214 | if (!memory_region_section_intersect_range(&tmp, offset, size)) { |
154 | continue; | 215 | continue; |
155 | } | 216 | } |
156 | - rdl2->notify_discard(rdl2, &tmp); | 217 | - rdl2->notify_discard(rdl2, &tmp); |
157 | + rdl2->notify_discard(rdl2, &tmp, false); | 218 | + scl2->notify_to_state_clear(scl2, &tmp); |
158 | } | 219 | } |
159 | } | 220 | } |
160 | return ret; | 221 | return ret; |
161 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem) | 222 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem) |
223 | } | ||
162 | 224 | ||
163 | QLIST_FOREACH(rdl, &vmem->rdl_list, next) { | 225 | QLIST_FOREACH(rdl, &vmem->rdl_list, next) { |
226 | + StateChangeListener *scl = &rdl->scl; | ||
164 | if (rdl->double_discard_supported) { | 227 | if (rdl->double_discard_supported) { |
165 | - rdl->notify_discard(rdl, rdl->section); | 228 | - rdl->notify_discard(rdl, rdl->section); |
166 | + rdl->notify_discard(rdl, rdl->section, false); | 229 | + scl->notify_to_state_clear(scl, scl->section); |
167 | } else { | 230 | } else { |
168 | virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, | 231 | - virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, |
232 | + virtio_mem_for_each_plugged_section(vmem, scl->section, scl, | ||
169 | virtio_mem_notify_discard_cb); | 233 | virtio_mem_notify_discard_cb); |
170 | @@ -XXX,XX +XXX,XX @@ static uint64_t virtio_mem_rdm_get_min_granularity(const RamDiscardManager *rdm, | 234 | } |
171 | } | 235 | } |
172 | 236 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) | |
173 | static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm, | 237 | * Set ourselves as RamDiscardManager before the plug handler maps the |
174 | - const MemoryRegionSection *s) | 238 | * memory region and exposes it via an address space. |
175 | + const MemoryRegionSection *s, | 239 | */ |
176 | + bool is_private) | 240 | - if (memory_region_set_ram_discard_manager(&vmem->memdev->mr, |
177 | { | 241 | - RAM_DISCARD_MANAGER(vmem))) { |
178 | const VirtIOMEM *vmem = VIRTIO_MEM(rdm); | 242 | + if (memory_region_set_generic_state_manager(&vmem->memdev->mr, |
243 | + GENERIC_STATE_MANAGER(vmem))) { | ||
244 | error_setg(errp, "Failed to set RamDiscardManager"); | ||
245 | ram_block_coordinated_discard_require(false); | ||
246 | return; | ||
247 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_device_unrealize(DeviceState *dev) | ||
248 | * The unplug handler unmapped the memory region, it cannot be | ||
249 | * found via an address space anymore. Unset ourselves. | ||
250 | */ | ||
251 | - memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL); | ||
252 | + memory_region_set_generic_state_manager(&vmem->memdev->mr, NULL); | ||
253 | ram_block_coordinated_discard_require(false); | ||
254 | } | ||
255 | |||
256 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_post_load_bitmap(VirtIOMEM *vmem) | ||
257 | * into an address space. Replay, now that we updated the bitmap. | ||
258 | */ | ||
259 | QLIST_FOREACH(rdl, &vmem->rdl_list, next) { | ||
260 | - ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, | ||
261 | + StateChangeListener *scl = &rdl->scl; | ||
262 | + ret = virtio_mem_for_each_plugged_section(vmem, scl->section, scl, | ||
263 | virtio_mem_notify_populate_cb); | ||
264 | if (ret) { | ||
265 | return ret; | ||
266 | @@ -XXX,XX +XXX,XX @@ static const Property virtio_mem_properties[] = { | ||
267 | dynamic_memslots, false), | ||
268 | }; | ||
269 | |||
270 | -static uint64_t virtio_mem_rdm_get_min_granularity(const RamDiscardManager *rdm, | ||
271 | +static uint64_t virtio_mem_rdm_get_min_granularity(const GenericStateManager *gsm, | ||
272 | const MemoryRegion *mr) | ||
273 | { | ||
274 | - const VirtIOMEM *vmem = VIRTIO_MEM(rdm); | ||
275 | + const VirtIOMEM *vmem = VIRTIO_MEM(gsm); | ||
276 | |||
277 | g_assert(mr == &vmem->memdev->mr); | ||
278 | return vmem->block_size; | ||
279 | } | ||
280 | |||
281 | -static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm, | ||
282 | +static bool virtio_mem_rdm_is_populated(const GenericStateManager *gsm, | ||
283 | const MemoryRegionSection *s) | ||
284 | { | ||
285 | - const VirtIOMEM *vmem = VIRTIO_MEM(rdm); | ||
286 | + const VirtIOMEM *vmem = VIRTIO_MEM(gsm); | ||
179 | uint64_t start_gpa = vmem->addr + s->offset_within_region; | 287 | uint64_t start_gpa = vmem->addr + s->offset_within_region; |
288 | uint64_t end_gpa = start_gpa + int128_get64(s->size); | ||
289 | |||
180 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg) | 290 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg) |
181 | { | 291 | return data->fn(s, data->opaque); |
182 | struct VirtIOMEMReplayData *data = arg; | 292 | } |
183 | 293 | ||
184 | - return ((ReplayRamPopulate)data->fn)(s, data->opaque); | 294 | -static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm, |
185 | + return ((ReplayRamPopulate)data->fn)(s, false, data->opaque); | 295 | +static int virtio_mem_rdm_replay_populated(const GenericStateManager *gsm, |
186 | } | ||
187 | |||
188 | static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm, | ||
189 | MemoryRegionSection *s, | 296 | MemoryRegionSection *s, |
190 | + bool is_private, | 297 | ReplayStateChange replay_fn, |
191 | ReplayRamPopulate replay_fn, | ||
192 | void *opaque) | 298 | void *opaque) |
193 | { | 299 | { |
300 | - const VirtIOMEM *vmem = VIRTIO_MEM(rdm); | ||
301 | + const VirtIOMEM *vmem = VIRTIO_MEM(gsm); | ||
302 | struct VirtIOMEMReplayData data = { | ||
303 | .fn = replay_fn, | ||
304 | .opaque = opaque, | ||
194 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s, | 305 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s, |
195 | { | ||
196 | struct VirtIOMEMReplayData *data = arg; | ||
197 | |||
198 | - ((ReplayRamDiscard)data->fn)(s, data->opaque); | ||
199 | + ((ReplayRamDiscard)data->fn)(s, false, data->opaque); | ||
200 | return 0; | 306 | return 0; |
201 | } | 307 | } |
202 | 308 | ||
203 | static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm, | 309 | -static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm, |
204 | MemoryRegionSection *s, | 310 | +static int virtio_mem_rdm_replay_discarded(const GenericStateManager *gsm, |
205 | + bool is_private, | 311 | MemoryRegionSection *s, |
206 | ReplayRamDiscard replay_fn, | 312 | ReplayStateChange replay_fn, |
207 | void *opaque) | 313 | void *opaque) |
208 | { | 314 | { |
209 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_rdm_unregister_listener(RamDiscardManager *rdm, | 315 | - const VirtIOMEM *vmem = VIRTIO_MEM(rdm); |
210 | g_assert(rdl->section->mr == &vmem->memdev->mr); | 316 | + const VirtIOMEM *vmem = VIRTIO_MEM(gsm); |
317 | struct VirtIOMEMReplayData data = { | ||
318 | .fn = replay_fn, | ||
319 | .opaque = opaque, | ||
320 | @@ -XXX,XX +XXX,XX @@ static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm, | ||
321 | virtio_mem_rdm_replay_discarded_cb); | ||
322 | } | ||
323 | |||
324 | -static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm, | ||
325 | - RamDiscardListener *rdl, | ||
326 | +static void virtio_mem_rdm_register_listener(GenericStateManager *gsm, | ||
327 | + StateChangeListener *scl, | ||
328 | MemoryRegionSection *s) | ||
329 | { | ||
330 | - VirtIOMEM *vmem = VIRTIO_MEM(rdm); | ||
331 | + VirtIOMEM *vmem = VIRTIO_MEM(gsm); | ||
332 | + RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl); | ||
333 | int ret; | ||
334 | |||
335 | g_assert(s->mr == &vmem->memdev->mr); | ||
336 | - rdl->section = memory_region_section_new_copy(s); | ||
337 | + scl->section = memory_region_section_new_copy(s); | ||
338 | |||
339 | QLIST_INSERT_HEAD(&vmem->rdl_list, rdl, next); | ||
340 | - ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, | ||
341 | + ret = virtio_mem_for_each_plugged_section(vmem, scl->section, scl, | ||
342 | virtio_mem_notify_populate_cb); | ||
343 | if (ret) { | ||
344 | error_report("%s: Replaying plugged ranges failed: %s", __func__, | ||
345 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm, | ||
346 | } | ||
347 | } | ||
348 | |||
349 | -static void virtio_mem_rdm_unregister_listener(RamDiscardManager *rdm, | ||
350 | - RamDiscardListener *rdl) | ||
351 | +static void virtio_mem_rdm_unregister_listener(GenericStateManager *gsm, | ||
352 | + StateChangeListener *scl) | ||
353 | { | ||
354 | - VirtIOMEM *vmem = VIRTIO_MEM(rdm); | ||
355 | + VirtIOMEM *vmem = VIRTIO_MEM(gsm); | ||
356 | + RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl); | ||
357 | |||
358 | - g_assert(rdl->section->mr == &vmem->memdev->mr); | ||
359 | + g_assert(scl->section->mr == &vmem->memdev->mr); | ||
211 | if (vmem->size) { | 360 | if (vmem->size) { |
212 | if (rdl->double_discard_supported) { | 361 | if (rdl->double_discard_supported) { |
213 | - rdl->notify_discard(rdl, rdl->section); | 362 | - rdl->notify_discard(rdl, rdl->section); |
214 | + rdl->notify_discard(rdl, rdl->section, false); | 363 | + scl->notify_to_state_clear(scl, scl->section); |
215 | } else { | 364 | } else { |
216 | virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, | 365 | - virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, |
366 | + virtio_mem_for_each_plugged_section(vmem, scl->section, scl, | ||
217 | virtio_mem_notify_discard_cb); | 367 | virtio_mem_notify_discard_cb); |
368 | } | ||
369 | } | ||
370 | |||
371 | - memory_region_section_free_copy(rdl->section); | ||
372 | - rdl->section = NULL; | ||
373 | + memory_region_section_free_copy(scl->section); | ||
374 | + scl->section = NULL; | ||
375 | QLIST_REMOVE(rdl, next); | ||
376 | } | ||
377 | |||
378 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_class_init(ObjectClass *klass, void *data) | ||
379 | DeviceClass *dc = DEVICE_CLASS(klass); | ||
380 | VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass); | ||
381 | VirtIOMEMClass *vmc = VIRTIO_MEM_CLASS(klass); | ||
382 | - RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(klass); | ||
383 | + GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(klass); | ||
384 | |||
385 | device_class_set_props(dc, virtio_mem_properties); | ||
386 | dc->vmsd = &vmstate_virtio_mem; | ||
387 | @@ -XXX,XX +XXX,XX @@ static void virtio_mem_class_init(ObjectClass *klass, void *data) | ||
388 | vmc->remove_size_change_notifier = virtio_mem_remove_size_change_notifier; | ||
389 | vmc->unplug_request_check = virtio_mem_unplug_request_check; | ||
390 | |||
391 | - rdmc->get_min_granularity = virtio_mem_rdm_get_min_granularity; | ||
392 | - rdmc->is_populated = virtio_mem_rdm_is_populated; | ||
393 | - rdmc->replay_populated = virtio_mem_rdm_replay_populated; | ||
394 | - rdmc->replay_discarded = virtio_mem_rdm_replay_discarded; | ||
395 | - rdmc->register_listener = virtio_mem_rdm_register_listener; | ||
396 | - rdmc->unregister_listener = virtio_mem_rdm_unregister_listener; | ||
397 | + gsmc->get_min_granularity = virtio_mem_rdm_get_min_granularity; | ||
398 | + gsmc->is_state_set = virtio_mem_rdm_is_populated; | ||
399 | + gsmc->replay_on_state_set = virtio_mem_rdm_replay_populated; | ||
400 | + gsmc->replay_on_state_clear = virtio_mem_rdm_replay_discarded; | ||
401 | + gsmc->register_listener = virtio_mem_rdm_register_listener; | ||
402 | + gsmc->unregister_listener = virtio_mem_rdm_unregister_listener; | ||
403 | } | ||
404 | |||
405 | static const TypeInfo virtio_mem_info = { | ||
218 | diff --git a/include/exec/memory.h b/include/exec/memory.h | 406 | diff --git a/include/exec/memory.h b/include/exec/memory.h |
219 | index XXXXXXX..XXXXXXX 100644 | 407 | index XXXXXXX..XXXXXXX 100644 |
220 | --- a/include/exec/memory.h | 408 | --- a/include/exec/memory.h |
221 | +++ b/include/exec/memory.h | 409 | +++ b/include/exec/memory.h |
410 | @@ -XXX,XX +XXX,XX @@ typedef struct IOMMUMemoryRegionClass IOMMUMemoryRegionClass; | ||
411 | DECLARE_OBJ_CHECKERS(IOMMUMemoryRegion, IOMMUMemoryRegionClass, | ||
412 | IOMMU_MEMORY_REGION, TYPE_IOMMU_MEMORY_REGION) | ||
413 | |||
414 | +#define TYPE_GENERIC_STATE_MANAGER "generic-state-manager" | ||
415 | +typedef struct GenericStateManagerClass GenericStateManagerClass; | ||
416 | +typedef struct GenericStateManager GenericStateManager; | ||
417 | +DECLARE_OBJ_CHECKERS(GenericStateManager, GenericStateManagerClass, | ||
418 | + GENERIC_STATE_MANAGER, TYPE_GENERIC_STATE_MANAGER) | ||
419 | + | ||
420 | #define TYPE_RAM_DISCARD_MANAGER "ram-discard-manager" | ||
421 | typedef struct RamDiscardManagerClass RamDiscardManagerClass; | ||
422 | typedef struct RamDiscardManager RamDiscardManager; | ||
222 | @@ -XXX,XX +XXX,XX @@ struct IOMMUMemoryRegionClass { | 423 | @@ -XXX,XX +XXX,XX @@ struct IOMMUMemoryRegionClass { |
223 | 424 | int (*num_indexes)(IOMMUMemoryRegion *iommu); | |
224 | typedef struct RamDiscardListener RamDiscardListener; | 425 | }; |
225 | typedef int (*NotifyRamPopulate)(RamDiscardListener *rdl, | 426 | |
427 | -typedef struct RamDiscardListener RamDiscardListener; | ||
428 | -typedef int (*NotifyRamPopulate)(RamDiscardListener *rdl, | ||
226 | - MemoryRegionSection *section); | 429 | - MemoryRegionSection *section); |
430 | -typedef void (*NotifyRamDiscard)(RamDiscardListener *rdl, | ||
431 | +typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque); | ||
432 | + | ||
433 | +typedef struct StateChangeListener StateChangeListener; | ||
434 | +typedef int (*NotifyStateSet)(StateChangeListener *scl, | ||
435 | + MemoryRegionSection *section); | ||
436 | +typedef void (*NotifyStateClear)(StateChangeListener *scl, | ||
437 | MemoryRegionSection *section); | ||
438 | |||
439 | -struct RamDiscardListener { | ||
440 | +struct StateChangeListener { | ||
441 | /* | ||
442 | - * @notify_populate: | ||
443 | + * @notify_to_state_set: | ||
444 | * | ||
445 | - * Notification that previously discarded memory is about to get populated. | ||
446 | - * Listeners are able to object. If any listener objects, already | ||
447 | - * successfully notified listeners are notified about a discard again. | ||
448 | + * Notification that previously state clear part is about to be set. | ||
449 | * | ||
450 | - * @rdl: the #RamDiscardListener getting notified | ||
451 | - * @section: the #MemoryRegionSection to get populated. The section | ||
452 | + * @scl: the #StateChangeListener getting notified | ||
453 | + * @section: the #MemoryRegionSection to be state-set. The section | ||
454 | * is aligned within the memory region to the minimum granularity | ||
455 | * unless it would exceed the registered section. | ||
456 | * | ||
457 | * Returns 0 on success. If the notification is rejected by the listener, | ||
458 | * an error is returned. | ||
459 | */ | ||
460 | - NotifyRamPopulate notify_populate; | ||
461 | + NotifyStateSet notify_to_state_set; | ||
462 | |||
463 | /* | ||
464 | - * @notify_discard: | ||
465 | + * @notify_to_state_clear: | ||
466 | * | ||
467 | - * Notification that previously populated memory was discarded successfully | ||
468 | - * and listeners should drop all references to such memory and prevent | ||
469 | - * new population (e.g., unmap). | ||
470 | + * Notification that previously state set part is about to be cleared | ||
471 | * | ||
472 | - * @rdl: the #RamDiscardListener getting notified | ||
473 | - * @section: the #MemoryRegionSection to get populated. The section | ||
474 | + * @scl: the #StateChangeListener getting notified | ||
475 | + * @section: the #MemoryRegionSection to be state-cleared. The section | ||
476 | * is aligned within the memory region to the minimum granularity | ||
477 | * unless it would exceed the registered section. | ||
478 | - */ | ||
479 | - NotifyRamDiscard notify_discard; | ||
480 | - | ||
481 | - /* | ||
482 | - * @double_discard_supported: | ||
483 | * | ||
484 | - * The listener suppors getting @notify_discard notifications that span | ||
485 | - * already discarded parts. | ||
486 | + * Returns 0 on success. If the notification is rejected by the listener, | ||
487 | + * an error is returned. | ||
488 | */ | ||
489 | - bool double_discard_supported; | ||
490 | + NotifyStateClear notify_to_state_clear; | ||
491 | |||
492 | MemoryRegionSection *section; | ||
493 | - QLIST_ENTRY(RamDiscardListener) next; | ||
494 | }; | ||
495 | |||
496 | -static inline void ram_discard_listener_init(RamDiscardListener *rdl, | ||
497 | - NotifyRamPopulate populate_fn, | ||
498 | - NotifyRamDiscard discard_fn, | ||
499 | - bool double_discard_supported) | ||
500 | -{ | ||
501 | - rdl->notify_populate = populate_fn; | ||
502 | - rdl->notify_discard = discard_fn; | ||
503 | - rdl->double_discard_supported = double_discard_supported; | ||
504 | -} | ||
505 | - | ||
506 | -typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque); | ||
507 | - | ||
508 | /* | ||
509 | - * RamDiscardManagerClass: | ||
510 | - * | ||
511 | - * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion | ||
512 | - * regions are currently populated to be used/accessed by the VM, notifying | ||
513 | - * after parts were discarded (freeing up memory) and before parts will be | ||
514 | - * populated (consuming memory), to be used/accessed by the VM. | ||
515 | - * | ||
516 | - * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the | ||
517 | - * #MemoryRegion isn't mapped into an address space yet (either directly | ||
518 | - * or via an alias); it cannot change while the #MemoryRegion is | ||
519 | - * mapped into an address space. | ||
520 | + * GenericStateManagerClass: | ||
521 | * | ||
522 | - * The #RamDiscardManager is intended to be used by technologies that are | ||
523 | - * incompatible with discarding of RAM (e.g., VFIO, which may pin all | ||
524 | - * memory inside a #MemoryRegion), and require proper coordination to only | ||
525 | - * map the currently populated parts, to hinder parts that are expected to | ||
526 | - * remain discarded from silently getting populated and consuming memory. | ||
527 | - * Technologies that support discarding of RAM don't have to bother and can | ||
528 | - * simply map the whole #MemoryRegion. | ||
529 | - * | ||
530 | - * An example #RamDiscardManager is virtio-mem, which logically (un)plugs | ||
531 | - * memory within an assigned RAM #MemoryRegion, coordinated with the VM. | ||
532 | - * Logically unplugging memory consists of discarding RAM. The VM agreed to not | ||
533 | - * access unplugged (discarded) memory - especially via DMA. virtio-mem will | ||
534 | - * properly coordinate with listeners before memory is plugged (populated), | ||
535 | - * and after memory is unplugged (discarded). | ||
536 | + * A #GenericStateManager is a common interface used to manage the state of | ||
537 | + * a #MemoryRegion. The managed states is a pair of opposite states, such as | ||
538 | + * populated and discarded, or private and shared. It is abstract as set and | ||
539 | + * clear in below callbacks, and the actual state is managed by the | ||
540 | + * implementation. | ||
541 | * | ||
542 | - * Listeners are called in multiples of the minimum granularity (unless it | ||
543 | - * would exceed the registered range) and changes are aligned to the minimum | ||
544 | - * granularity within the #MemoryRegion. Listeners have to prepare for memory | ||
545 | - * becoming discarded in a different granularity than it was populated and the | ||
546 | - * other way around. | ||
547 | */ | ||
548 | -struct RamDiscardManagerClass { | ||
549 | +struct GenericStateManagerClass { | ||
550 | /* private */ | ||
551 | InterfaceClass parent_class; | ||
552 | |||
553 | @@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass { | ||
554 | * @get_min_granularity: | ||
555 | * | ||
556 | * Get the minimum granularity in which listeners will get notified | ||
557 | - * about changes within the #MemoryRegion via the #RamDiscardManager. | ||
558 | + * about changes within the #MemoryRegion via the #GenericStateManager. | ||
559 | * | ||
560 | - * @rdm: the #RamDiscardManager | ||
561 | + * @gsm: the #GenericStateManager | ||
562 | * @mr: the #MemoryRegion | ||
563 | * | ||
564 | * Returns the minimum granularity. | ||
565 | */ | ||
566 | - uint64_t (*get_min_granularity)(const RamDiscardManager *rdm, | ||
567 | + uint64_t (*get_min_granularity)(const GenericStateManager *gsm, | ||
568 | const MemoryRegion *mr); | ||
569 | |||
570 | /** | ||
571 | - * @is_populated: | ||
572 | + * @is_state_set: | ||
573 | * | ||
574 | - * Check whether the given #MemoryRegionSection is completely populated | ||
575 | - * (i.e., no parts are currently discarded) via the #RamDiscardManager. | ||
576 | - * There are no alignment requirements. | ||
577 | + * Check whether the given #MemoryRegionSection state is set. | ||
578 | + * via the #GenericStateManager. | ||
579 | * | ||
580 | - * @rdm: the #RamDiscardManager | ||
581 | + * @gsm: the #GenericStateManager | ||
582 | * @section: the #MemoryRegionSection | ||
583 | * | ||
584 | - * Returns whether the given range is completely populated. | ||
585 | + * Returns whether the given range is completely set. | ||
586 | */ | ||
587 | - bool (*is_populated)(const RamDiscardManager *rdm, | ||
588 | + bool (*is_state_set)(const GenericStateManager *gsm, | ||
589 | const MemoryRegionSection *section); | ||
590 | |||
591 | /** | ||
592 | - * @replay_populated: | ||
593 | + * @replay_on_state_set: | ||
594 | * | ||
595 | - * Call the #ReplayStateChange callback for all populated parts within the | ||
596 | - * #MemoryRegionSection via the #RamDiscardManager. | ||
597 | + * Call the #ReplayStateChange callback for all state set parts within the | ||
598 | + * #MemoryRegionSection via the #GenericStateManager. | ||
599 | * | ||
600 | * In case any call fails, no further calls are made. | ||
601 | * | ||
602 | - * @rdm: the #RamDiscardManager | ||
603 | + * @gsm: the #GenericStateManager | ||
604 | * @section: the #MemoryRegionSection | ||
605 | * @replay_fn: the #ReplayStateChange callback | ||
606 | * @opaque: pointer to forward to the callback | ||
607 | * | ||
608 | * Returns 0 on success, or a negative error if any notification failed. | ||
609 | */ | ||
610 | - int (*replay_populated)(const RamDiscardManager *rdm, | ||
611 | - MemoryRegionSection *section, | ||
612 | - ReplayStateChange replay_fn, void *opaque); | ||
613 | + int (*replay_on_state_set)(const GenericStateManager *gsm, | ||
614 | + MemoryRegionSection *section, | ||
615 | + ReplayStateChange replay_fn, void *opaque); | ||
616 | |||
617 | /** | ||
618 | - * @replay_discarded: | ||
619 | + * @replay_on_state_clear: | ||
620 | * | ||
621 | - * Call the #ReplayStateChange callback for all discarded parts within the | ||
622 | - * #MemoryRegionSection via the #RamDiscardManager. | ||
623 | + * Call the #ReplayStateChange callback for all state clear parts within the | ||
624 | + * #MemoryRegionSection via the #GenericStateManager. | ||
625 | + * | ||
626 | + * In case any call fails, no further calls are made. | ||
627 | * | ||
628 | - * @rdm: the #RamDiscardManager | ||
629 | + * @gsm: the #GenericStateManager | ||
630 | * @section: the #MemoryRegionSection | ||
631 | * @replay_fn: the #ReplayStateChange callback | ||
632 | * @opaque: pointer to forward to the callback | ||
633 | * | ||
634 | * Returns 0 on success, or a negative error if any notification failed. | ||
635 | */ | ||
636 | - int (*replay_discarded)(const RamDiscardManager *rdm, | ||
637 | - MemoryRegionSection *section, | ||
638 | - ReplayStateChange replay_fn, void *opaque); | ||
639 | + int (*replay_on_state_clear)(const GenericStateManager *gsm, | ||
227 | + MemoryRegionSection *section, | 640 | + MemoryRegionSection *section, |
228 | + bool is_private); | 641 | + ReplayStateChange replay_fn, void *opaque); |
229 | typedef void (*NotifyRamDiscard)(RamDiscardListener *rdl, | 642 | |
230 | - MemoryRegionSection *section); | 643 | /** |
231 | + MemoryRegionSection *section, | 644 | * @register_listener: |
232 | + bool is_private); | 645 | * |
233 | 646 | - * Register a #RamDiscardListener for the given #MemoryRegionSection and | |
234 | struct RamDiscardListener { | 647 | - * immediately notify the #RamDiscardListener about all populated parts |
235 | /* | 648 | - * within the #MemoryRegionSection via the #RamDiscardManager. |
236 | @@ -XXX,XX +XXX,XX @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl, | 649 | + * Register a #StateChangeListener for the given #MemoryRegionSection and |
237 | rdl->double_discard_supported = double_discard_supported; | 650 | + * immediately notify the #StateChangeListener about all state-set parts |
238 | } | 651 | + * within the #MemoryRegionSection via the #GenericStateManager. |
239 | 652 | * | |
240 | -typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaque); | 653 | * In case any notification fails, no further notifications are triggered |
241 | -typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void *opaque); | 654 | * and an error is logged. |
242 | +typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, bool is_private, void *opaque); | 655 | * |
243 | +typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, bool is_private, void *opaque); | 656 | - * @rdm: the #RamDiscardManager |
244 | 657 | - * @rdl: the #RamDiscardListener | |
245 | /* | 658 | + * @rdm: the #GenericStateManager |
246 | * RamDiscardManagerClass: | 659 | + * @rdl: the #StateChangeListener |
247 | @@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass { | ||
248 | * | ||
249 | * @rdm: the #RamDiscardManager | ||
250 | * @section: the #MemoryRegionSection | 660 | * @section: the #MemoryRegionSection |
251 | + * @is_private: the attribute of the request section | 661 | */ |
252 | * | 662 | - void (*register_listener)(RamDiscardManager *rdm, |
253 | * Returns whether the given range is completely populated. | 663 | - RamDiscardListener *rdl, |
254 | */ | 664 | + void (*register_listener)(GenericStateManager *gsm, |
255 | bool (*is_populated)(const RamDiscardManager *rdm, | 665 | + StateChangeListener *scl, |
256 | - const MemoryRegionSection *section); | 666 | MemoryRegionSection *section); |
257 | + const MemoryRegionSection *section, | ||
258 | + bool is_private); | ||
259 | 667 | ||
260 | /** | 668 | /** |
261 | * @replay_populated: | 669 | * @unregister_listener: |
262 | @@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass { | 670 | * |
263 | * | 671 | - * Unregister a previously registered #RamDiscardListener via the |
264 | * @rdm: the #RamDiscardManager | 672 | - * #RamDiscardManager after notifying the #RamDiscardListener about all |
265 | * @section: the #MemoryRegionSection | 673 | - * populated parts becoming unpopulated within the registered |
266 | + * @is_private: the attribute of the populated parts | 674 | + * Unregister a previously registered #StateChangeListener via the |
267 | * @replay_fn: the #ReplayRamPopulate callback | 675 | + * #GenericStateManager after notifying the #StateChangeListener about all |
268 | * @opaque: pointer to forward to the callback | 676 | + * state-set parts becoming state-cleared within the registered |
269 | * | 677 | * #MemoryRegionSection. |
270 | @@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass { | 678 | * |
271 | */ | 679 | - * @rdm: the #RamDiscardManager |
272 | int (*replay_populated)(const RamDiscardManager *rdm, | 680 | - * @rdl: the #RamDiscardListener |
273 | MemoryRegionSection *section, | 681 | + * @rdm: the #GenericStateManager |
274 | + bool is_private, | 682 | + * @rdl: the #StateChangeListener |
275 | ReplayRamPopulate replay_fn, void *opaque); | 683 | */ |
276 | 684 | - void (*unregister_listener)(RamDiscardManager *rdm, | |
277 | /** | 685 | - RamDiscardListener *rdl); |
278 | @@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass { | 686 | + void (*unregister_listener)(GenericStateManager *gsm, |
279 | * | 687 | + StateChangeListener *scl); |
280 | * @rdm: the #RamDiscardManager | 688 | }; |
281 | * @section: the #MemoryRegionSection | 689 | |
282 | + * @is_private: the attribute of the discarded parts | 690 | -uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm, |
283 | * @replay_fn: the #ReplayRamDiscard callback | 691 | - const MemoryRegion *mr); |
284 | * @opaque: pointer to forward to the callback | 692 | +uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm, |
285 | */ | 693 | + const MemoryRegion *mr); |
286 | void (*replay_discarded)(const RamDiscardManager *rdm, | 694 | |
287 | MemoryRegionSection *section, | 695 | -bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, |
288 | + bool is_private, | ||
289 | ReplayRamDiscard replay_fn, void *opaque); | ||
290 | |||
291 | /** | ||
292 | @@ -XXX,XX +XXX,XX @@ uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm, | ||
293 | const MemoryRegion *mr); | ||
294 | |||
295 | bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, | ||
296 | - const MemoryRegionSection *section); | 696 | - const MemoryRegionSection *section); |
297 | + const MemoryRegionSection *section, | 697 | +bool generic_state_manager_is_state_set(const GenericStateManager *gsm, |
298 | + bool is_private); | 698 | + const MemoryRegionSection *section); |
299 | 699 | ||
300 | int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, | 700 | -int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, |
301 | MemoryRegionSection *section, | 701 | - MemoryRegionSection *section, |
302 | + bool is_private, | 702 | - ReplayStateChange replay_fn, |
303 | ReplayRamPopulate replay_fn, | 703 | - void *opaque); |
304 | void *opaque); | 704 | +int generic_state_manager_replay_on_state_set(const GenericStateManager *gsm, |
305 | 705 | + MemoryRegionSection *section, | |
306 | void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, | 706 | + ReplayStateChange replay_fn, |
307 | MemoryRegionSection *section, | 707 | + void *opaque); |
308 | + bool is_private, | 708 | |
309 | ReplayRamDiscard replay_fn, | 709 | -int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, |
310 | void *opaque); | 710 | - MemoryRegionSection *section, |
311 | 711 | - ReplayStateChange replay_fn, | |
712 | - void *opaque); | ||
713 | +int generic_state_manager_replay_on_state_clear(const GenericStateManager *gsm, | ||
714 | + MemoryRegionSection *section, | ||
715 | + ReplayStateChange replay_fn, | ||
716 | + void *opaque); | ||
717 | |||
718 | -void ram_discard_manager_register_listener(RamDiscardManager *rdm, | ||
719 | - RamDiscardListener *rdl, | ||
720 | - MemoryRegionSection *section); | ||
721 | +void generic_state_manager_register_listener(GenericStateManager *gsm, | ||
722 | + StateChangeListener *scl, | ||
723 | + MemoryRegionSection *section); | ||
724 | |||
725 | -void ram_discard_manager_unregister_listener(RamDiscardManager *rdm, | ||
726 | - RamDiscardListener *rdl); | ||
727 | +void generic_state_manager_unregister_listener(GenericStateManager *gsm, | ||
728 | + StateChangeListener *scl); | ||
729 | + | ||
730 | +typedef struct RamDiscardListener RamDiscardListener; | ||
731 | + | ||
732 | +struct RamDiscardListener { | ||
733 | + struct StateChangeListener scl; | ||
734 | + | ||
735 | + /* | ||
736 | + * @double_discard_supported: | ||
737 | + * | ||
738 | + * The listener suppors getting @notify_discard notifications that span | ||
739 | + * already discarded parts. | ||
740 | + */ | ||
741 | + bool double_discard_supported; | ||
742 | + | ||
743 | + QLIST_ENTRY(RamDiscardListener) next; | ||
744 | +}; | ||
745 | + | ||
746 | +static inline void ram_discard_listener_init(RamDiscardListener *rdl, | ||
747 | + NotifyStateSet populate_fn, | ||
748 | + NotifyStateClear discard_fn, | ||
749 | + bool double_discard_supported) | ||
750 | +{ | ||
751 | + rdl->scl.notify_to_state_set = populate_fn; | ||
752 | + rdl->scl.notify_to_state_clear = discard_fn; | ||
753 | + rdl->double_discard_supported = double_discard_supported; | ||
754 | +} | ||
755 | + | ||
756 | +/* | ||
757 | + * RamDiscardManagerClass: | ||
758 | + * | ||
759 | + * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion | ||
760 | + * regions are currently populated to be used/accessed by the VM, notifying | ||
761 | + * after parts were discarded (freeing up memory) and before parts will be | ||
762 | + * populated (consuming memory), to be used/accessed by the VM. | ||
763 | + * | ||
764 | + * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the | ||
765 | + * #MemoryRegion isn't mapped into an address space yet (either directly | ||
766 | + * or via an alias); it cannot change while the #MemoryRegion is | ||
767 | + * mapped into an address space. | ||
768 | + * | ||
769 | + * The #RamDiscardManager is intended to be used by technologies that are | ||
770 | + * incompatible with discarding of RAM (e.g., VFIO, which may pin all | ||
771 | + * memory inside a #MemoryRegion), and require proper coordination to only | ||
772 | + * map the currently populated parts, to hinder parts that are expected to | ||
773 | + * remain discarded from silently getting populated and consuming memory. | ||
774 | + * Technologies that support discarding of RAM don't have to bother and can | ||
775 | + * simply map the whole #MemoryRegion. | ||
776 | + * | ||
777 | + * An example #RamDiscardManager is virtio-mem, which logically (un)plugs | ||
778 | + * memory within an assigned RAM #MemoryRegion, coordinated with the VM. | ||
779 | + * Logically unplugging memory consists of discarding RAM. The VM agreed to not | ||
780 | + * access unplugged (discarded) memory - especially via DMA. virtio-mem will | ||
781 | + * properly coordinate with listeners before memory is plugged (populated), | ||
782 | + * and after memory is unplugged (discarded). | ||
783 | + * | ||
784 | + * Listeners are called in multiples of the minimum granularity (unless it | ||
785 | + * would exceed the registered range) and changes are aligned to the minimum | ||
786 | + * granularity within the #MemoryRegion. Listeners have to prepare for memory | ||
787 | + * becoming discarded in a different granularity than it was populated and the | ||
788 | + * other way around. | ||
789 | + */ | ||
790 | +struct RamDiscardManagerClass { | ||
791 | + /* private */ | ||
792 | + GenericStateManagerClass parent_class; | ||
793 | +}; | ||
794 | |||
795 | /** | ||
796 | * memory_get_xlat_addr: Extract addresses from a TLB entry | ||
797 | @@ -XXX,XX +XXX,XX @@ struct MemoryRegion { | ||
798 | const char *name; | ||
799 | unsigned ioeventfd_nb; | ||
800 | MemoryRegionIoeventfd *ioeventfds; | ||
801 | - RamDiscardManager *rdm; /* Only for RAM */ | ||
802 | + GenericStateManager *gsm; /* Only for RAM */ | ||
803 | |||
804 | /* For devices designed to perform re-entrant IO into their own IO MRs */ | ||
805 | bool disable_reentrancy_guard; | ||
806 | @@ -XXX,XX +XXX,XX @@ bool memory_region_present(MemoryRegion *container, hwaddr addr); | ||
807 | bool memory_region_is_mapped(MemoryRegion *mr); | ||
808 | |||
809 | /** | ||
810 | - * memory_region_get_ram_discard_manager: get the #RamDiscardManager for a | ||
811 | + * memory_region_get_generic_state_manager: get the #GenericStateManager for a | ||
812 | * #MemoryRegion | ||
813 | * | ||
814 | - * The #RamDiscardManager cannot change while a memory region is mapped. | ||
815 | + * The #GenericStateManager cannot change while a memory region is mapped. | ||
816 | * | ||
817 | * @mr: the #MemoryRegion | ||
818 | */ | ||
819 | -RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr); | ||
820 | +GenericStateManager *memory_region_get_generic_state_manager(MemoryRegion *mr); | ||
821 | |||
822 | /** | ||
823 | - * memory_region_has_ram_discard_manager: check whether a #MemoryRegion has a | ||
824 | - * #RamDiscardManager assigned | ||
825 | + * memory_region_set_generic_state_manager: set the #GenericStateManager for a | ||
826 | + * #MemoryRegion | ||
827 | + * | ||
828 | + * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion | ||
829 | + * that does not cover RAM, or a #MemoryRegion that already has a | ||
830 | + * #GenericStateManager assigned. Return 0 if the gsm is set successfully. | ||
831 | * | ||
832 | * @mr: the #MemoryRegion | ||
833 | + * @gsm: #GenericStateManager to set | ||
834 | */ | ||
835 | -static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr) | ||
836 | -{ | ||
837 | - return !!memory_region_get_ram_discard_manager(mr); | ||
838 | -} | ||
839 | +int memory_region_set_generic_state_manager(MemoryRegion *mr, | ||
840 | + GenericStateManager *gsm); | ||
841 | |||
842 | /** | ||
843 | - * memory_region_set_ram_discard_manager: set the #RamDiscardManager for a | ||
844 | - * #MemoryRegion | ||
845 | - * | ||
846 | - * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion | ||
847 | - * that does not cover RAM, or a #MemoryRegion that already has a | ||
848 | - * #RamDiscardManager assigned. Return 0 if the rdm is set successfully. | ||
849 | + * memory_region_has_ram_discard_manager: check whether a #MemoryRegion has a | ||
850 | + * #RamDiscardManager assigned | ||
851 | * | ||
852 | * @mr: the #MemoryRegion | ||
853 | - * @rdm: #RamDiscardManager to set | ||
854 | */ | ||
855 | -int memory_region_set_ram_discard_manager(MemoryRegion *mr, | ||
856 | - RamDiscardManager *rdm); | ||
857 | +bool memory_region_has_ram_discard_manager(MemoryRegion *mr); | ||
858 | |||
859 | /** | ||
860 | * memory_region_find: translate an address/size relative to a | ||
312 | diff --git a/migration/ram.c b/migration/ram.c | 861 | diff --git a/migration/ram.c b/migration/ram.c |
313 | index XXXXXXX..XXXXXXX 100644 | 862 | index XXXXXXX..XXXXXXX 100644 |
314 | --- a/migration/ram.c | 863 | --- a/migration/ram.c |
315 | +++ b/migration/ram.c | 864 | +++ b/migration/ram.c |
316 | @@ -XXX,XX +XXX,XX @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs, | ||
317 | } | ||
318 | |||
319 | static void dirty_bitmap_clear_section(MemoryRegionSection *section, | ||
320 | - void *opaque) | ||
321 | + bool is_private, void *opaque) | ||
322 | { | ||
323 | const hwaddr offset = section->offset_within_region; | ||
324 | const hwaddr size = int128_get64(section->size); | ||
325 | @@ -XXX,XX +XXX,XX @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb) | 865 | @@ -XXX,XX +XXX,XX @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb) |
866 | uint64_t cleared_bits = 0; | ||
867 | |||
868 | if (rb->mr && rb->bmap && memory_region_has_ram_discard_manager(rb->mr)) { | ||
869 | - RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr); | ||
870 | + GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr); | ||
871 | MemoryRegionSection section = { | ||
872 | .mr = rb->mr, | ||
873 | .offset_within_region = 0, | ||
326 | .size = int128_make64(qemu_ram_get_used_length(rb)), | 874 | .size = int128_make64(qemu_ram_get_used_length(rb)), |
327 | }; | 875 | }; |
328 | 876 | ||
329 | - ram_discard_manager_replay_discarded(rdm, §ion, | 877 | - ram_discard_manager_replay_discarded(rdm, §ion, |
330 | + ram_discard_manager_replay_discarded(rdm, §ion, false, | 878 | + generic_state_manager_replay_on_state_clear(gsm, §ion, |
331 | dirty_bitmap_clear_section, | 879 | dirty_bitmap_clear_section, |
332 | &cleared_bits); | 880 | &cleared_bits); |
333 | } | 881 | } |
334 | @@ -XXX,XX +XXX,XX @@ bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start) | 882 | @@ -XXX,XX +XXX,XX @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb) |
883 | bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start) | ||
884 | { | ||
885 | if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) { | ||
886 | - RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr); | ||
887 | + GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr); | ||
888 | MemoryRegionSection section = { | ||
889 | .mr = rb->mr, | ||
890 | .offset_within_region = start, | ||
335 | .size = int128_make64(qemu_ram_pagesize(rb)), | 891 | .size = int128_make64(qemu_ram_pagesize(rb)), |
336 | }; | 892 | }; |
337 | 893 | ||
338 | - return !ram_discard_manager_is_populated(rdm, §ion); | 894 | - return !ram_discard_manager_is_populated(rdm, §ion); |
339 | + return !ram_discard_manager_is_populated(rdm, §ion, false); | 895 | + return !generic_state_manager_is_state_set(gsm, §ion); |
340 | } | 896 | } |
341 | return false; | 897 | return false; |
342 | } | 898 | } |
343 | @@ -XXX,XX +XXX,XX @@ static inline void populate_read_range(RAMBlock *block, ram_addr_t offset, | ||
344 | } | ||
345 | |||
346 | static inline int populate_read_section(MemoryRegionSection *section, | ||
347 | - void *opaque) | ||
348 | + bool is_private, void *opaque) | ||
349 | { | ||
350 | const hwaddr size = int128_get64(section->size); | ||
351 | hwaddr offset = section->offset_within_region; | ||
352 | @@ -XXX,XX +XXX,XX @@ static void ram_block_populate_read(RAMBlock *rb) | 899 | @@ -XXX,XX +XXX,XX @@ static void ram_block_populate_read(RAMBlock *rb) |
900 | * Note: The result is only stable while migrating (precopy/postcopy). | ||
901 | */ | ||
902 | if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) { | ||
903 | - RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr); | ||
904 | + GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr); | ||
905 | MemoryRegionSection section = { | ||
906 | .mr = rb->mr, | ||
907 | .offset_within_region = 0, | ||
353 | .size = rb->mr->size, | 908 | .size = rb->mr->size, |
354 | }; | 909 | }; |
355 | 910 | ||
356 | - ram_discard_manager_replay_populated(rdm, §ion, | 911 | - ram_discard_manager_replay_populated(rdm, §ion, |
357 | + ram_discard_manager_replay_populated(rdm, §ion, false, | 912 | + generic_state_manager_replay_on_state_set(gsm, §ion, |
358 | populate_read_section, NULL); | 913 | populate_read_section, NULL); |
359 | } else { | 914 | } else { |
360 | populate_read_range(rb, 0, rb->used_length); | 915 | populate_read_range(rb, 0, rb->used_length); |
361 | @@ -XXX,XX +XXX,XX @@ void ram_write_tracking_prepare(void) | ||
362 | } | ||
363 | |||
364 | static inline int uffd_protect_section(MemoryRegionSection *section, | ||
365 | - void *opaque) | ||
366 | + bool is_private, void *opaque) | ||
367 | { | ||
368 | const hwaddr size = int128_get64(section->size); | ||
369 | const hwaddr offset = section->offset_within_region; | ||
370 | @@ -XXX,XX +XXX,XX @@ static int ram_block_uffd_protect(RAMBlock *rb, int uffd_fd) | 916 | @@ -XXX,XX +XXX,XX @@ static int ram_block_uffd_protect(RAMBlock *rb, int uffd_fd) |
917 | |||
918 | /* See ram_block_populate_read() */ | ||
919 | if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) { | ||
920 | - RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr); | ||
921 | + GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr); | ||
922 | MemoryRegionSection section = { | ||
923 | .mr = rb->mr, | ||
924 | .offset_within_region = 0, | ||
371 | .size = rb->mr->size, | 925 | .size = rb->mr->size, |
372 | }; | 926 | }; |
373 | 927 | ||
374 | - return ram_discard_manager_replay_populated(rdm, §ion, | 928 | - return ram_discard_manager_replay_populated(rdm, §ion, |
375 | + return ram_discard_manager_replay_populated(rdm, §ion, false, | 929 | + return generic_state_manager_replay_on_state_set(gsm, §ion, |
376 | uffd_protect_section, | 930 | uffd_protect_section, |
377 | (void *)(uintptr_t)uffd_fd); | 931 | (void *)(uintptr_t)uffd_fd); |
378 | } | 932 | } |
379 | diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c | ||
380 | index XXXXXXX..XXXXXXX 100644 | ||
381 | --- a/system/guest-memfd-manager.c | ||
382 | +++ b/system/guest-memfd-manager.c | ||
383 | @@ -XXX,XX +XXX,XX @@ OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(GuestMemfdManager, | ||
384 | { }) | ||
385 | |||
386 | static bool guest_memfd_rdm_is_populated(const RamDiscardManager *rdm, | ||
387 | - const MemoryRegionSection *section) | ||
388 | + const MemoryRegionSection *section, | ||
389 | + bool is_private) | ||
390 | { | ||
391 | const GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); | ||
392 | uint64_t first_bit = section->offset_within_region / gmm->block_size; | ||
393 | uint64_t last_bit = first_bit + int128_get64(section->size) / gmm->block_size - 1; | ||
394 | unsigned long first_discard_bit; | ||
395 | |||
396 | - first_discard_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit); | ||
397 | + if (is_private) { | ||
398 | + /* Check if the private section is populated */ | ||
399 | + first_discard_bit = find_next_bit(gmm->bitmap, last_bit + 1, first_bit); | ||
400 | + } else { | ||
401 | + /* Check if the shared section is populated */ | ||
402 | + first_discard_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit); | ||
403 | + } | ||
404 | + | ||
405 | return first_discard_bit > last_bit; | ||
406 | } | ||
407 | |||
408 | -typedef int (*guest_memfd_section_cb)(MemoryRegionSection *s, void *arg); | ||
409 | +typedef int (*guest_memfd_section_cb)(MemoryRegionSection *s, bool is_private, | ||
410 | + void *arg); | ||
411 | |||
412 | -static int guest_memfd_notify_populate_cb(MemoryRegionSection *section, void *arg) | ||
413 | +static int guest_memfd_notify_populate_cb(MemoryRegionSection *section, bool is_private, | ||
414 | + void *arg) | ||
415 | { | ||
416 | RamDiscardListener *rdl = arg; | ||
417 | |||
418 | - return rdl->notify_populate(rdl, section); | ||
419 | + return rdl->notify_populate(rdl, section, is_private); | ||
420 | } | ||
421 | |||
422 | -static int guest_memfd_notify_discard_cb(MemoryRegionSection *section, void *arg) | ||
423 | +static int guest_memfd_notify_discard_cb(MemoryRegionSection *section, bool is_private, | ||
424 | + void *arg) | ||
425 | { | ||
426 | RamDiscardListener *rdl = arg; | ||
427 | |||
428 | - rdl->notify_discard(rdl, section); | ||
429 | + rdl->notify_discard(rdl, section, is_private); | ||
430 | |||
431 | return 0; | ||
432 | } | ||
433 | |||
434 | -static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm, | ||
435 | - MemoryRegionSection *section, | ||
436 | - void *arg, | ||
437 | - guest_memfd_section_cb cb) | ||
438 | +static int guest_memfd_for_each_shared_section(const GuestMemfdManager *gmm, | ||
439 | + MemoryRegionSection *section, | ||
440 | + bool is_private, | ||
441 | + void *arg, | ||
442 | + guest_memfd_section_cb cb) | ||
443 | { | ||
444 | unsigned long first_one_bit, last_one_bit; | ||
445 | uint64_t offset, size; | ||
446 | @@ -XXX,XX +XXX,XX @@ static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm, | ||
447 | break; | ||
448 | } | ||
449 | |||
450 | - ret = cb(&tmp, arg); | ||
451 | + ret = cb(&tmp, is_private, arg); | ||
452 | if (ret) { | ||
453 | break; | ||
454 | } | ||
455 | @@ -XXX,XX +XXX,XX @@ static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm, | ||
456 | return ret; | ||
457 | } | ||
458 | |||
459 | -static int guest_memfd_for_each_discarded_section(const GuestMemfdManager *gmm, | ||
460 | - MemoryRegionSection *section, | ||
461 | - void *arg, | ||
462 | - guest_memfd_section_cb cb) | ||
463 | +static int guest_memfd_for_each_private_section(const GuestMemfdManager *gmm, | ||
464 | + MemoryRegionSection *section, | ||
465 | + bool is_private, | ||
466 | + void *arg, | ||
467 | + guest_memfd_section_cb cb) | ||
468 | { | ||
469 | unsigned long first_zero_bit, last_zero_bit; | ||
470 | uint64_t offset, size; | ||
471 | @@ -XXX,XX +XXX,XX @@ static int guest_memfd_for_each_discarded_section(const GuestMemfdManager *gmm, | ||
472 | break; | ||
473 | } | ||
474 | |||
475 | - ret = cb(&tmp, arg); | ||
476 | + ret = cb(&tmp, is_private, arg); | ||
477 | if (ret) { | ||
478 | break; | ||
479 | } | ||
480 | @@ -XXX,XX +XXX,XX @@ static void guest_memfd_rdm_register_listener(RamDiscardManager *rdm, | ||
481 | |||
482 | QLIST_INSERT_HEAD(&gmm->rdl_list, rdl, next); | ||
483 | |||
484 | - ret = guest_memfd_for_each_populated_section(gmm, section, rdl, | ||
485 | - guest_memfd_notify_populate_cb); | ||
486 | + /* Populate shared part */ | ||
487 | + ret = guest_memfd_for_each_shared_section(gmm, section, false, rdl, | ||
488 | + guest_memfd_notify_populate_cb); | ||
489 | if (ret) { | ||
490 | error_report("%s: Failed to register RAM discard listener: %s", __func__, | ||
491 | strerror(-ret)); | ||
492 | @@ -XXX,XX +XXX,XX @@ static void guest_memfd_rdm_unregister_listener(RamDiscardManager *rdm, | ||
493 | g_assert(rdl->section); | ||
494 | g_assert(rdl->section->mr == gmm->mr); | ||
495 | |||
496 | - ret = guest_memfd_for_each_populated_section(gmm, rdl->section, rdl, | ||
497 | - guest_memfd_notify_discard_cb); | ||
498 | + /* Discard shared part */ | ||
499 | + ret = guest_memfd_for_each_shared_section(gmm, rdl->section, false, rdl, | ||
500 | + guest_memfd_notify_discard_cb); | ||
501 | if (ret) { | ||
502 | error_report("%s: Failed to unregister RAM discard listener: %s", __func__, | ||
503 | strerror(-ret)); | ||
504 | @@ -XXX,XX +XXX,XX @@ typedef struct GuestMemfdReplayData { | ||
505 | void *opaque; | ||
506 | } GuestMemfdReplayData; | ||
507 | |||
508 | -static int guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section, void *arg) | ||
509 | +static int guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section, | ||
510 | + bool is_private, void *arg) | ||
511 | { | ||
512 | struct GuestMemfdReplayData *data = arg; | ||
513 | ReplayRamPopulate replay_fn = data->fn; | ||
514 | |||
515 | - return replay_fn(section, data->opaque); | ||
516 | + return replay_fn(section, is_private, data->opaque); | ||
517 | } | ||
518 | |||
519 | static int guest_memfd_rdm_replay_populated(const RamDiscardManager *rdm, | ||
520 | MemoryRegionSection *section, | ||
521 | + bool is_private, | ||
522 | ReplayRamPopulate replay_fn, | ||
523 | void *opaque) | ||
524 | { | ||
525 | @@ -XXX,XX +XXX,XX @@ static int guest_memfd_rdm_replay_populated(const RamDiscardManager *rdm, | ||
526 | struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; | ||
527 | |||
528 | g_assert(section->mr == gmm->mr); | ||
529 | - return guest_memfd_for_each_populated_section(gmm, section, &data, | ||
530 | - guest_memfd_rdm_replay_populated_cb); | ||
531 | + if (is_private) { | ||
532 | + /* Replay populate on private section */ | ||
533 | + return guest_memfd_for_each_private_section(gmm, section, is_private, &data, | ||
534 | + guest_memfd_rdm_replay_populated_cb); | ||
535 | + } else { | ||
536 | + /* Replay populate on shared section */ | ||
537 | + return guest_memfd_for_each_shared_section(gmm, section, is_private, &data, | ||
538 | + guest_memfd_rdm_replay_populated_cb); | ||
539 | + } | ||
540 | } | ||
541 | |||
542 | -static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection *section, void *arg) | ||
543 | +static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection *section, | ||
544 | + bool is_private, void *arg) | ||
545 | { | ||
546 | struct GuestMemfdReplayData *data = arg; | ||
547 | ReplayRamDiscard replay_fn = data->fn; | ||
548 | |||
549 | - replay_fn(section, data->opaque); | ||
550 | + replay_fn(section, is_private, data->opaque); | ||
551 | |||
552 | return 0; | ||
553 | } | ||
554 | |||
555 | static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, | ||
556 | MemoryRegionSection *section, | ||
557 | + bool is_private, | ||
558 | ReplayRamDiscard replay_fn, | ||
559 | void *opaque) | ||
560 | { | ||
561 | @@ -XXX,XX +XXX,XX @@ static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, | ||
562 | struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; | ||
563 | |||
564 | g_assert(section->mr == gmm->mr); | ||
565 | - guest_memfd_for_each_discarded_section(gmm, section, &data, | ||
566 | - guest_memfd_rdm_replay_discarded_cb); | ||
567 | + | ||
568 | + if (is_private) { | ||
569 | + /* Replay discard on private section */ | ||
570 | + guest_memfd_for_each_private_section(gmm, section, is_private, &data, | ||
571 | + guest_memfd_rdm_replay_discarded_cb); | ||
572 | + } else { | ||
573 | + /* Replay discard on shared section */ | ||
574 | + guest_memfd_for_each_shared_section(gmm, section, is_private, &data, | ||
575 | + guest_memfd_rdm_replay_discarded_cb); | ||
576 | + } | ||
577 | } | ||
578 | |||
579 | static bool guest_memfd_is_valid_range(GuestMemfdManager *gmm, | ||
580 | @@ -XXX,XX +XXX,XX @@ static void guest_memfd_notify_discard(GuestMemfdManager *gmm, | ||
581 | continue; | ||
582 | } | ||
583 | |||
584 | - guest_memfd_for_each_populated_section(gmm, &tmp, rdl, | ||
585 | - guest_memfd_notify_discard_cb); | ||
586 | + /* For current shared section, notify to discard shared parts */ | ||
587 | + guest_memfd_for_each_shared_section(gmm, &tmp, false, rdl, | ||
588 | + guest_memfd_notify_discard_cb); | ||
589 | } | ||
590 | } | ||
591 | |||
592 | @@ -XXX,XX +XXX,XX @@ static int guest_memfd_notify_populate(GuestMemfdManager *gmm, | ||
593 | continue; | ||
594 | } | ||
595 | |||
596 | - ret = guest_memfd_for_each_discarded_section(gmm, &tmp, rdl, | ||
597 | - guest_memfd_notify_populate_cb); | ||
598 | + /* For current private section, notify to populate the shared parts */ | ||
599 | + ret = guest_memfd_for_each_private_section(gmm, &tmp, false, rdl, | ||
600 | + guest_memfd_notify_populate_cb); | ||
601 | if (ret) { | ||
602 | break; | ||
603 | } | ||
604 | @@ -XXX,XX +XXX,XX @@ static int guest_memfd_notify_populate(GuestMemfdManager *gmm, | ||
605 | continue; | ||
606 | } | ||
607 | |||
608 | - guest_memfd_for_each_discarded_section(gmm, &tmp, rdl2, | ||
609 | - guest_memfd_notify_discard_cb); | ||
610 | + guest_memfd_for_each_private_section(gmm, &tmp, false, rdl2, | ||
611 | + guest_memfd_notify_discard_cb); | ||
612 | } | ||
613 | } | ||
614 | return ret; | ||
615 | diff --git a/system/memory.c b/system/memory.c | 933 | diff --git a/system/memory.c b/system/memory.c |
616 | index XXXXXXX..XXXXXXX 100644 | 934 | index XXXXXXX..XXXXXXX 100644 |
617 | --- a/system/memory.c | 935 | --- a/system/memory.c |
618 | +++ b/system/memory.c | 936 | +++ b/system/memory.c |
619 | @@ -XXX,XX +XXX,XX @@ uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm, | 937 | @@ -XXX,XX +XXX,XX @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr) |
620 | } | 938 | return imrc->num_indexes(iommu_mr); |
621 | 939 | } | |
622 | bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, | 940 | |
941 | -RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr) | ||
942 | +GenericStateManager *memory_region_get_generic_state_manager(MemoryRegion *mr) | ||
943 | { | ||
944 | if (!memory_region_is_ram(mr)) { | ||
945 | return NULL; | ||
946 | } | ||
947 | - return mr->rdm; | ||
948 | + return mr->gsm; | ||
949 | } | ||
950 | |||
951 | -int memory_region_set_ram_discard_manager(MemoryRegion *mr, | ||
952 | - RamDiscardManager *rdm) | ||
953 | +int memory_region_set_generic_state_manager(MemoryRegion *mr, | ||
954 | + GenericStateManager *gsm) | ||
955 | { | ||
956 | g_assert(memory_region_is_ram(mr)); | ||
957 | - if (mr->rdm && rdm) { | ||
958 | + if (mr->gsm && gsm) { | ||
959 | return -EBUSY; | ||
960 | } | ||
961 | |||
962 | - mr->rdm = rdm; | ||
963 | + mr->gsm = gsm; | ||
964 | return 0; | ||
965 | } | ||
966 | |||
967 | -uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm, | ||
968 | - const MemoryRegion *mr) | ||
969 | +bool memory_region_has_ram_discard_manager(MemoryRegion *mr) | ||
970 | { | ||
971 | - RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); | ||
972 | + if (!memory_region_is_ram(mr) || | ||
973 | + !object_dynamic_cast(OBJECT(mr->gsm), TYPE_RAM_DISCARD_MANAGER)) { | ||
974 | + return false; | ||
975 | + } | ||
976 | + | ||
977 | + return true; | ||
978 | +} | ||
979 | + | ||
980 | +uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm, | ||
981 | + const MemoryRegion *mr) | ||
982 | +{ | ||
983 | + GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm); | ||
984 | |||
985 | - g_assert(rdmc->get_min_granularity); | ||
986 | - return rdmc->get_min_granularity(rdm, mr); | ||
987 | + g_assert(gsmc->get_min_granularity); | ||
988 | + return gsmc->get_min_granularity(gsm, mr); | ||
989 | } | ||
990 | |||
991 | -bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, | ||
623 | - const MemoryRegionSection *section) | 992 | - const MemoryRegionSection *section) |
624 | + const MemoryRegionSection *section, | 993 | +bool generic_state_manager_is_state_set(const GenericStateManager *gsm, |
625 | + bool is_private) | 994 | + const MemoryRegionSection *section) |
626 | { | 995 | { |
627 | RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); | 996 | - RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); |
628 | 997 | + GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm); | |
629 | g_assert(rdmc->is_populated); | 998 | |
999 | - g_assert(rdmc->is_populated); | ||
630 | - return rdmc->is_populated(rdm, section); | 1000 | - return rdmc->is_populated(rdm, section); |
631 | + return rdmc->is_populated(rdm, section, is_private); | 1001 | + g_assert(gsmc->is_state_set); |
632 | } | 1002 | + return gsmc->is_state_set(gsm, section); |
633 | 1003 | } | |
634 | int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, | 1004 | |
635 | MemoryRegionSection *section, | 1005 | -int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, |
636 | + bool is_private, | 1006 | - MemoryRegionSection *section, |
637 | ReplayRamPopulate replay_fn, | 1007 | - ReplayStateChange replay_fn, |
638 | void *opaque) | 1008 | - void *opaque) |
639 | { | 1009 | +int generic_state_manager_replay_on_state_set(const GenericStateManager *gsm, |
640 | RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); | 1010 | + MemoryRegionSection *section, |
641 | 1011 | + ReplayStateChange replay_fn, | |
642 | g_assert(rdmc->replay_populated); | 1012 | + void *opaque) |
1013 | { | ||
1014 | - RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); | ||
1015 | + GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm); | ||
1016 | |||
1017 | - g_assert(rdmc->replay_populated); | ||
643 | - return rdmc->replay_populated(rdm, section, replay_fn, opaque); | 1018 | - return rdmc->replay_populated(rdm, section, replay_fn, opaque); |
644 | + return rdmc->replay_populated(rdm, section, is_private, replay_fn, opaque); | 1019 | + g_assert(gsmc->replay_on_state_set); |
645 | } | 1020 | + return gsmc->replay_on_state_set(gsm, section, replay_fn, opaque); |
646 | 1021 | } | |
647 | void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, | 1022 | |
648 | MemoryRegionSection *section, | 1023 | -int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, |
649 | + bool is_private, | 1024 | - MemoryRegionSection *section, |
650 | ReplayRamDiscard replay_fn, | 1025 | - ReplayStateChange replay_fn, |
651 | void *opaque) | 1026 | - void *opaque) |
652 | { | 1027 | +int generic_state_manager_replay_on_state_clear(const GenericStateManager *gsm, |
653 | RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); | 1028 | + MemoryRegionSection *section, |
654 | 1029 | + ReplayStateChange replay_fn, | |
655 | g_assert(rdmc->replay_discarded); | 1030 | + void *opaque) |
656 | - rdmc->replay_discarded(rdm, section, replay_fn, opaque); | 1031 | { |
657 | + rdmc->replay_discarded(rdm, section, is_private, replay_fn, opaque); | 1032 | - RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); |
658 | } | 1033 | + GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm); |
659 | 1034 | ||
660 | void ram_discard_manager_register_listener(RamDiscardManager *rdm, | 1035 | - g_assert(rdmc->replay_discarded); |
1036 | - return rdmc->replay_discarded(rdm, section, replay_fn, opaque); | ||
1037 | + g_assert(gsmc->replay_on_state_clear); | ||
1038 | + return gsmc->replay_on_state_clear(gsm, section, replay_fn, opaque); | ||
1039 | } | ||
1040 | |||
1041 | -void ram_discard_manager_register_listener(RamDiscardManager *rdm, | ||
1042 | - RamDiscardListener *rdl, | ||
1043 | - MemoryRegionSection *section) | ||
1044 | +void generic_state_manager_register_listener(GenericStateManager *gsm, | ||
1045 | + StateChangeListener *scl, | ||
1046 | + MemoryRegionSection *section) | ||
1047 | { | ||
1048 | - RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); | ||
1049 | + GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm); | ||
1050 | |||
1051 | - g_assert(rdmc->register_listener); | ||
1052 | - rdmc->register_listener(rdm, rdl, section); | ||
1053 | + g_assert(gsmc->register_listener); | ||
1054 | + gsmc->register_listener(gsm, scl, section); | ||
1055 | } | ||
1056 | |||
1057 | -void ram_discard_manager_unregister_listener(RamDiscardManager *rdm, | ||
1058 | - RamDiscardListener *rdl) | ||
1059 | +void generic_state_manager_unregister_listener(GenericStateManager *gsm, | ||
1060 | + StateChangeListener *scl) | ||
1061 | { | ||
1062 | - RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); | ||
1063 | + GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm); | ||
1064 | |||
1065 | - g_assert(rdmc->unregister_listener); | ||
1066 | - rdmc->unregister_listener(rdm, rdl); | ||
1067 | + g_assert(gsmc->unregister_listener); | ||
1068 | + gsmc->unregister_listener(gsm, scl); | ||
1069 | } | ||
1070 | |||
1071 | /* Called with rcu_read_lock held. */ | ||
1072 | @@ -XXX,XX +XXX,XX @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, | ||
1073 | error_setg(errp, "iommu map to non memory area %" HWADDR_PRIx "", xlat); | ||
1074 | return false; | ||
1075 | } else if (memory_region_has_ram_discard_manager(mr)) { | ||
1076 | - RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr); | ||
1077 | + GenericStateManager *gsm = memory_region_get_generic_state_manager(mr); | ||
1078 | MemoryRegionSection tmp = { | ||
1079 | .mr = mr, | ||
1080 | .offset_within_region = xlat, | ||
661 | @@ -XXX,XX +XXX,XX @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, | 1081 | @@ -XXX,XX +XXX,XX @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, |
662 | * Disallow that. vmstate priorities make sure any RamDiscardManager | 1082 | * Disallow that. vmstate priorities make sure any RamDiscardManager |
663 | * were already restored before IOMMUs are restored. | 1083 | * were already restored before IOMMUs are restored. |
664 | */ | 1084 | */ |
665 | - if (!ram_discard_manager_is_populated(rdm, &tmp)) { | 1085 | - if (!ram_discard_manager_is_populated(rdm, &tmp)) { |
666 | + if (!ram_discard_manager_is_populated(rdm, &tmp, false)) { | 1086 | + if (!generic_state_manager_is_state_set(gsm, &tmp)) { |
667 | error_setg(errp, "iommu map to discarded memory (e.g., unplugged" | 1087 | error_setg(errp, "iommu map to discarded memory (e.g., unplugged" |
668 | " via virtio-mem): %" HWADDR_PRIx "", | 1088 | " via virtio-mem): %" HWADDR_PRIx "", |
669 | iotlb->translated_addr); | 1089 | iotlb->translated_addr); |
1090 | @@ -XXX,XX +XXX,XX @@ static const TypeInfo iommu_memory_region_info = { | ||
1091 | .abstract = true, | ||
1092 | }; | ||
1093 | |||
1094 | -static const TypeInfo ram_discard_manager_info = { | ||
1095 | +static const TypeInfo generic_state_manager_info = { | ||
1096 | .parent = TYPE_INTERFACE, | ||
1097 | + .name = TYPE_GENERIC_STATE_MANAGER, | ||
1098 | + .class_size = sizeof(GenericStateManagerClass), | ||
1099 | + .abstract = true, | ||
1100 | +}; | ||
1101 | + | ||
1102 | +static const TypeInfo ram_discard_manager_info = { | ||
1103 | + .parent = TYPE_GENERIC_STATE_MANAGER, | ||
1104 | .name = TYPE_RAM_DISCARD_MANAGER, | ||
1105 | .class_size = sizeof(RamDiscardManagerClass), | ||
1106 | }; | ||
1107 | @@ -XXX,XX +XXX,XX @@ static void memory_register_types(void) | ||
1108 | { | ||
1109 | type_register_static(&memory_region_info); | ||
1110 | type_register_static(&iommu_memory_region_info); | ||
1111 | + type_register_static(&generic_state_manager_info); | ||
1112 | type_register_static(&ram_discard_manager_info); | ||
1113 | } | ||
1114 | |||
670 | diff --git a/system/memory_mapping.c b/system/memory_mapping.c | 1115 | diff --git a/system/memory_mapping.c b/system/memory_mapping.c |
671 | index XXXXXXX..XXXXXXX 100644 | 1116 | index XXXXXXX..XXXXXXX 100644 |
672 | --- a/system/memory_mapping.c | 1117 | --- a/system/memory_mapping.c |
673 | +++ b/system/memory_mapping.c | 1118 | +++ b/system/memory_mapping.c |
674 | @@ -XXX,XX +XXX,XX @@ static void guest_phys_block_add_section(GuestPhysListener *g, | ||
675 | } | ||
676 | |||
677 | static int guest_phys_ram_populate_cb(MemoryRegionSection *section, | ||
678 | - void *opaque) | ||
679 | + bool is_private, void *opaque) | ||
680 | { | ||
681 | GuestPhysListener *g = opaque; | ||
682 | |||
683 | @@ -XXX,XX +XXX,XX @@ static void guest_phys_blocks_region_add(MemoryListener *listener, | 1119 | @@ -XXX,XX +XXX,XX @@ static void guest_phys_blocks_region_add(MemoryListener *listener, |
684 | RamDiscardManager *rdm; | 1120 | |
685 | 1121 | /* for special sparse regions, only add populated parts */ | |
686 | rdm = memory_region_get_ram_discard_manager(section->mr); | 1122 | if (memory_region_has_ram_discard_manager(section->mr)) { |
1123 | - RamDiscardManager *rdm; | ||
1124 | - | ||
1125 | - rdm = memory_region_get_ram_discard_manager(section->mr); | ||
687 | - ram_discard_manager_replay_populated(rdm, section, | 1126 | - ram_discard_manager_replay_populated(rdm, section, |
688 | + ram_discard_manager_replay_populated(rdm, section, false, | 1127 | + GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr); |
1128 | + generic_state_manager_replay_on_state_set(gsm, section, | ||
689 | guest_phys_ram_populate_cb, g); | 1129 | guest_phys_ram_populate_cb, g); |
690 | return; | 1130 | return; |
691 | } | 1131 | } |
692 | -- | 1132 | -- |
693 | 2.43.5 | 1133 | 2.43.5 | diff view generated by jsdifflib |
New patch | |||
---|---|---|---|
1 | To manage the private and shared RAM states in confidential VMs, | ||
2 | introduce a new class of PrivateShareManager as a child of | ||
3 | GenericStateManager, which inherits the six interface callbacks. With a | ||
4 | different interface type, it can be distinguished from the | ||
5 | RamDiscardManager object and provide the flexibility for addressing | ||
6 | specific requirements of confidential VMs in the future. | ||
1 | 7 | ||
8 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> | ||
9 | --- | ||
10 | Changes in v4: | ||
11 | - Newly added. | ||
12 | --- | ||
13 | include/exec/memory.h | 44 +++++++++++++++++++++++++++++++++++++++++-- | ||
14 | system/memory.c | 17 +++++++++++++++++ | ||
15 | 2 files changed, 59 insertions(+), 2 deletions(-) | ||
16 | |||
17 | diff --git a/include/exec/memory.h b/include/exec/memory.h | ||
18 | index XXXXXXX..XXXXXXX 100644 | ||
19 | --- a/include/exec/memory.h | ||
20 | +++ b/include/exec/memory.h | ||
21 | @@ -XXX,XX +XXX,XX @@ typedef struct RamDiscardManager RamDiscardManager; | ||
22 | DECLARE_OBJ_CHECKERS(RamDiscardManager, RamDiscardManagerClass, | ||
23 | RAM_DISCARD_MANAGER, TYPE_RAM_DISCARD_MANAGER); | ||
24 | |||
25 | +#define TYPE_PRIVATE_SHARED_MANAGER "private-shared-manager" | ||
26 | +typedef struct PrivateSharedManagerClass PrivateSharedManagerClass; | ||
27 | +typedef struct PrivateSharedManager PrivateSharedManager; | ||
28 | +DECLARE_OBJ_CHECKERS(PrivateSharedManager, PrivateSharedManagerClass, | ||
29 | + PRIVATE_SHARED_MANAGER, TYPE_PRIVATE_SHARED_MANAGER) | ||
30 | + | ||
31 | #ifdef CONFIG_FUZZ | ||
32 | void fuzz_dma_read_cb(size_t addr, | ||
33 | size_t len, | ||
34 | @@ -XXX,XX +XXX,XX @@ void generic_state_manager_register_listener(GenericStateManager *gsm, | ||
35 | void generic_state_manager_unregister_listener(GenericStateManager *gsm, | ||
36 | StateChangeListener *scl); | ||
37 | |||
38 | +static inline void state_change_listener_init(StateChangeListener *scl, | ||
39 | + NotifyStateSet state_set_fn, | ||
40 | + NotifyStateClear state_clear_fn) | ||
41 | +{ | ||
42 | + scl->notify_to_state_set = state_set_fn; | ||
43 | + scl->notify_to_state_clear = state_clear_fn; | ||
44 | +} | ||
45 | + | ||
46 | typedef struct RamDiscardListener RamDiscardListener; | ||
47 | |||
48 | struct RamDiscardListener { | ||
49 | @@ -XXX,XX +XXX,XX @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl, | ||
50 | NotifyStateClear discard_fn, | ||
51 | bool double_discard_supported) | ||
52 | { | ||
53 | - rdl->scl.notify_to_state_set = populate_fn; | ||
54 | - rdl->scl.notify_to_state_clear = discard_fn; | ||
55 | + state_change_listener_init(&rdl->scl, populate_fn, discard_fn); | ||
56 | rdl->double_discard_supported = double_discard_supported; | ||
57 | } | ||
58 | |||
59 | @@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass { | ||
60 | GenericStateManagerClass parent_class; | ||
61 | }; | ||
62 | |||
63 | +typedef struct PrivateSharedListener PrivateSharedListener; | ||
64 | +struct PrivateSharedListener { | ||
65 | + struct StateChangeListener scl; | ||
66 | + | ||
67 | + QLIST_ENTRY(PrivateSharedListener) next; | ||
68 | +}; | ||
69 | + | ||
70 | +struct PrivateSharedManagerClass { | ||
71 | + /* private */ | ||
72 | + GenericStateManagerClass parent_class; | ||
73 | +}; | ||
74 | + | ||
75 | +static inline void private_shared_listener_init(PrivateSharedListener *psl, | ||
76 | + NotifyStateSet populate_fn, | ||
77 | + NotifyStateClear discard_fn) | ||
78 | +{ | ||
79 | + state_change_listener_init(&psl->scl, populate_fn, discard_fn); | ||
80 | +} | ||
81 | + | ||
82 | /** | ||
83 | * memory_get_xlat_addr: Extract addresses from a TLB entry | ||
84 | * | ||
85 | @@ -XXX,XX +XXX,XX @@ int memory_region_set_generic_state_manager(MemoryRegion *mr, | ||
86 | */ | ||
87 | bool memory_region_has_ram_discard_manager(MemoryRegion *mr); | ||
88 | |||
89 | +/** | ||
90 | + * memory_region_has_private_shared_manager: check whether a #MemoryRegion has a | ||
91 | + * #PrivateSharedManager assigned | ||
92 | + * | ||
93 | + * @mr: the #MemoryRegion | ||
94 | + */ | ||
95 | +bool memory_region_has_private_shared_manager(MemoryRegion *mr); | ||
96 | + | ||
97 | /** | ||
98 | * memory_region_find: translate an address/size relative to a | ||
99 | * MemoryRegion into a #MemoryRegionSection. | ||
100 | diff --git a/system/memory.c b/system/memory.c | ||
101 | index XXXXXXX..XXXXXXX 100644 | ||
102 | --- a/system/memory.c | ||
103 | +++ b/system/memory.c | ||
104 | @@ -XXX,XX +XXX,XX @@ bool memory_region_has_ram_discard_manager(MemoryRegion *mr) | ||
105 | return true; | ||
106 | } | ||
107 | |||
108 | +bool memory_region_has_private_shared_manager(MemoryRegion *mr) | ||
109 | +{ | ||
110 | + if (!memory_region_is_ram(mr) || | ||
111 | + !object_dynamic_cast(OBJECT(mr->gsm), TYPE_PRIVATE_SHARED_MANAGER)) { | ||
112 | + return false; | ||
113 | + } | ||
114 | + | ||
115 | + return true; | ||
116 | +} | ||
117 | + | ||
118 | uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm, | ||
119 | const MemoryRegion *mr) | ||
120 | { | ||
121 | @@ -XXX,XX +XXX,XX @@ static const TypeInfo ram_discard_manager_info = { | ||
122 | .class_size = sizeof(RamDiscardManagerClass), | ||
123 | }; | ||
124 | |||
125 | +static const TypeInfo private_shared_manager_info = { | ||
126 | + .parent = TYPE_GENERIC_STATE_MANAGER, | ||
127 | + .name = TYPE_PRIVATE_SHARED_MANAGER, | ||
128 | + .class_size = sizeof(PrivateSharedManagerClass), | ||
129 | +}; | ||
130 | + | ||
131 | static void memory_register_types(void) | ||
132 | { | ||
133 | type_register_static(&memory_region_info); | ||
134 | type_register_static(&iommu_memory_region_info); | ||
135 | type_register_static(&generic_state_manager_info); | ||
136 | type_register_static(&ram_discard_manager_info); | ||
137 | + type_register_static(&private_shared_manager_info); | ||
138 | } | ||
139 | |||
140 | type_init(memory_register_types) | ||
141 | -- | ||
142 | 2.43.5 | diff view generated by jsdifflib |
New patch | |||
---|---|---|---|
1 | Subsystems like VFIO previously disabled ram block discard and only | ||
2 | allowed coordinated discarding via RamDiscardManager. However, | ||
3 | guest_memfd in confidential VMs relies on discard operations for page | ||
4 | conversion between private and shared memory. This can lead to stale | ||
5 | IOMMU mapping issue when assigning a hardware device to a confidential | ||
6 | VM via shared memory. With the introduction of PrivateSharedManager | ||
7 | interface to manage private and shared states and being distinct from | ||
8 | RamDiscardManager, include PrivateSharedManager in coordinated RAM | ||
9 | discard and add related support in VFIO. | ||
1 | 10 | ||
11 | Currently, migration support for confidential VMs is not available, so | ||
12 | vfio_sync_dirty_bitmap() handling for PrivateSharedListener can be | ||
13 | ignored. The register/unregister of PrivateSharedListener is necessary | ||
14 | during vfio_listener_region_add/del(). The listener callbacks are | ||
15 | similar between RamDiscardListener and PrivateSharedListener, allowing | ||
16 | for extraction of common parts opportunisticlly. | ||
17 | |||
18 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> | ||
19 | --- | ||
20 | Changes in v4 | ||
21 | - Newly added. | ||
22 | --- | ||
23 | hw/vfio/common.c | 104 +++++++++++++++++++++++--- | ||
24 | hw/vfio/container-base.c | 1 + | ||
25 | include/hw/vfio/vfio-container-base.h | 10 +++ | ||
26 | 3 files changed, 105 insertions(+), 10 deletions(-) | ||
27 | |||
28 | diff --git a/hw/vfio/common.c b/hw/vfio/common.c | ||
29 | index XXXXXXX..XXXXXXX 100644 | ||
30 | --- a/hw/vfio/common.c | ||
31 | +++ b/hw/vfio/common.c | ||
32 | @@ -XXX,XX +XXX,XX @@ out: | ||
33 | rcu_read_unlock(); | ||
34 | } | ||
35 | |||
36 | -static void vfio_ram_discard_notify_discard(StateChangeListener *scl, | ||
37 | - MemoryRegionSection *section) | ||
38 | +static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer, | ||
39 | + MemoryRegionSection *section) | ||
40 | { | ||
41 | - RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl); | ||
42 | - VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, | ||
43 | - listener); | ||
44 | - VFIOContainerBase *bcontainer = vrdl->bcontainer; | ||
45 | const hwaddr size = int128_get64(section->size); | ||
46 | const hwaddr iova = section->offset_within_address_space; | ||
47 | int ret; | ||
48 | @@ -XXX,XX +XXX,XX @@ static void vfio_ram_discard_notify_discard(StateChangeListener *scl, | ||
49 | } | ||
50 | } | ||
51 | |||
52 | -static int vfio_ram_discard_notify_populate(StateChangeListener *scl, | ||
53 | +static void vfio_ram_discard_notify_discard(StateChangeListener *scl, | ||
54 | MemoryRegionSection *section) | ||
55 | { | ||
56 | RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl); | ||
57 | VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, | ||
58 | listener); | ||
59 | - VFIOContainerBase *bcontainer = vrdl->bcontainer; | ||
60 | + vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section); | ||
61 | +} | ||
62 | + | ||
63 | +static void vfio_private_shared_notify_to_private(StateChangeListener *scl, | ||
64 | + MemoryRegionSection *section) | ||
65 | +{ | ||
66 | + PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl); | ||
67 | + VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener, | ||
68 | + listener); | ||
69 | + vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section); | ||
70 | +} | ||
71 | + | ||
72 | +static int vfio_state_change_notify_to_state_set(VFIOContainerBase *bcontainer, | ||
73 | + MemoryRegionSection *section, | ||
74 | + uint64_t granularity) | ||
75 | +{ | ||
76 | const hwaddr end = section->offset_within_region + | ||
77 | int128_get64(section->size); | ||
78 | hwaddr start, next, iova; | ||
79 | @@ -XXX,XX +XXX,XX @@ static int vfio_ram_discard_notify_populate(StateChangeListener *scl, | ||
80 | * unmap in minimum granularity later. | ||
81 | */ | ||
82 | for (start = section->offset_within_region; start < end; start = next) { | ||
83 | - next = ROUND_UP(start + 1, vrdl->granularity); | ||
84 | + next = ROUND_UP(start + 1, granularity); | ||
85 | next = MIN(next, end); | ||
86 | |||
87 | iova = start - section->offset_within_region + | ||
88 | @@ -XXX,XX +XXX,XX @@ static int vfio_ram_discard_notify_populate(StateChangeListener *scl, | ||
89 | vaddr, section->readonly); | ||
90 | if (ret) { | ||
91 | /* Rollback */ | ||
92 | - vfio_ram_discard_notify_discard(scl, section); | ||
93 | + vfio_state_change_notify_to_state_clear(bcontainer, section); | ||
94 | return ret; | ||
95 | } | ||
96 | } | ||
97 | return 0; | ||
98 | } | ||
99 | |||
100 | +static int vfio_ram_discard_notify_populate(StateChangeListener *scl, | ||
101 | + MemoryRegionSection *section) | ||
102 | +{ | ||
103 | + RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl); | ||
104 | + VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, | ||
105 | + listener); | ||
106 | + return vfio_state_change_notify_to_state_set(vrdl->bcontainer, section, | ||
107 | + vrdl->granularity); | ||
108 | +} | ||
109 | + | ||
110 | +static int vfio_private_shared_notify_to_shared(StateChangeListener *scl, | ||
111 | + MemoryRegionSection *section) | ||
112 | +{ | ||
113 | + PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl); | ||
114 | + VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener, | ||
115 | + listener); | ||
116 | + return vfio_state_change_notify_to_state_set(vpsl->bcontainer, section, | ||
117 | + vpsl->granularity); | ||
118 | +} | ||
119 | + | ||
120 | static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer, | ||
121 | MemoryRegionSection *section) | ||
122 | { | ||
123 | @@ -XXX,XX +XXX,XX @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer, | ||
124 | } | ||
125 | } | ||
126 | |||
127 | +static void vfio_register_private_shared_listener(VFIOContainerBase *bcontainer, | ||
128 | + MemoryRegionSection *section) | ||
129 | +{ | ||
130 | + GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr); | ||
131 | + VFIOPrivateSharedListener *vpsl; | ||
132 | + PrivateSharedListener *psl; | ||
133 | + | ||
134 | + vpsl = g_new0(VFIOPrivateSharedListener, 1); | ||
135 | + vpsl->bcontainer = bcontainer; | ||
136 | + vpsl->mr = section->mr; | ||
137 | + vpsl->offset_within_address_space = section->offset_within_address_space; | ||
138 | + vpsl->granularity = generic_state_manager_get_min_granularity(gsm, | ||
139 | + section->mr); | ||
140 | + | ||
141 | + psl = &vpsl->listener; | ||
142 | + private_shared_listener_init(psl, vfio_private_shared_notify_to_shared, | ||
143 | + vfio_private_shared_notify_to_private); | ||
144 | + generic_state_manager_register_listener(gsm, &psl->scl, section); | ||
145 | + QLIST_INSERT_HEAD(&bcontainer->vpsl_list, vpsl, next); | ||
146 | +} | ||
147 | + | ||
148 | static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer, | ||
149 | MemoryRegionSection *section) | ||
150 | { | ||
151 | @@ -XXX,XX +XXX,XX @@ static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer, | ||
152 | g_free(vrdl); | ||
153 | } | ||
154 | |||
155 | +static void vfio_unregister_private_shared_listener(VFIOContainerBase *bcontainer, | ||
156 | + MemoryRegionSection *section) | ||
157 | +{ | ||
158 | + GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr); | ||
159 | + VFIOPrivateSharedListener *vpsl = NULL; | ||
160 | + PrivateSharedListener *psl; | ||
161 | + | ||
162 | + QLIST_FOREACH(vpsl, &bcontainer->vpsl_list, next) { | ||
163 | + if (vpsl->mr == section->mr && | ||
164 | + vpsl->offset_within_address_space == | ||
165 | + section->offset_within_address_space) { | ||
166 | + break; | ||
167 | + } | ||
168 | + } | ||
169 | + | ||
170 | + if (!vpsl) { | ||
171 | + hw_error("vfio: Trying to unregister missing RAM discard listener"); | ||
172 | + } | ||
173 | + | ||
174 | + psl = &vpsl->listener; | ||
175 | + generic_state_manager_unregister_listener(gsm, &psl->scl); | ||
176 | + QLIST_REMOVE(vpsl, next); | ||
177 | + g_free(vpsl); | ||
178 | +} | ||
179 | + | ||
180 | static bool vfio_known_safe_misalignment(MemoryRegionSection *section) | ||
181 | { | ||
182 | MemoryRegion *mr = section->mr; | ||
183 | @@ -XXX,XX +XXX,XX @@ static void vfio_listener_region_add(MemoryListener *listener, | ||
184 | if (memory_region_has_ram_discard_manager(section->mr)) { | ||
185 | vfio_register_ram_discard_listener(bcontainer, section); | ||
186 | return; | ||
187 | + } else if (memory_region_has_private_shared_manager(section->mr)) { | ||
188 | + vfio_register_private_shared_listener(bcontainer, section); | ||
189 | + return; | ||
190 | } | ||
191 | |||
192 | vaddr = memory_region_get_ram_ptr(section->mr) + | ||
193 | @@ -XXX,XX +XXX,XX @@ static void vfio_listener_region_del(MemoryListener *listener, | ||
194 | vfio_unregister_ram_discard_listener(bcontainer, section); | ||
195 | /* Unregistering will trigger an unmap. */ | ||
196 | try_unmap = false; | ||
197 | + } else if (memory_region_has_private_shared_manager(section->mr)) { | ||
198 | + vfio_unregister_private_shared_listener(bcontainer, section); | ||
199 | + /* Unregistering will trigger an unmap. */ | ||
200 | + try_unmap = false; | ||
201 | } | ||
202 | |||
203 | if (try_unmap) { | ||
204 | diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c | ||
205 | index XXXXXXX..XXXXXXX 100644 | ||
206 | --- a/hw/vfio/container-base.c | ||
207 | +++ b/hw/vfio/container-base.c | ||
208 | @@ -XXX,XX +XXX,XX @@ static void vfio_container_instance_init(Object *obj) | ||
209 | bcontainer->iova_ranges = NULL; | ||
210 | QLIST_INIT(&bcontainer->giommu_list); | ||
211 | QLIST_INIT(&bcontainer->vrdl_list); | ||
212 | + QLIST_INIT(&bcontainer->vpsl_list); | ||
213 | } | ||
214 | |||
215 | static const TypeInfo types[] = { | ||
216 | diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h | ||
217 | index XXXXXXX..XXXXXXX 100644 | ||
218 | --- a/include/hw/vfio/vfio-container-base.h | ||
219 | +++ b/include/hw/vfio/vfio-container-base.h | ||
220 | @@ -XXX,XX +XXX,XX @@ typedef struct VFIOContainerBase { | ||
221 | bool dirty_pages_started; /* Protected by BQL */ | ||
222 | QLIST_HEAD(, VFIOGuestIOMMU) giommu_list; | ||
223 | QLIST_HEAD(, VFIORamDiscardListener) vrdl_list; | ||
224 | + QLIST_HEAD(, VFIOPrivateSharedListener) vpsl_list; | ||
225 | QLIST_ENTRY(VFIOContainerBase) next; | ||
226 | QLIST_HEAD(, VFIODevice) device_list; | ||
227 | GList *iova_ranges; | ||
228 | @@ -XXX,XX +XXX,XX @@ typedef struct VFIORamDiscardListener { | ||
229 | QLIST_ENTRY(VFIORamDiscardListener) next; | ||
230 | } VFIORamDiscardListener; | ||
231 | |||
232 | +typedef struct VFIOPrivateSharedListener { | ||
233 | + VFIOContainerBase *bcontainer; | ||
234 | + MemoryRegion *mr; | ||
235 | + hwaddr offset_within_address_space; | ||
236 | + uint64_t granularity; | ||
237 | + PrivateSharedListener listener; | ||
238 | + QLIST_ENTRY(VFIOPrivateSharedListener) next; | ||
239 | +} VFIOPrivateSharedListener; | ||
240 | + | ||
241 | int vfio_container_dma_map(VFIOContainerBase *bcontainer, | ||
242 | hwaddr iova, ram_addr_t size, | ||
243 | void *vaddr, bool readonly); | ||
244 | -- | ||
245 | 2.43.5 | diff view generated by jsdifflib |
1 | As the commit 852f0048f3 ("RAMBlock: make guest_memfd require | 1 | Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated |
---|---|---|---|
2 | uncoordinated discard") highlighted, some subsystems like VFIO might | 2 | discard") highlighted that subsystems like VFIO may disable RAM block |
3 | disable ram block discard. However, guest_memfd relies on the discard | 3 | discard. However, guest_memfd relies on discard operations for page |
4 | operation to perform page conversion between private and shared memory. | 4 | conversion between private and shared memory, potentially leading to |
5 | This can lead to stale IOMMU mapping issue when assigning a hardware | 5 | stale IOMMU mapping issue when assigning hardware devices to |
6 | device to a confidential VM via shared memory (unprotected memory | 6 | confidential VMs via shared memory. To address this, it is crucial to |
7 | pages). Blocking shared page discard can solve this problem, but it | 7 | ensure systems like VFIO refresh its IOMMU mappings. |
8 | could cause guests to consume twice the memory with VFIO, which is not | 8 | |
9 | acceptable in some cases. An alternative solution is to convey other | 9 | PrivateSharedManager is introduced to manage private and shared states in |
10 | systems like VFIO to refresh its outdated IOMMU mappings. | 10 | confidential VMs, similar to RamDiscardManager, which supports |
11 | 11 | coordinated RAM discard in VFIO. Integrating PrivateSharedManager with | |
12 | RamDiscardManager is an existing concept (used by virtio-mem) to adjust | 12 | guest_memfd can facilitate the adjustment of VFIO mappings in response |
13 | VFIO mappings in relation to VM page assignment. Effectively page | 13 | to page conversion events. |
14 | conversion is similar to hot-removing a page in one mode and adding it | 14 | |
15 | back in the other, so the similar work that needs to happen in response | 15 | Since guest_memfd is not an object, it cannot directly implement the |
16 | to virtio-mem changes needs to happen for page conversion events. | 16 | PrivateSharedManager interface. Implementing it in HostMemoryBackend is |
17 | Introduce the RamDiscardManager to guest_memfd to achieve it. | 17 | not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks |
18 | 18 | have a memory backend while others do not. Notably, virtual BIOS | |
19 | However, guest_memfd is not an object so it cannot directly implement | 19 | RAMBlocks using memory_region_init_ram_guest_memfd() do not have a |
20 | the RamDiscardManager interface. | 20 | backend. |
21 | 21 | ||
22 | One solution is to implement the interface in HostMemoryBackend. Any | 22 | To manage RAMBlocks with guest_memfd, define a new object named |
23 | guest_memfd-backed host memory backend can register itself in the target | 23 | RamBlockAttribute to implement the RamDiscardManager interface. This |
24 | MemoryRegion. However, this solution doesn't cover the scenario where a | 24 | object stores guest_memfd information such as shared_bitmap, and handles |
25 | guest_memfd MemoryRegion doesn't belong to the HostMemoryBackend, e.g. | 25 | page conversion notification. The memory state is tracked at the host |
26 | the virtual BIOS MemoryRegion. | 26 | page size granularity, as the minimum memory conversion size can be one |
27 | 27 | page per request. Additionally, VFIO expects the DMA mapping for a | |
28 | Thus, choose the second option, i.e. define an object type named | 28 | specific iova to be mapped and unmapped with the same granularity. |
29 | guest_memfd_manager with RamDiscardManager interface. Upon creation of | 29 | Confidential VMs may perform partial conversions, such as conversions on |
30 | guest_memfd, a new guest_memfd_manager object can be instantiated and | 30 | small regions within larger regions. To prevent invalid cases and until |
31 | registered to the managed guest_memfd MemoryRegion to handle the page | 31 | cut_mapping operation support is available, all operations are performed |
32 | conversion events. | 32 | with 4K granularity. |
33 | |||
34 | In the context of guest_memfd, the discarded state signifies that the | ||
35 | page is private, while the populated state indicated that the page is | ||
36 | shared. The state of the memory is tracked at the granularity of the | ||
37 | host page size (i.e. block_size), as the minimum conversion size can be | ||
38 | one page per request. | ||
39 | |||
40 | In addition, VFIO expects the DMA mapping for a specific iova to be | ||
41 | mapped and unmapped with the same granularity. However, the confidential | ||
42 | VMs may do partial conversion, e.g. conversion happens on a small region | ||
43 | within a large region. To prevent such invalid cases and before any | ||
44 | potential optimization comes out, all operations are performed with 4K | ||
45 | granularity. | ||
46 | 33 | ||
47 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> | 34 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> |
48 | --- | 35 | --- |
49 | include/sysemu/guest-memfd-manager.h | 46 +++++ | 36 | Changes in v4: |
50 | system/guest-memfd-manager.c | 250 +++++++++++++++++++++++++++ | 37 | - Change the name from memory-attribute-manager to |
51 | system/meson.build | 1 + | 38 | ram-block-attribute. |
52 | 3 files changed, 297 insertions(+) | 39 | - Implement the newly-introduced PrivateSharedManager instead of |
53 | create mode 100644 include/sysemu/guest-memfd-manager.h | 40 | RamDiscardManager and change related commit message. |
54 | create mode 100644 system/guest-memfd-manager.c | 41 | - Define the new object in ramblock.h instead of adding a new file. |
55 | 42 | ||
56 | diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h | 43 | Changes in v3: |
44 | - Some rename (bitmap_size->shared_bitmap_size, | ||
45 | first_one/zero_bit->first_bit, etc.) | ||
46 | - Change shared_bitmap_size from uint32_t to unsigned | ||
47 | - Return mgr->mr->ram_block->page_size in get_block_size() | ||
48 | - Move set_ram_discard_manager() up to avoid a g_free() in failure | ||
49 | case. | ||
50 | - Add const for the memory_attribute_manager_get_block_size() | ||
51 | - Unify the ReplayRamPopulate and ReplayRamDiscard and related | ||
52 | callback. | ||
53 | |||
54 | Changes in v2: | ||
55 | - Rename the object name to MemoryAttributeManager | ||
56 | - Rename the bitmap to shared_bitmap to make it more clear. | ||
57 | - Remove block_size field and get it from a helper. In future, we | ||
58 | can get the page_size from RAMBlock if necessary. | ||
59 | - Remove the unncessary "struct" before GuestMemfdReplayData | ||
60 | - Remove the unncessary g_free() for the bitmap | ||
61 | - Add some error report when the callback failure for | ||
62 | populated/discarded section. | ||
63 | - Move the realize()/unrealize() definition to this patch. | ||
64 | --- | ||
65 | include/exec/ramblock.h | 24 +++ | ||
66 | system/meson.build | 1 + | ||
67 | system/ram-block-attribute.c | 282 +++++++++++++++++++++++++++++++++++ | ||
68 | 3 files changed, 307 insertions(+) | ||
69 | create mode 100644 system/ram-block-attribute.c | ||
70 | |||
71 | diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h | ||
72 | index XXXXXXX..XXXXXXX 100644 | ||
73 | --- a/include/exec/ramblock.h | ||
74 | +++ b/include/exec/ramblock.h | ||
75 | @@ -XXX,XX +XXX,XX @@ | ||
76 | #include "cpu-common.h" | ||
77 | #include "qemu/rcu.h" | ||
78 | #include "exec/ramlist.h" | ||
79 | +#include "system/hostmem.h" | ||
80 | + | ||
81 | +#define TYPE_RAM_BLOCK_ATTRIBUTE "ram-block-attribute" | ||
82 | +OBJECT_DECLARE_TYPE(RamBlockAttribute, RamBlockAttributeClass, RAM_BLOCK_ATTRIBUTE) | ||
83 | |||
84 | struct RAMBlock { | ||
85 | struct rcu_head rcu; | ||
86 | @@ -XXX,XX +XXX,XX @@ struct RAMBlock { | ||
87 | */ | ||
88 | ram_addr_t postcopy_length; | ||
89 | }; | ||
90 | + | ||
91 | +struct RamBlockAttribute { | ||
92 | + Object parent; | ||
93 | + | ||
94 | + MemoryRegion *mr; | ||
95 | + | ||
96 | + /* 1-setting of the bit represents the memory is populated (shared) */ | ||
97 | + unsigned shared_bitmap_size; | ||
98 | + unsigned long *shared_bitmap; | ||
99 | + | ||
100 | + QLIST_HEAD(, PrivateSharedListener) psl_list; | ||
101 | +}; | ||
102 | + | ||
103 | +struct RamBlockAttributeClass { | ||
104 | + ObjectClass parent_class; | ||
105 | +}; | ||
106 | + | ||
107 | +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr); | ||
108 | +void ram_block_attribute_unrealize(RamBlockAttribute *attr); | ||
109 | + | ||
110 | #endif | ||
111 | #endif | ||
112 | diff --git a/system/meson.build b/system/meson.build | ||
113 | index XXXXXXX..XXXXXXX 100644 | ||
114 | --- a/system/meson.build | ||
115 | +++ b/system/meson.build | ||
116 | @@ -XXX,XX +XXX,XX @@ system_ss.add(files( | ||
117 | 'dirtylimit.c', | ||
118 | 'dma-helpers.c', | ||
119 | 'globals.c', | ||
120 | + 'ram-block-attribute.c', | ||
121 | 'memory_mapping.c', | ||
122 | 'qdev-monitor.c', | ||
123 | 'qtest.c', | ||
124 | diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c | ||
57 | new file mode 100644 | 125 | new file mode 100644 |
58 | index XXXXXXX..XXXXXXX | 126 | index XXXXXXX..XXXXXXX |
59 | --- /dev/null | 127 | --- /dev/null |
60 | +++ b/include/sysemu/guest-memfd-manager.h | 128 | +++ b/system/ram-block-attribute.c |
61 | @@ -XXX,XX +XXX,XX @@ | 129 | @@ -XXX,XX +XXX,XX @@ |
62 | +/* | 130 | +/* |
63 | + * QEMU guest memfd manager | 131 | + * QEMU ram block attribute |
64 | + * | 132 | + * |
65 | + * Copyright Intel | 133 | + * Copyright Intel |
66 | + * | 134 | + * |
67 | + * Author: | 135 | + * Author: |
68 | + * Chenyi Qiang <chenyi.qiang@intel.com> | 136 | + * Chenyi Qiang <chenyi.qiang@intel.com> |
69 | + * | 137 | + * |
70 | + * This work is licensed under the terms of the GNU GPL, version 2 or later. | 138 | + * This work is licensed under the terms of the GNU GPL, version 2 or later. |
71 | + * See the COPYING file in the top-level directory | 139 | + * See the COPYING file in the top-level directory |
72 | + * | 140 | + * |
73 | + */ | 141 | + */ |
74 | + | 142 | + |
75 | +#ifndef SYSEMU_GUEST_MEMFD_MANAGER_H | ||
76 | +#define SYSEMU_GUEST_MEMFD_MANAGER_H | ||
77 | + | ||
78 | +#include "sysemu/hostmem.h" | ||
79 | + | ||
80 | +#define TYPE_GUEST_MEMFD_MANAGER "guest-memfd-manager" | ||
81 | + | ||
82 | +OBJECT_DECLARE_TYPE(GuestMemfdManager, GuestMemfdManagerClass, GUEST_MEMFD_MANAGER) | ||
83 | + | ||
84 | +struct GuestMemfdManager { | ||
85 | + Object parent; | ||
86 | + | ||
87 | + /* Managed memory region. */ | ||
88 | + MemoryRegion *mr; | ||
89 | + | ||
90 | + /* | ||
91 | + * 1-setting of the bit represents the memory is populated (shared). | ||
92 | + */ | ||
93 | + int32_t bitmap_size; | ||
94 | + unsigned long *bitmap; | ||
95 | + | ||
96 | + /* block size and alignment */ | ||
97 | + uint64_t block_size; | ||
98 | + | ||
99 | + /* listeners to notify on populate/discard activity. */ | ||
100 | + QLIST_HEAD(, RamDiscardListener) rdl_list; | ||
101 | +}; | ||
102 | + | ||
103 | +struct GuestMemfdManagerClass { | ||
104 | + ObjectClass parent_class; | ||
105 | +}; | ||
106 | + | ||
107 | +#endif | ||
108 | diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c | ||
109 | new file mode 100644 | ||
110 | index XXXXXXX..XXXXXXX | ||
111 | --- /dev/null | ||
112 | +++ b/system/guest-memfd-manager.c | ||
113 | @@ -XXX,XX +XXX,XX @@ | ||
114 | +/* | ||
115 | + * QEMU guest memfd manager | ||
116 | + * | ||
117 | + * Copyright Intel | ||
118 | + * | ||
119 | + * Author: | ||
120 | + * Chenyi Qiang <chenyi.qiang@intel.com> | ||
121 | + * | ||
122 | + * This work is licensed under the terms of the GNU GPL, version 2 or later. | ||
123 | + * See the COPYING file in the top-level directory | ||
124 | + * | ||
125 | + */ | ||
126 | + | ||
127 | +#include "qemu/osdep.h" | 143 | +#include "qemu/osdep.h" |
128 | +#include "qemu/error-report.h" | 144 | +#include "qemu/error-report.h" |
129 | +#include "sysemu/guest-memfd-manager.h" | 145 | +#include "exec/ramblock.h" |
130 | + | 146 | + |
131 | +OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(GuestMemfdManager, | 147 | +OBJECT_DEFINE_TYPE_WITH_INTERFACES(RamBlockAttribute, |
132 | + guest_memfd_manager, | 148 | + ram_block_attribute, |
133 | + GUEST_MEMFD_MANAGER, | 149 | + RAM_BLOCK_ATTRIBUTE, |
134 | + OBJECT, | 150 | + OBJECT, |
135 | + { TYPE_RAM_DISCARD_MANAGER }, | 151 | + { TYPE_PRIVATE_SHARED_MANAGER }, |
136 | + { }) | 152 | + { }) |
137 | + | 153 | + |
138 | +static bool guest_memfd_rdm_is_populated(const RamDiscardManager *rdm, | 154 | +static size_t ram_block_attribute_get_block_size(const RamBlockAttribute *attr) |
139 | + const MemoryRegionSection *section) | 155 | +{ |
140 | +{ | 156 | + /* |
141 | + const GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); | 157 | + * Because page conversion could be manipulated in the size of at least 4K or 4K aligned, |
142 | + uint64_t first_bit = section->offset_within_region / gmm->block_size; | 158 | + * Use the host page size as the granularity to track the memory attribute. |
143 | + uint64_t last_bit = first_bit + int128_get64(section->size) / gmm->block_size - 1; | 159 | + */ |
160 | + g_assert(attr && attr->mr && attr->mr->ram_block); | ||
161 | + g_assert(attr->mr->ram_block->page_size == qemu_real_host_page_size()); | ||
162 | + return attr->mr->ram_block->page_size; | ||
163 | +} | ||
164 | + | ||
165 | + | ||
166 | +static bool ram_block_attribute_psm_is_shared(const GenericStateManager *gsm, | ||
167 | + const MemoryRegionSection *section) | ||
168 | +{ | ||
169 | + const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm); | ||
170 | + const int block_size = ram_block_attribute_get_block_size(attr); | ||
171 | + uint64_t first_bit = section->offset_within_region / block_size; | ||
172 | + uint64_t last_bit = first_bit + int128_get64(section->size) / block_size - 1; | ||
144 | + unsigned long first_discard_bit; | 173 | + unsigned long first_discard_bit; |
145 | + | 174 | + |
146 | + first_discard_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit); | 175 | + first_discard_bit = find_next_zero_bit(attr->shared_bitmap, last_bit + 1, first_bit); |
147 | + return first_discard_bit > last_bit; | 176 | + return first_discard_bit > last_bit; |
148 | +} | 177 | +} |
149 | + | 178 | + |
150 | +typedef int (*guest_memfd_section_cb)(MemoryRegionSection *s, void *arg); | 179 | +typedef int (*ram_block_attribute_section_cb)(MemoryRegionSection *s, void *arg); |
151 | + | 180 | + |
152 | +static int guest_memfd_notify_populate_cb(MemoryRegionSection *section, void *arg) | 181 | +static int ram_block_attribute_notify_shared_cb(MemoryRegionSection *section, void *arg) |
153 | +{ | 182 | +{ |
154 | + RamDiscardListener *rdl = arg; | 183 | + StateChangeListener *scl = arg; |
155 | + | 184 | + |
156 | + return rdl->notify_populate(rdl, section); | 185 | + return scl->notify_to_state_set(scl, section); |
157 | +} | 186 | +} |
158 | + | 187 | + |
159 | +static int guest_memfd_notify_discard_cb(MemoryRegionSection *section, void *arg) | 188 | +static int ram_block_attribute_notify_private_cb(MemoryRegionSection *section, void *arg) |
160 | +{ | 189 | +{ |
161 | + RamDiscardListener *rdl = arg; | 190 | + StateChangeListener *scl = arg; |
162 | + | 191 | + |
163 | + rdl->notify_discard(rdl, section); | 192 | + scl->notify_to_state_clear(scl, section); |
164 | + | ||
165 | + return 0; | 193 | + return 0; |
166 | +} | 194 | +} |
167 | + | 195 | + |
168 | +static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm, | 196 | +static int ram_block_attribute_for_each_shared_section(const RamBlockAttribute *attr, |
169 | + MemoryRegionSection *section, | 197 | + MemoryRegionSection *section, |
170 | + void *arg, | 198 | + void *arg, |
171 | + guest_memfd_section_cb cb) | 199 | + ram_block_attribute_section_cb cb) |
172 | +{ | 200 | +{ |
173 | + unsigned long first_one_bit, last_one_bit; | 201 | + unsigned long first_bit, last_bit; |
174 | + uint64_t offset, size; | 202 | + uint64_t offset, size; |
203 | + const int block_size = ram_block_attribute_get_block_size(attr); | ||
175 | + int ret = 0; | 204 | + int ret = 0; |
176 | + | 205 | + |
177 | + first_one_bit = section->offset_within_region / gmm->block_size; | 206 | + first_bit = section->offset_within_region / block_size; |
178 | + first_one_bit = find_next_bit(gmm->bitmap, gmm->bitmap_size, first_one_bit); | 207 | + first_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size, first_bit); |
179 | + | 208 | + |
180 | + while (first_one_bit < gmm->bitmap_size) { | 209 | + while (first_bit < attr->shared_bitmap_size) { |
181 | + MemoryRegionSection tmp = *section; | 210 | + MemoryRegionSection tmp = *section; |
182 | + | 211 | + |
183 | + offset = first_one_bit * gmm->block_size; | 212 | + offset = first_bit * block_size; |
184 | + last_one_bit = find_next_zero_bit(gmm->bitmap, gmm->bitmap_size, | 213 | + last_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size, |
185 | + first_one_bit + 1) - 1; | 214 | + first_bit + 1) - 1; |
186 | + size = (last_one_bit - first_one_bit + 1) * gmm->block_size; | 215 | + size = (last_bit - first_bit + 1) * block_size; |
187 | + | 216 | + |
188 | + if (!memory_region_section_intersect_range(&tmp, offset, size)) { | 217 | + if (!memory_region_section_intersect_range(&tmp, offset, size)) { |
189 | + break; | 218 | + break; |
190 | + } | 219 | + } |
191 | + | 220 | + |
192 | + ret = cb(&tmp, arg); | 221 | + ret = cb(&tmp, arg); |
193 | + if (ret) { | 222 | + if (ret) { |
223 | + error_report("%s: Failed to notify RAM discard listener: %s", __func__, | ||
224 | + strerror(-ret)); | ||
194 | + break; | 225 | + break; |
195 | + } | 226 | + } |
196 | + | 227 | + |
197 | + first_one_bit = find_next_bit(gmm->bitmap, gmm->bitmap_size, | 228 | + first_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size, |
198 | + last_one_bit + 2); | 229 | + last_bit + 2); |
199 | + } | 230 | + } |
200 | + | 231 | + |
201 | + return ret; | 232 | + return ret; |
202 | +} | 233 | +} |
203 | + | 234 | + |
204 | +static int guest_memfd_for_each_discarded_section(const GuestMemfdManager *gmm, | 235 | +static int ram_block_attribute_for_each_private_section(const RamBlockAttribute *attr, |
205 | + MemoryRegionSection *section, | 236 | + MemoryRegionSection *section, |
206 | + void *arg, | 237 | + void *arg, |
207 | + guest_memfd_section_cb cb) | 238 | + ram_block_attribute_section_cb cb) |
208 | +{ | 239 | +{ |
209 | + unsigned long first_zero_bit, last_zero_bit; | 240 | + unsigned long first_bit, last_bit; |
210 | + uint64_t offset, size; | 241 | + uint64_t offset, size; |
242 | + const int block_size = ram_block_attribute_get_block_size(attr); | ||
211 | + int ret = 0; | 243 | + int ret = 0; |
212 | + | 244 | + |
213 | + first_zero_bit = section->offset_within_region / gmm->block_size; | 245 | + first_bit = section->offset_within_region / block_size; |
214 | + first_zero_bit = find_next_zero_bit(gmm->bitmap, gmm->bitmap_size, | 246 | + first_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size, |
215 | + first_zero_bit); | 247 | + first_bit); |
216 | + | 248 | + |
217 | + while (first_zero_bit < gmm->bitmap_size) { | 249 | + while (first_bit < attr->shared_bitmap_size) { |
218 | + MemoryRegionSection tmp = *section; | 250 | + MemoryRegionSection tmp = *section; |
219 | + | 251 | + |
220 | + offset = first_zero_bit * gmm->block_size; | 252 | + offset = first_bit * block_size; |
221 | + last_zero_bit = find_next_bit(gmm->bitmap, gmm->bitmap_size, | 253 | + last_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size, |
222 | + first_zero_bit + 1) - 1; | 254 | + first_bit + 1) - 1; |
223 | + size = (last_zero_bit - first_zero_bit + 1) * gmm->block_size; | 255 | + size = (last_bit - first_bit + 1) * block_size; |
224 | + | 256 | + |
225 | + if (!memory_region_section_intersect_range(&tmp, offset, size)) { | 257 | + if (!memory_region_section_intersect_range(&tmp, offset, size)) { |
226 | + break; | 258 | + break; |
227 | + } | 259 | + } |
228 | + | 260 | + |
229 | + ret = cb(&tmp, arg); | 261 | + ret = cb(&tmp, arg); |
230 | + if (ret) { | 262 | + if (ret) { |
263 | + error_report("%s: Failed to notify RAM discard listener: %s", __func__, | ||
264 | + strerror(-ret)); | ||
231 | + break; | 265 | + break; |
232 | + } | 266 | + } |
233 | + | 267 | + |
234 | + first_zero_bit = find_next_zero_bit(gmm->bitmap, gmm->bitmap_size, | 268 | + first_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size, |
235 | + last_zero_bit + 2); | 269 | + last_bit + 2); |
236 | + } | 270 | + } |
237 | + | 271 | + |
238 | + return ret; | 272 | + return ret; |
239 | +} | 273 | +} |
240 | + | 274 | + |
241 | +static uint64_t guest_memfd_rdm_get_min_granularity(const RamDiscardManager *rdm, | 275 | +static uint64_t ram_block_attribute_psm_get_min_granularity(const GenericStateManager *gsm, |
242 | + const MemoryRegion *mr) | 276 | + const MemoryRegion *mr) |
243 | +{ | 277 | +{ |
244 | + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); | 278 | + const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm); |
245 | + | 279 | + |
246 | + g_assert(mr == gmm->mr); | 280 | + g_assert(mr == attr->mr); |
247 | + return gmm->block_size; | 281 | + return ram_block_attribute_get_block_size(attr); |
248 | +} | 282 | +} |
249 | + | 283 | + |
250 | +static void guest_memfd_rdm_register_listener(RamDiscardManager *rdm, | 284 | +static void ram_block_attribute_psm_register_listener(GenericStateManager *gsm, |
251 | + RamDiscardListener *rdl, | 285 | + StateChangeListener *scl, |
252 | + MemoryRegionSection *section) | 286 | + MemoryRegionSection *section) |
253 | +{ | 287 | +{ |
254 | + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); | 288 | + RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm); |
289 | + PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl); | ||
255 | + int ret; | 290 | + int ret; |
256 | + | 291 | + |
257 | + g_assert(section->mr == gmm->mr); | 292 | + g_assert(section->mr == attr->mr); |
258 | + rdl->section = memory_region_section_new_copy(section); | 293 | + scl->section = memory_region_section_new_copy(section); |
259 | + | 294 | + |
260 | + QLIST_INSERT_HEAD(&gmm->rdl_list, rdl, next); | 295 | + QLIST_INSERT_HEAD(&attr->psl_list, psl, next); |
261 | + | 296 | + |
262 | + ret = guest_memfd_for_each_populated_section(gmm, section, rdl, | 297 | + ret = ram_block_attribute_for_each_shared_section(attr, section, scl, |
263 | + guest_memfd_notify_populate_cb); | 298 | + ram_block_attribute_notify_shared_cb); |
264 | + if (ret) { | 299 | + if (ret) { |
265 | + error_report("%s: Failed to register RAM discard listener: %s", __func__, | 300 | + error_report("%s: Failed to register RAM discard listener: %s", __func__, |
266 | + strerror(-ret)); | 301 | + strerror(-ret)); |
267 | + } | 302 | + } |
268 | +} | 303 | +} |
269 | + | 304 | + |
270 | +static void guest_memfd_rdm_unregister_listener(RamDiscardManager *rdm, | 305 | +static void ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm, |
271 | + RamDiscardListener *rdl) | 306 | + StateChangeListener *scl) |
272 | +{ | 307 | +{ |
273 | + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); | 308 | + RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm); |
309 | + PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl); | ||
274 | + int ret; | 310 | + int ret; |
275 | + | 311 | + |
276 | + g_assert(rdl->section); | 312 | + g_assert(scl->section); |
277 | + g_assert(rdl->section->mr == gmm->mr); | 313 | + g_assert(scl->section->mr == attr->mr); |
278 | + | 314 | + |
279 | + ret = guest_memfd_for_each_populated_section(gmm, rdl->section, rdl, | 315 | + ret = ram_block_attribute_for_each_shared_section(attr, scl->section, scl, |
280 | + guest_memfd_notify_discard_cb); | 316 | + ram_block_attribute_notify_private_cb); |
281 | + if (ret) { | 317 | + if (ret) { |
282 | + error_report("%s: Failed to unregister RAM discard listener: %s", __func__, | 318 | + error_report("%s: Failed to unregister RAM discard listener: %s", __func__, |
283 | + strerror(-ret)); | 319 | + strerror(-ret)); |
284 | + } | 320 | + } |
285 | + | 321 | + |
286 | + memory_region_section_free_copy(rdl->section); | 322 | + memory_region_section_free_copy(scl->section); |
287 | + rdl->section = NULL; | 323 | + scl->section = NULL; |
288 | + QLIST_REMOVE(rdl, next); | 324 | + QLIST_REMOVE(psl, next); |
289 | + | 325 | +} |
290 | +} | 326 | + |
291 | + | 327 | +typedef struct RamBlockAttributeReplayData { |
292 | +typedef struct GuestMemfdReplayData { | 328 | + ReplayStateChange fn; |
293 | + void *fn; | ||
294 | + void *opaque; | 329 | + void *opaque; |
295 | +} GuestMemfdReplayData; | 330 | +} RamBlockAttributeReplayData; |
296 | + | 331 | + |
297 | +static int guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section, void *arg) | 332 | +static int ram_block_attribute_psm_replay_cb(MemoryRegionSection *section, void *arg) |
298 | +{ | 333 | +{ |
299 | + struct GuestMemfdReplayData *data = arg; | 334 | + RamBlockAttributeReplayData *data = arg; |
300 | + ReplayRamPopulate replay_fn = data->fn; | 335 | + |
301 | + | 336 | + return data->fn(section, data->opaque); |
302 | + return replay_fn(section, data->opaque); | 337 | +} |
303 | +} | 338 | + |
304 | + | 339 | +static int ram_block_attribute_psm_replay_on_shared(const GenericStateManager *gsm, |
305 | +static int guest_memfd_rdm_replay_populated(const RamDiscardManager *rdm, | 340 | + MemoryRegionSection *section, |
306 | + MemoryRegionSection *section, | 341 | + ReplayStateChange replay_fn, |
307 | + ReplayRamPopulate replay_fn, | 342 | + void *opaque) |
308 | + void *opaque) | 343 | +{ |
309 | +{ | 344 | + RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm); |
310 | + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); | 345 | + RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque = opaque }; |
311 | + struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; | 346 | + |
312 | + | 347 | + g_assert(section->mr == attr->mr); |
313 | + g_assert(section->mr == gmm->mr); | 348 | + return ram_block_attribute_for_each_shared_section(attr, section, &data, |
314 | + return guest_memfd_for_each_populated_section(gmm, section, &data, | 349 | + ram_block_attribute_psm_replay_cb); |
315 | + guest_memfd_rdm_replay_populated_cb); | 350 | +} |
316 | +} | 351 | + |
317 | + | 352 | +static int ram_block_attribute_psm_replay_on_private(const GenericStateManager *gsm, |
318 | +static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection *section, void *arg) | 353 | + MemoryRegionSection *section, |
319 | +{ | 354 | + ReplayStateChange replay_fn, |
320 | + struct GuestMemfdReplayData *data = arg; | 355 | + void *opaque) |
321 | + ReplayRamDiscard replay_fn = data->fn; | 356 | +{ |
322 | + | 357 | + RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm); |
323 | + replay_fn(section, data->opaque); | 358 | + RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque = opaque }; |
324 | + | 359 | + |
325 | + return 0; | 360 | + g_assert(section->mr == attr->mr); |
326 | +} | 361 | + return ram_block_attribute_for_each_private_section(attr, section, &data, |
327 | + | 362 | + ram_block_attribute_psm_replay_cb); |
328 | +static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, | 363 | +} |
329 | + MemoryRegionSection *section, | 364 | + |
330 | + ReplayRamDiscard replay_fn, | 365 | +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr) |
331 | + void *opaque) | 366 | +{ |
332 | +{ | 367 | + uint64_t shared_bitmap_size; |
333 | + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); | 368 | + const int block_size = qemu_real_host_page_size(); |
334 | + struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; | 369 | + int ret; |
335 | + | 370 | + |
336 | + g_assert(section->mr == gmm->mr); | 371 | + shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size; |
337 | + guest_memfd_for_each_discarded_section(gmm, section, &data, | 372 | + |
338 | + guest_memfd_rdm_replay_discarded_cb); | 373 | + attr->mr = mr; |
339 | +} | 374 | + ret = memory_region_set_generic_state_manager(mr, GENERIC_STATE_MANAGER(attr)); |
340 | + | 375 | + if (ret) { |
341 | +static void guest_memfd_manager_init(Object *obj) | 376 | + return ret; |
342 | +{ | 377 | + } |
343 | + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj); | 378 | + attr->shared_bitmap_size = shared_bitmap_size; |
344 | + | 379 | + attr->shared_bitmap = bitmap_new(shared_bitmap_size); |
345 | + QLIST_INIT(&gmm->rdl_list); | 380 | + |
346 | +} | 381 | + return ret; |
347 | + | 382 | +} |
348 | +static void guest_memfd_manager_finalize(Object *obj) | 383 | + |
349 | +{ | 384 | +void ram_block_attribute_unrealize(RamBlockAttribute *attr) |
350 | + g_free(GUEST_MEMFD_MANAGER(obj)->bitmap); | 385 | +{ |
351 | +} | 386 | + g_free(attr->shared_bitmap); |
352 | + | 387 | + memory_region_set_generic_state_manager(attr->mr, NULL); |
353 | +static void guest_memfd_manager_class_init(ObjectClass *oc, void *data) | 388 | +} |
354 | +{ | 389 | + |
355 | + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); | 390 | +static void ram_block_attribute_init(Object *obj) |
356 | + | 391 | +{ |
357 | + rdmc->get_min_granularity = guest_memfd_rdm_get_min_granularity; | 392 | + RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(obj); |
358 | + rdmc->register_listener = guest_memfd_rdm_register_listener; | 393 | + |
359 | + rdmc->unregister_listener = guest_memfd_rdm_unregister_listener; | 394 | + QLIST_INIT(&attr->psl_list); |
360 | + rdmc->is_populated = guest_memfd_rdm_is_populated; | 395 | +} |
361 | + rdmc->replay_populated = guest_memfd_rdm_replay_populated; | 396 | + |
362 | + rdmc->replay_discarded = guest_memfd_rdm_replay_discarded; | 397 | +static void ram_block_attribute_finalize(Object *obj) |
363 | +} | 398 | +{ |
364 | diff --git a/system/meson.build b/system/meson.build | 399 | +} |
365 | index XXXXXXX..XXXXXXX 100644 | 400 | + |
366 | --- a/system/meson.build | 401 | +static void ram_block_attribute_class_init(ObjectClass *oc, void *data) |
367 | +++ b/system/meson.build | 402 | +{ |
368 | @@ -XXX,XX +XXX,XX @@ system_ss.add(files( | 403 | + GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(oc); |
369 | 'dirtylimit.c', | 404 | + |
370 | 'dma-helpers.c', | 405 | + gsmc->get_min_granularity = ram_block_attribute_psm_get_min_granularity; |
371 | 'globals.c', | 406 | + gsmc->register_listener = ram_block_attribute_psm_register_listener; |
372 | + 'guest-memfd-manager.c', | 407 | + gsmc->unregister_listener = ram_block_attribute_psm_unregister_listener; |
373 | 'memory_mapping.c', | 408 | + gsmc->is_state_set = ram_block_attribute_psm_is_shared; |
374 | 'qdev-monitor.c', | 409 | + gsmc->replay_on_state_set = ram_block_attribute_psm_replay_on_shared; |
375 | 'qtest.c', | 410 | + gsmc->replay_on_state_clear = ram_block_attribute_psm_replay_on_private; |
411 | +} | ||
376 | -- | 412 | -- |
377 | 2.43.5 | 413 | 2.43.5 | diff view generated by jsdifflib |
1 | Introduce a new state_change() callback in GuestMemfdManagerClass to | 1 | A new state_change() callback is introduced in PrivateSharedManageClass |
---|---|---|---|
2 | efficiently notify all registered RamDiscardListeners, including VFIO | 2 | to efficiently notify all registered PrivateSharedListeners, including |
3 | listeners about the memory conversion events in guest_memfd. The | 3 | VFIO listeners, about memory conversion events in guest_memfd. The VFIO |
4 | existing VFIO listener can dynamically DMA map/unmap the shared pages | 4 | listener can dynamically DMA map/unmap shared pages based on conversion |
5 | based on conversion types: | 5 | types: |
6 | - For conversions from shared to private, the VFIO system ensures the | 6 | - For conversions from shared to private, the VFIO system ensures the |
7 | discarding of shared mapping from the IOMMU. | 7 | discarding of shared mapping from the IOMMU. |
8 | - For conversions from private to shared, it triggers the population of | 8 | - For conversions from private to shared, it triggers the population of |
9 | the shared mapping into the IOMMU. | 9 | the shared mapping into the IOMMU. |
10 | 10 | ||
11 | Additionally, there could be some special conversion requests: | 11 | Additionally, special conversion requests are handled as followed: |
12 | - When a conversion request is made for a page already in the desired | 12 | - If a conversion request is made for a page already in the desired |
13 | state, the helper simply returns success. | 13 | state, the helper simply returns success. |
14 | - For requests involving a range partially in the desired state, only | 14 | - For requests involving a range partially in the desired state, only |
15 | the necessary segments are converted, ensuring the entire range | 15 | the necessary segments are converted, ensuring efficient compliance |
16 | complies with the request efficiently. | 16 | with the request. In this case, fallback to "1 block at a time" |
17 | - In scenarios where a conversion request is declined by other systems, | 17 | handling so that the range passed to the notify_to_private/shared() is |
18 | such as a failure from VFIO during notify_populate(), the helper will | 18 | always in the desired state. |
19 | roll back the request, maintaining consistency. | 19 | - If a conversion request is declined by other systems, such as a |
20 | failure from VFIO during notify_to_shared(), the helper rolls back the | ||
21 | request to maintain consistency. As for notify_to_private() handling, | ||
22 | failure in VFIO is unexpected, so no error check is performed. | ||
23 | |||
24 | Note that the bitmap status is updated before callbacks, allowing | ||
25 | listeners to handle memory based on the latest status. | ||
26 | |||
27 | Opportunistically, introduce a helper to trigger the state_change() | ||
28 | callback of the class. | ||
20 | 29 | ||
21 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> | 30 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> |
22 | --- | 31 | --- |
23 | include/sysemu/guest-memfd-manager.h | 3 + | 32 | Changes in v4: |
24 | system/guest-memfd-manager.c | 144 +++++++++++++++++++++++++++ | 33 | - Add the state_change() callback in PrivateSharedManagerClass |
25 | 2 files changed, 147 insertions(+) | 34 | instead of the RamBlockAttribute. |
26 | 35 | ||
27 | diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h | 36 | Changes in v3: |
37 | - Move the bitmap update before notifier callbacks. | ||
38 | - Call the notifier callbacks directly in notify_discard/populate() | ||
39 | with the expectation that the request memory range is in the | ||
40 | desired attribute. | ||
41 | - For the case that only partial range in the desire status, handle | ||
42 | the range with block_size granularity for ease of rollback | ||
43 | (https://lore.kernel.org/qemu-devel/812768d7-a02d-4b29-95f3-fb7a125cf54e@redhat.com/) | ||
44 | |||
45 | Changes in v2: | ||
46 | - Do the alignment changes due to the rename to MemoryAttributeManager | ||
47 | - Move the state_change() helper definition in this patch. | ||
48 | --- | ||
49 | include/exec/memory.h | 7 ++ | ||
50 | system/memory.c | 10 ++ | ||
51 | system/ram-block-attribute.c | 191 +++++++++++++++++++++++++++++++++++ | ||
52 | 3 files changed, 208 insertions(+) | ||
53 | |||
54 | diff --git a/include/exec/memory.h b/include/exec/memory.h | ||
28 | index XXXXXXX..XXXXXXX 100644 | 55 | index XXXXXXX..XXXXXXX 100644 |
29 | --- a/include/sysemu/guest-memfd-manager.h | 56 | --- a/include/exec/memory.h |
30 | +++ b/include/sysemu/guest-memfd-manager.h | 57 | +++ b/include/exec/memory.h |
31 | @@ -XXX,XX +XXX,XX @@ struct GuestMemfdManager { | 58 | @@ -XXX,XX +XXX,XX @@ struct PrivateSharedListener { |
32 | 59 | struct PrivateSharedManagerClass { | |
33 | struct GuestMemfdManagerClass { | 60 | /* private */ |
34 | ObjectClass parent_class; | 61 | GenericStateManagerClass parent_class; |
35 | + | 62 | + |
36 | + int (*state_change)(GuestMemfdManager *gmm, uint64_t offset, uint64_t size, | 63 | + int (*state_change)(PrivateSharedManager *mgr, uint64_t offset, uint64_t size, |
37 | + bool shared_to_private); | 64 | + bool to_private); |
38 | }; | 65 | }; |
39 | 66 | ||
40 | #endif | 67 | static inline void private_shared_listener_init(PrivateSharedListener *psl, |
41 | diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c | 68 | @@ -XXX,XX +XXX,XX @@ static inline void private_shared_listener_init(PrivateSharedListener *psl, |
69 | state_change_listener_init(&psl->scl, populate_fn, discard_fn); | ||
70 | } | ||
71 | |||
72 | +int private_shared_manager_state_change(PrivateSharedManager *mgr, | ||
73 | + uint64_t offset, uint64_t size, | ||
74 | + bool to_private); | ||
75 | + | ||
76 | /** | ||
77 | * memory_get_xlat_addr: Extract addresses from a TLB entry | ||
78 | * | ||
79 | diff --git a/system/memory.c b/system/memory.c | ||
42 | index XXXXXXX..XXXXXXX 100644 | 80 | index XXXXXXX..XXXXXXX 100644 |
43 | --- a/system/guest-memfd-manager.c | 81 | --- a/system/memory.c |
44 | +++ b/system/guest-memfd-manager.c | 82 | +++ b/system/memory.c |
45 | @@ -XXX,XX +XXX,XX @@ static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, | 83 | @@ -XXX,XX +XXX,XX @@ void generic_state_manager_unregister_listener(GenericStateManager *gsm, |
46 | guest_memfd_rdm_replay_discarded_cb); | 84 | gsmc->unregister_listener(gsm, scl); |
47 | } | 85 | } |
48 | 86 | ||
49 | +static bool guest_memfd_is_valid_range(GuestMemfdManager *gmm, | 87 | +int private_shared_manager_state_change(PrivateSharedManager *mgr, |
50 | + uint64_t offset, uint64_t size) | 88 | + uint64_t offset, uint64_t size, |
51 | +{ | 89 | + bool to_private) |
52 | + MemoryRegion *mr = gmm->mr; | 90 | +{ |
91 | + PrivateSharedManagerClass *psmc = PRIVATE_SHARED_MANAGER_GET_CLASS(mgr); | ||
92 | + | ||
93 | + g_assert(psmc->state_change); | ||
94 | + return psmc->state_change(mgr, offset, size, to_private); | ||
95 | +} | ||
96 | + | ||
97 | /* Called with rcu_read_lock held. */ | ||
98 | bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, | ||
99 | ram_addr_t *ram_addr, bool *read_only, | ||
100 | diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c | ||
101 | index XXXXXXX..XXXXXXX 100644 | ||
102 | --- a/system/ram-block-attribute.c | ||
103 | +++ b/system/ram-block-attribute.c | ||
104 | @@ -XXX,XX +XXX,XX @@ static int ram_block_attribute_psm_replay_on_private(const GenericStateManager * | ||
105 | ram_block_attribute_psm_replay_cb); | ||
106 | } | ||
107 | |||
108 | +static bool ram_block_attribute_is_valid_range(RamBlockAttribute *attr, | ||
109 | + uint64_t offset, uint64_t size) | ||
110 | +{ | ||
111 | + MemoryRegion *mr = attr->mr; | ||
53 | + | 112 | + |
54 | + g_assert(mr); | 113 | + g_assert(mr); |
55 | + | 114 | + |
56 | + uint64_t region_size = memory_region_size(mr); | 115 | + uint64_t region_size = memory_region_size(mr); |
57 | + if (!QEMU_IS_ALIGNED(offset, gmm->block_size)) { | 116 | + int block_size = ram_block_attribute_get_block_size(attr); |
117 | + | ||
118 | + if (!QEMU_IS_ALIGNED(offset, block_size)) { | ||
58 | + return false; | 119 | + return false; |
59 | + } | 120 | + } |
60 | + if (offset + size < offset || !size) { | 121 | + if (offset + size < offset || !size) { |
61 | + return false; | 122 | + return false; |
62 | + } | 123 | + } |
63 | + if (offset >= region_size || offset + size > region_size) { | 124 | + if (offset >= region_size || offset + size > region_size) { |
64 | + return false; | 125 | + return false; |
65 | + } | 126 | + } |
66 | + return true; | 127 | + return true; |
67 | +} | 128 | +} |
68 | + | 129 | + |
69 | +static void guest_memfd_notify_discard(GuestMemfdManager *gmm, | 130 | +static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr, |
70 | + uint64_t offset, uint64_t size) | 131 | + uint64_t offset, uint64_t size) |
71 | +{ | 132 | +{ |
72 | + RamDiscardListener *rdl; | 133 | + PrivateSharedListener *psl; |
73 | + | 134 | + |
74 | + QLIST_FOREACH(rdl, &gmm->rdl_list, next) { | 135 | + QLIST_FOREACH(psl, &attr->psl_list, next) { |
75 | + MemoryRegionSection tmp = *rdl->section; | 136 | + StateChangeListener *scl = &psl->scl; |
137 | + MemoryRegionSection tmp = *scl->section; | ||
76 | + | 138 | + |
77 | + if (!memory_region_section_intersect_range(&tmp, offset, size)) { | 139 | + if (!memory_region_section_intersect_range(&tmp, offset, size)) { |
78 | + continue; | 140 | + continue; |
79 | + } | 141 | + } |
80 | + | 142 | + scl->notify_to_state_clear(scl, &tmp); |
81 | + guest_memfd_for_each_populated_section(gmm, &tmp, rdl, | 143 | + } |
82 | + guest_memfd_notify_discard_cb); | 144 | +} |
83 | + } | 145 | + |
84 | +} | 146 | +static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr, |
85 | + | 147 | + uint64_t offset, uint64_t size) |
86 | + | 148 | +{ |
87 | +static int guest_memfd_notify_populate(GuestMemfdManager *gmm, | 149 | + PrivateSharedListener *psl, *psl2; |
88 | + uint64_t offset, uint64_t size) | ||
89 | +{ | ||
90 | + RamDiscardListener *rdl, *rdl2; | ||
91 | + int ret = 0; | 150 | + int ret = 0; |
92 | + | 151 | + |
93 | + QLIST_FOREACH(rdl, &gmm->rdl_list, next) { | 152 | + QLIST_FOREACH(psl, &attr->psl_list, next) { |
94 | + MemoryRegionSection tmp = *rdl->section; | 153 | + StateChangeListener *scl = &psl->scl; |
154 | + MemoryRegionSection tmp = *scl->section; | ||
95 | + | 155 | + |
96 | + if (!memory_region_section_intersect_range(&tmp, offset, size)) { | 156 | + if (!memory_region_section_intersect_range(&tmp, offset, size)) { |
97 | + continue; | 157 | + continue; |
98 | + } | 158 | + } |
99 | + | 159 | + ret = scl->notify_to_state_set(scl, &tmp); |
100 | + ret = guest_memfd_for_each_discarded_section(gmm, &tmp, rdl, | ||
101 | + guest_memfd_notify_populate_cb); | ||
102 | + if (ret) { | 160 | + if (ret) { |
103 | + break; | 161 | + break; |
104 | + } | 162 | + } |
105 | + } | 163 | + } |
106 | + | 164 | + |
107 | + if (ret) { | 165 | + if (ret) { |
108 | + /* Notify all already-notified listeners. */ | 166 | + /* Notify all already-notified listeners. */ |
109 | + QLIST_FOREACH(rdl2, &gmm->rdl_list, next) { | 167 | + QLIST_FOREACH(psl2, &attr->psl_list, next) { |
110 | + MemoryRegionSection tmp = *rdl2->section; | 168 | + StateChangeListener *scl2 = &psl2->scl; |
111 | + | 169 | + MemoryRegionSection tmp = *scl2->section; |
112 | + if (rdl2 == rdl) { | 170 | + |
171 | + if (psl == psl2) { | ||
113 | + break; | 172 | + break; |
114 | + } | 173 | + } |
115 | + if (!memory_region_section_intersect_range(&tmp, offset, size)) { | 174 | + if (!memory_region_section_intersect_range(&tmp, offset, size)) { |
116 | + continue; | 175 | + continue; |
117 | + } | 176 | + } |
118 | + | 177 | + scl2->notify_to_state_clear(scl2, &tmp); |
119 | + guest_memfd_for_each_discarded_section(gmm, &tmp, rdl2, | ||
120 | + guest_memfd_notify_discard_cb); | ||
121 | + } | 178 | + } |
122 | + } | 179 | + } |
123 | + return ret; | 180 | + return ret; |
124 | +} | 181 | +} |
125 | + | 182 | + |
126 | +static bool guest_memfd_is_range_populated(GuestMemfdManager *gmm, | 183 | +static bool ram_block_attribute_is_range_shared(RamBlockAttribute *attr, |
127 | + uint64_t offset, uint64_t size) | 184 | + uint64_t offset, uint64_t size) |
128 | +{ | 185 | +{ |
129 | + const unsigned long first_bit = offset / gmm->block_size; | 186 | + const int block_size = ram_block_attribute_get_block_size(attr); |
130 | + const unsigned long last_bit = first_bit + (size / gmm->block_size) - 1; | 187 | + const unsigned long first_bit = offset / block_size; |
188 | + const unsigned long last_bit = first_bit + (size / block_size) - 1; | ||
131 | + unsigned long found_bit; | 189 | + unsigned long found_bit; |
132 | + | 190 | + |
133 | + /* We fake a shorter bitmap to avoid searching too far. */ | 191 | + /* We fake a shorter bitmap to avoid searching too far. */ |
134 | + found_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit); | 192 | + found_bit = find_next_zero_bit(attr->shared_bitmap, last_bit + 1, first_bit); |
135 | + return found_bit > last_bit; | 193 | + return found_bit > last_bit; |
136 | +} | 194 | +} |
137 | + | 195 | + |
138 | +static bool guest_memfd_is_range_discarded(GuestMemfdManager *gmm, | 196 | +static bool ram_block_attribute_is_range_private(RamBlockAttribute *attr, |
139 | + uint64_t offset, uint64_t size) | 197 | + uint64_t offset, uint64_t size) |
140 | +{ | 198 | +{ |
141 | + const unsigned long first_bit = offset / gmm->block_size; | 199 | + const int block_size = ram_block_attribute_get_block_size(attr); |
142 | + const unsigned long last_bit = first_bit + (size / gmm->block_size) - 1; | 200 | + const unsigned long first_bit = offset / block_size; |
201 | + const unsigned long last_bit = first_bit + (size / block_size) - 1; | ||
143 | + unsigned long found_bit; | 202 | + unsigned long found_bit; |
144 | + | 203 | + |
145 | + /* We fake a shorter bitmap to avoid searching too far. */ | 204 | + /* We fake a shorter bitmap to avoid searching too far. */ |
146 | + found_bit = find_next_bit(gmm->bitmap, last_bit + 1, first_bit); | 205 | + found_bit = find_next_bit(attr->shared_bitmap, last_bit + 1, first_bit); |
147 | + return found_bit > last_bit; | 206 | + return found_bit > last_bit; |
148 | +} | 207 | +} |
149 | + | 208 | + |
150 | +static int guest_memfd_state_change(GuestMemfdManager *gmm, uint64_t offset, | 209 | +static int ram_block_attribute_psm_state_change(PrivateSharedManager *mgr, uint64_t offset, |
151 | + uint64_t size, bool shared_to_private) | 210 | + uint64_t size, bool to_private) |
152 | +{ | 211 | +{ |
212 | + RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(mgr); | ||
213 | + const int block_size = ram_block_attribute_get_block_size(attr); | ||
214 | + const unsigned long first_bit = offset / block_size; | ||
215 | + const unsigned long nbits = size / block_size; | ||
216 | + const uint64_t end = offset + size; | ||
217 | + unsigned long bit; | ||
218 | + uint64_t cur; | ||
153 | + int ret = 0; | 219 | + int ret = 0; |
154 | + | 220 | + |
155 | + if (!guest_memfd_is_valid_range(gmm, offset, size)) { | 221 | + if (!ram_block_attribute_is_valid_range(attr, offset, size)) { |
156 | + error_report("%s, invalid range: offset 0x%lx, size 0x%lx", | 222 | + error_report("%s, invalid range: offset 0x%lx, size 0x%lx", |
157 | + __func__, offset, size); | 223 | + __func__, offset, size); |
158 | + return -1; | 224 | + return -1; |
159 | + } | 225 | + } |
160 | + | 226 | + |
161 | + if ((shared_to_private && guest_memfd_is_range_discarded(gmm, offset, size)) || | 227 | + if (to_private) { |
162 | + (!shared_to_private && guest_memfd_is_range_populated(gmm, offset, size))) { | 228 | + if (ram_block_attribute_is_range_private(attr, offset, size)) { |
163 | + return 0; | 229 | + /* Already private */ |
164 | + } | 230 | + } else if (!ram_block_attribute_is_range_shared(attr, offset, size)) { |
165 | + | 231 | + /* Unexpected mixture: process individual blocks */ |
166 | + if (shared_to_private) { | 232 | + for (cur = offset; cur < end; cur += block_size) { |
167 | + guest_memfd_notify_discard(gmm, offset, size); | 233 | + bit = cur / block_size; |
234 | + if (!test_bit(bit, attr->shared_bitmap)) { | ||
235 | + continue; | ||
236 | + } | ||
237 | + clear_bit(bit, attr->shared_bitmap); | ||
238 | + ram_block_attribute_notify_to_private(attr, cur, block_size); | ||
239 | + } | ||
240 | + } else { | ||
241 | + /* Completely shared */ | ||
242 | + bitmap_clear(attr->shared_bitmap, first_bit, nbits); | ||
243 | + ram_block_attribute_notify_to_private(attr, offset, size); | ||
244 | + } | ||
168 | + } else { | 245 | + } else { |
169 | + ret = guest_memfd_notify_populate(gmm, offset, size); | 246 | + if (ram_block_attribute_is_range_shared(attr, offset, size)) { |
170 | + } | 247 | + /* Already shared */ |
171 | + | 248 | + } else if (!ram_block_attribute_is_range_private(attr, offset, size)) { |
172 | + if (!ret) { | 249 | + /* Unexpected mixture: process individual blocks */ |
173 | + unsigned long first_bit = offset / gmm->block_size; | 250 | + unsigned long *modified_bitmap = bitmap_new(nbits); |
174 | + unsigned long nbits = size / gmm->block_size; | 251 | + |
175 | + | 252 | + for (cur = offset; cur < end; cur += block_size) { |
176 | + g_assert((first_bit + nbits) <= gmm->bitmap_size); | 253 | + bit = cur / block_size; |
177 | + | 254 | + if (test_bit(bit, attr->shared_bitmap)) { |
178 | + if (shared_to_private) { | 255 | + continue; |
179 | + bitmap_clear(gmm->bitmap, first_bit, nbits); | 256 | + } |
257 | + set_bit(bit, attr->shared_bitmap); | ||
258 | + ret = ram_block_attribute_notify_to_shared(attr, cur, block_size); | ||
259 | + if (!ret) { | ||
260 | + set_bit(bit - first_bit, modified_bitmap); | ||
261 | + continue; | ||
262 | + } | ||
263 | + clear_bit(bit, attr->shared_bitmap); | ||
264 | + break; | ||
265 | + } | ||
266 | + | ||
267 | + if (ret) { | ||
268 | + /* | ||
269 | + * Very unexpected: something went wrong. Revert to the old | ||
270 | + * state, marking only the blocks as private that we converted | ||
271 | + * to shared. | ||
272 | + */ | ||
273 | + for (cur = offset; cur < end; cur += block_size) { | ||
274 | + bit = cur / block_size; | ||
275 | + if (!test_bit(bit - first_bit, modified_bitmap)) { | ||
276 | + continue; | ||
277 | + } | ||
278 | + assert(test_bit(bit, attr->shared_bitmap)); | ||
279 | + clear_bit(bit, attr->shared_bitmap); | ||
280 | + ram_block_attribute_notify_to_private(attr, cur, block_size); | ||
281 | + } | ||
282 | + } | ||
283 | + g_free(modified_bitmap); | ||
180 | + } else { | 284 | + } else { |
181 | + bitmap_set(gmm->bitmap, first_bit, nbits); | 285 | + /* Complete private */ |
182 | + } | 286 | + bitmap_set(attr->shared_bitmap, first_bit, nbits); |
183 | + | 287 | + ret = ram_block_attribute_notify_to_shared(attr, offset, size); |
184 | + return 0; | 288 | + if (ret) { |
289 | + bitmap_clear(attr->shared_bitmap, first_bit, nbits); | ||
290 | + } | ||
291 | + } | ||
185 | + } | 292 | + } |
186 | + | 293 | + |
187 | + return ret; | 294 | + return ret; |
188 | +} | 295 | +} |
189 | + | 296 | + |
190 | static void guest_memfd_manager_init(Object *obj) | 297 | int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr) |
191 | { | 298 | { |
192 | GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj); | 299 | uint64_t shared_bitmap_size; |
193 | @@ -XXX,XX +XXX,XX @@ static void guest_memfd_manager_finalize(Object *obj) | 300 | @@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_finalize(Object *obj) |
194 | 301 | static void ram_block_attribute_class_init(ObjectClass *oc, void *data) | |
195 | static void guest_memfd_manager_class_init(ObjectClass *oc, void *data) | ||
196 | { | 302 | { |
197 | + GuestMemfdManagerClass *gmmc = GUEST_MEMFD_MANAGER_CLASS(oc); | 303 | GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(oc); |
198 | RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); | 304 | + PrivateSharedManagerClass *psmc = PRIVATE_SHARED_MANAGER_CLASS(oc); |
199 | 305 | ||
200 | + gmmc->state_change = guest_memfd_state_change; | 306 | gsmc->get_min_granularity = ram_block_attribute_psm_get_min_granularity; |
201 | + | 307 | gsmc->register_listener = ram_block_attribute_psm_register_listener; |
202 | rdmc->get_min_granularity = guest_memfd_rdm_get_min_granularity; | 308 | @@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_class_init(ObjectClass *oc, void *data) |
203 | rdmc->register_listener = guest_memfd_rdm_register_listener; | 309 | gsmc->is_state_set = ram_block_attribute_psm_is_shared; |
204 | rdmc->unregister_listener = guest_memfd_rdm_unregister_listener; | 310 | gsmc->replay_on_state_set = ram_block_attribute_psm_replay_on_shared; |
311 | gsmc->replay_on_state_clear = ram_block_attribute_psm_replay_on_private; | ||
312 | + psmc->state_change = ram_block_attribute_psm_state_change; | ||
313 | } | ||
205 | -- | 314 | -- |
206 | 2.43.5 | 315 | 2.43.5 | diff view generated by jsdifflib |
1 | Introduce the realize()/unrealize() callbacks to initialize/uninitialize | 1 | A new field, ram_block_attribute, is introduced in RAMBlock to link to a |
---|---|---|---|
2 | the new guest_memfd_manager object and register/unregister it in the | 2 | RamBlockAttribute object. This change centralizes all guest_memfd state |
3 | target MemoryRegion. | 3 | information (such as fd and shared_bitmap) within a RAMBlock, |
4 | simplifying management. | ||
4 | 5 | ||
5 | Guest_memfd was initially set to shared until the commit bd3bcf6962 | 6 | The realize()/unrealized() helpers are used to initialize/uninitialize |
6 | ("kvm/memory: Make memory type private by default if it has guest memfd | 7 | the RamBlockAttribute object. The object is registered/unregistered in |
7 | backend"). To align with this change, the default state in | 8 | the target RAMBlock's MemoryRegion when creating guest_memfd. |
8 | guest_memfd_manager is set to private. (The bitmap is cleared to 0). | 9 | |
9 | Additionally, setting the default to private can also reduce the | 10 | Additionally, use the private_shared_manager_state_change() helper to |
10 | overhead of mapping shared pages into IOMMU by VFIO during the bootup stage. | 11 | notify the registered PrivateSharedListener of these changes. |
11 | 12 | ||
12 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> | 13 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> |
13 | --- | 14 | --- |
14 | include/sysemu/guest-memfd-manager.h | 27 +++++++++++++++++++++++++++ | 15 | Changes in v4: |
15 | system/guest-memfd-manager.c | 28 +++++++++++++++++++++++++++- | 16 | - Remove the replay operations for attribute changes which will be |
16 | system/physmem.c | 7 +++++++ | 17 | handled in a listener in following patches. |
17 | 3 files changed, 61 insertions(+), 1 deletion(-) | 18 | - Add some comment in the error path of realize() to remind the |
19 | future development of the unified error path. | ||
18 | 20 | ||
19 | diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h | 21 | Changes in v3: |
22 | - Use ram_discard_manager_reply_populated/discarded() to set the | ||
23 | memory attribute and add the undo support if state_change() | ||
24 | failed. | ||
25 | - Didn't add Reviewed-by from Alexey due to the new changes in this | ||
26 | commit. | ||
27 | |||
28 | Changes in v2: | ||
29 | - Introduce a new field memory_attribute_manager in RAMBlock. | ||
30 | - Move the state_change() handling during page conversion in this patch. | ||
31 | - Undo what we did if it fails to set. | ||
32 | - Change the order of close(guest_memfd) and memory_attribute_manager cleanup. | ||
33 | --- | ||
34 | accel/kvm/kvm-all.c | 9 +++++++++ | ||
35 | include/exec/ramblock.h | 1 + | ||
36 | system/physmem.c | 16 ++++++++++++++++ | ||
37 | 3 files changed, 26 insertions(+) | ||
38 | |||
39 | diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c | ||
20 | index XXXXXXX..XXXXXXX 100644 | 40 | index XXXXXXX..XXXXXXX 100644 |
21 | --- a/include/sysemu/guest-memfd-manager.h | 41 | --- a/accel/kvm/kvm-all.c |
22 | +++ b/include/sysemu/guest-memfd-manager.h | 42 | +++ b/accel/kvm/kvm-all.c |
23 | @@ -XXX,XX +XXX,XX @@ struct GuestMemfdManager { | 43 | @@ -XXX,XX +XXX,XX @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private) |
24 | struct GuestMemfdManagerClass { | 44 | addr = memory_region_get_ram_ptr(mr) + section.offset_within_region; |
25 | ObjectClass parent_class; | 45 | rb = qemu_ram_block_from_host(addr, false, &offset); |
26 | 46 | ||
27 | + void (*realize)(GuestMemfdManager *gmm, MemoryRegion *mr, uint64_t region_size); | 47 | + ret = private_shared_manager_state_change(PRIVATE_SHARED_MANAGER(mr->gsm), |
28 | + void (*unrealize)(GuestMemfdManager *gmm); | 48 | + offset, size, to_private); |
29 | int (*state_change)(GuestMemfdManager *gmm, uint64_t offset, uint64_t size, | 49 | + if (ret) { |
30 | bool shared_to_private); | 50 | + error_report("Failed to notify the listener the state change of " |
31 | }; | 51 | + "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s", |
32 | @@ -XXX,XX +XXX,XX @@ static inline int guest_memfd_manager_state_change(GuestMemfdManager *gmm, uint6 | 52 | + start, size, to_private ? "private" : "shared"); |
33 | return 0; | 53 | + goto out_unref; |
34 | } | 54 | + } |
35 | |||
36 | +static inline void guest_memfd_manager_realize(GuestMemfdManager *gmm, | ||
37 | + MemoryRegion *mr, uint64_t region_size) | ||
38 | +{ | ||
39 | + GuestMemfdManagerClass *klass; | ||
40 | + | 55 | + |
41 | + g_assert(gmm); | 56 | if (to_private) { |
42 | + klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm); | 57 | if (rb->page_size != qemu_real_host_page_size()) { |
43 | + | 58 | /* |
44 | + if (klass->realize) { | 59 | diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h |
45 | + klass->realize(gmm, mr, region_size); | ||
46 | + } | ||
47 | +} | ||
48 | + | ||
49 | +static inline void guest_memfd_manager_unrealize(GuestMemfdManager *gmm) | ||
50 | +{ | ||
51 | + GuestMemfdManagerClass *klass; | ||
52 | + | ||
53 | + g_assert(gmm); | ||
54 | + klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm); | ||
55 | + | ||
56 | + if (klass->unrealize) { | ||
57 | + klass->unrealize(gmm); | ||
58 | + } | ||
59 | +} | ||
60 | + | ||
61 | #endif | ||
62 | diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c | ||
63 | index XXXXXXX..XXXXXXX 100644 | 60 | index XXXXXXX..XXXXXXX 100644 |
64 | --- a/system/guest-memfd-manager.c | 61 | --- a/include/exec/ramblock.h |
65 | +++ b/system/guest-memfd-manager.c | 62 | +++ b/include/exec/ramblock.h |
66 | @@ -XXX,XX +XXX,XX @@ static int guest_memfd_state_change(GuestMemfdManager *gmm, uint64_t offset, | 63 | @@ -XXX,XX +XXX,XX @@ struct RAMBlock { |
67 | return ret; | 64 | int fd; |
68 | } | 65 | uint64_t fd_offset; |
69 | 66 | int guest_memfd; | |
70 | +static void guest_memfd_manager_realizefn(GuestMemfdManager *gmm, MemoryRegion *mr, | 67 | + RamBlockAttribute *ram_block_attribute; |
71 | + uint64_t region_size) | 68 | size_t page_size; |
72 | +{ | 69 | /* dirty bitmap used during migration */ |
73 | + uint64_t bitmap_size; | 70 | unsigned long *bmap; |
74 | + | ||
75 | + gmm->block_size = qemu_real_host_page_size(); | ||
76 | + bitmap_size = ROUND_UP(region_size, gmm->block_size) / gmm->block_size; | ||
77 | + | ||
78 | + gmm->mr = mr; | ||
79 | + gmm->bitmap_size = bitmap_size; | ||
80 | + gmm->bitmap = bitmap_new(bitmap_size); | ||
81 | + | ||
82 | + memory_region_set_ram_discard_manager(gmm->mr, RAM_DISCARD_MANAGER(gmm)); | ||
83 | +} | ||
84 | + | ||
85 | +static void guest_memfd_manager_unrealizefn(GuestMemfdManager *gmm) | ||
86 | +{ | ||
87 | + memory_region_set_ram_discard_manager(gmm->mr, NULL); | ||
88 | + | ||
89 | + g_free(gmm->bitmap); | ||
90 | + gmm->bitmap = NULL; | ||
91 | + gmm->bitmap_size = 0; | ||
92 | + gmm->mr = NULL; | ||
93 | +} | ||
94 | + | ||
95 | static void guest_memfd_manager_init(Object *obj) | ||
96 | { | ||
97 | GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj); | ||
98 | @@ -XXX,XX +XXX,XX @@ static void guest_memfd_manager_init(Object *obj) | ||
99 | |||
100 | static void guest_memfd_manager_finalize(Object *obj) | ||
101 | { | ||
102 | - g_free(GUEST_MEMFD_MANAGER(obj)->bitmap); | ||
103 | } | ||
104 | |||
105 | static void guest_memfd_manager_class_init(ObjectClass *oc, void *data) | ||
106 | @@ -XXX,XX +XXX,XX @@ static void guest_memfd_manager_class_init(ObjectClass *oc, void *data) | ||
107 | RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); | ||
108 | |||
109 | gmmc->state_change = guest_memfd_state_change; | ||
110 | + gmmc->realize = guest_memfd_manager_realizefn; | ||
111 | + gmmc->unrealize = guest_memfd_manager_unrealizefn; | ||
112 | |||
113 | rdmc->get_min_granularity = guest_memfd_rdm_get_min_granularity; | ||
114 | rdmc->register_listener = guest_memfd_rdm_register_listener; | ||
115 | diff --git a/system/physmem.c b/system/physmem.c | 71 | diff --git a/system/physmem.c b/system/physmem.c |
116 | index XXXXXXX..XXXXXXX 100644 | 72 | index XXXXXXX..XXXXXXX 100644 |
117 | --- a/system/physmem.c | 73 | --- a/system/physmem.c |
118 | +++ b/system/physmem.c | 74 | +++ b/system/physmem.c |
119 | @@ -XXX,XX +XXX,XX @@ | ||
120 | #include "sysemu/hostmem.h" | ||
121 | #include "sysemu/hw_accel.h" | ||
122 | #include "sysemu/xen-mapcache.h" | ||
123 | +#include "sysemu/guest-memfd-manager.h" | ||
124 | #include "trace.h" | ||
125 | |||
126 | #ifdef CONFIG_FALLOCATE_PUNCH_HOLE | ||
127 | @@ -XXX,XX +XXX,XX @@ static void ram_block_add(RAMBlock *new_block, Error **errp) | 75 | @@ -XXX,XX +XXX,XX @@ static void ram_block_add(RAMBlock *new_block, Error **errp) |
128 | qemu_mutex_unlock_ramlist(); | 76 | qemu_mutex_unlock_ramlist(); |
129 | goto out_free; | 77 | goto out_free; |
130 | } | 78 | } |
131 | + | 79 | + |
132 | + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(object_new(TYPE_GUEST_MEMFD_MANAGER)); | 80 | + new_block->ram_block_attribute = RAM_BLOCK_ATTRIBUTE(object_new(TYPE_RAM_BLOCK_ATTRIBUTE)); |
133 | + guest_memfd_manager_realize(gmm, new_block->mr, new_block->mr->size); | 81 | + if (ram_block_attribute_realize(new_block->ram_block_attribute, new_block->mr)) { |
82 | + error_setg(errp, "Failed to realize ram block attribute"); | ||
83 | + /* | ||
84 | + * The error path could be unified if the rest of ram_block_add() ever | ||
85 | + * develops a need to check for errors. | ||
86 | + */ | ||
87 | + object_unref(OBJECT(new_block->ram_block_attribute)); | ||
88 | + close(new_block->guest_memfd); | ||
89 | + ram_block_discard_require(false); | ||
90 | + qemu_mutex_unlock_ramlist(); | ||
91 | + goto out_free; | ||
92 | + } | ||
134 | } | 93 | } |
135 | 94 | ||
136 | ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS; | 95 | ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS; |
137 | @@ -XXX,XX +XXX,XX @@ static void reclaim_ramblock(RAMBlock *block) | 96 | @@ -XXX,XX +XXX,XX @@ static void reclaim_ramblock(RAMBlock *block) |
97 | } | ||
138 | 98 | ||
139 | if (block->guest_memfd >= 0) { | 99 | if (block->guest_memfd >= 0) { |
100 | + ram_block_attribute_unrealize(block->ram_block_attribute); | ||
101 | + object_unref(OBJECT(block->ram_block_attribute)); | ||
140 | close(block->guest_memfd); | 102 | close(block->guest_memfd); |
141 | + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(block->mr->rdm); | ||
142 | + guest_memfd_manager_unrealize(gmm); | ||
143 | + object_unref(OBJECT(gmm)); | ||
144 | ram_block_discard_require(false); | 103 | ram_block_discard_require(false); |
145 | } | 104 | } |
146 | |||
147 | -- | 105 | -- |
148 | 2.43.5 | 106 | 2.43.5 | diff view generated by jsdifflib |
New patch | |||
---|---|---|---|
1 | So that the caller can check the result of NotifyStateClear() handler if | ||
2 | the operation fails. | ||
1 | 3 | ||
4 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> | ||
5 | --- | ||
6 | Changes in v4: | ||
7 | - Newly added. | ||
8 | --- | ||
9 | hw/vfio/common.c | 18 ++++++++++-------- | ||
10 | include/exec/memory.h | 4 ++-- | ||
11 | 2 files changed, 12 insertions(+), 10 deletions(-) | ||
12 | |||
13 | diff --git a/hw/vfio/common.c b/hw/vfio/common.c | ||
14 | index XXXXXXX..XXXXXXX 100644 | ||
15 | --- a/hw/vfio/common.c | ||
16 | +++ b/hw/vfio/common.c | ||
17 | @@ -XXX,XX +XXX,XX @@ out: | ||
18 | rcu_read_unlock(); | ||
19 | } | ||
20 | |||
21 | -static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer, | ||
22 | - MemoryRegionSection *section) | ||
23 | +static int vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer, | ||
24 | + MemoryRegionSection *section) | ||
25 | { | ||
26 | const hwaddr size = int128_get64(section->size); | ||
27 | const hwaddr iova = section->offset_within_address_space; | ||
28 | @@ -XXX,XX +XXX,XX @@ static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontaine | ||
29 | error_report("%s: vfio_container_dma_unmap() failed: %s", __func__, | ||
30 | strerror(-ret)); | ||
31 | } | ||
32 | + | ||
33 | + return ret; | ||
34 | } | ||
35 | |||
36 | -static void vfio_ram_discard_notify_discard(StateChangeListener *scl, | ||
37 | - MemoryRegionSection *section) | ||
38 | +static int vfio_ram_discard_notify_discard(StateChangeListener *scl, | ||
39 | + MemoryRegionSection *section) | ||
40 | { | ||
41 | RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl); | ||
42 | VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, | ||
43 | listener); | ||
44 | - vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section); | ||
45 | + return vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section); | ||
46 | } | ||
47 | |||
48 | -static void vfio_private_shared_notify_to_private(StateChangeListener *scl, | ||
49 | - MemoryRegionSection *section) | ||
50 | +static int vfio_private_shared_notify_to_private(StateChangeListener *scl, | ||
51 | + MemoryRegionSection *section) | ||
52 | { | ||
53 | PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl); | ||
54 | VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener, | ||
55 | listener); | ||
56 | - vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section); | ||
57 | + return vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section); | ||
58 | } | ||
59 | |||
60 | static int vfio_state_change_notify_to_state_set(VFIOContainerBase *bcontainer, | ||
61 | diff --git a/include/exec/memory.h b/include/exec/memory.h | ||
62 | index XXXXXXX..XXXXXXX 100644 | ||
63 | --- a/include/exec/memory.h | ||
64 | +++ b/include/exec/memory.h | ||
65 | @@ -XXX,XX +XXX,XX @@ typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque); | ||
66 | typedef struct StateChangeListener StateChangeListener; | ||
67 | typedef int (*NotifyStateSet)(StateChangeListener *scl, | ||
68 | MemoryRegionSection *section); | ||
69 | -typedef void (*NotifyStateClear)(StateChangeListener *scl, | ||
70 | - MemoryRegionSection *section); | ||
71 | +typedef int (*NotifyStateClear)(StateChangeListener *scl, | ||
72 | + MemoryRegionSection *section); | ||
73 | |||
74 | struct StateChangeListener { | ||
75 | /* | ||
76 | -- | ||
77 | 2.43.5 | diff view generated by jsdifflib |
1 | Introduce a helper to trigger the state_change() callback of the class. | 1 | With the introduction of the RamBlockAttribute object to manage |
---|---|---|---|
2 | Once exit to userspace to convert the page from private to shared or | 2 | RAMBlocks with guest_memfd and the implementation of |
3 | vice versa at runtime, notify the event via the helper so that other | 3 | PrivateSharedManager interface to convey page conversion events, it is |
4 | registered subsystems like VFIO can be notified. | 4 | more elegant to move attribute changes into a PrivateSharedListener. |
5 | |||
6 | The PrivateSharedListener is reigstered/unregistered for each memory | ||
7 | region section during kvm_region_add/del(), and listeners are stored in | ||
8 | a CVMPrivateSharedListener list for easy management. The listener | ||
9 | handler performs attribute changes upon receiving notifications from | ||
10 | private_shared_manager_state_change() calls. With this change, the | ||
11 | state changes operations in kvm_convert_memory() can be removed. | ||
12 | |||
13 | Note that after moving attribute changes into a listener, errors can be | ||
14 | returned in ram_block_attribute_notify_to_private() if attribute changes | ||
15 | fail in corner cases (e.g. -ENOMEM). Since there is currently no rollback | ||
16 | operation for the to_private case, an assert is used to prevent the | ||
17 | guest from continuing with a partially changed attribute state. | ||
5 | 18 | ||
6 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> | 19 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> |
7 | --- | 20 | --- |
8 | accel/kvm/kvm-all.c | 4 ++++ | 21 | Changes in v4: |
9 | include/sysemu/guest-memfd-manager.h | 15 +++++++++++++++ | 22 | - Newly added. |
10 | 2 files changed, 19 insertions(+) | 23 | --- |
24 | accel/kvm/kvm-all.c | 73 ++++++++++++++++++--- | ||
25 | include/system/confidential-guest-support.h | 10 +++ | ||
26 | system/ram-block-attribute.c | 17 ++++- | ||
27 | target/i386/kvm/tdx.c | 1 + | ||
28 | target/i386/sev.c | 1 + | ||
29 | 5 files changed, 90 insertions(+), 12 deletions(-) | ||
11 | 30 | ||
12 | diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c | 31 | diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c |
13 | index XXXXXXX..XXXXXXX 100644 | 32 | index XXXXXXX..XXXXXXX 100644 |
14 | --- a/accel/kvm/kvm-all.c | 33 | --- a/accel/kvm/kvm-all.c |
15 | +++ b/accel/kvm/kvm-all.c | 34 | +++ b/accel/kvm/kvm-all.c |
16 | @@ -XXX,XX +XXX,XX @@ | 35 | @@ -XXX,XX +XXX,XX @@ |
17 | #include "kvm-cpus.h" | 36 | #include "kvm-cpus.h" |
18 | #include "sysemu/dirtylimit.h" | 37 | #include "system/dirtylimit.h" |
19 | #include "qemu/range.h" | 38 | #include "qemu/range.h" |
20 | +#include "sysemu/guest-memfd-manager.h" | 39 | +#include "system/confidential-guest-support.h" |
21 | 40 | ||
22 | #include "hw/boards.h" | 41 | #include "hw/boards.h" |
23 | #include "sysemu/stats.h" | 42 | #include "system/stats.h" |
43 | @@ -XXX,XX +XXX,XX @@ static int kvm_dirty_ring_init(KVMState *s) | ||
44 | return 0; | ||
45 | } | ||
46 | |||
47 | +static int kvm_private_shared_notify(StateChangeListener *scl, | ||
48 | + MemoryRegionSection *section, | ||
49 | + bool to_private) | ||
50 | +{ | ||
51 | + hwaddr start = section->offset_within_address_space; | ||
52 | + hwaddr size = section->size; | ||
53 | + | ||
54 | + if (to_private) { | ||
55 | + return kvm_set_memory_attributes_private(start, size); | ||
56 | + } else { | ||
57 | + return kvm_set_memory_attributes_shared(start, size); | ||
58 | + } | ||
59 | +} | ||
60 | + | ||
61 | +static int kvm_private_shared_notify_to_shared(StateChangeListener *scl, | ||
62 | + MemoryRegionSection *section) | ||
63 | +{ | ||
64 | + return kvm_private_shared_notify(scl, section, false); | ||
65 | +} | ||
66 | + | ||
67 | +static int kvm_private_shared_notify_to_private(StateChangeListener *scl, | ||
68 | + MemoryRegionSection *section) | ||
69 | +{ | ||
70 | + return kvm_private_shared_notify(scl, section, true); | ||
71 | +} | ||
72 | + | ||
73 | static void kvm_region_add(MemoryListener *listener, | ||
74 | MemoryRegionSection *section) | ||
75 | { | ||
76 | KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener); | ||
77 | + ConfidentialGuestSupport *cgs = MACHINE(qdev_get_machine())->cgs; | ||
78 | + GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr); | ||
79 | KVMMemoryUpdate *update; | ||
80 | + CVMPrivateSharedListener *cpsl; | ||
81 | + PrivateSharedListener *psl; | ||
82 | + | ||
83 | |||
84 | update = g_new0(KVMMemoryUpdate, 1); | ||
85 | update->section = *section; | ||
86 | |||
87 | QSIMPLEQ_INSERT_TAIL(&kml->transaction_add, update, next); | ||
88 | + | ||
89 | + if (!memory_region_has_guest_memfd(section->mr) || !gsm) { | ||
90 | + return; | ||
91 | + } | ||
92 | + | ||
93 | + cpsl = g_new0(CVMPrivateSharedListener, 1); | ||
94 | + cpsl->mr = section->mr; | ||
95 | + cpsl->offset_within_address_space = section->offset_within_address_space; | ||
96 | + cpsl->granularity = generic_state_manager_get_min_granularity(gsm, section->mr); | ||
97 | + psl = &cpsl->listener; | ||
98 | + QLIST_INSERT_HEAD(&cgs->cvm_private_shared_list, cpsl, next); | ||
99 | + private_shared_listener_init(psl, kvm_private_shared_notify_to_shared, | ||
100 | + kvm_private_shared_notify_to_private); | ||
101 | + generic_state_manager_register_listener(gsm, &psl->scl, section); | ||
102 | } | ||
103 | |||
104 | static void kvm_region_del(MemoryListener *listener, | ||
105 | MemoryRegionSection *section) | ||
106 | { | ||
107 | KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener); | ||
108 | + ConfidentialGuestSupport *cgs = MACHINE(qdev_get_machine())->cgs; | ||
109 | + GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr); | ||
110 | KVMMemoryUpdate *update; | ||
111 | + CVMPrivateSharedListener *cpsl; | ||
112 | + PrivateSharedListener *psl; | ||
113 | |||
114 | update = g_new0(KVMMemoryUpdate, 1); | ||
115 | update->section = *section; | ||
116 | |||
117 | QSIMPLEQ_INSERT_TAIL(&kml->transaction_del, update, next); | ||
118 | + if (!memory_region_has_guest_memfd(section->mr) || !gsm) { | ||
119 | + return; | ||
120 | + } | ||
121 | + | ||
122 | + QLIST_FOREACH(cpsl, &cgs->cvm_private_shared_list, next) { | ||
123 | + if (cpsl->mr == section->mr && | ||
124 | + cpsl->offset_within_address_space == section->offset_within_address_space) { | ||
125 | + psl = &cpsl->listener; | ||
126 | + generic_state_manager_unregister_listener(gsm, &psl->scl); | ||
127 | + QLIST_REMOVE(cpsl, next); | ||
128 | + g_free(cpsl); | ||
129 | + break; | ||
130 | + } | ||
131 | + } | ||
132 | } | ||
133 | |||
134 | static void kvm_region_commit(MemoryListener *listener) | ||
24 | @@ -XXX,XX +XXX,XX @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private) | 135 | @@ -XXX,XX +XXX,XX @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private) |
136 | goto out_unref; | ||
137 | } | ||
138 | |||
139 | - if (to_private) { | ||
140 | - ret = kvm_set_memory_attributes_private(start, size); | ||
141 | - } else { | ||
142 | - ret = kvm_set_memory_attributes_shared(start, size); | ||
143 | - } | ||
144 | - if (ret) { | ||
145 | - goto out_unref; | ||
146 | - } | ||
147 | - | ||
25 | addr = memory_region_get_ram_ptr(mr) + section.offset_within_region; | 148 | addr = memory_region_get_ram_ptr(mr) + section.offset_within_region; |
26 | rb = qemu_ram_block_from_host(addr, false, &offset); | 149 | rb = qemu_ram_block_from_host(addr, false, &offset); |
27 | 150 | ||
28 | + guest_memfd_manager_state_change(GUEST_MEMFD_MANAGER(mr->rdm), offset, | 151 | diff --git a/include/system/confidential-guest-support.h b/include/system/confidential-guest-support.h |
29 | + size, to_private); | 152 | index XXXXXXX..XXXXXXX 100644 |
30 | + | 153 | --- a/include/system/confidential-guest-support.h |
31 | if (to_private) { | 154 | +++ b/include/system/confidential-guest-support.h |
32 | if (rb->page_size != qemu_real_host_page_size()) { | 155 | @@ -XXX,XX +XXX,XX @@ |
33 | /* | ||
34 | diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h | ||
35 | index XXXXXXX..XXXXXXX 100644 | ||
36 | --- a/include/sysemu/guest-memfd-manager.h | ||
37 | +++ b/include/sysemu/guest-memfd-manager.h | ||
38 | @@ -XXX,XX +XXX,XX @@ struct GuestMemfdManagerClass { | ||
39 | bool shared_to_private); | ||
40 | }; | ||
41 | |||
42 | +static inline int guest_memfd_manager_state_change(GuestMemfdManager *gmm, uint64_t offset, | ||
43 | + uint64_t size, bool shared_to_private) | ||
44 | +{ | ||
45 | + GuestMemfdManagerClass *klass; | ||
46 | + | ||
47 | + g_assert(gmm); | ||
48 | + klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm); | ||
49 | + | ||
50 | + if (klass->state_change) { | ||
51 | + return klass->state_change(gmm, offset, size, shared_to_private); | ||
52 | + } | ||
53 | + | ||
54 | + return 0; | ||
55 | +} | ||
56 | + | ||
57 | #endif | 156 | #endif |
157 | |||
158 | #include "qom/object.h" | ||
159 | +#include "exec/memory.h" | ||
160 | |||
161 | #define TYPE_CONFIDENTIAL_GUEST_SUPPORT "confidential-guest-support" | ||
162 | OBJECT_DECLARE_TYPE(ConfidentialGuestSupport, | ||
163 | ConfidentialGuestSupportClass, | ||
164 | CONFIDENTIAL_GUEST_SUPPORT) | ||
165 | |||
166 | +typedef struct CVMPrivateSharedListener { | ||
167 | + MemoryRegion *mr; | ||
168 | + hwaddr offset_within_address_space; | ||
169 | + uint64_t granularity; | ||
170 | + PrivateSharedListener listener; | ||
171 | + QLIST_ENTRY(CVMPrivateSharedListener) next; | ||
172 | +} CVMPrivateSharedListener; | ||
173 | |||
174 | struct ConfidentialGuestSupport { | ||
175 | Object parent; | ||
176 | @@ -XXX,XX +XXX,XX @@ struct ConfidentialGuestSupport { | ||
177 | */ | ||
178 | bool require_guest_memfd; | ||
179 | |||
180 | + QLIST_HEAD(, CVMPrivateSharedListener) cvm_private_shared_list; | ||
181 | + | ||
182 | /* | ||
183 | * ready: flag set by CGS initialization code once it's ready to | ||
184 | * start executing instructions in a potentially-secure | ||
185 | diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c | ||
186 | index XXXXXXX..XXXXXXX 100644 | ||
187 | --- a/system/ram-block-attribute.c | ||
188 | +++ b/system/ram-block-attribute.c | ||
189 | @@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr, | ||
190 | uint64_t offset, uint64_t size) | ||
191 | { | ||
192 | PrivateSharedListener *psl; | ||
193 | + int ret; | ||
194 | |||
195 | QLIST_FOREACH(psl, &attr->psl_list, next) { | ||
196 | StateChangeListener *scl = &psl->scl; | ||
197 | @@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr, | ||
198 | if (!memory_region_section_intersect_range(&tmp, offset, size)) { | ||
199 | continue; | ||
200 | } | ||
201 | - scl->notify_to_state_clear(scl, &tmp); | ||
202 | + /* | ||
203 | + * No undo operation for the state_clear() callback failure at present. | ||
204 | + * Expect the state_clear() callback always succeed. | ||
205 | + */ | ||
206 | + ret = scl->notify_to_state_clear(scl, &tmp); | ||
207 | + g_assert(!ret); | ||
208 | } | ||
209 | } | ||
210 | |||
211 | @@ -XXX,XX +XXX,XX @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr, | ||
212 | uint64_t offset, uint64_t size) | ||
213 | { | ||
214 | PrivateSharedListener *psl, *psl2; | ||
215 | - int ret = 0; | ||
216 | + int ret = 0, ret2 = 0; | ||
217 | |||
218 | QLIST_FOREACH(psl, &attr->psl_list, next) { | ||
219 | StateChangeListener *scl = &psl->scl; | ||
220 | @@ -XXX,XX +XXX,XX @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr, | ||
221 | if (!memory_region_section_intersect_range(&tmp, offset, size)) { | ||
222 | continue; | ||
223 | } | ||
224 | - scl2->notify_to_state_clear(scl2, &tmp); | ||
225 | + /* | ||
226 | + * No undo operation for the state_clear() callback failure at present. | ||
227 | + * Expect the state_clear() callback always succeed. | ||
228 | + */ | ||
229 | + ret2 = scl2->notify_to_state_clear(scl2, &tmp); | ||
230 | + g_assert(!ret2); | ||
231 | } | ||
232 | } | ||
233 | return ret; | ||
234 | diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c | ||
235 | index XXXXXXX..XXXXXXX 100644 | ||
236 | --- a/target/i386/kvm/tdx.c | ||
237 | +++ b/target/i386/kvm/tdx.c | ||
238 | @@ -XXX,XX +XXX,XX @@ static void tdx_guest_init(Object *obj) | ||
239 | qemu_mutex_init(&tdx->lock); | ||
240 | |||
241 | cgs->require_guest_memfd = true; | ||
242 | + QLIST_INIT(&cgs->cvm_private_shared_list); | ||
243 | tdx->attributes = TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE; | ||
244 | |||
245 | object_property_add_uint64_ptr(obj, "attributes", &tdx->attributes, | ||
246 | diff --git a/target/i386/sev.c b/target/i386/sev.c | ||
247 | index XXXXXXX..XXXXXXX 100644 | ||
248 | --- a/target/i386/sev.c | ||
249 | +++ b/target/i386/sev.c | ||
250 | @@ -XXX,XX +XXX,XX @@ sev_snp_guest_instance_init(Object *obj) | ||
251 | SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(obj); | ||
252 | |||
253 | cgs->require_guest_memfd = true; | ||
254 | + QLIST_INIT(&cgs->cvm_private_shared_list); | ||
255 | |||
256 | /* default init/start/finish params for kvm */ | ||
257 | sev_snp_guest->kvm_start_conf.policy = DEFAULT_SEV_SNP_POLICY; | ||
58 | -- | 258 | -- |
59 | 2.43.5 | 259 | 2.43.5 | diff view generated by jsdifflib |
New patch | |||
---|---|---|---|
1 | In-place page conversion requires operations to follow a specific | ||
2 | sequence: unmap-before-conversion-to-private and | ||
3 | map-after-conversion-to-shared. Currently, both attribute changes and | ||
4 | VFIO DMA map/unmap operations are handled by PrivateSharedListeners, | ||
5 | they need to be invoked in a specific order. | ||
1 | 6 | ||
7 | For private to shared conversion: | ||
8 | - Change attribute to shared. | ||
9 | - VFIO populates the shared mappings into the IOMMU. | ||
10 | - Restore attribute if the operation fails. | ||
11 | |||
12 | For shared to private conversion: | ||
13 | - VFIO discards shared mapping from the IOMMU. | ||
14 | - Change attribute to private. | ||
15 | |||
16 | To faciliate this sequence, priority support is added to | ||
17 | PrivateSharedListener so that listeners are stored in a determined | ||
18 | order based on priority. A tail queue is used to store listeners, | ||
19 | allowing traversal in either direction. | ||
20 | |||
21 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> | ||
22 | --- | ||
23 | Changes in v4: | ||
24 | - Newly added. | ||
25 | --- | ||
26 | accel/kvm/kvm-all.c | 3 ++- | ||
27 | hw/vfio/common.c | 3 ++- | ||
28 | include/exec/memory.h | 19 +++++++++++++++++-- | ||
29 | include/exec/ramblock.h | 2 +- | ||
30 | system/ram-block-attribute.c | 23 +++++++++++++++++------ | ||
31 | 5 files changed, 39 insertions(+), 11 deletions(-) | ||
32 | |||
33 | diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c | ||
34 | index XXXXXXX..XXXXXXX 100644 | ||
35 | --- a/accel/kvm/kvm-all.c | ||
36 | +++ b/accel/kvm/kvm-all.c | ||
37 | @@ -XXX,XX +XXX,XX @@ static void kvm_region_add(MemoryListener *listener, | ||
38 | psl = &cpsl->listener; | ||
39 | QLIST_INSERT_HEAD(&cgs->cvm_private_shared_list, cpsl, next); | ||
40 | private_shared_listener_init(psl, kvm_private_shared_notify_to_shared, | ||
41 | - kvm_private_shared_notify_to_private); | ||
42 | + kvm_private_shared_notify_to_private, | ||
43 | + PRIVATE_SHARED_LISTENER_PRIORITY_MIN); | ||
44 | generic_state_manager_register_listener(gsm, &psl->scl, section); | ||
45 | } | ||
46 | |||
47 | diff --git a/hw/vfio/common.c b/hw/vfio/common.c | ||
48 | index XXXXXXX..XXXXXXX 100644 | ||
49 | --- a/hw/vfio/common.c | ||
50 | +++ b/hw/vfio/common.c | ||
51 | @@ -XXX,XX +XXX,XX @@ static void vfio_register_private_shared_listener(VFIOContainerBase *bcontainer, | ||
52 | |||
53 | psl = &vpsl->listener; | ||
54 | private_shared_listener_init(psl, vfio_private_shared_notify_to_shared, | ||
55 | - vfio_private_shared_notify_to_private); | ||
56 | + vfio_private_shared_notify_to_private, | ||
57 | + PRIVATE_SHARED_LISTENER_PRIORITY_COMMON); | ||
58 | generic_state_manager_register_listener(gsm, &psl->scl, section); | ||
59 | QLIST_INSERT_HEAD(&bcontainer->vpsl_list, vpsl, next); | ||
60 | } | ||
61 | diff --git a/include/exec/memory.h b/include/exec/memory.h | ||
62 | index XXXXXXX..XXXXXXX 100644 | ||
63 | --- a/include/exec/memory.h | ||
64 | +++ b/include/exec/memory.h | ||
65 | @@ -XXX,XX +XXX,XX @@ struct RamDiscardManagerClass { | ||
66 | GenericStateManagerClass parent_class; | ||
67 | }; | ||
68 | |||
69 | +#define PRIVATE_SHARED_LISTENER_PRIORITY_MIN 0 | ||
70 | +#define PRIVATE_SHARED_LISTENER_PRIORITY_COMMON 10 | ||
71 | + | ||
72 | typedef struct PrivateSharedListener PrivateSharedListener; | ||
73 | struct PrivateSharedListener { | ||
74 | struct StateChangeListener scl; | ||
75 | |||
76 | - QLIST_ENTRY(PrivateSharedListener) next; | ||
77 | + /* | ||
78 | + * @priority: | ||
79 | + * | ||
80 | + * Govern the order in which ram discard listeners are invoked. Lower priorities | ||
81 | + * are invoked earlier. | ||
82 | + * The listener priority can help to undo the effects of previous listeners in | ||
83 | + * a reverse order in case of a failure callback. | ||
84 | + */ | ||
85 | + int priority; | ||
86 | + | ||
87 | + QTAILQ_ENTRY(PrivateSharedListener) next; | ||
88 | }; | ||
89 | |||
90 | struct PrivateSharedManagerClass { | ||
91 | @@ -XXX,XX +XXX,XX @@ struct PrivateSharedManagerClass { | ||
92 | |||
93 | static inline void private_shared_listener_init(PrivateSharedListener *psl, | ||
94 | NotifyStateSet populate_fn, | ||
95 | - NotifyStateClear discard_fn) | ||
96 | + NotifyStateClear discard_fn, | ||
97 | + int priority) | ||
98 | { | ||
99 | state_change_listener_init(&psl->scl, populate_fn, discard_fn); | ||
100 | + psl->priority = priority; | ||
101 | } | ||
102 | |||
103 | int private_shared_manager_state_change(PrivateSharedManager *mgr, | ||
104 | diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h | ||
105 | index XXXXXXX..XXXXXXX 100644 | ||
106 | --- a/include/exec/ramblock.h | ||
107 | +++ b/include/exec/ramblock.h | ||
108 | @@ -XXX,XX +XXX,XX @@ struct RamBlockAttribute { | ||
109 | unsigned shared_bitmap_size; | ||
110 | unsigned long *shared_bitmap; | ||
111 | |||
112 | - QLIST_HEAD(, PrivateSharedListener) psl_list; | ||
113 | + QTAILQ_HEAD(, PrivateSharedListener) psl_list; | ||
114 | }; | ||
115 | |||
116 | struct RamBlockAttributeClass { | ||
117 | diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c | ||
118 | index XXXXXXX..XXXXXXX 100644 | ||
119 | --- a/system/ram-block-attribute.c | ||
120 | +++ b/system/ram-block-attribute.c | ||
121 | @@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_psm_register_listener(GenericStateManager *gsm, | ||
122 | { | ||
123 | RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm); | ||
124 | PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl); | ||
125 | + PrivateSharedListener *other = NULL; | ||
126 | int ret; | ||
127 | |||
128 | g_assert(section->mr == attr->mr); | ||
129 | scl->section = memory_region_section_new_copy(section); | ||
130 | |||
131 | - QLIST_INSERT_HEAD(&attr->psl_list, psl, next); | ||
132 | + if (QTAILQ_EMPTY(&attr->psl_list) || | ||
133 | + psl->priority >= QTAILQ_LAST(&attr->psl_list)->priority) { | ||
134 | + QTAILQ_INSERT_TAIL(&attr->psl_list, psl, next); | ||
135 | + } else { | ||
136 | + QTAILQ_FOREACH(other, &attr->psl_list, next) { | ||
137 | + if (psl->priority < other->priority) { | ||
138 | + break; | ||
139 | + } | ||
140 | + } | ||
141 | + QTAILQ_INSERT_BEFORE(other, psl, next); | ||
142 | + } | ||
143 | |||
144 | ret = ram_block_attribute_for_each_shared_section(attr, section, scl, | ||
145 | ram_block_attribute_notify_shared_cb); | ||
146 | @@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm | ||
147 | |||
148 | memory_region_section_free_copy(scl->section); | ||
149 | scl->section = NULL; | ||
150 | - QLIST_REMOVE(psl, next); | ||
151 | + QTAILQ_REMOVE(&attr->psl_list, psl, next); | ||
152 | } | ||
153 | |||
154 | typedef struct RamBlockAttributeReplayData { | ||
155 | @@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr, | ||
156 | PrivateSharedListener *psl; | ||
157 | int ret; | ||
158 | |||
159 | - QLIST_FOREACH(psl, &attr->psl_list, next) { | ||
160 | + QTAILQ_FOREACH_REVERSE(psl, &attr->psl_list, next) { | ||
161 | StateChangeListener *scl = &psl->scl; | ||
162 | MemoryRegionSection tmp = *scl->section; | ||
163 | |||
164 | @@ -XXX,XX +XXX,XX @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr, | ||
165 | PrivateSharedListener *psl, *psl2; | ||
166 | int ret = 0, ret2 = 0; | ||
167 | |||
168 | - QLIST_FOREACH(psl, &attr->psl_list, next) { | ||
169 | + QTAILQ_FOREACH(psl, &attr->psl_list, next) { | ||
170 | StateChangeListener *scl = &psl->scl; | ||
171 | MemoryRegionSection tmp = *scl->section; | ||
172 | |||
173 | @@ -XXX,XX +XXX,XX @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr, | ||
174 | |||
175 | if (ret) { | ||
176 | /* Notify all already-notified listeners. */ | ||
177 | - QLIST_FOREACH(psl2, &attr->psl_list, next) { | ||
178 | + QTAILQ_FOREACH(psl2, &attr->psl_list, next) { | ||
179 | StateChangeListener *scl2 = &psl2->scl; | ||
180 | MemoryRegionSection tmp = *scl2->section; | ||
181 | |||
182 | @@ -XXX,XX +XXX,XX @@ static void ram_block_attribute_init(Object *obj) | ||
183 | { | ||
184 | RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(obj); | ||
185 | |||
186 | - QLIST_INIT(&attr->psl_list); | ||
187 | + QTAILQ_INIT(&attr->psl_list); | ||
188 | } | ||
189 | |||
190 | static void ram_block_attribute_finalize(Object *obj) | ||
191 | -- | ||
192 | 2.43.5 | diff view generated by jsdifflib |
1 | As guest_memfd is now managed by guest_memfd_manager with | 1 | As guest_memfd is now managed by ram_block_attribute with |
---|---|---|---|
2 | RamDiscardManager, only block uncoordinated discard. | 2 | PrivateSharedManager, only block uncoordinated discard. |
3 | 3 | ||
4 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> | 4 | Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> |
5 | --- | 5 | --- |
6 | system/physmem.c | 2 +- | 6 | Changes in v4: |
7 | 1 file changed, 1 insertion(+), 1 deletion(-) | 7 | - Modify commit message (RamDiscardManager->PrivateSharedManager). |
8 | |||
9 | Changes in v3: | ||
10 | - No change. | ||
11 | |||
12 | Changes in v2: | ||
13 | - Change the ram_block_discard_require(false) to | ||
14 | ram_block_coordinated_discard_require(false). | ||
15 | --- | ||
16 | system/physmem.c | 6 +++--- | ||
17 | 1 file changed, 3 insertions(+), 3 deletions(-) | ||
8 | 18 | ||
9 | diff --git a/system/physmem.c b/system/physmem.c | 19 | diff --git a/system/physmem.c b/system/physmem.c |
10 | index XXXXXXX..XXXXXXX 100644 | 20 | index XXXXXXX..XXXXXXX 100644 |
11 | --- a/system/physmem.c | 21 | --- a/system/physmem.c |
12 | +++ b/system/physmem.c | 22 | +++ b/system/physmem.c |
... | ... | ||
17 | - ret = ram_block_discard_require(true); | 27 | - ret = ram_block_discard_require(true); |
18 | + ret = ram_block_coordinated_discard_require(true); | 28 | + ret = ram_block_coordinated_discard_require(true); |
19 | if (ret < 0) { | 29 | if (ret < 0) { |
20 | error_setg_errno(errp, -ret, | 30 | error_setg_errno(errp, -ret, |
21 | "cannot set up private guest memory: discard currently blocked"); | 31 | "cannot set up private guest memory: discard currently blocked"); |
32 | @@ -XXX,XX +XXX,XX @@ static void ram_block_add(RAMBlock *new_block, Error **errp) | ||
33 | */ | ||
34 | object_unref(OBJECT(new_block->ram_block_attribute)); | ||
35 | close(new_block->guest_memfd); | ||
36 | - ram_block_discard_require(false); | ||
37 | + ram_block_coordinated_discard_require(false); | ||
38 | qemu_mutex_unlock_ramlist(); | ||
39 | goto out_free; | ||
40 | } | ||
41 | @@ -XXX,XX +XXX,XX @@ static void reclaim_ramblock(RAMBlock *block) | ||
42 | ram_block_attribute_unrealize(block->ram_block_attribute); | ||
43 | object_unref(OBJECT(block->ram_block_attribute)); | ||
44 | close(block->guest_memfd); | ||
45 | - ram_block_discard_require(false); | ||
46 | + ram_block_coordinated_discard_require(false); | ||
47 | } | ||
48 | |||
49 | g_free(block); | ||
22 | -- | 50 | -- |
23 | 2.43.5 | 51 | 2.43.5 | diff view generated by jsdifflib |