When guest_memfd is used for both shared/private memory, converting
pages to shared may require kvm_arch_gmem_invalidate() to be issued to
return the pages to an architecturally-defined "shared" state if the
pages were previously allocated and transitioned to a private state via
kvm_arch_gmem_prepare().
Handle this by issuing the appropriate kvm_arch_gmem_invalidate() calls
when converting ranges in the filemap to a shared state.
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
virt/kvm/guest_memfd.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b77cdccd340e..f27e1f3962bb 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -203,6 +203,28 @@ static int kvm_gmem_shareability_apply(struct inode *inode,
struct maple_tree *mt;
mt = &kvm_gmem_private(inode)->shareability;
+
+ /*
+ * If a folio has been allocated then it was possibly in a private
+ * state prior to conversion. Ensure arch invalidations are issued
+ * to return the folio to a normal/shared state as defined by the
+ * architecture before tracking it as shared in gmem.
+ */
+ if (m == SHAREABILITY_ALL) {
+ pgoff_t idx;
+
+ for (idx = work->start; idx < work->start + work->nr_pages; idx++) {
+ struct folio *folio = filemap_lock_folio(inode->i_mapping, idx);
+
+ if (!IS_ERR(folio)) {
+ kvm_arch_gmem_invalidate(folio_pfn(folio),
+ folio_pfn(folio) + folio_nr_pages(folio));
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+ }
+ }
+
return kvm_gmem_shareability_store(mt, work->start, work->nr_pages, m);
}
--
2.25.1
On Thu, Jun 12, 2025 at 5:56 PM Michael Roth <michael.roth@amd.com> wrote: > > When guest_memfd is used for both shared/private memory, converting > pages to shared may require kvm_arch_gmem_invalidate() to be issued to > return the pages to an architecturally-defined "shared" state if the > pages were previously allocated and transitioned to a private state via > kvm_arch_gmem_prepare(). > > Handle this by issuing the appropriate kvm_arch_gmem_invalidate() calls > when converting ranges in the filemap to a shared state. > > Signed-off-by: Michael Roth <michael.roth@amd.com> > --- > virt/kvm/guest_memfd.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index b77cdccd340e..f27e1f3962bb 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -203,6 +203,28 @@ static int kvm_gmem_shareability_apply(struct inode *inode, > struct maple_tree *mt; > > mt = &kvm_gmem_private(inode)->shareability; > + > + /* > + * If a folio has been allocated then it was possibly in a private > + * state prior to conversion. Ensure arch invalidations are issued > + * to return the folio to a normal/shared state as defined by the > + * architecture before tracking it as shared in gmem. > + */ > + if (m == SHAREABILITY_ALL) { > + pgoff_t idx; > + > + for (idx = work->start; idx < work->start + work->nr_pages; idx++) { It is redundant to enter this loop for VM variants that don't need this loop e.g. for pKVM/TDX. I think KVM can dictate a set of rules (based on VM type) that guest_memfd will follow for memory management when it's created, e.g. something like: 1) needs pfn invalidation 2) needs zeroing on shared faults 3) needs zeroing on allocation > + struct folio *folio = filemap_lock_folio(inode->i_mapping, idx); > + > + if (!IS_ERR(folio)) { > + kvm_arch_gmem_invalidate(folio_pfn(folio), > + folio_pfn(folio) + folio_nr_pages(folio)); > + folio_unlock(folio); > + folio_put(folio); > + } > + } > + } > + > return kvm_gmem_shareability_store(mt, work->start, work->nr_pages, m); > } > > -- > 2.25.1 >
On Tue, Jul 15, 2025 at 06:20:09AM -0700, Vishal Annapurve wrote: > On Thu, Jun 12, 2025 at 5:56 PM Michael Roth <michael.roth@amd.com> wrote: > > > > When guest_memfd is used for both shared/private memory, converting > > pages to shared may require kvm_arch_gmem_invalidate() to be issued to > > return the pages to an architecturally-defined "shared" state if the > > pages were previously allocated and transitioned to a private state via > > kvm_arch_gmem_prepare(). > > > > Handle this by issuing the appropriate kvm_arch_gmem_invalidate() calls > > when converting ranges in the filemap to a shared state. > > > > Signed-off-by: Michael Roth <michael.roth@amd.com> > > --- > > virt/kvm/guest_memfd.c | 22 ++++++++++++++++++++++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > index b77cdccd340e..f27e1f3962bb 100644 > > --- a/virt/kvm/guest_memfd.c > > +++ b/virt/kvm/guest_memfd.c > > @@ -203,6 +203,28 @@ static int kvm_gmem_shareability_apply(struct inode *inode, > > struct maple_tree *mt; > > > > mt = &kvm_gmem_private(inode)->shareability; > > + > > + /* > > + * If a folio has been allocated then it was possibly in a private > > + * state prior to conversion. Ensure arch invalidations are issued > > + * to return the folio to a normal/shared state as defined by the > > + * architecture before tracking it as shared in gmem. > > + */ > > + if (m == SHAREABILITY_ALL) { > > + pgoff_t idx; > > + > > + for (idx = work->start; idx < work->start + work->nr_pages; idx++) { > > It is redundant to enter this loop for VM variants that don't need > this loop e.g. for pKVM/TDX. I think KVM can dictate a set of rules > (based on VM type) that guest_memfd will follow for memory management > when it's created, e.g. something like: > 1) needs pfn invalidation > 2) needs zeroing on shared faults > 3) needs zeroing on allocation Makes sense. Maybe internal/reserved GUEST_MEMFD_FLAG_*'s that can be passed to kvm_gmem_create()? -Mike > > > + struct folio *folio = filemap_lock_folio(inode->i_mapping, idx); > > + > > + if (!IS_ERR(folio)) { > > + kvm_arch_gmem_invalidate(folio_pfn(folio), > > + folio_pfn(folio) + folio_nr_pages(folio)); > > + folio_unlock(folio); > > + folio_put(folio); > > + } > > + } > > + } > > + > > return kvm_gmem_shareability_store(mt, work->start, work->nr_pages, m); > > } > > > > -- > > 2.25.1 > >
On Tue, Jul 15, 2025 at 3:56 PM Michael Roth <michael.roth@amd.com> wrote: > > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > > index b77cdccd340e..f27e1f3962bb 100644 > > > --- a/virt/kvm/guest_memfd.c > > > +++ b/virt/kvm/guest_memfd.c > > > @@ -203,6 +203,28 @@ static int kvm_gmem_shareability_apply(struct inode *inode, > > > struct maple_tree *mt; > > > > > > mt = &kvm_gmem_private(inode)->shareability; > > > + > > > + /* > > > + * If a folio has been allocated then it was possibly in a private > > > + * state prior to conversion. Ensure arch invalidations are issued > > > + * to return the folio to a normal/shared state as defined by the > > > + * architecture before tracking it as shared in gmem. > > > + */ > > > + if (m == SHAREABILITY_ALL) { > > > + pgoff_t idx; > > > + > > > + for (idx = work->start; idx < work->start + work->nr_pages; idx++) { > > > > It is redundant to enter this loop for VM variants that don't need > > this loop e.g. for pKVM/TDX. I think KVM can dictate a set of rules > > (based on VM type) that guest_memfd will follow for memory management > > when it's created, e.g. something like: > > 1) needs pfn invalidation > > 2) needs zeroing on shared faults > > 3) needs zeroing on allocation > > Makes sense. Maybe internal/reserved GUEST_MEMFD_FLAG_*'s that can be passed > to kvm_gmem_create()? Yeah, a set of internal flags in addition to what is passed by user space looks good to me. i.e. Something like: -int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) +int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args, u64 kvm_flags) > > -Mike
© 2016 - 2025 Red Hat, Inc.