When guest_memfd is used for both shared/private memory, converting
pages to shared may require kvm_arch_gmem_invalidate() to be issued to
return the pages to an architecturally-defined "shared" state if the
pages were previously allocated and transitioned to a private state via
kvm_arch_gmem_prepare().
Handle this by issuing the appropriate kvm_arch_gmem_invalidate() calls
when converting ranges in the filemap to a shared state.
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
virt/kvm/guest_memfd.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b77cdccd340e..f27e1f3962bb 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -203,6 +203,28 @@ static int kvm_gmem_shareability_apply(struct inode *inode,
struct maple_tree *mt;
mt = &kvm_gmem_private(inode)->shareability;
+
+ /*
+ * If a folio has been allocated then it was possibly in a private
+ * state prior to conversion. Ensure arch invalidations are issued
+ * to return the folio to a normal/shared state as defined by the
+ * architecture before tracking it as shared in gmem.
+ */
+ if (m == SHAREABILITY_ALL) {
+ pgoff_t idx;
+
+ for (idx = work->start; idx < work->start + work->nr_pages; idx++) {
+ struct folio *folio = filemap_lock_folio(inode->i_mapping, idx);
+
+ if (!IS_ERR(folio)) {
+ kvm_arch_gmem_invalidate(folio_pfn(folio),
+ folio_pfn(folio) + folio_nr_pages(folio));
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+ }
+ }
+
return kvm_gmem_shareability_store(mt, work->start, work->nr_pages, m);
}
--
2.25.1
On Thu, Jun 12, 2025 at 5:56 PM Michael Roth <michael.roth@amd.com> wrote:
>
> When guest_memfd is used for both shared/private memory, converting
> pages to shared may require kvm_arch_gmem_invalidate() to be issued to
> return the pages to an architecturally-defined "shared" state if the
> pages were previously allocated and transitioned to a private state via
> kvm_arch_gmem_prepare().
>
> Handle this by issuing the appropriate kvm_arch_gmem_invalidate() calls
> when converting ranges in the filemap to a shared state.
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
> virt/kvm/guest_memfd.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index b77cdccd340e..f27e1f3962bb 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -203,6 +203,28 @@ static int kvm_gmem_shareability_apply(struct inode *inode,
> struct maple_tree *mt;
>
> mt = &kvm_gmem_private(inode)->shareability;
> +
> + /*
> + * If a folio has been allocated then it was possibly in a private
> + * state prior to conversion. Ensure arch invalidations are issued
> + * to return the folio to a normal/shared state as defined by the
> + * architecture before tracking it as shared in gmem.
> + */
> + if (m == SHAREABILITY_ALL) {
> + pgoff_t idx;
> +
> + for (idx = work->start; idx < work->start + work->nr_pages; idx++) {
It is redundant to enter this loop for VM variants that don't need
this loop e.g. for pKVM/TDX. I think KVM can dictate a set of rules
(based on VM type) that guest_memfd will follow for memory management
when it's created, e.g. something like:
1) needs pfn invalidation
2) needs zeroing on shared faults
3) needs zeroing on allocation
> + struct folio *folio = filemap_lock_folio(inode->i_mapping, idx);
> +
> + if (!IS_ERR(folio)) {
> + kvm_arch_gmem_invalidate(folio_pfn(folio),
> + folio_pfn(folio) + folio_nr_pages(folio));
> + folio_unlock(folio);
> + folio_put(folio);
> + }
> + }
> + }
> +
> return kvm_gmem_shareability_store(mt, work->start, work->nr_pages, m);
> }
>
> --
> 2.25.1
>
On Tue, Jul 15, 2025 at 06:20:09AM -0700, Vishal Annapurve wrote:
> On Thu, Jun 12, 2025 at 5:56 PM Michael Roth <michael.roth@amd.com> wrote:
> >
> > When guest_memfd is used for both shared/private memory, converting
> > pages to shared may require kvm_arch_gmem_invalidate() to be issued to
> > return the pages to an architecturally-defined "shared" state if the
> > pages were previously allocated and transitioned to a private state via
> > kvm_arch_gmem_prepare().
> >
> > Handle this by issuing the appropriate kvm_arch_gmem_invalidate() calls
> > when converting ranges in the filemap to a shared state.
> >
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> > virt/kvm/guest_memfd.c | 22 ++++++++++++++++++++++
> > 1 file changed, 22 insertions(+)
> >
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index b77cdccd340e..f27e1f3962bb 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -203,6 +203,28 @@ static int kvm_gmem_shareability_apply(struct inode *inode,
> > struct maple_tree *mt;
> >
> > mt = &kvm_gmem_private(inode)->shareability;
> > +
> > + /*
> > + * If a folio has been allocated then it was possibly in a private
> > + * state prior to conversion. Ensure arch invalidations are issued
> > + * to return the folio to a normal/shared state as defined by the
> > + * architecture before tracking it as shared in gmem.
> > + */
> > + if (m == SHAREABILITY_ALL) {
> > + pgoff_t idx;
> > +
> > + for (idx = work->start; idx < work->start + work->nr_pages; idx++) {
>
> It is redundant to enter this loop for VM variants that don't need
> this loop e.g. for pKVM/TDX. I think KVM can dictate a set of rules
> (based on VM type) that guest_memfd will follow for memory management
> when it's created, e.g. something like:
> 1) needs pfn invalidation
> 2) needs zeroing on shared faults
> 3) needs zeroing on allocation
Makes sense. Maybe internal/reserved GUEST_MEMFD_FLAG_*'s that can be passed
to kvm_gmem_create()?
-Mike
>
> > + struct folio *folio = filemap_lock_folio(inode->i_mapping, idx);
> > +
> > + if (!IS_ERR(folio)) {
> > + kvm_arch_gmem_invalidate(folio_pfn(folio),
> > + folio_pfn(folio) + folio_nr_pages(folio));
> > + folio_unlock(folio);
> > + folio_put(folio);
> > + }
> > + }
> > + }
> > +
> > return kvm_gmem_shareability_store(mt, work->start, work->nr_pages, m);
> > }
> >
> > --
> > 2.25.1
> >
On Tue, Jul 15, 2025 at 3:56 PM Michael Roth <michael.roth@amd.com> wrote:
> > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > > index b77cdccd340e..f27e1f3962bb 100644
> > > --- a/virt/kvm/guest_memfd.c
> > > +++ b/virt/kvm/guest_memfd.c
> > > @@ -203,6 +203,28 @@ static int kvm_gmem_shareability_apply(struct inode *inode,
> > > struct maple_tree *mt;
> > >
> > > mt = &kvm_gmem_private(inode)->shareability;
> > > +
> > > + /*
> > > + * If a folio has been allocated then it was possibly in a private
> > > + * state prior to conversion. Ensure arch invalidations are issued
> > > + * to return the folio to a normal/shared state as defined by the
> > > + * architecture before tracking it as shared in gmem.
> > > + */
> > > + if (m == SHAREABILITY_ALL) {
> > > + pgoff_t idx;
> > > +
> > > + for (idx = work->start; idx < work->start + work->nr_pages; idx++) {
> >
> > It is redundant to enter this loop for VM variants that don't need
> > this loop e.g. for pKVM/TDX. I think KVM can dictate a set of rules
> > (based on VM type) that guest_memfd will follow for memory management
> > when it's created, e.g. something like:
> > 1) needs pfn invalidation
> > 2) needs zeroing on shared faults
> > 3) needs zeroing on allocation
>
> Makes sense. Maybe internal/reserved GUEST_MEMFD_FLAG_*'s that can be passed
> to kvm_gmem_create()?
Yeah, a set of internal flags in addition to what is passed by user
space looks good to me. i.e. Something like:
-int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
+int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd
*args, u64 kvm_flags)
>
> -Mike
© 2016 - 2026 Red Hat, Inc.