[PATCH RFC v1 3/5] KVM: guest_memfd: Call arch invalidation hooks when converting to shared

Michael Roth posted 5 patches 3 months, 4 weeks ago
[PATCH RFC v1 3/5] KVM: guest_memfd: Call arch invalidation hooks when converting to shared
Posted by Michael Roth 3 months, 4 weeks ago
When guest_memfd is used for both shared/private memory, converting
pages to shared may require kvm_arch_gmem_invalidate() to be issued to
return the pages to an architecturally-defined "shared" state if the
pages were previously allocated and transitioned to a private state via
kvm_arch_gmem_prepare().

Handle this by issuing the appropriate kvm_arch_gmem_invalidate() calls
when converting ranges in the filemap to a shared state.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 virt/kvm/guest_memfd.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b77cdccd340e..f27e1f3962bb 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -203,6 +203,28 @@ static int kvm_gmem_shareability_apply(struct inode *inode,
 	struct maple_tree *mt;
 
 	mt = &kvm_gmem_private(inode)->shareability;
+
+	/*
+	 * If a folio has been allocated then it was possibly in a private
+	 * state prior to conversion. Ensure arch invalidations are issued
+	 * to return the folio to a normal/shared state as defined by the
+	 * architecture before tracking it as shared in gmem.
+	 */
+	if (m == SHAREABILITY_ALL) {
+		pgoff_t idx;
+
+		for (idx = work->start; idx < work->start + work->nr_pages; idx++) {
+			struct folio *folio = filemap_lock_folio(inode->i_mapping, idx);
+
+			if (!IS_ERR(folio)) {
+				kvm_arch_gmem_invalidate(folio_pfn(folio),
+							 folio_pfn(folio) + folio_nr_pages(folio));
+				folio_unlock(folio);
+				folio_put(folio);
+			}
+		}
+	}
+
 	return kvm_gmem_shareability_store(mt, work->start, work->nr_pages, m);
 }
 
-- 
2.25.1
Re: [PATCH RFC v1 3/5] KVM: guest_memfd: Call arch invalidation hooks when converting to shared
Posted by Vishal Annapurve 2 months, 3 weeks ago
On Thu, Jun 12, 2025 at 5:56 PM Michael Roth <michael.roth@amd.com> wrote:
>
> When guest_memfd is used for both shared/private memory, converting
> pages to shared may require kvm_arch_gmem_invalidate() to be issued to
> return the pages to an architecturally-defined "shared" state if the
> pages were previously allocated and transitioned to a private state via
> kvm_arch_gmem_prepare().
>
> Handle this by issuing the appropriate kvm_arch_gmem_invalidate() calls
> when converting ranges in the filemap to a shared state.
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  virt/kvm/guest_memfd.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index b77cdccd340e..f27e1f3962bb 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -203,6 +203,28 @@ static int kvm_gmem_shareability_apply(struct inode *inode,
>         struct maple_tree *mt;
>
>         mt = &kvm_gmem_private(inode)->shareability;
> +
> +       /*
> +        * If a folio has been allocated then it was possibly in a private
> +        * state prior to conversion. Ensure arch invalidations are issued
> +        * to return the folio to a normal/shared state as defined by the
> +        * architecture before tracking it as shared in gmem.
> +        */
> +       if (m == SHAREABILITY_ALL) {
> +               pgoff_t idx;
> +
> +               for (idx = work->start; idx < work->start + work->nr_pages; idx++) {

It is redundant to enter this loop for VM variants that don't need
this loop e.g. for pKVM/TDX. I think KVM can dictate a set of rules
(based on VM type) that guest_memfd will follow for memory management
when it's created, e.g. something like:
1) needs pfn invalidation
2) needs zeroing on shared faults
3) needs zeroing on allocation

> +                       struct folio *folio = filemap_lock_folio(inode->i_mapping, idx);
> +
> +                       if (!IS_ERR(folio)) {
> +                               kvm_arch_gmem_invalidate(folio_pfn(folio),
> +                                                        folio_pfn(folio) + folio_nr_pages(folio));
> +                               folio_unlock(folio);
> +                               folio_put(folio);
> +                       }
> +               }
> +       }
> +
>         return kvm_gmem_shareability_store(mt, work->start, work->nr_pages, m);
>  }
>
> --
> 2.25.1
>
Re: [PATCH RFC v1 3/5] KVM: guest_memfd: Call arch invalidation hooks when converting to shared
Posted by Michael Roth 2 months, 3 weeks ago
On Tue, Jul 15, 2025 at 06:20:09AM -0700, Vishal Annapurve wrote:
> On Thu, Jun 12, 2025 at 5:56 PM Michael Roth <michael.roth@amd.com> wrote:
> >
> > When guest_memfd is used for both shared/private memory, converting
> > pages to shared may require kvm_arch_gmem_invalidate() to be issued to
> > return the pages to an architecturally-defined "shared" state if the
> > pages were previously allocated and transitioned to a private state via
> > kvm_arch_gmem_prepare().
> >
> > Handle this by issuing the appropriate kvm_arch_gmem_invalidate() calls
> > when converting ranges in the filemap to a shared state.
> >
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  virt/kvm/guest_memfd.c | 22 ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> >
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index b77cdccd340e..f27e1f3962bb 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -203,6 +203,28 @@ static int kvm_gmem_shareability_apply(struct inode *inode,
> >         struct maple_tree *mt;
> >
> >         mt = &kvm_gmem_private(inode)->shareability;
> > +
> > +       /*
> > +        * If a folio has been allocated then it was possibly in a private
> > +        * state prior to conversion. Ensure arch invalidations are issued
> > +        * to return the folio to a normal/shared state as defined by the
> > +        * architecture before tracking it as shared in gmem.
> > +        */
> > +       if (m == SHAREABILITY_ALL) {
> > +               pgoff_t idx;
> > +
> > +               for (idx = work->start; idx < work->start + work->nr_pages; idx++) {
> 
> It is redundant to enter this loop for VM variants that don't need
> this loop e.g. for pKVM/TDX. I think KVM can dictate a set of rules
> (based on VM type) that guest_memfd will follow for memory management
> when it's created, e.g. something like:
> 1) needs pfn invalidation
> 2) needs zeroing on shared faults
> 3) needs zeroing on allocation

Makes sense. Maybe internal/reserved GUEST_MEMFD_FLAG_*'s that can be passed
to kvm_gmem_create()?

-Mike

> 
> > +                       struct folio *folio = filemap_lock_folio(inode->i_mapping, idx);
> > +
> > +                       if (!IS_ERR(folio)) {
> > +                               kvm_arch_gmem_invalidate(folio_pfn(folio),
> > +                                                        folio_pfn(folio) + folio_nr_pages(folio));
> > +                               folio_unlock(folio);
> > +                               folio_put(folio);
> > +                       }
> > +               }
> > +       }
> > +
> >         return kvm_gmem_shareability_store(mt, work->start, work->nr_pages, m);
> >  }
> >
> > --
> > 2.25.1
> >
Re: [PATCH RFC v1 3/5] KVM: guest_memfd: Call arch invalidation hooks when converting to shared
Posted by Vishal Annapurve 2 months, 3 weeks ago
On Tue, Jul 15, 2025 at 3:56 PM Michael Roth <michael.roth@amd.com> wrote:
> > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > > index b77cdccd340e..f27e1f3962bb 100644
> > > --- a/virt/kvm/guest_memfd.c
> > > +++ b/virt/kvm/guest_memfd.c
> > > @@ -203,6 +203,28 @@ static int kvm_gmem_shareability_apply(struct inode *inode,
> > >         struct maple_tree *mt;
> > >
> > >         mt = &kvm_gmem_private(inode)->shareability;
> > > +
> > > +       /*
> > > +        * If a folio has been allocated then it was possibly in a private
> > > +        * state prior to conversion. Ensure arch invalidations are issued
> > > +        * to return the folio to a normal/shared state as defined by the
> > > +        * architecture before tracking it as shared in gmem.
> > > +        */
> > > +       if (m == SHAREABILITY_ALL) {
> > > +               pgoff_t idx;
> > > +
> > > +               for (idx = work->start; idx < work->start + work->nr_pages; idx++) {
> >
> > It is redundant to enter this loop for VM variants that don't need
> > this loop e.g. for pKVM/TDX. I think KVM can dictate a set of rules
> > (based on VM type) that guest_memfd will follow for memory management
> > when it's created, e.g. something like:
> > 1) needs pfn invalidation
> > 2) needs zeroing on shared faults
> > 3) needs zeroing on allocation
>
> Makes sense. Maybe internal/reserved GUEST_MEMFD_FLAG_*'s that can be passed
> to kvm_gmem_create()?

Yeah, a set of internal flags in addition to what is passed by user
space looks good to me. i.e. Something like:

-int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
+int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd
*args, u64 kvm_flags)

>
> -Mike