[v1] guest_memfd: In-place conversion support

[RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Ackerley Tng 3 months, 3 weeks ago

From: Sean Christopherson <seanjc@google.com>

Implement kvm_gmem_get_memory_attributes() for guest_memfd to allow the KVM
core and architecture code to query per-GFN memory attributes.

kvm_gmem_get_memory_attributes() finds the memory slot for a given GFN and
queries the guest_memfd file's to determine if the page is marked as
private.

If vm_memory_attributes is not enabled, there is no shared/private tracking
at the VM level. Install the guest_memfd implementation as long as
guest_memfd is enabled to give guest_memfd a chance to respond on
attributes.

guest_memfd should look up attributes regardless of whether this memslot is
gmem-only since attributes are now tracked by gmem regardless of whether
mmap() is enabled.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
 include/linux/kvm_host.h |  2 ++
 virt/kvm/guest_memfd.c   | 29 +++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c      |  3 +++
 3 files changed, 34 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 512febf47c265..b8418cc5851f1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2543,6 +2543,8 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 					 struct kvm_gfn_range *range);
 #endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
 
+unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn);
+
 #ifdef CONFIG_KVM_GUEST_MEMFD
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 26cec833766c3..f62facc3ab776 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -518,6 +518,35 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
 	return 0;
 }
 
+unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
+
+	/*
+	 * If this gfn has no associated memslot, there's no chance of the gfn
+	 * being backed by private memory, since guest_memfd must be used for
+	 * private memory, and guest_memfd must be associated with some memslot.
+	 */
+	if (!slot)
+		return 0;
+
+	CLASS(gmem_get_file, file)(slot);
+	if (!file)
+		return false;
+
+	/*
+	 * Don't take the filemap invalidation lock, as temporarily acquiring
+	 * that lock wouldn't provide any meaningful protection.  The caller
+	 * _must_ protect consumption of private vs. shared by checking
+	 * mmu_invalidate_retry_gfn() under mmu_lock.
+	 */
+	guard(rcu)();
+
+	return kvm_gmem_get_attributes(file_inode(file),
+				       kvm_gmem_get_index(slot, gfn));
+}
+EXPORT_SYMBOL_GPL(kvm_gmem_get_memory_attributes);
+
 static struct file_operations kvm_gmem_fops = {
 	.mmap		= kvm_gmem_mmap,
 	.open		= generic_file_open,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6c29770dfa7c8..c73ebdb73070e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2660,6 +2660,9 @@ static void kvm_init_memory_attributes(void)
 	if (vm_memory_attributes)
 		static_call_update(__kvm_get_memory_attributes,
 				   kvm_get_vm_memory_attributes);
+	else if (IS_ENABLED(CONFIG_KVM_GUEST_MEMFD))
+		static_call_update(__kvm_get_memory_attributes,
+				   kvm_gmem_get_memory_attributes);
 	else
 		static_call_update(__kvm_get_memory_attributes,
 				   (void *)__static_call_return0);
-- 
2.51.0.858.gf9c4a03a3a-goog

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Alexey Kardashevskiy 3 weeks, 5 days ago

On 18/10/25 07:11, Ackerley Tng wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Implement kvm_gmem_get_memory_attributes() for guest_memfd to allow the KVM
> core and architecture code to query per-GFN memory attributes.
> 
> kvm_gmem_get_memory_attributes() finds the memory slot for a given GFN and
> queries the guest_memfd file's to determine if the page is marked as
> private.
> 
> If vm_memory_attributes is not enabled, there is no shared/private tracking
> at the VM level. Install the guest_memfd implementation as long as
> guest_memfd is enabled to give guest_memfd a chance to respond on
> attributes.
> 
> guest_memfd should look up attributes regardless of whether this memslot is
> gmem-only since attributes are now tracked by gmem regardless of whether
> mmap() is enabled.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
>   include/linux/kvm_host.h |  2 ++
>   virt/kvm/guest_memfd.c   | 29 +++++++++++++++++++++++++++++
>   virt/kvm/kvm_main.c      |  3 +++
>   3 files changed, 34 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 512febf47c265..b8418cc5851f1 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2543,6 +2543,8 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
>   					 struct kvm_gfn_range *range);
>   #endif /* CONFIG_KVM_VM_MEMORY_ATTRIBUTES */
>   
> +unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn);
> +
>   #ifdef CONFIG_KVM_GUEST_MEMFD
>   int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>   		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 26cec833766c3..f62facc3ab776 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -518,6 +518,35 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>   	return 0;
>   }
>   
> +unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
> +{
> +	struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
> +
> +	/*
> +	 * If this gfn has no associated memslot, there's no chance of the gfn
> +	 * being backed by private memory, since guest_memfd must be used for
> +	 * private memory, and guest_memfd must be associated with some memslot.
> +	 */
> +	if (!slot)
> +		return 0;
> +
> +	CLASS(gmem_get_file, file)(slot);
> +	if (!file)
> +		return false;
> +
> +	/*
> +	 * Don't take the filemap invalidation lock, as temporarily acquiring
> +	 * that lock wouldn't provide any meaningful protection.  The caller
> +	 * _must_ protect consumption of private vs. shared by checking
> +	 * mmu_invalidate_retry_gfn() under mmu_lock.
> +	 */
> +	guard(rcu)();
> +
> +	return kvm_gmem_get_attributes(file_inode(file),
> +				       kvm_gmem_get_index(slot, gfn));
> +}
> +EXPORT_SYMBOL_GPL(kvm_gmem_get_memory_attributes);
> +
>   static struct file_operations kvm_gmem_fops = {
>   	.mmap		= kvm_gmem_mmap,
>   	.open		= generic_file_open,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6c29770dfa7c8..c73ebdb73070e 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2660,6 +2660,9 @@ static void kvm_init_memory_attributes(void)
>   	if (vm_memory_attributes)
>   		static_call_update(__kvm_get_memory_attributes,
>   				   kvm_get_vm_memory_attributes);
> +	else if (IS_ENABLED(CONFIG_KVM_GUEST_MEMFD))
> +		static_call_update(__kvm_get_memory_attributes,
> +				   kvm_gmem_get_memory_attributes);
>   	else
>   		static_call_update(__kvm_get_memory_attributes,
>   				   (void *)__static_call_return0);


I am trying to make it work with TEE-IO where fd of VFIO MMIO is a dmabuf fd while the rest (guest RAM) is gmemfd. The above suggests that if there is gmemfd - then the memory attributes are handled by gmemfd which is... expected?

The problem at hand is that kvm_mmu_faultin_pfn() fails at "if (fault->is_private != kvm_mem_is_private(kvm, fault->gfn))" and marking MMIO as private using kvm_vm_ioctl_set_mem_attributes() does not work as kvm_gmem_get_memory_attributes() fails on dmabuf fds.

I worked around this like below but wonder what is the proper way? Thanks,


@@ -768,13 +768,13 @@ unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
  	 */
  	if (!slot)
  		return 0;
  
  	CLASS(gmem_get_file, file)(slot);
  	if (!file)
-		return false;
+		return kvm_get_vm_memory_attributes(kvm, gfn);
  
  	/*
  	 * Don't take the filemap invalidation lock, as temporarily acquiring
  	 * that lock wouldn't provide any meaningful protection.  The caller
  	 * _must_ protect consumption of private vs. shared by checking
  	 * mmu_invalidate_retry_gfn() under mmu_lock.



-- 
Alexey

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Ackerley Tng 1 week, 6 days ago

Alexey Kardashevskiy <aik@amd.com> writes:

>
> [...snip...]
>
>

Thanks for bringing this up!

> I am trying to make it work with TEE-IO where fd of VFIO MMIO is a dmabuf fd while the rest (guest RAM) is gmemfd. The above suggests that if there is gmemfd - then the memory attributes are handled by gmemfd which is... expected?
>

I think this is not expected.

IIUC MMIO guest physical addresses don't have an associated memslot, but
if you managed to get to that line in kvm_gmem_get_memory_attributes(),
then there is an associated memslot (slot != NULL)?

Either way, guest_memfd shouldn't store attributes for guest physical
addresses that don't belong to some guest_memfd memslot.

I think we need a broader discussion for this on where to store memory
attributes for MMIO addresses.

I think we should at least have line of sight to storing memory
attributes for MMIO addresses, in case we want to design something else,
since we're putting vm_memory_attributes on a deprecation path with this
series.

Sean, what do you think?

Alexey, shall we discuss this at either the upcoming PUCK or guest_memfd
biweekly session?

> The problem at hand is that kvm_mmu_faultin_pfn() fails at "if (fault->is_private != kvm_mem_is_private(kvm, fault->gfn))" and marking MMIO as private using kvm_vm_ioctl_set_mem_attributes() does not work as kvm_gmem_get_memory_attributes() fails on dmabuf fds.
>
> I worked around this like below but wonder what is the proper way? Thanks,
>
>
> @@ -768,13 +768,13 @@ unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
>   	 */
>   	if (!slot)
>   		return 0;
>
>   	CLASS(gmem_get_file, file)(slot);
>   	if (!file)
> -		return false;
> +		return kvm_get_vm_memory_attributes(kvm, gfn);
>
>   	/*
>   	 * Don't take the filemap invalidation lock, as temporarily acquiring
>   	 * that lock wouldn't provide any meaningful protection.  The caller
>   	 * _must_ protect consumption of private vs. shared by checking
>   	 * mmu_invalidate_retry_gfn() under mmu_lock.
>
>
>
> --
> Alexey

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Jason Gunthorpe 1 week, 6 days ago

On Wed, Jan 28, 2026 at 01:47:50PM -0800, Ackerley Tng wrote:
> Alexey Kardashevskiy <aik@amd.com> writes:
> 
> >
> > [...snip...]
> >
> >
> 
> Thanks for bringing this up!
> 
> > I am trying to make it work with TEE-IO where fd of VFIO MMIO is a dmabuf fd while the rest (guest RAM) is gmemfd. The above suggests that if there is gmemfd - then the memory attributes are handled by gmemfd which is... expected?
> >
> 
> I think this is not expected.
> 
> IIUC MMIO guest physical addresses don't have an associated memslot, but
> if you managed to get to that line in kvm_gmem_get_memory_attributes(),
> then there is an associated memslot (slot != NULL)?

I think they should have a memslot, shouldn't they? I imagine creating
a memslot from a FD and the FD can be memfd, guestmemfd, dmabuf, etc,
etc ?

> Either way, guest_memfd shouldn't store attributes for guest physical
> addresses that don't belong to some guest_memfd memslot.
> 
> I think we need a broader discussion for this on where to store memory
> attributes for MMIO addresses.
> 
> I think we should at least have line of sight to storing memory
> attributes for MMIO addresses, in case we want to design something else,
> since we're putting vm_memory_attributes on a deprecation path with this
> series.

I don't know where you want to store them in KVM long term, but they
need to come from the dmabuf itself (probably via a struct
p2pdma_provider) and currently it is OK to assume all DMABUFs are
uncachable MMIO that is safe for the VM to convert into "write
combining" (eg Normal-NC on ARM)

Jason

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Sean Christopherson 1 week, 6 days ago

On Wed, Jan 28, 2026, Jason Gunthorpe wrote:
> On Wed, Jan 28, 2026 at 01:47:50PM -0800, Ackerley Tng wrote:
> > Alexey Kardashevskiy <aik@amd.com> writes:
> > 
> > >
> > > [...snip...]
> > >
> > >
> > 
> > Thanks for bringing this up!
> > 
> > > I am trying to make it work with TEE-IO where fd of VFIO MMIO is a dmabuf
> > > fd while the rest (guest RAM) is gmemfd. The above suggests that if there
> > > is gmemfd - then the memory attributes are handled by gmemfd which is...
> > > expected?
> > >
> > 
> > I think this is not expected.
> > 
> > IIUC MMIO guest physical addresses don't have an associated memslot, but
> > if you managed to get to that line in kvm_gmem_get_memory_attributes(),
> > then there is an associated memslot (slot != NULL)?
> 
> I think they should have a memslot, shouldn't they? I imagine creating
> a memslot from a FD and the FD can be memfd, guestmemfd, dmabuf, etc,
> etc ?

Yeah, there are two flavors of MMIO for KVM guests.  Emulated MMIO, which is
what Ackerley is thinking of, and "host" MMIO (for lack of a better term), which
is what I assume "fd of VFIO MMIO" is referring to.

Emulated MMIO does NOT have memslots[*].  There are some wrinkles and technical
exceptions, e.g. read-only memslots for emulating option ROMs, but by and large,
lack of a memslot means Emulated MMIO.

Host MMIO isn't something KVM really cares about, in the sense that, for the most
part, it's "just another memslot".  KVM x86 does need to identify host MMIO for
vendor specific reasons, e.g. to ensure UC memory stays UC when using EPT (MTRRs
are ignored), to create shared mappings when SME is enabled, and to mitigate the
lovely MMIO Stale Data vulnerability.

But those Host MMIO edge cases are almost entirely contained to make_spte() (see
the kvm_is_mmio_pfn() calls).  And so the vast, vast majority of "MMIO" code in
KVM is dealing with Emulated MMIO, and when most people talk about MMIO in KVM,
they're also talking about Emulated MMIO.

> > Either way, guest_memfd shouldn't store attributes for guest physical
> > addresses that don't belong to some guest_memfd memslot.
> > 
> > I think we need a broader discussion for this on where to store memory
> > attributes for MMIO addresses.
> > 
> > I think we should at least have line of sight to storing memory
> > attributes for MMIO addresses, in case we want to design something else,
> > since we're putting vm_memory_attributes on a deprecation path with this
> > series.
> 
> I don't know where you want to store them in KVM long term, but they
> need to come from the dmabuf itself (probably via a struct
> p2pdma_provider) and currently it is OK to assume all DMABUFs are
> uncachable MMIO that is safe for the VM to convert into "write
> combining" (eg Normal-NC on ARM)

+1.  For guest_memfd, we initially defined per-VM memory attributes to track
private vs. shared.  But as Ackerley noted, we are in the process of deprecating
that support, e.g. by making it incompatible with various guest_memfd features,
in favor of having each guest_memfd instance track the state of a given page.

The original guest_memfd design was that it would _only_ hold private pages, and
so tracking private vs. shared in guest_memfd didn't make any sense.  As we've
pivoted to in-place conversion, tracking private vs. shared in the guest_memfd
has basically become mandatory.  We could maaaaaybe make it work with per-VM
attributes, but it would be insanely complex.

For a dmabuf fd, the story is the same as guest_memfd.  Unless private vs. shared
is all or nothing, and can never change, then the only entity that can track that
info is the owner of the dmabuf.  And even if the private vs. shared attributes
are constant, tracking it external to KVM makes sense, because then the provider
can simply hardcode %true/%false.

As for _how_ to do that, no matter where the attributes are stored, we're going
to have to teach KVM to play nice with a non-guest_memfd provider of private
memory.

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Xu Yilun 1 week ago

> +1.  For guest_memfd, we initially defined per-VM memory attributes to track
> private vs. shared.  But as Ackerley noted, we are in the process of deprecating
> that support, e.g. by making it incompatible with various guest_memfd features,
> in favor of having each guest_memfd instance track the state of a given page.
> 
> The original guest_memfd design was that it would _only_ hold private pages, and
> so tracking private vs. shared in guest_memfd didn't make any sense.  As we've
> pivoted to in-place conversion, tracking private vs. shared in the guest_memfd
> has basically become mandatory.  We could maaaaaybe make it work with per-VM
> attributes, but it would be insanely complex.
> 
> For a dmabuf fd, the story is the same as guest_memfd.  Unless private vs. shared
> is all or nothing, and can never change, then the only entity that can track that
> info is the owner of the dmabuf.  And even if the private vs. shared attributes
> are constant, tracking it external to KVM makes sense, because then the provider
> can simply hardcode %true/%false.  

For CoCo-VM and Tee-IO, I'm wondering if host or KVM has to maintain
the private/shared attribute for "assigned MMIO". I'm not naming them
"host MMIO" cause unlike RAM host never needs to access them, either in
private manner or shared manner.

Traditionally, host maps these MMIOs only because KVM needs HVA->HPA
mapping to find pfn and setup KVM MMU. Now we have FD based approach so
with dmabuf fd, host no longer needs mapping. Does that give confidence
that KVM only needs to setup MMU for this type of MMIO as private/shared
according to guest's intension (which is fault->is_private)?

We don't need to track private/shared in VFIO MMIO dmabuf, only to keep
them unmappable.

> 
> As for _how_ to do that, no matter where the attributes are stored, we're going
> to have to teach KVM to play nice with a non-guest_memfd provider of private
> memory.
>

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Jason Gunthorpe 1 week ago

On Tue, Feb 03, 2026 at 05:56:37PM +0800, Xu Yilun wrote:
> > +1.  For guest_memfd, we initially defined per-VM memory attributes to track
> > private vs. shared.  But as Ackerley noted, we are in the process of deprecating
> > that support, e.g. by making it incompatible with various guest_memfd features,
> > in favor of having each guest_memfd instance track the state of a given page.
> > 
> > The original guest_memfd design was that it would _only_ hold private pages, and
> > so tracking private vs. shared in guest_memfd didn't make any sense.  As we've
> > pivoted to in-place conversion, tracking private vs. shared in the guest_memfd
> > has basically become mandatory.  We could maaaaaybe make it work with per-VM
> > attributes, but it would be insanely complex.
> > 
> > For a dmabuf fd, the story is the same as guest_memfd.  Unless private vs. shared
> > is all or nothing, and can never change, then the only entity that can track that
> > info is the owner of the dmabuf.  And even if the private vs. shared attributes
> > are constant, tracking it external to KVM makes sense, because then the provider
> > can simply hardcode %true/%false.  
> 
> For CoCo-VM and Tee-IO, I'm wondering if host or KVM has to maintain
> the private/shared attribute for "assigned MMIO". I'm not naming them
> "host MMIO" cause unlike RAM host never needs to access them, either in
> private manner or shared manner.
> 
> Traditionally, host maps these MMIOs only because KVM needs HVA->HPA
> mapping to find pfn and setup KVM MMU.

This is not actually completely true, the host mapping still ends up
being used by KVM if it happens to trap and emulate a MMIO touching
instruction.

It really shouldn't do this, but there is a whole set of complex
machinery in KVM and qemu to handle this case.

For example if the MSI-X window is not properly aligned then you have
some MMIO that is trapped and must be reflected to real HW.

So the sharable parts of the BAR should still end up being mmaped into
userspace, I think.

Which means we need VFIO to know what they are, and hopefully it is
just static based on the TDISP reports..

Jason

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Xu Yilun 6 days, 23 hours ago

On Tue, Feb 03, 2026 at 02:16:18PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 03, 2026 at 05:56:37PM +0800, Xu Yilun wrote:
> > > +1.  For guest_memfd, we initially defined per-VM memory attributes to track
> > > private vs. shared.  But as Ackerley noted, we are in the process of deprecating
> > > that support, e.g. by making it incompatible with various guest_memfd features,
> > > in favor of having each guest_memfd instance track the state of a given page.
> > > 
> > > The original guest_memfd design was that it would _only_ hold private pages, and
> > > so tracking private vs. shared in guest_memfd didn't make any sense.  As we've
> > > pivoted to in-place conversion, tracking private vs. shared in the guest_memfd
> > > has basically become mandatory.  We could maaaaaybe make it work with per-VM
> > > attributes, but it would be insanely complex.
> > > 
> > > For a dmabuf fd, the story is the same as guest_memfd.  Unless private vs. shared
> > > is all or nothing, and can never change, then the only entity that can track that
> > > info is the owner of the dmabuf.  And even if the private vs. shared attributes
> > > are constant, tracking it external to KVM makes sense, because then the provider
> > > can simply hardcode %true/%false.  
> > 
> > For CoCo-VM and Tee-IO, I'm wondering if host or KVM has to maintain
> > the private/shared attribute for "assigned MMIO". I'm not naming them
> > "host MMIO" cause unlike RAM host never needs to access them, either in
> > private manner or shared manner.
> > 
> > Traditionally, host maps these MMIOs only because KVM needs HVA->HPA
> > mapping to find pfn and setup KVM MMU.
> 
> This is not actually completely true, the host mapping still ends up
> being used by KVM if it happens to trap and emulate a MMIO touching
> instruction.
> 
> It really shouldn't do this, but there is a whole set of complex
> machinery in KVM and qemu to handle this case.
> 
> For example if the MSI-X window is not properly aligned then you have
> some MMIO that is trapped and must be reflected to real HW.

In this case, the affected pages are not assigned MMIOs and KVM won't
import them. Mapping them is just OK.

> 
> So the sharable parts of the BAR should still end up being mmaped into
> userspace, I think.

This does mean we can't make VFIO totally unmappable. But VFIO can still
try to create unmappable dmabufs for assigned MMIO regions, fail dmabuf
creation or fail mmap() based on the addresses.

> 
> Which means we need VFIO to know what they are, and hopefully it is
> just static based on the TDISP reports..

I don't think VMM need to check TDISP report. The only special thing is
the MSI-X mixed pages which can be figured out by standard PCI
discovery.

Seems this doesn't impact the idea that KVM needs no implication of
Private/Shared from VFIO, as long as VFIO keeps exported dmabufs
unmapped.

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Jason Gunthorpe 6 days, 15 hours ago

On Wed, Feb 04, 2026 at 12:43:16PM +0800, Xu Yilun wrote:
> > Which means we need VFIO to know what they are, and hopefully it is
> > just static based on the TDISP reports..
> 
> I don't think VMM need to check TDISP report. The only special thing is
> the MSI-X mixed pages which can be figured out by standard PCI
> discovery.

Either that or follow along with the guests's choices on
shared/private.

We can't let VFIO mmap a private MMIO page, so it has to know which
pages are private at any moment, and it can't guess.

Jason

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Xu Yilun 5 days, 20 hours ago

On Wed, Feb 04, 2026 at 08:47:15AM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 04, 2026 at 12:43:16PM +0800, Xu Yilun wrote:
> > > Which means we need VFIO to know what they are, and hopefully it is
> > > just static based on the TDISP reports..
> > 
> > I don't think VMM need to check TDISP report. The only special thing is
> > the MSI-X mixed pages which can be figured out by standard PCI
> > discovery.
> 
> Either that or follow along with the guests's choices on
> shared/private.
> 
> We can't let VFIO mmap a private MMIO page, so it has to know which
> pages are private at any moment, and it can't guess.

No we could only let VFIO mmap MMIO pages that need emulation (like this
MSI-X mixed page). MMIOs in such page cannot be assigned to guest so no
way to convert to private.

We don't allow VFIO mmap all asigned MMIO pages, no matter they will be
private or shared. They are assigned to guest, so host don't touch them.
Does that make sense?

> 
> Jason

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Jason Gunthorpe 1 week, 6 days ago

On Wed, Jan 28, 2026 at 05:03:27PM -0800, Sean Christopherson wrote:

> For a dmabuf fd, the story is the same as guest_memfd.  Unless private vs. shared
> is all or nothing, and can never change, then the only entity that can track that
> info is the owner of the dmabuf.  And even if the private vs. shared attributes
> are constant, tracking it external to KVM makes sense, because then the provider
> can simply hardcode %true/%false.

Oh my I had not given that bit any thought. My remarks were just about
normal non-CC systems.

So MMIO starts out shared, and then converts to private when the guest
triggers it. It is not all or nothing, there are permanent shared
holes in the MMIO ranges too.

Beyond that I don't know what people are thinking.

Clearly VFIO has to revoke and disable the DMABUF once any of it
becomes private. VFIO will somehow have to know when it changes modes
from the TSM subsystem.

I guess we could have a special channel for KVM to learn the
shared/private page by page from VFIO as some kind of "aware of CC"
importer.

I suppose AMD needs to mangle the RMP when it changes, and KVM has to
do that.

I forget what ARM does, but I seem to recall there is a call to create
a vPCI function and that is what stuffs the S2? So maybe KVM isn't
even involved? (IIRC people were talking that something else would
call the vPCI function but I haven't seen patches)

No idea what x86 does beyond it has to unmap all the MMIO otherwise
the machine crashes :P

Oh man, what a horrible mess to even contemplate. I'm going to bed.

Jason

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Alexey Kardashevskiy 1 week, 1 day ago

On 29/1/26 12:16, Jason Gunthorpe wrote:
> On Wed, Jan 28, 2026 at 05:03:27PM -0800, Sean Christopherson wrote:
> 
>> For a dmabuf fd, the story is the same as guest_memfd.  Unless private vs. shared
>> is all or nothing, and can never change, then the only entity that can track that
>> info is the owner of the dmabuf.  And even if the private vs. shared attributes
>> are constant, tracking it external to KVM makes sense, because then the provider
>> can simply hardcode %true/%false.
> 
> Oh my I had not given that bit any thought. My remarks were just about
> normal non-CC systems.
> 
> So MMIO starts out shared, and then converts to private when the guest
> triggers it. It is not all or nothing, there are permanent shared
> holes in the MMIO ranges too.
> 
> Beyond that I don't know what people are thinking.
> 
> Clearly VFIO has to revoke and disable the DMABUF once any of it
> becomes private.

huh? Private MMIO still has to be mapped in the NPT (well, on AMD). It is the userspace mapping which we do not want^wneed and we do not by using dmabuf.

> VFIO will somehow have to know when it changes modes
> from the TSM subsystem.
> 
> I guess we could have a special channel for KVM to learn the
> shared/private page by page from VFIO as some kind of "aware of CC"
> importer.

Yilun is doing something like that in (there must be a newer version somewhere)
https://lore.kernel.org/all/20250529053513.1592088-1-yilun.xu@linux.intel.com/


> I suppose AMD needs to mangle the RMP when it changes, and KVM has to
> do that.

True.

> I forget what ARM does, but I seem to recall there is a call to create
> a vPCI function and that is what stuffs the S2? So maybe KVM isn't
> even involved? (IIRC people were talking that something else would
> call the vPCI function but I haven't seen patches)
> 
> No idea what x86 does beyond it has to unmap all the MMIO otherwise
> the machine crashes :P

When it is in the hypervisor area, there is no "x86" :)

The "AMD x86" does not crash if there are mappings which won't work, it faults/fences when these are accessed.
  
> Oh man, what a horrible mess to even contemplate. I'm going to bed.
>
> 
> Jason

-- 
Alexey

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Jason Gunthorpe 1 week ago

On Tue, Feb 03, 2026 at 12:07:46PM +1100, Alexey Kardashevskiy wrote:
> On 29/1/26 12:16, Jason Gunthorpe wrote:
> > On Wed, Jan 28, 2026 at 05:03:27PM -0800, Sean Christopherson wrote:
> > 
> > > For a dmabuf fd, the story is the same as guest_memfd.  Unless private vs. shared
> > > is all or nothing, and can never change, then the only entity that can track that
> > > info is the owner of the dmabuf.  And even if the private vs. shared attributes
> > > are constant, tracking it external to KVM makes sense, because then the provider
> > > can simply hardcode %true/%false.
> > 
> > Oh my I had not given that bit any thought. My remarks were just about
> > normal non-CC systems.
> > 
> > So MMIO starts out shared, and then converts to private when the guest
> > triggers it. It is not all or nothing, there are permanent shared
> > holes in the MMIO ranges too.
> > 
> > Beyond that I don't know what people are thinking.
> > 
> > Clearly VFIO has to revoke and disable the DMABUF once any of it
> > becomes private.
> 
> huh? Private MMIO still has to be mapped in the NPT (well, on
> AMD). It is the userspace mapping which we do not want^wneed and we
> do not by using dmabuf.

Well, we don't know what the DMABUF got imported into, so the non-KVM
importers using the shared mapping certainly have to drop it.

How exactly to make that happen is going to be interesting..

Jason

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Quentin Perret 1 week, 5 days ago

Hi all,

On Wednesday 28 Jan 2026 at 21:16:18 (-0400), Jason Gunthorpe wrote:
> On Wed, Jan 28, 2026 at 05:03:27PM -0800, Sean Christopherson wrote:
> 
> > For a dmabuf fd, the story is the same as guest_memfd.  Unless private vs. shared
> > is all or nothing, and can never change, then the only entity that can track that
> > info is the owner of the dmabuf.  And even if the private vs. shared attributes
> > are constant, tracking it external to KVM makes sense, because then the provider
> > can simply hardcode %true/%false.
> 
> Oh my I had not given that bit any thought. My remarks were just about
> normal non-CC systems.
> 
> So MMIO starts out shared, and then converts to private when the guest
> triggers it. It is not all or nothing, there are permanent shared
> holes in the MMIO ranges too.
> 
> Beyond that I don't know what people are thinking.
> 
> Clearly VFIO has to revoke and disable the DMABUF once any of it
> becomes private. VFIO will somehow have to know when it changes modes
> from the TSM subsystem.
> 
> I guess we could have a special channel for KVM to learn the
> shared/private page by page from VFIO as some kind of "aware of CC"
> importer.

Slightly out of my depth, but I figured I should jump in this discussion
nonetheless; turns out dmabuf vs CoCo is a hot topic for pKVM[*], so
please bear with me :)

It occurred to me that lazily faulting a dmabuf page by page into a
guest isn't particularly useful, because the entire dmabuf is 'paged in'
by construction on the host side (regardless of whether that dmabuf is
backed by memory or MMIO). There is a weird edge case where a memslot
may not cover an entire dmabuf, but perhaps we could simply say 'don't
do that'. Faulting-in the entire dmabuf in one go on the first guest
access would be good for performance, but it doesn't really solve any of
the problems you've listed above.

A not-fully-thought-through-and-possibly-ridiculous idea that crossed
my mind some time ago was to make KVM itself a proper dmabuf
importer. You'd essentially see a guest as a 'device' (probably with an
actual struct dev representing it), and the stage-2 MMU in front of it
as its IOMMU. That could potentially allow KVM to implement dma_map_ops
for that guest 'device' by mapping/unmapping pages into its stage-2 and
such. And in order to get KVM to import a dmabuf, host userspace would
have to pass a dmabuf fd to SET_USER_MEMORY_REGION2, a which point KVM
could check properties about the dmabuf before proceeding with the
import. We could set different expectations about the properties we
want for CoCo vs non-CoCo guests at that level (and yes this could
include having KVM use a special channel with the exporter to check
that).

That has the nice benefit of having a clear KVM-level API to transition
an entire dmabuf fd to 'private' in one go in the CoCo case. And in the
non-CoCo case, we avoid the unnecessary lazy faulting of the dmabuf.

It gets really funny when a CoCo guest decides to share back a subset of
that dmabuf with the host, and I'm still wrapping my head around how
we'd make that work, but at this point I'm ready to be told how all the
above already doesn't work and that I should go back to the peanut
gallery :-)

Cheers,
Quentin

[*] https://www.youtube.com/watch?v=zaBxoyRepzA&list=PLW3ep1uCIRfxwmllXTOA2txfDWN6vUOHp&index=35

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Jason Gunthorpe 1 week, 5 days ago

On Thu, Jan 29, 2026 at 11:10:12AM +0000, Quentin Perret wrote:

> A not-fully-thought-through-and-possibly-ridiculous idea that crossed
> my mind some time ago was to make KVM itself a proper dmabuf
> importer. 

AFAIK this is already the plan. Since Intel cannot tolerate having the
private MMIO mapped into a VMA *at all* there is no other choice.

Since Intel has to build it it I figured everyone would want to use it
because it is probably going to be much faster than reading VMAs.

Especially in the modern world of MMIO BARs in the 512GB range.

> You'd essentially see a guest as a 'device' (probably with an
> actual struct dev representing it), and the stage-2 MMU in front of it
> as its IOMMU. That could potentially allow KVM to implement dma_map_ops
> for that guest 'device' by mapping/unmapping pages into its stage-2 and
> such. 

The plan isn't something so wild..

https://github.com/jgunthorpe/linux/commits/dmabuf_map_type/

The "Physical Address List" mapping type will let KVM just get a
normal phys_addr_t list and do its normal stuff with it. No need for
hacky DMA API things.

Probably what will be hard for KVM is that it gets the entire 512GB in
one shot and will have to chop it up to install the whole thing into
the PTE sizes available in the S2. I don't think it even has logic
like that right now??

> It gets really funny when a CoCo guest decides to share back a subset of
> that dmabuf with the host, and I'm still wrapping my head around how
> we'd make that work, but at this point I'm ready to be told how all the
> above already doesn't work and that I should go back to the peanut
> gallery :-)

Oh, I don't actually know how that ends up working but I suppose it
could be meaningfully done :\

Jason

Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

Posted by Quentin Perret 1 week, 5 days ago

On Thursday 29 Jan 2026 at 09:42:45 (-0400), Jason Gunthorpe wrote:
> On Thu, Jan 29, 2026 at 11:10:12AM +0000, Quentin Perret wrote:
> 
> > A not-fully-thought-through-and-possibly-ridiculous idea that crossed
> > my mind some time ago was to make KVM itself a proper dmabuf
> > importer. 
> 
> AFAIK this is already the plan. Since Intel cannot tolerate having the
> private MMIO mapped into a VMA *at all* there is no other choice.
> 
> Since Intel has to build it it I figured everyone would want to use it
> because it is probably going to be much faster than reading VMAs.

Ack.

> Especially in the modern world of MMIO BARs in the 512GB range.
> 
> > You'd essentially see a guest as a 'device' (probably with an
> > actual struct dev representing it), and the stage-2 MMU in front of it
> > as its IOMMU. That could potentially allow KVM to implement dma_map_ops
> > for that guest 'device' by mapping/unmapping pages into its stage-2 and
> > such. 
> 
> The plan isn't something so wild..

I'll take that as a compliment ;-)

Not dying on that hill, but it didn't feel _that_ horrible after
thinking about it for a little while. From the host's PoV, a guest is
just another thing that can address memory, which has its own address
space and a page-table that we control in front. If you squint hard
enough it doesn't look _that_ different from a device from that angle.
Oh well.

> https://github.com/jgunthorpe/linux/commits/dmabuf_map_type/
> 
> The "Physical Address List" mapping type will let KVM just get a
> normal phys_addr_t list and do its normal stuff with it. No need for
> hacky DMA API things.

Thanks, I'll read up.

> Probably what will be hard for KVM is that it gets the entire 512GB in
> one shot and will have to chop it up to install the whole thing into
> the PTE sizes available in the S2. I don't think it even has logic
> like that right now??

The closest thing I can think of is the KVM_PRE_FAULT_MEMORY stuff in
the KVM API that forces it to fault in an arbitrarily range of guest
IPA space. There should at least be bits of infrastructure that can be
re-used for that I guess.

> > It gets really funny when a CoCo guest decides to share back a subset of
> > that dmabuf with the host, and I'm still wrapping my head around how
> > we'd make that work, but at this point I'm ready to be told how all the
> > above already doesn't work and that I should go back to the peanut
> > gallery :-)
> 
> Oh, I don't actually know how that ends up working but I suppose it
> could be meaningfully done :\

For mobile/pKVM we'll want to use dmabufs for more than just passing
MMIO to guests FWIW, it'll likely be used for memory in certain cases
too. There are examples in the KVM Forum talk I linked in the previous
email, but being able to feed guests with dmabuf-backed memory regions
is very helpful. That's useful to e.g. get physically contiguous memory
allocated from a CMA-backed dmabuf heap on systems that don't tolerate
scattered private memory well for example (either for functional or
performance reasons). I certainly wish we could ignore this type of
hardware, but we don't have that luxury sadly.

In cases like that, we certainly expect that the guest will be sharing
back parts of memory it's been given (at least a swiotlb bounce buffer
so it can do virtio etc), and that may very well be in the middle of a
dmabuf-backed memslot. In fact the guest has no clue what is backing
it's memory region, so we can't really expect it _not_ to do that :/