[PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map

Roy, Patrick posted 12 patches 1 month ago
There is a newer version of this series
[PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map
Posted by Roy, Patrick 1 month ago
Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD()
ioctl. When set, guest_memfd folios will be removed from the direct map
after preparation, with direct map entries only restored when the folios
are freed.

To ensure these folios do not end up in places where the kernel cannot
deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested.

Add KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP to let userspace discover whether
guest_memfd supports GUEST_MEMFD_FLAG_NO_DIRECT_MAP. Support depends on
guest_memfd itself being supported, but also on whether KVM can
manipulate the direct map at page granularity at all (possible most of
the time, just arm64 is a notable outlier where its impossible if the
direct map has been setup using hugepages, as arm64 cannot break these
apart due to break-before-make semantics).

Note that this flag causes removal of direct map entries for all
guest_memfd folios independent of whether they are "shared" or "private"
(although current guest_memfd only supports either all folios in the
"shared" state, or all folios in the "private" state if
GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map
entries of also the shared parts of guest_memfd are a special type of
non-CoCo VM where, host userspace is trusted to have access to all of
guest memory, but where Spectre-style transient execution attacks
through the host kernel's direct map should still be mitigated.  In this
setup, KVM retains access to guest memory via userspace mappings of
guest_memfd, which are reflected back into KVM's memslots via
userspace_addr. This is needed for things like MMIO emulation on x86_64
to work.

Do not perform TLB flushes after direct map manipulations. This is
because TLB flushes resulted in a up to 40x elongation of page faults in
guest_memfd (scaling with the number of CPU cores), or a 5x elongation
of memory population. TLB flushes are not needed for functional
correctness (the virt->phys mapping technically stays "correct",  the
kernel should simply to not it for a while). On the other hand, it means
that the desired protection from Spectre-style attacks is not perfect,
as an attacker could try to prevent a stale TLB entry from getting
evicted, keeping it alive until the page it refers to is used by the
guest for some sensitive data, and then targeting it using a
spectre-gadget.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
 include/linux/kvm_host.h          |  7 +++++++
 include/uapi/linux/kvm.h          |  2 ++
 virt/kvm/guest_memfd.c            | 29 +++++++++++++++++++++++++----
 virt/kvm/kvm_main.c               |  5 +++++
 5 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2f2394cce24e..0bfd8e5fd9de 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -19,6 +19,7 @@
 #include <linux/maple_tree.h>
 #include <linux/percpu.h>
 #include <linux/psci.h>
+#include <linux/set_memory.h>
 #include <asm/arch_gicv3.h>
 #include <asm/barrier.h>
 #include <asm/cpufeature.h>
@@ -1706,5 +1707,16 @@ void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt);
 void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *res1);
 void check_feature_map(void);
 
+#ifdef CONFIG_KVM_GUEST_MEMFD
+static inline bool kvm_arch_gmem_supports_no_direct_map(void)
+{
+	/*
+	 * Without FWB, direct map access is needed in kvm_pgtable_stage2_map(),
+	 * as it calls dcache_clean_inval_poc().
+	 */
+	return can_set_direct_map() && cpus_have_final_cap(ARM64_HAS_STAGE2_FWB);
+}
+#define kvm_arch_gmem_supports_no_direct_map kvm_arch_gmem_supports_no_direct_map
+#endif /* CONFIG_KVM_GUEST_MEMFD */
 
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8b47891adca1..37553848e078 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -36,6 +36,7 @@
 #include <linux/rbtree.h>
 #include <linux/xarray.h>
 #include <asm/signal.h>
+#include <linux/set_memory.h>
 
 #include <linux/kvm.h>
 #include <linux/kvm_para.h>
@@ -731,6 +732,12 @@ static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
 bool kvm_arch_supports_gmem_mmap(struct kvm *kvm);
 #endif
 
+#ifdef CONFIG_KVM_GUEST_MEMFD
+#ifndef kvm_arch_gmem_supports_no_direct_map
+#define kvm_arch_gmem_supports_no_direct_map can_set_direct_map
+#endif
+#endif /* CONFIG_KVM_GUEST_MEMFD */
+
 #ifndef kvm_arch_has_readonly_mem
 static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
 {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6efa98a57ec1..33c8e8946019 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -963,6 +963,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_RISCV_MP_STATE_RESET 242
 #define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243
 #define KVM_CAP_GUEST_MEMFD_MMAP 244
+#define KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP 245
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
@@ -1600,6 +1601,7 @@ struct kvm_memory_attributes {
 
 #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
 #define GUEST_MEMFD_FLAG_MMAP	(1ULL << 0)
+#define GUEST_MEMFD_FLAG_NO_DIRECT_MAP (1ULL << 1)
 
 struct kvm_create_guest_memfd {
 	__u64 size;
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 9ec4c45e3cf2..e3696880405c 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -4,6 +4,7 @@
 #include <linux/kvm_host.h>
 #include <linux/pagemap.h>
 #include <linux/anon_inodes.h>
+#include <linux/set_memory.h>
 
 #include "kvm_mm.h"
 
@@ -42,8 +43,18 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
 	return 0;
 }
 
+static bool kvm_gmem_test_no_direct_map(struct inode *inode)
+{
+	return ((unsigned long) inode->i_private) & GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
+}
+
 static inline void kvm_gmem_mark_prepared(struct folio *folio)
 {
+	struct inode *inode = folio_inode(folio);
+
+	if (kvm_gmem_test_no_direct_map(inode))
+		set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), false);
+
 	folio_mark_uptodate(folio);
 }
 
@@ -429,25 +440,29 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
 	return MF_DELAYED;
 }
 
-#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 static void kvm_gmem_free_folio(struct address_space *mapping,
 				struct folio *folio)
 {
 	struct page *page = folio_page(folio, 0);
+
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 	kvm_pfn_t pfn = page_to_pfn(page);
 	int order = folio_order(folio);
+#endif
 
+	if (kvm_gmem_test_no_direct_map(mapping->host))
+		WARN_ON_ONCE(set_direct_map_valid_noflush(page, folio_nr_pages(folio), true));
+
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 	kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
-}
 #endif
+}
 
 static const struct address_space_operations kvm_gmem_aops = {
 	.dirty_folio = noop_dirty_folio,
 	.migrate_folio	= kvm_gmem_migrate_folio,
 	.error_remove_folio = kvm_gmem_error_folio,
-#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 	.free_folio = kvm_gmem_free_folio,
-#endif
 };
 
 static int kvm_gmem_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
@@ -504,6 +519,9 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	/* Unmovable mappings are supposed to be marked unevictable as well. */
 	WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
 
+	if (flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)
+		mapping_set_no_direct_map(inode->i_mapping);
+
 	kvm_get_kvm(kvm);
 	gmem->kvm = kvm;
 	xa_init(&gmem->bindings);
@@ -528,6 +546,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
 	if (kvm_arch_supports_gmem_mmap(kvm))
 		valid_flags |= GUEST_MEMFD_FLAG_MMAP;
 
+	if (kvm_arch_gmem_supports_no_direct_map())
+		valid_flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
+
 	if (flags & ~valid_flags)
 		return -EINVAL;
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 18f29ef93543..0dbfd17e1191 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -65,6 +65,7 @@
 #include <trace/events/kvm.h>
 
 #include <linux/kvm_dirty_ring.h>
+#include <linux/set_memory.h>
 
 
 /* Worst case buffer size needed for holding an integer. */
@@ -4916,6 +4917,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 		return kvm_supported_mem_attributes(kvm);
 #endif
 #ifdef CONFIG_KVM_GUEST_MEMFD
+	case KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP:
+		if (!can_set_direct_map())
+			return false;
+		fallthrough;
 	case KVM_CAP_GUEST_MEMFD:
 		return 1;
 	case KVM_CAP_GUEST_MEMFD_MMAP:
-- 
2.50.1

Re: [PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map
Posted by Mike Rapoport 1 month ago
On Thu, Aug 28, 2025 at 09:39:21AM +0000, Roy, Patrick wrote:
> Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD()
> ioctl. When set, guest_memfd folios will be removed from the direct map
> after preparation, with direct map entries only restored when the folios
> are freed.
> 
> To ensure these folios do not end up in places where the kernel cannot
> deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
> address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested.
> 
> Add KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP to let userspace discover whether
> guest_memfd supports GUEST_MEMFD_FLAG_NO_DIRECT_MAP. Support depends on
> guest_memfd itself being supported, but also on whether KVM can
> manipulate the direct map at page granularity at all (possible most of
> the time, just arm64 is a notable outlier where its impossible if the
> direct map has been setup using hugepages, as arm64 cannot break these
> apart due to break-before-make semantics).

There's also powerpc that does not select ARCH_HAS_SET_DIRECT_MAP
 
> Note that this flag causes removal of direct map entries for all
> guest_memfd folios independent of whether they are "shared" or "private"
> (although current guest_memfd only supports either all folios in the
> "shared" state, or all folios in the "private" state if
> GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map
> entries of also the shared parts of guest_memfd are a special type of
> non-CoCo VM where, host userspace is trusted to have access to all of
> guest memory, but where Spectre-style transient execution attacks
> through the host kernel's direct map should still be mitigated.  In this
> setup, KVM retains access to guest memory via userspace mappings of
> guest_memfd, which are reflected back into KVM's memslots via
> userspace_addr. This is needed for things like MMIO emulation on x86_64
> to work.
> 
> Do not perform TLB flushes after direct map manipulations. This is
> because TLB flushes resulted in a up to 40x elongation of page faults in
> guest_memfd (scaling with the number of CPU cores), or a 5x elongation
> of memory population. TLB flushes are not needed for functional
> correctness (the virt->phys mapping technically stays "correct",  the
> kernel should simply to not it for a while). On the other hand, it means

                          ^ not use it?

> that the desired protection from Spectre-style attacks is not perfect,
> as an attacker could try to prevent a stale TLB entry from getting
> evicted, keeping it alive until the page it refers to is used by the
> guest for some sensitive data, and then targeting it using a
> spectre-gadget.
> 
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> ---
>  arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
>  include/linux/kvm_host.h          |  7 +++++++
>  include/uapi/linux/kvm.h          |  2 ++
>  virt/kvm/guest_memfd.c            | 29 +++++++++++++++++++++++++----
>  virt/kvm/kvm_main.c               |  5 +++++
>  5 files changed, 51 insertions(+), 4 deletions(-)

...
 
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 9ec4c45e3cf2..e3696880405c 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -4,6 +4,7 @@
>  #include <linux/kvm_host.h>
>  #include <linux/pagemap.h>
>  #include <linux/anon_inodes.h>
> +#include <linux/set_memory.h>
>  
>  #include "kvm_mm.h"
>  
> @@ -42,8 +43,18 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
>  	return 0;
>  }
>  
> +static bool kvm_gmem_test_no_direct_map(struct inode *inode)
> +{
> +	return ((unsigned long) inode->i_private) & GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
> +}
> +
>  static inline void kvm_gmem_mark_prepared(struct folio *folio)
>  {
> +	struct inode *inode = folio_inode(folio);
> +
> +	if (kvm_gmem_test_no_direct_map(inode))
> +		set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), false);

This may fail to split large mapping in the direct map. Why not move this
to kvm_gmem_prepare_folio() where you can handle returned error?

I think that using set_direct_map_invalid_noflush() here and
set_direct_map_default_noflush() in kvm_gmem_free_folio() better is
clearer and makes it more obvious that here the folio is removed from the
direct map and when freed it's direct mapping is restored.

This requires to export two symbols in patch 2, but I think it's worth it.

>  	folio_mark_uptodate(folio);
>  }
>  
> @@ -429,25 +440,29 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
>  	return MF_DELAYED;
>  }
>  
> -#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>  static void kvm_gmem_free_folio(struct address_space *mapping,
>  				struct folio *folio)
>  {
>  	struct page *page = folio_page(folio, 0);
> +
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>  	kvm_pfn_t pfn = page_to_pfn(page);
>  	int order = folio_order(folio);
> +#endif
>  
> +	if (kvm_gmem_test_no_direct_map(mapping->host))
> +		WARN_ON_ONCE(set_direct_map_valid_noflush(page, folio_nr_pages(folio), true));

I don't think it can fail here. The direct map was split when you removed
the folio so here it will merely update the prot bits.

> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>  	kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
> -}
>  #endif
> +}

Instead of moving #ifdefs into kvm_gmem_free_folio() it's better to add, say,
kvm_gmem_invalidate() and move ifdefery there or even better have a static
inline stub for !CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE case.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 18f29ef93543..0dbfd17e1191 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -65,6 +65,7 @@
>  #include <trace/events/kvm.h>
>  
>  #include <linux/kvm_dirty_ring.h>
> +#include <linux/set_memory.h>
>  
>  
>  /* Worst case buffer size needed for holding an integer. */
> @@ -4916,6 +4917,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  		return kvm_supported_mem_attributes(kvm);
>  #endif
>  #ifdef CONFIG_KVM_GUEST_MEMFD
> +	case KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP:
> +		if (!can_set_direct_map())

Shouldn't this check with kvm_arch_gmem_supports_no_direct_map()?

> +			return false;
> +		fallthrough;
>  	case KVM_CAP_GUEST_MEMFD:
>  		return 1;
>  	case KVM_CAP_GUEST_MEMFD_MMAP:
> -- 
> 2.50.1
> 

-- 
Sincerely yours,
Mike.
Re: [PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map
Posted by Roy, Patrick 1 month ago
On Thu, 2025-08-28 at 15:54 +0100, Mike Rapoport wrote:
> On Thu, Aug 28, 2025 at 09:39:21AM +0000, Roy, Patrick wrote:
>> Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD()
>> ioctl. When set, guest_memfd folios will be removed from the direct map
>> after preparation, with direct map entries only restored when the folios
>> are freed.
>>
>> To ensure these folios do not end up in places where the kernel cannot
>> deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
>> address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested.
>>
>> Add KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP to let userspace discover whether
>> guest_memfd supports GUEST_MEMFD_FLAG_NO_DIRECT_MAP. Support depends on
>> guest_memfd itself being supported, but also on whether KVM can
>> manipulate the direct map at page granularity at all (possible most of
>> the time, just arm64 is a notable outlier where its impossible if the
>> direct map has been setup using hugepages, as arm64 cannot break these
>> apart due to break-before-make semantics).
> 
> There's also powerpc that does not select ARCH_HAS_SET_DIRECT_MAP

Ah, thanks! Although powerpc also doesnt support guest_memfd in the first
place, but will mention.

>> Note that this flag causes removal of direct map entries for all
>> guest_memfd folios independent of whether they are "shared" or "private"
>> (although current guest_memfd only supports either all folios in the
>> "shared" state, or all folios in the "private" state if
>> GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map
>> entries of also the shared parts of guest_memfd are a special type of
>> non-CoCo VM where, host userspace is trusted to have access to all of
>> guest memory, but where Spectre-style transient execution attacks
>> through the host kernel's direct map should still be mitigated.  In this
>> setup, KVM retains access to guest memory via userspace mappings of
>> guest_memfd, which are reflected back into KVM's memslots via
>> userspace_addr. This is needed for things like MMIO emulation on x86_64
>> to work.
>>
>> Do not perform TLB flushes after direct map manipulations. This is
>> because TLB flushes resulted in a up to 40x elongation of page faults in
>> guest_memfd (scaling with the number of CPU cores), or a 5x elongation
>> of memory population. TLB flushes are not needed for functional
>> correctness (the virt->phys mapping technically stays "correct",  the
>> kernel should simply to not it for a while). On the other hand, it means
> 
>                           ^ not use it?

Yup, thanks!

>> that the desired protection from Spectre-style attacks is not perfect,
>> as an attacker could try to prevent a stale TLB entry from getting
>> evicted, keeping it alive until the page it refers to is used by the
>> guest for some sensitive data, and then targeting it using a
>> spectre-gadget.
>>
>> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
>> ---
>>  arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
>>  include/linux/kvm_host.h          |  7 +++++++
>>  include/uapi/linux/kvm.h          |  2 ++
>>  virt/kvm/guest_memfd.c            | 29 +++++++++++++++++++++++++----
>>  virt/kvm/kvm_main.c               |  5 +++++
>>  5 files changed, 51 insertions(+), 4 deletions(-)
> 
> ...
> 
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index 9ec4c45e3cf2..e3696880405c 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -4,6 +4,7 @@
>>  #include <linux/kvm_host.h>
>>  #include <linux/pagemap.h>
>>  #include <linux/anon_inodes.h>
>> +#include <linux/set_memory.h>
>>
>>  #include "kvm_mm.h"
>>
>> @@ -42,8 +43,18 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
>>       return 0;
>>  }
>>
>> +static bool kvm_gmem_test_no_direct_map(struct inode *inode)
>> +{
>> +     return ((unsigned long) inode->i_private) & GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
>> +}
>> +
>>  static inline void kvm_gmem_mark_prepared(struct folio *folio)
>>  {
>> +     struct inode *inode = folio_inode(folio);
>> +
>> +     if (kvm_gmem_test_no_direct_map(inode))
>> +             set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), false);
> 
> This may fail to split large mapping in the direct map. Why not move this
> to kvm_gmem_prepare_folio() where you can handle returned error?

Argh, yeah, got that the wrong way around. Will update the error handling.

> I think that using set_direct_map_invalid_noflush() here and
> set_direct_map_default_noflush() in kvm_gmem_free_folio() better is
> clearer and makes it more obvious that here the folio is removed from the
> direct map and when freed it's direct mapping is restored.
> 
> This requires to export two symbols in patch 2, but I think it's worth it.

Mh, but set_direct_map_[default|invalid]_noflush() only take a single struct
page * argument, so they'd either need to gain a npages argument, or we add yet
more functions to set_memory.h.  Do you still think that's worth it? 

>>       folio_mark_uptodate(folio);
>>  }
>>
>> @@ -429,25 +440,29 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
>>       return MF_DELAYED;
>>  }
>>
>> -#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>>  static void kvm_gmem_free_folio(struct address_space *mapping,
>>                               struct folio *folio)
>>  {
>>       struct page *page = folio_page(folio, 0);
>> +
>> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>>       kvm_pfn_t pfn = page_to_pfn(page);
>>       int order = folio_order(folio);
>> +#endif
>>
>> +     if (kvm_gmem_test_no_direct_map(mapping->host))
>> +             WARN_ON_ONCE(set_direct_map_valid_noflush(page, folio_nr_pages(folio), true));
> 
> I don't think it can fail here. The direct map was split when you removed
> the folio so here it will merely update the prot bits.

Yup, will drop this WARN_ON_ONCE.

>> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>>       kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
>> -}
>>  #endif
>> +}
> 
> Instead of moving #ifdefs into kvm_gmem_free_folio() it's better to add, say,
> kvm_gmem_invalidate() and move ifdefery there or even better have a static
> inline stub for !CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE case.

Ack, will do the latter

>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 18f29ef93543..0dbfd17e1191 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -65,6 +65,7 @@
>>  #include <trace/events/kvm.h>
>>
>>  #include <linux/kvm_dirty_ring.h>
>> +#include <linux/set_memory.h>
>>
>>
>>  /* Worst case buffer size needed for holding an integer. */
>> @@ -4916,6 +4917,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>>               return kvm_supported_mem_attributes(kvm);
>>  #endif
>>  #ifdef CONFIG_KVM_GUEST_MEMFD
>> +     case KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP:
>> +             if (!can_set_direct_map())
> 
> Shouldn't this check with kvm_arch_gmem_supports_no_direct_map()?

Absolutely, thanks for catching!

>> +                     return false;
>> +             fallthrough;
>>       case KVM_CAP_GUEST_MEMFD:
>>               return 1;
>>       case KVM_CAP_GUEST_MEMFD_MMAP:
>> --
>> 2.50.1
>>
> 
> --
> Sincerely yours,
> Mike.

Best,
Patrick
Re: [PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map
Posted by Mike Rapoport 1 month ago
On Mon, Sep 01, 2025 at 02:22:22PM +0000, Roy, Patrick wrote:
> On Thu, 2025-08-28 at 15:54 +0100, Mike Rapoport wrote:
> > On Thu, Aug 28, 2025 at 09:39:21AM +0000, Roy, Patrick wrote:
> >
> >>  static inline void kvm_gmem_mark_prepared(struct folio *folio)
> >>  {
> >> +     struct inode *inode = folio_inode(folio);
> >> +
> >> +     if (kvm_gmem_test_no_direct_map(inode))
> >> +             set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), false);
> > 
> > This may fail to split large mapping in the direct map. Why not move this
> > to kvm_gmem_prepare_folio() where you can handle returned error?
> 
> Argh, yeah, got that the wrong way around. Will update the error handling.
> 
> > I think that using set_direct_map_invalid_noflush() here and
> > set_direct_map_default_noflush() in kvm_gmem_free_folio() better is
> > clearer and makes it more obvious that here the folio is removed from the
> > direct map and when freed it's direct mapping is restored.
> > 
> > This requires to export two symbols in patch 2, but I think it's worth it.
> 
> Mh, but set_direct_map_[default|invalid]_noflush() only take a single struct
> page * argument, so they'd either need to gain a npages argument, or we add yet
> more functions to set_memory.h.  Do you still think that's worth it? 

Ah, right, misremembered that. Let's keep set_direct_map_valid_noflush().
 
-- 
Sincerely yours,
Mike.