From: Ankit Agrawal <ankita@nvidia.com>
Generalizing S2 setting from DEVICE_nGnRE to NormalNc for non PCI
devices may be problematic. E.g. GICv2 vCPU interface, which is
effectively a shared peripheral, can allow a guest to affect another
guest's interrupt distribution. The issue may be solved by limiting
the relaxation to mappings that have a user VMA. Still there is
insufficient information and uncertainity in the behavior of
non PCI drivers.
Add a new flag VM_ALLOW_ANY_UNCACHED to indicate KVM that the device
is WC capable and these S2 changes can be extended to it. KVM can use
this flag to activate the code.
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
include/linux/mm.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f5a97dec5169..59576e56c58b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -391,6 +391,20 @@ extern unsigned int kobjsize(const void *objp);
# define VM_UFFD_MINOR VM_NONE
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
+/*
+ * This flag is used to connect VFIO to arch specific KVM code. It
+ * indicates that the memory under this VMA is safe for use with any
+ * non-cachable memory type inside KVM. Some VFIO devices, on some
+ * platforms, are thought to be unsafe and can cause machine crashes
+ * if KVM does not lock down the memory type.
+ */
+#ifdef CONFIG_64BIT
+#define VM_ALLOW_ANY_UNCACHED_BIT 39
+#define VM_ALLOW_ANY_UNCACHED BIT(VM_ALLOW_ANY_UNCACHED_BIT)
+#else
+#define VM_ALLOW_ANY_UNCACHED VM_NONE
+#endif
+
/* Bits set in the VMA until the stack is in its final location */
#define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ | VM_STACK_EARLY)
--
2.34.1
On 11.02.24 18:47, ankita@nvidia.com wrote: > From: Ankit Agrawal <ankita@nvidia.com> > > Generalizing S2 setting from DEVICE_nGnRE to NormalNc for non PCI > devices may be problematic. E.g. GICv2 vCPU interface, which is > effectively a shared peripheral, can allow a guest to affect another > guest's interrupt distribution. The issue may be solved by limiting > the relaxation to mappings that have a user VMA. Still there is > insufficient information and uncertainity in the behavior of s/uncertainity/uncertainty/ > non PCI drivers. > > Add a new flag VM_ALLOW_ANY_UNCACHED to indicate KVM that the device > is WC capable and these S2 changes can be extended to it. KVM can use > this flag to activate the code. > MM people will stumble only over this commit at some point, looking for details. It might make sense to add a bit more details on the underlying problem (user space tables vs. stage-1 vs. stage-2) and why we want to have a different mapping in user space compared to stage-1. Then, describe that the VMA flag was found to be the simplest and cleanest way to communicate this information from VFIO to KVM. > Suggested-by: Catalin Marinas <catalin.marinas@arm.com> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Ankit Agrawal <ankita@nvidia.com> > --- > include/linux/mm.h | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index f5a97dec5169..59576e56c58b 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -391,6 +391,20 @@ extern unsigned int kobjsize(const void *objp); > # define VM_UFFD_MINOR VM_NONE > #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ > > +/* > + * This flag is used to connect VFIO to arch specific KVM code. It > + * indicates that the memory under this VMA is safe for use with any > + * non-cachable memory type inside KVM. Some VFIO devices, on some > + * platforms, are thought to be unsafe and can cause machine crashes > + * if KVM does not lock down the memory type. > + */ > +#ifdef CONFIG_64BIT > +#define VM_ALLOW_ANY_UNCACHED_BIT 39 > +#define VM_ALLOW_ANY_UNCACHED BIT(VM_ALLOW_ANY_UNCACHED_BIT) > +#else > +#define VM_ALLOW_ANY_UNCACHED VM_NONE > +#endif > + > /* Bits set in the VMA until the stack is in its final location */ > #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ | VM_STACK_EARLY) > It's not perfect (very VFIO <-> KVM specific right now, VMA flags feel a bit wrong), but it certainly easier and cleaner than any alternatives I could think of. Acked-by: David Hildenbrand <david@redhat.com> -- Cheers, David / dhildenb
© 2016 - 2025 Red Hat, Inc.