[PATCH RFC 3/3] x86/mm: Simplify/correct l1_disallow_mask()

Andrew Cooper posted 3 patches 2 months ago
[PATCH RFC 3/3] x86/mm: Simplify/correct l1_disallow_mask()
Posted by Andrew Cooper 2 months ago
l1_disallow_mask() yields L1_DISALLOW_MASK with PAGE_CACHE_ATTRS conditionally
permitted.  First, rewrite it as a plain function.

Next, correct some dubious behaviours.

The use of is_pv_domain() is tautological; l1_disallow_mask() is only used in
the PV pagetable code, and it return true for system domains as well.

The use of has_arch_pdevs() is wonky; by making the use of cache attributes
dependent on the instantanious property of whether any devices is assigned,
means that a guest could have created a legal PTE which will fail validation
at a later point when the device has been removed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

RFC.  I've not tested this, and I doubt it will work to start with owing to
the removal of the dom_io special case which IIRC dom0 uses to map arbitrary
MMIO.

Furthermore, the rangeset_is_empty() calls have the same problem that
has_arch_pdevs() has; they're not invariants on the domain.  Also, VMs
wanting/needing encrypted memory need fully working cacheability irrespective
of device assignment.

I expect the way we actually want to fix this is to have a CDF flag for
allowing reduced cahcebility, and for this expression to simplify to just:

    if ( d->options & XEN_DOMCTL_CDF_any_cacheability )
        disallow &= ~PAGE_CACHE_ATTRS;

which is simpler still.

For the current form, bloat-o-meter reports:

  add/remove: 1/0 grow/shrink: 1/2 up/down: 75/-280 (-205)
  Function                                     old     new   delta
  l1_disallow_mask                               -      74     +74
  mod_l1_entry.cold                             55      56      +1
  get_page_from_l1e                           1271    1167    -104
  mod_l1_entry                                1860    1684    -176

which is an even bigger improvement than simply not duplicating the logic.
---
 xen/arch/x86/mm.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 95795567f2a5..31937319c057 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -162,13 +162,17 @@ static uint32_t __ro_after_init base_disallow_mask;
 
 #define L4_DISALLOW_MASK (base_disallow_mask)
 
-#define l1_disallow_mask(d)                                     \
-    (((d) != dom_io) &&                                         \
-     (rangeset_is_empty((d)->iomem_caps) &&                     \
-      rangeset_is_empty((d)->arch.ioport_caps) &&               \
-      !has_arch_pdevs(d) &&                                     \
-      is_pv_domain(d)) ?                                        \
-     L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
+static uint32_t l1_disallow_mask(const struct domain *d)
+{
+    uint32_t disallow = L1_DISALLOW_MASK;
+
+    if ( (d->options & XEN_DOMCTL_CDF_iommu) ||
+         !rangeset_is_empty(d->iomem_caps) ||
+         !rangeset_is_empty(d->arch.ioport_caps) )
+        disallow &= ~PAGE_CACHE_ATTRS;
+
+    return disallow;
+}
 
 static s8 __read_mostly opt_mmio_relax;
 
-- 
2.39.2


Re: [PATCH RFC 3/3] x86/mm: Simplify/correct l1_disallow_mask()
Posted by Jan Beulich 2 months ago
On 17.07.2024 18:14, Andrew Cooper wrote:
> l1_disallow_mask() yields L1_DISALLOW_MASK with PAGE_CACHE_ATTRS conditionally
> permitted.  First, rewrite it as a plain function.
> 
> Next, correct some dubious behaviours.
> 
> The use of is_pv_domain() is tautological; l1_disallow_mask() is only used in
> the PV pagetable code, and it return true for system domains as well.

Well, only quite. What you say is entirely true for mod_l1_entry(), but sadly
not for get_page_from_l1e(): That's also used by shadow_get_page_from_l1e(),
i.e. theoretically from a PVH Dom0 running in shadow mode. I don't think I
can spot anywhere we'd reject that combination.

> The use of has_arch_pdevs() is wonky; by making the use of cache attributes
> dependent on the instantanious property of whether any devices is assigned,
> means that a guest could have created a legal PTE which will fail validation
> at a later point when the device has been removed.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> 
> RFC.  I've not tested this, and I doubt it will work to start with owing to
> the removal of the dom_io special case which IIRC dom0 uses to map arbitrary
> MMIO.

I think that check has been dead for a long time. DomIO can't be the page
table owner; it can only be the owner of a page to be mapped. The check
likely was needed only before the proper splitting of which domain involved
plays which role. Dropping this may then want to be done as a separate
change, but it could of course remain here as long as properly explained in
the description. Maybe for the time being we want to retain an assertion to
this effect.

> Furthermore, the rangeset_is_empty() calls have the same problem that
> has_arch_pdevs() has; they're not invariants on the domain.  Also, VMs
> wanting/needing encrypted memory need fully working cacheability irrespective
> of device assignment.
> 
> I expect the way we actually want to fix this is to have a CDF flag for
> allowing reduced cahcebility, and for this expression to simplify to just:
> 
>     if ( d->options & XEN_DOMCTL_CDF_any_cacheability )
>         disallow &= ~PAGE_CACHE_ATTRS;
> 
> which is simpler still.

Perhaps. Or refuse altering the two rangesets post-creation for domains
without IOMMUs ("post-creation" here including the establishing of any
mapping on behalf of the domain).

Jan