[v3] x86: shim building adjustments (plus shadow follow-on)

[PATCH v3 0/3] x86: shim building adjustments (plus shadow follow-on)

Posted by Jan Beulich 5 years, 3 months ago

The change addressing the shadow related build issue noticed by
Andrew went in already. The build breakage goes beyond this
specific combination though - PV_SHIM_EXCLUSIVE plus HVM is
similarly an issue. This is what the 1st patch tries to take care
of, in a shape already on irc noticed to be controversial. I'm
submitting the change nevertheless because for the moment there
looks to be a majority in favor of going this route. One argument
not voiced there yet: What good does it do to allow a user to
enable HVM when then on the resulting hypervisor they still can't
run HVM guests (for the hypervisor still being a dedicated PV
shim one). On top of this, the alternative approach is likely
going to get ugly.

The shadow related adjustments are here merely because the want
to make them was noticed in the context of the patch which has
already gone in.

1: don't permit HVM and PV_SHIM_EXCLUSIVE at the same time
2: refactor shadow_vram_{get,put}_l1e()
3: sh_{make,destroy}_monitor_table() are "even more" HVM-only

Jan

[PATCH v3 1/3] x86/shim: don't permit HVM and PV_SHIM_EXCLUSIVE at the same time

Posted by Jan Beulich 5 years, 3 months ago

This combination doesn't really make sense (and there likely are more);
in particular even if the code built with both options set, HVM guests
wouldn't work (and I think one wouldn't be able to create one in the
first place). The alternative here would be some presumably intrusive
#ifdef-ary to get this combination to actually build (but still not
work) again.

Fixes: 8b5b49ceb3d9 ("x86: don't include domctl and alike in shim-exclusive builds")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
---
v2: Restore lost default setting.

--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -23,7 +23,7 @@ config X86
 	select HAS_PDX
 	select HAS_SCHED_GRANULARITY
 	select HAS_UBSAN
-	select HAS_VPCI if !PV_SHIM_EXCLUSIVE && HVM
+	select HAS_VPCI if HVM
 	select NEEDS_LIBELF
 	select NUMA
 
@@ -90,8 +90,9 @@ config PV_LINEAR_PT
          If unsure, say Y.
 
 config HVM
-	def_bool !PV_SHIM_EXCLUSIVE
-	prompt "HVM support"
+	bool "HVM support"
+	depends on !PV_SHIM_EXCLUSIVE
+	default y
 	---help---
 	  Interfaces to support HVM domains.  HVM domains require hardware
 	  virtualisation extensions (e.g. Intel VT-x, AMD SVM), but can boot

[PATCH v3 2/3] x86/shadow: refactor shadow_vram_{get,put}_l1e()

Posted by Jan Beulich 5 years, 3 months ago

By passing the functions an MFN and flags, only a single instance of
each is needed; they were pretty large for being inline functions
anyway.

While moving the code, also adjust coding style and add const where
sensible / possible.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: New.

--- a/xen/arch/x86/mm/shadow/hvm.c
+++ b/xen/arch/x86/mm/shadow/hvm.c
@@ -903,6 +903,104 @@ int shadow_track_dirty_vram(struct domai
     return rc;
 }
 
+void shadow_vram_get_mfn(mfn_t mfn, unsigned int l1f,
+                         mfn_t sl1mfn, const void *sl1e,
+                         const struct domain *d)
+{
+    unsigned long gfn;
+    struct sh_dirty_vram *dirty_vram = d->arch.hvm.dirty_vram;
+
+    ASSERT(is_hvm_domain(d));
+
+    if ( !dirty_vram /* tracking disabled? */ ||
+         !(l1f & _PAGE_RW) /* read-only mapping? */ ||
+         !mfn_valid(mfn) /* mfn can be invalid in mmio_direct */)
+        return;
+
+    gfn = gfn_x(mfn_to_gfn(d, mfn));
+    /* Page sharing not supported on shadow PTs */
+    BUG_ON(SHARED_M2P(gfn));
+
+    if ( (gfn >= dirty_vram->begin_pfn) && (gfn < dirty_vram->end_pfn) )
+    {
+        unsigned long i = gfn - dirty_vram->begin_pfn;
+        const struct page_info *page = mfn_to_page(mfn);
+
+        if ( (page->u.inuse.type_info & PGT_count_mask) == 1 )
+            /* Initial guest reference, record it */
+            dirty_vram->sl1ma[i] = mfn_to_maddr(sl1mfn) |
+                                   PAGE_OFFSET(sl1e);
+    }
+}
+
+void shadow_vram_put_mfn(mfn_t mfn, unsigned int l1f,
+                         mfn_t sl1mfn, const void *sl1e,
+                         const struct domain *d)
+{
+    unsigned long gfn;
+    struct sh_dirty_vram *dirty_vram = d->arch.hvm.dirty_vram;
+
+    ASSERT(is_hvm_domain(d));
+
+    if ( !dirty_vram /* tracking disabled? */ ||
+         !(l1f & _PAGE_RW) /* read-only mapping? */ ||
+         !mfn_valid(mfn) /* mfn can be invalid in mmio_direct */)
+        return;
+
+    gfn = gfn_x(mfn_to_gfn(d, mfn));
+    /* Page sharing not supported on shadow PTs */
+    BUG_ON(SHARED_M2P(gfn));
+
+    if ( (gfn >= dirty_vram->begin_pfn) && (gfn < dirty_vram->end_pfn) )
+    {
+        unsigned long i = gfn - dirty_vram->begin_pfn;
+        const struct page_info *page = mfn_to_page(mfn);
+        bool dirty = false;
+        paddr_t sl1ma = mfn_to_maddr(sl1mfn) | PAGE_OFFSET(sl1e);
+
+        if ( (page->u.inuse.type_info & PGT_count_mask) == 1 )
+        {
+            /* Last reference */
+            if ( dirty_vram->sl1ma[i] == INVALID_PADDR )
+            {
+                /* We didn't know it was that one, let's say it is dirty */
+                dirty = true;
+            }
+            else
+            {
+                ASSERT(dirty_vram->sl1ma[i] == sl1ma);
+                dirty_vram->sl1ma[i] = INVALID_PADDR;
+                if ( l1f & _PAGE_DIRTY )
+                    dirty = true;
+            }
+        }
+        else
+        {
+            /* We had more than one reference, just consider the page dirty. */
+            dirty = true;
+            /* Check that it's not the one we recorded. */
+            if ( dirty_vram->sl1ma[i] == sl1ma )
+            {
+                /* Too bad, we remembered the wrong one... */
+                dirty_vram->sl1ma[i] = INVALID_PADDR;
+            }
+            else
+            {
+                /*
+                 * Ok, our recorded sl1e is still pointing to this page, let's
+                 * just hope it will remain.
+                 */
+            }
+        }
+
+        if ( dirty )
+        {
+            dirty_vram->dirty_bitmap[i / 8] |= 1 << (i % 8);
+            dirty_vram->last_dirty = NOW();
+        }
+    }
+}
+
 /*
  * Local variables:
  * mode: C
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1047,107 +1047,6 @@ static int shadow_set_l2e(struct domain
     return flags;
 }
 
-static inline void shadow_vram_get_l1e(shadow_l1e_t new_sl1e,
-                                       shadow_l1e_t *sl1e,
-                                       mfn_t sl1mfn,
-                                       struct domain *d)
-{
-#ifdef CONFIG_HVM
-    mfn_t mfn = shadow_l1e_get_mfn(new_sl1e);
-    int flags = shadow_l1e_get_flags(new_sl1e);
-    unsigned long gfn;
-    struct sh_dirty_vram *dirty_vram = d->arch.hvm.dirty_vram;
-
-    if ( !is_hvm_domain(d) || !dirty_vram /* tracking disabled? */
-         || !(flags & _PAGE_RW) /* read-only mapping? */
-         || !mfn_valid(mfn) )   /* mfn can be invalid in mmio_direct */
-        return;
-
-    gfn = gfn_x(mfn_to_gfn(d, mfn));
-    /* Page sharing not supported on shadow PTs */
-    BUG_ON(SHARED_M2P(gfn));
-
-    if ( (gfn >= dirty_vram->begin_pfn) && (gfn < dirty_vram->end_pfn) )
-    {
-        unsigned long i = gfn - dirty_vram->begin_pfn;
-        struct page_info *page = mfn_to_page(mfn);
-
-        if ( (page->u.inuse.type_info & PGT_count_mask) == 1 )
-            /* Initial guest reference, record it */
-            dirty_vram->sl1ma[i] = mfn_to_maddr(sl1mfn)
-                | ((unsigned long)sl1e & ~PAGE_MASK);
-    }
-#endif
-}
-
-static inline void shadow_vram_put_l1e(shadow_l1e_t old_sl1e,
-                                       shadow_l1e_t *sl1e,
-                                       mfn_t sl1mfn,
-                                       struct domain *d)
-{
-#ifdef CONFIG_HVM
-    mfn_t mfn = shadow_l1e_get_mfn(old_sl1e);
-    int flags = shadow_l1e_get_flags(old_sl1e);
-    unsigned long gfn;
-    struct sh_dirty_vram *dirty_vram = d->arch.hvm.dirty_vram;
-
-    if ( !is_hvm_domain(d) || !dirty_vram /* tracking disabled? */
-         || !(flags & _PAGE_RW) /* read-only mapping? */
-         || !mfn_valid(mfn) )   /* mfn can be invalid in mmio_direct */
-        return;
-
-    gfn = gfn_x(mfn_to_gfn(d, mfn));
-    /* Page sharing not supported on shadow PTs */
-    BUG_ON(SHARED_M2P(gfn));
-
-    if ( (gfn >= dirty_vram->begin_pfn) && (gfn < dirty_vram->end_pfn) )
-    {
-        unsigned long i = gfn - dirty_vram->begin_pfn;
-        struct page_info *page = mfn_to_page(mfn);
-        int dirty = 0;
-        paddr_t sl1ma = mfn_to_maddr(sl1mfn)
-            | ((unsigned long)sl1e & ~PAGE_MASK);
-
-        if ( (page->u.inuse.type_info & PGT_count_mask) == 1 )
-        {
-            /* Last reference */
-            if ( dirty_vram->sl1ma[i] == INVALID_PADDR ) {
-                /* We didn't know it was that one, let's say it is dirty */
-                dirty = 1;
-            }
-            else
-            {
-                ASSERT(dirty_vram->sl1ma[i] == sl1ma);
-                dirty_vram->sl1ma[i] = INVALID_PADDR;
-                if ( flags & _PAGE_DIRTY )
-                    dirty = 1;
-            }
-        }
-        else
-        {
-            /* We had more than one reference, just consider the page dirty. */
-            dirty = 1;
-            /* Check that it's not the one we recorded. */
-            if ( dirty_vram->sl1ma[i] == sl1ma )
-            {
-                /* Too bad, we remembered the wrong one... */
-                dirty_vram->sl1ma[i] = INVALID_PADDR;
-            }
-            else
-            {
-                /* Ok, our recorded sl1e is still pointing to this page, let's
-                 * just hope it will remain. */
-            }
-        }
-        if ( dirty )
-        {
-            dirty_vram->dirty_bitmap[i / 8] |= 1 << (i % 8);
-            dirty_vram->last_dirty = NOW();
-        }
-    }
-#endif
-}
-
 static int shadow_set_l1e(struct domain *d,
                           shadow_l1e_t *sl1e,
                           shadow_l1e_t new_sl1e,
@@ -1156,6 +1055,7 @@ static int shadow_set_l1e(struct domain
 {
     int flags = 0;
     shadow_l1e_t old_sl1e;
+    unsigned int old_sl1f;
 #if SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC
     mfn_t new_gmfn = shadow_l1e_get_mfn(new_sl1e);
 #endif
@@ -1194,7 +1094,9 @@ static int shadow_set_l1e(struct domain
                 new_sl1e = shadow_l1e_flip_flags(new_sl1e, rc);
                 /* fall through */
             case 0:
-                shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
+                shadow_vram_get_mfn(shadow_l1e_get_mfn(new_sl1e),
+                                    shadow_l1e_get_flags(new_sl1e),
+                                    sl1mfn, sl1e, d);
                 break;
             }
 #undef PAGE_FLIPPABLE
@@ -1205,20 +1107,19 @@ static int shadow_set_l1e(struct domain
     shadow_write_entries(sl1e, &new_sl1e, 1, sl1mfn);
     flags |= SHADOW_SET_CHANGED;
 
-    if ( (shadow_l1e_get_flags(old_sl1e) & _PAGE_PRESENT)
-         && !sh_l1e_is_magic(old_sl1e) )
+    old_sl1f = shadow_l1e_get_flags(old_sl1e);
+    if ( (old_sl1f & _PAGE_PRESENT) && !sh_l1e_is_magic(old_sl1e) &&
+         shadow_mode_refcounts(d) )
     {
         /* We lost a reference to an old mfn. */
         /* N.B. Unlike higher-level sets, never need an extra flush
          * when writing an l1e.  Because it points to the same guest frame
          * as the guest l1e did, it's the guest's responsibility to
          * trigger a flush later. */
-        if ( shadow_mode_refcounts(d) )
-        {
-            shadow_vram_put_l1e(old_sl1e, sl1e, sl1mfn, d);
-            shadow_put_page_from_l1e(old_sl1e, d);
-            TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_PUT_REF);
-        }
+        shadow_vram_put_mfn(shadow_l1e_get_mfn(old_sl1e), old_sl1f,
+                            sl1mfn, sl1e, d);
+        shadow_put_page_from_l1e(old_sl1e, d);
+        TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_PUT_REF);
     }
     return flags;
 }
@@ -1944,9 +1845,12 @@ void sh_destroy_l1_shadow(struct domain
         /* Decrement refcounts of all the old entries */
         mfn_t sl1mfn = smfn;
         SHADOW_FOREACH_L1E(sl1mfn, sl1e, 0, 0, {
-            if ( (shadow_l1e_get_flags(*sl1e) & _PAGE_PRESENT)
-                 && !sh_l1e_is_magic(*sl1e) ) {
-                shadow_vram_put_l1e(*sl1e, sl1e, sl1mfn, d);
+            unsigned int sl1f = shadow_l1e_get_flags(*sl1e);
+
+            if ( (sl1f & _PAGE_PRESENT) && !sh_l1e_is_magic(*sl1e) )
+            {
+                shadow_vram_put_mfn(shadow_l1e_get_mfn(*sl1e), sl1f,
+                                    sl1mfn, sl1e, d);
                 shadow_put_page_from_l1e(*sl1e, d);
             }
         });
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -410,6 +410,14 @@ void shadow_update_paging_modes(struct v
  * With user_only == 1, unhooks only the user-mode mappings. */
 void shadow_unhook_mappings(struct domain *d, mfn_t smfn, int user_only);
 
+/* VRAM dirty tracking helpers. */
+void shadow_vram_get_mfn(mfn_t mfn, unsigned int l1f,
+                         mfn_t sl1mfn, const void *sl1e,
+                         const struct domain *d);
+void shadow_vram_put_mfn(mfn_t mfn, unsigned int l1f,
+                         mfn_t sl1mfn, const void *sl1e,
+                         const struct domain *d);
+
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
 /* Allow a shadowed page to go out of sync */
 int sh_unsync(struct vcpu *v, mfn_t gmfn);

Re: [PATCH v3 2/3] x86/shadow: refactor shadow_vram_{get,put}_l1e()

Posted by Roger Pau Monné 5 years, 3 months ago

On Mon, Oct 19, 2020 at 10:44:31AM +0200, Jan Beulich wrote:
> By passing the functions an MFN and flags, only a single instance of
> each is needed; they were pretty large for being inline functions
> anyway.
> 
> While moving the code, also adjust coding style and add const where
> sensible / possible.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

Re: [PATCH v3 2/3] x86/shadow: refactor shadow_vram_{get,put}_l1e()

Posted by Tim Deegan 5 years, 3 months ago

At 10:44 +0200 on 19 Oct (1603104271), Jan Beulich wrote:
> By passing the functions an MFN and flags, only a single instance of
> each is needed; they were pretty large for being inline functions
> anyway.
> 
> While moving the code, also adjust coding style and add const where
> sensible / possible.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Tim Deegan <tim@xen.org>

[PATCH v3 3/3] x86/shadow: sh_{make,destroy}_monitor_table() are "even more" HVM-only

Posted by Jan Beulich 5 years, 3 months ago

With them depending on just the number of shadow levels, there's no need
for more than one instance of them, and hence no need for any hook (IOW
452219e24648 ["x86/shadow: monitor table is HVM-only"] didn't go quite
far enough). Move the functions to hvm.c while dropping the dead
is_pv_32bit_domain() code paths.

While moving the code, replace a stale comment reference to
sh_install_xen_entries_in_l4(). Doing so made me notice the function
also didn't have its prototype dropped in 8d7b633adab7 ("x86/mm:
Consolidate all Xen L4 slot writing into init_xen_l4_slots()"), which
gets done here as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New.
---
TBD: In principle both functions could have their first parameter
     constified. In fact, "destroy" doesn't depend on the vCPU at all
     and hence could be passed a struct domain *. Not sure whether such
     an asymmetry would be acceptable.
     In principle "make" would also not need passing of the number of
     shadow levels (can be derived from v), which would result in yet
     another asymmetry.
     If these asymmetries were acceptable, "make" could then also update
     v->arch.hvm.monitor_table, instead of doing so at both call sites.
TBD: Collides with Andrew's "xen/x86: Fix memory leak in vcpu_create()
     error path".

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -2467,7 +2467,9 @@ static void sh_update_paging_modes(struc
 
         if ( pagetable_is_null(v->arch.hvm.monitor_table) )
         {
-            mfn_t mmfn = v->arch.paging.mode->shadow.make_monitor_table(v);
+            mfn_t mmfn = sh_make_monitor_table(
+                             v, v->arch.paging.mode->shadow.shadow_levels);
+
             v->arch.hvm.monitor_table = pagetable_from_mfn(mmfn);
             make_cr3(v, mmfn);
             hvm_update_host_cr3(v);
@@ -2504,7 +2506,8 @@ static void sh_update_paging_modes(struc
 
                 old_mfn = pagetable_get_mfn(v->arch.hvm.monitor_table);
                 v->arch.hvm.monitor_table = pagetable_null();
-                new_mfn = v->arch.paging.mode->shadow.make_monitor_table(v);
+                new_mfn = sh_make_monitor_table(
+                              v, v->arch.paging.mode->shadow.shadow_levels);
                 v->arch.hvm.monitor_table = pagetable_from_mfn(new_mfn);
                 SHADOW_PRINTK("new monitor table %"PRI_mfn "\n",
                                mfn_x(new_mfn));
@@ -2516,7 +2519,8 @@ static void sh_update_paging_modes(struc
                 if ( v == current )
                     write_ptbase(v);
                 hvm_update_host_cr3(v);
-                old_mode->shadow.destroy_monitor_table(v, old_mfn);
+                sh_destroy_monitor_table(v, old_mfn,
+                                         old_mode->shadow.shadow_levels);
             }
         }
 
@@ -2801,7 +2805,9 @@ void shadow_teardown(struct domain *d, b
                     mfn_t mfn = pagetable_get_mfn(v->arch.hvm.monitor_table);
 
                     if ( mfn_valid(mfn) && (mfn_x(mfn) != 0) )
-                        v->arch.paging.mode->shadow.destroy_monitor_table(v, mfn);
+                        sh_destroy_monitor_table(
+                            v, mfn,
+                            v->arch.paging.mode->shadow.shadow_levels);
                     v->arch.hvm.monitor_table = pagetable_null();
                 }
 #endif /* CONFIG_HVM */
--- a/xen/arch/x86/mm/shadow/hvm.c
+++ b/xen/arch/x86/mm/shadow/hvm.c
@@ -691,6 +691,88 @@ static void sh_emulate_unmap_dest(struct
     atomic_inc(&v->domain->arch.paging.shadow.gtable_dirty_version);
 }
 
+mfn_t sh_make_monitor_table(struct vcpu *v, unsigned int shadow_levels)
+{
+    struct domain *d = v->domain;
+    mfn_t m4mfn;
+    l4_pgentry_t *l4e;
+
+    ASSERT(!pagetable_get_pfn(v->arch.hvm.monitor_table));
+
+    /* Guarantee we can get the memory we need */
+    shadow_prealloc(d, SH_type_monitor_table, CONFIG_PAGING_LEVELS);
+    m4mfn = shadow_alloc(d, SH_type_monitor_table, 0);
+    mfn_to_page(m4mfn)->shadow_flags = 4;
+
+    l4e = map_domain_page(m4mfn);
+
+    /*
+     * Create a self-linear mapping, but no shadow-linear mapping.  A
+     * shadow-linear mapping will either be inserted below when creating
+     * lower level monitor tables, or later in sh_update_cr3().
+     */
+    init_xen_l4_slots(l4e, m4mfn, d, INVALID_MFN, false);
+
+    if ( shadow_levels < 4 )
+    {
+        mfn_t m3mfn, m2mfn;
+        l3_pgentry_t *l3e;
+
+        /*
+         * Install an l3 table and an l2 table that will hold the shadow
+         * linear map entries.  This overrides the empty entry that was
+         * installed by init_xen_l4_slots().
+         */
+        m3mfn = shadow_alloc(d, SH_type_monitor_table, 0);
+        mfn_to_page(m3mfn)->shadow_flags = 3;
+        l4e[l4_table_offset(SH_LINEAR_PT_VIRT_START)]
+            = l4e_from_mfn(m3mfn, __PAGE_HYPERVISOR_RW);
+
+        m2mfn = shadow_alloc(d, SH_type_monitor_table, 0);
+        mfn_to_page(m2mfn)->shadow_flags = 2;
+        l3e = map_domain_page(m3mfn);
+        l3e[0] = l3e_from_mfn(m2mfn, __PAGE_HYPERVISOR_RW);
+        unmap_domain_page(l3e);
+    }
+
+    unmap_domain_page(l4e);
+
+    return m4mfn;
+}
+
+void sh_destroy_monitor_table(struct vcpu *v, mfn_t mmfn,
+                              unsigned int shadow_levels)
+{
+    struct domain *d = v->domain;
+
+    ASSERT(mfn_to_page(mmfn)->u.sh.type == SH_type_monitor_table);
+
+    if ( shadow_levels < 4 )
+    {
+        mfn_t m3mfn;
+        l4_pgentry_t *l4e = map_domain_page(mmfn);
+        l3_pgentry_t *l3e;
+        unsigned int linear_slot = l4_table_offset(SH_LINEAR_PT_VIRT_START);
+
+        /*
+         * Need to destroy the l3 and l2 monitor pages used
+         * for the linear map.
+         */
+        ASSERT(l4e_get_flags(l4e[linear_slot]) & _PAGE_PRESENT);
+        m3mfn = l4e_get_mfn(l4e[linear_slot]);
+        l3e = map_domain_page(m3mfn);
+        ASSERT(l3e_get_flags(l3e[0]) & _PAGE_PRESENT);
+        shadow_free(d, l3e_get_mfn(l3e[0]));
+        unmap_domain_page(l3e);
+        shadow_free(d, m3mfn);
+
+        unmap_domain_page(l4e);
+    }
+
+    /* Put the memory back in the pool */
+    shadow_free(d, mmfn);
+}
+
 /**************************************************************************/
 /* VRAM dirty tracking support */
 int shadow_track_dirty_vram(struct domain *d,
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1405,84 +1405,6 @@ make_fl1_shadow(struct domain *d, gfn_t
 }
 
 
-#if SHADOW_PAGING_LEVELS == GUEST_PAGING_LEVELS && defined(CONFIG_HVM)
-mfn_t
-sh_make_monitor_table(struct vcpu *v)
-{
-    struct domain *d = v->domain;
-
-    ASSERT(pagetable_get_pfn(v->arch.hvm.monitor_table) == 0);
-
-    /* Guarantee we can get the memory we need */
-    shadow_prealloc(d, SH_type_monitor_table, CONFIG_PAGING_LEVELS);
-
-    {
-        mfn_t m4mfn;
-        l4_pgentry_t *l4e;
-
-        m4mfn = shadow_alloc(d, SH_type_monitor_table, 0);
-        mfn_to_page(m4mfn)->shadow_flags = 4;
-
-        l4e = map_domain_page(m4mfn);
-
-        /*
-         * Create a self-linear mapping, but no shadow-linear mapping.  A
-         * shadow-linear mapping will either be inserted below when creating
-         * lower level monitor tables, or later in sh_update_cr3().
-         */
-        init_xen_l4_slots(l4e, m4mfn, d, INVALID_MFN, false);
-
-#if SHADOW_PAGING_LEVELS < 4
-        {
-            mfn_t m3mfn, m2mfn;
-            l3_pgentry_t *l3e;
-            /* Install an l3 table and an l2 table that will hold the shadow
-             * linear map entries.  This overrides the linear map entry that
-             * was installed by sh_install_xen_entries_in_l4. */
-
-            m3mfn = shadow_alloc(d, SH_type_monitor_table, 0);
-            mfn_to_page(m3mfn)->shadow_flags = 3;
-            l4e[shadow_l4_table_offset(SH_LINEAR_PT_VIRT_START)]
-                = l4e_from_mfn(m3mfn, __PAGE_HYPERVISOR_RW);
-
-            m2mfn = shadow_alloc(d, SH_type_monitor_table, 0);
-            mfn_to_page(m2mfn)->shadow_flags = 2;
-            l3e = map_domain_page(m3mfn);
-            l3e[0] = l3e_from_mfn(m2mfn, __PAGE_HYPERVISOR_RW);
-            unmap_domain_page(l3e);
-
-            if ( is_pv_32bit_domain(d) )
-            {
-                l2_pgentry_t *l2t;
-
-                /* For 32-bit PV guests, we need to map the 32-bit Xen
-                 * area into its usual VAs in the monitor tables */
-                m3mfn = shadow_alloc(d, SH_type_monitor_table, 0);
-                mfn_to_page(m3mfn)->shadow_flags = 3;
-                l4e[0] = l4e_from_mfn(m3mfn, __PAGE_HYPERVISOR_RW);
-
-                m2mfn = shadow_alloc(d, SH_type_monitor_table, 0);
-                mfn_to_page(m2mfn)->shadow_flags = 2;
-                l3e = map_domain_page(m3mfn);
-                l3e[3] = l3e_from_mfn(m2mfn, _PAGE_PRESENT);
-
-                l2t = map_domain_page(m2mfn);
-                init_xen_pae_l2_slots(l2t, d);
-                unmap_domain_page(l2t);
-
-                unmap_domain_page(l3e);
-            }
-
-        }
-#endif /* SHADOW_PAGING_LEVELS < 4 */
-
-        unmap_domain_page(l4e);
-
-        return m4mfn;
-    }
-}
-#endif /* SHADOW_PAGING_LEVELS == GUEST_PAGING_LEVELS */
-
 /**************************************************************************/
 /* These functions also take a virtual address and return the level-N
  * shadow table mfn and entry, but they create the shadow pagetables if
@@ -1860,50 +1782,6 @@ void sh_destroy_l1_shadow(struct domain
     shadow_free(d, smfn);
 }
 
-#if SHADOW_PAGING_LEVELS == GUEST_PAGING_LEVELS && defined(CONFIG_HVM)
-void sh_destroy_monitor_table(struct vcpu *v, mfn_t mmfn)
-{
-    struct domain *d = v->domain;
-    ASSERT(mfn_to_page(mmfn)->u.sh.type == SH_type_monitor_table);
-
-#if SHADOW_PAGING_LEVELS != 4
-    {
-        mfn_t m3mfn;
-        l4_pgentry_t *l4e = map_domain_page(mmfn);
-        l3_pgentry_t *l3e;
-        int linear_slot = shadow_l4_table_offset(SH_LINEAR_PT_VIRT_START);
-
-        /* Need to destroy the l3 and l2 monitor pages used
-         * for the linear map */
-        ASSERT(l4e_get_flags(l4e[linear_slot]) & _PAGE_PRESENT);
-        m3mfn = l4e_get_mfn(l4e[linear_slot]);
-        l3e = map_domain_page(m3mfn);
-        ASSERT(l3e_get_flags(l3e[0]) & _PAGE_PRESENT);
-        shadow_free(d, l3e_get_mfn(l3e[0]));
-        unmap_domain_page(l3e);
-        shadow_free(d, m3mfn);
-
-        if ( is_pv_32bit_domain(d) )
-        {
-            /* Need to destroy the l3 and l2 monitor pages that map the
-             * Xen VAs at 3GB-4GB */
-            ASSERT(l4e_get_flags(l4e[0]) & _PAGE_PRESENT);
-            m3mfn = l4e_get_mfn(l4e[0]);
-            l3e = map_domain_page(m3mfn);
-            ASSERT(l3e_get_flags(l3e[3]) & _PAGE_PRESENT);
-            shadow_free(d, l3e_get_mfn(l3e[3]));
-            unmap_domain_page(l3e);
-            shadow_free(d, m3mfn);
-        }
-        unmap_domain_page(l4e);
-    }
-#endif
-
-    /* Put the memory back in the pool */
-    shadow_free(d, mmfn);
-}
-#endif
-
 /**************************************************************************/
 /* Functions to destroy non-Xen mappings in a pagetable hierarchy.
  * These are called from common code when we are running out of shadow
@@ -4705,8 +4583,6 @@ const struct paging_mode sh_paging_mode
     .shadow.cmpxchg_guest_entry    = sh_cmpxchg_guest_entry,
 #endif
 #ifdef CONFIG_HVM
-    .shadow.make_monitor_table     = sh_make_monitor_table,
-    .shadow.destroy_monitor_table  = sh_destroy_monitor_table,
 #if SHADOW_OPTIMIZATIONS & SHOPT_WRITABLE_HEURISTIC
     .shadow.guess_wrmap            = sh_guess_wrmap,
 #endif
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -366,9 +366,6 @@ void sh_set_toplevel_shadow(struct vcpu
                                                  mfn_t gmfn,
                                                  uint32_t shadow_type));
 
-/* Install the xen mappings in various flavours of shadow */
-void sh_install_xen_entries_in_l4(struct domain *, mfn_t gl4mfn, mfn_t sl4mfn);
-
 /* Update the shadows in response to a pagetable write from Xen */
 int sh_validate_guest_entry(struct vcpu *v, mfn_t gmfn, void *entry, u32 size);
 
@@ -410,6 +407,14 @@ void shadow_update_paging_modes(struct v
  * With user_only == 1, unhooks only the user-mode mappings. */
 void shadow_unhook_mappings(struct domain *d, mfn_t smfn, int user_only);
 
+/*
+ * sh_{make,destroy}_monitor_table() depend only on the number of shadow
+ * levels.
+ */
+mfn_t sh_make_monitor_table(struct vcpu *v, unsigned int shadow_levels);
+void sh_destroy_monitor_table(struct vcpu *v, mfn_t mmfn,
+                              unsigned int shadow_levels);
+
 /* VRAM dirty tracking helpers. */
 void shadow_vram_get_mfn(mfn_t mfn, unsigned int l1f,
                          mfn_t sl1mfn, const void *sl1e,
--- a/xen/arch/x86/mm/shadow/types.h
+++ b/xen/arch/x86/mm/shadow/types.h
@@ -262,15 +262,6 @@ static inline shadow_l4e_t shadow_l4e_fr
 #define sh_rm_write_access_from_sl1p INTERNAL_NAME(sh_rm_write_access_from_sl1p)
 #endif
 
-/* sh_make_monitor_table depends only on the number of shadow levels */
-#define sh_make_monitor_table \
-        SHADOW_SH_NAME(sh_make_monitor_table, SHADOW_PAGING_LEVELS)
-#define sh_destroy_monitor_table \
-        SHADOW_SH_NAME(sh_destroy_monitor_table, SHADOW_PAGING_LEVELS)
-
-mfn_t sh_make_monitor_table(struct vcpu *v);
-void sh_destroy_monitor_table(struct vcpu *v, mfn_t mmfn);
-
 #if SHADOW_PAGING_LEVELS == 3
 #define MFN_FITS_IN_HVM_CR3(_MFN) !(mfn_x(_MFN) >> 20)
 #endif
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -107,8 +107,6 @@ struct shadow_paging_mode {
                                             mfn_t gmfn);
 #endif
 #ifdef CONFIG_HVM
-    mfn_t         (*make_monitor_table    )(struct vcpu *v);
-    void          (*destroy_monitor_table )(struct vcpu *v, mfn_t mmfn);
     int           (*guess_wrmap           )(struct vcpu *v, 
                                             unsigned long vaddr, mfn_t gmfn);
     void          (*pagetable_dying       )(paddr_t gpa);

Re: [PATCH v3 3/3] x86/shadow: sh_{make,destroy}_monitor_table() are "even more" HVM-only

Posted by Roger Pau Monné 5 years, 3 months ago

On Mon, Oct 19, 2020 at 10:45:00AM +0200, Jan Beulich wrote:
> With them depending on just the number of shadow levels, there's no need
> for more than one instance of them, and hence no need for any hook (IOW
> 452219e24648 ["x86/shadow: monitor table is HVM-only"] didn't go quite
> far enough). Move the functions to hvm.c while dropping the dead
> is_pv_32bit_domain() code paths.
> 
> While moving the code, replace a stale comment reference to
> sh_install_xen_entries_in_l4(). Doing so made me notice the function
> also didn't have its prototype dropped in 8d7b633adab7 ("x86/mm:
> Consolidate all Xen L4 slot writing into init_xen_l4_slots()"), which
> gets done here as well.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> v3: New.
> ---
> TBD: In principle both functions could have their first parameter
>      constified. In fact, "destroy" doesn't depend on the vCPU at all
>      and hence could be passed a struct domain *. Not sure whether such
>      an asymmetry would be acceptable.
>      In principle "make" would also not need passing of the number of
>      shadow levels (can be derived from v), which would result in yet
>      another asymmetry.

I'm not specially fuzzed either way - having const v would be good
IMO.

Thanks, Roger.

Re: [PATCH v3 3/3] x86/shadow: sh_{make,destroy}_monitor_table() are "even more" HVM-only

Posted by Tim Deegan 5 years, 3 months ago

At 10:45 +0200 on 19 Oct (1603104300), Jan Beulich wrote:
> With them depending on just the number of shadow levels, there's no need
> for more than one instance of them, and hence no need for any hook (IOW
> 452219e24648 ["x86/shadow: monitor table is HVM-only"] didn't go quite
> far enough). Move the functions to hvm.c while dropping the dead
> is_pv_32bit_domain() code paths.
>
> While moving the code, replace a stale comment reference to
> sh_install_xen_entries_in_l4(). Doing so made me notice the function
> also didn't have its prototype dropped in 8d7b633adab7 ("x86/mm:
> Consolidate all Xen L4 slot writing into init_xen_l4_slots()"), which
> gets done here as well.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Tim Deegan <tim@xen.org>

> TBD: In principle both functions could have their first parameter
>      constified. In fact, "destroy" doesn't depend on the vCPU at all
>      and hence could be passed a struct domain *. Not sure whether such
>      an asymmetry would be acceptable.
>      In principle "make" would also not need passing of the number of
>      shadow levels (can be derived from v), which would result in yet
>      another asymmetry.
>      If these asymmetries were acceptable, "make" could then also update
>      v->arch.hvm.monitor_table, instead of doing so at both call sites.

Feel free to add consts, but please don't change the function
parameters any more than that.  I would rather keep them as a matched
pair, and leave the hvm.monitor_table updates in the caller, where
it's easier to see why they're not symmetrical.

Cheers

Tim.

Ping: [PATCH v3 0/3] x86: shim building adjustments (plus shadow follow-on)

Posted by Jan Beulich 5 years, 3 months ago

Tim,

On 19.10.2020 10:42, Jan Beulich wrote:
> The change addressing the shadow related build issue noticed by
> Andrew went in already. The build breakage goes beyond this
> specific combination though - PV_SHIM_EXCLUSIVE plus HVM is
> similarly an issue. This is what the 1st patch tries to take care
> of, in a shape already on irc noticed to be controversial. I'm
> submitting the change nevertheless because for the moment there
> looks to be a majority in favor of going this route. One argument
> not voiced there yet: What good does it do to allow a user to
> enable HVM when then on the resulting hypervisor they still can't
> run HVM guests (for the hypervisor still being a dedicated PV
> shim one). On top of this, the alternative approach is likely
> going to get ugly.
> 
> The shadow related adjustments are here merely because the want
> to make them was noticed in the context of the patch which has
> already gone in.
> 
> 1: don't permit HVM and PV_SHIM_EXCLUSIVE at the same time
> 2: refactor shadow_vram_{get,put}_l1e()
> 3: sh_{make,destroy}_monitor_table() are "even more" HVM-only

unless you tell me otherwise I'm intending to commit the latter
two with Roger's acks some time next week. Of course it would
still be nice to know your view on the first of the TBDs in
patch 3 (which may result in further changes, in a follow-up).

Jan

Re: Ping: [PATCH v3 0/3] x86: shim building adjustments (plus shadow follow-on)

Posted by Tim Deegan 5 years, 3 months ago

At 14:40 +0100 on 29 Oct (1603982415), Jan Beulich wrote:
> Tim,
> 
> unless you tell me otherwise I'm intending to commit the latter
> two with Roger's acks some time next week. Of course it would
> still be nice to know your view on the first of the TBDs in
> patch 3 (which may result in further changes, in a follow-up).

Ack, sorry for the dropped patches, and thank you for the ping.  I've
now replied to everything that I think is wating for my review.

Cheers,

Tim.