[PATCH 0/5] x86/p2m: hook adjustments

Jan Beulich posted 5 patches 3 years, 5 months ago
Only 0 patches received!
There is a newer version of this series
[PATCH 0/5] x86/p2m: hook adjustments
Posted by Jan Beulich 3 years, 5 months ago
This started out with me getting confused by the two write_p2m_entry()
hooks we have - there really ought to be no more than one, or if two
were absolutely needed they imo would better have distinct names. Other
adjustment opportunities (and I hope they're improvements) were found
while getting rid of that one unnecessary layer of indirect calls.

1: p2m: paging_write_p2m_entry() is a private function
2: p2m: collapse the two ->write_p2m_entry() hooks
3: p2m: suppress audit_p2m hook when possible
4: HAP: move nested-P2M flush calculations out of locked region
5: [RFC] p2m: split write_p2m_entry() hook

Jan

[PATCH 1/5] x86/p2m: paging_write_p2m_entry() is a private function
Posted by Jan Beulich 3 years, 5 months ago
As it gets installed by p2m_pt_init(), it doesn't need to live in
paging.c. The function working in terms of l1_pgentry_t even further
indicates its non-paging-generic nature. Move it and drop its
paging_ prefix, not adding any new one now that it's static.

This then also makes more obvious that in the EPT case we wouldn't
risk mistakenly calling through the NULL hook pointer.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -108,6 +108,31 @@ static unsigned long p2m_type_to_flags(c
     }
 }
 
+/*
+ * Atomically write a P2M entry and update the paging-assistance state
+ * appropriately.
+ * Arguments: the domain in question, the GFN whose mapping is being updated,
+ * a pointer to the entry to be written, the MFN in which the entry resides,
+ * the new contents of the entry, and the level in the p2m tree at which
+ * we are writing.
+ */
+static int write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
+                           l1_pgentry_t *p, l1_pgentry_t new,
+                           unsigned int level)
+{
+    struct domain *d = p2m->domain;
+    struct vcpu *v = current;
+    int rc = 0;
+
+    if ( v->domain != d )
+        v = d->vcpu ? d->vcpu[0] : NULL;
+    if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) )
+        rc = paging_get_hostmode(v)->write_p2m_entry(p2m, gfn, p, new, level);
+    else
+        safe_write_pte(p, new);
+
+    return rc;
+}
 
 // Find the next level's P2M entry, checking for out-of-range gfn's...
 // Returns NULL on error.
@@ -594,7 +619,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
         entry_content.l1 = l3e_content.l3;
 
         rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 3);
-        /* NB: paging_write_p2m_entry() handles tlb flushes properly */
+        /* NB: write_p2m_entry() handles tlb flushes properly */
         if ( rc )
             goto out;
     }
@@ -631,7 +656,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
 
         /* level 1 entry */
         rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
-        /* NB: paging_write_p2m_entry() handles tlb flushes properly */
+        /* NB: write_p2m_entry() handles tlb flushes properly */
         if ( rc )
             goto out;
     }
@@ -666,7 +691,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
         entry_content.l1 = l2e_content.l2;
 
         rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 2);
-        /* NB: paging_write_p2m_entry() handles tlb flushes properly */
+        /* NB: write_p2m_entry() handles tlb flushes properly */
         if ( rc )
             goto out;
     }
@@ -1107,7 +1132,7 @@ void p2m_pt_init(struct p2m_domain *p2m)
     p2m->recalc = do_recalc;
     p2m->change_entry_type_global = p2m_pt_change_entry_type_global;
     p2m->change_entry_type_range = p2m_pt_change_entry_type_range;
-    p2m->write_p2m_entry = paging_write_p2m_entry;
+    p2m->write_p2m_entry = write_p2m_entry;
 #if P2M_AUDIT
     p2m->audit_p2m = p2m_pt_audit_p2m;
 #else
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -941,27 +941,7 @@ void paging_update_nestedmode(struct vcp
         v->arch.paging.nestedmode = NULL;
     hvm_asid_flush_vcpu(v);
 }
-#endif
 
-int paging_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
-                           l1_pgentry_t *p, l1_pgentry_t new,
-                           unsigned int level)
-{
-    struct domain *d = p2m->domain;
-    struct vcpu *v = current;
-    int rc = 0;
-
-    if ( v->domain != d )
-        v = d->vcpu ? d->vcpu[0] : NULL;
-    if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v) != NULL) )
-        rc = paging_get_hostmode(v)->write_p2m_entry(p2m, gfn, p, new, level);
-    else
-        safe_write_pte(p, new);
-
-    return rc;
-}
-
-#ifdef CONFIG_HVM
 int __init paging_set_allocation(struct domain *d, unsigned int pages,
                                  bool *preempted)
 {
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -371,18 +371,6 @@ static inline void safe_write_pte(l1_pge
     *p = new;
 }
 
-/* Atomically write a P2M entry and update the paging-assistance state 
- * appropriately. 
- * Arguments: the domain in question, the GFN whose mapping is being updated, 
- * a pointer to the entry to be written, the MFN in which the entry resides, 
- * the new contents of the entry, and the level in the p2m tree at which 
- * we are writing. */
-struct p2m_domain;
-
-int paging_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
-                           l1_pgentry_t *p, l1_pgentry_t new,
-                           unsigned int level);
-
 /*
  * Called from the guest to indicate that the a process is being
  * torn down and its pagetables will soon be discarded.


Re: [PATCH 1/5] x86/p2m: paging_write_p2m_entry() is a private function
Posted by Roger Pau Monné 3 years, 5 months ago
On Wed, Oct 28, 2020 at 10:22:04AM +0100, Jan Beulich wrote:
> As it gets installed by p2m_pt_init(), it doesn't need to live in
> paging.c. The function working in terms of l1_pgentry_t even further
> indicates its non-paging-generic nature. Move it and drop its
> paging_ prefix, not adding any new one now that it's static.
> 
> This then also makes more obvious that in the EPT case we wouldn't
> risk mistakenly calling through the NULL hook pointer.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

> 
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -108,6 +108,31 @@ static unsigned long p2m_type_to_flags(c
>      }
>  }
>  
> +/*
> + * Atomically write a P2M entry and update the paging-assistance state
> + * appropriately.
> + * Arguments: the domain in question, the GFN whose mapping is being updated,
> + * a pointer to the entry to be written, the MFN in which the entry resides,
> + * the new contents of the entry, and the level in the p2m tree at which
> + * we are writing.
> + */
> +static int write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
> +                           l1_pgentry_t *p, l1_pgentry_t new,
> +                           unsigned int level)
> +{
> +    struct domain *d = p2m->domain;
> +    struct vcpu *v = current;

I think you could constify both?

Thanks, Roger.

Re: [PATCH 1/5] x86/p2m: paging_write_p2m_entry() is a private function
Posted by Jan Beulich 3 years, 5 months ago
On 10.11.2020 11:27, Roger Pau Monné wrote:
> On Wed, Oct 28, 2020 at 10:22:04AM +0100, Jan Beulich wrote:
>> As it gets installed by p2m_pt_init(), it doesn't need to live in
>> paging.c. The function working in terms of l1_pgentry_t even further
>> indicates its non-paging-generic nature. Move it and drop its
>> paging_ prefix, not adding any new one now that it's static.
>>
>> This then also makes more obvious that in the EPT case we wouldn't
>> risk mistakenly calling through the NULL hook pointer.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>> --- a/xen/arch/x86/mm/p2m-pt.c
>> +++ b/xen/arch/x86/mm/p2m-pt.c
>> @@ -108,6 +108,31 @@ static unsigned long p2m_type_to_flags(c
>>      }
>>  }
>>  
>> +/*
>> + * Atomically write a P2M entry and update the paging-assistance state
>> + * appropriately.
>> + * Arguments: the domain in question, the GFN whose mapping is being updated,
>> + * a pointer to the entry to be written, the MFN in which the entry resides,
>> + * the new contents of the entry, and the level in the p2m tree at which
>> + * we are writing.
>> + */
>> +static int write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
>> +                           l1_pgentry_t *p, l1_pgentry_t new,
>> +                           unsigned int level)
>> +{
>> +    struct domain *d = p2m->domain;
>> +    struct vcpu *v = current;
> 
> I think you could constify both?

For v it looks like I could. For d a subsequent patch would then
need to undo it, so I'd prefer to keep it this way here.

Jan

[PATCH 2/5] x86/p2m: collapse the two ->write_p2m_entry() hooks
Posted by Jan Beulich 3 years, 5 months ago
The struct paging_mode instances get set to the same functions
regardless of mode by both HAP and shadow code, hence there's no point
having this hook there. The hook also doesn't need moving elsewhere - we
can directly use struct p2m_domain's. This merely requires (from a
strictly formal pov; in practice this may not even be needed) making
sure we don't end up using safe_write_pte() for nested P2Ms.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Like for the possibly unnecessary p2m_is_nestedp2m() I'm not really sure
the paging_get_hostmode() check there is still needed either. But I
didn't want to alter more aspects than necessary here.

Of course with the p2m_is_nestedp2m() check there and with all three of
{hap,nestedp2m,shadow}_write_p2m_entry() now globally accessible, it's
certainly an option to do away with the indirect call there altogether.
In fact we may even be able to go further and fold the three functions:
They're relatively similar, and this would "seamlessly" address the
apparent bug of nestedp2m_write_p2m_entry() not making use of
p2m_entry_modify().

--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -823,6 +823,11 @@ hap_write_p2m_entry(struct p2m_domain *p
     return 0;
 }
 
+void hap_p2m_init(struct p2m_domain *p2m)
+{
+    p2m->write_p2m_entry = hap_write_p2m_entry;
+}
+
 static unsigned long hap_gva_to_gfn_real_mode(
     struct vcpu *v, struct p2m_domain *p2m, unsigned long gva, uint32_t *pfec)
 {
@@ -846,7 +851,6 @@ static const struct paging_mode hap_pagi
     .p2m_ga_to_gfn          = hap_p2m_ga_to_gfn_real_mode,
     .update_cr3             = hap_update_cr3,
     .update_paging_modes    = hap_update_paging_modes,
-    .write_p2m_entry        = hap_write_p2m_entry,
     .flush_tlb              = flush_tlb,
     .guest_levels           = 1
 };
@@ -858,7 +862,6 @@ static const struct paging_mode hap_pagi
     .p2m_ga_to_gfn          = hap_p2m_ga_to_gfn_2_levels,
     .update_cr3             = hap_update_cr3,
     .update_paging_modes    = hap_update_paging_modes,
-    .write_p2m_entry        = hap_write_p2m_entry,
     .flush_tlb              = flush_tlb,
     .guest_levels           = 2
 };
@@ -870,7 +873,6 @@ static const struct paging_mode hap_pagi
     .p2m_ga_to_gfn          = hap_p2m_ga_to_gfn_3_levels,
     .update_cr3             = hap_update_cr3,
     .update_paging_modes    = hap_update_paging_modes,
-    .write_p2m_entry        = hap_write_p2m_entry,
     .flush_tlb              = flush_tlb,
     .guest_levels           = 3
 };
@@ -882,7 +884,6 @@ static const struct paging_mode hap_pagi
     .p2m_ga_to_gfn          = hap_p2m_ga_to_gfn_4_levels,
     .update_cr3             = hap_update_cr3,
     .update_paging_modes    = hap_update_paging_modes,
-    .write_p2m_entry        = hap_write_p2m_entry,
     .flush_tlb              = flush_tlb,
     .guest_levels           = 4
 };
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -126,8 +126,9 @@ static int write_p2m_entry(struct p2m_do
 
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
-    if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) )
-        rc = paging_get_hostmode(v)->write_p2m_entry(p2m, gfn, p, new, level);
+    if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
+         p2m_is_nestedp2m(p2m) )
+        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
     else
         safe_write_pte(p, new);
 
@@ -209,7 +210,7 @@ p2m_next_level(struct p2m_domain *p2m, v
 
         new_entry = l1e_from_mfn(mfn, P2M_BASE_FLAGS | _PAGE_RW);
 
-        rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, new_entry, level + 1);
+        rc = write_p2m_entry(p2m, gfn, p2m_entry, new_entry, level + 1);
         if ( rc )
             goto error;
     }
@@ -251,7 +252,7 @@ p2m_next_level(struct p2m_domain *p2m, v
         {
             new_entry = l1e_from_pfn(pfn | (i << ((level - 1) * PAGETABLE_ORDER)),
                                      flags);
-            rc = p2m->write_p2m_entry(p2m, gfn, l1_entry + i, new_entry, level);
+            rc = write_p2m_entry(p2m, gfn, l1_entry + i, new_entry, level);
             if ( rc )
             {
                 unmap_domain_page(l1_entry);
@@ -262,8 +263,7 @@ p2m_next_level(struct p2m_domain *p2m, v
         unmap_domain_page(l1_entry);
 
         new_entry = l1e_from_mfn(mfn, P2M_BASE_FLAGS | _PAGE_RW);
-        rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, new_entry,
-                                  level + 1);
+        rc = write_p2m_entry(p2m, gfn, p2m_entry, new_entry, level + 1);
         if ( rc )
             goto error;
     }
@@ -335,7 +335,7 @@ static int p2m_pt_set_recalc_range(struc
             if ( (l1e_get_flags(e) & _PAGE_PRESENT) && !needs_recalc(l1, e) )
             {
                 set_recalc(l1, e);
-                err = p2m->write_p2m_entry(p2m, first_gfn, pent, e, level);
+                err = write_p2m_entry(p2m, first_gfn, pent, e, level);
                 if ( err )
                 {
                     ASSERT_UNREACHABLE();
@@ -412,8 +412,8 @@ static int do_recalc(struct p2m_domain *
                      !needs_recalc(l1, ent) )
                 {
                     set_recalc(l1, ent);
-                    err = p2m->write_p2m_entry(p2m, gfn - remainder, &ptab[i],
-                                               ent, level);
+                    err = write_p2m_entry(p2m, gfn - remainder, &ptab[i], ent,
+                                          level);
                     if ( err )
                     {
                         ASSERT_UNREACHABLE();
@@ -426,7 +426,7 @@ static int do_recalc(struct p2m_domain *
             if ( !err )
             {
                 clear_recalc(l1, e);
-                err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
+                err = write_p2m_entry(p2m, gfn, pent, e, level + 1);
                 ASSERT(!err);
 
                 recalc_done = true;
@@ -474,7 +474,7 @@ static int do_recalc(struct p2m_domain *
         }
         else
             clear_recalc(l1, e);
-        err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
+        err = write_p2m_entry(p2m, gfn, pent, e, level + 1);
         ASSERT(!err);
 
         recalc_done = true;
@@ -618,7 +618,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
             : l3e_empty();
         entry_content.l1 = l3e_content.l3;
 
-        rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 3);
+        rc = write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 3);
         /* NB: write_p2m_entry() handles tlb flushes properly */
         if ( rc )
             goto out;
@@ -655,7 +655,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
             entry_content = l1e_empty();
 
         /* level 1 entry */
-        rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
+        rc = write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
         /* NB: write_p2m_entry() handles tlb flushes properly */
         if ( rc )
             goto out;
@@ -690,7 +690,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
             : l2e_empty();
         entry_content.l1 = l2e_content.l2;
 
-        rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 2);
+        rc = write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 2);
         /* NB: write_p2m_entry() handles tlb flushes properly */
         if ( rc )
             goto out;
@@ -914,7 +914,7 @@ static void p2m_pt_change_entry_type_glo
             int rc;
 
             set_recalc(l1, e);
-            rc = p2m->write_p2m_entry(p2m, gfn, &tab[i], e, 4);
+            rc = write_p2m_entry(p2m, gfn, &tab[i], e, 4);
             if ( rc )
             {
                 ASSERT_UNREACHABLE();
@@ -1132,7 +1132,13 @@ void p2m_pt_init(struct p2m_domain *p2m)
     p2m->recalc = do_recalc;
     p2m->change_entry_type_global = p2m_pt_change_entry_type_global;
     p2m->change_entry_type_range = p2m_pt_change_entry_type_range;
-    p2m->write_p2m_entry = write_p2m_entry;
+
+    /* Still too early to use paging_mode_hap(). */
+    if ( hap_enabled(p2m->domain) )
+        hap_p2m_init(p2m);
+    else if ( IS_ENABLED(CONFIG_SHADOW_PAGING) )
+        shadow_p2m_init(p2m);
+
 #if P2M_AUDIT
     p2m->audit_p2m = p2m_pt_audit_p2m;
 #else
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -3137,7 +3137,7 @@ static void sh_unshadow_for_p2m_change(s
     }
 }
 
-int
+static int
 shadow_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
                        l1_pgentry_t *p, l1_pgentry_t new,
                        unsigned int level)
@@ -3183,6 +3183,11 @@ shadow_write_p2m_entry(struct p2m_domain
     return 0;
 }
 
+void shadow_p2m_init(struct p2m_domain *p2m)
+{
+    p2m->write_p2m_entry = shadow_write_p2m_entry;
+}
+
 /**************************************************************************/
 /* Log-dirty mode support */
 
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -4792,7 +4792,6 @@ const struct paging_mode sh_paging_mode
     .gva_to_gfn                    = sh_gva_to_gfn,
     .update_cr3                    = sh_update_cr3,
     .update_paging_modes           = shadow_update_paging_modes,
-    .write_p2m_entry               = shadow_write_p2m_entry,
     .flush_tlb                     = shadow_flush_tlb,
     .guest_levels                  = GUEST_PAGING_LEVELS,
     .shadow.detach_old_tables      = sh_detach_old_tables,
--- a/xen/arch/x86/mm/shadow/none.c
+++ b/xen/arch/x86/mm/shadow/none.c
@@ -60,21 +60,12 @@ static void _update_paging_modes(struct
     ASSERT_UNREACHABLE();
 }
 
-static int _write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
-                            l1_pgentry_t *p, l1_pgentry_t new,
-                            unsigned int level)
-{
-    ASSERT_UNREACHABLE();
-    return -EOPNOTSUPP;
-}
-
 static const struct paging_mode sh_paging_none = {
     .page_fault                    = _page_fault,
     .invlpg                        = _invlpg,
     .gva_to_gfn                    = _gva_to_gfn,
     .update_cr3                    = _update_cr3,
     .update_paging_modes           = _update_paging_modes,
-    .write_p2m_entry               = _write_p2m_entry,
 };
 
 void shadow_vcpu_init(struct vcpu *v)
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -390,11 +390,6 @@ static inline int sh_remove_write_access
 }
 #endif
 
-/* Functions that atomically write PT/P2M entries and update state */
-int shadow_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
-                           l1_pgentry_t *p, l1_pgentry_t new,
-                           unsigned int level);
-
 /* Functions that atomically write PV guest PT entries */
 void sh_write_guest_entry(struct vcpu *v, intpte_t *p, intpte_t new,
                           mfn_t gmfn);
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -836,6 +836,9 @@ void p2m_flush_nestedp2m(struct domain *
 /* Flushes the np2m specified by np2m_base (if it exists) */
 void np2m_flush_base(struct vcpu *v, unsigned long np2m_base);
 
+void hap_p2m_init(struct p2m_domain *p2m);
+void shadow_p2m_init(struct p2m_domain *p2m);
+
 int nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
     l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);
 
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -141,10 +141,6 @@ struct paging_mode {
     void          (*update_cr3            )(struct vcpu *v, int do_locking,
                                             bool noflush);
     void          (*update_paging_modes   )(struct vcpu *v);
-    int           (*write_p2m_entry       )(struct p2m_domain *p2m,
-                                            unsigned long gfn,
-                                            l1_pgentry_t *p, l1_pgentry_t new,
-                                            unsigned int level);
     bool          (*flush_tlb             )(bool (*flush_vcpu)(void *ctxt,
                                                                struct vcpu *v),
                                             void *ctxt);


Re: [PATCH 2/5] x86/p2m: collapse the two ->write_p2m_entry() hooks
Posted by Tim Deegan 3 years, 5 months ago
At 10:22 +0100 on 28 Oct (1603880578), Jan Beulich wrote:
> The struct paging_mode instances get set to the same functions
> regardless of mode by both HAP and shadow code, hence there's no point
> having this hook there. The hook also doesn't need moving elsewhere - we
> can directly use struct p2m_domain's. This merely requires (from a
> strictly formal pov; in practice this may not even be needed) making
> sure we don't end up using safe_write_pte() for nested P2Ms.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Tim Deegan <tim@xen.org>

Re: [PATCH 2/5] x86/p2m: collapse the two ->write_p2m_entry() hooks
Posted by Roger Pau Monné 3 years, 5 months ago
On Wed, Oct 28, 2020 at 10:22:58AM +0100, Jan Beulich wrote:
> The struct paging_mode instances get set to the same functions
> regardless of mode by both HAP and shadow code, hence there's no point
> having this hook there. The hook also doesn't need moving elsewhere - we
> can directly use struct p2m_domain's. This merely requires (from a
> strictly formal pov; in practice this may not even be needed) making
> sure we don't end up using safe_write_pte() for nested P2Ms.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Like for the possibly unnecessary p2m_is_nestedp2m() I'm not really sure
> the paging_get_hostmode() check there is still needed either. But I
> didn't want to alter more aspects than necessary here.
> 
> Of course with the p2m_is_nestedp2m() check there and with all three of
> {hap,nestedp2m,shadow}_write_p2m_entry() now globally accessible, it's
> certainly an option to do away with the indirect call there altogether.
> In fact we may even be able to go further and fold the three functions:
> They're relatively similar, and this would "seamlessly" address the
> apparent bug of nestedp2m_write_p2m_entry() not making use of
> p2m_entry_modify().
> 
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -823,6 +823,11 @@ hap_write_p2m_entry(struct p2m_domain *p
>      return 0;
>  }
>  
> +void hap_p2m_init(struct p2m_domain *p2m)
> +{
> +    p2m->write_p2m_entry = hap_write_p2m_entry;
> +}
> +
>  static unsigned long hap_gva_to_gfn_real_mode(
>      struct vcpu *v, struct p2m_domain *p2m, unsigned long gva, uint32_t *pfec)
>  {
> @@ -846,7 +851,6 @@ static const struct paging_mode hap_pagi
>      .p2m_ga_to_gfn          = hap_p2m_ga_to_gfn_real_mode,
>      .update_cr3             = hap_update_cr3,
>      .update_paging_modes    = hap_update_paging_modes,
> -    .write_p2m_entry        = hap_write_p2m_entry,
>      .flush_tlb              = flush_tlb,
>      .guest_levels           = 1
>  };
> @@ -858,7 +862,6 @@ static const struct paging_mode hap_pagi
>      .p2m_ga_to_gfn          = hap_p2m_ga_to_gfn_2_levels,
>      .update_cr3             = hap_update_cr3,
>      .update_paging_modes    = hap_update_paging_modes,
> -    .write_p2m_entry        = hap_write_p2m_entry,
>      .flush_tlb              = flush_tlb,
>      .guest_levels           = 2
>  };
> @@ -870,7 +873,6 @@ static const struct paging_mode hap_pagi
>      .p2m_ga_to_gfn          = hap_p2m_ga_to_gfn_3_levels,
>      .update_cr3             = hap_update_cr3,
>      .update_paging_modes    = hap_update_paging_modes,
> -    .write_p2m_entry        = hap_write_p2m_entry,
>      .flush_tlb              = flush_tlb,
>      .guest_levels           = 3
>  };
> @@ -882,7 +884,6 @@ static const struct paging_mode hap_pagi
>      .p2m_ga_to_gfn          = hap_p2m_ga_to_gfn_4_levels,
>      .update_cr3             = hap_update_cr3,
>      .update_paging_modes    = hap_update_paging_modes,
> -    .write_p2m_entry        = hap_write_p2m_entry,
>      .flush_tlb              = flush_tlb,
>      .guest_levels           = 4
>  };
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -126,8 +126,9 @@ static int write_p2m_entry(struct p2m_do
>  
>      if ( v->domain != d )
>          v = d->vcpu ? d->vcpu[0] : NULL;
> -    if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) )
> -        rc = paging_get_hostmode(v)->write_p2m_entry(p2m, gfn, p, new, level);
> +    if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
> +         p2m_is_nestedp2m(p2m) )
> +        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
>      else
>          safe_write_pte(p, new);
>  
> @@ -209,7 +210,7 @@ p2m_next_level(struct p2m_domain *p2m, v
>  
>          new_entry = l1e_from_mfn(mfn, P2M_BASE_FLAGS | _PAGE_RW);
>  
> -        rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, new_entry, level + 1);
> +        rc = write_p2m_entry(p2m, gfn, p2m_entry, new_entry, level + 1);
>          if ( rc )
>              goto error;
>      }
> @@ -251,7 +252,7 @@ p2m_next_level(struct p2m_domain *p2m, v
>          {
>              new_entry = l1e_from_pfn(pfn | (i << ((level - 1) * PAGETABLE_ORDER)),
>                                       flags);
> -            rc = p2m->write_p2m_entry(p2m, gfn, l1_entry + i, new_entry, level);
> +            rc = write_p2m_entry(p2m, gfn, l1_entry + i, new_entry, level);
>              if ( rc )
>              {
>                  unmap_domain_page(l1_entry);
> @@ -262,8 +263,7 @@ p2m_next_level(struct p2m_domain *p2m, v
>          unmap_domain_page(l1_entry);
>  
>          new_entry = l1e_from_mfn(mfn, P2M_BASE_FLAGS | _PAGE_RW);
> -        rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, new_entry,
> -                                  level + 1);
> +        rc = write_p2m_entry(p2m, gfn, p2m_entry, new_entry, level + 1);
>          if ( rc )
>              goto error;
>      }
> @@ -335,7 +335,7 @@ static int p2m_pt_set_recalc_range(struc
>              if ( (l1e_get_flags(e) & _PAGE_PRESENT) && !needs_recalc(l1, e) )
>              {
>                  set_recalc(l1, e);
> -                err = p2m->write_p2m_entry(p2m, first_gfn, pent, e, level);
> +                err = write_p2m_entry(p2m, first_gfn, pent, e, level);
>                  if ( err )
>                  {
>                      ASSERT_UNREACHABLE();
> @@ -412,8 +412,8 @@ static int do_recalc(struct p2m_domain *
>                       !needs_recalc(l1, ent) )
>                  {
>                      set_recalc(l1, ent);
> -                    err = p2m->write_p2m_entry(p2m, gfn - remainder, &ptab[i],
> -                                               ent, level);
> +                    err = write_p2m_entry(p2m, gfn - remainder, &ptab[i], ent,
> +                                          level);
>                      if ( err )
>                      {
>                          ASSERT_UNREACHABLE();
> @@ -426,7 +426,7 @@ static int do_recalc(struct p2m_domain *
>              if ( !err )
>              {
>                  clear_recalc(l1, e);
> -                err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
> +                err = write_p2m_entry(p2m, gfn, pent, e, level + 1);
>                  ASSERT(!err);
>  
>                  recalc_done = true;
> @@ -474,7 +474,7 @@ static int do_recalc(struct p2m_domain *
>          }
>          else
>              clear_recalc(l1, e);
> -        err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
> +        err = write_p2m_entry(p2m, gfn, pent, e, level + 1);
>          ASSERT(!err);
>  
>          recalc_done = true;
> @@ -618,7 +618,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
>              : l3e_empty();
>          entry_content.l1 = l3e_content.l3;
>  
> -        rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 3);
> +        rc = write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 3);
>          /* NB: write_p2m_entry() handles tlb flushes properly */
>          if ( rc )
>              goto out;
> @@ -655,7 +655,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
>              entry_content = l1e_empty();
>  
>          /* level 1 entry */
> -        rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
> +        rc = write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
>          /* NB: write_p2m_entry() handles tlb flushes properly */
>          if ( rc )
>              goto out;
> @@ -690,7 +690,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
>              : l2e_empty();
>          entry_content.l1 = l2e_content.l2;
>  
> -        rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 2);
> +        rc = write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 2);
>          /* NB: write_p2m_entry() handles tlb flushes properly */
>          if ( rc )
>              goto out;
> @@ -914,7 +914,7 @@ static void p2m_pt_change_entry_type_glo
>              int rc;
>  
>              set_recalc(l1, e);
> -            rc = p2m->write_p2m_entry(p2m, gfn, &tab[i], e, 4);
> +            rc = write_p2m_entry(p2m, gfn, &tab[i], e, 4);
>              if ( rc )
>              {
>                  ASSERT_UNREACHABLE();
> @@ -1132,7 +1132,13 @@ void p2m_pt_init(struct p2m_domain *p2m)
>      p2m->recalc = do_recalc;
>      p2m->change_entry_type_global = p2m_pt_change_entry_type_global;
>      p2m->change_entry_type_range = p2m_pt_change_entry_type_range;
> -    p2m->write_p2m_entry = write_p2m_entry;
> +
> +    /* Still too early to use paging_mode_hap(). */
> +    if ( hap_enabled(p2m->domain) )
> +        hap_p2m_init(p2m);
> +    else if ( IS_ENABLED(CONFIG_SHADOW_PAGING) )
> +        shadow_p2m_init(p2m);

There's already some logic in p2m_initialise that checks for
hap_enabled for EPT specific initialization. Do you think you could
move this there so that it's more contained?

I think having the initialization condition sprinkled all over the
different functions makes the logic more complicated to follow.

Also, should hap_p2m_init be limited to HAP and PT, as opposed to HAP
and EPT which doesn't use the helper AFAICT?

Maybe it would be clearer to unify shadow_write_p2m_entry with
hap_write_p2m_entry and call it p2m_pt_write_p2m_entry to match the
rest of the p2m PT helpers?

Thanks, Roger.

Re: [PATCH 2/5] x86/p2m: collapse the two ->write_p2m_entry() hooks
Posted by Jan Beulich 3 years, 5 months ago
On 10.11.2020 12:06, Roger Pau Monné wrote:
> On Wed, Oct 28, 2020 at 10:22:58AM +0100, Jan Beulich wrote:
>> @@ -1132,7 +1132,13 @@ void p2m_pt_init(struct p2m_domain *p2m)
>>      p2m->recalc = do_recalc;
>>      p2m->change_entry_type_global = p2m_pt_change_entry_type_global;
>>      p2m->change_entry_type_range = p2m_pt_change_entry_type_range;
>> -    p2m->write_p2m_entry = write_p2m_entry;
>> +
>> +    /* Still too early to use paging_mode_hap(). */
>> +    if ( hap_enabled(p2m->domain) )
>> +        hap_p2m_init(p2m);
>> +    else if ( IS_ENABLED(CONFIG_SHADOW_PAGING) )
>> +        shadow_p2m_init(p2m);
> 
> There's already some logic in p2m_initialise that checks for
> hap_enabled for EPT specific initialization. Do you think you could
> move this there so that it's more contained?
> 
> I think having the initialization condition sprinkled all over the
> different functions makes the logic more complicated to follow.
> 
> Also, should hap_p2m_init be limited to HAP and PT, as opposed to HAP
> and EPT which doesn't use the helper AFAICT?

It is limited to HAP and PT - we're in p2m_pt_init() here. This is
also why I don't want to move it to e.g. p2m_initialise(), as that
would be the wrong layer.

> Maybe it would be clearer to unify shadow_write_p2m_entry with
> hap_write_p2m_entry and call it p2m_pt_write_p2m_entry to match the
> rest of the p2m PT helpers?

This looks to go along the lines of what I'd put up as a post-
commit-message remark in "x86/p2m: collapse the two
->write_p2m_entry() hooks". The nested handler is perhaps the
bigger problem with such merging, plus it would feel a little like
a layering violation (which is why I did put up the question
instead of doing it right away).

Jan

Re: [PATCH 2/5] x86/p2m: collapse the two ->write_p2m_entry() hooks
Posted by Roger Pau Monné 3 years, 5 months ago
On Tue, Nov 10, 2020 at 02:51:11PM +0100, Jan Beulich wrote:
> On 10.11.2020 12:06, Roger Pau Monné wrote:
> > On Wed, Oct 28, 2020 at 10:22:58AM +0100, Jan Beulich wrote:
> >> @@ -1132,7 +1132,13 @@ void p2m_pt_init(struct p2m_domain *p2m)
> >>      p2m->recalc = do_recalc;
> >>      p2m->change_entry_type_global = p2m_pt_change_entry_type_global;
> >>      p2m->change_entry_type_range = p2m_pt_change_entry_type_range;
> >> -    p2m->write_p2m_entry = write_p2m_entry;
> >> +
> >> +    /* Still too early to use paging_mode_hap(). */
> >> +    if ( hap_enabled(p2m->domain) )
> >> +        hap_p2m_init(p2m);
> >> +    else if ( IS_ENABLED(CONFIG_SHADOW_PAGING) )
> >> +        shadow_p2m_init(p2m);
> > 
> > There's already some logic in p2m_initialise that checks for
> > hap_enabled for EPT specific initialization. Do you think you could
> > move this there so that it's more contained?
> > 
> > I think having the initialization condition sprinkled all over the
> > different functions makes the logic more complicated to follow.
> > 
> > Also, should hap_p2m_init be limited to HAP and PT, as opposed to HAP
> > and EPT which doesn't use the helper AFAICT?
> 
> It is limited to HAP and PT - we're in p2m_pt_init() here. This is
> also why I don't want to move it to e.g. p2m_initialise(), as that
> would be the wrong layer.

All those sub-initializations make the code slightly harder to follow,
but I guess it's fine if we want to keep it layered in this way.

> > Maybe it would be clearer to unify shadow_write_p2m_entry with
> > hap_write_p2m_entry and call it p2m_pt_write_p2m_entry to match the
> > rest of the p2m PT helpers?
> 
> This looks to go along the lines of what I'd put up as a post-
> commit-message remark in "x86/p2m: collapse the two
> ->write_p2m_entry() hooks". The nested handler is perhaps the
> bigger problem with such merging, plus it would feel a little like
> a layering violation (which is why I did put up the question
> instead of doing it right away).

Right, we could look into it on further patches:

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

[PATCH 3/5] x86/p2m: suppress audit_p2m hook when possible
Posted by Jan Beulich 3 years, 5 months ago
When P2M_AUDIT is false, it's unused, so instead of having a dangling
NULL pointer sit there, omit the field altogether.

Instead of adding "#if P2M_AUDIT && defined(CONFIG_HVM)" in even more
places, fold the latter part right into the definition of P2M_AUDIT.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1012,7 +1012,7 @@ long arch_do_domctl(
         break;
 #endif
 
-#if P2M_AUDIT && defined(CONFIG_HVM)
+#if P2M_AUDIT
     case XEN_DOMCTL_audit_p2m:
         if ( d == currd )
             ret = -EPERM;
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -1260,7 +1260,9 @@ int ept_p2m_init(struct p2m_domain *p2m)
     p2m->change_entry_type_global = ept_change_entry_type_global;
     p2m->change_entry_type_range = ept_change_entry_type_range;
     p2m->memory_type_changed = ept_memory_type_changed;
+#if P2M_AUDIT
     p2m->audit_p2m = NULL;
+#endif
     p2m->tlb_flush = ept_tlb_flush;
 
     /* Set the memory type used when accessing EPT paging structures. */
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -971,8 +971,8 @@ static int p2m_pt_change_entry_type_rang
     return err;
 }
 
-#if P2M_AUDIT && defined(CONFIG_HVM)
-long p2m_pt_audit_p2m(struct p2m_domain *p2m)
+#if P2M_AUDIT
+static long p2m_pt_audit_p2m(struct p2m_domain *p2m)
 {
     unsigned long entry_count = 0, pmbad = 0;
     unsigned long mfn, gfn, m2pfn;
@@ -1120,8 +1120,6 @@ long p2m_pt_audit_p2m(struct p2m_domain
 
     return pmbad;
 }
-#else
-# define p2m_pt_audit_p2m NULL
 #endif /* P2M_AUDIT */
 
 /* Set up the p2m function pointers for pagetable format */
@@ -1141,8 +1139,6 @@ void p2m_pt_init(struct p2m_domain *p2m)
 
 #if P2M_AUDIT
     p2m->audit_p2m = p2m_pt_audit_p2m;
-#else
-    p2m->audit_p2m = NULL;
 #endif
 }
 
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2435,7 +2435,7 @@ int p2m_altp2m_propagate_change(struct d
 
 /*** Audit ***/
 
-#if P2M_AUDIT && defined(CONFIG_HVM)
+#if P2M_AUDIT
 void audit_p2m(struct domain *d,
                uint64_t *orphans,
                 uint64_t *m2p_bad,
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -31,6 +31,14 @@
 #include <asm/mem_sharing.h>
 #include <asm/page.h>    /* for pagetable_t */
 
+/* Debugging and auditing of the P2M code? */
+#ifndef NDEBUG
+#define P2M_AUDIT     defined(CONFIG_HVM)
+#else
+#define P2M_AUDIT     0
+#endif
+#define P2M_DEBUGGING 0
+
 extern bool_t opt_hap_1gb, opt_hap_2mb;
 
 /*
@@ -268,7 +276,9 @@ struct p2m_domain {
     int                (*write_p2m_entry)(struct p2m_domain *p2m,
                                           unsigned long gfn, l1_pgentry_t *p,
                                           l1_pgentry_t new, unsigned int level);
+#if P2M_AUDIT
     long               (*audit_p2m)(struct p2m_domain *p2m);
+#endif
 
     /*
      * P2M updates may require TLBs to be flushed (invalidated).
@@ -758,14 +768,6 @@ extern void p2m_pt_init(struct p2m_domai
 void *map_domain_gfn(struct p2m_domain *p2m, gfn_t gfn, mfn_t *mfn,
                      p2m_query_t q, uint32_t *pfec);
 
-/* Debugging and auditing of the P2M code? */
-#ifndef NDEBUG
-#define P2M_AUDIT     1
-#else
-#define P2M_AUDIT     0
-#endif
-#define P2M_DEBUGGING 0
-
 #if P2M_AUDIT
 extern void audit_p2m(struct domain *d,
                       uint64_t *orphans,


Re: [PATCH 3/5] x86/p2m: suppress audit_p2m hook when possible
Posted by Roger Pau Monné 3 years, 5 months ago
On Wed, Oct 28, 2020 at 10:23:42AM +0100, Jan Beulich wrote:
> When P2M_AUDIT is false, it's unused, so instead of having a dangling
> NULL pointer sit there, omit the field altogether.
> 
> Instead of adding "#if P2M_AUDIT && defined(CONFIG_HVM)" in even more
> places, fold the latter part right into the definition of P2M_AUDIT.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

Re: [PATCH 3/5] x86/p2m: suppress audit_p2m hook when possible
Posted by Jan Beulich 3 years, 5 months ago
On 10.11.2020 12:30, Roger Pau Monné wrote:
> On Wed, Oct 28, 2020 at 10:23:42AM +0100, Jan Beulich wrote:
>> When P2M_AUDIT is false, it's unused, so instead of having a dangling
>> NULL pointer sit there, omit the field altogether.
>>
>> Instead of adding "#if P2M_AUDIT && defined(CONFIG_HVM)" in even more
>> places, fold the latter part right into the definition of P2M_AUDIT.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks. Since I see you keep replying to v1, are you aware of
there being v2? For this particular patch there actually is a
clang related fix in v2, which may be of interest to you.

Jan

Re: [PATCH 3/5] x86/p2m: suppress audit_p2m hook when possible
Posted by Roger Pau Monné 3 years, 5 months ago
On Tue, Nov 10, 2020 at 02:21:43PM +0100, Jan Beulich wrote:
> On 10.11.2020 12:30, Roger Pau Monné wrote:
> > On Wed, Oct 28, 2020 at 10:23:42AM +0100, Jan Beulich wrote:
> >> When P2M_AUDIT is false, it's unused, so instead of having a dangling
> >> NULL pointer sit there, omit the field altogether.
> >>
> >> Instead of adding "#if P2M_AUDIT && defined(CONFIG_HVM)" in even more
> >> places, fold the latter part right into the definition of P2M_AUDIT.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > 
> > Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Thanks. Since I see you keep replying to v1, are you aware of
> there being v2? For this particular patch there actually is a
> clang related fix in v2, which may be of interest to you.

Urg, no sorry. I was just going over my mail queue and didn't realize
there was a v2. Will take a look.

Roger.

[PATCH 4/5] x86/HAP: move nested-P2M flush calculations out of locked region
Posted by Jan Beulich 3 years, 5 months ago
By latching the old MFN into a local variable, these calculations don't
depend on anything but local variables anymore. Hence the point in time
when they get performed doesn't matter anymore, so they can be moved
past the locked region.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -780,7 +780,7 @@ hap_write_p2m_entry(struct p2m_domain *p
 {
     struct domain *d = p2m->domain;
     uint32_t old_flags;
-    bool_t flush_nestedp2m = 0;
+    mfn_t omfn;
     int rc;
 
     /* We know always use the host p2m here, regardless if the vcpu
@@ -790,21 +790,11 @@ hap_write_p2m_entry(struct p2m_domain *p
 
     paging_lock(d);
     old_flags = l1e_get_flags(*p);
-
-    if ( nestedhvm_enabled(d) && (old_flags & _PAGE_PRESENT) 
-         && !p2m_get_hostp2m(d)->defer_nested_flush ) {
-        /* We are replacing a valid entry so we need to flush nested p2ms,
-         * unless the only change is an increase in access rights. */
-        mfn_t omfn = l1e_get_mfn(*p);
-        mfn_t nmfn = l1e_get_mfn(new);
-
-        flush_nestedp2m = !(mfn_eq(omfn, nmfn)
-            && perms_strictly_increased(old_flags, l1e_get_flags(new)) );
-    }
+    omfn = l1e_get_mfn(*p);
 
     rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
                           p2m_flags_to_type(old_flags), l1e_get_mfn(new),
-                          l1e_get_mfn(*p), level);
+                          omfn, level);
     if ( rc )
     {
         paging_unlock(d);
@@ -817,7 +807,14 @@ hap_write_p2m_entry(struct p2m_domain *p
 
     paging_unlock(d);
 
-    if ( flush_nestedp2m )
+    if ( nestedhvm_enabled(d) && (old_flags & _PAGE_PRESENT) &&
+         !p2m_get_hostp2m(d)->defer_nested_flush &&
+         /*
+          * We are replacing a valid entry so we need to flush nested p2ms,
+          * unless the only change is an increase in access rights.
+          */
+         (!mfn_eq(omfn, l1e_get_mfn(new)) ||
+          !perms_strictly_increased(old_flags, l1e_get_flags(new))) )
         p2m_flush_nestedp2m(d);
 
     return 0;


Re: [PATCH 4/5] x86/HAP: move nested-P2M flush calculations out of locked region
Posted by Roger Pau Monné 3 years, 5 months ago
On Wed, Oct 28, 2020 at 10:24:12AM +0100, Jan Beulich wrote:
> By latching the old MFN into a local variable, these calculations don't
> depend on anything but local variables anymore. Hence the point in time
> when they get performed doesn't matter anymore, so they can be moved
> past the locked region.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

[PATCH 5/5] x86/p2m: split write_p2m_entry() hook
Posted by Jan Beulich 3 years, 5 months ago
Fair parts of the present handlers are identical; in fact
nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move
common parts right into write_p2m_entry(), splitting the hooks into a
"pre" one (needed just by shadow code) and a "post" one.

For the common parts moved I think that the p2m_flush_nestedp2m() is,
at least from an abstract perspective, also applicable in the shadow
case. Hence it doesn't get a 3rd hook put in place.

The initial comment that was in hap_write_p2m_entry() gets dropped: Its
placement was bogus, and looking back the the commit introducing it
(dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use
of a p2m it was meant to be associated with.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: This is effectively the alternative to the suggestion in an earlier
     patch that we might do away with the hook altogether. Of course a
     hybrid approach would also be possible, by using direct calls here
     instead of splitting the hook into two.
---
I'm unsure whether p2m_init_nestedp2m() zapping the "pre" hook is
actually correct, or whether previously the sh_unshadow_for_p2m_change()
invocation was wrongly skipped.

--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -774,55 +774,18 @@ static void hap_update_paging_modes(stru
     put_gfn(d, cr3_gfn);
 }
 
-static int
-hap_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, l1_pgentry_t *p,
-                    l1_pgentry_t new, unsigned int level)
+static void
+hap_write_p2m_entry_post(struct p2m_domain *p2m, unsigned int oflags)
 {
     struct domain *d = p2m->domain;
-    uint32_t old_flags;
-    mfn_t omfn;
-    int rc;
 
-    /* We know always use the host p2m here, regardless if the vcpu
-     * is in host or guest mode. The vcpu can be in guest mode by
-     * a hypercall which passes a domain and chooses mostly the first
-     * vcpu. */
-
-    paging_lock(d);
-    old_flags = l1e_get_flags(*p);
-    omfn = l1e_get_mfn(*p);
-
-    rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
-                          p2m_flags_to_type(old_flags), l1e_get_mfn(new),
-                          omfn, level);
-    if ( rc )
-    {
-        paging_unlock(d);
-        return rc;
-    }
-
-    safe_write_pte(p, new);
-    if ( old_flags & _PAGE_PRESENT )
+    if ( oflags & _PAGE_PRESENT )
         guest_flush_tlb_mask(d, d->dirty_cpumask);
-
-    paging_unlock(d);
-
-    if ( nestedhvm_enabled(d) && (old_flags & _PAGE_PRESENT) &&
-         !p2m_get_hostp2m(d)->defer_nested_flush &&
-         /*
-          * We are replacing a valid entry so we need to flush nested p2ms,
-          * unless the only change is an increase in access rights.
-          */
-         (!mfn_eq(omfn, l1e_get_mfn(new)) ||
-          !perms_strictly_increased(old_flags, l1e_get_flags(new))) )
-        p2m_flush_nestedp2m(d);
-
-    return 0;
 }
 
 void hap_p2m_init(struct p2m_domain *p2m)
 {
-    p2m->write_p2m_entry = hap_write_p2m_entry;
+    p2m->write_p2m_entry_post = hap_write_p2m_entry_post;
 }
 
 static unsigned long hap_gva_to_gfn_real_mode(
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -71,24 +71,11 @@
 /*        NESTED VIRT P2M FUNCTIONS         */
 /********************************************/
 
-int
-nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
-    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level)
+void
+nestedp2m_write_p2m_entry_post(struct p2m_domain *p2m, unsigned int oflags)
 {
-    struct domain *d = p2m->domain;
-    uint32_t old_flags;
-
-    paging_lock(d);
-
-    old_flags = l1e_get_flags(*p);
-    safe_write_pte(p, new);
-
-    if (old_flags & _PAGE_PRESENT)
-        guest_flush_tlb_mask(d, p2m->dirty_cpumask);
-
-    paging_unlock(d);
-
-    return 0;
+    if ( oflags & _PAGE_PRESENT )
+        guest_flush_tlb_mask(p2m->domain, p2m->dirty_cpumask);
 }
 
 /********************************************/
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -122,17 +122,55 @@ static int write_p2m_entry(struct p2m_do
 {
     struct domain *d = p2m->domain;
     struct vcpu *v = current;
-    int rc = 0;
 
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
     if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
          p2m_is_nestedp2m(p2m) )
-        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
+    {
+        unsigned int oflags;
+        mfn_t omfn;
+        int rc;
+
+        paging_lock(d);
+
+        if ( p2m->write_p2m_entry_pre )
+            p2m->write_p2m_entry_pre(d, gfn, p, new, level);
+
+        oflags = l1e_get_flags(*p);
+        omfn = l1e_get_mfn(*p);
+
+        rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
+                              p2m_flags_to_type(oflags), l1e_get_mfn(new),
+                              omfn, level);
+        if ( rc )
+        {
+            paging_unlock(d);
+            return rc;
+        }
+
+        safe_write_pte(p, new);
+
+        if ( p2m->write_p2m_entry_post )
+            p2m->write_p2m_entry_post(p2m, oflags);
+
+        paging_unlock(d);
+
+        if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) &&
+             (oflags & _PAGE_PRESENT) &&
+             !p2m_get_hostp2m(d)->defer_nested_flush &&
+             /*
+              * We are replacing a valid entry so we need to flush nested p2ms,
+              * unless the only change is an increase in access rights.
+              */
+             (!mfn_eq(omfn, l1e_get_mfn(new)) ||
+              !perms_strictly_increased(oflags, l1e_get_flags(new))) )
+            p2m_flush_nestedp2m(d);
+    }
     else
         safe_write_pte(p, new);
 
-    return rc;
+    return 0;
 }
 
 // Find the next level's P2M entry, checking for out-of-range gfn's...
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -198,7 +198,8 @@ static int p2m_init_nestedp2m(struct dom
             return -ENOMEM;
         }
         p2m->p2m_class = p2m_nested;
-        p2m->write_p2m_entry = nestedp2m_write_p2m_entry;
+        p2m->write_p2m_entry_pre = NULL;
+        p2m->write_p2m_entry_post = nestedp2m_write_p2m_entry_post;
         list_add(&p2m->np2m_list, &p2m_get_hostp2m(d)->np2m_list);
     }
 
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -3137,34 +3137,22 @@ static void sh_unshadow_for_p2m_change(s
     }
 }
 
-static int
-shadow_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
-                       l1_pgentry_t *p, l1_pgentry_t new,
-                       unsigned int level)
+static void
+sh_write_p2m_entry_pre(struct domain *d, unsigned long gfn, l1_pgentry_t *p,
+                       l1_pgentry_t new, unsigned int level)
 {
-    struct domain *d = p2m->domain;
-    int rc;
-
-    paging_lock(d);
-
     /* If there are any shadows, update them.  But if shadow_teardown()
      * has already been called then it's not safe to try. */
     if ( likely(d->arch.paging.shadow.total_pages != 0) )
          sh_unshadow_for_p2m_change(d, gfn, p, new, level);
-
-    rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
-                          p2m_flags_to_type(l1e_get_flags(*p)),
-                          l1e_get_mfn(new), l1e_get_mfn(*p), level);
-    if ( rc )
-    {
-        paging_unlock(d);
-        return rc;
-    }
-
-    /* Update the entry with new content */
-    safe_write_pte(p, new);
+}
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_FAST_FAULT_PATH)
+static void
+sh_write_p2m_entry_post(struct p2m_domain *p2m, unsigned int oflags)
+{
+    struct domain *d = p2m->domain;
+
     /* If we're doing FAST_FAULT_PATH, then shadow mode may have
        cached the fact that this is an mmio region in the shadow
        page tables.  Blow the tables away to remove the cache.
@@ -3176,16 +3164,15 @@ shadow_write_p2m_entry(struct p2m_domain
         shadow_blow_tables(d);
         d->arch.paging.shadow.has_fast_mmio_entries = false;
     }
-#endif
-
-    paging_unlock(d);
-
-    return 0;
 }
+#else
+# define sh_write_p2m_entry_post NULL
+#endif
 
 void shadow_p2m_init(struct p2m_domain *p2m)
 {
-    p2m->write_p2m_entry = shadow_write_p2m_entry;
+    p2m->write_p2m_entry_pre  = sh_write_p2m_entry_pre;
+    p2m->write_p2m_entry_post = sh_write_p2m_entry_post;
 }
 
 /**************************************************************************/
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -272,10 +272,13 @@ struct p2m_domain {
                                                   unsigned long first_gfn,
                                                   unsigned long last_gfn);
     void               (*memory_type_changed)(struct p2m_domain *p2m);
-    
-    int                (*write_p2m_entry)(struct p2m_domain *p2m,
-                                          unsigned long gfn, l1_pgentry_t *p,
-                                          l1_pgentry_t new, unsigned int level);
+    void               (*write_p2m_entry_pre)(struct domain *d,
+                                              unsigned long gfn,
+                                              l1_pgentry_t *p,
+                                              l1_pgentry_t new,
+                                              unsigned int level);
+    void               (*write_p2m_entry_post)(struct p2m_domain *p2m,
+                                               unsigned int oflags);
 #if P2M_AUDIT
     long               (*audit_p2m)(struct p2m_domain *p2m);
 #endif
@@ -472,7 +475,7 @@ void __put_gfn(struct p2m_domain *p2m, u
  *
  * This is also used in the shadow code whenever the paging lock is
  * held -- in those cases, the caller is protected against concurrent
- * p2m updates by the fact that shadow_write_p2m_entry() also takes
+ * p2m updates by the fact that write_p2m_entry() also takes
  * the paging lock.
  *
  * Note that an unlocked accessor only makes sense for a "query" lookup.
@@ -841,8 +844,8 @@ void np2m_flush_base(struct vcpu *v, uns
 void hap_p2m_init(struct p2m_domain *p2m);
 void shadow_p2m_init(struct p2m_domain *p2m);
 
-int nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
-    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);
+void nestedp2m_write_p2m_entry_post(struct p2m_domain *p2m,
+                                    unsigned int oflags);
 
 /*
  * Alternate p2m: shadow p2m tables used for alternate memory views


Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook
Posted by Tim Deegan 3 years, 5 months ago
At 10:24 +0100 on 28 Oct (1603880693), Jan Beulich wrote:
> Fair parts of the present handlers are identical; in fact
> nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move
> common parts right into write_p2m_entry(), splitting the hooks into a
> "pre" one (needed just by shadow code) and a "post" one.
> 
> For the common parts moved I think that the p2m_flush_nestedp2m() is,
> at least from an abstract perspective, also applicable in the shadow
> case. Hence it doesn't get a 3rd hook put in place.
> 
> The initial comment that was in hap_write_p2m_entry() gets dropped: Its
> placement was bogus, and looking back the the commit introducing it
> (dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use
> of a p2m it was meant to be associated with.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

This seems like a good approach to me.  I'm happy with the shadow
parts but am not confident enough on nested p2m any more to have an
opinion on that side. 

Acked-by: Tim Deegan <tim@xen.org>


Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook
Posted by Roger Pau Monné 3 years, 5 months ago
On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
> Fair parts of the present handlers are identical; in fact
> nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move
> common parts right into write_p2m_entry(), splitting the hooks into a
> "pre" one (needed just by shadow code) and a "post" one.
> 
> For the common parts moved I think that the p2m_flush_nestedp2m() is,
> at least from an abstract perspective, also applicable in the shadow
> case. Hence it doesn't get a 3rd hook put in place.
> 
> The initial comment that was in hap_write_p2m_entry() gets dropped: Its
> placement was bogus, and looking back the the commit introducing it
> (dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use
> of a p2m it was meant to be associated with.

Is there any performance implication of moving from one hook to two
hooks? Since this shouldn't be a hot path I assume it's fine.

> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC: This is effectively the alternative to the suggestion in an earlier
>      patch that we might do away with the hook altogether. Of course a
>      hybrid approach would also be possible, by using direct calls here
>      instead of splitting the hook into two.
> ---
> I'm unsure whether p2m_init_nestedp2m() zapping the "pre" hook is
> actually correct, or whether previously the sh_unshadow_for_p2m_change()
> invocation was wrongly skipped.
> 
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -774,55 +774,18 @@ static void hap_update_paging_modes(stru
>      put_gfn(d, cr3_gfn);
>  }
>  
> -static int
> -hap_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, l1_pgentry_t *p,
> -                    l1_pgentry_t new, unsigned int level)
> +static void
> +hap_write_p2m_entry_post(struct p2m_domain *p2m, unsigned int oflags)
>  {
>      struct domain *d = p2m->domain;
> -    uint32_t old_flags;
> -    mfn_t omfn;
> -    int rc;
>  
> -    /* We know always use the host p2m here, regardless if the vcpu
> -     * is in host or guest mode. The vcpu can be in guest mode by
> -     * a hypercall which passes a domain and chooses mostly the first
> -     * vcpu. */
> -
> -    paging_lock(d);
> -    old_flags = l1e_get_flags(*p);
> -    omfn = l1e_get_mfn(*p);
> -
> -    rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
> -                          p2m_flags_to_type(old_flags), l1e_get_mfn(new),
> -                          omfn, level);
> -    if ( rc )
> -    {
> -        paging_unlock(d);
> -        return rc;
> -    }
> -
> -    safe_write_pte(p, new);
> -    if ( old_flags & _PAGE_PRESENT )
> +    if ( oflags & _PAGE_PRESENT )
>          guest_flush_tlb_mask(d, d->dirty_cpumask);
> -
> -    paging_unlock(d);
> -
> -    if ( nestedhvm_enabled(d) && (old_flags & _PAGE_PRESENT) &&
> -         !p2m_get_hostp2m(d)->defer_nested_flush &&
> -         /*
> -          * We are replacing a valid entry so we need to flush nested p2ms,
> -          * unless the only change is an increase in access rights.
> -          */
> -         (!mfn_eq(omfn, l1e_get_mfn(new)) ||
> -          !perms_strictly_increased(old_flags, l1e_get_flags(new))) )
> -        p2m_flush_nestedp2m(d);
> -
> -    return 0;
>  }
>  
>  void hap_p2m_init(struct p2m_domain *p2m)
>  {
> -    p2m->write_p2m_entry = hap_write_p2m_entry;
> +    p2m->write_p2m_entry_post = hap_write_p2m_entry_post;
>  }
>  
>  static unsigned long hap_gva_to_gfn_real_mode(
> --- a/xen/arch/x86/mm/hap/nested_hap.c
> +++ b/xen/arch/x86/mm/hap/nested_hap.c
> @@ -71,24 +71,11 @@
>  /*        NESTED VIRT P2M FUNCTIONS         */
>  /********************************************/
>  
> -int
> -nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
> -    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level)
> +void
> +nestedp2m_write_p2m_entry_post(struct p2m_domain *p2m, unsigned int oflags)
>  {
> -    struct domain *d = p2m->domain;
> -    uint32_t old_flags;
> -
> -    paging_lock(d);
> -
> -    old_flags = l1e_get_flags(*p);
> -    safe_write_pte(p, new);
> -
> -    if (old_flags & _PAGE_PRESENT)
> -        guest_flush_tlb_mask(d, p2m->dirty_cpumask);
> -
> -    paging_unlock(d);
> -
> -    return 0;
> +    if ( oflags & _PAGE_PRESENT )
> +        guest_flush_tlb_mask(p2m->domain, p2m->dirty_cpumask);
>  }

This is a verbatimi copy of hap_write_p2m_entry_post. I assume there's
a reason why we need both, but I'm missing it.

>  
>  /********************************************/
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -122,17 +122,55 @@ static int write_p2m_entry(struct p2m_do
>  {
>      struct domain *d = p2m->domain;
>      struct vcpu *v = current;
> -    int rc = 0;
>  
>      if ( v->domain != d )
>          v = d->vcpu ? d->vcpu[0] : NULL;
>      if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
>           p2m_is_nestedp2m(p2m) )
> -        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
> +    {
> +        unsigned int oflags;
> +        mfn_t omfn;
> +        int rc;
> +
> +        paging_lock(d);
> +
> +        if ( p2m->write_p2m_entry_pre )
> +            p2m->write_p2m_entry_pre(d, gfn, p, new, level);
> +
> +        oflags = l1e_get_flags(*p);
> +        omfn = l1e_get_mfn(*p);
> +
> +        rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
> +                              p2m_flags_to_type(oflags), l1e_get_mfn(new),
> +                              omfn, level);
> +        if ( rc )
> +        {
> +            paging_unlock(d);
> +            return rc;
> +        }
> +
> +        safe_write_pte(p, new);
> +
> +        if ( p2m->write_p2m_entry_post )
> +            p2m->write_p2m_entry_post(p2m, oflags);
> +
> +        paging_unlock(d);
> +
> +        if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) &&
> +             (oflags & _PAGE_PRESENT) &&
> +             !p2m_get_hostp2m(d)->defer_nested_flush &&
> +             /*
> +              * We are replacing a valid entry so we need to flush nested p2ms,
> +              * unless the only change is an increase in access rights.
> +              */
> +             (!mfn_eq(omfn, l1e_get_mfn(new)) ||
> +              !perms_strictly_increased(oflags, l1e_get_flags(new))) )
> +            p2m_flush_nestedp2m(d);

It feel slightly weird to have a nested p2m hook post, and yet have
nested specific code here.

Have you considered if the post hook could be moved outside of the
locked region, so that we could put this chunk there in the nested p2m
case?

Thanks, Roger.

Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook
Posted by Jan Beulich 3 years, 5 months ago
On 10.11.2020 14:59, Roger Pau Monné wrote:
> On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
>> Fair parts of the present handlers are identical; in fact
>> nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move
>> common parts right into write_p2m_entry(), splitting the hooks into a
>> "pre" one (needed just by shadow code) and a "post" one.
>>
>> For the common parts moved I think that the p2m_flush_nestedp2m() is,
>> at least from an abstract perspective, also applicable in the shadow
>> case. Hence it doesn't get a 3rd hook put in place.
>>
>> The initial comment that was in hap_write_p2m_entry() gets dropped: Its
>> placement was bogus, and looking back the the commit introducing it
>> (dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use
>> of a p2m it was meant to be associated with.
> 
> Is there any performance implication of moving from one hook to two
> hooks? Since this shouldn't be a hot path I assume it's fine.

Well, first of all just a couple of patches ago two indirect
calls were folded into one, so it's at least not getting worse
compared to where we started from. And then both HAP and nested
install just one of the two hooks.

As per the remark in an earlier patch, referred to ...

>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> RFC: This is effectively the alternative to the suggestion in an earlier
>>      patch that we might do away with the hook altogether. Of course a
>>      hybrid approach would also be possible, by using direct calls here
>>      instead of splitting the hook into two.

... here, there is the option of doing away with the indirect
calls altogether, but - as said earlier - at the price of at
least coming close to a layering violation.

>> --- a/xen/arch/x86/mm/hap/nested_hap.c
>> +++ b/xen/arch/x86/mm/hap/nested_hap.c
>> @@ -71,24 +71,11 @@
>>  /*        NESTED VIRT P2M FUNCTIONS         */
>>  /********************************************/
>>  
>> -int
>> -nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
>> -    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level)
>> +void
>> +nestedp2m_write_p2m_entry_post(struct p2m_domain *p2m, unsigned int oflags)
>>  {
>> -    struct domain *d = p2m->domain;
>> -    uint32_t old_flags;
>> -
>> -    paging_lock(d);
>> -
>> -    old_flags = l1e_get_flags(*p);
>> -    safe_write_pte(p, new);
>> -
>> -    if (old_flags & _PAGE_PRESENT)
>> -        guest_flush_tlb_mask(d, p2m->dirty_cpumask);
>> -
>> -    paging_unlock(d);
>> -
>> -    return 0;
>> +    if ( oflags & _PAGE_PRESENT )
>> +        guest_flush_tlb_mask(p2m->domain, p2m->dirty_cpumask);
>>  }
> 
> This is a verbatimi copy of hap_write_p2m_entry_post. I assume there's
> a reason why we need both, but I'm missing it.

Only almost, since HAP has

    if ( oflags & _PAGE_PRESENT )
        guest_flush_tlb_mask(d, d->dirty_cpumask);

instead (i.e. they differ in which dirty_cpumask gets used).

>> --- a/xen/arch/x86/mm/p2m-pt.c
>> +++ b/xen/arch/x86/mm/p2m-pt.c
>> @@ -122,17 +122,55 @@ static int write_p2m_entry(struct p2m_do
>>  {
>>      struct domain *d = p2m->domain;
>>      struct vcpu *v = current;
>> -    int rc = 0;
>>  
>>      if ( v->domain != d )
>>          v = d->vcpu ? d->vcpu[0] : NULL;
>>      if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
>>           p2m_is_nestedp2m(p2m) )
>> -        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
>> +    {
>> +        unsigned int oflags;
>> +        mfn_t omfn;
>> +        int rc;
>> +
>> +        paging_lock(d);
>> +
>> +        if ( p2m->write_p2m_entry_pre )
>> +            p2m->write_p2m_entry_pre(d, gfn, p, new, level);
>> +
>> +        oflags = l1e_get_flags(*p);
>> +        omfn = l1e_get_mfn(*p);
>> +
>> +        rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
>> +                              p2m_flags_to_type(oflags), l1e_get_mfn(new),
>> +                              omfn, level);
>> +        if ( rc )
>> +        {
>> +            paging_unlock(d);
>> +            return rc;
>> +        }
>> +
>> +        safe_write_pte(p, new);
>> +
>> +        if ( p2m->write_p2m_entry_post )
>> +            p2m->write_p2m_entry_post(p2m, oflags);
>> +
>> +        paging_unlock(d);
>> +
>> +        if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) &&
>> +             (oflags & _PAGE_PRESENT) &&
>> +             !p2m_get_hostp2m(d)->defer_nested_flush &&
>> +             /*
>> +              * We are replacing a valid entry so we need to flush nested p2ms,
>> +              * unless the only change is an increase in access rights.
>> +              */
>> +             (!mfn_eq(omfn, l1e_get_mfn(new)) ||
>> +              !perms_strictly_increased(oflags, l1e_get_flags(new))) )
>> +            p2m_flush_nestedp2m(d);
> 
> It feel slightly weird to have a nested p2m hook post, and yet have
> nested specific code here.
> 
> Have you considered if the post hook could be moved outside of the
> locked region, so that we could put this chunk there in the nested p2m
> case?

Yes, I did, but I don't think the post hook can be moved out. The
only alternative therefore would be a 3rd hook. And this hook would
then need to be installed on the host p2m for nested guests, as
opposed to nestedp2m_write_p2m_entry_post, which gets installed in
the nested p2m-s. As said in the description, the main reason I
decided against a 3rd hook is that I suppose the code here isn't
HAP-specific (while prior to this patch it was).

Jan

Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook
Posted by Roger Pau Monné 3 years, 5 months ago
On Tue, Nov 10, 2020 at 03:50:44PM +0100, Jan Beulich wrote:
> On 10.11.2020 14:59, Roger Pau Monné wrote:
> > On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
> >> --- a/xen/arch/x86/mm/hap/nested_hap.c
> >> +++ b/xen/arch/x86/mm/hap/nested_hap.c
> >> @@ -71,24 +71,11 @@
> >>  /*        NESTED VIRT P2M FUNCTIONS         */
> >>  /********************************************/
> >>  
> >> -int
> >> -nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
> >> -    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level)
> >> +void
> >> +nestedp2m_write_p2m_entry_post(struct p2m_domain *p2m, unsigned int oflags)
> >>  {
> >> -    struct domain *d = p2m->domain;
> >> -    uint32_t old_flags;
> >> -
> >> -    paging_lock(d);
> >> -
> >> -    old_flags = l1e_get_flags(*p);
> >> -    safe_write_pte(p, new);
> >> -
> >> -    if (old_flags & _PAGE_PRESENT)
> >> -        guest_flush_tlb_mask(d, p2m->dirty_cpumask);
> >> -
> >> -    paging_unlock(d);
> >> -
> >> -    return 0;
> >> +    if ( oflags & _PAGE_PRESENT )
> >> +        guest_flush_tlb_mask(p2m->domain, p2m->dirty_cpumask);
> >>  }
> > 
> > This is a verbatimi copy of hap_write_p2m_entry_post. I assume there's
> > a reason why we need both, but I'm missing it.
> 
> Only almost, since HAP has
> 
>     if ( oflags & _PAGE_PRESENT )
>         guest_flush_tlb_mask(d, d->dirty_cpumask);
> 
> instead (i.e. they differ in which dirty_cpumask gets used).

Sorry, missed that bit.

> >> --- a/xen/arch/x86/mm/p2m-pt.c
> >> +++ b/xen/arch/x86/mm/p2m-pt.c
> >> @@ -122,17 +122,55 @@ static int write_p2m_entry(struct p2m_do
> >>  {
> >>      struct domain *d = p2m->domain;
> >>      struct vcpu *v = current;
> >> -    int rc = 0;
> >>  
> >>      if ( v->domain != d )
> >>          v = d->vcpu ? d->vcpu[0] : NULL;
> >>      if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
> >>           p2m_is_nestedp2m(p2m) )
> >> -        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
> >> +    {
> >> +        unsigned int oflags;
> >> +        mfn_t omfn;
> >> +        int rc;
> >> +
> >> +        paging_lock(d);
> >> +
> >> +        if ( p2m->write_p2m_entry_pre )
> >> +            p2m->write_p2m_entry_pre(d, gfn, p, new, level);
> >> +
> >> +        oflags = l1e_get_flags(*p);
> >> +        omfn = l1e_get_mfn(*p);
> >> +
> >> +        rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
> >> +                              p2m_flags_to_type(oflags), l1e_get_mfn(new),
> >> +                              omfn, level);
> >> +        if ( rc )
> >> +        {
> >> +            paging_unlock(d);
> >> +            return rc;
> >> +        }
> >> +
> >> +        safe_write_pte(p, new);
> >> +
> >> +        if ( p2m->write_p2m_entry_post )
> >> +            p2m->write_p2m_entry_post(p2m, oflags);
> >> +
> >> +        paging_unlock(d);
> >> +
> >> +        if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) &&
> >> +             (oflags & _PAGE_PRESENT) &&
> >> +             !p2m_get_hostp2m(d)->defer_nested_flush &&
> >> +             /*
> >> +              * We are replacing a valid entry so we need to flush nested p2ms,
> >> +              * unless the only change is an increase in access rights.
> >> +              */
> >> +             (!mfn_eq(omfn, l1e_get_mfn(new)) ||
> >> +              !perms_strictly_increased(oflags, l1e_get_flags(new))) )
> >> +            p2m_flush_nestedp2m(d);
> > 
> > It feel slightly weird to have a nested p2m hook post, and yet have
> > nested specific code here.
> > 
> > Have you considered if the post hook could be moved outside of the
> > locked region, so that we could put this chunk there in the nested p2m
> > case?
> 
> Yes, I did, but I don't think the post hook can be moved out. The
> only alternative therefore would be a 3rd hook. And this hook would
> then need to be installed on the host p2m for nested guests, as
> opposed to nestedp2m_write_p2m_entry_post, which gets installed in
> the nested p2m-s. As said in the description, the main reason I
> decided against a 3rd hook is that I suppose the code here isn't
> HAP-specific (while prior to this patch it was).

I'm not convinced the guest TLB flush needs to be performed while
holding the paging lock. The point of such flush is to invalidate any
intermediate guest visible translations that might now be invalid as a
result of the p2m change, but the paging lock doesn't affect the guest
in any way.

It's true that the dirty_cpumask might change, but I think we only
care that when returning from the function there are no stale cache
entries that contain the now invalid translation, and this can be
achieved equally by doing the flush outside of the locked region.

Roger.

Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook
Posted by Jan Beulich 3 years, 5 months ago
On 11.11.2020 13:17, Roger Pau Monné wrote:
> On Tue, Nov 10, 2020 at 03:50:44PM +0100, Jan Beulich wrote:
>> On 10.11.2020 14:59, Roger Pau Monné wrote:
>>> On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
>>>> --- a/xen/arch/x86/mm/p2m-pt.c
>>>> +++ b/xen/arch/x86/mm/p2m-pt.c
>>>> @@ -122,17 +122,55 @@ static int write_p2m_entry(struct p2m_do
>>>>  {
>>>>      struct domain *d = p2m->domain;
>>>>      struct vcpu *v = current;
>>>> -    int rc = 0;
>>>>  
>>>>      if ( v->domain != d )
>>>>          v = d->vcpu ? d->vcpu[0] : NULL;
>>>>      if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
>>>>           p2m_is_nestedp2m(p2m) )
>>>> -        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
>>>> +    {
>>>> +        unsigned int oflags;
>>>> +        mfn_t omfn;
>>>> +        int rc;
>>>> +
>>>> +        paging_lock(d);
>>>> +
>>>> +        if ( p2m->write_p2m_entry_pre )
>>>> +            p2m->write_p2m_entry_pre(d, gfn, p, new, level);
>>>> +
>>>> +        oflags = l1e_get_flags(*p);
>>>> +        omfn = l1e_get_mfn(*p);
>>>> +
>>>> +        rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
>>>> +                              p2m_flags_to_type(oflags), l1e_get_mfn(new),
>>>> +                              omfn, level);
>>>> +        if ( rc )
>>>> +        {
>>>> +            paging_unlock(d);
>>>> +            return rc;
>>>> +        }
>>>> +
>>>> +        safe_write_pte(p, new);
>>>> +
>>>> +        if ( p2m->write_p2m_entry_post )
>>>> +            p2m->write_p2m_entry_post(p2m, oflags);
>>>> +
>>>> +        paging_unlock(d);
>>>> +
>>>> +        if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) &&
>>>> +             (oflags & _PAGE_PRESENT) &&
>>>> +             !p2m_get_hostp2m(d)->defer_nested_flush &&
>>>> +             /*
>>>> +              * We are replacing a valid entry so we need to flush nested p2ms,
>>>> +              * unless the only change is an increase in access rights.
>>>> +              */
>>>> +             (!mfn_eq(omfn, l1e_get_mfn(new)) ||
>>>> +              !perms_strictly_increased(oflags, l1e_get_flags(new))) )
>>>> +            p2m_flush_nestedp2m(d);
>>>
>>> It feel slightly weird to have a nested p2m hook post, and yet have
>>> nested specific code here.
>>>
>>> Have you considered if the post hook could be moved outside of the
>>> locked region, so that we could put this chunk there in the nested p2m
>>> case?
>>
>> Yes, I did, but I don't think the post hook can be moved out. The
>> only alternative therefore would be a 3rd hook. And this hook would
>> then need to be installed on the host p2m for nested guests, as
>> opposed to nestedp2m_write_p2m_entry_post, which gets installed in
>> the nested p2m-s. As said in the description, the main reason I
>> decided against a 3rd hook is that I suppose the code here isn't
>> HAP-specific (while prior to this patch it was).
> 
> I'm not convinced the guest TLB flush needs to be performed while
> holding the paging lock. The point of such flush is to invalidate any
> intermediate guest visible translations that might now be invalid as a
> result of the p2m change, but the paging lock doesn't affect the guest
> in any way.
> 
> It's true that the dirty_cpumask might change, but I think we only
> care that when returning from the function there are no stale cache
> entries that contain the now invalid translation, and this can be
> achieved equally by doing the flush outside of the locked region.

I agree with all this. If only it was merely about TLB flushes. In
the shadow case, shadow_blow_all_tables() gets invoked, and that
one - looking at the other call sites - wants the paging lock held.

Additionally moving the stuff out of the locked region wouldn't
allow us to achieve the goal of moving the nested flush into the
hook, unless we introduced further hook handlers to be installed
on the host p2m-s of nested guests.

Jan

Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook
Posted by Roger Pau Monné 3 years, 5 months ago
On Thu, Nov 12, 2020 at 01:29:33PM +0100, Jan Beulich wrote:
> On 11.11.2020 13:17, Roger Pau Monné wrote:
> > On Tue, Nov 10, 2020 at 03:50:44PM +0100, Jan Beulich wrote:
> >> On 10.11.2020 14:59, Roger Pau Monné wrote:
> >>> On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
> >>>> --- a/xen/arch/x86/mm/p2m-pt.c
> >>>> +++ b/xen/arch/x86/mm/p2m-pt.c
> >>>> @@ -122,17 +122,55 @@ static int write_p2m_entry(struct p2m_do
> >>>>  {
> >>>>      struct domain *d = p2m->domain;
> >>>>      struct vcpu *v = current;
> >>>> -    int rc = 0;
> >>>>  
> >>>>      if ( v->domain != d )
> >>>>          v = d->vcpu ? d->vcpu[0] : NULL;
> >>>>      if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
> >>>>           p2m_is_nestedp2m(p2m) )
> >>>> -        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
> >>>> +    {
> >>>> +        unsigned int oflags;
> >>>> +        mfn_t omfn;
> >>>> +        int rc;
> >>>> +
> >>>> +        paging_lock(d);
> >>>> +
> >>>> +        if ( p2m->write_p2m_entry_pre )
> >>>> +            p2m->write_p2m_entry_pre(d, gfn, p, new, level);
> >>>> +
> >>>> +        oflags = l1e_get_flags(*p);
> >>>> +        omfn = l1e_get_mfn(*p);
> >>>> +
> >>>> +        rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
> >>>> +                              p2m_flags_to_type(oflags), l1e_get_mfn(new),
> >>>> +                              omfn, level);
> >>>> +        if ( rc )
> >>>> +        {
> >>>> +            paging_unlock(d);
> >>>> +            return rc;
> >>>> +        }
> >>>> +
> >>>> +        safe_write_pte(p, new);
> >>>> +
> >>>> +        if ( p2m->write_p2m_entry_post )
> >>>> +            p2m->write_p2m_entry_post(p2m, oflags);
> >>>> +
> >>>> +        paging_unlock(d);
> >>>> +
> >>>> +        if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) &&
> >>>> +             (oflags & _PAGE_PRESENT) &&
> >>>> +             !p2m_get_hostp2m(d)->defer_nested_flush &&
> >>>> +             /*
> >>>> +              * We are replacing a valid entry so we need to flush nested p2ms,
> >>>> +              * unless the only change is an increase in access rights.
> >>>> +              */
> >>>> +             (!mfn_eq(omfn, l1e_get_mfn(new)) ||
> >>>> +              !perms_strictly_increased(oflags, l1e_get_flags(new))) )
> >>>> +            p2m_flush_nestedp2m(d);
> >>>
> >>> It feel slightly weird to have a nested p2m hook post, and yet have
> >>> nested specific code here.
> >>>
> >>> Have you considered if the post hook could be moved outside of the
> >>> locked region, so that we could put this chunk there in the nested p2m
> >>> case?
> >>
> >> Yes, I did, but I don't think the post hook can be moved out. The
> >> only alternative therefore would be a 3rd hook. And this hook would
> >> then need to be installed on the host p2m for nested guests, as
> >> opposed to nestedp2m_write_p2m_entry_post, which gets installed in
> >> the nested p2m-s. As said in the description, the main reason I
> >> decided against a 3rd hook is that I suppose the code here isn't
> >> HAP-specific (while prior to this patch it was).
> > 
> > I'm not convinced the guest TLB flush needs to be performed while
> > holding the paging lock. The point of such flush is to invalidate any
> > intermediate guest visible translations that might now be invalid as a
> > result of the p2m change, but the paging lock doesn't affect the guest
> > in any way.
> > 
> > It's true that the dirty_cpumask might change, but I think we only
> > care that when returning from the function there are no stale cache
> > entries that contain the now invalid translation, and this can be
> > achieved equally by doing the flush outside of the locked region.
> 
> I agree with all this. If only it was merely about TLB flushes. In
> the shadow case, shadow_blow_all_tables() gets invoked, and that
> one - looking at the other call sites - wants the paging lock held.

You got me confused here, I think you meant shadow_blow_tables?

The post hook for shadow could pick the lock again, as I don't think
the removal of the tables needs to be strictly done inside of the same
locked region?

Something to consider from a performance PoV.

> Additionally moving the stuff out of the locked region wouldn't
> allow us to achieve the goal of moving the nested flush into the
> hook, unless we introduced further hook handlers to be installed
> on the host p2m-s of nested guests.

Right, or else we would also need to add that chunk in the
non-nestedp2m hook also?

Maybe you could join both the nested and non-nested hooks and use a
different dirty bitmap for the flush?

Thanks, Roger.

Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook
Posted by Jan Beulich 3 years, 5 months ago
On 12.11.2020 14:07, Roger Pau Monné wrote:
> On Thu, Nov 12, 2020 at 01:29:33PM +0100, Jan Beulich wrote:
>> On 11.11.2020 13:17, Roger Pau Monné wrote:
>>> On Tue, Nov 10, 2020 at 03:50:44PM +0100, Jan Beulich wrote:
>>>> On 10.11.2020 14:59, Roger Pau Monné wrote:
>>>>> On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
>>>>>> --- a/xen/arch/x86/mm/p2m-pt.c
>>>>>> +++ b/xen/arch/x86/mm/p2m-pt.c
>>>>>> @@ -122,17 +122,55 @@ static int write_p2m_entry(struct p2m_do
>>>>>>  {
>>>>>>      struct domain *d = p2m->domain;
>>>>>>      struct vcpu *v = current;
>>>>>> -    int rc = 0;
>>>>>>  
>>>>>>      if ( v->domain != d )
>>>>>>          v = d->vcpu ? d->vcpu[0] : NULL;
>>>>>>      if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
>>>>>>           p2m_is_nestedp2m(p2m) )
>>>>>> -        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
>>>>>> +    {
>>>>>> +        unsigned int oflags;
>>>>>> +        mfn_t omfn;
>>>>>> +        int rc;
>>>>>> +
>>>>>> +        paging_lock(d);
>>>>>> +
>>>>>> +        if ( p2m->write_p2m_entry_pre )
>>>>>> +            p2m->write_p2m_entry_pre(d, gfn, p, new, level);
>>>>>> +
>>>>>> +        oflags = l1e_get_flags(*p);
>>>>>> +        omfn = l1e_get_mfn(*p);
>>>>>> +
>>>>>> +        rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
>>>>>> +                              p2m_flags_to_type(oflags), l1e_get_mfn(new),
>>>>>> +                              omfn, level);
>>>>>> +        if ( rc )
>>>>>> +        {
>>>>>> +            paging_unlock(d);
>>>>>> +            return rc;
>>>>>> +        }
>>>>>> +
>>>>>> +        safe_write_pte(p, new);
>>>>>> +
>>>>>> +        if ( p2m->write_p2m_entry_post )
>>>>>> +            p2m->write_p2m_entry_post(p2m, oflags);
>>>>>> +
>>>>>> +        paging_unlock(d);
>>>>>> +
>>>>>> +        if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) &&
>>>>>> +             (oflags & _PAGE_PRESENT) &&
>>>>>> +             !p2m_get_hostp2m(d)->defer_nested_flush &&
>>>>>> +             /*
>>>>>> +              * We are replacing a valid entry so we need to flush nested p2ms,
>>>>>> +              * unless the only change is an increase in access rights.
>>>>>> +              */
>>>>>> +             (!mfn_eq(omfn, l1e_get_mfn(new)) ||
>>>>>> +              !perms_strictly_increased(oflags, l1e_get_flags(new))) )
>>>>>> +            p2m_flush_nestedp2m(d);
>>>>>
>>>>> It feel slightly weird to have a nested p2m hook post, and yet have
>>>>> nested specific code here.
>>>>>
>>>>> Have you considered if the post hook could be moved outside of the
>>>>> locked region, so that we could put this chunk there in the nested p2m
>>>>> case?
>>>>
>>>> Yes, I did, but I don't think the post hook can be moved out. The
>>>> only alternative therefore would be a 3rd hook. And this hook would
>>>> then need to be installed on the host p2m for nested guests, as
>>>> opposed to nestedp2m_write_p2m_entry_post, which gets installed in
>>>> the nested p2m-s. As said in the description, the main reason I
>>>> decided against a 3rd hook is that I suppose the code here isn't
>>>> HAP-specific (while prior to this patch it was).
>>>
>>> I'm not convinced the guest TLB flush needs to be performed while
>>> holding the paging lock. The point of such flush is to invalidate any
>>> intermediate guest visible translations that might now be invalid as a
>>> result of the p2m change, but the paging lock doesn't affect the guest
>>> in any way.
>>>
>>> It's true that the dirty_cpumask might change, but I think we only
>>> care that when returning from the function there are no stale cache
>>> entries that contain the now invalid translation, and this can be
>>> achieved equally by doing the flush outside of the locked region.
>>
>> I agree with all this. If only it was merely about TLB flushes. In
>> the shadow case, shadow_blow_all_tables() gets invoked, and that
>> one - looking at the other call sites - wants the paging lock held.
> 
> You got me confused here, I think you meant shadow_blow_tables?

Oh, yes, sorry - copy-and-paste from the wrong source.

> The post hook for shadow could pick the lock again, as I don't think
> the removal of the tables needs to be strictly done inside of the same
> locked region?

I think it does, or else a use of the now stale tables may occur
before they got blown away. Tim?

> Something to consider from a performance PoV.
> 
>> Additionally moving the stuff out of the locked region wouldn't
>> allow us to achieve the goal of moving the nested flush into the
>> hook, unless we introduced further hook handlers to be installed
>> on the host p2m-s of nested guests.
> 
> Right, or else we would also need to add that chunk in the
> non-nestedp2m hook also?

I'm a little confused by the question: If we wanted to move this
into the hook functions, it would need to be both hap's and
shadow's, i.e. _only_ the non-nested ones. IOW it could then
stay in hap's and be duplicated into shadow's. Avoiding the
duplication _and_ keeping it outside the locked region is why I
moved it into the common logic (provided, of course, I'm right
with my understanding of it also being needed in the shadow
case; else it could stay in the hap function alone), of which
the 2nd aspect would go away if the hook invocation itself lived
outside the locked region. But the duplication of this would
still concern me ...

> Maybe you could join both the nested and non-nested hooks and use a
> different dirty bitmap for the flush?

What would this gain us? Extra conditionals in a hook, when the
hook is (indirectly) to avoid having endless conditionals in the
common logic?

Jan

Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook
Posted by Tim Deegan 3 years, 5 months ago
At 15:04 +0100 on 12 Nov (1605193496), Jan Beulich wrote:
> On 12.11.2020 14:07, Roger Pau Monné wrote:
> > On Thu, Nov 12, 2020 at 01:29:33PM +0100, Jan Beulich wrote:
> >> I agree with all this. If only it was merely about TLB flushes. In
> >> the shadow case, shadow_blow_all_tables() gets invoked, and that
> >> one - looking at the other call sites - wants the paging lock held.
[...]
> > The post hook for shadow could pick the lock again, as I don't think
> > the removal of the tables needs to be strictly done inside of the same
> > locked region?
> 
> I think it does, or else a use of the now stale tables may occur
> before they got blown away. Tim?

Is this the call to shadow_blow_tables() in the write_p2m_entry path?
I think it would be safe to drop and re-take the paging lock there as
long as the call happens before the write is considered to have
finished.

But it would not be a useful performance improvement - any update that
takes this path is going to be very slow regardless.  So unless you
have another pressing reason to split it up, I would be inclined to
leave it as it is.  That way it's easier to see that the locking is
correct.

Cheers,

Tim.

Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook
Posted by Jan Beulich 3 years, 5 months ago
On 12.11.2020 18:52, Tim Deegan wrote:
> At 15:04 +0100 on 12 Nov (1605193496), Jan Beulich wrote:
>> On 12.11.2020 14:07, Roger Pau Monné wrote:
>>> On Thu, Nov 12, 2020 at 01:29:33PM +0100, Jan Beulich wrote:
>>>> I agree with all this. If only it was merely about TLB flushes. In
>>>> the shadow case, shadow_blow_all_tables() gets invoked, and that
>>>> one - looking at the other call sites - wants the paging lock held.
> [...]
>>> The post hook for shadow could pick the lock again, as I don't think
>>> the removal of the tables needs to be strictly done inside of the same
>>> locked region?
>>
>> I think it does, or else a use of the now stale tables may occur
>> before they got blown away. Tim?
> 
> Is this the call to shadow_blow_tables() in the write_p2m_entry path?

Yes.

> I think it would be safe to drop and re-take the paging lock there as
> long as the call happens before the write is considered to have
> finished.
> 
> But it would not be a useful performance improvement - any update that
> takes this path is going to be very slow regardless.  So unless you
> have another pressing reason to split it up, I would be inclined to
> leave it as it is.  That way it's easier to see that the locking is
> correct.

Thanks for the clarification.

Roger - what's your view at this point?

Jan

Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook
Posted by Roger Pau Monné 3 years, 5 months ago
On Fri, Nov 13, 2020 at 10:52:58AM +0100, Jan Beulich wrote:
> On 12.11.2020 18:52, Tim Deegan wrote:
> > At 15:04 +0100 on 12 Nov (1605193496), Jan Beulich wrote:
> >> On 12.11.2020 14:07, Roger Pau Monné wrote:
> >>> On Thu, Nov 12, 2020 at 01:29:33PM +0100, Jan Beulich wrote:
> >>>> I agree with all this. If only it was merely about TLB flushes. In
> >>>> the shadow case, shadow_blow_all_tables() gets invoked, and that
> >>>> one - looking at the other call sites - wants the paging lock held.
> > [...]
> >>> The post hook for shadow could pick the lock again, as I don't think
> >>> the removal of the tables needs to be strictly done inside of the same
> >>> locked region?
> >>
> >> I think it does, or else a use of the now stale tables may occur
> >> before they got blown away. Tim?
> > 
> > Is this the call to shadow_blow_tables() in the write_p2m_entry path?
> 
> Yes.
> 
> > I think it would be safe to drop and re-take the paging lock there as
> > long as the call happens before the write is considered to have
> > finished.
> > 
> > But it would not be a useful performance improvement - any update that
> > takes this path is going to be very slow regardless.  So unless you
> > have another pressing reason to split it up, I would be inclined to
> > leave it as it is.  That way it's easier to see that the locking is
> > correct.
> 
> Thanks for the clarification.
> 
> Roger - what's your view at this point?

So my main concern was not really about making this path faster -
after all this shouldn't be a hot path anyway, but rather reducing the
region protected by the paging lock, since it's a global lock that's
quite contended. In the HAP case we could move the flush outside of
the locked region thus reducing the lock time.

Anyway, seeing there's not much consensus on this aspect leaving it
as-is is no worse that what's currently in there.

Roger.

Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook
Posted by Roger Pau Monné 3 years, 5 months ago
On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
> Fair parts of the present handlers are identical; in fact
> nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move
> common parts right into write_p2m_entry(), splitting the hooks into a
> "pre" one (needed just by shadow code) and a "post" one.
> 
> For the common parts moved I think that the p2m_flush_nestedp2m() is,
> at least from an abstract perspective, also applicable in the shadow
> case. Hence it doesn't get a 3rd hook put in place.
> 
> The initial comment that was in hap_write_p2m_entry() gets dropped: Its
> placement was bogus, and looking back the the commit introducing it
> (dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use
> of a p2m it was meant to be associated with.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.