[v1] x86/pv: Flush TLB in response to paging structure changes

[PATCH] x86/pv: Flush TLB in response to paging structure changes

Posted by Andrew Cooper 5 years, 3 months ago

With MMU_UPDATE, a PV guest can make changes to higher level pagetables.  This
is from Xen's point of view (as the update only affects guest mappings), and
the guest is required to flush suitably after making updates.

However, Xen's use of linear pagetables (UPDATE_VA_MAPPING, GNTTABOP_map,
writeable pagetables, etc.) is an implementation detail outside of the
API/ABI.

Changes in the paging structure require invalidations in the linear pagetable
range for subsequent accesses into the linear pagetables to access non-stale
mappings.  Xen must provide suitable flushing to prevent intermixed guest
actions from accidentally accessing/modifying the wrong pagetable.

For all L2 and higher modifications, flush the full TLB.  (This could in
principle be an order 39 flush starting at LINEAR_PT_VIRT_START, but no such
mechanism exists in practice.)

As this combines with sync_guest for XPTI L4 "shadowing", replace the
sync_guest boolean with flush_flags and accumulate flags.  The sync_guest case
now always needs to flush, there is no point trying to exclude the current CPU
from the flush mask.  Use pt_owner->dirty_cpumask directly.

This is XSA-286.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

A couple of minor points.

 * PV guests can create global mappings.  I can't reason any safe way to relax
   FLUSH_TLB_GLOBAL to just FLUSH_TLB.

 * Performance tests are still ongoing, but so far is fairing better than the
   embargoed alternative.
---
 xen/arch/x86/mm.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 918ee2bbe3..a6a7fcb56c 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3883,11 +3883,10 @@ long do_mmu_update(
     void *va = NULL;
     unsigned long gpfn, gmfn;
     struct page_info *page;
-    unsigned int cmd, i = 0, done = 0, pt_dom;
+    unsigned int cmd, i = 0, done = 0, pt_dom, flush_flags = 0;
     struct vcpu *curr = current, *v = curr;
     struct domain *d = v->domain, *pt_owner = d, *pg_owner;
     mfn_t map_mfn = INVALID_MFN, mfn;
-    bool sync_guest = false;
     uint32_t xsm_needed = 0;
     uint32_t xsm_checked = 0;
     int rc = put_old_guest_table(curr);
@@ -4037,6 +4036,8 @@ long do_mmu_update(
                         break;
                     rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
                                       cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    if ( !rc )
+                        flush_flags |= FLUSH_TLB_GLOBAL;
                     break;
 
                 case PGT_l3_page_table:
@@ -4044,6 +4045,8 @@ long do_mmu_update(
                         break;
                     rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
                                       cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    if ( !rc )
+                        flush_flags |= FLUSH_TLB_GLOBAL;
                     break;
 
                 case PGT_l4_page_table:
@@ -4051,6 +4054,8 @@ long do_mmu_update(
                         break;
                     rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
                                       cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    if ( !rc )
+                        flush_flags |= FLUSH_TLB_GLOBAL;
                     if ( !rc && pt_owner->arch.pv.xpti )
                     {
                         bool local_in_use = false;
@@ -4071,7 +4076,7 @@ long do_mmu_update(
                              (1 + !!(page->u.inuse.type_info & PGT_pinned) +
                               mfn_eq(pagetable_get_mfn(curr->arch.guest_table_user),
                                      mfn) + local_in_use) )
-                            sync_guest = true;
+                            flush_flags |= FLUSH_ROOT_PGTBL;
                     }
                     break;
 
@@ -4173,19 +4178,13 @@ long do_mmu_update(
     if ( va )
         unmap_domain_page(va);
 
-    if ( sync_guest )
-    {
-        /*
-         * Force other vCPU-s of the affected guest to pick up L4 entry
-         * changes (if any).
-         */
-        unsigned int cpu = smp_processor_id();
-        cpumask_t *mask = per_cpu(scratch_cpumask, cpu);
-
-        cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu));
-        if ( !cpumask_empty(mask) )
-            flush_mask(mask, FLUSH_TLB_GLOBAL | FLUSH_ROOT_PGTBL);
-    }
+    /*
+     * Flush TLBs if an L2 or higher was changed (invalidates the structure of
+     * the linear pagetables), or an L4 in use by other CPUs was made (needs
+     * to resync the XPTI copy of the table).
+     */
+    if ( flush_flags )
+        flush_mask(pt_owner->dirty_cpumask, flush_flags);
 
     perfc_add(num_page_updates, i);
 
-- 
2.11.0

Re: [PATCH] x86/pv: Flush TLB in response to paging structure changes

Posted by Andrew Cooper 5 years, 3 months ago

On 20/10/2020 16:24, Andrew Cooper wrote:
> With MMU_UPDATE, a PV guest can make changes to higher level pagetables.  This
> is from Xen's point of view (as the update only affects guest mappings), and
> the guest is required to flush suitably after making updates.
>
> However, Xen's use of linear pagetables (UPDATE_VA_MAPPING, GNTTABOP_map,
> writeable pagetables, etc.) is an implementation detail outside of the
> API/ABI.
>
> Changes in the paging structure require invalidations in the linear pagetable
> range for subsequent accesses into the linear pagetables to access non-stale
> mappings.  Xen must provide suitable flushing to prevent intermixed guest
> actions from accidentally accessing/modifying the wrong pagetable.
>
> For all L2 and higher modifications, flush the full TLB.  (This could in
> principle be an order 39 flush starting at LINEAR_PT_VIRT_START, but no such
> mechanism exists in practice.)
>
> As this combines with sync_guest for XPTI L4 "shadowing", replace the
> sync_guest boolean with flush_flags and accumulate flags.  The sync_guest case
> now always needs to flush, there is no point trying to exclude the current CPU
> from the flush mask.  Use pt_owner->dirty_cpumask directly.
>
> This is XSA-286.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
>
> A couple of minor points.
>
>  * PV guests can create global mappings.  I can't reason any safe way to relax
>    FLUSH_TLB_GLOBAL to just FLUSH_TLB.

Sorry - forgot one of the points here.

We could in principle relax the flush entirely if we know that we're
editing from a not-present to present entry, but plumbing this up
through mod_l?_entry() isn't trivial, and its also not not obvious how
much of an optimisation it would be in practice.

~Andrew

>  * Performance tests are still ongoing, but so far is fairing better than the
>    embargoed alternative.
> ---
>  xen/arch/x86/mm.c | 31 +++++++++++++++----------------
>  1 file changed, 15 insertions(+), 16 deletions(-)
>
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 918ee2bbe3..a6a7fcb56c 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -3883,11 +3883,10 @@ long do_mmu_update(
>      void *va = NULL;
>      unsigned long gpfn, gmfn;
>      struct page_info *page;
> -    unsigned int cmd, i = 0, done = 0, pt_dom;
> +    unsigned int cmd, i = 0, done = 0, pt_dom, flush_flags = 0;
>      struct vcpu *curr = current, *v = curr;
>      struct domain *d = v->domain, *pt_owner = d, *pg_owner;
>      mfn_t map_mfn = INVALID_MFN, mfn;
> -    bool sync_guest = false;
>      uint32_t xsm_needed = 0;
>      uint32_t xsm_checked = 0;
>      int rc = put_old_guest_table(curr);
> @@ -4037,6 +4036,8 @@ long do_mmu_update(
>                          break;
>                      rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
>                                        cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
> +                    if ( !rc )
> +                        flush_flags |= FLUSH_TLB_GLOBAL;
>                      break;
>  
>                  case PGT_l3_page_table:
> @@ -4044,6 +4045,8 @@ long do_mmu_update(
>                          break;
>                      rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
>                                        cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
> +                    if ( !rc )
> +                        flush_flags |= FLUSH_TLB_GLOBAL;
>                      break;
>  
>                  case PGT_l4_page_table:
> @@ -4051,6 +4054,8 @@ long do_mmu_update(
>                          break;
>                      rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
>                                        cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
> +                    if ( !rc )
> +                        flush_flags |= FLUSH_TLB_GLOBAL;
>                      if ( !rc && pt_owner->arch.pv.xpti )
>                      {
>                          bool local_in_use = false;
> @@ -4071,7 +4076,7 @@ long do_mmu_update(
>                               (1 + !!(page->u.inuse.type_info & PGT_pinned) +
>                                mfn_eq(pagetable_get_mfn(curr->arch.guest_table_user),
>                                       mfn) + local_in_use) )
> -                            sync_guest = true;
> +                            flush_flags |= FLUSH_ROOT_PGTBL;
>                      }
>                      break;
>  
> @@ -4173,19 +4178,13 @@ long do_mmu_update(
>      if ( va )
>          unmap_domain_page(va);
>  
> -    if ( sync_guest )
> -    {
> -        /*
> -         * Force other vCPU-s of the affected guest to pick up L4 entry
> -         * changes (if any).
> -         */
> -        unsigned int cpu = smp_processor_id();
> -        cpumask_t *mask = per_cpu(scratch_cpumask, cpu);
> -
> -        cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu));
> -        if ( !cpumask_empty(mask) )
> -            flush_mask(mask, FLUSH_TLB_GLOBAL | FLUSH_ROOT_PGTBL);
> -    }
> +    /*
> +     * Flush TLBs if an L2 or higher was changed (invalidates the structure of
> +     * the linear pagetables), or an L4 in use by other CPUs was made (needs
> +     * to resync the XPTI copy of the table).
> +     */
> +    if ( flush_flags )
> +        flush_mask(pt_owner->dirty_cpumask, flush_flags);
>  
>      perfc_add(num_page_updates, i);
>

Re: [PATCH] x86/pv: Flush TLB in response to paging structure changes

Posted by Jan Beulich 5 years, 3 months ago

On 20.10.2020 17:24, Andrew Cooper wrote:
> A couple of minor points.
> 
>  * PV guests can create global mappings.  I can't reason any safe way to relax
>    FLUSH_TLB_GLOBAL to just FLUSH_TLB.

We only care about guest view here, and from guest view we only
care about non-leaf entries. Non-leaf entries can't be global,
and luckily (for now at least) the G bit also doesn't get
different meaning assigned in non-leaf entries.

Jan

Re: [PATCH] x86/pv: Flush TLB in response to paging structure changes

Posted by Jan Beulich 5 years, 3 months ago

On 20.10.2020 17:24, Andrew Cooper wrote:
> With MMU_UPDATE, a PV guest can make changes to higher level pagetables.  This
> is from Xen's point of view (as the update only affects guest mappings), and
> the guest is required to flush suitably after making updates.
> 
> However, Xen's use of linear pagetables (UPDATE_VA_MAPPING, GNTTABOP_map,
> writeable pagetables, etc.) is an implementation detail outside of the
> API/ABI.
> 
> Changes in the paging structure require invalidations in the linear pagetable
> range for subsequent accesses into the linear pagetables to access non-stale
> mappings.  Xen must provide suitable flushing to prevent intermixed guest
> actions from accidentally accessing/modifying the wrong pagetable.
> 
> For all L2 and higher modifications, flush the full TLB.  (This could in
> principle be an order 39 flush starting at LINEAR_PT_VIRT_START, but no such
> mechanism exists in practice.)
> 
> As this combines with sync_guest for XPTI L4 "shadowing", replace the
> sync_guest boolean with flush_flags and accumulate flags.  The sync_guest case
> now always needs to flush, there is no point trying to exclude the current CPU
> from the flush mask.  Use pt_owner->dirty_cpumask directly.

Why is there no point? There's no need for the FLUSH_ROOT_PGTBL
part of the flushing on the local CPU. The draft you had sent
earlier looked better in this regard.

Jan

Re: [PATCH] x86/pv: Flush TLB in response to paging structure changes

Posted by Andrew Cooper 5 years, 3 months ago

On 20/10/2020 16:48, Jan Beulich wrote:
> On 20.10.2020 17:24, Andrew Cooper wrote:
>> With MMU_UPDATE, a PV guest can make changes to higher level pagetables.  This
>> is from Xen's point of view (as the update only affects guest mappings), and
>> the guest is required to flush suitably after making updates.
>>
>> However, Xen's use of linear pagetables (UPDATE_VA_MAPPING, GNTTABOP_map,
>> writeable pagetables, etc.) is an implementation detail outside of the
>> API/ABI.
>>
>> Changes in the paging structure require invalidations in the linear pagetable
>> range for subsequent accesses into the linear pagetables to access non-stale
>> mappings.  Xen must provide suitable flushing to prevent intermixed guest
>> actions from accidentally accessing/modifying the wrong pagetable.
>>
>> For all L2 and higher modifications, flush the full TLB.  (This could in
>> principle be an order 39 flush starting at LINEAR_PT_VIRT_START, but no such
>> mechanism exists in practice.)
>>
>> As this combines with sync_guest for XPTI L4 "shadowing", replace the
>> sync_guest boolean with flush_flags and accumulate flags.  The sync_guest case
>> now always needs to flush, there is no point trying to exclude the current CPU
>> from the flush mask.  Use pt_owner->dirty_cpumask directly.
> Why is there no point? There's no need for the FLUSH_ROOT_PGTBL
> part of the flushing on the local CPU. The draft you had sent
> earlier looked better in this regard.

This was the area which broke.  It is to do with subtle difference in
the scope of L4 updates.

ROOT_PGTBL needs to resync current (if in use), and be broadcasted if
other references to the pages are found.

The TLB flush needs to be broadcast to the whole domain dirty mask, as
we can't (easily) know if the update was part of the current structure.

~Andrew

Re: [PATCH] x86/pv: Flush TLB in response to paging structure changes

Posted by Andrew Cooper 5 years, 3 months ago

On 20/10/2020 17:20, Andrew Cooper wrote:
> On 20/10/2020 16:48, Jan Beulich wrote:
>> On 20.10.2020 17:24, Andrew Cooper wrote:
>>> With MMU_UPDATE, a PV guest can make changes to higher level pagetables.  This
>>> is from Xen's point of view (as the update only affects guest mappings), and
>>> the guest is required to flush suitably after making updates.
>>>
>>> However, Xen's use of linear pagetables (UPDATE_VA_MAPPING, GNTTABOP_map,
>>> writeable pagetables, etc.) is an implementation detail outside of the
>>> API/ABI.
>>>
>>> Changes in the paging structure require invalidations in the linear pagetable
>>> range for subsequent accesses into the linear pagetables to access non-stale
>>> mappings.  Xen must provide suitable flushing to prevent intermixed guest
>>> actions from accidentally accessing/modifying the wrong pagetable.
>>>
>>> For all L2 and higher modifications, flush the full TLB.  (This could in
>>> principle be an order 39 flush starting at LINEAR_PT_VIRT_START, but no such
>>> mechanism exists in practice.)
>>>
>>> As this combines with sync_guest for XPTI L4 "shadowing", replace the
>>> sync_guest boolean with flush_flags and accumulate flags.  The sync_guest case
>>> now always needs to flush, there is no point trying to exclude the current CPU
>>> from the flush mask.  Use pt_owner->dirty_cpumask directly.
>> Why is there no point? There's no need for the FLUSH_ROOT_PGTBL
>> part of the flushing on the local CPU. The draft you had sent
>> earlier looked better in this regard.
> This was the area which broke.  It is to do with subtle difference in
> the scope of L4 updates.
>
> ROOT_PGTBL needs to resync current (if in use), and be broadcasted if
> other references to the pages are found.
>
> The TLB flush needs to be broadcast to the whole domain dirty mask, as
> we can't (easily) know if the update was part of the current structure.

Actually - we can know whether an L4 update needs flushing locally or
not, in exactly the same way as the sync logic currently works.

However, unlike the opencoded get_cpu_info()->root_pgt_changed = true,
we can't just flush locally for free.

This is quite awkward to express.

~Andrew

Re: [PATCH] x86/pv: Flush TLB in response to paging structure changes

Posted by Andrew Cooper 5 years, 3 months ago

On 20/10/2020 18:10, Andrew Cooper wrote:
> On 20/10/2020 17:20, Andrew Cooper wrote:
>> On 20/10/2020 16:48, Jan Beulich wrote:
>>> On 20.10.2020 17:24, Andrew Cooper wrote:
>>>> With MMU_UPDATE, a PV guest can make changes to higher level pagetables.  This
>>>> is from Xen's point of view (as the update only affects guest mappings), and
>>>> the guest is required to flush suitably after making updates.
>>>>
>>>> However, Xen's use of linear pagetables (UPDATE_VA_MAPPING, GNTTABOP_map,
>>>> writeable pagetables, etc.) is an implementation detail outside of the
>>>> API/ABI.
>>>>
>>>> Changes in the paging structure require invalidations in the linear pagetable
>>>> range for subsequent accesses into the linear pagetables to access non-stale
>>>> mappings.  Xen must provide suitable flushing to prevent intermixed guest
>>>> actions from accidentally accessing/modifying the wrong pagetable.
>>>>
>>>> For all L2 and higher modifications, flush the full TLB.  (This could in
>>>> principle be an order 39 flush starting at LINEAR_PT_VIRT_START, but no such
>>>> mechanism exists in practice.)
>>>>
>>>> As this combines with sync_guest for XPTI L4 "shadowing", replace the
>>>> sync_guest boolean with flush_flags and accumulate flags.  The sync_guest case
>>>> now always needs to flush, there is no point trying to exclude the current CPU
>>>> from the flush mask.  Use pt_owner->dirty_cpumask directly.
>>> Why is there no point? There's no need for the FLUSH_ROOT_PGTBL
>>> part of the flushing on the local CPU. The draft you had sent
>>> earlier looked better in this regard.
>> This was the area which broke.  It is to do with subtle difference in
>> the scope of L4 updates.
>>
>> ROOT_PGTBL needs to resync current (if in use), and be broadcasted if
>> other references to the pages are found.
>>
>> The TLB flush needs to be broadcast to the whole domain dirty mask, as
>> we can't (easily) know if the update was part of the current structure.
> Actually - we can know whether an L4 update needs flushing locally or
> not, in exactly the same way as the sync logic currently works.
>
> However, unlike the opencoded get_cpu_info()->root_pgt_changed = true,
> we can't just flush locally for free.
>
> This is quite awkward to express.

And not safe.  Flushes may accumulate from multiple levels in a batch,
and pt_owner may not be equal to current.

I stand by the version submitted as the security fix.

~Andrew

Re: [PATCH] x86/pv: Flush TLB in response to paging structure changes

Posted by Jan Beulich 5 years, 3 months ago

On 20.10.2020 20:46, Andrew Cooper wrote:
> On 20/10/2020 18:10, Andrew Cooper wrote:
>> On 20/10/2020 17:20, Andrew Cooper wrote:
>>> On 20/10/2020 16:48, Jan Beulich wrote:
>>>> On 20.10.2020 17:24, Andrew Cooper wrote:
>>>>> With MMU_UPDATE, a PV guest can make changes to higher level pagetables.  This
>>>>> is from Xen's point of view (as the update only affects guest mappings), and
>>>>> the guest is required to flush suitably after making updates.
>>>>>
>>>>> However, Xen's use of linear pagetables (UPDATE_VA_MAPPING, GNTTABOP_map,
>>>>> writeable pagetables, etc.) is an implementation detail outside of the
>>>>> API/ABI.
>>>>>
>>>>> Changes in the paging structure require invalidations in the linear pagetable
>>>>> range for subsequent accesses into the linear pagetables to access non-stale
>>>>> mappings.  Xen must provide suitable flushing to prevent intermixed guest
>>>>> actions from accidentally accessing/modifying the wrong pagetable.
>>>>>
>>>>> For all L2 and higher modifications, flush the full TLB.  (This could in
>>>>> principle be an order 39 flush starting at LINEAR_PT_VIRT_START, but no such
>>>>> mechanism exists in practice.)
>>>>>
>>>>> As this combines with sync_guest for XPTI L4 "shadowing", replace the
>>>>> sync_guest boolean with flush_flags and accumulate flags.  The sync_guest case
>>>>> now always needs to flush, there is no point trying to exclude the current CPU
>>>>> from the flush mask.  Use pt_owner->dirty_cpumask directly.
>>>> Why is there no point? There's no need for the FLUSH_ROOT_PGTBL
>>>> part of the flushing on the local CPU. The draft you had sent
>>>> earlier looked better in this regard.
>>> This was the area which broke.  It is to do with subtle difference in
>>> the scope of L4 updates.
>>>
>>> ROOT_PGTBL needs to resync current (if in use), and be broadcasted if
>>> other references to the pages are found.
>>>
>>> The TLB flush needs to be broadcast to the whole domain dirty mask, as
>>> we can't (easily) know if the update was part of the current structure.
>> Actually - we can know whether an L4 update needs flushing locally or
>> not, in exactly the same way as the sync logic currently works.
>>
>> However, unlike the opencoded get_cpu_info()->root_pgt_changed = true,
>> we can't just flush locally for free.
>>
>> This is quite awkward to express.
> 
> And not safe.  Flushes may accumulate from multiple levels in a batch,
> and pt_owner may not be equal to current.

I'm not questioning the TLB flush - this needs to be the scope you
use (but just FLUSH_TLB as per my earlier reply). I'm questioning
the extra ROOT_PGTBL sync (meaning changes to levels other than L4
don't matter), which is redundant with the explicit setting right
after the call to mod_l4_entry(). But I guess since now you need
to issue _some_ flush_mask() for the local CPU anyway, perhaps
it's rather the explicit setting of ->root_pgt_changed which wants
dropping?

(If pt_owner != current->domain, then pt_owner->dirty_cpumask
can't have smp_processor_id()'s bit set, and hence there was no
reduction in scope in this case anyway. Similarly in this case
local_in_use is necessarily false, as page tables can't be
shared between domains.)

Taking both adjustments together
- all L[234] changes require FLUSH_TLB on dirty CPUs of
  pt_owner including the local CPU
- the converted sync_guest continues to require
  FLUSH_TLB_GLOBAL | FLUSH_ROOT_PGTBL on remote dirty CPUs of
  pt_owner
This difference, I think, still warrants treating the local CPU
specially, as the global flush continues to be unnecessary there.
Whether the local CPU's ->root_pgt_changed gets set via
flush_local() or explicitly is then a pretty benign secondary
aspect.

Jan

Re: [PATCH] x86/pv: Flush TLB in response to paging structure changes

Posted by Andrew Cooper 5 years, 3 months ago

On 21/10/2020 07:55, Jan Beulich wrote:
> On 20.10.2020 20:46, Andrew Cooper wrote:
>> On 20/10/2020 18:10, Andrew Cooper wrote:
>>> On 20/10/2020 17:20, Andrew Cooper wrote:
>>>> On 20/10/2020 16:48, Jan Beulich wrote:
>>>>> On 20.10.2020 17:24, Andrew Cooper wrote:
>>>>>> With MMU_UPDATE, a PV guest can make changes to higher level pagetables.  This
>>>>>> is from Xen's point of view (as the update only affects guest mappings), and
>>>>>> the guest is required to flush suitably after making updates.
>>>>>>
>>>>>> However, Xen's use of linear pagetables (UPDATE_VA_MAPPING, GNTTABOP_map,
>>>>>> writeable pagetables, etc.) is an implementation detail outside of the
>>>>>> API/ABI.
>>>>>>
>>>>>> Changes in the paging structure require invalidations in the linear pagetable
>>>>>> range for subsequent accesses into the linear pagetables to access non-stale
>>>>>> mappings.  Xen must provide suitable flushing to prevent intermixed guest
>>>>>> actions from accidentally accessing/modifying the wrong pagetable.
>>>>>>
>>>>>> For all L2 and higher modifications, flush the full TLB.  (This could in
>>>>>> principle be an order 39 flush starting at LINEAR_PT_VIRT_START, but no such
>>>>>> mechanism exists in practice.)
>>>>>>
>>>>>> As this combines with sync_guest for XPTI L4 "shadowing", replace the
>>>>>> sync_guest boolean with flush_flags and accumulate flags.  The sync_guest case
>>>>>> now always needs to flush, there is no point trying to exclude the current CPU
>>>>>> from the flush mask.  Use pt_owner->dirty_cpumask directly.
>>>>> Why is there no point? There's no need for the FLUSH_ROOT_PGTBL
>>>>> part of the flushing on the local CPU. The draft you had sent
>>>>> earlier looked better in this regard.
>>>> This was the area which broke.  It is to do with subtle difference in
>>>> the scope of L4 updates.
>>>>
>>>> ROOT_PGTBL needs to resync current (if in use), and be broadcasted if
>>>> other references to the pages are found.
>>>>
>>>> The TLB flush needs to be broadcast to the whole domain dirty mask, as
>>>> we can't (easily) know if the update was part of the current structure.
>>> Actually - we can know whether an L4 update needs flushing locally or
>>> not, in exactly the same way as the sync logic currently works.
>>>
>>> However, unlike the opencoded get_cpu_info()->root_pgt_changed = true,
>>> we can't just flush locally for free.
>>>
>>> This is quite awkward to express.
>> And not safe.  Flushes may accumulate from multiple levels in a batch,
>> and pt_owner may not be equal to current.
> I'm not questioning the TLB flush - this needs to be the scope you
> use (but just FLUSH_TLB as per my earlier reply). I'm questioning
> the extra ROOT_PGTBL sync (meaning changes to levels other than L4
> don't matter), which is redundant with the explicit setting right
> after the call to mod_l4_entry(). But I guess since now you need
> to issue _some_ flush_mask() for the local CPU anyway, perhaps
> it's rather the explicit setting of ->root_pgt_changed which wants
> dropping?

No.  That was the delta which delayed posting in the first place.  Dom0
crashes very early without it.

The problem, as I said, is the asymmetry.

As dom0 is booting, the "only one use of this L4" logic kicks in, and
skips setting ROOT_PGTBL, which then causes the flush_mask() not to
flush on the local CPU either.

~Andrew

Re: [PATCH] x86/pv: Flush TLB in response to paging structure changes

Posted by Jan Beulich 5 years, 3 months ago

On 21.10.2020 12:01, Andrew Cooper wrote:
> On 21/10/2020 07:55, Jan Beulich wrote:
>> On 20.10.2020 20:46, Andrew Cooper wrote:
>>> On 20/10/2020 18:10, Andrew Cooper wrote:
>>>> On 20/10/2020 17:20, Andrew Cooper wrote:
>>>>> On 20/10/2020 16:48, Jan Beulich wrote:
>>>>>> On 20.10.2020 17:24, Andrew Cooper wrote:
>>>>>>> With MMU_UPDATE, a PV guest can make changes to higher level pagetables.  This
>>>>>>> is from Xen's point of view (as the update only affects guest mappings), and
>>>>>>> the guest is required to flush suitably after making updates.
>>>>>>>
>>>>>>> However, Xen's use of linear pagetables (UPDATE_VA_MAPPING, GNTTABOP_map,
>>>>>>> writeable pagetables, etc.) is an implementation detail outside of the
>>>>>>> API/ABI.
>>>>>>>
>>>>>>> Changes in the paging structure require invalidations in the linear pagetable
>>>>>>> range for subsequent accesses into the linear pagetables to access non-stale
>>>>>>> mappings.  Xen must provide suitable flushing to prevent intermixed guest
>>>>>>> actions from accidentally accessing/modifying the wrong pagetable.
>>>>>>>
>>>>>>> For all L2 and higher modifications, flush the full TLB.  (This could in
>>>>>>> principle be an order 39 flush starting at LINEAR_PT_VIRT_START, but no such
>>>>>>> mechanism exists in practice.)
>>>>>>>
>>>>>>> As this combines with sync_guest for XPTI L4 "shadowing", replace the
>>>>>>> sync_guest boolean with flush_flags and accumulate flags.  The sync_guest case
>>>>>>> now always needs to flush, there is no point trying to exclude the current CPU
>>>>>>> from the flush mask.  Use pt_owner->dirty_cpumask directly.
>>>>>> Why is there no point? There's no need for the FLUSH_ROOT_PGTBL
>>>>>> part of the flushing on the local CPU. The draft you had sent
>>>>>> earlier looked better in this regard.
>>>>> This was the area which broke.  It is to do with subtle difference in
>>>>> the scope of L4 updates.
>>>>>
>>>>> ROOT_PGTBL needs to resync current (if in use), and be broadcasted if
>>>>> other references to the pages are found.
>>>>>
>>>>> The TLB flush needs to be broadcast to the whole domain dirty mask, as
>>>>> we can't (easily) know if the update was part of the current structure.
>>>> Actually - we can know whether an L4 update needs flushing locally or
>>>> not, in exactly the same way as the sync logic currently works.
>>>>
>>>> However, unlike the opencoded get_cpu_info()->root_pgt_changed = true,
>>>> we can't just flush locally for free.
>>>>
>>>> This is quite awkward to express.
>>> And not safe.  Flushes may accumulate from multiple levels in a batch,
>>> and pt_owner may not be equal to current.
>> I'm not questioning the TLB flush - this needs to be the scope you
>> use (but just FLUSH_TLB as per my earlier reply). I'm questioning
>> the extra ROOT_PGTBL sync (meaning changes to levels other than L4
>> don't matter), which is redundant with the explicit setting right
>> after the call to mod_l4_entry(). But I guess since now you need
>> to issue _some_ flush_mask() for the local CPU anyway, perhaps
>> it's rather the explicit setting of ->root_pgt_changed which wants
>> dropping?
> 
> No.  That was the delta which delayed posting in the first place.  Dom0
> crashes very early without it.
> 
> The problem, as I said, is the asymmetry.
> 
> As dom0 is booting, the "only one use of this L4" logic kicks in, and
> skips setting ROOT_PGTBL, which then causes the flush_mask() not to
> flush on the local CPU either.

Ah, I think I finally got it. This asymmetry wants expressing then
in two different sets of flush flags (not sure whether two different
variables are needed), since - as per other replies - the local CPU
has different requirements anyway.

- all CPUs need FLUSH_TLB
- the local CPU may additionally need FLUSH_ROOT_PGTBL
- other CPUs may additionally (or instead, if you like) need
  FLUSH_ROOT_PGTBL | FLUSH_TLB_GLOBAL

But then of course the local CPU can as well have its
->root_pgt_changed updated directly - there's no difference whether
it gets done this way or by passing FLUSH_ROOT_PGTBL to flush_local().

Jan