[PATCH for-4.14] x86/tlb: fix assisted flush usage

Roger Pau Monne posted 1 patch 3 years, 9 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20200618160403.35199-1-roger.pau@citrix.com
There is a newer version of this series
xen/arch/x86/mm.c              | 12 +++++++++++-
xen/common/memory.c            |  2 +-
xen/common/page_alloc.c        |  2 +-
xen/include/asm-arm/flushtlb.h |  1 +
xen/include/asm-x86/flushtlb.h | 14 ++++++++++++++
xen/include/xen/mm.h           |  8 ++++++--
6 files changed, 34 insertions(+), 5 deletions(-)
[PATCH for-4.14] x86/tlb: fix assisted flush usage
Posted by Roger Pau Monne 3 years, 9 months ago
Commit e9aca9470ed86 introduced a regression when avoiding sending
IPIs for certain flush operations. Xen page fault handler
(spurious_page_fault) relies on blocking interrupts in order to
prevent handling TLB flush IPIs and thus preventing other CPUs from
removing page tables pages. Switching to assisted flushing avoided such
IPIs, and thus can result in pages belonging to the page tables being
removed (and possibly re-used) while __page_fault_type is being
executed.

Force some of the TLB flushes to use IPIs, thus avoiding the assisted
TLB flush. Those selected flushes are the page type change (when
switching from a page table type to a different one, ie: a page that
has been removed as a page table) and page allocation. This sadly has
a negative performance impact on the pvshim, as less assisted flushes
can be used.

Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
using an IPI (flush_tlb_mask_sync). Note that the flag is only
meaningfully defined when the hypervisor supports PV mode, as
otherwise translated domains are in charge of their page tables and
won't share page tables with Xen, thus not influencing the result of
page walks performed by the spurious fault handler.

Note the flag is not defined on Arm, and the introduced helper is just
a dummy alias to the existing flush_tlb_mask.

Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when available')
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
It's my understanding that not doing such IPI flushes could lead to
the pages tables being read by __page_fault_type being modified by a
third party, which could make them point to other mfns out of the
scope of the guest allowed physical memory addresses. However those
accesses would be limited to __page_fault_type, and hence the main
worry would be that a guest could make it point to read from a
physical memory region that has side effects?

Given that pvshim doesn't support passthrough devices, I'm not aware
of any such region being part of the physmap in the specific pvshim
case, so I wonder if it would be fine to remove the restriction for
the pvshim only case?
---
 xen/arch/x86/mm.c              | 12 +++++++++++-
 xen/common/memory.c            |  2 +-
 xen/common/page_alloc.c        |  2 +-
 xen/include/asm-arm/flushtlb.h |  1 +
 xen/include/asm-x86/flushtlb.h | 14 ++++++++++++++
 xen/include/xen/mm.h           |  8 ++++++--
 6 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index c294092f93..0be6b38769 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2894,7 +2894,17 @@ static int _get_page_type(struct page_info *page, unsigned long type,
                       ((nx & PGT_type_mask) == PGT_writable_page)) )
                 {
                     perfc_incr(need_flush_tlb_flush);
-                    flush_tlb_mask(mask);
+                    if ( (x & PGT_type_mask) &&
+                         (x & PGT_type_mask) <= PGT_l4_page_table )
+                        /*
+                         * If page was a page table make sure the flush is
+                         * performed using an IPI in order to avoid changing
+                         * the type of a page table page under the feet of
+                         * spurious_page_fault.
+                         */
+                        flush_tlb_mask_sync(mask);
+                    else
+                        flush_tlb_mask(mask);
                 }
 
                 /* We lose existing type and validity. */
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c1e5..0d224d6675 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -278,7 +278,7 @@ static void populate_physmap(struct memop_args *a)
 
 out:
     if ( need_tlbflush )
-        filtered_flush_tlb_mask(tlbflush_timestamp);
+        filtered_flush_tlb_mask(tlbflush_timestamp, false);
 
     if ( a->memflags & MEMF_no_icache_flush )
         invalidate_icache();
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 10b7aeca48..765f8d8e78 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1060,7 +1060,7 @@ static struct page_info *alloc_heap_pages(
     }
 
     if ( need_tlbflush )
-        filtered_flush_tlb_mask(tlbflush_timestamp);
+        filtered_flush_tlb_mask(tlbflush_timestamp, true);
 
     return pg;
 }
diff --git a/xen/include/asm-arm/flushtlb.h b/xen/include/asm-arm/flushtlb.h
index ab1aae5c90..7ae0543885 100644
--- a/xen/include/asm-arm/flushtlb.h
+++ b/xen/include/asm-arm/flushtlb.h
@@ -27,6 +27,7 @@ static inline void page_set_tlbflush_timestamp(struct page_info *page)
 
 /* Flush specified CPUs' TLBs */
 void flush_tlb_mask(const cpumask_t *mask);
+#define flush_tlb_mask_sync flush_tlb_mask
 
 /*
  * Flush a range of VA's hypervisor mappings from the TLB of the local
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 8639427cce..3650a822ac 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -126,6 +126,12 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
 #else
 #define FLUSH_HVM_ASID_CORE 0
 #endif
+#if CONFIG_PV
+/* Force an IPI to be sent */
+# define FLUSH_FORCE_IPI 0x8000
+#else
+# define FLUSH_FORCE_IPI 0
+#endif
 
 /* Flush local TLBs/caches. */
 unsigned int flush_area_local(const void *va, unsigned int flags);
@@ -148,6 +154,14 @@ void flush_area_mask(const cpumask_t *, const void *va, unsigned int flags);
 /* Flush specified CPUs' TLBs */
 #define flush_tlb_mask(mask)                    \
     flush_mask(mask, FLUSH_TLB)
+/*
+ * Flush specified CPUs' TLBs and force the usage of an IPI to do so. This is
+ * required for certain operations that rely on page tables themselves not
+ * being freed and reused when interrupts are blocked, as the flush IPI won't
+ * be fulfilled until exiting from that critical region.
+ */
+#define flush_tlb_mask_sync(mask)               \
+    flush_mask(mask, FLUSH_TLB | FLUSH_FORCE_IPI)
 #define flush_tlb_one_mask(mask,v)              \
     flush_area_mask(mask, (const void *)(v), FLUSH_TLB|FLUSH_ORDER(0))
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 9b62087be1..f360166f00 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -639,7 +639,8 @@ static inline void accumulate_tlbflush(bool *need_tlbflush,
     }
 }
 
-static inline void filtered_flush_tlb_mask(uint32_t tlbflush_timestamp)
+static inline void filtered_flush_tlb_mask(uint32_t tlbflush_timestamp,
+                                           bool sync)
 {
     cpumask_t mask;
 
@@ -648,7 +649,10 @@ static inline void filtered_flush_tlb_mask(uint32_t tlbflush_timestamp)
     if ( !cpumask_empty(&mask) )
     {
         perfc_incr(need_flush_tlb_flush);
-        flush_tlb_mask(&mask);
+        if ( sync )
+            flush_tlb_mask_sync(&mask);
+        else
+            flush_tlb_mask(&mask);
     }
 }
 
-- 
2.26.2


Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage
Posted by Jan Beulich 3 years, 9 months ago
On 18.06.2020 18:04, Roger Pau Monne wrote:
> Commit e9aca9470ed86 introduced a regression when avoiding sending
> IPIs for certain flush operations. Xen page fault handler
> (spurious_page_fault) relies on blocking interrupts in order to
> prevent handling TLB flush IPIs and thus preventing other CPUs from
> removing page tables pages. Switching to assisted flushing avoided such
> IPIs, and thus can result in pages belonging to the page tables being
> removed (and possibly re-used) while __page_fault_type is being
> executed.
> 
> Force some of the TLB flushes to use IPIs, thus avoiding the assisted
> TLB flush. Those selected flushes are the page type change (when
> switching from a page table type to a different one, ie: a page that
> has been removed as a page table) and page allocation. This sadly has
> a negative performance impact on the pvshim, as less assisted flushes
> can be used.
> 
> Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
> using an IPI (flush_tlb_mask_sync). Note that the flag is only
> meaningfully defined when the hypervisor supports PV mode, as
> otherwise translated domains are in charge of their page tables and
> won't share page tables with Xen, thus not influencing the result of
> page walks performed by the spurious fault handler.

Is this true for shadow mode? If a page shadowing a guest one was
given back quickly enough to the allocator and then re-used, I think
the same situation could in principle arise.

> Note the flag is not defined on Arm, and the introduced helper is just
> a dummy alias to the existing flush_tlb_mask.
> 
> Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when available')
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> It's my understanding that not doing such IPI flushes could lead to
> the pages tables being read by __page_fault_type being modified by a
> third party, which could make them point to other mfns out of the
> scope of the guest allowed physical memory addresses. However those
> accesses would be limited to __page_fault_type, and hence the main
> worry would be that a guest could make it point to read from a
> physical memory region that has side effects?

I don't think so, no - the memory could be changed such that the
PTEs are invalid altogether (like having reserved bits set). Consider
for example the case of reading an MFN out of such a PTE that's larger
than the physical address width supported by the CPU. Afaict
map_domain_page() will happily install a respective page table entry,
but we'd get a reserved-bit #PF upon reading from that mapping.

> ---
>  xen/arch/x86/mm.c              | 12 +++++++++++-
>  xen/common/memory.c            |  2 +-
>  xen/common/page_alloc.c        |  2 +-
>  xen/include/asm-arm/flushtlb.h |  1 +
>  xen/include/asm-x86/flushtlb.h | 14 ++++++++++++++
>  xen/include/xen/mm.h           |  8 ++++++--
>  6 files changed, 34 insertions(+), 5 deletions(-)

Not finding a consumer of the new flag, my first reaction was to
ask whether there's code missing somewhere. Having looked at
flush_area_mask() another time I now understand the itended
behavior results because of the extra flag now allowing
hypervisor_flush_tlb() to be entered. I think that's something
that's worth calling out in the description, or perhaps even in
the comment next to the #define.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -2894,7 +2894,17 @@ static int _get_page_type(struct page_info *page, unsigned long type,
>                        ((nx & PGT_type_mask) == PGT_writable_page)) )
>                  {
>                      perfc_incr(need_flush_tlb_flush);
> -                    flush_tlb_mask(mask);
> +                    if ( (x & PGT_type_mask) &&
> +                         (x & PGT_type_mask) <= PGT_l4_page_table )

With there being 5-level page tables around the corner, I think
we ought to get used to use PGT_root_page_table (or alike)
whenever possible, to avoid having to touch such code when
adding support for the new paging mode.

> --- a/xen/include/asm-x86/flushtlb.h
> +++ b/xen/include/asm-x86/flushtlb.h
> @@ -126,6 +126,12 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
>  #else
>  #define FLUSH_HVM_ASID_CORE 0
>  #endif
> +#if CONFIG_PV

#ifdef

> +/* Force an IPI to be sent */
> +# define FLUSH_FORCE_IPI 0x8000
> +#else
> +# define FLUSH_FORCE_IPI 0
> +#endif

If my shadow mode concern above is unwarranted, this overhead could
also be avoided if there's no PV domain at all in the system.
Perhaps an improvement not for now, but for the future ...

Jan

Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage
Posted by Roger Pau Monné 3 years, 9 months ago
On Fri, Jun 19, 2020 at 04:06:55PM +0200, Jan Beulich wrote:
> On 18.06.2020 18:04, Roger Pau Monne wrote:
> > Commit e9aca9470ed86 introduced a regression when avoiding sending
> > IPIs for certain flush operations. Xen page fault handler
> > (spurious_page_fault) relies on blocking interrupts in order to
> > prevent handling TLB flush IPIs and thus preventing other CPUs from
> > removing page tables pages. Switching to assisted flushing avoided such
> > IPIs, and thus can result in pages belonging to the page tables being
> > removed (and possibly re-used) while __page_fault_type is being
> > executed.
> > 
> > Force some of the TLB flushes to use IPIs, thus avoiding the assisted
> > TLB flush. Those selected flushes are the page type change (when
> > switching from a page table type to a different one, ie: a page that
> > has been removed as a page table) and page allocation. This sadly has
> > a negative performance impact on the pvshim, as less assisted flushes
> > can be used.
> > 
> > Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
> > using an IPI (flush_tlb_mask_sync). Note that the flag is only
> > meaningfully defined when the hypervisor supports PV mode, as
> > otherwise translated domains are in charge of their page tables and
> > won't share page tables with Xen, thus not influencing the result of
> > page walks performed by the spurious fault handler.
> 
> Is this true for shadow mode? If a page shadowing a guest one was
> given back quickly enough to the allocator and then re-used, I think
> the same situation could in principle arise.

Hm, I think it's not applicable to HVM shadow mode at least, because
CR3 is switched as part of vmentry/vmexit, and the page tables are not
shared between Xen and the guest, so there's no way for a HVM shadow
guest to modify the page-tables while Xen is walking them in
spurious_page_fault (note spurious_page_fault is only called when the
fault happens in non-guest context).

> > Note the flag is not defined on Arm, and the introduced helper is just
> > a dummy alias to the existing flush_tlb_mask.
> > 
> > Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when available')
> > Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > It's my understanding that not doing such IPI flushes could lead to
> > the pages tables being read by __page_fault_type being modified by a
> > third party, which could make them point to other mfns out of the
> > scope of the guest allowed physical memory addresses. However those
> > accesses would be limited to __page_fault_type, and hence the main
> > worry would be that a guest could make it point to read from a
> > physical memory region that has side effects?
> 
> I don't think so, no - the memory could be changed such that the
> PTEs are invalid altogether (like having reserved bits set). Consider
> for example the case of reading an MFN out of such a PTE that's larger
> than the physical address width supported by the CPU. Afaict
> map_domain_page() will happily install a respective page table entry,
> but we'd get a reserved-bit #PF upon reading from that mapping.

So there are no hazards from executing __page_fault_type against a
page-table that could be modified by a user?

> > ---
> >  xen/arch/x86/mm.c              | 12 +++++++++++-
> >  xen/common/memory.c            |  2 +-
> >  xen/common/page_alloc.c        |  2 +-
> >  xen/include/asm-arm/flushtlb.h |  1 +
> >  xen/include/asm-x86/flushtlb.h | 14 ++++++++++++++
> >  xen/include/xen/mm.h           |  8 ++++++--
> >  6 files changed, 34 insertions(+), 5 deletions(-)
> 
> Not finding a consumer of the new flag, my first reaction was to
> ask whether there's code missing somewhere. Having looked at
> flush_area_mask() another time I now understand the itended
> behavior results because of the extra flag now allowing
> hypervisor_flush_tlb() to be entered. I think that's something
> that's worth calling out in the description, or perhaps even in
> the comment next to the #define.

Oh right, the condition to use assisted flush is not actually changed
in flush_area_mask since setting any bit in the flags would prevent
using it.

> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -2894,7 +2894,17 @@ static int _get_page_type(struct page_info *page, unsigned long type,
> >                        ((nx & PGT_type_mask) == PGT_writable_page)) )
> >                  {
> >                      perfc_incr(need_flush_tlb_flush);
> > -                    flush_tlb_mask(mask);
> > +                    if ( (x & PGT_type_mask) &&
> > +                         (x & PGT_type_mask) <= PGT_l4_page_table )
> 
> With there being 5-level page tables around the corner, I think
> we ought to get used to use PGT_root_page_table (or alike)
> whenever possible, to avoid having to touch such code when
> adding support for the new paging mode.
> 
> > --- a/xen/include/asm-x86/flushtlb.h
> > +++ b/xen/include/asm-x86/flushtlb.h
> > @@ -126,6 +126,12 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
> >  #else
> >  #define FLUSH_HVM_ASID_CORE 0
> >  #endif
> > +#if CONFIG_PV
> 
> #ifdef
> 
> > +/* Force an IPI to be sent */
> > +# define FLUSH_FORCE_IPI 0x8000
> > +#else
> > +# define FLUSH_FORCE_IPI 0
> > +#endif
> 
> If my shadow mode concern above is unwarranted, this overhead could
> also be avoided if there's no PV domain at all in the system.
> Perhaps an improvement not for now, but for the future ...

Hm, right, I guess it would be possible to turn FLUSH_FORCE_IPI into a
dynamic flag.

Thanks, Roger.

Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage
Posted by Jan Beulich 3 years, 9 months ago
On 22.06.2020 11:31, Roger Pau Monné wrote:
> On Fri, Jun 19, 2020 at 04:06:55PM +0200, Jan Beulich wrote:
>> On 18.06.2020 18:04, Roger Pau Monne wrote:
>>> Commit e9aca9470ed86 introduced a regression when avoiding sending
>>> IPIs for certain flush operations. Xen page fault handler
>>> (spurious_page_fault) relies on blocking interrupts in order to
>>> prevent handling TLB flush IPIs and thus preventing other CPUs from
>>> removing page tables pages. Switching to assisted flushing avoided such
>>> IPIs, and thus can result in pages belonging to the page tables being
>>> removed (and possibly re-used) while __page_fault_type is being
>>> executed.
>>>
>>> Force some of the TLB flushes to use IPIs, thus avoiding the assisted
>>> TLB flush. Those selected flushes are the page type change (when
>>> switching from a page table type to a different one, ie: a page that
>>> has been removed as a page table) and page allocation. This sadly has
>>> a negative performance impact on the pvshim, as less assisted flushes
>>> can be used.
>>>
>>> Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
>>> using an IPI (flush_tlb_mask_sync). Note that the flag is only
>>> meaningfully defined when the hypervisor supports PV mode, as
>>> otherwise translated domains are in charge of their page tables and
>>> won't share page tables with Xen, thus not influencing the result of
>>> page walks performed by the spurious fault handler.
>>
>> Is this true for shadow mode? If a page shadowing a guest one was
>> given back quickly enough to the allocator and then re-used, I think
>> the same situation could in principle arise.
> 
> Hm, I think it's not applicable to HVM shadow mode at least, because
> CR3 is switched as part of vmentry/vmexit, and the page tables are not
> shared between Xen and the guest, so there's no way for a HVM shadow
> guest to modify the page-tables while Xen is walking them in
> spurious_page_fault (note spurious_page_fault is only called when the
> fault happens in non-guest context).

I'm afraid I disagree, because of shadow's use of "linear page tables".

>>> Note the flag is not defined on Arm, and the introduced helper is just
>>> a dummy alias to the existing flush_tlb_mask.
>>>
>>> Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when available')
>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> It's my understanding that not doing such IPI flushes could lead to
>>> the pages tables being read by __page_fault_type being modified by a
>>> third party, which could make them point to other mfns out of the
>>> scope of the guest allowed physical memory addresses. However those
>>> accesses would be limited to __page_fault_type, and hence the main
>>> worry would be that a guest could make it point to read from a
>>> physical memory region that has side effects?
>>
>> I don't think so, no - the memory could be changed such that the
>> PTEs are invalid altogether (like having reserved bits set). Consider
>> for example the case of reading an MFN out of such a PTE that's larger
>> than the physical address width supported by the CPU. Afaict
>> map_domain_page() will happily install a respective page table entry,
>> but we'd get a reserved-bit #PF upon reading from that mapping.
> 
> So there are no hazards from executing __page_fault_type against a
> page-table that could be modified by a user?

There are - I realize only now that the way I started my earlier
reply was ambiguous. I meant to negate your "only with side effects"
way of thinking.

Jan

Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage
Posted by Roger Pau Monné 3 years, 9 months ago
On Mon, Jun 22, 2020 at 01:07:10PM +0200, Jan Beulich wrote:
> On 22.06.2020 11:31, Roger Pau Monné wrote:
> > On Fri, Jun 19, 2020 at 04:06:55PM +0200, Jan Beulich wrote:
> >> On 18.06.2020 18:04, Roger Pau Monne wrote:
> >>> Commit e9aca9470ed86 introduced a regression when avoiding sending
> >>> IPIs for certain flush operations. Xen page fault handler
> >>> (spurious_page_fault) relies on blocking interrupts in order to
> >>> prevent handling TLB flush IPIs and thus preventing other CPUs from
> >>> removing page tables pages. Switching to assisted flushing avoided such
> >>> IPIs, and thus can result in pages belonging to the page tables being
> >>> removed (and possibly re-used) while __page_fault_type is being
> >>> executed.
> >>>
> >>> Force some of the TLB flushes to use IPIs, thus avoiding the assisted
> >>> TLB flush. Those selected flushes are the page type change (when
> >>> switching from a page table type to a different one, ie: a page that
> >>> has been removed as a page table) and page allocation. This sadly has
> >>> a negative performance impact on the pvshim, as less assisted flushes
> >>> can be used.
> >>>
> >>> Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
> >>> using an IPI (flush_tlb_mask_sync). Note that the flag is only
> >>> meaningfully defined when the hypervisor supports PV mode, as
> >>> otherwise translated domains are in charge of their page tables and
> >>> won't share page tables with Xen, thus not influencing the result of
> >>> page walks performed by the spurious fault handler.
> >>
> >> Is this true for shadow mode? If a page shadowing a guest one was
> >> given back quickly enough to the allocator and then re-used, I think
> >> the same situation could in principle arise.
> > 
> > Hm, I think it's not applicable to HVM shadow mode at least, because
> > CR3 is switched as part of vmentry/vmexit, and the page tables are not
> > shared between Xen and the guest, so there's no way for a HVM shadow
> > guest to modify the page-tables while Xen is walking them in
> > spurious_page_fault (note spurious_page_fault is only called when the
> > fault happens in non-guest context).
> 
> I'm afraid I disagree, because of shadow's use of "linear page tables".

You will have to bear with me, but I don't follow.

Could you provide some pointers at how/where the shadow (I assume
guest controlled) "linear page tables" are used while in Xen
context?

do_page_fault will only call spurious_page_fault (and thus attempt a
page walk) when the fault happens in non-guest context, yet on HVM all
accesses to guest memory are performed using __hvm_copy which doesn't
load any guest page tables into CR3, but instead performs a software
walk of the paging structures in order to find and map the destination
address into the hypervisor page tables and perform the read or copy.

> >>> Note the flag is not defined on Arm, and the introduced helper is just
> >>> a dummy alias to the existing flush_tlb_mask.
> >>>
> >>> Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when available')
> >>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>> ---
> >>> It's my understanding that not doing such IPI flushes could lead to
> >>> the pages tables being read by __page_fault_type being modified by a
> >>> third party, which could make them point to other mfns out of the
> >>> scope of the guest allowed physical memory addresses. However those
> >>> accesses would be limited to __page_fault_type, and hence the main
> >>> worry would be that a guest could make it point to read from a
> >>> physical memory region that has side effects?
> >>
> >> I don't think so, no - the memory could be changed such that the
> >> PTEs are invalid altogether (like having reserved bits set). Consider
> >> for example the case of reading an MFN out of such a PTE that's larger
> >> than the physical address width supported by the CPU. Afaict
> >> map_domain_page() will happily install a respective page table entry,
> >> but we'd get a reserved-bit #PF upon reading from that mapping.
> > 
> > So there are no hazards from executing __page_fault_type against a
> > page-table that could be modified by a user?
> 
> There are - I realize only now that the way I started my earlier
> reply was ambiguous. I meant to negate your "only with side effects"
> way of thinking.

Ack.

Thanks, Roger.

Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage
Posted by Jan Beulich 3 years, 9 months ago
On 22.06.2020 15:24, Roger Pau Monné wrote:
> On Mon, Jun 22, 2020 at 01:07:10PM +0200, Jan Beulich wrote:
>> On 22.06.2020 11:31, Roger Pau Monné wrote:
>>> On Fri, Jun 19, 2020 at 04:06:55PM +0200, Jan Beulich wrote:
>>>> On 18.06.2020 18:04, Roger Pau Monne wrote:
>>>>> Commit e9aca9470ed86 introduced a regression when avoiding sending
>>>>> IPIs for certain flush operations. Xen page fault handler
>>>>> (spurious_page_fault) relies on blocking interrupts in order to
>>>>> prevent handling TLB flush IPIs and thus preventing other CPUs from
>>>>> removing page tables pages. Switching to assisted flushing avoided such
>>>>> IPIs, and thus can result in pages belonging to the page tables being
>>>>> removed (and possibly re-used) while __page_fault_type is being
>>>>> executed.
>>>>>
>>>>> Force some of the TLB flushes to use IPIs, thus avoiding the assisted
>>>>> TLB flush. Those selected flushes are the page type change (when
>>>>> switching from a page table type to a different one, ie: a page that
>>>>> has been removed as a page table) and page allocation. This sadly has
>>>>> a negative performance impact on the pvshim, as less assisted flushes
>>>>> can be used.
>>>>>
>>>>> Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
>>>>> using an IPI (flush_tlb_mask_sync). Note that the flag is only
>>>>> meaningfully defined when the hypervisor supports PV mode, as
>>>>> otherwise translated domains are in charge of their page tables and
>>>>> won't share page tables with Xen, thus not influencing the result of
>>>>> page walks performed by the spurious fault handler.
>>>>
>>>> Is this true for shadow mode? If a page shadowing a guest one was
>>>> given back quickly enough to the allocator and then re-used, I think
>>>> the same situation could in principle arise.
>>>
>>> Hm, I think it's not applicable to HVM shadow mode at least, because
>>> CR3 is switched as part of vmentry/vmexit, and the page tables are not
>>> shared between Xen and the guest, so there's no way for a HVM shadow
>>> guest to modify the page-tables while Xen is walking them in
>>> spurious_page_fault (note spurious_page_fault is only called when the
>>> fault happens in non-guest context).
>>
>> I'm afraid I disagree, because of shadow's use of "linear page tables".
> 
> You will have to bear with me, but I don't follow.
> 
> Could you provide some pointers at how/where the shadow (I assume
> guest controlled) "linear page tables" are used while in Xen
> context?

See config.h:

/* Slot 258: linear page table (guest table). */
#define LINEAR_PT_VIRT_START    (PML4_ADDR(258))
#define LINEAR_PT_VIRT_END      (LINEAR_PT_VIRT_START + PML4_ENTRY_BYTES)
/* Slot 259: linear page table (shadow table). */
#define SH_LINEAR_PT_VIRT_START (PML4_ADDR(259))
#define SH_LINEAR_PT_VIRT_END   (SH_LINEAR_PT_VIRT_START + PML4_ENTRY_BYTES)

These linear page tables exist in the Xen page tables at basically
all times as long as a shadow guest's vCPU is in context. They're
there to limit the overhead of accessing guest page tables and
their shadows from inside Xen.

Jan

Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage
Posted by Roger Pau Monné 3 years, 9 months ago
On Mon, Jun 22, 2020 at 03:51:24PM +0200, Jan Beulich wrote:
> On 22.06.2020 15:24, Roger Pau Monné wrote:
> > On Mon, Jun 22, 2020 at 01:07:10PM +0200, Jan Beulich wrote:
> >> On 22.06.2020 11:31, Roger Pau Monné wrote:
> >>> On Fri, Jun 19, 2020 at 04:06:55PM +0200, Jan Beulich wrote:
> >>>> On 18.06.2020 18:04, Roger Pau Monne wrote:
> >>>>> Commit e9aca9470ed86 introduced a regression when avoiding sending
> >>>>> IPIs for certain flush operations. Xen page fault handler
> >>>>> (spurious_page_fault) relies on blocking interrupts in order to
> >>>>> prevent handling TLB flush IPIs and thus preventing other CPUs from
> >>>>> removing page tables pages. Switching to assisted flushing avoided such
> >>>>> IPIs, and thus can result in pages belonging to the page tables being
> >>>>> removed (and possibly re-used) while __page_fault_type is being
> >>>>> executed.
> >>>>>
> >>>>> Force some of the TLB flushes to use IPIs, thus avoiding the assisted
> >>>>> TLB flush. Those selected flushes are the page type change (when
> >>>>> switching from a page table type to a different one, ie: a page that
> >>>>> has been removed as a page table) and page allocation. This sadly has
> >>>>> a negative performance impact on the pvshim, as less assisted flushes
> >>>>> can be used.
> >>>>>
> >>>>> Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
> >>>>> using an IPI (flush_tlb_mask_sync). Note that the flag is only
> >>>>> meaningfully defined when the hypervisor supports PV mode, as
> >>>>> otherwise translated domains are in charge of their page tables and
> >>>>> won't share page tables with Xen, thus not influencing the result of
> >>>>> page walks performed by the spurious fault handler.
> >>>>
> >>>> Is this true for shadow mode? If a page shadowing a guest one was
> >>>> given back quickly enough to the allocator and then re-used, I think
> >>>> the same situation could in principle arise.
> >>>
> >>> Hm, I think it's not applicable to HVM shadow mode at least, because
> >>> CR3 is switched as part of vmentry/vmexit, and the page tables are not
> >>> shared between Xen and the guest, so there's no way for a HVM shadow
> >>> guest to modify the page-tables while Xen is walking them in
> >>> spurious_page_fault (note spurious_page_fault is only called when the
> >>> fault happens in non-guest context).
> >>
> >> I'm afraid I disagree, because of shadow's use of "linear page tables".
> > 
> > You will have to bear with me, but I don't follow.
> > 
> > Could you provide some pointers at how/where the shadow (I assume
> > guest controlled) "linear page tables" are used while in Xen
> > context?
> 
> See config.h:
> 
> /* Slot 258: linear page table (guest table). */
> #define LINEAR_PT_VIRT_START    (PML4_ADDR(258))
> #define LINEAR_PT_VIRT_END      (LINEAR_PT_VIRT_START + PML4_ENTRY_BYTES)
> /* Slot 259: linear page table (shadow table). */
> #define SH_LINEAR_PT_VIRT_START (PML4_ADDR(259))
> #define SH_LINEAR_PT_VIRT_END   (SH_LINEAR_PT_VIRT_START + PML4_ENTRY_BYTES)
> 
> These linear page tables exist in the Xen page tables at basically
> all times as long as a shadow guest's vCPU is in context. They're
> there to limit the overhead of accessing guest page tables and
> their shadows from inside Xen.

Oh, I have to admit I know very little about all this, and I'm not
able to find a description of how this is to be used.

I think the shadow linear page tables should be per-pCPU, and hence
they cannot be modified by the guest while a spurious page fault is
being processed? (since the vCPU running on the pCPU is in Xen
context).

Thanks, Roger.

Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage
Posted by Jan Beulich 3 years, 9 months ago
On 22.06.2020 16:56, Roger Pau Monné wrote:
> On Mon, Jun 22, 2020 at 03:51:24PM +0200, Jan Beulich wrote:
>> On 22.06.2020 15:24, Roger Pau Monné wrote:
>>> On Mon, Jun 22, 2020 at 01:07:10PM +0200, Jan Beulich wrote:
>>>> On 22.06.2020 11:31, Roger Pau Monné wrote:
>>>>> On Fri, Jun 19, 2020 at 04:06:55PM +0200, Jan Beulich wrote:
>>>>>> On 18.06.2020 18:04, Roger Pau Monne wrote:
>>>>>>> Commit e9aca9470ed86 introduced a regression when avoiding sending
>>>>>>> IPIs for certain flush operations. Xen page fault handler
>>>>>>> (spurious_page_fault) relies on blocking interrupts in order to
>>>>>>> prevent handling TLB flush IPIs and thus preventing other CPUs from
>>>>>>> removing page tables pages. Switching to assisted flushing avoided such
>>>>>>> IPIs, and thus can result in pages belonging to the page tables being
>>>>>>> removed (and possibly re-used) while __page_fault_type is being
>>>>>>> executed.
>>>>>>>
>>>>>>> Force some of the TLB flushes to use IPIs, thus avoiding the assisted
>>>>>>> TLB flush. Those selected flushes are the page type change (when
>>>>>>> switching from a page table type to a different one, ie: a page that
>>>>>>> has been removed as a page table) and page allocation. This sadly has
>>>>>>> a negative performance impact on the pvshim, as less assisted flushes
>>>>>>> can be used.
>>>>>>>
>>>>>>> Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
>>>>>>> using an IPI (flush_tlb_mask_sync). Note that the flag is only
>>>>>>> meaningfully defined when the hypervisor supports PV mode, as
>>>>>>> otherwise translated domains are in charge of their page tables and
>>>>>>> won't share page tables with Xen, thus not influencing the result of
>>>>>>> page walks performed by the spurious fault handler.
>>>>>>
>>>>>> Is this true for shadow mode? If a page shadowing a guest one was
>>>>>> given back quickly enough to the allocator and then re-used, I think
>>>>>> the same situation could in principle arise.
>>>>>
>>>>> Hm, I think it's not applicable to HVM shadow mode at least, because
>>>>> CR3 is switched as part of vmentry/vmexit, and the page tables are not
>>>>> shared between Xen and the guest, so there's no way for a HVM shadow
>>>>> guest to modify the page-tables while Xen is walking them in
>>>>> spurious_page_fault (note spurious_page_fault is only called when the
>>>>> fault happens in non-guest context).
>>>>
>>>> I'm afraid I disagree, because of shadow's use of "linear page tables".
>>>
>>> You will have to bear with me, but I don't follow.
>>>
>>> Could you provide some pointers at how/where the shadow (I assume
>>> guest controlled) "linear page tables" are used while in Xen
>>> context?
>>
>> See config.h:
>>
>> /* Slot 258: linear page table (guest table). */
>> #define LINEAR_PT_VIRT_START    (PML4_ADDR(258))
>> #define LINEAR_PT_VIRT_END      (LINEAR_PT_VIRT_START + PML4_ENTRY_BYTES)
>> /* Slot 259: linear page table (shadow table). */
>> #define SH_LINEAR_PT_VIRT_START (PML4_ADDR(259))
>> #define SH_LINEAR_PT_VIRT_END   (SH_LINEAR_PT_VIRT_START + PML4_ENTRY_BYTES)
>>
>> These linear page tables exist in the Xen page tables at basically
>> all times as long as a shadow guest's vCPU is in context. They're
>> there to limit the overhead of accessing guest page tables and
>> their shadows from inside Xen.
> 
> Oh, I have to admit I know very little about all this, and I'm not
> able to find a description of how this is to be used.
> 
> I think the shadow linear page tables should be per-pCPU, and hence
> they cannot be modified by the guest while a spurious page fault is
> being processed? (since the vCPU running on the pCPU is in Xen
> context).

A guest would have for some linear address e.g.

vCR3 -> G4 -> G3 -> G2 -> G1

visible to some random set of its CPUs (i.e. the same CR3 can be in
use by multiple vCPU-s). This will then have shadows like this:

pCR3 -> S4 -> S3 -> S2 -> S1

The G4 page gets hooked up into LINEAR_PT (i.e. becomes an L3 page)
for all vCPU-s that have this very CR3 active. Same goes for S4 and
SH_LINEAR_PT respectively. Afaik shadows of guest page tables also
don't get created on a per-pCPU basis - a page table either has a
shadow, or it doesn't.

Hence afaict page tables mapped through these facilities (and
reachable while in Xen) can very well be modified "behind our backs".

Jan

Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage
Posted by Roger Pau Monné 3 years, 9 months ago
On Mon, Jun 22, 2020 at 05:30:11PM +0200, Jan Beulich wrote:
> On 22.06.2020 16:56, Roger Pau Monné wrote:
> > On Mon, Jun 22, 2020 at 03:51:24PM +0200, Jan Beulich wrote:
> >> On 22.06.2020 15:24, Roger Pau Monné wrote:
> >>> On Mon, Jun 22, 2020 at 01:07:10PM +0200, Jan Beulich wrote:
> >>>> On 22.06.2020 11:31, Roger Pau Monné wrote:
> >>>>> On Fri, Jun 19, 2020 at 04:06:55PM +0200, Jan Beulich wrote:
> >>>>>> On 18.06.2020 18:04, Roger Pau Monne wrote:
> >>>>>>> Commit e9aca9470ed86 introduced a regression when avoiding sending
> >>>>>>> IPIs for certain flush operations. Xen page fault handler
> >>>>>>> (spurious_page_fault) relies on blocking interrupts in order to
> >>>>>>> prevent handling TLB flush IPIs and thus preventing other CPUs from
> >>>>>>> removing page tables pages. Switching to assisted flushing avoided such
> >>>>>>> IPIs, and thus can result in pages belonging to the page tables being
> >>>>>>> removed (and possibly re-used) while __page_fault_type is being
> >>>>>>> executed.
> >>>>>>>
> >>>>>>> Force some of the TLB flushes to use IPIs, thus avoiding the assisted
> >>>>>>> TLB flush. Those selected flushes are the page type change (when
> >>>>>>> switching from a page table type to a different one, ie: a page that
> >>>>>>> has been removed as a page table) and page allocation. This sadly has
> >>>>>>> a negative performance impact on the pvshim, as less assisted flushes
> >>>>>>> can be used.
> >>>>>>>
> >>>>>>> Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
> >>>>>>> using an IPI (flush_tlb_mask_sync). Note that the flag is only
> >>>>>>> meaningfully defined when the hypervisor supports PV mode, as
> >>>>>>> otherwise translated domains are in charge of their page tables and
> >>>>>>> won't share page tables with Xen, thus not influencing the result of
> >>>>>>> page walks performed by the spurious fault handler.
> >>>>>>
> >>>>>> Is this true for shadow mode? If a page shadowing a guest one was
> >>>>>> given back quickly enough to the allocator and then re-used, I think
> >>>>>> the same situation could in principle arise.
> >>>>>
> >>>>> Hm, I think it's not applicable to HVM shadow mode at least, because
> >>>>> CR3 is switched as part of vmentry/vmexit, and the page tables are not
> >>>>> shared between Xen and the guest, so there's no way for a HVM shadow
> >>>>> guest to modify the page-tables while Xen is walking them in
> >>>>> spurious_page_fault (note spurious_page_fault is only called when the
> >>>>> fault happens in non-guest context).
> >>>>
> >>>> I'm afraid I disagree, because of shadow's use of "linear page tables".
> >>>
> >>> You will have to bear with me, but I don't follow.
> >>>
> >>> Could you provide some pointers at how/where the shadow (I assume
> >>> guest controlled) "linear page tables" are used while in Xen
> >>> context?
> >>
> >> See config.h:
> >>
> >> /* Slot 258: linear page table (guest table). */
> >> #define LINEAR_PT_VIRT_START    (PML4_ADDR(258))
> >> #define LINEAR_PT_VIRT_END      (LINEAR_PT_VIRT_START + PML4_ENTRY_BYTES)
> >> /* Slot 259: linear page table (shadow table). */
> >> #define SH_LINEAR_PT_VIRT_START (PML4_ADDR(259))
> >> #define SH_LINEAR_PT_VIRT_END   (SH_LINEAR_PT_VIRT_START + PML4_ENTRY_BYTES)
> >>
> >> These linear page tables exist in the Xen page tables at basically
> >> all times as long as a shadow guest's vCPU is in context. They're
> >> there to limit the overhead of accessing guest page tables and
> >> their shadows from inside Xen.
> > 
> > Oh, I have to admit I know very little about all this, and I'm not
> > able to find a description of how this is to be used.
> > 
> > I think the shadow linear page tables should be per-pCPU, and hence
> > they cannot be modified by the guest while a spurious page fault is
> > being processed? (since the vCPU running on the pCPU is in Xen
> > context).
> 
> A guest would have for some linear address e.g.
> 
> vCR3 -> G4 -> G3 -> G2 -> G1
> 
> visible to some random set of its CPUs (i.e. the same CR3 can be in
> use by multiple vCPU-s). This will then have shadows like this:
> 
> pCR3 -> S4 -> S3 -> S2 -> S1
> 
> The G4 page gets hooked up into LINEAR_PT (i.e. becomes an L3 page)
> for all vCPU-s that have this very CR3 active. Same goes for S4 and
> SH_LINEAR_PT respectively. Afaik shadows of guest page tables also
> don't get created on a per-pCPU basis - a page table either has a
> shadow, or it doesn't.
> 
> Hence afaict page tables mapped through these facilities (and
> reachable while in Xen) can very well be modified "behind our backs".

Oh, I see. I'm still slightly puzzled because __hvm_copy seems to
always map the guest page, and sh_gva_to_gfn also seems to just walk
the guest page tables using the software implemented page table
walker, and returns a gfn (not a mfn). I was expecting __hvm_copy to
somehow make use of those shadow page tables when performing accesses
to guest memory?

Is shadow code using some of this internally then? Sorry I'm having
trouble with all this, but I would like to fully understand.

Thanks, Roger.

Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage
Posted by Jan Beulich 3 years, 9 months ago
On 23.06.2020 11:25, Roger Pau Monné wrote:
> On Mon, Jun 22, 2020 at 05:30:11PM +0200, Jan Beulich wrote:
>> On 22.06.2020 16:56, Roger Pau Monné wrote:
>>> On Mon, Jun 22, 2020 at 03:51:24PM +0200, Jan Beulich wrote:
>>>> On 22.06.2020 15:24, Roger Pau Monné wrote:
>>>>> On Mon, Jun 22, 2020 at 01:07:10PM +0200, Jan Beulich wrote:
>>>>>> On 22.06.2020 11:31, Roger Pau Monné wrote:
>>>>>>> On Fri, Jun 19, 2020 at 04:06:55PM +0200, Jan Beulich wrote:
>>>>>>>> On 18.06.2020 18:04, Roger Pau Monne wrote:
>>>>>>>>> Commit e9aca9470ed86 introduced a regression when avoiding sending
>>>>>>>>> IPIs for certain flush operations. Xen page fault handler
>>>>>>>>> (spurious_page_fault) relies on blocking interrupts in order to
>>>>>>>>> prevent handling TLB flush IPIs and thus preventing other CPUs from
>>>>>>>>> removing page tables pages. Switching to assisted flushing avoided such
>>>>>>>>> IPIs, and thus can result in pages belonging to the page tables being
>>>>>>>>> removed (and possibly re-used) while __page_fault_type is being
>>>>>>>>> executed.
>>>>>>>>>
>>>>>>>>> Force some of the TLB flushes to use IPIs, thus avoiding the assisted
>>>>>>>>> TLB flush. Those selected flushes are the page type change (when
>>>>>>>>> switching from a page table type to a different one, ie: a page that
>>>>>>>>> has been removed as a page table) and page allocation. This sadly has
>>>>>>>>> a negative performance impact on the pvshim, as less assisted flushes
>>>>>>>>> can be used.
>>>>>>>>>
>>>>>>>>> Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
>>>>>>>>> using an IPI (flush_tlb_mask_sync). Note that the flag is only
>>>>>>>>> meaningfully defined when the hypervisor supports PV mode, as
>>>>>>>>> otherwise translated domains are in charge of their page tables and
>>>>>>>>> won't share page tables with Xen, thus not influencing the result of
>>>>>>>>> page walks performed by the spurious fault handler.
>>>>>>>>
>>>>>>>> Is this true for shadow mode? If a page shadowing a guest one was
>>>>>>>> given back quickly enough to the allocator and then re-used, I think
>>>>>>>> the same situation could in principle arise.
>>>>>>>
>>>>>>> Hm, I think it's not applicable to HVM shadow mode at least, because
>>>>>>> CR3 is switched as part of vmentry/vmexit, and the page tables are not
>>>>>>> shared between Xen and the guest, so there's no way for a HVM shadow
>>>>>>> guest to modify the page-tables while Xen is walking them in
>>>>>>> spurious_page_fault (note spurious_page_fault is only called when the
>>>>>>> fault happens in non-guest context).
>>>>>>
>>>>>> I'm afraid I disagree, because of shadow's use of "linear page tables".
>>>>>
>>>>> You will have to bear with me, but I don't follow.
>>>>>
>>>>> Could you provide some pointers at how/where the shadow (I assume
>>>>> guest controlled) "linear page tables" are used while in Xen
>>>>> context?
>>>>
>>>> See config.h:
>>>>
>>>> /* Slot 258: linear page table (guest table). */
>>>> #define LINEAR_PT_VIRT_START    (PML4_ADDR(258))
>>>> #define LINEAR_PT_VIRT_END      (LINEAR_PT_VIRT_START + PML4_ENTRY_BYTES)
>>>> /* Slot 259: linear page table (shadow table). */
>>>> #define SH_LINEAR_PT_VIRT_START (PML4_ADDR(259))
>>>> #define SH_LINEAR_PT_VIRT_END   (SH_LINEAR_PT_VIRT_START + PML4_ENTRY_BYTES)
>>>>
>>>> These linear page tables exist in the Xen page tables at basically
>>>> all times as long as a shadow guest's vCPU is in context. They're
>>>> there to limit the overhead of accessing guest page tables and
>>>> their shadows from inside Xen.
>>>
>>> Oh, I have to admit I know very little about all this, and I'm not
>>> able to find a description of how this is to be used.
>>>
>>> I think the shadow linear page tables should be per-pCPU, and hence
>>> they cannot be modified by the guest while a spurious page fault is
>>> being processed? (since the vCPU running on the pCPU is in Xen
>>> context).
>>
>> A guest would have for some linear address e.g.
>>
>> vCR3 -> G4 -> G3 -> G2 -> G1
>>
>> visible to some random set of its CPUs (i.e. the same CR3 can be in
>> use by multiple vCPU-s). This will then have shadows like this:
>>
>> pCR3 -> S4 -> S3 -> S2 -> S1
>>
>> The G4 page gets hooked up into LINEAR_PT (i.e. becomes an L3 page)
>> for all vCPU-s that have this very CR3 active. Same goes for S4 and
>> SH_LINEAR_PT respectively. Afaik shadows of guest page tables also
>> don't get created on a per-pCPU basis - a page table either has a
>> shadow, or it doesn't.
>>
>> Hence afaict page tables mapped through these facilities (and
>> reachable while in Xen) can very well be modified "behind our backs".
> 
> Oh, I see. I'm still slightly puzzled because __hvm_copy seems to
> always map the guest page, and sh_gva_to_gfn also seems to just walk
> the guest page tables using the software implemented page table
> walker, and returns a gfn (not a mfn). I was expecting __hvm_copy to
> somehow make use of those shadow page tables when performing accesses
> to guest memory?

No, that's using the guest page table walker plus the p2m table
instead.

> Is shadow code using some of this internally then?

Yes. Just grep the shadow code for linear_l[1234]_table to find
the uses.

Jan