[PATCH] x86/hvm: set 'ipat' in EPT for special pages

Paul Durrant posted 1 patch 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20200731104644.20906-1-paul@xen.org
xen/arch/x86/hvm/mtrr.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

[PATCH] x86/hvm: set 'ipat' in EPT for special pages

Posted by Paul Durrant 1 week ago
From: Paul Durrant <pdurrant@amazon.com>

All non-MMIO ranges (i.e those not mapping real device MMIO regions) that
map valid MFNs are normally marked MTRR_TYPE_WRBACK and 'ipat' is set. Hence
when PV drivers running in a guest populate the BAR space of the Xen Platform
PCI Device with pages such as the Shared Info page or Grant Table pages,
accesses to these pages will be cachable.

However, should IOMMU mappings be enabled be enabled for the guest then these
accesses become uncachable. This has a substantial negative effect on I/O
throughput of PV devices. Arguably PV drivers should bot be using BAR space to
host the Shared Info and Grant Table pages but it is currently commonplace for
them to do this and so this problem needs mitigation. Hence this patch makes
sure the 'ipat' bit is set for any special page regardless of where in GFN
space it is mapped.

NOTE: Clearly this mitigation only applies to Intel EPT. It is not obvious
      that there is any similar mitigation possible for AMD NPT. Downstreams
      such as Citrix XenServer have been carrying a patch similar to this for
      several releases though.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/mtrr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 511c3be1c8..3ad813ed15 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -830,7 +830,8 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
         return MTRR_TYPE_UNCACHABLE;
     }
 
-    if ( !is_iommu_enabled(d) && !cache_flush_permitted(d) )
+    if ( (!is_iommu_enabled(d) && !cache_flush_permitted(d)) ||
+         is_special_page(mfn_to_page(mfn)) )
     {
         *ipat = 1;
         return MTRR_TYPE_WRBACK;
-- 
2.20.1


Re: [PATCH] x86/hvm: set 'ipat' in EPT for special pages

Posted by Andrew Cooper 1 week ago
On 31/07/2020 11:46, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
>
> All non-MMIO ranges (i.e those not mapping real device MMIO regions) that
> map valid MFNs are normally marked MTRR_TYPE_WRBACK and 'ipat' is set. Hence
> when PV drivers running in a guest populate the BAR space of the Xen Platform
> PCI Device with pages such as the Shared Info page or Grant Table pages,
> accesses to these pages will be cachable.
>
> However, should IOMMU mappings be enabled be enabled for the guest then these
> accesses become uncachable. This has a substantial negative effect on I/O
> throughput of PV devices. Arguably PV drivers should bot be using BAR space to
> host the Shared Info and Grant Table pages but it is currently commonplace for
> them to do this and so this problem needs mitigation. Hence this patch makes
> sure the 'ipat' bit is set for any special page regardless of where in GFN
> space it is mapped.
>
> NOTE: Clearly this mitigation only applies to Intel EPT. It is not obvious
>       that there is any similar mitigation possible for AMD NPT. Downstreams
>       such as Citrix XenServer have been carrying a patch similar to this for
>       several releases though.

https://github.com/xenserver/xen.pg/blob/XS-8.2.x/master/xen-override-caching-cp-26562.patch

(Yay for internal ticket references escaping into the wild.)


However, it is very important to be aware that this is just papering
over the problem, and it will cease to function as soon as we get MKTME
support.  When we hit that point, iPAT cannot be used, as it will cause
data corruption in guests.

The only correct way to fix this is to not (mis)use BAR space for RAM
mappings.

~Andrew

RE: [PATCH] x86/hvm: set 'ipat' in EPT for special pages

Posted by Paul Durrant 1 week ago
> -----Original Message-----
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: 31 July 2020 12:21
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Paul Durrant <pdurrant@amazon.com>; Jan Beulich <jbeulich@suse.com>; Wei Liu <wl@xen.org>; Roger
> Pau Monné <roger.pau@citrix.com>
> Subject: Re: [PATCH] x86/hvm: set 'ipat' in EPT for special pages
> 
> On 31/07/2020 11:46, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > All non-MMIO ranges (i.e those not mapping real device MMIO regions) that
> > map valid MFNs are normally marked MTRR_TYPE_WRBACK and 'ipat' is set. Hence
> > when PV drivers running in a guest populate the BAR space of the Xen Platform
> > PCI Device with pages such as the Shared Info page or Grant Table pages,
> > accesses to these pages will be cachable.
> >
> > However, should IOMMU mappings be enabled be enabled for the guest then these
> > accesses become uncachable. This has a substantial negative effect on I/O
> > throughput of PV devices. Arguably PV drivers should bot be using BAR space to
> > host the Shared Info and Grant Table pages but it is currently commonplace for
> > them to do this and so this problem needs mitigation. Hence this patch makes
> > sure the 'ipat' bit is set for any special page regardless of where in GFN
> > space it is mapped.
> >
> > NOTE: Clearly this mitigation only applies to Intel EPT. It is not obvious
> >       that there is any similar mitigation possible for AMD NPT. Downstreams
> >       such as Citrix XenServer have been carrying a patch similar to this for
> >       several releases though.
> 
> https://github.com/xenserver/xen.pg/blob/XS-8.2.x/master/xen-override-caching-cp-26562.patch
> 
> (Yay for internal ticket references escaping into the wild.)
> 

:-)

> 
> However, it is very important to be aware that this is just papering
> over the problem, and it will cease to function as soon as we get MKTME
> support.  When we hit that point, iPAT cannot be used, as it will cause
> data corruption in guests.
> 
> The only correct way to fix this is to not (mis)use BAR space for RAM
> mappings.
> 

Oh yes, t
his is only a mitigation. I believe Roger is working on a mechanism for guests to query for non-populated RAM space, which would be suitable for use by PV drivers.

  Paul

> ~Andrew


Re: [PATCH] x86/hvm: set 'ipat' in EPT for special pages

Posted by Jan Beulich 1 week ago
On 31.07.2020 12:46, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> All non-MMIO ranges (i.e those not mapping real device MMIO regions) that
> map valid MFNs are normally marked MTRR_TYPE_WRBACK and 'ipat' is set. Hence
> when PV drivers running in a guest populate the BAR space of the Xen Platform
> PCI Device with pages such as the Shared Info page or Grant Table pages,
> accesses to these pages will be cachable.
> 
> However, should IOMMU mappings be enabled be enabled for the guest then these
> accesses become uncachable. This has a substantial negative effect on I/O
> throughput of PV devices. Arguably PV drivers should bot be using BAR space to
> host the Shared Info and Grant Table pages but it is currently commonplace for
> them to do this and so this problem needs mitigation. Hence this patch makes
> sure the 'ipat' bit is set for any special page regardless of where in GFN
> space it is mapped.
> 
> NOTE: Clearly this mitigation only applies to Intel EPT. It is not obvious
>       that there is any similar mitigation possible for AMD NPT. Downstreams
>       such as Citrix XenServer have been carrying a patch similar to this for
>       several releases though.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

However, ...

> --- a/xen/arch/x86/hvm/mtrr.c
> +++ b/xen/arch/x86/hvm/mtrr.c
> @@ -830,7 +830,8 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
>          return MTRR_TYPE_UNCACHABLE;
>      }
>  
> -    if ( !is_iommu_enabled(d) && !cache_flush_permitted(d) )
> +    if ( (!is_iommu_enabled(d) && !cache_flush_permitted(d)) ||
> +         is_special_page(mfn_to_page(mfn)) )
>      {
>          *ipat = 1;
>          return MTRR_TYPE_WRBACK;

... shouldn't we leverages this (right away?) to do away with the
APIC access page special case a few lines up from here? It is my
understanding that vmx_alloc_vlapic_mapping() uses
set_mmio_p2m_entry() just in order to get ipat set on the resulting
EPT entry. Yet with the allocation using MEMF_no_refcount, this will
now be the result even if no artificial MMIO mapping was created.

Jan

RE: [PATCH] x86/hvm: set 'ipat' in EPT for special pages

Posted by Paul Durrant 1 week ago
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 31 July 2020 12:15
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>
> Subject: Re: [PATCH] x86/hvm: set 'ipat' in EPT for special pages
> 
> On 31.07.2020 12:46, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > All non-MMIO ranges (i.e those not mapping real device MMIO regions) that
> > map valid MFNs are normally marked MTRR_TYPE_WRBACK and 'ipat' is set. Hence
> > when PV drivers running in a guest populate the BAR space of the Xen Platform
> > PCI Device with pages such as the Shared Info page or Grant Table pages,
> > accesses to these pages will be cachable.
> >
> > However, should IOMMU mappings be enabled be enabled for the guest then these
> > accesses become uncachable. This has a substantial negative effect on I/O
> > throughput of PV devices. Arguably PV drivers should bot be using BAR space to
> > host the Shared Info and Grant Table pages but it is currently commonplace for
> > them to do this and so this problem needs mitigation. Hence this patch makes
> > sure the 'ipat' bit is set for any special page regardless of where in GFN
> > space it is mapped.
> >
> > NOTE: Clearly this mitigation only applies to Intel EPT. It is not obvious
> >       that there is any similar mitigation possible for AMD NPT. Downstreams
> >       such as Citrix XenServer have been carrying a patch similar to this for
> >       several releases though.
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> However, ...
> 
> > --- a/xen/arch/x86/hvm/mtrr.c
> > +++ b/xen/arch/x86/hvm/mtrr.c
> > @@ -830,7 +830,8 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
> >          return MTRR_TYPE_UNCACHABLE;
> >      }
> >
> > -    if ( !is_iommu_enabled(d) && !cache_flush_permitted(d) )
> > +    if ( (!is_iommu_enabled(d) && !cache_flush_permitted(d)) ||
> > +         is_special_page(mfn_to_page(mfn)) )
> >      {
> >          *ipat = 1;
> >          return MTRR_TYPE_WRBACK;
> 
> ... shouldn't we leverages this (right away?) to do away with the
> APIC access page special case a few lines up from here? It is my
> understanding that vmx_alloc_vlapic_mapping() uses
> set_mmio_p2m_entry() just in order to get ipat set on the resulting
> EPT entry. Yet with the allocation using MEMF_no_refcount, this will
> now be the result even if no artificial MMIO mapping was created.

That's a good point. Best handled by a separate patch I think so I'll re-send this with your R-b plus a follow up patch as a v2.

  Paul

> 
> Jan