[v1] x86: mirror compat argument translation area for 32-bit PV

[PATCH][4.15] x86: mirror compat argument translation area for 32-bit PV

Posted by Jan Beulich 3 years, 9 months ago

Now that we guard the entire Xen VA space against speculative abuse
through hypervisor accesses to guest memory, the argument translation
area's VA also needs to live outside this range, at least for 32-bit PV
guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
uniformly.

While this could be conditionalized upon CONFIG_PV32 &&
CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
keeps the code more legible imo.

Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
                (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
                 l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
     }
+
+    /* Slot 511: Per-domain mappings mirror. */
+    if ( !is_pv_64bit_domain(d) )
+        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
+            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
 }
 
 bool fill_ro_mpt(mfn_t mfn)
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -159,11 +159,11 @@ extern unsigned char boot_edid_info[128]
  *    1:1 direct mapping of all physical memory.
 #endif
  *  0xffff880000000000 - 0xffffffffffffffff [120TB,             PML4:272-511]
- *    PV: Guest-defined use.
+ *    PV (64-bit): Guest-defined use.
  *  0xffff880000000000 - 0xffffff7fffffffff [119.5TB,           PML4:272-510]
  *    HVM/idle: continuation of 1:1 mapping
  *  0xffffff8000000000 - 0xffffffffffffffff [512GB, 2^39 bytes  PML4:511]
- *    HVM/idle: unused
+ *    HVM / 32-bit PV: Secondary per-domain mappings.
  *
  * Compatibility guest area layout:
  *  0x0000000000000000 - 0x00000000f57fffff [3928MB,            PML4:0]
@@ -242,6 +242,9 @@ extern unsigned char boot_edid_info[128]
 #endif
 #define DIRECTMAP_VIRT_END      (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE)
 
+/* Slot 511: secondary per-domain mappings (for compat xlat area accesses). */
+#define PERDOMAIN2_VIRT_START   (PML4_ADDR(511))
+
 #ifndef __ASSEMBLY__
 
 #ifdef CONFIG_PV32
--- a/xen/include/asm-x86/x86_64/uaccess.h
+++ b/xen/include/asm-x86/x86_64/uaccess.h
@@ -1,7 +1,17 @@
 #ifndef __X86_64_UACCESS_H
 #define __X86_64_UACCESS_H
 
-#define COMPAT_ARG_XLAT_VIRT_BASE ((void *)ARG_XLAT_START(current))
+/*
+ * With CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS (apparent) PV guest accesses
+ * are prohibited to touch the Xen private VA range.  The compat argument
+ * translation area, therefore, can't live within this range.  Domains
+ * (potentially) in need of argument translation (32-bit PV, possibly HVM) get
+ * a secondary mapping installed, which needs to be used for such accesses in
+ * the PV case, and will also be used for HVM to avoid extra conditionals.
+ */
+#define COMPAT_ARG_XLAT_VIRT_BASE ((void *)ARG_XLAT_START(current) + \
+                                   (PERDOMAIN2_VIRT_START - \
+                                    PERDOMAIN_VIRT_START))
 #define COMPAT_ARG_XLAT_SIZE      (2*PAGE_SIZE)
 struct vcpu;
 int setup_compat_arg_xlat(struct vcpu *v);

Re: [PATCH][4.15] x86: mirror compat argument translation area for 32-bit PV

Posted by Roger Pau Monné 3 years, 9 months ago

On Mon, Feb 22, 2021 at 11:27:07AM +0100, Jan Beulich wrote:
> Now that we guard the entire Xen VA space against speculative abuse
> through hypervisor accesses to guest memory, the argument translation
> area's VA also needs to live outside this range, at least for 32-bit PV
> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
> uniformly.

Since you are double mapping the per-domain virtual area, won't it
make more sense to map it just once outside of the Xen virtual space
area? (so it's always using PML4_ADDR(511))

Is there anything concerning in the per-domain area that should be
protected against speculative accesses?

Thanks, Roger.

Re: [PATCH][4.15] x86: mirror compat argument translation area for 32-bit PV

Posted by Jan Beulich 3 years, 9 months ago

On 22.02.2021 12:35, Roger Pau Monné wrote:
> On Mon, Feb 22, 2021 at 11:27:07AM +0100, Jan Beulich wrote:
>> Now that we guard the entire Xen VA space against speculative abuse
>> through hypervisor accesses to guest memory, the argument translation
>> area's VA also needs to live outside this range, at least for 32-bit PV
>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
>> uniformly.
> 
> Since you are double mapping the per-domain virtual area, won't it
> make more sense to map it just once outside of the Xen virtual space
> area? (so it's always using PML4_ADDR(511))

This would then require conditionals in paths using other parts of
the per-domain mappings for 64-bit PV, as the same range is under
guest control there.

> Is there anything concerning in the per-domain area that should be
> protected against speculative accesses?

First of all this is an unrelated question - I'm not changing what
gets accessed there, but only through which addresses these accesses
happen. What lives there are GDT/LDT mappings, map cache, and the
argument translation area. The guest has no control (or very limited
when considering GDT/LDT one) over the accesses made to this space.

Jan

Re: [PATCH][4.15] x86: mirror compat argument translation area for 32-bit PV

Posted by Andrew Cooper 3 years, 9 months ago

On 22/02/2021 10:27, Jan Beulich wrote:
> Now that we guard the entire Xen VA space against speculative abuse
> through hypervisor accesses to guest memory, the argument translation
> area's VA also needs to live outside this range, at least for 32-bit PV
> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
> uniformly.
>
> While this could be conditionalized upon CONFIG_PV32 &&
> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
> keeps the code more legible imo.
>
> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
>                 (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
>                  l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
>      }
> +
> +    /* Slot 511: Per-domain mappings mirror. */
> +    if ( !is_pv_64bit_domain(d) )
> +        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
> +            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);

This virtual address is inside the extended directmap.  You're going to
need to rearrange more things than just this, to make it safe.

While largely a theoretical risk as far as the directmap goes, there is
now a rather higher risk of colliding with the ERR_PTR() range.  Its bad
enough this infrastructure is inherently unsafe with 64bit PV guests,

~Andrew

Re: [PATCH][4.15] x86: mirror compat argument translation area for 32-bit PV

Posted by Jan Beulich 3 years, 9 months ago

On 22.02.2021 15:14, Andrew Cooper wrote:
> On 22/02/2021 10:27, Jan Beulich wrote:
>> Now that we guard the entire Xen VA space against speculative abuse
>> through hypervisor accesses to guest memory, the argument translation
>> area's VA also needs to live outside this range, at least for 32-bit PV
>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
>> uniformly.
>>
>> While this could be conditionalized upon CONFIG_PV32 &&
>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
>> keeps the code more legible imo.
>>
>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
>>                 (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
>>                  l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
>>      }
>> +
>> +    /* Slot 511: Per-domain mappings mirror. */
>> +    if ( !is_pv_64bit_domain(d) )
>> +        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
>> +            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
> 
> This virtual address is inside the extended directmap.

No. That one covers only the range excluding the last L4 slot.

>  You're going to
> need to rearrange more things than just this, to make it safe.

I specifically picked that entry because I don't think further
arrangements are needed.

> While largely a theoretical risk as far as the directmap goes, there is
> now a rather higher risk of colliding with the ERR_PTR() range.  Its bad
> enough this infrastructure is inherently unsafe with 64bit PV guests,

The ERR_PTR() range is still _far_ away from the sub-ranges we
use in the per-domain area.

Jan

Re: [PATCH][4.15] x86: mirror compat argument translation area for 32-bit PV

Posted by Andrew Cooper 3 years, 9 months ago

On 22/02/2021 14:22, Jan Beulich wrote:
> On 22.02.2021 15:14, Andrew Cooper wrote:
>> On 22/02/2021 10:27, Jan Beulich wrote:
>>> Now that we guard the entire Xen VA space against speculative abuse
>>> through hypervisor accesses to guest memory, the argument translation
>>> area's VA also needs to live outside this range, at least for 32-bit PV
>>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
>>> uniformly.
>>>
>>> While this could be conditionalized upon CONFIG_PV32 &&
>>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
>>> keeps the code more legible imo.
>>>
>>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
>>>                 (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
>>>                  l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
>>>      }
>>> +
>>> +    /* Slot 511: Per-domain mappings mirror. */
>>> +    if ( !is_pv_64bit_domain(d) )
>>> +        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
>>> +            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
>> This virtual address is inside the extended directmap.
> No. That one covers only the range excluding the last L4 slot.
>
>>   You're going to
>> need to rearrange more things than just this, to make it safe.
> I specifically picked that entry because I don't think further
> arrangements are needed.

map_domain_page() will blindly hand this virtual address if an
appropriate mfn is passed, because there are no suitability checks.

The error handling isn't great, but at least any attempt to use that
pointer would fault, which is now no longer the case.

LA57 machines can have RAM or NVDIMMs in a range which will tickle this
bug.  In fact, they can have MFNs which would wrap around 0 into guest
space.

~Andrew

Re: [PATCH][4.15] x86: mirror compat argument translation area for 32-bit PV

Posted by Jan Beulich 3 years, 9 months ago

On 22.02.2021 17:47, Andrew Cooper wrote:
> On 22/02/2021 14:22, Jan Beulich wrote:
>> On 22.02.2021 15:14, Andrew Cooper wrote:
>>> On 22/02/2021 10:27, Jan Beulich wrote:
>>>> Now that we guard the entire Xen VA space against speculative abuse
>>>> through hypervisor accesses to guest memory, the argument translation
>>>> area's VA also needs to live outside this range, at least for 32-bit PV
>>>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
>>>> uniformly.
>>>>
>>>> While this could be conditionalized upon CONFIG_PV32 &&
>>>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
>>>> keeps the code more legible imo.
>>>>
>>>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> --- a/xen/arch/x86/mm.c
>>>> +++ b/xen/arch/x86/mm.c
>>>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
>>>>                 (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
>>>>                  l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
>>>>      }
>>>> +
>>>> +    /* Slot 511: Per-domain mappings mirror. */
>>>> +    if ( !is_pv_64bit_domain(d) )
>>>> +        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
>>>> +            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
>>> This virtual address is inside the extended directmap.
>> No. That one covers only the range excluding the last L4 slot.
>>
>>>   You're going to
>>> need to rearrange more things than just this, to make it safe.
>> I specifically picked that entry because I don't think further
>> arrangements are needed.
> 
> map_domain_page() will blindly hand this virtual address if an
> appropriate mfn is passed, because there are no suitability checks.
> 
> The error handling isn't great, but at least any attempt to use that
> pointer would fault, which is now no longer the case.
> 
> LA57 machines can have RAM or NVDIMMs in a range which will tickle this
> bug.  In fact, they can have MFNs which would wrap around 0 into guest
> space.

This latter fact would be a far worse problem than accesses through
the last L4 entry, populated or not. However, I don't really follow
your concern: There are ample cases where functions assume to be
passed sane arguments. A pretty good (imo) comparison here is with
mfn_to_page(), which also will assume a sane MFN (i.e. one with a
representable (in frame_table[]) value. If there was a bug, it
would be either the caller taking an MFN out of thin air, or us
introducing MFNs we can't cover in any of direct map, frame table,
or M2P. But afaict there is guarding against the latter (look for
the "Ignoring inaccessible memory range" loge messages in setup.c).

In any event - imo any such bug would need fixing there, rather
than being an argument against the change here.

Also, besides your objection going quite a bit too far for my taste,
I miss any form of an alternative suggestion. Do you want the mirror
range to be put below the canonical boundary? Taking in mind your
wrapping consideration, about _any_ VA would be unsuitable.

Jan

Re: [PATCH][4.15] x86: mirror compat argument translation area for 32-bit PV

Posted by Andrew Cooper 3 years, 9 months ago

On 23/02/2021 07:13, Jan Beulich wrote:
> On 22.02.2021 17:47, Andrew Cooper wrote:
>> On 22/02/2021 14:22, Jan Beulich wrote:
>>> On 22.02.2021 15:14, Andrew Cooper wrote:
>>>> On 22/02/2021 10:27, Jan Beulich wrote:
>>>>> Now that we guard the entire Xen VA space against speculative abuse
>>>>> through hypervisor accesses to guest memory, the argument translation
>>>>> area's VA also needs to live outside this range, at least for 32-bit PV
>>>>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
>>>>> uniformly.
>>>>>
>>>>> While this could be conditionalized upon CONFIG_PV32 &&
>>>>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
>>>>> keeps the code more legible imo.
>>>>>
>>>>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>
>>>>> --- a/xen/arch/x86/mm.c
>>>>> +++ b/xen/arch/x86/mm.c
>>>>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
>>>>>                 (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
>>>>>                  l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
>>>>>      }
>>>>> +
>>>>> +    /* Slot 511: Per-domain mappings mirror. */
>>>>> +    if ( !is_pv_64bit_domain(d) )
>>>>> +        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
>>>>> +            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
>>>> This virtual address is inside the extended directmap.
>>> No. That one covers only the range excluding the last L4 slot.
>>>
>>>>   You're going to
>>>> need to rearrange more things than just this, to make it safe.
>>> I specifically picked that entry because I don't think further
>>> arrangements are needed.
>> map_domain_page() will blindly hand this virtual address if an
>> appropriate mfn is passed, because there are no suitability checks.
>>
>> The error handling isn't great, but at least any attempt to use that
>> pointer would fault, which is now no longer the case.
>>
>> LA57 machines can have RAM or NVDIMMs in a range which will tickle this
>> bug.  In fact, they can have MFNs which would wrap around 0 into guest
>> space.
> This latter fact would be a far worse problem than accesses through
> the last L4 entry, populated or not. However, I don't really follow
> your concern: There are ample cases where functions assume to be
> passed sane arguments. A pretty good (imo) comparison here is with
> mfn_to_page(), which also will assume a sane MFN (i.e. one with a
> representable (in frame_table[]) value. If there was a bug, it
> would be either the caller taking an MFN out of thin air, or us
> introducing MFNs we can't cover in any of direct map, frame table,
> or M2P. But afaict there is guarding against the latter (look for
> the "Ignoring inaccessible memory range" loge messages in setup.c).

I'm not trying to say that this patch has introduced the bug.

But we should absolutely have defence in depth where appropriate.  I
don't mind if it is an unrelated change, but 4.15 does start trying to
introduce support for IceLake and I think this qualifies as a reasonable
precaution to add.

> In any event - imo any such bug would need fixing there, rather
> than being an argument against the change here.
>
> Also, besides your objection going quite a bit too far for my taste,
> I miss any form of an alternative suggestion. Do you want the mirror
> range to be put below the canonical boundary? Taking in mind your
> wrapping consideration, about _any_ VA would be unsuitable.

Honestly, I want the XLAT area to disappear entirely.  This is partly
PTSD from the acquire_resource fixes, but was an opinion held from
before that series as well, and I'm convinced that the result without
XLAT will be clearer and faster code.

Obviously, that's not an option at this point in 4.15.


I'd forgotten that the lower half of the address space was available,
and I do prefer that idea.  We don't need to put everything adjacent
together in the upper half.

~Andrew

Re: [PATCH][4.15] x86: mirror compat argument translation area for 32-bit PV

Posted by Roger Pau Monné 3 years, 9 months ago

On Mon, Feb 22, 2021 at 04:47:38PM +0000, Andrew Cooper wrote:
> On 22/02/2021 14:22, Jan Beulich wrote:
> > On 22.02.2021 15:14, Andrew Cooper wrote:
> >> On 22/02/2021 10:27, Jan Beulich wrote:
> >>> Now that we guard the entire Xen VA space against speculative abuse
> >>> through hypervisor accesses to guest memory, the argument translation
> >>> area's VA also needs to live outside this range, at least for 32-bit PV
> >>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
> >>> uniformly.
> >>>
> >>> While this could be conditionalized upon CONFIG_PV32 &&
> >>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
> >>> keeps the code more legible imo.
> >>>
> >>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >>>
> >>> --- a/xen/arch/x86/mm.c
> >>> +++ b/xen/arch/x86/mm.c
> >>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
> >>>                 (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
> >>>                  l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
> >>>      }
> >>> +
> >>> +    /* Slot 511: Per-domain mappings mirror. */
> >>> +    if ( !is_pv_64bit_domain(d) )
> >>> +        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
> >>> +            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
> >> This virtual address is inside the extended directmap.
> > No. That one covers only the range excluding the last L4 slot.
> >
> >>   You're going to
> >> need to rearrange more things than just this, to make it safe.
> > I specifically picked that entry because I don't think further
> > arrangements are needed.
> 
> map_domain_page() will blindly hand this virtual address if an
> appropriate mfn is passed, because there are no suitability checks.
> 
> The error handling isn't great, but at least any attempt to use that
> pointer would fault, which is now no longer the case.

AFAICT map_domain_page will never populate the error page virtual
address, as the slot end (MAPCACHE_VIRT_END) is way lower than
-MAX_ERRNO?

We could add:

BUILD_BUG_ON((PERDOMAIN_VIRT_SLOT(PERDOMAIN_SLOTS) - 1) >= (unsigned long)-MAX_ERRNO);

For safety somewhere.

Roger.

Re: [PATCH][4.15] x86: mirror compat argument translation area for 32-bit PV

Posted by Ian Jackson 3 years, 9 months ago

Jan Beulich writes ("[PATCH][4.15] x86: mirror compat argument translation area for 32-bit PV"):
> Now that we guard the entire Xen VA space against speculative abuse
> through hypervisor accesses to guest memory, the argument translation
> area's VA also needs to live outside this range, at least for 32-bit PV
> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
> uniformly.
> 
> While this could be conditionalized upon CONFIG_PV32 &&
> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
> keeps the code more legible imo.
> 
> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Release-Acked-by: Ian Jackson <iwj@xenproject.org>

Despite the fact that this is trying to fix the current breakage, I
would still want to see a full maintainer review.

Thanks,
Ian.