Now that we guard the entire Xen VA space against speculative abuse
through hypervisor accesses to guest memory, the argument translation
area's VA also needs to live outside this range, at least for 32-bit PV
guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
uniformly.
While this could be conditionalized upon CONFIG_PV32 &&
CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
keeps the code more legible imo.
Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
(ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
}
+
+ /* Slot 511: Per-domain mappings mirror. */
+ if ( !is_pv_64bit_domain(d) )
+ l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
+ l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
}
bool fill_ro_mpt(mfn_t mfn)
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -159,11 +159,11 @@ extern unsigned char boot_edid_info[128]
* 1:1 direct mapping of all physical memory.
#endif
* 0xffff880000000000 - 0xffffffffffffffff [120TB, PML4:272-511]
- * PV: Guest-defined use.
+ * PV (64-bit): Guest-defined use.
* 0xffff880000000000 - 0xffffff7fffffffff [119.5TB, PML4:272-510]
* HVM/idle: continuation of 1:1 mapping
* 0xffffff8000000000 - 0xffffffffffffffff [512GB, 2^39 bytes PML4:511]
- * HVM/idle: unused
+ * HVM / 32-bit PV: Secondary per-domain mappings.
*
* Compatibility guest area layout:
* 0x0000000000000000 - 0x00000000f57fffff [3928MB, PML4:0]
@@ -242,6 +242,9 @@ extern unsigned char boot_edid_info[128]
#endif
#define DIRECTMAP_VIRT_END (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE)
+/* Slot 511: secondary per-domain mappings (for compat xlat area accesses). */
+#define PERDOMAIN2_VIRT_START (PML4_ADDR(511))
+
#ifndef __ASSEMBLY__
#ifdef CONFIG_PV32
--- a/xen/include/asm-x86/x86_64/uaccess.h
+++ b/xen/include/asm-x86/x86_64/uaccess.h
@@ -1,7 +1,17 @@
#ifndef __X86_64_UACCESS_H
#define __X86_64_UACCESS_H
-#define COMPAT_ARG_XLAT_VIRT_BASE ((void *)ARG_XLAT_START(current))
+/*
+ * With CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS (apparent) PV guest accesses
+ * are prohibited to touch the Xen private VA range. The compat argument
+ * translation area, therefore, can't live within this range. Domains
+ * (potentially) in need of argument translation (32-bit PV, possibly HVM) get
+ * a secondary mapping installed, which needs to be used for such accesses in
+ * the PV case, and will also be used for HVM to avoid extra conditionals.
+ */
+#define COMPAT_ARG_XLAT_VIRT_BASE ((void *)ARG_XLAT_START(current) + \
+ (PERDOMAIN2_VIRT_START - \
+ PERDOMAIN_VIRT_START))
#define COMPAT_ARG_XLAT_SIZE (2*PAGE_SIZE)
struct vcpu;
int setup_compat_arg_xlat(struct vcpu *v);
On Mon, Feb 22, 2021 at 11:27:07AM +0100, Jan Beulich wrote: > Now that we guard the entire Xen VA space against speculative abuse > through hypervisor accesses to guest memory, the argument translation > area's VA also needs to live outside this range, at least for 32-bit PV > guests. To avoid extra is_hvm_*() conditionals, use the alternative VA > uniformly. Since you are double mapping the per-domain virtual area, won't it make more sense to map it just once outside of the Xen virtual space area? (so it's always using PML4_ADDR(511)) Is there anything concerning in the per-domain area that should be protected against speculative accesses? Thanks, Roger.
On 22.02.2021 12:35, Roger Pau Monné wrote: > On Mon, Feb 22, 2021 at 11:27:07AM +0100, Jan Beulich wrote: >> Now that we guard the entire Xen VA space against speculative abuse >> through hypervisor accesses to guest memory, the argument translation >> area's VA also needs to live outside this range, at least for 32-bit PV >> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA >> uniformly. > > Since you are double mapping the per-domain virtual area, won't it > make more sense to map it just once outside of the Xen virtual space > area? (so it's always using PML4_ADDR(511)) This would then require conditionals in paths using other parts of the per-domain mappings for 64-bit PV, as the same range is under guest control there. > Is there anything concerning in the per-domain area that should be > protected against speculative accesses? First of all this is an unrelated question - I'm not changing what gets accessed there, but only through which addresses these accesses happen. What lives there are GDT/LDT mappings, map cache, and the argument translation area. The guest has no control (or very limited when considering GDT/LDT one) over the accesses made to this space. Jan
On 22/02/2021 10:27, Jan Beulich wrote: > Now that we guard the entire Xen VA space against speculative abuse > through hypervisor accesses to guest memory, the argument translation > area's VA also needs to live outside this range, at least for 32-bit PV > guests. To avoid extra is_hvm_*() conditionals, use the alternative VA > uniformly. > > While this could be conditionalized upon CONFIG_PV32 && > CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals > keeps the code more legible imo. > > Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse") > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t > (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots - > l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t)); > } > + > + /* Slot 511: Per-domain mappings mirror. */ > + if ( !is_pv_64bit_domain(d) ) > + l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] = > + l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW); This virtual address is inside the extended directmap. You're going to need to rearrange more things than just this, to make it safe. While largely a theoretical risk as far as the directmap goes, there is now a rather higher risk of colliding with the ERR_PTR() range. Its bad enough this infrastructure is inherently unsafe with 64bit PV guests, ~Andrew
On 22.02.2021 15:14, Andrew Cooper wrote: > On 22/02/2021 10:27, Jan Beulich wrote: >> Now that we guard the entire Xen VA space against speculative abuse >> through hypervisor accesses to guest memory, the argument translation >> area's VA also needs to live outside this range, at least for 32-bit PV >> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA >> uniformly. >> >> While this could be conditionalized upon CONFIG_PV32 && >> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals >> keeps the code more legible imo. >> >> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse") >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> >> --- a/xen/arch/x86/mm.c >> +++ b/xen/arch/x86/mm.c >> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t >> (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots - >> l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t)); >> } >> + >> + /* Slot 511: Per-domain mappings mirror. */ >> + if ( !is_pv_64bit_domain(d) ) >> + l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] = >> + l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW); > > This virtual address is inside the extended directmap. No. That one covers only the range excluding the last L4 slot. > You're going to > need to rearrange more things than just this, to make it safe. I specifically picked that entry because I don't think further arrangements are needed. > While largely a theoretical risk as far as the directmap goes, there is > now a rather higher risk of colliding with the ERR_PTR() range. Its bad > enough this infrastructure is inherently unsafe with 64bit PV guests, The ERR_PTR() range is still _far_ away from the sub-ranges we use in the per-domain area. Jan
On 22/02/2021 14:22, Jan Beulich wrote: > On 22.02.2021 15:14, Andrew Cooper wrote: >> On 22/02/2021 10:27, Jan Beulich wrote: >>> Now that we guard the entire Xen VA space against speculative abuse >>> through hypervisor accesses to guest memory, the argument translation >>> area's VA also needs to live outside this range, at least for 32-bit PV >>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA >>> uniformly. >>> >>> While this could be conditionalized upon CONFIG_PV32 && >>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals >>> keeps the code more legible imo. >>> >>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse") >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> >>> --- a/xen/arch/x86/mm.c >>> +++ b/xen/arch/x86/mm.c >>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t >>> (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots - >>> l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t)); >>> } >>> + >>> + /* Slot 511: Per-domain mappings mirror. */ >>> + if ( !is_pv_64bit_domain(d) ) >>> + l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] = >>> + l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW); >> This virtual address is inside the extended directmap. > No. That one covers only the range excluding the last L4 slot. > >> You're going to >> need to rearrange more things than just this, to make it safe. > I specifically picked that entry because I don't think further > arrangements are needed. map_domain_page() will blindly hand this virtual address if an appropriate mfn is passed, because there are no suitability checks. The error handling isn't great, but at least any attempt to use that pointer would fault, which is now no longer the case. LA57 machines can have RAM or NVDIMMs in a range which will tickle this bug. In fact, they can have MFNs which would wrap around 0 into guest space. ~Andrew
On 22.02.2021 17:47, Andrew Cooper wrote: > On 22/02/2021 14:22, Jan Beulich wrote: >> On 22.02.2021 15:14, Andrew Cooper wrote: >>> On 22/02/2021 10:27, Jan Beulich wrote: >>>> Now that we guard the entire Xen VA space against speculative abuse >>>> through hypervisor accesses to guest memory, the argument translation >>>> area's VA also needs to live outside this range, at least for 32-bit PV >>>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA >>>> uniformly. >>>> >>>> While this could be conditionalized upon CONFIG_PV32 && >>>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals >>>> keeps the code more legible imo. >>>> >>>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse") >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>> >>>> --- a/xen/arch/x86/mm.c >>>> +++ b/xen/arch/x86/mm.c >>>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t >>>> (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots - >>>> l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t)); >>>> } >>>> + >>>> + /* Slot 511: Per-domain mappings mirror. */ >>>> + if ( !is_pv_64bit_domain(d) ) >>>> + l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] = >>>> + l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW); >>> This virtual address is inside the extended directmap. >> No. That one covers only the range excluding the last L4 slot. >> >>> You're going to >>> need to rearrange more things than just this, to make it safe. >> I specifically picked that entry because I don't think further >> arrangements are needed. > > map_domain_page() will blindly hand this virtual address if an > appropriate mfn is passed, because there are no suitability checks. > > The error handling isn't great, but at least any attempt to use that > pointer would fault, which is now no longer the case. > > LA57 machines can have RAM or NVDIMMs in a range which will tickle this > bug. In fact, they can have MFNs which would wrap around 0 into guest > space. This latter fact would be a far worse problem than accesses through the last L4 entry, populated or not. However, I don't really follow your concern: There are ample cases where functions assume to be passed sane arguments. A pretty good (imo) comparison here is with mfn_to_page(), which also will assume a sane MFN (i.e. one with a representable (in frame_table[]) value. If there was a bug, it would be either the caller taking an MFN out of thin air, or us introducing MFNs we can't cover in any of direct map, frame table, or M2P. But afaict there is guarding against the latter (look for the "Ignoring inaccessible memory range" loge messages in setup.c). In any event - imo any such bug would need fixing there, rather than being an argument against the change here. Also, besides your objection going quite a bit too far for my taste, I miss any form of an alternative suggestion. Do you want the mirror range to be put below the canonical boundary? Taking in mind your wrapping consideration, about _any_ VA would be unsuitable. Jan
On 23/02/2021 07:13, Jan Beulich wrote: > On 22.02.2021 17:47, Andrew Cooper wrote: >> On 22/02/2021 14:22, Jan Beulich wrote: >>> On 22.02.2021 15:14, Andrew Cooper wrote: >>>> On 22/02/2021 10:27, Jan Beulich wrote: >>>>> Now that we guard the entire Xen VA space against speculative abuse >>>>> through hypervisor accesses to guest memory, the argument translation >>>>> area's VA also needs to live outside this range, at least for 32-bit PV >>>>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA >>>>> uniformly. >>>>> >>>>> While this could be conditionalized upon CONFIG_PV32 && >>>>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals >>>>> keeps the code more legible imo. >>>>> >>>>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse") >>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>>> >>>>> --- a/xen/arch/x86/mm.c >>>>> +++ b/xen/arch/x86/mm.c >>>>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t >>>>> (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots - >>>>> l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t)); >>>>> } >>>>> + >>>>> + /* Slot 511: Per-domain mappings mirror. */ >>>>> + if ( !is_pv_64bit_domain(d) ) >>>>> + l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] = >>>>> + l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW); >>>> This virtual address is inside the extended directmap. >>> No. That one covers only the range excluding the last L4 slot. >>> >>>> You're going to >>>> need to rearrange more things than just this, to make it safe. >>> I specifically picked that entry because I don't think further >>> arrangements are needed. >> map_domain_page() will blindly hand this virtual address if an >> appropriate mfn is passed, because there are no suitability checks. >> >> The error handling isn't great, but at least any attempt to use that >> pointer would fault, which is now no longer the case. >> >> LA57 machines can have RAM or NVDIMMs in a range which will tickle this >> bug. In fact, they can have MFNs which would wrap around 0 into guest >> space. > This latter fact would be a far worse problem than accesses through > the last L4 entry, populated or not. However, I don't really follow > your concern: There are ample cases where functions assume to be > passed sane arguments. A pretty good (imo) comparison here is with > mfn_to_page(), which also will assume a sane MFN (i.e. one with a > representable (in frame_table[]) value. If there was a bug, it > would be either the caller taking an MFN out of thin air, or us > introducing MFNs we can't cover in any of direct map, frame table, > or M2P. But afaict there is guarding against the latter (look for > the "Ignoring inaccessible memory range" loge messages in setup.c). I'm not trying to say that this patch has introduced the bug. But we should absolutely have defence in depth where appropriate. I don't mind if it is an unrelated change, but 4.15 does start trying to introduce support for IceLake and I think this qualifies as a reasonable precaution to add. > In any event - imo any such bug would need fixing there, rather > than being an argument against the change here. > > Also, besides your objection going quite a bit too far for my taste, > I miss any form of an alternative suggestion. Do you want the mirror > range to be put below the canonical boundary? Taking in mind your > wrapping consideration, about _any_ VA would be unsuitable. Honestly, I want the XLAT area to disappear entirely. This is partly PTSD from the acquire_resource fixes, but was an opinion held from before that series as well, and I'm convinced that the result without XLAT will be clearer and faster code. Obviously, that's not an option at this point in 4.15. I'd forgotten that the lower half of the address space was available, and I do prefer that idea. We don't need to put everything adjacent together in the upper half. ~Andrew
On Mon, Feb 22, 2021 at 04:47:38PM +0000, Andrew Cooper wrote: > On 22/02/2021 14:22, Jan Beulich wrote: > > On 22.02.2021 15:14, Andrew Cooper wrote: > >> On 22/02/2021 10:27, Jan Beulich wrote: > >>> Now that we guard the entire Xen VA space against speculative abuse > >>> through hypervisor accesses to guest memory, the argument translation > >>> area's VA also needs to live outside this range, at least for 32-bit PV > >>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA > >>> uniformly. > >>> > >>> While this could be conditionalized upon CONFIG_PV32 && > >>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals > >>> keeps the code more legible imo. > >>> > >>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse") > >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> > >>> > >>> --- a/xen/arch/x86/mm.c > >>> +++ b/xen/arch/x86/mm.c > >>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t > >>> (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots - > >>> l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t)); > >>> } > >>> + > >>> + /* Slot 511: Per-domain mappings mirror. */ > >>> + if ( !is_pv_64bit_domain(d) ) > >>> + l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] = > >>> + l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW); > >> This virtual address is inside the extended directmap. > > No. That one covers only the range excluding the last L4 slot. > > > >> You're going to > >> need to rearrange more things than just this, to make it safe. > > I specifically picked that entry because I don't think further > > arrangements are needed. > > map_domain_page() will blindly hand this virtual address if an > appropriate mfn is passed, because there are no suitability checks. > > The error handling isn't great, but at least any attempt to use that > pointer would fault, which is now no longer the case. AFAICT map_domain_page will never populate the error page virtual address, as the slot end (MAPCACHE_VIRT_END) is way lower than -MAX_ERRNO? We could add: BUILD_BUG_ON((PERDOMAIN_VIRT_SLOT(PERDOMAIN_SLOTS) - 1) >= (unsigned long)-MAX_ERRNO); For safety somewhere. Roger.
Jan Beulich writes ("[PATCH][4.15] x86: mirror compat argument translation area for 32-bit PV"): > Now that we guard the entire Xen VA space against speculative abuse > through hypervisor accesses to guest memory, the argument translation > area's VA also needs to live outside this range, at least for 32-bit PV > guests. To avoid extra is_hvm_*() conditionals, use the alternative VA > uniformly. > > While this could be conditionalized upon CONFIG_PV32 && > CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals > keeps the code more legible imo. > > Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse") > Signed-off-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org> Despite the fact that this is trying to fix the current breakage, I would still want to see a full maintainer review. Thanks, Ian.
© 2016 - 2024 Red Hat, Inc.