Addressing a few assorted aspects I've noticed during the investigations / reviews. 1: mod_l<N>_entry() have no need to use __copy_from_user() 2: rename and tidy create_pae_xen_mappings() 3: re-order a few conditionals Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
mod_l1_entry()'s need to do so went away with commit 2d0557c5cb ("x86: Fold page_info lock into type_info"), and the other three never had such a need, at least going back as far as 3.2.0. Replace the uses by newly introduced l<N>e_access_once(). Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v2: Use ACCESS_ONCE() clones. --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -2124,13 +2124,10 @@ static int mod_l1_entry(l1_pgentry_t *pl struct vcpu *pt_vcpu, struct domain *pg_dom) { bool preserve_ad = (cmd == MMU_PT_UPDATE_PRESERVE_AD); - l1_pgentry_t ol1e; + l1_pgentry_t ol1e = l1e_access_once(*pl1e); struct domain *pt_dom = pt_vcpu->domain; int rc = 0; - if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) ) - return -EFAULT; - ASSERT(!paging_mode_refcounts(pt_dom)); if ( l1e_get_flags(nl1e) & _PAGE_PRESENT ) @@ -2248,8 +2245,7 @@ static int mod_l2_entry(l2_pgentry_t *pl return -EPERM; } - if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) ) - return -EFAULT; + ol2e = l2e_access_once(*pl2e); if ( l2e_get_flags(nl2e) & _PAGE_PRESENT ) { @@ -2311,8 +2307,7 @@ static int mod_l3_entry(l3_pgentry_t *pl if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) ) return -EINVAL; - if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) ) - return -EFAULT; + ol3e = l3e_access_once(*pl3e); if ( l3e_get_flags(nl3e) & _PAGE_PRESENT ) { @@ -2378,8 +2373,7 @@ static int mod_l4_entry(l4_pgentry_t *pl return -EINVAL; } - if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) ) - return -EFAULT; + ol4e = l4e_access_once(*pl4e); if ( l4e_get_flags(nl4e) & _PAGE_PRESENT ) { --- a/xen/include/asm-x86/page.h +++ b/xen/include/asm-x86/page.h @@ -55,6 +55,16 @@ #define l4e_write(l4ep, l4e) \ pte_write(&l4e_get_intpte(*(l4ep)), l4e_get_intpte(l4e)) +/* Type-correct ACCESS_ONCE() wrappers for PTE accesses. */ +#define l1e_access_once(l1e) (*container_of(&ACCESS_ONCE(l1e_get_intpte(l1e)), \ + volatile l1_pgentry_t, l1)) +#define l2e_access_once(l2e) (*container_of(&ACCESS_ONCE(l2e_get_intpte(l2e)), \ + volatile l2_pgentry_t, l2)) +#define l3e_access_once(l3e) (*container_of(&ACCESS_ONCE(l3e_get_intpte(l3e)), \ + volatile l3_pgentry_t, l3)) +#define l4e_access_once(l4e) (*container_of(&ACCESS_ONCE(l4e_get_intpte(l4e)), \ + volatile l4_pgentry_t, l4)) + /* Get direct integer representation of a pte's contents (intpte_t). */ #define l1e_get_intpte(x) ((x).l1) #define l2e_get_intpte(x) ((x).l2) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
On 06/01/2020 15:34, Jan Beulich wrote: > --- a/xen/include/asm-x86/page.h > +++ b/xen/include/asm-x86/page.h > @@ -55,6 +55,16 @@ > #define l4e_write(l4ep, l4e) \ > pte_write(&l4e_get_intpte(*(l4ep)), l4e_get_intpte(l4e)) > > +/* Type-correct ACCESS_ONCE() wrappers for PTE accesses. */ > +#define l1e_access_once(l1e) (*container_of(&ACCESS_ONCE(l1e_get_intpte(l1e)), \ > + volatile l1_pgentry_t, l1)) > +#define l2e_access_once(l2e) (*container_of(&ACCESS_ONCE(l2e_get_intpte(l2e)), \ > + volatile l2_pgentry_t, l2)) > +#define l3e_access_once(l3e) (*container_of(&ACCESS_ONCE(l3e_get_intpte(l3e)), \ > + volatile l3_pgentry_t, l3)) > +#define l4e_access_once(l4e) (*container_of(&ACCESS_ONCE(l4e_get_intpte(l4e)), \ > + volatile l4_pgentry_t, l4)) What's wrong with l?e_read_atomic() which already exist, and are already used elsewhere? If nothing, then Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> to save another round of posting. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
On 06.01.2020 17:09, Andrew Cooper wrote: > On 06/01/2020 15:34, Jan Beulich wrote: >> --- a/xen/include/asm-x86/page.h >> +++ b/xen/include/asm-x86/page.h >> @@ -55,6 +55,16 @@ >> #define l4e_write(l4ep, l4e) \ >> pte_write(&l4e_get_intpte(*(l4ep)), l4e_get_intpte(l4e)) >> >> +/* Type-correct ACCESS_ONCE() wrappers for PTE accesses. */ >> +#define l1e_access_once(l1e) (*container_of(&ACCESS_ONCE(l1e_get_intpte(l1e)), \ >> + volatile l1_pgentry_t, l1)) >> +#define l2e_access_once(l2e) (*container_of(&ACCESS_ONCE(l2e_get_intpte(l2e)), \ >> + volatile l2_pgentry_t, l2)) >> +#define l3e_access_once(l3e) (*container_of(&ACCESS_ONCE(l3e_get_intpte(l3e)), \ >> + volatile l3_pgentry_t, l3)) >> +#define l4e_access_once(l4e) (*container_of(&ACCESS_ONCE(l4e_get_intpte(l4e)), \ >> + volatile l4_pgentry_t, l4)) > > What's wrong with l?e_read_atomic() which already exist, and are already > used elsewhere? I did consider going that route, but predicted you would object to its use here: Iirc you've previously voiced opinion in this direction (perhaps not on the page table accessors themselves but the underlying {read,write}_u<N>_atomic()). > If nothing, then Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> > to save another round of posting. Let's get the above clarified - I'll be happy to switch to l<N>e_read_atomic() if that's fine by you. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
On 06/01/2020 16:18, Jan Beulich wrote: > On 06.01.2020 17:09, Andrew Cooper wrote: >> On 06/01/2020 15:34, Jan Beulich wrote: >>> --- a/xen/include/asm-x86/page.h >>> +++ b/xen/include/asm-x86/page.h >>> @@ -55,6 +55,16 @@ >>> #define l4e_write(l4ep, l4e) \ >>> pte_write(&l4e_get_intpte(*(l4ep)), l4e_get_intpte(l4e)) >>> >>> +/* Type-correct ACCESS_ONCE() wrappers for PTE accesses. */ >>> +#define l1e_access_once(l1e) (*container_of(&ACCESS_ONCE(l1e_get_intpte(l1e)), \ >>> + volatile l1_pgentry_t, l1)) >>> +#define l2e_access_once(l2e) (*container_of(&ACCESS_ONCE(l2e_get_intpte(l2e)), \ >>> + volatile l2_pgentry_t, l2)) >>> +#define l3e_access_once(l3e) (*container_of(&ACCESS_ONCE(l3e_get_intpte(l3e)), \ >>> + volatile l3_pgentry_t, l3)) >>> +#define l4e_access_once(l4e) (*container_of(&ACCESS_ONCE(l4e_get_intpte(l4e)), \ >>> + volatile l4_pgentry_t, l4)) >> What's wrong with l?e_read_atomic() which already exist, and are already >> used elsewhere? > I did consider going that route, but predicted you would object to its > use here: Iirc you've previously voiced opinion in this direction > (perhaps not on the page table accessors themselves but the underlying > {read,write}_u<N>_atomic()). > >> If nothing, then Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> >> to save another round of posting. > Let's get the above clarified - I'll be happy to switch to > l<N>e_read_atomic() if that's fine by you. I'd definitely prefer reusing l?e_read_atomic() than introducing a new set of helpers. I have two key issues with the general _atomic() infrastructure. First, the term is overloaded for two very different things. We use it both for "don't subdivide this read/write" and "use a locked update", where the latter is what is expected by the name "atomic". Second, and specifically with {read,write}_u<N>_atomic(), it is their construction using macros which makes them impossible to locate in the source code. This can trivially be fixed by not using macros. (If it were up to me, the use of ## would be disallowed in general, because it does very little but to obfuscate code.) ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
After dad74b0f9e ("i386: fix handling of Xen entries in final L2 page table") and the removal of 32-bit support the function doesn't modify state anymore, and hence its name has been misleading. Change its name, constify parameters and a local variable, and make it return bool. Also drop the call to it from mod_l3_entry(): The function explicitly disallows 32-bit domains to modify slot 3. This way we also won't re-check slot 3 when a slot other than slot 3 changes. Doing so has needlessly disallowed making some L2 table recursively link back to an L2 used in some L3's 3rd slot, as we check for the type ref count to be 1. (Note that allowing dynamic changes of L3 entries in the way we do is bogus anyway, as that's not how L3s behave in the native and EPT cases: They get re-evaluated only upon CR3 reloads. NPT is different in this regard.) As a result of this we no longer need to play games to get at the start of the L3 table. Additionally move the single remaining call site, allowing to drop one is_pv_32bit_domain() invocation and a _PAGE_PRESENT check (in the function itself) as well as to exit the loop early (remaining entries have all ben set to empty just ahead of this loop). Further move a BUG_ON() such that in the common case its condition wouldn't need evaluating. Finally, since we're at it, move init_xen_pae_l2_slots() next to the renamed function, as they really belong together (in fact init_xen_pae_l2_slots() was [indirectly] broken out of this function). Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v2: Refine description. Drop an ASSERT(). Add a comment ahead of the function. --- We could go further here and delete the function altogether: There are no linear mappings in a PGT_pae_xen_l2 table anymore (this was on 32-bit only). The corresponding conditional in mod_l3_entry() could then go away as well (or, more precisely, would need to be replaced by correct handling of 3rd slot updates). This would mean that a 32-bit guest functioning on new Xen may fail to work on older (possibly 32-bit) Xen. --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -1414,23 +1414,22 @@ static int promote_l1_table(struct page_ return ret; } -static int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e) +/* + * Note: The checks performed by this function are just to enforce a + * legacy restriction necessary on 32-bit hosts. There's not much point in + * relaxing (dropping) this though, as 32-bit guests would still need to + * conform to the original restrictions in order to be able to run on (old) + * 32-bit Xen. + */ +static bool pae_xen_mappings_check(const struct domain *d, + const l3_pgentry_t *pl3e) { - struct page_info *page; - l3_pgentry_t l3e3; - - if ( !is_pv_32bit_domain(d) ) - return 1; - - pl3e = (l3_pgentry_t *)((unsigned long)pl3e & PAGE_MASK); - - /* 3rd L3 slot contains L2 with Xen-private mappings. It *must* exist. */ - l3e3 = pl3e[3]; - if ( !(l3e_get_flags(l3e3) & _PAGE_PRESENT) ) - { - gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is empty\n"); - return 0; - } + /* + * 3rd L3 slot contains L2 with Xen-private mappings. It *must* exist, + * which our caller has already verified. + */ + l3_pgentry_t l3e3 = pl3e[3]; + const struct page_info *page = l3e_get_page(l3e3); /* * The Xen-private mappings include linear mappings. The L2 thus cannot @@ -1441,17 +1440,24 @@ static int create_pae_xen_mappings(struc * a. promote_l3_table() calls this function and this check will fail * b. mod_l3_entry() disallows updates to slot 3 in an existing table */ - page = l3e_get_page(l3e3); BUG_ON(page->u.inuse.type_info & PGT_pinned); - BUG_ON((page->u.inuse.type_info & PGT_count_mask) == 0); BUG_ON(!(page->u.inuse.type_info & PGT_pae_xen_l2)); if ( (page->u.inuse.type_info & PGT_count_mask) != 1 ) { + BUG_ON(!(page->u.inuse.type_info & PGT_count_mask)); gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is shared\n"); - return 0; + return false; } - return 1; + return true; +} + +void init_xen_pae_l2_slots(l2_pgentry_t *l2t, const struct domain *d) +{ + memcpy(&l2t[COMPAT_L2_PAGETABLE_FIRST_XEN_SLOT(d)], + &compat_idle_pg_table_l2[ + l2_table_offset(HIRO_COMPAT_MPT_VIRT_START)], + COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*l2t)); } static int promote_l2_table(struct page_info *page, unsigned long type) @@ -1592,6 +1598,16 @@ static int promote_l3_table(struct page_ l3e_get_mfn(l3e), PGT_l2_page_table | PGT_pae_xen_l2, d, partial_flags | PTF_preemptible | PTF_retain_ref_on_restart); + + if ( !rc ) + { + if ( pae_xen_mappings_check(d, pl3e) ) + { + pl3e[i] = adjust_guest_l3e(l3e, d); + break; + } + rc = -EINVAL; + } } else if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) ) { @@ -1621,8 +1637,6 @@ static int promote_l3_table(struct page_ pl3e[i] = adjust_guest_l3e(l3e, d); } - if ( !rc && !create_pae_xen_mappings(d, pl3e) ) - rc = -EINVAL; if ( rc < 0 && rc != -ERESTART && rc != -EINTR ) { gdprintk(XENLOG_WARNING, @@ -1663,14 +1677,6 @@ static int promote_l3_table(struct page_ unmap_domain_page(pl3e); return rc; } - -void init_xen_pae_l2_slots(l2_pgentry_t *l2t, const struct domain *d) -{ - memcpy(&l2t[COMPAT_L2_PAGETABLE_FIRST_XEN_SLOT(d)], - &compat_idle_pg_table_l2[ - l2_table_offset(HIRO_COMPAT_MPT_VIRT_START)], - COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*l2t)); -} #endif /* CONFIG_PV */ /* @@ -2347,10 +2353,6 @@ static int mod_l3_entry(l3_pgentry_t *pl return -EFAULT; } - if ( likely(rc == 0) ) - if ( !create_pae_xen_mappings(d, pl3e) ) - BUG(); - put_page_from_l3e(ol3e, mfn, PTF_defer); return rc; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
On 06/01/2020 15:35, Jan Beulich wrote: > After dad74b0f9e ("i386: fix handling of Xen entries in final L2 page > table") and the removal of 32-bit support the function doesn't modify > state anymore, and hence its name has been misleading. Change its name, > constify parameters and a local variable, and make it return bool. > > Also drop the call to it from mod_l3_entry(): The function explicitly > disallows 32-bit domains to modify slot 3. This way we also won't > re-check slot 3 when a slot other than slot 3 changes. Doing so has > needlessly disallowed making some L2 table recursively link back to an > L2 used in some L3's 3rd slot, as we check for the type ref count to be > 1. (Note that allowing dynamic changes of L3 entries in the way we do is > bogus anyway, as that's not how L3s behave in the native and EPT cases: > They get re-evaluated only upon CR3 reloads. NPT is different in this > regard.) > > As a result of this we no longer need to play games to get at the start > of the L3 table. > > Additionally move the single remaining call site, allowing to drop one > is_pv_32bit_domain() invocation and a _PAGE_PRESENT check (in the > function itself) as well as to exit the loop early (remaining entries > have all ben set to empty just ahead of this loop). been. > > Further move a BUG_ON() such that in the common case its condition > wouldn't need evaluating. > > Finally, since we're at it, move init_xen_pae_l2_slots() next to the > renamed function, as they really belong together (in fact > init_xen_pae_l2_slots() was [indirectly] broken out of this function). > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
is_{hvm,pv}_*() can be expensive now, so where possible evaluate cheaper conditions first. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v2: New. --- I couldn't really decide whether to drop the two involved unlikely(). --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -1588,7 +1588,7 @@ static int promote_l3_table(struct page_ if ( i > page->nr_validated_ptes && hypercall_preempt_check() ) rc = -EINTR; - else if ( is_pv_32bit_domain(d) && (i == 3) ) + else if ( i == 3 && is_pv_32bit_domain(d) ) { if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || (l3e_get_flags(l3e) & l3_disallow_mask(d)) ) @@ -2310,7 +2310,7 @@ static int mod_l3_entry(l3_pgentry_t *pl * Disallow updates to final L3 slot. It contains Xen mappings, and it * would be a pain to ensure they remain continuously valid throughout. */ - if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) ) + if ( pgentry_ptr_to_slot(pl3e) >= 3 && is_pv_32bit_domain(d) ) return -EINVAL; ol3e = l3e_access_once(*pl3e); @@ -2470,7 +2470,7 @@ static int cleanup_page_mappings(struct { struct domain *d = page_get_owner(page); - if ( d && is_pv_domain(d) && unlikely(need_iommu_pt_sync(d)) ) + if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) ) { int rc2 = iommu_legacy_unmap(d, _dfn(mfn), PAGE_ORDER_4K); @@ -2984,7 +2984,7 @@ static int _get_page_type(struct page_in /* Special pages should not be accessible from devices. */ struct domain *d = page_get_owner(page); - if ( d && is_pv_domain(d) && unlikely(need_iommu_pt_sync(d)) ) + if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) ) { mfn_t mfn = page_to_mfn(page); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
On 06/01/2020 15:35, Jan Beulich wrote: > is_{hvm,pv}_*() can be expensive now, so where possible evaluate cheaper > conditions first. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> > --- > v2: New. > --- > I couldn't really decide whether to drop the two involved unlikely(). Personally, I don't think we should have any likely/unlikley annotations at all. They are difficult for humans to reason about (especially when you're in a nested clause of annotated condition) - several of them are(/were) wrong, and plenty are dubious. People who actually care should be using PGO. This is yet another toolchain feature I'm hoping that we will get "for free" from Anthony's work to switch to using kbuild. If we were to delete all likely/unlikley annotations, and someone could then measure a performance improvement from reinserting some of them, then perhaps it would be ok to keep a few around, but my gut feeling is that the compiler can do a better job in general than humans can. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
© 2016 - 2024 Red Hat, Inc.