Especially the latter three patches provide only small possible gains, from all I can tell. I nevertheless wanted to put up the entire set for consideration. 1: consistently inline {,un}adjust_guest_l<N>e() 2: fold redundant calls to adjust_guest_l<N>e() 3: _PAGE_RW changes may take fast path of mod_l[234]_entry() 4: restrict TLB flushing after mod_l[234]_entry() 5: avoid TLB flushing after mod_l3_entry() Jan
Commit 8a74707a7c ("x86/nospec: Use always_inline to fix code gen for evaluate_nospec") converted inline to always_inline for adjust_guest_l[134]e(), but left adjust_guest_l2e() and unadjust_guest_l3e() alone without saying why these two would differ in the needed / wanted treatment. Adjust these two as well. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- Actually I question the need for always_inline here, for two reasons: 1) All that's guarded are updates to local variables, depending on merely the values held by these local variables (really: function arguments) on input. As a result it would look to me as if we wanted evaluate_nospec()-free variants of is_pv{,_32bit}_domain() and alike, to be used e.g. here. 2) These functions don't act as predicates, and hence the concern expressed in said commit doesn't apply here: Callers wouldn't observe misplaced LFENCEs, as they don't use the results of these helpers for further (direct) code flow control. --- So far I've observed only clang to not inline the two functions when just "inline" is in place. --- a/xen/arch/x86/pv/mm.h +++ b/xen/arch/x86/pv/mm.h @@ -99,8 +99,8 @@ static always_inline l1_pgentry_t adjust return l1e; } -static inline l2_pgentry_t adjust_guest_l2e(l2_pgentry_t l2e, - const struct domain *d) +static always_inline l2_pgentry_t adjust_guest_l2e(l2_pgentry_t l2e, + const struct domain *d) { if ( likely(l2e_get_flags(l2e) & _PAGE_PRESENT) && likely(!is_pv_32bit_domain(d)) ) @@ -119,8 +119,8 @@ static always_inline l3_pgentry_t adjust return l3e; } -static inline l3_pgentry_t unadjust_guest_l3e(l3_pgentry_t l3e, - const struct domain *d) +static always_inline l3_pgentry_t unadjust_guest_l3e(l3_pgentry_t l3e, + const struct domain *d) { if ( unlikely(is_pv_32bit_domain(d)) && likely(l3e_get_flags(l3e) & _PAGE_PRESENT) )
On Tue, Nov 03, 2020 at 11:56:16AM +0100, Jan Beulich wrote: > Commit 8a74707a7c ("x86/nospec: Use always_inline to fix code gen for > evaluate_nospec") converted inline to always_inline for > adjust_guest_l[134]e(), but left adjust_guest_l2e() and > unadjust_guest_l3e() alone without saying why these two would differ in > the needed / wanted treatment. Adjust these two as well. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> I'm also unsure why those where left without always_inline. Maybe it's because for the way they are used? Turning them into always_inline shouldn't do any damage, so I think it's fine. Thanks, Roger.
At least from an abstract perspective it is quite odd for us to compare adjusted old and unadjusted new page table entries when determining whether the fast path can be used. This is largely benign because FASTPATH_FLAG_WHITELIST covers most of the flags which the adjustments may set, and the flags getting set don't affect the outcome of get_page_from_l<N>e(). There's one exception: 32-bit L3 entries get _PAGE_RW set, but get_page_from_l3e() doesn't allow linear page tables to be created at this level for such guests. Apart from this _PAGE_RW is unused by get_page_from_l<N>e() (for N > 1), and hence forcing the bit on early has no functional effect. The main reason for the change, however, is that adjust_guest_l<N>e() aren't exactly cheap - both in terms of pure code size and because each one has at least one evaluate_nospec() by way of containing is_pv_32bit_domain() conditionals. Call the functions once ahead of the fast path checks, instead of twice after. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -2199,10 +2199,11 @@ static int mod_l1_entry(l1_pgentry_t *pl nl1e = l1e_from_page(page, l1e_get_flags(nl1e)); } + nl1e = adjust_guest_l1e(nl1e, pt_dom); + /* Fast path for sufficiently-similar mappings. */ if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) ) { - nl1e = adjust_guest_l1e(nl1e, pt_dom); rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad); if ( page ) @@ -2227,7 +2228,6 @@ static int mod_l1_entry(l1_pgentry_t *pl if ( page ) put_page(page); - nl1e = adjust_guest_l1e(nl1e, pt_dom); if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad)) ) { @@ -2279,10 +2279,11 @@ static int mod_l2_entry(l2_pgentry_t *pl return -EINVAL; } + nl2e = adjust_guest_l2e(nl2e, d); + /* Fast path for sufficiently-similar mappings. */ if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) ) { - nl2e = adjust_guest_l2e(nl2e, d); if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, mfn, vcpu, preserve_ad) ) return 0; return -EBUSY; @@ -2291,7 +2292,6 @@ static int mod_l2_entry(l2_pgentry_t *pl if ( unlikely((rc = get_page_from_l2e(nl2e, mfn, d, 0)) < 0) ) return rc; - nl2e = adjust_guest_l2e(nl2e, d); if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, mfn, vcpu, preserve_ad)) ) { @@ -2341,10 +2341,11 @@ static int mod_l3_entry(l3_pgentry_t *pl return -EINVAL; } + nl3e = adjust_guest_l3e(nl3e, d); + /* Fast path for sufficiently-similar mappings. */ if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) ) { - nl3e = adjust_guest_l3e(nl3e, d); rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, mfn, vcpu, preserve_ad); return rc ? 0 : -EFAULT; } @@ -2354,7 +2355,6 @@ static int mod_l3_entry(l3_pgentry_t *pl return rc; rc = 0; - nl3e = adjust_guest_l3e(nl3e, d); if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, mfn, vcpu, preserve_ad)) ) { @@ -2403,10 +2403,11 @@ static int mod_l4_entry(l4_pgentry_t *pl return -EINVAL; } + nl4e = adjust_guest_l4e(nl4e, d); + /* Fast path for sufficiently-similar mappings. */ if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) ) { - nl4e = adjust_guest_l4e(nl4e, d); rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, mfn, vcpu, preserve_ad); return rc ? 0 : -EFAULT; } @@ -2416,7 +2417,6 @@ static int mod_l4_entry(l4_pgentry_t *pl return rc; rc = 0; - nl4e = adjust_guest_l4e(nl4e, d); if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, mfn, vcpu, preserve_ad)) ) {
On Tue, Nov 03, 2020 at 11:56:44AM +0100, Jan Beulich wrote: > At least from an abstract perspective it is quite odd for us to compare > adjusted old and unadjusted new page table entries when determining > whether the fast path can be used. This is largely benign because > FASTPATH_FLAG_WHITELIST covers most of the flags which the adjustments > may set, and the flags getting set don't affect the outcome of > get_page_from_l<N>e(). There's one exception: 32-bit L3 entries get > _PAGE_RW set, but get_page_from_l3e() doesn't allow linear page tables > to be created at this level for such guests. Apart from this _PAGE_RW > is unused by get_page_from_l<N>e() (for N > 1), and hence forcing the > bit on early has no functional effect. > > The main reason for the change, however, is that adjust_guest_l<N>e() > aren't exactly cheap - both in terms of pure code size and because each > one has at least one evaluate_nospec() by way of containing > is_pv_32bit_domain() conditionals. > > Call the functions once ahead of the fast path checks, instead of twice > after. I guess part of the reasoning for doing it that way is because you can avoid the adjust_guest_l1e in the slow path if get_page_from_l1e fails? In any case, adjust_guest_l1e was only called once from either the fast or the slow paths. > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, Roger.
On 11.01.2021 11:36, Roger Pau Monné wrote: > On Tue, Nov 03, 2020 at 11:56:44AM +0100, Jan Beulich wrote: >> At least from an abstract perspective it is quite odd for us to compare >> adjusted old and unadjusted new page table entries when determining >> whether the fast path can be used. This is largely benign because >> FASTPATH_FLAG_WHITELIST covers most of the flags which the adjustments >> may set, and the flags getting set don't affect the outcome of >> get_page_from_l<N>e(). There's one exception: 32-bit L3 entries get >> _PAGE_RW set, but get_page_from_l3e() doesn't allow linear page tables >> to be created at this level for such guests. Apart from this _PAGE_RW >> is unused by get_page_from_l<N>e() (for N > 1), and hence forcing the >> bit on early has no functional effect. >> >> The main reason for the change, however, is that adjust_guest_l<N>e() >> aren't exactly cheap - both in terms of pure code size and because each >> one has at least one evaluate_nospec() by way of containing >> is_pv_32bit_domain() conditionals. >> >> Call the functions once ahead of the fast path checks, instead of twice >> after. > > I guess part of the reasoning for doing it that way is because you can > avoid the adjust_guest_l1e in the slow path if get_page_from_l1e > fails? Possibly, but this is specifically the not generally interesting code path (in particular not one where I'd consider performance relevant). > In any case, adjust_guest_l1e was only called once from either the > fast or the slow paths. > >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks. Jan
The only time _PAGE_RW matters when validating an L2 or higher entry is when a linear page table is tried to be installed. Therefore when we disallow such at build time, we can allow _PAGE_RW changes to take the fast paths there. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -2140,6 +2140,18 @@ static void l3t_unlock(struct page_info (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \ _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER) +/* + * PDE flags that a guest may change without re-validating the PDE. + * All other bits affect translation, caching, or Xen's safety. When guest + * created linear page tables aren't allowed, intermediate page tables may + * have _PAGE_RW altered without this requiring re-validation. + */ +#ifndef CONFIG_PV_LINEAR_PT +# define FASTPATH_PDE_FLAG_WHITELIST (FASTPATH_FLAG_WHITELIST | _PAGE_RW) +#else +# define FASTPATH_PDE_FLAG_WHITELIST FASTPATH_FLAG_WHITELIST +#endif + /* Update the L1 entry at pl1e to new value nl1e. */ static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e, mfn_t gl1mfn, unsigned int cmd, @@ -2282,7 +2294,7 @@ static int mod_l2_entry(l2_pgentry_t *pl nl2e = adjust_guest_l2e(nl2e, d); /* Fast path for sufficiently-similar mappings. */ - if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) ) + if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_PDE_FLAG_WHITELIST) ) { if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, mfn, vcpu, preserve_ad) ) return 0; @@ -2344,7 +2356,7 @@ static int mod_l3_entry(l3_pgentry_t *pl nl3e = adjust_guest_l3e(nl3e, d); /* Fast path for sufficiently-similar mappings. */ - if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) ) + if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_PDE_FLAG_WHITELIST) ) { rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, mfn, vcpu, preserve_ad); return rc ? 0 : -EFAULT; @@ -2406,7 +2418,7 @@ static int mod_l4_entry(l4_pgentry_t *pl nl4e = adjust_guest_l4e(nl4e, d); /* Fast path for sufficiently-similar mappings. */ - if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) ) + if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_PDE_FLAG_WHITELIST) ) { rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, mfn, vcpu, preserve_ad); return rc ? 0 : -EFAULT;
On Tue, Nov 03, 2020 at 11:57:10AM +0100, Jan Beulich wrote: > The only time _PAGE_RW matters when validating an L2 or higher entry is > when a linear page table is tried to be installed. Therefore when we > disallow such at build time, we can allow _PAGE_RW changes to take the > fast paths there. I think it would be helpful to note why PDEs with linear page tables aren't allowed to have the RW bit set, likely here and in the comment below. Thanks, Roger.
On 11.01.2021 12:08, Roger Pau Monné wrote: > On Tue, Nov 03, 2020 at 11:57:10AM +0100, Jan Beulich wrote: >> The only time _PAGE_RW matters when validating an L2 or higher entry is >> when a linear page table is tried to be installed. Therefore when we >> disallow such at build time, we can allow _PAGE_RW changes to take the >> fast paths there. > > I think it would be helpful to note why PDEs with linear page tables > aren't allowed to have the RW bit set, likely here and in the comment > below. I've changed the description to "The only time _PAGE_RW matters when validating an L2 or higher entry is when a linear page table is tried to be installed (see the comment ahead of define_get_linear_pagetable()). Therefore when we disallow such at build time, we can allow _PAGE_RW changes to take the fast paths there." Considering there already is a code comment explaining this, I'm less convinced of also editing the comment. An option might be to move up the #define-s next to define_get_linear_pagetable(), but imo this should then involve moving FASTPATH_FLAG_WHITELIST as well. Jan
On Mon, Jan 11, 2021 at 02:31:10PM +0100, Jan Beulich wrote: > On 11.01.2021 12:08, Roger Pau Monné wrote: > > On Tue, Nov 03, 2020 at 11:57:10AM +0100, Jan Beulich wrote: > >> The only time _PAGE_RW matters when validating an L2 or higher entry is > >> when a linear page table is tried to be installed. Therefore when we > >> disallow such at build time, we can allow _PAGE_RW changes to take the > >> fast paths there. > > > > I think it would be helpful to note why PDEs with linear page tables > > aren't allowed to have the RW bit set, likely here and in the comment > > below. > > I've changed the description to > > "The only time _PAGE_RW matters when validating an L2 or higher entry is > when a linear page table is tried to be installed (see the comment ahead > of define_get_linear_pagetable()). Therefore when we disallow such at > build time, we can allow _PAGE_RW changes to take the fast paths there." > > Considering there already is a code comment explaining this, I'm > less convinced of also editing the comment. An option might be to > move up the #define-s next to define_get_linear_pagetable(), but > imo this should then involve moving FASTPATH_FLAG_WHITELIST as > well. Referencing the comment in define_get_linear_pagetable in the commit message seems fine to me: Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, Roger.
Just like we avoid to invoke remote root pt flushes when all uses of an L4 table can be accounted for locally, the same can be done for all of L[234] for the linear pt flush when the table is a "free floating" one, i.e. it is pinned but not hooked up anywhere. While this situation doesn't occur very often, it can be observed. Since this breaks one of the implications of the XSA-286 fix, drop the flush_root_pt_local variable again and set ->root_pgt_changed directly, just like it was before that change. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- While adjusting the big comment that was added for XSA-286 I wondered why it talks about the "construction of 32bit PV guests". How are 64-bit PV guests different in this regard? --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -3903,8 +3903,7 @@ long do_mmu_update( struct vcpu *curr = current, *v = curr; struct domain *d = v->domain, *pt_owner = d, *pg_owner; mfn_t map_mfn = INVALID_MFN, mfn; - bool flush_linear_pt = false, flush_root_pt_local = false, - flush_root_pt_others = false; + bool flush_linear_pt = false, flush_root_pt_others = false; uint32_t xsm_needed = 0; uint32_t xsm_checked = 0; int rc = put_old_guest_table(curr); @@ -4054,7 +4053,9 @@ long do_mmu_update( break; rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn, cmd == MMU_PT_UPDATE_PRESERVE_AD, v); - if ( !rc ) + if ( !rc && + (page->u.inuse.type_info & PGT_count_mask) > + 1 + !!(page->u.inuse.type_info & PGT_pinned) ) flush_linear_pt = true; break; @@ -4063,7 +4064,9 @@ long do_mmu_update( break; rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn, cmd == MMU_PT_UPDATE_PRESERVE_AD, v); - if ( !rc ) + if ( !rc && + (page->u.inuse.type_info & PGT_count_mask) > + 1 + !!(page->u.inuse.type_info & PGT_pinned) ) flush_linear_pt = true; break; @@ -4072,7 +4075,9 @@ long do_mmu_update( break; rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn, cmd == MMU_PT_UPDATE_PRESERVE_AD, v); - if ( !rc ) + if ( !rc && + (page->u.inuse.type_info & PGT_count_mask) > + 1 + !!(page->u.inuse.type_info & PGT_pinned) ) flush_linear_pt = true; if ( !rc && pt_owner->arch.pv.xpti ) { @@ -4082,7 +4087,7 @@ long do_mmu_update( mfn) ) { local_in_use = true; - flush_root_pt_local = true; + get_cpu_info()->root_pgt_changed = true; } /* @@ -4199,8 +4204,8 @@ long do_mmu_update( /* * Perform required TLB maintenance. * - * This logic currently depend on flush_linear_pt being a superset of the - * flush_root_pt_* conditions. + * This logic currently depends on flush_linear_pt being a superset of the + * flush_root_pt_others condition. * * pt_owner may not be current->domain. This may occur during * construction of 32bit PV guests, or debugging of PV guests. The @@ -4219,7 +4224,7 @@ long do_mmu_update( * pt_owner->dirty_cpumask), and/or all *other* dirty CPUs as there are * references we can't account for locally. */ - if ( flush_linear_pt /* || flush_root_pt_local || flush_root_pt_others */ ) + if ( flush_linear_pt /* || flush_root_pt_others */ ) { unsigned int cpu = smp_processor_id(); cpumask_t *mask = pt_owner->dirty_cpumask; @@ -4236,12 +4241,8 @@ long do_mmu_update( cpumask_copy(mask, pt_owner->dirty_cpumask); __cpumask_clear_cpu(cpu, mask); - flush_local(FLUSH_TLB | - (flush_root_pt_local ? FLUSH_ROOT_PGTBL : 0)); + flush_local(FLUSH_TLB); } - else - /* Sanity check. flush_root_pt_local implies local cpu is dirty. */ - ASSERT(!flush_root_pt_local); /* Flush the remote dirty CPUs. Does not include the local CPU. */ if ( !cpumask_empty(mask) ) @@ -4249,8 +4250,8 @@ long do_mmu_update( (flush_root_pt_others ? FLUSH_ROOT_PGTBL : 0)); } else - /* Sanity check. flush_root_pt_* implies flush_linear_pt. */ - ASSERT(!flush_root_pt_local && !flush_root_pt_others); + /* Sanity check. flush_root_pt_others implies flush_linear_pt. */ + ASSERT(!flush_root_pt_others); perfc_add(num_page_updates, i);
On 03/11/2020 10:57, Jan Beulich wrote: > Just like we avoid to invoke remote root pt flushes when all uses of an > L4 table can be accounted for locally, the same can be done for all of > L[234] for the linear pt flush when the table is a "free floating" one, > i.e. it is pinned but not hooked up anywhere. While this situation > doesn't occur very often, it can be observed. > > Since this breaks one of the implications of the XSA-286 fix, drop the > flush_root_pt_local variable again and set ->root_pgt_changed directly, > just like it was before that change. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > While adjusting the big comment that was added for XSA-286 I wondered > why it talks about the "construction of 32bit PV guests". How are 64-bit > PV guests different in this regard? Because the sole caller is move_l3_below_4G() for 32bit PV guests which don't support folded CR3's. It's not impossible that future changes to PV construction might change this, but it is highly unlikely in practice. ~Andrew
On Tue, Nov 03, 2020 at 11:57:33AM +0100, Jan Beulich wrote: > Just like we avoid to invoke remote root pt flushes when all uses of an > L4 table can be accounted for locally, the same can be done for all of > L[234] for the linear pt flush when the table is a "free floating" one, > i.e. it is pinned but not hooked up anywhere. While this situation > doesn't occur very often, it can be observed. > > Since this breaks one of the implications of the XSA-286 fix, drop the > flush_root_pt_local variable again and set ->root_pgt_changed directly, > just like it was before that change. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> LGTM, so: Acked-by: Roger Pau Monné <roger.pau@citrix.com> It would be good however if Andrew can give is opinion also, since he was the one to introduce such logic, and it's not trivial. > --- > While adjusting the big comment that was added for XSA-286 I wondered > why it talks about the "construction of 32bit PV guests". How are 64-bit > PV guests different in this regard? > > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -3903,8 +3903,7 @@ long do_mmu_update( > struct vcpu *curr = current, *v = curr; > struct domain *d = v->domain, *pt_owner = d, *pg_owner; > mfn_t map_mfn = INVALID_MFN, mfn; > - bool flush_linear_pt = false, flush_root_pt_local = false, > - flush_root_pt_others = false; > + bool flush_linear_pt = false, flush_root_pt_others = false; > uint32_t xsm_needed = 0; > uint32_t xsm_checked = 0; > int rc = put_old_guest_table(curr); > @@ -4054,7 +4053,9 @@ long do_mmu_update( > break; > rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn, > cmd == MMU_PT_UPDATE_PRESERVE_AD, v); > - if ( !rc ) > + if ( !rc && > + (page->u.inuse.type_info & PGT_count_mask) > > + 1 + !!(page->u.inuse.type_info & PGT_pinned) ) I think adding a macro to encapsulate the added condition would make the code clearer, maybe PAGETABLE_IN_USE, _LOADED or _ACTIVE? Thanks, Roger.
On 11.01.2021 14:00, Roger Pau Monné wrote: > On Tue, Nov 03, 2020 at 11:57:33AM +0100, Jan Beulich wrote: >> Just like we avoid to invoke remote root pt flushes when all uses of an >> L4 table can be accounted for locally, the same can be done for all of >> L[234] for the linear pt flush when the table is a "free floating" one, >> i.e. it is pinned but not hooked up anywhere. While this situation >> doesn't occur very often, it can be observed. >> >> Since this breaks one of the implications of the XSA-286 fix, drop the >> flush_root_pt_local variable again and set ->root_pgt_changed directly, >> just like it was before that change. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > LGTM, so: > > Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks. > It would be good however if Andrew can give is opinion also, since he > was the one to introduce such logic, and it's not trivial. I can certainly wait some, also to hopefully get a comment on >> --- >> While adjusting the big comment that was added for XSA-286 I wondered >> why it talks about the "construction of 32bit PV guests". How are 64-bit >> PV guests different in this regard? ... this, but his email troubles make it hard to judge for how long to wait. >> @@ -4054,7 +4053,9 @@ long do_mmu_update( >> break; >> rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn, >> cmd == MMU_PT_UPDATE_PRESERVE_AD, v); >> - if ( !rc ) >> + if ( !rc && >> + (page->u.inuse.type_info & PGT_count_mask) > >> + 1 + !!(page->u.inuse.type_info & PGT_pinned) ) > > I think adding a macro to encapsulate the added condition would make > the code clearer, maybe PAGETABLE_IN_USE, _LOADED or _ACTIVE? A macro or inline function may certainly be nice (indeed I did consider adding one), but I think here more than in some other cases it is crucial that the name properly reflect what it does. Unfortunately none of the ones you suggest nor any of the ones I did consider will meet this requirement. In particular, the check is about "is there any _other_ use". Hence I went with the not very fortunate redundant open-coding. Jan
On 11.01.2021 14:22, Jan Beulich wrote: > On 11.01.2021 14:00, Roger Pau Monné wrote: >> On Tue, Nov 03, 2020 at 11:57:33AM +0100, Jan Beulich wrote: >>> Just like we avoid to invoke remote root pt flushes when all uses of an >>> L4 table can be accounted for locally, the same can be done for all of >>> L[234] for the linear pt flush when the table is a "free floating" one, >>> i.e. it is pinned but not hooked up anywhere. While this situation >>> doesn't occur very often, it can be observed. >>> >>> Since this breaks one of the implications of the XSA-286 fix, drop the >>> flush_root_pt_local variable again and set ->root_pgt_changed directly, >>> just like it was before that change. >>> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> >> LGTM, so: >> >> Acked-by: Roger Pau Monné <roger.pau@citrix.com> > > Thanks. > >> It would be good however if Andrew can give is opinion also, since he >> was the one to introduce such logic, and it's not trivial. > > I can certainly wait some, also to hopefully get a comment on > >>> --- >>> While adjusting the big comment that was added for XSA-286 I wondered >>> why it talks about the "construction of 32bit PV guests". How are 64-bit >>> PV guests different in this regard? > > ... this, but his email troubles make it hard to judge for how > long to wait. About like for the other one just pinged, I intend to commit this one together with the other two still pending ones in this series once the tree is fully open again, unless I hear otherwise by then. All three have been acked by Roger. Jan
32-bit guests may not depend upon the side effect of using ordinary 4-level paging when running on a 64-bit hypervisor. For L3 entry updates to take effect, they have to use a CR3 reload. Therefore there's no need to issue a paging structure invalidating TLB flush in this case. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4066,7 +4066,8 @@ long do_mmu_update( cmd == MMU_PT_UPDATE_PRESERVE_AD, v); if ( !rc && (page->u.inuse.type_info & PGT_count_mask) > - 1 + !!(page->u.inuse.type_info & PGT_pinned) ) + 1 + !!(page->u.inuse.type_info & PGT_pinned) && + !is_pv_32bit_domain(pt_owner) ) flush_linear_pt = true; break;
On Tue, Nov 03, 2020 at 11:58:16AM +0100, Jan Beulich wrote: > 32-bit guests may not depend upon the side effect of using ordinary > 4-level paging when running on a 64-bit hypervisor. For L3 entry updates > to take effect, they have to use a CR3 reload. Therefore there's no need > to issue a paging structure invalidating TLB flush in this case. I assume it's fine for the Xen linear page tables to be lkely out of sync during the windows between the entry update and the CR3 reload? I wonder, won't something similar also apply to 64bit and L4 entries? Thanks, Roger.
On 11.01.2021 15:23, Roger Pau Monné wrote: > On Tue, Nov 03, 2020 at 11:58:16AM +0100, Jan Beulich wrote: >> 32-bit guests may not depend upon the side effect of using ordinary >> 4-level paging when running on a 64-bit hypervisor. For L3 entry updates >> to take effect, they have to use a CR3 reload. Therefore there's no need >> to issue a paging structure invalidating TLB flush in this case. > > I assume it's fine for the Xen linear page tables to be lkely out of > sync during the windows between the entry update and the CR3 reload? Yes, because ... > I wonder, won't something similar also apply to 64bit and L4 entries? ... unlike 64-bit paging, PAE paging special cases the treatment of the 4 top level table entries. On bare metal they get loaded by the CPU upon CR3 load, not when walking page tables. Jan
On Mon, Jan 11, 2021 at 03:28:23PM +0100, Jan Beulich wrote: > On 11.01.2021 15:23, Roger Pau Monné wrote: > > On Tue, Nov 03, 2020 at 11:58:16AM +0100, Jan Beulich wrote: > >> 32-bit guests may not depend upon the side effect of using ordinary > >> 4-level paging when running on a 64-bit hypervisor. For L3 entry updates > >> to take effect, they have to use a CR3 reload. Therefore there's no need > >> to issue a paging structure invalidating TLB flush in this case. > > > > I assume it's fine for the Xen linear page tables to be lkely out of > > sync during the windows between the entry update and the CR3 reload? > > Yes, because ... > > > I wonder, won't something similar also apply to 64bit and L4 entries? > > ... unlike 64-bit paging, PAE paging special cases the treatment > of the 4 top level table entries. On bare metal they get loaded > by the CPU upon CR3 load, not when walking page tables. I wouldn't mind having this added to the commit message. In any case: Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, Roger.
On 03.11.2020 11:54, Jan Beulich wrote: > Especially the latter three patches provide only small possible > gains, from all I can tell. I nevertheless wanted to put up the > entire set for consideration. > > 1: consistently inline {,un}adjust_guest_l<N>e() > 2: fold redundant calls to adjust_guest_l<N>e() > 3: _PAGE_RW changes may take fast path of mod_l[234]_entry() > 4: restrict TLB flushing after mod_l[234]_entry() > 5: avoid TLB flushing after mod_l3_entry() Ping? Jan
© 2016 - 2024 Red Hat, Inc.